% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/vs_uchime_ref.R
\name{vs_uchime_ref}
\alias{vs_uchime_ref}
\alias{uchime_ref}
\title{Detect chimeras by comparing sequences to a reference database}
\usage{
vs_uchime_ref(
  fasta_input,
  database,
  nonchimeras = NULL,
  chimeras = NULL,
  sizein = TRUE,
  sizeout = TRUE,
  relabel = NULL,
  relabel_sha1 = FALSE,
  fasta_width = 0,
  sample = NULL,
  log_file = NULL,
  threads = 1,
  vsearch_options = NULL,
  tmpdir = NULL
)
}
\arguments{
\item{fasta_input}{(Required). A FASTA file path or a FASTA object with reads.
See \emph{Details}.}

\item{database}{(Required). A FASTA file path or FASTA tibble object
containing the reference sequences. These sequences are assumed to be
chimera-free.}

\item{nonchimeras}{(Optional). Name of the FASTA output file for the
non-chimeric sequences. If \code{NULL} (default), no output is written to
file.}

\item{chimeras}{(Optional). Name of the FASTA output file for the chimeric
sequences. If \code{NULL} (default), no output is written to file.}

\item{sizein}{(Optional). If \code{TRUE} (default), abundance annotations
present in sequence headers are taken into account.}

\item{sizeout}{(Optional). If \code{TRUE} (default), abundance annotations
are added to FASTA headers.}

\item{relabel}{(Optional). Relabel sequences using the given prefix and a
ticker to construct new headers. Defaults to \code{NULL}.}

\item{relabel_sha1}{(Optional). If \code{TRUE} (default), relabel sequences
using the SHA1 message digest algorithm. Defaults to \code{FALSE}.}

\item{fasta_width}{(Optional). Number of characters per line in the output
FASTA file. Defaults to \code{0}, which eliminates wrapping.}

\item{sample}{(Optional). Add the given sample identifier string to sequence
headers. For instance, if the given string is "ABC", the text ";sample=ABC"
will be added to the header. If \code{NULL} (default), no identifier is added.}

\item{log_file}{(Optional). Name of the log file to capture messages from
\code{VSEARCH}. If \code{NULL} (default), no log file is created.}

\item{threads}{(Optional). Number of computational threads to be used by
\code{VSEARCH}. Defaults to \code{1}.}

\item{vsearch_options}{(Optional). Additional arguments to pass to
\code{VSEARCH}. Defaults to \code{NULL}. See \emph{Details}.}

\item{tmpdir}{(Optional). Path to the directory where temporary files should
be written when tables are used as input or output. Defaults to
\code{NULL}, which resolves to the session-specific temporary directory
(\code{tempdir()}).}
}
\value{
A tibble or \code{NULL}.

If \code{nonchimeras} and \code{chimeras} are specified, the resulting
sequences after chimera detection written directly to the specified files in
FASTA format, and no tibbles are returned.

If \code{nonchimeras} and \code{chimeras} are \code{NULL}, A FASTA object
containing non-chimeric sequences with an attribute \code{"chimeras"}
containing a tibble of chimeric sequences is returned. If no chimeras are
found, the \code{"chimeras"} attribute is an empty data frame.

Additionally, the returned tibble (when applicable) has an attribute
\code{"statistics"} containing a tibble with chimera detection statistics.

The statistics tibble has the following columns:
\itemize{
  \item \code{num_nucleotides}: Total number of nucleotides used as input
  for chimera detection.
  \item \code{num_sequences}: Total number of sequences used as input for
  chimera detection.
  \item \code{min_length_input_seq}: Length of the shortest sequence used
  as input for chimera detection.
  \item \code{max_length_input_seq}: Length of the longest sequence used as
  input for chimera detection.
  \item \code{avg_length_input_seq}: Average length of the sequences used as
  input for chimera detection.
  \item \code{num_non_chimeras}: Number of non-chimeric sequences.
  \item \code{num_chimeras}: Number of chimeric sequences.
  \item \code{input}: Name of the input file/object for the chimera
  detection.
}
}
\description{
\code{vs_uchime_ref} detects chimeras present in the FASTA
sequences in using \code{VSEARCH}'s \code{uchime_ref} algorithm.
}
\details{
Chimeras in the input FASTA sequences are detected using \code{VSEARCH}´s
\code{uchime_ref}.

\code{fasta_input} can either be a FASTA file or a FASTA object. FASTA objects
are tibbles that contain the columns \code{Header} and \code{Sequence}, see
\code{\link[microseq]{readFasta}}.

\code{database} must be a FASTA file or a FASTA object with high-quality
non-chimeric sequences.

\code{vsearch_options} allows users to pass additional command-line arguments
to \code{VSEARCH} that are not directly supported by this function. Refer to
the \code{VSEARCH} manual for more details.
}
\examples{
\dontrun{
# Define arguments
query_file <- file.path(file.path(path.package("Rsearch"), "extdata"),
                        "small.fasta")
db <- file.path(file.path(path.package("Rsearch"), "extdata"),
                "sintax_db.fasta")

# Detect chimeras with default parameters and return FASTA files
vs_uchime_ref(fasta_input = query_file,
              database = db,
              nonchimeras = "nonchimeras.fa",
              chimeras = "chimeras.fa")

# Detect chimeras with default parameters and return a FASTA tibble
nonchimeras.tbl <- vs_uchime_ref(fasta_input = query_file,
                                 database = db,
                                 nonchimeras = NULL,
                                 chimeras = NULL)

# Get chimeras tibble
chimeras.tbl <- attr(nonchimeras.tbl, "chimeras")

# Get statistics tibble
statistics.tbl <- attr(nonchimeras.tbl, "statistics")
}

}
\references{
\url{https://github.com/torognes/vsearch}

\url{https://github.com/torognes/vsearch}
}
