% Generated by roxygen2 (4.0.2): do not edit by hand
\name{src_mysql}
\alias{src_mysql}
\alias{tbl.src_mysql}
\title{Connect to mysql/mariadb.}
\usage{
src_mysql(dbname, host = NULL, port = 0L, user = "root", password = "",
  ...)

\method{tbl}{src_mysql}(src, from, ...)
}
\arguments{
\item{dbname}{Database name}

\item{host,port}{Host name and port number of database}

\item{user,password}{User name and password. Rather than supplying a
username and password here, it's better to save them in \code{my.cnf},
as described in \code{\link[RMySQL]{MySQL}}. In that case, supply
\code{NULL} to both \code{user} and \code{password}.}

\item{...}{for the src, other arguments passed on to the underlying
database connector, \code{dbConnect}. For the tbl, included for
compatibility with the generic, but otherwise ignored.}

\item{src}{a mysql src created with \code{src_mysql}.}

\item{from}{Either a string giving the name of table in database, or
\code{\link{sql}} described a derived table or compound join.}
}
\description{
Use \code{src_mysql} to connect to an existing mysql or mariadb database,
and \code{tbl} to connect to tables within that database.
If you are running a local mysqlql database, leave all parameters set as
their defaults to connect. If you're connecting to a remote database,
ask your database administrator for the values of these variables.
}
\section{Debugging}{


To see exactly what SQL is being sent to the database, you see
\code{\link{show_query}} and \code{\link{explain}}.
}

\section{Grouping}{


Typically you will create a grouped data table is to call the \code{group_by}
method on a mysql tbl: this will take care of capturing
the unevalated expressions for you.

For best performance, the database should have an index on the variables
that you are grouping by. Use \code{\link{explain}} to check that
the database is using the indexes that you expect.
}

\section{Output}{


All data manipulation on SQL tbls are lazy: they will not actually
run the query or retrieve the data unless you ask for it: they all return
a new \code{\link{tbl_sql}} object. Use \code{\link{compute}} to run the
query and save the results in a temporary in the database, or use
\code{\link{collect}} to retrieve the results to R.

Note that \code{do} is not lazy since it must pull the data into R.
It returns a \code{\link{tbl_df}} or \code{\link{grouped_df}}, with one
column for each grouping variable, and one list column that contains the
results of the operation. \code{do} never simplifies its output.
}

\section{Query principles}{


This section attempts to lay out the principles governing the generation
of SQL queries from the manipulation verbs.  The basic principle is that
a sequence of operations should return the same value (modulo class)
regardless of where the data is stored.

\itemize{
 \item \code{arrange(arrange(df, x), y)} should be equivalent to
   \code{arrange(df, y, x)}

 \item \code{select(select(df, a:x), n:o)} should be equivalent to
   \code{select(df, n:o)}

 \item \code{mutate(mutate(df, x2 = x * 2), y2 = y * 2)} should be
    equivalent to \code{mutate(df, x2 = x * 2, y2 = y * 2)}

 \item \code{filter(filter(df, x == 1), y == 2)} should be
    equivalent to \code{filter(df, x == 1, y == 2)}

 \item \code{summarise} should return the summarised output with
   one level of grouping peeled off.
}
}
\examples{
\dontrun{
# Connection basics ---------------------------------------------------------
# To connect to a database first create a src:
my_db <- src_mysql(host = "blah.com", user = "hadley",
  password = "pass")
# Then reference a tbl within that src
my_tbl <- tbl(my_db, "my_table")
}

# Here we'll use the Lahman database: to create your own local copy,
# create a local database called "lahman", or tell lahman_mysql() how to
# a database that you can write to

if (!has_lahman("postgres") && has_lahman("mysql")) {
# Methods -------------------------------------------------------------------
batting <- tbl(lahman_mysql(), "Batting")
dim(batting)
colnames(batting)
head(batting)

# Data manipulation verbs ---------------------------------------------------
filter(batting, yearID > 2005, G > 130)
select(batting, playerID:lgID)
arrange(batting, playerID, desc(yearID))
summarise(batting, G = mean(G), n = n())
mutate(batting, rbi2 = 1.0 * R / AB)

# note that all operations are lazy: they don't do anything until you
# request the data, either by `print()`ing it (which shows the first ten
# rows), by looking at the `head()`, or `collect()` the results locally.

system.time(recent <- filter(batting, yearID > 2010))
system.time(collect(recent))

# Group by operations -------------------------------------------------------
# To perform operations by group, create a grouped object with group_by
players <- group_by(batting, playerID)
group_size(players)

# MySQL doesn't support windowed functions, which means that only
# grouped summaries are really useful:
summarise(players, mean_g = mean(G), best_ab = max(AB))

# When you group by multiple level, each summarise peels off one level
per_year <- group_by(batting, playerID, yearID)
stints <- summarise(per_year, stints = max(stint))
filter(ungroup(stints), stints > 3)
summarise(stints, max(stints))

# Joins ---------------------------------------------------------------------
player_info <- select(tbl(lahman_mysql(), "Master"), playerID,
  birthYear)
hof <- select(filter(tbl(lahman_mysql(), "HallOfFame"), inducted == "Y"),
 playerID, votedBy, category)

# Match players and their hall of fame data
inner_join(player_info, hof)
# Keep all players, match hof data where available
left_join(player_info, hof)
# Find only players in hof
semi_join(player_info, hof)
# Find players not in hof
anti_join(player_info, hof)

# Arbitrary SQL -------------------------------------------------------------
# You can also provide sql as is, using the sql function:
batting2008 <- tbl(lahman_mysql(),
  sql("SELECT * FROM Batting WHERE YearID = 2008"))
batting2008
}
}

