% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/split-LD.R
\name{snp_ldsplit}
\alias{snp_ldsplit}
\title{Independent LD blocks}
\usage{
snp_ldsplit(corr, thr_r2, min_size, max_size, max_K)
}
\arguments{
\item{corr}{Sparse correlation matrix. Usually, the output of \code{\link[=snp_cor]{snp_cor()}}.}

\item{thr_r2}{Threshold under which squared correlations are ignored.
This is useful to avoid counting noise, which should give clearer patterns
of costs vs. number of blocks. It is therefore possible to have a splitting
cost of 0. If this parameter is used, then \code{corr} can be computed using the
same parameter in \code{\link[=snp_cor]{snp_cor()}} (to increase the sparsity of the resulting matrix).}

\item{min_size}{Minimum number of variants in each block. This is used not to
have a disproportionate number of small blocks.}

\item{max_size}{Maximum number of variants in each block. This is used not to
have blocks that are too large, e.g. to limit computational and memory
requirements of applications that would use these blocks. For some long-range
LD regions, it may be needed to allow for large blocks.}

\item{max_K}{Maximum number of blocks to consider. All optimal solutions for K
from 1 to \code{max_K} will be returned. Some of these K might not have any corresponding
solution due to the limitations in size of the blocks. For example, splitting
10,000 variants in blocks with at least 500 and at most 2000 variants implies
that there are at least 5 and at most 20 blocks. Then, the choice of K depends
on the application, but a simple solution is to choose the largest K for which
the cost is lower than some threshold.}
}
\value{
A tibble with five columns:
\itemize{
\item \verb{$n_block}: Number of blocks.
\item \verb{$cost}: The sum of squared correlations outside the blocks.
\item \verb{$perc_kept}: Percentage of initial non-zero values kept within the blocks defined.
\item \verb{$block_num}: Resulting block numbers for each variant.
\item \verb{$all_last}: Last index of each block.
\item \verb{$all_size}: Sizes of the blocks.
}
}
\description{
Split a correlation matrix in blocks as independent as possible.
This will find the splitting in blocks that minimize the sum of squared
correlation between these blocks (i.e. everything outside these blocks).
}
\examples{
\dontrun{

  corr <- readRDS(url("https://www.dropbox.com/s/65u96jf7y32j2mj/spMat.rds?raw=1"))

  THR_R2 <- 0.01

  (res <- snp_ldsplit(corr, thr_r2 = THR_R2, min_size = 10, max_size = 50, max_K = 50))

  library(ggplot2)
  qplot(n_block, cost, data = res) + theme_bw(16) + scale_y_log10()

  all_ind <- head(res$all_last[[6]], -1)

  ## Transform sparse representation into (i,j,x) triplets
  corrT <- as(corr, "dgTMatrix")
  upper <- (corrT@i <= corrT@j & corrT@x^2 >= THR_R2)
  df <- data.frame(
    i = corrT@i[upper] + 1L,
    j = corrT@j[upper] + 1L,
    r2 = corrT@x[upper]^2
  )
  df$y <- (df$j - df$i) / 2

  ggplot(df) +
    geom_point(aes(i + y, y, color = r2), size = rel(0.5)) +
    coord_fixed() +
    scale_color_gradientn(colours = rev(colorRamps::matlab.like2(100))) +
    theme_minimal() +
    theme(axis.text.y = element_blank(), axis.ticks.y = element_blank()) +
    geom_vline(xintercept = all_ind + 0.5, linetype = 3) +
    labs(x = "Position", y = NULL) +
    scale_alpha(guide = 'none')
}
}
