\name{ccems-package}
\alias{ccems-package}
\alias{ccems}
\docType{package}
\title{ Combinatorially Complex Equilibrium Model Selection }
\description{
This package performs model selections of equilibriums in general and quasi-equilibriums of enzyme complexes in particular.
Estimates of dissociation constants K that best describe a dataset are found by 
systematically scanning though all possibilities of K being infinity and/or plausibly equal to other K. 
The automatically generated space of models is then fitted to data.  Automation enables searches of spaces 
too large to be specified by hand, e.g. spaces generated by combinatorially complex equilibriums. 
}

\details{
\tabular{ll}{
Package: \tab ccems\cr
Type: \tab Package\cr
Depends: \tab odesolve, snow, PolynomF\cr
Suggests: \tab nws\cr
License: \tab GPL-2\cr
LazyLoad: \tab yes\cr
LazyData: \tab yes\cr
URL: \tab http://epbi-radivot.cwru.edu/ccems\cr
}

Index:
\preformatted{
RNR                     Ribonucleotide Reductase Data
TK1                     Thymidine Kinase 1 Data
ems                     Equilibrium Model Selection
fitModel                Fit Model
mkGrids                 Make Grid Model Space
mkKd2Kj                 Make Kd2Kj Mappings
mkModel                 Make Specific Model
mkSpurs                 Make Spur Model Space
mkg                     Make Generic Model
simulateData            Simulate Data
}

This package automatically generates and fits biochemical equilibrium models using as outputs either average protein mass 
data or enzyme reaction rate data. 
It is currently limited to systems where one central hub protein mediates all of the interactions and total 
concentrations of the reactants are approximately known exactly, e.g. as in systems that were reconstituted 
from purified reactants. 
It is limited further in that multiple sites for the same ligand must be filled in a predetermined sequence. 

Equilibriums can be specified by any acyclic spanning subgraph of its nodes, where edges are 
dissociation constants. Here, hub protein
oligomerization is viewed as a curtain rod from which threads 
of ligand bound states/complexes hang: each notch down a thread 
corresponds to one additional ligand bound to the hub j-mer where j increases as 
one moves to the right on the curtain rod. At the top of each thread is
a head-node that sits on the rod. The head nodes must be specified, as 
some j values may be absent and some ligand sites (other than the thread 
defining site) may be assumed to be saturated in some j-mers. The last node in 
each thread will be referred to as a tail node. If a ligand has more than one binding site, 
the tail of the thread of one site (other than the last one filled) is 
the head of the thread of the site filled next. 
Thus, head nodes must be stated only for the first site filled.  

In the examples below, E is the concentration of thymidine kinase 1 (TK1) tetramers, S is thymidine, 
t is dTTP, X is ATP and R is the large subunit of ribonucleotide reductase (RNR). The examples are
ordered by cpu consumption: the first takes ~0.5 min on 1 core, the second ~1.5 minutes on 2 cores, and 
the third ~2 days on 16 cores. The first fits activity data to a single thread model. It  
is the fastest example because it uses rational polynomials for 
the system model because [E] is small enough that total [S] approximates free [S]. 
In the second example there is only one ligand binding 
site (the s-site) and the hub protein forms at most a dimer. 
Thus, the thread topology of the acyclic graph used (to explore K equality 
hypotheses) has only two head nodes and two threads.  
The head node of the monomer thread is the free hub protein R1t0 and 
the head node of the dimer thread is the ligand free dimer R2t0.  
As there is only one site, the s-site, there are only two threads, one for the monomer 
and one for the dimer. Threads contain the names of
only their non-head nodes since their heads have already been specified. 
This structure is assigned to \code{topology} which is then passed to the function \code{mkg} 
to produce a generic model object \code{g}. Together with the data, this
generic model object is then passed to the function \code{ems} (equilibrium model selection) which generates the 
model space, fits it to the data, and returns the \code{topN} (typically 5, 10 or 20) best (lowest AIC) models.  
The third example is more complicated than the second because ATP has multiple R1 binding sites and because R also tetramerizes
and hexamerizes with increases in [ATP]. This problem motivated the development of this R package. 
It is an example of a problem whose solution 
is enabled by this software because its model space is too large to specify by hand. A linux cluster is needed to execute this example.

The user must have working directory write privileges so that the subdirectories
\code{models} and \code{results} can be created to hold  model C code (generated 
by \code{mkg}) and html output (generated by \code{ems}), respectively.
}

\note{ This work was supported by the National Cancer Institute (K25CA104791). }

\author{ Tom Radivoyevitch (txr24@case.edu) }
\references{
Radivoyevitch, T. (2008) Equilibrium model selection: dTTP induced R1 dimerization. \emph{BMC Systems Biology} \bold{2}, 15. 

Radivoyevitch, T.  Automated model generation and analysis methods for combinatorially 
complex biochemical equilibriums. (submitted). 
}
\seealso{\code{\link{ems}},  \code{\link{mkg}} }

\keyword{package}
\examples{
## LAPTOP EXAMPLE: Top 3 three parameter models of 
##                 Berenstein et al. JBC 2000 TK1 data
library(ccems)
topology <- list(  
    heads=c("E1S0"), #one E is a tetramer
    sites=list(                    
        c=list(    # c-site = catyltic site  
            t=c("E1S1","E1S2","E1S3","E1S4")   
        )
    )
)
g <- mkg(topology,activity=TRUE,TCC=FALSE)
dd=subset(TK1,(year==2000),select=c(E,S,k)) # Berenstein et al
names(dd)[1:2]= c("ET","ST")
tops=ems(dd,g,maxTotalPs=3,kIC=4) 
plot(dd$ST,dd$k,type="p",pch=1, xlab="[dT] (uM)", log="x",ylab="k (1/sec)",
          main="Top 3 TK1 Models with 3 parameters or less")
lgx=log(dd$ST)
upr=range(lgx)[2]
lwr=range(lgx)[1]
del=(upr-lwr)/50
fineX=exp(seq(lwr,upr,by=del))
newPnts <- data.frame(ET = rep(dd$ET[1],length(fineX)), ST = fineX)
for (i in 1:3) {
  df <- simulateData(tops[[i]],predict=newPnts,typeYP="k")$predict  
  lines(df$ST,df$EY,type="l",lty=i) 
}

## DESKTOP EXAMPLE: This example automatically creates (and fits) the model  
## space of the BMC SB 2008 dTTP induced R1 dimerization reference above.
library(ccems)
topology <- list(  
    heads=c("R1t0","R2t0"),  
    sites=list(       
        s=list(                     # s-site    thread #
            m=c("R1t1"),        # monomer      1
            d=c("R2t1","R2t2")  # dimer        2
        )
    )
) 

g <- mkg(topology,TCC=TRUE) 
data(RNR)
d1 <- subset(RNR,(year==2001)&(fg==1)&(G==0)&(t>0),select=c(R,t,m,year))
d2 <- subset(RNR,year==2006,select=c(R,t,m,year)) 
dd <- rbind(d1,d2)
names(dd)[1:2] <- paste(strsplit(g$id,split="")[[1]],"T",sep="")#e.g. to form "RT"
rownames(dd) <- 1:dim(dd)[1] # lose big number row names of parent dataframe
## top10=ems(dd,g,cpusPerHost=c("localhost"=2),maxTotalPs=2,ptype="SOCK") 


## CLUSTER EXAMPLE: This ATP induced R1 hexamerization example runs 1.8 days
##                  on a 16 core (4 quad proc machines) ROCKS Linux cluster. 

library(ccems)
topology <- list(
    heads=c("R1X0","R2X2","R4X4","R6X6"), 
    sites=list(                # s-sites are already filled only in (j>1)-mers 
        a=list(  #a-site                                                    thread
            m=c("R1X1"),                                            # monomer   1
            d=c("R2X3","R2X4"),                                     # dimer     2
            t=c("R4X5","R4X6","R4X7","R4X8"),                       # tetramer  3
            h=c("R6X7","R6X8","R6X9","R6X10", "R6X11", "R6X12")     # hexamer   4
        ), # tails of a-site threads are heads of h-site threads
        h=list(   # h-site
            m=c("R1X2"),                                            # monomer   5
            d=c("R2X5", "R2X6"),                                    # dimer     6
            t=c("R4X9", "R4X10","R4X11", "R4X12"),                  # tetramer  7
            h=c("R6X13", "R6X14", "R6X15","R6X16", "R6X17", "R6X18")# hexamer   8
        )
    )
)
g=mkg(topology,TCC=TRUE) 
dd=subset(RNR,(year==2002)&(fg==1)&(X>0),select=c(R,X,m,year))
names(dd)[1:2] <- paste(strsplit(g$id,split="")[[1]],"T",sep="")#i.e. c("RT","XT")

## 29 choose 3(2) is 3654(406), so 3654 + 406 + 29 + 1 = 4090 spurs, but after 
## subtracting those without at least one hexamer complex, and after adding 
## grids, the total number of models is 3410. Of these 3406 converged, see below. 
\dontrun{
cpusPerHost=c("localhost" = 4,"compute-0-0"=4,"compute-0-1"=4,"compute-0-2"=4)
top10=ems(dd,g,cpusPerHost=cpusPerHost, maxTotalPs=3, ptype="SOCK",KIC=100) 

# The following are the last few lines of the output. The first line shows that a 
# one parameter model is best(shown are best AICs of models with 0, 1, 2 or 3  
# parameters). The next shows that it took 1.8 days on 16 cpus to fit 3406 models. 
# And the block that follows shows that the top 5 models are all spur graph models.
# The html file RXglobSOCK.htm in the results directory contains this information 
# and more (e.g. parameter estimates and CI). 
#
# [1] 1000000.00000     -33.16309     -31.73658     -29.99075
#
# Time difference of 2623.881 mins
# Fitted = 3406, out of a total of  3410 
#
# ... making HTML file ... 
#  1 Model  20; nbp= 1; id=IIIIIIIIIIIJIIIIIIIIIIIIIIIII; AIC=-33.1631
#  2 Model 108; nbp= 2; id=IIIIIJIIIIIJIIIIIIIIIIIIIIIII; AIC=-31.7366
#  3 Model  21; nbp= 1; id=IIIIIIIIIIIIJIIIIIIIIIIIIIIII; AIC=-31.5144
#  4 Model 109; nbp= 2; id=IIIIIJIIIIIIJIIIIIIIIIIIIIIII; AIC=-31.4678
#  5 Model 145; nbp= 2; id=IIIIIIIIJIIIJIIIIIIIIIIIIIIII; AIC=-31.4431
}
}
