10/14/94:  Add the 1950 data to the survexp.mnwhite data set.


11/9/94:
    Small change in survdiff2.c suggested by Steve Kaluzny.  Change an
"if (a &b)" to  "if (b && a)" to avoid referencing beyond an array.

    Bug fixes to survexp.cfit.s, residuals.coxph.s and coxph.detail.s.  If
there was a death at time 0 the computation would be wrong.


11/22/94:
    Add as.matrix method to ratetable.s.

    Add the year 1990 and 2000 extrapolated data to all of the rate tables.
This is discussed fully in Biostatistics technical report #55.

    Change the imputation for 1970 Arizona non-white.  Lacking the data, and
because I needed to make an array out of it plus the white data (which has
1970), I had used the 1970 white data as an approximation.  This was a bad
idea; white and non-white are very different, and a plot that I happened to
do pointed this out.  Instead, I have just replicated the 1980 non-white
data into 1970 non-white.


11/25/94:
    Finish the code for survexp, Cox model, individual expected survival.
    Add a test case "expect3".


12/14/94:
    Make an efficiency improvement in plot.survfit and lines.survfit.  When
there is censoring and few deaths, so the plot has long horizontal segments
with many "+' signs, the prior code would pass coordinate sets like
	x= a,b,c,d,e,f,g,....
	y= l,m,m,m,m,m,n,....
to the underlying plot routines.  I now delete the redundant m's in in the
middle of this sequence.  The prior behavior could muddy up the look of
dashed or dotted lines, besides being incredibly slow.


12/28/94:
    Make an explicit check for the singularity flag (diagonal elements =0) in
chinv2.c.  In the old version round off error could cause these elements to
become not exactly 0, and the warning message for singularity (in the .s
routine) would not be triggered.  I noticed this in the printout of a
model with extra dummy vars that I knew was singular.  Add one more test
case to the library.


12/29/94:
    The coxph.detail routine had an incorrect call to coxph.getdata (it
refered to a non-existent arg).


12/29/94:
    Consider the following 2 models:
      fit1 _ coxph(Surv(time, status)~  x + strata(epoch), test2, weight=dummy)
      fit2 _ coxph(Surv(time, status)~  x + strata(epoch), test2, weight=epoch)
and assume that dummy==epoch = integers.  The second model will fail, but
the first works ok.  It seems that if the weight variable is also in a
strata statement (and only then) the result of "model.extract" has storage
mode integer instead of double.  The fix was to add as.double() to my .C
calls in coxph.fit and agreg.fit.  Since I don't see any logical use for a
model like fit2 above I don't expect this bug to effect anyone, but changed
it anyway.  I have NO explanation for the S behavior, not even a guess.


1/17/94: Fix title error in survfit.object.d, as pointed out by S Kaluzny.

1/23/94: The year 1990 and 2000 extrapolated data was wrong for the
usr and azr tables (I confused row major & column major order).  We now agree
with Jan Offord's SAS implimentation.

1/24/94:  In ratetable.s, add an error message for invalid subscripts (it's
easy to type 'survexp.us[0,,]' thinking that this will list data for age 0).
Add a print method, mainly to stop the annoying listing of attributes that
trails the table.

1/27/95:  Minor bug in survexp; if y was a Surv object it would choke with
a spurious error message.

1/31/95:  Error in [.ratetable:  the result of 'survexp.uswhite[,,1:2]' was
not a ratetable, due to a misunderstanding of what list(...) would produce.
Also, added a check to is.ratetable to ensure that cutpoints are in
ascending order.

2/2/95: Found by Steve Kaluzny -- in survreg an attempted use of the
'init' argument led to a syntax error.


2/7/95:  Bug in the summary function of survexp.uswhite, found by Brian
Ripley.  (I counted the males twice, and then printed a table labeled M & F).


2/25/95: Minor problem found by Frank Harrell.  The 'dfbeta' residual for
a univariate Cox model is returned as a matrix, but all others will be
a vector.  The 'dfbetas' residual fails (but it's the same oversight).

2/28/95: Fix bug found by Frank Harrell.  The summary.survfit function could
fail with a memory execption if the survival object was a matrix and an
explicit time was requested.  (The temp variable "n" should be the number
of rows of surv, not its length).

    Second bug (also Frank).  If more than one variable is going to infinity,
the warning message only mentions one of them.  Also the warning() function
complains, because it was being handed a vector of error messages.


3/14/95: Fix misc errors in survexp.  The main one: when I added individual
survival as an option for Cox models, I accidentally made it so that you
always got individual survival.  Added a set of test cases to 'expect3.s'
to avoid such in the future.

3/14/95: Another error found by Frank -- the robust=T option of coxph leads
to a syntax error.  Add a test case to doovarain.s, which also turned up
a bug in residuals (needed to use naive.var instead of 'var' for a robust
model).

---- at this point Statsci obtained a copy of my code, fixes appear in 3.2

5/23/95: Added a subscript method for survfit objects.  This makes it easy
to plot a subset of curves, when the object contains multiples.

6/8/95: Change the standard error formula in plot.cox.zph.s, following the
published correction in the appendix of my and Pat's paper.  The increase
in width is only on the order of (d+1)/d, where d=number of deaths.

6/21/95: Fix bug in coxph.s.  If the cluster() argument was used along with
the efron approx, it returned the robust variance using a breslow
approximation instead.  The numerical effect of the error was usually very
small.

7/26/95: Fix bug in print.summary.survreg -- the 'digits' option was changed
permantly if specified.
	 Many updates to the help files: over half of them were corrections
suggested by StatSci.


7/28/95: The 'strata' value returned by coxph.detail was incorrect.  (It's
almost never useful, however).


9/1/95: Small bug in coxph.fit.s, pointed out by Dan Sargent.  If `x' was
a vector then nvar=0. (Only arose when coxph.fit was called directly, without
using coxph).

10/25/95:  Correct the robust variance when there are case weights in coxph.
It should be D'WWD, where W is the diagonal matrix of weights, and I was
returning the unweighted form D'D.

10/27/95:  Added one more check to is.ratetable.  On-the-fly interpolation
only is valid for the last dimension of a ratetable (a design constraint in
pystep.c that would be much work to loosen), so return F for any ratetable
that requests such interpolation inappropriately.  None of the rate tables in
the current distribution or examples are affected by this.

12/22/95: Fix a minor bug found by Dick Repasky; with 'interval2' type data
the Surv routine would reject an exact (uncensored) survival time as
illegal.

12/22/95: The 9/1/95 repair caused null models, Surv(time, stat) ~1, to
fail.  This must be the umpteenth time I've replaced is.null(x) with
length(x)==0 somewhere in the code so as to properly handle a numeric(0)
vector; you'd think I'd learn!

12/22/95: Documentation error in survexp.d: the 'conditional' argument had
the wrong default.

12/22/95: Added a computation of the intra-class correlation coefficient to
Cox models with the cluster() argument, and to the printout.

12/27/95: Add a subscript method to cox.zph.  This makes it much easier to
plot only 1 of the curves.  Also, change the default transform to 'identity'
for counting processes since the KM gives an error message.  (I really need
to get the KM-for-counting-process functionality added.)

1/6/96:  Change the print.coxph and summary.coxph functions.  When the
coefficient vector contains NAs, then the degrees of freedom should not
count them nor should the Wald test include them.

1/7/96: Major improvement to survdiff, based on concerns of Sarah Lewington.
The code now can do stratified tests, and the output structure has more
information.  Also, we now correctly handle one special case: if one of the
groups is entirely censored before the first death, then expected=0 for that
group and the actual degrees of freedom for the test should be one less.
(This same case gives an NA coefficient in coxph).  Added 'difftest.s' to
the test suite to validate both this special case and the stratified log-rank.

2/24/96: Fix bug in coxfit2.c --- with an offset term the loglik was not
computed correctly.  (Everything else was ok-- betas, var, solution, even
the LR test).  The number was off by the constant:
	sum(status*(offset- mean(offset)))    #status=0 for alive
This only effected the standard models with Efron or Breslow approximation,
counting process and exact likelihood code didn't have the mistake.

2/25/96: Tiny change to the spacing of an error message in coxph.fit.

4/14/96: Update is.ratetable with a verbose option.  This is useful when
creating a new rate table.

6/5/96:  Changes posted to Statsci with one exception: the ICC is commented
out of coxph.  I think it needs more testing and explanation before I 
release it --- the computation is correct, but what does it really mean?
These fixes appear in version 3.4.

--------------------------------------------------------------

7/10/96: Minor fixes to format.Surv (add ... arg) and print.coxph (different
digits default), per suggestions of Steve Kaluzny.

7/29/96: Add the 'expected' option to pyears.  Also add the test file
pyear1.s, change 'summ' to 'summary' in the output structure, and update
the documentation.

8/13/96: Add the ... arg to plot.cox.zph.s, so that things like "ylim" can
be passed in.  This should have been there from the beginning.

8/30/96: Add m$singular.ok <- NULL to coxph.s (couldn't pass the singular.ok
arg to coxph).  Minor bug found by Frank Harrell.

9/17/96: Fix survfit bug found by Kenneth Hess.  If the first event
was not a death, then the modified lower limits were incorrect.  (The
message "Length of longer object is not a multiple of the length of
the shorter object..." is the clue that something didn't match).

11/18/96: Avoid a division by 0 in coxfit2.c;  wtave[i] was never used for
observations i where denom=0 (no deaths), but in creating it we would divide
by zero for such rows; this causes a floating point problem on some
machines.  Pointed out by Thomas Lumley.

2/8/97: Repair minor bug found by Doug McManus.  If x was a factor, then
survfit(Surv(time, status) ~x) would always draw the curves in alphabetical
order, even if that was not the order of the levels.  (The curves were
numerically correct, however).  Now they are drawn in the proper order.

2/18/97: Bug in Surv -- if (start, stop, event] data was all deaths (no
censoring), then it was treated as all censored.

3/18/97: Add the robust score test (robust log-rank) to coxph.  An example
due to Micheal Riggs showed me that the Wald test is not always reliable
in the correlated data case.  (He added a variable, and the Wald test
got smaller by 1; even a random variable should increase the test).

3/25/97: Fix both the robust score test and the Wald test.  They were
always a test of beta=0, rather than beta=initial as they should be.
The Wald test is now part of the output structure rather than being computed
by summary.coxph.

3/26/97: Change the 'rowsum' function (called by residuals.coxph).  Changes
in the standard tapply() method now make it faster than the .C routine
rowsum.c; slowness of tapply was the original reason for writing rowsum.
Remove the .C call and replace it with tapply.  Remove rowsum.c from the
library.

3/27/97: New feature for survexp: the ratetable() construct:
	     survexp(time ~ group + ratetable(age=age, year=year, sex=sex)...
is no longer necessary in a case like the above, i.e., when the current
data set and the rates data set have the same variable names then the
routine infers the ratetable(..) expression.  

3/28/97: Fix bug in survobrien: if the very last death had only 1 subject
at risk, then a row of all zeros was output.
	Two minor changes suggested by Statsci: limits in print.survfit are
labeled as "LCL" and "UCL" instead of "CI"; fix spelling mistake in 
lines.survfit.

5/2/97: Fix bug in agsurv2.c, found by Nino Kuenzli.  If there were
multiple strata, the confidence bands for the latter ones were systematically
too large.  (Forgot to zero a variable).  Effects the results of 
survfit(coxph(....)).

5/8/97: Bug in residuals.coxph, 'collapse' did not work with deviance
residuals.  Pointed out by a user via Statsci support.  One line fix.
	
6/13/97: The resduals of Cox model with only a stata statement were wrong.
(Forgot to zero a variable).  Added a test for this, using both
the usual and (start, stop] models.

8/3/97: Add a new feature to plot.survfit and lines.survfit.  The "ptype"
option allows uphill (% dead to date) and cum-hazard plots, as well as
the default.

10/16/97: Small change to 'survexp', adding to the 3/27 change.  Now we
can skip the ratetable() arg for a Cox model as well.

10/22/97: Replace solve() with use of the svd() in coxph.  Some ill-conditioned
X matrices would succeed in the C routine (based on cholesky) and then
fail at the solve statement with "not full rank" errors.  
  Add the first version of code to support penalized Cox models.

11/1/97: Major changes/improvements to the plot.survfit and lines.survfit
functions.
    1. Remove the 8/3 change, and make it more general.  There is now
an arbitrary 'fun' argument that can be any user-defined function.  
    2. Add firstx, firsty option.  The default values of (0,1) give the
old behavior, adding (0,1) to the start of every curve.    
    3. Add xmax arg.  This allows the curves to be restricted to say
0-2 years, without all the "warning, point out of bounds" messages.
    4. Re-introduce explicit step functions.  The type='s' arg to lines
does not always give what we want, in particular the horizonal segement
BEFORE a curve drops off the plot (out of range) is not shown.  We now do
the computation internally that is found in the depreciated function "stepfun".
    5. Add conf.int arg to lines.survfit, and allow "only" as an option.
    6. Explicitly document the "Survival" style x-axis computation.
    7. Allow users to type log='y' as well as log=T.  This allows them
to do log='xy' for instance, to get a logarithmic x-axis.

11/18/97: Changes to person-years:
   Match.ratetable didn't catch a particular type of invalid data, a 0
value for "sex" in a ratetable (has to be 1:2 or "Male"/"Female"), which
would lead to strange output from the C routine.  Now gives an error message.
   Pyears would fail if the output was a single number, trying to add a
dimnames to it.

12/29/97: Changes to print functions:
   Per a request from StatSci, change the "formal" argument of all of 
them to "x", which matches the formal argument in the generic function.
The old way worked, but would generate an extra frame in the calling
tree.

1/13/98: Enhance the 'survobrien' function: variables protected by I() will
not be replaced by logit-ranks.  Also fix a bug, where if only 1 variable
was replaced by its ranks, then it's name was changed to "xx" in the output
data set.

3/1/98: Major oversight in predict.coxph.  If "newdata" is specified, the
code should not try to 're-insert' missing values as it would with the
original data.  

3/21/98: Tune up plot.survfit and lines.survfit.  Beth A had found some
combinations of 'fun' and 'firstx' that did not result in the correct
limits for the plot.  Lines.survfit did not get colors correct for multiple
expected curves.  Lines.survfit now returns the last point on each
curve, like plot.survfit does.  Updated the .d file, including Statsci
edits.  Improved the example for lines, using the PBC data.

7/20/98: Remove the restriction that survival times must be >=0.  The only
real worry was several routines that had "cbind(-1, time)" in order to
turn ordinary survival into an Andersen-Gill form; they just needed to
bind on something smaller than any current data value.

7/22/98: Fix 2 bugs in the survreg function, one a blunder the other an
enhancement.  If the minimizer ever picked a far-too small value for the
scale parameter, then the loglik would involve the value of a huge
scaled deviate, e.g. pnorm(-85) leading to lots of log(0) messages.
The enhancement catches these, the bug sometimes caused the situation to
arise when computing an initial guess for beta.  We now correctly solve
all but one of the "survreg fails" examples sent to me over time.  For that
one my ridge-stabilized NR just isn't good enough and gets lost.

9/25/98: Bug in survfit(cox model, new data, individual=T) -- there was
an implicit assumption that the input data, used to fit the original Cox
model, was sorted.  Found by Pat Grambsch.
    Repair a typo in lines.survfit, that caused it to squawk in a multi-strata
multi-line-newdata Cox model case.

10/27/98: Changes to coxph and survreg to allow the tolerance of the underlying
Cholesky routines as a parameter; before it was a constant in cholesky2.c

10/28/98: Add code for the random effects Cox model (major effort), and
for smoothing splines.

11/3/98  Rate tables: The survexp.us and survexp.usr tables now go from 1940
to 1990 with real census data.  The survexp.uswhite table is removed since
it it redundant.  The extrapolated year 2000 data has not yet been recreated.
The first year of life for the US table is now broken down into 4 intervals.

A bug in the US tables by race has been eliminated: the survexp and pyears
routines assume that an interpolating dimension (calander year), if present, 
is the last dimension of the table; this was not true for the old survexp.usr
table.

11/3/98: Update survfit.coxph and survfit.coxph.null to allow negative
survival times.  Also change to the use of "coxph.getdata" to recreate
the X matrix (consistency).

11/4/98: Changes to survfit.coxph: 1-allow the user to choose the estimator
(KM, Aalen or F-H) and the variance estimate (Greenwood, Aalen, F-H)
independently, 2- allow most of the common aliases (Aalen=Breslow=Tsiatis,
Kaplan-Meier=Kalbfleisch-Prentice=exact, Fleming-Harrington=Efron) in
the choice, 3-ensure that the default method agrees with the original
Cox model method for ties.  The latter could have ocassionally been wrong.
   Second, added another test set to exercise all of the options.
   Note that SAS phreg always uses variance=Aalen.  The baseline curve=KM 
or Aalen with the first as the default, whatever method was used for ties.

11/4/98: Discovered a bug in the residuals from a null Cox model with
strata.  (I must be blind -- it shows in the test suite ever since I
added the test last year!).  Rather than fix coxfit_null.c, just call
the usual C routine with a dummy variable.  So what if it's slower; this
case almost never happens.  Remove coxfit_null.c from the code base.

11/4/98: Remove a spurious line from the logistic distribution in 
survreg.distributions; it could cause a "log(0)" message.  This occured
in calculating the deviance, after the fit was all done, so results aren't
affected.

11/4/98: Add summary.ratetable method, which will help the documentation
of survexp.  

12/4/98: Finish a near complete rewrite of survreg to: accomodate
penalized models (the driving reason for the work), improve iteration
method, add residuals and predict methods, allow strata, add the
'testreg' directory.  The same functions that work with coxph --
frailty, pspline, and ridge -- now work with survreg as well.  The
arguments to survreg have changed too, see the manual page.  One can
use 'survreg.old' for old-style calls.

12/21/98: Real 1990 data added to the state rate tables (survexp.mn,
survexp.fl, survexp.az), year 2000 extrapolated data added to all rate
tables.

12/22/98: Per conversations with Bill Dunlap and Tim Hesterberg (Statsci),
as.data.frame.Surv is replaced with data.frameAux.Surv in version 5.

12/22/98: A new run of the test files in version 5 release 2.
   print.summary.survreg was setting options(digits) forever
   agreg with a null model needed one more "as.double" in the C call

1/6/99: Bug pointed out by Brian Ripley: plot.survfit and lines.survfit
could fail if a curve had only 2 points.

1/10/99: In cholesky3.c, zero out the entire column for a redundant variable
rather than just the diagonal.  This is mostly cosmetic: it causes the
redundant column to appear as all zeros in the variance matrix.  Both the
prior and current result are legal generalized inverses, so Wald tests and
etc are unchanged.  (I did this while validating some code because it made
tracking the singularities easier, then decided to keep it for aesthetic
reasons).

1/14/99: Fix typo in list of default distributions:
 frailty(x, dist='t') would fail.

1/14/99: Change default from "sparse=T" to "sparse=(ngroup>5)" in the
frailty functions.  When used as "frailty(sex)" say, the user likely
does want the estimated coefficients printed.

1/14/99: Small problem in summary.survreg for a frailty model, pointed
out by Rob Vierkant.  For a frailty only model it still tried to print
out a table of confidence intervals for the coefficients -- but there are
no coefficients (and shouldn't be).

1/19/99: Enhance predict.survreg to give std.err estimates for predicted
quantiles.  Add another test case, which also ferretted out some oversights
in newdata/strata combinations.  Per comments from B Ripley, make type=
'response' the default.

1/23/99: Redo the "matrix" residuals for survreg.  They now use only S
code and the elements from survreg.distributions.  This makes it possible
for user-written densities to give residuals.  This also showed up a bug
in the old code for interval-censored, 4th column of type "matrix" (2nd deriv
of g wrt log sigma), there was a "1+" missing from the c-code.

2/2/99: Rate tables validated by a second reader -- had 2 dimensions swapped
in the "by race" ones.  Technical report describing the extrapolation
has been written.

2/6/99: Complete the changes to survreg that allow user specified
distributions.  A little more tuning of the start-up estimates has
cut the average # iterations till convergence.

2/9/99: "One last time" run of the test suites.  Post the routines to
our web server.
   Lots and lots of things are still on my to-do list, but we have to
wrap this up and get the frailty material distributed.
Changes made since posting survival5

15Feb99: The predict.coxph function would fail for type='terms' if the
model contained a cluster() statement -- it was trying to create a column
for this and of course the X matrix doesn't have one; cluster() only
effects variance estimation.  Predict now ignores "cluster" terms.

19Feb99: Added a missing line in survexp, without it the routine fails if
the ratetable() call is omitted.  Obviously, the 3/27/97 change was not
thoroughly tested!  (Actually it was, but the changes didn't get
transcribed properly into the master source code).

19Feb99: Above changes to pyears.  If there is a ratetable argument and
no ratetable() term in the formula, add it automatically in the same way
as in survexp.

24Feb99: Wrong "cargs" in pspline for the AIC method.  Led to a slew of
warning messages which were actually inconsequential, but annoying.

25Feb99: Added the "df.residual" component to the survreg output.  This
is needed by anova.survreg.  (The anova routine was written by Statsci,
not me, and so isn't in my test suite.  Thanks to Brian Ripley for
pointing
this out).

