Technical Reports


Asymptotic Tests For Location, Variance, and Equivalence of Gene Expression Using the Distance Matrix

by Douglas Hayden, Mark Kon

Questions regarding this item should be sent to Douglas Hayden

This technical report summarizes work done to develop asymptotic tests for location, variance, and equivalence of gene expression between two groups of microarrays in order to find an alternative to permutation tests based on the pairwise distance matrix. Given additive errors, the location test is not asymptotically normal without further assumptions. The variance test and equivalence test are asymptotically normal.

Survival Analysis of Longitudinal Microarrays

by Natasa Rajicic

R code for implementing survival analysis of longitudinally collected gene expression data. Methods are described in the linked paper by Rajicic N, Finkelstein DM, and Schoenfeld DA, submitted to Bioinformatics, May 2006.

Production of Deidentified Datasets at the Conclusion of Human Subjects Trials

by A. Korpak, D. Schoenfeld; for MGH Clinical Research Day 2005

This poster, prepared for the MGH Clinical Research Day 2005, discusses some important considerations regarding the preparation of deidentified datasets. It is a requirement of most NIH-sponsored trials that data be provided for distribution to qualified investigators. Studies involving human subjects face the challenge of preparing a dataset that is both useful and protects patient confidentiality. This is a new area, with few established guidelines; the poster looks at some specific issues that have arisen in the context of dataset preparations done for ARDS Network studies.

Analysis of familial aggregation studies with complex ascertainment schemes

by A. Matthews, D. Finkelstein, R. Betensky

Questions regarding this item should be sent to Dianne Finkelstein

This paper proposes an approach to adjust analyses of family studies for complex ascertainment schemes where the sampling is dependent on the disease history of the entire family. This approach extends that of Tosteson et al. 1991 to handle these types of sampling schemes.

Analysis of Co-aggregation of Cancer Based on Registry Data

by A. Matthews et al.

Questions regarding this item should be sent to Dianne Finkelstein

We conducted an exploratory analysis of co-aggregation of cancers in individuals and families utilizing sibships from over 18,000 families who had been recruited to the registry of the NCI-sponsored multi-institutional Cancer Genetics Network. We found statistically significant familial co-aggregation of lung cancer with pancreatic (p<0.0001), prostate (p < 0.001), and colorectal cancers (p=0.003). In addition, we found significant familial co-aggregation of pancreatic and colorectal cancers (p=0.022), and co-aggregation of hematopoietic and (non-ovarian) gynecologic cancers (p=0.01)

A set of functions for computing Matlab functions in parallel

micaParalize is a set of functions that allow a user to easily run normal Matlab functions on a multi-processesor machine by dividing the work-load amongst the serval processors. The program is used for simulations, bootstraps and function maximization problems in biostatistics.

micaParalize has been superceeded by biopara, which is on the software page

One sample log-rank software

by Dianne M. Finkelstein, Alona Muzikansky, David A. Schoenfeld

1s_logrank.xls is for computing one sample log rank test, confidence intervals for the SMR, calculating estimate for survivorship in the matched standard population and visually comparing survivorship of the sample to that of the standard population as described in the paper and instructions (both included in the zip file). The paper was published in the Journals of the National Cancer Institute, Vol. 95, No. 19, Oct 1 2003 pp. 1434–1439 as a commentary.

Analysis of Failure Time Data From Screening Studies With Missing Observations

by Dianne Finkelstein

Questions regarding this item should be sent to Dianne Finkelstein

"Analysis of Failure Time Data From Screening Studies With Missing Observations" was delivered by Dianne Finkelstein, Ph.D. at the 2003 Joint Statistical Meetings in San Fransisco.

A program for analysis of failure time data with dependent interval censoring

by Dianne Finkelstein and David Schoenfeld

depcen.exe is a program for estimating survival probabilities and probabilities of attending visits as described in the paper "Analysis of Failure Time Data with Dependent Interval Censoring" (Finkelstein D.M., Goggins W.B, and Schoenfeld D.A., Biometrics 2002 58:298-304). The program was implemented in Matlab and runs as a batch job from a DOS command prompt. The time to blood shedding data from the paper is also included. "interval_censr_data.zip" contains the data in .dat format and the .sas file required for setup. When using this data, please reference the article cited above.

A program for computing sequential boundaries

by David Schoenfeld, Ph. D.

Questions regarding this item should be sent to David Schoenfeld

Gen.m is an m-file (Matlab/Octave) for for computing sequential boundaries, as descibed in the paper "A Simple Algorithm for Designing Group Sequential Clinical Trials" (Schoenfeld, Biometrics 57, 972-974; September 2001). If you have Matlab download gen.m, gen.m can also be run under Octave a public domain m-file interpreter which can be downloaded from the URL below. In addition sequential.zip contains a compiled version of gen.m which runs on the command line.