Gearing up software systems for genome data: A case study of SAS

4/2/2009 Jing Hua Zhao


Here I include code snippets from the application note submitted to the
journal Bioinformatics.

sasgwa.sas contains the macro I used to perform analysis for the European
Prospective Investigation in Cancer (EPIC) obesity genetics analysis. As
the three sources of information it requires are quite standard, I won't
provide example data.

Under Linux, I used the following command to run the program,

sas -work /tmp -memsize 4G sasgwa &

where /tmp is the directory to store temporary file.

For reference I have given an alternative form of the macro at the end
of the program. To customize the macro to your own data, you can specify
appropriate location(s) of your file in the following statements,

libname in ("." "/genetics/data/GWA/EPIC/6-5-7/wide2");
libname out '/tmp/scratch';

There were memory problem with the CASECONTROL procedure for SAS 9.1.3
but LOGISTIC procedure should work.

To adapt the algorithm to other systems, e.g., Stata, you can include
chromosomes and physical positions of SNPs in the map file, so that
the notsorted option in the by statement can be avoided.

Here is how I select just one SNP for an association analysis. 

proc sql;
     create table ind as select id from in.id where cohort="1" order by id;
     create table bmi as select id, bmi from in.trait order by id;
quit;
data trait;
     merge ind (in=id_flag) bmi;
     format bmi 5.2 obesity 1.;
     if bmi ne . then obesity=(bmi>=30);
     by id;
     one=id_flag;
run;
data rsn;
     input rsn$1-15;
cards;
rs9939609
;
proc sql;
     create table a1a2 as select * from in.a1a216 where rsn in (select rsn from rsn);
     create table map  as select chr, rsn, a, b, pos from in.map
            where rsn in (select rsn from rsn) order by chr, rsn;
quit;
%wtl(in.a1a2, map, trait, snpid=rsn, vlist=age bmi obesity, inc=one);

The remaining code is very much similar.

Please contact me if you have problems,


Jing Hua Zhao

MRC Epidemiology Unit
Institute of Metabolic Science
PO Box 285
Addenbrooke's Hospital
Hills Road
Cambridge CB2 0QQ
United Kingdom
Tel: +44 1223 769165
email: jinghua.zhao@mrc-epid.cam.ac.uk
url: http://www.mrc-epid.cam.ac.uk/~jinghua.zhao
