numactl --interleave=all ./testing_zgetrf -N 100 -N 1000 --range 10:90:10 --range 100:900:100 --range 1000:9000:1000 --range 10000:20000:2000
MAGMA 1.6.0  compiled for CUDA capability >= 3.5
CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.3, MKL threads 16. 
device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 2: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
Usage: ./testing_zgetrf [options] [-h|--help]

ngpu 1
    M     N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   |PA-LU|/(N*|A|)
=========================================================================
  100   100     ---   (  ---  )      0.35 (   0.01)     ---   
 1000  1000     ---   (  ---  )    132.76 (   0.02)     ---   
   10    10     ---   (  ---  )      0.27 (   0.00)     ---   
   20    20     ---   (  ---  )      0.81 (   0.00)     ---   
   30    30     ---   (  ---  )      1.34 (   0.00)     ---   
   40    40     ---   (  ---  )      3.09 (   0.00)     ---   
   50    50     ---   (  ---  )      2.01 (   0.00)     ---   
   60    60     ---   (  ---  )      4.12 (   0.00)     ---   
   70    70     ---   (  ---  )      1.12 (   0.00)     ---   
   80    80     ---   (  ---  )      1.63 (   0.00)     ---   
   90    90     ---   (  ---  )      2.09 (   0.00)     ---   
  100   100     ---   (  ---  )      2.65 (   0.00)     ---   
  200   200     ---   (  ---  )     10.58 (   0.00)     ---   
  300   300     ---   (  ---  )     22.35 (   0.00)     ---   
  400   400     ---   (  ---  )     35.30 (   0.00)     ---   
  500   500     ---   (  ---  )     50.70 (   0.01)     ---   
  600   600     ---   (  ---  )     66.13 (   0.01)     ---   
  700   700     ---   (  ---  )     83.86 (   0.01)     ---   
  800   800     ---   (  ---  )    101.99 (   0.01)     ---   
  900   900     ---   (  ---  )    118.39 (   0.02)     ---   
 1000  1000     ---   (  ---  )    136.90 (   0.02)     ---   
 2000  2000     ---   (  ---  )    332.12 (   0.06)     ---   
 3000  3000     ---   (  ---  )    512.23 (   0.14)     ---   
 4000  4000     ---   (  ---  )    624.34 (   0.27)     ---   
 5000  5000     ---   (  ---  )    668.85 (   0.50)     ---   
 6000  6000     ---   (  ---  )    759.58 (   0.76)     ---   
 7000  7000     ---   (  ---  )    820.13 (   1.12)     ---   
 8000  8000     ---   (  ---  )    873.60 (   1.56)     ---   
 9000  9000     ---   (  ---  )    892.08 (   2.18)     ---   
10000 10000     ---   (  ---  )    930.85 (   2.86)     ---   
12000 12000     ---   (  ---  )    985.75 (   4.67)     ---   
14000 14000     ---   (  ---  )   1023.04 (   7.15)     ---   
16000 16000     ---   (  ---  )   1050.01 (  10.40)     ---   
18000 18000     ---   (  ---  )   1058.57 (  14.69)     ---   
20000 20000     ---   (  ---  )   1066.84 (  20.00)     ---   

numactl --interleave=all ./testing_zgetrf_gpu -N 100 -N 1000 --range 10:90:10 --range 100:900:100 --range 1000:9000:1000 --range 10000:20000:2000
MAGMA 1.6.0  compiled for CUDA capability >= 3.5
CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.3, MKL threads 16. 
device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 2: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
Usage: ./testing_zgetrf_gpu [options] [-h|--help]

    M     N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   |PA-LU|/(N*|A|)
=========================================================================
  100   100     ---   (  ---  )      0.91 (   0.00)     ---  
 1000  1000     ---   (  ---  )    128.24 (   0.02)     ---  
   10    10     ---   (  ---  )      0.06 (   0.00)     ---  
   20    20     ---   (  ---  )      0.29 (   0.00)     ---  
   30    30     ---   (  ---  )      0.60 (   0.00)     ---  
   40    40     ---   (  ---  )      1.29 (   0.00)     ---  
   50    50     ---   (  ---  )      1.18 (   0.00)     ---  
   60    60     ---   (  ---  )      2.25 (   0.00)     ---  
   70    70     ---   (  ---  )      0.54 (   0.00)     ---  
   80    80     ---   (  ---  )      0.82 (   0.00)     ---  
   90    90     ---   (  ---  )      1.10 (   0.00)     ---  
  100   100     ---   (  ---  )      1.45 (   0.00)     ---  
  200   200     ---   (  ---  )      6.56 (   0.00)     ---  
  300   300     ---   (  ---  )     16.09 (   0.00)     ---  
  400   400     ---   (  ---  )     28.03 (   0.01)     ---  
  500   500     ---   (  ---  )     43.38 (   0.01)     ---  
  600   600     ---   (  ---  )     56.20 (   0.01)     ---  
  700   700     ---   (  ---  )     74.53 (   0.01)     ---  
  800   800     ---   (  ---  )     92.45 (   0.01)     ---  
  900   900     ---   (  ---  )    110.40 (   0.02)     ---  
 1000  1000     ---   (  ---  )    132.67 (   0.02)     ---  
 2000  2000     ---   (  ---  )    417.89 (   0.05)     ---  
 3000  3000     ---   (  ---  )    643.55 (   0.11)     ---  
 4000  4000     ---   (  ---  )    766.68 (   0.22)     ---  
 5000  5000     ---   (  ---  )    807.08 (   0.41)     ---  
 6000  6000     ---   (  ---  )    898.85 (   0.64)     ---  
 7000  7000     ---   (  ---  )    954.91 (   0.96)     ---  
 8000  8000     ---   (  ---  )   1005.00 (   1.36)     ---  
 9000  9000     ---   (  ---  )   1015.98 (   1.91)     ---  
10000 10000     ---   (  ---  )   1048.98 (   2.54)     ---  
12000 12000     ---   (  ---  )   1090.31 (   4.23)     ---  
14000 14000     ---   (  ---  )   1115.15 (   6.56)     ---  
16000 16000     ---   (  ---  )   1131.71 (   9.65)     ---  
18000 18000     ---   (  ---  )   1135.59 (  13.69)     ---  
20000 20000     ---   (  ---  )   1127.67 (  18.92)     ---  
