Tracking Research

It can be an onerous task to keep track of all the research that gets published these days. I can’t say that I am great at it either, but there are some reasonably standard Google tools that I have found useful.  Back when I first started in MRI research, my supervisor had a lab meeting in which one person’s job (rotating, thankfully) was to go to the library and photocopy the Table of Contents of all the relevant journals, go through looking for relevant or interesting articles and present to the group.  Those were the days.

Google Reader is an RSS feed aggregator which basically means you can subscribe to RSS feeds and it gives an interface to be able to see them. Sites like PubMed are great for searching for journal articles (PubMed for the medical arena). Their site has changed a lot over the past 10 years. Recently, on the search results page there is an RSS feed symbol (orange symbol in the graphic above) that one can click. In most browsers this will bring up a window in which you can select how to watch the feed. There are lots of offline readers, but I like Google Reader.  Many journals also have RSS feeds of the articles that come out.

Another Google tool that works well is Google Docs. It is a good place to writeup documents that need to be shared with others and it keeps revisions of the documents. I can’t say I have used it for writing a paper, yet, but it is getting very close. The one thing that I haven’t worked out is how to put references in a Google Doc though I am sure there is a way.

Noise in MRI (magnitude) data

Background

Magnitude MRI data has Rician noise distribution by definition ((Hákon Gudbjartsson and Samuel Patz, “The Rician Distribution of Noisy MRI Data,” Magnetic resonance in medicine : official journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine 34, no. 6 (December 1995): 910-914)). It comes about because two channels each with Gaussian noise are squared and added together ((R M Henkelman, “Measurement of signal intensities in the presence of noise in MR images,” Medical Physics 12, no. 2 (April 1985): 232-233.)).  There is a longer description here.

Modeling

The Rician noise is created as $latex y_e(t_i) = \sqrt{ \left[y(t_i) + e_1 \right]^2 + e_2^2 }$, where $latex y$ is the true signal, and $latex e_1$ and $latex e_2$ are random numbers from a Gaussian distribution with zero mean and standard deviation $latex \sigma$.  The standard deviation, $latex \sigma$, for the Gaussian distribution is related to the signal to noise ratio and is typically on the order of 1% – 10% of the signal $latex y$.

Code

It is relatively easy to model this using Matlab or Python. For the code here I am modeling a T2 decay curve and then the noise.

Matlab

[cc lang="matlab"]

%  Setup the initial variables
rho = 100;
t2 = 80; % in ms
te = 10:10:320;  % in ms

%  Create a T2 decay curve
y = rho * exp(-te / t2 );

%  Define the noise to be 5% of the signal
s = 5;

%  Create the two Gaussian random variable vectors
e1 = s * randn(size(y));
e2 = s * randn(size(y));

%  Now create the new, noisy decay curve.
y_e = sqrt( (y+e1).^2 + (e2).^2 );

[/cc]

Python

The Python version is quite similar.

[cc lang="python"]

from __future__ import division

#  Setup the initial variables
rho = 100
t2 = 80 # in ms
te = r_[10:330:10] # in ms

#  Create a T2 decay curve
y = rho * exp( -te / t2 )

#  Define the noise to be 5% of the signal
s = 5;

#  Create the two Gaussian random variable vectors
e1 = normal(0, 5, y.shape)
e2 = normal(0, 5, y.shape)

#  Now create the new, noisy decay curve.
y_e = sqrt( (y+e1)**2 + (e2)**2 );

[/cc]

There are a couple of small gotcha’s that at least tripped me up as I am still relatively new to Python.

  1. The first is that under Python 2.x all data is processed as integer (not doubles, as the default is in Matlab).  Supposedly this is going to change in Python 3, but to get around it for now, the best thing to do is to add the [cci lang="python"]from __future__ import divison[/cci].
  2. To define [cci lang="python"]te[/cci] I had to go to 330, rather than 320 as the generator is an open set on the higher end so it does not include the number.
  3. There are several options for creating the random numbers.  There is a Python module called [cci lang="python"]random[/cci] that could be used.  Instead I used the Numpy [cci lang="python"]normal[/cci] instead as I can pass in the shape parameter.

everydns

Hmm… It seems that everydns.net has been sold to the dyndns people. I have used everydns.net for quite a few years now (even donated!). There are two things that I am nervous about doing myself, one is DNS and the second is MX (mail). I have been very happily pointing my MX records to Google and very happily using everydns.net to point my A and CNAME records where I want. I started looking through for a replacement (just in case, at some point they will likely start charging for the service). I found a few that seem free but then look like they are slightly limiting. One I found seems to be very similar to everydns.net (free and not limiting). One is called http://xname.org and another is http://www.freedns.ws.

Change font size of tick labels

For making figures it is sometimes important (or quite important) to increase the font size of the x or y ticklabels. Here is one way I found to do it:

[cc lang="python"]
fig1 = figure()
for t in gca().get_yticklabels():
t.set_fontsize(14)

fig1.canvas.draw()

[/cc]

For some reason there has to be a [cci lang="python"]fig1.canvas.draw()[/cci] at the end of this to refresh the figure.

Finding coordinates in MRI data volumes

I find myself wanting to run through a list of (x,y,z) coordinates of some data volume (here called “d”) to do some sort of processing on each voxel. What I have come up with is the following…

First, find the set of coordinates that match some criterion. For example, find all coordinates in “d” that are greater than the 70th percentile:
[cc lang="python"]coords = array( nonzero( d > prctile( d, 70 ) ) ).transpose()[/cc]
Now that we have the list of coordinates, we can run through each coordinate and do some sort of processing on it:
[cc lang="python"]
for ii,coord in enumerate( coords ):
r = coord[0]
c = coord[1]

# more stuff here
[/cc]
Obviously if you are using a 3 dimensional volume “d” then you would use:
[cc lang="python"]
s = coord[0]
r = coord[2]
c = coord[2]
[/cc]

Show SQL table output using PHP

I found this code on the internet. This is a nice bit of code to show an arbitrary table with headers and all the entries. There is probably lots more one could do with it, but it works well for general testing and is completely independent of what is in the table.

[cc lang="php"]
function display($table)
{
$db_host = ‘localhost’;
$db_user = ‘‘;
$db_pwd = ‘ ‘;
 
$database = ‘‘;
 
if (!mysql_connect($db_host, $db_user, $db_pwd))
die(“Can’t connect to database”);
 
if (!mysql_select_db($database)) || die(“Can’t select database”);
 
// sending query
$result = mysql_query(“SELECT * FROM {$table}”);
if (!$result) {
die(“Query to show fields from table failed”);
}
 
$fields_num = mysql_num_fields($result);
 
echo “

;Table: {$table}

“;
echo “

“;
// printing table headers
for($i=0; $i<$fields_num; $i++)
{
$field = mysql_fetch_field($result);
echo "

“;
}
echo “

\n”;
// printing table rows
while($row = mysql_fetch_row($result))
{
echo “

“;
 
// $row is array… foreach( .. ) puts every element
// of $row to $cell variable
foreach($row as $cell)
echo “

“;
 
echo “

\n”;
}
echo “

{$field->name}
$cell

“;
mysql_free_result($result);
}
[/cc]

Backups and Revision Control

I seem to be always looking for a reasonable way to use revision control on my home directory (lots of binary files and text files). There is a great comparison of different revision control systems, including Git. I like Git, a little complex, but it is nice. There is a great writeup about using Git for revision control which includes automating the add/commit cycle reasonably often.

Project Euler

A very interesting Math type website is Project Euler. There are over 250 mathematical problems to solve in varying degrees of difficulty. The basic idea is to attempt to solve the problem using snippets of code such that the run time is less than 1 minute.

I have used this website to learn Python and have had great fun figuring out different ways of solving the problems. I can’t say all mine have completed in less than a minute, but getting there.

Python code for reading in Varian FDF files

Below is a Python class that will read in a Varian FDF file, or a Varian “.img” directory (which contains the FDF files). I have used this in the past, but can’t make any claims about it. I offer it up in hopes it is useful to someone.

[cc lang="python"]
import os
import re
from numpy import *
import struct

class Varian:

def __init__(self):
pass

def read( self, filename ):
if filename.endswith(‘.fdf’):
data = self.readFDF( filename )
elif filename.endswith(‘.img’):
data = self.readIMG( filename )
else:
print “Unknown filename %s ” % (filename)

return data

def readFDF(self, filename ):

fp = open( filename, ‘rb’ )

xsize = -1
ysize = -1
zsize = 1
bigendian = -1
done = False

while not done :

line = fp.readline()

if( len( line ) >= 1 and line[0] == chr(12) ):
break

if( len( line ) >= 1 and line[0] != chr(12) ):

if( line.find(‘bigendian’) > 0 ):
endian = line.split(‘=’)[-1].rstrip(‘\n; ‘).strip(‘ ‘)

if( line.find(‘echos’) > 0 ):
nechoes = line.split(‘=’)[-1].rstrip(‘\n; ‘).strip(‘ ‘)

if( line.find(‘echo_no’) > 0 ):
echo_no = line.split(‘=’)[-1].rstrip(‘\n; ‘).strip(‘ ‘)

if( line.find(‘nslices’) > 0 ):
nslices = line.split(‘=’)[-1].rstrip(‘\n; ‘).strip(‘ ‘)

if( line.find(‘slice_no’) > 0 ):
sl = line.split(‘=’)[-1].rstrip(‘\n; ‘).strip(‘ ‘)

if( line.find(‘matrix’) > 0 ):
m = re.findall(‘(\d+)’, line.rstrip())

if len(m) == 2:
xsize, ysize = int(m[0]), int(m[1])
elif len(m) == 3:
xsize, ysize, zsize = int(m[0]), int(m[1]), int(m[2])

fp.seek(-xsize*ysize*zsize*4,2)

if bigendian == 1:
fmt = “>%df” % (xsize*ysize*zsize)
else:
fmt = “<%df” % (xsize*ysize*zsize)

data = struct.unpack(fmt, fp.read(xsize*ysize*zsize*4))
data = array( data ).reshape( [xsize, ysize, zsize ] ).squeeze()

fp.close()

return data

def readIMG(self, directory):

# Get a list of all the FDF files in the directory
try:
files = os.listdir(directory)
except:
print “Could not find the directory %s” % directory
return

files = [ file for file in files if file.endswith('.fdf') ]

data = []
for file in files:
data.append( self.readFDF( directory+’/'+file ) )

data = transpose( array( data ), (1,2,0) )

return data

[/cc]

Python Dicom

There is a great Python package pydicom that implements a nice interface in order to be able to access data within Dicom files.

One application which I wrote up was a dicom directory summarizer which goes through a list of dicom files and summarizes the types of MRI data in the directory.  I found myself getting frustrated trying to figure out which series of data was which given the huge number of dicom files (with really long names too!) in a directory.

The code below may be run within a Dicom directory and should run on Siemens Dicom data (IMA) files. It has been a while that I have run it so I can’t guarantee that it will work, but it should be a good place to start.

[cc lang="python"]
#! /usr/bin/python

import dicom
import os
import re

def blah(val):
return re.compile(‘[\-\w]+\.MR\.[\-\w]+\.\d+\.1\..*’).match(val, 1)

# Get a list of all the files
files = []
for entry in os.listdir(‘.’):
if ~os.path.isdir(entry) & entry.endswith(‘IMA’):
files.append(entry)

# Filter to find the first of each series
firsts = filter( blah, files )

firsts.sort(key=lambda s: int( re.compile(‘[\-\w]+\.MR\.[\-\w]+\.(\d+)\.1\..*’).search(s).group(1)) )

# Read the first and output some interesting stuff
d = dicom.ReadFile(firsts[1])
print ” Patient: ” + d.PatientsName
print “Acquired: ” + d.StudyDate[0:4]+”-”+d.StudyDate[4:6]+”-”+d.StudyDate[6:8] \
+ ” ” + d.StudyTime[0:2] + “:” + d.StudyTime[2:4] + “:” + d.StudyTime[4:6]
print “Comments: ” + d.ImageComments

# Run through the first file of each of the series
for entry in firsts:
d = dicom.ReadFile(entry)

num = re.compile(‘[\-\w]+\.MR\.[\-\w]+\.(\d+)\.1\..*’).search(entry).group(1)

out = “\t” + str(num) + “) ” + d.SeriesDescription

tt = ‘[_\-\w]+\.MR\.[_\-\w]+\.’+str(num)+’\..*’
count = 0
r = re.compile(tt)
for f in files:
if( r.match(f, 1) ):
#print “%s matches %d” % (f, ii)
count = count + 1

if( not re.compile(“.*(FA|TRACEW|TENSOR|ADC|MoCoSeries)$”).match(d.SeriesDescription, 1 ) ):
out += ” (vols=” + str(count)

if( ‘RepetitionTime’ in d ):
out += “, TR=” + str(d.RepetitionTime)

if( ‘EchoTime’ in d ):
out += “, TE=” + str(d.EchoTime)

out += “)”

print out

[/cc]