Sunday, December 14, 2008

Derivative in Python/Numpy


Though I don't use it very often, the following little snippet for python/numpy can be useful for the determination of an array's derivative. The image shows the use of this function to determine the derivative of a particular array to aid in a classical peak picking scheme using zero-crossings. Oddly enough there doesn't seem to be an explicit function to do this in scipy and/or numpy--or at least not that I'm aware. If anyone has a better solution please share.


def derivative(y_data):
'''calculates the 1st derivative'''
y = (y_data[1:]-y_data[:-1])
dy = y/2 #scaling factor that is not necessary but useful for my application
#one more value is added because the length
# of y and dy are 1 less than y_data
return N.append(dy,dy.mean())

TopHat Filter


I'm always on the lookout for new methods for signal processing, especially related to mass spectrometry and general noise reduction. The tophat filter is a method borrowed from the image processing community that treats a 1D graph as a 2D black and white image. It is primarily used to remove the baseline noise that may be contained in a spectrum. This can be especially important for MALDI spectra that have a high background. An example of this processing may be found in the following document which also contains a number sample figures: Beating the Noise: New Statistical Methods for Detecting Signals in MALDI-TOF Spectra below Noise Level by Tim O.F. Conrad at the Free University of Berlin (pdf). The authors of this pdf are also connected with the OpenMS/TOPP project for proteomics data processing. I've also included a small script that I put together that will perform this function in python.


import numpy as N
from scipy import ndimage#used for tophat filter

def topHat(data, factor):
'''
data -- numpy array
pntFactor determines how finely the filter is applied to data.
A point factor of 0.01 is appropriate for the tophat filter of Bruker MALDI mass spectra.
A smaller number is faster but a trade-off is imposed
'''
pntFactor = factor
struct_pts = int(round(data.size*pntFactor))
str_el = N.repeat([1], struct_pts)
tFil = ndimage.white_tophat(data, None, str_el)

return tFil