Example Jupyter Notebooks¶
In the folder examples, there are currently two Jupyter/IPython notebooks stored:
- Example Analysis.ipynb which contains some possible downstream processing, turning the tabular data from molyso into graphs.
- Example molyso Embedding.ipynb which contains a little peek into using molyso as a library.
The notebooks are embedded into the documentation in a read-only manner. You can see them below.
Example Analysis¶
This Jupyter/IPython notebook explains how to post-process the tabular tab-separated value format produced by molyso to generate various graphs.
If you are unfamiliar with IPython and the NumPy/SciPy/matplotlib/pandas… Stack, we’d suggest you first read a bit about it. We think it’s definitely worth giving it a try, as it is a greatly versatile tool for scientific programming endeavours.
This notebook expects a prepared data file in the current working directory.
An example file is shipped with this notebook, it was created by calling molyso:
python3 -m molyso MM_Cglutamicum_D_aceE.ome.tiff -p -o MM_Cglutamicum_D_aceE_results.tsv
The file format is a tab separated table format, which can easily be opened with Pandas.
See the following explanation of the table columns:
Column name | Contents |
---|---|
about_to_divide | Whether this cell is about to divide, i.e., this is the last occurrence of this cell |
cell_age | The cell age [h], relative to its “birth” |
cellxposition | The horizontal position [µm] of the cell (i.e., of the channel) |
cellyposition | The vertical position [µm] of the cell within the channel |
channel_average_cells | The average count of cells detected in this channel |
channel_in_multipoint | The position number of the cell’s channel within this multipoint position |
channel_orientation | A heuristic result whether high or low y positions represent the open end |
channel_width | The width of the cell’s channel |
division_age | The age [h] at which the cell will divide |
elongation_rate | The elongation rate [µm·s⁻¹] of the cell |
fluorescence_n | The mean fluorescence of the cell (with background subtracted) |
fluorescence_background_n | The background fluorescence of the cell’s image |
fluorescence_count | The count of fluorescences present in the dataset. The other fluorescence_*_n fields occurr dependent on this number. |
fluorescence_raw_n | The mean raw fluorescence value of the cell |
fluorescence_std_n | The standard deviation of the fluorescence of the cell |
length | The cell length [µm] |
multipoint | The multipoint number of the frame of the cell |
timepoint | The timepoint [s] of the cell sighting |
timepoint_num | The timepoint number (within the dataset) of the cell sighting |
uid_cell | A unique id for this tracked cell |
uid_parent | A unique id for the cell’s parent cell |
uid_thiscell | A unique id for this particular cell sighting |
uid_track | A unique id for origin (the whole tracking from one start) |
file_name = 'MM_Cglutamicum_D_aceE_results.tsv'
# Some general setup routines
%matplotlib inline
%config InlineBackend.figure_formats=['svg']
import pandas
import numpy
from matplotlib import pylab
pandas.options.display.max_columns = None
pylab.rcParams['figure.figsize'] = (10, 6)
pylab.rcParams['svg.fonttype'] = 'none'
pylab.rcParams['font.sans-serif'] = ['Arial']
pylab.rcParams['font.family'] = 'sans-serif'
try:
import seaborn
except ImportError:
print("Optional dependency: seaborn to pretty up the graphs.")
# we first open the file via Pandas, it is formatted so that it can be read with the read_table command.
results = pandas.read_table(file_name)
# let's take a sneak peek into the file:
results.head()
about_to_divide | cell_age | cellxposition | cellyposition | channel_average_cells | channel_in_multipoint | channel_orientation | channel_width | division_age | elongation_rate | fluorescence_0 | fluorescence_background_0 | fluorescence_count | fluorescence_raw_0 | fluorescence_std_0 | length | multipoint | timepoint | timepoint_num | uid_cell | uid_parent | uid_thiscell | uid_track | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0.000000 | 169.5 | 162.999583 | 6.920833 | 0 | 0 | 1.235 | 0.886687 | 0.000000 | 7.315575 | 3246.311622 | 1 | 3253.627197 | 71.395714 | 1.040 | 0 | 1069.611671 | 1 | 1 | 0 | 2 | 1 |
1 | 0 | 0.248598 | 171.5 | 191.500000 | 6.920833 | 0 | 0 | 1.235 | 0.886687 | 0.023465 | 17.634413 | 3245.607042 | 1 | 3263.241455 | 82.589760 | 2.405 | 0 | 1964.563235 | 2 | 1 | 0 | 3 | 1 |
2 | 0 | 0.498510 | 171.5 | 210.526302 | 6.920833 | 0 | 0 | 1.235 | 0.886687 | 0.015561 | 18.731400 | 3246.076706 | 1 | 3264.808105 | 69.626511 | 3.315 | 0 | 2864.248744 | 3 | 1 | 0 | 4 | 1 |
3 | 0 | 0.000000 | 169.5 | 198.999583 | 6.920833 | 0 | 0 | 1.235 | NaN | 0.000000 | 16.169823 | 3246.311622 | 1 | 3262.481445 | 79.805779 | 3.380 | 0 | 1069.611671 | 1 | 5 | 0 | 6 | 5 |
4 | 0 | 0.000000 | 169.5 | 245.999583 | 6.920833 | 0 | 0 | 1.235 | NaN | 0.000000 | 9.266991 | 3246.311622 | 1 | 3255.578613 | 85.752457 | 2.470 | 0 | 1069.611671 | 1 | 7 | 0 | 8 | 7 |
# Let's take a look at the growth rate.
# Therefore, we take a look at all division events:
division_events = results.query('about_to_divide == 1')
print("We found %d division events (out of %d overall cell sightings)" % (len(division_events), len(results),))
We found 1165 division events (out of 23236 overall cell sightings)
pylab.title('Scatter plot of detected division events')
pylab.ylabel('Division time [h]')
pylab.xlabel('Experiment time [h]')
pylab.scatter(division_events.timepoint / (60.0*60.0), division_events.division_age)
<matplotlib.collections.PathCollection at 0x7fc000aea240>
As you can see, the points are quite nicely crowded in a meaningful range, with some outliers. As a reminder, the dataset was acquired with a 15 min interval, which produces quite some error.
Let’s look into a unified growth rate …
division_events_on_first_day = results.query('about_to_divide == 1 and timepoint < 24*60*60')
doubling_times = numpy.array(division_events_on_first_day.division_age)
print("Unfiltered growth rate on first day µ=%f" % (numpy.log(2)/doubling_times.mean(),))
Unfiltered growth rate on first day µ=0.483987
That way, the data contains quite some outliers, let’s remove them by applying some biological prior knowledge:
mu_min = 0.01
mu_max = 1.00
filtered_doubling_times = doubling_times[
((numpy.log(2)/mu_min) > doubling_times) & (doubling_times > (numpy.log(2)/mu_max))
]
print("Filtered growth rate on first day µ=%f" % (numpy.log(2)/filtered_doubling_times.mean(),))
Filtered growth rate on first day µ=0.403989
Now, how do we generate an overall growth rate graph from the scattered points? We use the simple moving average to unify many points into a single points (over time). [And group the points by their timepoints to have a measure if enough division events occured within a certain time frame. There are multiple possible approaches here, and if precise µ is desired, the graph should be based on filtered data as well!]
#division_events = division_events.sort('timepoint')
division_events = division_events.query('timepoint < (50.0 * 60.0 * 60.0)')
grouped_division_events = division_events.groupby(by=('timepoint_num',))
window_width = 25
sma_division_age = pandas.rolling_mean(numpy.array(grouped_division_events.mean().division_age), window_width)
sma_time = pandas.rolling_mean(numpy.array(grouped_division_events.mean().timepoint), window_width)
sma_count = pandas.rolling_mean(numpy.array(grouped_division_events.count().division_age), window_width)
sma_division_age[sma_count < 5] = float('nan')
t = sma_time / 60.0 / 60.0
mu = numpy.log(2)/sma_division_age
pylab.title('Growth graph')
pylab.xlabel('Experiment time [h]')
pylab.ylabel('Growth rate µ [h⁻¹]')
pylab.plot(t, mu)
pylab.ylim(0, 0.6)
pylab.xlim(0, 60)
pylab.show()
fluor = results.query('fluorescence_0 == fluorescence_0') # while the example dataset does not contain nans, other data might
fluor = fluor.groupby(by=('timepoint_num'))
fluor_time = pandas.rolling_mean(numpy.array(fluor.timepoint.mean()), window_width) / (60.0*60.0)
fluor_value = pandas.rolling_mean(numpy.array(fluor.fluorescence_0.mean()), window_width)
pylab.title('Growth and fluorescence graph')
pylab.xlabel('Experiment time [h]')
pylab.ylabel('Growth rate µ [h⁻¹]')
pylab.ylim(0, 0.6)
pylab.plot(t, mu)
pylab.twinx()
pylab.ylabel('Fluorescence [a.u.]')
pylab.plot(0, 0, label='µ') # to add a legend entry
pylab.plot(fluor_time, fluor_value, label='Fluorescence', color='yellow')
pylab.xlim(0, 60)
pylab.legend()
pylab.show()
Let’s look into some single cell data, e.g., cell length or fluorescence (note that different timepoints are used then in the paper):
# Dividing cells can be identified by the about_to_divide == 1 flag,
# cells, which resulted from a division have cell_age == 0
pre_division = results.query('about_to_divide==1')
post_division = results.query('cell_age==0')
pylab.subplot(2,2,1)
pylab.title('Cell lengths pre/post-division')
pylab.xlabel('Length [µm]')
pylab.ylabel('Occurrence [#]')
pre_division.length.hist(alpha=0.5, label='Pre-division')
post_division.length.hist(alpha=0.5, label='Post-division')
pylab.legend()
pylab.subplot(2,2,2)
pylab.title('Cell lengths boxplot')
pylab.ylabel('Length [µm]')
pylab.boxplot([pre_division.length, post_division.length], labels=['Pre-division', 'Post-division'])
pylab.show()
fluor = results.query('fluorescence_0 == fluorescence_0') # while the example dataset does not contain nans, other data might
pylab.title('Fluorescences pre/post-media change')
pylab.xlabel('Fluorescence [a.u.]')
pylab.ylabel('Occurrence [#]')
fluor.query('timepoint < 24*60*60').fluorescence_0.hist(alpha=0.5, label='Pre-media change')
# 6h gap for the bacteria to start production
fluor.query('timepoint > 30*60*60').fluorescence_0.hist(alpha=0.5, label='Post-media change')
pylab.legend()
pylab.show()
That’s it so far. We hope this notebook gave you some ideas how to analyze your data.
Example molyso Embedding¶
This example should give a brief overview how molyso can easily be called and embedded. As of now, only the steps until after the cell detection are to be performed.
For the example, a test image shipped with molyso will be used.
# Some general setup routines
%matplotlib inline
%config InlineBackend.figure_formats=['svg']
import numpy
from matplotlib import pylab
pylab.rcParams.update({
'figure.figsize': (10, 6),
'svg.fonttype': 'none',
'font.sans-serif': 'Arial',
'font.family': 'sans-serif',
'image.cmap': 'gray',
})
# the test image can be fetched by the `test_image` function
# it is included in molyso primarily to run testing routines
from molyso.test import test_image
pylab.imshow(test_image())
<matplotlib.image.AxesImage at 0x7f09dbf04f98>
# the central starting point of the molyso highlevel interface
# is the Image class
from molyso.mm.image import Image
image = Image()
image.setup_image(test_image())
# as a first test, let's run the autorotate routine, which will
# automatically correct the rotation detected
image.autorotate()
print("Detected angle: %.4f°" % (image.angle,))
pylab.imshow(image.image)
Detected angle: -1.5074°
<matplotlib.image.AxesImage at 0x7f09c8443e48>
# the next two functions call the low-level steps
# therefore, while not much is to see here, ...
# the magic happens behind the curtains
image.find_channels()
image.find_cells_in_channels()
from molyso.debugging.debugplot import inject_poly_drawing_helper
inject_poly_drawing_helper(pylab)
# embedded debugging functionality can be used
# to produce an image with cells and channels drawn as overlay
image.debug_print_cells(pylab)
# let's look into an important part of the Image-class
# the channels member, which supports the iterator interface
# ... therefore, we call it as parameter to list()
# to get a list of channels
channels = list(image.channels)
# and print some info about it (the .cells member works analogously)
print("The first channel contains: %d cells." % (len(channels[0].cells),))
print("The third channel contains: %d cells." % (len(channels[2].cells),))
The first channel contains: 0 cells.
The third channel contains: 4 cells.
# connectivity to original image data remains,
# as long as it is not removed (due to memory/disk-space consumption reasons)
channel = channels[2]
pylab.imshow(channel.channel_image)
<matplotlib.image.AxesImage at 0x7f09c8369898>
# lets look into the channel's cells ...
print(list(channel.cells))
[<molyso.mm.cell_detection.Cell object at 0x7f09d9e46f48>, <molyso.mm.cell_detection.Cell object at 0x7f09d9dd6248>, <molyso.mm.cell_detection.Cell object at 0x7f09d9e35d88>, <molyso.mm.cell_detection.Cell object at 0x7f09d9e151c8>]
# the last call did not really advance our insights ...
# let's take a look at some properties of the cell:
# *local_*top and *local_*bottom ...
# which coincide witht the pixel positions within the channel image
# Note: there is a top and bottom member as well,
# but these values include the total offset of the channel within the image!
for n, cell in enumerate(channel.cells):
print("Cell #%d from %d to %d" % (n, cell.local_top, cell.local_bottom,))
Cell #0 from 109 to 140
Cell #1 from 142 to 169
Cell #2 from 171 to 215
Cell #3 from 217 to 255
# again, connectivity to the image data remains ...
# lets show all individual cell images of that channel
# long line just to prettify the output ...
from functools import partial
next_subplot = partial(pylab.subplot, int(numpy.sqrt(len(channel.cells)))+1, int(numpy.sqrt(len(channel.cells))))
for n, cell in enumerate(channel.cells):
next_subplot(n+1)
pylab.title('Cell #%d' % (n,))
pylab.imshow(cell.cell_image)
pylab.tight_layout()
This was only a brief overview of some basic functionality. It might get expanded in the future. For now, if you’d like to get deeper insights on the working of molyso, I’d like to ask you to study the source files.
# PS: You can as well turn on Debug printing within IPython to get more insight on the internals
from molyso.debugging import DebugPlot
DebugPlot.force_active = True
DebugPlot.post_figure = 'show'
image.find_channels()
image.find_cells_in_channels()