Example Jupyter Notebooks

In the folder examples, there are currently two Jupyter/IPython notebooks stored:

  • Example Analysis.ipynb which contains some possible downstream processing, turning the tabular data from molyso into graphs.
  • Example molyso Embedding.ipynb which contains a little peek into using molyso as a library.

The notebooks are embedded into the documentation in a read-only manner. You can see them below.

Example Analysis

This Jupyter/IPython notebook explains how to post-process the tabular tab-separated value format produced by molyso to generate various graphs.

If you are unfamiliar with IPython and the NumPy/SciPy/matplotlib/pandas… Stack, we’d suggest you first read a bit about it. We think it’s definitely worth giving it a try, as it is a greatly versatile tool for scientific programming endeavours.

This notebook expects a prepared data file in the current working directory.

An example file is shipped with this notebook, it was created by calling molyso:

python3 -m molyso MM_Cglutamicum_D_aceE.ome.tiff -p -o MM_Cglutamicum_D_aceE_results.tsv

The file format is a tab separated table format, which can easily be opened with Pandas.

See the following explanation of the table columns:

Column name Contents
about_to_divide Whether this cell is about to divide, i.e., this is the last occurrence of this cell
cell_age The cell age [h], relative to its “birth”
cellxposition The horizontal position [µm] of the cell (i.e., of the channel)
cellyposition The vertical position [µm] of the cell within the channel
channel_average_cells The average count of cells detected in this channel
channel_in_multipoint The position number of the cell’s channel within this multipoint position
channel_orientation A heuristic result whether high or low y positions represent the open end
channel_width The width of the cell’s channel
division_age The age [h] at which the cell will divide
elongation_rate The elongation rate [µm·s⁻¹] of the cell
fluorescence_n The mean fluorescence of the cell (with background subtracted)
fluorescence_background_n The background fluorescence of the cell’s image
fluorescence_count The count of fluorescences present in the dataset. The other fluorescence_*_n fields occurr dependent on this number.
fluorescence_raw_n The mean raw fluorescence value of the cell
fluorescence_std_n The standard deviation of the fluorescence of the cell
length The cell length [µm]
multipoint The multipoint number of the frame of the cell
timepoint The timepoint [s] of the cell sighting
timepoint_num The timepoint number (within the dataset) of the cell sighting
uid_cell A unique id for this tracked cell
uid_parent A unique id for the cell’s parent cell
uid_thiscell A unique id for this particular cell sighting
uid_track A unique id for origin (the whole tracking from one start)
file_name = 'MM_Cglutamicum_D_aceE_results.tsv'
# Some general setup routines
%matplotlib inline
%config InlineBackend.figure_formats=['svg']
import pandas
import numpy
from matplotlib import pylab
pandas.options.display.max_columns = None
pylab.rcParams['figure.figsize'] = (10, 6)
pylab.rcParams['svg.fonttype'] = 'none'
pylab.rcParams['font.sans-serif'] = ['Arial']
pylab.rcParams['font.family'] = 'sans-serif'
    import seaborn
except ImportError:
    print("Optional dependency: seaborn to pretty up the graphs.")
# we first open the file via Pandas, it is formatted so that it can be read with the read_table command.

results = pandas.read_table(file_name)

# let's take a sneak peek into the file:
about_to_divide cell_age cellxposition cellyposition channel_average_cells channel_in_multipoint channel_orientation channel_width division_age elongation_rate fluorescence_0 fluorescence_background_0 fluorescence_count fluorescence_raw_0 fluorescence_std_0 length multipoint timepoint timepoint_num uid_cell uid_parent uid_thiscell uid_track
0 0 0.000000 169.5 162.999583 6.920833 0 0 1.235 0.886687 0.000000 7.315575 3246.311622 1 3253.627197 71.395714 1.040 0 1069.611671 1 1 0 2 1
1 0 0.248598 171.5 191.500000 6.920833 0 0 1.235 0.886687 0.023465 17.634413 3245.607042 1 3263.241455 82.589760 2.405 0 1964.563235 2 1 0 3 1
2 0 0.498510 171.5 210.526302 6.920833 0 0 1.235 0.886687 0.015561 18.731400 3246.076706 1 3264.808105 69.626511 3.315 0 2864.248744 3 1 0 4 1
3 0 0.000000 169.5 198.999583 6.920833 0 0 1.235 NaN 0.000000 16.169823 3246.311622 1 3262.481445 79.805779 3.380 0 1069.611671 1 5 0 6 5
4 0 0.000000 169.5 245.999583 6.920833 0 0 1.235 NaN 0.000000 9.266991 3246.311622 1 3255.578613 85.752457 2.470 0 1069.611671 1 7 0 8 7
# Let's take a look at the growth rate.
# Therefore, we take a look at all division events:

division_events = results.query('about_to_divide == 1')

print("We found %d division events (out of %d overall cell sightings)" % (len(division_events), len(results),))
We found 1165 division events (out of 23236 overall cell sightings)
pylab.title('Scatter plot of detected division events')
pylab.ylabel('Division time [h]')
pylab.xlabel('Experiment time [h]')
pylab.scatter(division_events.timepoint / (60.0*60.0), division_events.division_age)
<matplotlib.collections.PathCollection at 0x7fc000aea240>

As you can see, the points are quite nicely crowded in a meaningful range, with some outliers. As a reminder, the dataset was acquired with a 15 min interval, which produces quite some error.

Let’s look into a unified growth rate …

division_events_on_first_day = results.query('about_to_divide == 1 and timepoint < 24*60*60')

doubling_times = numpy.array(division_events_on_first_day.division_age)

print("Unfiltered growth rate on first day µ=%f" % (numpy.log(2)/doubling_times.mean(),))
Unfiltered growth rate on first day µ=0.483987

That way, the data contains quite some outliers, let’s remove them by applying some biological prior knowledge:

mu_min = 0.01
mu_max = 1.00

filtered_doubling_times = doubling_times[
    ((numpy.log(2)/mu_min) > doubling_times) & (doubling_times > (numpy.log(2)/mu_max))

print("Filtered growth rate on first day µ=%f" % (numpy.log(2)/filtered_doubling_times.mean(),))
Filtered growth rate on first day µ=0.403989

Now, how do we generate an overall growth rate graph from the scattered points? We use the simple moving average to unify many points into a single points (over time). [And group the points by their timepoints to have a measure if enough division events occured within a certain time frame. There are multiple possible approaches here, and if precise µ is desired, the graph should be based on filtered data as well!]

#division_events = division_events.sort('timepoint')
division_events = division_events.query('timepoint < (50.0 * 60.0 * 60.0)')

grouped_division_events = division_events.groupby(by=('timepoint_num',))

window_width = 25

sma_division_age = pandas.rolling_mean(numpy.array(grouped_division_events.mean().division_age), window_width)
sma_time = pandas.rolling_mean(numpy.array(grouped_division_events.mean().timepoint), window_width)
sma_count = pandas.rolling_mean(numpy.array(grouped_division_events.count().division_age), window_width)
sma_division_age[sma_count < 5] = float('nan')

t = sma_time / 60.0 / 60.0
mu = numpy.log(2)/sma_division_age

pylab.title('Growth graph')
pylab.xlabel('Experiment time [h]')
pylab.ylabel('Growth rate µ [h⁻¹]')
pylab.plot(t, mu)
pylab.ylim(0, 0.6)
pylab.xlim(0, 60)
fluor = results.query('fluorescence_0 == fluorescence_0')  # while the example dataset does not contain nans, other data might
fluor = fluor.groupby(by=('timepoint_num'))

fluor_time = pandas.rolling_mean(numpy.array(fluor.timepoint.mean()), window_width)  / (60.0*60.0)
fluor_value = pandas.rolling_mean(numpy.array(fluor.fluorescence_0.mean()), window_width)

pylab.title('Growth and fluorescence graph')
pylab.xlabel('Experiment time [h]')
pylab.ylabel('Growth rate µ [h⁻¹]')
pylab.ylim(0, 0.6)
pylab.plot(t, mu)
pylab.ylabel('Fluorescence [a.u.]')
pylab.plot(0, 0, label='µ')  # to add a legend entry
pylab.plot(fluor_time, fluor_value, label='Fluorescence', color='yellow')
pylab.xlim(0, 60)

Let’s look into some single cell data, e.g., cell length or fluorescence (note that different timepoints are used then in the paper):

# Dividing cells can be identified by the about_to_divide == 1 flag,
# cells, which resulted from a division have cell_age == 0

pre_division = results.query('about_to_divide==1')
post_division = results.query('cell_age==0')

pylab.title('Cell lengths pre/post-division')
pylab.xlabel('Length [µm]')
pylab.ylabel('Occurrence [#]')
pre_division.length.hist(alpha=0.5, label='Pre-division')
post_division.length.hist(alpha=0.5, label='Post-division')
pylab.title('Cell lengths boxplot')
pylab.ylabel('Length [µm]')
pylab.boxplot([pre_division.length, post_division.length], labels=['Pre-division', 'Post-division'])
fluor = results.query('fluorescence_0 == fluorescence_0')  # while the example dataset does not contain nans, other data might
pylab.title('Fluorescences pre/post-media change')
pylab.xlabel('Fluorescence [a.u.]')
pylab.ylabel('Occurrence [#]')
fluor.query('timepoint < 24*60*60').fluorescence_0.hist(alpha=0.5, label='Pre-media change')
# 6h gap for the bacteria to start production
fluor.query('timepoint > 30*60*60').fluorescence_0.hist(alpha=0.5, label='Post-media change')

That’s it so far. We hope this notebook gave you some ideas how to analyze your data.

Example molyso Embedding

This example should give a brief overview how molyso can easily be called and embedded. As of now, only the steps until after the cell detection are to be performed.

For the example, a test image shipped with molyso will be used.

# Some general setup routines
%matplotlib inline
%config InlineBackend.figure_formats=['svg']
import numpy
from matplotlib import pylab
    'figure.figsize': (10, 6),
    'svg.fonttype': 'none',
    'font.sans-serif': 'Arial',
    'font.family': 'sans-serif',
    'image.cmap': 'gray',
# the test image can be fetched by the `test_image` function
# it is included in molyso primarily to run testing routines

from molyso.test import test_image
<matplotlib.image.AxesImage at 0x7f09dbf04f98>
# the central starting point of the molyso highlevel interface
# is the Image class

from molyso.mm.image import Image
image = Image()

# as a first test, let's run the autorotate routine, which will
# automatically correct the rotation detected

print("Detected angle: %.4f°" % (image.angle,))
Detected angle: -1.5074°
<matplotlib.image.AxesImage at 0x7f09c8443e48>
# the next two functions call the low-level steps
# therefore, while not much is to see here, ...
# the magic happens behind the curtains


from molyso.debugging.debugplot import inject_poly_drawing_helper

# embedded debugging functionality can be used
# to produce an image with cells and channels drawn as overlay
# let's look into an important part of the Image-class

# the channels member, which supports the iterator interface
# ... therefore, we call it as parameter to list()
# to get a list of channels

channels = list(image.channels)

# and print some info about it (the .cells member works analogously)
print("The first channel contains: %d cells." % (len(channels[0].cells),))
print("The third channel contains: %d cells." % (len(channels[2].cells),))
The first channel contains: 0 cells.
The third channel contains: 4 cells.
# connectivity to original image data remains,
# as long as it is not removed (due to memory/disk-space consumption reasons)

channel = channels[2]
<matplotlib.image.AxesImage at 0x7f09c8369898>
# lets look into the channel's cells ...

[<molyso.mm.cell_detection.Cell object at 0x7f09d9e46f48>, <molyso.mm.cell_detection.Cell object at 0x7f09d9dd6248>, <molyso.mm.cell_detection.Cell object at 0x7f09d9e35d88>, <molyso.mm.cell_detection.Cell object at 0x7f09d9e151c8>]
# the last call did not really advance our insights ...
# let's take a look at some properties of the cell:
# *local_*top and *local_*bottom ...
# which coincide witht the pixel positions within the channel image
# Note: there is a top and bottom member as well,
# but these values include the total offset of the channel within the image!

for n, cell in enumerate(channel.cells):
    print("Cell #%d from %d to %d" % (n, cell.local_top, cell.local_bottom,))
Cell #0 from 109 to 140
Cell #1 from 142 to 169
Cell #2 from 171 to 215
Cell #3 from 217 to 255
# again, connectivity to the image data remains ...
# lets show all individual cell images of that channel

# long line just to prettify the output ...
from functools import partial
next_subplot = partial(pylab.subplot, int(numpy.sqrt(len(channel.cells)))+1, int(numpy.sqrt(len(channel.cells))))

for n, cell in enumerate(channel.cells):
    pylab.title('Cell #%d' % (n,))


This was only a brief overview of some basic functionality. It might get expanded in the future. For now, if you’d like to get deeper insights on the working of molyso, I’d like to ask you to study the source files.

# PS: You can as well turn on Debug printing within IPython to get more insight on the internals
from molyso.debugging import DebugPlot
DebugPlot.force_active = True
DebugPlot.post_figure = 'show'