Example Jupyter Notebooks ¶

In the folder examples, there are currently two Jupyter/IPython notebooks stored:

Example Analysis.ipynb which contains some possible downstream processing, turning the tabular data from molyso into graphs.

Example molyso Embedding.ipynb which contains a little peek into using molyso as a library.

Contents

Example Jupyter Notebooks
- Example Analysis
- Example molyso Embedding

The notebooks are embedded into the documentation in a read-only manner. You can see them below.

Example Analysis ¶

This Jupyter/IPython notebook explains how to post-process the tabular tab-separated value format produced by molyso to generate various graphs.

If you are unfamiliar with IPython and the NumPy/SciPy/matplotlib/pandas… Stack, we’d suggest you first read a bit about it. We think it’s definitely worth giving it a try, as it is a greatly versatile tool for scientific programming endeavours.

This notebook expects a prepared data file in the current working directory.

An example file is shipped with this notebook, it was created by calling molyso:

python3 -m molyso MM_Cglutamicum_D_aceE.ome.tiff -p -o MM_Cglutamicum_D_aceE_results.tsv

The file format is a tab separated table format, which can easily be opened with Pandas.

See the following explanation of the table columns:

Column name	Contents
about_to_divide	Whether this cell is about to divide, i.e., this is the last occurrence of this cell
cell_age	The cell age [h], relative to its “birth”
cellxposition	The horizontal position [µm] of the cell (i.e., of the channel)
cellyposition	The vertical position [µm] of the cell within the channel
channel_average_cells	The average count of cells detected in this channel
channel_in_multipoint	The position number of the cell’s channel within this multipoint position
channel_orientation	A heuristic result whether high or low y positions represent the open end
channel_width	The width of the cell’s channel
division_age	The age [h] at which the cell will divide
elongation_rate	The elongation rate [µm·s⁻¹] of the cell
fluorescence_n	The mean fluorescence of the cell (with background subtracted)
fluorescence_background_n	The background fluorescence of the cell’s image
fluorescence_count	The count of fluorescences present in the dataset. The other fluorescence_*_n fields occurr dependent on this number.
fluorescence_raw_n	The mean raw fluorescence value of the cell
fluorescence_std_n	The standard deviation of the fluorescence of the cell
length	The cell length [µm]
multipoint	The multipoint number of the frame of the cell
timepoint	The timepoint [s] of the cell sighting
timepoint_num	The timepoint number (within the dataset) of the cell sighting
uid_cell	A unique id for this tracked cell
uid_parent	A unique id for the cell’s parent cell
uid_thiscell	A unique id for this particular cell sighting
uid_track	A unique id for origin (the whole tracking from one start)

file_name = 'MM_Cglutamicum_D_aceE_results.tsv'

# Some general setup routines
%matplotlib inline
%config InlineBackend.figure_formats=['svg']
import pandas
import numpy
from matplotlib import pylab
pandas.options.display.max_columns = None
pylab.rcParams['figure.figsize'] = (10, 6)
pylab.rcParams['svg.fonttype'] = 'none'
pylab.rcParams['font.sans-serif'] = ['Arial']
pylab.rcParams['font.family'] = 'sans-serif'
try:
    import seaborn
except ImportError:
    print("Optional dependency: seaborn to pretty up the graphs.")

# we first open the file via Pandas, it is formatted so that it can be read with the read_table command.

results = pandas.read_table(file_name)

# let's take a sneak peek into the file:
results.head()

	cell_age	cellxposition	cellyposition	channel_average_cells	channel_width	division_age	elongation_rate	fluorescence_0	fluorescence_background_0	fluorescence_count	fluorescence_raw_0	fluorescence_std_0	length	timepoint	timepoint_num	uid_cell	uid_thiscell	uid_track
0	0.000000	169.5	162.999583	6.920833	1.235	0.886687	0.000000	7.315575	3246.311622	1	3253.627197	71.395714	1.040	1069.611671	1	1	2	1
1	0.248598	171.5	191.500000	6.920833	1.235	0.886687	0.023465	17.634413	3245.607042	1	3263.241455	82.589760	2.405	1964.563235	2	1	3	1
2	0.498510	171.5	210.526302	6.920833	1.235	0.886687	0.015561	18.731400	3246.076706	1	3264.808105	69.626511	3.315	2864.248744	3	1	4	1
3	0.000000	169.5	198.999583	6.920833	1.235	NaN	0.000000	16.169823	3246.311622	1	3262.481445	79.805779	3.380	1069.611671	1	5	6	5
4	0.000000	169.5	245.999583	6.920833	1.235	NaN	0.000000	9.266991	3246.311622	1	3255.578613	85.752457	2.470	1069.611671	1	7	8	7

# Let's take a look at the growth rate.
# Therefore, we take a look at all division events:

division_events = results.query('about_to_divide == 1')

print("We found %d division events (out of %d overall cell sightings)" % (len(division_events), len(results),))

We found 1165 division events (out of 23236 overall cell sightings)

pylab.title('Scatter plot of detected division events')
pylab.ylabel('Division time [h]')
pylab.xlabel('Experiment time [h]')
pylab.scatter(division_events.timepoint / (60.0*60.0), division_events.division_age)

<matplotlib.collections.PathCollection at 0x7fc000aea240>

As you can see, the points are quite nicely crowded in a meaningful range, with some outliers. As a reminder, the dataset was acquired with a 15 min interval, which produces quite some error.

Let’s look into a unified growth rate …

division_events_on_first_day = results.query('about_to_divide == 1 and timepoint < 24*60*60')

doubling_times = numpy.array(division_events_on_first_day.division_age)

print("Unfiltered growth rate on first day µ=%f" % (numpy.log(2)/doubling_times.mean(),))

Unfiltered growth rate on first day µ=0.483987

That way, the data contains quite some outliers, let’s remove them by applying some biological prior knowledge:

mu_min = 0.01
mu_max = 1.00

filtered_doubling_times = doubling_times[
    ((numpy.log(2)/mu_min) > doubling_times) & (doubling_times > (numpy.log(2)/mu_max))
]

print("Filtered growth rate on first day µ=%f" % (numpy.log(2)/filtered_doubling_times.mean(),))

Filtered growth rate on first day µ=0.403989

Now, how do we generate an overall growth rate graph from the scattered points? We use the simple moving average to unify many points into a single points (over time). [And group the points by their timepoints to have a measure if enough division events occured within a certain time frame. There are multiple possible approaches here, and if precise µ is desired, the graph should be based on filtered data as well!]

#division_events = division_events.sort('timepoint')
division_events = division_events.query('timepoint < (50.0 * 60.0 * 60.0)')

grouped_division_events = division_events.groupby(by=('timepoint_num',))

window_width = 25

sma_division_age = pandas.rolling_mean(numpy.array(grouped_division_events.mean().division_age), window_width)
sma_time = pandas.rolling_mean(numpy.array(grouped_division_events.mean().timepoint), window_width)
sma_count = pandas.rolling_mean(numpy.array(grouped_division_events.count().division_age), window_width)
sma_division_age[sma_count < 5] = float('nan')

t = sma_time / 60.0 / 60.0
mu = numpy.log(2)/sma_division_age

pylab.title('Growth graph')
pylab.xlabel('Experiment time [h]')
pylab.ylabel('Growth rate µ [h⁻¹]')
pylab.plot(t, mu)
pylab.ylim(0, 0.6)
pylab.xlim(0, 60)
pylab.show()

fluor = results.query('fluorescence_0 == fluorescence_0')  # while the example dataset does not contain nans, other data might
fluor = fluor.groupby(by=('timepoint_num'))

fluor_time = pandas.rolling_mean(numpy.array(fluor.timepoint.mean()), window_width)  / (60.0*60.0)
fluor_value = pandas.rolling_mean(numpy.array(fluor.fluorescence_0.mean()), window_width)

pylab.title('Growth and fluorescence graph')
pylab.xlabel('Experiment time [h]')
pylab.ylabel('Growth rate µ [h⁻¹]')
pylab.ylim(0, 0.6)
pylab.plot(t, mu)
pylab.twinx()
pylab.ylabel('Fluorescence [a.u.]')
pylab.plot(0, 0, label='µ')  # to add a legend entry
pylab.plot(fluor_time, fluor_value, label='Fluorescence', color='yellow')
pylab.xlim(0, 60)
pylab.legend()
pylab.show()

Let’s look into some single cell data, e.g., cell length or fluorescence (note that different timepoints are used then in the paper):

# Dividing cells can be identified by the about_to_divide == 1 flag,
# cells, which resulted from a division have cell_age == 0

pre_division = results.query('about_to_divide==1')
post_division = results.query('cell_age==0')

pylab.subplot(2,2,1)
pylab.title('Cell lengths pre/post-division')
pylab.xlabel('Length [µm]')
pylab.ylabel('Occurrence [#]')
pre_division.length.hist(alpha=0.5, label='Pre-division')
post_division.length.hist(alpha=0.5, label='Post-division')
pylab.legend()
pylab.subplot(2,2,2)
pylab.title('Cell lengths boxplot')
pylab.ylabel('Length [µm]')
pylab.boxplot([pre_division.length, post_division.length], labels=['Pre-division', 'Post-division'])
pylab.show()

fluor = results.query('fluorescence_0 == fluorescence_0')  # while the example dataset does not contain nans, other data might
pylab.title('Fluorescences pre/post-media change')
pylab.xlabel('Fluorescence [a.u.]')
pylab.ylabel('Occurrence [#]')
fluor.query('timepoint < 24*60*60').fluorescence_0.hist(alpha=0.5, label='Pre-media change')
# 6h gap for the bacteria to start production
fluor.query('timepoint > 30*60*60').fluorescence_0.hist(alpha=0.5, label='Post-media change')
pylab.legend()
pylab.show()

That’s it so far. We hope this notebook gave you some ideas how to analyze your data.

Example molyso Embedding ¶

This example should give a brief overview how molyso can easily be called and embedded. As of now, only the steps until after the cell detection are to be performed.

For the example, a test image shipped with molyso will be used.

# Some general setup routines
%matplotlib inline
%config InlineBackend.figure_formats=['svg']
import numpy
from matplotlib import pylab
pylab.rcParams.update({
    'figure.figsize': (10, 6),
    'svg.fonttype': 'none',
    'font.sans-serif': 'Arial',
    'font.family': 'sans-serif',
    'image.cmap': 'gray',
})

# the test image can be fetched by the `test_image` function
# it is included in molyso primarily to run testing routines

from molyso.test import test_image
pylab.imshow(test_image())

<matplotlib.image.AxesImage at 0x7f09dbf04f98>

# the central starting point of the molyso highlevel interface
# is the Image class

from molyso.mm.image import Image
image = Image()
image.setup_image(test_image())

# as a first test, let's run the autorotate routine, which will
# automatically correct the rotation detected

image.autorotate()
print("Detected angle: %.4f°" % (image.angle,))
pylab.imshow(image.image)

Detected angle: -1.5074°

<matplotlib.image.AxesImage at 0x7f09c8443e48>

# the next two functions call the low-level steps
# therefore, while not much is to see here, ...
# the magic happens behind the curtains

image.find_channels()
image.find_cells_in_channels()

from molyso.debugging.debugplot import inject_poly_drawing_helper
inject_poly_drawing_helper(pylab)

# embedded debugging functionality can be used
# to produce an image with cells and channels drawn as overlay
image.debug_print_cells(pylab)

# let's look into an important part of the Image-class

# the channels member, which supports the iterator interface
# ... therefore, we call it as parameter to list()
# to get a list of channels

channels = list(image.channels)

# and print some info about it (the .cells member works analogously)
print("The first channel contains: %d cells." % (len(channels[0].cells),))
print("The third channel contains: %d cells." % (len(channels[2].cells),))

The first channel contains: 0 cells.
The third channel contains: 4 cells.

# connectivity to original image data remains,
# as long as it is not removed (due to memory/disk-space consumption reasons)

channel = channels[2]
pylab.imshow(channel.channel_image)

<matplotlib.image.AxesImage at 0x7f09c8369898>

# lets look into the channel's cells ...

print(list(channel.cells))

[<molyso.mm.cell_detection.Cell object at 0x7f09d9e46f48>, <molyso.mm.cell_detection.Cell object at 0x7f09d9dd6248>, <molyso.mm.cell_detection.Cell object at 0x7f09d9e35d88>, <molyso.mm.cell_detection.Cell object at 0x7f09d9e151c8>]

# the last call did not really advance our insights ...
# let's take a look at some properties of the cell:
# *local_*top and *local_*bottom ...
# which coincide witht the pixel positions within the channel image
# Note: there is a top and bottom member as well,
# but these values include the total offset of the channel within the image!

for n, cell in enumerate(channel.cells):
    print("Cell #%d from %d to %d" % (n, cell.local_top, cell.local_bottom,))

Cell #0 from 109 to 140
Cell #1 from 142 to 169
Cell #2 from 171 to 215
Cell #3 from 217 to 255

# again, connectivity to the image data remains ...
# lets show all individual cell images of that channel

# long line just to prettify the output ...
from functools import partial
next_subplot = partial(pylab.subplot, int(numpy.sqrt(len(channel.cells)))+1, int(numpy.sqrt(len(channel.cells))))

for n, cell in enumerate(channel.cells):
    next_subplot(n+1)
    pylab.title('Cell #%d' % (n,))
    pylab.imshow(cell.cell_image)

pylab.tight_layout()

This was only a brief overview of some basic functionality. It might get expanded in the future. For now, if you’d like to get deeper insights on the working of molyso, I’d like to ask you to study the source files.

# PS: You can as well turn on Debug printing within IPython to get more insight on the internals
from molyso.debugging import DebugPlot
DebugPlot.force_active = True
DebugPlot.post_figure = 'show'

image.find_channels()
image.find_cells_in_channels()

Example Jupyter Notebooks¶

Example Analysis¶

Example molyso Embedding¶

Example Jupyter Notebooks ¶

Example Analysis ¶

Example molyso Embedding ¶