## Annual report of my intuition about the brain (2021)

How does the brain work and how can we understand it? I want to make it a habit to report some of the thoughts about the brain that marked me most during the past twelve month at the end of each year – with the hope to advance and structure the progress in the part of my understanding of the brain that is not immediately reflected in journal publications. Enjoy the read! And check out previous year-end write-ups: 2018, 2019, 2020, 2021.

During the last year, I have continued to work on the ideas described during previous year-end write-ups, resulting in a project proposal that is currently under evaluation. I will use this year’s write-up to talk about something different, although related, a recent book by Peter Robin Hiesinger: The Self-Assembling Brain.

Hiesinger, based in Berlin, is working in the field of developmental neurobiology. However, this book is rather a cross-over between multiple disciplines, ranging from developmental neurobiology, circuit neuroscience, artificial intelligence, robotics, and many side-branches of the mentioned disciplines. Hiesinger masterfully assembles the perspectives of the different fields around his own points of interest. For example, his introductory discussion about the emergence of the field of artificial intelligence in the 1950s is one of the most insightful account that I have read about this period. He tells the stories how key figures like von Neumann, Minsky, Rosenblatt or McCarthy and their relationships and personalities influenced the further development of the field.

The main hypothesis of Hiesinger’s book is that the genetic code does not encode the endpoint of the system (e.g., the map of brain areas, the default state network, thalamocortical loops, interneuron connectivity, etc.). According to him, and I think that most neuroscientists would agree, the neuronal circuits of the brain are not directly encoded in the genetic code. Instead, the simple genetic code needs to unfold in time in order to generate the complex brain. More importantly, it is, according to Hiesinger, necessary to actually run the code in order to find out what the endpoint of the system is. Let’s pick two analogies brought up in the book to illustrate this unfolding idea.

First, in the preface Hiesinger describes how an alien not familiar with life on earth finds an apple seed. Upon analysis of the apple seed, the alien realizes that there are complex and intricate genetic codes in the apple seed, and it starts to see beauty and meaning in these patterns. However, the analysis based on its structural content would not enable the alien to predict the purpose of the apple seed. This is only possible by development (unfolding) of the seed into an apple tree. Unfolding therefore is the addition of both time and energy to the seed.

Second, Hiesinger connects the unfolding idea with the field of cellular automata, and in particular with the early work of Stephen Wolfram, a very influential but also controversial personality of complexity research, and his cellular automaton named rule 110. The 110 automaton is a very simple rule (the rule is described in this wikipedia article) that is applied to a row of 1’s and 0’s and results in another binary row. The resulting row is again subject to rule 110, etc., leading to a two-dimensional pattern as computed here in Matlab:

The pattern is surprisingly complex, despite the simplicity of the rule. For example, how can one explain the large solitary black triangle in the middle right? How the vertical line of equally sized triangles in the center that ends so abruptly? The answers are not obvious. These examples show that a very simple rule can lead to very complex patterns. From Hiesinger’s point of view, it is important to state that the endpoint of the system, let’s say line 267, cannot be derived from the rule – unless it is developed (unfolded) for exactly 267 iterations. Hiesinger believes that this analogy can be transferred to the relationship between the genetic code and the architecture of the brain.

The rest of Hiesinger’s book discusses the implications of this concept. As a side-effect, Hiesinger illustrates how complex the genome is in comparison with the simple 101 automaton. Not only is the code richer and more complex, but it is also, due to transcription factor cascades that include feedback loops, a system of rules where rules, unlike rule 110, change over time with development. Therefore, according to Hiesinger, the classical research in developmental biology that tries to match single genes (or a few genes) onto a specific function is ill-guided. He convincingly argues that the examples for such relationships that have been found as “classical” examples for the field (e.g., genes coding for cell adhesion molecules involved in establishing synaptic specificity) are probably the exception rather than the rule.

The implication of the unfolding hypothesis for research on artificial intelligence is, interestingly, very similar. That is, to stop treating intelligent systems like engineered systems, where the rules can be fully designed. Since the connection between the generative rules and the resulting endpoint system cannot be understood unless their unfolding in time is observed, Hiesinger is in favor of research that embraces this limitation. He suggests to build models based on a to-be-optimized (“genetic”) code and, letting go of the full control, make them unfold in time to generate an artificial intelligence. Of course, this idea reminds of the existing field of evolutionary algorithms. However, in classic evolutionary algorithms, evolving properties of the code are more or less directly mapped to properties of the network or the agent. If I understood the book right, it would be in Hiesinger’s spirit to make this mapping more indirect through developmental steps that allow for higher complexity, even though it would also obfuscate the mechanistic connection between rules and models.

Overall, I find Hiesinger’s approach interesting. He shows mastery of other fields as well, but it is pncing point that the idea of the unfolding code, the self-assembling brain, is reasonable, and he also brings up examples of research that goes into that direction. However, as a note of caution to myself, accepting the idea of self-assembly seemed a bit like giving in when faced with complexity. There is a long history of complexity research that agreed on the fact that things are too complex to be understood. Giving in resulted in giving vague names to the complex phenomena, which seemed to explain away the unknown but in reality only gave it a name. For example, the concepts of emergence, autopoiesis or the free energy principle are in my opinion relatively abstract and non-concrete concepts that contributed to the halting of effective research by preventing incremental progress on more comprehensible questions. I get similar vibes when Hiesinger states that the connections between the self-organizing rules and the resulting product are too complex to be understood and require unfolding in time. The conclusion of this statement is either that everything is solved, because the final explanation is unfolding in time of a code that cannot be understood; or it is that nothing can be solved because it is too complex. In both cases, there seems to be some sort of logical dead end. But this just as a note of caution to myself.

So, what is the use of the unfolding hypothesis about the organization and self-assembly of the brain? I think it is useful because it might help guide future efforts. I agree with Hiesinger that the field of “artificial intelligence” should shift its focus on self-organized and growing neuronal networks. In my opinion, work focusing on evolutionary algorithms, actor-based reinforcement learning (e.g., something called neuroevolution), neural architecture search or more generally AutoML go into the right direction. Right now it seems a long shot to say this, but my guess is that these forms of artificial neuronal networks will become dominant within 10 years, potentially replacing artificial neuronal networks based on backpropagation. – After finishing the write-up, I came across a blog post by Sebastian Risi that is a good starting point with up-to-date references on self-assembling algorithms from the perspective of computer science and machine learning – check it out if you want to know more.

For neurobiology, on the other hand, the unfolding hypothesis means that an understanding of the brain requires understanding of its self-assembly. Self-assembly can happen, as Hiesinger stresses, during development, but it can also happen in the developped circuit through neuronal plasticity (synaptic plasticity on short and long time scales, as well as intrinsic plasticity). I have written about this self-organizing aspect of neuronal circuits in my last year’s write-up. Beyond that, if we were to accept the unfolding hypothesis as central to the organization of the brain, we would also be pressured to drop some of the beautiful models of the brain that are based on engineering concepts like modularity. For example, the idea of the cortical column, the canonical microcircuit, or the concept of segregated neuronal cell types. All those concepts have been obviously very useful frameworks or hypotheses to advance our understanding of the brain, but if the unfolding of the brain is indeed the main concept of its assembly, these engineering concepts are unlikely (although not impossible) to turn out to be true.

It is possible that most of the ideas are already contained in the first few pages, and the rest of the book is less dense and feels often a bit redundant. But especially the historical perspective in the beginning and also some later discussions are very interesting. Language-wise, the book could have benefitted from a bit more inference by the editor to avoid unnaturally sounding sentences, especially during the first couple of pages. But this is only a minor drawback of an otherwise clear and nice presentation.

The book is structured into ten “seminars”, which are each of them a slightly confusing mix of book chapter and lecture style. Each of the “seminars” is accompanied by a staged discussion between four actors: a developmental biologist, an AI researcher, a circuit neuroscientist and a robotics engineer (see the photo above). Theoretically, this is a great idea. In practice, it works only half of the time, and the book loses a bit of its natural flow because the direction is a bit missing. However, these small drawbacks are acceptable because the main ideas are interesting and enthusiastically presented.

Altogether, Hiesinger’s book is worth the time to read it, and I can recommend it to anybody interested in the intersection of biological brains, artificial neuronal networks and self-organized systems.

## Large-scale calcium imaging & noise levels

Calcium imaging based on two-photon scanning microscopy is a standard method to record the activity of neurons in the living brain. Due to the point-scanning approach, sampling speed is limited and the dwell time on a single neuron reduces with the number of recorded neurons. Therefore, one needs to trade off the number of quasi-simultaneously imaged neurons versus the shot noise level of these recordings.

To give an simplified example, one can distribute the laser power in space and time over 100 neurons at 30 Hz, or 1000 neurons at 3 Hz. Due to the lower sampling rate, the signal-to-noise-ratio (SNR) of the 1000 neurons will decrease as well.

A standardized noise level

To compare the shot noise levels across recordings, in our recent paper (Rupprecht et al., 2021) we took advantage of the fact that the slow calcium signal is typically very similar between adjacent frames. Therefore, the noise level can be estimated by

$\nu = \frac{Median_t \mid \Delta F/F_{t+1} - \Delta F/F_t \mid}{\sqrt{f_r}}$

The median makes sure to exclude outliers that stem from the fast onset dynamics of calcium signals. The normalization by the square root of the frame rate $f_r$ renders the metric comparable across datasets with different frame rates.

Why the square root? Because shot noise decreases with the number of sampling points with a square root dependecy. The only downside of this measure is that the units seem a bit arbitrary (% for dF/F, divided by the square root of seconds), but this does not make it less useful. To compute it on a raw dF/F trace (percent dF/F, no neuropil subtraction applied), simple use this simple one-liner in Matlab:

noise_level = median(abs(diff(dFF_trace)))/sqrt(framerate)

Or in Python:

import numpy as np
noise_level = np.median(np.abs(np.diff(dFF_trace)))/np.sqrt(framerate)

If you want to know more about this metric, check out the Methods part of our paper on more details (bioRxiv / Nature Neuroscience, subsection “Computation of noise levels”).

The metric $\nu$ comes in handy if you want to compare the shot noise levels between calcium imaging datasets and understand whether noise levels are relatively high or low. So, what is a “high” noise level?

Comparison of noise levels and neuron numbers across datasets

I collected a couple of publicly available datasets (links and descriptions in the appendix of the blog post) and extracted both the numbers of simultaneously recorded neurons and the shot noise level $\nu$. Each data point stands for one animal, except for the MICrONS dataset, where each dataset stands for a separate session in the same animal.

As a reference, I used the Allen Brain Institute Visual Coding dataset. For excitatory neurons, typically 100-200 neurons were recording with a standard noise level of 1 (units omitted for simplicity). If you distribute the photons across an increasing number of neurons, the shot noise levels should increase with the square root of this multiple (indicated by the black line). Datasets with inhibitory neurons (de Vries et al., red) have by experimental design fewer neurons and therefore lie above the line.

A dataset that I recorded in zebrafish with typically 800-1500 neuron per recording lies pretty much on this line, similar to the MICrONS dataset where they used a mesoscope to record from several thousand cortical neurons simultaneously, at the cost of lower frame rate and therefore higher noise levels, similar to the dataset by Sofroniew et al., which recorded ca. 3000 neurons, but all from one plane in a large FOV.

Two datasets acquired by Pachitariu and colleagues stands out a bit by pushing the number of simultaneously recorded neurons. In 2018, this came at the expense of increased noise levels (pink). In 2019 (a single mouse; grey), despite a dataset with ca. 20,000 simultaneously recorded neurons, the noise level was impressively low.

In regular experiments, in order to mitigate possible laser-induced photodamage or problems due to overexpression of indicators, noise levels should not be maximized at the cost of physiological damage. For example, the mouse from the MICrONS dataset was later used for dense EM reconstruction; any sort of damage to the tissue, which might be invisible at first glance, could complicate subsequent diffusive penetration with heavy metals or the cutting of nanometer-thick slices. As a bottom line, there are often good reasons not to go for the highest signal yield.

Spike inference for high noise levels

To give an idea about the noise level, here is an example for the MICrONS dataset. Due to the noisiness of the recordings (noise level of ca. 8-9), only large transients can be reliably detected. I used spike inference through CASCADE to de-noise the recording. It is also clear from this example that CASCADE extracts useful information, but won’t be able to recover anything close to single-spike precision for such a noise level.

Above are shown the smooth inferred spike rates (orange) and also the discrete inferred spikes (black). The discrete spikes (black) are nice to look at, but due to the high noise levels, the discretization into binary spikes is mostly overfitting to noise and should be avoided for real analyses. For analyses, I would use the inferred spike rate (orange).

Conclusion

The noise level $\nu$ can be used to quantitatively compare noise levels across recordings. I hope that other people can use this noise level metric $\nu$ for their work.

As a note of caution, $\nu$ should never be the sole criterion for data quality. Other factors like neuropil contamination, spatial resolution, movement artifacts, potential downsides of over-expression, etc. also play important roles. Low shot noise levels is not a guarantee for anything. However, high shot noise levels on the other hand are always undesirable.

.

Appendix: Details about the data shown in the scatter plot

de Vries et al. (2020; red and black) describes the Allen Visual Coding Observatory dataset. It includes recordings from more than 100 mice with different transgenic backgrounds in different layers of visual-related cortices. Red dots are datasets from mice that only expressed calcium indicators in interneurons, while black dot are datasets with cortical principal neurons of different layers. The datasets are highly standardized and of low shot noise levels (standardized level of ca. 1.0), with relatively few neurons per dataset (100-200).

Rupprecht et al. (unpublished; green) is a small dataset in transgenic Thy-1 mice in hippocampal CA1 that I recorded as a small pilot earlier this year. The number of manually selected neurons is around 400-500, at a standardized noise level of 2.0-3.0. With virally induced expression and with higher laser power (here, I used only 20 mW), lower noise levels and higher cell counts could be easily achieved in CA1.

Rupprecht et al. (2021; violet) is a dataset using the small dye indicator OGB-1 injected in the homolog of olfactory cortex in adult zebrafish. At low laser powers of ca. 30 mW, 800-1500 neurons were recorded simultaneously at a standardized noise level of 2.0-4.0.

Sofroniew et al. (2016; light green) recorded a bit more than 3000 neurons simultaneously at a relatively low imaging rate (1.96 Hz). Different from all other datasets with >1000 neurons shown in the plot, they recorded only from one single but very large field of view. All neuronal ROIs had been drawn manually, which I really appreciate.

Pachitariu et al. (2018; pink) is a dataset recorded at a relatively low imaging rate (2.5 Hz), covering ca. 10,000 neurons simultaneously. The standardized noise level seems to be rather high according to my calculations.

Pachitariu et al. (2019; black) is a similar dataset that contains ca. 20,000 neurons, but at a much lower standardized noise level (4.0-5.0). The improvement compared to the 2018 dataset was later explained by Marius Pachitariu in this tweet.

MICrONS et al. (2021; red) is a dataset from a single mouse, each dot representing a different session. 8 imaging planes were recorded simultaneously at laser powers that would not damage the tissue, in order to preserve the brain for later slicing, with the ultimate goal to image the ultrastructure using electron microscopes. The number of simultaneously imaged neurons comes close to 10,000, resulting in a relatively high standardized noise level of 7.0-10.0.
[Update, November 2021] As has become clear after a discussion with Jake Reimer on Github, the MICrONS data that I used were not properly normalized; it was not proper dF/F but with a background subtraction. The noise measure for this dataset is therefore not very meaningful, unfortunately. My guess is that the true noise level is in the same order of magnitude as shown in the plot above, but I cannot tell for sure.

The black line indicates how the noise level scales with the number of neurons. For $n_1 = 150$ neurons (Allen dataset, de Vries et al.), a standardized noise level of $\nu_1 = 1.0$ can be assumed. For higher numbers of neurons $n_2$, the noise level $\nu_2$ scales with $\nu_2 = \nu_1*\sqrt{n_2/n_1}$. Deviations from the line indicate where recording conditions were better or worse compared to these “typical” conditions.

## 5 reasons why to use Cascade for spike inference

Our paper on A database and deep learning toolbox for noise-optimized, generalized spike inference from calcium imaging is out now in Nature Neuroscience. It consists of a large and diverse ground truth database with simultaneous calcium imaging and juxtacellular recordings across almost 300 neurons. We used the database to train a supervised algorithm (“Cascade”) to infer spike rates from calcium imaging data.

If you are into calcium imaging, here are 5 reasons why you should use Cascade.

1. You don’t have to install it

You are not familiar with Python? Or you don’t want to install dependencies? No problem. Click this link. The link will bring you to an online interface where you can work with Cascade. You can upload your data (dF/F traces) as *.mat- or *.npy-files and download the results (inferred spike rates). Trying out the package is as quick as it can get. The online interface is a Google Colaboratory and therefore runs on servers provided by Google for free.

However, you can also install Cascade locally on your computer. Just go to the Github page and follow the installation instructions (tested on Ubuntu, Windows and Mac). This might be useful if you want to integrate spike inference into an existing workflow. People including myself have used Cascade together with CaImAn or Suite2p. You can use the dF/F output of these packages and apply the Cascade “predict” function as seen in the demo scripts. It’s a single-line addition of code to your pipeline.

Personally, I used Cascade with a GPU for the analyses in the paper. For daily work with calcium imaging recordings in mice (typically a few hours of recordings and a few hundreds of neurons), I run it on a computer without dedicated GPU because it’s fast enough on any reasonable CPU (seconds to minutes). Sometimes, I also use the online Colaboratory Notebook. The results are identical, whether I use a local installation or the Notebook, since the code is identical.

2. You don’t have to tune parameters

For each neuron from a data set, the algorithm automatically detects the noise level and chooses an appropriately trained model. The models are trained across many different conditions and indicators. This broadly trained model ideally generalizes to unseen data. Check out Figure 3 in the paper if you want to know more about the details.

To get started, you have to choose a model based on the frame rate of your recordings and on the temporal resolution that you want to achieve with spike inference. The FAQ, which can be found both on the Github Readme and at the end of the Colab Notebook, give you guidance if any doubts remain.

3. Estimate absolute spike rates

Have you ever wondered whether a specific calcium transient corresponded to a single action potential or a burst of many action potentials? At least it would be nice to have some estimate of spike rates for a given recording. Cascade gives you this estimate.

The next question is a bit more tricky: how precise is that estimate? – It is as exact as it can get if you apply an algorithm to calcium imaging data where you do not have an associated ground truth. We have quantified this precision in the paper in terms of correlation (=variance explained), absolute errors and typical biases (see Figure 3 and Extended Data Figure 4). In the end, the typical errors depend on the data quality and noise levels. However, you should expect that the true spike rate might be as low as 0.5 times or as high as 2.0 times the spike rate estimated by Cascade. On average across neurons, the error will be lower. This is not single-spike precision, but, as bad as it sounds, this is as good as it gets. The imprecision is, among other things, due to the unpredictable heterogeneity of the spike-calcium-relationship across neurons.

However, absolute estimates with a certain imprecision, which you get from Cascade, are still better than results that do not have an immediate meaning (dF/F scores).

4. Improve temporal precision

Spike inference, also referred to as temporal “deconvolution” of calcium recordings, improves the temporal resolution by getting rid of the slow calcium transient. The slower the calcium transient, the more pronounced the improvement of deconvolution. And, yes, the algorithm generalizes well across short and long time constants.

Applications where I have used Cascade myself to achieve improved temporal resolution:

a) Detection of fast sequences on a sub-second time scale (check out Figure 5e in our toolbox paper)

b) Detection of swimming-locked neuronal activity in head-fixed adult zebrafish. Adult zebrafish move on a sub-second time scale, faster than the calcium indicators we used in their brains at room temperature.

c) Locking of neuronal activity to oscillations in the hippocampus (ongoing work, not yet published).

If your observations are masked by slow indicator transients, give it a shot and try out Cascade.

When I used Cascade for the first time, I was surprised how well it de-noised recordings. The Cascade paper is full of examples where this is validated with ground truth.

Or, have a look at the spike rate predictions for the Allen Brain Observatory data set (Figure 6b,f,g,h; Extended Data Figure 10). Shot noise is removed, and slower transients due to movement artifacts are rejected. The algorithm simply has learned very well how an action potential looks like.

One of the most striking examples, however, was when we tested the effect of Cascade on population imaging analyses (Supplementary Figure 11, the only figure not yet included in the preprint). To this end, we used NAOMi to simulate neuronal population patterns and analyzed how well the correlations between neuron pairs were predicted from dF/F traces (red) or from spike rates inferred with Cascade (blue). For dF/F traces, correlations were often overestimated (among other reasons due to slow calcium transients) and underestimated (due to overwhelming noise). Pairwise correlation computed from Cascade’s spike rates are simply closer to the true correlations.

Therefore, if you want to get the best out of your 2P calcium imaging data, I would recommend to use Cascade. The result is simply closer to the true neuronal activity.

## Fast scanning, triplet states and photon yield

In point-scanning microscopy like two-photon or confocal microscopy, a focused laser beam is scanned across the field of view and thereby sequentially recovers an image of the object. In this blog post, I will discuss the idea that scanning faster across the field of view would increase the total amount of collected fluorescence. This idea is based on the experimental finding that high-intensity laser light could induce long-lived and non-fluorescent triplet states of the fluorophore molecule while scanning the sample; when the laser is scanned only slowly across the sample, it would therefore try to image fluorophores that are already in their “dark” triplet states. Imaging dark fluorophores would result in an overall decreased fluorescence yield. I tested this hypothesis directly with resonant scanning two-photon microscopy in the living brain tissue together with typical fluorophores (GCaMP6f, OGB-1). The main result from these experiments is that I could not find a substantial effect of the triplet states under these realistic conditions, and therefore no advantage in terms of fluorescence yield gained by an increased scan speed.

Introduction

Fluorophores are molecules that can re-emit light after absorbing a photon themselves. This effect of fluorescence can be described as a state change of the fluorophore from ground state to an excited state upon absorption of the incoming photon, and as a state change from an excited state back to the ground state, together with emission of the outgoing photon (see this Wikipedia article). The lifetime of the fluorescent state is often a few nanoseconds. However, there are often additional excited states that are more long-lived, such as so-called triplet states. The transition probabilities from ground state to triplet state (and vice versa) are very low due to the exclusion principle that derives from quantum spin mechanics. The lower transition probabilities make these states less likely to occur during fluorescence microscopy but also render the triplet state longer-lived once it is attained. These long-lived triplet states could have rather undesired consequences, since a fluorophore in the triplet state is unable to absorb or emit photons, therefore becoming “dark” or non-functional from the experimenter’s perspective.

When triplet states play a prominent role in laser-scanning microscopy, one can avoid their detrimental effects by simply scanning faster. Scanning slowly across the sample means that the same fluorophores are hit with light over and over within a short time window. In such a scenario, some of the fluorophores are already in a dark triplet states, resulting in lower overall fluorescence. Faster scanning avoids this problem, and it has been shown for confocal microscopy (Borlinghaus, 2006) as well as STED microscopy (Schneider et al., 2015; Wu et al., 2015) that a large signal increase can be achieved simply by faster scanning.

For two-photon microscopy, the situation is less clear. It has been shown that scanning at all compared to non-scanning two-photon fluorescence correlation spectroscopy results in higher photon yield (Petrášek and Schwille, 2008), but this specific fluorescence configuration cannot be translated easily to two-photon imaging used by neuroscientists. However, there is one publication that demonstrated that microsecond-long triplet states do play a large role for two-photon microscopy (Donnert et al., 2007). This paper has been cited as a standard reference to justify the advantages of fast scanning approaches (e.g., Chen et al., 2011), and some two-photon microscopy approaches directly designed their scanning approaches to reduce triplet-induced reduction of fluorescence yield (Gautam et al., 2015; Castanares et al., 2016; Karpf et al., 2020). However, another study showed an effect that seemed to be contradictory to the Donnert et al. results (Ji et al., 2008), and the interpretation of both papers was discussed repeatedly, e.g., by Andrew Hires and on Labrigger. The consensus, if there was any, seemed to be that it probably depends on the sample. The experiments by Donnert et al. had been done with GFP fixed on a coverslip, but one can easily imagine that a fluorescent protein in vivo might behave differently.

So I decided to probe these results with typical samples and procedures used by neuroscientists – using video-rate calcium imaging of neurons in the living brain. I did not do this out of pure curiosity about photophysics, but because of the implications of these photophysics for the design of two-photon imaging modalities. If triplet states were indeed an important factor for two-photon imaging of such samples as suggested by Donnert et al., a slightly modified scanning scheme (or parallelized scanning schemes, as reviewed by Weissenburger and Vaziri, 2018, Figure 3) or adapted laser repetition rates would have huge benefits for the total fluorescence yield.

The main finding of the Donnert et al. paper was the induction of dark triplet states by the scanning laser, and a key prediction was that faster scanning would decrease the probability that an imaged fluorophore is in a dark state; in short, that faster scanner would increase fluorescence yield. Therefore, I wanted to systematically understand whether fluorescence for two-photon microscopy depends on scan speed. To study this dependence experimentally in a typical in vivo sample , I used a resonant scanning microscope. For a resonant scanning microscope, the physical scan speed across the sample can be adjusted by changing the scan amplitude (often called the Zoom setting). High zoom would result in low scan speed across the sample, while low zoom would result in high scan speed, enabling a simple characterization of scan speed versus fluorescence yield. According to Donnert et al., higher scan speed would result in higher fluorescence yield compared to slower scan speed when imaging the same spot.

Experimental approach

For a resonant scanning microscope, the scan speed can be derived from the position x(t) of the laser beam focus:

$x(t) = x_0 \cdot sin( t \cdot f_{res} \cdot 2 \pi )$,

with the resonant frequency (f = 8 kHz) and the amplitude (x0 = 500 μm), which determines roughly the size of the field of view (FOV, here: 1000 μm). The scan speed v(t) is the derivative of the position x(t) with respect to time and therefore also follows a sinusoidal trajectory that reaches its peak in the center of the FOV:

$v(t) = x_0\cdot f_{res} \cdot 2 \pi \cdot cos( t \cdot f_{res} \cdot 2 \pi )$,

with the speed at the center of the FOV given by

$v_{max} = x_0\cdot f_{res} \cdot 2 \pi$.

For higher zoom values (i.e., a smaller FOV), the value is reduced by the same factor (at least in the microscopy software I used for these experiments!). Thereby, it is possible to test a wide range of scan speeds just be changing the zoom setting of the resonant scanner. Since the resonant scanner can span zoom levels between 1x to 20x (for much higher zoom settings, a resonant scanner can become unstable due to the small scan amplitude), a range of speeds between 25 μm/μs and 1.25 μm/μs is spanned.

This speed should be compared to the lateral resolution of the microscope, which is for the microscope used in this experiment around 0.4 μm FWHM (full width at half maximum of the point spread function). To scan over this resolution-limited spot, the resonant scanner needs 0.016 – 0.32 μs. For a pulsed laser with a repetition rate of 80 MHz with 12.5 ns between two pulses, this corresponds to 1.3 – 26 pulses per resolution-limited spot. Let’s call this number the cumulative pulse count per resolution-limited spot (CPC) from now on.

Intuitively, this means that each time a laser pulse tries to excite a fluorophore, this fluorophore has very recently already been hit by a number n of previous pulses, approximately n = CPC/2. If there is a chance that a pulse drives the fluorophore into a dark state that makes the fluorophore non-excitable for a handful of microseconds (e.g., a triplet state), then the CPC can be used to calculate the number of fluorophores remaining to be excited:

$N_{remaining} = N_0 \cdot e^{-CPC/\lambda}$

λ is a constant that depends on the applied laser intensity and the fluorophore itself. More concretely, the constant λ is the event rate of a fluorophore going into the dark state. The exponential decay with CPC should hold true as long as these events occur within a time window that is shorter than the recovery timescale of the dark state (for triplet states, this is ca. 1 μs). The main consequence is that the fluorescence yield is obviously proportional to the remaining number of excitable fluorophores, Nremaining.

Therefore, if these events that generate dark triplet states were very unlikely (e.g., λ ~ 500), the effect of fast scanning would not really make a difference. However, if λ was ~1, the effects would be dramatic. How can we distinguish these scenarios?

Experimental implementation, part I

With a resonant scanning microscope, the experiment is easy to perform for a dye in a solution. You simply have to generate a homogeneous sample and image first with high zoom and low zoom, then compare the fluorescence in the central region where the scan speed is maximal.

Unfortunately, biological samples are not homogeneous, and also not stationary in the case of calcium indicators in living neurons. To image in a biological but still rather homogeneous sample, I used a transgenic GCaMP6f zebrafish line. I imaged in an explant of a dorsal forebrain region that I knew was labeled very densely and very homogeneously (described by Huang et al., 2020). But the neuronal somata with the nice nuclear exclusion of GCaMP generated a lot of undesired variability.

The solution to circumvent these sources of variability is systematic averaging. In a first approach, I took advantage of the fact that I had written large parts of the microscopy control software myself. I wrote a helper program that performed continuous imaging but randomly moved the stage to different positions every few seconds, thereby averaging across all these inhomogeneously labeled FOVs:

After a minute, the program automatically changed the Zoom setting to a different, random value and recorded again the video with intermittent stage movements:

To see an effect, I performed these experiments with rather high laser power (50-80 mW below the objective) and at rather low wavelengths (typically 800 nm, which had been used previously by Donnert et al.).

In the following plot, each data point corresponds to one movie as shown above. Keep in mind that a zoom setting corresponds to a specific CPC. For example, a zoom setting of 1 corresponds to a CPC of 1.3. However, I could not see any dependence of the total fluorescence on the cumulative pulse count CPC (left), so it was not worth determining a numeric value for λ. Interestingly, the total fluorescence decreased slightly but clearly visible over time (right), but since the zoom setting sequence was randomized, this did not reflect any dependence on the CPC:

This finding also held true for repetitions of the same experiment at different locations of the fish brain and with slightly changed wavelength or average power:

Next, I performed the same experiment with a different fluorophore. I injected OGB-1 into the homolog of piriform cortex of zebrafish. This is a large and relatively uniform region in the zebrafish forebrain that allows the dye to diffuse more or less homogeneously, at least if you’re lucky. When I analyzed the experiments, I found again no visible effect, no matter the laser power or the laser wavelength:

Together, these experiments strongly suggested that there is indeed no effect of triplet states and therefore no benefit of fast scanning to increase the fluorescence yield.

Experimental implementation, part II

However, to convince myself more about this experimental finding that seemed to be at odds with my expectations from the Donnert et al. paper, I used a second, slightly more direct approach. Instead of comparing sequential recordings at different zoom levels, I thought it would be interesting to record the fluorescence while changing the zoom level continuously. This would enable me to measure the dependence of fluorescence on CPC more quickly and therefore also for a larger set of power settings and wavelengths. To continuously change the zoom setting, I used a DAQ board to generate a low-frequency sine signal that modulated the zoom level from 1 (peak of the sine) to ~20 (trough of the sine) with a period of ca. 13 seconds. (To keep track of the sine signal, I also connected the command signal with the second input channel of the microscope.) That’s how these experiments looked like:

Of course it is important to use not the entire FOV but only the central vertical stripe that remains more or less stationary. I used only a small vertical window of 7 pixels for the analysis. A single experiments resulted in a result such as shown below, plotting CPC (top) and fluorescence (bottom). The fluorescence clearly shows some variability stemming from active neurons (after all, we’re still dealing with a living brain here!):

In the plot above, no obvious relationship between CPC and fluorescence can be seen, and when I changed the power at 920 nm between 15 mW and 60 mW (this was the maximum that I could get with this system), I could not see any effect. I therefore show here all experiments performed at 920 nm, pooled across power settings and across two fish (total of ca. 30 recordings, each a few minutes):

For some experiments, the maximum zoom level was around 14, which I extended for a subset of experiments to something closer to 20.

I performed the same experiment also at 800 nm. I could also increase the laser power, simply because the laser provided more power at this wavelength range. However, these levels cannot go to arbitrary values. At a certain threshold that also depends on the duration of the exposure, all neurons across the zebrafish’s brain region become bright, resulting in a wave of high-calcium neurons that propagates through the brain. To avoid this, I used a maximum power of 75 mW at 800 nm. The result, again pooled across laser powers between 40 and 75 mW, showed an effect, albeit a bit subtle:

Fluorescence was indeed slightly higher for very low CPCs. However, the effect was much smaller than expected from experiments in vitro by Donnert et al.. Overall, such a small effect which, in addition, only appeared at the less relevant wavelength of 800 nm, seemed of little relevance for practical purposes.

Therefore, with fluorescence depending on CPC only in minor ways and under non-typical imaging conditions, the suggested triplet states seem to be not relevant for in vivo calcium imaging situations; and as a consequence, one would be ill-guided to assume that faster scanning yields higher fluorescence yield for two-photon microscopy by avoiding these μs-long dark states.

As a side-note, when I tried to analyze the power-dependency of this weak effect observed at 800 nm, I came across a weird effect. There was indeed some sort of increased fluorescence at lower zoom levels. However, this increase came with a delay of several seconds during the above experiments, resulting in a hysteresis:

Due to this longer time-scale, this effect has nothing to do with the short-lived μs-triplet states but is something different entirely. Photophysics is really complicated! This additional observation made me stop experiments, because I realized that these things would be more difficult to figure out, on top of a very small and probably irrelevant triplet effect.

Conclusion

I did not find any evidence for a substantial effect of triplet states during typical conditions (calcium imaging of neurons with a Ti:Sa laser and <100 mW power at the sample). I therefore do not see a benefit in terms of fluorescence yield by faster scanning. The triplet states on the timescale of few microseconds that had been observed by Donnert et al. for fixed samples do not seem to play a major role under the investigated conditions. It is always challenging to convincingly show the absence of an effect, but my experimental results convinced me to not further investigate fast scanning and multiplexing schemes as a means to increase fluorescence yield.

[Update August 2021: Christian Wilms pointed out that the most obvious difference between my experiments and the Donnert et al. study is that the dye molecules were freely diffusible in my experiments but fixed in the Donnert et al. experiments. Consistent with that, he also noticed that the paper from Ji et al., which found results contradictory to the Donnert et al. paper, was mostly based on experiments with freely diffusible dyes, while STED experiments, which clearly showed the effect, are mostly based on fixed fluorophores.]

Multiplexing or other modified distributions of excitation photons in time and space might still be able to increase fluorescence yield for certain fluorophores and conditions, but probably not under conditions similar to the ones I investigated.

The experiments described above are not fully systematic and cover only a specific parameter regime of excitation wavelengths, fluorophores and laser powers. However, anybody with a resonantly scanning two-photon microscope can easily reproduce these findings for any other scenario. Simply switch between high and low zoom settings and check whether the brightness in the center of the FOV changed substantially or not. Quantification of possible effects requires careful averaging, but a quick and qualitative confirmation or refutation of the above findings would be very easy to do for any experimenter.

Acknowledgements

The experiments described above were carried out in the lab of Rainer Friedrich at the FMI in Basel. I’m thankful to Christian Wilms, who encouraged me to analyze and write up these experiments after a discussion on Twitter.

## Research and Intuition

So far I was very fortunate with my scientific long-term mentors and supervisors: both of them are kind, open, creative and stunningly intelligent. I could not wish for more. However, when asked about a role model, I would mention a person that influenced my take on research, during a time when I still was studying physics, probably more than others: Pina Bausch.

Pina Bausch was a dancer and choreographer who mostly worked in the small town of Wuppertal, Germany, where she developed her own way of modern dance. Her works are creative and inventive in very unexpected ways, and the way she explored body movements as a dancer struck me as surprisingly similar to what I think is research.

Research in its purest form is the exploration of the unknown, the discovery of what is not yet discovered, without a clear path ahead. The question that I’m working on in the broadest sense, “How does the brain work?”, enters the unknown very quickly as soon as you take the question seriously. How, in general, can we see what cannot be seen yet, how can we find ideas that do not yet exist?

Pina Bausch was a master in this art. Her craft was not science or biology but dancing. However, I think one can learn some lessons from her. It was typical of her to explore her own movements and to “invent” new movements, like wrist movements or coordinated movements of elbows and the head, or simply a slowed-down or delayed movement of the fingers. In regular life we use a rather limited and predefined combination of motor actions, and it takes some creativity to come up with movements that are unexpected and new but still interesting. One way to find new ways to move would be to consciously become aware of the own patterns and limitations and then try to systematically break those rules. However, Pina Bausch performed this discovery process in a different way. Her research was not guided by intellectual deduction or conclusion, but by her intuition. In 1992, she said:

“Ich weiß nämlich immer, wonach ich suche, aber ich weiß es eher mit meinem Gefühl als mit meinem Kopf.”

“Because I always know what I’m searching for. But I know it with my heart and with my feeling rather than with my brain.”

This might come over as a bit naive at first glance. Sure, an artist uses her heart, a scientist uses his brain, that sounds more or less normal, doesn’t it? However, when I saw Pina Bausch do this kind of searching, that is, when she danced, I was very impressed.

She seemed to rely on her intuition on every single moment of her explorations; and when I heard her talk about it (unfortunately, I’m only aware of interviews in German without translation), it was also clear that she did not have and did not need an explanation of what was going on. Most impressively for me, her way of exploring the unknown really struck me as similar to what is going on in a researcher, no matter the subject. What made her such an excellent researcher?

To me, it seems that the prerequisites of her impressive ability are the following: First of all, of course, a deeply engrained knowledge of and skill with her art, together with a honest care about the details. There’s no intuition without experience and knowledge. Second, an openness to whatever random things might happen and to embrace them, coming from the outside or her inside. Third, an acceptance of the fact that she doesn’t really know what she’s doing. Or, to put this differently, a certain humility in the face of what is going to happen and what is going on in her own subconsciousness. I believe that these are qualities that also make for a good researcher in science.

It also reflects my own experience of doing research (at least partially). Even when I was working with mathematical tools, for example when I was modeling diffusion processes in inhomogeneous media during my diploma thesis, I had the impression that my intuition was always a couple of steps ahead of myself. Often I could see the shape of the mathematical goal ahead of my derivations, and it would take me several days before I could bring it down to the paper.

Of course there are other ways to develop new ideas, and for some problems intuition also fails systematically (maybe complex systems?). And of course there are other kinds of research, for example the gradual optimization of methods, or the development of devices to solve a specific problem, or the broad and systematic screening of candidate genes or materials for a defined purpose.

These systematic and step-wise procedures are more predictable than “pure” research, and the grant-based scientific research reinforces this kind of research. In a grant proposal, there are typically a defined number of “aims”. The more clearly defined these aims are, the better the chances of the grant proposal to be accepted. This makes sense. It would be ridiculous to fund a project with loosely defined aims, especially if other, competing proposals have a clear and realistic goal.

However, this necessary side-effect of grant-based research narrows our perspective on a kind of research that can be more or less clearly described even before doing it. It narrows down also the way how we talk about research and about results. We do not directly encourage young researchers to use and develop their intuition, as if this had nothing to do with the scientific process. In grants and progress reports and talks and papers, we try to use very concise, precise language, sharp and clean as steel (often completed by pieces of superficial math that are supposed to demonstrate precision), not only when describing our methods – but also when describing results and when interpreting the results. This is not bad by itself, but it shapes also the way we think about research, and it can lead to a situation where we internally might reject ideas or results that do not satisfy the desired clarity and cleanliness in a first step.

I think that also researchers in “hard” sciences like neuroscience could benefit from a technique that uses intuitive thinking, and at least I have learnt a lot from the way Pina Bausch approached her subject of study using these techniques. Ultimately, understanding in neuroscience should always aim for descriptions in terms of words or math. But the way towards this goal does not need to be guided by these clear ways of thinking alone. From my experience, the power of intuition is only unleashed if we accept that we cannot really understand the process itself. Therefore, I see the humility that Pina Bausch showed towards her own intuitive thought process not simply as a virtue of a human being, but rather as a tool and a way of thinking that enables creativity.

## Online spike rate inference with Cascade

To infer spike rates from calcium imaging data for a time point t, knowledge about the calcium signal both before and after time t is required. Our algorithm Cascade (Github) uses by default a window that is symmetric in time and feeds this window into a small deep network to use the data points in the window for spike inference (schematic below taken from Fig. 2A of the preprint; CC-BY-NC 4.0):

However, if one wants to perform spike inference not as a post-processing step but rather during the experiment (“online spike inference”), it would be ideal to perform spike inference with a delay as short as possible. This would allow for example to use the result of spike inference for a closed-loop interaction with the animal.

Dario Ringach recently came up with this interesting problem. With the Cascade algorithm already set up, I was curious to check very specifically: How many time points (i.e., imaging frames) are required after time point t to perform reliable spike inference?

Using GCaMP/mouse datasets from the large ground truth database (the database is again described in the preprint), I addressed this question directly by training separate models. For each model, the time window was shifted such that a variable number of data points (between minimally 1 and maximally 32) were used for spike inference. Everything was evaluated at a typical frame rate of 30 Hz, and also at different noise levels of the recordings (color-coded below); a noise level of “2” is pretty decent, while a noise level of “8” is quite noisy – explained with examples (Fig. S3) and equations (Methods) again in the preprint.

The results are quite clear: For low noise levels (black curve, SEM across datasets as corridor), spike inference seems to reach a saturating performance (correlation with ground truth spike rates) around a value of almost 8 frames. This would result in a delay of almost 8*33 ms ≈ 260 ms after a spiking event (dashed line).

But let’s have a closer look. The above curve was averaged across 8 datasets, mixing different indicators (GCaMP6f and GCaMP6s) and induction methods (transgenic mouse lines and AAV-based induction). Below, I looked into the curve for each single dataset (for the noise level of 2).

It is immediately clear that for some datasets fewer frames after t are sufficient for almost optimal spike inference, for others not.

For the best datasets, optimal performance is already reached with 4 frames (left panel; delay of ca. 120 ms). These are datasets #10 and #11, which use the fast indicator GCaMP6f, which in addition is here transgenically expressed. The corresponding spike-triggered linear kernels (right side; copied from Fig. S1 of the preprint) are indeed faster than for other datasets.

Two datasets with GCaMP6s (datasets #15 and #16) stand out as non-ideal, requiring almost 16 frames after t before optimal performance is reached. Probably, expression levels in these experiment using AAV-based approaches were very high, resulting in calcium buffering and therefore slower transients. The corresponding spike-triggered linear kernels are indeed much slower than for the other GCaMP6s- or GCaMP6f-based datasets.

The script used to perform the above evaluations can be found on Cascade’s Github repository. Since each data point requires retraining the model from scratch, it cannot be run on a CPU in reasonable time. On a RTX 2080 Ti, the script took 2-3 days to complete.

Conclusions:

1. Only few frames (down to 4 frames) after time t are sufficient to perform almost ideal spike inference. This is probably a consequence of the fact that the sharp step increase is more informative than the slow decay of a spike-triggered event.
2. To optimize the experiment for online spike-inference, it is helpful to use a fast indicator (e.g., GCaMP6f). It also seems that transgenic expression might be an advantage, since indicator expression and calcium buffering is typically lower for transgenic expression than for viral induction, preventing a slow-down of the indicator by overexpression.

## Heating up the objective for two-photon imaging

To image neurons in vivo with a large field of view, a large objective is necessary. This big piece of metal and glass is in indirect contact with the brain surface, with only water and maybe a cover slip in between. The objective touching the brain effectively results in local cooling of the brain surface through heat conduction (Roche et al., eLife, 2019; see also Kalmbach and Waters, J Neurophysiology, 2012). Is this a problem?

Maybe it is: Cooling by only few degrees can result in a drop of capillary blood flow and some side-effects (Roche et al., eLife, 2019). And it has also been shown (in slice work) that minor temperatures changes can affect the activity of astrocytic microdomains (Schmidt and Oheim, Biophysical J, 2020), which might in turn affect neuronal plasticity or even neuronal activity.

For a specific experiment, I wanted to briefly test how such a temperature drop affects my results. Roche et al. used a commercial objective heating device with temperature controller, and a brief email exchange with senior author Serge Charpak was quite helpful to get started. However, the tools used by Roche et al. are relatively expensive. In addition, they used a fancy thermocouple element together with a specialized amplifier from National Instruments to probe the temperature below the objective.

Since this was only a brief test experiment, I was hesitant to buy expensive equipment that would maybe never be used again. As a first attempt, I wrapped a heating pad, which is normally used to keep the temperature of mice during anesthesia at physiological levels, around the objective; however, the immersion medium below the objective could only heated up to something like 28°C, which is quite a bit below the desired 37°C.

Therefore, I got in touch with Martin Wieckhorst, a very skilled technician from my institute. He suggested a more effective heating of the objective by using a very simple solution. After a layer of insulation tape (Kapton tape, see picture below), we wrapped a constantan wire, which he had available from another project, in spirals around the objective body, followed again by a layer of insulation tape. Then, using a lab power supply, we just sent some current (ca. 1A at 10 V) through the wire. The wire acts as a resistor – therefore it is important that adjacent spirals do not touch each other – and produces simply heat that is taken up by the objective body.

To measure the temperature below the objective, we needed a sensor as small as possible. A typical thermometer head would simply not fit into the space between objective and brain surface. We decided to use a thermistor or RTD (resistance temperature detectors). How can we read out the resistance and convert it into temperature? Fortunately, Martin found an old heating block which contained a temperature controller (this one). These controllers are typically capable to use information from standardized thermistors of different kinds or thermocouples.

Next, we bought the sensor itself, a PT100 thermistor (I think it was this one) with a very small spatial footprint. The connection from the PT100 to the temperature controller is pretty straightforward once you understand the connection scheme based on three wires (explained here). This three-wire scheme serves to eliminate the effect of the electrical resistance of the cables on the measurement. Then, we dipped the head of the PT100 into non-corrosive hot glue in order to prevent a shortcut of the PT100 resistor once it dips into the immersion medium. The immersion medium is at least partially conductive and would therefore affect the measure resistance and also the measured temperature. Once we had everything set up, we checked the functionality of the sensor in a water bath, using a standard thermometer for calibration. Another way to perform this calibration would be an ice bath, which is stably at 0°C.

The contact surface of my objective with the immersion medium is mostly glass and a bit of plastic, therefore it took roughly 30-60 min until the temperature below the objective reached a stable value of around 37°C. In order to prevent that the heat is distributed throughout the whole microscope, we used a plastic objective holder that does not conduct heat.

Together, I found this small project very instructive. First, I was surprised to learn how reliable and fast an objective heater based on simple resistive wire can be. Heating up the metal part of the objective up to >60°C within minutes was no problem. It took however much longer until the non-metal parts of the objective also reached the desired temperature. I was also glad to see that the objective (16x Nikon) was not damaged and its resolution during imaging was not affected by its increased temperature!

The problem of designing a very small temperature sensor was more complicated, also due to the standard three-wire scheme to measure with thermistors. However, all components that we used were relatively cheap, and I think that these temperature measurement devices are interesting tools that could be used also for other experiments, e.g., to monitor body temperature or to build custom-made temperature controllers of water bath temperature for slice experiments.

## Temporal dispersion of spike rates from deconvolved calcium imaging data

On Twitter, Richie Hakim asked whether the toolbox Cascade for spike inference (preprint, Github) induces temporal dispersion of the predicted spiking activity compared to ground truth. This kind of temporal dispersion had been observed in a study from last year (Wei et al., PLoS Comp Biol, 2020; also discussed in a previous blog post), suggesting that analyses based on raw or deconvolved calcium imaging data might falsely suggest continuous sequences of neuronal activations, while the true activity patterns are coming in discrete bouts.

To approach this question, I used one of our 27 ground truth datasets (the one recorded for the original GCaMP6f paper). From all recordings in this data set, I detected events that exceeded a certain ground truth spike rate. Next, I assigned these extracted events in 3 groups and systematically shifted the detected event of groups 1 and 3 by 0.5 seconds forth and back. Note that this is a short shift compared to the timescale investigated by the Wei et al. paper. This is how the ground truth looks like. It is clearly not a continuous sequence of activations:

To evaluate whether the three-bout pattern would result in a continuous sequence after spike inference, I just used the dF/F recordings associated with above ground truth recordings and Cascade’s global model for excitatory neurons (a pretrained network that is available with the toolbox), I infered the spike rates. There is indeed some dispersion due to the difficulty to infer spike rates from noisy data. But the three bouts are very clearly visible.

This is even more apparent when plotting the average spike rate across neurons:

Therefore, it can be concluded that there are conditions and existing datasets where discrete activity bouts can be clearly distinguished from sequential activations based on spike rates inferred with Cascade.

This analysis was performed on neurons at a standardized noise level of 2% Hz-1 (see the preprint for a proper definition of the standardized noise level). This is a typical and very decent noise level for population calcium imaging. However, if we perform the same analysis on the same data set but with a relatively high noise level of 8% Hz-1, the resulting predictions are indeed much more dispersed, since the dF/F patterns are too noisy to make more precise predictions. The average spike rate still shows three peaks, but they are only riding on top of a more broadly distributed, seemingly persistent increase of the spike rate.

If you want to play around with this analysis with different noise levels or different data sets, you do not need to install anything. You can just, within less than 5 minutes, run this Colaboratory Notebook in your browser and reproduce the above results.

## Annual report of my intuition about the brain (2020)

How does the brain work and how can we understand it? I want to make it a habit to report some of the thoughts about the brain that marked me most during the past twelve month at the end of each year – with the hope to advance and structure the progress in the part of my understanding of the brain that is not immediately reflected in journal publications. Enjoy the read! And check out previous year-end write-ups: 2018, 2019, 2020, 2021.

Doing experiments in neuroscience means opening Pandora’s box. On a daily basis, you’re confronted with the vexing fact that the outcome of experiments is not only slightly, but much more complex and variable than any mental model you could come up with. It is rewarding and soothing to read published stories about scientific findings, but they often become stories only because things which did not fit in were omitted or glossed over. This is understandable to some extent, since nobody wants to read 379 side-notes on anecdotal and potentially confusing observations. But it leads to a striking gap between headlines with clear messages, and the feeling of being overwhelmed by complexity when doing experiments or going through a raw dataset. It is possible to overcome this complexity by nested analysis pipelines (‘source’ extraction, unsupervised clustering, inclusion criteria, dimensionality reduction, etc.) and to restore simplicity. But the confusion often comes back when going back to the raw, unreduced data, because they contain so much more complexity.

In this year’s write-up, I want to address this complexity of the brain from the perspective of self-organized systems, and I will try to point out lines of research that can, in my opinion, contribute to an understanding of these systems in the brain.

Complex systems

Two years ago, I have been writing about the limitation of the human mind to deal with the brain’s complexity, and the reasons behind this limitations (Entanglement of temporal and spatial scales in the brain but not in the mind). This year again, I have been thinking quite a bit about these issues. During summer, in the bookshelf of a friend, I noticed the novel Jurassic Park, which my friend, to my surprise, recommended to me. The book, more so than the movie, tells the story of how a complex system – the Jurassic Park – cannot be controlled because of unexpected interactions among system components that were thought to be separated by design. This perspective is represented in the book by a smart-ass physicist who works on chaos theory. He not only predicts from the start that everything will go downhill but also provides lengthy rants about the hubris of men who think they can control complexity.

This threw me back to the days when I studied physics myself, actually also with a focus on complex systems: non-linear dynamics, non-equilibrium thermodynamics, chaos control and biophysics. So, with some years in neuroscience behind me, I went back to the theory of complex systems. I started to go through a very easy-to-read book on the topic by Melanie Mitchell: Complexity: A Guided Tour. Melanie Mitchell is herself a researcher in complexity science. She did her PhD work with Douglas Hofstadter, famously known for his book Gödel, Escher, Bach. Mitchell summarizes the history and ideas of her field in a refreshingly modest and self-critical way, which I can only recommend. As another bonus point, the book was published in 2009, just before deep learning emerged as a dominant idea – which also suppressed and overshadowed many other interesting lines of thought.

For example, Mitchell brings up John von Neumann’s idea of self-organization in cellular automata, Douglas Hofstadter’s work, Alan Turing’s idea of self-organization in simple reaction-diffusion systems, the cybernetics movement around Norbert Wiener, Hermann Haken’s concept of Synergetics, and Stephen Wolfram’s A New Kind of Science.

Unfortunately, many of these ideas about complex systems were intellectually inspiring and certainly influenced many people; but at the same time they often did not really hold their promise. They did not have a significant real-world impact outside of the philosophical realm, in contrast to, e.g., the invention of semiconductors or backpropagation. On both extreme sides of the spectrum, things were a bit detached from reality. On one extreme, ideas around self-organization like the Autopoiesis concept were riddled with ill-defined concepts and connected to ideas like “emergence”, “cognition” or “consciousness” in very vague ways. On the other side, many very influential researchers like Douglas Hofstadter or Stephen Wolfram had a very strong mathematical background and therefore were fascinated by beauty and simplicity rather than truly high-dimensional chaos. I think it’s fascinating to proof that a cellular automaton like the Game of Life is Turing-complete (i.e., it is a universal computer), but wouldn’t a practical application of such an automaton be more convincing and useful than a theoretical proof?

It is therefore tempting for an experimentalist and natural scientist to simply trash the entire field as verbal or mathematical acrobatics that will not help to understand complex systems like the brain. However, in the next section I’d like to make the point why I think the concept of self-organized systems should still be considered as potentially central when it comes to understanding the brain.

Self-organized systems

Over the last years, I have become more and more convinced that complex systems cannot be easily understood by simply describing their behavior. Even for very simple phenomena like the Lorenz equations, where the behavior of the system can be described by some sort of attractors, the low-dimensional description allows to predict the behavior of the system, but it does not tell much about the generative processes underlying the respective phenomena.

Low-dimensional descriptions of brain activity are one of the most active areas of current research in neuroscience. This can range from a description of brain dynamics in terms of oscillatory regimes, to default mode networks of the human brain, or more recently to attempts to break down the population activity of thousands of neurons into a low-dimensional manifold. These are beautiful descriptions of neuronal activities, and it is certainly useful to study the brain with these approaches. But does it provide us with a real understanding? From one perspective, one could say that such a condensed description (if it exists, which is not yet clear) would be a form of deep understanding, since any way to compress a description is some sort of understanding. But I think there should be a deeper way of understanding that focuses on the underlying generative processes.

Imagine you want to understand artificial neural networks (deep networks). One way would be to investigate information flows and how representations of the input evolve across different layers and become less similar to the input and more similar to the respective target label. This is an operative and valuable way to understand of what is going on. In my opinion, it would however allow for a deeper understanding to simply bring up the very organizing principles which underlie the generation of the network: back-propagation of errors during learning and the definition of the loss function.

Similarly, I think it would be equally more interesting in neuroscience to understand the generative and organizing principles which underlie the final structure of the brain, instead of studying the representations of information in neuronal activity. It is clear that a part of the organization of the brain is encoded in the genome (e.g., guidance cues for axonal growth, or the program of sequential migration of cell types, or the coarse specific connectivity across different cell types). However, the more flexible and possibly also more interesting part is probably not organized by an external designer (like a deep network) and also not directly organized by the genome. In the absence of an external designing instance, there must be self-organization at work.

Once we accept that this part of the generative principles underlying the brain structure and function is self-organization, it becomes immediately clear that it might be useful to get inspired by complexity science and the study of self-organized systems. This connection between neuroscience is probably evident to anybody working on complex systems, but I have the impression that this perspective is sometimes lost by systems neuroscience and in particular experimental neuroscience.

Self-organizing principles of neuronal networks: properties of single neurons

I believe that the most relevant building blocks of self-organization in the brain are single neurons (and not molecules, synapses or brain areas). A year ago, I have argued why I think this makes sense from an evolutionary perspective (Annual report of my intuition about the brain (2019)), and I have argued why it would be interesting to understand the objective function of a single cell. The objective function would be the single cell-specific generative principle that underlies the self-organization of biological neuronal networks.

Realistically speaking, this is too abstract a way of exploring biological neurons. What would be a less condensed way to describe the self-organizing principles underlying single neurons that are analogous to back-propagation and loss functions for deep networks? I would tend to mention main ingredients: First, the principles that determine the integration of inputs in a single neuron. Second, the principles that determine the way that neurons connect and modify their mutual connections – which is basically nothing but plasticity rules between neurons. I am convinced that the integrative properties of neurons and the plasticity rules of a neuron when interacting with other cells are the main ingredients that are together the self-organizing principles of neuronal networks.

This is a somewhat underwhelming conclusion, because both plasticity rules and integrative properties of neurons have been studied very intensely since the 1990s. The detour of this blog post about self-organization basically reframes what a certain branch of neuroscience has been studying anyway. However, in addition it makes – in my opinion – also clear why the study of population dynamics and low-dimensional descriptions of neuronal activity aims at a different level of understanding. And it makes the point that the deepest understanding of biosimilar networks can probably be achieved by studying the aspects of self-organizing agents, plasticity rules and single-cell integrative properties, and not by studying the pure behavior of animals or neuronal networks.

Studying self-organized neuronal networks and plasticity rules

So, how can we study these principles of self-organized agents? Unfortunately, the last 30 years have made it quite clear that there is not simply a single universal plasticity rule. Typical plasticity rules (spike-time dependent plasticity; fire-together-wire-together; NMDA-dependent potentiation) usually explain only a small fraction of the variance in the experimental data and can often be only studied in very specific experimental conditions, and in most cases only in slice work. Usually, the conditions of the experiment (number of presynaptic action potentials to induce plasticity, spike frequency, etc.) are tuned to achieve strong effects, and absence of effects in other conditions are not systematically studied and often go unreported. In addition, plasticity rules in vivo seem to be somewhat different. Neuromodulation and other state-dependent influences might affect plasticity rules in ways that make them almost impossible to study systematically. In addition, it is very likely that there is not a single plasticity rule that governs the same behavior across all all neurons, since diversity of properties has been shown in simulations to provide robustness to neuronal circuits at many different levels. And evolution has would be a fool not to make use of this property that is so easy to achieve – evolution does not care about being hard to reverse engineer. This however makes it merely impossible (although still very valuable!) to dissect these principles systematically in experiments.

That is why I think that simulations – not experiments – could be the best starting point for understanding these self-organized networks.

There is indeed a large body of work going into this direction. If you google for “self-organizing neuronal networks”, you will find a huge literature which goes back to the 60s and is often based on very simplistic models of neurons (still heavily inspired by condensed matter physics), but there are also some interesting more recent papers that directly combine modern plasticity rules with the idea of self-organization (e.g. Lazar et al., 2009). And there are not few computational labs that study plasticity rules and their effect on the organization of neuronal networks, which is also some kind of self-organization, e.g. the labs of Henning Sprekeler, Claudia Clopath, Tim Vogels, Friedemann Zenke, Richard Naud, all of them influenced by Wulfram Gerstner; or Sophie Deneuve; Christian Machens; Wolfgang Maass – to name just few out of many people who work on this topic. I think this is one of the most interesting fields of theoretical neuroscience. However, I would be personally very satisfied to see this field shift towards a better inclusion of the self-organizing perspective.

To give a random example, in a study from this year, Naumann and Sprekeler show how specific non-linear properties of neurons can mitigate a well-known problem associated with purely Hebbian plasticity rules (Presynaptic inhibition rapidly stabilises recurrent excitationin the face of plasticity, 2020). They basically take an experimental finding that has been made quite some time ago (pre-synaptic inhibition of the axonal boutons via GABAB receptors) and builds a model that explains how this could make sense in the light of plasticity rules. This is a very valuable way of doing research, also because it takes biological details of neurons into account and gives the experimentalists a potential context and explanation of their findings. However, this approach seems to be the perspective of a designer or engineer, rather than the approach of somebody who aims at an understanding of a self-organized system. What would be an alternative approach?

From engineered organization to self-organization

I think it would be useful to take the perspective of a neuron and in addition also an evolutionary perspective. Let’s say, a neuron with certain properties (rules on when and how to connect) joins the large pool of a recurrent network. The question which must be solved by the neuron is: How do I learn how to behave meaningfully?

I’d like to give an analogy on how I think this neuron should ideally behave: A human person that interacts with others in a social network, be it in real life or in the virtual world, must adjust their actions according to how they are received. Shouting loud all the time will isolate them, because they are blocking the receiving channels of others, and being silent all the time will let others equally drop the connections. To adjust the level of output and to adjust the appropriate content that will be well-received, it is crucial to listen to feedback.

This is what I think could be the central question from this self-organized perspective of neuronal circuits: How does the neuron get feedback on its own actions? With feedback, I do not mean global error signals about the behavior of the organism via neuromodulation channels, but feedback on the neuron’s action potentials and its other actions. Where does this feedback come from?

If we reduce the complex network to a single cell organism that lives by itself, we can immediately see the answer to this question. The feedback comes from the external world. A spike of this single cell organism has a direct impact on the world, and the world in return acts back upon the single cell organism. It is not clear how this scales up to larger networks, but I think that this inclusion of the external world, as opposed to a machine learning-style input-output task, could be the most important ingredient that makes the step from engineered network organizations to self-organized networks.

(There are many loose connections from here to reinforcement learning using learning agents and also to predictive processing, but let’s not go into that here.)

Conclusion and summary

I’m glad to get convinced by the opposite, but today I think that a very promising way to achieve a deep understanding of the brain could consist of the following ingredients, as motivated above:

1) To regard the brain as an at least partially self-organized network,

2) To use simulations together with evolutionary algorithms to explore the generative / self-organizing principles,

3) To consider properties and actions on the level of single neurons as the main parameters that can be modified during this evolutionary process and

4) To include an external world to transition from an externally organized to a self-organized system.