| | |||||||
| Register | Search | Today's Posts | Mark Forums Read |
| Cell Biology and Cell Culture Cell Biology Forum. Cell Culture Forum. Post and ask questions about cell culturing, cell lysis, cell transfection, cell growth, and cell biology. |
| | LinkBack | Thread Tools | Display Modes |
|
#1
| |||
| |||
| Hi, As the on-line version of my article on the Human Cytome Project and the application of cytomics in medicine and drug discovery (pharmaceutical research) evolves, I put the updated version in this newsgroup for reference. The original "question" on a Human Cytome Project was posted in this newsgroup on Monday 1 December 2003. On-line version (split version): A Human Cytome Project - an idea [Only registered users see links. ] Human Cytome Project and Drug Discovery [Only registered users see links. ] Human Cytome Project - How to Explore [Only registered users see links. ] A framework for cytome exploration [Only registered users see links. ] ================================================== ========================== A framework for cytome exploration By Peter Van Osta Goal To create an analog to digital workflow concept which can be applied to ultra large scale research of human cellular diversity to improve our understanding of cellular disease processes and to develop better drugs (less attrition due to better functional predictions). Allow for managing a highly diverse quantitative processing of cellular structure and function. Create in-silico representations of cellular (and maybe beyond) structure and function to make them accessible to quantitative content and feature extraction. Introduction An entire organism is an anisotropic, densely packed, 4D grid (or matrix) of a high order of “recursive” information levels. We can study its structure and function at multiple levels, where the structure and function at each level is intertwined with over- and underlying structures and their function. The genotype and the phenotype both exist in a continuum of (bidirectional) interacting organizational levels. Here I want to present and discuss some ideas on the exploration of the cytome and the conversion of the spatial, spectral and temporal properties of the cytome and its cells into their in-silico digital representation. It is a set of ideas about a concept which is still changing and growing, so do not expect anything final or polished yet. A modular and distributed framework should provide a unified approach to the management of the quantitative analysis of space (X, Y and Z), spectrum (wavelength) and time (t) related phenomena. We want to go from physics to quantitative features and finally come to a classification and understanding of the underlying biological process. We want to extract attributes from the physical process which are giving us information about the status and development of the process and its underlying structures. First we have to create an in-silico digital representation starting from the analogue reality captured by an instrument. The second stage (after creation of an in-silico representation) is to extract meaningful parts (objects) related to biologically relevant structures and processes. Thirdly we apply features to the extracted objects, such as area and (spectral) intensity, which represent (relevant) attributes of the observed structure and process. Finally we have to separate and cluster objects based on their feature properties into biologically relevant subgroups, such as healthy versus disease. In order to quantify the physical properties of space and time of a biological sample we must be able to create an appropriate digital representation of these physical properties in-silico. This digital representation is then accessible to algorithms for content extraction. The content or objects of interest are then to be presented to a quantification engine which associates physical meaningful properties or features to the extracted objects. These object features build a multidimensional feature space which can be inserted into feature analyzers to find object/feature clusters, trends, associations and correlations. Managing the flow My personal interest is to build a framework in which acquisition, detection and quantification are designed as modules each using plug-ins to do the actual work and which operate on objects being transferred through the framework. Data representing space, time and spectral sampling are distributed throughout a data management system to be processed. The data flow through the framework and are subjected to plug-in modules which operate on the data and transform the content into another content representing space, such as physics to features. The focus is not on the individual device to create the data or on individual algorithms, but on the management of the dataflow through a distributed system to convert spatial, spectral and temporal data into a feature (hyper-) space for quantitative analysis. The software framework manages the entire flow and transformation of data from physics to features, like a ball which is thrown from player to player. As long as digital information is transferred from module to module, it is nothing more than a chunk of data whose actual data layout is only important for those modules which act upon its data content. The dimensionality of its content (XYZ, spectral, time) only matters for those modules which have be aware of it for extracting content in the process from converting physics into features and finally attributing a meaning to the events being observed. Up- and downscaling of cell-based research is dynamically managed by the system as the scale of processing does not require a change in basic design. Expanding and collapsing data and feature dimensionality is a dynamic process in itself and leading to a continuously variable exploration system. Methods and algorithms for content extraction and feature attribution are overloaded for a multiplicity of data types and dimensionality. I will mostly focus on imaging technology, but the basic principles should be applicable on any digitized content extraction process. Images are digital information matrixes of a higher order; they only become images as such when we want to look at them and have to transform them into something which is meaningful for our visual system. Visualization provides us with a window on the data content, but not necessarily on the data as such. Probing the sample We want to extract from the sample its structure and its dynamics or the flow of its structural changes through time. When applying digital imaging technology to a biological sample, a clear understanding of the physical characteristics of the sample and its interaction with the “sampling” device is a prerequisite for a successful application of technology. The basic principle of a digital imaging system is to create a digital in-silico representation of the spatial, temporal and spectral physical process which is being studied. In order to achieve this we try to let down an equidistant sampling grid on the biological specimen. The physical layout of this sampling grid in reality is never a precise isomorphic cubical sampling pattern. The temporal and spectral sampling inner and outer resolution is determined by the physical characteristics of the sample (electromagnetic spectral range and spectral sampling layout) and the interaction with the detection technology being used. The instrument which converts the spatial (scale, dimensions), spectral (electromagnetic energy, wavelength) and temporal continuum of the sample into its digital representation allows us to take a view on biology beyond the capacity of our own perceptive system. It rescales space, spectrum and time into a digital representation accessible to human perception (contrast-range, color) and ideally also to quantification. Instruments rescale spatial dimensions, spectral ranges and time into a scale which is accessible to the human mind. The digital image acts as a see-through window on a part of the physical properties of the biological sample, not on the instrument as such. We want to insert a probe system into the sample which changes its state according to the physical characteristics of the sample. A probe is in general a dual system, a structure/function reporter on one side and an appropriate detector on the other side. The changes in the probe system are ideally perfectly aligned in a spatial-spectral and temporal space with the physical properties of the sample itself in space and time. Each probe system senses the state of the specimen with a finite aperture and so provides us with a view on the biological structure. All sensing is done in a 5 dimensional environment, in 3D space, spectrum (wavelength) and time. It is the inner an outer resolution of our sampling which changes. When we do 2D imaging, this is the same as 3D with the 3rd dimension collapsed to one layer, but due to the Depth of Focus (D.O.F.) of the optical system we use, this represents a physical Z-slice. In the spectral domain we also probe electromagnetic energy along the spectral axis with a certain inner and outer resolution. We slide up and down the spectral axis within the spectral limits of the probing system, which transforms analogue electromagnetic energy into its digital representation. A single CCD camera probes the visible spectrum (and beyond) in one sweep, with a rather bad inner resolution. A 3CCD camera uses 3 probes to do its spectral sampling and gives us a threefold increase in inner resolution. Increasing or decreasing the density of the spectral sampling is only a matter of spectral dynamics. By using n cameras (or PMTs, etc.), each individually controlled (spectral) we can expand or collapse our spectral inner and outer resolution. We tend to use “spectral imaging” for anything which samples the visible spectrum with more than the spectral resolution of a 3CCD camera. Up-and downscaling our spectral sampling from broad to narrow, parallel or sequential, continuous or discontinuous is a matter of applying an appropriate detector array. A system can manage 1 to n spectral probing devices such as cameras or PMTs (or a spectral filter in front of a single detector), each sampling a part of the spectrum and spatially aligned allows probing the spectrum in a dynamic way. The time axis is also probed with a varying temporal inner and outer resolution and depending on the characteristics of the detection device; the time-slicing can be collapsed or expanded. Time can be sampled continuously or discontinuously (time-lapse). We can expand or collapse the temporal resolution of the detector in order to capture (temporal integration) weak signals or shorten the time-slicing down to the minimum achievable with a given detector. In order to compensate for sensitivity deficits of a detector, three strategies for improvement can be followed, but all three decrease the sampling resolution. Spatial, spectral and temporal signal integration can be used by expanding the physical scale of capturing along the spatial, temporal or spectral axis or in combinations. Using a B/W camera instead of a 3CCD camera is a way of spectral integration, but gives a threefold reduction in spectral sampling. The result of the detection is a 5-dimensional system expanding or collapsing each dimension (XYZ, lambda, time) according to the requirements of exploration. The device and its components attached to the exploration core, imposes the inner and outer resolution limits upon the system. In-silico these are only high-order matrix arrays representing a 5D space. We could call this a continuously variable in-silico representation. The inner an outer resolution of the probing system is determined by the physical XYZ sampling characteristics of the sampling device, such as its point spread function (PSF). For a digital microscope the resolving power of the objective (XYZ) and its depth of view/focus are important issues in experimental design and determining the application range of a device. The interaction of the detection device with the image created by the optics of the system such as Nyquist sampling demands, distribution of spectral sensitivity, dynamic range, also plays an important role. Multiplexing In order to increase and improve the extraction of content from our experiments, we try to increase their information density by multiplexing. To increase the throughput of exploration we try to do multiple experiments simultaneous to obtain multiple readouts at once. We miniaturize the experiments (multi-well plates, arrays) and we use biological entities which can be multiplexed in relatively small volumes (cells, tissue samples). We place multiple molecular structural and/or functional markers or labels into each biological unit (labeled molecules, structural contrast), so we can make functional and structural cross-correlations between biological events. The more events and structures we can explore in parallel, the more chance we have to detect potential meaningful events (shotgun, grid, and mesh or spider web type exploration). From each structural or functional label we extract multiple attributes as quantitative features. It is the choice of the appropriate markers and their features which are co-changing with functional attributes (cell division, apoptosis, cell death …) which is open for exploratory research. Arrays are actually a type of miniaturized assays; they allow us to do more experiments on a smaller footprint. The exploration of samples is organised in an array-pattern (in general 2D due to technical limitations), ranging form a single tissue slice on a glass slide up to a large scale grid of for instance a cell or tissue expression arrays. Biological samples, up to tissue samples are small enough to allow for multiplexing experiments and they do not require large amounts of reagents in huge containers. Multiplexing experiments with entire elephants would be somewhat cumbersome, but DNA, protein, cells and parts of tissue nicely fit into our instruments. Scaffold cultures would allow us to use the 3rd dimension if we can properly capture its content. Dynamic scaffold culturing, would allow us to disassemble the culture for manipulation or content exploration and reassemble them for continuation of the experiment (the ultimate scaffold culture is the organism itself). DNA and protein arrays are arrays of the first degree, as each sample in an array in itself provides us with a scalar readout; there is no further spatial differentiation. Cell arrays are of the second or third degree, depending on the content (how many cells per array coordinate) and the resolution of the readout. In an array of the second degree each array coordinate is in itself an array as it is not a homogeneous sample (multiple cells), but readout resolution is limited to the sub elements. In an array of the third degree each of the sub elements is also compartmentalized (e.g. tissue arrays, sub-cellular organelles, nuclear organization) and each array coordinate is explored at sufficient resolution. By using arrays with multiple cells at each coordinate, we can create readout cascades at multiple readout resolutions. This way we can combine speed and simplicity for a quick overview and switch to more detail, to find out about cellular heterogeneity and/or sub-cellular compartmentalization. At each array position we can add additional spatial, spectral and temporal multiplexing strategies. Spatial multiplexing in arrays is done in cell based assays or bead assays. Spectral multiplexing is done by using multiple spectral labels, either static or by using spectral shift signalling (dynamic spectral multiplexing). Temporal multiplexing is done by sequential readouts at each array position to study dynamics or kinetics. By combining arrays with multiplexing we can increase the content readout of experiments. By combining DNA-, RNA-, protein-, cell- and tissue arrays with each other we can also multiplex information from different biological processes, e.g. massive parallel RNAi transfection of stem cells. When we construct arrays with compartmentalized elements, we can up- and downscale our exploration without the need to redo an entire experiment and so extract more content from the experiment when wanted. The experiment is arranged and its content is extracted in a way like Russian dolls fit into each other. When the array consists of living cells or tissue, we can add the time dimension to our experiment and create a 4D array for experimental multiplexing. The granularity or density of the array pattern is determined by the experimental demands and upstream and downstream processing capacity. Of course the optical characteristics of the sample carrier (glass, plastic) will determine the spatial sampling limits in its inner and outer resolution. The optical and mechanical characteristics of the device used to explore the (sub) cellular physical domain will also lead to a spatial, spectral and temporal application domain. The coarse grid-like pattern of samples on a sample carrier is being explored at each array position at the appropriate inner and outer resolution, within the optical physical boundaries of the device used to capture the data. The outer resolution barrier of the individual detector in space and time is extended by both spatial and temporal tiling at a range of intervals. Spectral multiplexing is being done by using spectral selection devices with the appropriate spectral characteristics for the spectral profile of the sample. Feedback loops on the content-flow The detection cascade is not a one way passive flow of events, but we can place content-driven feedback systems into the dataflow. Adaptive content generation manages a source content driven digitalization process. Active feedback and control depends on the degree of automation and flexibility of the detection system. The spatial content capturing can be driven by a plug-in which controls the spatial sampling in order to sample within the physical boundaries of a sample (e.g. adaptive tissue scanning in 2D or 3D and beyond). A plug-in is docked into the system to modify its behavior and make it respond to content changes. The decision process can be implemented, based on a set of rules implemented as a neural network, fuzzy logic or whatever is appropriate. Spatial, spectral and temporal events can drive the process to create a content-driven acquisition process. Feedback loops cross the dimension and scale boundaries, a spectral change can drive a change in spatial layout, etc. A content driven time-lapse will change its temporal pacing whenever a meaningful event is detected and allow for aniso-temporal sampling. An acquisition system can be equipped with an active search plug-in making it search for interesting regions at low resolution and switching to high resolution for spectral and/or time-slicing. Liquid dispensers, incubators, robot arms and other automated components can be controlled by a content driven control system. Object extraction Robust operating algorithms for object extraction are a prerequisite for a large scale endeavor. A semi-interactive approach is not acceptable for large volume processing. The challenges are enormous as robust unattended large scale object extraction is still not achieved in many cases. The failure rate of the applied object extraction procedures must be less than 1 to 0.1 percent if we are to rely on large scale automated exploration of the human cytome. The detection of appropriate objects for further quantification is done either in-line within the acquisition process or distributed to another process dealing with the object extraction. Objects should be aligned with biological structures and processes. The pixel or voxel representation in-silico however is basically “unaware” of this meta-information about how the digital density pattern was created. The physical meaning of one data point will change depending on the spatial, temporal and spectral sampling and its inner and out resolution. The digital data build a (dis-)continuous representation of a spatial, spectral and temporal continuum which expands or collapses in an anisotropic way. The content of the data is of no meaning for a data-transfer system as such, it only transfers the content throughout its dependencies. Analytical tools operating on the data content need to be informed about the layout of the data. Detection and quantification algorithms act on the digital information as such and only the back-translation into physical meaningful data requires a back-propagation into the real-world layout and dimensions. The resulting discrete representation of the sampled spatial, spectral and temporal grid at each array position is being sent to a storage medium (file system, database…) to provide an audit trail for quality assessment and data validation. Content extraction The selected objects are sent to a quantification module which attaches an array of quantitative descriptors (shape, density …) to each object. We expand or collapse the content extraction according to their meaning for describing the biological phenomenon. Content extraction is being multiplexed, just as the experiment itself. Objects belonging to the same biological entity are tagged to allow for a linked exploration of the feature space created for each individual object. The resulting data arrays can be fed into analytical tools appropriate for analysing a high dimensional linked feature space or feature hyperspace. The dynamics of the attributes of the biological system need not be aligned with the features we extract to create a quantitative representation. An attribute change and a feature of which we expect to represent this change may not be perfectly aligned, so we may only capture a fraction of the actual change itself. Changes may occur in a combined spatial-spectral and temporal space of which we can only capture certain features, such as length, intensity, volume, etc. The feature sets can be fed into analytical systems for statistical data analysis, exploratory statistics, classification and clustering. Classification performance can be improved by combining several independent classifiers on the feature sets. The resultant vector of a multiparametric quantification may point in the most meaningful direction to capture a change. Both parametric and nonparametric approaches to classification can be used. We often try to do our experiments on a non-changing background (genetic homogeneity) or average the background noise by randomisation. What we call noise is in many cases not well understood but maybe meaningful dynamic behaviour of a system? Trying to describe changes relative to underlying oscillations, e.g. cell cycle, by using dynamic background reporters could help to find dynamic correlations between events. Copyright notice and disclaimer My web pages represent my interests, my opinions and my ideas, not those of my employer or anyone else. I have created these web pages without any commercial goal, but solely out of personal and scientific interest. You may download, display, print and copy, any material at this website, in unaltered form only, for your personal use or for non-commercial use within your organization. Should my web pages or portions of my web pages be used on any Internet or World Wide Web page or informational presentation, that a link back to my website (and where appropriate back to the source document) be established. I expect at least a short notice by email when you copy my web pages, or part of it for your own use. Any information here is provided in good faith but no warranty can be made for its accuracy. As this is a work in progress, it is still incomplete and even inaccurate. Although care has been taken in preparing the information contained in my web pages, I do not and cannot guarantee the accuracy thereof. Anyone using the information does so at their own risk and shall be deemed to indemnify me from any and all injury or damage arising from such use. To the best of my knowledge, all graphics, text and other presentations not created by me on my web pages are in the public domain and freely available from various sources on the Internet or elsewhere and/or kindly provided by the owner. If you notice something incorrect or have any questions, send me an email. Email: pvosta at cs dot com First on-line version published on 9 Jan. 2005, last update on 16 April 2005 The author of this webpage is Peter Van Osta, MD. |
| Tags |
| 2005 , april , cytome , exploration , framework , human , project , update |
| Thread Tools | |
| Display Modes | |
|
|
| | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Human Cytome Project - Update 24 Jan. 2005 | Peter Van Osta | Cell Biology and Cell Culture | 1 | 08-01-2010 02:18 PM |
| Human Cytome Project - an idea - Update 19 April 2005 | Peter Van Osta | Cell Biology and Cell Culture | 1 | 06-01-2009 02:17 PM |
| Human Cytome Project - Update 6 Jan. 2005 | Peter Van Osta | Cell Biology and Cell Culture | 0 | 01-06-2005 10:18 AM |
| New Saccharomyces Sequences 08/11/04 | SGD Sequences | Yeast Forum | 0 | 08-12-2004 12:26 AM |