# Division Day: using PCA in cell biology

In this post I’ll describe a computational method for splitting two sides of a cell biological structure. It’s a simple method that relies on principal component analysis, otherwise known as PCA. Like all things mathematical there are some great resources on the web, if you want to understand this operation in more detail (for example, this great post by Lior Pachter). PCA can applied to many biological problems, you’ve probably seen it used to find patterns in large data sets, e.g. from proteomic studies. It can also be useful for analysing microscopy data. Since our analysis using this method is unlikely to make it into print any time soon, I thought I’d put it up on Quantixed.

Mitotic spindle in 3D. Kinetochores are green. Microtubules are red.

During mitosis, a cell forms a mitotic spindle to share copied chromosomes equally to the two new cells. Our lab is working on how this process works and how it goes wrong in cancer. The chromosomes attach to the spindle via kinetochores and during prometaphase they are moved to the middle of the cell. Here, the chromosomes are organised into a disc-like structure called the metaphase plate. The disc is thin in the direction of the spindle axis, but much larger in width and height. To examine the spatial distribution of kinetochores on the plate we wanted a way to approximately separate kinetochores on one side if the plate from the other.

Kinetochores can be easily detected in 3D confocal images of mitotic cells by particle analysis. Kinetochores are easily stained and appear as bright spots that a computer can pick out (we use Imaris for this). The cartesian coordinates of each detected kinetochore were saved as csv and fed into IgorPro. A procedure could then be run which works in three steps. The code is shown at the bottom, it is wrapped in further code that deals with multiple datasets from many cells/experiments etc. The three steps are:

1. PCA
2. Point-to-plane
3. Analysis on each subset

I’ll describe each step and how it works.

1. Principal component analysis

This is used to find the 3rd eigenvector, which can be used to define a plane passing through the centre of the plate. This plane is used for division.

Now, because the metaphase plate is a disc it has three dimensions, the third of which – “thickness” – is the smallest. PCA will find the principal component, i.e. the direction in which there is most variance. Orthogonal to that is the second biggest variance and orthogonal to that direction is the smallest. These directions are called eigenvectors and their magnitude is the eigenvalue. As there are three dimensions to the data we can get all three eigenvectors out and the 3rd eigenvector corresponds to thickness of the metaphase plate. Metaphase plates in cells grown on coverslips are orientated similarly, but the cells themselves are at random orientations. PCA takes no notice of this and can simply reveal the direction of the smallest dimension of a 3D structure. The movie shows this in action for a simulated data set. The black spots are arranged in a disk shape about the origin. They are rotated about x by 45° (the blue spots). We then run PCA and show the eigenvectors as unit vectors (red lines). The 3rd eigenvector is normal to the plane of division, i.e. the 1st and 2nd eigenvectors lie on the plane of division.

Also, the centroid needs to be defined. This is simply the cartesian coordinates for the average of each dimension. It is sometimes referred to as the mean vector. In the example this was the origin, in reality this will depend on the position and the overall height of the cell.

A much longer method to get the eigenvectors is to define the variance-covariance matrix (sometimes called the dispersion matrix) for each dimension, for all kinetochores and then do an eigenvector decomposition on the matrix. PCA is one command, whereas the matrix calculation would be an extra loop followed by an additional command.

2. Point-to-plane

The distance of each kinetochore to the plane that we defined is calculated. If it is a positive value then the kinetochore lies on the same side as the normal vector (defined above). If it is negative then it is on the other side. The maths behind how to do this are in section 10.3.1 of Geometric Tools for Computer Graphics by Schneider & Eberly (starting on p. 374). Google it, there is a PDF version on the web. I’ll save you some time, you just need one equation that defines a plane,

$ax+by+cz+d=0$

Where the unit normal vector is [a b c] and a point on the plane is [x y z]. We’ll use the coordinates of the centroid as a point on the plane to find d. Now that we know this, we can use a similar equation to find the distance of any point to the plane,

$ax_{i}+by_{i}+cz_{i}+d$

Results for each kinetochore are used to sort each side of the plane into separate waves for further calculation. In the movie below, the red dots and blue dots show the positions of the kinetochores on either side of the division plane. It’s a bit of an optical illusion, but the cube is turning in a right hand fashion.

3. Analysis on each subset

Now that the data have been sorted, separate calculations can be carried out on each. In the example, we were interested in how the kinetochores were organised spatially and so we looked at the distance to nearest neighbour. This is done by finding the Euclidean distance from each kinetochore to every other kinetochore and putting the lowest value for each kinetochore into a new wave. However, this calculation can be anything you want. If there are further waves that specify other properties of the kinetochores, e.g. brightness, then these can be similarly processed here.

Other notes

The code in its present form (not very streamlined) was fast and could be run on every cell from a number of experiments, reading out positional data for 10,000 kinetochores in ~2 s. For QC it is possible to display the two separated coordinated sets to check that the division worked fine (see above). The power of this method is that it doesn’t rely on imaging spindle poles or anything else to work out the orientation of the metaphase plate. It works well for metaphase cells, but cells with any misaligned chromosomes ruin the calculation. It is possible to remove these and still fit the plane, but for our analysis we focused on cells at metaphase with a defined plate.

What else can it be used for?

Other structures in the cell can be segregated in a similar way. For example, the Golgi apparatus has a trans and a cis side, which could be similarly divided (although using the 2nd eigenvector as normal to the plane, rather than the 3rd).

Acknowledgements: I’d like to thank A.G. at WaveMetrics Inc. for encouraging me to try PCA rather than my dispersion matrix approach.

If you want to use it, the code is available here (it seems I can only upload PDF at wordpress.com). I used pygments for annotation.

The post title comes from “Division Day” a great single by Elliott Smith.

# Sticky End

We have a new paper out! You can access it here.

The work was mainly done by Cristina Gutiérrez Caballero, a post-doc in the lab. We had some help from Selena Burgess and Richard Bayliss at the University of Leicester, with whom we have an ongoing collaboration.

The paper in a nutshell

We found that TACC3 binds the plus-ends of microtubules via an interaction with ch-TOG. So TACC3 is a +TIP.

What is a +TIP?

EB3 (red) and TACC3 (green) at the tips of microtubules in mitotic spindle

This is a term used to describe proteins that bind to the plus-ends of microtubules. Microtubules are a major component of the cell’s cytoskeleton. They are polymers of alpha/beta-tubulin that grow and shrink, a feature known as dynamic instability. A microtubule has polarity, the fast growing end is known as the plus-end, and the slower growing end is referred to as the minus-end. There are many proteins that bind to the plus-end and these are termed +TIPs.

OK, so what are TACC3 and ch-TOG?

They are two proteins found on the mitotic spindle. TACC3 is an acronym for transforming acidic coiled-coil protein 3, and ch-TOG stands for colonic hepatic tumour overexpressed gene. As you can tell from the names they were discovered due to their altered expression in certain human cancers. TACC3 is a well-known substrate for Aurora A kinase, which is an enzyme that is often amplified in cancer. The ch-TOG protein is thought to be a microtubule polymerase, i.e. an enzyme that helps microtubules grow. In the paper, we describe how TACC3 and ch-TOG stick together at the microtubule end. TACC3 and ch-TOG are at the very end of the microtubule, they move ahead of other +TIPs like “end-binding proteins”, e.g. EB3.

What is the function of TACC3 as a +TIP?

We think that TACC3 is piggybacking on ch-TOG while it is acting as a polymerase, but any biological function or consequence of this piggybacking was difficult to detect. We couldn’t see any clear effect on microtubule dynamics when we removed or overexpressed TACC3. We did find that loss of TACC3 affects how cells migrate, but this is not likely to be due to a change in microtubule dynamics.

I thought TACC3 and ch-TOG were centrosomal proteins…

In the paper we look again at this and find that there are different pools of TACC3, ch-TOG and clathrin (alone and in combination) and describe how they reside in different places in the cell. Although ch-TOG is clearly at centrosomes, we don’t find TACC3 at centrosomes, although it is on microtubules that cluster near the centrosomes at the spindle pole. TACC3 is often described as a centrosomal protein in lots of other papers, but this is quite misleading.

What else?

We were on the cover – whatever that means in the digital age! We imaged a cell expressing tagged EB3 proteins, EB3 is another +TIP. We coloured consecutive frames different colours and the result looked pretty striking. Biology Open picked it as their cover, which we were really pleased about. Our paper is AOP at the moment and so hopefully they won’t change their mind by the time it appears in the next issue.

Preprinting

This is the second paper that we have deposited as a preprint at bioRxiv (not counting a third paper that we preprinted after it was accepted). I was keen to preprint this particular paper because we became aware that two other groups had similar results following a meeting last summer. Strangely, a week or so after preprinting and submitting to a journal, a paper from a completely different group appeared with a very similar finding! We’d been “scooped”. They had found that the Xenopus homologue of TACC3 was a +TIP in retinal neuronal cultures. The other group had clearly beaten us to it, having submitted their paper some time before our preprint. The reviewers of our paper complained that our data was no longer novel and our paper was rejected. This was annoying because there were lots of novel findings in our paper that weren’t in theirs (and vice versa). The reviewers did make some other constructive suggestions that we incorporated into the manuscript. We updated our preprint and then submitted to Biology Open. One advantage of the preprinting process is that the changes we made can be seen by all. Biology Open were great and took a decision based on our comments from the other journal and the changes we had made in response to them. Their decision to provisionally accept the paper was made in four days. Like our last experience publishing in Biology Open, it was very positive.

References

Gutiérrez-Caballero, C., Burgess, S.G., Bayliss, R. & Royle, S.J. (2015) TACC3-ch-TOG track the growing tips of microtubules independently of clathrin and Aurora-A phosphorylation. Biol. Open doi:10.1242/​bio.201410843.

Nwagbara, B. U., Faris, A. E., Bearce, E. A., Erdogan, B., Ebbert, P. T., Evans, M. F., Rutherford, E. L., Enzenbacher, T. B. and Lowery, L. A. (2014) TACC3 is a microtubule plus end-tracking protein that promotes axon elongation and also regulates microtubule plus end dynamics in multiple embryonic cell types. Mol. Biol. Cell 25, 3350-3362.

The post title is taken from the last track on The Orb’s U.F.Orb album.