As
part of my Biochemistry degree at Oxford, I had to spend a year focusing on a
single research project. My obsession with bioinformatics was already firmly
established when Iain Campbell, a leading NMR spectroscopist and structural
biologist, took me under his wing. At the time, structural biology was
definitely the most computational area of molecular biology, so I was looking
forward to getting stuck into a computational project.
![]() |
| Type III fibronectin determined by NMR, from I. Campbell's group |
It
was great to be immersed in this technical world – the COSY and NOESY spectra,
triggered by a series of radio pulses to link up atoms; the rather impressive
cooling process, with liquid helium being poured into huge superconducting
magnets sunk into the ground; the somewhat scary signs warning people with
pacemakers to turn back (the magnetic fields are insanely strong in and around
an NMR machine)...
I
learned a lot about NMR and structural biology, from the technical aspects of
chemical shifts, coupling constants, distance restraints, disordered regions
and hydrophobic cores to the elegance of protein structures that manage to fold
perfectly to do something so absolutely specific.
Seeing
is believing
But
… I had already heard the siren call of simpler, linear protein and DNA
sequences, and there was this wonderful new institute going to be set up – the
Sanger Centre – and I had a chance to work there on sequencing the human
genome…
…
fast forward 20 years, when I had the pleasure of sitting in the back of the room
for the Protein Data Bank in Europe (PDBe) Scientific Advisory Board, now as
Associate Director of the EBI. I was still just in awe of the incredible beauty
and precision of protein structures, and the skills of structural biologists in
uncovering their details.
![]() |
| Cryo electron tomography of sensory cilia |
Some
things had not changed in 20 years: dihedral angles are still important,
transitions from ordered to disordered are still being explored and the methods
are still extremely technically detailed. But other areas have progressed so much
they are almost unrecognisable: the ability to look at larger complexes, with
electron microscopy (EM) techniques – single-particle averaging and, even more
impressive to me, electron tomography. Electron tomography allows you to
reconstruct a single 3D sample to ~40 Å from images taken at a series of sample
tilts – no crystal, no averaging, just for this particular sample – like a
high-resolution 3D microscopy image. These are spine-tingling images.
So
often we have to conceptualise and imagine what is going on in cells. Electron
tomography is the closest thing I’ve seen to actually seeing molecular biology
in action. One can see little ribosomes, microtubules and proteasomes and complex
membrane-associated structures in a bacterial cell, in a single 3D volume.
Keeping
up
The
wealth of structural data has grown incredibly over the past decade. New
techniques such as EM are constantly emerging and structural biology’s
workhorse, X-ray crystallography, is continually being refined with better
production and crystallisation techniques and tuneable high-energy X-rays from
synchrotrons. Light microscopy has also improved vastly, with techniques such
as super-resolution technology.
Integrating
all the data being produced with these techniques to gain an overall view is an
impressive task. It involves fitting X-ray structures into EM maps and then
into tomograms, with NMR measurements to provide the dynamics at the atomic
level and light microscopy to illuminate the dynamics at the complex and
organelle level. There are still so many more protein structures to determine
and integrate, and endless discoveries to be made.
Bringing
structure to genomics?
All
this progress is not just for the benefit of structural biologists. Gerard Kleywegt, who leads PDBe, has a passion for making this information accessible
to the broader biology community. Molecular biologists, developmental
biologists, geneticists and systems biologists can all make use (or more use...) of structural
data.
All
too often we forget that linear sequence shows only how information is encoded,
not how it is used. The majority of things that happen outside of the nucleus,
and certainly the vast majority of the “doing” of life, is executed by either
proteins or RNAs folded up into specific structures and collaborating in
specific complexes. We know a lot about these structures and complexes – 4,717
proteins (23% of protein coding genes)
have at least one structure (many proteins have far more than one structure), and this accounts for 42% of residues in these
proteins (around 11% of protein residues overall). When we expand this to
things we can confidently model, this goes up.
I
am sure that in my own research area – genomics – we’re not taking enough
advantage of this information. We might think about structural biology as the
final mechanistic determination of why one allele has an effect or not, but can
we integrate structural information to make our statistical genetic tests more
powerful? Can we use the collection protein structures of transcription factors
(often bound to DNA) to help interpret DNaseI footprinting results? Or use
protein-complex information to inform epistasis models, potentially at a
residue/patch-of-protein, not just at the gene level?
Many
fields use structural information in all sorts of ways but I am sure the
integration of different structural techniques, and the integration of that
structural information with other experiments and knowledge – chemistry,
pathways, gene expression, proteomics – is going to be amazing.
Part
of me wonders why I chose to stick with the “boring” world of linear, four-base
DNA sequence some 20 years ago. I guess there’s always time to learn some new
tricks…


