CAnGAROO is a program to detect clefts in proteins. The definition of a
cleft is purely geometrical. A cleft is a large inward-facing area on the
surface of a protein. The surface of a protein can be the Van der Waals,
Connolly, Solvent Accessible surface or any other kind of closed surface
around the protein. CAnGAROO computes a surface of the chosen type around
a given protein. This surface is composed of a set of points computed at
a given density (generally between 1 and 3 points/Å2). A special
kind of surface curvature is then calculated for each point of the surface.
The normal vectors computed with the curvature are then orientated toward
the inside of the surface, if the region around the current point is inward-facing
and vice-versa. Then a simple clustering of the points according to their
curvature and normal direction and their relative position in space gives
a set of inward-facing areas.
Molecular Surfaces
Although CAnGAROO only needs as input a file containing the coordinates
of some points in space that represents a surface (whatever surface), it
can also generate three kinds of surfaces:
Molecular
surfaces
The simplest one, the Van der Waals surface, consists of the surface engendered
by the Van der Waals spheres of the atoms of the molecule (RED in the following
images).
Van
der Waals surface
The Solvent Accessible surface (YELLOW in the following images) is the
same as the VDW surface except that the value of the radius of a probe
sphere has been added to all the VDW radii.
Solvent
Accessible surface
The Connolly surface, also called the molecular surface, (BLUE in the following
images) has been defined by F.M. Richards [1]
and
is the underlying surface of a spheric probe rolling on the VDW surface.
Connolly
surface
Surface Curvature
For each point of the surface the surface curvature is computed. The surface
curvature of a point p is computed according to a set of reference
points that are approximately at a given geodesic distance from p.
Ideally, this geodesic distance should be equal to the depth of the deepest
cleft. As this is not known in most cases, a value of 8 Å has proved
appropriate in many cases. The graphic below illustrates why the geodesic
distance is chosen instead of the simpler to compute Euclidean distance.
In this case, point n3 would have been selected as a reference point
for p, and this is clearly wrong.
The 2D curvatures of point p according to its set of reference
points and the normal at p are also computed. For more details,
see [2][3].
Direction of the Normal
At this stage, one does not know if the normal vector at each point is
pointing inside or outside the surface. If it points inside, then the point
lies on an inward facing region that is potentially a cleft. The graphic
below illustrates the algorithm that we have developed. A line that originates
from the current point is followed in the direction of the normal at this
point. If this line cuts the surface an even number of times, the normal
is facing outward. Otherwise it is facing inward. This algorithm is numerical
hence CPU time consuming. For more details, see [2][4].
Examples
HIV Protease
The Solvent Accessible surface of HIV Protease has been computed
from the protein contained in file 5HVP.PDB of the Protein Data Bank. The
point density is 2points/Å2 and the probe radius is 1Å. 13
potential clefts have been found by CAnGAROO in 5 min 30sec CPU time on
a 100MHz INDY. The largest of these 13 inward-facing regions is a known
HIV protease receptor site. A picture of this cleft is shown below . A
set of superimposed ligands is also displayed. It should be emphasised
that the ligands have not been used in the detection of the cleft.
HIV protease cleft with ligands
p21
The solvent accessible surface of the protein p21 was computed
in a similar way to the previous example, using the file 1Q21 from the
Protein Data Bank, but here, the water molecules were not removed before
the computation. The point density is 2points/Å and the probe radius
1Å. 17 potential clefts were found, the largest one being the GDP
binding site. CAnGAROO used 5 min CPU time on a 100MHz INDY. Below is shown
the detected cleft and GDP.
p21 with ligand (GDP)
References
F.M. Richards, Ann. Rev. Biophys. Bioeng., 6 (1977) 151.
D.M. Bayada, PhD Thesis, University of Leeds, UK, February 1994.