Telescopes, microscopes, cameras, and many other optical instruments share a common design that has not changed since the days of Galileo.  Rays of light enter one end of a system, hit a lens or mirror, change direction, travel a certain distance, then encounter another lens or mirror, change direction again, and so on.  Eventually the rays hit a film or photodetector and are recorded by chemical or electronic means


Figuring out how all the rays of light make their way through the optical device is a complicated task, but it can be efficiently formulated using matrices.  We can then use our knowledge of matrices to work out practical things like image locations and magnifications.  The matrix formulation assumes that the rays are all close to being parallel to a given optical axis running the length of the device, and that all lenses are “thin” compared with the distances between lenses.  We’ll pretend there are no mirrors or prisms that change the direction of the optical axis by large angles.


When we talk of a ray, we mean a path that is a straight line between lenses, and which can change direction when passing through a lens.  If it is at height Yl with slope Sl at the left end of an lensless interval of length d, then at the right end of that interval it will still have the same slope Sr = Sl, but the elevation will have changed by d*Sl, i.e. Yr=Yl+d*Sl.  Conversely, when a ray goes through a lens its elevation stays the same Yr=Yl, but its slope changes by an amount proportional to the elevation: Sr=Sl-Yl/f.  The quantity f is the focal length of the lens, and it is typically positive for a convex lens and negative for a concave lens (provided the index of refraction is greater than 1!).








You can already see how the propagation and refraction relations are conveniently expressed in terms of matrices.  But the true power of the matrix formulation only appears when you start putting multiple lenses along the axis at various intervals.  The properties of such an optical system can be deduced by simply multiplying the individual matrices (in the proper order) corresponding to the individual intervals and lenses.  Interestingly enough, we tend to draw optical paths as above with the rays propagating from left to right.  The matrices that represent these optical components have to be multiplied in the opposite order, right to left.

As a first example, we shall consider the case of a single convex lens with a point light source a distance a to the left and a screen at distance b to the right.


At the source all the rays have the same value of Y (call it Y0), while the slope S0 can take a range of values.  (Mathematically S0 can be any positive or negative number; practically there is a finite range for which the ray actually hits a lens of finite diameter.)  Propagation to the left hand side of the lens is given by


                                      , then passing through the lens by                                             ,



then propagating to the right by                                       .  Or, all together:






Now the way I drew it, the rays that came out of the light source converge again at a single point on the screen, in other words the height Y3 at which the rays hit the screen do not depend on the slope S0 with which they left the light source.  Looking at the multiplied-out matrix, you see that this can happen only if the upper-right entry is zero, i.e. a + b = ab/f .  You may be more familiar with this formula as  , relating the object distance a, the image distance b, and the lens focal length f .  (I started this problem imagining only a single light source, but in fact this argument applies to every single light-emitting point in a complicated real object.)  When this relation is satisfied, you can read off Y3 = (1 – b/f)Y0, which indicates that the magnification of the image is (1 – b/f) .  This factor can be either positive or negative; a negative value means that the image is inverted, i.e. upside-down.


I have more examples, but I’ll let you work them out as homework.