Graphs

Presentation of data in an orderly manner often calls for a graphic display. Nowadays it is easier, with the advent of graphics programs for the computer, but still requires the application of basic techniques.

The first consideration for a graph, is whether the graph is needed, and if so, the type of graph to be used. For accuracy, a well-constructed table of data usually provides more information than a graph. The values obtained and their variability are readily apparent in a table, and interpolation (reading the graph) is unnecessary. For visual impact, however, nothing is better than a graphic display.

There are a variety of graph types to be chosen from; e.g., line graphs, bar graphs, and pie graphs. Each of these has its own characteristics and subdivisions. One also has to decide upon singular or multiple graphs, 2-dimensional or 3-dimensional displays, presence or absence of error bars, and the aesthetics of the display. The latter include such details as legend bars, axis labels, titles, selection of the symbols to represent data, and patterns for bar graphs.

The Basics
a 2-dimensional graph, with 2 values (x and y), which value is x and which is y? The answer is always the same—the known value is always the ordinate (x) value. The value that is measured is the abscissa (y) value. For a standard curve of absorption in spectrophotometry, the known concentrations of the standards are placed on the x-axis, while the measured absorbance would be on the y-axis. For measurements of the diameter of cells, the x-axis would be a micron scale, while the y-axis would be the number of cells with a given diameter.

Unless you are specifically attempting to demonstrate an inverted function, the scales should always be arranged with the lowest value on the left of the x-axis, and the lowest value at the bottom of the y-value. The range of each scale should be determined by the lowest and higest value of your data, with the scale rounded to the nearest tenth, hundredth, thousandth, etc. That is, if the data range from 12 to 93, the scale should be from 10 to 100. It is not necessary to always range from 0, unless you wish to demonstrate the relationship of the data to this value (spectrophotometric standard curve).

The number of integrals placed on the graph will be determined by the point you wish to make, but in general, one should use about 10 divisions of the scale. For our range of 12 to 93, an appropriate scale would be from 0 to 100, with an integral of 10.

Placing smaller integrals on the scale does not convey more information, but merely adds a lot of confusing marks to the graph. The user can estimate the values of 12 and 93 from such a scale without having every possible value ticked off.

Line Graph vs. Bar Graph or Pie Graph
If the presentation is to highlight various data as a percentage of the total data, then a pie graph is ideal. Pie graphs might be used, for example, to demonstrate the composition of the white cell differential count. They are the most often used graph type for business, particularly for displaying budget details.

Pie graphs are circular presentations that are drawn by summing your data and computing the percent of the total for each data entry. These percent values are then converted to portions of a circle (by multiplying the percent by 360°) and drawing the appropriate arc of a circle to represent the percent. By connecting the arc to the center point of the circle, the pie is divided into wedges, the size of which demonstrate the relative size of the data to the total. If one or more wedges are to be highlighted, that wedge can be drawn slightly out of the perimeter of the circle for what is referred to as an “exploided” view.

More typical of data presented in cell biology, however, are the line graph and the bar graph. There is no hard and fast rule for choosing between these graph types, except where the data are noncontinuous. Then, a bar graph must be used. In general, line graphs are used to demonstrate data that are related on a continuous scale, whereas bar graphs are used to demonstrate discontinuous or interval data.

Suppose, for example, that you decide to count the number of T-lymphocytes in 4 slices of tissue, one each from the thymus, Payer’s patches, a lymph node, and a healing wound on the skin. Let’s label each of these as T, P, L, and S, respectively. The numbers obtained per cubic centimeter of each tissue are T = 200, P = 150, L = 100, and S = 50. Note that there is a rather nice linear decrease in the numbers if T is placed on the left of an x-axis, and S to the right. A linear graph of these data would produce a nice straight line, with a statistical regression fit and slope. But look at the data! There is no reason to place T (or P, L, or S) to the right or left of any other point on the graph—the placement is totally arbitrary. A line graph for these data would be completely misleading since it would imply that there is a linear decrease from the thymus to a skin injury and that there was some sort of quantitative relationship among the tissues. There is certainly a decrease, and a bar graph could demonstrate that fact, by arranging the tissue type on the x-axis in such a way to demonstrate that relationship—but there is no inherent quantitative relationship between the tissue types that would force one and only one graphic display. Certainly, the thymus is not 4 times some value of skin (although the numbers are).

However, were you to plot the number of lymphocytes with increasing distance from the point of a wound in the skin, an entirely different presentation would be called for. Distance is a continuous variable. We may choose to collect the data in 1-mm intervals, or 1 cm. The range is continuous from 0 to the limit of our measurements. That is, we may wish to measure the value at 1 mm, 1.2 mm, 1.23 mm, or 1.23445 mm. The important point is that the 2-mm position is 2x the point at 1 mm. There is a linear relationship between the values to be placed on the x-axis. Therefore, a linear graph would be appropriate, with the dots connected by a single line. If we choose to ignore the 1.2 and 1.23 and round these down to a value of 1, then a bar graph would be more appropriate. This latter technique (dividing the data in appropriate intervals and plotting as a bar graph) is known as a histogram.

Having decided that the data have been collected as a continuous series, and that they will be plotted on a linear graph, there are still decisions to be made. Should the data be placed on the graph as individual points with no lines connecting them (a scattergram)? Should a line be drawn between the points (known as a dot-to-dot)? Should the points be plotted, but curve smoothing be applied? If the latter, what type of smoothing?

There are many algorithms for curve fitting, and the 2 most commonly used are linear regression and polynomial regression. It is important to decide before graphing the data, which of these is appropriate.

Linear regression is used when there is good reason to suspect a linear relationship within the data (for example, in a spectrophotometric standard following the Beer-Lambert law). In general, the y-value can be calculated from the equation for a straight line, y = mx + b, where m is the slope and b is the y-intercept.

Computer programs for this can be very misleading. Any set of data can be entered into a program to calculate and plot linear regression. It is important that there be a valid reason for supposing linearity before using this function, however. This is also true when using polynomial regressions. This type of regression calculates an ideal curve based on quadratic equations with increasing exponential values, that is y = (mx + b)n, where n is greater than 1. The mathematics of this can become quite complex, but often the graphic displays look better to the beginning student. It is important to note that use of polynomial regression must be warranted by the relationship within the data, not by the individual drawing the graph.

For single sets of data, that is the extent of the available options. For multiple sets, the options increase. If the multiple sets are data collected pertaining to identical ordinate values, then error bars (standard deviation or standard error of the means) can be added to the graphics. Plots can be made where 2 lines are drawn, connecting the highest y-values for each x, and a second connecting the lowest values (the Hi-Lo Graph). The area between the 2 lines presents a graphic depiction of variability at each ordinate value.

If the data collected involve 2 or more sets with a common x-axis, but varying y-axis (or values), then a multiple graph may be used. The rules for graphing apply to each set of data, with the following provision: keep the number of data sets on any single graph to an absolute minimum. It is far better to have 3 graphs, each with 3 lines (or bars), than to have a single graph with 9 lines. A graph that contains an excess of information (such as 9 lines) is usually ignored by the viewer (as are tables with extensive lists of data). For this same reason, all unnecessary clutter should be removed from the graph; e.g., grid marks on the graph are rarely useful.

Finally, it is possible to plot 2 variables, y and z, against a common value, x. This is done with a 3D graphic program. The rules for designing a graph follow for this type of graph, and the use of these should clearly be left to computer graphics program. These graphs often look appealing with their hills and valleys, but rarely impart any more information than 2 separate 2D graphs. Perhaps the main reason is that people are familiar with 2-dimensional graphs, but have a more difficult time visually interpreting 3-dimensional graphs.