We often find the word statistic when we are studying statistics. Lay people are often mistaken about the uses of the words. So, what do statistics and statistic mean? The word “statistics” is derived from the word status (Latin) or statist (Italian), which means political circumstances of a country. In the early years of its development, statistics had the connotation of a collection of facts about a country or its citizens for administrative and political purposes. In order that a good governance took place, the government collected data on the condition of the country’s population. The activity was called a census. At that time the statistics meant “science about the state”. But in subsequent developments, not only statistics has been used to study the states of countries’ populations, but has been used more widely. Sir R. A. Fisher viewed statistics as “the mathematics applied to observational data”.
What is the definition of statistics? (Please note that the word statistics is singular.) Authors of books on statistics have provided various definitions of statistics. Just to mention two of them: Spiegel, a mathematics professor at Rensselaer Polytechnic Institute, in his book “Theory and Problems of Statistics” has described statistics as “the scientific methods for collecting, organizing, summarizing, presenting and analyzing data, as well as drawing valid conclusions and making reasonable decisions on the basis of such analysis.” Lind (1999) has noted that statistics is “the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions”.
Although various definitions of statistics have been provided in many literatures, in general there are some aspects in common. Statistics, as explained by Spiegel (1981), comprises two phases, i.e. descriptive statistics and inferential/inductive statistics. The former only describes a set of data under study without drawing conclusions about the “larger” set of data that contains it. On the other hand, inferential statistics attempts to draw conclusions about the population based on the sampling results. Because sampling is chancy, there is a certain amount of probability of drawing some false conclusions. Therefore, inferential statistics must be related to the probability theory.
Now, what are the meanings of population and sample? The terms often come out when we are talking about methods of collecting data. In general, there are two methods by which we gather data, namely census and sampling. Suppose that you are asked to find the average height of all students enrolled in a university. The first method to accomplish the task is to get a complete list of students registered in the university and then measure the height of all the students on the list (with no exception) and then calculate the average height. Such method is called census. Alternatively, we can make measurements on the height of only a small number of students on the list. Next, based on the the data in hand (and after doing some calculation), we draw the conclusion on the average height of all students enrolled in the university. This method is called sampling. Obviously, census requires more time and resources than sampling. Lind explains the population and sample as follows. “Population is the entire set of individuals or objects of interest or the measurements obtained from all individuals or objects of interest. On the other hand, sample is a portion, or part, of the population of interest.” In the example above, the population is all the students enrolled in the university. After the height data of all students are gathered, they can also be called population. Now, suppose that the sampling method has been carried out and among all the registered students in the university (let’s say 10,000 students), we have measured the height of only 100 students. The 100 height data from this measurement are sample data.
Let’s look at another example. Suppose that 10,000 units were produced in a manufacturing process this morning. We are going to check the proportion of defective units. By the census method, all the units have to be inspected. If 10 units are found to be defective, we conclude that the proportion of defective units is 0.001. The number just obtained is the population proportion because we got it after collecting data from the entire population, i.e. all the units produced this morning. Now suppose that instead of inspecting all the units produced this morning, we have decided to apply the sampling method. For example, among the 10,000 units produced this morning, only 400 were inspected and it turned out that 8 units were defective. From this sampling result, the proportion of defective units is . This number is a sample proportion because it was calculated based on the sampling result. It is easily seen that the population proportion differs from the sample proportion. It may happen because of the chancy nature in the sampling process. Inferential statistics tells us how to infer populations’ parameters based on sampling results. This is “the magic” of statistics.
We have discussed the meanings of census, sampling, population, and sample. These four are often found in the world of statistics. We are left with the task of discussing the difference between statistics and statistic. The former has been previously elaborated on. Now, what is a statistic? Theoretically, a statistic is any function of the random variables constituting a random sample. For example, to find the average height of all the students in our first example, we conducted a sampling by collecting data on the height of 100 students. Suppose that the height of the i-th student was . Then, the statistic used to estimate the average height of all students is . From this statistic (and after doing some calculation) we can calculate the desired average. In the second example, the statistic used to estimate the proportion of defective units is where x is the number of defective units found in the samples. Having known the statistic p, then (after doing some calculation) we can infer the population proportion of defective units.
References
Lind, D.A., W. G. Marchal, S. A. Wathen, Statistical Techniques in Business and Economics 10th Ed., McGraw-Hill Irwin, 1999
Spiegel, M. R., Theory and Problems of Statistics, McGraw-Hill Inc., 1981