Statistics is a branch of mathematics dealing with the collection, analysis, presentation and explanation of data. The beginnings of this field started with the development of probability theory related to games of chance, of which the first publication appeared in 1550 by Cardan. http://www.math.utep.edu/Faculty/mleung/probabilityandstatistics/beg.html The mathematical foundations of modern probability theory were developed by cooperative efforts of Pierre de Fermat and Blaise Pascal in the 17th century; and the Dutch scientist Christiaan Hugyens published the first book on this subject in the late 1600s. http://math.truman.edu/~thammond/history/Huygens.html http://www.math.utep.edu/Faculty/mleung/probabilityandstatistics/beg.html The field of statistics has applications beyond determining the probability in gambling games, such as in the insurance industry, which uses statistics to develop predictable models of human survival for the selling of insurance and annuities, in scientific investigations for the study of biological, behavioral and physical phenomena, and in sports for data summaries. http://www.scribd.com/doc/3530536/Applications-of-Statistics The value of statistics, in its various applications, is its ability to describe data and its predictive power for making decisions - an application which is known as inferential statistics. http://faculty.vassar.edu/lowry/webtext.html
Additional information on this page include scientists and mathematicians who were important in the development of modern statistics. There is also a glossary and a fast facts section to help explain some terms and concepts. A featured video discusses how to calculate various measures of central tendency for data sets. Links, notable quotes, blogs and news sections are also a part of this page for further information.
Notable Figures in the Development of Statistics
- Thomas Bayes (1702 - 1761), developer of the Bayes' theorem, which established the mathematical foundation for probability inference. http://www.morris.umn.edu/~sungurea/introstat/history/w98/Bayes.html http://www-history.mcs.st-and.ac.uk/Mathematicians/Bayes.html
- Pafnuty Chebychev (1821-1894), Russian mathematician who developed what is known as Cheychev's Theorum, a method for calculation of probability when the probability distribution of a population is unknown. http://www.saintmarys.edu/~psmith/345act13.html http://www-history.mcs.st-and.ac.uk/Biographies/Chebyshev.html
- Cardan, in 1550 made the first crude definition of probability. http://www.math.utep.edu/Faculty/mleung/probabilityandstatistics/beg.html
- Pierre de Fermat (1601-1665), inventor of Fermat's last theorem, and with Blaise Pascal, they laid the mathematical foundations of probability theory. http://www.math.utep.edu/Faculty/mleung/probabilityandstatistics/beg.html
- Andrey Kolmogorov (1903-1987), mathematician, university, published a monograph which laid the foundation for advanced probability theory and random processes.
- Blaise Pascal (1623-1662), French philosopher, mathematician, who with Pierre de Fermat, developed a mathematical model of probability. http://www.math.utep.edu/Faculty/mleung/probabilityandstatistics/beg.html
- Karl Pearson (1857 - 1936), applied statistics to biology and medicine. In 1900, he developed the chi-square test and popularized the concept that populations may have skewed distributions, as opposed to normal distributions. Worked in the fields of heredity and evolution. http://www.morris.umn.edu/~sungurea/introstat/history/w98/Pearson.html http://www-history.mcs.st-and.ac.uk/Mathematicians/Pearson.html
- Charles Spearman (1863-1945) , American psychologist who created Spearman's rank correlation coefficient which allowed for factor analysis. http://www.psych.cornell.edu/Darlington/factor.htm http://www.york.ac.uk/depts/maths/histstat/spearman_biog.htm
- Ronald Aylmer Fisher (1890-1962) , developed analysis of variance (ANOVA) and made important contributions regarding experimental design. Other contributions included the development of extreme value theory and the P-values for determining the reliability of statistical predictions. His work demonstrated that "uncertainty may be capable of precise quantitative assessment." http://www.morris.umn.edu/~sungurea/introstat/history/w98/RAFisher.html http://scienceworld.wolfram.com/biography/FisherRonald.html
- John Wilder Tukey (1915 -2000), developed methods for robust data analysis, time series analysis methods, graphical methods for exploratory data analysis (stem-leaf diagrams and box and whisker plots), paired and multiple comparisons, and in cooperative work with James Cooley, developed what is known as the Fast Fourier Transform. http://www.morris.umn.edu/~sungurea/introstat/history/w98/Tukey.html http://www.swlearning.com/quant/kohler/stat/biographical_sketches/bio15.1.html
- Christiaan Huygen (1629-1695), Dutch scientist and mathematician wrote the first book on probability theory, "The Value of all Chances in Games of Fortune,". This theory was applied to vital statistics for humans, which led to its use in annuities. http://www.surveyor.in-berlin.de/himmel/Bios/Huygens-e.html http://math.truman.edu/~thammond/history/Huygens.html
Glossary of Terms Used in Statistics
- Correlation: The relationship that exists between two variables, x and y. A parameter that is determined by regression analysis. There can be either positive or negative correlations.
- Distribution: A representation of the data in a set, usually presented graphically as a histogram or as a scatter plot.
- Estimate: An estimate of one value based on other known data values
- Experiment: The study of collected data
- Experimental unit: The unit of data that is sampled and studied
- Mean: A measure of central tendency of the data, also known as an average. There are arithmetic, geometric and harmonic means.
- Median: The middle number in an ordered data set. If there are an even numbers within a set, the middle two numbers are averaged to arrive at this number.
- Mode: The number which occurs most frequently in a set of numbers. A measure of central tendency in a data set.
- Outliers: Data which do not seem to be representative of a data set, due to too large or too small of a value. Sometimes due to measurement errors.
- Parameter: A constant value used to signify a characteristic of the population. Examples are measures of central tendency or variance.
- Population: The complete group of objects that data is drawn from
- Sample: A portion of the population that is used to estimate characteristics of the entire population
- Sampling distribution: The variance between data collected from a sample and the complete population
- Statistic: A specific piece of data from a sample or population
- Statistical inference: The conclusion made from analysis of collected data
- Variance: a mathematical measure of the variability within a data set.
Quotes About Statistics
- He uses statistics as a drunken man uses lampposts - for support rather than for illumination. - Andrew Lang
- Then there is the man who drowned crossing a stream with an average depth of six inches. - W.I.E. Gates
- Satan delights equally in statistics and in quoting scripture. - H.G. Wells, The Undying Fire
- The theory of probabilities is at bottom nothing but common sense reduced to calculus. -Pierre-Simon Laplace, Théorie analytique des probabilités, 1820
- I abhor averages. I like the individual case. A man may have six meals one day and none the next, making an average of three meals per day, but that is not a good way to live. - Louis D. Brandeis
Measures of Central Tendency in a Data Set
There are two forms of statistical representation of data, descriptive and inferential. Descriptive representations of data indicate or summarize a collection of data. Inferential representations use the data to make conclusions or predictions from samples about entire populations. In descriptive statistics, measure of central tendency (where the middle of a data set is) are important measures, which include the the mean (arithmetic average), median and mode. The presenter describes how to find each of these measures for a given data set, and the effect of outlier data on the proper choice of a measure for central tendency.
Featured Video
Statistics Theories
Wikipedia: Statistical Theory
IOP: Statistical theories of atomic transport in crystalline solids
Matthew R. Watkins: Statistical Theory of Numbers
Vanderbilt Medical Center: From Batting Averages to the Bard, Statistical Theories Apply