INTRODUCTION TO DESCRIPTIVE STATISTICS
Descriptive
statistics
is the
discipline of quantitatively describing the main features of a collection ofdata, or the
quantitative description itself. Descriptive statistics are distinguished
from inferential
statistics (or inductive
statistics), in that descriptive statistics aim to summarize
a sample, rather than
use the data to learn about the population that the
sample of data is thought to represent. This generally means that descriptive
statistics, unlike inferential statistics, are not developed on the basis
of probability
theory. Even
when a data analysis draws its main conclusions using inferential statistics,
descriptive statistics are generally also presented. For example in a paper
reporting on a study involving human subjects, there typically appears a table
giving the overall sample size, sample sizes
in important subgroups (e.g., for each treatment or exposure group), and demographic or
clinical characteristics such as the average age, the proportion of subjects of
each sex, and the proportion of subjects with relatedcomorbidities.
Some measures
that are commonly used to describe a data set are measures of central tendency and
measures of variability or dispersion. Measures of
central tendency include the mean, median and mode, while
measures of variability include the standard
deviation (or variance), the minimum
and maximum values of the variables, kurtosis and skewness.
Use in statistical analysis
Descriptive
statistics provides simple summaries about the sample and about the
observations that have been made. Such summaries may be either quantitative, i.e. summary
statistics, or visual, i.e. simple-to-understand graphs. These
summaries may either form the basis of the initial description of the data as
part of a more extensive statistical analysis, or they may be sufficient in and
of themselves for a particular investigation.
For example,
the shooting percentage in basketball is a
descriptive statistic that summarizes the performance of a player or a team.
This number is the number of shots made divided by the number of shots taken.
For example, a player who shoots 33% is making approximately one shot in every
three. The percentage summarizes or describes multiple discrete events.
Consider also the grade
point average. This single number describes the general performance of
a student across the range of their course experiences.
The use of
descriptive and summary statistics has an extensive history and, indeed, the
simple tabulation of populations and of economic data was the first way the
topic of statistics appeared.
More recently, a collection of summarisation techniques has been formulated
under the heading of exploratory
data analysis: an example of such a technique is the box plot.
In the
business world, descriptive statistics provide a useful summary of security
returns when researchers perform empirical and analytical analysis, as they
give a historical account of return behavior.
Univariate analysis
Univariate
analysis involves describing the distribution of a
single variable, including its central tendency (including the mean, median, and mode) and
dispersion (including the range and quantiles of the data-set, and measures of spread such as
the variance and standard
deviation). The shape of the distribution may also be described
via indices such as skewness and kurtosis.
Characteristics of a variable's distribution may also be depicted in graphical
or tabular format, including histograms and stem-and-leaf
display.
Bivariate analysis
When a sample
consists of more than one variable, descriptive statistics may be used to
describe the relationship between pairs of variables. In this case, descriptive
statistics include:
·
Cross-tabulations
and contingency tables
·
Graphical
representation via scatterplots
·
Quantitative
measures of dependence
·
Descriptions
of conditional
distributions
The main
reason for differentiating univariate and bivariate analysis is that bivariate
analysis is not only simple descriptive analysis, but also it describes the
relationship between two different variables.Quantitative
measures of dependence include correlation (such as Pearson's r when
both variables are continuous, or Spearman's
rho if one or both are not) and covariance (which
reflects the scale variables are measured on). The slope, in regression
analysis, also reflects the relationship between variables. The unstandardised
slope indicates the unit change in the criterion variable for a one unit change
in the predictor. The standardised slope indicates this change in standardised
(z-score) units. Highly skewed data are often transformed by taking logarithms.
Use of logarithms makes graphs more symmetrical and look more similar to
the normal
distribution, making them easier to interpret intuitively
Tags: DESCRIPTIVE STATISTICS

Subscribe to:
Post Comments (Atom)
Share your views...
0 Respones to "INTRODUCTION TO DESCRIPTIVE STATISTICS"
Post a Comment