INTRODUCTION TO DATA ANALYSIS-PART1-
Data
analysis
Analysis of data is a process of inspecting, cleaning, transforming,
and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision
making. Data analysis has multiple facets and approaches, encompassing diverse
techniques under a variety of names, in different business, science, and social
science domains.
Data mining is a particular data analysis technique that
focuses on modeling and knowledge discovery for predictive rather than purely
descriptive purposes.Business intelligence covers
data analysis that relies heavily on aggregation, focusing on business
information. In statistical applications, some people
divide data analysis into descriptive statistics, exploratory data analysis (EDA),
andconfirmatory data analysis (CDA).
EDA focuses on discovering new features in the data and CDA on confirming or
falsifying existing hypotheses. Predictive analytics focuses
on application of statistical or structural models for predictive forecasting
or classification, while text analytics applies statistical, linguistic, and
structural techniques to extract and classify information from textual sources,
a species of unstructured data. All are
varieties of data analysis.
Data integration is a
precursor to data analysis, and data analysis is closely linked to data visualization and
data dissemination. The term data analysis is sometimes used
as a synonym for data modeling.
Initial data analysis
The most important distinction between the initial data analysis phase and
the main analysis phase, is that during initial data analysis one refrains from
any analysis that are aimed at answering the original research question. The
initial data analysis phase is guided by the following four questions:
Quality of data
The quality of the data should be checked as early as possible. Data
quality can be assessed in several ways, using different types of analyses:
frequency counts, descriptive statistics (mean, standard deviation, median),
normality (skewness, kurtosis, frequency histograms, n: variables are compared
with coding schemes of variables external to the data set, and possibly corrected
if coding schemes are not comparable.
Test for common-method variance
The choice of analyses to assess the data quality during the initial data
analysis phase depends on the analyses that will be conducted in the main
analysis phase.
Quality of measurements
The quality of the measurement instruments should
only be checked during the initial data analysis phase when this is not the
focus or research question of the study. One should check whether structure of
measurement instruments corresponds to structure reported in the literature.
There are two ways to assess measurement quality
There are two ways to assess measurement quality
·
Confirmatory factor analysis
·
Analysis of homogeneity (internal consistency),
which gives an indication of the reliability of a
measurement instrument. During this analysis, one inspects the variances of the
items and the scales, the Cronbach's α of the scales, and the change in the
Cronbach's alpha when an item would be deleted from a scale.
Initial transformations
After assessing the quality of the data and of the measurements, one might
decide to impute missing data, or to perform initial transformations of one or
more variables, although this can also be done during the main analysis phase.
Possible transformations of variables are
Possible transformations of variables are
·
Square root transformation (if the distribution differs moderately from
normal)
·
Log-transformation (if the distribution differs substantially from normal)
·
Inverse transformation (if the distribution differs severely from normal)
·
Make categorical (ordinal / dichotomous) (if the distribution differs
severely from normal, and no transformations help)
Did the implementation of the study
fulfill the intentions of the research design?
One should check the success of the randomization procedure, for instance by checking whether
background and substantive variables are equally distributed within and across
groups.
If the study did not need and/or use a randomization procedure, one should check the success of the non-random sampling, for instance by checking whether all subgroups of the population of interest are represented in sample.
Other possible data distortions that should be checked are
If the study did not need and/or use a randomization procedure, one should check the success of the non-random sampling, for instance by checking whether all subgroups of the population of interest are represented in sample.
Other possible data distortions that should be checked are
·
dropout (this should
be identified during the initial data analysis phase)
·
Item nonresponse (whether this is random
or not should be assessed during the initial data analysis phase)
·
Treatment quality (using manipulation
checks)
Characteristics of data sample
In any report or article, the structure of the sample must be accurately
described. It is especially important to exactly determine the structure of the
sample (and specifically the size of the subgroups) when subgroup analyses will
be performed during the main analysis phase.
The characteristics of the data sample can be assessed by looking at:
The characteristics of the data sample can be assessed by looking at:
·
Basic statistics of important variables
·
Scatter plots
·
Correlations and associations
·
Cross-tabulations
Tags: DATA ANALYSIS

Subscribe to:
Post Comments (Atom)
Share your views...
0 Respones to "INTRODUCTION TO DATA ANALYSIS-PART1-"
Post a Comment