Understanding Design and Analysis of Research Experiments - Statistical Analysis of Experimental Data

Types of experimental data
Data analysis is the application of one or more statistical techniques to a set of data as collected from one of the following three types of research projects: (1) designed experiments, (2) sample surveys and (3) observational studies. Let us briefly describe these three types of research data. In designed experiments, some form of treatment is applied to experimental units and responses are observed. For example, in a food processing experiment, portions of fruit, such as slices of apples, are treated with different preservatives and the shelf lives of the portions of fruit are determined. Most experiments in agri-food research are designed experiments because the researchers often want to determine effects of different treatments.

There are also survey and observational data in agricultural research. In sample surveys, data are collected on units according to a plan, called a survey design, but no treatments are applied to the units. For example, in a geographic survey of pathogenic variation of wheat stem rust in western Canada, isolates of the rust, the survey units, are collected from different regions in western Canada and disease severity is measured after inoculating on wheat seedlings. In observational studies, data are collected on units that are available, rather than on units chosen according to a plan. An example is a study at a veterinary clinic in which dogs that enter the clinic are diagnosed according to their skin condition, and blood samples are drawn for measurement of trace elements.

Types of variables
Any research experiment involves in measuring or characterizing some variables whose values can be used to assess effects of different treatments or inherent attributes of research materials in a survey experiment. There are roughly two classes of variables in agricultural research:

Measurement variables;
Categorical variables.

Measurement variables are all those whose differing states can be expressed in a numerically ordered fashion. There are two types of measurement variables: continuous and discontinuous variables. Continuous variables are those which at least theoretically can assume an infinite number of values between any two fixed points. For example, between lengths 2.5 and 2.6 cm of two wheat seedlings, there can be an infinite number of lengths such as 2.57 cm if one measures enough wheat seedlings. Agricultural researchers frequently have this type of measurement variables. Examples include plant height, growth areas, grain volume, body weight of animals, body temperature and growth rate. Discontinuous variables, also known as meristic or discrete variables, have only certain fixed numerical values, with no intermediate values possible in between. For example, the number of segments in a certain insect appendage may be 4 or 5 or 6 but never 4.3 or 5.5. Examples are number of piglets per litter, number of weeds in a given quadrat and number of spikelets per wheat spike.

Categorical variables cannot be measured but can be categorized according to their attributes (nominal variables) or ranked according to their magnitude (ordinal variables). For example, a plant breeder is interested in breeding for genetic resistance to a certain disease. She may need a rating system of 0 - 4 to assess the disease severity (0 = no symptoms to 4 = maximum infection). The disease severity is an ordinal variable. Sometimes she may be only interested in the disease incidence so that all plants are categorized as diseased or not diseased. The disease incidence is a nominal variable.

Which statistical method should be used?
Exactly which statistical method(s) should be used? The answer to this question require an examination of (i) whether the variables in the experiment are dependent (measurement) variables or independent ('treatment') variables and (ii) whether the variables are continuous or discrete variables. With such examination, it is relatively straightforward to identify appropriate statistical methods to be used for analyzing the experimental data at hand. The following table summarize the matching of statistical methods with the types of data. However, this does not imply that the suggested statistical methods are the ONLY ones for the data. In fact, data analysis is a dynamic and exploratory process. Thus, the alternative statistical methods may be used sometimes.

Independent variable	Dependent variable (y)
(x)	UD	UC	MVD	MVC
None	One-way chi-square test	Univariate descriptive statistics	Two-way chi-square test; Log-linear analysis	Principal components; Factor analysis
Univariate Discrete (UD)	z-test or ANOVA with frequency data	One-way ANOVA	log-linear analysis; logit model	One-way MANOVA
Univariate Continuous (UC)	Logistic regression	Simple Regression	???	Simple multivariate regression
Multivariate Discrete (MVD)	Logit model	Block, nested or factorial ANOVA	Log-linear analysis; logit model	Multi-way MANOVA
Multivariate Continuous (MVC)	Logistic regression	Multiple regression	Discriminant analysis	Multivariate regression; Cannonical correlations
Mixed discrete and continuous	???	Analysis of covariance	???	Multivariate analysis of covariance

Note: (i) Question marks in a cell indicate that no appropriate statistical analysis is identified; (ii) All statistical analyses appeared in this table can be carried out through appropriate SAS/STAT procedures (See SAS/STAT manual for details).

Which statistical software should be used?
With the leap in desktop computing power over the past few years, many statistical softwares that traditionally target the mainframe computers have now had their user-friendly PC versions available. This enables individual researchers to carry out statistical analysis with little help from computer specialists. However, apart from a real danger of (i) misapplication of statistical methods by the researchers with limited statistical backgrounds and (ii) invalid interpretation of outputs from the analysis, there is also a difficulty to choose which statistical softwares should be used in analyzing the research data. General Statistical Resources has compiled the most commonly used statistical softwares that are currently used by researchers in different disciplines. A brief description of each software is also given to facilitate your choice. It should be noted that SAS has dominated the statistical computing market for the last 20 years and it is no surprise that most AAFRD researchers have chosen SAS as their workhorse for the statistical analysis. For this reason, How to Use SAS for Data Analysis describes some basic SAS commands and their appropriate use. This appendix is for those AAFRD researchers who are not familiar with the SAS system but who are considering to use SAS for the data analysis.

Understanding Design and Analysis of Research Experiments - Statistical Analysis of Experimental Data

Other Documents in the Series