# Understanding Design and Analysis of Research Experiments - Statistical Analysis of Experimental Data Agri-NewsThis Week

Types of experimental data
Data analysis is the application of one or more statistical techniques to a set of data as collected from one of the following three types of research projects: (1) designed experiments, (2) sample surveys and (3) observational studies. Let us briefly describe these three types of research data. In designed experiments, some form of treatment is applied to experimental units and responses are observed. For example, in a food processing experiment, portions of fruit, such as slices of apples, are treated with different preservatives and the shelf lives of the portions of fruit are determined. Most experiments in agri-food research are designed experiments because the researchers often want to determine effects of different treatments.

There are also survey and observational data in agricultural research. In sample surveys, data are collected on units according to a plan, called a survey design, but no treatments are applied to the units. For example, in a geographic survey of pathogenic variation of wheat stem rust in western Canada, isolates of the rust, the survey units, are collected from different regions in western Canada and disease severity is measured after inoculating on wheat seedlings. In observational studies, data are collected on units that are available, rather than on units chosen according to a plan. An example is a study at a veterinary clinic in which dogs that enter the clinic are diagnosed according to their skin condition, and blood samples are drawn for measurement of trace elements.

Types of variables
Any research experiment involves in measuring or characterizing some variables whose values can be used to assess effects of different treatments or inherent attributes of research materials in a survey experiment. There are roughly two classes of variables in agricultural research:

• Measurement variables;
• Categorical variables.

Measurement variables are all those whose differing states can be expressed in a numerically ordered fashion. There are two types of measurement variables: continuous and discontinuous variables. Continuous variables are those which at least theoretically can assume an infinite number of values between any two fixed points. For example, between lengths 2.5 and 2.6 cm of two wheat seedlings, there can be an infinite number of lengths such as 2.57 cm if one measures enough wheat seedlings. Agricultural researchers frequently have this type of measurement variables. Examples include plant height, growth areas, grain volume, body weight of animals, body temperature and growth rate. Discontinuous variables, also known as meristic or discrete variables, have only certain fixed numerical values, with no intermediate values possible in between. For example, the number of segments in a certain insect appendage may be 4 or 5 or 6 but never 4.3 or 5.5. Examples are number of piglets per litter, number of weeds in a given quadrat and number of spikelets per wheat spike.

Categorical variables cannot be measured but can be categorized according to their attributes (nominal variables) or ranked according to their magnitude (ordinal variables). For example, a plant breeder is interested in breeding for genetic resistance to a certain disease. She may need a rating system of 0 - 4 to assess the disease severity (0 = no symptoms to 4 = maximum infection). The disease severity is an ordinal variable. Sometimes she may be only interested in the disease incidence so that all plants are categorized as diseased or not diseased. The disease incidence is a nominal variable.

Which statistical method should be used?
Exactly which statistical method(s) should be used? The answer to this question require an examination of (i) whether the variables in the experiment are dependent (measurement) variables or independent ('treatment') variables and (ii) whether the variables are continuous or discrete variables. With such examination, it is relatively straightforward to identify appropriate statistical methods to be used for analyzing the experimental data at hand. The following table summarize the matching of statistical methods with the types of data. However, this does not imply that the suggested statistical methods are the ONLY ones for the data. In fact, data analysis is a dynamic and exploratory process. Thus, the alternative statistical methods may be used sometimes.

 Independent variable Dependent variable (y) (x) UD UC MVD MVC None One-way chi-square test Univariate descriptive statistics Two-way chi-square test; Log-linear analysis Principal components; Factor analysis Univariate Discrete (UD) z-test or ANOVA with frequency data One-way ANOVA log-linear analysis; logit model One-way MANOVA Univariate Continuous (UC) Logistic regression Simple Regression ??? Simple multivariate regression Multivariate Discrete (MVD) Logit model Block, nested or factorial ANOVA Log-linear analysis; logit model Multi-way MANOVA Multivariate Continuous (MVC) Logistic regression Multiple regression Discriminant analysis Multivariate regression; Cannonical correlations Mixed discrete and continuous ??? Analysis of covariance ??? Multivariate analysis of covariance

Note: (i) Question marks in a cell indicate that no appropriate statistical analysis is identified; (ii) All statistical analyses appeared in this table can be carried out through appropriate SAS/STAT procedures (See SAS/STAT manual for details).

Which statistical software should be used?
With the leap in desktop computing power over the past few years, many statistical softwares that traditionally target the mainframe computers have now had their user-friendly PC versions available. This enables individual researchers to carry out statistical analysis with little help from computer specialists. However, apart from a real danger of (i) misapplication of statistical methods by the researchers with limited statistical backgrounds and (ii) invalid interpretation of outputs from the analysis, there is also a difficulty to choose which statistical softwares should be used in analyzing the research data. General Statistical Resources has compiled the most commonly used statistical softwares that are currently used by researchers in different disciplines. A brief description of each software is also given to facilitate your choice. It should be noted that SAS has dominated the statistical computing market for the last 20 years and it is no surprise that most AAFRD researchers have chosen SAS as their workhorse for the statistical analysis. For this reason, How to Use SAS for Data Analysis describes some basic SAS commands and their appropriate use. This appendix is for those AAFRD researchers who are not familiar with the SAS system but who are considering to use SAS for the data analysis.

### Other Documents in the Series

Understanding Design and Analysis of Research Experiments
Understanding Design and Analysis of Research Experiments - Statistical Considerations in Initial Research Planning
Understanding Design and Analysis of Research Experiments - Experimental Designs
Understanding Design and Analysis of Research Experiments - Statistical Analysis of Experimental Data - Current Document
Understanding Design and Analysis of Research Experiments - General Statistical Resources
Understanding Design and Analysis of Research Experiments - How to Use SAS for Data Analysis
Understanding Design and Analysis of Research Experiments - References 