Before starting any kind of analysis classify the data set as either continuous or attribute, and in some cases it is a mixture of both types. Continuous data is seen as a variables that can be measured on a continuous scale such as time, temperature, strength, or value. A test is to divide the worth in two and find out if it still is practical.

Attribute, or discrete, data can be associated with defined grouping and then counted. Examples are classifications of positive and negative, location, vendors’ materials, product or process types, and scales of satisfaction including poor, fair, good, and excellent. Once a product is classified it can be counted and the frequency of occurrence can be determined.

The next determination to create is whether or not the information is input or output. Output variables are often referred to as CTQs (essential to quality characteristics) or performance measures. Input variables are what drive the resultant outcomes. We generally characterize an item, process, or service delivery outcome (the Y) by some function of the input variables X1,X2,X3,… Xn. The Y's are driven through the X's.

The Y outcomes can be either continuous or discrete data. Types of continuous Y’s are cycle time, cost, and productivity. Samples of discrete Y’s are delivery performance (late or promptly), invoice accuracy (accurate, not accurate), and application errors (wrong address, misspelled name, missing age, etc.).

The X inputs can additionally be either continuous or discrete. Types of continuous X’s are temperature, pressure, speed, and volume. Examples of discrete X’s are process (intake, examination, treatment, and discharge), product type (A, B, C, and D), and vendor material (A, B, C, and D).

Another set of X inputs to always consider are the stratification factors. These are variables that may influence the product, process, or service delivery performance and must not be overlooked. Whenever we capture this info during data collection we can study it to find out when it is important or not. Examples are period of day, day of the week, month of year, season, location, region, or shift.

Since the inputs can be sorted from your outputs and the data can be classified as either continuous or discrete selecting the statistical tool to apply boils down to answering the question, “The facts that we want to know?” This is a list of common questions and we’ll address each one separately.

What is the baseline performance? Did the adjustments made to the procedure, product, or service delivery make a difference? What are the relationships in between the multiple input X’s as well as the output Y’s? If you can find relationships do they really create a significant difference? That’s enough questions to be statistically dangerous so let’s start by tackling them one at a time.

Precisely what is baseline performance? Continuous Data – Plot the data in a time based sequence employing an X-MR (individuals and moving range control charts) or subgroup the data utilizing an Xbar-R (averages and range control charts). The centerline in the chart gives an estimate of the average from the data overtime, thus establishing the baseline. The MR or R charts provide estimates in the variation over time and establish the upper and lower 3 standard deviation control limits for the X or Xbar charts. Create a Histogram from the data to look at a graphic representation of the distribution in the data, test it for normality (p-value should be much more than .05), and compare it to specifications to assess capability.

Minitab Statistical Software Tools are Variables Control Charts, Histograms, Graphical Summary, Normality Test, and Capability Study between and within.

Discrete Data. Plot the information in a time based sequence employing a P Chart (percent defective chart), C Chart (count of defects chart), nP Chart (Sample n times percent defective chart), or even a U Chart (defectives per unit chart). The centerline supplies the baseline average performance. Top of the and lower control limits estimate 3 standard deviations of performance above and underneath the average, which accounts for 99.73% of all the expected activity with time. You will get a bid from the worst and best case scenarios before any improvements are administered. Develop a Pareto Chart to view a distribution of the categories along with their frequencies of occurrence. When the control charts exhibit only normal natural patterns of variation with time (only common cause variation, no special causes) the centerline, or average value, establishes the ability.

Minitab Statistical Software Tools are Attributes Control Charts and Pareto Analysis. Did the adjustments designed to this process, product, or service delivery change lives?

Discrete X – Continuous Y – To check if two group averages (5W-30 vs. Synthetic Oil) impact fuel useage, utilize a T-Test. If you can find potential environmental concerns that may influence the test results utilize a Paired T-Test. Plot the results on the Boxplot and assess the T statistics with the p-values to create a decision (p-values less than or similar to .05 signify that a difference exists with a minimum of a 95% confidence that it must be true). If you have a change pick the group with the best overall average to meet the aim.

To check if 2 or more group averages (5W-30, 5W-40, 10W-30, 10W-40, or Synthetic) impact gas mileage use ANOVA (analysis of variance). Randomize the order in the testing to lower any moment dependent environmental influences on the test results. Plot the results over a Boxplot or Histogram and assess the F statistics with all the p-values to make a decision (p-values under or comparable to .05 signify that the difference exists with a minimum of a 95% confidence that it must be true). When there is a change pick the group with the best overall average to fulfill the objective.

In either of the above cases to check to determine if you will find a difference in the variation brought on by the inputs as they impact the output use a Test for Equal Variances (homogeneity of variance). Use the p-values to create a decision (p-values lower than or comparable to .05 signify that a difference exists with at the very least a 95% confidence that it is true). If you have a positive change pick the group with the lowest standard deviation.

Minitab Statistical Software Tools are 2 Sample T-Test, Paired T-Test, ANOVA, and Test for Equal Variances, Boxplot, Histogram, and Graphical Summary. Continuous X – Continuous Y – Plot the input X versus the output Y employing a Scatter Plot or if you can find multiple input X variables make use of a Matrix Plot. The plot offers a graphical representation from the relationship between the variables. If it seems that a romantic relationship may exist, between a number of from the X input variables as well as the output Y variable, conduct a Linear Regression of merely one input X versus one output Y. Repeat as required for each X – Y relationship.

The Linear Regression Model provides an R2 statistic, an F statistic, and also the p-value. To be significant to get a single X-Y relationship the R2 needs to be more than .36 (36% in the variation in the output Y is explained by the observed changes in the input X), the F ought to be much in excess of 1, as well as the p-value needs to be .05 or less.

Minitab Statistical Software Tools are Scatter Plot, Matrix Plot, and Fitted Line Plot.

Discrete X – Discrete Y – In this kind of analysis categories, or groups, are compared to other categories, or groups. For example, “Which cruise line had the highest client satisfaction?” The discrete X variables are (RCI, Carnival, and Princess Cruise Companies). The discrete Y variables would be the frequency of responses from passengers on their satisfaction surveys by category (poor, fair, good, excellent, and ideal) that relate with their vacation experience.

Conduct a cross tab table analysis, or Chi Square analysis, to examine if there were variations in degrees of satisfaction by passengers based upon the cruise line they vacationed on. Percentages can be used as the evaluation as well as the Chi Square analysis supplies a p-value to advance quantify whether or not the differences are significant. The entire p-value associated with the Chi Square analysis should be .05 or less. The variables who have the largest contribution to the Chi Square statistic drive the observed differences.

Minitab Statistical Software Tools are Table Analysis, Matrix Analysis, and Chi Square Analysis.

Continuous X – Discrete Y – Does the cost per gallon of fuel influence consumer satisfaction? The continuous X is the cost per gallon of fuel. The discrete Y will be the consumer satisfaction rating (unhappy, indifferent, or happy). Plot the data using Dot Plots stratified on Y. The statistical technique is a Logistic Regression. Yet again the p-values are used to validate that the significant difference either exists, or it doesn’t. P-values which are .05 or less mean that people have at least a 95% confidence that the significant difference exists. Use the most frequently occurring ratings to help make your determination.

Minitab Statistical Software Tools are Dot Plots stratified on Y and Logistic Regression Analysis. Are there any relationships between the multiple input X’s and also the output Y’s? If you will find relationships do they change lives?

Continuous X – Continuous Y – The graphical analysis is a Matrix Scatter Plot where multiple input X’s can be evaluated against the output Y characteristic. The statistical analysis technique is multiple regression. Assess the scatter plots to search for relationships in between the X input variables and also the output Y. Also, look for multicolinearity where one input X variable is correlated with another input X variable. This is analogous to double dipping so we identify those conflicting inputs and systematically eliminate them from the model.

Multiple regression is actually a powerful tool, but requires proceeding with caution. Run the model with all of variables included then evaluate the T statistics and F statistics to identify the first set of insignificant variables to get rid of from your model. During the second iteration in the regression model turn on the variance inflation factors, or VIFs, which are utilized to quantify potential multicolinearity issues 5 to 10 are issues). Evaluate the Matrix Plot to distinguish X’s associated with other X’s. Take away the variables with the high VIFs and also the largest p-values, but ihtujy remove among the related X variables inside a questionable pair. Review the remaining p-values and remove variables with large p-values through the model. Don’t be surprised if this type of process requires some more iterations.

If the multiple regression model is finalized all VIFs is going to be less than 5 and all p-values is going to be less than .05. The R2 value ought to be 90% or greater. It is a significant model and also the regression equation can certainly be used for making predictions as long while we keep your input variables inside the min and max range values which were employed to produce the model.

Minitab Statistical Software Tools are Regression Analysis, Step Wise Regression Analysis, Scatter Plots, Matrix Plots, Fitted Line Plots, Graphical Summary, and Histograms.

Discrete X and Continuous X – Continuous Y

This example requires using designed experiments. Discrete and continuous X’s can be utilized as the input variables, but the settings to them are predetermined in the design of the experiment. The analysis technique is ANOVA that was earlier mentioned.

The following is a good example. The aim would be to reduce the quantity of unpopped kernels of popping corn in a bag of popped pop corn (the output Y). Discrete X’s could possibly be the type of popping corn, form of oil, and form of the popping vessel. Continuous X’s might be amount of oil, level of popping corn, cooking time, and cooking temperature. Specific settings for each of the input X’s are selected and included in the statistical experiment.