# WGU – MBA – C207 – Data Driven Decision Making

Activities (RBM stage)
second step involves the process that converts inputs to outputs (actions necessary to produces results – training, evaluating, developing)
Alternative hypothesis
The argument that either a sample is not equal to, greater than, or less than the hypothesized null sample
Analysis of Variance (ANOVA)
a technique used to determine if there is a sufficient evidence from sample data of three or more populations to conclude that the means of the population are not all equal. A statistical method that helps identify the sources of variability by comparing
their means or averages; it compares the variation within a sample to the variation between samples to see if any differences are the result of some contributing factor or if the differences occur by chance alone.
Analytics
The discovery, analysis, and communication of meaningful patterns in data.
Autocorrelation
A relationship between two variables that is inherently non-linear
Balanced Scorecard
An approach using multiple measures to evaluate performance, including financial measures, and the non-financial measures of customers, internal business processes, and learning and growth.
Bar chart
A graph that measures the distribution of data over discrete groups or categories.
Benchmarks
Standards or points of reference for an industry or sector that can be used for comparison and evaluation.
Big Data
very large amounts of data; an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process them using traditional data processing applications
Blind Study
A study performed where the participants are not told if they are in the treatment group or control group
body mass index (BMI)
A measure, based on a person’s weight and height, that is used to classify people as underweight or overweight.
A sequence of logically related and time based work activities to provide a specific output for a customer.
Central Limit Theorem
A theorem that states that, the greater the sample, the closer the mean of the sample is to the entire population and the more the results will look like a normal distribution
Cluster Analysis
The process of arranging terms or values based on different variables into “natural” groups
Cointegration
Occurs when two time series are moving with a common pattern due to a connection between the two time series
Combination
The number of different unordered possibilities for a certain situation.
Complement
The occurrence of an event not happening, the opposite
Confidence interval
An interval estimate used to indicate reliability
Continuous Data
Data that can lay along any point in a range of data
Control chart
A graphic display of process data over time and against established control limits, and that has a centerline that assists in detecting a trend of plotted values toward either control limit. A modified run chart that also provides upper and/or lower limits that a process should not exceed.
Control limits
The area composed of three standard deviations on either side of the centerline, or mean, of a normal distribution of data plotted on a control chart that reflects the expected variation in the data. See also specification limits.
Criterion-reference test
compare an individual to certain defined standards
Cumulative Average-Time Learning Model
A learning curve model in which the cumulative average time per unit declines by a constant percentage each time the cumulative quantity of units produced is doubled
Cumulative distributions
The probability that a random variable will be found at a value less than or equal to a given number
Customer satisfaction
A measure of the extent to which customers are satisfied with the products and related services they received from a supplier.
Cycle time
The total elapsed time to move a unit of work from the beginning to the end of a physical process, as defined by the producer and the customer.
Data Management
The management, including cleaning and storage, of collected data.
Data Mining
the process of discovering patterns in large data sets; performed on big data to decipher patterns from these large databases
Data Set
A collection of related data records on a storage device.
Davenport Kim Three Stage Model
A decision making model developed by Thomas Davenport and Jinho Kim that consists of three stages: framing the problem, solving the problem, and communicating results
Dependent Variable
The variable whose value depends on one or more variables in the equation; typically the cost or activity to be predicted
Detractor
A category of customer used in the calculation of the Net Promoter Score that indicates an unhappy customer.
Discrete Data
Data that can only take on whole values and has clear boundaries
Double Blind Study
A study performed where neither the treatment allocator nor the participant knows which group the participant is in
Epidemiology
study of the incidence, distribution and possible control of diseases and other factors relating to health
Event
An outcome that occurs
Experience Curve
A curve that shows the decline in cost per unit in various business functions of the value chain as the amount of these activities increases
Heteroscedasticity
A regression in which the variances in y for the values of x are not equal
Histogram
A graph that displays continuous data. This type of graph has vertical bars that show the counts or numbers in each range of data. A vertical bar chart that shows the distribution of data across groups or categories.
Homoscedasticity
A regression in which the variances in y for the values of x are equal or close to equal
Hypothesis
A proposed explanation used as a starting point for future examination
Impact (RBM stage)
last step when applying results-based management is to study the long-term effects that the output will have (economic, environmental, cultural, or political change)
Incidence
measures the number of new cases that arise in a population over the course of a designated time period
# new cases / person-time units
Incremental Unit-Time Learning Model
A learning curve model in which the incremental unit time (the time needed to produce the last unit) declines by a constant percentage each time the cumulative quantity of units produced is doubled
Independent Variable
The variable presumed to influence another variable (dependent variable); typically it is the level of activity or cost driver
Information Bias
A prejudice in the data that results when either the respondent or the interviewer has an agenda and is not presenting impartial questions or responding with truly honest responses, respectively
Input (RBM stage)
the first step of RBM is to define the resources, human or financial, used by the RBM system (people, funds, information)
Interquartile range
The difference, in value, between the bottom and top 25 percent of the sample or population
Interval Data
Data that is ordered within a range and with each data point being an equal interval apart
Item Response Theory (IRT)
model of designing, analyzing and scoring tests
Key Performance Indicator (KPI)
A performance measurement that organizations use to quantify their level of success.
Laspeyres Index
a comparison of the same quantity of goods with the same weight over a period of time
Line graph
A graph that illustrates relationships between two changing variables with a line or curve that connects a series of successive data points
Lower limit control
The minimum value on a control chart that a process should not exceed
Mean
An average, calculated by adding a series of elements in a data set together and dividing by the total number in the series
Measurement Bias
A prejudice in the data that results when the sample is not representative of the population being tested
Median
The value or quantity lying at the midpoint of a frequency distribution
Mode
the value that appears most often in a set of data
Multicollinearity
A multiple regression equation is flawed because two variables thought to be independent are actually correlated to be independent
Multiple Linear Regression
A statistical method used to model the relationship between one dependent (or response) variable and two or more independent (or explanatory) variables by fitting a linear equation to observed data
Multiplication Principle
When the probabilities of multiple events are multiplied together to determine the likelihood of all of those events occurring
Mutually exclusive events
When two or more events are not able to occur at the same time
Nominal Data
Sometimes called categorical data or qualitative data, this data type is used to label subjects or data by name
Data quality issues from misspelled data or missing data
Non parametric test
A test that does not assume there to be a structure (may be a normal distribution) to the population.
Norm-referenced test
compare an individual to other individuals
Normal distribution
data tending to occur around a central value with no bias right or left
Null hypothesis
The argument that there is no difference between two samples or that a sample has not changed over time
Omission Error
An error because something (for example, data or survey response) is missing.
Operating Income
Earnings before Interest and Taxes.
Ordinal Data
Data that places data objects into an order according to some quality with higher order indicating more of that quality
Outcome (RBM stage)
the short-term effect that the outputs will have (greater efficiency, more viability, better decision making, social action, or changed public opinion)
Outlier
An observation point that is significantly distant from the other observations in the dataset
Output (RBM stage)
third step when the outputs have been created by the RBM activities (goods and services, publications, systems, evaluations, skills changes)
Paasche Index
calculates the difference over time between the weighted totals of the qualities purchased at each time
Parametric test
A test that assumes there is a structure (maybe a normal distribution) to the population, often appearing when mean or standard deviation are important.
Passives
A category of customer used in the calculation of the Net Promoter Score that indicates an enthusiastic and satisfied but apathetic customer.
Percentile
the percent of the population that falls below a certain value
Permutation
The number of different ordered possibilities for a certain situation
Prevalence
measures the number cases of a particular disease that exist in a population
# of cases/total pop
Probability
The chance of an event occurring
Probability density function
Often used to represent probabilities of continuous data, a probability density function (pdf) gives the probability that a continuous random variable is equal to the area below it
Probability distributions
A set of probabilities that are attached to the different possible outcomes in a survey, experiment, or procedure
Probability mass function
Often used to represent probabilities of discrete data, a probability mass function (pmf) gives the probability that a discrete random variable is exactly equal to some value
Proportion
a ratio in which a part of a group is compared to the whole group
R-squared
The measure of the “goodness of fit” of the regression line and the percentage of variation in the dependent variable that is explained by the independent variable
Random Variation
The variability of a process which might be caused by irregular fluctuations due to chance that cannot be anticipated, detected, or eliminated
Range
The difference between the minimum and maximum value in a given measurable set
Rate
measure of an event occurring over a period of time
Ratio
measures one quantity in relation to another quantity
Ratio Data
Similar to interval data in that the data that is ordered within a range and with each data point being an equal interval apart, also has a natural zero point which indicates none of the given quality.
Regression Analysis
A statistical analysis tool that quantifies the relationship between a dependent variable and one or more independent variables. ie seasonality, trend, random variation
Relational Database
A database structured to recognize relations among stored items of information.
Reliable Data
Data that is consistent and repeatable
Return on Investment (ROI)
The ratio of income earned on the investment to the investment made to earn that income.
Run chart
A line chart that shows performance measurements over time; help to uncover trends or aberrations in processes
Sampling with replacement
When a piece of the population can be selected more than once
Sampling without replacement
When a piece of the population cannot be selected more than once
Scatter diagram
A graphic that uses dots to show relationships or correlations between variables.
Significance level
A number that is used as the cutoff for how statistically meaningful a probability, equal to or more extreme than what was observed, is test statistic One value used to test the hypothesis, it is a numerical summary of the data set
Simple Composite Index
created when a researcher gathers data from many different sources without weighing any data more significantly than any other data
Simple Index Number
shows the change in price or quantity of a single good or service over time
Simple Linear Regression
A form of regression analysis with only one independent variable
Specification limits
The area, on either side of the centerline, or mean, of data plotted on a control chart that meets the customer’s requirements for a product or service. This area may be greater than or less than the area defined by the control limits
Standard deviation
The square root of the variance, a measure of how spread out the numbers are
Measure that is used to quantify the amount of variation or dispersion of a set of data values
how far on avg data points are from the mean
Standard Error (SE) of Estimate
The “average” deviation of the data points from the regression line or curve
Standard score
Also Z-scores, measure the distance from a piece of data from the mean compared to the entire population; method to compare two data sets together with different scales.
Statistics
The science that deals with the interpretation of numerical facts or data through theories of probability. Also, the numerical facts or data themselves.
Systematic Errors
Errors in measurement that are constant within a data set, sometimes caused by faulty equipment or bias
Random errors
are caused by unknown and unpredictable changes in the experiment. These changes may occur in the measuring instruments or in the environmental conditions.
test statistic
One value used to test the hypothesis, it is a numerical summary of the data set
The Result Chain
1) Resources – inputs and activities 2) Results – outputs then outcomes then impact
Time Series Analysis
Regression analysis that uses time as the independent variable
Trend
In data analysis, a general slope upward or downward over a long period of time
Trial
An experiment, a test of the performance or qualities of something or someone
Triple Blind Study
A study performed where neither the treatment allocator nor the participant nor the response gatherer knows which group the participant is in
True Score Model
average score an individual would achieve if he or she were to take the test infinite times; observed score is the true score plus random error
Valid Data
Data resulting from a test that accurately measures what it is intended to measure
Variance
The average of the squared differences from the mean of the related sample
Weighted Composite Index
created when a researcher applies more weight to certain goods or services than others as they are calculating the index number
Z-score
A statistical measure that indicates the number of standard deviations a data point is from its mean
less than 0 represents an element less than the mean
Pareto chart
A bar chart that sorts data into categories, then prioritizes those categories to help project teams identify the most significant factors or the biggest contributors to problems.
80/20 rule
states that 80% of quality management problems are the result of a small number – about 20% – of causes
expected value
a random variable is intuitively the long-run average value of repetitions of the experiment it represents
cohort study
A study that observes and follows people moving forward in time from the time of their entry into the study.
linear programming
A mathematical tool used to optimize a function (the objective function) subject to various constraints, all of which are linear. Often used to find the combination of products that will maximize profits or minimize costs.
correlation
The extent or degree of statistical association among two or more variables.
response bias
This misuse occurs when the respondents to a survey say what they believe the questioner wants to hear. This bias can occur as a result of the wording of a question.
Cost-effectiveness analysis
is a form of economic analysis that compares the relative costs and outcomes (effects) of two or more courses of action.
Association and causality
This statistical misuse occurs when a researcher notices a relationship between two variables and assumes that one variable is the cause of the other. In reality, these variables might both be caused by a separate variable. In this case, they would merely be correlated, which means they show up together. Or there might be no relationship at all.
operationalization
refers to the development of specific research procedures that allow for observation and measurement of abstract concepts
conscious bias
occurs when the surveyor is actively seeking a certain response to support his or her theory or cause
Bayes’ Theorem
A formula that calculates conditional probabilities, important in understanding how new information affects the probabilities of different outcomes.
conditional probability
the probability of an event occurring given that another event has occurred
Chi-square test
any statistical hypothesis test in which the sampling distribution of the test statistic is a chi-square distribution when the null hypothesis is true
SIPOC diagram
A diagram that defines the boundaries of a process and shows how its Suppliers, Inputs, Processes, Outputs, and Customers affect process quality.
Holistic view.
Ishikawa – 7 Basic Tools of Quality
1) Run Chart
2) Check sheet
3) Cause and effect diagram (fishbone diagram)
4) Histogram
5) Flow Chart
6) Scatter Diagram
7) Pareto chart
Six Sigma
A highly disciplined, data-driven approach that uses statistical analysis to measure and improve a company’s operational performance by identifying and eliminating defects in manufacturing and service processes; the term itself is commonly defined as 3.4 defects per million opportunities.
lean operations
Popularized by Six Sigma, business practices that use as little time, inventory, supplies and work as possible to create a dependable product or service. The less that is used, the less waste occurs, and the more money the business saves. Accuracy is also very important in POS (Point of Sale) systems, and the most accurate systems produce products and services without flaws, so nothing needs to be thrown away.
International Organization for Standardization (ISO)
Established a certification program that guarantees that an organization is dedicated to quality concepts and is continually working to ensure that it is producing highest level of quality possible. The certification shows that an organization has a quality management system in place to monitor and control quality issues and is continuing to meet the needs of customers and stakeholders with high-quality products and services.
Affinity diagram
A tool that helps teams sort verbal data or ideas into categories for further investigation or evaluation.
attribute data
Data that shows whether a result meets a requirement or not (yes/no, pass/fail).
check sheet
A structured form or table that lets practitioners collect and record data in a simple format; by putting marks on a table or image, team members can track and record information about the number, time, and location of events or problems
Common cause
variation
Variation that occurs as a natural part of a process.
Chance cause
Non-assignable cause
Noise
Natural pattern
Critical path
Generally, but not always, the sequence of schedule activities that determines the duration of the project. It is the longest path through the project. See also critical path methodology.
CTQ tree
A tree diagram that shows how customer needs or Critical to Quality
characteristics can be quantified and measured.
Design of
experiments
A method that uses statistical models to determine which combinations of
variables are most likely to lead to the desired quality results; the method tests multiple factors at once to see how they interact in producing an outcome.
Early finish dates
The early start dates for activities plus the amount of time that the activities will take.
Early start dates
The earliest dates that activities can start, based on the completion of any
predecessor activities.
Failure Mode and Effects
Analysis
An analytical procedure in which each potential failure mode in every component of a product is analyzed to determine its effect on the reliability of that component
and, by itself or in combination with other possible failure modes, on the reliability of the product or system and on the required function of the component; or the examination of a product (at the system and/or lower levels) for all ways that a
failure may occur. For each potential failure, an estimate is made of its effect on the total system and of its impact. In addition, a review is undertaken of the action
planned to minimize the probability of failure and to minimize its effects.
Fishbone
Diagram
A diagram that shows the underlying causes of a problem or event; also known as
a cause-and-effect diagram.
flowchart
A graphical representation of the flow of information in which symbols are used to
represent operations, data, reports generated, equipment, etc.
oval = begin/end
rectangle = process step
diamond = question
inverse triangle = wtg area
Interrelationship
digraph
A diagram that places the contributors to a problem in a circle and uses arrows to
show the cause-and-effect relationships among the contributors.
(relationship diagram)
Late finish dates
The latest dates that activities can finish without delaying the project, based on the
completion of any successor activities.
Late start dates
The latest activities can start without delaying the project. Late start dates are equal to the late finish dates for tasks minus the amount of time it takes to complete the tasks.
Lean Management
approach that seeks to maximize customer value while minimizing
waste.
Lower control limit
The minimum value on a control chart that a process should not exceed.
Matrix diagram
A table or chart that shows the strength of relationships between items or sets of items.
Metrics
Measurements that allow teams to gauge results objectively.
Network diagram
A graphic representation of the schedule that shows the sequence of project activities.
Option ranking
An indicator that explains how well an option will satisfy a criterion in a
prioritization matrix.
Plan-Do-Check-Act Cycle
A four-step method that practitioners use to create plans to solve a problem
(Plan), run an experiment to see if the plan will work (Do), check the experiment
results (Check), and implement changes to processes or policies (Act).
Prioritization matrix
A table or chart that helps a team prioritize multiple options, based on how well these options satisfy preselected criteria.
Process decision program chart
A tree diagram designed specifically to help uncover countermeasures or
contingency plans so problems can be solved or avoided.
Quality assurance
The function responsible for providing assurance that products or services are consistently maintained at a high level of quality. [CMA]
Quality control
A process, such as statistical sampling, that monitors the quality of operations.
[CMA]
Quality management
quality and the means to achieve it.
focus on customer, strong leadership, engaged employees, focus on process, systems approach, committ to const. improvement, fact decision making, collab relationship with suppliers
Rolled throughput yield
A statistical calculation that shows the probability of something passing
completely through a process with no rework or defects.
Sampling
the process of selecting research participants or survey respondents from a population.
Special cause variation
Abnormal variation that is not a natural part of a process.
Assignable cause
Signal
Unnatural pattern
Statistical Process Control
Methods that rely on statistics and measurements to monitor work and analyze
improvements to processes.
Tree diagram
A hierarchical tool that uses successive steps to break a topic down into its
components.
Upper control limit
The maximum value on a control chart that a process should not exceed.
Voice of the Customer
A planning technique used to provide products, services, and results that truly
reflect customer requirements by translating those customer requirements into the
appropriate technical requirements for each phase of project product
development.
Weighted score
A score calculated by multiplying a weighting factor by an option ranking;
weighted scores for each option in a prioritization matrix are added together to
help team prioritize options.
Weighting factor
An indicator of how important a criterion is to the completion of an objective
Critical Success Factors
The important things an entity must do to be successful, such as quality measures,
customer service, or efficiency.
Net income (after taxes) earned in excess of the amount of net income required to
earn the company’s cost of capital.
Net Promoter Score
A management tool designed to collect data indicating the relative loyalty of
customers and their willingness to recommend a company’s products or services.
Promoter
A category of customer used in the calculation of the Net Promoter Score that
indicates a loyal and enthusiastic customer.
box plot
(a.k.a. box and whisker diagram) is a standardized way of displaying the distribution of data based on the five number summary: minimum, first quartile, median, third quartile, and maximum.
t-test
helps you compare whether two groups have different average values (for example, whether men and women have different average heights).
bell curve
a graph of a normal (Gaussian) distribution, with a large rounded peak tapering away at each end.
fact-based decision making
These informed decisions reduce bias and foster trust in decisions and plans.
seasonality
Regular pattern of volatility, usually within a single year.
Cyclicality
Repetition of up (peaks) or down movements (troughs) that follow or counteract a business cycle that can last several years.
irregularity
One-time deviations from expectations caused by unforeseen circumstances such as war, natural disasters, poor weather, labor strikes, single-occurrence company-specific surprises or macroeconomic shocks.
trend
A general slope upward or downward over a long period of time.
random variation
The variability of a process which might be caused by irregular fluctuations due to chance that cannot be anticipated, detected, or eliminated.
cause-and-effect diagram
AKA Fishbone Diagram identifies possible causes for an effect or problem.
variable data
Data that shows how well a result meets a requirement, often shown on a scale or
as a rating.
Results-based management (RBM)
is a management strategy which uses feedback loops to achieve strategic goals.
uses results as the central measurement of performance. It has been adopted by many nonprofit and governmental institutions.
input – activities – output – outcome – impact
data quality management
cleans data and reduces amt of incomplete data
Observational studies
are also known as quasi-experimental studies. An observational study is sometimes used because it is impractical or impossible to control the conditions of the study.
prospective cohort study
observes people going forward in time from the time of their entry into the study.
experimental study
all variable measurements and manipulations are under the researcher’s control, including the subjects or participants.
experimental units
subjects or objects under observation
treatments
the procedures applied to each subject
responses
the effects of the experimental treatments
Construct validity
means that the construct has been generally accepted in the field. Closely related to this is content validity*, which refers to whether the construct measures what it claims to measure.
Content validity
may be questioned if the construct is too wide or too narrow.
Internal validity
concerns biases that may find their way into the data that is collected. These may be systematic biases, intentional biases, or self-serving biases. Any of these may lead to questions about your study’s internal validity.
statistical validity
do your results stand up to statistical scrutiny? This can be validated through the use of hypothesis testing
p-value
is the level of significance of a hypothesis test, represented as the probability of a certain event occuring.
t-statistic
determines whether specific individual variables are significantly related to the dependent variable.
F-value
in the ANOVA output displays the result of the ratio between the mean square of the regression and the mean square of the residual. This gives a relationship between the variability between the groups and within the groups.
decision tree
A diagram of possible alternatives and their expected consequences used to formulate possible courses of actions in order to make decisions.
Descriptive Analytics
which use data aggregation and data mining techniques to provide insight into the past and answer: “What has happened?”
Predictive Analytics
which use statistical models and forecasts techniques to understand the future and answer: “What could happen?”
Prescriptive Analytics
which use optimization and simulation algorithms to advice on possible outcomes and answer: “What should we do?”
intersection
denoted A∩B, is the collection of elements that are in both A and B.
union
written as U, is the chance of, for instance, Bob wearing a black suit OR a black pair of shoes.
Educate management on company performance
Can be used as a tool across an entire organization
Data-driven results make it easier to quantify performance
If used over time, can create an internal benchmarking system
Can be expensive and time-consuming to set up and use
Requires frequent, even on-going, maintenance and monitoring
Small changes in KPIs may be viewed as meaningful, but may not be statistically significant
Results are often only rough guide rather than a concrete measurement
Once designed and set up, difficult to change
Improves organization alignment
Improves internal and external communication
Links company operations with its strategy
Emphasizes strategy and organizational results
Requires time and effort to establish a meaningful scorecard
Does not illustrate a full picture of the company performance, particularly financial data
Sometimes difficult to maintain momentum
Requires a wide cross-section of the organization departments in developing the system
May not encourage desired behavior changes
consumer price index
A measure of the price level of a defined “basket” of consumer items purchased by households.
logistic regression
can be applied when the dependent variable is a categorical, binary variable, such as male/female, dead/living, gas/electric, etc.
For ordinary least squares (OLS)
regression analysis, homoscedasticity is assumed — occurs when all of the random variables have the same general finite variance.
weighted least squares
reflects the behavior of the random errors in the model; and it can be used with functions that are either linear or nonlinear in the parameters
bivariate charts
y & x axis – independent goes on x
cross over analysis
ID crossover point – point at with we are indifferent btwn plans
controllable inputs
direction controlled
cumulative
# of new cases in certain time period/person-time units
decision analysis
weighing all outcomes
deviation score
calc by subtracting mean from indv score
forecasting
judgmental – sales, consumer input
time-series – seasonality
associative – predictive
GIGO
garbage in/garbage out
linear relationship
measured by strength
strong then bunches around straight line
maximin
greatest min calculated
maximax
maximum payoff determined
monte carlo simulation
to approx prob of certain outcomes
trial runs, simulations
probabilistic inputs
outside direct control
skewness
measure of degree to which data “leans” toward one side
7 basic quality tools for non-numeric data
affinity diagram
interrelationship diagraph
tree diagram
matrix diagram
prioritization matrix
process decision program chart
network diagram
uniform probability
any outcome has same prob as any other outcome ie roll die
valid probability
P(x) btwn 0 and 1 & sum of probs = 1
what if analysis
simulation analysis
select diff vals for probabilistic inputs then compute outcomes
quality control
uncover defects
recognize problems
inspection/repair
reactive
quality assurance
prevent defects
understand intricacies
training
proactive
DMAIC
six sigma 5-step framework
define, measure, analyze, improve, control processes