This is a
rather contemplative essay, but I think it was important for me to ponder.
Numbers allow us to analyze the world.
Data
acquisition and analysis in order to arrive at conclusions and to answer
questions has long been the method of science. We often need to know what the
data shows. In order to do this, we need to analyze that data through analogy,
quantification, comparison to other data, and other statistical processes.
Wikipedia gives dictionary and encyclopedia definitions of statistics as
follows:
“Statistics is the discipline that concerns the
collection, organization, analysis, interpretation, and presentation of data.”
In line with that definition, statisticians are now often
known as data scientists. In data analysis, there are two main methods that are
used: descriptive statistics, where data is summarized via indexes such as a
mean or a standard deviation; and inferential statistics, where conclusions are
arrived at through random variation using observational errors and sampling
variation. Descriptive statistics are concerned with sample distribution and
population distribution, while inferential statistics are mathematical and based
on probability theory. According to Wikipedia: “Descriptive statistics is
distinguished from inferential statistics (or inductive statistics), in that
descriptive statistics aims to summarize a sample, rather than use the data to
learn about the population that the sample of data is thought to represent.”
Forms of data are often complementary. Numerical data tied
to spatial data, or geography in Geographic Information Systems (GIS) is often
very valuable and very useful. Data analysis in science often involves the
discovery of formulas and recipes that hold true when determining the best
courses of action.
I have seen
colleagues in the oil & gas business develop large databases that have
proved very useful in scientific analysis. I have even helped to develop some. Federal,
state and local departments and regulatory agencies also develop large
databases that are utilized in scientific papers, policy proposals, and for
practical comparisons and high grading/prioritization. Government data made
available to citizens, academics, and industry is also common and a very useful
practice.
Geospatial
databases allow information to be mapped in space, typically in 2D space, but
also in 3D space where applicable. The added dimension of place allows for
high-grading/low-grading and prioritization of project needs or of resource
development.
Data can be
scanned for patterns and signals. Humans and machines can do that scanning.
Humans can do it better in certain aspects and machines can do it better in
other aspects. Thus, one could hypothesize that humans and machines
collaborating on data analytics are currently the most optimized. As long as we
are still teaching machines what we figure out about data and they are
revealing hidden patterns in data that we cannot readily see, the partnership
will be optimized. Humans teach. Machines learn. Then machines teach in a kind
of feedback loop. Humans have been scanning their environments for patterns
very deep into their evolutionary past. Doing so has aided our survival, food
search, and search for mates. However, machines have advantages in scanning data for
patterns. They can work very fast and scan vast amounts of data without getting
tired. Thus, they have speed and stamina. They only make errors if the human
who directed them made them first. Machine capabilities to analyze large
datasets quickly makes them very valuable assistants to humans. However, it is
still up to humans to fully interpret the results of data analysis.
Knowing a
subject involves knowing its numbers, and its statistics. We need to know how much
of this and how much of that and we need to pick out any trends that show up in the data. We use data to
interpret the past, assess the present, and predict the future. We use it to
support our decisions in policy and action. We use data and data analysis to communicate
and show our proposed actions, to back them up. We follow the data, which
usually doesn’t lie. The data suggests courses of action and we must decide on
them.
Data can tell
stories. Those stories can support business intelligence or BI. Microsoft and
other companies sell tools to enhance BI. According to Microsoft:
What is business intelligence?
“Business intelligence (BI) uncovers insights for
making strategic decisions. Business intelligence tools analyze historical and
current data and present findings in intuitive visual formats.”
What is data storytelling?
“Data storytelling is the concept of building a
compelling narrative based on complex data and analytics that help tell your
story and influence and inform a particular audience.”
Microsoft gives four
steps to utilizing business intelligence:
Step 1: Collect and transform data from multiple
sources
Business intelligence tools typically use the extract,
transform, and load (ETL) method to aggregate structured and unstructured data
from multiple sources. This data is then transformed and remodeled before being
stored in a central location, so applications can easily analyze and query it
as one comprehensive data set.
Step 2: Uncover trends and inconsistencies
Data mining, or data discovery, typically uses automation
to quickly analyze data to find patterns and outliers which provide insight
into the current state of business. BI tools often feature several types of
data modeling and analytics—including exploratory, descriptive, statistical,
and predictive—that further explore data, predict trends, and make
recommendations.
Step 3: Use data visualization to present findings
Business intelligence reporting uses data visualizations
to make findings easier to understand and share. Reporting methods include
interactive data dashboards, charts, graphs, and maps that help users see
what’s going on in the business right now.
Step 4: Take action on insights in real time
Viewing current and historical data in context with
business activities gives companies the ability to quickly move from insights
to action. Business intelligence enables real time adjustments and long-term
strategic changes that eliminate inefficiencies, adapt to market shifts,
correct supply problems, and solve customer issues.
This can be summarized as 1) collecting data
from multiple sources; 2) finding meaningful patterns in that data; 3) communicating
that data to others through visualization; and 4) taking action based on
findings.
We can also
use data in biased ways by “cherry-picking” it to fit our preconceived
narratives. Microsoft gives the three key elements of data storytelling: 1)
build your narrative; 2) use visuals to enlighten; and 3) show data to support.
As a geologist, I remember when computerized mapping was new(ish) and many maps and
cross-sections were still done by hand. My supervisor at the time was rightly excited by what he called “the power of the computer.” It allowed us to analyze
more information faster and to test ideas quickly to see if the data supported
them. I always enjoyed making maps by hand because I could skew and orient them
the way I thought they should be skewed and oriented. However, it would not be
long before the computer could do that as well and so much more. We could now
analyze large datasets instantly. Sure, we still needed to weed out bad data
but now even that could be done faster and more effectively. The advantages
were massive. Later we could add GIS layers to our maps and pick and choose
different layers as needed. We used to plot out large maps and lay them out on
large tables for analysis. Now we can use computers, often with two or even
three large monitors synced as one in order to better scan and analyze.
The final parts
of data analysis are communicating that data to other decision-makers and acting
on it. Thus, the data supports the action. That is why cherry-picking can be
problematic as the data is used selectively to tell a story that is often
predetermined. That is a biased use of data where the data is used selectively to
fit a certain narrative instead of being freshly interpreted. As a scientist, I
know that the goal is the truth of the situation, whether it fits preconceived
ideas or not. That is not easy sometimes but it must be the way to do science. I have seen many media stories where data analysis
conclusions are interpreted differently by those with biases than by those
without them. Of course, we need to be aware of biased data analysis and to
call out cherry-picking when it occurs.
“The McNamara fallacy is what occurs when decision-makers
rely solely on quantitative metrics while ignoring qualitative factors. In
other words, it’s when you look at raw numbers rather than the nuances that
matter in the decision-making process.”
The article goes
on to suggest that over-reliance on numerical data is at the expense of stories
and subtle details. Perhaps the following quote sums up the article best:
“Data is a great starting point, and a great many idiotic
and dangerous things are done when we ignore data, but it doesn’t always make
for the best decisions.”
They give three
examples of applying the McNamara fallacy: 1) Dig into the details – look at the
numbers behind the numbers, 2) Consider the root causes – look into the reasons
behind the numbers, and 3) Pick up the phone – apparently, getting the details is
easier in person, including by phone.
References:
Statistics. Wikipedia. Statistics - Wikipedia
What is data storytelling? Microsoft BI. What
is Data Storytelling and Data Storytelling Examples | Microsoft Power BI
What
is business intelligence? Microsoft BI. What
Is Business Intelligence | Microsoft Power BI
The
“McNamara fallacy”: When data leads to the worst decision. Jonny Thompson. Big
Think. October 18, 2024. The "McNamara fallacy": When data leads to the
worst decision - Big Think
No comments:
Post a Comment