Review: The Little Book of Data

Cover image of "The Little Book of Data" by Justin Evans

The Little Book of Data, Justin Evans. HarperCollins Leadership (ISBN: 9781400248353) 2025.

Summary: Stories of how people have used data to solve big problems and how that might apply in one’s own work.

Some of us are in data denial. We don’t think we need to understand it. Or its too complicated. Then, it’s just intimidating. And for some, it’s just downright evil. Justin Evans passionately believes that when we are in data denial, we forfeit a key resource for advancing our careers and our organizations. Data can help us solve big problems. Fundamentally, it’s not about math but about ideas. As an undergrad English major, Evans says anyone can understand this stuff if we don’t “give power to the twerps.” And while there are real concerns with surveillance capitalism, it is a Promethean fire. We wield both great power that charges us with responsibility.

Evans learned about the power of data to solve problems in a career that included work with Nielsen, Comcast, Samsung, and a start-up along the way. He wrote The Little Book of Data to tell stories of how data has solved a variety of big problems. And he helps us consider the opportunities this presents each of us in our chosen work.

But first he begins with a personal account of how we “shed” vast amounts of data every day. Our email accounts, our rideshare apps, GPS, streaming services, medical information systems…and so much more. A whole infrastructure has been created to identify, store, and utilize that information. And chances are, in whatever line of work you are in, data is there to help with the problems you are trying to solve.

For example, we are introduced to:

  • Herman Hollerith, who worked with the Census Bureau preparing for the 1890 census. There were an unprecedented number of variables on which they were to collect information. All of it would need to be cross-matchable. Hollerith created the punch card to collect this information and a tabulating machine to analyze different combinations of data. And so was born the enterprise we now know as IBM.
  • More contemporarily, we meet Priya, who developed analytics to study websites used to traffic women, enabling the NGO she worked for to build cases to rescue underage women.
  • Barry Glick started working for a company that had a division distributing maps to gas stations. It was called Mapquest. He figured out how to connect the vector data of driving directions to raster data used to make visual maps. And then they put it online…
  • Sharon Greene was an epidemiologist in New York City when COVID broke loose. Her team figured out a way to use daily testing data to identify hot spots, surge resources to them, resulting in dropping death rates in each of these spots.
  • Adam Greene developed textual analysis to identify loneliness among senior adults through phone conversation, helping seniors get more socially connected.

The stories help illustrate different aspects of data science from the development of artificial intelligence to how we use data to count, track, spot anomalies like impending earthquakes, match genetic attributes, certify grades of meat and measure performance. We learn about the use of data to crystallize complex information by meeting railroad nerd Henry Varnum Poor. Poor went from editing a railroad journal to create an objective resource to help those investing in railroads. Poor’s Manual of Railroads provided information on road miles, rolling stock, passenger numbers, freight tonnage…and the names of each director. Eventually this became Standard & Poor, and crystallized all this data into a rating, AAA to D (bankrupt).

Along the way, Evans tells stories from his own career journey. Each of the chapters concludes with a ‘key points” summary, thought starters, and “Where do we go next?”. Evans offers both inspiring stories combined with a “see, you can understand this” approach.

Most of the book was pretty positive about the potential of the world of big data. But Evans includes a chapter on data bullies along the way, those who use their expertise to conceal information. He offers a humorous account of how he asked such people to break down their claims and explain everything he didn’t understand.

At the end of the book, he returns to the power of large tech firms and the issue of secrecy, illustrating it with how the AI industry used large amounts of copyrighted material secretly to train its Large Language Models. He argues that our data might be tagged in such a way to establish provenance, allowing its licensed or unlicensed use to be tracked. He also argues for data advocates for industries where the use of data to make decisions having implications for the rest of us would be less opaque–health insurance companies for example.

On the whole, Evans approach is to illustrate different ways data has been used to solve problems that matter. He helps readers think about the problems they are trying to solve in this light. Therefore, data becomes a useful tool instead of an amorphous, intimidating reality. For me, one of the biggest takeaways was that data ultimately isn’t about crunching numbers but about asking good questions. Then we look for the data sets that will help us answer those questions. I found this an encouraging and empowering approach. Evans acknowledges the realities of our world, including the AI explosion. And helps us see the opportunity all this data represents.

____________________

Disclosure of Material Connection: I received a complimentary copy of this book for review from the publisher through LibraryThing’s Early Reviewers Program.

Finally, thanks for visiting Bob on Books. People aren’t reading blogs like they used to, so I appreciate that you spent time here. Feel to “look around” – see the tabs at the top of the website, and the right hand column. And use the buttons below to share this post. Blessings! [Adapted from Enough Light, a blog I follow.]

Review: Weapons of Math Destruction

weapons of math destruction

Weapons of Math DestructionCathy O’Neil. New York: Broadway Books, 2017.

Summary: An insider account of the algorithms that affect our lives, from going to college, to the ads we see online, to our chances of getting a job, being arrested, getting credit and insurance.

Big Data is indeed BIG. Mathematical algorithms shape who will see this post on their Facebook newsfeed. If you go to Amazon or another online bookseller, algorithms will suggest other books like this one you might be interested in. Have you seen all those ads about credit scores? They are more important than you might imagine. Algorithms used by employers and insurance companies determine your employability and insurability in part through these scores. Far more than another credit card (bad idea, by the way) or a mortgage are on the line. These algorithms seem objective, but how they are formulated, and the assumptions made in doing so mean the difference between useful tools that benefit people, and “black boxes” that thwart the flourishing of others, often unknown to them.

Cathy O’Neil should know. A tenure track math professor, she made the jump to Wall Street and became a “quant” who helped develop mathematical algorithms and witnessed, in the crash of 2008, the harm some of these caused. And she began to notice how algorithms often painfully impacted the lives of many others.  She describes how a teacher was fired because of the weighting of performance scores of a single class, despite other evaluations finding her an excellent teacher (afterwards it was found that there were a high number of erasures on tests for students who would have been in her class the previous year, suggesting these had been altered to improve scores).

As she looked at the algorithms responsible for such injustices, she came to dub them “Weapons of Math Destruction” or WMDs and she identified three characteristics of these WMDs:

  1. Opacity: those whose lives are affected by them have no idea of the factors and weighting of those factors that contributed to their “score”.
  2. Scale: how widely an algorithm is applied across industries and sectors of life can affect how much of one’s life is touched by a single formula. For example, the FICO scores mentioned above affect not only credit, but the ability to get a job, the cost of auto insurance, and your ability to rent an apartment.
  3. Damage: WMDs can reinforce other factors perpetuating a cycle of poverty, or incarceration.

She also shows that what makes these algorithms destructive is the use of proxy measurements. For example an employer may not know directly how savvy someone is as a marketer, and so they use a “proxy” measurement of how many Twitter followers that person has. Or age is used as a proxy for how safe a driver one is. For a group, the proxy may work well, and be utterly inaccurate for an individual that falls within that proxy group.

Then in successive chapters, she chronicles some of the ways WMDs operate in different parts of life. She discusses the U.S. News & World Report college rankings, and the use of algorithms in admissions processes. Social media uses algorithms to target advertising, which means some will see ads for for-profit schools and payday lenders, and others for upscale furnishing or Viagra, based on clicks, likes, searches, and comments. Policing strategies, including locations for intensified “stop and frisk” policing are shaped by another set of algorithms. Algorithms to filter resumes may use scoring algorithms that discriminate by address and psych exam algorithms may render others unemployable in a certain industry. Scheduling algorithms may promote efficiency at the expense of the ability of workers to sleep on a regular schedule, or arrange childcare, or work enough hours to qualify for health insurance. Algorithms sometimes shut people out from credit or low cost insurance when in fact they are good risks. She concludes by showing how algorithms determine ads and news we see (and don’t see). In an afterword she explores the flaws in algorithms revealed on the election of Donald Trump (algorithms, for example predicted Clinton would easily win Michigan and Wisconsin, where consequently she did not campaign, and lost by small margins).

In her conclusion, she makes the case not only for a code of ethics for mathematicians but also argues that regulation and audits of these algorithms are necessary. The value assumptions, as well as the mathematical methods of many algorithms are flawed, and yet opacity means those whose lives are most affected don’t even know what hit them.

She helps us see both the sinister and useful side of these algorithms. They may reveal where a pro-active intervention may save a family from descending into family violence, or provide extra assistance to a child in danger of falling behind in a key subject. Or they may be used to invade personal rights, or even to perpetuate socio-economic divides in a society. The reality is that the problem is not the math but the old GIGO problem (garbage in, garbage out). The values and assumptions of the humans who devise the formulas and weightings of values and the use of proxies determine what may be destructive outcomes for some people. Yet it can be hidden behind an app, a program, an algorithm.

The massive explosion in storage capacities, processing speeds, and the way our interests, health status, travel patterns, spending patterns, fitness, diet and sleep habits, our political inclinations and more may be tracked via our online and smartphone usage makes O’Neil’s warning an urgent one. We create mountains of data that may be increasingly mined by government and private interests. Perhaps as important as asking whether this will be governed in ways that contribute to our flourishing, is whether we will be alert enough to care.

____________________________

Disclosure of Material Connection: I received this book free from the publisher. I was not required to write a positive review. The opinions I have expressed are my own.

 

Forget the NSA, It’s the Data Brokers We Should Fear!

It may be that the issue of the government surveilling our every move could be the least of our worries. In our online and highly networked world it appears according to an article in today’s Washington Post, that data brokers may know more about us than some of our family members. These brokers harvest profiles of us not only from public records but also via our credit card and shopper card usage and our online behavior. It is not at all an accident that my grocery store sends me emails with digital coupons for products I purchase or competing brands.

What was disturbing are the inferences that might be made from sites we could visit out of mere curiosity. For example, what inferences might be drawn from visits to sites referencing high cholesterol or diabetes? In my wife’s case, and she is a web newbie, she noticed ads showing up in email after visiting a few sites she was casually curious about, even though she had not subscribed to the emails. Of course, all our online activity is logged whether it is social media, web searches or purchases from online vendors. Supposedly, this information cannot be used to determine insurance rates, to make job offers, or determine credit worthiness. Given other abuses of “big data” I’m not reassured.

Funny and scary at the same time is the fact that these brokers segment us into categories based on all this information, such as “Bible Lifestyle” or “Affluent Baby Boomer”. Some are even less complimentary, such as “Rural Everlasting” which describes older people of “low educational attainment and low net worths.” What drives much of this is the effort to tailor marketing to our interests, whether it is those book recommendations on Amazon, or the ads on Facebook, or even the content we see on our newsfeeds.

Most of what the data brokers know about us is not known to us. How many of you have ever heard of these companies:  Acxiom, CoreLogic, Datalogix, eBureau, ID Analytics, Intelius, PeekYou, Rapleaf and Recorded Future? The article discusses the lack of transparency in this industry, which does not deal directly with the public, but obtains its information from third parties and from each other and markets it to various vendors.

At very least, it seems utterly reasonable that we have access to these profiles, just as we do to credit records, our own health records, personnel records and more. It would also seem proper, although this could be complicated, to know who else has had access to these records–who knows what this profile says about us? And it seems that there should be established protocols to amend erroneous information that could have a harmful impact upon us. So far, however, the FTC and Congress seem unwilling to afford these opportunities to us or consider any further regulation of this industry.

What all this means is very simple: for most of us, there is very little about our lives that is private–our online activity, many of our purchases via online or physical stores, our medical records, our employment, salary, and housing, our hobbies, beliefs and relationships. Many of us actually accept this or even value the convenience of advertising tailored to our interests, which may be why there is so little stink about the NSA revelations. The act of writing this blog makes a public record of my thoughts and whatever activities, family history, whatever else I post. There is probably no way to completely avoid this although going off the net and cash only would greatly reduce the “data points”.

The Latin phrase, Coram Deo, means “in the presence of God.” It reflects the idea of being consciously aware that one’s life is lived every moment under the watchful care of God and for God’s glory. Perhaps the greatest source of freedom in the midst of our highly surveilled lives comes from living unashamedly before God. Perhaps Coram Deo might be our greatest protection from Coram Data.