Is it possible to use old data gathered over the years to glean new insights that provide a game-changing competitive advantage once thought impossible?

To find out, look no further than this year’s Chicago Cubs, who currently enjoy the best record in baseball, making them favorites to win …their first World Series since 1908. Such an outcome would certainly come as a shock given how many fans believe the Cubbies are baseball’s most cursed franchise. But then again, many felt the same about the Boston Red Sox, who went 85 years without a World Series title before finally popping champagne in 2004.

What’s interesting about these two clubs is what they have in common – current Cubs’ president Theo Epstein, who was the general manager of those Red Sox championship teams.

“How could one man turn around the fortunes of baseball’s two most famously futile franchises?
In a word, data.”

As one of the earliest and most successful adopters of baseballs new statistics developed by a slew of data scientists known as sabremetricians (derived from the Society of Advanced Baseball Research), Epstein was able to capitalize on insights that showed that traditional metrics like batting average, home runs and RBI were not the best predictors of success as measured by wins and losses. Using data gathered over more than a century, augmented with new data not previously captured, the data scientists were able to define new measures that translated directly into wins and losses. Those clubs that adopted the new metrics, like the Oakland A’s and the aforementioned Cubs and Red Sox benefited almost immediately.

So, how did baseball finally see the light after more than a century? Through three simple steps available to virtually every business regarding data – collect, process and analyze.


Baseball has been collecting data as long as there’s been baseball. Scorebooks became box scores in local papers, which in turn became team and player stats. Standings told the story of wins, losses and championships. These were baseball’s equivalents of point-of-sale systems, ledger books and annual reports.

And for decades that data sat there, siloed in newspapers, yearbooks and team offices until a group of geeky fans decided to apply their PhD educations in statistics to their love of baseball.


They began by aggregating the reams of data available into a central location. This involved painstaking review of box scores and newspaper accounts to generate a granular history of the game. No detail was too small since it might hold the key to the Holy Grail – determining what best translates into wins and losses. Once the historical data was finalized, they kept it fresh with real time updates of new data.


Then came the heavy lifting. It wasn’t enough to simply report numbers – they had to translate into wins and losses. So these data scientists and statisticians created algorithms to simulate games using historical data. Eventually, their algorithms were able to roughly predict game outcomes based upon the inputs of hits, walks, outs and other stats available to them. But they determined it wasn’t enough. They needed additional data. So they began capturing how and where balls were hit (spray charts) and other data points to provide a better predictive picture.


This data sat largely unused for nearly two decades. For all their insight, all their passion for the game, these baseball geeks could not impact what was taking place on the field.

Enter Theo Epstein. Following on the success of Billy Beane’s Oakland Athletics, as described in the book Moneyball, Epstein put the lessons learned into practice. And the rest, as they say, is history.

The Author

Paul Sydlowski is a self-confessed data junkie. He is also an experienced leader with a career in big business (Procter & Gamble, Abbott Laboratories, Conoco/DuPont), and also in the SMB trenches. He has a a record of maximizing performance by identifying, developing and delivering solutions to operating or cultural challenges. He is passionate about organizational culture, customer service, analytics, and people development as the foundation for finding the best way to deliver business value.

InnerJoin Technologies logo.Paul is the Director of Business Development for INNERJOIN Technologies.. Innerjoin is a team of data scientists who assist with the collection, processing and analysis needed to transform siloed data into actionable insights. If data is not available, Innerjoin can begin the collection process through CRM implementations in conjunction with POS systems, web portals and mobile apps. Automated data hygiene, append and warehousing will put that data in one secure, cohesive location. Our data experts put algorithms and machine learning to work, just as sabremetricians did for baseball, in order to develop predictive models that provide game-changing insights, allowing strategies to be focused with laser-like precision.