The Moneyball of Hiring: What Can Baseball Teach Us About Fair Recruitment?

CJ Choquer
Finding Needles in Haystacks
11 min readSep 22, 2021

--

The business world loves a good sports analogy. And if you want to perform like a leading sports team (or building a corporate culture like one), you should be recruiting like one. So, why don’t we? Because hiring is hard, undervalued in many companies, and worse, we think we’re doing it right. But we can do better. You can hire like the Red Sox and Applied can help.

Applied is the Moneyball of hiring. We use behavioural science to assess candidates that saves companies time (and therefore money), surfaces candidates that are too frequently overlooked (due to their name, their face, their education, or experience), and helps hiring managers mitigate their biases. In this article, I’m going to talk through why and how Applied is shaking up the hiring sector in the same way that Billie Beane shook up Baseball at the Oakland A’s.

👇 If you’re not interested in baseball, I’ve posted a few resources at the bottom for how to hire and source candidates without bias.

Moneyball is a book written by Michael Lewis (also famous for Flyboys, The Big Short, and The Undoing Project) about how the Oakland Athletics used statistics to recruit baseball players to have better ROI (return on investment), and set the MLB (Major League Baseball) record winning streak. This led them to almost winning the world series and ultimately changing professional baseball forever.

Applied is a hiring tool that candidates love. It debiases the process, saves time, and helps companies hire more ethically and efficiently. It uses behavioural and scenario based questions to help companies assess candidates. Candidates applications are masked so that hiring managers aren’t influenced by their gender, the colour of their skin, or if they went to an ivy league school.

*warning this blog is FULL of Brad Pitt memes.

What is Noisy data?

In order to fully explain how baseball changed and how hiring is changing, we need to look at noisiness in data. In science and engineering a signal is a function that carries information to or about something. When you flick a switch in your house it sends a signal to the light for the light to turn on (or off). A signal in data, is how useful, accurate, or predictive the data is whereas noise is the inaccurate, distracting or irrelevant data. For example, on a message board on reddit, there will be messages that are related to the topic of discussion and a lot more messages that are spam, trolls, or divergent to the topic (noisy messages). In any data, there is going be some misleading points that will distract you from finding an accurate answer. At Applied, we see CVs and resumes as being full of “noise” — lots of signifiers that distract from what the candidate is really trying to signal — their signal being that they have the skills for the job.

In Moneyball, this is seen when the scouts are talking about the players’ stats and whether or not they can “get on base.” Brad Pitt’s character tries to make it clear that the thing that matters in winning games in the next year, is getting players on base. All the rest, is noise — “throwing funny” isn’t a good reason not to recruit someone.

Nate Silver (the famous statistician who founded fivethirtyeight) wrote a book called Signal and the Noise, a book about the art and science of prediction. Silver also created a statistical model called PECOTA that ranks baseball players based off of more varied and historical baseball stats. He wasn’t the Jonah Hill character in Moneyball, but, he does cover it in a chapter about the predictability of recruiting baseball players. I bring up this book because it will help tease out this long sports analogy and help explain why Applied is the best way to assess candidates.

Your process is bad, and you should feel bad

What do you look for and how do you assess it?

In Silver’s book, he talks about data collected from baseball scouts as being both quantitative (numbers) and qualitative (first hand observation). Like many other statheads, economists, and scientists out there, Silver concludes that it has to be a bit of both, in order to give good predictions. Meaning, baseball teams can’t just rely on a statistical models like PECOTA, they also need scouts and recruiters, to go out into the field and make observations about players to make a more rounded decision.

Baseball is an incredibly controlled environment: There are only 10 positions to play at one time, there are 9 innings, and there are specific rules to follow. This is wildly different to corporate structures where the rules change depending on a company’s revenue, number of employees, what industry they’re in or what countries they operate in. Not only that, there are 10s if not 100s of different roles and jobs and various levels that people have. If a game that controlled hasn’t figured out how to measure potential or create an exact predictive model for successful players, how could we expect every HR or People team to do the same for employees?

In baseball, there are things that aren’t easily tracked and aspects that scouts can pick up on — which are listed below. We’re going to try and look at both below through the examples of eligibility, sift, and interview questions.

In the book, Silver maps out the intellectual and psychological behaviours that separate good baseball players from great baseball players (if you use Applied, in the app this what we call “spread” which I’ll come back to). These skills are qualitative which means that there isn’t an objective/number that you can allocate to a player. ie. it’s not quantitative like a stat in baseball or in professional careers, like a CFA certification, an MD, or a CSCS card.

Silver breaks these qualities down into five groups. Here is how they look in Applied:

Applied Required Skills Tagging

Eligibility — PECOTA

Let’s start with the easy bit — stats. They are straight forward and there really isn’t a right or wrong answer, but the scouts will want to know. These can be input into the eligibility section. Our app is definitely not built for all the stats that are input into PECOTA, and that is of course why PECOTA exists, to rank players off of these numerous stats. To keep in line with what leagues have been tracking since its beginnings, we could include runs, hits, putouts, assists, and errors. However, as the book states, these aren’t leading indicators for whether a player will be good, so to make it more simple and for the purposes of using Applied, we’ll stick to one — their PECOTA ranking.

Applied Eligibility View

Even though PECOTA takes in stats that could be considered “cv like,” Silver’s main approach is correct in that past performance in certain areas, does not predict future performance. “His findings are counterintuitive to most fans. ‘When you try to predict future E.R.A. (Earned Run Averages) with past E.R.A.’s, you’re making a mistake,’ Silver said. Silver found that the most predictive statistics, by a considerable margin, are a pitcher’s strikeout rate and walk rate. Home runs allowed, lefty-righty breakdowns and other data tell less about a pitcher’s future” These sorts of models aren’t perfect and they’re not always right which is why they shouldn’t be the be-all end-all for making decisions.

What this illustrates is that data can’t give us everything. All models are wrong, but some of them are useful. This is because in order to have a good dataset, you need to categorise. And when you categorise, you need to simplify. This often eliminates the much needed nuance of the situation. When we simplify, we make shortcuts. And when we have shortcuts, we often have bias. It’s why we built the Sift — while you can’t eliminate bias, you can work around it, structuring your process and approach to recruitment by creating interventions for these shortcuts.

The Sift — Scouts observations

Now we have The Sift. The Applied Sift is our way of assessing applications by masking the identity of the candidate and reorganising their answers to mitigate hiring managers biases for rank order effect (as well as many other biases). The goal of sift questions is to get candidates to think as though they already have the job and are slotted into a situation where they have to react, plan, solve, decide, prioritize etc. One of the examples that Silver talks about in his book is preparedness and work ethic. A typical sift question in Applied for a baseball player or for a scout to observe would be “what is their pre-game routine”? or even better, “your pre-game routine has been interrupted by x. This means you have to change your routine and have to plan a different approach. What are the first steps you take? Who do you need to talk to and is there anything that you need to support you?”

After interviewing many scouts, Silver concluded that there were 5 characteristics that scouts looked for. It’s rather convenient too that he groups these under five headings as noted in the spider graph above:

  • competitiveness and self-confidence
  • preparedness and work ethic
  • concentration and focus
  • adaptiveness and learning ability
  • stress management and humility
Applied Feedback Graph

It’s arguable that these are five qualities that any hiring manager would be looking for, regardless if they’re playing baseball, selling your product, or the person building it.

But the thing is these players aren’t going to be filling out or answering sift questions — they’re going to be playing. So why do the sift? Scouts will take notes, observations, and will most likely have to have some form that they’ll then have to report back to their team in order to make an informed decision of who they’re going to recruit. Not only that, before they set out to make their observations, they’re probably given a set of criteria (a rubric or scorecard) to make these notes any useful to the rest of the scouts. So, the scout would be the person writing in their observations in the sift questions with specific criteria based on the review guide (also known as a rubric/scorecard). Then, once completed, the rest of their team would be the reviewers and rank the observations also based on the notes taken by the observational scout/recruiter. It would also be an interesting test to compare scout’s notes based on the rubric/review guide.

How can we know if these are the right questions or observations to be making?

At Applied we use three metrics to assess if the questions in our library to showcase if they are good questions. These are:

  • Maturity = How frequently a question is used — the higher the score the better as it means the statistics behind the question are robust
  • Agreement = Reviewers agree which answers are good, which indicates the question is well designed
  • Spread = This metric indicates if the question separates the field of applicants
Applied Library Question

Coming back to the concept of spread — A lower score for the spread metric means that the question may not separate good applicants from great applicants. I think even when we have a library full of questions with high spread ratings, hiring managers would still get nervous about bringing in less people for interviews and would still want to go forward with interviewing applicants rather than just relying on sift questions. So why are interviews so important in hiring?

Interviewing — Do they Fit This Team?

Interviews are pretty good predictors of whether or not someone will be a good candidate for the role. In Schmidt & Hunter’s study, structured interviews are third from the top in terms of predictability. That being said, I can’t imagine anyone trying to sit down and do hundreds of interviews in one go with every person that applied. It’s why you need some sort of Sift to make a shortlist to save time and surface the best candidates.

But why even do an interview if you have the stats (eligibility) and observations (sift assessment)? Because this alone doesn’t factor in how well they will do. You have to factor in who else is on your team and what skills, knowledge, or attributes that might be missing. You also can’t ask every single question upfront — it’s time consuming for both candidates and reviewers.

Collaboration, motivation, and values, are all factors in building a really great team. You’re not evaluating/assessing this person in isolation. They have to work with a group of people you already employ/play with. Could this person have an approach that no one else has on the team? After you know if they have the essential skills to do the job (through Sift questions), interviews are a great follow up for further observation, skills testing, and value fit.

Hiring’s not that hard, right?

How can you not be romantic about baseball?

At the end of the film, Jonah Hill shows Brad Pitt a clip of an unlikely player hitting a home run: “Jeremy’s about to realise that the balls gone 60 ft over the fence. He hit a home run and didn’t even realise it.

Getting the chance to get up to bat is one thing, then hitting one out of the park not thinking you could is another. Many candidates who’ve gone through the Applied platform know this exact feeling — being the underdog, the overlooked, or the misfit — and getting a fair chance not just to be considered, but to even hit a home run. There are so many people who reach out to us to share their stories expressing their joy of applying through the platform, whether they got the job or not.

Regardless if they get the job, Applied made them feel like they hit a home run.

Resources

The Two Friends Who Changed How We Think About How We Think by Michael Lewis, New Yorker

What Works by Iris Bohnet

Moneyball: The Art of Winning an Unfair Game by Michael Lewis

Signal and the Noise by Nate Silver

Cognitive Bias Cheat Sheet

Recency Bias

Attribution Bias

Group Think

The Halo & Horn Effect

Contrast Effect

Confirmation Bias

A/B Testing with Applied

Improving Diversity

But is it faster?

--

--