Calorie Counting or Wavelength? A Knowledge Framework
August 9, 2025
Takeaway — We can categorize knowledge surveys of falsifiable questions into calorie counting and Wavelength problems. A calorie counting problem’s answer is best approximated by roughly averaging survey responses. Conversely, a Wavelength problem’s answer is best approximated by a single respondent’s reply.
Modern connectivity brings with it an ever-increasing deluge of information. Somewhat paradoxically, this trend makes finding signal and deriving sense from it increasingly difficult. Improving our epistemological understanding of how to handle existing and form new knowledge is paramount.
I offer a small contribution to the larger conversation: how to approximate truth for falsifiable questions where you have a survey of answers and when better heuristics are unavailing.
I want to call two things out. First, this model is unhelpful at best for non-falsifiable questions, like value judgments or policy decisions. And second, this is a fallback model. If you have literally any better way to assess what knowledge to trust (gut doesn’t count), use that. For example, if you can identify someone as particularly expert, if there is preexisting good process, like peer review, or even if you can find Aumann agreement,
see tailcalled,
Aumann-Agreement is Common, LessWrong (2023).
Hopefully I’m not reinventing the wheel here. I’d love links to relevant work, if you have them. And I’d love to hear your thoughts more generally.
Restaurants without nutrition facts present difficulties to calorie counters. The diet-conscious must make estimates (and so, concessions) to eat out. I researched this issue in a college entrepreneurship course with a team of CS PhDs. (Mostly their efforts: I was admittedly along for the ride.)
The solution was surprising. We found that we could reasonably estimate the calories in a photographed dish by (1) collecting 10-15 estimates from MTurk, (2) shaving outliers, and (3) averaging the remaining values (with a little special sauce). The result fell within 5-10% of the actual count. This held even where no individual estimate was that strong (MTurk respondents had no extrinsic motive to be correct, and every motivation to be fast.)
“Calorie counting” represents knowledge that is susceptible to mass discernment. If everyone shares their thoughts, the group can muddle their way into something roughly correct. Francis Galton also observed this in 1907 after holding a competition to guess an ox’s weight. When Galton averaged the 787 entries, he found the crowd had, as whole, guessed to within a pound of the ox’s actual weight.
See Kenneth F. Wallis,
Revisiting Francis Galton’s Forecasting Competition, 29 Inst. Math. Studies 420 (2014).
Calorie counting appears in other fields, too. Merged airline passenger datasets create a more accurate predictive model than either original. John Bates & Clive Granger,
The Combination of Forecasts, 20 Operational Rsch. Q. 451 (1969). The corporate case for diversity is that it improves strategy and decisionmaking. See, e.g., McKinsey & Company,
Diversity Matters Even More: The Case For Holistic Impact (2023). Prediction markets emerged from a similar intuition. And the epistemic concept of group knowledge is distinct but related. See Joshua Habgood-Coote,
Group Knowledge, Questions, and the Division of Epistemic Labour, 6 Ergo 33 (2019) (describing that, for example, NASA collectively can build a space shuttle, even if no individual employee could on their own).
Wavelength is a cooperative guessing game. Each round, a clue-giver receives a spectrum. The spectrum’s extremes are described by adjectives: e.g., “round” vs. “pointy”. And the spectrum has a bullseye target region: guesses closer to its center award more points.
The clue-giver crafts a prompt to help the group guess within the target region. A clue-giver might provide “curled-up hedgehog” for a target region whose center is right between “round” and “pointy,” reasoning that it is both round and pointy.
In my experience, players rapidly discover that compromise is a losing strategy. Often, one or more players have an accurate intuition about the clue-giver’s thought process. Thoughtless deviation results in a shift from the bullseye’s center in expectation. The optimal process is to determine whose reasoning is most correct and to align on that, making small adjustments is logical.
“Wavelength problems” require engagement with each respondent’s rationale. Common heuristics fail us. Confidence is insufficient: the most correct player may not know they are the most correct. So are empirics: the most correct player often changes from round to round. And experience falters, too: round-specific insights tend to outweigh general game skill.
“Disagree and commit” is the best-known real-world example of this philosophy. Greg Ballard,
Agree and Commit, Disagree and Commit, Stanford eCorner (2006). Lesser-known is Stanislav Petrov’s world-saving Cold War decision. Compare Anton Barba-Kay,
There Is No Ethical Automation: Stanislav Petrov’s Ordeal by Protocol, 23 J. Mil. Ethics 277 (2024) with Paul Scharre,
Army of None: Autonomous Weapons and the Future of War (2018) (misplaced soldier trust in AI leads to shooting down allied aircraft).
Translating theory to practice is often difficult. People have biases, ulterior motives, and cognitive barriers. But I think these models handle bad answers reasonably gracefully.
The calorie counting approach removes the worst answers in step 1 by shaving outliers. Then step 2 dilutes bad answers that escaped filtering, given reasonable n.
The Wavelength approach is resilient, too. A single deviant answer probably won’t affect the outcome because it is unlikely to be chosen. Its deviance hopefully even reduces its selection chances.
That said, the models don’t handle unknown systemic problems well. But I don’t think they are uniquely bad at handling population issues relative to other approaches, either.
The real theory-to-practice difficulty lies in categorizing questions as calorie counting or Wavelength. Hindsight makes this determination easy. By observing a past survey and its outcome, we can back into the correct methodology. That class of problem is then categorized for subsequent encounters.
But often, that luxury is absent. And some situations are distinct enough to render past experiences unhelpful. I wish I had more to say here, but I really don’t have a strong sense of how to confidently categorize novel problems. I’d love to hear what you think novel categorization involves. :)