The Hidden Choices Behind Geographic Data

Data Literacy

Geographic data, or information organized by location, may look precise, but it’s shaped by choices. How boundaries are drawn, how data are grouped, and how small numbers are handled can all change the story you see.

Geographic data, or information tied to a specific place, often appears clean and straightforward – zip code, census tract (small areas used to collect population data), town, county, state, country. Nice neat boxes on a map. But those boxes do a lot of heavy lifting. They help shape real-world decisions, like where hospitals and schools are built and how funding and resources are distributed. In public health, they’re used to track disease outbreaks and identify communities that may need more support.

So, looking at data by place seems like a great way to make sense of what’s happening, right?

It sounds simple enough. But “place” is far more complicated than it seems.

For starters, geographic units weren’t designed with data analysis in mind… at all. ZIP codes, for example, were created by the United States Postal Service to deliver mail, NOT to define communities. Which honestly makes sense when you consider that Santa Claus and Smokey Bear both have their own ZIP codes, and somewhere in Michigan, there’s a ZIP code that is literally just three boats floating around (How the ZIP code organized America, and has it gone too far? : Planet Money : NPR). Not exactly your standard “population unit” (basically a group of people you can consistently count and compare in a meaningful way).

Census tracts, defined by the U.S. Census Bureau, are more data-friendly, but they don’t always reflect how people actually think about their neighborhoods. And then there are towns and counties, which are shaped by history, politics, and administrative decisions that often have little to do with how people live, access care, or experience health today. (2024 County Government Primer_update_v8_FINAL.pdf).

So, while maps may look precise, the boundaries on them don’t always match real life. Analysts are often working with imperfect, and at times, absurd, ways of grouping people into what we call “community”.

Data availability adds another wrinkle. In theory, the smaller the geography, the more detailed the insight. In practice, smaller areas often come with smaller numbers, and that’s where things get tricky. A small town might only have a handful of events (like deaths) in a given year, which means rates can swing wildly from one year to the next. It can look like something alarming is happening when really, it’s just math being a little dramatic.

And sometimes, the population itself isn’t even stable. Take Nantucket, for example: an island off the coast of Massachusetts with a year-round population of roughly 14,000–15,000 that can swell to 50,000 or even 80,000 in the summer months (Nantucket Current | New Data Shows Explosive Growth Of Nantucket’s Summer Population). So what exactly is your denominator? Better put, what total population are you using to calculate your rate, and who’s included in it? Are you counting only year-round residents, or everyone on the island at a given time?

The answer, of course, depends on what you’re trying to measure. If you’re studying long-term health outcomes, you might focus on year-round residents. But if you’re looking at things like emergency services or seasonal risk, the summer population may be more relevant. Same place, yet very different answers depending on who you decide “counts.”

On the flip side, zooming out to the county or state level smooths things out nicely, but at a cost. That stability can hide important local differences, like pockets of higher risk that get averaged away. In some cities, neighboring ZIP codes just a few miles apart can have life expectancy differences of 10, 15, or even 20 years (Mapping Inequality) (How Neighborhoods Shape Health and Opportunity – Community Health and Economic Prosperity – NCBI Bookshelf). When those areas are combined into a single town, county, or state average, those stark disparities disappear, replaced by a number that looks “normal” but tells only part of the story. It’s the classic tradeoff: do you want detail, or do you want precision? (Spoiler: you rarely get both.)

⚖️ EQUITY ALERT: Those differences aren’t random. They often come from long-standing unfair systems, like redlining, housing segregation, and unequal access to resources, that shape where people live and what they have access to. These same systems also influence how boundaries are drawn in the first place, often reflecting patterns of race and income.

Not everyone who lives near each other shares the same resources or experiences. Even something as simple as distance can mean very different things. For example, living two miles from a grocery store might be manageable in a place with a car or good public transportation, but much harder in a community without those options (Mapping Inequality). When we average data across larger areas, we risk smoothing over not just differences, but inequities.

And even if you’ve made peace with your geographic unit and your sample size, there’s still the small matter of actually working with the data. Geographic identifiers are notoriously inconsistent. The same town might appear as “St. Louis”, “Saint Louis”, or “St Louis” depending on the dataset. Even something as simple as Foxborough vs. Foxboro in Massachusetts can turn into a surprisingly passionate debate, especially among people who grew up there and are convinced their version is the only correct one. Directions get abbreviated, punctuation comes and goes, and suddenly your clean merge turns into a detective story.

Taken together, all of this points to a simple but important truth: geographic data isn’t nearly as precise or straight forward as it seems. Every decision, from how you define a place to how you group data and handle small numbers, shapes the story your analysis tells.

So… how can you make sense of all of this?

You don’t have to be an analyst to think critically about geographic data – you just need to ask a few good questions.

Start with this: What is this data trying to show? Is it giving a big-picture view across a region, or trying to say something about a specific community? The answer can change how you interpret what you’re seeing.

When you come across maps or statistics, it’s worth pausing and asking a few simple questions:

  • What area is this actually showing?
  • How were the boundaries chosen?
  • Who is included… and who might be missing?
  • What differences might be hidden when everything is averaged together?
  • Is this based on a small number of people, where even a few cases could change the results?
  • Do the numbers stay steady over time, or do they jump around from year to year?

You don’t need to have all the answers, but asking the questions can help you better understand what the data is (and isn’t) telling you.

It also helps to keep a few things in mind. Data from smaller areas can show more detail, but it can also be less reliable. Larger areas are more stable, but they can hide important differences. And the boundaries used, from zip codes to counties, aren’t perfect; they’re just one way of grouping people and places.

At the end of the day, looking at data by place isn’t just about what’s on the map. It’s about understanding the choices behind it and thinking carefully about what the data really shows. Because, in the end, those choices shape the story you see.

Link to Original Substack Post