Video Summary

How to locate (almost) any photo

colsto

Main takeaways
01

Use a global building footprint dataset (Google Open Buildings) to collapse a massive search area into a finite candidate list.

02

Create a simple 3D terrain sketch from the photo and match it against a DEM to produce a topographical fingerprint.

03

Filter and rank candidates with automated scoring (terrain match + tree/land-cover filters) and then manually verify the top results in GIS.

04

Key improvements: higher-resolution DEMs and more precise terrain templates increase accuracy; workflow scales globally.

Key moments
Questions answered

Which primary datasets are used in this workflow?

The workflow uses Google Open Buildings for global building footprints, a 30‑meter resolution digital elevation model (DEM) for terrain, and ESA WorldCover (10 m) land-cover classification to filter vegetation.

Why not simply download satellite imagery for the entire search area?

Downloading all imagery would be impractical (roughly 15 TB for the area) and remote imagery can be too grainy to match; using datasets lets you filter the search space instead of brute-forcing imagery.

How does terrain fingerprinting work at scale?

Create a 3D sketch of terrain visible in the photo, convert that into a template, then compare it mathematically against DEM-derived terrain around each candidate building to score matches automatically.

How are candidates prioritized and reduced for manual review?

After initial density filtering, candidates are scored by terrain-match quality and a tree/land-cover percentage filter; the top-ranked sites (e.g., top 100–300) are exported to GIS for manual verification.

What are the main suggested improvements to increase accuracy?

Use a higher-resolution DEM where available and create a more precise terrain template/sketch — both changes improve matching fidelity and reduce false positives.

The Challenge of Finding an Off-Grid Location 00:18

"This place is actually findable, and everything you need is on the screen right here."

  • The speaker discovered a Twitter account showcasing an off-grid compound located in the Atlantic rainforest, a vast area analogous to the size of the entire East Coast of the United States.

  • The inherent challenge arises from the remote nature of the compound, resulting in no available address, tax records, or online presence.

  • Although it appears daunting, the speaker believes that the location can still be identified with the right tools and data.

Initial Approach and Data Limitations 00:49

"So while that would have been the simplest thing to do, that idea was pretty quickly out."

  • The initial idea was to download satellite imagery and utilize a Python script to match the images to a data tile. However, this approach proved impractical due to the enormity of data needed (around 15 terabytes) and the poor resolution of remote areas.

  • To streamline the search, the speaker highlights the importance of using a comprehensive dataset to filter down from, ideally a building dataset.

The Importance of a Comprehensive Building Dataset 01:16

"So all in all, I'm looking for a building dataset that spans these multiple states, is entirely exhaustive of all buildings, including off-grid structures."

  • The speaker stresses that an effective building dataset must cover multiple geographical regions and include all types of buildings, particularly off-grid structures that may not appear in conventional datasets created for taxation.

  • The Google Open Buildings dataset is identified as an exceptional resource, featuring building footprints globally created through machine learning algorithms applied to satellite imagery.

Filtering Candidates Using Density and Matching Techniques 03:08

"I just made a quick Python script that filters based on the density."

  • Once the Google Open Buildings dataset was downloaded, the speaker employed a Python script to filter results based on building density in the remote area, successfully narrowing the candidates down considerably.

  • The next step involved matching the remaining buildings with available photographic data. However, the condition of the data and the timing of its collection were uncertain, limiting further filtering.

Utilizing Topographic Features for Identification 03:56

"Every ridge, valley, and dimension can be used as a topographical fingerprint that makes this view pretty much unique."

  • The speaker illustrates how topographical features in photographs serve as unique identifiers for potential locations, facilitating the identification process for the remaining candidates.

  • By creating a 3D mock-up of the terrain using digital elevation models (DEMs), the speaker seeks to mathematically analyze each candidate's terrain in relation to the constructed model.

Establishing Scale and Orientation of the Terrain 05:35

"Now we have size and we also now have orientation here."

  • To accurately match the 3D mock-up with the DEM, the speaker determined the scale of the target structure using measurements from reference images, then adjusted the model's orientation based on recognizable celestial features (specifically, the Milky Way galaxy).

  • This process ensured that each candidate could be evaluated against the known dimensions and orientation of the actual location, setting the stage for further analysis.

Prioritizing Candidates Using Python Code 08:13

"What this Python code does is generate a list of prioritized sites."

  • After aligning the model with the DEM, the speaker utilized Python code to score the remaining candidates based on how closely their terrains matched the constructed model.

  • The resulting top 100 priority sites were then exported into a geographic information system (GIS) for manual examination, enabling a more focused and efficient search process.

Filtering for Accurate Photo Location Identification 08:31

"I added one more filter to the scoring: basically, what percentage of the area directly around it is dense green foliage."

  • The speaker demonstrates a method to quickly disqualify images based on the surrounding terrain. For example, in one photo, houses are surrounded by trees, which indicates it's not the correct location.

  • A significant improvement to the identification process involves using a dataset from the European Space Agency called World Cover, which classifies land types at a 10-meter resolution.

  • By focusing on dark green areas, the speaker refines the search to filter out obviously incorrect matches without over-penalizing for variations like non-tree areas.

Ranking and Manual Verification Process 09:24

"If it's in the top like 500 or 1,000, that will speed up the search a ton."

  • The speaker explains the ranking system for more than a million buildings based on terrain match and tree coverage, which identifies potential matches for a given photo.

  • Rather than relying solely on the top-ranked results, the approach allows for a manual check of the top 300 candidates, significantly streamlining the search process.

  • While assessing the candidates, the speaker makes observations about certain visible features, such as white trees, which might play a role in further refining the scoring, although this is not yet implemented.

Scalability and Practical Applications of the Method 10:35

"To me, the coolest part of all this is just how scalable it is."

  • The scalability of the methodology allows it to be applied worldwide, making it useful in various scenarios, such as search and rescue operations where identifying terrain based on available images is crucial.

  • The speaker highlights that even a rudimentary 3D model can yield meaningful results, suggesting that more experienced 3D modelers could achieve even better accuracy.

  • The impressive aspect of this proof of concept is its capacity to pinpoint photo locations from millions of buildings, illustrating the power of terrain analysis.