Developer-guideFeatured

Create Geotagged PDFs from Photos with EXIF and WebP

14 min read
Alexander Georges
developer-guide for create geotagged PDFs from photos

As the founder of WebP2PDF I've handled thousands of user requests where the core requirement wasn't just "convert images to PDF" but to preserve photographic context — GPS, date, camera model, and other EXIF fields. This guide is a practical, hands-on developer reference for how to create geotagged PDFs from photos using WebP images and EXIF data, with workflows that work for single documents, multi-page reports, and large archival batches.

You'll get a clear picture of how EXIF metadata is stored in modern WebP files, options for copying that metadata into a PDF container (XMP), trade-offs for compatibility and privacy, and step-by-step patterns you can use on the client, server, or in a serverless pipeline. I'll include benchmarks, a comparison table of approaches, command-line snippets, and troubleshooting tips gathered from operating a live browser-based converter used by thousands of people.

Why embed geotags in PDFs? Use cases and benefits

Geotagged PDFs are useful when images need to carry spatial context in a portable, printable container. PDFs are the lingua franca for reporting — inspectors, field surveyors, real estate professionals, journalists, and legal teams often prefer PDFs because they are easy to annotate, sign, and archive.

Key use cases:

  • Multi-page field reports where each photo shows location and timestamp.
  • Project documentation and asset inventories where a single PDF bundles photos with searchable metadata.
  • Legal or insurance submissions where a stable document format (PDF/A) is required but the photos' geolocation must be preserved for chain-of-custody.
  • Archival storage where an organization wants to keep images and their EXIF together in a single file while retaining the ability to extract GPS coordinates later.

Technical background: EXIF, XMP, WebP metadata and PDF

EXIF is the standard container for camera metadata including GPS tags; many mobile devices write GPSLatitude, GPSLongitude, and related tags in decimal or DMS format. WebP supports embedding EXIF and XMP sections inside the RIFF container, so a WebP file can carry full photographic metadata.

When you create geotagged PDFs from photos you have two main strategies:

  1. Keep the image file embedded in the PDF with its EXIF block intact; the PDF page contains the image stream and the EXIF chunk remains retrievable by tools that extract image data from a PDF.
  2. Copy the EXIF GPS fields into the PDF's XMP metadata or custom PDF metadata fields (often under http://ns.adobe.com/xap/1.0/ or an EXIF-specific XMP schema) so that the PDF itself is the carrier of the GPS information, not just the embedded image.

PDF viewers rarely display GPS data out of the box. The point of embedding geotags is long-term preservation, programmatic extraction, and enabling workflows (mapping, indexing) that can read GPS from the PDF or its embedded images.

How PDF geotagging works: practical approaches

There are three practical approaches I recommend, depending on goals for compatibility, size, and toolchain simplicity:

  • Embed image with EXIF intact — Lowest implementation complexity when your PDF generator preserves image binary and doesn't strip metadata. This works well if you need to ensure the original image is extractable later.
  • Copy EXIF to PDF XMP — Best for making the PDF itself self-describing. XMP metadata is searchable and sits in the PDF metadata stream. Extraction tools (exiftool, custom parsers, GIS software) can read these fields.
  • Create a GeoPDF-style sidecar layer — More complex: create spatial layers or annotations in the PDF that represent coordinates; useful for interactive maps but often requires specialized tooling (GeoPDF tools).

Recommended toolset

My production toolset is a combination of small, reliable building blocks so I can scale from a browser to serverless functions:

  • exiftool (CLI) — extract and normalize EXIF to JSON or XMP. Stable and battle-tested.
  • pdf-lib (JavaScript) or pikepdf (Python) — programmatic PDF creation and XMP metadata writing without heavy external dependencies.
  • WebP encoder/decoder (libwebp, sharp, browser-native) — ensure images are not re-compressed unnecessarily.
  • Ghostscript — optional for PDF/A conformance or flattening.
  • WebP2PDF.com — when you need a simple browser-based conversion with EXIF preservation.

Step-by-step: Create a geotagged PDF from WebP photos (recommended server-side pattern)

High level steps:

  1. Extract EXIF/GPS from the WebP image.
  2. Create a PDF page for the image and embed the image bytes without recompressing where possible.
  3. Write the GPS fields into the PDF metadata (XMP) and, optionally, as a per-page annotation (custom metadata dictionary).
  4. Output or validate the PDF, then optionally convert to PDF/A for archival.

Here's a minimal command-line + pseudo-code flow that works well for batch processing and can be run in a serverless function or CI job.

exiftool -j image.webp > image-exif.json

Section spacer

In production I prefer JSON extraction so I can normalize coordinate formats (decimal degrees) and timestamp formats (ISO 8601). Example minimal Node.js pseudocode using pdf-lib (described, not shown fully): extract the JSON, create a PDF, embed the image stream, then set doc.setMetadata({ 'XMP:GPSLatitude': ..., 'XMP:GPSLongitude': ... }). Many libraries provide a way to inject raw XMP XML into the PDF metadata stream.

Client-side pattern (browser-based) with privacy controls

Client-side conversion is ideal when photos are sensitive — keep data local to the user's browser. The flow in the browser is similar but uses browser APIs:

  1. User selects files via an <input type="file"> or drag-and-drop.
  2. Use FileReader to get the binary WebP; use a small JS EXIF parser to extract GPS without sending files to the server.
  3. Create a PDF with a client-side PDF library, embed the image as a page, add XMP metadata with GPS payload, then offer the PDF for download.

When building this, be mindful of memory: generating PDFs with hundreds of high-resolution images in the browser can exceed memory caps. For large batches, upload to a secure server or use chunked serverless jobs.

Troubleshooting: common issues and fixes

Below are practical problems we regularly see and steps to resolve them:

  • Missing GPS after conversion — Check if your PDF generator re-encodes the image and strips metadata. If so, either embed the original image stream or copy EXIF into XMP explicitly.
  • Incorrect coordinate format — EXIF sometimes stores DMS; convert to decimal degrees when writing XMP. Use a small utility to normalize values (e.g., decimal = degrees + minutes/60 + seconds/3600).
  • Orientation wrong — EXIF Orientation may be ignored; apply orientation during image embedding or set page rotation in the PDF.
  • Margins or scaling — Decide if images should be fit-to-page or placed with specific margins. Compute target DPI for printing (300 DPI often recommended) and scale accordingly.
  • Viewer doesn't show GPS — Most PDF viewers don't surface GPS fields; use exiftool or a map visualization script to extract and display the coordinates.

Privacy and legal considerations

Geotags reveal sensitive location data. Always provide a clear UI option to strip EXIF before conversion. For legal evidence, maintain an audit trail (who created the PDF and when). If your workflow includes cloud storage, encrypt or apply access controls to geotagged documents.

Comparison table: embedding strategies

Section spacer

ApproachPreserves original EXIFViewer compatibilitySize overheadBest for
Embed image with EXIF intactYes (image chunk preserved)Low (image extractors can read it)Low (no extra copy)Archival, extractable originals
Copy EXIF to PDF XMPNo (unless also embed the original)Medium (tools can read PDF XMP)Small (~1–10 KB per image)Self-describing PDFs, search/indexing
GeoPDF / spatial annotationsVariesLow (requires specialized viewers)MediumInteractive mapping and overlays

Section spacer

Example benchmark: file sizes and overhead (practical numbers)

Benchmarks vary by image content and compression settings. Below are representative numbers from a synthetic sample of 50 smartphone photos (12 MP) converted to WebP with perceptual compression and then packaged into PDFs using a pipeline that preserves binary images and optionally copies XMP metadata.

MetricAvg per imageNotes
WebP file size1.8 MBPerceptual quality ~85/100, typical for smartphone photos
JPEG equivalent3.0 MBWebP ~40% smaller on this sample
PDF page with embedded WebP (no XMP)1.85 MB~+50 KB overhead for PDF objects
PDF page with XMP GPS1.855 MB~+5 KB per image for GPS/XMP
Batch 50-image PDF (no XMP)~92.5 MBAggregate of pages + PDF container overhead
Batch 50-image PDF (with XMP)~93 MB~+250 KB total for GPS/XMP

Section spacer

Conclusion from these numbers: embedding GPS as XMP adds negligible size compared to the image data, and preserving the original WebP stream is the most storage-efficient approach if you want extractability later.

Developer guide: detailed sample workflows

Below are three concrete workflows: a single-file quick conversion, a serverless batch, and a browser-first privacy-preserving flow. These are patterns I've used in production at scale with WebP2PDF.com.

Quick CLI workflow (single file)

Use exiftool to read GPS and write into a newly created PDF's XMP using a lightweight PDF writer. If you use a library that supports injecting raw XMP XML into the metadata stream, populate GPS fields there.

exiftool -j image.webp > image.json

Section spacer

Then parse image.json to extract GPSLatitude and GPSLongitude and create a minimal XMP packet to write into the PDF metadata. Many PDF libraries expose a way to set the /Metadata stream to an XML string.

Serverless batch pattern (recommended for scale)

1) Upload photos to temporary cloud storage (S3 or equivalent) with server-side encryption. 2) Fire a serverless job per photo that extracts EXIF and writes a small JSON sidecar. 3) When all images for a document are processed, a final job pulls image bytes and sidecars and builds a multi-page PDF, adding XMP metadata at the document level and optionally per-page custom properties.

Benefits: parallelizable, low memory per job, and easy retries. Keep a separate step for validation and PDF/A conversion if required.

Browser-first (local conversion, privacy-first)

1) Read files with FileReader or the modern File System Access API. 2) Use a client-side EXIF parser that understands WebP EXIF blocks. 3) Build PDF pages with pdf-lib or a lighter-weight client-side writer, inject XMP metadata, then create a blob URL for download. This keeps data off your servers and gives users control to strip or include geotags.

Implementation details: converting EXIF formats and precision

EXIF stores GPS in a couple of ways: as rational DMS fields (degrees, minutes, seconds) and as directional indicators (N/S, E/W). Normalizing to signed decimal degrees avoids ambiguity: decimal = degrees + minutes/60 + seconds/3600, and apply negative sign for S/W.

Precision matters: store GPS to at least 6 decimal places for sub-meter precision when needed; 5 decimals ~1.1 meters at the equator, 6 decimals ~0.11 meters. For many applications 4–5 decimals are sufficient.

Mapping and visualization after PDF creation

Embedding geotags is only useful if they can be extracted. For quick extraction use exiftool:

exiftool -json document.pdf

Section spacer

This will show any XMP or embedded image EXIF fields that exiftool can see. For mapping, write a small script to parse coordinates from the PDF metadata or extract embedded images and read their EXIF, then produce a GeoJSON file for visualization in any mapping library (Leaflet, Mapbox, OpenLayers).

When to use PDF versus keeping original images

Use PDF when your workflow needs bundling, annotation, signatures, or a printable fixed-layout report. Keep the original images when you need full fidelity for image analysis, machine vision, or when applications expect standalone images with EXIF. The best approach is often hybrid: keep originals in archival storage and produce geotagged PDFs as reports.

Compatibility checklist before production deployment

  • Verify your chosen PDF generator preserves image binary if you need extractability.
  • Confirm exiftool or your parsing library can read WebP EXIF blocks in your environment.
  • Decide whether to write XMP at the document level or per-page; per-page metadata is easier to map to pages but requires more careful structure in the PDF.
  • Test with representative devices (iPhone HEIC converted to WebP, Android WebP) — camera EXIF variants can surprise you.
  • Include an opt-out for geotags in user-facing tools.

Tools and libraries I trust and why

When building WebP-to-PDF pipelines that preserve geotags, pick reliable primitives:

  • exiftool — universal metadata extraction and writing. It understands many vendor-specific tags and can produce JSON for programmatic flows.
  • pdf-lib (JS) — small, dependency-free PDF creation that lets you inject metadata and embed binary images without recompression.
  • pikepdf (Python) — powerful for PDF manipulation built on qpdf, good for server pipelines.
  • sharp / libvips — image transformations when you must resize or re-encode (try to avoid re-encoding if you want to preserve original EXIF).

Practical examples and micro-optimizations

When assembling multi-page PDFs from a folder of WebP photos, follow these micro-optimizations I use in production:

  1. Stream images into the PDF writer to avoid loading all images into memory.
  2. When adding XMP, build a single XMP packet with repeated GPS elements per page instead of many separate metadata objects, reducing overhead.
  3. If the PDF will be printed, calculate target pixel dimensions based on print DPI (e.g., 300 DPI) and downsample if originals far exceed that size to save storage.
  4. If you expect programmatic extraction, create a small machine-readable table-of-contents page at the beginning of the PDF listing filenames, ISO timestamps, and decimal GPS coordinates.

References and further reading

For standards and compatibility checks refer to authoritative resources:

Troubleshooting checklist (quick)

If your PDF is missing geotags, run this checklist:

  1. Run exiftool -json file.pdf to inspect embedded metadata.
  2. Check if the PDF generator stripped metadata — test by embedding a small image with known EXIF and verify extractability.
  3. Confirm GPS format normalization — ensure North/South and East/West are handled correctly.
  4. Validate any per-page metadata placement — some libraries add only document-level metadata by default.
  5. Test on multiple viewers and extract with exiftool to isolate viewer limitations from missing metadata.

Actionable checklist before releasing a feature that embeds GPS

  • Provide explicit user consent and an option to remove geotags.
  • Document the exact metadata fields you write (GPSLatitude, GPSLongitude, GPSAltitude, GPSDateStamp).
  • Validate coordinate precision and format.
  • Audit storage and access policies for geotagged PDFs.
  • Include a mapping export (GeoJSON) option for users who want to visualize locations externally.

Frequently Asked Questions About create geotagged PDFs from photos

Section spacer

How do I preserve EXIF data when converting WebP to PDF?

The most reliable method is to embed the original WebP binary into the PDF page stream without re-encoding, and/or copy the EXIF fields into the PDF's XMP metadata. Use tools like exiftool to extract and normalize EXIF to JSON, then write XMP into the PDF metadata stream. Confirm with exiftool -json file.pdf after conversion.

Can I extract GPS from a PDF later and plot it on a map?

Yes. Use extraction tools (exiftool or a small parser) to read XMP or embedded image EXIF. Convert coordinates to decimal degrees if necessary and output GeoJSON for mapping libraries. Many production workflows build a small CSV or GeoJSON sidecar automatically during conversion for easy visualization.

Is it better to store GPS in PDF XMP or keep image EXIF?

It depends on goals. Keeping EXIF in the embedded image preserves the original file for extraction; copying GPS to PDF XMP makes the PDF self-contained and easier to index. For archival safety, do both: embed the original and write XMP fields. That adds negligible overhead and maximizes interoperability.

How much extra size does embedding GPS metadata add to a PDF?

Very little. GPS-only XMP is usually 1–10 KB per photo depending on how many fields you include. In practical tests with smartphone WebP images (~1.8 MB each), adding XMP increased the PDF by roughly 0.1%–0.5% per image — a minor cost versus the benefit of a self-describing document.

Do common PDF viewers show geotags to end users?

Most PDF viewers do not display GPS data as a first-class field in the UI. The metadata is present for extraction by specialized tools or custom scripts. If you want viewers to see locations directly, include a small caption on the page with coordinates or a tiny map thumbnail derived from the GPS values when creating the PDF.

Section spacer

Advertisement