Context
We're building ANNCSU Viewer, an open-source address viewer for Italy's national address archive (~14M addresses). We adopted several architectural patterns from geocoding-sdk — H3 tiling, Jaccard similarity, LRU cache, smart query detection — but had to reimplement them because the SDK is tightly coupled to the Saudi Arabia dataset schema.
We'd like to contribute upstream to make the SDK reusable across different countries and schemas.
Problem
The GeoSDK class in geocoder-h3.ts (~1685 lines) is a monolith that mixes:
- DuckDB WASM lifecycle and tile loading (generic)
- H3 tile index management (generic)
- LRU caching (generic)
- FTS/BM25 and Jaccard search (generic algorithm, schema-specific field names)
- Arabic/English language detection (Saudi-specific)
- Address field references like
full_address_ar, full_address_en (Saudi-specific)
- Admin hierarchy with Saudi region names (Saudi-specific)
This makes it impossible to use the SDK with a different dataset without forking the entire class.
Proposal
Refactor into composable modules:
1. @tabaqat/core — Generic infrastructure
- DuckDB WASM initialization and lifecycle
- H3 tile index loading and spatial filtering
- HTTP tile fetching with parallel downloads
- LRU cache (search cache + admin cache + grid-based cache)
2. @tabaqat/search — Pluggable search engine
- FTS/BM25 index creation and querying
- Jaccard similarity fallback
- Multi-term CONTAINS filtering
- Configurable via a
SearchConfig interface (field names, stemmer, etc.)
3. @tabaqat/schema — Schema adapter interface
interface SchemaAdapter {
// Field mappings
addressFields: string[] // Fields to search
displayAddress: (row: Record<string, unknown>) => string
// Optional features
language?: {
detect: (query: string) => string
fields: Record<string, string> // language → field name
stemmers: Record<string, string> // language → stemmer
}
// Municipality / admin hierarchy
municipality?: {
nameField: string
codeField: string
}
// Postcode
postcode?: {
pattern: RegExp
field: string
}
}
Example adapters:
// Saudi Arabia (current behavior, no breaking changes)
const saudiAdapter: SchemaAdapter = {
addressFields: ['full_address_ar', 'full_address_en'],
displayAddress: (row) => row.full_address_en as string,
language: {
detect: (q) => /[\u0600-\u06FF]/.test(q) ? 'ar' : 'en',
fields: { ar: 'full_address_ar', en: 'full_address_en' },
stemmers: { ar: 'arabic', en: 'porter' },
},
municipality: { nameField: 'district_name_en', codeField: 'district_id' },
postcode: { pattern: /^\d{5}$/, field: 'postcode' },
}
// Italy (ANNCSU)
const anncsuAdapter: SchemaAdapter = {
addressFields: ['ODONIMO'],
displayAddress: (row) =>
`${row.ODONIMO} ${row.CIVICO}${row.ESPONENTE ? ' ' + row.ESPONENTE : ''}`,
municipality: { nameField: 'NOME_COMUNE', codeField: 'CODICE_ISTAT' },
postcode: { pattern: /^\d{5}$/, field: 'CAP' },
}
4. @tabaqat/geocoder — High-level API
const geocoder = createGeocoder({
dataUrl: 'https://data.example.com/tiles',
schema: anncsuAdapter,
// Optional overrides
h3Resolution: 5,
maxTiles: 50,
cacheSize: 100,
cacheTtlMs: 5 * 60 * 1000,
})
const results = await geocoder.geocode('Roma, Via Appia 1')
What we'd contribute
- Schema adapter interface and refactoring of field references out of the core
- Italian ANNCSU adapter as a second real-world schema
- Tests for the adapter pattern (ensuring Saudi behavior doesn't break)
- Documentation for creating new country adapters
Context
We're building ANNCSU Viewer, an open-source address viewer for Italy's national address archive (~14M addresses). We adopted several architectural patterns from
geocoding-sdk— H3 tiling, Jaccard similarity, LRU cache, smart query detection — but had to reimplement them because the SDK is tightly coupled to the Saudi Arabia dataset schema.We'd like to contribute upstream to make the SDK reusable across different countries and schemas.
Problem
The
GeoSDKclass ingeocoder-h3.ts(~1685 lines) is a monolith that mixes:full_address_ar,full_address_en(Saudi-specific)This makes it impossible to use the SDK with a different dataset without forking the entire class.
Proposal
Refactor into composable modules:
1.
@tabaqat/core— Generic infrastructure2.
@tabaqat/search— Pluggable search engineSearchConfiginterface (field names, stemmer, etc.)3.
@tabaqat/schema— Schema adapter interfaceExample adapters:
4.
@tabaqat/geocoder— High-level APIWhat we'd contribute