feat(rust/sedona-raster-functions): add RS_Contains, RS_Intersects, RS_Within UDFs#615
feat(rust/sedona-raster-functions): add RS_Contains, RS_Intersects, RS_Within UDFs#615Kontinuation wants to merge 12 commits intoapache:mainfrom
Conversation
d54ef15 to
38aab51
Compare
38aab51 to
cda9308
Compare
There was a problem hiding this comment.
Pull request overview
Adds raster spatial predicate UDFs to sedona-raster-functions, including CRS-aware handling and new executor iteration patterns to support raster↔geometry and raster↔raster comparisons.
Changes:
- Introduces
RS_Intersects,RS_Contains, andRS_WithinUDF implementations (raster-geom, geom-raster, raster-raster) using raster convex hull WKB. - Adds
crs_utilshelpers for CRS resolution and WKB/coordinate transformation via the global PROJ engine. - Extends
RasterExecutorwithexecute_raster_wkb_crs_voidandexecute_raster_raster_void, plus tests and benchmark entries.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| rust/sedona-raster-functions/src/rs_spatial_predicates.rs | New UDF kernels + CRS-aware predicate evaluation + unit tests. |
| rust/sedona-raster-functions/src/register.rs | Registers the new spatial predicate UDFs in the default function set. |
| rust/sedona-raster-functions/src/lib.rs | Exposes new crs_utils and rs_spatial_predicates modules. |
| rust/sedona-raster-functions/src/executor.rs | Adds new raster↔raster and raster↔(WKB,CRS) iteration helpers + tests. |
| rust/sedona-raster-functions/src/crs_utils.rs | New CRS resolution + WKB/coord reprojection utilities. |
| rust/sedona-raster-functions/benches/native-raster-functions.rs | Adds benchmark cases for the new UDFs. |
| rust/sedona-raster-functions/Cargo.toml | Adds new deps for tg predicates, CRS transform, and WKB parsing. |
| docs/reference/sql/rs_contains.qmd | Adds SQL reference page for RS_Contains. |
| docs/reference/sql/rs_intersects.qmd | Adds SQL reference page for RS_Intersects. |
| docs/reference/sql/rs_within.qmd | Adds SQL reference page for RS_Within. |
| c/sedona-proj/src/transform.rs | Makes with_global_proj_engine public for cross-crate reuse. |
| c/sedona-proj/src/lib.rs | Makes st_transform module public. |
| Cargo.lock | Lockfile updates for new dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cda9308 to
84b5edd
Compare
paleolimbot
left a comment
There was a problem hiding this comment.
Thank you!
A few patterns I think we should update before merging this but in general this looks very cool!
| //! and (raster, raster). Rasters are compared via their convex hulls. | ||
| //! | ||
| //! CRS transformation rules: | ||
| //! - If a raster or geometry does not have a defined SRID, it is assumed to be in WGS84 |
There was a problem hiding this comment.
In SedonaDB I've tried hard to avoid this assumption (I am guessing this is a hangover from Sedona not propagating CRSes from GeoParquet). We can make it easy to apply this assumption explicitly if it causes issues (e.g., ST_ApplyDefaultCRS() or a Python dataframe level function that does it for all spatial columns).
There was a problem hiding this comment.
This strange semantics comes from Apache Sedona. Should we simply throw an error when one of the input parameter does not have a CRS?
There was a problem hiding this comment.
If both of them don't have a CRS it would technically be OK (but you can throw an error for now).
| executor.execute_raster_wkb_crs_void(|raster_opt, maybe_wkb, maybe_geom_crs| { | ||
| match (raster_opt, maybe_wkb) { | ||
| (Some(raster), Some(geom_wkb)) => { | ||
| raster_wkb.clear(); | ||
| write_convexhull_wkb(raster, &mut raster_wkb)?; |
There was a problem hiding this comment.
I imagine this is not a huge performance issue, but technically this is unnecessarily recomputing the raster convex hull for the raster-is-a-scalar/geometry-is-a-column case.
There was a problem hiding this comment.
The raster-is-a-scalar case should be rare in practice, leaving this case unoptimized to simplify the code should be a sane trade off.
| /// Evaluate a spatial predicate with CRS handling | ||
| /// | ||
| /// Rules: | ||
| /// - If no CRS defined, assume WGS84 | ||
| /// - If both same CRS, compare directly | ||
| /// - Otherwise, try transforming one side to the other's CRS for comparison. | ||
| /// If that fails, transform both to WGS84 and compare. | ||
| fn evaluate_predicate_with_crs<Op: tg::BinaryPredicate>( | ||
| wkb_a: &[u8], | ||
| crs_a: Option<&str>, | ||
| wkb_b: &[u8], | ||
| crs_b: Option<&str>, | ||
| from_a_to_b: bool, | ||
| ) -> Result<bool> { |
There was a problem hiding this comment.
I expected to see with_global_proj_engine() outside the loop with a reference to the engine passed in here. I think it would be better to keep that in the UDF execute because it will be easier to grab the session engine from the ConfigOptions when we get there.
There was a problem hiding this comment.
Moved with_global_proj_engine to call once per batch.
| /// Cached default CRS (WGS84 longitude/latitude). Initialized once on first access. | ||
| static DEFAULT_CRS: LazyLock<Arc<dyn CoordinateReferenceSystem + Send + Sync>> = | ||
| LazyLock::new(|| lnglat().expect("lnglat() should always succeed")); |
There was a problem hiding this comment.
I commented above too, but I don't think this is a good place to apply a default CRS. In SedonaDB it is practically quite difficult to get something without a CRS (when there's a raster reader, the reader would be a good place to apply that).
| /// Run a closure with the active CRS engine, abstracting away the engine choice. | ||
| /// | ||
| /// We keep this API engine-agnostic to allow future engines beyond PROJ; the | ||
| /// current implementation uses PROJ via `with_global_proj_engine`. | ||
| pub fn with_crs_engine<R, F: FnMut(&dyn CrsEngine) -> Result<R>>(mut func: F) -> Result<R> { | ||
| with_global_proj_engine(|engine| func(engine)) | ||
| } |
There was a problem hiding this comment.
I think we can apply this at the UDF level where it's a bit more explicit (and amortized over a batch).
| pub fn crs_transform_wkb( | ||
| wkb: &[u8], | ||
| from_crs: &dyn CoordinateReferenceSystem, | ||
| to_crs: &dyn CoordinateReferenceSystem, | ||
| ) -> Result<Vec<u8>> { |
There was a problem hiding this comment.
The identical CRS check should probably happen before this to avoid the clone. For a bbox the clone is trivial but this could also be a complex geometry (e.g., division boundary) on the other side of the predicate.
There was a problem hiding this comment.
Moved the CRS equivalence check before calling this function. This will eliminate one allocation and WKB buffer copying when CRSes are equivalent.
| fn crs_string_from_sedona_type(sedona_type: &SedonaType) -> Result<Option<String>> { | ||
| match sedona_type { | ||
| SedonaType::Wkb(_, crs) | SedonaType::WkbView(_, crs) => { | ||
| if let Some(crs) = crs { | ||
| Ok(Some( | ||
| crs.to_authority_code()?.unwrap_or_else(|| crs.to_json()), | ||
| )) |
There was a problem hiding this comment.
I believe we have a CRS abbreviator already (and the executor should probably not automatically abbreviate so that functions can use the full definitions if they are provided).
There was a problem hiding this comment.
I decide to change execute_raster_wkb_crs_void to pass &dyn CoordinateReferenceSystem instead of a CRS string to the closure.
|
|
||
| impl ItemWkbAccessor { | ||
| #[inline] | ||
| fn get(&self, i: usize) -> Option<&[u8]> { |
There was a problem hiding this comment.
The pattern of switching on type for every element is probably not a great one.
Possibly a Box<dyn Iterator<Item = ...>>? In the context of raster iteration where it is less likely there will be billions of them this is probably OK. You could also use that for raster/CRS iteration and use the existing executor to handle the arrays (it's almost always free to cast the CRS array to string view and there is less variation in the raster type than the geometry type).
There was a problem hiding this comment.
I have thought about eliminating this dynamic dispatching when implementing it, and I deliberately leave it that way. Here are the reasons:
Box<dyn Iterator<Item = ...>>performs dynamic dispatching implicitly, I'm not sure if it is better than pattern-matching a enum.- RS functions involving a raster and a geometry input are usually heavy weight, the cost of dispatching does not add up too much overhead.
There was a problem hiding this comment.
Can you add a comment here before all the switches explaining this?
There was a problem hiding this comment.
Comment added for both ItemWkbAccessor and GeomWkbCrsAccessor
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
…to the closure instead of Crs
|
The benchmark no longer works because of the semantics changes. I have to add a |
paleolimbot
left a comment
There was a problem hiding this comment.
This looks great...thank you!
I would prefer to avoid adding ST_ApplyDefaultCrs()...documented functions carry some compatibility/stability considerations and we just need this to exist for a benchmark. I left two suggestions about how I think we can avoid this!
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| title: ST_ApplyDefaultCRS |
There was a problem hiding this comment.
I would give this an SD_* prefix (and then you don't have to document it). I am not sure we want documentation for this because it's an implementation detail (all actual users should just do ST_SetSRID() or ST_SetCrs()).
Alternatively, just put the definition in the raster benchmarks file since that's the only place we need it!
There was a problem hiding this comment.
Renamed ST_ApplyDefaultCRS to SD_ApplyDefaultCRS and moved it to the benchmark file.
| fn crs_from_sedona_type(sedona_type: &SedonaType) -> Crs { | ||
| match sedona_type { | ||
| SedonaType::Wkb(_, crs) | SedonaType::WkbView(_, crs) => crs.clone(), | ||
| _ => None, | ||
| } | ||
| } | ||
|
|
||
| fn is_item_crs_type(sedona_type: &SedonaType) -> bool { | ||
| matches!( | ||
| sedona_type, | ||
| SedonaType::Arrow(DataType::Struct(fields)) | ||
| if fields.len() == 2 && fields[0].name() == "item" && fields[1].name() == "crs" | ||
| ) | ||
| } |
There was a problem hiding this comment.
No need to do it here but these could both be methods on the SedonaType
There was a problem hiding this comment.
We can do this in a separate PR and replace other ad-hoc item-crs schema checks with SedonaType::is_item_crs_type.
Summary
RS_Contains,RS_Intersects, andRS_Withinspatial predicate UDFs tosedona-raster-functionscrs_utilsmodule for CRS resolution and coordinate transformation helpersexecutor.rswithexecute_raster_wkb_crs_void(raster+geometry iteration) andexecute_raster_raster_void(two-raster iteration) patternssedona-proj::st_transformmodule andwith_global_proj_enginefunction public for cross-crate CRS transformationnative-raster-functions.rs