SOLR-13309: Add IntRangeField for Lucenes IntRange#4141
SOLR-13309: Add IntRangeField for Lucenes IntRange#4141gerlowskija wants to merge 17 commits intoapache:mainfrom
Conversation
This commit adds a new field type, IntRangeField, that can be used to
hold singular or multi-dimensional (up to 4) ranges of integers.
Field values are represented using brackets and the "TO" operator, with
commas used to delimit dimensions (when a particular field is defined as
having more than 1 dimension), e.g.
- [-1 TO 5]
- [1,2 TO 5,10]
- [1 TO 1]
IntRangeField does not support docValues or uninversion, meaning it's
primarily only used for querying. The field can be stored and returned
in search-results. Searches on these range-fields mostly rely on a
QParser, {!myRange}, which supports "intersects", "crosses", "within",
and "contains" semantics via a "criteria" local param. e.g.
- {!myRange field=price_range criteria=within}[1 TO 5]
Matches docs whose 'price_range' field falls fully within [1 TO 5].
A doc with [2 TO 3] would match; [3 TO 6] or [8 TO 10] would not.
- {!myRange field=price_range criteria=crosses}[1,10 TO 5,20]
Matches docs whose 'price_range' field is partially but not fully
contained within [1,10 TO 5,20]. A doc with [2,11 TO 6,21] would
match, but [3,11 TO 5,19] would not.
TODO
- renaming of QParser, 'myRange' stinks
- general cleanup
- switch around 'external', 'internal', 'native' representations.
|
Still a lot of cleanup to be done here, but I thought this was ready to publish as a "draft" so folks can provide feedback on the general approach. |
|
So, I wanted to highlight some design choices for potential reviewers - these are decisions I made in putting this together that 100% need a second set of eyes:
Still TODO
|
I ended up reversing course on this. The field can now be queried using the Lucene, etc. QParser. Queries sent in this way will assume "contains" semantics (i.e. match documents that have a field value that entirely contains the query-range). As a bit of syntax-sugar, users may specify only a single bound or point as a shorthand for I'm pretty uncertain about all the naming and other syntax here, so I've marked IntRangeQParser and the underlying field type as 'experimental' for now. The underlying implementation is unlikely to change, but IMO it's very likely we'll be tweaking the syntax a bit over time.... |
https://issues.apache.org/jira/browse/SOLR-13309
Description
Lucene offers a variety of 'Range' field types, where the value stored in the field is itself a range (e.g.
[1 TO 5]). Lucene then allows efficient search on these using itsRangeFieldQuery.Solr offers no similar functionality, despite having access to these underlying Lucene capabilities. We should expose start exposing these, starting with what's probably the most popular option, ints.
Solution
This commit adds a new field type, IntRangeField, that can be used to hold singular or multi-dimensional (up to 4) ranges of integers.
Field values are represented using brackets and the "TO" operator, with commas used to delimit dimensions (when a particular field is defined as having more than 1 dimension), e.g.
[-1 TO 5][1,2 TO 5,10][1 TO 1]IntRangeField does not support docValues or uninversion, meaning it's primarily only used for querying. The field can be stored and returned in search-results. Searches on these range-fields rely on a new QParserPlugin implementation,
{!numericRange}, which supports "intersects", "crosses", "within", and "contains" semantics via a "criteria" local param. e.g.{!numericRange field=price_range criteria=within}[1 TO 5]Matches docs whose 'price_range' field falls fully within [1 TO 5]. A doc with [2 TO 3] would match; [3 TO 6] or [8 TO 10] would not.{!numericRange field=price_range criteria=crosses}[1,10 TO 5,20]Matches docs whose 'price_range' field is partially but not fully contained within [1,10 TO 5,20]. A doc with [2,11 TO 6,21] would match, but [3,11 TO 5,19] would not.Tests
New test classes: IntRangeFieldTest and IntRangeQParserPluginTest.
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.