-
Notifications
You must be signed in to change notification settings - Fork 526
Description
Bug Summary
Searching "query": "time:>=T" returns documents with time < T, where:
Tis a numeric literal containing a valid integer nanosecond unix timestamp whereT % 1000 > 0- Indexed document(s) exist with nanoseconds timestamp
xwhereT-N < x < Tfor "small N" (*) - Field
timein the index is configured withfast_precision: nanosecondsandoutput_format: unix_timestamp_nanos.
(*) Black-box behavior probing gives the impression that timestamps are being erroneously rounded down to lower precisions internally at some point during query execution, but further probing shows that the "boundary" points (the smallest U>T for which time:>=U stops returning documents at time T) are very unusual and don't seem to obviously correspond to either the integer microsecond or floating point nanosecond rounding error boundaries we'd expect. See "Example Repro" section below for more details.
Repro Steps
- Ingest some data with nanosecond precision timestamps (reference index configuration is provided below)
- Select some sample timestamp
Tfrom any document, as long as the last 3 decimal digits ofTare not999. It's fine if the last 3 digits are all000. - Use the REST API to search
"query": "time:>={T+1}", "sort_by":"-time", "max_hits":1"whereTthe sample timestamp selected above. - You will find that the document at time
Tis incorrectly included in the result! - Optional: Experiment with different time offsets (incorrect behavior repros with offsets much larger than 1 nano, though exactly what offset ahead of
Tstops incorrectly returning the document at timeTseems unusual and doesn't align with most normal precision rounding boundaries one would expect).
Example Repro
❯ curl -X POST "http://127.0.0.1:7280/api/v1/fluentbit-logs/search" \
-H "Content-Type: application/json" \
-d '{"max_hits":1,"query":"time:>=1751128201918247001","sort_by":"-time"}'
{
"num_hits": 7354717336,
"hits": [
{
"time": 1751128201918247000
/* ... */
}
],
"elapsed_time_micros": 291577,
"errors": []
}
Bisecting this "time:>=T+N" query to find when Quickwit stops returning the document at time T found this:
❯ curl -X POST "http://127.0.0.1:7280/api/v1/fluentbit-logs/search" \
-H "Content-Type: application/json" \
-d '{"max_hits":1,"query":"time:>=1751128201918247595","sort_by":"-time"}'
...
"time": 1751128201918247000
❯ curl -X POST "http://127.0.0.1:7280/api/v1/fluentbit-logs/search" \
-H "Content-Type: application/json" \
-d '{"max_hits":1,"query":"time:>=1751128201918247596","sort_by":"-time"}'
...
"time": 1751128201918249000
Notice how the query for time:>= (T + 595) still incorrectly returns the document at time T, while querying time:>=(T + 596) finally got it to move on to the (presumably) next document >= T.
Expected behavior
Quickwit should NOT have returned a "hit" document with "time": 1751128201918247000 in response to a query for time:>=1751128201918247595, since 1751128201918247000 < 1751128201918247595 (equivalently 000 < 595).
In general:
- Querying
time:>=Tshould never return documents withtime < T. - Querying
field:>=Xshould never return documents withfield < X.
Some black-box brainstorming
This odd boundary at least suggests that whatever the issue is internally, it's not as simple as integer nanoseconds being rounded down to integer microseconds, and probably not even nanoseconds being simply rounded to a IEEE-753 64-bit float!
If the timestamps were just being rounded into f64, then we'd expect the boundary point to be N > 40, rather than the observed boundary of N > 595 (since (1751128201918247000.0 >= 1751128201918247040.0) == true and (1751128201918247000.0 >= 1751128201918247041.0) == false).
Configuration
> quickwit --version
Quickwit 0.8.2 (x86_64-unknown-linux-gnu 2024-09-03T11:26:51Z 0f28194)
index.yaml
version: 0.8
index_id: fluentbit-logs
doc_mapping:
mode: lenient
field_mappings:
- name: time
type: datetime
input_formats:
- iso8601
output_format: unix_timestamp_nanos
fast_precision: nanoseconds
fast: true
# ...
timestamp_field: time
# ...