Checkpoints
Checkpoints are a simple yet powerful way of storing state between queries in a stage, or between multiple queries.
Usage​
Setting a checkpoint​
... --checkpoint "api_next_page:result.nextPage"
Using a checkpoint​
$CHECKPOINTS.{CHECKPOINT NAME}$
​
You can access any checkpoint value using the above token format. This is used with setting checkpoints using the --checkpoint
flag.
api get https://www.{AUTHENTICATED-API}.com?nextPage=$CHECKPOINTS.api_next_page$
Use cases​
Scheduled Queries​
Checkpoints are useful for avoiding duplicate work in scheduled queries. A scheduled query can store a value from a stage's results as a checkpoint, that can then be used in the next scheduled run of the query. Examples of checkpoint values could include last visited page, timestamp, hashes, etc.
Example:​
api get https://www.{AUTHENTICATED-API}.com?nextPage=$CHECKPOINTS.api_next_page$ --checkpoint "api_next_page:result.nextPage"
The first time the query runs, the nextPage
value will be empty, however the checkpoint flag will set a checkpoint named api_next_page
to the value of the result.nextPage
column in the first row of results. Next time the query runs, the checkpoint will be retrieved using $CHECKPOINTS.api_next_page$
and nextPage
will use the checkpoint value.
Between Stages​
Although there are ways to maintain state across stages (see --enrich
, --labelStage
, --appendStage
, etc.), checkpoints provide a simple way to set a value from a stage's results, and access that value in a later stage.
Example:​
api get https://www.{AUTHENTICATED-API}.com?nextPage=$CHECKPOINTS.api_next_page$ --checkpoint "api_next_page:result.nextPage"
|| ...
|| ...
|| addcolumn api_next_page $CHECKPOINTS.api_next_page$