Key Commands
The crul query language is extensive, and allows for many specific operations to help transform a data set into what you need. However there is a much smaller number of key commands that are most frequently used when authoring queries.
These key commands are broken into 4 categories, let's take a look at each of these categories and key commands in additional detail below.
Key Data Retrieval Commands​
Data retrieval commands are used to retrieve data from a web source, such as an API, web page, etc.
api
​
The api
command allows you to make REST requests to an endpoint, and returns the response in a tabular form. Currently this command supports XML, CSV, and JSON response formats, and will return other formats as raw data.
api get https://api.github.com/orgs/netflix/members
open
​
The open
command will open a web page, fulfill network requests, render javascript, etc. and process the page's content into a tabular data set.
open https://news.ycombinator.com
requests
​
The requests
command will open a web page, and monitor network requests. Once the page has fully loaded, the response will include a rich data set including request sources and destinations, full request and response payloads, timing data, and more.
requests https://news.ycombinator.com
addcolumn
​
The addcolumn
command can be used to add a new column to a data set, and can include tokens to create a new column containing the values of other columns in a row:
addcolumn newcolumn "the oldcolumn value is: $oldcolumn$"
seed
​
The seed
command allows you to provide an array of JSON objects, each of which will become a row to process.
seed '[{"col1": "val1", "col2": "val2"},{"col1": "val3", "col2": "val4"}]'
Key Data Filtering Commands​
Data filtering commands are used to limit our data set according to keywords and specific values.
find
​
- The
find
command will do a text search for the provided string in each row, a row is included in the results if the string exists somewhere in the row.
open https://news.ycombinator.com
|| find comments
filter
​
The filter
command allows you to run more complex expressions comparing column values in each row to one another, to specific values, with boolean logic. See the filter expressions documentation for more details on constructing filters.
open https://news.ycombinator.com
|| filter '(nodeName == "A")'
table
​
The table
command will only include the provided columns. This a great way to reduce a data set for better performance, or when you want a clean final set of results.
open https://news.ycombinator.com
|| table nodeName attributes.href innerText
head
​
The head
command can be used only include the first N rows of results. This is helpful in developing queries as it allows you to limit expansion while testing. For example, you may have a result set with 20 links that you want to expand, but to test, you might first use ... || head 1
prior to the the api
/open
/requests
to limit the expansion to only the first link.
devices
|| head 3
unique
​
The unique
command will remove duplicate rows from a particular column.
open https://news.ycombinator.com
|| unique nodeName
contains
​
The contains
command can be used for finer grained filtering than the find
command, you can specify a column to check for a particular string, rows containing that string in the provided column will be included in the results. Also see the regex
command for similar functionality with regex paterns and the excludes
for the opposite use case (remove rows containing a string).
devices
|| contains name "iP"
Key Data Import/Export Commands​
Data import/export commands are used to either start a query by importing data from an existing local or external data set, or to export results to local or external stores (such as Kafka, Amazon S3, etc.)
freeze
​
The freeze
command can either store results locally to a file (this will overwrite if the frozen results already exist), which can be thawed (see below), or push results to a preconfigured store using the --store
flag, such as a Kafka topic, an Amazon S3 Bucket, etc.
open https://news.ycombinator.com
|| freeze hn_home_page_raw
thaw
​
The thaw
command will fill a query pipeline with previously locally frozen results. It is also possible to thaw frozen results from read/write external stores, such as Amazon S3.
Also, if you upload your own JSON, NDJSON, or CSV file to the cellar, you will be able to fill a query pipeline with it using the thaw
command.
open https://news.ycombinator.com
|| freeze hn_home_page_raw
|| thaw hn_home_page_raw
Key Data Processing Commands​
Data processing commands are used to convert/flatten, merge, or compare data sets.
normalize
​
The normalize
command takes a column containing an array, and expands each element of the array into its own row, while keeping the top level columns in the new rows. If you have a large number of columns with keys like data.0.thing1
, data.1.thing1
, data.2.thing1
, (note the array index; 0, 1, 2, ...) you can normalize the data
column.
api get https://pokeapi.co/api/v2/pokemon
|| normalize results
join
​
The join
command can be used to join two datasets based on a shared key. It is based on the concept of labeled stages, which means that you must first label an older stage before joining it with the results of the current stage.
seed '[{"shared": "hi", "unique1": "value1"}]' --labelStage "joinme"
|| seed '[{"shared": "hi", "unique2": "value2"}]'
|| join shared joinme
sort
​
The sort
command will sort results in the provided column. Use the --order
flag to determine if the results are sorted in ascending or descending order.
devices
|| sort viewport.width
diff
​
The diff
command can be used to get results that don't exist in a previous set of results. This is particularly powerful in combination with the freeze
command, as it allows for only new results to be pushed.
seed '[{"shared": "hi"}]'
|| freeze todiff
|| seed '[{"shared": "hi"},{"new": "hello"}]'
|| diff todiff