Expanding links from a webpage (Hacker News)
A common use case for the crul query language is taking advantage of expanding stages to open many pages from a single webpage and return the results as a consolidated data set. For example, we may have a recipe site with many recipes listed in a recipe directory. We can use crul to get links to all the recipes in the directory, then expand each of those links and filter for recipe ingredients or another use case.
Example: Hacker News Comments​
Full Query​
open https://news.ycombinator.com/news
|| find comments
|| filter "(nodeName == 'A') and (parentElement.attributes.class == 'subline')"
|| open $attributes.href$
|| filter "(attributes.class == 'comment')"
Stage 1-3: Filtering for specific links​
open https://news.ycombinator.com/news
|| find comments
|| filter "(nodeName == 'A') and (parentElement.attributes.class == 'subline')"
The first stage will open the Hacker News site and process the page into a tabular structure. Think of crul as browser that is opening this page and rendering the content, fulfilling network requests, etc., then converting that rendered content a tabular format.
Next we will find
the keyword comments
. This helps to narrow down our data set to only rows that contain the string comments
somewhere in the row values.
We will next provide a filter
expression that narrows down our result set to just links to comment sections. We now have a list of links to comments to pass into our next expanding stage.
Stage 4-5: Opening links and filtering for comments​
...
|| open $attributes.href$
|| filter "(attributes.class == 'comment')"
With our list of links to comments, we will open
each link asynchronously (throttled/limited by domain policy and available browser workers) and then filter
the results to only include elements on the page that contain a comment.
We now have a data set of most comments from the top postings on Hacker News.
Note: There could be some missing comments due to possible expandable sections, but this is beyond the scope of this example!