Converting an HTML Table to a Dataset
HTML tables allow web developers to arrange data into rows and columns. A table in HTML consists of table cells inside rows and columns which can be easily converted to a dataset using the parseHTMLTable
command.
Let's take a look at a pair of examples.
Example 1​
Full Query​
echo "<table>
<tr>
<th>Company</th>
<th>Contact</th>
<th>Country</th>
</tr>
<tr>
<td>Alfreds Futterkiste</td>
<td>Maria Anders</td>
<td>Germany</td>
</tr>
<tr>
<td>Centro comercial Moctezuma</td>
<td>Francisco Chang</td>
<td>Mexico</td>
</tr>
</table>"
|| parseHTMLTable echo
Stage 1: Making the sample HTML table​
echo "<table>
<tr>
<th>Company</th>
<th>Contact</th>
<th>Country</th>
</tr>
<tr>
<td>Alfreds Futterkiste</td>
<td>Maria Anders</td>
<td>Germany</td>
</tr>
<tr>
<td>Centro comercial Moctezuma</td>
<td>Francisco Chang</td>
<td>Mexico</td>
</tr>
</table>"
The echo
command is used to generate an HTML column called echo
with a value of the first argument passed.
Stage 2: Converting the HTML Table into a dataset.​
...
|| parseHTMLTable echo
This stage uses the parseHTMLTable
command to construct a dataset from the echo
cell value.
The HTML table will be parsed and converted into a data set will contain four columns:
Company
, Contact
, Country
, hash
, and sequence
. Company
, Contact
and Country
are the HTML table columns. hash
is the md5hash of a rows values. sequence
is the ordinal position of the row which is 0
based.
Example 2​
Full Query​
open https://www.w3schools.com/html/html_tables.asp --html
|| filter "(nodeName == 'TABLE')"
|| head 1
|| parseHTMLTable outerHTML
Stage 1: Open a web page​
open https://www.w3schools.com/html/html_tables.asp --html
Open a web page in a browser and wait for all JavaScript and external assets to load. We use the -html
flag to include the HTML source of the rendered web page for each returned element.
NOTE: The -html
flag has speed implications as it includes both the outerHTML and innerHTML per element.
Stage 2-3: Filtering for the first table​
...
|| filter "(nodeName == 'TABLE')"
|| head 1
The filter will find and match all rows that are TABLE
elements. We pluck out the first table by limiting the rows returned by using the head
command followed by the constraint.
Stage 4: Parse/convert the html table to a dataset.​
...
|| parseHTMLTable outerHTML
Using the parseHTMLTable
we can convert the full HTML table source found in the outerHTML
column.
NOTE: outerHTML
contains an HTML element's self and inner contents whereas the innerHTML
contains it's inner contents only.