Using gwebcmd

Using gwebcmd

Basics

gwebcmd solves two tasks:

  1. Downloading data
  2. Converting data to CSV

You may use a single WebToCSV command or two commands: WebToText and TextToCSV.

For example:

gwebcmd.exe WebToCSV http://www.nasdaq.com/symbol/aapl/dividend-history aapl.csv

gwebcmd.exe WebToText http://www.nasdaq.com/symbol/aapl/dividend-history aapl.htm

gwebcmd.exe TextToCSV aapl.htm aapl.csv

gwebcmd allows converting HTML, XML, JSON, CSV, and plain text files.

Below we discuss specific topics including tuning parsers.

You may find examples of various cases in the downloaded package.

Parsing HTML

gwebcmd allows extracting data from HTML tables.

You may tune the parser using the /rootPath option with a table number.

For example, you may download dividends from nasdaq.com like

gwebcmd.exe WebToText http://www.nasdaq.com/symbol/aapl/dividend-history aapl.htm

Then you may get available tables using the command:

gwebcmd.exe HtmlTables aapl.htm tables.htm

Open tables.htm and find a number of the desired table. Then use the command like this:

gwebcmd.exe TextToCSV aapl.htm aapl.csv /rootPath=6

or

gwebcmd.exe WebToCSV http://www.nasdaq.com/symbol/aapl/dividend-history aapl.csv /rootPath=6

Parsing XML

gwebcmd allows extracting regular data from XML.

For example, we have the following file, test.xml:

<?xml version="1.0" encoding="utf-8"?>
<root>
    <parent>
        <row id="1"><f1>data11</f1><f2>data12</f2></row>
        <row id="2"><f1>data21</f1><f2>data22</f2></row>
        <row id="3"><f1>data31</f1><f2>data32</f2></row>
    </parent>
</root>

If we run the basic command

gwebcmd.exe TextToCSV test.xml test.csv 

we get the following result in test.csv:

id;"f1";"f2"
1;"data11";"data12"
2;"data21";"data22"
3;"data31";"data32"

gwebcmd selects the "best" root of the data. This command is equivalent to

gwebcmd.exe TextToCSV test.xml test.csv /rootPath=root.parent.row

You may change the root path to get data from the specified node. For example:

gwebcmd.exe TextToCSV test.xml test.csv /rootPath=root.parent

returns the following:

row.id;"row.f1";"row.f2"
1;"data11";"data12"
2;"data21";"data22"
3;"data31";"data32"

You may skip required nodes. For example:

gwebcmd.exe TextToCSV test.xml test.csv /rootPath=root.parent /skippedNodes=row.f2

returns:

row.id;"row.f1"
1;"data11"
2;"data21"
3;"data31"

Parsing JSON

Parsing JSON is similar to the parsing XML as the both document formats have hierarchical structures.

For example, a file has the content:

{"root":
  {"parent":[
    {"id":1,"f1":"data11","f2":"data12"},
    {"id":2,"f1":"data21","f2":"data22"},
    {"id":3,"f1":"data31","f2":"data32"}
  ]
}

Run the command:

gwebcmd.exe TextToCSV test.json test.csv

The command returns:

id;"f1";"f2"
1;"data11";"data12"
2;"data21";"data22"
3;"data31";"data32"

Parsing Plain Text

gwebcmd allows getting data from plain text files if the text contains "visible" columns like these:

id  f1      f2
1   data11  data12
2   data21  data22
3   data31  data32

Just use the standard command:

gwebcmd.exe TextToCSV test.txt test.csv

Parsing CSV

You may use gwebcmd to convert CSV files including separators and encoding, and adding additional calculated columns like this:

gwebcmd.exe TextToCSV test.csv test-65001.csv /outputEncoding=65001 /add=Symbol=AAPL;Date=Date()

See available options here.

Customizing Output CSV

gwebcmd has useful options to customize the output.

The most useful options are:

/datetimeformat=<format>

/separator=<separator>|Tab

/add=<header=value>[<separator>...]

For example:

gwebcmd.exe WebToCSV https://finance.yahoo.com/q/hp?s=AAPL aapl.csv /datetimeformat=yyyy-MM-dd /separator=; /add=Symb=AAPL

Batch Files

Here is a simple batch file to get the data for the list of tickers from the tickers.txt file:

@echo off

@for /F %%i in (tickers.txt) do (
    echo %%i
    gwebcmd.exe WebToCSV https://finance.yahoo.com/q/hp?s=%%i %%i.csv /datetimeformat=yyyy-MM-dd /add=Symb=%%i
    gwebcmd.exe sleep 500
)

Please use the sleep mode to make a delay between requests to prevent banning your IP by web servers.

You may do the job in an easy way using gwebcmd task files.

For example, create the task.txt file with following content:

CsvFileName  URL
csv\aapl.csv http://ichart.finance.yahoo.com/table.csv?s=AAPL&ignore=.csv
csv\goog.csv http://ichart.finance.yahoo.com/table.csv?s=GOOG&ignore=.csv

and the batch file can be much simpler:

@echo off

gwebcmd.exe WebToCSV @task.txt /datetimeformat=yyyy-MM-dd /delay=500

Please use the /delay option instead of the sleep mode to make a delay between requests.

Multiple Pages

gwebcmd detects several schemes of the next page URLs.

In such cases, you may use the command like this:

gwebcmd.exe WebToCSV "http://www.google.com/finance/option_chain?q=AAPL&output=json" ^
                  aapl.csv /rootPath=calls,puts /pages=20
This website is using cookies. By continuing to browse, you give us your consent to our use of cookies as explained in our Cookie Policy.