GUIDE : Converting htm Web Pages Into csv Text Files with Gnumeric on Ubuntu 9.10
I found another way to convert data in file format into another text format. This time extracting and converting data from http://econ.mpob.gov.my/upk/daily/bh_ffbapr10.htm into a csv (comma-separated value) format. After the data has been converted, then the Palo ETL (extract-transform-load) tool can be used to load into a MySQL database. I have used jodconverter before but it was not able to convert from htm to csv.
In this method I am using Gnumeric as the converter and it is run on the command line on my Ubuntu Linux Netbook Remix 9.10. I am using the command line, so that it can be used as a batch process. I think the same method can be used on any other Linux distribution which has Gnome desktop environment.
The steps are really very simple (# indicates the command line prompt) :
- Download and install Gnumeric (this is only needed once):
# sudo apt-get install gnumeric - Download the web page :
# wget http://econ.mpob.gov.my/upk/daily/bh_ffbapr10.htm - Convert the web page into csv format :
# ssconvert bh_ffbapr10.htm bh_ffbapr10.csv
This also works for the following xls files which we had problems with jodconverter before:
- http://www.bnm.gov.my/files/publication/msb/2010/2/xls/1.3.xls
- http://www.bnm.gov.my/files/publication/msb/2010/2/xls/1.16.xls
- rajaiskandar @krimnet's blog
- Login to post comments
Copyright © 2008-2011