Wednesday, July 8, 2020

Downloading table frow website to Pandas Dataframe - html to pandas

In some cases it is possible to download only with pandas command:

pd.read_html(url)

If the command returns with the html response "forbidden" we can use the requests library, sending some header information to prevent this error.

Example: downloading world population data from site www.worldometers.info: 

import requests
import pandas as pd

url = r'https://www.worldometers.info/world-population/population-by-country/'

header = {
  "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
  "X-Requested-With": "XMLHttpRequest"
}

r = requests.get(url, headers=header)

#world population - first table of the website
dfpop=pd.read_html(r.text)[0]