Visualization
I used Selenium to web-scrape the Fortune 100 website and drew a radial chart based on revenue and market value.
Selenium
D3.js
I usually use BeautifulSoup
in Python to web-scrape data based on the page’s HTML code. However, the Fortune 100 has some sneaky thing that doesn’t render the HTML code unless you click a button to see more, making BeautifulSoup
useless by itself. I used this blog post to learn how to use Selenium
because someone else faced this issue before me (and solved this problem before me).
I used webdriver
, which allows me to use features like waiting a few seconds to let the next page load. In order to use one, I downloaded a Chrome webdriver and imported it with this line of code.
# Open browser
driver = webdriver.Chrome('/path_to_chromedriver/chromedriver')
url = 'https://fortune.com/fortune500/2019/search/'
driver.get(url)
To overcome the delayed page rendering, my program waited a few seconds before reading off the website.
# Find all the rows on the page
# Wait 10 seconds before thowing an exception
# Calls ExpectedCondition every 500 milliseconds until returns successfully
rows = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'rt-tr-group')))
Finally, this block of code allows me to click the “Next” button directly on the website, which will render the next 10 companies, adding them to the HTML code.
# Go to next page
page += 1
next_button = driver.find_element_by_xpath("//div[contains(@class,'-next')]")
next_button.click()
time.sleep(5)
My first chart was based on a very nice visual I saw on Tableau Public. Tableau can create Polygons by inputting vertices, similar to filling in shapes in CSS using paths. Therefore, I coded this chart using the same concepts and D3-arc.
I was also introduced to using transitions that would reorder the companies and color the appropriate level based on what metrics was selected.
My second chart was based on a college tuition ranking I saw on Visual Capitalist. It also used scales and arcs like the first chart, but it also included circles to show Revenue.
Fun Fact: Technology companies (Apple, Amazon, Google, Microsoft, Facebook, Oracle) tend to have the highest top-line sales.