Social: scraping LinkedIn follower counts with Python

Update January 2022: LinkedIn has changed their interface and the following code is now obsolete.

Introduction

This code was completed as a final project deliverable for my Master's Program pre-requisite, Introduction to Python at Boston University's Metropolitan College.

I had previously attempted to automate this web scraping using R. I was successful with other social channels (like Facebook), but LinkedIn proved elusive. I have been manually
pulling numbers for several companies every month as part of our competitive analysis.

I originally submitted a more extensive code base, since the code had to work even if the user hadn't entered credentials and so on. I've since simplified it and the following code is what I currently have deployed for my regular monthly reporting.

STEP ONE: Create your account file

Create an Excel file with the company names (column A), the URL of their LinkedIn account (column B), and the CSS selector of the aggregate number of followers (column C).

To find the CSS selector:

  • Start on their LinkedIn page
  • Right click on the follower number (be sure not to confuse it with the number of employees) and click "inspect"
  • Look at the source code to make sure it looks like the right code (you may need to right click and inspect that element again, as it doesn't always jump to the right spot)
  • Right click on the code in the inspect window. One of the options in the menu should be "copy" with an arrow. From the copy submenu, select "Selector." This is the value that goes in the Excel file.
  • In my experience, most of the pages will have the same CSS selector for this element but not all will, particularly if you're an admin for one of the pages.
  • I have also used the Selector Gadget extension in Chrome, but found that for this project, some of the results were hit or miss.

STEP TWO: Create your credentials file

Create an Excel file with your credentials. I include the site (column A), login name (column B), and password (column C) and I use the same credentials file for different channels so that I don't hardcode any credentials.

Keep in mind that LinkedIn will occasionally check that you're a real person by throwing up a verification test.

STEP THREE: Run the code

import pandas as pd
from bs4 import BeautifulSoup 
from selenium import webdriver
from datetime import date

month_start = date.today().replace(day=1)
BROWSER = webdriver.Chrome(executable_path='{Insert the path to your executable file}')

def get_credentials(site):
    creds = pd.read_excel('login_creds.xlsx')
    username = creds.loc[creds['Site']==site,'Login']
    password = creds.loc[creds['Site']==site,'Password']
    return {'Username': username, 'Password': password}

When I run the following login process, it assumes that the login will be successful because I’ve worked out all the kinks. What it does is log in and then check for the profile-rail-card, which is an element of my profile that tells me that I’m logged in. If the login is not working for you, then play around with the result of this function. The function should return a result that is not an empty list. If it returns an empty list, then your login has failed.

def login_linkedin(site):
    """Opens a LinkedIn login window and inputs the credentials"""
    print('Logging you in to LinkedIn...')
    USERNAME = get_credentials(site).get('Username')
    PASSWORD = get_credentials(site).get('Password')
    BROWSER.get('https://www.linkedin.com/login?fromSignIn=true&trk=\guest_homepage-basic_nav-header-signin')
    elementID = BROWSER.find_element_by_id('username')
    elementID.send_keys(USERNAME)
    elementID = BROWSER.find_element_by_id('password')
    elementID.send_keys(PASSWORD)
    elementID.submit()
    soup = BeautifulSoup(BROWSER.page_source, 'lxml')
    result = soup.select('.profile-rail-card__actor-link')
return result
def get_linkedin(company, url, css): """Opens a specific webpage, pulls the page source, then uses CSS selectors to strip follower numbers using BeautifulSoup""" BROWSER.get(url) soup = BeautifulSoup(BROWSER.page_source, 'lxml') followers = [] result = soup.select(css) for i in str(result): if i.isdigit(): followers.append(i) followers = ''.join(followers) print(f'Checking LinkedIn follower counts for {company}...{followers}') return followers login_linkedin('LinkedIn') li = pd.read_csv('linkedin_urls.csv') li['Followers'] = li.apply(lambda row: get_linkedin(row.company, row.URL, row.Selector), axis=1) out_path = "filename" + str(month_start) + '.xlsx' li.to_excel(out_path)