Analyzing Finance Job Market

Published: Thu, Oct 6, 2022

Updated: Mon, Apr 17, 2023

Python Jupyter Lab/Notebook Anaconda GitHub Git Visual Studio Code

Data Manipulation Library:


Data Visualization Libraries:

Seaborn, Plotly

Below is the Jupyter Notebook for this project, generated with Jupyter2Svelte:

Finance Job Market Analysis


As a finance and management information systems major I wanted to learn more about the current (2022) job market before entering within the next couple years.


  • Where are the most jobs located?
  • Which industries have the most jobs?
  • How can I best prepare myself for getting a job, what skills are in demand beyond an education in Finance?
  • Would my MIS degree make me more marketable?

Data Collection

For this project I developed a Python script, linkedin-job-scraper, to scrape public job listings on LinkedIn. LinkedIn is one of the most popular platforms for job seekers, making it one of the best websites to gather data on the job market.


From September 28th through October 4th 2022, I scraped LinkedIn's job listings each day with the search keyword "Finance" and location of United States. Filtering for the last 24 hours, the seniority/experience levels of Internship, Entry Level, and Associate with the job types of Full Time, Internship, and Other. I was looking to collect information on Full Time jobs available to those with about 0-3 years experience in the field ("entry jobs") and these filters would produce the most relevant data for my questions. Queries used LinkedIn's default "Most relevant" sorting in an attempt to get the best data pertaining to the "Finance" keyword.

Because LinkedIn will only display up to 1000 listings per search, I was unfortunately unable to scrape the entirety of the listings posted each day. I used separate search queries for each seniority level and further broke down those queries by filtering each for remote, hybrid, or on-site for the widest coverage. Gathering more data would not be possible without using highly specific filters introducing bias.

Data Structure

Col Definition
date_scraped date the listing was scraped
title title of the job listing
full_url LinkedIn URL of the job posting
company company name
company_url company's LinkedIn URL
location job's location
description raw HTML of the job's description
seniority_level job's seniority level
employment_type job's employment type (Full Time, Part Time etc.)
job_function job's expected functions
industries industries the company is in

Data Cleaning

Import Data

import pandas as pd
import glob

files = glob.glob("data/*.csv")
dfs = []
for file in files:

linkedin = pd.concat(dfs)    
date_scraped title full_url company company_url location description seniority_level employment_type job_function industries
0 2022-10-04 Quantitative analyst (finance) Lucas Group, A Korn Ferry Company Charlotte, NC \n <p>Lucas Group, a Korn Ferry company... Associate Full-time Finance Banking
1 2022-10-03 Accounting and finance associates EVERESTX Talent Solutions Pennsylvania, United States \n <strong>Overview of the Role:</stron... Not Assigned Not Assigned Not Assigned Not Assigned
2 2022-09-28 Associate/ Consulting Associate - Litigation, ... Charles River Associates Washington, DC \n <strong>About Charles River Associat... Entry level Full-time Legal Business Consulting and Services
3 2022-09-28 Associate/ Consulting Associate - Litigation, ... Charles River Associates Chicago, IL \n <strong>About Charles River Associat... Entry level Full-time Legal Business Consulting and Services
4 2022-10-04 Senior Financial Analyst (Remote) Capital Search Group McLean, VA \n Microsoft has become a corporate lea... Not Assigned Not Assigned Not Assigned Not Assigned
date_scraped title full_url company company_url location description seniority_level employment_type job_function industries
count 60017 60017 60017 60016 60017 60017 60016 51890 52288 52288 52288
unique 7 12294 29607 7205 7213 3071 14787 6 4 693 1009
top 2022-09-30 Remote Tax Professional Aston Carter United States \n <strong><u>What You'll Do...<br><br>... Associate Full-time Accounting/Auditing and Finance Not Assigned
freq 11299 2443 7 3339 3339 3738 2474 23092 40980 10160 9708

Drop Duplicates

Out of the 31,383 job listings that were scraped, 29,607 were unique

linkedin.drop_duplicates(subset="full_url", inplace=True)
date_scraped title full_url company company_url location description seniority_level employment_type job_function industries
count 29607 29607 29607 29607 29607 29607 29606 29400 29422 29422 29422
unique 7 12294 29607 7205 7213 3071 14647 6 4 691 1000
top 2022-09-30 Remote Tax Professional Aston Carter United States \n <strong><u>What You'll Do...<br><br>... Associate Full-time Not Assigned Not Assigned
freq 5669 1214 1 1749 1749 1833 1229 11325 20245 8380 8381

Handle Nulls

date_scraped         0
title                0
full_url             0
company              0
company_url          0
location             0
description          1
seniority_level    207
employment_type    185
job_function       185
industries         185
dtype: int64

First lets drop the row with the missing description.

linkedin.dropna(subset="description", inplace=True)
date_scraped         0
title                0
full_url             0
company              0
company_url          0
location             0
description          0
seniority_level    207
employment_type    185
job_function       185
industries         185
dtype: int64

I will fill the rest of the missing data with the string "Not Assigned"

linkedin.fillna("Not Assigned", inplace=True)

Data Types

Int64Index: 29606 entries, 0 to 4625
Data columns (total 11 columns):
 #   Column           Dtype 
---  ------           ----- 
 0   date_scraped     object
 1   title            object
 2   full_url         object
 3   company          object
 4   company_url      object
 5   location         object
 6   description      object
 7   seniority_level  object
 8   employment_type  object
 9   job_function     object
 10  industries       object
dtypes: object(11)
memory usage: 2.7+ MB

Remove Unnecessary listings

If a listing does not mention finance in it's title or it's description, the listing can be discarded as only finance related job postings are relevant.

# Drop rows that do not mention finance, "financ" is chosen to include words such as "financial"
def drop(x):
    if 'financ' in x["description"].lower() or 'financ' in x["title"].lower():
        return True
    return False

linkedin = linkedin[linkedin.apply(drop, axis=1)]

Export cleaned data

After cleaning we are left with 28,634 listings out of the 31,383 originally scraped

linkedin.to_csv("data/cleaned.csv", index=False)


Loading the data

After cleaning there are 28,634 unique listings.

import pandas as pd

# Visualization
%matplotlib inline
import seaborn as sns

linkedin = pd.read_csv("data/cleaned.csv")

date_scraped title full_url company company_url location description seniority_level employment_type job_function industries
0 2022-10-04 Quantitative analyst (finance) Lucas Group, A Korn Ferry Company Charlotte, NC \n <p>Lucas Group, a Korn Ferry company... Associate Full-time Finance Banking
date_scraped title full_url company company_url location description seniority_level employment_type job_function industries
count 28634 28634 28634 28634 28634 28634 28634 28634 28634 28634 28634
unique 7 11956 28634 6985 6992 3027 14095 6 4 670 975
top 2022-09-30 Remote Tax Professional Aston Carter United States \n <strong><u>What You'll Do...<br><br>... Associate Full-time Not Assigned Not Assigned
freq 5568 1214 1 1589 1589 1792 1229 10997 19586 8293 8294


locations = pd.DataFrame(linkedin["location"].value_counts()).reset_index()
locations.rename(columns={"index": "Location", "location": "Count"}, inplace=True)

palette = reversed(sns.color_palette("Blues", 10))
sns.barplot(data=locations.iloc[:10], y="Location", x="Count", palette=palette).set(title='Top 10 Locations')
[Text(0.5, 1.0, 'Top 10 Locations')]
Bar Chart of the Top 10 Locations
import plotly.graph_objects as go

# Drop location "United States"
locations.drop(index=0, inplace=True)

# Get State Instead of City and State
locations["Location"] = locations["Location"].map(lambda x: x[-2:])
locations = locations.groupby(locations["Location"]).aggregate('sum').reset_index()

# Map Figure
fig = go.Figure(data=go.Choropleth(

    title_text='Job Listings by State',
    geo = dict(

Jobs with "United States" as the location are most likely remote positions. New York City and Chicago have a much higher share of jobs relative to other cities. Since New York City and Chicago are the homes of the major US security markets, it makes since that those 2 cities would have the greatest number of finance jobs.

The states of California, New York, Texas, Illinois, and Florida have the most jobs.


Not Assigned                                                                          8294
Staffing and Recruiting                                                               3112
Financial Services                                                                    1691
Financial Services and Retail                                                         1120
Accounting                                                                             924
Newspaper Publishing, Online Audio and Video Media, Book and Periodical Publishing       1
Research Services, Staffing and Recruiting, Executive Offices                            1
Staffing and Recruiting, Retail, Hospitality                                             1
Financial Services, Biotechnology Research, Pharmaceutical Manufacturing                 1
Consumer Services, Retail, Hospitality                                                   1
Name: industries, Length: 975, dtype: int64

Not Assigned can be dropped. Also, some listings have multiple different industries separated by a comma.

industries = []
def get_industries(x):
    for industry in str(x).split(', '):
        if "Not Assigned" not in industry:

# Derive industries

industries = pd.Series(industries, name="Count").value_counts().reset_index()
industries.rename(columns={"index": "Industry"}, inplace=True)

palette = reversed(sns.color_palette("Blues", 15))
sns.barplot(data=industries.iloc[:15], y="Industry", x="Count", palette=palette).set(title='Top 15 Industries')
[Text(0.5, 1.0, 'Top 15 Industries')]
Chart Showing Top 15 Industries
Aston Carter                1589
H&R Block                   1278
Grant Thornton LLP (US)      728
RemoteWorker US              620
PwC                          474
Eureka Multifamily Group       1
University of Phoenix          1
5th Avenue Recruiting          1
JES Holdings, LLC              1
Jamestown                      1
Name: company, Length: 6985, dtype: int64

Staffing/ Recruiting is interestingly the most common industry making postings. This is likely due to staffing firms making many postings, as several of the most repeated companies in the dataset are staffing firms.

Besides Staffing/ Recruiting, many of the top 15 industries are directly in the financial industry as expected.

Software development, IT, Information/Internet, and Technology where information systems domain knowledge may be useful make up numbers 4, 6, 12, and 13 respectively.


import re

# Tuple containing the skill name and it's regex pattern
# 0 is the name and 1 is the regex pattern
skills_search = [
    ('Data Analysis', 'data anal'),
    ('SQL', 'sql'),
    ('Database', 'database'),
    ('Excel', 'excel |excel.|excel,'),
    ('Tableau', 'tableau'),
    ('Power BI', 'power bi')

skills = []
def get_skills(x):
    for skill in skills_search:
        if[1], x) != None:


skills = pd.Series(skills, name="Count").value_counts().reset_index()
skills.rename(columns={"index": "Skill"}, inplace=True)

skills['% of Listings'] = skills['Count'] / linkedin.shape[0] * 100

length = len(skills_search)
palette = reversed(sns.color_palette("Blues", length))
sns.barplot(data=skills.iloc[:length], y="Skill", x='% of Listings', palette=palette).set(title='Most In Demand Skills')
[Text(0.5, 1.0, 'Most In Demand Skills')]
Chart Showing the Most In Demand Skills

Excel is a must-have skill, it is mentioned in 25% of the job listings.


Where are the most jobs located?

New York City and Chicago have the greatest proportion of the jobs relative to other cities in the United States.

The states of California, New York, Texas, Illinois, and Florida have the most jobs overall.

Which industries have the most jobs?

Unsurprisingly jobs directly in the financial services industry are the most common, but tech is common as well. Software development, IT, Information/Internet, and Technology make up numbers 4, 6, 12, and 13 respectively.

How can I best prepare myself for getting a job, what skills are in demand beyond an education in Finance?

Excel is the most in demand skill, being mentioned in 25% of the job descriptions. Database, and Data Analysis are second and third, both around 10%.

SQL, Tableau, and Power BI were mentioned in about 1% of the postings, however these skills may set a job candidate apart from others seeking the same role.

Would my MIS degree make me more marketable?

Database knowledge and data analysis skills are high in demand which are topics covered in the MIS degree program. Also, IT and tech are in the top 15 industries and in these industries information systems domain knowledge may be useful.