Analyzing Finance Job Market

Published: Thu, Oct 6, 2022

Updated: Mon, Apr 17, 2023

View Project

Technologies:

Data Manipulation Library:

Pandas

Data Visualization Libraries:

Seaborn, Plotly

Below is the Jupyter Notebook for this project, generated with Jupyter2Svelte:

Finance Job Market Analysis ¶

Purpose ¶

As a finance and management information systems major I wanted to learn more about the current (2022) job market before entering within the next couple years.

Questions ¶

Where are the most jobs located?
Which industries have the most jobs?
How can I best prepare myself for getting a job, what skills are in demand beyond an education in Finance?
Would my MIS degree make me more marketable?

Data Collection ¶

For this project I developed a Python script, linkedin-job-scraper, to scrape public job listings on LinkedIn. LinkedIn is one of the most popular platforms for job seekers, making it one of the best websites to gather data on the job market.

Methodology ¶

From September 28th through October 4th 2022, I scraped LinkedIn's job listings each day with the search keyword "Finance" and location of United States. Filtering for the last 24 hours, the seniority/experience levels of Internship, Entry Level, and Associate with the job types of Full Time, Internship, and Other. I was looking to collect information on Full Time jobs available to those with about 0-3 years experience in the field ("entry jobs") and these filters would produce the most relevant data for my questions. Queries used LinkedIn's default "Most relevant" sorting in an attempt to get the best data pertaining to the "Finance" keyword.

Because LinkedIn will only display up to 1000 listings per search, I was unfortunately unable to scrape the entirety of the listings posted each day. I used separate search queries for each seniority level and further broke down those queries by filtering each for remote, hybrid, or on-site for the widest coverage. Gathering more data would not be possible without using highly specific filters introducing bias.

Data Structure ¶

Col	Definition
date_scraped	date the listing was scraped
title	title of the job listing
full_url	LinkedIn URL of the job posting
company	company name
company_url	company's LinkedIn URL
location	job's location
description	raw HTML of the job's description
seniority_level	job's seniority level
employment_type	job's employment type (Full Time, Part Time etc.)
job_function	job's expected functions
industries	industries the company is in

Data Cleaning ¶

Import Data ¶

import pandas as pd
import glob

files = glob.glob("data/*.csv")
                  
dfs = []
for file in files:
    dfs.append(pd.read_csv(file))

linkedin = pd.concat(dfs)    
display(linkedin.head())
linkedin.describe()

	date_scraped	title	full_url	company	company_url	location	description	seniority_level	employment_type	job_function	industries
0	2022-10-04	Quantitative analyst (finance)	https://www.linkedin.com/jobs/view/quantitativ...	Lucas Group, A Korn Ferry Company	https://www.linkedin.com/company/lucas-group?t...	Charlotte, NC	\n <p>Lucas Group, a Korn Ferry company...	Associate	Full-time	Finance	Banking
1	2022-10-03	Accounting and finance associates	https://www.linkedin.com/jobs/view/accounting-...	EVERESTX Talent Solutions	https://www.linkedin.com/company/everestxtalen...	Pennsylvania, United States	\n <strong>Overview of the Role:</stron...	Not Assigned	Not Assigned	Not Assigned	Not Assigned
2	2022-09-28	Associate/ Consulting Associate - Litigation, ...	https://www.linkedin.com/jobs/view/associate-c...	Charles River Associates	https://www.linkedin.com/company/charles-river...	Washington, DC	\n <strong>About Charles River Associat...	Entry level	Full-time	Legal	Business Consulting and Services
3	2022-09-28	Associate/ Consulting Associate - Litigation, ...	https://www.linkedin.com/jobs/view/associate-c...	Charles River Associates	https://www.linkedin.com/company/charles-river...	Chicago, IL	\n <strong>About Charles River Associat...	Entry level	Full-time	Legal	Business Consulting and Services
4	2022-10-04	Senior Financial Analyst (Remote)	https://www.linkedin.com/jobs/view/senior-fina...	Capital Search Group	https://www.linkedin.com/company/capital-searc...	McLean, VA	\n Microsoft has become a corporate lea...	Not Assigned	Not Assigned	Not Assigned	Not Assigned

	date_scraped	title	full_url	company	company_url	location	description	seniority_level	employment_type	job_function	industries
count	60017	60017	60017	60016	60017	60017	60016	51890	52288	52288	52288
unique	7	12294	29607	7205	7213	3071	14787	6	4	693	1009
top	2022-09-30	Remote Tax Professional	https://www.linkedin.com/jobs/view/finance-and...	Aston Carter	https://www.linkedin.com/company/aston-carter?...	United States	\n <strong><u>What You'll Do...<br><br>...	Associate	Full-time	Accounting/Auditing and Finance	Not Assigned
freq	11299	2443	7	3339	3339	3738	2474	23092	40980	10160	9708

Drop Duplicates ¶

Out of the 31,383 job listings that were scraped, 29,607 were unique

linkedin.drop_duplicates(subset="full_url", inplace=True)
linkedin.describe()

	date_scraped	title	full_url	company	company_url	location	description	seniority_level	employment_type	job_function	industries
count	29607	29607	29607	29607	29607	29607	29606	29400	29422	29422	29422
unique	7	12294	29607	7205	7213	3071	14647	6	4	691	1000
top	2022-09-30	Remote Tax Professional	https://www.linkedin.com/jobs/view/quantitativ...	Aston Carter	https://www.linkedin.com/company/aston-carter?...	United States	\n <strong><u>What You'll Do...<br><br>...	Associate	Full-time	Not Assigned	Not Assigned
freq	5669	1214	1	1749	1749	1833	1229	11325	20245	8380	8381

Handle Nulls ¶

linkedin.isna().sum()

date_scraped         0
title                0
full_url             0
company              0
company_url          0
location             0
description          1
seniority_level    207
employment_type    185
job_function       185
industries         185
dtype: int64

First lets drop the row with the missing description.

linkedin.dropna(subset="description", inplace=True)
linkedin.isna().sum()

date_scraped         0
title                0
full_url             0
company              0
company_url          0
location             0
description          0
seniority_level    207
employment_type    185
job_function       185
industries         185
dtype: int64

I will fill the rest of the missing data with the string "Not Assigned"

linkedin.fillna("Not Assigned", inplace=True)

Data Types ¶

linkedin.info(show_counts=False)


Int64Index: 29606 entries, 0 to 4625
Data columns (total 11 columns):
 #   Column           Dtype 
---  ------           ----- 
 0   date_scraped     object
 1   title            object
 2   full_url         object
 3   company          object
 4   company_url      object
 5   location         object
 6   description      object
 7   seniority_level  object
 8   employment_type  object
 9   job_function     object
 10  industries       object
dtypes: object(11)
memory usage: 2.7+ MB

Remove Unnecessary listings ¶

If a listing does not mention finance in it's title or it's description, the listing can be discarded as only finance related job postings are relevant.

# Drop rows that do not mention finance, "financ" is chosen to include words such as "financial"
def drop(x):
    if 'financ' in x["description"].lower() or 'financ' in x["title"].lower():
        return True
    return False

linkedin = linkedin[linkedin.apply(drop, axis=1)]

Export cleaned data ¶

After cleaning we are left with 28,634 listings out of the 31,383 originally scraped

display(linkedin.shape[0])
linkedin.to_csv("data/cleaned.csv", index=False)

EDA ¶

Loading the data ¶

After cleaning there are 28,634 unique listings.

import pandas as pd

# Visualization
%matplotlib inline
import seaborn as sns

linkedin = pd.read_csv("data/cleaned.csv")

display(linkedin.head(1))
linkedin.describe()

	date_scraped	title	full_url	company	company_url	location	description	seniority_level	employment_type	job_function	industries
0	2022-10-04	Quantitative analyst (finance)	https://www.linkedin.com/jobs/view/quantitativ...	Lucas Group, A Korn Ferry Company	https://www.linkedin.com/company/lucas-group?t...	Charlotte, NC	\n <p>Lucas Group, a Korn Ferry company...	Associate	Full-time	Finance	Banking

	date_scraped	title	full_url	company	company_url	location	description	seniority_level	employment_type	job_function	industries
count	28634	28634	28634	28634	28634	28634	28634	28634	28634	28634	28634
unique	7	11956	28634	6985	6992	3027	14095	6	4	670	975
top	2022-09-30	Remote Tax Professional	https://www.linkedin.com/jobs/view/quantitativ...	Aston Carter	https://www.linkedin.com/company/aston-carter?...	United States	\n <strong><u>What You'll Do...<br><br>...	Associate	Full-time	Not Assigned	Not Assigned
freq	5568	1214	1	1589	1589	1792	1229	10997	19586	8293	8294

Locations ¶

locations = pd.DataFrame(linkedin["location"].value_counts()).reset_index()
locations.rename(columns={"index": "Location", "location": "Count"}, inplace=True)

palette = reversed(sns.color_palette("Blues", 10))
sns.barplot(data=locations.iloc[:10], y="Location", x="Count", palette=palette).set(title='Top 10 Locations')

[Text(0.5, 1.0, 'Top 10 Locations')]

import plotly.graph_objects as go

# Drop location "United States"
locations.drop(index=0, inplace=True)

# Get State Instead of City and State
locations["Location"] = locations["Location"].map(lambda x: x[-2:])
locations = locations.groupby(locations["Location"]).aggregate('sum').reset_index()

# Map Figure
fig = go.Figure(data=go.Choropleth(
    locations=locations["Location"],
    z=locations["Count"],
    locationmode='USA-states',
    colorscale='Blues',
    autocolorscale=False,
    marker_line_color='black',
))

fig.update_layout(
    title_text='Job Listings by State',
    geo = dict(
        scope='usa',
))

fig.show()

Jobs with "United States" as the location are most likely remote positions. New York City and Chicago have a much higher share of jobs relative to other cities. Since New York City and Chicago are the homes of the major US security markets, it makes since that those 2 cities would have the greatest number of finance jobs.

The states of California, New York, Texas, Illinois, and Florida have the most jobs.

Industries ¶

linkedin["industries"].value_counts()

Not Assigned                                                                          8294
Staffing and Recruiting                                                               3112
Financial Services                                                                    1691
Financial Services and Retail                                                         1120
Accounting                                                                             924
                                                                                      ... 
Newspaper Publishing, Online Audio and Video Media, Book and Periodical Publishing       1
Research Services, Staffing and Recruiting, Executive Offices                            1
Staffing and Recruiting, Retail, Hospitality                                             1
Financial Services, Biotechnology Research, Pharmaceutical Manufacturing                 1
Consumer Services, Retail, Hospitality                                                   1
Name: industries, Length: 975, dtype: int64

Not Assigned can be dropped. Also, some listings have multiple different industries separated by a comma.

industries = []
def get_industries(x):
    for industry in str(x).split(', '):
        if "Not Assigned" not in industry:
            industries.append(industry)

# Derive industries
linkedin["industries"].map(get_industries)

industries = pd.Series(industries, name="Count").value_counts().reset_index()
industries.rename(columns={"index": "Industry"}, inplace=True)

palette = reversed(sns.color_palette("Blues", 15))
sns.barplot(data=industries.iloc[:15], y="Industry", x="Count", palette=palette).set(title='Top 15 Industries')

[Text(0.5, 1.0, 'Top 15 Industries')]

linkedin["company"].value_counts()

Aston Carter                1589
H&R Block                   1278
Grant Thornton LLP (US)      728
RemoteWorker US              620
PwC                          474
                            ... 
Eureka Multifamily Group       1
University of Phoenix          1
5th Avenue Recruiting          1
JES Holdings, LLC              1
Jamestown                      1
Name: company, Length: 6985, dtype: int64

Staffing/ Recruiting is interestingly the most common industry making postings. This is likely due to staffing firms making many postings, as several of the most repeated companies in the dataset are staffing firms.

Besides Staffing/ Recruiting, many of the top 15 industries are directly in the financial industry as expected.

Software development, IT, Information/Internet, and Technology where information systems domain knowledge may be useful make up numbers 4, 6, 12, and 13 respectively.

Skills ¶

import re

# Tuple containing the skill name and it's regex pattern
# 0 is the name and 1 is the regex pattern
skills_search = [
    ('Data Analysis', 'data anal'),
    ('SQL', 'sql'),
    ('Database', 'database'),
    ('Excel', 'excel |excel.|excel,'),
    ('Tableau', 'tableau'),
    ('Power BI', 'power bi')
]

skills = []
def get_skills(x):
    for skill in skills_search:
        if re.search(skill[1], x) != None:
            skills.append(skill[0])

linkedin["description"].map(get_skills)

skills = pd.Series(skills, name="Count").value_counts().reset_index()
skills.rename(columns={"index": "Skill"}, inplace=True)

skills['% of Listings'] = skills['Count'] / linkedin.shape[0] * 100

length = len(skills_search)
palette = reversed(sns.color_palette("Blues", length))
sns.barplot(data=skills.iloc[:length], y="Skill", x='% of Listings', palette=palette).set(title='Most In Demand Skills')

[Text(0.5, 1.0, 'Most In Demand Skills')]

Excel is a must-have skill, it is mentioned in 25% of the job listings.

Findings ¶

Where are the most jobs located? ¶

New York City and Chicago have the greatest proportion of the jobs relative to other cities in the United States.

The states of California, New York, Texas, Illinois, and Florida have the most jobs overall.

Which industries have the most jobs? ¶

Unsurprisingly jobs directly in the financial services industry are the most common, but tech is common as well. Software development, IT, Information/Internet, and Technology make up numbers 4, 6, 12, and 13 respectively.

How can I best prepare myself for getting a job, what skills are in demand beyond an education in Finance? ¶

Excel is the most in demand skill, being mentioned in 25% of the job descriptions. Database, and Data Analysis are second and third, both around 10%.

SQL, Tableau, and Power BI were mentioned in about 1% of the postings, however these skills may set a job candidate apart from others seeking the same role.

Would my MIS degree make me more marketable? ¶

Database knowledge and data analysis skills are high in demand which are topics covered in the MIS degree program. Also, IT and tech are in the top 15 industries and in these industries information systems domain knowledge may be useful.