#install.packages("openxlsx")
#install.packages("devtools")
#install.packages("wesanderson")
#install.packages("RColorBrewer")
#install.packages("tidytext")
library(tidytuesdayR)
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(tidyverse)
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(ggplot2)
library(ggthemes)
library(readxl) # to read the excel files
library(openxlsx)
library(stringr)
library(wesanderson)
library(RColorBrewer)
library(gridExtra)
##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
library(tidytext)
Define your research question below. What about the data interests you? What is a specific question you want to find out about the data?
Given your question, what is your expectation about the data?
I have taken the College tuition, diversity, and pay from TidyTuesday (week of 10th March 2020).The data comes from different sources but the original source is the US Department of Education.
The easily available data is from tuitiontracker.org but since the dataset is in a very wide format, the data was converted to long format and only selected tables are available in the tidytuesday dataset.
I am very interested in understanding how the type of college chosen in the undergraduate degree influences and shapes one’s career prospects and their quality of education.
In most cases, the cost of attending public college is significantly cheaper than that of private colleges. Students who have their degrees from private colleges especially the elite universities and ivy-league colleges spend way more than the students who attend public colleges.The highly ranked colleges have very low acceptance rate making the process of getting into top colleges incredibly difficult.
Although, attending a highly ranked private college can be prestigious and may have better infrastructure and career prospects, it is important to not choose a college purely based on the ranking but instead evaluate all the aspects before choosing the college that would be well aligned with the one’s goal and interests.
I am also interested in understanding if gender, race and ethnicity has an influence in college selection.
Being an immigrant and having attended graduate college in a private university in the US which had a very high tuition and having worked in the industry for a couple years, my thought process of choosing a college has drifted away from emphasizing on the ranking to considering all the aspects holistically, the most important criteria`s to evaluate is
->the curriculum and faculty:ensure that curriculum offered is diverse and in-depth that will enhance career opportunities
->to be able to choose a program that is closely aligned with the interests
->networking opportunities and a strong alumni network
->return of investment
-> safety and living expenses
-> open to different cultural/ racial/ ethnic and socio-economic backgrounds
Is the Earning potential of student and the quality of education influenced by the type of college they attend?Does this relationship change across the different states in the USA?
Typically the private schools have better infrastructure, facility and funding opportunities, that I would like to study if attending private/public college affect the earning potential at entry level and at much later in the careers.
I would also like to study how a private/public college degree would contribute to aquiring STEM skills and to the betterment of society.
Does factors like gender, tuition-cost, race, income level affect the selection of type of college(public vs private)?
There has always been systemic bias in accessibility to education. Women, students from lower socio-economic backgrounds and minor racial/ethnic groups typically do not have the financial and moral support than those who come from affluent background and from more priveleged racial groups.
I am also interested in performing some sort of exploratory analysis such as
Number of private vs public colleges in the data set considered across the different states in the United States of America
Analyze the change in tuition rates over the last 20 years for different type of colleges and for different type of programs in the United States
Basically my intent is to compare if the students attending private and public college have different career prospects and quality of education and if so, how does the difference look.
Also, I am interested in understanding if gender or race/ethnicity influence in the decision of attending a particular type of school and if location play a role too in college selection.
My expectation of the data is the choice of college is definitely influenced by the socioeconomic status. Most likely, students from lower socioeconomic status would attend colleges that have a reasonable tuition fees and also provides scholarship that will aid them in paying off their tuitions. Students from minor racial/ ethnic groups or students who are first generational learners would probably be inclined to attend colleges that would be have a strong diversity and inclusion policy, lower tuition fees or provide need based scholarships. I would expect % women enrollment to be lesser than % men enrollment in most parts of the USA.
And I think the earning potential of the student may vary at the beginning of their careers i.e. students who attend prestigious private colleges may have a better start in their careers compared to students who attend public colleges but this difference may weaken as they advance in their careers.
The quality of education(making world a better place and stem %) maybe influenced partly by the college one attends but the most important aspect is one’s personal interest and critical thinking skills.
Load the data below and use
dplyr::glimpse()
orskimr::skim()
on the data. You should upload the data file into thedata
directory.
# Load the tidytuesday data
devtools::install_github("thebioengineer/tidytuesdayR")
## Skipping install of 'tidytuesdayR' from a github remote, the SHA1 (51dfba6f) has not changed since last install.
## Use `force = TRUE` to force installation
tuesdata <- tidytuesdayR::tt_load(2020, week = 11)
## --- Compiling #TidyTuesday Information for 2020-03-10 ----
## --- There are 5 files available ---
## --- Starting Download ---
##
## Downloading file 1 of 5: `diversity_school.csv`
## Downloading file 2 of 5: `historical_tuition.csv`
## Downloading file 3 of 5: `salary_potential.csv`
## Downloading file 4 of 5: `tuition_cost.csv`
## Downloading file 5 of 5: `tuition_income.csv`
## --- Download complete ---
The data could be uploaded either in 2 ways, directly access from “tuesdata” into the environment or by writing the data into the folder.
# write the cost data into /data folder
write_excel_csv(tuesdata$tuition_cost,
file=here::here("Data/cost.csv"))
# write the tuition income data into /data folder
write_excel_csv(tuesdata$tuition_income,
file=here::here("Data/income.csv"))
# write the salary data into /data folder
write_excel_csv(tuesdata$salary_potential,
file=here::here("Data/salary.csv"))
# write the historical tuition data into /data folder
write_excel_csv(tuesdata$historical_tuition,
file=here::here("Data/historical_tuition.csv"))
# write the diversity data into /data folder
write_excel_csv(tuesdata$diversity_school,
file=here::here("Data/diversity.csv"))
#alternative way to load data
cost<-tuesdata$tuition_cost
income<-tuesdata$tuition_income
salary<-tuesdata$salary_potential
hist_tuition<-tuesdata$historical_tuition
diversity<-tuesdata$diversity_school
This section would be not be required if we directly loaded the data into the environment.
# read the cost data set from /data folder
cost<-read_csv(here::here("Data/cost.csv"),
na="NA",skip=0)
## Rows: 2973 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): name, state, state_code, type, degree_length
## dbl (5): room_and_board, in_state_tuition, in_state_total, out_of_state_tuit...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# read the income data set from /data folder
income<-read_csv(here::here("Data/income.csv"),
na="NA",skip=0)
## Rows: 209012 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): name, state, campus, income_lvl
## dbl (3): total_price, year, net_cost
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# read the salary data set from /data folder
salary<-read_csv(here::here("Data/salary.csv"),
na="NA",skip=0)
## Rows: 935 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): name, state_name
## dbl (5): rank, early_career_pay, mid_career_pay, make_world_better_percent, ...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# read the historical tuition data set from /data folder
hist_tuition<-read_csv(here::here("Data/historical_tuition.csv"), na="NA",skip=0)
## Rows: 270 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): type, year, tuition_type
## dbl (1): tuition_cost
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# read the diversity data set from /data folder
diversity<-read_csv(here::here("Data/diversity.csv"),na="NA", skip=0)
## Rows: 50655 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): name, state, category
## dbl (2): total_enrollment, enrollment
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
If there are any quirks that you have to deal with NA
coded as something else, or it is multiple tables, please make some notes here about what you need to do before you start transforming the data in the next section.
Make sure your data types are correct!
In the following code chunk,we read the tuition income data and use the glimpse and the skim command to get a sense of the data, total number of observations, number of observations and variables,the data type of each variable and the skim also provides mini histogram plots of the continuous variables which gives us an idea about the underlying distribution. Missingness in each variable is also captured by the skim command.
# tuition income data
# verify using glimpse and skim
glimpse(income) # 209,012 observations and 7 variables
## Rows: 209,012
## Columns: 7
## $ name <chr> "Piedmont International University", "Piedmont Internation…
## $ state <chr> "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC"…
## $ total_price <dbl> 20174, 20174, 20174, 20174, 20514, 20514, 20514, 20514, 20…
## $ year <dbl> 2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017, 2017, 2018…
## $ campus <chr> "On Campus", "On Campus", "On Campus", "On Campus", "On Ca…
## $ net_cost <dbl> 11475.00, 11451.00, 16229.00, 15592.00, 11668.39, 11643.99…
## $ income_lvl <chr> "0 to 30,000", "30,001 to 48,000", "48_001 to 75,000", "75…
# 4 character variables and 3 numeric variables
skimr::skim(income) # 209,012 observations and 7 variables
Name | income |
Number of rows | 209012 |
Number of columns | 7 |
_______________________ | |
Column type frequency: | |
character | 4 |
numeric | 3 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 6 | 75 | 0 | 3664 | 0 |
state | 0 | 1 | 2 | 2 | 0 | 51 | 0 |
campus | 0 | 1 | 9 | 10 | 0 | 2 | 0 |
income_lvl | 0 | 1 | 11 | 17 | 0 | 5 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
total_price | 0 | 1 | 30102.19 | 13824.70 | 4906 | 19186.00 | 26286.0 | 38831 | 114083.00 | ▇▆▂▁▁ |
year | 0 | 1 | 2014.38 | 2.46 | 2010 | 2012.00 | 2014.0 | 2017 | 2018.00 | ▅▇▃▇▇ |
net_cost | 0 | 1 | 16784.92 | 8887.16 | -15101 | 10148.56 | 16207.1 | 22350 | 95674.84 | ▁▇▁▁▁ |
# 4 character variables and 3 numeric variables
Using the glimpse command, we were able to know that this dataset had 209,012 observations and 7 variables-out of which 4 are character variables and 3 are numeric variables. The variables name,state, campus and income_level are character variables and the remaining variables: total_price, year and net_cost are in the numeric format.
The variables present in the data set are:
I also wanted to plot the histograms separately to get understand the underlying distribution, assess the center/spread/skewness of the distribution.
# additional checks on the distribution of numeric variables
# total price
plot1<-ggplot(data=income,
aes(x=total_price))+
geom_histogram(fill="lightblue")+
geom_vline(xintercept=mean(income$total_price,na.rm=TRUE), col = "orange",
lwd = 2)+
annotate("text", x=35000, y=15000, label="Mean of total price",angle=90)+
geom_vline(xintercept=median(income$total_price,na.rm=TRUE), col = "red",
lwd = 2)+
annotate("text", x=22000, y=15000, label="Median of total price",angle=90)+
labs(x="Total price in USD",
y="Count",
title="Histogram of Total price(in USD)")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
# mean exceeds median-> heavily right skewed!!
# net_cost
plot2<-ggplot(data=income,
aes(x=net_cost))+
geom_histogram(fill="light blue")+
geom_vline(xintercept=mean(income$net_cost,
na.rm=TRUE), col = "orange",
lwd = 2)+
annotate("text", x=20000, y=15000, label="Mean of net cost",angle=90)+
geom_vline(xintercept=median(income$net_cost,
na.rm=TRUE),
col = "red",
lwd = 2)+
annotate("text", x=12000, y=15000, label="Median of net cost",angle=90)+
labs(x="Net cost in USD",
y="Count",
title="Histogram of Net cost(in USD)")
# reasonably normal
# mean and median almost coincides!
grid.arrange(plot1,plot2,nrow=2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# check for missing values
sum(is.na(income)) # no missing values
## [1] 0
Some comments on the data
It can be noted that the variable state
is in character format and needs to be converted to factors especially useful while making exploratory plots.
Also,the variable year
is in the numeric format which will have to be converted to a factor format in order to study how the net_cost and total_price varies over the years.
Although the variable campus
is in the categorical format, we are not very concerned about transforming it to the factor format as in the next section, I will be excluding the off-campus type observation since I am only interested in the on-campus observations.
Also,from the tiny histogram plots provided from the output of skim command, the variable total_price data seems to be heavily right skewed. I was particularly interested in the distribution of total_price, the mean exceeds the median and the data is heavily right skewed, typically log transformation is applied to right-skewed data to address non-normality but in this case since the primary interest is to visualize the data and not to perform apply modelling techniques, we will use the data as it is.
The distribution of net cost variable looks reasonably normal, the mean and median almost coincide.
No missing values observed in the tuition income data.
In the following code chunk, we use the glimpse and study commands to get an idea about the tuition cost data.
# tuition cost data
# verify using glimpse and skim
glimpse(cost) # 2973 observations and 10 variables
## Rows: 2,973
## Columns: 10
## $ name <chr> "Aaniiih Nakoda College", "Abilene Christian Univ…
## $ state <chr> "Montana", "Texas", "Georgia", "Minnesota", "Cali…
## $ state_code <chr> "MT", "TX", "GA", "MN", "CA", "CO", "NY", "NY", "…
## $ type <chr> "Public", "Private", "Public", "For Profit", "For…
## $ degree_length <chr> "2 Year", "4 Year", "2 Year", "2 Year", "4 Year",…
## $ room_and_board <dbl> NA, 10350, 8474, NA, 16648, 8782, 16030, 11660, 1…
## $ in_state_tuition <dbl> 2380, 34850, 4128, 17661, 27810, 9440, 38660, 537…
## $ in_state_total <dbl> 2380, 45200, 12602, 17661, 44458, 18222, 54690, 1…
## $ out_of_state_tuition <dbl> 2380, 34850, 12550, 17661, 27810, 20456, 38660, 9…
## $ out_of_state_total <dbl> 2380, 45200, 21024, 17661, 44458, 29238, 54690, 2…
# 5 character variables and 5 numeric variables
skimr::skim(cost) # 2973 observations and 10 variables
Name | cost |
Number of rows | 2973 |
Number of columns | 10 |
_______________________ | |
Column type frequency: | |
character | 5 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1.00 | 8 | 67 | 0 | 2938 | 0 |
state | 52 | 0.98 | 4 | 14 | 0 | 50 | 0 |
state_code | 0 | 1.00 | 2 | 2 | 0 | 55 | 0 |
type | 0 | 1.00 | 5 | 10 | 0 | 4 | 0 |
degree_length | 0 | 1.00 | 5 | 6 | 0 | 3 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
room_and_board | 1094 | 0.63 | 10095.28 | 3288.55 | 30 | 7935 | 10000 | 12424.5 | 21300 | ▁▅▇▃▁ |
in_state_tuition | 0 | 1.00 | 16491.29 | 14773.84 | 480 | 4890 | 10099 | 27124.0 | 59985 | ▇▂▂▁▁ |
in_state_total | 0 | 1.00 | 22871.73 | 18948.39 | 962 | 5802 | 17669 | 35960.0 | 75003 | ▇▅▂▂▁ |
out_of_state_tuition | 0 | 1.00 | 20532.73 | 13255.65 | 480 | 9552 | 17486 | 29208.0 | 59985 | ▇▆▅▂▁ |
out_of_state_total | 0 | 1.00 | 26913.16 | 17719.73 | 1376 | 11196 | 23214 | 39054.0 | 75003 | ▇▅▅▂▁ |
# 5 character variables and 5 numeric variables
Based on the output of glimpse and skim command,2973 observations and 10 variables-out of which 5 are character variables and 5 are numeric variables. The variables name
,state
,state_code
,type
and degree_length
are character variables and the remaining variables: room_and_board
, in_state_tuition
,in_state_total
,out_of_state_tuition
, out_of_state_total
are in the numeric format.
The variables present in the data set are:
The skim command not only provides the total # of observations, total # of variables and their data type’s but also provides a peek of the distribution of the continuous variables by their small histogram plots. I also tried plotting histograms using ggplots.
Numeric variables
# histogram of in-state tuition expenses
ggplot(data=cost,
aes(x=in_state_tuition))+
geom_histogram(fill="lightblue")+
geom_vline(xintercept=mean(cost$in_state_tuition
,na.rm=TRUE),
col = "orange",
lwd = 2)+
annotate("text", x=18000, y=200, label="Mean",angle=90)+
geom_vline(xintercept=median(cost$in_state_tuition,
na.rm=TRUE), col = "red",
lwd = 2)+
annotate("text", x=8000, y=200, label="Median",angle=90)+
labs(x="In state tuition expenses in USD",
y="Count",
title="Histogram of instate tuition expenses(in USD)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# histogram of out-state tuition expenses
ggplot(data=cost,
aes(x=out_of_state_tuition))+
geom_histogram(fill="lightblue")+
geom_vline(xintercept=mean(cost$out_of_state_tuition,na.rm=TRUE),
col = "orange",
lwd = 2)+
annotate("text", x=23000, y=100, label="Mean",angle=90)+
geom_vline(xintercept=median(cost$out_of_state_tuition,na.rm=TRUE),
col = "red",
lwd = 2)+
annotate("text", x=15000, y=100, label="Median",angle=90)+
labs(x="Out of state tuition expenses in USD",
y="Count",
title="Histogram of out of state tuition expenses(in USD)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# histogram of in-state total expenses
ggplot(data=cost,
aes(x=in_state_total))+
geom_histogram(fill="lightblue")+
geom_vline(xintercept=mean(cost$in_state_total
,na.rm=TRUE),
col = "orange",
lwd = 2)+
annotate("text", x=24000, y=150, label="Mean",angle=90)+
geom_vline(xintercept=median(cost$in_state_total,
na.rm=TRUE), col = "red",
lwd = 2)+
annotate("text", x=16000, y=150, label="Median",angle=90)+
labs(x="In state total expenses in USD",
y="Count",
title="Histogram of instate total expenses(in USD)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# histogram of out-state total expenses
ggplot(data=cost,
aes(x=out_of_state_total))+
geom_histogram(fill="lightblue")+
geom_vline(xintercept=mean(cost$out_of_state_total,na.rm=TRUE),
col = "orange",
lwd = 2)+
annotate("text", x=29000, y=100, label="Mean",angle=90)+
geom_vline(xintercept=median(cost$out_of_state_total,na.rm=TRUE),
col = "red",
lwd = 2)+
annotate("text", x=20000, y=100, label="Median",angle=90)+
labs(x="Out of state total expenses in USD",
y="Count",
title="Histogram of out of state total expenses(in USD)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# histogram of room and board expenses
ggplot(data=cost,
aes(x=room_and_board))+
geom_histogram(fill="lightblue")+
geom_vline(xintercept=mean(cost$room_and_board
,na.rm=TRUE),
col = "orange",
lwd = 2)+
annotate("text", x=11000, y=50, label="Mean",angle=90)+
geom_vline(xintercept=median(cost$room_and_board,
na.rm=TRUE), col = "red",
lwd = 2)+
annotate("text", x=9000, y=50, label="Median",angle=90)+
labs(x="Room and Board expenses in USD",
y="Count",
title="Histogram of Room and board expenses(in USD)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1094 rows containing non-finite values (`stat_bin()`).
From the histogram plots,the distribution of the variable room_and_board
is reasonably normal, where mean and median coincides.
The distribution of the variables: in_state_tuition
,in_state_total
, out_of_state_tuition
and out_of_state_total
is heavily right skewed with the mean exceeding the median values.
Character variables
There are 52 observations with missing values for the state variable and 1094 missing values for room and board, we assess the NA values for the state variable in the data transformation section.
We retain the missing values for room and board, since our question of interest is not related to room and board
and those missing observations in room and board
have complete observations for the other variables that I am interested in exploring further.
# filter out the state with NA's to understand why they are missing!
cost_na<-cost%>%filter(is.na(state))
# looks like the state names are missing however the state codes are present
# get the different state codes and counts of the states with missing names!
tabyl(cost_na$state_code)
## cost_na$state_code n percent
## AS 1 0.01923077
## DC 8 0.15384615
## GU 1 0.01923077
## PR 41 0.78846154
## VI 1 0.01923077
# based on the state code's fill the name of the state using mutate command
cost<-cost%>%
mutate(state.upd=case_when(!is.na(state)~state,
is.na(state)&state_code=="AS"~"American Samoa",
is.na(state)&state_code=="DC"~"District of Columbia",
is.na(state)&state_code=="GU"~"Guam",
is.na(state)&state_code=="PR"~"Peurto Rico",
is.na(state)&state_code=="VI"~"Virgin Islands",
TRUE~"NA"))
#verify
tabyl(cost$state.upd) # no NA's
## cost$state.upd n percent
## Alabama 54 0.0181634712
## Alaska 6 0.0020181635
## American Samoa 1 0.0003363606
## Arizona 34 0.0114362597
## Arkansas 46 0.0154725866
## California 254 0.0854355869
## Colorado 38 0.0127817020
## Connecticut 36 0.0121089808
## Delaware 9 0.0030272452
## District of Columbia 8 0.0026908846
## Florida 88 0.0295997309
## Georgia 79 0.0265724857
## Guam 1 0.0003363606
## Hawaii 14 0.0047090481
## Idaho 13 0.0043726875
## Illinois 125 0.0420450723
## Indiana 62 0.0208543559
## Iowa 52 0.0174907501
## Kansas 52 0.0174907501
## Kentucky 44 0.0147998655
## Louisiana 34 0.0114362597
## Maine 27 0.0090817356
## Maryland 45 0.0151362260
## Massachusetts 93 0.0312815338
## Michigan 78 0.0262361251
## Minnesota 71 0.0238816011
## Mississippi 32 0.0107635385
## Missouri 73 0.0245543222
## Montana 22 0.0073999327
## Nebraska 33 0.0110998991
## Nevada 10 0.0033636058
## New Hampshire 21 0.0070635721
## New Jersey 54 0.0181634712
## New Mexico 24 0.0080726539
## New York 221 0.0743356879
## North Carolina 117 0.0393541877
## North Dakota 18 0.0060544904
## Ohio 127 0.0427177935
## Oklahoma 40 0.0134544231
## Oregon 40 0.0134544231
## Pennsylvania 160 0.0538176926
## Peurto Rico 41 0.0137907837
## Rhode Island 11 0.0036999664
## South Carolina 57 0.0191725530
## South Dakota 18 0.0060544904
## Tennessee 62 0.0208543559
## Texas 150 0.0504540868
## Utah 14 0.0047090481
## Vermont 19 0.0063908510
## Virgin Islands 1 0.0003363606
## Virginia 79 0.0265724857
## Washington 60 0.0201816347
## West Virginia 30 0.0100908174
## Wisconsin 67 0.0225361588
## Wyoming 8 0.0026908846
# use the state.upd column going forward but rename it to state
cost<-cost%>%
select(-state)%>%
mutate(state=state.upd)%>%
select(-state.upd)
# rearrange the columns, state before state_code
cost<-cost%>%relocate("state",.after="name")
It was observed those 52 missing values of state was having complete observations for the state_code with which I was able to address the missing values in state.
# check the different levels in the variable type, degree length and state
unique(cost$type) # 4 types: public,private, not for profit and others
## [1] "Public" "Private" "For Profit" "Other"
unique(cost$degree_length) # 2 year, 4 year and others
## [1] "2 Year" "4 Year" "Other"
unique(cost$state) #55 unique tates
## [1] "Montana" "Texas" "Georgia"
## [4] "Minnesota" "California" "Colorado"
## [7] "New York" "Michigan" "Virginia"
## [10] "Florida" "South Carolina" "Alabama"
## [13] "North Carolina" "Alaska" "Connecticut"
## [16] "Pennsylvania" "Mississippi" "West Virginia"
## [19] "Maryland" "Ohio" "Iowa"
## [22] "Kansas" "Wisconsin" "Illinois"
## [25] "Tennessee" "Arizona" "Massachusetts"
## [28] "American Samoa" "District of Columbia" "Peurto Rico"
## [31] "Indiana" "Washington" "Arkansas"
## [34] "Kentucky" "New Jersey" "South Dakota"
## [37] "Missouri" "Maine" "Louisiana"
## [40] "Nebraska" "Vermont" "North Dakota"
## [43] "Oregon" "Idaho" "Utah"
## [46] "Hawaii" "Rhode Island" "Oklahoma"
## [49] "Nevada" "Wyoming" "New Mexico"
## [52] "New Hampshire" "Delaware" "Guam"
## [55] "Virgin Islands"
# can also tabyl command to obtain the unique levels and their corresponding counts
tabyl(cost$type)
## cost$type n percent
## For Profit 107 0.0359905819
## Other 1 0.0003363606
## Private 1281 0.4308779011
## Public 1584 0.5327951564
tabyl(cost$degree_length)
## cost$degree_length n percent
## 2 Year 1120 0.3767238480
## 4 Year 1852 0.6229397915
## Other 1 0.0003363606
tabyl(cost$state)
## cost$state n percent
## Alabama 54 0.0181634712
## Alaska 6 0.0020181635
## American Samoa 1 0.0003363606
## Arizona 34 0.0114362597
## Arkansas 46 0.0154725866
## California 254 0.0854355869
## Colorado 38 0.0127817020
## Connecticut 36 0.0121089808
## Delaware 9 0.0030272452
## District of Columbia 8 0.0026908846
## Florida 88 0.0295997309
## Georgia 79 0.0265724857
## Guam 1 0.0003363606
## Hawaii 14 0.0047090481
## Idaho 13 0.0043726875
## Illinois 125 0.0420450723
## Indiana 62 0.0208543559
## Iowa 52 0.0174907501
## Kansas 52 0.0174907501
## Kentucky 44 0.0147998655
## Louisiana 34 0.0114362597
## Maine 27 0.0090817356
## Maryland 45 0.0151362260
## Massachusetts 93 0.0312815338
## Michigan 78 0.0262361251
## Minnesota 71 0.0238816011
## Mississippi 32 0.0107635385
## Missouri 73 0.0245543222
## Montana 22 0.0073999327
## Nebraska 33 0.0110998991
## Nevada 10 0.0033636058
## New Hampshire 21 0.0070635721
## New Jersey 54 0.0181634712
## New Mexico 24 0.0080726539
## New York 221 0.0743356879
## North Carolina 117 0.0393541877
## North Dakota 18 0.0060544904
## Ohio 127 0.0427177935
## Oklahoma 40 0.0134544231
## Oregon 40 0.0134544231
## Pennsylvania 160 0.0538176926
## Peurto Rico 41 0.0137907837
## Rhode Island 11 0.0036999664
## South Carolina 57 0.0191725530
## South Dakota 18 0.0060544904
## Tennessee 62 0.0208543559
## Texas 150 0.0504540868
## Utah 14 0.0047090481
## Vermont 19 0.0063908510
## Virgin Islands 1 0.0003363606
## Virginia 79 0.0265724857
## Washington 60 0.0201816347
## West Virginia 30 0.0100908174
## Wisconsin 67 0.0225361588
## Wyoming 8 0.0026908846
I will be converting the character variables type
, state
, state_code
to factor format in the next section. I am not too concerned about the converting the degree_length
variable to factor format as I will be only retaining 4 year degree_length observations(in the next section) which is more pertianing to my research question of interest.
# salary potential data
# verify using glimpse and skim
glimpse(salary) # 935 observations and 7 variables
## Rows: 935
## Columns: 7
## $ rank <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ name <chr> "Auburn University", "University of Alabama …
## $ state_name <chr> "Alabama", "Alabama", "Alabama", "Alabama", …
## $ early_career_pay <dbl> 54400, 57500, 52300, 54500, 48400, 46600, 49…
## $ mid_career_pay <dbl> 104500, 103900, 97400, 93500, 90500, 89100, …
## $ make_world_better_percent <dbl> 51, 59, 50, 61, 52, 53, 48, 57, 56, 58, 60, …
## $ stem_percent <dbl> 31, 45, 15, 30, 3, 12, 27, 17, 17, 20, 8, 7,…
# 2 character variables and 5 numeric variables
skimr::skim(salary) # 935 observations and 7 variables
Name | salary |
Number of rows | 935 |
Number of columns | 7 |
_______________________ | |
Column type frequency: | |
character | 2 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 6 | 64 | 0 | 934 | 0 |
state_name | 0 | 1 | 4 | 14 | 0 | 50 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
rank | 0 | 1.00 | 11.48 | 7.07 | 1 | 5 | 11 | 17 | 25 | ▇▇▆▆▅ |
early_career_pay | 0 | 1.00 | 51359.68 | 8365.51 | 32500 | 46000 | 50000 | 55500 | 91200 | ▂▇▂▁▁ |
mid_career_pay | 0 | 1.00 | 92805.78 | 15856.48 | 60100 | 81650 | 89900 | 100650 | 158200 | ▃▇▃▁▁ |
make_world_better_percent | 33 | 0.96 | 53.88 | 9.19 | 33 | 48 | 52 | 58 | 94 | ▂▇▃▁▁ |
stem_percent | 0 | 1.00 | 16.96 | 15.40 | 0 | 7 | 13 | 23 | 100 | ▇▃▁▁▁ |
# 2 character variables and 5 numeric variables
From the glimpse and skim command, we were able to observed that the salary potential data set had 935 observations and 7 variables out of which name
and state_name
variables are character variables and the rest of the variables:early_career_pay
,mid_career_pay
,make_world_better_percent
and stem_percent
. rank
are numeric.
I also used ggplots to get a better idea about the distribution of the continuous variables.
# histogram of early career pay
ggplot(data=salary,
aes(x=early_career_pay))+
geom_histogram(fill="lightblue")+
geom_vline(xintercept=mean(salary$early_career_pay
,na.rm=TRUE),
col = "orange",
lwd = 2)+
annotate("text", x=53000, y=30, label="Mean",angle=90)+
geom_vline(xintercept=median(salary$early_career_pay,
na.rm=TRUE), col = "red",
lwd = 2)+
annotate("text", x=48000, y=30,label="Median",angle=90)+
labs(x="Early career pay in USD",
y="Count",
title="Histogram of early career pay(in USD)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# histogram of mid career pay
ggplot(data=salary,
aes(x=mid_career_pay))+
geom_histogram(fill="lightblue")+
geom_vline(xintercept=mean(salary$mid_career_pay
,na.rm=TRUE),
col = "orange",
lwd = 2)+
annotate("text", x=95000, y=30, label="Mean",angle=90)+
geom_vline(xintercept=median(salary$mid_career_pay,
na.rm=TRUE), col = "red",
lwd = 2)+
annotate("text", x=85000, y=30,label="Median",angle=90)+
labs(x="Mid career pay in USD",
y="Count",
title="Histogram of mid career pay(in USD)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# histogram of make world a better percent
ggplot(data=salary,
aes(x=make_world_better_percent))+
geom_histogram(fill="lightblue")+
geom_vline(xintercept=mean(salary$make_world_better_percent
,na.rm=TRUE),
col = "orange",
lwd = 2)+
annotate("text", x=55, y=25, label="Mean",angle=90)+
geom_vline(xintercept=median(salary$make_world_better_percent,
na.rm=TRUE), col = "red",
lwd = 2)+
annotate("text", x=50, y=25,label="Median",angle=90)+
labs(x="Make world better in percent",
title="Histogram of make world better")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 33 rows containing non-finite values (`stat_bin()`).
# histogram of stem
ggplot(data=salary,
aes(x=stem_percent))+
geom_histogram(fill="lightblue")+
geom_vline(xintercept=mean(salary$stem_percent
,na.rm=TRUE),
col = "orange",
lwd = 2)+
annotate("text", x=20, y=50, label="Mean",angle=90)+
geom_vline(xintercept=median(salary$stem_percent,
na.rm=TRUE), col = "red",
lwd = 2)+
annotate("text", x=10, y=50,label="Median",angle=90)+
labs(x="Stem percent",
title="Histogram of stem (in percent)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# not-normal, right skew is observed!
shapiro.test(salary$early_career_pay)
##
## Shapiro-Wilk normality test
##
## data: salary$early_career_pay
## W = 0.94447, p-value < 2.2e-16
shapiro.test(salary$mid_career_pay)
##
## Shapiro-Wilk normality test
##
## data: salary$mid_career_pay
## W = 0.94412, p-value < 2.2e-16
shapiro.test(salary$make_world_better_percent)
##
## Shapiro-Wilk normality test
##
## data: salary$make_world_better_percent
## W = 0.93619, p-value < 2.2e-16
shapiro.test(salary$stem_percent)
##
## Shapiro-Wilk normality test
##
## data: salary$stem_percent
## W = 0.81815, p-value < 2.2e-16
From the histogram plots, the right skew is more prominently seen in the stem_percent
, mid_career_pay
, early_career_pay
and in make_world_better_percent
.
The variable state_name
is in character format,I will convert it to factor format in the next section.
# hist tuition data
# verify using glimpse and skim
glimpse(hist_tuition) # 270 observations and 4 variables
## Rows: 270
## Columns: 4
## $ type <chr> "All Institutions", "All Institutions", "All Institutions…
## $ year <chr> "1985-86", "1985-86", "1985-86", "1985-86", "1985-86", "1…
## $ tuition_type <chr> "All Constant", "4 Year Constant", "2 Year Constant", "Al…
## $ tuition_cost <dbl> 10893, 12274, 7508, 4885, 5504, 3367, 13822, 16224, 7421,…
# 3 character variables and 1 numeric variables
skimr::skim(hist_tuition) # 270 observations and 4 variables
Name | hist_tuition |
Number of rows | 270 |
Number of columns | 4 |
_______________________ | |
Column type frequency: | |
character | 3 |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
type | 0 | 1 | 6 | 16 | 0 | 3 | 0 |
year | 0 | 1 | 7 | 7 | 0 | 19 | 0 |
tuition_type | 0 | 1 | 11 | 15 | 0 | 6 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
tuition_cost | 0 | 1 | 17597.37 | 9201.45 | 2981 | 9796.5 | 16983 | 23574 | 41468 | ▇▆▇▂▂ |
# 3 character variables and 1 numeric variables
In the historical tuition dataset, there are 270 observations with 4 variables out of which 1 is a numeric variable and the remaining three are character variables
The histogram plot of the continuous variable is plotted below:
# histogram of tuition cost
ggplot(data=hist_tuition,
aes(x=tuition_cost))+
geom_histogram(fill="lightblue")+
geom_vline(xintercept=mean(hist_tuition$tuition_cost
,na.rm=TRUE),
col = "orange",
lwd = 2)+
annotate("text", x=19000, y=10, label="Mean",angle=90)+
geom_vline(xintercept=median(hist_tuition$tuition_cost,
na.rm=TRUE), col = "red",
lwd = 2)+
annotate("text", x=16000, y=10,label="Median",angle=90)+
labs(x="Tuition cost(in USD)",
y="count",
title="Histogram of tuition cost (in USD)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# just to verify
shapiro.test(hist_tuition$tuition_cost) #p<0.05, not normal
##
## Shapiro-Wilk normality test
##
## data: hist_tuition$tuition_cost
## W = 0.94332, p-value = 1.071e-08
The variable tuition_cost
seems is bi-modal and right skewed, this is verified by using shapiro test for normality. Since the primary focus of the project is to visualize the data and not to use modelling techniques, I will be using the data as it is without any transformation.
The variables type
, tuition type
and year
are in character format, I will be converting them into factor format in the next section. Also, I will be including only the observations that have “4 year current” and “4 year constant” tuition_type
since I am am interested in understanding how the public/private institutions have different tuition cost for the 4 year degree programs.
# for the character variables
tabyl(hist_tuition$tuition_type) # 6 types
## hist_tuition$tuition_type n percent
## 2 Year Constant 45 0.1666667
## 2 Year Current 45 0.1666667
## 4 Year Constant 45 0.1666667
## 4 Year Current 45 0.1666667
## All Constant 45 0.1666667
## All Current 45 0.1666667
tabyl(hist_tuition$type) # 3 types: private,public and all institutions
## hist_tuition$type n percent
## All Institutions 114 0.4222222
## Private 78 0.2888889
## Public 78 0.2888889
tabyl(hist_tuition$year) # different years starting from 1985 to 1986, 1995-96, 2000 to 2017(in a year increments)
## hist_tuition$year n percent
## 1985-86 18 0.06666667
## 1995-96 18 0.06666667
## 2000-01 18 0.06666667
## 2001-02 18 0.06666667
## 2002-03 18 0.06666667
## 2003-04 18 0.06666667
## 2004-05 18 0.06666667
## 2005-06 18 0.06666667
## 2006-07 6 0.02222222
## 2007-08 6 0.02222222
## 2008-09 6 0.02222222
## 2009-10 12 0.04444444
## 2010-11 6 0.02222222
## 2011-12 6 0.02222222
## 2012-13 12 0.04444444
## 2013-14 18 0.06666667
## 2014-15 18 0.06666667
## 2015-16 18 0.06666667
## 2016-17 18 0.06666667
There are 3 type of colleges: public, private and all institutions; historical tuition is collected for the years 1985-86, 1995-1996 and for the years 2000 to 2017 at each year increments. There are 6 levels in the variable tuition_type
: 2 year current, 2 year constant, 4 year current , 4 year constant, all current and all constant.
# hist tuition data
# verify using glimpse and skim
glimpse(diversity) # 50,655 observations and 5 variables
## Rows: 50,655
## Columns: 5
## $ name <chr> "University of Phoenix-Arizona", "University of Phoen…
## $ total_enrollment <dbl> 195059, 195059, 195059, 195059, 195059, 195059, 19505…
## $ state <chr> "Arizona", "Arizona", "Arizona", "Arizona", "Arizona"…
## $ category <chr> "Women", "American Indian / Alaska Native", "Asian", …
## $ enrollment <dbl> 134722, 876, 1959, 31455, 13984, 1019, 58209, 19039, …
# 3 character variables and 2 numeric variables
skimr::skim(diversity) # 50,655 observations and 5 variables
Name | diversity |
Number of rows | 50655 |
Number of columns | 5 |
_______________________ | |
Column type frequency: | |
character | 3 |
numeric | 2 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 341 | 0.99 | 5 | 84 | 0 | 4574 | 0 |
state | 341 | 0.99 | 4 | 14 | 0 | 50 | 0 |
category | 0 | 1.00 | 5 | 34 | 0 | 11 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
total_enrollment | 0 | 1 | 4386.69 | 8217.53 | 1 | 391 | 1391 | 4504 | 195059 | ▇▁▁▁▁ |
enrollment | 0 | 1 | 774.19 | 2545.01 | 0 | 6 | 59 | 379 | 134722 | ▇▁▁▁▁ |
# 3 character variables and 2 numeric variables
Notes on the data
It can be noted that the variables category
and state
are character variables, which I will convert to factor format in the next section . It is observed that there are 341 missing values in the variables name
and state
which would be addressed in the data transformation section.
# for the categorical variables
diversity_cat_table<-tabyl(diversity$category)
# gender, race and ethnicity variables are all lumped into one column, numbers of only women are available, numbers of men need to be calculated by subtracting from total enrollment, and the long format data needs to be converted to wide format data with one column each for each gender and each race/ethnic groups!
diversity_cat_table
## diversity$category n percent
## American Indian / Alaska Native 4605 0.09090909
## Asian 4605 0.09090909
## Black 4605 0.09090909
## Hispanic 4605 0.09090909
## Native Hawaiian / Pacific Islander 4605 0.09090909
## Non-Resident Foreign 4605 0.09090909
## Total Minority 4605 0.09090909
## Two Or More Races 4605 0.09090909
## Unknown 4605 0.09090909
## White 4605 0.09090909
## Women 4605 0.09090909
# check the number of total observations
sum(diversity_cat_table$n) # 50655 observation, tallies with the number of rows in the dataset and no NA observations!
## [1] 50655
Although the histogram plots of the enrollment
and total_enrollment
imply that the data is right-skewed, since the enrollment and total_enrollment are lumped for the different gender, racial and ethnic groups, the histogram plots wouldn’t be very meaningful in this case.
If the data needs to be transformed in any way (values recoded, pivoted, etc), do it here. Examples include transforming a continuous variable into a categorical using
case_when()
, etc.
Bonus points (5 points) for datasets that require merging of tables, but only if you reason through whether you should use
left_join
,inner_join
, orright_join
on these tables. No credit will be provided if you don’t.
Show your transformed table here. Use tools such as
glimpse()
,skim()
orhead()
to illustrate your point.
Are the values what you expected for the variables? Why or Why not?
From the earlier section, it was noted that the income levels
, year
and state
variables are all in the character format, it was required to be transformed to factor format in order to use them in visualization and creating summary tables.
# check the number and names of the different levels in the categorical variable
income_tab<-tabyl(income$income_lvl)# 5 categories
# verify if there are any missing values
sum(income_tab$n) #209012
## [1] 209012
nrow(income) #209012, no missing values
## [1] 209012
# convert income levels,year and state to factor format
income<-income%>%
mutate(income_level=case_when(income_lvl=="0 to 30,000"~1,
income_lvl=="30,001 to 48,000"~2,
income_lvl=="48,001 to 75,000"~3,
income_lvl=="75,001 to 110,000"~4,
TRUE~5),
income_level=factor(income_level,
levels=c(1,2,3,4,5),
labels(c("Less than $30,000",
"Between $30,001 and $48,000",
"Between $48,001 and $75,000",
"Between $75,001 to $110,000",
"Over $110,000"))),
year=as.factor(year),
state=as.factor(state))
# drop income_lvl
income<-income%>%
select(-income_lvl)
#verify
skimr::skim(income) # 3 factor variables,income_level,year and state is created
Name | income |
Number of rows | 209012 |
Number of columns | 7 |
_______________________ | |
Column type frequency: | |
character | 2 |
factor | 3 |
numeric | 2 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 6 | 75 | 0 | 3664 | 0 |
campus | 0 | 1 | 9 | 10 | 0 | 2 | 0 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
state | 0 | 1 | FALSE | 51 | NY: 16152, CA: 14104, PA: 13436, TX: 11133 |
year | 0 | 1 | FALSE | 9 | 201: 27845, 201: 26390, 201: 24574, 201: 24547 |
income_level | 0 | 1 | FALSE | 4 | 5: 80256, 1: 44969, 2: 43384, 4: 40403 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
total_price | 0 | 1 | 30102.19 | 13824.70 | 4906 | 19186.00 | 26286.0 | 38831 | 114083.00 | ▇▆▂▁▁ |
net_cost | 0 | 1 | 16784.92 | 8887.16 | -15101 | 10148.56 | 16207.1 | 22350 | 95674.84 | ▁▇▁▁▁ |
Since, I am interested only in the on-campus type programs to answer my research question of interest, I excluded the off campus type observations from the dataset.
nrow(income) #209012 observations
## [1] 209012
income_on<-income%>%
filter(campus=="On Campus")
skimr::skim(income_on) #75391 observations
Name | income_on |
Number of rows | 75391 |
Number of columns | 7 |
_______________________ | |
Column type frequency: | |
character | 2 |
factor | 3 |
numeric | 2 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 6 | 75 | 0 | 2043 | 0 |
campus | 0 | 1 | 9 | 9 | 0 | 1 | 0 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
state | 0 | 1 | FALSE | 51 | NY: 6661, PA: 5133, CA: 4314, TX: 4156 |
year | 0 | 1 | FALSE | 8 | 201: 9990, 201: 9800, 201: 9415, 201: 9375 |
income_level | 0 | 1 | FALSE | 4 | 5: 29665, 1: 15483, 2: 15323, 4: 14920 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
total_price | 0 | 1 | 33244.63 | 15145.87 | 4906 | 20714 | 30466.00 | 44278.00 | 96636.00 | ▇▇▅▁▁ |
net_cost | 0 | 1 | 18193.89 | 8559.27 | -15101 | 12181 | 17577.91 | 23128.32 | 95674.84 | ▁▇▁▁▁ |
The transformed income data set has 75391 observations(limited only to on-campus type observations ) and 7 variables and the variables state
, year
, income_lvl
are converted to factor format.
It was noted that there were 52 missing observations for state
variable and in the following code chunk, I have assessed the reason for missingness and manupulated data in order to address the missigness.
# filter out the state with NA's to understand why they are missing!
cost_na<-cost%>%filter(is.na(state))
# looks like the state names are missing however the state codes are present
# get the different state codes and counts of the states with missing names!
tabyl(cost_na$state_code)
## [1] cost_na$state_code n percent
## <0 rows> (or 0-length row.names)
# based on the state code's fill the name of the state using mutate command
cost<-cost%>%
mutate(state.upd=case_when(!is.na(state)~state,
is.na(state)&state_code=="AS"~"American Samoa",
is.na(state)&state_code=="DC"~"District of Columbia",
is.na(state)&state_code=="GU"~"Guam",
is.na(state)&state_code=="PR"~"Peurto Rico",
is.na(state)&state_code=="VI"~"Virgin Islands",
TRUE~"NA"))
#verify
tabyl(cost$state.upd) # no NA's
## cost$state.upd n percent
## Alabama 54 0.0181634712
## Alaska 6 0.0020181635
## American Samoa 1 0.0003363606
## Arizona 34 0.0114362597
## Arkansas 46 0.0154725866
## California 254 0.0854355869
## Colorado 38 0.0127817020
## Connecticut 36 0.0121089808
## Delaware 9 0.0030272452
## District of Columbia 8 0.0026908846
## Florida 88 0.0295997309
## Georgia 79 0.0265724857
## Guam 1 0.0003363606
## Hawaii 14 0.0047090481
## Idaho 13 0.0043726875
## Illinois 125 0.0420450723
## Indiana 62 0.0208543559
## Iowa 52 0.0174907501
## Kansas 52 0.0174907501
## Kentucky 44 0.0147998655
## Louisiana 34 0.0114362597
## Maine 27 0.0090817356
## Maryland 45 0.0151362260
## Massachusetts 93 0.0312815338
## Michigan 78 0.0262361251
## Minnesota 71 0.0238816011
## Mississippi 32 0.0107635385
## Missouri 73 0.0245543222
## Montana 22 0.0073999327
## Nebraska 33 0.0110998991
## Nevada 10 0.0033636058
## New Hampshire 21 0.0070635721
## New Jersey 54 0.0181634712
## New Mexico 24 0.0080726539
## New York 221 0.0743356879
## North Carolina 117 0.0393541877
## North Dakota 18 0.0060544904
## Ohio 127 0.0427177935
## Oklahoma 40 0.0134544231
## Oregon 40 0.0134544231
## Pennsylvania 160 0.0538176926
## Peurto Rico 41 0.0137907837
## Rhode Island 11 0.0036999664
## South Carolina 57 0.0191725530
## South Dakota 18 0.0060544904
## Tennessee 62 0.0208543559
## Texas 150 0.0504540868
## Utah 14 0.0047090481
## Vermont 19 0.0063908510
## Virgin Islands 1 0.0003363606
## Virginia 79 0.0265724857
## Washington 60 0.0201816347
## West Virginia 30 0.0100908174
## Wisconsin 67 0.0225361588
## Wyoming 8 0.0026908846
# use the state.upd column going forward but rename it to state
cost<-cost%>%
select(-state)%>%
mutate(state=state.upd)%>%
select(-state.upd)
# rearrange the columns, state before state_code
cost<-cost%>%relocate("state",.after="name")
# verify
skimr::skim(cost) # no missing values in state variable
Name | cost |
Number of rows | 2973 |
Number of columns | 10 |
_______________________ | |
Column type frequency: | |
character | 5 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 8 | 67 | 0 | 2938 | 0 |
state | 0 | 1 | 4 | 20 | 0 | 55 | 0 |
state_code | 0 | 1 | 2 | 2 | 0 | 55 | 0 |
type | 0 | 1 | 5 | 10 | 0 | 4 | 0 |
degree_length | 0 | 1 | 5 | 6 | 0 | 3 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
room_and_board | 1094 | 0.63 | 10095.28 | 3288.55 | 30 | 7935 | 10000 | 12424.5 | 21300 | ▁▅▇▃▁ |
in_state_tuition | 0 | 1.00 | 16491.29 | 14773.84 | 480 | 4890 | 10099 | 27124.0 | 59985 | ▇▂▂▁▁ |
in_state_total | 0 | 1.00 | 22871.73 | 18948.39 | 962 | 5802 | 17669 | 35960.0 | 75003 | ▇▅▂▂▁ |
out_of_state_tuition | 0 | 1.00 | 20532.73 | 13255.65 | 480 | 9552 | 17486 | 29208.0 | 59985 | ▇▆▅▂▁ |
out_of_state_total | 0 | 1.00 | 26913.16 | 17719.73 | 1376 | 11196 | 23214 | 39054.0 | 75003 | ▇▅▅▂▁ |
It was observed those 52 missing values of state was having complete observations for the state_code with which I was able to address the missing values in state.
The variablesstate
and type
are converted to factor formats.
cost<-cost%>%mutate(across(.cols=c(state,type),.fns=as.factor))
skimr::skim(cost) # 2 factor variables are created
Name | cost |
Number of rows | 2973 |
Number of columns | 10 |
_______________________ | |
Column type frequency: | |
character | 3 |
factor | 2 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 8 | 67 | 0 | 2938 | 0 |
state_code | 0 | 1 | 2 | 2 | 0 | 55 | 0 |
degree_length | 0 | 1 | 5 | 6 | 0 | 3 | 0 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
state | 0 | 1 | FALSE | 55 | Cal: 254, New: 221, Pen: 160, Tex: 150 |
type | 0 | 1 | FALSE | 4 | Pub: 1584, Pri: 1281, For: 107, Oth: 1 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
room_and_board | 1094 | 0.63 | 10095.28 | 3288.55 | 30 | 7935 | 10000 | 12424.5 | 21300 | ▁▅▇▃▁ |
in_state_tuition | 0 | 1.00 | 16491.29 | 14773.84 | 480 | 4890 | 10099 | 27124.0 | 59985 | ▇▂▂▁▁ |
in_state_total | 0 | 1.00 | 22871.73 | 18948.39 | 962 | 5802 | 17669 | 35960.0 | 75003 | ▇▅▂▂▁ |
out_of_state_tuition | 0 | 1.00 | 20532.73 | 13255.65 | 480 | 9552 | 17486 | 29208.0 | 59985 | ▇▆▅▂▁ |
out_of_state_total | 0 | 1.00 | 26913.16 | 17719.73 | 1376 | 11196 | 23214 | 39054.0 | 75003 | ▇▅▅▂▁ |
State and type variables are converted to factor formats.
Since I wanted to limit the scope of the data set, I am primarily interested in 4 years degree observations to answer my research question of interest.
cost_4yr<-cost%>%filter(degree_length=="4 Year")
nrow(cost_4yr) #1852 observations
## [1] 1852
# assess the NA observations in room and board
cost_room_na<-cost_4yr%>%filter(is.na(room_and_board))
nrow(cost_room_na) # 245 missing values for room and board but the in-state and out of state tuition values are still present
## [1] 245
#final cost dataset
cost_final<-cost_4yr
#verify
skimr::skim(cost_final)
Name | cost_final |
Number of rows | 1852 |
Number of columns | 10 |
_______________________ | |
Column type frequency: | |
character | 3 |
factor | 2 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 8 | 66 | 0 | 1829 | 0 |
state_code | 0 | 1 | 2 | 2 | 0 | 53 | 0 |
degree_length | 0 | 1 | 6 | 6 | 0 | 1 | 0 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
state | 0 | 1 | FALSE | 53 | New: 163, Cal: 135, Pen: 129, Tex: 82 |
type | 0 | 1 | FALSE | 3 | Pri: 1209, Pub: 597, For: 46, Oth: 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
room_and_board | 245 | 0.87 | 10624.67 | 3070.93 | 950 | 8617.00 | 10400 | 12713.0 | 21300 | ▁▅▇▃▁ |
in_state_tuition | 0 | 1.00 | 23212.36 | 14664.22 | 480 | 9932.25 | 19960 | 33854.5 | 59985 | ▇▅▆▃▂ |
in_state_total | 0 | 1.00 | 32431.50 | 17533.60 | 1430 | 18199.00 | 28287 | 44845.5 | 75003 | ▃▇▅▃▂ |
out_of_state_tuition | 0 | 1.00 | 26910.22 | 12630.54 | 480 | 17683.75 | 25392 | 34863.5 | 59985 | ▂▇▇▃▂ |
out_of_state_total | 0 | 1.00 | 36129.35 | 15994.29 | 1430 | 24951.00 | 34888 | 46670.0 | 75003 | ▂▆▇▅▂ |
The final cost data set has only 1852 observations limited to only observations with 4 yr degree_length programs.
salary<-salary%>%
rename(state=state_name)%>%
mutate(state=as.factor(state))
#verify
skimr::skim(salary)
Name | salary |
Number of rows | 935 |
Number of columns | 7 |
_______________________ | |
Column type frequency: | |
character | 1 |
factor | 1 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 6 | 64 | 0 | 934 | 0 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
state | 0 | 1 | FALSE | 50 | Ala: 25, Cal: 25, Flo: 25, Geo: 25 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
rank | 0 | 1.00 | 11.48 | 7.07 | 1 | 5 | 11 | 17 | 25 | ▇▇▆▆▅ |
early_career_pay | 0 | 1.00 | 51359.68 | 8365.51 | 32500 | 46000 | 50000 | 55500 | 91200 | ▂▇▂▁▁ |
mid_career_pay | 0 | 1.00 | 92805.78 | 15856.48 | 60100 | 81650 | 89900 | 100650 | 158200 | ▃▇▃▁▁ |
make_world_better_percent | 33 | 0.96 | 53.88 | 9.19 | 33 | 48 | 52 | 58 | 94 | ▂▇▃▁▁ |
stem_percent | 0 | 1.00 | 16.96 | 15.40 | 0 | 7 | 13 | 23 | 100 | ▇▃▁▁▁ |
State_name
column was renamed to State
and it was converted to a factor format. We retain the NA observations of make_world_better_percent
as it is since those observations have complete values for the other earning potential and quality of education variables.
hist_tuition<-hist_tuition%>%
mutate(across(where(is.character),as.factor))
#verify
skimr::skim(hist_tuition)
Name | hist_tuition |
Number of rows | 270 |
Number of columns | 4 |
_______________________ | |
Column type frequency: | |
factor | 3 |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
type | 0 | 1 | FALSE | 3 | All: 114, Pri: 78, Pub: 78 |
year | 0 | 1 | FALSE | 19 | 198: 18, 199: 18, 200: 18, 200: 18 |
tuition_type | 0 | 1 | FALSE | 6 | 2 Y: 45, 2 Y: 45, 4 Y: 45, 4 Y: 45 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
tuition_cost | 0 | 1 | 17597.37 | 9201.45 | 2981 | 9796.5 | 16983 | 23574 | 41468 | ▇▆▇▂▂ |
It was noted that year, type and tuition_type were all in the character format, they were converted to the factor format which would help in data visualization and summarization.
It was observed that there were 341 missing values in the variables name
and state
, the next step would be to explore the reason behind the missingness, we could either remove the NA observations in the state
and name
or group the NA observations into a separate category.
diversity_na<-diversity%>%filter(is.na(name)|is.name(state))
#341 observations
# implies that the same observations had missing values for both the name and state variable
# remove the NA observations from the data set since there is no way to merge with the other data sets, both the university name and state values are missing!
total_na<-length(unique(diversity_na$total_enrollment))
total_na # 30 distinct colleges
## [1] 30
It can be observed that the 341 missing observations have missing values for the name and state, by assessing the distinct entries of the total enrollment variable, these missing observations are from 30 different colleges.
If I had to assign them to a category called “NA”, it may not be a great option in this case as we would lump the enrollment from the 30 colleges with missing observations. Thus, I will be removing the NA observations from the diversity data set.
diversity<-diversity%>%filter(!is.na(name)|!is.na(state))
#50314 observations, 341 observations are removed!
skimr::skim(diversity) # 50314 observations and 5 variables, 3 character and 2 numeric variables,no missing values noted
Name | diversity |
Number of rows | 50314 |
Number of columns | 5 |
_______________________ | |
Column type frequency: | |
character | 3 |
numeric | 2 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 5 | 84 | 0 | 4574 | 0 |
state | 0 | 1 | 4 | 14 | 0 | 50 | 0 |
category | 0 | 1 | 5 | 34 | 0 | 11 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
total_enrollment | 0 | 1 | 4392.17 | 8227.65 | 1 | 395 | 1394 | 4504 | 195059 | ▇▁▁▁▁ |
enrollment | 0 | 1 | 775.26 | 2548.21 | 0 | 6 | 59 | 381 | 134722 | ▇▁▁▁▁ |
Thus after excluding the NA observations in the diversity data, there were 50314 observations with 5 variables , 3 of them are character variables and the remaining 2 are numeric variables.
In the diversity data, it was observed that the data was in long-format where the gender and race categories were all lumped in one column. My next step in data transformation is to create separate columns for the Women, Men and for each of the race-ethnic groups i.e. we would be converting the long format of data to the wide-format.
There was also no separate row for the Men enrollment in the original diversity data, so we will have to subtract the women enrollment from the total enrollment in order to obtain the men enrollment for each college, this is easier to implement when the data is in long format.
Also, when we are comparing enrollment in different types of college across different states in the USA, we will have to account for the fact that different states have different population/student population as well.
For example, states like Texas, California, Florida, New York would certainly have more number of students enrolling in colleges than states like Rhode Island, New Hampshire,Wyoming.
It wouldn’t be appropriate to just report the number of students from different gender/ different racial/ethnic groups attending public/ private institutions across the different states, instead it would be reasonable to report the proportion of enrollment for each of the gender and racial ethnic groups. This is achieved by creating new columns for proportions for enrollment per college for Male, Female, Asian, Black, White, American Indian/Alaskan native, Native Hawaiian/ Pacific islander, Two or more races,unknown, non-resident foreign,total minority and Hispanic groups. I will also be creating additional column for enrollment of resident students.
# convert the variables state in factor format
diversity<-diversity%>%
mutate(state=as.factor(state))
skimr::skim(diversity)
Name | diversity |
Number of rows | 50314 |
Number of columns | 5 |
_______________________ | |
Column type frequency: | |
character | 2 |
factor | 1 |
numeric | 2 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 5 | 84 | 0 | 4574 | 0 |
category | 0 | 1 | 5 | 34 | 0 | 11 | 0 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
state | 0 | 1 | FALSE | 50 | Cal: 5027, New: 3344, Tex: 2915, Pen: 2739 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
total_enrollment | 0 | 1 | 4392.17 | 8227.65 | 1 | 395 | 1394 | 4504 | 195059 | ▇▁▁▁▁ |
enrollment | 0 | 1 | 775.26 | 2548.21 | 0 | 6 | 59 | 381 | 134722 | ▇▁▁▁▁ |
# convert long to wide data format
diversity_wide<-pivot_wider(diversity,names_from=category,values_from=enrollment)
#check the data type!
skimr::skim(diversity_wide) #4574 observations and 14 variables,12 are numeric and 1 is factor and 1 is character variable
Name | diversity_wide |
Number of rows | 4574 |
Number of columns | 14 |
_______________________ | |
Column type frequency: | |
character | 1 |
factor | 1 |
numeric | 12 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 5 | 84 | 0 | 4574 | 0 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
state | 0 | 1 | FALSE | 50 | Cal: 457, New: 304, Tex: 265, Pen: 249 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
total_enrollment | 0 | 1 | 4392.17 | 8228.47 | 1 | 395.25 | 1394.0 | 4504.00 | 195059 | ▇▁▁▁▁ |
Women | 0 | 1 | 2480.10 | 4708.32 | 0 | 220.25 | 815.0 | 2705.50 | 134722 | ▇▁▁▁▁ |
American Indian / Alaska Native | 0 | 1 | 31.71 | 111.37 | 0 | 1.00 | 6.0 | 23.00 | 2659 | ▇▁▁▁▁ |
Asian | 0 | 1 | 250.96 | 830.57 | 0 | 5.00 | 24.0 | 122.75 | 11257 | ▇▁▁▁▁ |
Black | 0 | 1 | 569.42 | 1390.84 | 0 | 37.00 | 135.0 | 460.75 | 31455 | ▇▁▁▁▁ |
Hispanic | 0 | 1 | 662.05 | 2028.28 | 0 | 19.00 | 90.0 | 365.75 | 44870 | ▇▁▁▁▁ |
Native Hawaiian / Pacific Islander | 0 | 1 | 11.96 | 39.95 | 0 | 0.00 | 2.0 | 8.00 | 1019 | ▇▁▁▁▁ |
White | 0 | 1 | 2303.23 | 4259.66 | 0 | 140.00 | 691.0 | 2528.00 | 61498 | ▇▁▁▁▁ |
Two Or More Races | 0 | 1 | 129.50 | 392.62 | 0 | 3.00 | 24.0 | 103.00 | 19039 | ▇▁▁▁▁ |
Unknown | 0 | 1 | 234.40 | 1207.42 | 0 | 6.00 | 43.0 | 196.00 | 65163 | ▇▁▁▁▁ |
Non-Resident Foreign | 0 | 1 | 198.94 | 704.63 | 0 | 0.00 | 9.0 | 84.00 | 11495 | ▇▁▁▁▁ |
Total Minority | 0 | 1 | 1655.60 | 3707.10 | 0 | 128.25 | 388.5 | 1335.00 | 68332 | ▇▁▁▁▁ |
typeof(diversity_wide) # list
## [1] "list"
# add a column for the enrollment for Men using mutate command
diversity_wide<-diversity_wide%>%mutate(Men=(total_enrollment-Women),Resident=((total_enrollment)-(diversity_wide$`Non-Resident Foreign`)))
# verify!
skimr::skim(diversity_wide) #14 columns
Name | diversity_wide |
Number of rows | 4574 |
Number of columns | 16 |
_______________________ | |
Column type frequency: | |
character | 1 |
factor | 1 |
numeric | 14 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 5 | 84 | 0 | 4574 | 0 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
state | 0 | 1 | FALSE | 50 | Cal: 457, New: 304, Tex: 265, Pen: 249 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
total_enrollment | 0 | 1 | 4392.17 | 8228.47 | 1 | 395.25 | 1394.0 | 4504.00 | 195059 | ▇▁▁▁▁ |
Women | 0 | 1 | 2480.10 | 4708.32 | 0 | 220.25 | 815.0 | 2705.50 | 134722 | ▇▁▁▁▁ |
American Indian / Alaska Native | 0 | 1 | 31.71 | 111.37 | 0 | 1.00 | 6.0 | 23.00 | 2659 | ▇▁▁▁▁ |
Asian | 0 | 1 | 250.96 | 830.57 | 0 | 5.00 | 24.0 | 122.75 | 11257 | ▇▁▁▁▁ |
Black | 0 | 1 | 569.42 | 1390.84 | 0 | 37.00 | 135.0 | 460.75 | 31455 | ▇▁▁▁▁ |
Hispanic | 0 | 1 | 662.05 | 2028.28 | 0 | 19.00 | 90.0 | 365.75 | 44870 | ▇▁▁▁▁ |
Native Hawaiian / Pacific Islander | 0 | 1 | 11.96 | 39.95 | 0 | 0.00 | 2.0 | 8.00 | 1019 | ▇▁▁▁▁ |
White | 0 | 1 | 2303.23 | 4259.66 | 0 | 140.00 | 691.0 | 2528.00 | 61498 | ▇▁▁▁▁ |
Two Or More Races | 0 | 1 | 129.50 | 392.62 | 0 | 3.00 | 24.0 | 103.00 | 19039 | ▇▁▁▁▁ |
Unknown | 0 | 1 | 234.40 | 1207.42 | 0 | 6.00 | 43.0 | 196.00 | 65163 | ▇▁▁▁▁ |
Non-Resident Foreign | 0 | 1 | 198.94 | 704.63 | 0 | 0.00 | 9.0 | 84.00 | 11495 | ▇▁▁▁▁ |
Total Minority | 0 | 1 | 1655.60 | 3707.10 | 0 | 128.25 | 388.5 | 1335.00 | 68332 | ▇▁▁▁▁ |
Men | 0 | 1 | 1912.07 | 3663.95 | 0 | 129.00 | 567.5 | 1867.00 | 60337 | ▇▁▁▁▁ |
Resident | 0 | 1 | 4193.22 | 7786.88 | 1 | 382.00 | 1359.0 | 4356.50 | 191704 | ▇▁▁▁▁ |
# check for further missing values
sum(is.na(diversity_wide))
## [1] 0
Since the data set considered is very large, I have considered a set of inclusion-exclusion criterias to obtain a reasonable sized data set to answer my research question of interest.
I am primarily interested only in the on-campus programs and 4 year degree type programs obtained only from private and public colleges.This implies, I will be excluding observations from:
-> 2 year degree length program -> off-campus programs ->“for profit” and “other” type of colleges
I will filter out the observations that are aligned with my inclusion criteria`s after I perform the merge operation between tuition_cost, tuition_income,salary and diversity datasets in the subsequent sections. I have stated the reasons as to why I chose this exclusion critera in the next sections.
Since I am interested in studying if the type of college attended would influence the earning potential and quality of education, I will have to merge 2 datasets: tuition cost
and salary potential
using left join.
The tuition cost
data has the information about the name of the college, the state in which the college is located and the type of the college(“public”,“private”,“For Profit”, “Other” ) and the degree length and the salary_potential
data consists of two earning potential indicators-early_career_pay
and mid_career_pay
and two quality of education indicators- make_world_better_percent
and stem_percent
.
# make sure the variable names are clean before merging
diversity_wide<-clean_names(diversity_wide)
salary<-clean_names(salary)
cost<-clean_names(cost)
# merge using left join
merged1<-merge(x=cost,y=salary,by=c("name","state"),all.x=TRUE)
skimr::skim(merged1)
Name | merged1 |
Number of rows | 2973 |
Number of columns | 15 |
_______________________ | |
Column type frequency: | |
character | 3 |
factor | 2 |
numeric | 10 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 8 | 67 | 0 | 2938 | 0 |
state_code | 0 | 1 | 2 | 2 | 0 | 55 | 0 |
degree_length | 0 | 1 | 5 | 6 | 0 | 3 | 0 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
state | 0 | 1 | FALSE | 55 | Cal: 254, New: 221, Pen: 160, Tex: 150 |
type | 0 | 1 | FALSE | 4 | Pub: 1584, Pri: 1281, For: 107, Oth: 1 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
room_and_board | 1094 | 0.63 | 10095.28 | 3288.55 | 30 | 7935 | 10000 | 12424.5 | 21300 | ▁▅▇▃▁ |
in_state_tuition | 0 | 1.00 | 16491.29 | 14773.84 | 480 | 4890 | 10099 | 27124.0 | 59985 | ▇▂▂▁▁ |
in_state_total | 0 | 1.00 | 22871.73 | 18948.39 | 962 | 5802 | 17669 | 35960.0 | 75003 | ▇▅▂▂▁ |
out_of_state_tuition | 0 | 1.00 | 20532.73 | 13255.65 | 480 | 9552 | 17486 | 29208.0 | 59985 | ▇▆▅▂▁ |
out_of_state_total | 0 | 1.00 | 26913.16 | 17719.73 | 1376 | 11196 | 23214 | 39054.0 | 75003 | ▇▅▅▂▁ |
rank | 2380 | 0.20 | 11.81 | 7.03 | 1 | 6 | 12 | 18.0 | 25 | ▇▇▇▆▅ |
early_career_pay | 2380 | 0.20 | 50892.58 | 7929.03 | 32500 | 45900 | 49700 | 54500.0 | 88800 | ▂▇▂▁▁ |
mid_career_pay | 2380 | 0.20 | 91941.82 | 15055.51 | 60100 | 81500 | 89600 | 99700.0 | 158200 | ▃▇▃▁▁ |
make_world_better_percent | 2401 | 0.19 | 53.83 | 8.81 | 33 | 48 | 53 | 59.0 | 94 | ▂▇▃▁▁ |
stem_percent | 2380 | 0.20 | 16.63 | 14.78 | 0 | 7 | 14 | 22.0 | 97 | ▇▃▁▁▁ |
skimr::skim(merged1) # missingness rate is very high in the variables are interested
Name | merged1 |
Number of rows | 2973 |
Number of columns | 15 |
_______________________ | |
Column type frequency: | |
character | 3 |
factor | 2 |
numeric | 10 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 8 | 67 | 0 | 2938 | 0 |
state_code | 0 | 1 | 2 | 2 | 0 | 55 | 0 |
degree_length | 0 | 1 | 5 | 6 | 0 | 3 | 0 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
state | 0 | 1 | FALSE | 55 | Cal: 254, New: 221, Pen: 160, Tex: 150 |
type | 0 | 1 | FALSE | 4 | Pub: 1584, Pri: 1281, For: 107, Oth: 1 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
room_and_board | 1094 | 0.63 | 10095.28 | 3288.55 | 30 | 7935 | 10000 | 12424.5 | 21300 | ▁▅▇▃▁ |
in_state_tuition | 0 | 1.00 | 16491.29 | 14773.84 | 480 | 4890 | 10099 | 27124.0 | 59985 | ▇▂▂▁▁ |
in_state_total | 0 | 1.00 | 22871.73 | 18948.39 | 962 | 5802 | 17669 | 35960.0 | 75003 | ▇▅▂▂▁ |
out_of_state_tuition | 0 | 1.00 | 20532.73 | 13255.65 | 480 | 9552 | 17486 | 29208.0 | 59985 | ▇▆▅▂▁ |
out_of_state_total | 0 | 1.00 | 26913.16 | 17719.73 | 1376 | 11196 | 23214 | 39054.0 | 75003 | ▇▅▅▂▁ |
rank | 2380 | 0.20 | 11.81 | 7.03 | 1 | 6 | 12 | 18.0 | 25 | ▇▇▇▆▅ |
early_career_pay | 2380 | 0.20 | 50892.58 | 7929.03 | 32500 | 45900 | 49700 | 54500.0 | 88800 | ▂▇▂▁▁ |
mid_career_pay | 2380 | 0.20 | 91941.82 | 15055.51 | 60100 | 81500 | 89600 | 99700.0 | 158200 | ▃▇▃▁▁ |
make_world_better_percent | 2401 | 0.19 | 53.83 | 8.81 | 33 | 48 | 53 | 59.0 | 94 | ▂▇▃▁▁ |
stem_percent | 2380 | 0.20 | 16.63 | 14.78 | 0 | 7 | 14 | 22.0 | 97 | ▇▃▁▁▁ |
# analyzing the trends in missingness
# check the counts in each category of the following variables
tabyl(merged1$degree_length)
## merged1$degree_length n percent
## 2 Year 1120 0.3767238480
## 4 Year 1852 0.6229397915
## Other 1 0.0003363606
tabyl(merged1$type)
## merged1$type n percent
## For Profit 107 0.0359905819
## Other 1 0.0003363606
## Private 1281 0.4308779011
## Public 1584 0.5327951564
# filter only the NA observations in the quality of education and salary potential variables
merged1_na<-merged1%>%filter(is.na(early_career_pay)&is.na(mid_career_pay)&is.na(make_world_better_percent)&is.na(stem_percent)) # 2380 observations with missing NA in all the 4 variables
# check the counts in each category of the following variables
tabyl(merged1_na$degree_length) # almost all the 2 year degree length data has NA observations, for the 4 year degree length, 1261 out of 1852 observations are missing
## merged1_na$degree_length n percent
## 2 Year 1118 0.4697478992
## 4 Year 1261 0.5298319328
## Other 1 0.0004201681
tabyl(merged1_na$type)# around 907 observations out of 1281 private college observations are missing, 1365 public out of 1584 observations are missing, all the for profit and other college observations are missing!
## merged1_na$type n percent
## For Profit 107 0.0449579832
## Other 1 0.0004201681
## Private 907 0.3810924370
## Public 1365 0.5735294118
# final merged data set
# thus from the missingness analysis, we can omit the NA observations if they were NA in all the 4 variables, since there was only one complete observation for the 2 year degree data, I have excluded the 2 year degree length observations from my analysis
merged1_final<-merged1%>%filter(!is.na(early_career_pay)|!is.na(mid_career_pay)|!is.na(make_world_better_percent)|!is.na(stem_percent))%>%
filter(degree_length!="2 Year"|type!="For Profit"|type!="Other") #593 observations
# only 593 observations that can be used to answer our question of interest
tabyl(merged1_final$type)
## merged1_final$type n percent
## For Profit 0 0.0000000
## Other 0 0.0000000
## Private 374 0.6306914
## Public 219 0.3693086
# verify
skimr::skim(merged1_final)
Name | merged1_final |
Number of rows | 593 |
Number of columns | 15 |
_______________________ | |
Column type frequency: | |
character | 3 |
factor | 2 |
numeric | 10 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 11 | 56 | 0 | 593 | 0 |
state_code | 0 | 1 | 2 | 2 | 0 | 40 | 0 |
degree_length | 0 | 1 | 6 | 6 | 0 | 2 | 0 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
state | 0 | 1 | FALSE | 40 | Ken: 24, Ala: 22, Geo: 22, Iow: 22 |
type | 0 | 1 | FALSE | 2 | Pri: 374, Pub: 219, For: 0, Oth: 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
room_and_board | 29 | 0.95 | 10730.87 | 2738.66 | 1698 | 8892 | 10353 | 12702.5 | 18127 | ▁▃▇▅▂ |
in_state_tuition | 0 | 1.00 | 26355.69 | 16289.35 | 4118 | 10032 | 25600 | 40652.0 | 58230 | ▇▂▃▃▃ |
in_state_total | 0 | 1.00 | 36561.78 | 18618.05 | 4258 | 19392 | 34280 | 52192.0 | 75003 | ▆▇▆▆▃ |
out_of_state_tuition | 0 | 1.00 | 30826.91 | 12861.58 | 4118 | 20800 | 28898 | 41010.0 | 58230 | ▂▇▇▅▃ |
out_of_state_total | 0 | 1.00 | 41033.00 | 15511.94 | 6670 | 29504 | 38783 | 52856.0 | 75003 | ▂▇▇▆▃ |
rank | 0 | 1.00 | 11.81 | 7.03 | 1 | 6 | 12 | 18.0 | 25 | ▇▇▇▆▅ |
early_career_pay | 0 | 1.00 | 50892.58 | 7929.03 | 32500 | 45900 | 49700 | 54500.0 | 88800 | ▂▇▂▁▁ |
mid_career_pay | 0 | 1.00 | 91941.82 | 15055.51 | 60100 | 81500 | 89600 | 99700.0 | 158200 | ▃▇▃▁▁ |
make_world_better_percent | 21 | 0.96 | 53.83 | 8.81 | 33 | 48 | 53 | 59.0 | 94 | ▂▇▃▁▁ |
stem_percent | 0 | 1.00 | 16.63 | 14.78 | 0 | 7 | 14 | 22.0 | 97 | ▇▃▁▁▁ |
# merge using left join
merged2<-merge(x=cost,y=diversity_wide,by=c("name","state"),all.x=TRUE)
skimr::skim(merged2)
Name | merged2 |
Number of rows | 2973 |
Number of columns | 24 |
_______________________ | |
Column type frequency: | |
character | 3 |
factor | 2 |
numeric | 19 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 8 | 67 | 0 | 2938 | 0 |
state_code | 0 | 1 | 2 | 2 | 0 | 55 | 0 |
degree_length | 0 | 1 | 5 | 6 | 0 | 3 | 0 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
state | 0 | 1 | FALSE | 55 | Cal: 254, New: 221, Pen: 160, Tex: 150 |
type | 0 | 1 | FALSE | 4 | Pub: 1584, Pri: 1281, For: 107, Oth: 1 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
room_and_board | 1094 | 0.63 | 10095.28 | 3288.55 | 30 | 7935.00 | 10000.0 | 12424.50 | 21300 | ▁▅▇▃▁ |
in_state_tuition | 0 | 1.00 | 16491.29 | 14773.84 | 480 | 4890.00 | 10099.0 | 27124.00 | 59985 | ▇▂▂▁▁ |
in_state_total | 0 | 1.00 | 22871.73 | 18948.39 | 962 | 5802.00 | 17669.0 | 35960.00 | 75003 | ▇▅▂▂▁ |
out_of_state_tuition | 0 | 1.00 | 20532.73 | 13255.65 | 480 | 9552.00 | 17486.0 | 29208.00 | 59985 | ▇▆▅▂▁ |
out_of_state_total | 0 | 1.00 | 26913.16 | 17719.73 | 1376 | 11196.00 | 23214.0 | 39054.00 | 75003 | ▇▅▅▂▁ |
total_enrollment | 819 | 0.72 | 6185.05 | 8270.69 | 15 | 1351.50 | 3132.0 | 7623.00 | 81459 | ▇▁▁▁▁ |
women | 819 | 0.72 | 3486.93 | 4631.49 | 0 | 772.75 | 1881.0 | 4381.00 | 48329 | ▇▁▁▁▁ |
american_indian_alaska_native | 819 | 0.72 | 46.25 | 138.14 | 0 | 4.00 | 14.0 | 39.00 | 2659 | ▇▁▁▁▁ |
asian | 819 | 0.72 | 321.60 | 839.51 | 0 | 14.00 | 55.0 | 222.00 | 10381 | ▇▁▁▁▁ |
black | 819 | 0.72 | 774.88 | 1478.91 | 0 | 74.00 | 243.5 | 827.50 | 18914 | ▇▁▁▁▁ |
hispanic | 819 | 0.72 | 933.05 | 2436.55 | 0 | 47.00 | 175.0 | 688.75 | 44870 | ▇▁▁▁▁ |
native_hawaiian_pacific_islander | 819 | 0.72 | 14.56 | 36.60 | 0 | 1.00 | 4.0 | 13.00 | 653 | ▇▁▁▁▁ |
white | 819 | 0.72 | 3347.52 | 4385.80 | 0 | 729.50 | 1838.5 | 4119.00 | 41898 | ▇▁▁▁▁ |
two_or_more_races | 819 | 0.72 | 175.40 | 290.96 | 0 | 18.00 | 62.0 | 193.00 | 2801 | ▇▁▁▁▁ |
unknown | 819 | 0.72 | 299.74 | 919.59 | 0 | 25.00 | 105.0 | 295.50 | 27213 | ▇▁▁▁▁ |
non_resident_foreign | 819 | 0.72 | 272.05 | 797.11 | 0 | 5.00 | 38.5 | 160.00 | 11495 | ▇▁▁▁▁ |
total_minority | 819 | 0.72 | 2265.74 | 3996.32 | 0 | 283.25 | 766.5 | 2441.00 | 56561 | ▇▁▁▁▁ |
men | 819 | 0.72 | 2698.13 | 3769.05 | 0 | 540.25 | 1283.0 | 3277.50 | 33130 | ▇▁▁▁▁ |
resident | 819 | 0.72 | 5913.01 | 7802.33 | 11 | 1298.00 | 3046.0 | 7315.75 | 80423 | ▇▁▁▁▁ |
# create new columns to compute the % gender, % race and ethnicity groups in each of the colleges
merged2<-merged2%>%mutate(across(.cols=women:resident,.fns=~((.x/total_enrollment)*100),.names ="percent_{.col}"))
#check the # of entries in the type and degree_length
tabyl(merged2$type)
## merged2$type n percent
## For Profit 107 0.0359905819
## Other 1 0.0003363606
## Private 1281 0.4308779011
## Public 1584 0.5327951564
tabyl(merged2$degree_length)
## merged2$degree_length n percent
## 2 Year 1120 0.3767238480
## 4 Year 1852 0.6229397915
## Other 1 0.0003363606
# retain only 4 year degree length observations
merged2.upd<-merged2%>%
filter(degree_length=="4 Year"&(type=="Private"|type=="Public")) #1806 observations
#check the # of entries in the type and degree_length
tabyl(merged2.upd$type)
## merged2.upd$type n percent
## For Profit 0 0.0000000
## Other 0 0.0000000
## Private 1209 0.6694352
## Public 597 0.3305648
tabyl(merged2.upd$degree_length)
## merged2.upd$degree_length n percent
## 4 Year 1806 1
# check for NA's
sum(is.na(merged2.upd$total_enrollment)) #542 observations
## [1] 542
# remove those NA observations
merged2.final<-merged2.upd%>%
filter(!is.na(total_enrollment))# 1264 observations
# relocate the men column next to women column
merged2.final<-merged2.final%>%relocate("percent_men",.after="percent_women")
#verify!
skimr::skim(merged2.final)
Name | merged2.final |
Number of rows | 1264 |
Number of columns | 37 |
_______________________ | |
Column type frequency: | |
character | 3 |
factor | 2 |
numeric | 32 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 11 | 58 | 0 | 1264 | 0 |
state_code | 0 | 1 | 2 | 2 | 0 | 50 | 0 |
degree_length | 0 | 1 | 6 | 6 | 0 | 1 | 0 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
state | 0 | 1 | FALSE | 50 | New: 84, Pen: 82, Cal: 81, Tex: 66 |
type | 0 | 1 | FALSE | 2 | Pri: 906, Pub: 358, For: 0, Oth: 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
room_and_board | 100 | 0.92 | 10633.34 | 2998.55 | 950.00 | 8634.00 | 10359.00 | 12714.50 | 21300.00 | ▁▅▇▃▁ |
in_state_tuition | 0 | 1.00 | 25218.35 | 14996.93 | 480.00 | 10849.00 | 24375.00 | 36180.50 | 59985.00 | ▇▆▇▅▂ |
in_state_total | 0 | 1.00 | 35010.44 | 17662.86 | 1430.00 | 19454.75 | 32742.00 | 48450.00 | 75003.00 | ▃▇▆▅▂ |
out_of_state_tuition | 0 | 1.00 | 28369.28 | 12797.39 | 480.00 | 18629.50 | 27124.00 | 36683.50 | 59985.00 | ▂▇▇▃▂ |
out_of_state_total | 0 | 1.00 | 38161.37 | 15809.88 | 1430.00 | 26802.00 | 36709.50 | 48899.50 | 75003.00 | ▂▆▇▅▂ |
total_enrollment | 0 | 1.00 | 5807.69 | 8620.18 | 20.00 | 1142.50 | 2523.00 | 6304.75 | 81459.00 | ▇▁▁▁▁ |
women | 0 | 1.00 | 3249.67 | 4687.62 | 0.00 | 652.00 | 1502.00 | 3671.25 | 48329.00 | ▇▁▁▁▁ |
american_indian_alaska_native | 0 | 1.00 | 32.90 | 105.22 | 0.00 | 3.00 | 9.00 | 26.00 | 1754.00 | ▇▁▁▁▁ |
asian | 0 | 1.00 | 301.61 | 851.85 | 0.00 | 13.00 | 55.00 | 187.25 | 10381.00 | ▇▁▁▁▁ |
black | 0 | 1.00 | 658.51 | 1219.67 | 0.00 | 67.00 | 196.50 | 669.00 | 14751.00 | ▇▁▁▁▁ |
hispanic | 0 | 1.00 | 573.88 | 1633.40 | 0.00 | 41.00 | 139.00 | 458.00 | 31211.00 | ▇▁▁▁▁ |
native_hawaiian_pacific_islander | 0 | 1.00 | 11.09 | 29.67 | 0.00 | 0.75 | 3.00 | 9.00 | 429.00 | ▇▁▁▁▁ |
white | 0 | 1.00 | 3374.73 | 4968.27 | 0.00 | 594.50 | 1517.50 | 3738.75 | 41898.00 | ▇▁▁▁▁ |
two_or_more_races | 0 | 1.00 | 165.70 | 296.36 | 0.00 | 17.00 | 56.00 | 167.00 | 2801.00 | ▇▁▁▁▁ |
unknown | 0 | 1.00 | 294.58 | 1083.90 | 0.00 | 25.00 | 106.00 | 290.25 | 27213.00 | ▇▁▁▁▁ |
non_resident_foreign | 0 | 1.00 | 394.69 | 987.26 | 0.00 | 16.00 | 73.00 | 269.00 | 11495.00 | ▇▁▁▁▁ |
total_minority | 0 | 1.00 | 1743.70 | 3189.62 | 0.00 | 230.75 | 585.00 | 1803.00 | 39763.00 | ▇▁▁▁▁ |
men | 0 | 1.00 | 2558.02 | 4037.99 | 0.00 | 463.25 | 1014.00 | 2675.50 | 33130.00 | ▇▁▁▁▁ |
resident | 0 | 1.00 | 5413.00 | 7933.87 | 11.00 | 1100.50 | 2462.50 | 5911.25 | 80423.00 | ▇▁▁▁▁ |
percent_women | 0 | 1.00 | 56.09 | 15.59 | 0.00 | 51.15 | 57.36 | 63.16 | 100.00 | ▁▁▇▅▁ |
percent_men | 0 | 1.00 | 43.91 | 15.59 | 0.00 | 36.84 | 42.64 | 48.85 | 100.00 | ▁▅▇▁▁ |
percent_american_indian_alaska_native | 0 | 1.00 | 1.02 | 5.98 | 0.00 | 0.15 | 0.31 | 0.59 | 100.00 | ▇▁▁▁▁ |
percent_asian | 0 | 1.00 | 3.63 | 4.97 | 0.00 | 0.95 | 1.94 | 4.44 | 60.99 | ▇▁▁▁▁ |
percent_black | 0 | 1.00 | 13.99 | 20.58 | 0.00 | 3.67 | 6.59 | 14.11 | 100.00 | ▇▁▁▁▁ |
percent_hispanic | 0 | 1.00 | 7.87 | 9.17 | 0.00 | 2.75 | 5.18 | 9.24 | 92.98 | ▇▁▁▁▁ |
percent_native_hawaiian_pacific_islander | 0 | 1.00 | 0.21 | 0.66 | 0.00 | 0.01 | 0.10 | 0.21 | 15.57 | ▇▁▁▁▁ |
percent_white | 0 | 1.00 | 60.02 | 22.94 | 0.00 | 48.81 | 65.47 | 75.93 | 100.00 | ▂▂▃▇▃ |
percent_two_or_more_races | 0 | 1.00 | 2.51 | 2.11 | 0.00 | 1.24 | 2.26 | 3.26 | 28.77 | ▇▁▁▁▁ |
percent_unknown | 0 | 1.00 | 5.41 | 6.52 | 0.00 | 1.25 | 3.38 | 7.31 | 62.89 | ▇▁▁▁▁ |
percent_non_resident_foreign | 0 | 1.00 | 5.33 | 8.07 | 0.00 | 1.04 | 2.83 | 6.57 | 96.62 | ▇▁▁▁▁ |
percent_total_minority | 0 | 1.00 | 29.24 | 21.97 | 0.00 | 14.73 | 23.20 | 35.22 | 100.00 | ▇▇▂▁▁ |
percent_resident | 0 | 1.00 | 94.67 | 8.07 | 3.38 | 93.43 | 97.17 | 98.96 | 100.00 | ▁▁▁▁▇ |
# convert the wide to a long data
merged2_gender_long<-pivot_longer(data=merged2.final,
cols=c("percent_men","percent_women"),
names_to="percent_gender",
values_to="enrollment")
# remove the string "percent_" in the merged2_long columns
merged2_gender_long <-merged2_gender_long %>%
mutate(gender= str_remove_all(percent_gender, "percent_"))
#keep the columns that are required to plot compare gender
gender<-merged2_gender_long%>%
select(name,state,state_code,type, degree_length,gender, enrollment, total_enrollment)
#verify
skimr::skim(gender)
Name | gender |
Number of rows | 2528 |
Number of columns | 8 |
_______________________ | |
Column type frequency: | |
character | 4 |
factor | 2 |
numeric | 2 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 11 | 58 | 0 | 1264 | 0 |
state_code | 0 | 1 | 2 | 2 | 0 | 50 | 0 |
degree_length | 0 | 1 | 6 | 6 | 0 | 1 | 0 |
gender | 0 | 1 | 3 | 5 | 0 | 2 | 0 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
state | 0 | 1 | FALSE | 50 | New: 168, Pen: 164, Cal: 162, Tex: 132 |
type | 0 | 1 | FALSE | 2 | Pri: 1812, Pub: 716, For: 0, Oth: 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
enrollment | 0 | 1 | 50.00 | 16.73 | 0 | 40.95 | 50 | 59.05 | 100 | ▁▂▇▂▁ |
total_enrollment | 0 | 1 | 5807.69 | 8618.48 | 20 | 1142.50 | 2523 | 6304.75 | 81459 | ▇▁▁▁▁ |
# convert the wide to a long data
merged2_race_long<-pivot_longer(data=merged2.final,
cols=percent_american_indian_alaska_native:percent_resident,
names_to="percent_race",
values_to="enrollment")
# remove the string "percent_" in the merged2_long columns
merged2_race_long <-merged2_race_long %>%
mutate(race= str_remove_all(percent_race, "percent_"))
#keep the columns that are required to plot compare gender
race<-merged2_race_long%>%
select(name,state,state_code,type, degree_length,race, enrollment, total_enrollment)
#verify
skimr::skim(race)
Name | race |
Number of rows | 13904 |
Number of columns | 8 |
_______________________ | |
Column type frequency: | |
character | 4 |
factor | 2 |
numeric | 2 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 11 | 58 | 0 | 1264 | 0 |
state_code | 0 | 1 | 2 | 2 | 0 | 50 | 0 |
degree_length | 0 | 1 | 6 | 6 | 0 | 1 | 0 |
race | 0 | 1 | 5 | 32 | 0 | 11 | 0 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
state | 0 | 1 | FALSE | 50 | New: 924, Pen: 902, Cal: 891, Tex: 726 |
type | 0 | 1 | FALSE | 2 | Pri: 9966, Pub: 3938, For: 0, Oth: 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
enrollment | 0 | 1 | 20.36 | 31.55 | 0 | 0.81 | 4.07 | 21.58 | 100 | ▇▁▁▁▁ |
total_enrollment | 0 | 1 | 5807.69 | 8617.08 | 20 | 1142.50 | 2523.00 | 6304.75 | 81459 | ▇▁▁▁▁ |
When I was working with the individual dataset and merging the datasets, I came across a lot of NA observations, it took me a while for me to figure out if they had a pattern to it or they were missing at random. For example the 2 year degree type observations had most of the observations missing for early career pay, mid career pay, stem % and make world better percent % , so then I had to apply a bunch of inclusion/exclusion criterias.
Also, reshaping the diversity data was a bit of challenge since the original data was in long format with all the gender and race variables lumped into one column, I had to convert to wide format in order to use them to create summary tables and then later for making boxplots and bar plots, I needed a long format data for which I had to reshape the data to a long format again.
I think after performing all the data transformation and merging of the datasets, I am happy with the outcome.
Use
group_by()
andsummarize()
to make a summary of the data here. The summary should be relevant to your research question
What are your findings about the summary? Are they what you expected?
Make at least two plots that help you answer your question on the transformed or summarized data. Use scales and/or labels to make each plot informative.
To answer out first research question of interest,
->we will to calculate the median early career pay, median mid career pay for the private/public colleges across different states of USA, this would provide an idea about the earning potential of a student at different stages in their career.
-> also to compare the quality of education, we will be comparing the median stem % and median make world better % between private and public colleges across different states in USA.
->apart from calculating the median of the variable of interest, it is also important to understand the spread of the distribution, for which we calculate the minimum and maximum values and a boxplot is plotted in the data visualization section
-> I created a comprehensive table which included the minimum,median and maximum value of early career pay, mid career pay, stem percent and make world better percent across private and public colleges in the different states in U.S.A and also created 4 separate tables one for each variable of interest.
# create a summary table which calculates the min,median and max of the early_career_pay, mid_career_pay, stem %, make world better %
merged1_stats<-merged1_final%>%
group_by(state_code,type)%>%
summarise(number_of_colleges=n(),
across(early_career_pay:stem_percent,~min(.x, na.rm=TRUE),.names = "minimum_{.col}"),
across(early_career_pay:stem_percent,~median(.x, na.rm=TRUE),.names = "median_{.col}"),
across(early_career_pay:stem_percent,~max(.x,na.rm=TRUE),.names = "maximum_{.col}"))
## `summarise()` has grouped output by 'state_code'. You can override using the
## `.groups` argument.
merged1_stats%>%gt::gt()
type | number_of_colleges | minimum_early_career_pay | minimum_mid_career_pay | minimum_make_world_better_percent | minimum_stem_percent | median_early_career_pay | median_mid_career_pay | median_make_world_better_percent | median_stem_percent | maximum_early_career_pay | maximum_mid_career_pay | maximum_make_world_better_percent | maximum_stem_percent |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AK | |||||||||||||
Private | 1 | 50300 | 90000 | 67 | 6 | 50300 | 90000 | 67.0 | 6.0 | 50300 | 90000 | 67 | 6 |
Public | 2 | 56000 | 101000 | 54 | 10 | 57550 | 101400 | 55.5 | 15.0 | 59100 | 101800 | 57 | 20 |
AL | |||||||||||||
Private | 10 | 34800 | 65100 | 52 | 0 | 41500 | 76450 | 64.0 | 12.5 | 54500 | 93500 | 77 | 30 |
Public | 12 | 38900 | 70700 | 49 | 4 | 44750 | 79800 | 59.5 | 10.5 | 57500 | 104500 | 61 | 45 |
AR | |||||||||||||
Private | 8 | 34600 | 61200 | 47 | 3 | 43200 | 77350 | 55.5 | 17.0 | 49200 | 90300 | 58 | 33 |
Public | 8 | 40500 | 70500 | 49 | 2 | 44950 | 79800 | 56.0 | 7.5 | 52500 | 98000 | 84 | 22 |
AZ | |||||||||||||
Private | 2 | 41200 | 73500 | 69 | 0 | 41900 | 75050 | 69.0 | 3.5 | 42600 | 76600 | 69 | 7 |
Public | 2 | 49600 | 90000 | 51 | 12 | 52850 | 95500 | 53.5 | 17.5 | 56100 | 101000 | 56 | 23 |
CA | |||||||||||||
Private | 13 | 58500 | 109400 | 39 | 0 | 64500 | 120600 | 49.5 | 16.0 | 88800 | 158200 | 59 | 97 |
Public | 1 | 63000 | 114700 | 53 | 28 | 63000 | 114700 | 53.0 | 28.0 | 63000 | 114700 | 53 | 28 |
CO | |||||||||||||
Private | 5 | 33600 | 60100 | 48 | 0 | 53100 | 94800 | 53.0 | 12.0 | 55700 | 105100 | 70 | 23 |
Public | 9 | 44400 | 81400 | 47 | 3 | 48300 | 87100 | 53.0 | 19.0 | 75600 | 139600 | 60 | 93 |
CT | |||||||||||||
Private | 12 | 49700 | 85900 | 36 | 2 | 55550 | 100650 | 52.5 | 19.0 | 70300 | 138300 | 63 | 31 |
Public | 6 | 48500 | 83600 | 43 | 2 | 50450 | 87950 | 49.0 | 10.0 | 59100 | 105200 | 68 | 24 |
DE | |||||||||||||
Private | 3 | 45100 | 76600 | 42 | 2 | 45400 | 77300 | 50.0 | 4.0 | 49800 | 86900 | 50 | 25 |
Public | 2 | 46300 | 81900 | 43 | 10 | 52450 | 93650 | 52.5 | 16.0 | 58600 | 105400 | 62 | 22 |
FL | |||||||||||||
Private | 10 | 46100 | 84400 | 44 | 2 | 49650 | 89100 | 53.0 | 8.0 | 58300 | 105400 | 88 | 48 |
Public | 8 | 47300 | 83900 | 40 | 4 | 49500 | 89350 | 51.0 | 15.5 | 55800 | 102800 | 62 | 29 |
GA | |||||||||||||
Private | 11 | 44400 | 79400 | 34 | 3 | 46800 | 84900 | 54.0 | 19.0 | 62000 | 110800 | 62 | 26 |
Public | 11 | 43300 | 77900 | 45 | 7 | 47000 | 84400 | 51.0 | 10.0 | 54400 | 100700 | 69 | 22 |
HI | |||||||||||||
Private | 3 | 45400 | 84000 | 55 | 3 | 51700 | 93500 | 56.0 | 12.0 | 52200 | 93900 | 57 | 15 |
Public | 2 | 44000 | 77200 | 56 | 14 | 47800 | 84400 | 57.5 | 16.0 | 51600 | 91600 | 59 | 18 |
IA | |||||||||||||
Private | 19 | 43900 | 78000 | 41 | 0 | 48000 | 86900 | 49.0 | 14.0 | 53400 | 99900 | 60 | 65 |
Public | 3 | 47100 | 85300 | 46 | 8 | 54100 | 99400 | 48.0 | 16.0 | 56100 | 101300 | 48 | 31 |
ID | |||||||||||||
Private | 2 | 46200 | 80600 | 59 | 7 | 49950 | 87050 | 63.5 | 10.5 | 53700 | 93500 | 68 | 14 |
Public | 4 | 44900 | 77600 | 54 | 5 | 50450 | 89100 | 57.5 | 13.0 | 52300 | 97700 | 64 | 22 |
IL | |||||||||||||
Private | 16 | 49400 | 89800 | 41 | 0 | 54200 | 96000 | 46.0 | 16.5 | 64600 | 118500 | 88 | 64 |
Public | 4 | 51200 | 95000 | 44 | 10 | 54650 | 99650 | 47.0 | 25.5 | 62600 | 115300 | 48 | 36 |
IN | |||||||||||||
Private | 17 | 46900 | 84500 | 40 | 0 | 50000 | 89900 | 53.0 | 21.0 | 75000 | 135800 | 59 | 96 |
Public | 2 | 45600 | 83700 | 48 | 7 | 46750 | 84650 | 51.5 | 7.5 | 47900 | 85600 | 55 | 8 |
KS | |||||||||||||
Private | 11 | 37100 | 66100 | 55 | 0 | 44700 | 79100 | 61.0 | 7.0 | 48700 | 91800 | 73 | 14 |
Public | 7 | 43500 | 78000 | 51 | 4 | 47100 | 84900 | 54.0 | 9.0 | 53000 | 98600 | 64 | 24 |
KY | |||||||||||||
Private | 16 | 35300 | 64100 | 50 | 0 | 40950 | 75300 | 54.0 | 8.0 | 50000 | 90900 | 74 | 27 |
Public | 8 | 40700 | 73100 | 46 | 5 | 45550 | 81900 | 53.0 | 11.0 | 51700 | 96400 | 57 | 19 |
LA | |||||||||||||
Private | 7 | 33900 | 61500 | 37 | 0 | 46700 | 86300 | 60.5 | 4.0 | 48400 | 88600 | 74 | 35 |
Public | 9 | 40100 | 71000 | 51 | 5 | 48200 | 88200 | 57.0 | 13.0 | 53100 | 100700 | 76 | 24 |
MA | |||||||||||||
Private | 20 | 52600 | 98200 | 36 | 0 | 62600 | 114100 | 47.0 | 25.0 | 86300 | 155200 | 71 | 86 |
Public | 1 | 67900 | 117500 | 53 | 47 | 67900 | 117500 | 53.0 | 47.0 | 67900 | 117500 | 53 | 47 |
MD | |||||||||||||
Private | 13 | 47900 | 82600 | 40 | 0 | 51100 | 94100 | 52.5 | 11.0 | 67400 | 118400 | 63 | 88 |
Public | 5 | 49600 | 87000 | 47 | 2 | 50400 | 89800 | 51.0 | 12.0 | 51200 | 91200 | 65 | 19 |
ME | |||||||||||||
Private | 9 | 36300 | 64600 | 37 | 0 | 46900 | 84300 | 51.0 | 7.0 | 61300 | 112000 | 69 | 37 |
Public | 8 | 37100 | 66200 | 47 | 5 | 42550 | 74700 | 59.5 | 7.5 | 67000 | 121800 | 73 | 48 |
MI | |||||||||||||
Private | 12 | 47400 | 84700 | 40 | 0 | 50700 | 91000 | 50.5 | 16.5 | 69900 | 125000 | 57 | 74 |
Public | 10 | 45900 | 82200 | 46 | 6 | 50400 | 90500 | 48.5 | 15.0 | 65000 | 116300 | 54 | 81 |
MN | |||||||||||||
Private | 6 | 47400 | 84700 | 39 | 0 | 50850 | 93900 | 49.0 | 16.5 | 58800 | 109900 | 52 | 45 |
Public | 3 | 49700 | 85900 | 50 | 10 | 50200 | 86800 | 51.0 | 11.0 | 51600 | 91600 | 53 | 15 |
MO | |||||||||||||
Private | 11 | 42500 | 77200 | 37 | 0 | 46500 | 81200 | 56.0 | 7.0 | 54200 | 95300 | 68 | 13 |
Public | 6 | 43400 | 77700 | 48 | 9 | 46900 | 84200 | 51.0 | 21.0 | 67300 | 122600 | 54 | 80 |
MS | |||||||||||||
Private | 6 | 34800 | 62400 | 56 | 2 | 41150 | 73050 | 60.5 | 19.0 | 49300 | 88700 | 76 | 32 |
Public | 8 | 32500 | 61900 | 55 | 2 | 41300 | 74400 | 63.0 | 11.5 | 51100 | 94100 | 70 | 27 |
MT | |||||||||||||
Private | 2 | 46000 | 82900 | 55 | 12 | 47100 | 87150 | 57.0 | 18.5 | 48200 | 91400 | 59 | 25 |
Public | 2 | 53700 | 96200 | 54 | 31 | 57450 | 104400 | 59.0 | 46.0 | 61200 | 112600 | 64 | 61 |
NE | |||||||||||||
Private | 7 | 43800 | 77700 | 45 | 0 | 48600 | 84800 | 57.0 | 9.0 | 53500 | 95800 | 74 | 23 |
Public | 4 | 40600 | 73400 | 50 | 3 | 42450 | 75750 | 60.0 | 6.0 | 53400 | 93000 | 79 | 11 |
NV | |||||||||||||
Private | 1 | 54000 | 95500 | 39 | 2 | 54000 | 95500 | 39.0 | 2.0 | 54000 | 95500 | 39 | 2 |
OH | |||||||||||||
Private | 14 | 48300 | 88500 | 39 | 0 | 53700 | 99450 | 46.0 | 21.0 | 66600 | 117800 | 78 | 45 |
Public | 2 | 48300 | 88600 | 51 | 20 | 50050 | 90550 | 51.5 | 20.5 | 51800 | 92500 | 52 | 21 |
OK | |||||||||||||
Private | 8 | 40700 | 74100 | 51 | 0 | 46000 | 83050 | 58.5 | 8.0 | 58700 | 107200 | 71 | 41 |
Public | 11 | 37300 | 67400 | 48 | 5 | 44500 | 77600 | 55.0 | 10.0 | 47900 | 90800 | 66 | 15 |
OR | |||||||||||||
Private | 11 | 40500 | 76400 | 44 | 0 | 50000 | 89600 | 57.0 | 8.0 | 58500 | 104900 | 82 | 28 |
Public | 8 | 44000 | 73900 | 46 | 5 | 51300 | 93150 | 53.5 | 11.5 | 65400 | 108300 | 79 | 34 |
PA | |||||||||||||
Private | 19 | 53400 | 98400 | 37 | 5 | 59300 | 112300 | 43.0 | 25.0 | 75900 | 136100 | 72 | 66 |
TN | |||||||||||||
Private | 14 | 41200 | 74900 | 46 | 3 | 45950 | 82350 | 59.5 | 9.0 | 65400 | 119100 | 68 | 39 |
Public | 6 | 43700 | 77300 | 48 | 8 | 46850 | 79750 | 56.0 | 11.0 | 53100 | 95100 | 68 | 28 |
TX | |||||||||||||
Private | 11 | 51800 | 93500 | 33 | 0 | 55500 | 98900 | 50.0 | 17.0 | 71000 | 129500 | 61 | 45 |
Public | 7 | 50700 | 94800 | 48 | 3 | 56000 | 101200 | 57.0 | 22.0 | 61300 | 106500 | 86 | 53 |
UT | |||||||||||||
Private | 1 | 54500 | 91500 | 57 | 9 | 54500 | 91500 | 57.0 | 9.0 | 54500 | 91500 | 57 | 9 |
Public | 5 | 46400 | 84400 | 57 | 7 | 52400 | 93900 | 60.0 | 12.0 | 54800 | 102600 | 65 | 22 |
VA | |||||||||||||
Private | 10 | 43000 | 80600 | 40 | 1 | 52500 | 92850 | 52.5 | 14.0 | 65900 | 123200 | 86 | 23 |
Public | 12 | 44900 | 81800 | 42 | 9 | 52500 | 94050 | 52.0 | 18.5 | 65900 | 120600 | 68 | 47 |
VT | |||||||||||||
Private | 8 | 42000 | 75200 | 39 | 0 | 49950 | 89200 | 51.0 | 15.5 | 60400 | 109800 | 63 | 20 |
Public | 4 | 41500 | 75100 | 41 | 4 | 47050 | 85050 | 49.5 | 7.0 | 53200 | 99400 | 52 | 22 |
WA | |||||||||||||
Private | 13 | 41100 | 73600 | 39 | 0 | 53600 | 97800 | 52.0 | 18.0 | 60600 | 113700 | 60 | 33 |
Public | 6 | 43500 | 78400 | 45 | 9 | 51400 | 90850 | 50.0 | 15.0 | 55900 | 103700 | 56 | 20 |
WI | |||||||||||||
Private | 12 | 45600 | 81900 | 45 | 0 | 48150 | 87000 | 53.0 | 15.5 | 65700 | 117600 | 94 | 69 |
Public | 10 | 46000 | 81300 | 44 | 8 | 50000 | 89400 | 49.0 | 15.5 | 57300 | 104400 | 53 | 38 |
WY | |||||||||||||
Public | 1 | 52400 | 98800 | 58 | 25 | 52400 | 98800 | 58.0 | 25.0 | 52400 | 98800 | 58 | 25 |
# you can also make individual tables for each of the variable of interest
#early_career_pay
early_career_pay_stats<-merged1_final%>%
group_by(state_code,type)%>%
summarise(number_of_colleges=n(),
minimum_early_career_pay=min(early_career_pay,na.rm=TRUE),
median_early_career_pay=median(early_career_pay,na.rm=TRUE),
maximum_early_career_pay=max(early_career_pay,na.rm=TRUE))
## `summarise()` has grouped output by 'state_code'. You can override using the
## `.groups` argument.
early_career_pay_stats%>%gt::gt()
type | number_of_colleges | minimum_early_career_pay | median_early_career_pay | maximum_early_career_pay |
---|---|---|---|---|
AK | ||||
Private | 1 | 50300 | 50300 | 50300 |
Public | 2 | 56000 | 57550 | 59100 |
AL | ||||
Private | 10 | 34800 | 41500 | 54500 |
Public | 12 | 38900 | 44750 | 57500 |
AR | ||||
Private | 8 | 34600 | 43200 | 49200 |
Public | 8 | 40500 | 44950 | 52500 |
AZ | ||||
Private | 2 | 41200 | 41900 | 42600 |
Public | 2 | 49600 | 52850 | 56100 |
CA | ||||
Private | 13 | 58500 | 64500 | 88800 |
Public | 1 | 63000 | 63000 | 63000 |
CO | ||||
Private | 5 | 33600 | 53100 | 55700 |
Public | 9 | 44400 | 48300 | 75600 |
CT | ||||
Private | 12 | 49700 | 55550 | 70300 |
Public | 6 | 48500 | 50450 | 59100 |
DE | ||||
Private | 3 | 45100 | 45400 | 49800 |
Public | 2 | 46300 | 52450 | 58600 |
FL | ||||
Private | 10 | 46100 | 49650 | 58300 |
Public | 8 | 47300 | 49500 | 55800 |
GA | ||||
Private | 11 | 44400 | 46800 | 62000 |
Public | 11 | 43300 | 47000 | 54400 |
HI | ||||
Private | 3 | 45400 | 51700 | 52200 |
Public | 2 | 44000 | 47800 | 51600 |
IA | ||||
Private | 19 | 43900 | 48000 | 53400 |
Public | 3 | 47100 | 54100 | 56100 |
ID | ||||
Private | 2 | 46200 | 49950 | 53700 |
Public | 4 | 44900 | 50450 | 52300 |
IL | ||||
Private | 16 | 49400 | 54200 | 64600 |
Public | 4 | 51200 | 54650 | 62600 |
IN | ||||
Private | 17 | 46900 | 50000 | 75000 |
Public | 2 | 45600 | 46750 | 47900 |
KS | ||||
Private | 11 | 37100 | 44700 | 48700 |
Public | 7 | 43500 | 47100 | 53000 |
KY | ||||
Private | 16 | 35300 | 40950 | 50000 |
Public | 8 | 40700 | 45550 | 51700 |
LA | ||||
Private | 7 | 33900 | 46700 | 48400 |
Public | 9 | 40100 | 48200 | 53100 |
MA | ||||
Private | 20 | 52600 | 62600 | 86300 |
Public | 1 | 67900 | 67900 | 67900 |
MD | ||||
Private | 13 | 47900 | 51100 | 67400 |
Public | 5 | 49600 | 50400 | 51200 |
ME | ||||
Private | 9 | 36300 | 46900 | 61300 |
Public | 8 | 37100 | 42550 | 67000 |
MI | ||||
Private | 12 | 47400 | 50700 | 69900 |
Public | 10 | 45900 | 50400 | 65000 |
MN | ||||
Private | 6 | 47400 | 50850 | 58800 |
Public | 3 | 49700 | 50200 | 51600 |
MO | ||||
Private | 11 | 42500 | 46500 | 54200 |
Public | 6 | 43400 | 46900 | 67300 |
MS | ||||
Private | 6 | 34800 | 41150 | 49300 |
Public | 8 | 32500 | 41300 | 51100 |
MT | ||||
Private | 2 | 46000 | 47100 | 48200 |
Public | 2 | 53700 | 57450 | 61200 |
NE | ||||
Private | 7 | 43800 | 48600 | 53500 |
Public | 4 | 40600 | 42450 | 53400 |
NV | ||||
Private | 1 | 54000 | 54000 | 54000 |
OH | ||||
Private | 14 | 48300 | 53700 | 66600 |
Public | 2 | 48300 | 50050 | 51800 |
OK | ||||
Private | 8 | 40700 | 46000 | 58700 |
Public | 11 | 37300 | 44500 | 47900 |
OR | ||||
Private | 11 | 40500 | 50000 | 58500 |
Public | 8 | 44000 | 51300 | 65400 |
PA | ||||
Private | 19 | 53400 | 59300 | 75900 |
TN | ||||
Private | 14 | 41200 | 45950 | 65400 |
Public | 6 | 43700 | 46850 | 53100 |
TX | ||||
Private | 11 | 51800 | 55500 | 71000 |
Public | 7 | 50700 | 56000 | 61300 |
UT | ||||
Private | 1 | 54500 | 54500 | 54500 |
Public | 5 | 46400 | 52400 | 54800 |
VA | ||||
Private | 10 | 43000 | 52500 | 65900 |
Public | 12 | 44900 | 52500 | 65900 |
VT | ||||
Private | 8 | 42000 | 49950 | 60400 |
Public | 4 | 41500 | 47050 | 53200 |
WA | ||||
Private | 13 | 41100 | 53600 | 60600 |
Public | 6 | 43500 | 51400 | 55900 |
WI | ||||
Private | 12 | 45600 | 48150 | 65700 |
Public | 10 | 46000 | 50000 | 57300 |
WY | ||||
Public | 1 | 52400 | 52400 | 52400 |
From the above summary table of the minimum,maximum and median early career pay across private/public colleges in the different states of USA, we noted the following:
-> Amongst the private colleges, the highest median early career pay was recorded for California, followed by Massachusetts and then by Pennsylvania. The lowest median early career pay was recorded for the Kansas state and the second lowest for Mississipi
->Amongst the public colleges, the highest median early career pay was noted for Massachusetts and then by California and the lowest was noted for Mississipi and the second lowest for Nebraska.
-> It must also be noted that in this data, there are 20 private colleges and 1 public college in Massachusetts, 13 private and 1 public college in California.
->The highest number of public colleges was in AL and VA(12 in each state)and the lowest number was in CA, MA, WY,NV (with only 1 public college in each state) and the second lowest was noted in AR, AZ, DE, HI(2 public colleges in each state)
-> By assessing the min and max values, if we compare the range of early career pay between Mississipi and California public colleges, it looks the max early career pay in Mississipi exceeds that of California, this may not true since we had only 1 public college in california in our data set and thus doesn’t provide an accurate representation of all the public colleges in CA, especially in a state like CA where there are a lot of public colleges. Mississipi on the other hand, have 10 public colleges, so the data is a little more reliable.
->If we compare MI and MA, the maximum early pay is only little higher for MA public colleges compared to MI public colleges, this is due to same issue as above since MA has data from only 1 public college, so we can’t make any definitive conclusions from this.
->It is interesting to note that Colorado,having 9 public colleges has the highest early career pay $75,600. I think this seems like a reasonable early career pay for Colorado but most likely the recent graduates from public colleges in CO may not make the highest pay compared to CA and MI. A solid comparison can’t be made at this point since we have just one public college in CA and in MI(in the data).
-> The lowest min early career pay amongst the public colleges was recorded for MS $32,500. This was calculated from 8 public colleges in OK and the highest min early career pay was recorded for MA and again we can’t consider this to be a very credible value since this data is obtained from only 1 public college
->The highest number of private colleges was in Massachusetts(20), followed by Iowa(19) and Indiana(17)and the lowest number was in AK,NV,UT(1 each) and the second lowest in AZ, ID, MT(1 each).
-> Amongst the private college,the highest median and max early career pay was noted for California followed by MA and then by Pennsylvania and Kansas, Mississipi recorded the lowest median and max early career pay. Colorado had recorded the lowest minimum early career pay compared to the other states.
#mid_career_pay
mid_career_pay_stats<-merged1_final%>%
group_by(state_code,type)%>%
summarise(number_of_colleges=n(),
minimum_mid_career_pay=min(mid_career_pay,na.rm=TRUE),
median_mid_career_pay=median(mid_career_pay,na.rm=TRUE),
maximum_mid_career_pay=max(mid_career_pay,na.rm=TRUE))
## `summarise()` has grouped output by 'state_code'. You can override using the
## `.groups` argument.
mid_career_pay_stats%>%gt::gt()
type | number_of_colleges | minimum_mid_career_pay | median_mid_career_pay | maximum_mid_career_pay |
---|---|---|---|---|
AK | ||||
Private | 1 | 90000 | 90000 | 90000 |
Public | 2 | 101000 | 101400 | 101800 |
AL | ||||
Private | 10 | 65100 | 76450 | 93500 |
Public | 12 | 70700 | 79800 | 104500 |
AR | ||||
Private | 8 | 61200 | 77350 | 90300 |
Public | 8 | 70500 | 79800 | 98000 |
AZ | ||||
Private | 2 | 73500 | 75050 | 76600 |
Public | 2 | 90000 | 95500 | 101000 |
CA | ||||
Private | 13 | 109400 | 120600 | 158200 |
Public | 1 | 114700 | 114700 | 114700 |
CO | ||||
Private | 5 | 60100 | 94800 | 105100 |
Public | 9 | 81400 | 87100 | 139600 |
CT | ||||
Private | 12 | 85900 | 100650 | 138300 |
Public | 6 | 83600 | 87950 | 105200 |
DE | ||||
Private | 3 | 76600 | 77300 | 86900 |
Public | 2 | 81900 | 93650 | 105400 |
FL | ||||
Private | 10 | 84400 | 89100 | 105400 |
Public | 8 | 83900 | 89350 | 102800 |
GA | ||||
Private | 11 | 79400 | 84900 | 110800 |
Public | 11 | 77900 | 84400 | 100700 |
HI | ||||
Private | 3 | 84000 | 93500 | 93900 |
Public | 2 | 77200 | 84400 | 91600 |
IA | ||||
Private | 19 | 78000 | 86900 | 99900 |
Public | 3 | 85300 | 99400 | 101300 |
ID | ||||
Private | 2 | 80600 | 87050 | 93500 |
Public | 4 | 77600 | 89100 | 97700 |
IL | ||||
Private | 16 | 89800 | 96000 | 118500 |
Public | 4 | 95000 | 99650 | 115300 |
IN | ||||
Private | 17 | 84500 | 89900 | 135800 |
Public | 2 | 83700 | 84650 | 85600 |
KS | ||||
Private | 11 | 66100 | 79100 | 91800 |
Public | 7 | 78000 | 84900 | 98600 |
KY | ||||
Private | 16 | 64100 | 75300 | 90900 |
Public | 8 | 73100 | 81900 | 96400 |
LA | ||||
Private | 7 | 61500 | 86300 | 88600 |
Public | 9 | 71000 | 88200 | 100700 |
MA | ||||
Private | 20 | 98200 | 114100 | 155200 |
Public | 1 | 117500 | 117500 | 117500 |
MD | ||||
Private | 13 | 82600 | 94100 | 118400 |
Public | 5 | 87000 | 89800 | 91200 |
ME | ||||
Private | 9 | 64600 | 84300 | 112000 |
Public | 8 | 66200 | 74700 | 121800 |
MI | ||||
Private | 12 | 84700 | 91000 | 125000 |
Public | 10 | 82200 | 90500 | 116300 |
MN | ||||
Private | 6 | 84700 | 93900 | 109900 |
Public | 3 | 85900 | 86800 | 91600 |
MO | ||||
Private | 11 | 77200 | 81200 | 95300 |
Public | 6 | 77700 | 84200 | 122600 |
MS | ||||
Private | 6 | 62400 | 73050 | 88700 |
Public | 8 | 61900 | 74400 | 94100 |
MT | ||||
Private | 2 | 82900 | 87150 | 91400 |
Public | 2 | 96200 | 104400 | 112600 |
NE | ||||
Private | 7 | 77700 | 84800 | 95800 |
Public | 4 | 73400 | 75750 | 93000 |
NV | ||||
Private | 1 | 95500 | 95500 | 95500 |
OH | ||||
Private | 14 | 88500 | 99450 | 117800 |
Public | 2 | 88600 | 90550 | 92500 |
OK | ||||
Private | 8 | 74100 | 83050 | 107200 |
Public | 11 | 67400 | 77600 | 90800 |
OR | ||||
Private | 11 | 76400 | 89600 | 104900 |
Public | 8 | 73900 | 93150 | 108300 |
PA | ||||
Private | 19 | 98400 | 112300 | 136100 |
TN | ||||
Private | 14 | 74900 | 82350 | 119100 |
Public | 6 | 77300 | 79750 | 95100 |
TX | ||||
Private | 11 | 93500 | 98900 | 129500 |
Public | 7 | 94800 | 101200 | 106500 |
UT | ||||
Private | 1 | 91500 | 91500 | 91500 |
Public | 5 | 84400 | 93900 | 102600 |
VA | ||||
Private | 10 | 80600 | 92850 | 123200 |
Public | 12 | 81800 | 94050 | 120600 |
VT | ||||
Private | 8 | 75200 | 89200 | 109800 |
Public | 4 | 75100 | 85050 | 99400 |
WA | ||||
Private | 13 | 73600 | 97800 | 113700 |
Public | 6 | 78400 | 90850 | 103700 |
WI | ||||
Private | 12 | 81900 | 87000 | 117600 |
Public | 10 | 81300 | 89400 | 104400 |
WY | ||||
Public | 1 | 98800 | 98800 | 98800 |
From the above summary table of the minimum,maximum and median mid career pay across private/public colleges in the different states of USA, we noted the following:
-> Amongst the private colleges, the highest median mid career pay was recorded for California, followed by Massachusetts and Pennsylvania. The lowest median mid career pay was recorded for the Mississipi state and the second lowest for Arizona.
->Amongst the public colleges, the highest median mid career pay was noted for Massachusetts and then by California and the lowest was noted for Mississipi and the second lowest for ME.
->By eyeballing,the median mid career pay for the private and public colleges within each state,in some of the states have a slightly higher average mid career pay for public colleges compared to private which seems like an interesting observation however we need to factor in the number of colleges in each of the private/public college in each of the state.
->Sometimes there could be less number colleges in one category which would provide biased mean.It is difficult to compare the mean mid pay from private and public college from the summary table as the trend is not consistent across all the states.
-> It must also be noted that in this data, there are 20 private colleges and 1 public college in Massachusetts, 13 private and 1 public college in California.
->The highest number of public colleges was in AL and VA(12 in each state)and the lowest number was in CA, MA, WY,NV (with only 1 public college in each state) and the second lowest was noted in AR, AZ, DE, HI(2 public colleges in each state)
-> By assessing the min and max values, if we compare the range of mid career pay between Mississipi and California public colleges, it looks the max mid-career pay in Mississipi is less that of California’s, this may be true however this is not an accurate representation of all the public colleges in CA since there was on;y 1 public college in CA, especially in a state like CA where there are a lot of public colleges. Mississipi on the other hand, have 10 public colleges, so the data is a little more reliable.
->If we compare MI and MA, the maximum mid career pay is higher for MA public colleges compared to MI public colleges, however inferences can’t be drawn definitely since MA has data from only 1 public college.
->It is interesting to note that the highest maximum mid career pay amongst the public colleges was observed for Colorado with $139,600 followed by MO($122,600), this was higher than the CA and MA public colleges. One could interpret this in two ways, since CA and MA has a lot of public colleges and based on the location factor and cost of living, one would expect to see the highest mid career salaries made in these two states but we must factor in that both these states had data from only one public college which is definitely doesn’t represent the earning potential of students of students from California public colleges.
-> The lowest min mid-career pay amongst the public colleges was recorded for MS . This was calculated from 8 public colleges in MS and the trend is similar for the min early career pay as well.The highest min mid career pay was recorded for MA and again we can’t consider this to be a very credible value since this data is obtained from only 1 public college
->The highest number of private colleges was in Massachusetts(20), followed by Iowa(19) and Indiana(17)and the lowest number was in AK,NV,UT(1 each) and the second lowest in AZ, ID, MT(1 each).
-> Amongst the private college,the highest median and max mid-career pay was noted for California followed by MA and then by Pennsylvania and Mississipi and Arizona recorded the lowest median mid-career pay. Colorado had recorded the lowest minimum mid career pay compared to the other states. The trend is very similar to early career pay.
Since right-skew was prominently noticed for make world better %, median was used instead of mean as median is more reliable as a measure of central tendency when the data is skewed or when outliers are present.
# make world better percent
make_world_better_stats<-merged1_final%>%
group_by(state_code,type)%>%
summarise(number_of_colleges=n(),
minimum_make_world_better_percent=min(make_world_better_percent,na.rm=TRUE),
median_make_world_better_percent=median(make_world_better_percent,na.rm=TRUE),
maximum_make_world_better_percent=max(make_world_better_percent,na.rm=TRUE))
## `summarise()` has grouped output by 'state_code'. You can override using the
## `.groups` argument.
make_world_better_stats%>%gt::gt()
type | number_of_colleges | minimum_make_world_better_percent | median_make_world_better_percent | maximum_make_world_better_percent |
---|---|---|---|---|
AK | ||||
Private | 1 | 67 | 67.0 | 67 |
Public | 2 | 54 | 55.5 | 57 |
AL | ||||
Private | 10 | 52 | 64.0 | 77 |
Public | 12 | 49 | 59.5 | 61 |
AR | ||||
Private | 8 | 47 | 55.5 | 58 |
Public | 8 | 49 | 56.0 | 84 |
AZ | ||||
Private | 2 | 69 | 69.0 | 69 |
Public | 2 | 51 | 53.5 | 56 |
CA | ||||
Private | 13 | 39 | 49.5 | 59 |
Public | 1 | 53 | 53.0 | 53 |
CO | ||||
Private | 5 | 48 | 53.0 | 70 |
Public | 9 | 47 | 53.0 | 60 |
CT | ||||
Private | 12 | 36 | 52.5 | 63 |
Public | 6 | 43 | 49.0 | 68 |
DE | ||||
Private | 3 | 42 | 50.0 | 50 |
Public | 2 | 43 | 52.5 | 62 |
FL | ||||
Private | 10 | 44 | 53.0 | 88 |
Public | 8 | 40 | 51.0 | 62 |
GA | ||||
Private | 11 | 34 | 54.0 | 62 |
Public | 11 | 45 | 51.0 | 69 |
HI | ||||
Private | 3 | 55 | 56.0 | 57 |
Public | 2 | 56 | 57.5 | 59 |
IA | ||||
Private | 19 | 41 | 49.0 | 60 |
Public | 3 | 46 | 48.0 | 48 |
ID | ||||
Private | 2 | 59 | 63.5 | 68 |
Public | 4 | 54 | 57.5 | 64 |
IL | ||||
Private | 16 | 41 | 46.0 | 88 |
Public | 4 | 44 | 47.0 | 48 |
IN | ||||
Private | 17 | 40 | 53.0 | 59 |
Public | 2 | 48 | 51.5 | 55 |
KS | ||||
Private | 11 | 55 | 61.0 | 73 |
Public | 7 | 51 | 54.0 | 64 |
KY | ||||
Private | 16 | 50 | 54.0 | 74 |
Public | 8 | 46 | 53.0 | 57 |
LA | ||||
Private | 7 | 37 | 60.5 | 74 |
Public | 9 | 51 | 57.0 | 76 |
MA | ||||
Private | 20 | 36 | 47.0 | 71 |
Public | 1 | 53 | 53.0 | 53 |
MD | ||||
Private | 13 | 40 | 52.5 | 63 |
Public | 5 | 47 | 51.0 | 65 |
ME | ||||
Private | 9 | 37 | 51.0 | 69 |
Public | 8 | 47 | 59.5 | 73 |
MI | ||||
Private | 12 | 40 | 50.5 | 57 |
Public | 10 | 46 | 48.5 | 54 |
MN | ||||
Private | 6 | 39 | 49.0 | 52 |
Public | 3 | 50 | 51.0 | 53 |
MO | ||||
Private | 11 | 37 | 56.0 | 68 |
Public | 6 | 48 | 51.0 | 54 |
MS | ||||
Private | 6 | 56 | 60.5 | 76 |
Public | 8 | 55 | 63.0 | 70 |
MT | ||||
Private | 2 | 55 | 57.0 | 59 |
Public | 2 | 54 | 59.0 | 64 |
NE | ||||
Private | 7 | 45 | 57.0 | 74 |
Public | 4 | 50 | 60.0 | 79 |
NV | ||||
Private | 1 | 39 | 39.0 | 39 |
OH | ||||
Private | 14 | 39 | 46.0 | 78 |
Public | 2 | 51 | 51.5 | 52 |
OK | ||||
Private | 8 | 51 | 58.5 | 71 |
Public | 11 | 48 | 55.0 | 66 |
OR | ||||
Private | 11 | 44 | 57.0 | 82 |
Public | 8 | 46 | 53.5 | 79 |
PA | ||||
Private | 19 | 37 | 43.0 | 72 |
TN | ||||
Private | 14 | 46 | 59.5 | 68 |
Public | 6 | 48 | 56.0 | 68 |
TX | ||||
Private | 11 | 33 | 50.0 | 61 |
Public | 7 | 48 | 57.0 | 86 |
UT | ||||
Private | 1 | 57 | 57.0 | 57 |
Public | 5 | 57 | 60.0 | 65 |
VA | ||||
Private | 10 | 40 | 52.5 | 86 |
Public | 12 | 42 | 52.0 | 68 |
VT | ||||
Private | 8 | 39 | 51.0 | 63 |
Public | 4 | 41 | 49.5 | 52 |
WA | ||||
Private | 13 | 39 | 52.0 | 60 |
Public | 6 | 45 | 50.0 | 56 |
WI | ||||
Private | 12 | 45 | 53.0 | 94 |
Public | 10 | 44 | 49.0 | 53 |
WY | ||||
Public | 1 | 58 | 58.0 | 58 |
From the above summary table of the minimum,maximum and median make world better % across private/public colleges in the different states of USA, we noted the following:
Private Colleges
-> Amongst the private colleges, the highest median make world better % was recorded for Arizona(69%) and Alaska(67%) and the lowest median make world better % was for Nevada(39%). It was surprising to observe that the median make world better % was pretty low for CA(49.5%), MA(47%) and PA(43%), these three states were having the highest average early and mid career pay.
->By assessing the min make world better % for the private colleges, the highest min make world better % was observed for Arizona(69%) and Alaska (67%) and the lowest % was noted for TX(33%), GA(34%),MA(36%) and CA(39%). The low % in make world better is not expected for states like MA and CA. Note:Only 2 private colleges are in AZ and 1 is in Alaska, so this data may not truly represent all the other private colleges in AZ and AK.
->By assessing the max make world better % for the private colleges, the highest max make world better % was observed for WI(94%) and for FL(88%) and IL(88%) and the smallest max make world better % was noted for NV(39%) and DE(50%).It was also intriguing to note that CA(59%) and MA(71%) had only moderate min make world better % rankings.
Public Colleges
->Amongst the public colleges, the highest median make world better % was observed for Mississipi(63%) and Nebraska(60%),both of these states had the lowest median early and mid career pay, so it was interesting to note this observation. The lowest median make world better % was observed for IA(48%) and IL(47%).
->By assessing the min make world better % for the public colleges, the highest min make world better % was observed for WY(58%) and UT(57%) and the lowest % was noted for FL(40%) and VT(41%) . The MIN make world better % for states like MA and CA(53% each) was almost comparable to the highest min make world better %.
->By assessing the max make world better % for the public colleges, the highest max make world better % was observed for TX(86%)and AR(84%) and the smallest max make world better % was noted for IL(48%) and IA(48%).It was also intriguing to note that CA(53%) and MA(53%) had pretty low max make world better % rankings.
->By eyeballing the median world better % between the public and private colleges across the different states-in some of the states the public colleges seem to have a better median world % compared to the private colleges and vice-versa. The largest diff in make world better % between the public and private college was observed for Arizona(approx 16%, with private college % being higher) and Alaska(approx 12%,with private college % being higher), for the rest of the states the difference was very small.
#stem percent
stem_stats<-merged1_final%>%
group_by(state_code,type)%>%
summarise(number_of_colleges=n(),
minimum_stem_percent=min(stem_percent,na.rm=TRUE),
median_stem_percent=median(stem_percent,na.rm=TRUE),
maximum_stem_percent=max(stem_percent,na.rm=TRUE))
## `summarise()` has grouped output by 'state_code'. You can override using the
## `.groups` argument.
stem_stats%>%gt::gt()
type | number_of_colleges | minimum_stem_percent | median_stem_percent | maximum_stem_percent |
---|---|---|---|---|
AK | ||||
Private | 1 | 6 | 6.0 | 6 |
Public | 2 | 10 | 15.0 | 20 |
AL | ||||
Private | 10 | 0 | 12.5 | 30 |
Public | 12 | 4 | 10.5 | 45 |
AR | ||||
Private | 8 | 3 | 17.0 | 33 |
Public | 8 | 2 | 7.5 | 22 |
AZ | ||||
Private | 2 | 0 | 3.5 | 7 |
Public | 2 | 12 | 17.5 | 23 |
CA | ||||
Private | 13 | 0 | 16.0 | 97 |
Public | 1 | 28 | 28.0 | 28 |
CO | ||||
Private | 5 | 0 | 12.0 | 23 |
Public | 9 | 3 | 19.0 | 93 |
CT | ||||
Private | 12 | 2 | 19.0 | 31 |
Public | 6 | 2 | 10.0 | 24 |
DE | ||||
Private | 3 | 2 | 4.0 | 25 |
Public | 2 | 10 | 16.0 | 22 |
FL | ||||
Private | 10 | 2 | 8.0 | 48 |
Public | 8 | 4 | 15.5 | 29 |
GA | ||||
Private | 11 | 3 | 19.0 | 26 |
Public | 11 | 7 | 10.0 | 22 |
HI | ||||
Private | 3 | 3 | 12.0 | 15 |
Public | 2 | 14 | 16.0 | 18 |
IA | ||||
Private | 19 | 0 | 14.0 | 65 |
Public | 3 | 8 | 16.0 | 31 |
ID | ||||
Private | 2 | 7 | 10.5 | 14 |
Public | 4 | 5 | 13.0 | 22 |
IL | ||||
Private | 16 | 0 | 16.5 | 64 |
Public | 4 | 10 | 25.5 | 36 |
IN | ||||
Private | 17 | 0 | 21.0 | 96 |
Public | 2 | 7 | 7.5 | 8 |
KS | ||||
Private | 11 | 0 | 7.0 | 14 |
Public | 7 | 4 | 9.0 | 24 |
KY | ||||
Private | 16 | 0 | 8.0 | 27 |
Public | 8 | 5 | 11.0 | 19 |
LA | ||||
Private | 7 | 0 | 4.0 | 35 |
Public | 9 | 5 | 13.0 | 24 |
MA | ||||
Private | 20 | 0 | 25.0 | 86 |
Public | 1 | 47 | 47.0 | 47 |
MD | ||||
Private | 13 | 0 | 11.0 | 88 |
Public | 5 | 2 | 12.0 | 19 |
ME | ||||
Private | 9 | 0 | 7.0 | 37 |
Public | 8 | 5 | 7.5 | 48 |
MI | ||||
Private | 12 | 0 | 16.5 | 74 |
Public | 10 | 6 | 15.0 | 81 |
MN | ||||
Private | 6 | 0 | 16.5 | 45 |
Public | 3 | 10 | 11.0 | 15 |
MO | ||||
Private | 11 | 0 | 7.0 | 13 |
Public | 6 | 9 | 21.0 | 80 |
MS | ||||
Private | 6 | 2 | 19.0 | 32 |
Public | 8 | 2 | 11.5 | 27 |
MT | ||||
Private | 2 | 12 | 18.5 | 25 |
Public | 2 | 31 | 46.0 | 61 |
NE | ||||
Private | 7 | 0 | 9.0 | 23 |
Public | 4 | 3 | 6.0 | 11 |
NV | ||||
Private | 1 | 2 | 2.0 | 2 |
OH | ||||
Private | 14 | 0 | 21.0 | 45 |
Public | 2 | 20 | 20.5 | 21 |
OK | ||||
Private | 8 | 0 | 8.0 | 41 |
Public | 11 | 5 | 10.0 | 15 |
OR | ||||
Private | 11 | 0 | 8.0 | 28 |
Public | 8 | 5 | 11.5 | 34 |
PA | ||||
Private | 19 | 5 | 25.0 | 66 |
TN | ||||
Private | 14 | 3 | 9.0 | 39 |
Public | 6 | 8 | 11.0 | 28 |
TX | ||||
Private | 11 | 0 | 17.0 | 45 |
Public | 7 | 3 | 22.0 | 53 |
UT | ||||
Private | 1 | 9 | 9.0 | 9 |
Public | 5 | 7 | 12.0 | 22 |
VA | ||||
Private | 10 | 1 | 14.0 | 23 |
Public | 12 | 9 | 18.5 | 47 |
VT | ||||
Private | 8 | 0 | 15.5 | 20 |
Public | 4 | 4 | 7.0 | 22 |
WA | ||||
Private | 13 | 0 | 18.0 | 33 |
Public | 6 | 9 | 15.0 | 20 |
WI | ||||
Private | 12 | 0 | 15.5 | 69 |
Public | 10 | 8 | 15.5 | 38 |
WY | ||||
Public | 1 | 25 | 25.0 | 25 |
Since right-skew was prominently noticed for stem %, median was used instead of mean as median is more reliable as a measure of central tendency when the data is skewed or when outliers are present.
Amongst the public colleges, MA(47%) has the highest median stem % followed by MT(46%) and CA(28.00%) and the least mean stem % is noted for NE(6%),VT(7.0%), AR(7.5%).It is interesting to note that CA has a pretty low stem % but this could be due to CA having only one public college in the data and this particular college had a low stem % but that doesn’t represent the public colleges in CA accurately. CO (93%) has the maximum stem % however it’s median stem is 19%. Indiana has the least maximum stem percent(8%).MA(47%) has the highest min stem percent and MD,AR,CT, MS(2% for each state) have the lowest min stem percent.
Amongst the private colleges, PA(25.00%) and MA(25.00%) has the highest median stem % and the least median stem % is noted for LA(4%),AZ(3.5%) and NV(2%). The median stem % of CA private colleges(13 colleges) is 16% only since the minimum stem % is 0 and the maximum stem % is 97.
It is intriguing to note that the private colleges on have lower median stem % compared to the public colleges.
How Gender, Race and Ethnicity impact the choice of private/ public colleges across the USA
# create a summary table
gender.stats<-merged2.final%>%
group_by(state,type)%>%
summarise(number_of_colleges=n(),
across(c(percent_women,percent_men),~min(.x, na.rm=TRUE),.names = "minimum_{.col}"),
across(c(percent_women,percent_men),~mean(.x, na.rm=TRUE),.names = "average_{.col}"),
across(c(percent_women,percent_men),~max(.x,na.rm=TRUE),.names = "maximum_{.col}"))
## `summarise()` has grouped output by 'state'. You can override using the
## `.groups` argument.
gender.stats%>%gt::gt()
type | number_of_colleges | minimum_percent_women | minimum_percent_men | average_percent_women | average_percent_men | maximum_percent_women | maximum_percent_men |
---|---|---|---|---|---|---|---|
Alabama | |||||||
Private | 15 | 5.37634409 | 33.6250000 | 51.91116 | 48.08884 | 66.37500 | 94.62366 |
Public | 11 | 49.39024390 | 29.7819002 | 61.67802 | 38.32198 | 70.21810 | 50.60976 |
Alaska | |||||||
Private | 2 | 54.34782609 | 33.6787565 | 60.33453 | 39.66547 | 66.32124 | 45.65217 |
Arizona | |||||||
Private | 3 | 44.91180461 | 33.0188679 | 55.24636 | 44.75364 | 66.98113 | 55.08820 |
Public | 3 | 42.95310016 | 41.1081032 | 51.22506 | 48.77494 | 58.89190 | 57.04690 |
Arkansas | |||||||
Private | 10 | 33.14794216 | 35.9788360 | 49.92745 | 50.07255 | 64.02116 | 66.85206 |
Public | 9 | 54.07879029 | 34.8788927 | 58.46318 | 41.53682 | 65.12111 | 45.92121 |
California | |||||||
Private | 76 | 0.00000000 | 0.6072874 | 54.81189 | 45.18811 | 99.39271 | 100.00000 |
Public | 5 | 49.63775869 | 37.0936491 | 56.05517 | 43.94483 | 62.90635 | 50.36224 |
Colorado | |||||||
Private | 6 | 37.40157480 | 31.2248996 | 57.21153 | 42.78847 | 68.77510 | 62.59843 |
Public | 6 | 28.31264676 | 34.3734440 | 49.11059 | 50.88941 | 65.62656 | 71.68735 |
Connecticut | |||||||
Private | 12 | 32.80423280 | 10.9551657 | 58.09519 | 41.90481 | 89.04483 | 67.19577 |
Public | 6 | 50.13707734 | 32.6075687 | 56.47142 | 43.52858 | 67.39243 | 49.86292 |
Delaware | |||||||
Private | 1 | 43.19085487 | 56.8091451 | 43.19085 | 56.80915 | 43.19085 | 56.80915 |
Public | 2 | 56.24779541 | 37.2299295 | 59.50893 | 40.49107 | 62.77007 | 43.75220 |
Florida | |||||||
Private | 28 | 0.00000000 | 30.2799404 | 49.15176 | 50.84824 | 69.72006 | 100.00000 |
Public | 10 | 54.22673325 | 40.7673861 | 56.37840 | 43.62160 | 59.23261 | 45.77327 |
Georgia | |||||||
Private | 24 | 0.04741584 | 0.0000000 | 59.75706 | 40.24294 | 100.00000 | 99.95258 |
Public | 20 | 30.15708166 | 30.9974425 | 59.32227 | 40.67773 | 69.00256 | 69.84292 |
Hawaii | |||||||
Private | 2 | 56.66723872 | 34.6153846 | 61.02593 | 38.97407 | 65.38462 | 43.33276 |
Public | 1 | 60.60142712 | 39.3985729 | 60.60143 | 39.39857 | 60.60143 | 39.39857 |
Idaho | |||||||
Private | 3 | 49.14285714 | 41.6184971 | 53.32397 | 46.67603 | 58.38150 | 50.85714 |
Public | 4 | 46.84669287 | 39.3587361 | 53.46854 | 46.53146 | 60.64126 | 53.15331 |
Illinois | |||||||
Private | 46 | 35.56596607 | 9.7560976 | 61.04623 | 38.95377 | 90.24390 | 64.43403 |
Public | 9 | 44.90695614 | 29.8995845 | 57.03811 | 42.96189 | 70.10042 | 55.09304 |
Indiana | |||||||
Private | 19 | 0.00000000 | 32.7269386 | 50.26330 | 49.73670 | 67.27306 | 100.00000 |
Public | 4 | 55.07092468 | 36.5624317 | 60.35421 | 39.64579 | 63.43757 | 44.92908 |
Iowa | |||||||
Private | 25 | 32.94360385 | 8.8695652 | 57.39885 | 42.60115 | 91.13043 | 67.05640 |
Public | 3 | 43.98141426 | 41.7756539 | 51.26467 | 48.73533 | 58.22435 | 56.01859 |
Kansas | |||||||
Private | 11 | 40.06069803 | 34.7436941 | 53.20483 | 46.79517 | 65.25631 | 59.93930 |
Public | 8 | 49.32568844 | 37.2260386 | 54.62502 | 45.37498 | 62.77396 | 50.67431 |
Kentucky | |||||||
Private | 17 | 21.79487179 | 26.5253137 | 56.05595 | 43.94405 | 73.47469 | 78.20513 |
Public | 8 | 51.56532628 | 39.4195251 | 56.96244 | 43.03756 | 60.58047 | 48.43467 |
Louisiana | |||||||
Private | 7 | 53.26433121 | 14.6922184 | 65.51076 | 34.48924 | 85.30778 | 46.73567 |
Public | 11 | 48.82850780 | 23.7755587 | 62.17425 | 37.82575 | 76.22444 | 51.17149 |
Maine | |||||||
Private | 8 | 50.02770083 | 26.5795207 | 61.89320 | 38.10680 | 73.42048 | 49.97230 |
Public | 8 | 13.86792453 | 27.8516295 | 57.95685 | 42.04315 | 72.14837 | 86.13208 |
Maryland | |||||||
Private | 9 | 0.00000000 | 13.6034732 | 59.56967 | 40.43033 | 86.39653 | 100.00000 |
Public | 8 | 52.41806909 | 27.1943824 | 60.01231 | 39.98769 | 72.80562 | 47.58193 |
Massachusetts | |||||||
Private | 48 | 19.43835015 | 0.4434590 | 59.50762 | 40.49238 | 99.55654 | 80.56165 |
Public | 9 | 12.69205077 | 30.0238663 | 57.45505 | 42.54495 | 69.97613 | 87.30795 |
Michigan | |||||||
Private | 29 | 0.00000000 | 8.8410992 | 50.39129 | 49.60871 | 91.15890 | 100.00000 |
Public | 11 | 26.29947880 | 40.3508772 | 53.03614 | 46.96386 | 59.64912 | 73.70052 |
Minnesota | |||||||
Private | 13 | 27.79922780 | 5.8160237 | 57.18617 | 42.81383 | 94.18398 | 72.20077 |
Public | 4 | 56.25752106 | 37.4596960 | 58.24378 | 41.75622 | 62.54030 | 43.74248 |
Mississippi | |||||||
Private | 8 | 33.33333333 | 31.5751094 | 57.70600 | 42.29400 | 68.42489 | 66.66667 |
Public | 8 | 48.59469659 | 18.6943620 | 62.61132 | 37.38868 | 81.30564 | 51.40530 |
Missouri | |||||||
Private | 26 | 2.15053763 | 0.0000000 | 61.51486 | 38.48514 | 100.00000 | 97.84946 |
Public | 9 | 22.61574074 | 32.3437500 | 54.29603 | 45.70397 | 67.65625 | 77.38426 |
Montana | |||||||
Private | 4 | 48.36146971 | 29.8119964 | 59.83198 | 40.16802 | 70.18800 | 51.63853 |
Public | 1 | 32.47002398 | 67.5299760 | 32.47002 | 67.52998 | 32.47002 | 67.52998 |
Nebraska | |||||||
Private | 11 | 44.85294118 | 2.2593320 | 62.60753 | 37.39247 | 97.74067 | 55.14706 |
Public | 4 | 57.34870317 | 35.8225108 | 59.90584 | 40.09416 | 64.17749 | 42.65130 |
Nevada | |||||||
Private | 1 | 57.65158807 | 42.3484119 | 57.65159 | 42.34841 | 57.65159 | 42.34841 |
Public | 2 | 64.55026455 | 23.5443038 | 70.50298 | 29.49702 | 76.45570 | 35.44974 |
New Hampshire | |||||||
Private | 8 | 47.11019371 | 30.3149606 | 59.95840 | 40.04160 | 69.68504 | 52.88981 |
Public | 5 | 51.98764161 | 25.7582617 | 58.08928 | 41.91072 | 74.24174 | 48.01236 |
New Jersey | |||||||
Private | 15 | 0.00000000 | 25.7552483 | 43.07105 | 56.92895 | 74.24475 | 100.00000 |
Public | 9 | 25.03287620 | 36.6027532 | 54.19591 | 45.80409 | 63.39725 | 74.96712 |
New Mexico | |||||||
Private | 1 | 65.31219029 | 34.6878097 | 65.31219 | 34.68781 | 65.31219 | 34.68781 |
Public | 7 | 32.15796897 | 36.0406091 | 55.86579 | 44.13421 | 63.95939 | 67.84203 |
New York | |||||||
Private | 83 | 0.00000000 | 0.1943257 | 46.24819 | 53.75181 | 99.80567 | 100.00000 |
Public | 1 | 84.65792708 | 15.3420729 | 84.65793 | 15.34207 | 84.65793 | 15.34207 |
North Carolina | |||||||
Private | 35 | 41.98581560 | 1.5906681 | 60.34770 | 39.65230 | 98.40933 | 58.01418 |
Public | 15 | 44.26137868 | 27.8352490 | 58.95017 | 41.04983 | 72.16475 | 55.73862 |
North Dakota | |||||||
Private | 3 | 36.08695652 | 34.6030616 | 51.97229 | 48.02771 | 65.39694 | 63.91304 |
Public | 6 | 46.01613888 | 39.3548387 | 54.96614 | 45.03386 | 60.64516 | 53.98386 |
Ohio | |||||||
Private | 46 | 0.00000000 | 9.5940959 | 53.61896 | 46.38104 | 90.40590 | 100.00000 |
Public | 9 | 48.81479008 | 41.9344902 | 53.46294 | 46.53706 | 58.06551 | 51.18521 |
Oklahoma | |||||||
Private | 10 | 42.50320376 | 29.1666667 | 56.49564 | 43.50436 | 70.83333 | 57.49680 |
Public | 10 | 54.20319752 | 35.5088496 | 59.83497 | 40.16503 | 64.49115 | 45.79680 |
Oregon | |||||||
Private | 12 | 44.09030544 | 31.2138728 | 59.32605 | 40.67395 | 68.78613 | 55.90969 |
Public | 8 | 46.29107981 | 34.9528137 | 55.82764 | 44.17236 | 65.04719 | 53.70892 |
Pennsylvania | |||||||
Private | 66 | 0.00000000 | 0.6622517 | 60.29091 | 39.70909 | 99.33775 | 100.00000 |
Public | 16 | 36.06615686 | 34.8564426 | 54.77032 | 45.22968 | 65.14356 | 63.93384 |
Rhode Island | |||||||
Private | 7 | 32.20396988 | 31.7269076 | 52.26290 | 47.73710 | 68.27309 | 67.79603 |
Public | 2 | 54.77641663 | 31.0149288 | 61.88074 | 38.11926 | 68.98507 | 45.22358 |
South Carolina | |||||||
Private | 18 | 43.55909695 | 8.3513319 | 57.05298 | 42.94702 | 91.64867 | 56.44090 |
Public | 9 | 19.93318486 | 29.7920892 | 56.90451 | 43.09549 | 70.20791 | 80.06682 |
South Dakota | |||||||
Private | 4 | 56.78160920 | 35.8024691 | 60.24194 | 39.75806 | 64.19753 | 43.21839 |
Public | 8 | 22.05146533 | 29.4851794 | 55.96087 | 44.03913 | 70.51482 | 77.94853 |
Tennessee | |||||||
Private | 28 | 29.93630573 | 8.8034188 | 57.74361 | 42.25639 | 91.19658 | 70.06369 |
Public | 6 | 44.55419349 | 38.9719730 | 56.32104 | 43.67896 | 61.02803 | 55.44581 |
Texas | |||||||
Private | 38 | 11.67883212 | 32.8674121 | 51.16689 | 48.83311 | 67.13259 | 88.32117 |
Public | 28 | 43.26044598 | 11.6117046 | 58.63384 | 41.36616 | 88.38830 | 56.73955 |
Utah | |||||||
Private | 2 | 44.87600052 | 39.4268518 | 52.72457 | 47.27543 | 60.57315 | 55.12400 |
Public | 2 | 44.51213708 | 46.5747091 | 48.96871 | 51.03129 | 53.42529 | 55.48786 |
Vermont | |||||||
Private | 9 | 20.28867102 | 30.1115242 | 53.37629 | 46.62371 | 69.88848 | 79.71133 |
Public | 3 | 42.86713287 | 44.4306161 | 48.28991 | 51.71009 | 55.56938 | 57.13287 |
Virginia | |||||||
Private | 22 | 0.18099548 | 3.5714286 | 58.61043 | 41.38957 | 96.42857 | 99.81900 |
Public | 12 | 11.05882353 | 31.8485086 | 54.91656 | 45.08344 | 68.15149 | 88.94118 |
Washington | |||||||
Private | 11 | 51.36035536 | 17.7204659 | 61.30630 | 38.69370 | 82.27953 | 48.63964 |
Public | 6 | 51.34211811 | 44.3841522 | 53.63577 | 46.36423 | 55.61585 | 48.65788 |
West Virginia | |||||||
Private | 4 | 41.72185430 | 43.5555556 | 52.28120 | 47.71880 | 56.44444 | 58.27815 |
Public | 9 | 39.09595559 | 38.0678183 | 53.42305 | 46.57695 | 61.93218 | 60.90404 |
Wisconsin | |||||||
Private | 20 | 24.05693950 | 3.1393889 | 65.38474 | 34.61526 | 96.86061 | 75.94306 |
Public | 2 | 49.24419842 | 47.8318292 | 50.70618 | 49.29382 | 52.16817 | 50.75580 |
Wyoming | |||||||
Public | 1 | 51.95787832 | 48.0421217 | 51.95788 | 48.04212 | 51.95788 | 48.04212 |
Amongst the public colleges, the highest mean % women enrollment is noted for NY(84.65%) and NV(70.50%) and the lowest was noted for Montana(32.47%) and Vermont(48.28%). The highest mean % women enrollment for CA and MA public colleges is 56.05% and 57.4%. The % enrollment of men in public colleges would be (100-% women enrollment) and hence the lowest mean % enrollment of men is noted for NY and NV and the highest for Montana and Vermont.
One of the limitation of this data set is only women enrollment was provided and by taking the difference from the total enrollment and women enrollment, men enrollment was calculated. This calculation holds good only for binary gender groups and doesn’t include the other gender groups.
# create a summary table
race_ethnicity.stats<-merged2.final%>%
group_by(state,type)%>%
summarise(number_of_colleges=n(),
across(percent_american_indian_alaska_native:percent_resident,~min(.x, na.rm=TRUE),.names = "minimum_{.col}"),
across(percent_american_indian_alaska_native:percent_resident,~mean(.x, na.rm=TRUE),.names = "average_{.col}"),
across(percent_american_indian_alaska_native:percent_resident,~max(.x,na.rm=TRUE),.names = "maximum_{.col}"))
## `summarise()` has grouped output by 'state'. You can override using the
## `.groups` argument.
race_ethnicity.stats%>%gt::gt()
type | number_of_colleges | minimum_percent_american_indian_alaska_native | minimum_percent_asian | minimum_percent_black | minimum_percent_hispanic | minimum_percent_native_hawaiian_pacific_islander | minimum_percent_white | minimum_percent_two_or_more_races | minimum_percent_unknown | minimum_percent_non_resident_foreign | minimum_percent_total_minority | minimum_percent_resident | average_percent_american_indian_alaska_native | average_percent_asian | average_percent_black | average_percent_hispanic | average_percent_native_hawaiian_pacific_islander | average_percent_white | average_percent_two_or_more_races | average_percent_unknown | average_percent_non_resident_foreign | average_percent_total_minority | average_percent_resident | maximum_percent_american_indian_alaska_native | maximum_percent_asian | maximum_percent_black | maximum_percent_hispanic | maximum_percent_native_hawaiian_pacific_islander | maximum_percent_white | maximum_percent_two_or_more_races | maximum_percent_unknown | maximum_percent_non_resident_foreign | maximum_percent_total_minority | maximum_percent_resident |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Alabama | ||||||||||||||||||||||||||||||||||
Private | 15 | 0.00000000 | 0.00000000 | 7.5268817 | 0.0000000 | 0.00000000 | 0.8960573 | 0.00000000 | 0.00000000 | 0.00000000 | 14.960470 | 92.315627 | 0.5611901 | 0.8230765 | 53.2076651 | 2.177298 | 0.09911336 | 35.29526 | 0.87036975 | 5.4710553 | 1.49497673 | 57.738713 | 98.50502 | 2.1505376 | 5.2320675 | 98.5507246 | 5.595930 | 1.07526882 | 81.22846 | 3.27034884 | 28.8000000 | 7.6843734 | 98.550725 | 100.00000 |
Public | 11 | 0.14495380 | 0.15041364 | 7.2784810 | 0.7118845 | 0.00000000 | 3.7687987 | 0.00000000 | 1.06963312 | 0.54347826 | 12.615776 | 94.492899 | 0.6471454 | 1.6061984 | 28.5768747 | 2.108923 | 0.08325071 | 58.64694 | 1.67328864 | 3.4049145 | 3.25246816 | 34.695681 | 96.74753 | 1.5345269 | 4.9791422 | 89.6901613 | 3.497715 | 0.20879469 | 80.48395 | 3.01101722 | 6.2289895 | 5.5071010 | 92.498641 | 99.45652 |
Alaska | ||||||||||||||||||||||||||||||||||
Private | 2 | 6.52173913 | 0.00000000 | 0.0000000 | 2.1739130 | 0.00000000 | 55.4404145 | 2.17391304 | 0.00000000 | 0.00000000 | 10.869565 | 100.000000 | 10.7738229 | 0.8635579 | 1.1226252 | 2.382293 | 0.34542314 | 72.28542 | 4.62754374 | 7.5993092 | 0.00000000 | 20.115266 | 100.00000 | 15.0259067 | 1.7271157 | 2.2452504 | 2.590674 | 0.69084629 | 89.13043 | 7.08117444 | 15.1986183 | 0.0000000 | 29.360967 | 100.00000 |
Arizona | ||||||||||||||||||||||||||||||||||
Private | 3 | 0.94979647 | 1.41509434 | 2.8301887 | 2.4764151 | 0.23584906 | 6.5934066 | 0.00000000 | 0.00000000 | 0.00000000 | 14.150943 | 96.698113 | 23.5257010 | 2.0184101 | 5.2624602 | 10.743414 | 1.76995983 | 47.53874 | 3.07078458 | 4.3819283 | 1.68859818 | 46.390730 | 98.31140 | 67.0329670 | 2.4423338 | 7.4626866 | 15.468114 | 4.39560440 | 75.23585 | 4.61329715 | 7.3113208 | 3.3018868 | 93.406593 | 100.00000 |
Public | 3 | 1.22407425 | 1.68922577 | 3.2051976 | 16.3732114 | 0.20468998 | 51.9773450 | 3.50755167 | 0.78686158 | 4.49738314 | 31.291733 | 84.193164 | 1.8475064 | 4.5607107 | 3.3692364 | 19.338836 | 0.21780097 | 55.51847 | 4.04685803 | 1.4155663 | 9.68501602 | 33.380949 | 90.31498 | 3.0247248 | 6.3294913 | 3.5830684 | 22.267734 | 0.22492660 | 62.37141 | 4.82584371 | 2.5357515 | 15.8068362 | 36.506771 | 95.50262 |
Arkansas | ||||||||||||||||||||||||||||||||||
Private | 10 | 0.00000000 | 0.11123471 | 3.7894737 | 0.3527337 | 0.00000000 | 0.5291005 | 0.00000000 | 0.00000000 | 0.88987764 | 11.239478 | 89.634146 | 1.0391711 | 1.2956800 | 25.1500961 | 5.387377 | 0.05549444 | 59.34981 | 1.46284304 | 1.5496239 | 4.70990631 | 34.390662 | 95.29009 | 1.8292683 | 5.6701031 | 94.2157953 | 14.650767 | 0.17636684 | 83.34444 | 3.74787053 | 6.4606742 | 10.3658537 | 94.994438 | 99.11012 |
Public | 9 | 0.11937923 | 0.59678256 | 4.3923865 | 1.5917230 | 0.00000000 | 4.6955830 | 0.07958615 | 0.00000000 | 0.44110016 | 18.655224 | 95.084630 | 0.6727289 | 1.8417061 | 25.4832340 | 4.471255 | 0.13035704 | 60.82006 | 3.06935664 | 0.8577156 | 2.65358582 | 35.668637 | 97.34641 | 2.7232796 | 4.5674740 | 91.3251094 | 9.677892 | 0.78962211 | 77.60373 | 7.06741091 | 3.1487889 | 4.9153701 | 93.712694 | 99.55890 |
California | ||||||||||||||||||||||||||||||||||
Private | 76 | 0.00000000 | 0.00000000 | 0.0000000 | 0.0000000 | 0.00000000 | 0.0000000 | 0.00000000 | 0.00000000 | 0.00000000 | 0.000000 | 3.378378 | 0.4990811 | 11.3811370 | 5.9086256 | 18.109358 | 0.62276228 | 40.82164 | 3.89647241 | 6.4061661 | 12.35476169 | 40.417436 | 87.64524 | 5.0000000 | 60.9929078 | 21.8303946 | 48.186528 | 3.10663308 | 96.93252 | 9.63130173 | 62.2549020 | 96.6216216 | 87.009063 | 100.00000 |
Public | 5 | 0.13450310 | 3.45315262 | 2.1743811 | 22.6423746 | 0.21528525 | 21.7405924 | 4.76568948 | 4.15431174 | 1.41426046 | 39.171152 | 89.056338 | 0.4288308 | 15.7942713 | 3.4476867 | 25.740584 | 0.41514383 | 36.04037 | 5.76817383 | 6.5548161 | 5.81012715 | 51.594690 | 94.18987 | 1.0017678 | 31.7335616 | 4.6767351 | 29.217812 | 0.73976707 | 51.87298 | 6.52314316 | 8.2334974 | 10.9436615 | 63.161434 | 98.58574 |
Colorado | ||||||||||||||||||||||||||||||||||
Private | 6 | 0.43541364 | 0.78740157 | 1.4056225 | 6.6929134 | 0.00000000 | 61.4031277 | 2.54126846 | 0.00000000 | 0.00000000 | 18.110236 | 89.313236 | 0.6500631 | 2.8170575 | 4.3132906 | 9.728886 | 0.16946646 | 69.32110 | 4.73317466 | 4.4980127 | 3.76895278 | 22.411939 | 96.23105 | 1.1811024 | 4.6481321 | 6.7565232 | 15.117289 | 0.39370079 | 81.88976 | 8.36961780 | 8.6120765 | 10.6867643 | 28.811903 | 100.00000 |
Public | 6 | 0.21804764 | 0.63307834 | 0.8177262 | 6.9942972 | 0.05031869 | 54.4711158 | 2.86377709 | 1.76115398 | 0.00000000 | 16.101979 | 88.527340 | 4.4050785 | 1.5726292 | 2.6744927 | 13.681386 | 0.31768426 | 63.58903 | 3.91688512 | 7.0134684 | 2.82934492 | 26.568156 | 97.17066 | 23.3975204 | 3.8745387 | 6.0558022 | 25.745086 | 0.60333480 | 72.27951 | 5.98786600 | 15.1315789 | 11.4726602 | 41.281984 | 100.00000 |
Connecticut | ||||||||||||||||||||||||||||||||||
Private | 12 | 0.00000000 | 0.45161290 | 1.0582011 | 6.3873538 | 0.00000000 | 33.4232325 | 1.05820106 | 0.00000000 | 0.70967742 | 13.214913 | 75.900597 | 0.3499549 | 3.6141543 | 10.5126543 | 9.080962 | 0.09848934 | 58.38210 | 1.94630596 | 6.9648024 | 9.05057636 | 25.602520 | 90.94942 | 1.7994859 | 12.8161479 | 33.2258065 | 14.000000 | 0.21190522 | 75.97122 | 4.43417639 | 19.3372320 | 24.0994028 | 49.483871 | 99.29032 |
Public | 6 | 0.15447798 | 1.29600829 | 5.0977733 | 7.2642327 | 0.01847575 | 57.8019699 | 1.12567204 | 2.57056452 | 0.20161290 | 21.524494 | 89.827060 | 0.2255267 | 3.7201589 | 10.6672862 | 10.894874 | 0.09255368 | 63.54261 | 2.16476152 | 6.3233951 | 2.36883308 | 27.765161 | 97.63117 | 0.3110420 | 8.7411929 | 16.7444272 | 15.120968 | 0.16801075 | 69.33989 | 2.59126159 | 8.6374134 | 10.1729400 | 33.229653 | 99.79839 |
Delaware | ||||||||||||||||||||||||||||||||||
Private | 1 | 0.19880716 | 2.73359841 | 10.6858847 | 1.3916501 | 0.00000000 | 19.6322068 | 1.29224652 | 0.49701789 | 63.56858847 | 16.302187 | 36.431412 | 0.1988072 | 2.7335984 | 10.6858847 | 1.391650 | 0.00000000 | 19.63221 | 1.29224652 | 0.4970179 | 63.56858847 | 16.302187 | 36.43141 | 0.1988072 | 2.7335984 | 10.6858847 | 1.391650 | 0.00000000 | 19.63221 | 1.29224652 | 0.4970179 | 63.5685885 | 16.302187 | 36.43141 |
Public | 2 | 0.07495591 | 1.06891062 | 5.4320988 | 5.1398681 | 0.09259259 | 12.2583580 | 2.53968254 | 1.49470899 | 4.91244030 | 19.069665 | 91.141975 | 0.1739346 | 2.7787234 | 35.8750214 | 5.790834 | 0.10315324 | 41.41798 | 3.08926360 | 3.8858580 | 6.88523250 | 47.810930 | 93.11477 | 0.2729134 | 4.4885362 | 66.3179441 | 6.441799 | 0.11371390 | 70.57760 | 3.63884467 | 6.2770071 | 8.8580247 | 76.552195 | 95.08756 |
Florida | ||||||||||||||||||||||||||||||||||
Private | 28 | 0.00000000 | 0.00000000 | 0.0000000 | 0.0000000 | 0.00000000 | 0.5890052 | 0.00000000 | 0.00000000 | 0.00000000 | 0.000000 | 64.899108 | 0.3882364 | 1.9424517 | 20.4182202 | 12.287918 | 0.23402674 | 51.01961 | 2.02478246 | 5.1267983 | 6.55795435 | 37.295635 | 93.44205 | 1.4414414 | 7.5471698 | 88.5262117 | 39.814815 | 1.85185185 | 100.00000 | 5.40540541 | 19.7552020 | 35.1008916 | 94.187298 | 100.00000 |
Public | 10 | 0.08466035 | 1.62371312 | 2.6378897 | 8.3478813 | 0.00000000 | 11.4553517 | 2.10038299 | 0.40040040 | 1.59607545 | 24.940048 | 91.316040 | 0.2292512 | 3.9775278 | 9.8004227 | 20.966752 | 0.18384279 | 55.80071 | 3.21755099 | 1.6367859 | 4.18716044 | 38.375347 | 95.81284 | 0.6110141 | 7.0785903 | 18.7411295 | 62.912719 | 0.48525041 | 70.74341 | 4.87362362 | 3.2006308 | 8.6839605 | 80.151179 | 98.40392 |
Georgia | ||||||||||||||||||||||||||||||||||
Private | 24 | 0.00000000 | 0.00000000 | 2.7280477 | 0.0000000 | 0.00000000 | 0.0000000 | 0.00000000 | 0.00000000 | 0.00000000 | 9.974425 | 80.620428 | 0.2957594 | 2.9849375 | 35.7465735 | 4.155618 | 0.09349994 | 43.71465 | 1.70773236 | 7.2180669 | 4.08316281 | 44.984121 | 95.91684 | 0.7262164 | 15.4445122 | 100.0000000 | 10.329068 | 0.35149385 | 85.25149 | 5.95647194 | 24.6800731 | 19.3795717 | 100.000000 | 100.00000 |
Public | 20 | 0.06923709 | 0.10172940 | 4.2579681 | 1.1508951 | 0.00000000 | 4.6998983 | 0.56265985 | 0.03662109 | 0.34469552 | 16.361488 | 78.575447 | 0.2471032 | 3.6693637 | 33.5559831 | 5.969450 | 0.11231804 | 49.35042 | 2.59409998 | 1.6700013 | 2.83126455 | 46.148318 | 97.16874 | 0.4337866 | 15.5047817 | 91.4417887 | 19.715698 | 0.27541312 | 81.80744 | 4.10615924 | 4.2027194 | 21.4245532 | 94.217425 | 99.65530 |
Hawaii | ||||||||||||||||||||||||||||||||||
Private | 2 | 0.41187575 | 15.80573194 | 5.5152395 | 5.8780842 | 2.19667067 | 16.5094340 | 9.61538462 | 2.36828557 | 2.14078374 | 54.384761 | 85.807448 | 0.6413515 | 21.0741287 | 5.7179767 | 9.494731 | 8.88135420 | 22.78192 | 13.27688743 | 9.9649846 | 8.16666783 | 59.086430 | 91.83333 | 0.8708273 | 26.3425254 | 5.9207139 | 13.111378 | 15.56603774 | 29.05440 | 16.93839025 | 17.5616836 | 14.1925519 | 63.788099 | 97.85922 |
Public | 1 | 0.50968400 | 20.54026504 | 1.1722732 | 11.6462793 | 9.96432212 | 22.8338430 | 28.77166157 | 0.33129460 | 4.23037717 | 72.604485 | 95.769623 | 0.5096840 | 20.5402650 | 1.1722732 | 11.646279 | 9.96432212 | 22.83384 | 28.77166157 | 0.3312946 | 4.23037717 | 72.604485 | 95.76962 | 0.5096840 | 20.5402650 | 1.1722732 | 11.646279 | 9.96432212 | 22.83384 | 28.77166157 | 0.3312946 | 4.2303772 | 72.604485 | 95.76962 |
Idaho | ||||||||||||||||||||||||||||||||||
Private | 3 | 0.52447552 | 0.57142857 | 1.1428571 | 5.1428571 | 0.00000000 | 63.3741259 | 1.71428571 | 3.42857143 | 0.57142857 | 10.285714 | 93.094406 | 0.9537534 | 1.5230550 | 1.5703699 | 9.407099 | 0.27908026 | 75.69288 | 1.92287096 | 5.3730200 | 3.27787530 | 15.656228 | 96.72212 | 1.7142857 | 2.7972028 | 1.9230769 | 15.297203 | 0.43706294 | 85.71429 | 2.09790210 | 6.6433566 | 6.9055944 | 23.076923 | 99.42857 |
Public | 4 | 0.59387232 | 0.90613383 | 0.7667286 | 4.9953532 | 0.18800205 | 71.7849430 | 2.31588354 | 2.67193309 | 2.60223048 | 11.640335 | 89.887557 | 1.2274179 | 1.3941204 | 1.1870040 | 7.730865 | 0.25449549 | 76.66083 | 2.75868094 | 3.1647368 | 5.62185138 | 14.552583 | 94.37815 | 2.3001859 | 1.9615783 | 1.6061547 | 9.668421 | 0.31943132 | 83.08550 | 3.21312596 | 3.6786060 | 10.1124432 | 17.285284 | 97.39777 |
Illinois | ||||||||||||||||||||||||||||||||||
Private | 46 | 0.00000000 | 0.00000000 | 0.0000000 | 0.0000000 | 0.00000000 | 2.5225225 | 0.00000000 | 0.00000000 | 0.00000000 | 0.000000 | 51.671309 | 0.2809328 | 4.9724991 | 10.1155932 | 10.165297 | 0.19748833 | 61.80488 | 2.47353844 | 5.0380218 | 4.95174849 | 28.205349 | 95.04825 | 1.1702128 | 15.3543307 | 59.4594595 | 32.447113 | 0.84772370 | 100.00000 | 7.98747063 | 22.3085461 | 48.3286908 | 95.135135 | 100.00000 |
Public | 9 | 0.08639787 | 0.89756535 | 4.8427116 | 4.9814877 | 0.03838035 | 5.2005373 | 0.00000000 | 0.49963619 | 2.04293629 | 19.980597 | 77.933097 | 0.1691045 | 6.0218893 | 20.4775320 | 12.008570 | 0.12310473 | 49.25631 | 1.89414275 | 3.3167118 | 6.73263146 | 40.694344 | 93.26737 | 0.2596953 | 18.6956988 | 68.7967761 | 31.727494 | 0.25304136 | 77.45816 | 2.76066178 | 9.6334677 | 22.0669030 | 78.257532 | 97.95706 |
Indiana | ||||||||||||||||||||||||||||||||||
Private | 19 | 0.00000000 | 0.00000000 | 2.3869347 | 1.4925373 | 0.00000000 | 3.4825871 | 0.41493776 | 0.00000000 | 0.00000000 | 5.800575 | 86.287996 | 0.2926445 | 2.0916343 | 12.6468480 | 4.713684 | 0.06698609 | 69.18740 | 2.45706312 | 2.6448209 | 5.89892250 | 22.268860 | 94.10108 | 0.7462687 | 4.7048198 | 92.2885572 | 12.336892 | 0.32618826 | 87.77565 | 6.50112867 | 7.4788681 | 13.7120036 | 96.268657 | 100.00000 |
Public | 4 | 0.12587751 | 1.02777170 | 3.4331948 | 2.1999146 | 0.04373497 | 66.3202609 | 1.91157625 | 1.74939864 | 0.69975946 | 9.490488 | 92.429644 | 0.2103463 | 1.1449611 | 7.8016160 | 2.803577 | 0.07384205 | 79.68787 | 2.21488553 | 2.4348506 | 3.62804966 | 14.249228 | 96.37195 | 0.2730790 | 1.3578093 | 16.6805735 | 3.350278 | 0.11619463 | 88.06035 | 2.83698703 | 4.1103849 | 7.5703558 | 24.205416 | 99.30024 |
Iowa | ||||||||||||||||||||||||||||||||||
Private | 25 | 0.00000000 | 0.49592632 | 1.0272759 | 0.3478261 | 0.00000000 | 3.1250000 | 0.00000000 | 0.00000000 | 0.00000000 | 5.043478 | 31.568088 | 0.3556412 | 2.8814917 | 4.6048481 | 4.795584 | 0.44292186 | 72.39513 | 1.80619894 | 4.5756429 | 8.14253662 | 14.886686 | 91.85746 | 1.0273973 | 28.1250000 | 16.1920715 | 12.891344 | 7.05962988 | 92.46575 | 4.26758939 | 20.9302326 | 68.4319120 | 44.791667 | 100.00000 |
Public | 3 | 0.16683350 | 0.90543260 | 2.5874837 | 3.1606304 | 0.05868545 | 67.1271271 | 1.76056338 | 2.25519785 | 5.24815560 | 8.936955 | 87.610944 | 0.2029154 | 2.3862456 | 2.7585718 | 4.382269 | 0.07848785 | 74.27715 | 1.93805610 | 4.2445687 | 9.73173972 | 11.746545 | 90.26826 | 0.2323218 | 3.5235235 | 2.8461795 | 5.772439 | 0.09002468 | 83.55969 | 2.10210210 | 5.9859860 | 12.3890557 | 14.497831 | 94.75184 |
Kansas | ||||||||||||||||||||||||||||||||||
Private | 11 | 0.30349014 | 0.32154341 | 2.5723473 | 2.8938907 | 0.00000000 | 46.0085837 | 0.96463023 | 0.45523520 | 0.00000000 | 8.360129 | 95.176849 | 1.0486529 | 1.4372847 | 8.7345840 | 7.369121 | 0.31167230 | 69.91048 | 2.52445367 | 6.7549818 | 1.90877293 | 21.425768 | 98.09123 | 1.8261752 | 4.8820179 | 18.0257511 | 12.535211 | 0.60698027 | 79.04584 | 6.26992561 | 21.6309013 | 4.8231511 | 31.673820 | 100.00000 |
Public | 8 | 0.37955261 | 0.00000000 | 0.0000000 | 0.0000000 | 0.00000000 | 0.0000000 | 0.00000000 | 0.00000000 | 0.00000000 | 13.547920 | 74.235081 | 13.0673817 | 1.9318196 | 3.7788280 | 5.377596 | 0.09570302 | 59.66256 | 3.02190640 | 3.6233675 | 9.44084053 | 27.273234 | 90.55916 | 100.0000000 | 6.2687563 | 5.5285095 | 8.596199 | 0.14536057 | 77.71092 | 4.67778868 | 16.9294853 | 25.7649186 | 100.000000 | 100.00000 |
Kentucky | ||||||||||||||||||||||||||||||||||
Private | 17 | 0.00000000 | 0.00000000 | 0.0000000 | 0.0000000 | 0.00000000 | 62.3072178 | 0.00000000 | 0.00000000 | 0.00000000 | 0.000000 | 92.288711 | 0.2648063 | 1.0410234 | 7.9457530 | 2.382887 | 0.09232473 | 76.81884 | 1.82047481 | 7.0383214 | 2.59557407 | 13.547269 | 97.40443 | 0.8522727 | 3.3165105 | 18.3903072 | 5.965909 | 0.26609899 | 96.15385 | 5.42874769 | 20.1060204 | 7.7112893 | 28.130783 | 100.00000 |
Public | 8 | 0.15769213 | 0.37097358 | 3.7006877 | 1.4657980 | 0.05353797 | 29.0765172 | 1.28483532 | 0.27828023 | 1.26649077 | 7.111835 | 92.638529 | 0.2218325 | 1.3789247 | 12.8920323 | 2.420546 | 0.10569806 | 73.88816 | 2.15688352 | 3.0859078 | 3.85001475 | 19.175918 | 96.14999 | 0.3644798 | 3.4738648 | 53.1398417 | 3.770697 | 0.21108179 | 90.20992 | 3.63619498 | 11.7678100 | 7.3614705 | 57.889182 | 98.73351 |
Louisiana | ||||||||||||||||||||||||||||||||||
Private | 7 | 0.00000000 | 0.00000000 | 8.2202650 | 0.5000000 | 0.00000000 | 0.3333333 | 0.00000000 | 0.00000000 | 0.00000000 | 21.367928 | 89.605649 | 0.5354838 | 4.0767140 | 34.7178595 | 4.855675 | 0.08057357 | 46.91237 | 1.93524502 | 3.1221840 | 3.76389907 | 46.201551 | 96.23610 | 0.9693053 | 12.9032258 | 92.3333333 | 12.494226 | 0.17421603 | 70.27464 | 3.55411955 | 7.1593533 | 10.3943506 | 93.583333 | 100.00000 |
Public | 11 | 0.14265335 | 0.22202487 | 12.8374165 | 0.6181645 | 0.00000000 | 2.2202487 | 0.00000000 | 0.07284205 | 0.00000000 | 17.318486 | 93.350661 | 0.7328287 | 1.7430960 | 30.9478606 | 3.422808 | 0.06242392 | 54.30882 | 2.35380103 | 3.4826453 | 2.94571532 | 39.262818 | 97.05428 | 1.8595041 | 7.1041802 | 90.7637655 | 9.096816 | 0.14253898 | 71.90970 | 5.22537447 | 12.1730861 | 6.6493394 | 93.472469 | 100.00000 |
Maine | ||||||||||||||||||||||||||||||||||
Private | 8 | 0.00000000 | 1.32969656 | 0.7772021 | 1.2754705 | 0.00000000 | 60.0974553 | 0.00000000 | 0.00000000 | 0.00000000 | 7.228094 | 84.715026 | 0.2935869 | 3.6471025 | 3.5751147 | 5.240684 | 0.05904883 | 70.18189 | 3.07155363 | 8.5203383 | 5.41067965 | 15.887091 | 94.58932 | 0.5181347 | 6.3157895 | 6.0195987 | 12.686981 | 0.23331778 | 87.75988 | 6.48199446 | 29.9352199 | 15.2849741 | 30.470914 | 100.00000 |
Public | 8 | 0.28301887 | 0.35149385 | 0.5660377 | 0.7547170 | 0.00000000 | 63.3006782 | 0.00000000 | 2.54716981 | 0.00000000 | 2.452830 | 91.559910 | 1.3500610 | 0.8111373 | 1.8375999 | 1.622170 | 0.07849710 | 77.58514 | 1.75485137 | 12.1270052 | 2.83354204 | 7.454316 | 97.16646 | 2.7160494 | 1.9458946 | 3.2098765 | 2.962963 | 0.18867925 | 95.00000 | 2.46045694 | 21.0248681 | 8.4400904 | 10.740741 | 100.00000 |
Maryland | ||||||||||||||||||||||||||||||||||
Private | 9 | 0.00000000 | 0.00000000 | 0.0000000 | 0.0000000 | 0.00000000 | 47.6137002 | 0.00000000 | 0.00000000 | 0.60157335 | 0.000000 | 82.360097 | 0.2394598 | 5.2044198 | 11.5754389 | 5.121979 | 0.09840825 | 65.31147 | 2.14089430 | 4.9964023 | 5.31152498 | 24.380600 | 94.68848 | 0.7959479 | 13.6580573 | 29.3151319 | 7.876655 | 0.21707670 | 94.53125 | 5.12820513 | 18.7264151 | 17.6399027 | 39.819528 | 99.39843 |
Public | 8 | 0.03191829 | 0.15959145 | 7.9023823 | 1.9470156 | 0.00000000 | 1.4363230 | 1.91509735 | 0.51961548 | 1.51074956 | 22.486926 | 89.435046 | 0.1963727 | 2.3802636 | 43.4784274 | 4.144900 | 0.09675264 | 39.51956 | 3.35984981 | 2.8240257 | 3.99984778 | 53.656566 | 96.00015 | 0.3876853 | 4.7072022 | 83.4240562 | 7.088902 | 0.18685768 | 73.03893 | 4.41603719 | 5.3578640 | 10.5649537 | 90.640913 | 98.48925 |
Massachusetts | ||||||||||||||||||||||||||||||||||
Private | 48 | 0.00000000 | 1.06382979 | 0.8571429 | 2.4316109 | 0.00000000 | 13.3484163 | 0.00000000 | 0.00000000 | 0.00000000 | 8.376609 | 61.381074 | 0.2022323 | 6.1958632 | 7.3737226 | 7.665594 | 0.07901937 | 54.86901 | 2.98601752 | 9.6945052 | 10.93403489 | 24.502449 | 89.06597 | 0.6787330 | 23.1597073 | 35.8123570 | 22.034495 | 0.49751244 | 91.48936 | 7.69230769 | 39.8933333 | 38.6189258 | 59.267735 | 100.00000 |
Public | 9 | 0.10770888 | 1.10741971 | 2.2044088 | 2.8056112 | 0.00000000 | 60.5727924 | 0.66800267 | 1.80360721 | 0.39660057 | 9.485638 | 95.662027 | 0.3806267 | 2.7624481 | 5.6671475 | 7.166784 | 0.06016574 | 74.60185 | 2.21910697 | 5.6799776 | 1.46189842 | 18.256279 | 98.53810 | 1.2024048 | 6.3961814 | 8.3852691 | 10.531995 | 0.19093079 | 88.24315 | 3.79686758 | 16.5632458 | 4.3379735 | 23.146649 | 99.60340 |
Michigan | ||||||||||||||||||||||||||||||||||
Private | 29 | 0.00000000 | 0.00000000 | 0.0000000 | 0.0000000 | 0.00000000 | 28.1250000 | 0.00000000 | 0.00000000 | 0.00000000 | 0.000000 | 78.730252 | 0.4275021 | 2.9250017 | 11.8024390 | 3.860043 | 0.30408100 | 66.64480 | 2.07622499 | 6.9982826 | 4.96162253 | 21.395291 | 95.03838 | 3.1250000 | 23.0386052 | 65.6250000 | 13.399649 | 5.75022462 | 100.00000 | 4.79123888 | 22.9110512 | 21.2697484 | 71.875000 | 100.00000 |
Public | 11 | 0.23213249 | 0.54009140 | 1.3382167 | 1.6481195 | 0.00000000 | 54.4056857 | 0.00000000 | 0.43038176 | 1.59799155 | 6.916467 | 84.603465 | 1.1481290 | 2.5091218 | 8.9676443 | 3.402197 | 0.07528080 | 71.69357 | 2.31556707 | 3.0733725 | 6.81511362 | 18.417939 | 93.18489 | 8.6414624 | 7.4588440 | 18.1197268 | 4.670904 | 0.12691276 | 83.19519 | 3.14271684 | 6.1654288 | 15.3965347 | 31.644789 | 98.40201 |
Minnesota | ||||||||||||||||||||||||||||||||||
Private | 13 | 0.04823927 | 0.46674446 | 0.7001167 | 0.9334889 | 0.00000000 | 50.4587156 | 0.00000000 | 0.00000000 | 0.00000000 | 3.383897 | 86.444766 | 0.6100444 | 4.2869347 | 3.2824122 | 4.028990 | 0.05300242 | 73.40600 | 2.73655982 | 7.8374794 | 3.75858135 | 14.997944 | 96.24142 | 3.5460993 | 8.9450656 | 8.1081081 | 6.319883 | 0.18832392 | 94.63244 | 6.37065637 | 31.8639517 | 13.5552340 | 23.869713 | 100.00000 |
Public | 4 | 0.10363888 | 1.02286402 | 1.6646611 | 1.8852788 | 0.02005616 | 55.5236385 | 2.02291696 | 1.56609857 | 1.94544725 | 8.190692 | 96.672041 | 0.9301768 | 4.0683518 | 5.8917111 | 2.951201 | 0.05694801 | 77.77681 | 2.80483632 | 3.1548336 | 2.36512843 | 16.703225 | 97.63487 | 2.6073004 | 12.0526631 | 17.9413525 | 4.751646 | 0.10771993 | 86.81567 | 4.14123279 | 5.4352186 | 3.3279595 | 39.664871 | 98.05455 |
Mississippi | ||||||||||||||||||||||||||||||||||
Private | 8 | 0.00000000 | 0.00000000 | 10.2941176 | 0.0000000 | 0.00000000 | 0.4153686 | 0.00000000 | 0.00000000 | 0.00000000 | 13.602941 | 92.174960 | 0.6126403 | 1.3950833 | 43.1429884 | 1.786073 | 0.03380897 | 47.56644 | 0.45078958 | 2.3691127 | 2.64306054 | 47.421384 | 97.35694 | 2.3809524 | 4.1567696 | 98.5555556 | 3.500243 | 0.17015070 | 84.92647 | 1.48274186 | 8.8964511 | 7.8250401 | 99.111111 | 100.00000 |
Public | 8 | 0.04500450 | 0.27480077 | 14.1225614 | 0.3297609 | 0.00000000 | 2.8029678 | 0.00000000 | 0.00000000 | 0.00000000 | 21.219393 | 96.126726 | 0.2495251 | 0.9840197 | 50.9659836 | 1.488892 | 0.10082563 | 41.83147 | 1.15531855 | 0.7814013 | 2.44256433 | 54.944564 | 97.55744 | 0.5164366 | 1.9064125 | 93.0475405 | 3.089508 | 0.30228085 | 75.02111 | 2.13628989 | 3.6003600 | 3.8732744 | 95.603188 | 100.00000 |
Missouri | ||||||||||||||||||||||||||||||||||
Private | 26 | 0.00000000 | 0.34465780 | 0.9913259 | 0.7877893 | 0.00000000 | 42.7336156 | 0.00000000 | 0.00000000 | 0.00000000 | 5.328377 | 87.315634 | 0.5843973 | 1.9178370 | 8.6849277 | 4.605814 | 0.19764224 | 72.59577 | 2.43146465 | 5.6032551 | 3.37889531 | 18.422083 | 96.62110 | 2.2304833 | 7.5268817 | 37.0743634 | 16.129032 | 0.94395280 | 94.29988 | 5.94795539 | 23.5625705 | 12.6843658 | 49.013060 | 100.00000 |
Public | 9 | 0.07812500 | 0.15625000 | 3.3564815 | 1.0284539 | 0.00000000 | 7.6562500 | 0.10755357 | 1.85185185 | 0.70312500 | 10.835467 | 82.453704 | 0.6143217 | 1.3297326 | 14.4737154 | 2.462243 | 0.09693745 | 66.59804 | 1.97585449 | 4.6847848 | 7.76436645 | 20.952804 | 92.23563 | 2.9217887 | 2.6157407 | 82.6562500 | 3.830394 | 0.20569078 | 80.66581 | 2.80986432 | 13.5062411 | 17.5462963 | 87.031250 | 99.29688 |
Montana | ||||||||||||||||||||||||||||||||||
Private | 4 | 1.25000000 | 0.00000000 | 0.5820722 | 1.6298021 | 0.23282887 | 21.3038417 | 0.00000000 | 0.00000000 | 0.00000000 | 10.277778 | 95.332671 | 20.3219605 | 1.1499986 | 1.6457009 | 4.146048 | 0.36368964 | 63.52879 | 2.48790826 | 4.1854713 | 2.17043496 | 30.115306 | 97.82957 | 76.2514552 | 2.1486124 | 3.1777557 | 8.236347 | 0.49652433 | 78.68056 | 6.65342602 | 9.4444444 | 4.6673287 | 78.696158 | 100.00000 |
Public | 1 | 1.67865707 | 0.95923261 | 1.1031175 | 1.6306954 | 0.00000000 | 77.5539568 | 0.04796163 | 6.37889688 | 10.64748201 | 5.419664 | 89.352518 | 1.6786571 | 0.9592326 | 1.1031175 | 1.630695 | 0.00000000 | 77.55396 | 0.04796163 | 6.3788969 | 10.64748201 | 5.419664 | 89.35252 | 1.6786571 | 0.9592326 | 1.1031175 | 1.630695 | 0.00000000 | 77.55396 | 0.04796163 | 6.3788969 | 10.6474820 | 5.419664 | 89.35252 |
Nebraska | ||||||||||||||||||||||||||||||||||
Private | 11 | 0.00000000 | 0.00000000 | 1.7366136 | 2.7900147 | 0.00000000 | 57.0907987 | 0.00000000 | 0.00000000 | 0.00000000 | 9.261939 | 95.227606 | 0.5116739 | 2.2486971 | 6.4570358 | 6.353040 | 0.28972707 | 76.04051 | 2.02700758 | 4.6043600 | 1.46794375 | 17.887182 | 98.53206 | 1.0660981 | 9.0092278 | 15.4411765 | 14.705882 | 0.73529412 | 90.73806 | 6.75477239 | 16.2060937 | 4.7723935 | 32.598039 | 100.00000 |
Public | 4 | 0.27056277 | 0.54755043 | 1.6504329 | 3.3008658 | 0.02705628 | 75.6676558 | 1.16341991 | 1.81277056 | 0.00000000 | 11.147186 | 91.801948 | 0.6945519 | 1.8367904 | 3.8739236 | 5.259272 | 0.18187061 | 77.82262 | 2.05152180 | 5.6469099 | 2.63254251 | 13.897930 | 97.36746 | 1.1869436 | 4.7348485 | 7.2829132 | 6.599424 | 0.36267722 | 78.84199 | 2.73656446 | 8.4438040 | 8.1980519 | 17.326931 | 100.00000 |
Nevada | ||||||||||||||||||||||||||||||||||
Private | 1 | 2.98363811 | 1.82868142 | 3.9461020 | 3.1761309 | 0.96246391 | 69.2974013 | 0.28873917 | 15.20692974 | 2.30991338 | 13.185756 | 97.690087 | 2.9836381 | 1.8286814 | 3.9461020 | 3.176131 | 0.96246391 | 69.29740 | 0.28873917 | 15.2069297 | 2.30991338 | 13.185756 | 97.69009 | 2.9836381 | 1.8286814 | 3.9461020 | 3.176131 | 0.96246391 | 69.29740 | 0.28873917 | 15.2069297 | 2.3099134 | 13.185756 | 97.69009 |
Public | 2 | 0.42194093 | 2.05026455 | 1.8849206 | 16.1375661 | 0.82671958 | 43.5443038 | 2.54629630 | 5.05952381 | 0.00000000 | 26.620370 | 99.867725 | 1.7982721 | 6.2290704 | 6.6386628 | 19.179894 | 1.24317695 | 55.86607 | 3.22814112 | 5.7505777 | 0.06613757 | 38.317218 | 99.93386 | 3.1746032 | 10.4078762 | 11.3924051 | 22.222222 | 1.65963432 | 68.18783 | 3.90998594 | 6.4416315 | 0.1322751 | 50.014065 | 100.00000 |
New Hampshire | ||||||||||||||||||||||||||||||||||
Private | 8 | 0.00000000 | 0.60975610 | 1.0752688 | 1.7582418 | 0.00000000 | 25.0635486 | 0.00000000 | 6.58607753 | 0.00000000 | 5.376344 | 84.693554 | 0.5773033 | 2.7186457 | 5.3122559 | 4.107853 | 0.18477460 | 62.27110 | 1.57831958 | 18.3472421 | 4.90250116 | 14.479152 | 95.09750 | 1.6195618 | 12.3054938 | 17.1321384 | 7.480315 | 0.98425197 | 79.77642 | 3.89012385 | 62.8853353 | 15.3064465 | 30.422356 | 100.00000 |
Public | 5 | 0.11764706 | 0.86011770 | 0.9683276 | 2.1276596 | 0.00000000 | 70.1176471 | 1.25643666 | 8.67460157 | 0.04034698 | 6.609325 | 96.355097 | 0.2754972 | 1.7038658 | 1.4185501 | 2.876995 | 0.01976929 | 79.01674 | 1.66209426 | 11.7142714 | 1.31221523 | 7.956772 | 98.68778 | 0.4979629 | 2.3529412 | 2.0803296 | 3.647059 | 0.04526935 | 84.47261 | 2.00000000 | 20.0000000 | 3.6449031 | 9.647059 | 99.95965 |
New Jersey | ||||||||||||||||||||||||||||||||||
Private | 15 | 0.00000000 | 0.00000000 | 0.0000000 | 0.0000000 | 0.00000000 | 9.4142259 | 0.00000000 | 0.00000000 | 0.00000000 | 0.000000 | 69.044898 | 0.1996066 | 4.0991606 | 13.3710016 | 10.922956 | 0.15660179 | 54.53586 | 1.29638963 | 8.0912045 | 7.32722061 | 30.045716 | 92.67278 | 0.7268322 | 15.8506429 | 52.0958084 | 28.770442 | 0.56323605 | 100.00000 | 4.18410042 | 38.1794651 | 30.9551020 | 80.588822 | 100.00000 |
Public | 9 | 0.05635920 | 3.90323331 | 5.3140097 | 8.4066062 | 0.00000000 | 27.5596468 | 1.16608362 | 1.66861144 | 0.35005834 | 24.270712 | 83.064062 | 0.2038684 | 7.0899629 | 11.8147674 | 17.752549 | 0.20353163 | 50.54686 | 2.06493237 | 7.2281380 | 3.09539482 | 39.129612 | 96.90461 | 0.6280530 | 17.8564719 | 19.9606686 | 32.239430 | 0.64200977 | 73.71062 | 3.18150035 | 13.3612468 | 16.9359384 | 61.750246 | 99.64994 |
New Mexico | ||||||||||||||||||||||||||||||||||
Private | 1 | 2.08126858 | 0.99108028 | 23.9841427 | 20.7135778 | 0.39643211 | 33.7958375 | 0.00000000 | 17.74033697 | 0.29732408 | 48.166501 | 99.702676 | 2.0812686 | 0.9910803 | 23.9841427 | 20.713578 | 0.39643211 | 33.79584 | 0.00000000 | 17.7403370 | 0.29732408 | 48.166501 | 99.70268 | 2.0812686 | 0.9910803 | 23.9841427 | 20.713578 | 0.39643211 | 33.79584 | 0.00000000 | 17.7403370 | 0.2973241 | 48.166501 | 99.70268 |
Public | 7 | 2.53878702 | 0.81782290 | 0.2057613 | 7.4074074 | 0.09402915 | 11.5226337 | 0.47793084 | 0.80233406 | 0.14587892 | 35.825106 | 92.900799 | 14.6967194 | 1.6622964 | 2.8202712 | 38.631326 | 0.24253118 | 31.05496 | 2.17870431 | 5.5063989 | 3.20678809 | 60.231849 | 96.79321 | 75.1028807 | 3.2753915 | 4.5967287 | 71.043034 | 0.50761421 | 55.99436 | 3.86579139 | 21.3663199 | 7.0992008 | 85.995624 | 99.85412 |
New York | ||||||||||||||||||||||||||||||||||
Private | 83 | 0.00000000 | 0.00000000 | 0.0000000 | 0.0000000 | 0.00000000 | 0.8598452 | 0.00000000 | 0.00000000 | 0.00000000 | 0.000000 | 49.529781 | 0.2509328 | 4.0604171 | 6.5272589 | 7.937922 | 0.13055783 | 66.48749 | 1.81357899 | 5.9502243 | 6.84161446 | 20.720667 | 93.15839 | 1.6798419 | 14.8435280 | 59.2019544 | 82.287188 | 3.40768278 | 100.00000 | 5.70012392 | 40.6980803 | 50.4702194 | 97.162511 | 100.00000 |
Public | 1 | 0.16386727 | 10.07783695 | 9.6169603 | 16.1102007 | 0.27652601 | 46.3334699 | 3.40024580 | 1.61818927 | 12.40270381 | 39.645637 | 87.597296 | 0.1638673 | 10.0778370 | 9.6169603 | 16.110201 | 0.27652601 | 46.33347 | 3.40024580 | 1.6181893 | 12.40270381 | 39.645637 | 87.59730 | 0.1638673 | 10.0778370 | 9.6169603 | 16.110201 | 0.27652601 | 46.33347 | 3.40024580 | 1.6181893 | 12.4027038 | 39.645637 | 87.59730 |
North Carolina | ||||||||||||||||||||||||||||||||||
Private | 35 | 0.00000000 | 0.00000000 | 3.4831461 | 0.2219756 | 0.00000000 | 0.3074558 | 0.00000000 | 0.00000000 | 0.00000000 | 10.744681 | 81.716700 | 0.6779701 | 1.8996102 | 28.5820150 | 3.581534 | 0.09036037 | 52.36635 | 1.93381175 | 7.6767985 | 3.19155315 | 36.765301 | 96.80845 | 1.9292605 | 14.3037336 | 95.4545455 | 10.286225 | 0.51361068 | 85.49107 | 4.01129944 | 30.1282051 | 18.2832997 | 98.113208 | 100.00000 |
Public | 15 | 0.24409187 | 0.37493305 | 3.2249675 | 1.3390466 | 0.02601795 | 11.4739170 | 0.79674521 | 0.94394144 | 0.21424746 | 11.366914 | 89.046456 | 1.5548657 | 2.5315037 | 27.5982550 | 4.897334 | 0.10334125 | 54.65034 | 2.86331034 | 3.1809657 | 2.62008187 | 39.548610 | 97.37992 | 14.8189504 | 8.2718380 | 76.0114479 | 7.342683 | 0.24026085 | 85.69289 | 4.59290188 | 8.1949652 | 10.9535438 | 84.142058 | 99.78575 |
North Dakota | ||||||||||||||||||||||||||||||||||
Private | 3 | 1.54639175 | 0.00000000 | 2.7411890 | 1.7391304 | 0.17799929 | 78.6956522 | 0.00000000 | 0.00000000 | 0.86956522 | 10.323959 | 93.917526 | 2.5462036 | 0.7577387 | 4.8969958 | 3.225991 | 1.03199110 | 80.31317 | 1.81461245 | 2.5382209 | 2.87507758 | 14.273532 | 97.12492 | 4.3478261 | 1.1391954 | 7.8260870 | 4.948454 | 2.60869565 | 81.75258 | 3.91304348 | 7.5115700 | 6.0824742 | 20.434783 | 99.13043 |
Public | 6 | 0.73913338 | 0.18501388 | 2.3077955 | 1.6613549 | 0.08137248 | 69.2082111 | 1.76307046 | 0.55504163 | 2.68270120 | 8.184716 | 87.947214 | 1.2560680 | 0.9218368 | 3.8843095 | 3.794283 | 0.24022673 | 78.61656 | 2.60341936 | 2.4711061 | 6.21219071 | 12.700144 | 93.78781 | 1.7576318 | 1.4918288 | 6.4754857 | 4.897361 | 0.46253469 | 83.88970 | 2.96022202 | 4.0889672 | 12.0527859 | 16.558742 | 97.31730 |
Ohio | ||||||||||||||||||||||||||||||||||
Private | 46 | 0.00000000 | 0.00000000 | 0.0000000 | 0.0000000 | 0.00000000 | 0.5167959 | 0.00000000 | 0.00000000 | 0.00000000 | 0.000000 | 72.000000 | 0.2670138 | 1.7891142 | 10.7333653 | 3.175560 | 0.07934518 | 71.33020 | 2.28516381 | 6.3545010 | 3.98573686 | 18.329563 | 96.01426 | 1.5625000 | 14.3255037 | 92.7648579 | 10.052678 | 0.47846890 | 100.00000 | 10.88888889 | 42.8888889 | 28.0000000 | 94.315245 | 100.00000 |
Public | 9 | 0.14717459 | 0.05711022 | 5.0859430 | 0.5886508 | 0.00000000 | 1.4848658 | 0.85665334 | 0.68875430 | 0.45688178 | 8.264657 | 88.944306 | 0.2287374 | 1.8620747 | 19.7214203 | 2.594275 | 0.07086432 | 64.48507 | 2.30621284 | 2.9103394 | 5.82101015 | 26.783584 | 94.17899 | 0.5886508 | 3.2424320 | 95.3169617 | 4.316249 | 0.11281321 | 84.81281 | 3.45564660 | 5.3214033 | 11.0556941 | 97.087379 | 99.54312 |
Oklahoma | ||||||||||||||||||||||||||||||||||
Private | 10 | 1.93939394 | 0.00000000 | 3.9085861 | 0.0000000 | 0.00000000 | 45.5104551 | 0.00000000 | 0.00000000 | 0.00000000 | 17.343016 | 66.666667 | 5.3816774 | 2.0479018 | 12.3772645 | 5.324622 | 0.17304398 | 57.06790 | 3.95654188 | 3.0050619 | 10.66598638 | 29.261052 | 89.33401 | 16.6666667 | 8.2020202 | 27.6413690 | 7.842574 | 0.37202381 | 70.23749 | 8.03436079 | 9.7960356 | 33.3333333 | 42.857143 | 100.00000 |
Public | 10 | 1.49073328 | 0.00000000 | 2.3325062 | 1.2489927 | 0.00000000 | 13.6583400 | 0.00000000 | 0.00000000 | 0.20145044 | 21.191136 | 91.152019 | 8.7512386 | 1.4789009 | 14.2124162 | 6.925812 | 0.17863313 | 54.59074 | 7.64041271 | 2.3912953 | 3.83055596 | 39.187414 | 96.16944 | 21.1070999 | 3.0285036 | 79.3311845 | 16.255778 | 0.65017157 | 67.74129 | 17.59305211 | 7.7100646 | 8.8479810 | 82.070911 | 99.79855 |
Oregon | ||||||||||||||||||||||||||||||||||
Private | 12 | 0.16895969 | 1.19521912 | 0.9633911 | 3.0222222 | 0.19267823 | 57.3076923 | 0.79908676 | 1.56028369 | 0.00000000 | 14.666667 | 90.035971 | 0.9628141 | 5.1739799 | 2.1958425 | 7.599922 | 0.61496163 | 67.77156 | 6.27319696 | 4.7615602 | 4.64616427 | 22.820717 | 95.35384 | 2.5531915 | 13.9285714 | 5.8130401 | 10.473458 | 1.53846154 | 79.57447 | 11.94605010 | 7.9623288 | 9.9640288 | 32.967033 | 100.00000 |
Public | 8 | 0.45438658 | 2.08263352 | 1.2233485 | 5.2778749 | 0.24466970 | 53.8461538 | 0.26450653 | 2.19538513 | 1.00938967 | 15.630988 | 86.989542 | 1.2139270 | 5.2652446 | 2.1188789 | 7.632290 | 0.75706647 | 66.50313 | 3.88804520 | 6.8906769 | 5.73074021 | 20.875452 | 94.26926 | 2.2721051 | 9.9615519 | 3.1990179 | 9.618718 | 2.41362209 | 76.01971 | 5.72249533 | 25.3611018 | 13.0104582 | 26.819757 | 98.99061 |
Pennsylvania | ||||||||||||||||||||||||||||||||||
Private | 66 | 0.00000000 | 0.00000000 | 0.0000000 | 0.0000000 | 0.00000000 | 17.7130045 | 0.00000000 | 0.00000000 | 0.00000000 | 0.000000 | 39.798206 | 0.1989786 | 3.3304171 | 9.5445353 | 5.173646 | 0.08829435 | 67.25971 | 2.20629738 | 6.5500583 | 5.64806678 | 20.542169 | 94.35193 | 0.8011146 | 16.4534838 | 67.3758865 | 13.359274 | 0.63357973 | 100.00000 | 7.78210117 | 36.2745098 | 60.2017937 | 77.905074 | 100.00000 |
Public | 16 | 0.06337136 | 0.09784736 | 3.2722746 | 1.1153712 | 0.05885815 | 1.1741683 | 1.59883721 | 0.39786149 | 0.19569472 | 7.284768 | 92.930506 | 0.1592404 | 1.9026603 | 12.6461132 | 4.477350 | 0.09547283 | 73.67511 | 2.32262608 | 3.0026774 | 1.71875386 | 21.603463 | 98.28125 | 0.3270349 | 10.1880752 | 85.5185910 | 10.265435 | 0.16272510 | 89.19484 | 3.71819961 | 8.0070392 | 7.0694945 | 93.248532 | 99.80431 |
Rhode Island | ||||||||||||||||||||||||||||||||||
Private | 7 | 0.12249898 | 1.12508273 | 2.0416497 | 4.8730549 | 0.06142506 | 31.7680686 | 0.08190008 | 2.97515887 | 1.89850310 | 10.462735 | 70.559412 | 0.4110147 | 5.4439026 | 3.6833485 | 7.316944 | 0.10987466 | 62.76910 | 1.90956765 | 8.7936652 | 9.56258072 | 18.874652 | 90.43742 | 0.8599509 | 15.7615353 | 5.9906328 | 9.737501 | 0.19854401 | 77.41010 | 4.09541444 | 12.9811630 | 29.4405880 | 32.218713 | 98.10150 |
Public | 2 | 0.26552411 | 2.88161092 | 4.8458150 | 7.7726148 | 0.03620783 | 65.0156232 | 1.66647379 | 9.03830575 | 0.17359102 | 18.520307 | 96.632672 | 0.3179258 | 3.0520540 | 6.1203962 | 10.557989 | 0.07596759 | 66.89318 | 2.02206074 | 9.1899633 | 1.77045974 | 22.146393 | 98.22954 | 0.3703275 | 3.2224971 | 7.3949774 | 13.343363 | 0.11572735 | 68.77074 | 2.37764770 | 9.3416209 | 3.3673285 | 25.772480 | 99.82641 |
South Carolina | ||||||||||||||||||||||||||||||||||
Private | 18 | 0.00000000 | 0.00000000 | 1.4157014 | 0.2136752 | 0.00000000 | 0.0000000 | 0.00000000 | 0.00000000 | 0.00000000 | 10.977034 | 94.208494 | 0.2769395 | 1.0605547 | 40.1867030 | 2.587914 | 0.06761614 | 46.07725 | 1.23779295 | 6.2452569 | 2.25997292 | 45.417521 | 97.74003 | 1.0718114 | 3.0825023 | 98.8543372 | 5.694981 | 0.38610039 | 79.97587 | 3.19662244 | 36.1411087 | 5.7915058 | 99.615385 | 100.00000 |
Public | 9 | 0.12008406 | 0.66046232 | 6.1627854 | 0.5403783 | 0.03002101 | 3.7826479 | 0.03002101 | 0.33200531 | 0.15010507 | 13.034726 | 92.821522 | 0.3312356 | 1.8439523 | 27.3711694 | 2.943275 | 0.09573345 | 61.30750 | 2.15979787 | 1.8345840 | 2.11275703 | 34.745164 | 97.88724 | 0.9743875 | 5.8661146 | 93.9057340 | 5.707127 | 0.25055679 | 80.49057 | 4.05974338 | 4.8309179 | 7.1784783 | 95.286701 | 99.84989 |
South Dakota | ||||||||||||||||||||||||||||||||||
Private | 4 | 0.28188865 | 0.22988506 | 1.7241379 | 1.1980268 | 0.00000000 | 65.7064472 | 0.00000000 | 0.16181230 | 0.00000000 | 7.701149 | 98.390805 | 1.2928915 | 0.7700026 | 4.7755315 | 4.141070 | 0.13636877 | 83.44967 | 2.04877840 | 2.4818948 | 0.90379675 | 13.164643 | 99.09620 | 2.3462783 | 1.2345679 | 11.3854595 | 7.544582 | 0.40453074 | 90.34483 | 3.38266385 | 8.5048011 | 1.6091954 | 24.417010 | 100.00000 |
Public | 8 | 0.82047916 | 0.00000000 | 0.0000000 | 0.0000000 | 0.00000000 | 3.7140855 | 0.00000000 | 0.00000000 | 0.00000000 | 7.703200 | 94.419198 | 24.4195678 | 0.9537036 | 1.6379058 | 2.286078 | 0.18111660 | 64.41244 | 1.83529354 | 1.2716429 | 3.00224907 | 31.313665 | 96.99775 | 95.3749124 | 2.0019691 | 4.7259600 | 3.931380 | 0.50690786 | 85.94435 | 3.43102216 | 4.0766318 | 5.5808020 | 95.935529 | 100.00000 |
Tennessee | ||||||||||||||||||||||||||||||||||
Private | 28 | 0.00000000 | 0.00000000 | 3.4697509 | 0.0000000 | 0.00000000 | 0.0000000 | 0.00000000 | 0.00000000 | 0.00000000 | 7.918149 | 90.524988 | 0.3252366 | 2.0583604 | 23.7164958 | 3.712725 | 0.09281991 | 60.51757 | 1.76878686 | 5.1718908 | 2.63611455 | 31.674425 | 97.36389 | 1.7441860 | 7.6146934 | 100.0000000 | 19.023622 | 0.59842520 | 87.18861 | 5.92105263 | 16.6103984 | 9.4750118 | 100.000000 | 100.00000 |
Public | 6 | 0.14992504 | 1.00808685 | 3.6687539 | 1.2407223 | 0.00000000 | 20.3168273 | 2.20477996 | 0.37922215 | 0.42527940 | 9.577564 | 88.984919 | 0.2546569 | 1.8373055 | 24.2343711 | 3.131386 | 0.08429978 | 60.84336 | 3.09108378 | 1.2563444 | 5.26718887 | 32.633103 | 94.73281 | 0.3956087 | 2.9678522 | 63.6977955 | 5.686876 | 0.18791415 | 82.90841 | 5.08357235 | 2.2154090 | 11.0150807 | 69.159189 | 99.57472 |
Texas | ||||||||||||||||||||||||||||||||||
Private | 38 | 0.00000000 | 0.00000000 | 2.2988506 | 2.9197080 | 0.00000000 | 1.5544041 | 0.00000000 | 0.00000000 | 0.00000000 | 19.744842 | 76.468811 | 0.4585051 | 3.0085943 | 23.4126611 | 20.448949 | 0.17174096 | 42.95614 | 1.85604476 | 2.2780272 | 5.40934134 | 49.356496 | 94.59066 | 1.2195122 | 16.6440115 | 85.4858549 | 66.091954 | 1.48148148 | 72.54062 | 4.44078947 | 13.9376997 | 23.5311886 | 97.435897 | 100.00000 |
Public | 28 | 0.06619010 | 0.38287504 | 0.6221869 | 2.7168110 | 0.00000000 | 2.0121790 | 0.00000000 | 0.21180831 | 0.52210233 | 28.105470 | 76.393159 | 0.3932651 | 6.1144762 | 15.6597864 | 27.711786 | 0.18342767 | 40.06689 | 2.41757050 | 1.6085496 | 5.84424550 | 52.480312 | 94.15575 | 0.8812950 | 19.9393808 | 84.0787757 | 92.983850 | 1.62723601 | 70.44774 | 7.67691717 | 6.0636788 | 23.6068413 | 94.638602 | 99.47790 |
Utah | ||||||||||||||||||||||||||||||||||
Private | 2 | 0.36740585 | 1.94528277 | 0.5150243 | 5.5274898 | 0.49808893 | 72.4615624 | 2.87092925 | 1.17438656 | 0.12625171 | 12.314657 | 96.181603 | 0.4950085 | 2.8707407 | 4.9634148 | 6.474334 | 0.54100090 | 77.57706 | 3.12323526 | 1.9828800 | 1.97232412 | 18.467735 | 98.02768 | 0.6226112 | 3.7961986 | 9.4118054 | 7.421179 | 0.58391287 | 82.69256 | 3.37554127 | 2.7913734 | 3.8183965 | 24.620813 | 99.87375 |
Public | 2 | 0.36603221 | 1.22139169 | 1.2329506 | 8.6244645 | 0.37759112 | 49.7110272 | 1.63751252 | 3.86482627 | 1.55274717 | 14.198197 | 91.102650 | 0.3829209 | 3.1333041 | 1.2526803 | 8.993592 | 0.46009494 | 58.96308 | 2.38785034 | 19.2014275 | 5.22504882 | 16.610442 | 94.77495 | 0.3998096 | 5.0452166 | 1.2724100 | 9.362719 | 0.54259876 | 68.21514 | 3.13818816 | 34.5380288 | 8.8973505 | 19.022688 | 98.44725 |
Vermont | ||||||||||||||||||||||||||||||||||
Private | 9 | 0.19098549 | 0.65645514 | 0.0000000 | 0.4347826 | 0.00000000 | 57.9868709 | 1.08043217 | 0.49656226 | 0.00000000 | 6.521739 | 89.006623 | 0.6739817 | 2.2384566 | 3.8700412 | 5.081177 | 0.12708871 | 69.09190 | 3.01438427 | 12.2761315 | 3.62683738 | 15.005129 | 96.37316 | 1.7218543 | 6.7114094 | 12.9102845 | 9.409190 | 0.37174721 | 85.59969 | 5.66248257 | 26.9565217 | 10.9933775 | 25.601751 | 100.00000 |
Public | 3 | 0.24891101 | 0.76923077 | 1.2912259 | 1.4915694 | 0.00000000 | 80.3438083 | 1.95804196 | 1.29701686 | 0.76923077 | 7.198444 | 95.698507 | 0.6237871 | 1.6515307 | 2.0697896 | 2.998046 | 0.03368132 | 84.49991 | 2.28911520 | 3.3656885 | 2.46845153 | 9.665949 | 97.53155 | 0.9090909 | 3.0180460 | 3.4265734 | 4.145924 | 0.06993007 | 89.16991 | 2.57467330 | 4.7552448 | 4.3014935 | 11.309894 | 99.23077 |
Virginia | ||||||||||||||||||||||||||||||||||
Private | 22 | 0.00000000 | 0.00000000 | 4.1134752 | 0.6172840 | 0.00000000 | 0.9259259 | 0.00000000 | 0.00000000 | 0.00000000 | 14.588859 | 90.043290 | 0.4647529 | 2.3696297 | 20.4559626 | 4.547600 | 0.12292265 | 62.79519 | 2.72368219 | 3.3897708 | 3.13049068 | 30.684550 | 96.86951 | 1.4351476 | 7.9628015 | 98.4567901 | 13.106655 | 0.99290780 | 82.84704 | 5.42986425 | 27.2505187 | 9.9567100 | 99.074074 | 100.00000 |
Public | 12 | 0.14864541 | 0.45771144 | 4.1380964 | 1.8323408 | 0.00000000 | 2.9253731 | 0.00000000 | 0.05882353 | 0.00000000 | 13.650939 | 91.517782 | 0.2622088 | 4.7385796 | 22.3630494 | 4.938981 | 0.21013459 | 57.47387 | 3.16721640 | 4.3530848 | 2.49287401 | 35.680170 | 97.50713 | 0.3569710 | 14.8270035 | 85.2338308 | 10.139642 | 0.76470588 | 81.41176 | 5.29038986 | 8.8358209 | 8.4822181 | 89.547038 | 100.00000 |
Washington | ||||||||||||||||||||||||||||||||||
Private | 11 | 0.14154282 | 3.39110776 | 0.9345794 | 3.4941764 | 0.00000000 | 47.9994500 | 0.26497085 | 1.45081387 | 0.38924275 | 18.498368 | 90.884092 | 0.6174093 | 6.9774762 | 2.8470087 | 8.162984 | 0.47652631 | 66.16276 | 5.94665967 | 4.3613593 | 4.44781491 | 25.028064 | 95.55219 | 1.1658718 | 13.9007287 | 6.0521932 | 12.400636 | 2.10993892 | 76.41296 | 8.77565464 | 10.1746185 | 9.1159082 | 32.710023 | 99.61076 |
Public | 6 | 0.43160691 | 2.15690922 | 1.4873838 | 6.5134870 | 0.18592297 | 46.9252412 | 4.48970490 | 1.03585657 | 0.45034368 | 23.552457 | 85.090657 | 0.9861372 | 6.7831796 | 3.1227578 | 9.580315 | 0.43934539 | 62.69559 | 6.25798570 | 5.6243575 | 4.51033049 | 27.169721 | 95.48967 | 2.5835506 | 19.4310468 | 5.3804219 | 13.171783 | 1.16111535 | 74.36919 | 7.72908367 | 9.5770828 | 14.9093426 | 34.637370 | 99.54966 |
West Virginia | ||||||||||||||||||||||||||||||||||
Private | 4 | 0.25396825 | 0.19854401 | 0.0000000 | 1.3245033 | 0.09474183 | 61.2979631 | 0.00000000 | 2.05162144 | 0.33112583 | 2.980132 | 93.699668 | 0.5367935 | 1.1090403 | 5.6047722 | 1.909421 | 0.27063041 | 76.93228 | 1.35697207 | 8.7008041 | 3.57928715 | 10.787629 | 96.42071 | 0.9000474 | 2.7001421 | 9.9005211 | 2.558029 | 0.66225166 | 88.07947 | 2.84579749 | 15.2534344 | 6.3003316 | 17.148271 | 99.66887 |
Public | 9 | 0.00000000 | 0.16648169 | 4.2227935 | 0.4854369 | 0.00000000 | 48.3356449 | 0.00000000 | 0.44901457 | 0.33296337 | 7.386785 | 92.980291 | 0.3231175 | 0.9062822 | 7.3868638 | 1.935252 | 0.03845896 | 78.28907 | 1.56114182 | 6.7828578 | 2.77695728 | 12.151117 | 97.22304 | 0.6434051 | 1.8680377 | 12.8745838 | 4.282316 | 0.08568980 | 88.01039 | 2.97172237 | 39.8751734 | 7.0197087 | 15.316315 | 99.66704 |
Wisconsin | ||||||||||||||||||||||||||||||||||
Private | 20 | 0.00000000 | 0.17123288 | 0.2849003 | 1.4245014 | 0.00000000 | 56.1322729 | 0.00000000 | 0.46044370 | 0.00000000 | 3.988604 | 87.829181 | 0.5601772 | 2.6697187 | 6.1876671 | 5.939914 | 0.17927190 | 73.88952 | 2.26783759 | 5.0565536 | 3.24934109 | 17.804587 | 96.75066 | 2.0547945 | 7.9470199 | 22.0939386 | 16.785266 | 0.66225166 | 95.15670 | 3.90469887 | 21.0526316 | 12.1708185 | 41.816660 | 100.00000 |
Public | 2 | 0.19810698 | 2.97160467 | 1.6606344 | 2.3419204 | 0.14903130 | 69.6015849 | 1.64998935 | 0.44023773 | 2.17917676 | 9.367681 | 96.263572 | 0.2480848 | 3.1198240 | 5.0565994 | 7.334288 | 0.17356914 | 77.85487 | 2.64097531 | 0.6139873 | 2.95780213 | 18.573341 | 97.04220 | 0.2980626 | 3.2680434 | 8.4525644 | 12.326656 | 0.19810698 | 86.10815 | 3.63196126 | 0.7877369 | 3.7364275 | 27.779001 | 97.82082 |
Wyoming | ||||||||||||||||||||||||||||||||||
Public | 1 | 0.56162246 | 1.33385335 | 0.9438378 | 5.8424337 | 0.25741030 | 74.1497660 | 2.37909516 | 7.80811232 | 6.72386895 | 11.318253 | 93.276131 | 0.5616225 | 1.3338534 | 0.9438378 | 5.842434 | 0.25741030 | 74.14977 | 2.37909516 | 7.8081123 | 6.72386895 | 11.318253 | 93.27613 | 0.5616225 | 1.3338534 | 0.9438378 | 5.842434 | 0.25741030 | 74.14977 | 2.37909516 | 7.8081123 | 6.7238690 | 11.318253 | 93.27613 |
I am particularly interested in the the White, Asian, black,Hispanic, american indian alaska native and native Hawaiian pacific islander race and ethnic groups and the following inferences can be made from the summary table:
Public colleges
->By assessing the mean % enrollment for the White, Asian, black,Hispanic, american indian alaska native and native Hawaiian pacific islander race and ethnic groups in the public colleges, it can be observed that the states in the Mid-west and North-Central region have more % white students enrolled in the colleges compared to the other race/ethnic groups. States such as California, New Mexico and Hawaii have the least % white students enrolled in public colleges.
->the highest Hispanic % enrollment was observed in New Mexico, Texas,California, Florida and Arizona which is quite expected and the least % Hispanic enrollment was seen in states like Mississipi , Maine, Montana etc
->the highest Asian % enrollment as expected was observed in Hawaii, California and New York, New Jersey and the least was seen in North Dakota, West Virginia and Maine
-> the highest american indian alaska native % enrollment was observed in South Dakota, New Mexico and Kansas and the least % was in Illinois. New York and Pennsylvania
->the highest native Hawaiian pacific islander % enrollment was noted in Hawaii, Nevada and Oregon and the least % was observed in New Hamphsire and Montana.
Private colleges
->By assessing the mean % enrollment for the White, Asian, black,Hispanic, american indian alaska native and native Hawaiian pacific islander race and ethnic groups in the private colleges, it can be observed that the states in the Mid-west and North-Central region such as South and North Dakota, Utah, West Virginia, Kentucky and ebraska have more % white students enrolled in the colleges compared to the other race/ethnic groups. States such as California, New Mexico,Hawaii,Delaware have the least % white students enrolled in private colleges.
->the highest Hispanic % enrollment was observed in New Mexico, Texas,California, Florida and Arizona which is quite expected and the least % Hispanic enrollment was seen in states like Alabama,West Virginia, Mississipi and Delaware etc
->the highest Asian % enrollment as expected was observed in Hawaii,California,Washington,Massachusetts and the least was seen in Alabama, South Dakota and North Dakota
-> the highest american indian alaska native % enrollment was observed in Arizona,Montana and Alaska and the least % was in New Jersey, Pennsylvania and Delaware.
->the highest native Hawaiian pacific islander % enrollment was noted in Hawaii, Arizona and North Dakota and the least % was observed in Minnesota, Mississipi and Delaware
I expected a higher early career pay, higher stem % and higher making world better % for the private colleges and similar mid career pay for both the private and public college students. From the data, definitive conclusions couldn’t be made, it seems like both private and public colleges have similar earning potential and quality of education.
I also expected higher % men enrollment compared to % women enrollment, this would apply to both private and public schools.The inference made from the data about % enrollment based on gender was infact reverse of what I had expected. More % female enrollment was noticed in both private and public colleges.
I also expected certain race/ethnic groups choosing colleges based on location but the choice of private/public school would be influenced by economic factors as well. The inferences from the data aligns well with my expectation.
ggplot(data=merged1_stats,
aes(x=median_early_career_pay,
y=reorder(state_code,median_early_career_pay),fill=type))+
geom_col(position="dodge")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
scale_fill_brewer(palette="Pastel2")+
labs(title=str_wrap("Median Early career pay(in USD) for the private vs public college across the different states",width=60),
x="Median Early career pay(USD)",
y="State Name")+
guides(fill=guide_legend(title="College Type"))
From the above plot, it is evident that Massachusetts had the highest median early career, out of which the public college in MA had a slightly higher median early career pay than the private colleges in MA. This was followed by California having the second highest median early career pay with the private college having slightly higher median early career pay than the public colleges.
Missouri and Alaska seem to be have the lowest median early career pay. In around 10 states, it seems like the median early career pay is slightly higher for the public colleges compared to the private colleges and in 11 states the opposite trend is observed and in the rest of the states both the private and public colleges have similar median early career pay.
Only in 5 states(MA,AK,MT,DE,AZ), the differences seem to be more pronounced with public colleges having higher median career pay.
Note: NV and PA have early career pay data only for the private college and Wyoming has the data only for public college, so a comparison between private and public college couldn’t be made.
#create a manual color palette
pal <- colorRampPalette(c("red", "lightyellow","lightblue","lightgreen","pink","violet","orange","green","blue"))
pal<-pal(52)
merged1_final%>%
group_by(state_code,type)%>%
summarise(median_early_career_pay=median(early_career_pay,na.rm=TRUE))%>%
ggplot()+
labs(title=str_wrap("Comparison of median early career pay in USD between the private and public colleges for different states in the United States",width=60),
x="Median Early career pay(in USD)",
y="State Name")+
scale_fill_manual(values=pal)+
geom_bar(aes(x=median_early_career_pay,
y=reorder_within(state_code,-median_early_career_pay,type),
fill=state_code),stat="identity",show.legend=TRUE)+
facet_wrap(~type,scales="free_y")+
scale_y_reordered()+
guides(fill=guide_legend(title="State Name"))
## `summarise()` has grouped output by 'state_code'. You can override using the
## `.groups` argument.
The interpretations of this plot are the same as the interpretations for the plot above. In this plot, the median early career pay for private and public colleges are plotted in different panels and thus it would be easier to interpret the changes in median early career pay within each type of college for the different states. The plot is more readable as the bars are arranged in a sequential order.
From the above bar plots, we were able to only compare the median early career pay i.e. only the center of the data for each type of college and for each state. It would be useful to understand the spread of the data as well for which boxplots can be used. Since we have 52 states and most color palettes have limited color options, I manually created a color palette by clubbing color options from inbuilt color palettes.
# create a manual palette using ColorBrewer Palettes
mycolors<- c(brewer.pal(name="Dark2", n = 8), brewer.pal(name="Paired", n = 6),
brewer.pal(name="Accent", n = 8),
brewer.pal(name="Set3", n = 12),
brewer.pal(name="Set2", n = 8),
brewer.pal(name="Set1", n = 9),
brewer.pal(name="Pastel1",n=1))
## Warning in brewer.pal(name = "Pastel1", n = 1): minimal value for n is 3, returning requested palette with 3 different levels
# boxplot to compare the median career pay for the different types of colleges at different states
ggplot(data=merged1_final,
aes(x=early_career_pay,
y=state_code,
fill=state))+
geom_boxplot()+
facet_wrap(~type)+
scale_fill_manual(values=mycolors)+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
labs(title=str_wrap("Median career pay(in USD) for the private vs public college across the different states",width=60),
x="Median career pay(USD)",
y="State Name")+
guides(fill=guide_legend(title="State Name"))
Amongst the private colleges, the early career pay was the highest in California and the least in Alabama and Mississipi. The interquartile range(difference between Q3 and Q1) seems largest for CA,ME and PA and the smallest for DE, KS, MT, MT and NV.
Amongst the public colleges, the early career pay was the highest in MA, the variability is very low, that is due to just one public college in MA present in the data. The least early career pay was noted for Mississipi. The largest IQR was observed for Oregon, Alaska and Virginia.
ggplot(data=merged1_stats,
aes(x=median_mid_career_pay,
y=reorder(state_code,median_mid_career_pay),
fill=type))+
geom_col(position="dodge")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
scale_fill_brewer(palette="Pastel2")+
labs(title=str_wrap("Median Mid career pay(in USD) for the private vs public college across the different states",width=60),
x="Median Mid career pay(USD)",
y="State Name")+
guides(fill=guide_legend(title="College Type"))
From the above plot, it is evident that California had the highest median mid career pay, out of which the private colleges in CA had a slightly higher median mid career pay than the public colleges in CA. This was followed by Massachusetts having the second highest median mid career pay with the public college having slightly higher median mid career pay than the private colleges.
Missouri and Alaska seem to be have the lowest median mid career pay. in around 18 states, it seems like the median mid career pay is slightly higher for the public colleges compared to the private colleges and in 14 states the opposite trend is observed and in the rest of the states both the private and public colleges have similar median early career pay.
Only in 5 states(AK,MT,IA,DE,AZ), the differences seem to be more pronounced with public colleges having higher median career pay than private colleges
Note: NV and PA have early career pay data only for the private college and Wyoming has the data only for public college, so a comparison between private and public college couldn’t be made.
merged1_final%>%
group_by(state_code,type)%>%
summarise(median_mid_career_pay=median(mid_career_pay,na.rm=TRUE))%>%
ggplot()+
labs(title=str_wrap("Comparison of median mid career pay in USD between the private and public colleges for different states in the United States",width=60),
x="Median mid career pay(in USD)",
y="State Name",
col="State Name")+
scale_fill_manual(values=pal)+
geom_bar(aes(x=median_mid_career_pay,
y=reorder_within(state_code,-median_mid_career_pay,type),
fill=state_code),stat="identity",show.legend=TRUE)+
facet_wrap(~type,scales="free_y")+
scale_y_reordered()+
guides(fill=guide_legend(title="State Name"))
## `summarise()` has grouped output by 'state_code'. You can override using the
## `.groups` argument.
The interpretations of this plot are the same as the interpretations for the plot above. In this plot, the median mid career pay for private and public colleges are plotted in different panels and thus it would be easier to interpret the changes in median mid career pay within each type of college for the different states. The plot is more readable as the bars are arranged in a sequential order.
# boxplot to compare the mid career pay for the different types of colleges at different states
ggplot(data=merged1_final,
aes(x=mid_career_pay,
y=state_code,
fill=state))+
geom_boxplot()+
facet_wrap(~type)+
scale_fill_manual(values=mycolors)+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
labs(title=str_wrap("Mid career pay(in USD) for the private vs public college across the different states",width=60),
x="Mid career pay(USD)",
y="State Name")+
guides(fill=guide_legend(title="State Name"))
Amongst the private colleges, the mid career pay was the highest in California and the least in Colorado. The interquartile range(difference between Q3 and Q1) seems largest for CA,ME,PA,VA and VT and the smallest for DE, KS, MT, MT and NV.
Amongst the public colleges, the mid career pay was the highest in Colorado(this is the one of the two outlier points in the plot). The least mid career pay was noted for Mississipi. The largest IQR was observed for Oregon,Virginia and Vermont.
ggplot(data=merged1_stats,
aes(x=median_make_world_better_percent,
y=reorder(state_code,-median_make_world_better_percent),
fill=type))+
geom_col(position="dodge")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
scale_fill_brewer(palette="Pastel2")+
labs(title=str_wrap("Median Make world better(in percent) for private vs public college across the different states",width=60),
x="Median make world better(in percent)",
y="State Name")+
guides(fill=guide_legend(title="College Type"))
merged1_final%>%
group_by(state_code,type)%>%
summarise(median_make_world_better_percent=median(make_world_better_percent,na.rm=TRUE))%>%
ggplot()+
labs(title=str_wrap("Comparison of median make world better in percent between the private and public colleges for different states in the United States",width=60),
x="Median make world better (in percent)",
y="State Name")+
scale_fill_manual(values=pal)+
geom_bar(aes(x=median_make_world_better_percent,
y=reorder_within(state_code,-median_make_world_better_percent,type),
fill=state_code),stat="identity",show.legend=TRUE)+
facet_wrap(~type,scales="free_y")+
scale_y_reordered()+
guides(fill=guide_legend(title="State Name"))
## `summarise()` has grouped output by 'state_code'. You can override using the
## `.groups` argument.
From the above plots, the following inferences can be made:
-> Amongst the private colleges,Arizona has the highest median make world better % followed by Alaska and Nevada has the lowest median make world better %. CA and MA have pretty low median make world better % which is not quite expected.
->Comparing the public colleges across USA, Mississipi has the highest median make world better % followed by Nebraska. The least median make world better % is taken by Illionis and Iowa.The median make world better % seem to gone up a bit for CA and MA amongst the public colleges but it must be noted that we had only 1 public college for each in CA and MA and thus making the accuracy of the data questionable.
-> Public and Private colleges do not seem to have a noticeable difference in the making world better %.
# boxplot to compare the make world better percent for the different types of colleges at different states
ggplot(data=merged1_final,
aes(x=make_world_better_percent,
y=state_code,
fill=state))+
geom_boxplot()+
facet_wrap(~type)+
scale_fill_manual(values=mycolors)+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
labs(title=str_wrap("title=Make world better in percent for the private vs public college across the different states",width=60),
x="Make world better (in percent)",
y="State Name")+
guides(fill=guide_legend(title="State Name"))
## Warning: Removed 21 rows containing non-finite values (`stat_boxplot()`).
Amongst, the private colleges, Wisconsin has the highest make world better % and Texas has the lowest make world better %. The spread of the data is highest for Lousiana.
Comparing the public colleges across the different states in the US,Texas has the highest make world better % and Vermont and Florida has the lowest make world better %. The spread of the data is highest for Nebraska.
ggplot(data=merged1_stats,
aes(x=median_stem_percent,
y=reorder(state_code,median_stem_percent),
fill=type))+
geom_col(position="dodge")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
scale_fill_brewer(palette="Pastel2")+
labs(title=str_wrap("Median Stem(in percent) for private vs public college across the different states",width=60),
x="Median Stem(in percent)",
y="State Name")+
guides(fill=guide_legend(title="College Type"))
By assessing the above plot,
Amonsgt the public colleges,the median stem % is the highest for MA and MI and the least for ME. The difference in stem % between private and public colleges is prominent for MA,MI CA,IL,CO, MO,FL, AZ ,AK, LA and DE with stem % being higher for the public colleges compared to private colleges
Amonsgt the private colleges,PA and MA have the highest median stem % and NV has the lowest median stem % . For 6 states(MS,GA,CT,MN,AR, IN), the difference in stem % between the private and public colleges is prominent,
merged1_final%>%
group_by(state_code,type)%>%
summarise(median_stem_percent=median(stem_percent,na.rm=TRUE))%>%
ggplot()+
labs(title=str_wrap("Comparison of median stem in percent between the private and public colleges for different states in the United States",width=60),
x="Median stem (in percent)",
y="State Name")+
scale_fill_manual(values=pal)+
geom_bar(aes(x=median_stem_percent,
y=reorder_within(state_code,-median_stem_percent,type),
fill=state_code),stat="identity",show.legend=TRUE)+
facet_wrap(~type,scales="free_y")+
scale_y_reordered()+
guides(fill=guide_legend(title="State Name"))
## `summarise()` has grouped output by 'state_code'. You can override using the
## `.groups` argument.
# boxplot to compare the stem percent for the different types of colleges at different states
ggplot(data=merged1_final,
aes(x=stem_percent,
y=state_code,
fill=state))+
geom_boxplot()+
facet_wrap(~type)+
scale_fill_manual(values=mycolors)+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
labs(title=str_wrap("Stem percent for the private vs public college across the different states",width=60),
x="Stem(in percent)",
y="State Name")+
guides(fill=guide_legend(title="State Name"))
Comparing the private colleges in the different states, the highest median stem % was recorded for CA and IN. The lowest stem % was noted for AZ, NE. The variability in the stem % was high for PA and CA.
Comparing the public colleges, highest median stem % was recorded for Colorado and lowest for MD and TX.Variability was high in stem % of MO.
The second question of interest was to study if gender and race/ethnicity has an impact on selecting private/public colleges, this is studief for the different states in USA.
ggplot(gender,
aes(x=enrollment,
y=reorder(state_code,enrollment),
fill=gender))+
geom_bar(position="dodge",stat="identity")+
facet_wrap(~type)+
labs(title=str_wrap("Comparison of % enrollment by different gender groups across private/public colleges in USA",width=60),
x="Enrollment in %",
y="State Name")+
guides(fill=guide_legend(title="College Type"))
gender%>%
group_by(state_code,type,gender)%>%
summarise(average_gender_percent=mean(enrollment,na.rm=TRUE))%>%
ggplot()+
labs(title=str_wrap("Comparison of gender in percent between the private and public colleges for different states in the United States",width=60),
x="Enrollment (in percent)",
y="State Name")+
scale_fill_manual(values=pal)+
geom_bar(aes(x=average_gender_percent,
y=reorder(state_code,-average_gender_percent),
fill=state_code),stat="identity",show.legend=TRUE)+
facet_wrap(~type+gender,scales="free_y")+
scale_y_reordered()+
guides(fill=guide_legend(title="State Name"))
## `summarise()` has grouped output by 'state_code', 'type'. You can override
## using the `.groups` argument.
The above two plots provide similar interpretation, they are just different ways to presenting the same data.
Amongst the public colleges, the highest % women enrollment was noted in NY,the least % women enrollment was noted in MT, the highest % men enrollment was seen in MT and the lowest in NY.
Out of the private colleges, the highest % women enrollment was noted in WI,the least % women enrollment was noted in NJ, the highest % men enrollment was seen in NJ and the lowest in WI.
# boxplot to compare the percent of men and women enrolled at different types of colleges in different states
ggplot(data=gender,
aes(x=enrollment,
y=state_code,
fill=type))+
geom_boxplot()+
facet_wrap(~gender+type)+
scale_fill_manual(values=mycolors)+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
labs(title=str_wrap("Percent of men and women enrolled in private vs public college across the different states",width=60),
x="Enrollment by Gender(in percent)",
y="State Name")+
guides(fill=guide_legend(title="College Type"))
Out of the private colleges, % enrollment of Men was highest in NJ and lowest in WI. The range of % men enrollment is the largest for New Jersey followed by New York and smallest for New Mexico and Nevada.
Out of the private colleges, % enrollment of women was highest in WI and lowest in New Jersey. The range of % women enrollment is the largest for New Jersey followed by New York and smallest for New Mexico and Nevada.
Comparing the public colleges, % enrollment of Men was highest in MT and lowest in NY. The range of % men enrollment is the largest for SD and smallest for MT.
Out of the public colleges, % enrollment of women was highest in NY and lowest in MT. The variability in the data for both NY and MT is very small and is largest for SD.
Since there are too many race and ethnic groups, I picked those race groups that I am interested to explore further.
race.final<-race%>%
filter(race=="american_indian_alaska_native"|
race=="white"|
race=="asian"|
race=="black"|
race=="hispanic")
ggplot(race.final,
aes(x=enrollment,
y=state_code,
fill=race))+
geom_bar(position="stack",stat="identity")+
facet_wrap(~type)+
labs(title=str_wrap("Comparison of different race-ethnic groups enrollment in private and public colleges across the different states in USA",width=60),
x="% Enrollment",
y="State Name")+
guides(fill=guide_legend(title="Race"))
Amongst the private colleges,NY has the highest student enrollment, out of which almost 3/4th of the student population are whites .
% Asian population is most prominent in California. % Black student population is noticeable in states like NC,TX,TN,GA and FL. Hispanic student population more commonly seen in CA and TX.
American Indian Alaskan Native is a really small race/ethnic population group, small numbers are noticed in CA, NY and NC
Out of the public colleges,the enrollment rates are generally lower than the private colleges.The largest white student population is noticed in PA, Blacks are seen in GA. Small Asian population noticed in TX and it is interesting to see that CA has only a really tiny Asian population enrolled in the public colleges. Kansas has a small native american population.Hispanics most commonly seen in TX and then in GA.
compare<-cost_final%>%
filter(type=="Private"|type=="Public")%>%
group_by(state_code,type)%>%
summarise(n=n())
## `summarise()` has grouped output by 'state_code'. You can override using the
## `.groups` argument.
ggplot(compare,
aes(x=n,
y=state_code,
fill=type))+
geom_bar(stat="identity")+
facet_wrap(~type)+
labs(title=str_wrap("Comparison of number of private and public collegs",width=60),
x="Number of colleges",
y="State name")+
guides(fill=guide_legend(title="Race"))
In this data, the largest number of the private colleges was present in New York, followed by California and the smallest number of private colleges were located in AK,NV and NM. The largest number of public colleges was located in PA followed by TX and the least number of public colleges were in DC,WY and VI.
ggplot(data=hist_tuition,
aes(x=as.factor(year),
y=tuition_cost,
group=tuition_type))+
geom_point(aes(color=tuition_type))+
geom_line(aes(color=tuition_type))+
facet_grid(.~type)+
theme_minimal()+
theme(legend.position = "bottom")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
labs(title=str_wrap("Line plot of historical tuition data for the years 1985 to 2017",width=60),
x="Year",
y="Tuition cost in USD")+
guides(fill=guide_legend(title="Tuition type"))
From the above line plot, increasing trend in the tuition cost is observed for all the different tuition types and for the different type of colleges.
Especially for the private colleges, the increasing trend is more prominent. In the private colleges, the “4 yr current” and “all year current” rates have constantly gone up across the years whereas the “2 yr current”, “2 yr constant”,“4 yr constant” and “all year constant” tuition costs slightly went up and then dropped a bit. This trend is expected. Also, the baseline tuition fees i.e. tuition cost at 1985-86 year is highest for “4 year constant” and “all constant” , followed by 2 year constant and then by “all current”,“4 year current” and “2 year current”.
For the public colleges, the baseline tuition fees is same as observed for the private colleges. However, the increasing trend is not very dramatic as observed for the private colleges. The trend is gradual over the years and the rate at which the tuition cost increases for the different tuition types are fairly similar.
I had two research questions:one was to identify how the earning potential and quality of education are different for students attending private and public colleges across the different states in the USA. The second question was to assess if gender/ race/ethnicity has an influence on college selection. For example, would a student belonging to a particular gender/race group pick a public/private college or have preference in attending college in a particular state.
Overall,I was able to draw some inferences from the summary tables and the bar plots/boxplots.
To answer the first question of interest, I used two variables i.e. early_career_pay
and mid_career_pay
to evaluate the earning potential of a student. I used two more variablesmake_world_better_percent
and stem_percent
to assess if a attending particular type of college at a particular location influences the quality of education one would receive.
For early-career pay:
There are a few states which have higher median early career pay for public colleges compared to private colleges and vice-versa and in only 5 states (MA,AK,MT,DE,AZ), the differences seem to be more pronounced with public colleges having higher median career pay.
Also,data was not available for some of the important states like CA, MA which has a lot of public colleges and hence a true estimate of the median career pay is not available for some of the states and hence this makes it harder to conclude if there is a significant difference between the earning potential of students attending private and public colleges.
From the boxplots, we were able to assess the spread of the data and the variability in data for some of the states likes CA and ME(for private colleges) and VA,VT (for public colleges) was pretty high. Due to the variability being different for different states and presence of outliers, median seem to be a more reliable measure to assess the central tendency of the data.
Overall, with the given data, it is very hard to draw an inference that attending a particular type of college actually influences the earning potential of a student.
The largest difference in the median early career pay amongst the 5 states where the difference in median early career pay was noticeable is $10,000. I would think that just by a difference of $10,000,I would conclude that both the public and private colleges seem to have comparable median early career pay.
Also, while comparing the early career pay, It is important to account for the cost of living which includes house pricing,food/healthcare/transportation costs in each of the state before we make a comparison between the early career pay
I used websites like nerdwallet.com to understand the differences in living cost between cities. For example: an annual income of $100,000 in San Francisco equals to $67,550 in Portland,OR and equals to $81,389 in Boston,MA.
This comparison is only for a single person, it would dramatically change if there are more dependents/children and in that case, child care cost will also play a huge role in assessing the cost of living.
For mid-career pay:
My conclusions for comparing median mid career pay between private and public colleges are similar to that of the early career pay since there were some states where the student from public colleges had higher median mid career pay than the private colleges and vice-versa and only in 5 states(AK,MT,IA,DE,AZ), the differences seem to be more pronounced with public colleges having higher median career pay than private colleges. Infact, the trends between early career pay and mid career pay across the states is similar.
The largest difference in median mid career pay of `$15,000 was noted in Arizona. It is also interesting to see the public colleges having similar and better earning potential in some states.
Thus, with the given constraints of the data,I did not see a convincing difference between the earning potential of private and public colleges.
For Make world a better percent:
Public and private colleges didn’t seem to have an observable difference in make world a better % thereby implying the quality of education is comparable between the private and public colleges across the states.
It was interesting to see that students graduating from colleges/states that were having high earning potential didn’t seem to have high % for making the world better.
For Stem percent:
In some of the states, the private colleges seems to have higher stem % rating that the public colleges and the vice-versa was observed in some states. Prominent difference in stem % rating was observed in 13 states(LA,DE,AK,AZ,FL,HI,MO,CO,VA, IL,CA,MT,MA)with public colleges have higher ratings than the private colleges and and 6 states (VT,AR,MN,CT,GA,MS) with private colleges having higher stem ratings than the public colleges.
Although, this is not very appropriate to make to make inferences from the plot since some of the states had one or two colleges and making inferences from a limited data could potentially result in incorrect conclusions. For example CA, MA had only 1 public college and MT had 2 public and 2 private colleges, so we really can’t take the prominent difference as an accurate reflection of all the colleges in the respective states.
However, by assessing the differences between the private and public college stem ratings, public colleges seem to be performing well in terms of imparting stem knowledge across most states and on par with private colleges.
Gender
From the summary table and the data visualizations, it seems like the % women enrollment is slightly higher in both public and private colleges compared to % men enrollment. Public colleges in a couple states like NY and MT had higher % men enrollment than women but the rest of the states pretty much have comparable enrollment % or %women enrollment exceeding %men enrollment.
Race
Race did seem to have an influence in type of college selection which kind of aligns with literature. In some states, I was able to notice similarities in race % enrollment between private and public colleges and in some states, I couldn’t find that connection, this could be due to some states having very less public colleges in the data set and hence a reliable comparison can’t be made.
For example: from the barplots, it was observed that Hispanics enrolled in colleges in areas where they are densely populated like Texas, California, New Mexico etc. Similarly, Asians were noticed prominently in California,Texas and Whites enrollment is prominent in Central North and Mid West colleges. Blacks were more prominently seen in Texas and Tennesse.American Indian and Alaska native were present in small numbers in NC and CA,
I was expecting to see that early career pay
being influenced by the type of college one would attend as I expected that the private college students may have better exposure to career fairs, networking opportunities and the brand name would help them to get a really good entry level job whereas students from public colleges would also get new grad jobs but may not have a comparable compensation like the private college grads. But form this data,I didn’t observe this relationship and it seemed like both the private and public colleges have similar early career pay.
For the mid career pay
, my expectation aligns with the findings, I wasn’t expecting to see the a remarkable difference in the mid career pay between students from private and public colleges.The rationale being, once a student enters industry, promotions and pay rise would be based on performance and not based on the type of college one attended.
I expected to see the private colleges having better stem % and higher making the world better percent since they would have more funding, infrastructure to support stem activities and in general projects that would help student to have an impact. But I was surprised to see that some of the public colleges having higher stem % and making world better % compared to the private colleges and overall from the data, it seemed like the public and private college students had similar quality of education.
My expectation of % enrollment based on gender was actually the opposite, I was expecting to see more % men enrollment atleast in some of the states but findings from the data was opposite. This could also be because the data as such didn’t have % men enrollment. I subtracted the women enrollment from total enrollment to obtain % men enrollment and the trend observed could be due to this assumption.
My expectation of % enrollment based on race was pretty accurate. Based on literature and online sources, it seemed like certain race/ethnic groups prefer to attend colleges in certain parts of the country however the difference in % enrollment between the private and public colleges was not clear from the plots.
I would expect there shouldn’t be a significant difference in the %enrollment based on race/ethnicity between private and public colleges, economic factors might play a role here as private colleges are mostly much more expensive than public colleges.
There were some limitations in the data set due to which some of the conclusions may not be very reliable
-> some states had very limited data from public/private college, thus the comparisons made for some of the states will not be accurate
->there were lot of missing data and I had to exclude them in order to analyze, this could potentially cause bias in interpretations
->only women enrollment was provided, I calculated the men enrollment by subtracting from the total enrollment, by doing so, I didn’t account for non-binary gender groups.
->the income levels in the income data was in the categorical format. I was hoping to study the impact of economic factors in college selection, having the income in continuous format would have helped the analysis.