Define your research question below. What about the data interests you? What is a specific question you want to find out about the data?
Research Question:
For this project, I used a dataset from “Tidy Tuesday” called “Art History”. The data in interesting because I love arts first and foremost. Second, because as per “Tidy Tuesday”1: “The data…assess(es) the demographic representation of artists through editions of Janson’s History of Art and Gardner’s Art Through the Ages, two of the most popular art history textbooks used in the American education system”. Hence, this package is very rich in artists’ demographic information which can answer a lot of questions about racism in art and representation of non-white race art(ists) in art textbooks.
My primary reasearch question is:
How are American artists from different races represented in two of the most popular art history textbooks used in the American education system.
My specific question is:
Whether the representation of non-white races has varied before and after the year 2000. It is known that reverse racism has been increasing since the turn of the new millenium. In the early 2000s, the USA got its first Black president. It can be expected to see more representation of other races in art books as in the whole nation. I will explore whether the representation of non-white artists has increased post 2000 or not.
I will use numeric metrics for describing the space which artists from each race took up in all editions of both textbooks.The area in millimeter squared represent both the text and the figure of a particular artist per single page of a book.
Given your question, what is your expectation about the data?
I am expecting that more White American artists are represented in American art books compared to Americans from other races. There is, however, an expectation that the representation of non-white artists increased after the year 2000.
Load the data below and use
dplyr::glimpse()
orskimr::skim()
on the data. You should upload the data file into thedata
directory.
# Reading in the data manually from tidytuesdayR:
artists <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-01-17/artists.csv', na = "NA")
## Rows: 3162 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): artist_name, artist_nationality, artist_nationality_other, artist_g...
## dbl (6): edition_number, year, space_ratio_per_page_total, artist_unique_id,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#Save dataset:
artists%>% write_csv(file= "art_history_data.csv")
#OR
#write_csv(artists,file="art_history_data2023.csv")
#Exploring data:
# View (artists)
glimpse(artists)
## Rows: 3,162
## Columns: 14
## $ artist_name <chr> "Aaron Douglas", "Aaron Douglas", "Aaron Do…
## $ edition_number <dbl> 9, 10, 11, 12, 13, 14, 15, 16, 14, 15, 16, …
## $ year <dbl> 1991, 1996, 2001, 2005, 2009, 2013, 2016, 2…
## $ artist_nationality <chr> "American", "American", "American", "Americ…
## $ artist_nationality_other <chr> "American", "American", "American", "Americ…
## $ artist_gender <chr> "Male", "Male", "Male", "Male", "Male", "Ma…
## $ artist_race <chr> "Black or African American", "Black or Afri…
## $ artist_ethnicity <chr> "Not Hispanic or Latino origin", "Not Hispa…
## $ book <chr> "Gardner", "Gardner", "Gardner", "Gardner",…
## $ space_ratio_per_page_total <dbl> 0.3533658, 0.3739470, 0.3032593, 0.3770489,…
## $ artist_unique_id <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 6, 6, 6, 6…
## $ moma_count_to_year <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ whitney_count_to_year <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ artist_race_nwi <chr> "Non-White", "Non-White", "Non-White", "Non…
skim(artists)
Name | artists |
Number of rows | 3162 |
Number of columns | 14 |
_______________________ | |
Column type frequency: | |
character | 8 |
numeric | 6 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
artist_name | 0 | 1.00 | 4 | 99 | 0 | 413 | 0 |
artist_nationality | 0 | 1.00 | 3 | 18 | 0 | 52 | 0 |
artist_nationality_other | 0 | 1.00 | 5 | 8 | 0 | 6 | 0 |
artist_gender | 0 | 1.00 | 3 | 6 | 0 | 3 | 0 |
artist_race | 0 | 1.00 | 3 | 41 | 0 | 6 | 0 |
artist_ethnicity | 58 | 0.98 | 25 | 29 | 0 | 2 | 0 |
book | 0 | 1.00 | 6 | 7 | 0 | 2 | 0 |
artist_race_nwi | 0 | 1.00 | 5 | 9 | 0 | 2 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
edition_number | 0 | 1 | 8.22 | 4.40 | 1.00 | 5.00 | 8.00 | 12.00 | 16.0 | ▇▇▆▅▆ |
year | 0 | 1 | 1994.24 | 19.20 | 1926.00 | 1986.00 | 1996.00 | 2009.00 | 2020.0 | ▁▁▃▇▇ |
space_ratio_per_page_total | 0 | 1 | 0.53 | 0.39 | 0.09 | 0.31 | 0.41 | 0.59 | 3.8 | ▇▁▁▁▁ |
artist_unique_id | 0 | 1 | 201.76 | 114.18 | 1.00 | 108.00 | 189.00 | 305.75 | 413.0 | ▆▇▇▆▆ |
moma_count_to_year | 0 | 1 | 4.31 | 7.79 | 0.00 | 0.00 | 1.00 | 5.00 | 64.0 | ▇▁▁▁▁ |
whitney_count_to_year | 0 | 1 | 1.96 | 5.19 | 0.00 | 0.00 | 0.00 | 0.00 | 40.0 | ▇▁▁▁▁ |
artists %>% tabyl(artist_nationality)
## artist_nationality n percent
## American 908 0.2871600253
## Argentine 1 0.0003162555
## Armenian-American 10 0.0031625553
## Australian 7 0.0022137887
## Austrian 36 0.0113851992
## Austrian-American 5 0.0015812777
## Belgian 30 0.0094876660
## Brazilian 1 0.0003162555
## British 317 0.1002530044
## Canadian 14 0.0044275775
## Chinese 5 0.0015812777
## Columbian 2 0.0006325111
## Congolese 5 0.0015812777
## Cuban 3 0.0009487666
## Cuban-American 5 0.0015812777
## Czech 3 0.0009487666
## Danish-American 6 0.0018975332
## Danish-French 16 0.0050600886
## Dutch 50 0.0158127767
## Dutch-American 18 0.0056925996
## French 870 0.2751423150
## French Polynesian 6 0.0018975332
## German 256 0.0809614168
## German-American 11 0.0034788109
## German-French 13 0.0041113219
## Hungarian 10 0.0031625553
## Hungarian-American 4 0.0012650221
## Hungarian-French 10 0.0031625553
## Indian 13 0.0041113219
## Iranian 3 0.0009487666
## Italian 74 0.0234029096
## Italian-American 10 0.0031625553
## Japanese 56 0.0177103099
## Korean 3 0.0009487666
## Latvian 2 0.0006325111
## Mexican 52 0.0164452878
## N/A 23 0.0072738773
## New Zealander 4 0.0012650221
## Norwegian 21 0.0066413662
## Pakistani-American 3 0.0009487666
## Peruvian 2 0.0006325111
## Polynesian 6 0.0018975332
## Russian 62 0.0196078431
## Russian-French 16 0.0050600886
## Scottish 16 0.0050600886
## Spanish 94 0.0297280202
## Swedish 5 0.0015812777
## Swiss 44 0.0139152435
## Swiss-French 7 0.0022137887
## Swiss-German 22 0.0069576218
## Thai 1 0.0003162555
## Uruguayan 1 0.0003162555
artists %>% tabyl(artist_race)
## artist_race n percent
## American Indian or Alaska Native 12 0.003795066
## Asian 79 0.024984187
## Black or African American 83 0.026249209
## N/A 29 0.009171410
## Native Hawaiian or Other Pacific Islander 23 0.007273877
## White 2936 0.928526249
If there are any quirks that you have to deal with
NA
coded as something else, or it is multiple tables, please make some notes here about what you need to do before you start transforming the data in the next section.
vis_dat(artists)
## Warning: `gather_()` was deprecated in tidyr 1.2.0.
## ℹ Please use `gather()` instead.
## ℹ The deprecated feature was likely used in the visdat package.
## Please report the issue at <]8;;https://github.com/ropensci/visdat/issueshttps://github.com/ropensci/visdat/issues]8;;>.
#Must assign values coded as "N/A" as `NA`
artists <- artists %>% mutate(
artist_race = na_if(artist_race, "N/A"))%>% drop_na(artist_race)
There is some missingness in artist ethnicity as shown by “skim” and “vis_dat” (n=58). However, it is not a problem because ethnicity not part of the analysis.
There is missingness in artist race that was detected by “View” and “tabyl” functions of the data. Missing values are not showing with either “skim” or “vis_dat” because of the way it is coded (N/A). “N/A” were converted to
NA
i.e. missing. Since only 4% of Race was missing from data (n=133), I decided to remove them from the analysis.
Make sure your data types are correct!
glimpse(artists)
## Rows: 3,133
## Columns: 14
## $ artist_name <chr> "Aaron Douglas", "Aaron Douglas", "Aaron Do…
## $ edition_number <dbl> 9, 10, 11, 12, 13, 14, 15, 16, 14, 15, 16, …
## $ year <dbl> 1991, 1996, 2001, 2005, 2009, 2013, 2016, 2…
## $ artist_nationality <chr> "American", "American", "American", "Americ…
## $ artist_nationality_other <chr> "American", "American", "American", "Americ…
## $ artist_gender <chr> "Male", "Male", "Male", "Male", "Male", "Ma…
## $ artist_race <chr> "Black or African American", "Black or Afri…
## $ artist_ethnicity <chr> "Not Hispanic or Latino origin", "Not Hispa…
## $ book <chr> "Gardner", "Gardner", "Gardner", "Gardner",…
## $ space_ratio_per_page_total <dbl> 0.3533658, 0.3739470, 0.3032593, 0.3770489,…
## $ artist_unique_id <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 6, 6, 6, 6…
## $ moma_count_to_year <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ whitney_count_to_year <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ artist_race_nwi <chr> "Non-White", "Non-White", "Non-White", "Non…
Data types seem to be correct: string values appear as character and numerical values are doubles.
Character variables include: artist name, artist nationality, artist nationality other, artist gender, artist race, artist ethnicity, book, artist race non-white.
Numerical variables include: edition number, year, space ratio per page total, artist unique id, moma count to year, whitney count to year
If the data needs to be transformed in any way (values recoded, pivoted, etc), do it here. Examples include transforming a continuous variable into a categorical using
case_when()
, etc.
## Creating a new dataset called "artists_Amer2" in which I am:
# Selecting only American artists,
# Converting area in book from cm2 to mm2 (x100), and
# Categorizing year to "before_2000" and "2000_andafter".
artists_Amer2<-artists %>% filter(artist_nationality=="American") %>% mutate(bookrep_mm= space_ratio_per_page_total*100) %>% mutate(reverse_racism = case_when(year >=1920 & year<2000 ~ "before_2000",year >=2000 & year<=2020 ~ "2020_andafter"))
class(artists_Amer2$reverse_racism)
## [1] "character"
#reverse_racism is character.Change reverse_racism to factor
artists_Amer2<-artists_Amer2%>% mutate(reverse_racism = factor(reverse_racism))
class(artists_Amer2$reverse_racism)
## [1] "factor"
levels(artists_Amer2$reverse_racism)
## [1] "2020_andafter" "before_2000"
#need to re-order levels of reverse_racism: bring "before_2000" first
artists_Amer2 <- artists_Amer2%>% mutate(reverse_racism = reverse_racism %>%
fct_relevel("before_2000"))
levels(artists_Amer2$reverse_racism)
## [1] "before_2000" "2020_andafter"
## Exploring variables of interest:
tabyl(artists$year) #the frequency of total artist representation per year (distribution)
## artists$year n percent
## 1926 19 0.006064475
## 1936 47 0.015001596
## 1948 60 0.019150974
## 1959 86 0.027449729
## 1963 62 0.019789339
## 1969 76 0.024257900
## 1970 68 0.021704437
## 1975 84 0.026811363
## 1977 90 0.028726460
## 1980 114 0.036386850
## 1986 253 0.080753272
## 1991 311 0.099265879
## 1995 185 0.059048835
## 1996 156 0.049792531
## 2001 353 0.112671561
## 2005 162 0.051707628
## 2007 163 0.052026811
## 2009 160 0.051069263
## 2011 153 0.048834982
## 2013 173 0.055218640
## 2016 179 0.057133738
## 2020 179 0.057133738
tabyl(artists_Amer2$reverse_racism) # the sample seems to be split fairly by timeline
## artists_Amer2$reverse_racism n percent
## before_2000 417 0.4607735
## 2020_andafter 488 0.5392265
Since I am interested only in American (American born) artists, I filtered the data by American nationality. I also converted area per page from cm2 to mm2 by multiplying by 100. Finally, I divided the timeline (years) at the 2000 point; where the year 2000 demarcates (softly) a reverse racism period.
Bonus points (5 points) for datasets that require merging of tables, but only if you reason through whether you should use
left_join
,inner_join
, orright_join
on these tables. No credit will be provided if you don’t.
Show your transformed table here. Use tools such as
glimpse()
,skim()
orhead()
to illustrate your point.
artists_Amer2 %>% glimpse
## Rows: 905
## Columns: 16
## $ artist_name <chr> "Aaron Douglas", "Aaron Douglas", "Aaron Do…
## $ edition_number <dbl> 9, 10, 11, 12, 13, 14, 15, 16, 2, 3, 4, 7, …
## $ year <dbl> 1991, 1996, 2001, 2005, 2009, 2013, 2016, 2…
## $ artist_nationality <chr> "American", "American", "American", "Americ…
## $ artist_nationality_other <chr> "American", "American", "American", "Americ…
## $ artist_gender <chr> "Male", "Male", "Male", "Male", "Male", "Ma…
## $ artist_race <chr> "Black or African American", "Black or Afri…
## $ artist_ethnicity <chr> "Not Hispanic or Latino origin", "Not Hispa…
## $ book <chr> "Gardner", "Gardner", "Gardner", "Gardner",…
## $ space_ratio_per_page_total <dbl> 0.3533658, 0.3739470, 0.3032593, 0.3770489,…
## $ artist_unique_id <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8…
## $ moma_count_to_year <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1…
## $ whitney_count_to_year <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ artist_race_nwi <chr> "Non-White", "Non-White", "Non-White", "Non…
## $ bookrep_mm <dbl> 35.33658, 37.39470, 30.32593, 37.70489, 39.…
## $ reverse_racism <fct> before_2000, before_2000, 2020_andafter, 20…
Artist nationality is only “American” now. A new column/variable called “bookrep_mm” has been added. A new column/variable called “reverse_racism” has been added.
Are the values what you expected for the variables? Why or Why not?
Yes. “bookrep_mm” which represents the area in squared millimeter (mm2) instead of cm2 is 100x the value of “space_ratio_per_page_total”. “reverse_racism” is a factor of 2 levels created from the variable “year”.
Use
group_by()
andsummarize()
to make a summary of the data here. The summary should be relevant to your research question
## Creating another dataset (artists_Amer2_Race) where American artists are grouped by race:
artists_Amer2_Race<-artists_Amer2%>% group_by(artist_race)%>% summarize(mean_bookrep_mm = mean(bookrep_mm, na.rm = TRUE))
#Arrange race descending,starting with races having the highest representation in art books downwards
artists_Amer2_Race%>% arrange(desc(mean_bookrep_mm)) %>% gt::gt()
artist_race | mean_bookrep_mm |
---|---|
American Indian or Alaska Native | 50.61391 |
White | 41.02466 |
Black or African American | 39.75356 |
Native Hawaiian or Other Pacific Islander | 35.88631 |
Asian | 25.91803 |
What are your findings about the summary? Are they what you expected?
The summary shows that American Indian or Alaska Native (AIANs) art(ists) have the highest representation in art books, followed by White, Black, Native Hawaiian or Other Pacific Islander, and finally Asian Americans. This is not what I expected. I was expecting that white art(ists) would rank first in representation in the 2 art textbooks.
## Creating another dataset (artists_Amer3_Race) where American artists are grouped by race and reverse_racism:
artists_Amer3_Race<-artists_Amer2%>% group_by(artist_race,reverse_racism)%>% summarize(mean_bookrep_mm = mean(bookrep_mm, na.rm = TRUE))
## `summarise()` has grouped output by 'artist_race'. You can override using the
## `.groups` argument.
#Arrange race descending,starting with races having the highest representation in art books downwards, split by racism era
artists_Amer3_Race%>% arrange(desc(mean_bookrep_mm)) %>% gt::gt()
reverse_racism | mean_bookrep_mm |
---|---|
American Indian or Alaska Native | |
2020_andafter | 51.70879 |
before_2000 | 48.69787 |
White | |
before_2000 | 42.03978 |
2020_andafter | 40.07465 |
Black or African American | |
2020_andafter | 40.01563 |
before_2000 | 39.02251 |
Native Hawaiian or Other Pacific Islander | |
2020_andafter | 35.88631 |
Asian | |
2020_andafter | 25.91803 |
What are your findings about the summary? Are they what you expected?
More information is revealed when data was stratified by reverse racism (before and after year 2000).
The representation of American Indian or Alaska Native artists increased after the year 2000. This race still has the highest representation in art books.
The representation of American artists from Asian race was non-existent prior to 2000. The representation of those artists appear to be heavy after the year 2000 since they moved from last place (as a general ranking among races) to the second place, when we stratified by year.
Black and African American artists also have more representation in art books after the year 2000.
Native Hawaiian or Other Pacific Islander artists are represented in art books only after 2000.
The representation of White artists in art books has decreased since 2000.
These findings are what I expected. With reverse racism happening since the turn of the millennium, more racial rights have been acquired and racism is becoming less pronounced. I am glad to see that happen in art books too!
Make at least two plots that help you answer your question on the transformed or summarized data. Use scales and/or labels to make each plot informative.
#Figure 1: Scatter plot:Mean area per page, by artists' race
ggplot(artists_Amer3_Race) + aes(x = artist_race, y = mean_bookrep_mm, color=reverse_racism) + geom_point(alpha = 0.5) +
labs(title = "American Artist: Mean Area per Page by Artist's Race",
y = "Mean Area per Page (mm2)",
x = "Artist Race", color = "Reverse Racism") + theme(axis.text.x = element_text(angle = 90))
ggsave("Figure1.Midterm.jpg")
## Saving 7 x 6 in image
#Figure 2: Boxplot: Area in book by race, faceted by reverse racism
ggplot(artists_Amer2) + aes(x = artist_race, y = bookrep_mm, fill=artist_race) + geom_boxplot(alpha = 0.2) + facet_wrap(vars(reverse_racism))+ labs(title = "American Artist: Area in Book by Race",
y = "Area per Page (mm2)",x = "Artist Race",
fill = "Reverse Racism")+ theme(axis.text.x = element_blank())
ggsave("Figure2.Midterm.jpg")
## Saving 7 x 5 in image
Summarize your research question and findings below.
This analysis was conducted to explore the amount of representation of non-white races in Janson’s History of Art and Gardner’s Art Through the Ages, two of the most popular art history textbooks used in the American education system. The area in millimeters squared of both the text and the figure of a particular artist divided by the area in millimeters squared of a single page of the respective edition is used to measure representation of the art(ist) and their race. I stratified the findings by 2 timeline periods (prior to 2000, and after 2000) to see if racial representation changed along years, especially with reverse racism pronounced in the 2000s.
The general representation in art books was in favor of American Indian or Alaska Native race, then White Americans, Black Americans, Pacific Islander or Hawaiian and the least represenation of American artists of Asian race.
When art(ist) representation was further stratified by timeline, I found out that 2 races were not even represented before 2000 (Native Hawaiian or Other Pacific Islander and Asian). There was an increased representation of American Indian or Alaska Native and Black art(ists) after 2000 and a decrease in representation of white art(ists).
Are your findings what you expected? Why or Why not?
The first part of the results (AIANs represented more than white) is not exactly what I expected. I must also say that I have used data on artists who are only American. American artists of another origin (e.g. German-American, French-American..etc) were not included in this analysis. Had they been included, we might have seen other results (but that’s another research). However, the second part of my findings are so exciting. It comes as a nice surprise to see more racial diversity happening after year 2000, in the two most important art textbooks used in American education system.