Please submit your .Rmd
and .html
files in
Sakai. If you are working together, both people should submit the
files.
This data set was downloaded from kaggle.com [https://www.kaggle.com/datasets/rtatman/animal-bites]. Over 9,000 bites were recorded which occurred near Louisville, Kentucky between 1985 through 2017. It consists of 15 variables. The following are the variable names and description of the variables from the original data set that I will use in this project:
bite_date: The date the bite occurred SpeciesIDDesc: The species of animal that did the biting GenderIDDesc: Gender (of the animal)
Define your research question below. What about the data interests you? What is a specific question you want to find out about the data?
Getting bitten by an animal can lead to exposure to rabies and serious injury. This data set of animal bites can be used to get informed about the animals that bite people most often, and the number of bites that occur yearly. The information can inspire caution to be taken around animals that are known to bite often. For this project, I aim to address the following two research questions:
How do the number of bites differ based on gender and species of the animal?
Is there a trend in number of bites over a period of time?
Given your question, what is your expectation about the data?
There are several species and many observations recorded in this data set. In terms of gender, it is difficult to say if there will be a difference in the number of bites. Specifically for species, since many people own dogs and cats, I expect to to see a larger portion of bites recorded for these animals. For the second question, because of the long period (1985-2021) over which the data has been collected, I expect to see a varying number of bites over this period of time.
Load the data below and use
dplyr::glimpse()
orskimr::skim()
on the data. You should upload the data file into thedata
directory.
# Import the data
Health_AnimalBites <- read_excel("data/Health_AnimalBites.xlsx")
# Glimpse and skim the data to explore the data set composition and distribution
glimpse(Health_AnimalBites)
## Rows: 9,003
## Columns: 15
## $ bite_date <chr> "1985-05-05", "1986-02-12", "1987-05-07", "1988-10-0…
## $ SpeciesIDDesc <chr> "DOG", "DOG", "DOG", "DOG", "DOG", "DOG", "DOG", "DO…
## $ BreedIDDesc <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ GenderIDDesc <chr> "FEMALE", "UNKNOWN", "UNKNOWN", "MALE", "FEMALE", "U…
## $ color <chr> "LIG. BROWN", "BRO & BLA", NA, "BLA & BRO", "BLK-WHT…
## $ vaccination_yrs <dbl> 1, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA…
## $ vaccination_date <chr> "1985-06-20", NA, NA, NA, NA, NA, "1990-02-13", NA, …
## $ victim_zip <chr> "40229", "40218", "40219", NA, NA, "40211", "40203",…
## $ AdvIssuedYNDesc <chr> "NO", "NO", "NO", "NO", "NO", "NO", "NO", "NO", "NO"…
## $ WhereBittenIDDesc <chr> "BODY", "BODY", "BODY", "BODY", "BODY", "BODY", "BOD…
## $ quarantine_date <chr> "1985-05-05", "1986-02-12", "1990-05-07", "1990-10-0…
## $ DispositionIDDesc <chr> "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN…
## $ head_sent_date <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ release_date <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ ResultsIDDesc <chr> "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN…
# there are 9,003 rows, and 15 columns
skim(Health_AnimalBites)
Name | Health_AnimalBites |
Number of rows | 9003 |
Number of columns | 15 |
_______________________ | |
Column type frequency: | |
character | 12 |
logical | 2 |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
bite_date | 317 | 0.96 | 10 | 10 | 0 | 2702 | 0 |
SpeciesIDDesc | 118 | 0.99 | 3 | 7 | 0 | 9 | 0 |
GenderIDDesc | 2526 | 0.72 | 4 | 7 | 0 | 3 | 0 |
color | 2576 | 0.71 | 2 | 10 | 0 | 713 | 0 |
vaccination_date | 4888 | 0.46 | 10 | 10 | 0 | 2107 | 0 |
victim_zip | 1838 | 0.80 | 4 | 10 | 0 | 233 | 0 |
AdvIssuedYNDesc | 6438 | 0.28 | 2 | 3 | 0 | 2 | 0 |
WhereBittenIDDesc | 616 | 0.93 | 4 | 7 | 0 | 3 | 0 |
quarantine_date | 6983 | 0.22 | 10 | 10 | 0 | 602 | 0 |
DispositionIDDesc | 7468 | 0.17 | 4 | 8 | 0 | 4 | 0 |
head_sent_date | 8608 | 0.04 | 10 | 10 | 0 | 325 | 0 |
ResultsIDDesc | 7460 | 0.17 | 7 | 8 | 0 | 3 | 0 |
Variable type: logical
skim_variable | n_missing | complete_rate | mean | count |
---|---|---|---|---|
BreedIDDesc | 9003 | 0 | NaN | : |
release_date | 9003 | 0 | NaN | : |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
vaccination_yrs | 5265 | 0.42 | 1.45 | 0.85 | 1 | 1 | 1 | 1 | 11 | ▇▁▁▁▁ |
# there are 12 character variables, 2 logical variables, and 1 numeric variable
If there are any quirks that you have to deal with
NA
coded as something else, or it is multiple tables, please make some notes here about what you need to do before you start transforming the data in the next section.
Make sure your data types are correct!
The cells where there is missing data are blank, so no additional options are needed when importing the data. The names of the variables should be cleaned. Additionally, the date variables should be split into year, month and date variables. Then converted into numerical variables.
We can change these in two steps before transforming the variables:
# Clean the names of all columns in the data set
bites_cleaned <- clean_names(Health_AnimalBites) %>%
glimpse()
## Rows: 9,003
## Columns: 15
## $ bite_date <chr> "1985-05-05", "1986-02-12", "1987-05-07", "1988-1…
## $ species_id_desc <chr> "DOG", "DOG", "DOG", "DOG", "DOG", "DOG", "DOG", …
## $ breed_id_desc <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ gender_id_desc <chr> "FEMALE", "UNKNOWN", "UNKNOWN", "MALE", "FEMALE",…
## $ color <chr> "LIG. BROWN", "BRO & BLA", NA, "BLA & BRO", "BLK-…
## $ vaccination_yrs <dbl> 1, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA,…
## $ vaccination_date <chr> "1985-06-20", NA, NA, NA, NA, NA, "1990-02-13", N…
## $ victim_zip <chr> "40229", "40218", "40219", NA, NA, "40211", "4020…
## $ adv_issued_yn_desc <chr> "NO", "NO", "NO", "NO", "NO", "NO", "NO", "NO", "…
## $ where_bitten_id_desc <chr> "BODY", "BODY", "BODY", "BODY", "BODY", "BODY", "…
## $ quarantine_date <chr> "1985-05-05", "1986-02-12", "1990-05-07", "1990-1…
## $ disposition_id_desc <chr> "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKN…
## $ head_sent_date <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ release_date <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ results_id_desc <chr> "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKN…
# Separate the date variables needed for analysis: bite_date, vaccination_date
# Converting them into numeric variables as well if they aren't already
bites_cleaned <- bites_cleaned %>%
separate(col = bite_date,
into = c("bite_year", "bite_month", "bite_day"),
sep = "-",
remove = FALSE)
# View a few observations to confirm that the variable was properly split
bites_cleaned %>%
select(bite_date, bite_year, bite_month, bite_day) %>% # just show these columns
slice(1:20) # show first 20 rows
## # A tibble: 20 × 4
## bite_date bite_year bite_month bite_day
## <chr> <chr> <chr> <chr>
## 1 1985-05-05 1985 05 05
## 2 1986-02-12 1986 02 12
## 3 1987-05-07 1987 05 07
## 4 1988-10-02 1988 10 02
## 5 1989-08-29 1989 08 29
## 6 1989-11-24 1989 11 24
## 7 1990-02-08 1990 02 08
## 8 1990-02-22 1990 02 22
## 9 1990-08-02 1990 08 02
## 10 1990-08-19 1990 08 19
## 11 1990-08-31 1990 08 31
## 12 1990-10-20 1990 10 20
## 13 1991-02-09 1991 02 09
## 14 1991-07-05 1991 07 05
## 15 1991-09-14 1991 09 14
## 16 1991-10-09 1991 10 09
## 17 1991-11-07 1991 11 07
## 18 1992-02-08 1992 02 08
## 19 1992-02-27 1992 02 27
## 20 1992-03-06 1992 03 06
# Check the class of the variable
class(bites_cleaned$bite_year)
## [1] "character"
class(bites_cleaned$bite_month)
## [1] "character"
class(bites_cleaned$bite_day)
## [1] "character"
# Since it is a character, we convert it to the desired numeric format
bites_cleaned$bite_year <- as.numeric(bites_cleaned$bite_year)
bites_cleaned$bite_month <- as.numeric(bites_cleaned$bite_month)
bites_cleaned$bite_day <- as.numeric(bites_cleaned$bite_day)
# Confirm that it was successfully changed to numeric
class(bites_cleaned$bite_year)
## [1] "numeric"
class(bites_cleaned$bite_month)
## [1] "numeric"
class(bites_cleaned$bite_day)
## [1] "numeric"
If the data needs to be transformed in any way (values recoded, pivoted, etc), do it here. Examples include transforming a continuous variable into a categorical using
case_when()
, etc.
Due to the long period of time for which the data was collected, creating a decade variable will facilitate analysis. The variable will be based on the range of the data and the bite_date variable to answer the research questions.
summarize()
.# determine the range of bite_year
bites_cleaned %>%
arrange(bite_year) %>% # use this to confirm that the output is correct by viewing the table
summarize(min(bite_year, na.rm = TRUE), max(bite_year, na.rm = TRUE))
## # A tibble: 1 × 2
## `min(bite_year, na.rm = TRUE)` `max(bite_year, na.rm = TRUE)`
## <dbl> <dbl>
## 1 1952 5013
# the range identified the max as "5013", which is beyond present day, so we filter to identify possible typos or misreading of the date
# install package for date format, 'ymd'
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
# mutate and filter
bites_cleaned %>% mutate(bite_date = ymd(bite_date)) %>% filter(bite_date > ymd("2023-01-01"))
## # A tibble: 5 × 18
## bite_date bite_year bite_month bite_day speci…¹ breed…² gende…³ color vacci…⁴
## <date> <dbl> <dbl> <dbl> <chr> <lgl> <chr> <chr> <dbl>
## 1 2101-02-18 2101 2 18 CAT NA FEMALE BLACK NA
## 2 5013-07-15 5013 7 15 DOG NA FEMALE WHITE 1
## 3 2201-01-21 2201 1 21 CAT NA MALE GRAY NA
## 4 2201-02-21 2201 2 21 DOG NA MALE TAN … 1
## 5 2201-05-01 2201 5 1 DOG NA MALE BROWN 1
## # … with 9 more variables: vaccination_date <chr>, victim_zip <chr>,
## # adv_issued_yn_desc <chr>, where_bitten_id_desc <chr>,
## # quarantine_date <chr>, disposition_id_desc <chr>, head_sent_date <chr>,
## # release_date <lgl>, results_id_desc <chr>, and abbreviated variable names
## # ¹species_id_desc, ²breed_id_desc, ³gender_id_desc, ⁴vaccination_yrs
# remove all rows/observations with the identified years beyond 2023
bites_cleaned <- bites_cleaned %>%
filter(bite_year < '2023')
# remaining observations = 8681
# confirm that the rows were removed
bites_cleaned %>%
filter(vaccination_date > ymd("2023-01-01"))
## # A tibble: 0 × 18
## # … with 18 variables: bite_date <chr>, bite_year <dbl>, bite_month <dbl>,
## # bite_day <dbl>, species_id_desc <chr>, breed_id_desc <lgl>,
## # gender_id_desc <chr>, color <chr>, vaccination_yrs <dbl>,
## # vaccination_date <chr>, victim_zip <chr>, adv_issued_yn_desc <chr>,
## # where_bitten_id_desc <chr>, quarantine_date <chr>,
## # disposition_id_desc <chr>, head_sent_date <chr>, release_date <lgl>,
## # results_id_desc <chr>
bites_cleaned%>%
summarize(min(bite_year, na.rm = TRUE), max(bite_year, na.rm = TRUE))
## # A tibble: 1 × 2
## `min(bite_year, na.rm = TRUE)` `max(bite_year, na.rm = TRUE)`
## <dbl> <dbl>
## 1 1952 2021
The data for bite_date ranges from 1952 through 2021.
mutate()
.# using case_when() within mutate(), we can create a categorical variable for each decade within the data set using the bite_year variable
bites_cleaned <- bites_cleaned %>%
mutate(
decade = case_when(
bite_year < 1959 ~ "50's",
bite_year >= 1960 & bite_year <= 1969 ~ "60's",
bite_year >= 1970 & bite_year <= 1979 ~ "70's",
bite_year >= 1980 & bite_year <= 1989 ~ "80's",
bite_year >= 1990 & bite_year <= 1999 ~ "90's",
bite_year >= 2000 & bite_year <= 2009 ~ "2000's",
bite_year >= 2010 & bite_year <= 2019 ~ "2010's",
bite_year >= 2020 ~ "2020's")
)
# make decade a factor variable
bites_cleaned %>%
mutate(decade =
factor(decade,
levels = c("50's", "60's", "70's", "80's", "90's", "2000's", "2010's", "2020's")
)
) %>%
# view to confirm order of the categories is correct
tabyl(decade)
## decade n percent
## 50's 2 0.0002303882
## 60's 0 0.0000000000
## 70's 0 0.0000000000
## 80's 6 0.0006911646
## 90's 36 0.0041469877
## 2000's 17 0.0019582997
## 2010's 8618 0.9927427716
## 2020's 2 0.0002303882
Since the 2010’s has a significant amount of bites recorded (8,618), I will only use this decade in particular for analysis.
I will create a data set for only the 2010’s.
# Use filter() to subset the bites_cleaned data to the object bites_2010
bites_2010 <- bites_cleaned %>%
filter(decade == "2010's")
# use group_by() and mutate() to create the count variable, `num_bites`
bites_2010 <- bites_2010 %>%
group_by(bite_year) %>%
mutate(num_bites = n())
Bonus points (5 points) for datasets that require merging of tables, but only if you reason through whether you should use
left_join
,inner_join
, orright_join
on these tables. No credit will be provided if you don’t.
There were no tables to merge for this data set.
Show your transformed table here. Use tools such as
glimpse()
,skim()
orhead()
to illustrate your point.
# use glimpse() to check whether the data is ready for analysis
bites_2010 %>%
glimpse() %>%
select(1:5, 7, 19, 20) # view only the variables of interest
## Rows: 8,618
## Columns: 20
## Groups: bite_year [9]
## $ bite_date <chr> "2010-01-01", "2010-01-02", "2010-01-02", "2010-0…
## $ bite_year <dbl> 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2…
## $ bite_month <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ bite_day <dbl> 1, 2, 2, 2, 2, 2, 3, 4, 4, 5, 5, 6, 7, 7, 7, 8, 8…
## $ species_id_desc <chr> "DOG", "DOG", "DOG", "CAT", "DOG", "DOG", "CAT", …
## $ breed_id_desc <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ gender_id_desc <chr> "FEMALE", "MALE", "UNKNOWN", "FEMALE", "UNKNOWN",…
## $ color <chr> "WHT", "BLK-BRN", NA, NA, "BLK", "BRN-WHT", "BLK-…
## $ vaccination_yrs <dbl> 1, 3, NA, NA, NA, 1, NA, 1, 1, 1, 1, 1, 3, 3, 1, …
## $ vaccination_date <chr> "2009-10-22", "2008-02-07", NA, NA, NA, "2010-01-…
## $ victim_zip <chr> "40228", "40291", "40219", "40291", "40216", "400…
## $ adv_issued_yn_desc <chr> "NO", "NO", "YES", "NO", "NO", "NO", "YES", "NO",…
## $ where_bitten_id_desc <chr> "BODY", "HEAD", "BODY", "BODY", "BODY", "HEAD", "…
## $ quarantine_date <chr> "2010-01-04", "2010-01-04", "2010-01-04", "2010-0…
## $ disposition_id_desc <chr> "RELEASED", "RELEASED", "UNKNOWN", "RELEASED", "U…
## $ head_sent_date <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ release_date <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ results_id_desc <chr> "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKN…
## $ decade <chr> "2010's", "2010's", "2010's", "2010's", "2010's",…
## $ num_bites <int> 1131, 1131, 1131, 1131, 1131, 1131, 1131, 1131, 1…
## # A tibble: 8,618 × 8
## # Groups: bite_year [9]
## bite_date bite_year bite_month bite_day species_id_…¹ gende…² decade num_b…³
## <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <int>
## 1 2010-01-01 2010 1 1 DOG FEMALE 2010's 1131
## 2 2010-01-02 2010 1 2 DOG MALE 2010's 1131
## 3 2010-01-02 2010 1 2 DOG UNKNOWN 2010's 1131
## 4 2010-01-02 2010 1 2 CAT FEMALE 2010's 1131
## 5 2010-01-02 2010 1 2 DOG UNKNOWN 2010's 1131
## 6 2010-01-02 2010 1 2 DOG FEMALE 2010's 1131
## 7 2010-01-03 2010 1 3 CAT UNKNOWN 2010's 1131
## 8 2010-01-04 2010 1 4 DOG FEMALE 2010's 1131
## 9 2010-01-04 2010 1 4 DOG MALE 2010's 1131
## 10 2010-01-05 2010 1 5 DOG MALE 2010's 1131
## # … with 8,608 more rows, and abbreviated variable names ¹species_id_desc,
## # ²gender_id_desc, ³num_bites
# View a table of the number of bites by year to see how the observations are distributed
bites_2010 %>%
tabyl(bite_year, num_bites)
## bite_year 1 1051 1131 1145 1148 1176 1180 801 985
## 2010 0 0 1131 0 0 0 0 0 0
## 2011 0 0 0 0 1148 0 0 0 0
## 2012 0 0 0 0 0 0 1180 0 0
## 2013 0 0 0 1145 0 0 0 0 0
## 2014 0 0 0 0 0 1176 0 0 0
## 2015 0 0 0 0 0 0 0 0 985
## 2016 0 1051 0 0 0 0 0 0 0
## 2017 0 0 0 0 0 0 0 801 0
## 2018 1 0 0 0 0 0 0 0 0
The categories for the variable, decade, correspond to the values of bite_year, and the table of num_bites variable with the bite_year show a distribution of observations that make sense. Thus, the data is ready for analysis.
Are the values what you expected for the variables? Why or Why not?
Use
group_by()
andsummarize()
to make a summary of the data here. The summary should be relevant to your research question
To answer the first question, we can use the group_by
and the summarize()
function on the variables for gender
and species to give us the number of bites per category of each.
# make gender_id_desc a factor variable, arrange the levels, and rename it "gender"
bites_2010 <- bites_2010 %>%
mutate(gender = factor(gender_id_desc, levels = c("FEMALE", "MALE")))
# use group_by() and summarize() to answer question # 1
bites_2010 %>%
group_by(gender) %>%
summarize(num_bites = n())
## # A tibble: 3 × 2
## gender num_bites
## <fct> <int>
## 1 FEMALE 1979
## 2 MALE 3763
## 3 <NA> 2876
# make species_id_desc a factor variable, arrange the levels, and rename it "species"
bites_2010 <- bites_2010 %>%
mutate(species = factor(species_id_desc))
# use group_by() and summarize() to answer question # 2
bites_2010 %>%
group_by(species_id_desc) %>%
summarize(num_bites = n())
## # A tibble: 10 × 2
## species_id_desc num_bites
## <chr> <int>
## 1 BAT 76
## 2 CAT 1527
## 3 DOG 6872
## 4 FERRET 4
## 5 HORSE 5
## 6 OTHER 8
## 7 RABBIT 3
## 8 RACCOON 21
## 9 SKUNK 1
## 10 <NA> 101
What are your findings about the summary? Are they what you expected?
After grouping the data by gender (female/male), the number of bites for female animals were observed to be 1,979. For male animals, they were 3,763. After grouping the data by species (bat/cat/dog/ferret/horse/other/rabbit/raccoon/skunk), the number of bites were highest among dogs (6,872), followed by cats (1,527), bats (76), raccoon (21), others (8), horses(5), ferrets(4), rabbits(3), and skunks(1).
Finding that male animals were observed to bite 1,784 more people than female animals was surprising. Though, there is 2,876 missing observations for gender which may have potentially change the distribution since it is a large proportion of the data. For species, it is not surprising that there are many more recorded bites among dogs and cats since they are very common to have as household pets. Though, not nearly as large in proportion, it is surprising to see that there are many recorded incidents of bites from bats.
Make at least two plots that help you answer your question on the transformed or summarized data. Use scales and/or labels to make each plot informative.
The following will show visualizations of the number of bites over the chosen period of time (2010’s), as well as, the distributions of the number of bites that occurred in each year, by gender and then by species.
# create a histogram of bite_year filled to gender
gender_bites <- ggplot(bites_2010) +
aes(x = bite_year,
fill = gender) +
geom_histogram() +
scale_fill_manual(values = c("orange", "yellow")) +
labs(title = "Number of Bites by Year and Gender",
x = "Year of Bite Occurrence",
y = "Number of Bites")
#output
gender_bites
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# create a histogram of bite_year filled to species
species_bites <- ggplot(bites_2010) +
aes(x = bite_year,
fill = species) +
geom_histogram() +
scale_fill_brewer(palette = "Paired") +
labs(title = "Number of Bites by Year and Species",
x = "Year of Bite Occurrence",
y = "Number of Bites")
#output
species_bites
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# create a line graph object for bite_year by num_bites
line_graph <- ggplot(bites_2010) +
aes(x = bite_year,
y = num_bites) +
geom_line() +
labs(title = "Number of Bites by Year",
x = "Year of Bite Occurrence",
y = "Number of Bites")
#output
line_graph
Summarize your research question and findings below.
The first research question I posed was regarding the difference in the number of bites across categories of gender and categories of species of the animals included in the data set. Based on the histogram of the number of bites per year, by category of gender, we see that the number of bites of both female and male animals across each year is very similar. Male animals have a noticeably higher number of bites recorded compared to females as well. Based on the histogram of the number of bites per year, by categories of species, we can see that dogs have an overwhelmingly higher number of recorded bites in comparison to other animals. These are confirmed with the summary tables that I explored earlier in the analysis. In comparison to dogs, and cats, the rest of the species had very little recorded occurrences of bites.
To answer the second research question and in order to see the trend more clearly, I plotted another graph solely focusing on the number of bites per year in the 2010’s. From the line graph, you can see that the number of recorded bites were high (about 1100 bites) per year and with little variation between 2010 and 2013. After 2014, the number of bites started to decline rapidly. In 2018, there was only one recorded bite which explains the negative trend through this point. From observing that there are very few bites in other decades within this data set, it is also possible that other bites occurring in 2018 were not recorded.
Overall the information I extracted from this data set is helpful to see which animals more frequently bite people, and what other animals people are prone to being bit from. For the years that also have a lot of observations recorded (2010 - 2016), it is worth noting that the number of bites remain relatively consistent, and without any surprising spikes.
Are your findings what you expected? Why or Why not?
The number of bites varied greatly across gender and species, as well as, over time as I expected. Though, I did observe a lot of missing data within the gender variable and some dates which were not entered in correctly, which was surprising. There may be other misclassified observations which I did not come across and that may have skewed the results. Additionally, I did not expect that the bulk of the recorded bites would have occurred between 2010 and 2017.