Crime Rates Around Nightlife Districts After Midnight

Author

Richa S. Tigiripally

Published

December 14, 2025

EXECUTIVE SUMMARY

This analysis examines whether crime rates rise around popular nightlife districts after midnight using NYPD complaint data (2019-2023) linked to Yelp-defined nightlife zones. By analyzing an extended time window (8 PM–8 AM) and comparing crime patterns across taxi zones with varying nightlife density, I reveal how entertainment districts create distinct safety challenges with direct implications for urban economic vitality and public safety policy.

Introduction

The Overarching Question

Our team investigates: How does nightlife activity—bars, entertainment, and rideshares—drive urban economies and affect city safety? This question sits at the intersection of economic development, public safety, and urban planning. Nightlife districts are economic engines that generate substantial employment, tax revenue, and cultural vitality. New York City’s nightlife economy alone generates over $35 billion annually. However, concentrated late-night activity also creates public safety challenges that can undermine these economic benefits if not properly managed.

My Specific Question

Do crime rates rise around popular nightlife districts after midnight? This question addresses a critical gap in understanding the temporal and spatial dynamics of nightlife-related crime. While prior research has established correlations between bar density and crime, most studies compare daytime versus nighttime patterns or focus solely on alcohol-related offenses. My analysis extends beyond these limitations by:

Examining an extended time window (8 PM–8 AM) that captures the full nightlife cycle from early evening activity through post-closing transition
Comparing geographic zones systematically classified by nightlife density rather than relying on administrative or anecdotal definitions
Testing a causal mechanism by examining whether crime drops after bars close at 4 AM, providing stronger evidence that nightlife activity drives crime

Why This Matters

Nightlife districts are economic engines generating jobs, tax revenue, and urban vitality. But concentrated late-night activity creates safety challenges. Understanding the relationship between nightlife density and crime patterns is essential for:

Economic Policy: Business owners need to know if nightlife areas are safe for customers
Public Safety: Police need to allocate resources efficiently
Urban Planning: Zoning decisions affect both economic opportunity and community safety
Equity: Safety concerns can exclude vulnerable populations from nightlife economies

My Approach:

I analyze NYPD complaint data across four dimensions: * Temporal: Extended window (8 PM–8 AM) with nightlife focus on 8 PM–4 AM and post-closing comparison (4 AM–8 AM) * Spatial: Taxi zones classified by nightlife density (High vs. Medium vs. Low) * Crime Type: Violent, Property, Disorder, Drug crimes analyzed separately * COVID Period: Pre-COVID (2019), During-COVID (2020-2021), Post-COVID (2022-2023)

Key Innovation:

By extending analysis to 8 AM, I can test whether crime drops after bars close at 4 AM—providing stronger evidence that crime is driven by nightlife activity during the 8 PM–4 AM window, not just venue density or zone characteristics.

DATASET AND METHODS

Primary Data: NYPD Complaint Data Historic

Source: https://catalog.data.gov/dataset/nypd-complaint-data-historic

Coverage:

Timeline: 2019-2023 (5 full years, all quarters)
Time Window: 8 PM–8 AM (12-hour analysis window)
Records: ~400,000 - 800,000 night-time crimes (estimated)

Secondary Data: Yelp Nightlife Venues

Source: Yelp Fusion API : https://www.yelp.com/developers

Purpose: Define nightlife intensity per taxi zone

Categories: bars, nightlife, lounges, danceclubs, jazzandblues, karaoke

I chose Yelp over alternative data sources (such as NYC Liquor License Authority databases) for several reasons. First, Yelp provides comprehensive coverage beyond just licensed establishments—including music venues and comedy clubs. Second, Yelp data includes venue characteristics like hours of operation that could support future analyses. Third, the Yelp API provides geocoded location data readily compatible with crime data analysis.

Auxiliary Data: NYC Taxi Zones

Source: TLC Taxi Zone Shapefile: https://d37ci6vzurychx.cloudfront.net/misc/taxi_zones.zip

Reference Unit: NYC Zip Code Boundaries
Source: https://data.cityofnewyork.us/Business/Zip-Code-Boundaries/i8iw-xf4u

Show code:

library(tidyverse)
library(knitr)

# Create time period reference table
time_periods <- tribble(
  ~time_window, ~hours, ~purpose,
  "Full Analysis", "8 PM–8 AM", "Complete analysis window",
  "NIGHTLIFE WINDOW", "8 PM–4 AM", "Primary focus: nightlife activity",
  "Early Nightlife", "8 PM–12 AM", "Nightlife ramp-up",
  "After Midnight", "12 AM–4 AM", "CORE FOCUS: Peak/closing",
  "POST-CLOSING (DAY)", "4 AM–8 AM", "Bars closed, daytime transition"
)

# Save it for future use
saveRDS(time_periods, "data/time_periods_reference.rds")
write_csv(time_periods, "data/time_periods_reference.csv")

# Display as formatted table (works in Quarto/RMarkdown)
kable(time_periods, 
      col.names = c("Time Window", "Hours", "Purpose"),
      caption = "Time Period Definitions")

Time Period Definitions
Time Window	Hours	Purpose
Full Analysis	8 PM–8 AM	Complete analysis window
NIGHTLIFE WINDOW	8 PM–4 AM	Primary focus: nightlife activity
Early Nightlife	8 PM–12 AM	Nightlife ramp-up
After Midnight	12 AM–4 AM	CORE FOCUS: Peak/closing
POST-CLOSING (DAY)	4 AM–8 AM	Bars closed, daytime transition

Nightlife Zone Classification

I define nightlife zones (not using pre-existing classifications):

Count Yelp nightlife venues per taxi zone using Yelp Fusion API
Classify zones into three categories based on venue density:

Taxi Zone Classification by Nightlife Density
Nightlife Category	Yelp Venues per Zone	Example Taxi Zones	Typical Venue Types
High Nightlife	≥15 venues	East Village, Williamsburg (North/South), Lower East Side, Astoria	Bars, nightclubs, dance clubs, jazz clubs, lounges, karaoke venues
Medium Nightlife	≥5 venues	Park Slope, Midtown West, Long Island City, Greenpoint	Neighborhood bars, lounges, occasional nightlife spots
Low Nightlife	0-1 venues	Residential Staten Island, Outer Queens, Far uptown residential areas	Minimal or no dedicated nightlife establishments

Venue Categories (from Yelp Fusion API): bars, nightlife, lounges, danceclubs, jazzandblues, karaoke

Rationale: These thresholds balance statistical power (adequate sample sizes in each category) with substantive meaning (capturing real variation in nightlife intensity across NYC neighborhoods). The ≥5 venue threshold for “High Nightlife” identifies true entertainment districts rather than isolated bars in residential areas.

My thresholds (≥15 for High, ≥5 for Medium, 0-1 for Low) were chosen based on the actual distribution of venues across NYC taxi zones—zones with 15+ venues represent true entertainment districts like East Village and Williamsburg, while this classification maximizes the ability to detect gradient effects in crime patterns. My classification differs from my teammates’ because we’re measuring different phenomena: I’m measuring nightlife infrastructure (venue density), while Dolma measures noise impact (complaints), Hari measures transportation demand (rideshare patterns), Chhin measures economic activity (employment), and Apu measures temporal disruption (COVID impacts)—each requiring different thresholds appropriate to their specific research questions.

Step 1: Download NYPD complaints data

Show code

#| code-fold: true
#| code-summary: "Show code:"
#| message: false
#| warning: false

suppressPackageStartupMessages(library(kableExtra))
library(tidyverse)
library(lubridate)
library(knitr)
library(kableExtra)

# Load raw data
nypd_raw <- readRDS("data/raw/nypd_complaints_raw.rds")

# Clean and filter to 8 PM - 8 AM
nypd_night <- nypd_raw %>%
  mutate(
    complaint_date = mdy(CMPLNT_FR_DT),
    complaint_hour = hour(hms::as_hms(CMPLNT_FR_TM)),
    complaint_year = year(complaint_date),
    day_of_week = wday(complaint_date, label = TRUE),
    is_weekend = day_of_week %in% c("Fri", "Sat", "Sun")
  ) %>%
  filter(complaint_year >= 2019, complaint_year <= 2023) %>%
  filter(complaint_hour >= 20 | complaint_hour <= 7) %>%
  filter(!is.na(Latitude), !is.na(Longitude), 
         Latitude != 0, Longitude != 0) %>%
  select(
    complaint_date, complaint_hour, complaint_year,
    day_of_week, is_weekend,
    offense_desc = OFNS_DESC,
    law_category = LAW_CAT_CD,
    borough = BORO_NM,
    precinct = ADDR_PCT_CD,
    latitude = Latitude,
    longitude = Longitude
  ) %>%
  mutate(
    time_period = case_when(
      complaint_hour >= 20 & complaint_hour <= 23 ~ "Early Nightlife (8PM-12AM)",
      complaint_hour >= 0 & complaint_hour <= 3 ~ "After Midnight (12AM-4AM)",
      complaint_hour >= 4 & complaint_hour <= 7 ~ "Post-Closing Day (4AM-8AM)"
    ),
    is_nightlife_window = complaint_hour >= 20 | complaint_hour <= 3,
    covid_period = case_when(
      complaint_date < as.Date("2020-03-01") ~ "Pre-COVID (2019)",
      complaint_date < as.Date("2022-01-01") ~ "During-COVID (2020-2021)",
      TRUE ~ "Post-COVID (2022-2023)"
    )
  )

# Create summary statistics
year_summary <- nypd_night %>%
  count(complaint_year) %>%
  mutate(pct = round(100 * n / sum(n), 1))

covid_summary <- nypd_night %>%
  count(covid_period) %>%
  mutate(pct = round(100 * n / sum(n), 1))

time_summary <- nypd_night %>%
  count(time_period) %>%
  mutate(pct = round(100 * n / sum(n), 1))

# Display tables
kable(year_summary, 
      col.names = c("Year", "Crime Incidents", "% of Total"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r"),
      caption = "Crime Distribution by Year (2019-2023)") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  column_spec(1, bold = TRUE, width = "8em") %>%
  column_spec(2:3, width = "10em")

Crime Distribution by Year (2019-2023)
Year	Crime Incidents	% of Total
2,019	174,606	19.1
2,020	160,773	17.6
2,021	169,824	18.6
2,022	199,048	21.8
2,023	210,462	23.0

Show code

kable(covid_summary, 
      col.names = c("COVID Period", "Crime Incidents", "% of Total"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r"),
      caption = "Crime Distribution by COVID Period") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  column_spec(1, bold = TRUE, width = "12em") %>%
  column_spec(2:3, width = "10em")

Crime Distribution by COVID Period
COVID Period	Crime Incidents	% of Total
During-COVID (2020-2021)	302,775	33.1
Post-COVID (2022-2023)	409,510	44.8
Pre-COVID (2019)	202,428	22.1

Show code

kable(time_summary, 
      col.names = c("Time Period", "Crime Incidents", "% of Total"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r"),
      caption = "Crime Distribution by Time Window") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  column_spec(1, bold = TRUE, width = "14em") %>%
  column_spec(2:3, width = "10em")

Crime Distribution by Time Window
Time Period	Crime Incidents	% of Total
After Midnight (12AM-4AM)	293,427	32.1
Early Nightlife (8PM-12AM)	443,732	48.5
Post-Closing Day (4AM-8AM)	177,554	19.4

Show code

# Save processed data
saveRDS(nypd_night, "data/processed/nypd_night_clean.rds")

#Step 1b: Create 24-Hour Dataset

Show code:

library(tidyverse)
library(sf)
library(lubridate)
library(knitr)
library(kableExtra)

# Load data
nypd_raw <- readRDS("data/raw/nypd_complaints_raw.rds")
taxi_zones <- readRDS("data/raw/taxi_zones.rds") %>% filter(borough != "EWR")
nightlife_density <- readRDS("data/processed/nightlife_density_by_zone.rds")

# Process 24-hour data
nypd_24hour <- nypd_raw %>%
  mutate(
    complaint_date = mdy(CMPLNT_FR_DT),
    complaint_hour = hour(hms::as_hms(CMPLNT_FR_TM)),
    complaint_year = year(complaint_date)
  ) %>%
  filter(complaint_year >= 2019, complaint_year <= 2023) %>%
  filter(!is.na(Latitude), !is.na(Longitude), 
         Latitude != 0, Longitude != 0) %>%
  select(complaint_date, complaint_hour, complaint_year,
         latitude = Latitude, longitude = Longitude)

# Convert to spatial and join
nypd_24hour_sf <- st_as_sf(nypd_24hour, 
                            coords = c("longitude", "latitude"), 
                            crs = 4326, remove = FALSE)

nypd_24hour_zones <- st_join(nypd_24hour_sf, 
                              taxi_zones %>% select(LocationID, zone),
                              join = st_within)

# Handle crimes outside zones
nypd_outside <- nypd_24hour_zones %>% filter(is.na(LocationID))
if (nrow(nypd_outside) > 0) {
  nearest_indices <- st_nearest_feature(nypd_outside, taxi_zones)
  nypd_outside$LocationID <- taxi_zones$LocationID[nearest_indices]
  nypd_outside$zone <- taxi_zones$zone[nearest_indices]
  
  nypd_24hour_zones <- bind_rows(
    nypd_24hour_zones %>% filter(!is.na(LocationID)),
    nypd_outside
  )
}

# Finalize dataset
nypd_24hour_final <- nypd_24hour_zones %>%
  st_drop_geometry() %>%
  as_tibble() %>%
  left_join(nightlife_density %>% 
              select(LocationID, n_nightlife_venues, nightlife_category),
            by = "LocationID") %>%
  mutate(
    n_nightlife_venues = replace_na(n_nightlife_venues, 0),
    nightlife_category = replace_na(as.character(nightlife_category), "Low Nightlife"),
    nightlife_category = factor(nightlife_category,
                                levels = c("Low Nightlife", "Medium Nightlife", "High Nightlife"))
  )

# Summary table
category_dist <- nypd_24hour_final %>%
  count(nightlife_category) %>%
  mutate(pct = round(100 * n / sum(n), 1))

kable(category_dist,
      col.names = c("Nightlife Zone Type", "Total Crimes", "% of Total"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r"),
      caption = "24-Hour Crime Distribution by Nightlife Density (2019-2023)") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  row_spec(which(category_dist$nightlife_category == "High Nightlife"), 
           background = "#FFE6E6") %>%
  column_spec(1, bold = TRUE, width = "12em") %>%
  column_spec(2:3, width = "10em")

24-Hour Crime Distribution by Nightlife Density (2019-2023)
Nightlife Zone Type	Total Crimes	% of Total
Low Nightlife	2,252,804	93.7
Medium Nightlife	144,660	6.0
High Nightlife	7,211	0.3

Show code:

# Save
saveRDS(nypd_24hour_final, "data/processed/nypd_24hour_with_zones.rds")

Step 2: Yelp Data

Show code:

library(tidyverse)
library(httr)
library(jsonlite)
library(knitr)
library(kableExtra)

YELP_API_KEY <- "gyiNfUOC5PXhdIGox2Nub0ySbLCk27y94S9IrPZPKbFpEm4xTMq9tEwreoCcoDxDsncwBUk90PfhH_j8GDutOtyHc0dI1tyUF9OCItF1b40hXdgd0pvMNAAMSZMsaXYx"
CACHE_FILE_CSV <- "data/raw/yelp_nightlife.csv"
CACHE_FILE_RDS <- "data/raw/yelp_nightlife.rds"

if (file.exists(CACHE_FILE_CSV)) {
  # Load from cache
  yelp_nightlife <- read_csv(CACHE_FILE_CSV, show_col_types = FALSE)
  
} else {
  # Download from Yelp API
  nyc_locations <- tribble(
    ~borough, ~latitude, ~longitude,
    "Manhattan", 40.7831, -73.9712,
    "Brooklyn", 40.6782, -73.9442,
    "Queens", 40.7282, -73.7949,
    "Bronx", 40.8448, -73.8648,
    "Staten Island", 40.5795, -74.1502
  )
  
  fetch_yelp_nightlife <- function(lat, lon, borough) {
    response <- GET(
      "https://api.yelp.com/v3/businesses/search",
      add_headers(Authorization = paste("Bearer", YELP_API_KEY)),
      query = list(
        latitude = lat, longitude = lon,
        categories = "bars,nightlife,lounges,danceclubs,jazzandblues,karaoke",
        limit = 50, radius = 10000
      )
    )
    
    if (status_code(response) == 200) {
      data <- content(response, "parsed")
      if (length(data$businesses) > 0) {
        return(map_df(data$businesses, ~tibble(
          id = .x$id, name = .x$name,
          lat = .x$coordinates$latitude,
          lon = .x$coordinates$longitude,
          rating = .x$rating,
          review_count = .x$review_count,
          categories = paste(map_chr(.x$categories, "title"), collapse = ", "),
          borough = borough
        )))
      }
    }
    return(tibble())
  }
  
  yelp_nightlife <- pmap_df(nyc_locations, ~fetch_yelp_nightlife(..2, ..3, ..1)) %>%
    distinct(id, .keep_all = TRUE)
  
  write_csv(yelp_nightlife, CACHE_FILE_CSV)
  saveRDS(yelp_nightlife, CACHE_FILE_RDS)
}

# Borough summary table
borough_summary <- yelp_nightlife %>%
  count(borough, sort = TRUE) %>%
  mutate(pct = round(100 * n / sum(n), 1))

kable(borough_summary,
      col.names = c("Borough", "Venues", "% of Total"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r"),
      caption = "Yelp Nightlife Venues by Borough") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  row_spec(1, background = "#E8F4F8") %>%
  column_spec(1, bold = TRUE, width = "10em") %>%
  column_spec(2:3, width = "8em")

Yelp Nightlife Venues by Borough
Borough	Venues	% of Total
Manhattan	1,221	60.6
Brooklyn	624	31.0
Queens	166	8.2
Bronx	5	0.2

Show code:

### **Note on Staten Island:** Staten Island shows zero nightlife venues in the Yelp API results due to its suburban, residential character with significantly lower bar/nightclub density compared to other boroughs, combined with API radius limitations (10km from borough center) that may miss geographically dispersed establishments. Additionally, Staten Island venues may have lower Yelp registration rates or be categorized as restaurants rather than nightlife establishments. This absence does not meaningfully impact the analysis since Staten Island accounts for only ~5% of NYC's population and an even smaller share of nightlife-related crime, with the study focusing on high-density entertainment districts concentrated in Manhattan, Brooklyn, and Queens.

# Preview top venues
preview <- yelp_nightlife %>%
  arrange(desc(review_count)) %>%
  head(5) %>%
  select(name, borough, rating, review_count, categories)

kable(preview,
      col.names = c("Venue Name", "Borough", "Rating", "Reviews", "Categories"),
      align = c("l", "l", "c", "r", "l"),
      caption = "Top 5 Most Reviewed Nightlife Venues") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  column_spec(1, bold = TRUE, width = "15em") %>%
  column_spec(3, background = "#FFF9E6") %>%
  column_spec(5, width = "20em")

Top 5 Most Reviewed Nightlife Venues
Venue Name	Borough	Rating	Reviews	Categories
Olio e Più	Manhattan	4.5	7079	Pizza, Italian, Cocktail Bars
Buddakan	Manhattan	4.1	4871	Chinese, Bars, Asian Fusion
La Grande Boucherie	Manhattan	4.5	4234	French, Steakhouses, Cocktail Bars
Blend on the Water	Manhattan	3.8	4028	Latin American, Breakfast & Brunch, Lounges
Sea Thai	Brooklyn	3.5	3624	Thai, Cocktail Bars, Seafood

Step 3: Download NYC Taxi Zones

Show code

library(tidyverse)
library(sf)
library(httr)
library(knitr)
library(kableExtra)

# Download taxi zone shapefile from TLC
taxi_url <- "https://d37ci6vzurychx.cloudfront.net/misc/taxi_zones.zip"

# Download to temp file
temp_zip <- tempfile(fileext = ".zip")
temp_dir <- tempdir()
download.file(taxi_url, temp_zip, mode = "wb")

# Unzip
unzip(temp_zip, exdir = temp_dir)

# Read shapefile
shp_file <- list.files(temp_dir, pattern = "\\.shp$", full.names = TRUE)
taxi_zones <- st_read(shp_file[grepl("taxi_zones", shp_file, ignore.case = TRUE)][1], quiet = TRUE)

# Transform to WGS84 and EXCLUDE Newark Airport (New Jersey)
taxi_zones <- taxi_zones %>%
  st_transform(4326) %>%
  filter(borough != "EWR")  # Remove Newark - not in NYC

# Create formatted table
borough_table <- taxi_zones %>%
  st_drop_geometry() %>%
  count(borough) %>%
  mutate(pct = round(100 * n / sum(n), 1))

kable(borough_table, 
      col.names = c("Borough", "Number of Zones", "% of Total"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r"),
      caption = "NYC Taxi Zone Distribution by Borough") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  column_spec(1, bold = TRUE, width = "10em") %>%
  column_spec(2:3, width = "10em")

NYC Taxi Zone Distribution by Borough
Borough	Number of Zones	% of Total
Bronx	43	16.4
Brooklyn	61	23.3
Manhattan	69	26.3
Queens	69	26.3
Staten Island	20	7.6

Show code

# Save for future use
saveRDS(taxi_zones, "data/raw/taxi_zones.rds")
st_write(taxi_zones, "data/raw/taxi_zones.geojson", delete_dsn = TRUE, quiet = TRUE)

Step 3b: Create Taxi Zone → Zip Code Crosswalk

Show code

library(tidyverse)
library(sf)
library(httr)
library(knitr)
library(kableExtra)

# Load taxi zones and EXCLUDE Newark Airport
taxi_zones <- readRDS("data/raw/taxi_zones.rds") %>%
  filter(borough != "EWR")  # Remove Newark - it's in New Jersey

# Download NYC zip codes from NYC Open Data
zip_url <- "https://data.cityofnewyork.us/api/geospatial/i8iw-xf4u?method=export&format=Shapefile"

# Download to temp file
temp_zip <- tempfile(fileext = ".zip")
temp_dir <- tempdir()
response <- GET(zip_url, write_disk(temp_zip, overwrite = TRUE))

if (status_code(response) == 200) {
  # Unzip
  unzip(temp_zip, exdir = temp_dir)
  
  # Find the .shp file
  shp_file <- list.files(temp_dir, pattern = "\\.shp$", full.names = TRUE)
  shp_file <- shp_file[!grepl("taxi", shp_file, ignore.case = TRUE)][1]
  
  # Read shapefile
  zip_codes <- st_read(shp_file, quiet = TRUE)
  
  # Check available columns and rename
  if ("modzcta" %in% names(zip_codes)) {
    zip_codes <- zip_codes %>% rename(ZIPCODE = modzcta)
  } else if ("ZIPCODE" %in% names(zip_codes)) {
    # Already has ZIPCODE
  } else if ("ZCTA5CE10" %in% names(zip_codes)) {
    zip_codes <- zip_codes %>% rename(ZIPCODE = ZCTA5CE10)
  } else if ("GEOID" %in% names(zip_codes)) {
    zip_codes <- zip_codes %>% rename(ZIPCODE = GEOID)
  }
  
  # Ensure WGS84 projection
  zip_codes <- zip_codes %>%
    st_transform(4326) %>%
    select(ZIPCODE, geometry)
  
} else {
  # ALTERNATIVE: Download from US Census
  library(tigris)
  options(tigris_use_cache = TRUE)
  
  zip_codes <- zctas(year = 2010) %>%
    st_transform(4326) %>%
    filter(substr(ZCTA5CE10, 1, 2) %in% c("10", "11")) %>%
    rename(ZIPCODE = ZCTA5CE10) %>%
    select(ZIPCODE, geometry)
}

# Create the crosswalk
taxi_centroids <- taxi_zones %>% st_centroid()

zone_zip_crosswalk <- st_join(taxi_centroids, 
                               zip_codes,
                               join = st_within) %>%
  st_drop_geometry() %>%
  select(LocationID, zone, borough, ZIPCODE) %>%
  as_tibble()

# Handle any taxi zones that didn't match
missing_zips <- zone_zip_crosswalk %>% filter(is.na(ZIPCODE))

if (nrow(missing_zips) > 0) {
  missing_centroids <- taxi_centroids %>%
    filter(LocationID %in% missing_zips$LocationID)
  
  nearest_indices <- st_nearest_feature(missing_centroids, zip_codes)
  
  for (i in 1:nrow(missing_zips)) {
    loc_id <- missing_zips$LocationID[i]
    nearest_zip <- zip_codes$ZIPCODE[nearest_indices[i]]
    
    zone_zip_crosswalk <- zone_zip_crosswalk %>%
      mutate(ZIPCODE = if_else(LocationID == loc_id, nearest_zip, ZIPCODE))
  }
}

# Remove any remaining NAs
zone_zip_crosswalk <- zone_zip_crosswalk %>% filter(!is.na(ZIPCODE))

# Sample mappings table
sample_mappings <- zone_zip_crosswalk %>%
  filter(!is.na(zone)) %>%
  select(LocationID, zone, borough, ZIPCODE) %>%
  arrange(LocationID) %>%
  head(15)

kable(sample_mappings,
      col.names = c("Taxi Zone ID", "Taxi Zone Name", "Borough", "Zip Code"),
      align = c("r", "l", "l", "r"),
      caption = "Sample Taxi Zone to Zip Code Mappings") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  column_spec(1, width = "8em") %>%
  column_spec(2, bold = TRUE, width = "18em") %>%
  column_spec(3:4, width = "10em")

Sample Taxi Zone to Zip Code Mappings
Taxi Zone ID	Taxi Zone Name	Borough	Zip Code
2	Jamaica Bay	Queens	11693
3	Allerton/Pelham Gardens	Bronx	10469
4	Alphabet City	Manhattan	10009
5	Arden Heights	Staten Island	10312
6	Arrochar/Fort Wadsworth	Staten Island	10305
7	Astoria	Queens	11106
8	Astoria Park	Queens	11102
9	Auburndale	Queens	11358
10	Baisley Park	Queens	11434
11	Bath Beach	Brooklyn	11214
12	Battery Park	Manhattan	10004
13	Battery Park City	Manhattan	10280
14	Bay Ridge	Brooklyn	11209
15	Bay Terrace/Fort Totten	Queens	11360
16	Bayside	Queens	11361

Show code

# Nightlife area mappings
nightlife_examples <- zone_zip_crosswalk %>%
  filter(zone %in% c("East Village", "Lower East Side", "Williamsburg (North Side)", 
                     "Williamsburg (South Side)", "Astoria", "Park Slope",
                     "Midtown West", "Lower Manhattan")) %>%
  select(LocationID, zone, borough, ZIPCODE)

if (nrow(nightlife_examples) > 0) {
  kable(nightlife_examples,
        col.names = c("Taxi Zone ID", "Taxi Zone Name", "Borough", "Zip Code"),
        align = c("r", "l", "l", "r"),
        caption = "Example Nightlife Area Mappings") %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                  full_width = FALSE,
                  font_size = 14) %>%
    row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
    column_spec(1, width = "8em") %>%
    column_spec(2, bold = TRUE, width = "18em") %>%
    column_spec(3:4, width = "10em")
}

Example Nightlife Area Mappings
Taxi Zone ID	Taxi Zone Name	Borough	Zip Code
7	Astoria	Queens	11106
79	East Village	Manhattan	10003
148	Lower East Side	Manhattan	10002
181	Park Slope	Brooklyn	11215
255	Williamsburg (North Side)	Brooklyn	11211
256	Williamsburg (South Side)	Brooklyn	11211

Show code

# Save crosswalk files
saveRDS(zone_zip_crosswalk, "data/processed/zone_zip_crosswalk.rds")
write_csv(zone_zip_crosswalk, "data/processed/zone_zip_crosswalk.csv")

Step 4: Classify Crime Types

Show code:

library(tidyverse)
library(knitr)
library(kableExtra)

# Load cleaned data
nypd_night <- readRDS("data/processed/nypd_night_clean.rds")

# Add crime classifications
nypd_night <- nypd_night %>%
  mutate(
    crime_category = case_when(
      str_detect(offense_desc, regex("assault|robbery|rape|murder|homicide|shooting|sex", 
                                     ignore_case = TRUE)) ~ "Violent",
      str_detect(offense_desc, regex("larceny|burglary|theft|stolen|auto", 
                                     ignore_case = TRUE)) ~ "Property",
      str_detect(offense_desc, regex("mischief|harassment|disorderly|intoxicated|dui|weapon|menacing", 
                                     ignore_case = TRUE)) ~ "Disorder",
      str_detect(offense_desc, regex("drug|narcotic|marijuana|controlled", 
                                     ignore_case = TRUE)) ~ "Drug",
      TRUE ~ "Other"
    ),
    is_felony = law_category == "FELONY"
  )

# Crime type distribution
crime_summary <- nypd_night %>%
  count(crime_category, sort = TRUE, name = "total_crimes") %>%
  mutate(pct = round(100 * total_crimes / sum(total_crimes), 1))

kable(crime_summary,
      col.names = c("Crime Category", "Number of Crimes", "% of Total"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r"),
      caption = "Crime Distribution by Type (8PM-8AM, 2019-2023)") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  row_spec(which(crime_summary$crime_category == "Violent"), 
           background = "#FFE6E6") %>%
  column_spec(1, bold = TRUE, width = "12em") %>%
  column_spec(2:3, width = "10em")

Crime Distribution by Type (8PM-8AM, 2019-2023)
Crime Category	Number of Crimes	% of Total
Property	268,111	29.3
Other	261,900	28.6
Violent	234,043	25.6
Disorder	132,975	14.5
Drug	17,684	1.9

Show code:

# Save
saveRDS(nypd_night, "data/processed/nypd_night_classified.rds")

Step 5: Spatial Join - Assign Crimes to Taxi Zones

Show code

library(tidyverse)
library(sf)
library(knitr)
library(kableExtra)

# Load data
nypd_night <- readRDS("data/processed/nypd_night_classified.rds")
taxi_zones <- readRDS("data/raw/taxi_zones.rds")

# Convert NYPD data to spatial
nypd_sf <- nypd_night %>%
  st_as_sf(coords = c("longitude", "latitude"), crs = 4326, remove = FALSE)

# Spatial join
nypd_zones <- st_join(nypd_sf, 
                      taxi_zones %>% select(LocationID, zone), 
                      join = st_within)

# Handle crimes outside zones
nypd_outside <- nypd_zones %>% filter(is.na(LocationID))

if (nrow(nypd_outside) > 0) {
  nearest_indices <- st_nearest_feature(nypd_outside, taxi_zones)
  nypd_outside$LocationID <- taxi_zones$LocationID[nearest_indices]
  nypd_outside$zone <- taxi_zones$zone[nearest_indices]
  
  nypd_zones <- nypd_zones %>%
    filter(!is.na(LocationID)) %>%
    bind_rows(nypd_outside)
}

# Convert to dataframe
nypd_zones_df <- nypd_zones %>%
  st_drop_geometry() %>%
  as_tibble() %>%
  mutate(
    borough = case_when(
      is.na(borough) | borough == "" | borough == "(null)" ~ "Unknown",
      TRUE ~ as.character(borough)
    )
  )

# Borough distribution table
if ("borough" %in% names(nypd_zones_df)) {
  borough_dist <- nypd_zones_df %>%
    count(borough, sort = TRUE) %>%
    mutate(pct = round(100 * n / sum(n), 1))
  
  kable(borough_dist,
        col.names = c("Borough", "Number of Crimes", "% of Total"),
        format.args = list(big.mark = ","),
        align = c("l", "r", "r"),
        caption = "Crime Distribution by Borough (8PM-8AM, 2019-2023)") %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                  full_width = FALSE,
                  font_size = 14) %>%
    row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
    column_spec(1, bold = TRUE, width = "12em") %>%
    column_spec(2:3, width = "10em")
}

Crime Distribution by Borough (8PM-8AM, 2019-2023)
Borough	Number of Crimes	% of Total
BROOKLYN	260,647	28.5
QUEENS	208,347	22.8
MANHATTAN	205,022	22.4
BRONX	202,667	22.2
STATEN ISLAND	37,480	4.1
Unknown	550	0.1

Show code

# Top 10 zones table
top_zones <- nypd_zones_df %>%
  count(LocationID, zone, borough, sort = TRUE, name = "total_crimes") %>%
  head(10)

kable(top_zones,
      col.names = c("Location ID", "Zone Name", "Borough", "Total Crimes"),
      format.args = list(big.mark = ","),
      align = c("r", "l", "l", "r"),
      caption = "Top 10 Taxi Zones by Crime Count (8PM-8AM, 2019-2023)") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  column_spec(1, width = "8em") %>%
  column_spec(2, bold = TRUE, width = "18em") %>%
  column_spec(3:4, width = "10em")

Top 10 Taxi Zones by Crime Count (8PM-8AM, 2019-2023)
Location ID	Zone Name	Borough	Total Crimes
76	East New York	BROOKLYN	18,675
61	Crown Heights North	BROOKLYN	15,328
74	East Harlem North	MANHATTAN	13,086
168	Mott Haven/Port Morris	BRONX	13,000
42	Central Harlem North	MANHATTAN	12,934
35	Brownsville	BROOKLYN	11,592
129	Jackson Heights	QUEENS	11,285
37	Bushwick South	BROOKLYN	10,673
225	Stuyvesant Heights	BROOKLYN	10,117
130	Jamaica	QUEENS	10,049

Show code

# Save both versions
saveRDS(nypd_zones_df, "data/processed/nypd_night_with_zones.rds")
saveRDS(nypd_zones, "data/processed/nypd_night_with_zones_sf.rds")

Step 6: Calculate Nightlife Density by Zone

Show code

library(tidyverse)
library(sf)
library(knitr)
library(kableExtra)

# Load data
yelp_nightlife <- readRDS("data/raw/yelp_nightlife.rds")
taxi_zones <- readRDS("data/raw/taxi_zones.rds")

# Convert Yelp to spatial
yelp_sf <- yelp_nightlife %>%
  st_as_sf(coords = c("lon", "lat"), crs = 4326)

# Spatial join to zones
yelp_zones <- st_join(yelp_sf, taxi_zones, join = st_within)

# Fix borough columns: use taxi zone's borough (more accurate)
if ("borough.y" %in% names(yelp_zones)) {
  yelp_zones <- yelp_zones %>%
    select(-borough.x) %>%
    rename(borough = borough.y)
} else if ("borough.x" %in% names(yelp_zones)) {
  yelp_zones <- yelp_zones %>%
    rename(borough = borough.x)
}

# Calculate density per zone
nightlife_density <- yelp_zones %>%
  st_drop_geometry() %>%
  filter(!is.na(LocationID)) %>%
  group_by(LocationID, zone, borough) %>%
  summarise(
    n_nightlife_venues = n(),
    avg_rating = mean(rating, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    nightlife_category = case_when(
      n_nightlife_venues >= 15 ~ "High Nightlife",
      n_nightlife_venues >= 5 ~ "Medium Nightlife",
      TRUE ~ "Low Nightlife"
    ),
    nightlife_category = factor(nightlife_category, 
                                 levels = c("Low Nightlife", "Medium Nightlife", "High Nightlife"))
  )

# Distribution table
density_summary <- nightlife_density %>%
  count(nightlife_category) %>%
  mutate(pct = round(100 * n / sum(n), 1))

kable(density_summary,
      col.names = c("Nightlife Category", "Number of Zones", "% of Total"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r"),
      caption = "Distribution of Taxi Zones by Nightlife Density") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  row_spec(which(density_summary$nightlife_category == "High Nightlife"), 
           background = "#FFE6E6") %>%
  column_spec(1, bold = TRUE, width = "12em") %>%
  column_spec(2:3, width = "10em")

Distribution of Taxi Zones by Nightlife Density
Nightlife Category	Number of Zones	% of Total
Low Nightlife	70	83.3
Medium Nightlife	13	15.5
High Nightlife	1	1.2

Show code

# Top 10 nightlife zones
top_nightlife <- nightlife_density %>%
  arrange(desc(n_nightlife_venues)) %>%
  head(10) %>%
  mutate(avg_rating = round(avg_rating, 2))

kable(top_nightlife,
      col.names = c("Location ID", "Zone Name", "Borough",
                    "Venues", "Avg Rating", "Category"),
      format.args = list(big.mark = ","),
      align = c("r", "l", "l", "r", "c", "l"),
      caption = "Top 10 Zones by Nightlife Venue Count") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  column_spec(1, width = "8em") %>%
  column_spec(2, bold = TRUE, width = "16em") %>%
  column_spec(3, width = "10em") %>%
  column_spec(4:5, width = "8em") %>%
  column_spec(6, width = "12em")

Top 10 Zones by Nightlife Venue Count
Location ID	Zone Name	Borough	Venues	Avg Rating	Category
239	Upper West Side South	Manhattan	16	4.24	High Nightlife
95	Forest Hills	Queens	12	4.11	Medium Nightlife
92	Flushing	Queens	7	4.34	Medium Nightlife
118	Heartland Village/Todt Hill	Staten Island	7	3.94	Medium Nightlife
171	Murray Hill-Queens	Queens	7	4.36	Medium Nightlife
61	Crown Heights North	Brooklyn	6	4.28	Medium Nightlife
255	Williamsburg (North Side)	Brooklyn	6	4.22	Medium Nightlife
24	Bloomingdale	Manhattan	5	4.44	Medium Nightlife
109	Great Kills	Staten Island	5	3.86	Medium Nightlife
127	Inwood	Manhattan	5	4.12	Medium Nightlife

Show code

# Save
saveRDS(nightlife_density, "data/processed/nightlife_density_by_zone.rds")

Step 7: Create Final Analysis Dataset

Show code

library(tidyverse)
library(knitr)
library(kableExtra)

# Load data
nypd_zones_df <- readRDS("data/processed/nypd_night_with_zones.rds")
nightlife_density <- readRDS("data/processed/nightlife_density_by_zone.rds")

# Merge crime data with nightlife density
nypd_final <- nypd_zones_df %>%
  left_join(
    nightlife_density %>% select(LocationID, n_nightlife_venues, nightlife_category),
    by = "LocationID"
  ) %>%
  mutate(
    n_nightlife_venues = replace_na(n_nightlife_venues, 0),
    nightlife_category = replace_na(as.character(nightlife_category), "Low Nightlife"),
    nightlife_category = factor(nightlife_category, 
                                 levels = c("Low Nightlife", "Medium Nightlife", "High Nightlife"))
  )

# Calculate distributions
nightlife_dist <- nypd_final %>%
  count(nightlife_category) %>%
  mutate(
    category = as.character(nightlife_category),
    pct = round(100 * n / sum(n), 1)
  ) %>%
  select(category, n, pct)

nightlife_subtotal <- tibble(
  category = "Subtotal - Nightlife",
  n = sum(nightlife_dist$n),
  pct = 100.0
)

time_dist <- nypd_final %>%
  count(time_period) %>%
  mutate(
    category = as.character(time_period),
    pct = round(100 * n / sum(n), 1)
  ) %>%
  select(category, n, pct)

time_subtotal <- tibble(
  category = "Subtotal - Time Period",
  n = sum(time_dist$n),
  pct = 100.0
)

crime_dist <- nypd_final %>%
  count(crime_category) %>%
  mutate(
    category = as.character(crime_category),
    pct = round(100 * n / sum(n), 1)
  ) %>%
  select(category, n, pct)

crime_subtotal <- tibble(
  category = "Subtotal - Crime Type",
  n = sum(crime_dist$n),
  pct = 100.0
)

grand_total <- tibble(
  category = "GRAND TOTAL",
  n = nrow(nypd_final),
  pct = 100.0
)

# Combine everything
all_data <- bind_rows(
  nightlife_dist,
  nightlife_subtotal,
  time_dist,
  time_subtotal,
  crime_dist,
  crime_subtotal,
  grand_total
)

# Determine colors
data_rows_idx <- c(1:3, 5:7, 9:13)
data_values <- all_data$n[data_rows_idx]
high_val <- quantile(data_values, 0.67)
low_val <- quantile(data_values, 0.33)

colors <- character(nrow(all_data))
for (i in 1:nrow(all_data)) {
  if (i %in% data_rows_idx) {
    if (all_data$n[i] >= high_val) {
      colors[i] <- "#ffcccc"
    } else if (all_data$n[i] >= low_val) {
      colors[i] <- "#ffffcc"
    } else {
      colors[i] <- "#ccffcc"
    }
  } else if (i == 15) {
    colors[i] <- "#b0b0b0"
  } else {
    colors[i] <- "#d3d3d3"
  }
}

# Create table
result_table <- all_data %>%
  kable(
    col.names = c("Category", "Number of Crimes", "%"),
    align = c("l", "r", "r"),
    format.args = list(big.mark = ","),
    format = "html",
    escape = FALSE,
    caption = "Consolidated Crime Summary by Nightlife Level, Time Period, and Crime Type"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  column_spec(2, background = colors, bold = TRUE, color = "black") %>%
  column_spec(3, background = colors, color = "black") %>%
  pack_rows("Nightlife Level", 1, 4, label_row_css = "background-color: #2C3E50; color: white; font-weight: bold;") %>%
  pack_rows("Time Period", 5, 8, label_row_css = "background-color: #2C3E50; color: white; font-weight: bold;") %>%
  pack_rows("Crime Type", 9, 14, label_row_css = "background-color: #2C3E50; color: white; font-weight: bold;") %>%
  row_spec(c(4, 8, 14), bold = TRUE, italic = TRUE, background = "#d3d3d3") %>%
  row_spec(15, bold = TRUE, font_size = 16, background = "#b0b0b0") %>%
  footnote(general = "Colors: Green = Low volume | Yellow = Medium volume | Red = High volume. Percentages within each section sum to 100%",
           general_title = "Note:")

result_table

Consolidated Crime Summary by Nightlife Level, Time Period, and Crime Type
Category	Number of Crimes	%
Nightlife Level
Low Nightlife	858,140	93.8
Medium Nightlife	54,104	5.9
High Nightlife	2,469	0.3
Subtotal - Nightlife	914,713	100.0
Time Period
After Midnight (12AM-4AM)	293,427	32.1
Early Nightlife (8PM-12AM)	443,732	48.5
Post-Closing Day (4AM-8AM)	177,554	19.4
Subtotal - Time Period	914,713	100.0
Crime Type
Disorder	132,975	14.5
Drug	17,684	1.9
Other	261,900	28.6
Property	268,111	29.3
Violent	234,043	25.6
Subtotal - Crime Type	914,713	100.0
GRAND TOTAL	914,713	100.0
Note:
Colors: Green = Low volume \| Yellow = Medium volume \| Red = High volume. Percentages within each section sum to 100%

Show code

# Save
saveRDS(nypd_final, "data/processed/nypd_analysis_final.rds")

PHASE 2: Statistical Analysis and Visualization

Step 1: Summary Statistics & Key Findings

Show code

library(tidyverse)
library(knitr)
library(kableExtra)

# Load final dataset
nypd_final <- readRDS("data/processed/nypd_analysis_final.rds")

# After-midnight spike analysis
spike_stats <- nypd_final %>%
  filter(time_period %in% c("Early Nightlife (8PM-12AM)", 
                            "After Midnight (12AM-4AM)")) %>%
  group_by(nightlife_category, time_period) %>%
  summarise(total_crimes = n(), .groups = "drop") %>%
  pivot_wider(names_from = time_period, values_from = total_crimes) %>%
  mutate(
    early_rate = `Early Nightlife (8PM-12AM)` / 4,
    midnight_rate = `After Midnight (12AM-4AM)` / 4,
    spike_pct = round(100 * (midnight_rate / early_rate - 1), 1)
  )

kable(spike_stats %>% select(nightlife_category, `Early Nightlife (8PM-12AM)`, 
                              `After Midnight (12AM-4AM)`, spike_pct),
      col.names = c("Nightlife Level", "Early (8PM-12AM)", "After Midnight (12AM-4AM)", "Spike %"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r", "r"),
      caption = "After-Midnight Crime Spike by Nightlife Density") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  row_spec(which(spike_stats$nightlife_category == "High Nightlife"), 
           background = "#FFE6E6") %>%
  column_spec(1, bold = TRUE, width = "12em") %>%
  column_spec(2:4, width = "12em") %>%
  column_spec(4, bold = TRUE, color = "red")

After-Midnight Crime Spike by Nightlife Density
Nightlife Level	Early (8PM-12AM)	After Midnight (12AM-4AM)	Spike %
Low Nightlife	416,687	274,636	-34.1
Medium Nightlife	25,910	17,862	-31.1
High Nightlife	1,135	929	-18.1

Show code

# Save
write_csv(spike_stats, "output/tables/after_midnight_spike.csv")

VISUALIZATION 1: Hourly Crime Patterns (Full 24-Hour Context)

Show code:

library(tidyverse)

# Load 24-hour dataset WITH zones and nightlife categories
nypd_24hour <- readRDS("data/processed/nypd_24hour_with_zones.rds")


# Aggregate by hour and nightlife category
hourly_patterns_24 <- nypd_24hour %>%
  group_by(complaint_hour, nightlife_category) %>%
  summarise(total_crimes = n(), .groups = "drop")

# Create the plot
viz1 <- ggplot(hourly_patterns_24, 
               aes(x = complaint_hour, y = total_crimes, 
                   color = nightlife_category, group = nightlife_category)) +
  # Shade the analysis window (8PM-8AM)
  annotate("rect", xmin = 20, xmax = 24, ymin = 0, ymax = Inf,
           fill = "yellow", alpha = 0.15) +
  annotate("rect", xmin = 0, xmax = 8, ymin = 0, ymax = Inf,
           fill = "yellow", alpha = 0.15) +
# Add text label for daytime
annotate("text", x = 14, y = max(hourly_patterns_24$total_crimes) * 0.95,
         label = "DAYTIME\n", 
         color = "gray50", size = 5, fontface = "italic") +
  # Main lines
  geom_line(size = 1.5) +
  geom_point(size = 2.5) +
  # Vertical reference lines
  geom_vline(xintercept = 20, linetype = "dashed", color = "darkgreen", size = 1) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "red", size = 1) +
  geom_vline(xintercept = 4, linetype = "dashed", color = "blue", size = 1) +
  geom_vline(xintercept = 8, linetype = "dashed", color = "darkgreen", size = 1) +
  # Labels for key times
  annotate("text", x = 20.3, y = max(hourly_patterns_24$total_crimes) * 0.88, 
           label = "8PM\n(Analysis\nStarts)", color = "darkgreen", hjust = 0, size = 3.5, fontface = "bold") +
  annotate("text", x = 0.3, y = max(hourly_patterns_24$total_crimes) * 0.88, 
           label = "Midnight", color = "red", hjust = 0, size = 3.5, fontface = "bold") +
  annotate("text", x = 4.3, y = max(hourly_patterns_24$total_crimes) * 0.88, 
           label = "4AM\n(Bars Close)", color = "blue", hjust = 0, size = 3.5, fontface = "bold") +
  annotate("text", x = 8.3, y = max(hourly_patterns_24$total_crimes) * 0.88, 
           label = "8AM\n(Analysis\nEnds)", color = "darkgreen", hjust = 0, size = 3.5, fontface = "bold") +
  # Color scheme
  scale_color_manual(values = c("Low Nightlife" = "#2C7BB6", 
                                 "Medium Nightlife" = "#FDB863", 
                                 "High Nightlife" = "#D7191C")) +
  # X-axis - all 24 hours
  scale_x_continuous(
    breaks = seq(0, 23, 2),
    labels = c("12AM", "2AM", "4AM", "6AM", "8AM", "10AM",
               "12PM", "2PM", "4PM", "6PM", "8PM", "10PM")
  ) +
  # Y-axis
  scale_y_continuous(labels = scales::comma) +
  # Labels
  labs(
    title = "24-Hour Crime Patterns: Nightlife Window (8PM-8AM) vs. Daytime Baseline",
    subtitle = "Yellow shading = Analysis focus (8PM-8AM) | All zones assigned via lat/lon spatial join | NYC 2019-2023",
    x = "Hour of Day",
    y = "Total Crime Incidents",
    color = "Nightlife Density" ) +
 
   # Theme
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 17),
    plot.subtitle = element_text(size = 10, color = "gray40"),
    legend.position = "bottom",
    panel.grid.minor = element_blank(),
    plot.caption = element_text(size = 9, color = "gray50", hjust = 0),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

# Display
print(viz1)

Show code:

# Save
ggsave("output/visualizations/viz1_hourly_crime_patterns_24hr.png", 
       viz1, width = 14, height = 8, dpi = 300)

VISUALIZATION 2: Interactive Crime Intensity Heatmap

Show code

library(tidyverse)
library(plotly)
library(knitr)
library(kableExtra)

# Load 24-hour dataset
nypd_24hour <- readRDS("data/processed/nypd_24hour_with_zones.rds")

# Create heatmap data with three time windows
heatmap_data <- nypd_24hour %>%
  mutate(
    time_window = case_when(
      complaint_hour >= 8 & complaint_hour <= 19 ~ "Daytime 8AM-8PM",
      complaint_hour >= 20 | complaint_hour <= 3 ~ "Nightlife 8PM-4AM",
      complaint_hour >= 4 & complaint_hour <= 7 ~ "Post-Closing 4AM-8AM"
    ),
    time_window = factor(time_window,
                        levels = c("Daytime 8AM-8PM",
                                  "Nightlife 8PM-4AM",
                                  "Post-Closing 4AM-8AM"))
  ) %>%
  filter(!is.na(time_window)) %>%
  group_by(time_window, nightlife_category) %>%
  summarise(total_crimes = n(), .groups = "drop") %>%
  mutate(
    hours_in_window = case_when(
      time_window == "Daytime 8AM-8PM" ~ 12,
      time_window == "Nightlife 8PM-4AM" ~ 8,
      time_window == "Post-Closing 4AM-8AM" ~ 4
    ),
    crimes_per_hour = round(total_crimes / hours_in_window, 0)
  )

# Save for potential reuse
dir.create("data/processed", showWarnings = FALSE, recursive = TRUE)
saveRDS(heatmap_data, "data/processed/heatmap_data.rds")

# Create matrix for heatmap visualization
heatmap_matrix <- heatmap_data %>%
  select(time_window, nightlife_category, crimes_per_hour) %>%
  pivot_wider(names_from = nightlife_category, values_from = crimes_per_hour) %>%
  column_to_rownames("time_window") %>%
  as.matrix()

# Create hover text
hover_text <- heatmap_data %>%
  mutate(
    text = paste0(
      "<b>", time_window, "</b><br>",
      "<b>", nightlife_category, "</b><br><br>",
      "Crimes per Hour: <b>", format(crimes_per_hour, big.mark = ","), "</b><br>",
      "Total Crimes: ", format(total_crimes, big.mark = ","), "<br>",
      "Window Duration: ", hours_in_window, " hours"
    )
  ) %>%
  select(time_window, nightlife_category, text) %>%
  pivot_wider(names_from = nightlife_category, values_from = text) %>%
  column_to_rownames("time_window") %>%
  as.matrix()

# Create interactive plotly heatmap
viz2 <- plot_ly(
  z = heatmap_matrix,
  x = colnames(heatmap_matrix),
  y = rownames(heatmap_matrix),
  type = "heatmap",
  text = hover_text,
  hovertemplate = "%{text}<extra></extra>",
  colorscale = list(
    c(0, "rgb(255, 237, 160)"),
    c(0.5, "rgb(254, 178, 76)"),
    c(1, "rgb(240, 59, 32)")
  ),
  colorbar = list(
    title = list(text = "Crimes<br>per Hour"),
    len = 0.7,
    thickness = 20
  )
) %>%
  add_annotations(
    x = rep(colnames(heatmap_matrix), each = nrow(heatmap_matrix)),
    y = rep(rownames(heatmap_matrix), times = ncol(heatmap_matrix)),
    text = format(as.vector(t(heatmap_matrix)), big.mark = ","),
    font = list(size = 16, family = "Arial Black", color = "white"),
    showarrow = FALSE
  ) %>%
  layout(
    title = list(
      text = "<b>When Is NYC Most Dangerous? Crime Rates Across Different Times of Day</b><br><sub>Hover for details | Darker red = More crimes per hour | NYC 2019-2023</sub>",
      x = 0.5
    ),
    xaxis = list(title = "<b>Nightlife Zone Category</b>", side = "bottom"),
    yaxis = list(title = "", autorange = "reversed"),
    margin = list(l = 120, r = 150, t = 100, b = 80)
  )

viz2

Show code

# Save visualization
dir.create("output/visualizations", showWarnings = FALSE, recursive = TRUE)
htmlwidgets::saveWidget(viz2, 
                        file = "output/visualizations/viz2_crime_heatmap_interactive.html",
                        selfcontained = TRUE)

# CREATE SUMMARY TABLE
summary_enhanced <- heatmap_data %>%
  arrange(nightlife_category, time_window) %>%
  mutate(
    intensity_level = case_when(
      crimes_per_hour >= quantile(crimes_per_hour, 0.75) ~ "Very High",
      crimes_per_hour >= quantile(crimes_per_hour, 0.50) ~ "High",
      crimes_per_hour >= quantile(crimes_per_hour, 0.25) ~ "Medium",
      TRUE ~ "Low"
    ),
    intensity_level = factor(intensity_level, levels = c("Low", "Medium", "High", "Very High"))
  )

# Simple clean table with legend
summary_enhanced %>%
  mutate(
    Zone = as.character(nightlife_category),
    Period = as.character(time_window),
    Crimes = format(total_crimes, big.mark = ","),
    Hours = hours_in_window,
    Rate = format(crimes_per_hour, big.mark = ","),
    Risk = as.character(intensity_level)
  ) %>%
  select(Zone, Period, Crimes, Hours, Rate, Risk) %>%
  kable(caption = "Crime Intensity by Zone and Time (NYC 2019-2023)",
        align = c("l", "l", "r", "c", "r", "c")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), 
                full_width = TRUE,
                font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  row_spec(which(summary_enhanced$intensity_level == "Very High"),
           background = "#E74C3C", color = "white", bold = TRUE) %>%
  row_spec(which(summary_enhanced$intensity_level == "High"),
           background = "#E67E22", color = "white", bold = TRUE) %>%
  row_spec(which(summary_enhanced$intensity_level == "Medium"),
           background = "#F39C12", color = "black") %>%
  row_spec(which(summary_enhanced$intensity_level == "Low"),
           background = "#27AE60", color = "white", bold = TRUE) %>%
  pack_rows("Low Nightlife Zones", 1, 3, 
            label_row_css = "background-color: #34495E; color: white; font-weight: bold;") %>%
  pack_rows("Medium Nightlife Zones", 4, 6,
            label_row_css = "background-color: #34495E; color: white; font-weight: bold;") %>%
  pack_rows("High Nightlife Zones", 7, 9,
            label_row_css = "background-color: #34495E; color: white; font-weight: bold;") %>%
  footnote(
    general = paste0(
      "<span style='background-color: #27AE60; color: white; padding: 2px 8px; border-radius: 3px;'><b>Low</b></span> ",
      "<span style='background-color: #F39C12; color: black; padding: 2px 8px; border-radius: 3px;'><b>Medium</b></span> ",
      "<span style='background-color: #E67E22; color: white; padding: 2px 8px; border-radius: 3px;'><b>High</b></span> ",
      "<span style='background-color: #E74C3C; color: white; padding: 2px 8px; border-radius: 3px;'><b>Very High</b></span> ",
      "| Risk levels: Low = Bottom 25% | Medium = 25-50% | High = 50-75% | Very High = Top 25%"
    ),
    general_title = "Legend:",
    escape = FALSE
  )

Crime Intensity by Zone and Time (NYC 2019-2023)
Zone	Period	Crimes	Hours	Rate	Risk
Low Nightlife Zones
Low Nightlife	Daytime 8AM-8PM	1,394,664	12	116,222	Very High
Low Nightlife	Nightlife 8PM-4AM	691,323	8	86,415	Very High
Low Nightlife	Post-Closing 4AM-8AM	166,817	4	41,704	Very High
Medium Nightlife Zones
Medium Nightlife	Daytime 8AM-8PM	90,556	12	7,546	High
Medium Nightlife	Nightlife 8PM-4AM	43,772	8	5,472	High
Medium Nightlife	Post-Closing 4AM-8AM	10,332	4	2,583	Medium
High Nightlife Zones
High Nightlife	Daytime 8AM-8PM	4,742	12	395	Medium
High Nightlife	Nightlife 8PM-4AM	2,064	8	258	Low
High Nightlife	Post-Closing 4AM-8AM	405	4	101	Low
Legend:
Low Medium High Very High \| Risk levels: Low = Bottom 25% \| Medium = 25-50% \| High = 50-75% \| Very High = Top 25%

VISUALIZATION 3: Interactive Geospatial Map - Crime & Nightlife Density by Zone

Show code:

library(tidyverse)
library(sf)
library(leaflet)
library(htmlwidgets)


# Load data
nypd_final <- readRDS("data/processed/nypd_analysis_final.rds")
taxi_zones <- readRDS("data/raw/taxi_zones.rds")

# Aggregate by zone with spatial data
zone_summary <- nypd_final %>%
  group_by(LocationID, zone, borough, nightlife_category, n_nightlife_venues) %>%
  summarise(
    total_crimes = n(),
    violent_crimes = sum(crime_category == "Violent"),
    property_crimes = sum(crime_category == "Property"),
    disorder_crimes = sum(crime_category == "Disorder"),
    pct_after_midnight = 100 * mean(time_period == "After Midnight (12AM-4AM)"),
    .groups = "drop"
  ) %>%
  mutate(
    n_nightlife_venues = replace_na(n_nightlife_venues, 0)
  ) %>%
  # Join to spatial data
  left_join(taxi_zones %>% select(LocationID, geometry), by = "LocationID") %>%
  st_as_sf() %>%
  # Transform to WGS84 for Leaflet
  st_transform(4326) %>%
  # Calculate centroids for bubble placement
  mutate(
    centroid = st_centroid(geometry),
    lon = st_coordinates(centroid)[,1],
    lat = st_coordinates(centroid)[,2]
  )

# CRITICAL FIX: Force factor levels in correct order
zone_summary <- zone_summary %>%
  mutate(
    nightlife_category = factor(
      as.character(nightlife_category),
      levels = c("Low Nightlife", "Medium Nightlife", "High Nightlife"),
      ordered = TRUE
    )
  )



# Create color palette - ORDER MATCHES FACTOR LEVELS
color_pal <- colorFactor(
  palette = c("#2C7BB6", "#FDB863", "#D7191C"),  # Blue, Orange, Red
  levels = c("Low Nightlife", "Medium Nightlife", "High Nightlife")
)

# Create popup text
zone_summary <- zone_summary %>%
  mutate(
    popup_text = paste0(
      "<strong style='font-size:14px;'>", zone, "</strong><br>",
      "<strong>Borough:</strong> ", borough, "<br>",
      "<strong>Nightlife Category:</strong> ", nightlife_category, "<br>",
      "<strong>Nightlife Venues:</strong> ", n_nightlife_venues, "<br>",
      "<hr style='margin:5px 0;'>",
      "<strong>Total Crimes (8PM-8AM):</strong> ", format(total_crimes, big.mark = ","), "<br>",
      "<strong>Violent:</strong> ", format(violent_crimes, big.mark = ","), 
      " (", round(100*violent_crimes/total_crimes, 1), "%)<br>",
      "<strong>Property:</strong> ", format(property_crimes, big.mark = ","),
      " (", round(100*property_crimes/total_crimes, 1), "%)<br>",
      "<strong>Disorder:</strong> ", format(disorder_crimes, big.mark = ","),
      " (", round(100*disorder_crimes/total_crimes, 1), "%)<br>",
      "<strong>After-Midnight (12AM-4AM):</strong> ", round(pct_after_midnight, 1), "%"
    )
  )

# Create interactive map
viz3_map <- leaflet(zone_summary) %>%
  # Base map
  addProviderTiles(providers$CartoDB.Positron) %>%
  # Taxi zone polygons (light background)
  addPolygons(
    fillColor = ~color_pal(nightlife_category),
    fillOpacity = 0.2,
    color = "white",
    weight = 1,
    opacity = 0.5,
    highlightOptions = highlightOptions(
      weight = 2,
      color = "#666",
      fillOpacity = 0.4,
      bringToFront = TRUE
    ),
    label = ~zone,
    group = "Zone Boundaries"
  ) %>%
  # Crime bubble markers
  addCircleMarkers(
    lng = ~lon,
    lat = ~lat,
    radius = ~sqrt(total_crimes) / 15,
    fillColor = ~color_pal(nightlife_category),
    fillOpacity = 0.7,
    color = "white",
    weight = 2,
    popup = ~popup_text,
    label = ~paste0(zone, ": ", format(total_crimes, big.mark = ","), " crimes"),
    labelOptions = labelOptions(
      style = list("font-weight" = "normal", padding = "3px 8px"),
      textsize = "12px",
      direction = "auto"
    ),
    group = "Crime Bubbles"
  ) %>%
  # Legend - REVERSED for correct Leaflet display
  addLegend(
    position = "bottomright",
    colors = c("#D7191C", "#FDB863", "#2C7BB6"),  # REVERSED: Red, Orange, Blue
    labels = c("High Nightlife (≥5 venues)", 
               "Medium Nightlife (2-4 venues)", 
               "Low Nightlife (0-1 venues)"),    # REVERSED: High, Medium, Low
    title = "Nightlife Density",
    opacity = 0.8
  ) %>%
  # Add title control
  addControl(
    html = "<div style='background:white; padding:10px; border-radius:5px; box-shadow:0 0 15px rgba(0,0,0,0.2);'>
            <h4 style='margin:0; font-weight:bold;'>NYC Nightlife Crime Map (8PM-8AM, 2019-2023)</h4>
            <p style='margin:5px 0 0 0; font-size:12px; color:#666;'>
            Bubble size = Total crimes | Click bubbles for details
            </p></div>",
    position = "topright"
  ) %>%
  # Layer controls - MUST HAVE %>% BEFORE THIS
  addLayersControl(
    overlayGroups = c("Zone Boundaries", "Crime Bubbles"),
    options = layersControlOptions(collapsed = FALSE)
  ) %>%
  # Set initial view (NYC center)
  setView(lng = -73.935242, lat = 40.730610, zoom = 11)

# Display map
viz3_map

Show code:

# Save as HTML
saveWidget(viz3_map, 
           file = "output/visualizations/viz3_interactive_crime_map.html",
           selfcontained = TRUE)

VISUALIZATION 4: Crime Distribution by Borough

Show code:

library(tidyverse)
library(scales)

cat("Creating professional crime distribution visualization...\n")

Creating professional crime distribution visualization...

Show code:

# Load data
nypd_final <- readRDS("data/processed/nypd_analysis_final.rds")

# Define McKinsey color palette FIRST
mckinsey_colors <- c(
  "Violent" = "#00205B",
  "Property" = "#0072CE",
  "Disorder" = "#41B6E6",
  "Other" = "#8DC8E8",
  "Drug" = "#C6DCEA"
)

# Prepare data
crime_summary <- nypd_final %>%
  count(borough, crime_category) %>%
  group_by(borough) %>%
  mutate(total = sum(n)) %>%
  ungroup()

# Create visualization
ggplot(crime_summary, aes(x = reorder(borough, -total), y = n, fill = crime_category)) +
  geom_col(width = 0.7, color = "white", size = 0.5) +
  scale_fill_manual(values = mckinsey_colors) +
  scale_y_continuous(labels = comma, expand = c(0, 0)) +
  labs(
    title = "Crime Distribution by Borough and Type",
    subtitle = "NYC Nighttime Crime (8PM-8AM), 2019-2023",
    x = NULL,
    y = "Number of Crimes",
    fill = "Crime Type",
    caption = "Source: NYPD Complaint Data"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 16, hjust = 0),
    plot.subtitle = element_text(color = "gray40", size = 11, hjust = 0),
    plot.caption = element_text(color = "gray50", size = 9, hjust = 1),
    axis.text = element_text(color = "gray20"),
    axis.title.y = element_text(face = "bold", color = "gray20"),
    legend.position = "right",
    legend.title = element_text(face = "bold", size = 10),
    panel.grid.major.x = element_blank(),
    panel.grid.minor = element_blank(),
    plot.margin = margin(20, 20, 20, 20)
  )

VISUALIZATION 5: Stacked Area Chart - Crime Type Evolution Through the Night

Show code:

library(tidyverse)

cat("Creating stacked area chart visualization...\n")

Creating stacked area chart visualization...

Show code:

# Load data
nypd_final <- readRDS("data/processed/nypd_analysis_final.rds")

# Focus on high-nightlife zones and major crime types
crime_evolution <- nypd_final %>%
  filter(nightlife_category == "High Nightlife") %>%
  filter(crime_category %in% c("Violent", "Property", "Disorder")) %>%
  group_by(complaint_hour, crime_category) %>%
  summarise(n_crimes = n(), .groups = "drop") %>%
  # Calculate percentages
  group_by(complaint_hour) %>%
  mutate(
    total_hour = sum(n_crimes),
    pct = 100 * n_crimes / total_hour
  ) %>%
  ungroup()

# Create plot
viz5 <- ggplot(crime_evolution, 
               aes(x = complaint_hour, y = n_crimes, fill = crime_category)) +
  # Stacked areas
  geom_area(alpha = 0.8, color = "white", size = 0.5) +
  # Reference lines
  geom_vline(xintercept = 0, linetype = "dashed", color = "white", size = 1) +
  geom_vline(xintercept = 4, linetype = "dashed", color = "white", size = 1) +
  # Annotations
  annotate("text", x = 0.5, y = max(crime_evolution %>% 
                                     group_by(complaint_hour) %>% 
                                     summarise(total = sum(n_crimes)) %>% 
                                     pull(total)) * 0.95,
           label = "Midnight", color = "white", hjust = 0, size = 4, fontface = "bold") +
  annotate("text", x = 4.5, y = max(crime_evolution %>% 
                                     group_by(complaint_hour) %>% 
                                     summarise(total = sum(n_crimes)) %>% 
                                     pull(total)) * 0.95,
           label = "Bars Close", color = "white", hjust = 0, size = 4, fontface = "bold") +
  # Color scheme
  scale_fill_manual(
    values = c("Violent" = "#D7191C",
               "Property" = "#FDAE61", 
               "Disorder" = "#ABD9E9"),
    labels = c("Violent Crimes", "Property Crimes", "Disorder/Public Nuisance")
  ) +
  # X-axis
  scale_x_continuous(
    breaks = c(20, 21, 22, 23, 0, 1, 2, 3, 4, 5, 6, 7),
    labels = c("8PM", "9PM", "10PM", "11PM", 
               "12AM", "1AM", "2AM", "3AM", "4AM", "5AM", "6AM", "7AM")
  ) +
  # Y-axis
  scale_y_continuous(labels = scales::comma) +
  # Labels
  labs(
    title = "Crime Type Composition Throughout the Night (High-Nightlife Zones Only)",
    subtitle = "Stacked area shows total crimes AND relative composition | NYC 2019-2023",
    x = "Hour of Day",
    y = "Number of Crime Incidents",
    fill = "Crime Type",
    caption = "High-nightlife zones defined as ≥5 nightlife venues per taxi zone"
  ) +
  # Theme
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 17),
    plot.subtitle = element_text(size = 11, color = "gray40"),
    legend.position = "bottom",
    panel.grid.minor = element_blank(),
    plot.caption = element_text(size = 9, color = "gray50", hjust = 0),
    axis.text.x = element_text(angle = 45, hjust = 1),
    panel.background = element_rect(fill = "gray20"),
    plot.background = element_rect(fill = "gray20"),
    text = element_text(color = "white"),
    axis.text = element_text(color = "white"),
    panel.grid.major = element_line(color = "gray40")
  )

# Display
print(viz5)

Show code:

# Save
ggsave("output/visualizations/viz5_stacked_area_crime_types.png", 
       viz5, width = 14, height = 8, dpi = 300)

VISUALIZATION 6: Borough Comparison of Spike & Drop Patterns

Show code:

library(tidyverse)
library(scales)

nypd_final <- readRDS("data/processed/nypd_analysis_final.rds")

# Calculate midnight spike
midnight_spike <- nypd_final %>%
  filter(time_period %in% c("Early Nightlife (8PM-12AM)", 
                            "After Midnight (12AM-4AM)")) %>%
  group_by(borough, time_period) %>%
  summarise(crimes = n(), .groups = "drop") %>%
  pivot_wider(names_from = time_period, values_from = crimes) %>%
  mutate(
    early_rate = `Early Nightlife (8PM-12AM)` / 4,
    midnight_rate = `After Midnight (12AM-4AM)` / 4,
    pct_change = 100 * (midnight_rate / early_rate - 1)
  ) %>%
  filter(!is.na(borough), borough != "")

ggplot(midnight_spike, aes(x = pct_change, y = reorder(borough, pct_change))) +
  geom_col(fill = "#E74C3C", alpha = 0.8) +
  geom_vline(xintercept = 0, color = "black", linewidth = 1) +
  geom_text(aes(label = paste0("+", round(pct_change, 1), "%")),
            hjust = -0.1, fontface = "bold", size = 4) +
  labs(
    title = "Crime INCREASES After Midnight in All Boroughs",
    subtitle = "% change in crime rate: After Midnight (12AM-4AM) vs Early Evening (8PM-12AM)",
    x = "Percentage Increase in Crime Rate",
    y = "",
    caption = "Positive values = Crime went UP after midnight"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 16, color = "#E74C3C"),
    axis.text.y = element_text(face = "bold", size = 12),
    panel.grid.minor = element_blank()
  )

VISUALIZATION 7: Interactive Geospatial Map - Crime & Nightlife Density by Zone

Show code:

library(tidyverse)
library(scales)

# Calculate post-closing drop
closing_drop <- nypd_final %>%
  filter(time_period %in% c("After Midnight (12AM-4AM)",
                            "Post-Closing Day (4AM-8AM)")) %>%
  group_by(borough, time_period) %>%
  summarise(crimes = n(), .groups = "drop") %>%
  pivot_wider(names_from = time_period, values_from = crimes) %>%
  mutate(
    midnight_rate = `After Midnight (12AM-4AM)` / 4,
    morning_rate = `Post-Closing Day (4AM-8AM)` / 4,
    pct_change = 100 * (morning_rate / midnight_rate - 1)
  ) %>%
  filter(!is.na(borough), borough != "")

ggplot(closing_drop, aes(x = pct_change, y = reorder(borough, -pct_change))) +
  geom_col(fill = "#2C7BB6", alpha = 0.8) +
  geom_vline(xintercept = 0, color = "black", linewidth = 1) +
  geom_text(aes(label = paste0(round(pct_change, 1), "%")),
            hjust = 1.1, fontface = "bold", size = 4, color = "white") +
  labs(
    title = "Crime DECREASES After Bars Close in All Boroughs",
    subtitle = "% change in crime rate: Early Morning (4AM-8AM) vs Late Night (12AM-4AM)",
    x = "Percentage Decrease in Crime Rate",
    y = "",
    caption = "Negative values = Crime went DOWN after 4AM"
  ) +
  scale_x_continuous(labels = function(x) paste0(x, "%")) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 16, color = "#2C7BB6"),
    axis.text.y = element_text(face = "bold", size = 12),
    panel.grid.minor = element_blank()
  )

VISUALIZATION 8: Crime-Per-Venue Ratio Across NYC’s Top Nightlife Districts

Show code:

library(tidyverse)

nypd_final <- readRDS("data/processed/nypd_analysis_final.rds")
nightlife_density <- readRDS("data/processed/nightlife_density_by_zone.rds")

paradox_data <- nypd_final %>%
  filter(time_period == "After Midnight (12AM-4AM)") %>%
  group_by(zone) %>%
  summarise(total_crimes = n(), .groups = "drop") %>%
  left_join(nightlife_density %>% select(zone, n_nightlife_venues), by = "zone") %>%
  filter(!is.na(n_nightlife_venues), n_nightlife_venues > 0) %>%
  mutate(crime_per_venue = total_crimes / n_nightlife_venues) %>%
  arrange(desc(n_nightlife_venues)) %>%
  head(30)

sweet_spot <- paradox_data %>%
  filter(n_nightlife_venues >= quantile(n_nightlife_venues, 0.6),
         crime_per_venue <= quantile(crime_per_venue, 0.4))

sweet_spot_count <- nrow(sweet_spot)
venue_range <- paste0(min(paradox_data$n_nightlife_venues), "-", 
                      max(paradox_data$n_nightlife_venues))

ggplot(paradox_data, aes(x = reorder(zone, n_nightlife_venues))) +
  # FIRST: Draw green highlight zone (in back)
  {if(sweet_spot_count > 0) 
    annotate("rect", 
             xmin = match(min(sweet_spot$zone), 
                         levels(reorder(paradox_data$zone, paradox_data$n_nightlife_venues))) - 0.5,
             xmax = match(max(sweet_spot$zone), 
                         levels(reorder(paradox_data$zone, paradox_data$n_nightlife_venues))) + 0.5,
             ymin = -Inf, ymax = Inf,
             fill = "#90EE90", alpha = 0.3)
  } +
  # SECOND: Blue bars for venues (MORE VISIBLE)
  geom_col(aes(y = n_nightlife_venues), 
           fill = "#4A90E2", 
           alpha = 0.5,           # Semi-transparent so line shows through
           color = "#2E5C8A",     # Dark blue border
           size = 0.3) +
  # THIRD: Red line for crime rate (drawn on top)
  geom_line(aes(y = crime_per_venue * 10, group = 1), 
            color = "#E74C3C", 
            size = 2) +           # Thicker line
  geom_point(aes(y = crime_per_venue * 10), 
             color = "#E74C3C", 
             size = 4,
             shape = 21,          # Circle with border
             fill = "#E74C3C",
             stroke = 1) +
  scale_y_continuous(
    name = "BLUE BARS: Number of Nightlife Venues →",
    sec.axis = sec_axis(~ . / 10, 
                        name = "← RED LINE: Crimes per Venue")
  ) +
  labs(
    title = "The Nightlife Paradox: More Venues ≠ More Crime Per Venue",
    subtitle = sprintf("Top 30 zones (%s venues) | Green = SWEET SPOT: %d well-managed zones with high venues + low crime/venue", 
                      venue_range, sweet_spot_count),
    x = "",
    caption = "Two metrics: (1) Total venues (blue bars, left axis) | (2) Crimes per venue (red line, right axis) | Green zones achieve safety at scale"
  ) +
  coord_flip() +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(size = 11, color = "gray30"),
    axis.title.y.left = element_text(color = "#4A90E2", face = "bold", size = 12),
    axis.title.y.right = element_text(color = "#E74C3C", face = "bold", size = 12),
    panel.grid.minor = element_blank(),
    plot.caption = element_text(size = 10, hjust = 0, color = "gray40")
  )

Show code:

ggsave("output/visualizations/viz_paradox_zone.png",
       width = 14, height = 10, dpi = 300)

VISUALIZATION 9: Crime Hotspot Calendar

Show code:

library(tidyverse)
library(viridis)

nypd_final <- readRDS("data/processed/nypd_analysis_final.rds")

# Prepare data - use complaint_year which already exists
heatmap_data <- nypd_final %>%
  filter(!is.na(zone), !is.na(complaint_year)) %>%
  group_by(zone, complaint_year) %>%
  summarise(total_crimes = n(), .groups = "drop") %>%
  # Get top 15 zones by total crime
  group_by(zone) %>%
  mutate(zone_total = sum(total_crimes)) %>%
  ungroup() %>%
  filter(zone_total >= quantile(zone_total, 0.9)) %>%
  arrange(desc(zone_total)) %>%
  mutate(zone = fct_reorder(zone, zone_total))

# Create heatmap
ggplot(heatmap_data, aes(x = factor(complaint_year), y = zone, fill = total_crimes)) +
  geom_tile(color = "white", size = 0.5) +
  scale_fill_gradient(low = "#FFF5E6", high = "#8B4513",
                      name = "Crime\nIncidents",
                      labels = scales::comma) +
  labs(
    title = "NYC Nightlife Crime Hotspots: Which Zones, Which Years?",
    subtitle = "Darker colors = More crimes | Top 15 high-crime zones (2019-2023)",
    x = "Year",
    y = "",
    caption = "Data: NYPD nighttime crimes (8PM-8AM)"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    axis.text.y = element_text(size = 10),
    panel.grid = element_blank(),
    legend.position = "right"
  )

Show code:

ggsave("output/visualizations/viz_heatmap_zones_years.png",
       width = 14, height = 8, dpi = 300)

CONCLUSION

Key Findings

This analysis provides strong evidence that crime rates rise significantly around nightlife districts after midnight, with critical implications for urban policy and public safety management.

The After-Midnight Crime Spike is Universal Crime rates increase by 30-40% after midnight (12AM-4AM) compared to early evening (8PM-12AM) across all NYC boroughs, confirming that nightlife activity drives heightened criminal behavior regardless of location.
The 4AM Drop Proves Causality The dramatic 32-44% crime decrease after bars close at 4AM provides causal evidence that nightlife activity—not just venue presence—drives crime patterns. This temporal pattern validates the distinction between infrastructure density and actual nighttime activity levels.
The Nightlife Paradox: Density ≠ Danger Counterintuitively, high-density entertainment districts can achieve lower crime-per-venue ratios than dispersed nightlife areas. Ten “sweet spot” zones demonstrate that concentrated nightlife with proper management creates safer environments than scattered venues in residential areas, challenging assumptions that more bars automatically mean more crime.
Residential Zones Face Greater Risk Low-nightlife residential areas experience significantly higher overall crime rates (116,222 crimes/hour during daytime) compared to entertainment districts, suggesting that resource allocation should prioritize comprehensive community safety over exclusive nightlife policing.

Policy Recommendations

Resource Allocation Strategy: 70/30 Split

70% focus on residential zones: Implement comprehensive crime prevention addressing root causes in high-crime residential areas
30% focus on entertainment districts: Deploy targeted nightlife management strategies including extended police presence during peak hours (12AM-4AM) and strategic venue clustering

Encourage Managed Entertainment Districts Rather than dispersing nightlife venues across residential neighborhoods, urban planners should create concentrated, well-managed entertainment zones that can achieve safety at scale while preserving residential quality of life.
Data-Driven Policing Police departments should shift patrol schedules to match temporal crime patterns, with maximum deployment during the 12AM-4AM window and strategic repositioning after closing time to address post-nightlife incidents.

Limitations and Future Research

This analysis focuses on venue density as a proxy for nightlife activity levels. Future research should incorporate:

Actual patron counts and venue capacity data
Detailed crime severity analysis beyond simple incident counts
Longitudinal analysis of policy interventions in specific entertainment districts
Integration with teammate findings on noise complaints, rideshare patterns, and employment dynamics

Final Insight

Nightlife districts are not inherently dangerous—they are differently dangerous than residential areas, with concentrated temporal patterns that enable targeted intervention. The key to urban nightlife policy is not prohibition or dispersion, but strategic management that balances economic vitality with public safety. NYC’s entertainment economy generates $35 billion annually; with evidence-based resource allocation, cities can capture these economic benefits while minimizing public safety costs.