STA 9750 Submission Material

On this page

  • EXECUTIVE SUMMARY
  • Introduction
    • The Overarching Question
    • My Specific Question
  • Why This Matters
  • My Approach:
  • Key Innovation:
  • DATASET AND METHODS
  • Primary Data: NYPD Complaint Data Historic
  • Secondary Data: Yelp Nightlife Venues
  • Auxiliary Data: NYC Taxi Zones
  • Nightlife Zone Classification
  • Step 1: Download NYPD complaints data
  • Step 2: Yelp Data
  • Step 3: Download NYC Taxi Zones
  • Step 3b: Create Taxi Zone → Zip Code Crosswalk
  • Step 4: Classify Crime Types
  • Step 5: Spatial Join - Assign Crimes to Taxi Zones
  • Step 6: Calculate Nightlife Density by Zone
  • Step 7: Create Final Analysis Dataset
  • PHASE 2: Statistical Analysis and Visualization
  • Step 1: Summary Statistics & Key Findings
  • VISUALIZATION 1: Hourly Crime Patterns (Full 24-Hour Context)
  • VISUALIZATION 2: Interactive Crime Intensity Heatmap
  • VISUALIZATION 3: Interactive Geospatial Map - Crime & Nightlife Density by Zone
  • VISUALIZATION 4: Crime Distribution by Borough
  • VISUALIZATION 5: Stacked Area Chart - Crime Type Evolution Through the Night
  • VISUALIZATION 6: Borough Comparison of Spike & Drop Patterns
  • VISUALIZATION 7: Interactive Geospatial Map - Crime & Nightlife Density by Zone
  • VISUALIZATION 8: Crime-Per-Venue Ratio Across NYC’s Top Nightlife Districts
  • VISUALIZATION 9: Crime Hotspot Calendar
  • CONCLUSION
    • Key Findings
  • Policy Recommendations
  • Limitations and Future Research
  • Final Insight

Crime Rates Around Nightlife Districts After Midnight

Author

Richa S. Tigiripally

Published

December 14, 2025

EXECUTIVE SUMMARY

This analysis examines whether crime rates rise around popular nightlife districts after midnight using NYPD complaint data (2019-2023) linked to Yelp-defined nightlife zones. By analyzing an extended time window (8 PM–8 AM) and comparing crime patterns across taxi zones with varying nightlife density, I reveal how entertainment districts create distinct safety challenges with direct implications for urban economic vitality and public safety policy.

Introduction

The Overarching Question

Our team investigates: How does nightlife activity—bars, entertainment, and rideshares—drive urban economies and affect city safety? This question sits at the intersection of economic development, public safety, and urban planning. Nightlife districts are economic engines that generate substantial employment, tax revenue, and cultural vitality. New York City’s nightlife economy alone generates over $35 billion annually. However, concentrated late-night activity also creates public safety challenges that can undermine these economic benefits if not properly managed.

My Specific Question

Do crime rates rise around popular nightlife districts after midnight? This question addresses a critical gap in understanding the temporal and spatial dynamics of nightlife-related crime. While prior research has established correlations between bar density and crime, most studies compare daytime versus nighttime patterns or focus solely on alcohol-related offenses. My analysis extends beyond these limitations by:

  1. Examining an extended time window (8 PM–8 AM) that captures the full nightlife cycle from early evening activity through post-closing transition
  2. Comparing geographic zones systematically classified by nightlife density rather than relying on administrative or anecdotal definitions
  3. Testing a causal mechanism by examining whether crime drops after bars close at 4 AM, providing stronger evidence that nightlife activity drives crime

Why This Matters

Nightlife districts are economic engines generating jobs, tax revenue, and urban vitality. But concentrated late-night activity creates safety challenges. Understanding the relationship between nightlife density and crime patterns is essential for:

  1. Economic Policy: Business owners need to know if nightlife areas are safe for customers
  2. Public Safety: Police need to allocate resources efficiently
  3. Urban Planning: Zoning decisions affect both economic opportunity and community safety
  4. Equity: Safety concerns can exclude vulnerable populations from nightlife economies

My Approach:

I analyze NYPD complaint data across four dimensions: * Temporal: Extended window (8 PM–8 AM) with nightlife focus on 8 PM–4 AM and post-closing comparison (4 AM–8 AM) * Spatial: Taxi zones classified by nightlife density (High vs. Medium vs. Low) * Crime Type: Violent, Property, Disorder, Drug crimes analyzed separately * COVID Period: Pre-COVID (2019), During-COVID (2020-2021), Post-COVID (2022-2023)

Key Innovation:

By extending analysis to 8 AM, I can test whether crime drops after bars close at 4 AM—providing stronger evidence that crime is driven by nightlife activity during the 8 PM–4 AM window, not just venue density or zone characteristics.

DATASET AND METHODS

Primary Data: NYPD Complaint Data Historic

Source: https://catalog.data.gov/dataset/nypd-complaint-data-historic

Coverage:

  • Timeline: 2019-2023 (5 full years, all quarters)
  • Time Window: 8 PM–8 AM (12-hour analysis window)
  • Records: ~400,000 - 800,000 night-time crimes (estimated)

Secondary Data: Yelp Nightlife Venues

Source: Yelp Fusion API : https://www.yelp.com/developers

Purpose: Define nightlife intensity per taxi zone

Categories: bars, nightlife, lounges, danceclubs, jazzandblues, karaoke

I chose Yelp over alternative data sources (such as NYC Liquor License Authority databases) for several reasons. First, Yelp provides comprehensive coverage beyond just licensed establishments—including music venues and comedy clubs. Second, Yelp data includes venue characteristics like hours of operation that could support future analyses. Third, the Yelp API provides geocoded location data readily compatible with crime data analysis.

Auxiliary Data: NYC Taxi Zones

Source: TLC Taxi Zone Shapefile: https://d37ci6vzurychx.cloudfront.net/misc/taxi_zones.zip

Reference Unit: NYC Zip Code Boundaries
Source: https://data.cityofnewyork.us/Business/Zip-Code-Boundaries/i8iw-xf4u

Show code:
library(tidyverse)
library(knitr)

# Create time period reference table
time_periods <- tribble(
  ~time_window, ~hours, ~purpose,
  "Full Analysis", "8 PM–8 AM", "Complete analysis window",
  "NIGHTLIFE WINDOW", "8 PM–4 AM", "Primary focus: nightlife activity",
  "Early Nightlife", "8 PM–12 AM", "Nightlife ramp-up",
  "After Midnight", "12 AM–4 AM", "CORE FOCUS: Peak/closing",
  "POST-CLOSING (DAY)", "4 AM–8 AM", "Bars closed, daytime transition"
)

# Save it for future use
saveRDS(time_periods, "data/time_periods_reference.rds")
write_csv(time_periods, "data/time_periods_reference.csv")

# Display as formatted table (works in Quarto/RMarkdown)
kable(time_periods, 
      col.names = c("Time Window", "Hours", "Purpose"),
      caption = "Time Period Definitions")
Time Period Definitions
Time Window Hours Purpose
Full Analysis 8 PM–8 AM Complete analysis window
NIGHTLIFE WINDOW 8 PM–4 AM Primary focus: nightlife activity
Early Nightlife 8 PM–12 AM Nightlife ramp-up
After Midnight 12 AM–4 AM CORE FOCUS: Peak/closing
POST-CLOSING (DAY) 4 AM–8 AM Bars closed, daytime transition

Nightlife Zone Classification

I define nightlife zones (not using pre-existing classifications):

  1. Count Yelp nightlife venues per taxi zone using Yelp Fusion API
  2. Classify zones into three categories based on venue density:
Taxi Zone Classification by Nightlife Density
Nightlife Category Yelp Venues per Zone Example Taxi Zones Typical Venue Types
High Nightlife ≥15 venues East Village, Williamsburg (North/South), Lower East Side, Astoria Bars, nightclubs, dance clubs, jazz clubs, lounges, karaoke venues
Medium Nightlife ≥5 venues Park Slope, Midtown West, Long Island City, Greenpoint Neighborhood bars, lounges, occasional nightlife spots
Low Nightlife 0-1 venues Residential Staten Island, Outer Queens, Far uptown residential areas Minimal or no dedicated nightlife establishments

Venue Categories (from Yelp Fusion API): bars, nightlife, lounges, danceclubs, jazzandblues, karaoke

Rationale: These thresholds balance statistical power (adequate sample sizes in each category) with substantive meaning (capturing real variation in nightlife intensity across NYC neighborhoods). The ≥5 venue threshold for “High Nightlife” identifies true entertainment districts rather than isolated bars in residential areas.

My thresholds (≥15 for High, ≥5 for Medium, 0-1 for Low) were chosen based on the actual distribution of venues across NYC taxi zones—zones with 15+ venues represent true entertainment districts like East Village and Williamsburg, while this classification maximizes the ability to detect gradient effects in crime patterns. My classification differs from my teammates’ because we’re measuring different phenomena: I’m measuring nightlife infrastructure (venue density), while Dolma measures noise impact (complaints), Hari measures transportation demand (rideshare patterns), Chhin measures economic activity (employment), and Apu measures temporal disruption (COVID impacts)—each requiring different thresholds appropriate to their specific research questions.

Step 1: Download NYPD complaints data

Show code
#| code-fold: true
#| code-summary: "Show code:"
#| message: false
#| warning: false

suppressPackageStartupMessages(library(kableExtra))
library(tidyverse)
library(lubridate)
library(knitr)
library(kableExtra)

# Load raw data
nypd_raw <- readRDS("data/raw/nypd_complaints_raw.rds")

# Clean and filter to 8 PM - 8 AM
nypd_night <- nypd_raw %>%
  mutate(
    complaint_date = mdy(CMPLNT_FR_DT),
    complaint_hour = hour(hms::as_hms(CMPLNT_FR_TM)),
    complaint_year = year(complaint_date),
    day_of_week = wday(complaint_date, label = TRUE),
    is_weekend = day_of_week %in% c("Fri", "Sat", "Sun")
  ) %>%
  filter(complaint_year >= 2019, complaint_year <= 2023) %>%
  filter(complaint_hour >= 20 | complaint_hour <= 7) %>%
  filter(!is.na(Latitude), !is.na(Longitude), 
         Latitude != 0, Longitude != 0) %>%
  select(
    complaint_date, complaint_hour, complaint_year,
    day_of_week, is_weekend,
    offense_desc = OFNS_DESC,
    law_category = LAW_CAT_CD,
    borough = BORO_NM,
    precinct = ADDR_PCT_CD,
    latitude = Latitude,
    longitude = Longitude
  ) %>%
  mutate(
    time_period = case_when(
      complaint_hour >= 20 & complaint_hour <= 23 ~ "Early Nightlife (8PM-12AM)",
      complaint_hour >= 0 & complaint_hour <= 3 ~ "After Midnight (12AM-4AM)",
      complaint_hour >= 4 & complaint_hour <= 7 ~ "Post-Closing Day (4AM-8AM)"
    ),
    is_nightlife_window = complaint_hour >= 20 | complaint_hour <= 3,
    covid_period = case_when(
      complaint_date < as.Date("2020-03-01") ~ "Pre-COVID (2019)",
      complaint_date < as.Date("2022-01-01") ~ "During-COVID (2020-2021)",
      TRUE ~ "Post-COVID (2022-2023)"
    )
  )

# Create summary statistics
year_summary <- nypd_night %>%
  count(complaint_year) %>%
  mutate(pct = round(100 * n / sum(n), 1))

covid_summary <- nypd_night %>%
  count(covid_period) %>%
  mutate(pct = round(100 * n / sum(n), 1))

time_summary <- nypd_night %>%
  count(time_period) %>%
  mutate(pct = round(100 * n / sum(n), 1))

# Display tables
kable(year_summary, 
      col.names = c("Year", "Crime Incidents", "% of Total"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r"),
      caption = "Crime Distribution by Year (2019-2023)") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  column_spec(1, bold = TRUE, width = "8em") %>%
  column_spec(2:3, width = "10em")
Crime Distribution by Year (2019-2023)
Year Crime Incidents % of Total
2,019 174,606 19.1
2,020 160,773 17.6
2,021 169,824 18.6
2,022 199,048 21.8
2,023 210,462 23.0
Show code
kable(covid_summary, 
      col.names = c("COVID Period", "Crime Incidents", "% of Total"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r"),
      caption = "Crime Distribution by COVID Period") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  column_spec(1, bold = TRUE, width = "12em") %>%
  column_spec(2:3, width = "10em")
Crime Distribution by COVID Period
COVID Period Crime Incidents % of Total
During-COVID (2020-2021) 302,775 33.1
Post-COVID (2022-2023) 409,510 44.8
Pre-COVID (2019) 202,428 22.1
Show code
kable(time_summary, 
      col.names = c("Time Period", "Crime Incidents", "% of Total"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r"),
      caption = "Crime Distribution by Time Window") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  column_spec(1, bold = TRUE, width = "14em") %>%
  column_spec(2:3, width = "10em")
Crime Distribution by Time Window
Time Period Crime Incidents % of Total
After Midnight (12AM-4AM) 293,427 32.1
Early Nightlife (8PM-12AM) 443,732 48.5
Post-Closing Day (4AM-8AM) 177,554 19.4
Show code
# Save processed data
saveRDS(nypd_night, "data/processed/nypd_night_clean.rds")

#Step 1b: Create 24-Hour Dataset

Show code:
library(tidyverse)
library(sf)
library(lubridate)
library(knitr)
library(kableExtra)

# Load data
nypd_raw <- readRDS("data/raw/nypd_complaints_raw.rds")
taxi_zones <- readRDS("data/raw/taxi_zones.rds") %>% filter(borough != "EWR")
nightlife_density <- readRDS("data/processed/nightlife_density_by_zone.rds")

# Process 24-hour data
nypd_24hour <- nypd_raw %>%
  mutate(
    complaint_date = mdy(CMPLNT_FR_DT),
    complaint_hour = hour(hms::as_hms(CMPLNT_FR_TM)),
    complaint_year = year(complaint_date)
  ) %>%
  filter(complaint_year >= 2019, complaint_year <= 2023) %>%
  filter(!is.na(Latitude), !is.na(Longitude), 
         Latitude != 0, Longitude != 0) %>%
  select(complaint_date, complaint_hour, complaint_year,
         latitude = Latitude, longitude = Longitude)

# Convert to spatial and join
nypd_24hour_sf <- st_as_sf(nypd_24hour, 
                            coords = c("longitude", "latitude"), 
                            crs = 4326, remove = FALSE)

nypd_24hour_zones <- st_join(nypd_24hour_sf, 
                              taxi_zones %>% select(LocationID, zone),
                              join = st_within)

# Handle crimes outside zones
nypd_outside <- nypd_24hour_zones %>% filter(is.na(LocationID))
if (nrow(nypd_outside) > 0) {
  nearest_indices <- st_nearest_feature(nypd_outside, taxi_zones)
  nypd_outside$LocationID <- taxi_zones$LocationID[nearest_indices]
  nypd_outside$zone <- taxi_zones$zone[nearest_indices]
  
  nypd_24hour_zones <- bind_rows(
    nypd_24hour_zones %>% filter(!is.na(LocationID)),
    nypd_outside
  )
}

# Finalize dataset
nypd_24hour_final <- nypd_24hour_zones %>%
  st_drop_geometry() %>%
  as_tibble() %>%
  left_join(nightlife_density %>% 
              select(LocationID, n_nightlife_venues, nightlife_category),
            by = "LocationID") %>%
  mutate(
    n_nightlife_venues = replace_na(n_nightlife_venues, 0),
    nightlife_category = replace_na(as.character(nightlife_category), "Low Nightlife"),
    nightlife_category = factor(nightlife_category,
                                levels = c("Low Nightlife", "Medium Nightlife", "High Nightlife"))
  )

# Summary table
category_dist <- nypd_24hour_final %>%
  count(nightlife_category) %>%
  mutate(pct = round(100 * n / sum(n), 1))

kable(category_dist,
      col.names = c("Nightlife Zone Type", "Total Crimes", "% of Total"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r"),
      caption = "24-Hour Crime Distribution by Nightlife Density (2019-2023)") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  row_spec(which(category_dist$nightlife_category == "High Nightlife"), 
           background = "#FFE6E6") %>%
  column_spec(1, bold = TRUE, width = "12em") %>%
  column_spec(2:3, width = "10em")
24-Hour Crime Distribution by Nightlife Density (2019-2023)
Nightlife Zone Type Total Crimes % of Total
Low Nightlife 2,252,804 93.7
Medium Nightlife 144,660 6.0
High Nightlife 7,211 0.3
Show code:
# Save
saveRDS(nypd_24hour_final, "data/processed/nypd_24hour_with_zones.rds")

Step 2: Yelp Data

Show code:
library(tidyverse)
library(httr)
library(jsonlite)
library(knitr)
library(kableExtra)

YELP_API_KEY <- "gyiNfUOC5PXhdIGox2Nub0ySbLCk27y94S9IrPZPKbFpEm4xTMq9tEwreoCcoDxDsncwBUk90PfhH_j8GDutOtyHc0dI1tyUF9OCItF1b40hXdgd0pvMNAAMSZMsaXYx"
CACHE_FILE_CSV <- "data/raw/yelp_nightlife.csv"
CACHE_FILE_RDS <- "data/raw/yelp_nightlife.rds"

if (file.exists(CACHE_FILE_CSV)) {
  # Load from cache
  yelp_nightlife <- read_csv(CACHE_FILE_CSV, show_col_types = FALSE)
  
} else {
  # Download from Yelp API
  nyc_locations <- tribble(
    ~borough, ~latitude, ~longitude,
    "Manhattan", 40.7831, -73.9712,
    "Brooklyn", 40.6782, -73.9442,
    "Queens", 40.7282, -73.7949,
    "Bronx", 40.8448, -73.8648,
    "Staten Island", 40.5795, -74.1502
  )
  
  fetch_yelp_nightlife <- function(lat, lon, borough) {
    response <- GET(
      "https://api.yelp.com/v3/businesses/search",
      add_headers(Authorization = paste("Bearer", YELP_API_KEY)),
      query = list(
        latitude = lat, longitude = lon,
        categories = "bars,nightlife,lounges,danceclubs,jazzandblues,karaoke",
        limit = 50, radius = 10000
      )
    )
    
    if (status_code(response) == 200) {
      data <- content(response, "parsed")
      if (length(data$businesses) > 0) {
        return(map_df(data$businesses, ~tibble(
          id = .x$id, name = .x$name,
          lat = .x$coordinates$latitude,
          lon = .x$coordinates$longitude,
          rating = .x$rating,
          review_count = .x$review_count,
          categories = paste(map_chr(.x$categories, "title"), collapse = ", "),
          borough = borough
        )))
      }
    }
    return(tibble())
  }
  
  yelp_nightlife <- pmap_df(nyc_locations, ~fetch_yelp_nightlife(..2, ..3, ..1)) %>%
    distinct(id, .keep_all = TRUE)
  
  write_csv(yelp_nightlife, CACHE_FILE_CSV)
  saveRDS(yelp_nightlife, CACHE_FILE_RDS)
}

# Borough summary table
borough_summary <- yelp_nightlife %>%
  count(borough, sort = TRUE) %>%
  mutate(pct = round(100 * n / sum(n), 1))

kable(borough_summary,
      col.names = c("Borough", "Venues", "% of Total"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r"),
      caption = "Yelp Nightlife Venues by Borough") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  row_spec(1, background = "#E8F4F8") %>%
  column_spec(1, bold = TRUE, width = "10em") %>%
  column_spec(2:3, width = "8em")
Yelp Nightlife Venues by Borough
Borough Venues % of Total
Manhattan 1,221 60.6
Brooklyn 624 31.0
Queens 166 8.2
Bronx 5 0.2
Show code:
### **Note on Staten Island:** Staten Island shows zero nightlife venues in the Yelp API results due to its suburban, residential character with significantly lower bar/nightclub density compared to other boroughs, combined with API radius limitations (10km from borough center) that may miss geographically dispersed establishments. Additionally, Staten Island venues may have lower Yelp registration rates or be categorized as restaurants rather than nightlife establishments. This absence does not meaningfully impact the analysis since Staten Island accounts for only ~5% of NYC's population and an even smaller share of nightlife-related crime, with the study focusing on high-density entertainment districts concentrated in Manhattan, Brooklyn, and Queens.

# Preview top venues
preview <- yelp_nightlife %>%
  arrange(desc(review_count)) %>%
  head(5) %>%
  select(name, borough, rating, review_count, categories)

kable(preview,
      col.names = c("Venue Name", "Borough", "Rating", "Reviews", "Categories"),
      align = c("l", "l", "c", "r", "l"),
      caption = "Top 5 Most Reviewed Nightlife Venues") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  column_spec(1, bold = TRUE, width = "15em") %>%
  column_spec(3, background = "#FFF9E6") %>%
  column_spec(5, width = "20em")
Top 5 Most Reviewed Nightlife Venues
Venue Name Borough Rating Reviews Categories
Olio e Più Manhattan 4.5 7079 Pizza, Italian, Cocktail Bars
Buddakan Manhattan 4.1 4871 Chinese, Bars, Asian Fusion
La Grande Boucherie Manhattan 4.5 4234 French, Steakhouses, Cocktail Bars
Blend on the Water Manhattan 3.8 4028 Latin American, Breakfast & Brunch, Lounges
Sea Thai Brooklyn 3.5 3624 Thai, Cocktail Bars, Seafood

Step 3: Download NYC Taxi Zones

Show code
library(tidyverse)
library(sf)
library(httr)
library(knitr)
library(kableExtra)

# Download taxi zone shapefile from TLC
taxi_url <- "https://d37ci6vzurychx.cloudfront.net/misc/taxi_zones.zip"

# Download to temp file
temp_zip <- tempfile(fileext = ".zip")
temp_dir <- tempdir()
download.file(taxi_url, temp_zip, mode = "wb")

# Unzip
unzip(temp_zip, exdir = temp_dir)

# Read shapefile
shp_file <- list.files(temp_dir, pattern = "\\.shp$", full.names = TRUE)
taxi_zones <- st_read(shp_file[grepl("taxi_zones", shp_file, ignore.case = TRUE)][1], quiet = TRUE)

# Transform to WGS84 and EXCLUDE Newark Airport (New Jersey)
taxi_zones <- taxi_zones %>%
  st_transform(4326) %>%
  filter(borough != "EWR")  # Remove Newark - not in NYC

# Create formatted table
borough_table <- taxi_zones %>%
  st_drop_geometry() %>%
  count(borough) %>%
  mutate(pct = round(100 * n / sum(n), 1))

kable(borough_table, 
      col.names = c("Borough", "Number of Zones", "% of Total"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r"),
      caption = "NYC Taxi Zone Distribution by Borough") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  column_spec(1, bold = TRUE, width = "10em") %>%
  column_spec(2:3, width = "10em")
NYC Taxi Zone Distribution by Borough
Borough Number of Zones % of Total
Bronx 43 16.4
Brooklyn 61 23.3
Manhattan 69 26.3
Queens 69 26.3
Staten Island 20 7.6
Show code
# Save for future use
saveRDS(taxi_zones, "data/raw/taxi_zones.rds")
st_write(taxi_zones, "data/raw/taxi_zones.geojson", delete_dsn = TRUE, quiet = TRUE)

Step 3b: Create Taxi Zone → Zip Code Crosswalk

Show code
library(tidyverse)
library(sf)
library(httr)
library(knitr)
library(kableExtra)

# Load taxi zones and EXCLUDE Newark Airport
taxi_zones <- readRDS("data/raw/taxi_zones.rds") %>%
  filter(borough != "EWR")  # Remove Newark - it's in New Jersey

# Download NYC zip codes from NYC Open Data
zip_url <- "https://data.cityofnewyork.us/api/geospatial/i8iw-xf4u?method=export&format=Shapefile"

# Download to temp file
temp_zip <- tempfile(fileext = ".zip")
temp_dir <- tempdir()
response <- GET(zip_url, write_disk(temp_zip, overwrite = TRUE))

if (status_code(response) == 200) {
  # Unzip
  unzip(temp_zip, exdir = temp_dir)
  
  # Find the .shp file
  shp_file <- list.files(temp_dir, pattern = "\\.shp$", full.names = TRUE)
  shp_file <- shp_file[!grepl("taxi", shp_file, ignore.case = TRUE)][1]
  
  # Read shapefile
  zip_codes <- st_read(shp_file, quiet = TRUE)
  
  # Check available columns and rename
  if ("modzcta" %in% names(zip_codes)) {
    zip_codes <- zip_codes %>% rename(ZIPCODE = modzcta)
  } else if ("ZIPCODE" %in% names(zip_codes)) {
    # Already has ZIPCODE
  } else if ("ZCTA5CE10" %in% names(zip_codes)) {
    zip_codes <- zip_codes %>% rename(ZIPCODE = ZCTA5CE10)
  } else if ("GEOID" %in% names(zip_codes)) {
    zip_codes <- zip_codes %>% rename(ZIPCODE = GEOID)
  }
  
  # Ensure WGS84 projection
  zip_codes <- zip_codes %>%
    st_transform(4326) %>%
    select(ZIPCODE, geometry)
  
} else {
  # ALTERNATIVE: Download from US Census
  library(tigris)
  options(tigris_use_cache = TRUE)
  
  zip_codes <- zctas(year = 2010) %>%
    st_transform(4326) %>%
    filter(substr(ZCTA5CE10, 1, 2) %in% c("10", "11")) %>%
    rename(ZIPCODE = ZCTA5CE10) %>%
    select(ZIPCODE, geometry)
}

# Create the crosswalk
taxi_centroids <- taxi_zones %>% st_centroid()

zone_zip_crosswalk <- st_join(taxi_centroids, 
                               zip_codes,
                               join = st_within) %>%
  st_drop_geometry() %>%
  select(LocationID, zone, borough, ZIPCODE) %>%
  as_tibble()

# Handle any taxi zones that didn't match
missing_zips <- zone_zip_crosswalk %>% filter(is.na(ZIPCODE))

if (nrow(missing_zips) > 0) {
  missing_centroids <- taxi_centroids %>%
    filter(LocationID %in% missing_zips$LocationID)
  
  nearest_indices <- st_nearest_feature(missing_centroids, zip_codes)
  
  for (i in 1:nrow(missing_zips)) {
    loc_id <- missing_zips$LocationID[i]
    nearest_zip <- zip_codes$ZIPCODE[nearest_indices[i]]
    
    zone_zip_crosswalk <- zone_zip_crosswalk %>%
      mutate(ZIPCODE = if_else(LocationID == loc_id, nearest_zip, ZIPCODE))
  }
}

# Remove any remaining NAs
zone_zip_crosswalk <- zone_zip_crosswalk %>% filter(!is.na(ZIPCODE))

# Sample mappings table
sample_mappings <- zone_zip_crosswalk %>%
  filter(!is.na(zone)) %>%
  select(LocationID, zone, borough, ZIPCODE) %>%
  arrange(LocationID) %>%
  head(15)

kable(sample_mappings,
      col.names = c("Taxi Zone ID", "Taxi Zone Name", "Borough", "Zip Code"),
      align = c("r", "l", "l", "r"),
      caption = "Sample Taxi Zone to Zip Code Mappings") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  column_spec(1, width = "8em") %>%
  column_spec(2, bold = TRUE, width = "18em") %>%
  column_spec(3:4, width = "10em")
Sample Taxi Zone to Zip Code Mappings
Taxi Zone ID Taxi Zone Name Borough Zip Code
2 Jamaica Bay Queens 11693
3 Allerton/Pelham Gardens Bronx 10469
4 Alphabet City Manhattan 10009
5 Arden Heights Staten Island 10312
6 Arrochar/Fort Wadsworth Staten Island 10305
7 Astoria Queens 11106
8 Astoria Park Queens 11102
9 Auburndale Queens 11358
10 Baisley Park Queens 11434
11 Bath Beach Brooklyn 11214
12 Battery Park Manhattan 10004
13 Battery Park City Manhattan 10280
14 Bay Ridge Brooklyn 11209
15 Bay Terrace/Fort Totten Queens 11360
16 Bayside Queens 11361
Show code
# Nightlife area mappings
nightlife_examples <- zone_zip_crosswalk %>%
  filter(zone %in% c("East Village", "Lower East Side", "Williamsburg (North Side)", 
                     "Williamsburg (South Side)", "Astoria", "Park Slope",
                     "Midtown West", "Lower Manhattan")) %>%
  select(LocationID, zone, borough, ZIPCODE)

if (nrow(nightlife_examples) > 0) {
  kable(nightlife_examples,
        col.names = c("Taxi Zone ID", "Taxi Zone Name", "Borough", "Zip Code"),
        align = c("r", "l", "l", "r"),
        caption = "Example Nightlife Area Mappings") %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                  full_width = FALSE,
                  font_size = 14) %>%
    row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
    column_spec(1, width = "8em") %>%
    column_spec(2, bold = TRUE, width = "18em") %>%
    column_spec(3:4, width = "10em")
}
Example Nightlife Area Mappings
Taxi Zone ID Taxi Zone Name Borough Zip Code
7 Astoria Queens 11106
79 East Village Manhattan 10003
148 Lower East Side Manhattan 10002
181 Park Slope Brooklyn 11215
255 Williamsburg (North Side) Brooklyn 11211
256 Williamsburg (South Side) Brooklyn 11211
Show code
# Save crosswalk files
saveRDS(zone_zip_crosswalk, "data/processed/zone_zip_crosswalk.rds")
write_csv(zone_zip_crosswalk, "data/processed/zone_zip_crosswalk.csv")

Step 4: Classify Crime Types

Show code:
library(tidyverse)
library(knitr)
library(kableExtra)

# Load cleaned data
nypd_night <- readRDS("data/processed/nypd_night_clean.rds")

# Add crime classifications
nypd_night <- nypd_night %>%
  mutate(
    crime_category = case_when(
      str_detect(offense_desc, regex("assault|robbery|rape|murder|homicide|shooting|sex", 
                                     ignore_case = TRUE)) ~ "Violent",
      str_detect(offense_desc, regex("larceny|burglary|theft|stolen|auto", 
                                     ignore_case = TRUE)) ~ "Property",
      str_detect(offense_desc, regex("mischief|harassment|disorderly|intoxicated|dui|weapon|menacing", 
                                     ignore_case = TRUE)) ~ "Disorder",
      str_detect(offense_desc, regex("drug|narcotic|marijuana|controlled", 
                                     ignore_case = TRUE)) ~ "Drug",
      TRUE ~ "Other"
    ),
    is_felony = law_category == "FELONY"
  )

# Crime type distribution
crime_summary <- nypd_night %>%
  count(crime_category, sort = TRUE, name = "total_crimes") %>%
  mutate(pct = round(100 * total_crimes / sum(total_crimes), 1))

kable(crime_summary,
      col.names = c("Crime Category", "Number of Crimes", "% of Total"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r"),
      caption = "Crime Distribution by Type (8PM-8AM, 2019-2023)") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  row_spec(which(crime_summary$crime_category == "Violent"), 
           background = "#FFE6E6") %>%
  column_spec(1, bold = TRUE, width = "12em") %>%
  column_spec(2:3, width = "10em")
Crime Distribution by Type (8PM-8AM, 2019-2023)
Crime Category Number of Crimes % of Total
Property 268,111 29.3
Other 261,900 28.6
Violent 234,043 25.6
Disorder 132,975 14.5
Drug 17,684 1.9
Show code:
# Save
saveRDS(nypd_night, "data/processed/nypd_night_classified.rds")

Step 5: Spatial Join - Assign Crimes to Taxi Zones

Show code
library(tidyverse)
library(sf)
library(knitr)
library(kableExtra)

# Load data
nypd_night <- readRDS("data/processed/nypd_night_classified.rds")
taxi_zones <- readRDS("data/raw/taxi_zones.rds")

# Convert NYPD data to spatial
nypd_sf <- nypd_night %>%
  st_as_sf(coords = c("longitude", "latitude"), crs = 4326, remove = FALSE)

# Spatial join
nypd_zones <- st_join(nypd_sf, 
                      taxi_zones %>% select(LocationID, zone), 
                      join = st_within)

# Handle crimes outside zones
nypd_outside <- nypd_zones %>% filter(is.na(LocationID))

if (nrow(nypd_outside) > 0) {
  nearest_indices <- st_nearest_feature(nypd_outside, taxi_zones)
  nypd_outside$LocationID <- taxi_zones$LocationID[nearest_indices]
  nypd_outside$zone <- taxi_zones$zone[nearest_indices]
  
  nypd_zones <- nypd_zones %>%
    filter(!is.na(LocationID)) %>%
    bind_rows(nypd_outside)
}

# Convert to dataframe
nypd_zones_df <- nypd_zones %>%
  st_drop_geometry() %>%
  as_tibble() %>%
  mutate(
    borough = case_when(
      is.na(borough) | borough == "" | borough == "(null)" ~ "Unknown",
      TRUE ~ as.character(borough)
    )
  )

# Borough distribution table
if ("borough" %in% names(nypd_zones_df)) {
  borough_dist <- nypd_zones_df %>%
    count(borough, sort = TRUE) %>%
    mutate(pct = round(100 * n / sum(n), 1))
  
  kable(borough_dist,
        col.names = c("Borough", "Number of Crimes", "% of Total"),
        format.args = list(big.mark = ","),
        align = c("l", "r", "r"),
        caption = "Crime Distribution by Borough (8PM-8AM, 2019-2023)") %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                  full_width = FALSE,
                  font_size = 14) %>%
    row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
    column_spec(1, bold = TRUE, width = "12em") %>%
    column_spec(2:3, width = "10em")
}
Crime Distribution by Borough (8PM-8AM, 2019-2023)
Borough Number of Crimes % of Total
BROOKLYN 260,647 28.5
QUEENS 208,347 22.8
MANHATTAN 205,022 22.4
BRONX 202,667 22.2
STATEN ISLAND 37,480 4.1
Unknown 550 0.1
Show code
# Top 10 zones table
top_zones <- nypd_zones_df %>%
  count(LocationID, zone, borough, sort = TRUE, name = "total_crimes") %>%
  head(10)

kable(top_zones,
      col.names = c("Location ID", "Zone Name", "Borough", "Total Crimes"),
      format.args = list(big.mark = ","),
      align = c("r", "l", "l", "r"),
      caption = "Top 10 Taxi Zones by Crime Count (8PM-8AM, 2019-2023)") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  column_spec(1, width = "8em") %>%
  column_spec(2, bold = TRUE, width = "18em") %>%
  column_spec(3:4, width = "10em")
Top 10 Taxi Zones by Crime Count (8PM-8AM, 2019-2023)
Location ID Zone Name Borough Total Crimes
76 East New York BROOKLYN 18,675
61 Crown Heights North BROOKLYN 15,328
74 East Harlem North MANHATTAN 13,086
168 Mott Haven/Port Morris BRONX 13,000
42 Central Harlem North MANHATTAN 12,934
35 Brownsville BROOKLYN 11,592
129 Jackson Heights QUEENS 11,285
37 Bushwick South BROOKLYN 10,673
225 Stuyvesant Heights BROOKLYN 10,117
130 Jamaica QUEENS 10,049
Show code
# Save both versions
saveRDS(nypd_zones_df, "data/processed/nypd_night_with_zones.rds")
saveRDS(nypd_zones, "data/processed/nypd_night_with_zones_sf.rds")

Step 6: Calculate Nightlife Density by Zone

Show code
library(tidyverse)
library(sf)
library(knitr)
library(kableExtra)

# Load data
yelp_nightlife <- readRDS("data/raw/yelp_nightlife.rds")
taxi_zones <- readRDS("data/raw/taxi_zones.rds")

# Convert Yelp to spatial
yelp_sf <- yelp_nightlife %>%
  st_as_sf(coords = c("lon", "lat"), crs = 4326)

# Spatial join to zones
yelp_zones <- st_join(yelp_sf, taxi_zones, join = st_within)

# Fix borough columns: use taxi zone's borough (more accurate)
if ("borough.y" %in% names(yelp_zones)) {
  yelp_zones <- yelp_zones %>%
    select(-borough.x) %>%
    rename(borough = borough.y)
} else if ("borough.x" %in% names(yelp_zones)) {
  yelp_zones <- yelp_zones %>%
    rename(borough = borough.x)
}

# Calculate density per zone
nightlife_density <- yelp_zones %>%
  st_drop_geometry() %>%
  filter(!is.na(LocationID)) %>%
  group_by(LocationID, zone, borough) %>%
  summarise(
    n_nightlife_venues = n(),
    avg_rating = mean(rating, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    nightlife_category = case_when(
      n_nightlife_venues >= 15 ~ "High Nightlife",
      n_nightlife_venues >= 5 ~ "Medium Nightlife",
      TRUE ~ "Low Nightlife"
    ),
    nightlife_category = factor(nightlife_category, 
                                 levels = c("Low Nightlife", "Medium Nightlife", "High Nightlife"))
  )

# Distribution table
density_summary <- nightlife_density %>%
  count(nightlife_category) %>%
  mutate(pct = round(100 * n / sum(n), 1))

kable(density_summary,
      col.names = c("Nightlife Category", "Number of Zones", "% of Total"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r"),
      caption = "Distribution of Taxi Zones by Nightlife Density") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  row_spec(which(density_summary$nightlife_category == "High Nightlife"), 
           background = "#FFE6E6") %>%
  column_spec(1, bold = TRUE, width = "12em") %>%
  column_spec(2:3, width = "10em")
Distribution of Taxi Zones by Nightlife Density
Nightlife Category Number of Zones % of Total
Low Nightlife 70 83.3
Medium Nightlife 13 15.5
High Nightlife 1 1.2
Show code
# Top 10 nightlife zones
top_nightlife <- nightlife_density %>%
  arrange(desc(n_nightlife_venues)) %>%
  head(10) %>%
  mutate(avg_rating = round(avg_rating, 2))

kable(top_nightlife,
      col.names = c("Location ID", "Zone Name", "Borough",
                    "Venues", "Avg Rating", "Category"),
      format.args = list(big.mark = ","),
      align = c("r", "l", "l", "r", "c", "l"),
      caption = "Top 10 Zones by Nightlife Venue Count") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  column_spec(1, width = "8em") %>%
  column_spec(2, bold = TRUE, width = "16em") %>%
  column_spec(3, width = "10em") %>%
  column_spec(4:5, width = "8em") %>%
  column_spec(6, width = "12em")
Top 10 Zones by Nightlife Venue Count
Location ID Zone Name Borough Venues Avg Rating Category
239 Upper West Side South Manhattan 16 4.24 High Nightlife
95 Forest Hills Queens 12 4.11 Medium Nightlife
92 Flushing Queens 7 4.34 Medium Nightlife
118 Heartland Village/Todt Hill Staten Island 7 3.94 Medium Nightlife
171 Murray Hill-Queens Queens 7 4.36 Medium Nightlife
61 Crown Heights North Brooklyn 6 4.28 Medium Nightlife
255 Williamsburg (North Side) Brooklyn 6 4.22 Medium Nightlife
24 Bloomingdale Manhattan 5 4.44 Medium Nightlife
109 Great Kills Staten Island 5 3.86 Medium Nightlife
127 Inwood Manhattan 5 4.12 Medium Nightlife
Show code
# Save
saveRDS(nightlife_density, "data/processed/nightlife_density_by_zone.rds")

Step 7: Create Final Analysis Dataset

Show code
library(tidyverse)
library(knitr)
library(kableExtra)

# Load data
nypd_zones_df <- readRDS("data/processed/nypd_night_with_zones.rds")
nightlife_density <- readRDS("data/processed/nightlife_density_by_zone.rds")

# Merge crime data with nightlife density
nypd_final <- nypd_zones_df %>%
  left_join(
    nightlife_density %>% select(LocationID, n_nightlife_venues, nightlife_category),
    by = "LocationID"
  ) %>%
  mutate(
    n_nightlife_venues = replace_na(n_nightlife_venues, 0),
    nightlife_category = replace_na(as.character(nightlife_category), "Low Nightlife"),
    nightlife_category = factor(nightlife_category, 
                                 levels = c("Low Nightlife", "Medium Nightlife", "High Nightlife"))
  )

# Calculate distributions
nightlife_dist <- nypd_final %>%
  count(nightlife_category) %>%
  mutate(
    category = as.character(nightlife_category),
    pct = round(100 * n / sum(n), 1)
  ) %>%
  select(category, n, pct)

nightlife_subtotal <- tibble(
  category = "Subtotal - Nightlife",
  n = sum(nightlife_dist$n),
  pct = 100.0
)

time_dist <- nypd_final %>%
  count(time_period) %>%
  mutate(
    category = as.character(time_period),
    pct = round(100 * n / sum(n), 1)
  ) %>%
  select(category, n, pct)

time_subtotal <- tibble(
  category = "Subtotal - Time Period",
  n = sum(time_dist$n),
  pct = 100.0
)

crime_dist <- nypd_final %>%
  count(crime_category) %>%
  mutate(
    category = as.character(crime_category),
    pct = round(100 * n / sum(n), 1)
  ) %>%
  select(category, n, pct)

crime_subtotal <- tibble(
  category = "Subtotal - Crime Type",
  n = sum(crime_dist$n),
  pct = 100.0
)

grand_total <- tibble(
  category = "GRAND TOTAL",
  n = nrow(nypd_final),
  pct = 100.0
)

# Combine everything
all_data <- bind_rows(
  nightlife_dist,
  nightlife_subtotal,
  time_dist,
  time_subtotal,
  crime_dist,
  crime_subtotal,
  grand_total
)

# Determine colors
data_rows_idx <- c(1:3, 5:7, 9:13)
data_values <- all_data$n[data_rows_idx]
high_val <- quantile(data_values, 0.67)
low_val <- quantile(data_values, 0.33)

colors <- character(nrow(all_data))
for (i in 1:nrow(all_data)) {
  if (i %in% data_rows_idx) {
    if (all_data$n[i] >= high_val) {
      colors[i] <- "#ffcccc"
    } else if (all_data$n[i] >= low_val) {
      colors[i] <- "#ffffcc"
    } else {
      colors[i] <- "#ccffcc"
    }
  } else if (i == 15) {
    colors[i] <- "#b0b0b0"
  } else {
    colors[i] <- "#d3d3d3"
  }
}

# Create table
result_table <- all_data %>%
  kable(
    col.names = c("Category", "Number of Crimes", "%"),
    align = c("l", "r", "r"),
    format.args = list(big.mark = ","),
    format = "html",
    escape = FALSE,
    caption = "Consolidated Crime Summary by Nightlife Level, Time Period, and Crime Type"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  column_spec(2, background = colors, bold = TRUE, color = "black") %>%
  column_spec(3, background = colors, color = "black") %>%
  pack_rows("Nightlife Level", 1, 4, label_row_css = "background-color: #2C3E50; color: white; font-weight: bold;") %>%
  pack_rows("Time Period", 5, 8, label_row_css = "background-color: #2C3E50; color: white; font-weight: bold;") %>%
  pack_rows("Crime Type", 9, 14, label_row_css = "background-color: #2C3E50; color: white; font-weight: bold;") %>%
  row_spec(c(4, 8, 14), bold = TRUE, italic = TRUE, background = "#d3d3d3") %>%
  row_spec(15, bold = TRUE, font_size = 16, background = "#b0b0b0") %>%
  footnote(general = "Colors: Green = Low volume | Yellow = Medium volume | Red = High volume. Percentages within each section sum to 100%",
           general_title = "Note:")

result_table
Consolidated Crime Summary by Nightlife Level, Time Period, and Crime Type
Category Number of Crimes %
Nightlife Level
Low Nightlife 858,140 93.8
Medium Nightlife 54,104 5.9
High Nightlife 2,469 0.3
Subtotal - Nightlife 914,713 100.0
Time Period
After Midnight (12AM-4AM) 293,427 32.1
Early Nightlife (8PM-12AM) 443,732 48.5
Post-Closing Day (4AM-8AM) 177,554 19.4
Subtotal - Time Period 914,713 100.0
Crime Type
Disorder 132,975 14.5
Drug 17,684 1.9
Other 261,900 28.6
Property 268,111 29.3
Violent 234,043 25.6
Subtotal - Crime Type 914,713 100.0
GRAND TOTAL 914,713 100.0
Note:
Colors: Green = Low volume | Yellow = Medium volume | Red = High volume. Percentages within each section sum to 100%
Show code
# Save
saveRDS(nypd_final, "data/processed/nypd_analysis_final.rds")

PHASE 2: Statistical Analysis and Visualization

Step 1: Summary Statistics & Key Findings

Show code
library(tidyverse)
library(knitr)
library(kableExtra)

# Load final dataset
nypd_final <- readRDS("data/processed/nypd_analysis_final.rds")

# After-midnight spike analysis
spike_stats <- nypd_final %>%
  filter(time_period %in% c("Early Nightlife (8PM-12AM)", 
                            "After Midnight (12AM-4AM)")) %>%
  group_by(nightlife_category, time_period) %>%
  summarise(total_crimes = n(), .groups = "drop") %>%
  pivot_wider(names_from = time_period, values_from = total_crimes) %>%
  mutate(
    early_rate = `Early Nightlife (8PM-12AM)` / 4,
    midnight_rate = `After Midnight (12AM-4AM)` / 4,
    spike_pct = round(100 * (midnight_rate / early_rate - 1), 1)
  )

kable(spike_stats %>% select(nightlife_category, `Early Nightlife (8PM-12AM)`, 
                              `After Midnight (12AM-4AM)`, spike_pct),
      col.names = c("Nightlife Level", "Early (8PM-12AM)", "After Midnight (12AM-4AM)", "Spike %"),
      format.args = list(big.mark = ","),
      align = c("l", "r", "r", "r"),
      caption = "After-Midnight Crime Spike by Nightlife Density") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  row_spec(which(spike_stats$nightlife_category == "High Nightlife"), 
           background = "#FFE6E6") %>%
  column_spec(1, bold = TRUE, width = "12em") %>%
  column_spec(2:4, width = "12em") %>%
  column_spec(4, bold = TRUE, color = "red")
After-Midnight Crime Spike by Nightlife Density
Nightlife Level Early (8PM-12AM) After Midnight (12AM-4AM) Spike %
Low Nightlife 416,687 274,636 -34.1
Medium Nightlife 25,910 17,862 -31.1
High Nightlife 1,135 929 -18.1
Show code
# Save
write_csv(spike_stats, "output/tables/after_midnight_spike.csv")

VISUALIZATION 1: Hourly Crime Patterns (Full 24-Hour Context)

Show code:
library(tidyverse)

# Load 24-hour dataset WITH zones and nightlife categories
nypd_24hour <- readRDS("data/processed/nypd_24hour_with_zones.rds")


# Aggregate by hour and nightlife category
hourly_patterns_24 <- nypd_24hour %>%
  group_by(complaint_hour, nightlife_category) %>%
  summarise(total_crimes = n(), .groups = "drop")

# Create the plot
viz1 <- ggplot(hourly_patterns_24, 
               aes(x = complaint_hour, y = total_crimes, 
                   color = nightlife_category, group = nightlife_category)) +
  # Shade the analysis window (8PM-8AM)
  annotate("rect", xmin = 20, xmax = 24, ymin = 0, ymax = Inf,
           fill = "yellow", alpha = 0.15) +
  annotate("rect", xmin = 0, xmax = 8, ymin = 0, ymax = Inf,
           fill = "yellow", alpha = 0.15) +
# Add text label for daytime
annotate("text", x = 14, y = max(hourly_patterns_24$total_crimes) * 0.95,
         label = "DAYTIME\n", 
         color = "gray50", size = 5, fontface = "italic") +
  # Main lines
  geom_line(size = 1.5) +
  geom_point(size = 2.5) +
  # Vertical reference lines
  geom_vline(xintercept = 20, linetype = "dashed", color = "darkgreen", size = 1) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "red", size = 1) +
  geom_vline(xintercept = 4, linetype = "dashed", color = "blue", size = 1) +
  geom_vline(xintercept = 8, linetype = "dashed", color = "darkgreen", size = 1) +
  # Labels for key times
  annotate("text", x = 20.3, y = max(hourly_patterns_24$total_crimes) * 0.88, 
           label = "8PM\n(Analysis\nStarts)", color = "darkgreen", hjust = 0, size = 3.5, fontface = "bold") +
  annotate("text", x = 0.3, y = max(hourly_patterns_24$total_crimes) * 0.88, 
           label = "Midnight", color = "red", hjust = 0, size = 3.5, fontface = "bold") +
  annotate("text", x = 4.3, y = max(hourly_patterns_24$total_crimes) * 0.88, 
           label = "4AM\n(Bars Close)", color = "blue", hjust = 0, size = 3.5, fontface = "bold") +
  annotate("text", x = 8.3, y = max(hourly_patterns_24$total_crimes) * 0.88, 
           label = "8AM\n(Analysis\nEnds)", color = "darkgreen", hjust = 0, size = 3.5, fontface = "bold") +
  # Color scheme
  scale_color_manual(values = c("Low Nightlife" = "#2C7BB6", 
                                 "Medium Nightlife" = "#FDB863", 
                                 "High Nightlife" = "#D7191C")) +
  # X-axis - all 24 hours
  scale_x_continuous(
    breaks = seq(0, 23, 2),
    labels = c("12AM", "2AM", "4AM", "6AM", "8AM", "10AM",
               "12PM", "2PM", "4PM", "6PM", "8PM", "10PM")
  ) +
  # Y-axis
  scale_y_continuous(labels = scales::comma) +
  # Labels
  labs(
    title = "24-Hour Crime Patterns: Nightlife Window (8PM-8AM) vs. Daytime Baseline",
    subtitle = "Yellow shading = Analysis focus (8PM-8AM) | All zones assigned via lat/lon spatial join | NYC 2019-2023",
    x = "Hour of Day",
    y = "Total Crime Incidents",
    color = "Nightlife Density" ) +
 
   # Theme
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 17),
    plot.subtitle = element_text(size = 10, color = "gray40"),
    legend.position = "bottom",
    panel.grid.minor = element_blank(),
    plot.caption = element_text(size = 9, color = "gray50", hjust = 0),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

# Display
print(viz1)

Show code:
# Save
ggsave("output/visualizations/viz1_hourly_crime_patterns_24hr.png", 
       viz1, width = 14, height = 8, dpi = 300)

VISUALIZATION 2: Interactive Crime Intensity Heatmap

Show code
library(tidyverse)
library(plotly)
library(knitr)
library(kableExtra)

# Load 24-hour dataset
nypd_24hour <- readRDS("data/processed/nypd_24hour_with_zones.rds")

# Create heatmap data with three time windows
heatmap_data <- nypd_24hour %>%
  mutate(
    time_window = case_when(
      complaint_hour >= 8 & complaint_hour <= 19 ~ "Daytime 8AM-8PM",
      complaint_hour >= 20 | complaint_hour <= 3 ~ "Nightlife 8PM-4AM",
      complaint_hour >= 4 & complaint_hour <= 7 ~ "Post-Closing 4AM-8AM"
    ),
    time_window = factor(time_window,
                        levels = c("Daytime 8AM-8PM",
                                  "Nightlife 8PM-4AM",
                                  "Post-Closing 4AM-8AM"))
  ) %>%
  filter(!is.na(time_window)) %>%
  group_by(time_window, nightlife_category) %>%
  summarise(total_crimes = n(), .groups = "drop") %>%
  mutate(
    hours_in_window = case_when(
      time_window == "Daytime 8AM-8PM" ~ 12,
      time_window == "Nightlife 8PM-4AM" ~ 8,
      time_window == "Post-Closing 4AM-8AM" ~ 4
    ),
    crimes_per_hour = round(total_crimes / hours_in_window, 0)
  )

# Save for potential reuse
dir.create("data/processed", showWarnings = FALSE, recursive = TRUE)
saveRDS(heatmap_data, "data/processed/heatmap_data.rds")

# Create matrix for heatmap visualization
heatmap_matrix <- heatmap_data %>%
  select(time_window, nightlife_category, crimes_per_hour) %>%
  pivot_wider(names_from = nightlife_category, values_from = crimes_per_hour) %>%
  column_to_rownames("time_window") %>%
  as.matrix()

# Create hover text
hover_text <- heatmap_data %>%
  mutate(
    text = paste0(
      "<b>", time_window, "</b><br>",
      "<b>", nightlife_category, "</b><br><br>",
      "Crimes per Hour: <b>", format(crimes_per_hour, big.mark = ","), "</b><br>",
      "Total Crimes: ", format(total_crimes, big.mark = ","), "<br>",
      "Window Duration: ", hours_in_window, " hours"
    )
  ) %>%
  select(time_window, nightlife_category, text) %>%
  pivot_wider(names_from = nightlife_category, values_from = text) %>%
  column_to_rownames("time_window") %>%
  as.matrix()

# Create interactive plotly heatmap
viz2 <- plot_ly(
  z = heatmap_matrix,
  x = colnames(heatmap_matrix),
  y = rownames(heatmap_matrix),
  type = "heatmap",
  text = hover_text,
  hovertemplate = "%{text}<extra></extra>",
  colorscale = list(
    c(0, "rgb(255, 237, 160)"),
    c(0.5, "rgb(254, 178, 76)"),
    c(1, "rgb(240, 59, 32)")
  ),
  colorbar = list(
    title = list(text = "Crimes<br>per Hour"),
    len = 0.7,
    thickness = 20
  )
) %>%
  add_annotations(
    x = rep(colnames(heatmap_matrix), each = nrow(heatmap_matrix)),
    y = rep(rownames(heatmap_matrix), times = ncol(heatmap_matrix)),
    text = format(as.vector(t(heatmap_matrix)), big.mark = ","),
    font = list(size = 16, family = "Arial Black", color = "white"),
    showarrow = FALSE
  ) %>%
  layout(
    title = list(
      text = "<b>When Is NYC Most Dangerous? Crime Rates Across Different Times of Day</b><br><sub>Hover for details | Darker red = More crimes per hour | NYC 2019-2023</sub>",
      x = 0.5
    ),
    xaxis = list(title = "<b>Nightlife Zone Category</b>", side = "bottom"),
    yaxis = list(title = "", autorange = "reversed"),
    margin = list(l = 120, r = 150, t = 100, b = 80)
  )

viz2
Show code
# Save visualization
dir.create("output/visualizations", showWarnings = FALSE, recursive = TRUE)
htmlwidgets::saveWidget(viz2, 
                        file = "output/visualizations/viz2_crime_heatmap_interactive.html",
                        selfcontained = TRUE)

# CREATE SUMMARY TABLE
summary_enhanced <- heatmap_data %>%
  arrange(nightlife_category, time_window) %>%
  mutate(
    intensity_level = case_when(
      crimes_per_hour >= quantile(crimes_per_hour, 0.75) ~ "Very High",
      crimes_per_hour >= quantile(crimes_per_hour, 0.50) ~ "High",
      crimes_per_hour >= quantile(crimes_per_hour, 0.25) ~ "Medium",
      TRUE ~ "Low"
    ),
    intensity_level = factor(intensity_level, levels = c("Low", "Medium", "High", "Very High"))
  )

# Simple clean table with legend
summary_enhanced %>%
  mutate(
    Zone = as.character(nightlife_category),
    Period = as.character(time_window),
    Crimes = format(total_crimes, big.mark = ","),
    Hours = hours_in_window,
    Rate = format(crimes_per_hour, big.mark = ","),
    Risk = as.character(intensity_level)
  ) %>%
  select(Zone, Period, Crimes, Hours, Rate, Risk) %>%
  kable(caption = "Crime Intensity by Zone and Time (NYC 2019-2023)",
        align = c("l", "l", "r", "c", "r", "c")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), 
                full_width = TRUE,
                font_size = 13) %>%
  row_spec(0, bold = TRUE, background = "#2C3E50", color = "white") %>%
  row_spec(which(summary_enhanced$intensity_level == "Very High"),
           background = "#E74C3C", color = "white", bold = TRUE) %>%
  row_spec(which(summary_enhanced$intensity_level == "High"),
           background = "#E67E22", color = "white", bold = TRUE) %>%
  row_spec(which(summary_enhanced$intensity_level == "Medium"),
           background = "#F39C12", color = "black") %>%
  row_spec(which(summary_enhanced$intensity_level == "Low"),
           background = "#27AE60", color = "white", bold = TRUE) %>%
  pack_rows("Low Nightlife Zones", 1, 3, 
            label_row_css = "background-color: #34495E; color: white; font-weight: bold;") %>%
  pack_rows("Medium Nightlife Zones", 4, 6,
            label_row_css = "background-color: #34495E; color: white; font-weight: bold;") %>%
  pack_rows("High Nightlife Zones", 7, 9,
            label_row_css = "background-color: #34495E; color: white; font-weight: bold;") %>%
  footnote(
    general = paste0(
      "<span style='background-color: #27AE60; color: white; padding: 2px 8px; border-radius: 3px;'><b>Low</b></span> ",
      "<span style='background-color: #F39C12; color: black; padding: 2px 8px; border-radius: 3px;'><b>Medium</b></span> ",
      "<span style='background-color: #E67E22; color: white; padding: 2px 8px; border-radius: 3px;'><b>High</b></span> ",
      "<span style='background-color: #E74C3C; color: white; padding: 2px 8px; border-radius: 3px;'><b>Very High</b></span> ",
      "| Risk levels: Low = Bottom 25% | Medium = 25-50% | High = 50-75% | Very High = Top 25%"
    ),
    general_title = "Legend:",
    escape = FALSE
  )
Crime Intensity by Zone and Time (NYC 2019-2023)
Zone Period Crimes Hours Rate Risk
Low Nightlife Zones
Low Nightlife Daytime 8AM-8PM 1,394,664 12 116,222 Very High
Low Nightlife Nightlife 8PM-4AM 691,323 8 86,415 Very High
Low Nightlife Post-Closing 4AM-8AM 166,817 4 41,704 Very High
Medium Nightlife Zones
Medium Nightlife Daytime 8AM-8PM 90,556 12 7,546 High
Medium Nightlife Nightlife 8PM-4AM 43,772 8 5,472 High
Medium Nightlife Post-Closing 4AM-8AM 10,332 4 2,583 Medium
High Nightlife Zones
High Nightlife Daytime 8AM-8PM 4,742 12 395 Medium
High Nightlife Nightlife 8PM-4AM 2,064 8 258 Low
High Nightlife Post-Closing 4AM-8AM 405 4 101 Low
Legend:
Low Medium High Very High | Risk levels: Low = Bottom 25% | Medium = 25-50% | High = 50-75% | Very High = Top 25%

VISUALIZATION 3: Interactive Geospatial Map - Crime & Nightlife Density by Zone

Show code:
library(tidyverse)
library(sf)
library(leaflet)
library(htmlwidgets)


# Load data
nypd_final <- readRDS("data/processed/nypd_analysis_final.rds")
taxi_zones <- readRDS("data/raw/taxi_zones.rds")

# Aggregate by zone with spatial data
zone_summary <- nypd_final %>%
  group_by(LocationID, zone, borough, nightlife_category, n_nightlife_venues) %>%
  summarise(
    total_crimes = n(),
    violent_crimes = sum(crime_category == "Violent"),
    property_crimes = sum(crime_category == "Property"),
    disorder_crimes = sum(crime_category == "Disorder"),
    pct_after_midnight = 100 * mean(time_period == "After Midnight (12AM-4AM)"),
    .groups = "drop"
  ) %>%
  mutate(
    n_nightlife_venues = replace_na(n_nightlife_venues, 0)
  ) %>%
  # Join to spatial data
  left_join(taxi_zones %>% select(LocationID, geometry), by = "LocationID") %>%
  st_as_sf() %>%
  # Transform to WGS84 for Leaflet
  st_transform(4326) %>%
  # Calculate centroids for bubble placement
  mutate(
    centroid = st_centroid(geometry),
    lon = st_coordinates(centroid)[,1],
    lat = st_coordinates(centroid)[,2]
  )

# CRITICAL FIX: Force factor levels in correct order
zone_summary <- zone_summary %>%
  mutate(
    nightlife_category = factor(
      as.character(nightlife_category),
      levels = c("Low Nightlife", "Medium Nightlife", "High Nightlife"),
      ordered = TRUE
    )
  )



# Create color palette - ORDER MATCHES FACTOR LEVELS
color_pal <- colorFactor(
  palette = c("#2C7BB6", "#FDB863", "#D7191C"),  # Blue, Orange, Red
  levels = c("Low Nightlife", "Medium Nightlife", "High Nightlife")
)

# Create popup text
zone_summary <- zone_summary %>%
  mutate(
    popup_text = paste0(
      "<strong style='font-size:14px;'>", zone, "</strong><br>",
      "<strong>Borough:</strong> ", borough, "<br>",
      "<strong>Nightlife Category:</strong> ", nightlife_category, "<br>",
      "<strong>Nightlife Venues:</strong> ", n_nightlife_venues, "<br>",
      "<hr style='margin:5px 0;'>",
      "<strong>Total Crimes (8PM-8AM):</strong> ", format(total_crimes, big.mark = ","), "<br>",
      "<strong>Violent:</strong> ", format(violent_crimes, big.mark = ","), 
      " (", round(100*violent_crimes/total_crimes, 1), "%)<br>",
      "<strong>Property:</strong> ", format(property_crimes, big.mark = ","),
      " (", round(100*property_crimes/total_crimes, 1), "%)<br>",
      "<strong>Disorder:</strong> ", format(disorder_crimes, big.mark = ","),
      " (", round(100*disorder_crimes/total_crimes, 1), "%)<br>",
      "<strong>After-Midnight (12AM-4AM):</strong> ", round(pct_after_midnight, 1), "%"
    )
  )

# Create interactive map
viz3_map <- leaflet(zone_summary) %>%
  # Base map
  addProviderTiles(providers$CartoDB.Positron) %>%
  # Taxi zone polygons (light background)
  addPolygons(
    fillColor = ~color_pal(nightlife_category),
    fillOpacity = 0.2,
    color = "white",
    weight = 1,
    opacity = 0.5,
    highlightOptions = highlightOptions(
      weight = 2,
      color = "#666",
      fillOpacity = 0.4,
      bringToFront = TRUE
    ),
    label = ~zone,
    group = "Zone Boundaries"
  ) %>%
  # Crime bubble markers
  addCircleMarkers(
    lng = ~lon,
    lat = ~lat,
    radius = ~sqrt(total_crimes) / 15,
    fillColor = ~color_pal(nightlife_category),
    fillOpacity = 0.7,
    color = "white",
    weight = 2,
    popup = ~popup_text,
    label = ~paste0(zone, ": ", format(total_crimes, big.mark = ","), " crimes"),
    labelOptions = labelOptions(
      style = list("font-weight" = "normal", padding = "3px 8px"),
      textsize = "12px",
      direction = "auto"
    ),
    group = "Crime Bubbles"
  ) %>%
  # Legend - REVERSED for correct Leaflet display
  addLegend(
    position = "bottomright",
    colors = c("#D7191C", "#FDB863", "#2C7BB6"),  # REVERSED: Red, Orange, Blue
    labels = c("High Nightlife (≥5 venues)", 
               "Medium Nightlife (2-4 venues)", 
               "Low Nightlife (0-1 venues)"),    # REVERSED: High, Medium, Low
    title = "Nightlife Density",
    opacity = 0.8
  ) %>%
  # Add title control
  addControl(
    html = "<div style='background:white; padding:10px; border-radius:5px; box-shadow:0 0 15px rgba(0,0,0,0.2);'>
            <h4 style='margin:0; font-weight:bold;'>NYC Nightlife Crime Map (8PM-8AM, 2019-2023)</h4>
            <p style='margin:5px 0 0 0; font-size:12px; color:#666;'>
            Bubble size = Total crimes | Click bubbles for details
            </p></div>",
    position = "topright"
  ) %>%
  # Layer controls - MUST HAVE %>% BEFORE THIS
  addLayersControl(
    overlayGroups = c("Zone Boundaries", "Crime Bubbles"),
    options = layersControlOptions(collapsed = FALSE)
  ) %>%
  # Set initial view (NYC center)
  setView(lng = -73.935242, lat = 40.730610, zoom = 11)

# Display map
viz3_map
Show code:
# Save as HTML
saveWidget(viz3_map, 
           file = "output/visualizations/viz3_interactive_crime_map.html",
           selfcontained = TRUE)

VISUALIZATION 4: Crime Distribution by Borough

Show code:
library(tidyverse)
library(scales)

cat("Creating professional crime distribution visualization...\n")
Creating professional crime distribution visualization...
Show code:
# Load data
nypd_final <- readRDS("data/processed/nypd_analysis_final.rds")

# Define McKinsey color palette FIRST
mckinsey_colors <- c(
  "Violent" = "#00205B",
  "Property" = "#0072CE",
  "Disorder" = "#41B6E6",
  "Other" = "#8DC8E8",
  "Drug" = "#C6DCEA"
)

# Prepare data
crime_summary <- nypd_final %>%
  count(borough, crime_category) %>%
  group_by(borough) %>%
  mutate(total = sum(n)) %>%
  ungroup()

# Create visualization
ggplot(crime_summary, aes(x = reorder(borough, -total), y = n, fill = crime_category)) +
  geom_col(width = 0.7, color = "white", size = 0.5) +
  scale_fill_manual(values = mckinsey_colors) +
  scale_y_continuous(labels = comma, expand = c(0, 0)) +
  labs(
    title = "Crime Distribution by Borough and Type",
    subtitle = "NYC Nighttime Crime (8PM-8AM), 2019-2023",
    x = NULL,
    y = "Number of Crimes",
    fill = "Crime Type",
    caption = "Source: NYPD Complaint Data"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 16, hjust = 0),
    plot.subtitle = element_text(color = "gray40", size = 11, hjust = 0),
    plot.caption = element_text(color = "gray50", size = 9, hjust = 1),
    axis.text = element_text(color = "gray20"),
    axis.title.y = element_text(face = "bold", color = "gray20"),
    legend.position = "right",
    legend.title = element_text(face = "bold", size = 10),
    panel.grid.major.x = element_blank(),
    panel.grid.minor = element_blank(),
    plot.margin = margin(20, 20, 20, 20)
  )

VISUALIZATION 5: Stacked Area Chart - Crime Type Evolution Through the Night

Show code:
library(tidyverse)

cat("Creating stacked area chart visualization...\n")
Creating stacked area chart visualization...
Show code:
# Load data
nypd_final <- readRDS("data/processed/nypd_analysis_final.rds")

# Focus on high-nightlife zones and major crime types
crime_evolution <- nypd_final %>%
  filter(nightlife_category == "High Nightlife") %>%
  filter(crime_category %in% c("Violent", "Property", "Disorder")) %>%
  group_by(complaint_hour, crime_category) %>%
  summarise(n_crimes = n(), .groups = "drop") %>%
  # Calculate percentages
  group_by(complaint_hour) %>%
  mutate(
    total_hour = sum(n_crimes),
    pct = 100 * n_crimes / total_hour
  ) %>%
  ungroup()

# Create plot
viz5 <- ggplot(crime_evolution, 
               aes(x = complaint_hour, y = n_crimes, fill = crime_category)) +
  # Stacked areas
  geom_area(alpha = 0.8, color = "white", size = 0.5) +
  # Reference lines
  geom_vline(xintercept = 0, linetype = "dashed", color = "white", size = 1) +
  geom_vline(xintercept = 4, linetype = "dashed", color = "white", size = 1) +
  # Annotations
  annotate("text", x = 0.5, y = max(crime_evolution %>% 
                                     group_by(complaint_hour) %>% 
                                     summarise(total = sum(n_crimes)) %>% 
                                     pull(total)) * 0.95,
           label = "Midnight", color = "white", hjust = 0, size = 4, fontface = "bold") +
  annotate("text", x = 4.5, y = max(crime_evolution %>% 
                                     group_by(complaint_hour) %>% 
                                     summarise(total = sum(n_crimes)) %>% 
                                     pull(total)) * 0.95,
           label = "Bars Close", color = "white", hjust = 0, size = 4, fontface = "bold") +
  # Color scheme
  scale_fill_manual(
    values = c("Violent" = "#D7191C",
               "Property" = "#FDAE61", 
               "Disorder" = "#ABD9E9"),
    labels = c("Violent Crimes", "Property Crimes", "Disorder/Public Nuisance")
  ) +
  # X-axis
  scale_x_continuous(
    breaks = c(20, 21, 22, 23, 0, 1, 2, 3, 4, 5, 6, 7),
    labels = c("8PM", "9PM", "10PM", "11PM", 
               "12AM", "1AM", "2AM", "3AM", "4AM", "5AM", "6AM", "7AM")
  ) +
  # Y-axis
  scale_y_continuous(labels = scales::comma) +
  # Labels
  labs(
    title = "Crime Type Composition Throughout the Night (High-Nightlife Zones Only)",
    subtitle = "Stacked area shows total crimes AND relative composition | NYC 2019-2023",
    x = "Hour of Day",
    y = "Number of Crime Incidents",
    fill = "Crime Type",
    caption = "High-nightlife zones defined as ≥5 nightlife venues per taxi zone"
  ) +
  # Theme
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 17),
    plot.subtitle = element_text(size = 11, color = "gray40"),
    legend.position = "bottom",
    panel.grid.minor = element_blank(),
    plot.caption = element_text(size = 9, color = "gray50", hjust = 0),
    axis.text.x = element_text(angle = 45, hjust = 1),
    panel.background = element_rect(fill = "gray20"),
    plot.background = element_rect(fill = "gray20"),
    text = element_text(color = "white"),
    axis.text = element_text(color = "white"),
    panel.grid.major = element_line(color = "gray40")
  )

# Display
print(viz5)

Show code:
# Save
ggsave("output/visualizations/viz5_stacked_area_crime_types.png", 
       viz5, width = 14, height = 8, dpi = 300)

VISUALIZATION 6: Borough Comparison of Spike & Drop Patterns

Show code:
library(tidyverse)
library(scales)

nypd_final <- readRDS("data/processed/nypd_analysis_final.rds")

# Calculate midnight spike
midnight_spike <- nypd_final %>%
  filter(time_period %in% c("Early Nightlife (8PM-12AM)", 
                            "After Midnight (12AM-4AM)")) %>%
  group_by(borough, time_period) %>%
  summarise(crimes = n(), .groups = "drop") %>%
  pivot_wider(names_from = time_period, values_from = crimes) %>%
  mutate(
    early_rate = `Early Nightlife (8PM-12AM)` / 4,
    midnight_rate = `After Midnight (12AM-4AM)` / 4,
    pct_change = 100 * (midnight_rate / early_rate - 1)
  ) %>%
  filter(!is.na(borough), borough != "")

ggplot(midnight_spike, aes(x = pct_change, y = reorder(borough, pct_change))) +
  geom_col(fill = "#E74C3C", alpha = 0.8) +
  geom_vline(xintercept = 0, color = "black", linewidth = 1) +
  geom_text(aes(label = paste0("+", round(pct_change, 1), "%")),
            hjust = -0.1, fontface = "bold", size = 4) +
  labs(
    title = "Crime INCREASES After Midnight in All Boroughs",
    subtitle = "% change in crime rate: After Midnight (12AM-4AM) vs Early Evening (8PM-12AM)",
    x = "Percentage Increase in Crime Rate",
    y = "",
    caption = "Positive values = Crime went UP after midnight"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 16, color = "#E74C3C"),
    axis.text.y = element_text(face = "bold", size = 12),
    panel.grid.minor = element_blank()
  )

VISUALIZATION 7: Interactive Geospatial Map - Crime & Nightlife Density by Zone

Show code:
library(tidyverse)
library(scales)

# Calculate post-closing drop
closing_drop <- nypd_final %>%
  filter(time_period %in% c("After Midnight (12AM-4AM)",
                            "Post-Closing Day (4AM-8AM)")) %>%
  group_by(borough, time_period) %>%
  summarise(crimes = n(), .groups = "drop") %>%
  pivot_wider(names_from = time_period, values_from = crimes) %>%
  mutate(
    midnight_rate = `After Midnight (12AM-4AM)` / 4,
    morning_rate = `Post-Closing Day (4AM-8AM)` / 4,
    pct_change = 100 * (morning_rate / midnight_rate - 1)
  ) %>%
  filter(!is.na(borough), borough != "")

ggplot(closing_drop, aes(x = pct_change, y = reorder(borough, -pct_change))) +
  geom_col(fill = "#2C7BB6", alpha = 0.8) +
  geom_vline(xintercept = 0, color = "black", linewidth = 1) +
  geom_text(aes(label = paste0(round(pct_change, 1), "%")),
            hjust = 1.1, fontface = "bold", size = 4, color = "white") +
  labs(
    title = "Crime DECREASES After Bars Close in All Boroughs",
    subtitle = "% change in crime rate: Early Morning (4AM-8AM) vs Late Night (12AM-4AM)",
    x = "Percentage Decrease in Crime Rate",
    y = "",
    caption = "Negative values = Crime went DOWN after 4AM"
  ) +
  scale_x_continuous(labels = function(x) paste0(x, "%")) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 16, color = "#2C7BB6"),
    axis.text.y = element_text(face = "bold", size = 12),
    panel.grid.minor = element_blank()
  )

VISUALIZATION 8: Crime-Per-Venue Ratio Across NYC’s Top Nightlife Districts

Show code:
library(tidyverse)

nypd_final <- readRDS("data/processed/nypd_analysis_final.rds")
nightlife_density <- readRDS("data/processed/nightlife_density_by_zone.rds")

paradox_data <- nypd_final %>%
  filter(time_period == "After Midnight (12AM-4AM)") %>%
  group_by(zone) %>%
  summarise(total_crimes = n(), .groups = "drop") %>%
  left_join(nightlife_density %>% select(zone, n_nightlife_venues), by = "zone") %>%
  filter(!is.na(n_nightlife_venues), n_nightlife_venues > 0) %>%
  mutate(crime_per_venue = total_crimes / n_nightlife_venues) %>%
  arrange(desc(n_nightlife_venues)) %>%
  head(30)

sweet_spot <- paradox_data %>%
  filter(n_nightlife_venues >= quantile(n_nightlife_venues, 0.6),
         crime_per_venue <= quantile(crime_per_venue, 0.4))

sweet_spot_count <- nrow(sweet_spot)
venue_range <- paste0(min(paradox_data$n_nightlife_venues), "-", 
                      max(paradox_data$n_nightlife_venues))

ggplot(paradox_data, aes(x = reorder(zone, n_nightlife_venues))) +
  # FIRST: Draw green highlight zone (in back)
  {if(sweet_spot_count > 0) 
    annotate("rect", 
             xmin = match(min(sweet_spot$zone), 
                         levels(reorder(paradox_data$zone, paradox_data$n_nightlife_venues))) - 0.5,
             xmax = match(max(sweet_spot$zone), 
                         levels(reorder(paradox_data$zone, paradox_data$n_nightlife_venues))) + 0.5,
             ymin = -Inf, ymax = Inf,
             fill = "#90EE90", alpha = 0.3)
  } +
  # SECOND: Blue bars for venues (MORE VISIBLE)
  geom_col(aes(y = n_nightlife_venues), 
           fill = "#4A90E2", 
           alpha = 0.5,           # Semi-transparent so line shows through
           color = "#2E5C8A",     # Dark blue border
           size = 0.3) +
  # THIRD: Red line for crime rate (drawn on top)
  geom_line(aes(y = crime_per_venue * 10, group = 1), 
            color = "#E74C3C", 
            size = 2) +           # Thicker line
  geom_point(aes(y = crime_per_venue * 10), 
             color = "#E74C3C", 
             size = 4,
             shape = 21,          # Circle with border
             fill = "#E74C3C",
             stroke = 1) +
  scale_y_continuous(
    name = "BLUE BARS: Number of Nightlife Venues →",
    sec.axis = sec_axis(~ . / 10, 
                        name = "← RED LINE: Crimes per Venue")
  ) +
  labs(
    title = "The Nightlife Paradox: More Venues ≠ More Crime Per Venue",
    subtitle = sprintf("Top 30 zones (%s venues) | Green = SWEET SPOT: %d well-managed zones with high venues + low crime/venue", 
                      venue_range, sweet_spot_count),
    x = "",
    caption = "Two metrics: (1) Total venues (blue bars, left axis) | (2) Crimes per venue (red line, right axis) | Green zones achieve safety at scale"
  ) +
  coord_flip() +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(size = 11, color = "gray30"),
    axis.title.y.left = element_text(color = "#4A90E2", face = "bold", size = 12),
    axis.title.y.right = element_text(color = "#E74C3C", face = "bold", size = 12),
    panel.grid.minor = element_blank(),
    plot.caption = element_text(size = 10, hjust = 0, color = "gray40")
  )

Show code:
ggsave("output/visualizations/viz_paradox_zone.png",
       width = 14, height = 10, dpi = 300)

VISUALIZATION 9: Crime Hotspot Calendar

Show code:
library(tidyverse)
library(viridis)

nypd_final <- readRDS("data/processed/nypd_analysis_final.rds")

# Prepare data - use complaint_year which already exists
heatmap_data <- nypd_final %>%
  filter(!is.na(zone), !is.na(complaint_year)) %>%
  group_by(zone, complaint_year) %>%
  summarise(total_crimes = n(), .groups = "drop") %>%
  # Get top 15 zones by total crime
  group_by(zone) %>%
  mutate(zone_total = sum(total_crimes)) %>%
  ungroup() %>%
  filter(zone_total >= quantile(zone_total, 0.9)) %>%
  arrange(desc(zone_total)) %>%
  mutate(zone = fct_reorder(zone, zone_total))

# Create heatmap
ggplot(heatmap_data, aes(x = factor(complaint_year), y = zone, fill = total_crimes)) +
  geom_tile(color = "white", size = 0.5) +
  scale_fill_gradient(low = "#FFF5E6", high = "#8B4513",
                      name = "Crime\nIncidents",
                      labels = scales::comma) +
  labs(
    title = "NYC Nightlife Crime Hotspots: Which Zones, Which Years?",
    subtitle = "Darker colors = More crimes | Top 15 high-crime zones (2019-2023)",
    x = "Year",
    y = "",
    caption = "Data: NYPD nighttime crimes (8PM-8AM)"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    axis.text.y = element_text(size = 10),
    panel.grid = element_blank(),
    legend.position = "right"
  )

Show code:
ggsave("output/visualizations/viz_heatmap_zones_years.png",
       width = 14, height = 8, dpi = 300)

CONCLUSION

Key Findings

This analysis provides strong evidence that crime rates rise significantly around nightlife districts after midnight, with critical implications for urban policy and public safety management.

  1. The After-Midnight Crime Spike is Universal Crime rates increase by 30-40% after midnight (12AM-4AM) compared to early evening (8PM-12AM) across all NYC boroughs, confirming that nightlife activity drives heightened criminal behavior regardless of location.

  2. The 4AM Drop Proves Causality The dramatic 32-44% crime decrease after bars close at 4AM provides causal evidence that nightlife activity—not just venue presence—drives crime patterns. This temporal pattern validates the distinction between infrastructure density and actual nighttime activity levels.

  3. The Nightlife Paradox: Density ≠ Danger Counterintuitively, high-density entertainment districts can achieve lower crime-per-venue ratios than dispersed nightlife areas. Ten “sweet spot” zones demonstrate that concentrated nightlife with proper management creates safer environments than scattered venues in residential areas, challenging assumptions that more bars automatically mean more crime.

  4. Residential Zones Face Greater Risk Low-nightlife residential areas experience significantly higher overall crime rates (116,222 crimes/hour during daytime) compared to entertainment districts, suggesting that resource allocation should prioritize comprehensive community safety over exclusive nightlife policing.

Policy Recommendations

  1. Resource Allocation Strategy: 70/30 Split
  • 70% focus on residential zones: Implement comprehensive crime prevention addressing root causes in high-crime residential areas
  • 30% focus on entertainment districts: Deploy targeted nightlife management strategies including extended police presence during peak hours (12AM-4AM) and strategic venue clustering
  1. Encourage Managed Entertainment Districts Rather than dispersing nightlife venues across residential neighborhoods, urban planners should create concentrated, well-managed entertainment zones that can achieve safety at scale while preserving residential quality of life.

  2. Data-Driven Policing Police departments should shift patrol schedules to match temporal crime patterns, with maximum deployment during the 12AM-4AM window and strategic repositioning after closing time to address post-nightlife incidents.

Limitations and Future Research

This analysis focuses on venue density as a proxy for nightlife activity levels. Future research should incorporate:

  • Actual patron counts and venue capacity data
  • Detailed crime severity analysis beyond simple incident counts
  • Longitudinal analysis of policy interventions in specific entertainment districts
  • Integration with teammate findings on noise complaints, rideshare patterns, and employment dynamics

Final Insight

Nightlife districts are not inherently dangerous—they are differently dangerous than residential areas, with concentrated temporal patterns that enable targeted intervention. The key to urban nightlife policy is not prohibition or dispersion, but strategic management that balances economic vitality with public safety. NYC’s entertainment economy generates $35 billion annually; with evidence-based resource allocation, cities can capture these economic benefits while minimizing public safety costs.