The goal of the Long Island Sound Watershed Water Quality Dashboard is to provide users with an interactive mapping tool that allows them to explore publicly available stream and river water quality data, change in water quality data over time, and water quality data linked with common human impacts, including watershed land use land cover, septic density, dams, and wastewater treatment plants.
Support for this project was provided by the Long Island Sound Study and the College of Agriculture, Health and Natural Resources at the University of Connecticut.
Methods
Water Quality Data Acquisition
All water quality data within the LIS water quality dashboard were downloaded from the Water Quality Portal (WQP; https://www.waterqualitydata.us/) with the ‘dataRetrival’ package (De Cicco et al. 2022) in R Statistical Software (R Core Team 2021). The dashboard includes all available stream/river site locations within the U.S. portion of the LIS watershed sampled between 1957 and 2021.
Watershed Delineation
We snapped all locations within the site list to nearest stream reach (based on the National Hydrography Dataset (NHD); USGS, 2019a) within 500m, and used these updated locations as “pour-point” locations for watershed delineations. Each point was delineated using the StreamStats R package (https://github.com/markwh/streamstats.git). Results were manually inspected for accuracy with a focus on small watersheds since there is a stronger tendency to snap those locations to incorrect stream reaches. Sites with clear corrections were manually adjusted to represent the accurate ‘pour point’ location and run via online StreamStats tools (Ries et al., 2017; USGS, 2019b). We eliminated 235 sites because the location did not clearly coincide with a stream reach.
Water Quality Data Cleaning
Data were filtered to only includes sites that had a corresponding watershed delineation, and had one or more of the following common water quality measurements: "Ammonia and ammonium", 'Total dissolved solids', "pH", "Specific conductance", "Nitrate", "Orthophosphate", "Phosphate-phosphorus", "Phosphate-phosphorus as P", "Soluble Reactive Phosphorus (SRP)".
We took the following steps to clean the data, which are documented within the LIS_WaterQualityData GitHub repository (Haredkb/LIS_WaterQualityData.git) using the code “SVIC_LISDatabase_CleanUp.R” (1) Data points with the following error codes were removed: the ResultDetectionConditionText was "Detected Not Quantified", "Present Above Quantification Limit", or "Systematic Contamination". (2) We eliminated duplicate samples. (3) We converted all negative values to 0 and created a data column to denote when measurements were below detection limits. (4) We converted all measurements to consistent units as follows:
- Nitrate: all data was converted to mg N L-1. Units were identified via the Measure.Unit.Code or USGSPCode.
- Orthophosphate: All phosphate data were converted to units of mg P L-1. All characteristic names were considered Orthophosphate (from "Orthophosphate", "Phosphate-phosphorus", "Phosphate-phosphorus as P","Soluble Reactive Phosphorus (SRP)").
- Ammonia and Ammonium: all data were converted to mg N L-1.
- Total dissolved solids (TDS) were filtered for units of “mgL-1” , the remainder were removed from the analysis.
Site Influence
For each site in the data portal, we determined whether it was likely affected by a dam, diversion, or wastewater treatment plant (WWTP). For dams, we snapped locations from the National Inventory of Dams (USACE, 2021.) to NHD stream reaches. For each water quality site, we identified two categories as being potentially dam influenced if it was <10km or <50km downstream from the nearest dam. We only considered dams with the purpose of “Water Supply” and have a minimum of 100 Storage acre ft. For WWTPs, we snapped locations from the EPA Wastewater Treatment facilities database (https://echo.epa.gov/tools/web-services/facility-search-water) to NHD stream reaches, and flagged any sites downstream within the network of a WWTP. For diversions, we used the Connecticut Department of Energy and Environmental Protection hydrography data (CT DEEP, 2011), which identified stream reaches with varying diversion types (“Aqueduct","Canal, Lock, or Sluice Gate","Underground Aqueduct","Spillway", or "Ditch or Canal"). We do not include diversion information for states other than Connecticut. Dates are not considered within the analysis, thus only current conditions are used within the site influence analysis.
Baseflow Condition
We conducted baseflow regression using ‘bfi’ function within USGS-R ‘DVstats’ package version 0.3.4 (Lorenz, 2017) for all available site locations with streamflow data. We used R ‘flowfill’ from the R package ‘baytrends’ (Murphy et al. 2023) to fill data gaps less than seven days. For data gaps longer than seven days, we ran separate baseflow regressions for each time period. The result is daily baseflow metrics (proportion of streamflow as baseflow and proportion of streamflow as quickflow) for all streamflow sites and collection time periods. For each streamflow site and day, we categorized conditions as baseflow when the proportion of baseflow was greater than 75%.
For each water quality sample collected, we determined whether it was collected during baseflow or nonbaseflow conditions. Because the majority of sites within the water quality dataset are not directly associated with streamflow data, we joined each water quality site with all available streamflow records within the same HUC06 basin (USGS, 2016). For each HUC we determined that all water quality samples collected on a given day were collected during baseflow conditions when all of the streamflow sites within that HUC were at baseflow (i.e., baseflow proportion greater than 75%). For time periods where there were no flow records for a given HUC06, all records within the LIS basin are used and if all were at greater than 75% baseflow, then that period was considered to be baseflow conditions, and workflow can be found within the repository code “2_NHD_DownloadSetUp.R”.
Code Repository
All of the code used to download, clean, and analyze the information presented in the LIS Water Quality Dashboard is available https://github.com/Haredkb/LIS_WaterQualityData.git.
Citations
CT DEEP— Connecticut State Department of Environment and Protection. (2011). Connecticut Hydrography. Retrieved from https://services1.arcgis.com/FjPcSmEFuDYlIdKC/ArcGIS/rest/services/Connecticut_Hydrography_Set/FeatureServer.
De Cicco, L.A., Hirsch, R.M., Lorenz, D., Watkins, W.D., Johnson, M., 2022, dataRetrieval: R packages for discovering and retrieving water data available from Federal hydrologic web services, v.2.7.12, doi:10.5066/P9X4L3GE
Lorenz D (2017). DVstats: Functions to manipulate daily-values data_. R package version 0.3.4.
Murphy R, Perry E, Keisman J, Harcum J, Leppo EW (2023). baytrends: Long Term Water Quality Trend Analysis. R package version 2.0.9
Ries, K. G., III, Newson, J. K., Smith, M. J., Guthrie, J. D., Steeves, P. A., Haluska, T., Kolb, K., Thompson, R. F., Santoro, R. D., & Vraga, H. W. (2017). StreamStats, version 4 (USGS numbered series No. 2017–3046), StreamStats, version 4, fact sheet. U.S. Geological Survey. https://doi.org/10.3133/fs20173046
USACE—U.S Army Corps of Engineers. National Inventory of Dams. 2021. Available online: https://nid.sec.usace.army.mil/ (accessed on 15 January 2023).
USGS—U.S. Geological Survey, 2016, National Water Information System data available on the World Wide Web (USGS Water Data for the Nation), accessed [January 10, 2023], at URL [http://waterdata.usgs.gov/nwis/].
USGS—U.S. Geological Survey, 2019a, National Hydrography Dataset (ver. USGS National Hydrography Dataset Best Resolution (NHD) for Hydrologic Unit (HU) 4 - 2001 (published 20191002)), accessed 05 Feb, 2023 at URL https://www.usgs.gov/national-hydrography/access-national-hydrography-products
USGS—U.S. Geological Survey, 2019b, The StreamStats program, online at https://streamstats.usgs.gov/ss/, accessed on 05 Feb 2023.