Sufficiently Minimal
https://sufficientlyminimal.netlify.com/
Recent content on Sufficiently MinimalHugo -- gohugo.ioen-usSun, 28 Jan 2018 00:00:00 +0000Congaree Big Trees
https://sufficientlyminimal.netlify.com/2018/01/28/congaree-big-trees/
Sun, 28 Jan 2018 00:00:00 +0000https://sufficientlyminimal.netlify.com/2018/01/28/congaree-big-trees/Introduction This post is an expansion on a term project completed for SCHC 312 at the University of South Carolina during the Fall 2016 semester taught by Dr. John Grego. The project involved validating a big-tree database and then subsequently mapping the results. Ultimately though, it left us with more questions than answers.
The Congaree National Park is South Carolina’s only National Park. It is known most prominently for its abundance of massive trees that are also some of the oldest preserved individuals of their species in any floodplain ecosystem.Charleston Air Quality
https://sufficientlyminimal.netlify.com/2018/01/27/charleston-air-quality/
Sat, 27 Jan 2018 00:00:00 +0000https://sufficientlyminimal.netlify.com/2018/01/27/charleston-air-quality/Intro Thanks to BrendaSo for posting this neat and tidy air pollution data. This data set got me interested in plotting some different air pollutants in Charleston, SC over the past several years.
First here’s some code to read the full data set and get the parts I’m interested in.
pol_sc <- read_csv("pollution_us_2000_2016.csv") %>% filter(State == "South Carolina") %>% select(-c(X1,`State Code`,`County Code`,`Site Num`,State,City)) %>% mutate(Date = `Date Local`) %>% filter(substr(Date,1,4) >= 2006) Below are time series plots showing daily averages of pollutant concentrations for a selection of different air pollutants, with points colored by EPA’s Air Quality Index for that day.Twin Peaks Season 1 Script Analysis
https://sufficientlyminimal.netlify.com/2018/01/03/twin-peaks-season-1-script-analysis/
Wed, 03 Jan 2018 00:00:00 +0000https://sufficientlyminimal.netlify.com/2018/01/03/twin-peaks-season-1-script-analysis/Intro The early 90’s series Twin Peaks by David Lynch and Mark Frost is probably my favorite show of all time. The show’s eerie blend of humor in the midst of sorrow and clash between characters of pure good and pure evil make the show so unique. After working through a copy of Text Mining with R by Julian Silge and David Robinson, I decided to try to get my hands on copies of the transcripts of Twin Peaks episodes to apply some newly learned tidytext principles.UFO Sightings
https://sufficientlyminimal.netlify.com/2017/12/22/ufo-sightings/
Fri, 22 Dec 2017 00:00:00 +0000https://sufficientlyminimal.netlify.com/2017/12/22/ufo-sightings/This analysis looks at data on the sightings 🔭 of unidentified flying objects 🛸 covering 1904-2014. The data set contains 80332 sightings from countries all over the world with dates, times, locations and descriptions, among other features. The data is provided by the National UFO Reporting Center (NUFORC) by way of Kaggle.
Introduction to the Data There are two data sets available on Kaggle, one titled complete.csv and one titled scrubbed.Bayesian Normal Model
https://sufficientlyminimal.netlify.com/2017/11/29/bayesian-normal-model/
Wed, 29 Nov 2017 00:00:00 +0000https://sufficientlyminimal.netlify.com/2017/11/29/bayesian-normal-model/Inference for unknown mean and known variance For a sampling model \(Y|\mu, \sigma^2 \stackrel{iid}{\sim} N(\mu,\sigma^2)\) we wish to make inference on \(\mu\) assuming \(\sigma^2\) is known. The first step will be to derive an expression for \(\pi(\mu|Y=y,\sigma^2)\) which is proportional to \(\pi(\mu|\sigma^2)\pi(y|\mu,\sigma^2)\). To following prior structure will be adopted:
\[Y|\mu,\sigma^2 \sim N(\mu,\sigma^2)\]
\[\mu \sim N(\mu_0,\sigma_0^2)\]
Now we can derive an expression for \(\pi(\mu|Y=y,\sigma^2) \propto \pi(\mu|\sigma^2)\pi(y|\mu,\sigma^2)\).
\[\pi(\mu|\sigma^2)\pi(y|\mu,\sigma^2) \propto e^{-\frac{1}{2\sigma_0^2}(\mu-\mu_o)^2} e^{-\frac{1}{2\sigma^2} (y-\mu)^2}\]Gamma Poisson
https://sufficientlyminimal.netlify.com/2017/11/16/gamma-poisson/
Thu, 16 Nov 2017 00:00:00 +0000https://sufficientlyminimal.netlify.com/2017/11/16/gamma-poisson/Bayesian Inference with Count Data If we model the count of events \(X\) over a certain interval as a Poisson random variable with rate \(\lambda\), then \(X|\lambda \sim Poisson(\lambda)\). This can be thought of as a prior model for \(X\), where \(\lambda\) is itself a random variable. We can adopt a \(Gamma(\alpha,\beta)\) prior for \(\lambda\) to allow for conjugacy of the prior and posterior.
To derive the posterior and predictive distributions, we first need to find the joint distribution of \(X\) and \(\lambda\), \(f(x,\lambda)\).Beta Binomial
https://sufficientlyminimal.netlify.com/2017/11/01/beta-binomial/
Wed, 01 Nov 2017 00:00:00 +0000https://sufficientlyminimal.netlify.com/2017/11/01/beta-binomial/Bayesian Inference with Binary Data The Beta-Binomial model is a classic Bayesian framework for working with dichotomous data. If we observe a Bernoulli process over a total of \(n\) trials, where each trial has “success” probability \(\theta\), then the total number of successes in \(n\) trials \(X\) is a binomial random variable where \(X = 0,1,2,...,n\).
\[p(X=x|\theta) = {{n}\choose{x}} \theta^x (1-\theta)^{n-x}\]
The binomial random variable is known as the likelihood of this model.A Comparison of Imputation Algorithms for Modeling Water Quality
https://sufficientlyminimal.netlify.com/2017/05/01/a-comparison-of-imputation-algorithms-for-modeling-water-quality/
Mon, 01 May 2017 00:00:00 +0000https://sufficientlyminimal.netlify.com/2017/05/01/a-comparison-of-imputation-algorithms-for-modeling-water-quality/Approved by:
Primary Advisor: Dr. Edsel Peña; Professor, Dept. of Statistics
Second Reader: Dr. John Grego; Professor & Chair, Dept. of Statistics
Submitted in partial fulfillment of the requirments for
graduation with Honors from the South Carolina Honors College.
Abstract This project addresses the need for predictive modeling tools to forecast expected concentrations of fecal bacteria in recreational waters in the Charleston, SC area. Data was provided by Charleston Waterkeeper, a water quality monitoring organization that has been measuring Enterococcus faecalis concentrations at 15 recreational sites since 2013.About
https://sufficientlyminimal.netlify.com/about/
Thu, 05 May 2016 21:48:51 -0700https://sufficientlyminimal.netlify.com/about/Hello! My name is Carter Allen. I am a PhD student in biostatistics at the Medical University of South Carolina. I graduated from the University of South Carolina in the Spring of 2017 with a B.S. in statistics. I am interested in building modern data science tools, studying and refining best practices for data science workflow, Bayesian statististics, water quality modeling, and much more. I enjoy teaching statistics and statistical computing and also help researchers with statistical questions from time to time.