Now that it's possible to collect a bunch of data from the Realearth API, it might be useful to visualize some of this data over time. This might tell us about some useful Karl features for when it's time to figure out how to build predictive models from the raw imagery data. Below are an example gif of Karl from the afternoon of 8/13/18. This gif was generated using the embedded python code below, which collects, parses, and edits images from the University of Wisconsin-Madison Realearth API.
0 Comments
Predicting Karl, ie Karlcast, is going to require collected a huge number of images of Karl and relating those images back to certain times of day, atmospheric conditions, ocean water temperature, etc. Following up on the previous post about image collection, I've written a couple of python pipelines: one that initializes and then populates a pandas dataframe with data from a given GOES16 satellite channel, and one that updates a given dataframe with new information. Regular updating is probably the easiest way to build a big enough dataframe, as it requires only interaction with the RealEarth API. It has its drawbacks though- I'll have to wait around for enough data to be posted, and this pipeline--and the data--are on my local machine. Before Karl's shape and extent can be predicted, it's a good idea to take a more global look at the data. To that end, I've simply converted images of Karl to flattened numpy arrays and taken their sum- in other words, a measurement of total Karl. As can be seen below, there is a clear relationship between time and Karl over a four-day testing window. Next, we can bin that data by hour to investigate the shape of daily trends. There is clearly a strong trend by time of day, as well as a fair bit of heterogeneity around the peak and a long 'inactive' trough. That heterogeneity and inactivity are both important considerations in how a downstream ML model is built- the heterogeneity provides some structure that could aid in fine-scale predictions while the weight of the inactive periods may need to be adjusted to avoid a simple bimodal Karl prediction.
To go along with the previous post on scraping web data from the RealEarth API, here's some embedded javascript showing near-real-time data on cloud and water vapor around San Francisco. This map is only centered on San Francisco and can be used to view any other region in the continental U.S. This was produced with minor adjustments from RealEarth, and the labels option allows for the overlay of gmaps and other masks.
RealEarthâ„¢ Loading...
Following up on the previous post about exploring the Darksky API, I've developed a short notebook that allows for the collection of near-real-time data from geostationary satellites. The launch of the GOES-R satellites in 2016 came with (I collect) heaps of excitement from the earth imaging community, as these satellites provide real-time, high-resolution imagery from a number of channels of interest. A bunch of these tools are overlayed on maps by the RealEarth project out of the University of Wisconsin-Madison, and one cool use that is not dissimilar from mine is the FogToday project by Logan Williams.
GOES satellite imagery is stored in a number of places, including AWS and the Google Cloud, both of which keep ~100 Terabyte (something like 120-day) historical datastreams for download. RealEarth has a much smaller dataset, on the order of a few days, but it has a very simple API and by building a pipeline that periodically collects data from the API it should be possible to build imagery databases over time. That seems like a good starting point, so in this linked notebook I show how to collect specific imagery data (in this case, cloud/fog data) from the API. These images are then read in and converted to flattened numpy arrays, which can be used for downstream applications such as ML. Below is an example image, from August 8, 2018 at 5pm, centered on San Francisco and imaged in the G16-ABI-CONUS-BAND02 channel. As you can see, Karl is making an appearance. The Bay Area midsummer is heralded by the arrival of a monolithic, tempestuous atmostpheric phenomenon known colloquially as 'Karl the Fog'. It is massive and beautiful, charging up San Francisco's western slopes and through the Golden Gate in the late morning, erupting vertically into mist from Twin Peaks in the afternoon, and on hotter days cascading down the City's eastern slopes late in the day. From the east bay, it can occlude the city entirely, giving the impression that there is no San Francisco at all; from the inverted perspective of the City, there is no outside.
This phenomenon is caused by the geography and weather patterns of the area. California's interior forms a sprawling and continuous valley in which summer temperatures regularly climb into the triple-digits. As this hot air rises, it acts to draw new air in to replace it, from the ocean and through the mile-wide Golden Gate. It is the condensation of this cooler water that gives shape to Karl. Given this relatively simple model for how Karl forms, it may be possible to forecast the fog (thus, Karlcast) using weather data from the Central Valley. This data could be combined with historical fog data, for example from the geostationary satellite GOES-16, to uncover the relationship between heat in the central valley and the movement and growth of Karl. To begin to approach this problem, I've written some python to collect real-time and historical weather data from the Darksky API. This code is available as a jupyter notebook on my github. Below are some visualizations using bokeh of the weather data that I've collected. The first plot is of the last 120 days of hourly temperature and humidity data from Stockton, the endpoint of the SF Bay system. The second plot shows the relationship between temperature and humidity during this time. As could be expected, these are negatively-correlated: more humid days are cooler. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |
Proudly powered by Weebly