Demystifying big data

How Target predicts pregnancy—and how it’s transforming transportation

Michael Pack / August 03, 2016
transforming transportation
transforming transportation

The term Big Data (BD) has become ambiguous, confusing and often misused in the transportation industry. It is a broad term that has become as much a marketing strategy and industry buzzword as anything else. There’s big money in BD, but what’s all the hype about? Is it really something that the transportation industry can benefit from? If so, how?

Other industries outside of transportation are already using BD methodologies to change the way they serve, target or build brand loyalty among their customers. For example, the mega retailer Target employs data scientists and statisticians to better leverage their wide-ranging data assets including things like credit card transactions, online searches, online and in-store registries (weddings, babies, etc.), demographics, market research, stock/inventory and social media buzz about products and the company brand.

BD methodologies and analytics are then used to analyze, find patterns and even predict future shopping habits of shoppers. In a famous example of the reach and often surprising insights gained from BD analytics, a front-page New York Times article in 2012 explained how Target predicts that if a 23-year-old woman (we’ll call her Jenny) living in Atlanta, Ga., purchases the following items in March:

  • Cocoa butter lotion;
  • A large purse;
  • Zinc and magnesium supplements; and
  • A blue rug.

Then there is an 87% chance that she is pregnant with a baby boy and that her due date is in late August.

Target could leverage this knowledge to its advantage—mailing customized coupons and fliers to Jenny for 20% off all baby items. Because Target predicts Jenny is having a baby boy, blue colors could be used in the flier, which would also show photos of women about Jenny’s age/demographic holding a baby boy, and include all of the products that Target predicts Jenny will need to buy in the coming months. All of this information is based on market research into the known behavior of those with Jenny’s income, level of education, geography and purchasing habits.

This is only one small, but powerful, example of Target’s leveraging of BD concepts—statistics, data science, behavior science, data fusion, information technology—to better manage its business and better serve its customers. While some may consider these types of BD analytics to be an invasion of privacy, in practice it translates to brand loyalty, increased sales and much more for the retailer.

things happen on the roadway

How this translates to the transportation industry

The transportation industry is awash with data. Advancements in technology have provided accessibility to national and local datasets never before available to the transportation community. These new datasets are growing bigger and more diverse every day and include:

  • Real-time data from on-board vehicular sensors that support connected and autonomous vehicle technologies;
  • Traveler location data from mobile devices and transportation infrastructure; and
  • Remote-sensing data ranging with spatial, temporal and spectral resolutions that promise to transform how infrastructure, the surrounding environment, is measured.

So, if the data exists, why then is the transportation industry so far behind Target’s BD approaches to serving customers and improving operations? First, we’ve used these data for individual purposes. We’ve failed to combine them in ways that reveal unsuspected or unobserved patterns that could ultimately lead to a more efficient, safe and sustainable transportation system.

Secondly, there is a gulf between the BD and transportation communities. The gulf remains in part because:

  • Many agencies have strict guidelines on what technologies can and cannot be used within its own enterprise;
  • An agency may lack the personnel resources and know-how to develop and maintain big-data technologies;
  • Many of these big-data archives are large, and agencies may lack the ability to procure storage and processing capacity; and
  • Internal politics may exist with an agency that view investments in big-data technologies and tools as risky when leadership may not perceive an immediate tangible benefit.

Transportation systems management and operation (TSM&O) strategies in most states have, to date, been widely based on the same old rudimentary data and outdated analytical concepts. The standard transportation operations approach can be oversimplified as:

The most advanced systems in operation today attempt to fill in knowledge gaps and/or be more predictive in nature through the use of simulation technologies. However, even these systems don’t start truly predicting the effects of operator actions until some event has already happened. So, even in the best cases, we are still reactive rather than proactive. Most analytics used on these data only provide minimal insights and point out what has already happened, not how to better react and anticipate before and during the next event.

BD technologies like the ones used in the Target example have the power to change this old way of thinking.

BD as a Framework

To many, BD is about size and complexity of data; to others, it’s about storage and retrieval technologies that make it possible to explore large datasets. But BD is really just a new way of thinking, acting, deriving insights, and even communicating. BD is a framework that can be used as a platform to optimize decision-making of practitioners in transportation. It is about methodologies, tools and techniques that enable knowledge discovery and accessibility. These enablers can allow us to do great things using existing data that we haven’t been able to do before—at least not easily or not without significant investments. BD centers on new ways of leveraging data to find correlation and meaning where we might not have thought of looking previously.

A graphic representation of the CATT Lab’s Regional Integrated  Transportation Information System.

A graphic representation of the CATT Lab’s Regional Integrated  Transportation Information System.

The CATT Lab

The CATT Lab operates the largest transportation BD fusion, sharing and analytics platform in the U.S. This platform, called the Regional Integrated Transportation Information System (RITIS), continually ingests over 6 billion real-time streaming data records per day including those from commercial probe-based speed/travel-time vendors such as Here, INRIX, TomTom and Google/Waze; high-definition signal data; social media/crowdsourced data; connected-vehicle data; credit card point of sale transactions; first responder computer-aided dispatch; first responder radio communications; traffic volume measurements; incident and event data; and origin-destination reidentification data.

By the close of 2016, this number is expected to jump to over 8 trillion records per day. An additional 250,000 commercial vehicles also will provide to RITIS the following real-time streaming data: hard-braking events; antilock braking system (ABS) engagement; stability control; temperature sensor measurements; wiper use; headlight use; air bag deployments; seatbelt use; tire pressure sensor readings; and emissions and oxygen sensor data.

As these data grow bigger and more diverse each day, the transportation industry is increasingly overwhelmed. Providing data, tools and knowledge to empower transportation professionals and researchers to extract previously untouchable information from these datasets is critical.

Data, tools and domain expertise are the three components required to effectively use BD in RITIS. Without these three basic components, potential user communities will be unable to derive the meaningful insights that help to make informed decisions, derive new knowledge and ultimately move both the industry and society forward. Within RITIS and the CATT Laboratory, focus is on the visualization of BD, as most people’s ability to understand and work with large numbers and data is extremely limited.

The RITIS platform utilizes the latest hadoop-style technologies mixed with relational databases to enable users to ask demanding questions through a suite of visual analytics, multivariate high-dimensionality statistical analytics, and robust APIs and download tools that aim to make the data more accessible to end users. All of these applications are available through web interfaces. The ultimate goal of RITIS has been to make this data as easily accessible and understandable to as many end users as possible to allow for the insights and knowledge discovery.

A key innovation in RITIS that is enabled because of new BD methodologies is the realization of self-creating, dynamic and predictive dashboards and visual analytics.

Whereas most analytics and dashboard systems force the users to configure settings, define parameters and otherwise ask questions of the system, RITIS is employing forward-thinking, machine-learning BD concepts that dynamically and continually analyze incoming data, compare it to historic data, search for patterns, look for correlations and/or statistical outliers and then report out on only the interesting and important elements that are discovered. In this sense, the RITIS dashboards are “self-creating” and ever-changing. They don’t rely solely on humans to ask questions. They are predictive. They are intelligent. They are searching for problems and patterns that humans might not ever think to ask about.

The end result is a more informed government, a better served citizen, and more efficient, sustainable and safer mobility services.

Big is relative

Big data methodologies are for everyone regardless of the size of your data. Big is relative. Many agencies struggle to process any amount of data that is larger than can be managed in an Excel worksheet. For them, BD is anything that exceeds Excel’s worksheet limits.

What’s important to remember is we’re all using the same methodologies, hardware platforms and technologies to advance our industry for the better. With BD technologies, we can work on bigger problems faster. We can leverage the power of integrating traditional transportation datasets with non-transportation data. We start to answer questions we never even knew we needed to ask. We will thus finally begin to move away from being a reactive industry to being a proactive one.

About the Author

Pack is director of the University of Maryland CATT Laboratory.

Related Articles

Autonomous driving
Image: Olivier Le Moal - stock.adobe.com
Fifty years ago, a bright, shiny spacecraft delivered humans to the moon for the first time. That breathtaking achievement captured the world’s…
August 01, 2019
I-90 project enclosed pedestrian overpass bridge
A significant feature of the I-90 project is a $6 million enclosed pedestrian overpass bridge and accompanying entry buildings integrated within the interchange. This steel truss bridge with glass curtain walls spans the tollway and connects express bus stations on both sides of I-90, improving mobility and safety for pedestrians and transit users.
Multimodal connectivity is becoming an increasingly popular concept in addressing congestion and quality-of-life issues. While there are abundant…
August 01, 2019
Rural intersections continue to be the second leading contributor of serious crashes in Minnesota. Specifically, these crashes typically occur when…
June 05, 2019
Israel LaFleur Bridge led to an increase of traffic congestion along I-210, I-10, and the surrounding area.
Work on the Israel LaFleur Bridge led to an increase of traffic congestion along I-210, I-10, and the surrounding area. However, overall vehicle crashes during the work were lower than expected, partially attributed to smart work zone systems.
Interstate 210 (I-210) is a 12.4-mile-long bypass of I-10 in Lake Charles, Louisiana. One of the defining features of this route is the Israel…
June 05, 2019
expand_less