Demystifying big data

Aug. 3, 2016

How Target predicts pregnancy—and how it’s transforming transportation

The term Big Data (BD) has become ambiguous, confusing and often misused in the transportation industry. It is a broad term that has become as much a marketing strategy and industry buzzword as anything else. There’s big money in BD, but what’s all the hype about? Is it really something that the transportation industry can benefit from? If so, how?

Other industries outside of transportation are already using BD methodologies to change the way they serve, target or build brand loyalty among their customers. For example, the mega retailer Target employs data scientists and statisticians to better leverage their wide-ranging data assets including things like credit card transactions, online searches, online and in-store registries (weddings, babies, etc.), demographics, market research, stock/inventory and social media buzz about products and the company brand.

BD methodologies and analytics are then used to analyze, find patterns and even predict future shopping habits of shoppers. In a famous example of the reach and often surprising insights gained from BD analytics, a front-page New York Times article in 2012 explained how Target predicts that if a 23-year-old woman (we’ll call her Jenny) living in Atlanta, Ga., purchases the following items in March:

  • Cocoa butter lotion;
  • A large purse;
  • Zinc and magnesium supplements; and
  • A blue rug.

Then there is an 87% chance that she is pregnant with a baby boy and that her due date is in late August.

Target could leverage this knowledge to its advantage—mailing customized coupons and fliers to Jenny for 20% off all baby items. Because Target predicts Jenny is having a baby boy, blue colors could be used in the flier, which would also show photos of women about Jenny’s age/demographic holding a baby boy, and include all of the products that Target predicts Jenny will need to buy in the coming months. All of this information is based on market research into the known behavior of those with Jenny’s income, level of education, geography and purchasing habits.

This is only one small, but powerful, example of Target’s leveraging of BD concepts—statistics, data science, behavior science, data fusion, information technology—to better manage its business and better serve its customers. While some may consider these types of BD analytics to be an invasion of privacy, in practice it translates to brand loyalty, increased sales and much more for the retailer.

How this translates to the transportation industry

The transportation industry is awash with data. Advancements in technology have provided accessibility to national and local datasets never before available to the transportation community. These new datasets are growing bigger and more diverse every day and include:

  • Real-time data from on-board vehicular sensors that support connected and autonomous vehicle technologies;
  • Traveler location data from mobile devices and transportation infrastructure; and
  • Remote-sensing data ranging with spatial, temporal and spectral resolutions that promise to transform how infrastructure, the surrounding environment, is measured.

So, if the data exists, why then is the transportation industry so far behind Target’s BD approaches to serving customers and improving operations? First, we’ve used these data for individual purposes. We’ve failed to combine them in ways that reveal unsuspected or unobserved patterns that could ultimately lead to a more efficient, safe and sustainable transportation system.

Secondly, there is a gulf between the BD and transportation communities. The gulf remains in part because:

  • Many agencies have strict guidelines on what technologies can and cannot be used within its own enterprise;
  • An agency may lack the personnel resources and know-how to develop and maintain big-data technologies;
  • Many of these big-data archives are large, and agencies may lack the ability to procure storage and processing capacity; and
  • Internal politics may exist with an agency that view investments in big-data technologies and tools as risky when leadership may not perceive an immediate tangible benefit.

Transportation systems management and operation (TSM&O) strategies in most states have, to date, been widely based on the same old rudimentary data and outdated analytical concepts. The standard transportation operations approach can be oversimplified as:

The most advanced systems in operation today attempt to fill in knowledge gaps and/or be more predictive in nature through the use of simulation technologies. However, even these systems don’t start truly predicting the effects of operator actions until some event has already happened. So, even in the best cases, we are still reactive rather than proactive. Most analytics used on these data only provide minimal insights and point out what has already happened, not how to better react and anticipate before and during the next event.

BD technologies like the ones used in the Target example have the power to change this old way of thinking.

BD as a Framework

To many, BD is about size and complexity of data; to others, it’s about storage and retrieval technologies that make it possible to explore large datasets. But BD is really just a new way of thinking, acting, deriving insights, and even communicating. BD is a framework that can be used as a platform to optimize decision-making of practitioners in transportation. It is about methodologies, tools and techniques that enable knowledge discovery and accessibility. These enablers can allow us to do great things using existing data that we haven’t been able to do before—at least not easily or not without significant investments. BD centers on new ways of leveraging data to find correlation and meaning where we might not have thought of looking previously.

A graphic representation of the CATT Lab’s Regional Integrated  Transportation Information System.

The CATT Lab

The CATT Lab operates the largest transportation BD fusion, sharing and analytics platform in the U.S. This platform, called the Regional Integrated Transportation Information System (RITIS), continually ingests over 6 billion real-time streaming data records per day including those from commercial probe-based speed/travel-time vendors such as Here, INRIX, TomTom and Google/Waze; high-definition signal data; social media/crowdsourced data; connected-vehicle data; credit card point of sale transactions; first responder computer-aided dispatch; first responder radio communications; traffic volume measurements; incident and event data; and origin-destination reidentification data.

By the close of 2016, this number is expected to jump to over 8 trillion records per day. An additional 250,000 commercial vehicles also will provide to RITIS the following real-time streaming data: hard-braking events; antilock braking system (ABS) engagement; stability control; temperature sensor measurements; wiper use; headlight use; air bag deployments; seatbelt use; tire pressure sensor readings; and emissions and oxygen sensor data.

As these data grow bigger and more diverse each day, the transportation industry is increasingly overwhelmed. Providing data, tools and knowledge to empower transportation professionals and researchers to extract previously untouchable information from these datasets is critical.

Data, tools and domain expertise are the three components required to effectively use BD in RITIS. Without these three basic components, potential user communities will be unable to derive the meaningful insights that help to make informed decisions, derive new knowledge and ultimately move both the industry and society forward. Within RITIS and the CATT Laboratory, focus is on the visualization of BD, as most people’s ability to understand and work with large numbers and data is extremely limited.

The RITIS platform utilizes the latest hadoop-style technologies mixed with relational databases to enable users to ask demanding questions through a suite of visual analytics, multivariate high-dimensionality statistical analytics, and robust APIs and download tools that aim to make the data more accessible to end users. All of these applications are available through web interfaces. The ultimate goal of RITIS has been to make this data as easily accessible and understandable to as many end users as possible to allow for the insights and knowledge discovery.

A key innovation in RITIS that is enabled because of new BD methodologies is the realization of self-creating, dynamic and predictive dashboards and visual analytics.

Whereas most analytics and dashboard systems force the users to configure settings, define parameters and otherwise ask questions of the system, RITIS is employing forward-thinking, machine-learning BD concepts that dynamically and continually analyze incoming data, compare it to historic data, search for patterns, look for correlations and/or statistical outliers and then report out on only the interesting and important elements that are discovered. In this sense, the RITIS dashboards are “self-creating” and ever-changing. They don’t rely solely on humans to ask questions. They are predictive. They are intelligent. They are searching for problems and patterns that humans might not ever think to ask about.

The end result is a more informed government, a better served citizen, and more efficient, sustainable and safer mobility services.

Big is relative

Big data methodologies are for everyone regardless of the size of your data. Big is relative. Many agencies struggle to process any amount of data that is larger than can be managed in an Excel worksheet. For them, BD is anything that exceeds Excel’s worksheet limits.

What’s important to remember is we’re all using the same methodologies, hardware platforms and technologies to advance our industry for the better. With BD technologies, we can work on bigger problems faster. We can leverage the power of integrating traditional transportation datasets with non-transportation data. We start to answer questions we never even knew we needed to ask. We will thus finally begin to move away from being a reactive industry to being a proactive one.

About The Author: Pack is director of the University of Maryland CATT Laboratory.

Sponsored Recommendations

The Science Behind Sustainable Concrete Sealing Solutions

Extend the lifespan and durability of any concrete. PoreShield is a USDA BioPreferred product and is approved for residential, commercial, and industrial use. It works great above...

Powerful Concrete Protection For ANY Application

PoreShield protects concrete surfaces from water, deicing salts, oil and grease stains, and weather extremes. It's just as effective on major interstates as it is on backyard ...

Concrete Protection That’s Easy on the Environment and Tough to Beat

PoreShield's concrete penetration capabilities go just as deep as our American roots. PoreShield is a plant-based, eco-friendly alternative to solvent-based concrete sealers.

Proven Concrete Protection That’s Safe & Sustainable

Real-life DOT field tests and university researchers have found that PoreShieldTM lasts for 10+ years and extends the life of concrete.