Skip to content

Alternative Data Use Cases: Where Investment Strategies Meet Big Data

Alternative data refers to information sourced from non-traditional streams outside standard financial reporting. This encompasses everything from satellite imagery of store parking lots to sentiment analysis of tweets. The applications for investors are far reaching even if the concept remains relatively new.

While some progressive hedge funds tapped alternative data sources over the past decade, adoption has rapidly accelerated. Recent estimates peg the global alternative data market at over $187 billion in 2021 – a figure expected to nearly double to $334 billion by 2028 based on a compound annual growth rate above 10%.

Myriad Factors Driving the Alternative Data Explosion

So what explains this explosion in behind-the-scenes data fueling Wall Street? At the highest level, increased complexity calls for enhanced signals. Quant funds and institutional managers alike seek to augment evaluations of company and sector performance as markets grow more opaque.

Specifically, the accelerated rise of factors like electronic trading, algorithmic management and high frequency strategies demands ever more real-time data flows. Millisecond advantages now separate profitable trades from losers – forcing funds into a technological arms race.

At the same time, expansion of the private markets creates huge demand for non-public data on unicorns like Stripe or SpaceX. Even basic operating metrics prove elusive on these closely held darlings. Again, alt data provides visibility.

Finally, the core driver remains pursuing untapped alpha. With public datasets rapidly commoditized, obscure alternative data vectors allow quant managers to maintain an analytical edge over rivals.

This guide explores exactly how funds leverage various alternative data types toward this end of outsized returns. Let‘s examine the top 5 emerging use cases where alt data intersects with investment strategies:

Top 5 Alternative Data Use Cases in Investing

1. Social Media Networks: The Voices of the Market

Scraping intelligence from social platforms represents a highly popular alternative data approach. Platforms like Twitter, Reddit and StockTwits contain rich sentiment data and discourse on financial topics from retail investors, institutions, analysts and media personalities.

Using natural language processing (NLP), quantitative strategies can parse large textual datasets to detect patterns that anticipate price movements or volatility shifts. Basic sentiment analysis tags commentary as bullish, bearish or neutral. More advanced NLP classifies tone, urgency and emotional intensity using linguistic models. For example, urgency detected in bearsish chatter often proves highly valuable as a contrarian indicator.

TheLimitNews built a stocks chatbot named FinBrain that performs sentiment analysis in real-time across financial subreddits and StockTwits. FinBrain achieved 72% directional accuracy on next-day price movements for covered tickers. The bot continues ingesting user commentary across social platforms, evolving its NLP model.

Of course challenges abound around sampling, bots and misinformation with social data. Critics also highlight risks like the GameStop short squeeze erupting from Reddit threads. Still, money managers allocate significant resources toward social alternative data for its real-time purview into market psychology.

2. Supply Chain Analysis: Following the Transaction Trail

Global supply chains now function as the circulatory system of the world economy. Consequently, data trails documenting supply chain flows unlock business insights and predictive signals for commodity prices, freight costs, retail performance and beyond. By tapping into transport data, storage infrastructure patterns and inventory levels, investors better track sector dynamics and company fundamentals.

Supply chain data applications span:

Satellite monitoring – Leveraging image feeds from companies like Orbital Insight which perform pattern analysis on activity metrics for storage facilities, ports, and transport assets

IoT sensor analytics – Drawing telemetry from container fleets, logistics infrastructure and factory equipment using Internet of Things connectivity

Blockchain transaction analysis – Utilizing shared ledgers to trace procurement flows, verify certifications and establish provenance across supply network stakeholders

Esteemed hedge fund manager Paul Tudor Jones proclaimed that data from the supply chain will be as important as annual report figures. Retail earnings announced yesterday rely on activity from months earlier. Real-time data on consumer demand and order fulfillment proves more telling.

Predicting box office figures for major film releases offers an apt example. One analytics firm built a model integrating trailer sentiment, social media momentum, director/cast data, genre patterns and historical comparisons. This hybrid dataset generates forecasts comparable to insider advance ticket sales – allowing traders to make bets on entertainment stocks ahead of weekend receipts.

Supply chain integration and IoT expansion will only expand this crucial alternative data category.

3. Internet of Things: From Sensor to Signal

Speaking of IoT advancement, streaming sensor data represents an alt data category brimming with potential. Connecting objects through embedded sensors and internet connectivity unlocks quantification capabilities previously unimaginable. Machines, products, infrastructure and assets now create data trails documenting performance, usage, environmentals and more.

Manufacturing proves an ideal testing ground for IoT analytics thanks to assets like connected shop floors and networked equipment. IoT pioneer Axis Technologies helped specialty chemicals company Reflex improve utilization by analyzing sensor and process data in their polymer productionnetwork. Alternative data showed certain equipment bottlenecks depressing yields. By optimizing workflow balancing, Reflex increased utilization over 20% – an efficiency gain benefitingMargins and multiplier potential as output scaled up.

This example demonstrates how IoT data grants visibility into production efficiencies, informing tactics like predictive equipment maintenance, inventory balancing and automation opportunities. McKinsey estimates factory efficiency use cases for IoT could unlock over $3 trillion in value through higher productivity and lower costs.

Now consider consumer IoT data documenting usage patterns and feature preferences from networked products and applications. Again, this alternative data monitors engagement, delight and usability – all vital to cash flow forecasting for investors.

On a macro level, surges in electric vehicle registrations as collected by Crossbow Automotive or wind turbine operations data from Angel Aero detect important demand shifts. The buildout of alt data infrastructure has only just begun.

4. Web Trends: Quantifying What‘s Cool

A variety of web analytics offer investor utility for gauging brand traction including search trends, site traffic, consumer reviews and earned media momentum. As digital presence increasingly mirrors commercial success, tracking web data provides vital demand signals.

Of note, web data offers crucial early visibility by its real-time nature when contrasted against quarterly earnings reports. The street already knows your ecommerce sales explosion by the time financials reflect this last quarter. But rapid search and traffic upticks this week reveal your viral moment as its unfolding.

Web data also assists investors by differentiating situational signals from noise. For instance, a single sensationalized news mention creates limited impact. But sustained elevation across traffic, search queries, review sentiment and press citations signal meaningful momentum.

Casting a wider alternative data net across multiple websites also improves result integrity. Scraping Amazon reviews avoids the look-ahead bias of sampling one ecommerce platform with quicker reporting. Comparing patterns technicians on Yahoo Finance forums prevent false signals.

In practice, web data informs everything from M&A targets (private company demand surging) to bankruptcy candidates (brand fading rapidly). Elite hedge funds spend heavily to access specialist web data feeds from players like SimilarWeb, Apptopia and Aleph Farms. Expect this alt category to grow in sophistication and impact.

5. Foot Traffic: Quantifying Store Activity

One largely untapped alternative data category set for expansion captures physical store traffic through sensors, cameras and mobile data. Contactless payment systems like Square track checkout velocity. Apps with location access like Foursquare chronicle store visitation tallies. Retail optimization platforms like Placer.ai deploy camera analytics to monitor foot traffic.

As digital and physical retail channels converge, benchmarking IRL and URL traffic provides invaluable mirroring for investor decisions. Site clicks preview checkout conversions. Foot traffic provides the even earlier demand signal.

Suppose an athletic footwear brand launches an aggressive Instagram marketing campaign for a new running shoe with sleep tracking features. conversion uplift directly tracks the immediate sales impact on their ecommerce site. But before clicks even occur, the first indication appears through increased foot traffic and ‘pings‘ recorded from smartphones entering brick-and-mortar locations. Investors monitoring this physical traffic alternative data obtain the earliest possible reads on initiative success.

Looking ahead, location datasets and mobility analytics sit positioned for hockey stick adoption curves as IoT proliferation takes hold. Apps sharing user insight like SafeGraph or Unacast already contribute data augmenting macro retail sector analytics. Extrapolate similar opt-in tracking across industrial sites, commercial facilities and smart urban infrastructure and investable geographical intelligence abounds.

The democratization of location data beckons. Its alternative status days are numbered.


Now that we‘ve spanned major alternative data categories from social platforms to sensor systems, how do investment teams actually capture and activate these external signals? The next section details key elements of assembling a high-functioning alternative data stack.

Constructing the Alternative Data Stack

Streamlining an institutional alternative data pipeline requires specific capabilities for intake, enrichment and modeling centered around three pillars:

1. Data Aggregation – Leveraging APIs, bulk transfers, web scrapers and specialty feeds to ingest target alt data signals in a consistent format.

2. Quantitative Analysis – Running statistical analytics, machine learning and AI techniques on aggregated alternative data to derive alpha factors and signals.

3. Model Productionalization – Finalizing reproducible models and metrics to assist portfolio management, risk analysis and position cost basis.

Examining each layer in detail:

Pillar 1: Data Aggregation

Data aggregation forms the foundation for downstream alternative data usage. Key requirements include:

  • Connectivity – Accessing streams via cloud or on-premise connectivity from origination points like social media firehose, supply chain EDI messages, sensor telemetry protocols etc. Options span batch uploads, APIs, FTP streams, web scrapers and commercial feeds.

  • Pre-processing – Cleaning raw sourcing for analysis suitability via parsing, standardization, error correction etc.

  • Storage – Landing aggregated alternative data in specialized cloud data lake architectures for transformation. Snowflake, AWS S3 and Google Big Query provide examples of scalable cloud storage engines tailored for these workloads.

Once aggregated and stored, alternative data feeds downstream for further synthesis.

Pillar 2: Quantitative Analysis

With target alt datasets landing in accessible repositories, quantitative analysts get to work examining signals.

  • Statistical Analytics – Surface-level timeseries analyses assess seasonality, cyclicality and covariance as precursor visualizations before more advanced model fitting.
  • Machine Learning – Supervised and unsupervised ML modeling uncovers hidden alpha signals across alternative data domain areas like web scraped ratings or supply chain API calls.
  • Predictive Modeling – Final modeling yields singular KPIs, classifications (buy signals, fraud likelihood etc.) or forecasted metrics to power investment decisions.

Tools like Python, R and MATLAB enable manipulation alongside scaled data science platforms like Databricks for building ML models on cloud infrastructure. The derived alpha factors and signals then flow into production.

Pillar 3: Model Productionalization

In productionalization, data scientists work closely with engineers and portfolio managers to properly activate analytical alternative data pipelines through:

  • Signal Activation – Finalizing models into platforms ingesting live alternative data streams with continuous outputs to trading systems.
  • Monitoring – Ensuring stable operations via logs analysis, smoke testing and robust instrumentation for the deployed models.
  • Integration – Connecting model output dashboards, notifications and programmatic signals into surrounding investment technology systems.

For example, social media sentiment classifier models get embedded into live dashboards alongside trading applications. Risk signals from supply chain data notify exposure tracking platforms. The end game centers operationalizing alternative data models to enhance investing outcomes.

Supporting Technologies

Specialized cloud software lays the foundation for alternative data Stack operations end-to-end. Examples include:

Workflow Orchestration – Tools like Apache Airflow arrange chains of alt data processes spanning scrape jobs, data validation tests, machine learning model scoring, signal checks etc.

Model Development – End-to-end ML platforms such as MLFlow and Kubeflow streamline modeling with version control, packaging and testing.

Stream Processing – Distributed stream engines like Kafka and Spark rapidly handle near-real-time analysis of high throughput alternative data feeds.

Container Deployment – Docker and Kubernetes efficiently deploy alternative data applications through isolated application containers.

As covered above, a finely tuned alternative data stack provides the assembly lines mass producing trade-worthy signals from external data feeds. Well worth the investments for funds playing advanced ball.


Even with rock solid data pipelines, alternative data introduces new challenges. The final section discusses limitations managers face along with the road ahead for alternative data capabilities.

Challenges and the Road Ahead

While adoption expands briskly, alternative data still poses hurdles spanning data quality, model rigor and product innovation.

Accessing Relevant Signals – The firehose of available datapoints creates extreme noise. Honing collections on targeted value drivers proves easier said than done. Providers also closely protect their special datasets.

Preventing Model Overfit – Complex quantitative models built on niche alternative data can fail dramatically outside limited domains. Analysts must emphasize generalizability.

Handling Data Biases – Many alternative datasets contain sampling biases. Social media skews young. Web traffic concentrates geographically. Income and demography affect smartphone ownership to collect mobility trails. Correcting for bias requires awareness.

Evaluating New Data Types – Speculative alternative data categories often lack historical baselines time-series comparisons. This complicates utility assessments.

The road ahead for alternative data brings further advancement across analytics techniques, data sourcing diversity and product offerings. A few developments to track include:

  • Integrating New Data Categories – Fast-evolving sources like lidar scans, VR behavior, augmented visual feeds and smart city metrics
  • Streamlining Access – Consolidation amongst alt data platforms and analytics vendors
  • Custom Synthesis – Automated pipelines building tailored datasets by spec

Advancing Techniques – State-of-the-art ML, better bias correction, 246 synthetic modeling assisting vetting

The lack of remaining public datasets may position alternative data as the dominant signal category in coming years. The time to start is now before the gold rush stalls.

For more on capitalizing on alternative data, checkout the Bright Data blog for use case examples and data scraping tutorials tailored to investors.

The world runs on data. Choose what type carefully in navigating markets primed for disruption.