Twitter volume spikes analysis and its application in trading

Disclaimer: this article is based on a study by Yuexin Mao, Wei Wei and Bing Wang, “Twitter Volume Spikes: Analysis and Application in Stock Trading”. We would like to thank them for this study. All copyrights belong to the original article’s respective owners, its authors.

Stock is a popular topic in Twitter. The number of tweets concerning a stock varies over days, and sometimes exhibits a significant spike. In this paper, we investigate Twitter volume spikes related to S&P 500 stocks, and whether they are useful for stock trading.  Authors develop a strategy that combines the Bayesian classifier and a stock bottom picking method, and demonstrate that it can achieve significant gain in a short amount of time. Simulation over a half year’s stock market data indicates that it achieves on average 8.6% gain in 27 trading days and 15.0% gain in 55 trading days. Statistical tests show that the gain is statistically significant.

TWITTER VOLUME SPIKE ANALYSIS

The authors investigate whether the number of tweets for a stock spikes around the earnings dates. Suppose that a company’s earnings date is day t. An analysis takes place on whether the number of tweets on the company’s stock spikes around t, in particular, on days t−1, t and t + 1. In the data collection period, there are 509 earnings days for the stocks that were considered. They found 79.2% of them are surrounded by a Twitter volume spike, confirming authors’ thoughts that people indeed tweet more about a stock around its earnings dates.

Twitter Spikes Analysis

Time difference(in days) from an earnings day to the closest day that has a Twitter volume spike. A negative value corresponds to the time difference to the closest Twitter volume spike in the past.

Twitter volume spikes close to earnings days are likely due to the earnings days themselves. Since earnings days are public information that people know beforehand, these Twitter volume spikes are no surprises. These spikes cannot be used in building a trading strategy as the price reflects them beforehand. Thus, the authors needed a way to determine if a certain spike was expected or not. Option implied volatility can be used as an indicator to determine whether a Twitter volume spike is expected or not, whether it is related to a scheduled event.

Picture2

Assume that for a stock, a Twitter volume spike happens on day t. In this figure, average daily implied volatility is plotted for both short-term options, i.e., those that will expire in 30 days after t, and longer-term options, i.e., those that will expire in 30 to 60 days after t. For short-term options, it can indeed be seen that the daily average implied volatility increases before t and decreases after t. For longer-term options, the trend is not clear. It was found out that 37.3% of the Twitter volume spikes are . Note that this percentage is a very conservative estimate and serves more like a lower bound, showing that a fair share of spikes are expected.

The authors now investigate potential causes of Twitter volume spikes. Specifically, they consider the following five factors:

1. Stock breakout point,
2. Intraday price change rate,
3. Interday price change rate,
4. Earnings day, and
5. Stock option implied volatility.

Then, the authors calculate the correlation of each of these five factors with Twitter volume spikes.

The correlation analysis resulted in the following figure:

Twitter Spikes Analysis

On the y axis we can see the CDF (cumulative distribution function) of the correlations between Twitter volume spikes and each of the five factors over all the stocks. Twitter volume spike has the strongest correlation with earnings days (with median of 0.37), which confirms our earlier result that a significant fraction of Twitter volume spikes occurs around earnings days. The correlation between Twitter volume spike and implied volatility has a median value of 0.14, much stronger than the correlation with the rest of the factors.

APPLICATION IN STOCK TRADING

Two trading strategies were developed, both using Twitter volume spikes as trading signals. For comparison, a baseline strategy that purchases a stock on a random day, and a strategy that uses trading volume spikes are considered.

First strategy was based solely on Bayesian classifier.

Classifier’s training factor was the probability that buying the stock can lead to profit after a number of days was calculated, and the stock was only bought when the probability was sufficiently large (above 0.7).

To evaluate the strategy, the data from February 21, 2012 to October 19, 2012 was used as training data, and the data from October 20, 2012 to March 31, 2013 was used as test data. This results in 573 Twitter volume spikes in the training set, and 672 Twitter volume spikes in the test set.

Implied volatility factor was excluded from testing and training because it  requires using option data and hence does not provide a fair comparison with other strategies.

Twitter Spikes Analysis

The results of the above simple strategy are encouraging, indicating that Twitter volume spikes are indeed useful in stock trading. On the other hand, the strategy does not consider the trend of a stock. For instance, it may buy a stock when the price of the stock is increasing, which may not lead to profit. So, the authors propose an enchanced strategy that takes trends into consideration.

Enhanced strategy using bottom-picking method

The authors combine the Twitter volume spike strategy with a Zigzag based algorithm (based on ZigZag indicator), used to identify turning points for a given movement rate, λ, which is defined as the minimum price difference ratio between two adjacent turning points.

The stock price turning point identification algorithm for a given λ is described as follows:

(1) Start the search from the first point in the dataset. Search forward until a potential turning point is found, i.e., one of the two conditions holds: (i) the price increases by at least λ from the start point, or (ii) the price decreases by at least λ from the start point. Continue the search.

(a) If condition (i) holds (i.e., the price moves upward), update the potential turning point when finding a point that is larger than the previous potential turning point. When finding a point that drops at least λ compared to the current potential turning point, set the current potential turning point to be a downward turning point.

(b) If condition (ii) holds (i.e., the price moves downward), update the potential turning point when finding a point that is smaller than the previous potential turning point. Set the current potential turning point to be an upward turning point.

(2) Start to search from the turning point. If the turning point is a upward turning point, go to Step (1a). If the turning point is a downward turning point, go to Step (1b). Repeat until the end of the data set.

Twitter Spikes Analysis

For the stock, the top figure shows the price chart; the bottom figure shows the tweets ratio, i.e., the number of tweets on a day over the average number of tweets in the past 70 days, over time. A day with tweets ratio above K has a Twitter volume spike.

Thus, a factor of the price being near the upward turning point of the ZigZag is added to the strategy.

Twitter Spike Analysis

The authors confirm that there is indeed strong evidence that the profit is positive, and the enhanced strategy outperforms the random strategy as well as the strategy that uses stock trading volume spikes.

Picture7

This figure plots the fraction of the winning trades using the enhanced strategy. We can observe that significant fraction of the trades lead to profit. For instance, when using intraday and interday price change rates, as much as 89.3% of the trades lead to profit in 29 days.

Simulation over a half year’s stock market data demonstrates that both strategies lead to substantial profits, and the enhanced strategy significantly outperforms the basic strategy and a bottom picking method that uses trading volume spikes, which proves that using Twitter volume spikes in trading can indeed provide a statistical/trading edge and should be employed by the traders.