mathjax + gtag

Wednesday, September 24, 2014

Weighted Moving Averages on Google Sheets

Abstract

A moving average (MA) process, also called weighted moving average (WMA) is a type of signal filtering that consists in performing a weighted average over a finite past sequence of samples of the original signal. Implementing such scheme in a worksheet is not always straightforward due to the handling of missing values. This article proposes a convenient way to implement such filters on Google Sheets.

Keywords

Weighted Moving Average, MA, WMA, Noise Removal, Filtering, Google Sheets, Spreadsheets.

Introduction

The moving average signal is defined by:
\[y(j) = \sum_{i=0}^n w(i) \cdot x(j-n+i)\]
The spreadsheet function \(AVERAGE()\) performs this computation in the particular case where all \(w(i)\) are equal to one. In this case, it is very easy to deal with missing values, as the only thing you need to do is count the number of non-missing values and divide the sum by this count.

When the \(w(i)\)'s are different numbers, it is not enough to count the number of non-missing values, one needs to perform the final normalization using the sum of the correspondent weight coefficients.

Proposed Solution

The proposed solution is to use the function \(SUMPRODUCT()\) twice, first to get the weighted sum, and second to get the sum of the coefficients that have multiplied non-missing data.

Assume that:
  • The spreadsheet has a page called "Filter1".
  • The averaging coefficients are on column Filter1!A.
  • Cell Filter1!B1 has the formula "COUNT(A:A)", which will count the number of averaging coefficients.
  • The current cell is B22.
Then the claim is that the following formula will calculate the correct value of WMA:
\[\begin{array}{l}
=\\
SUMPRODUCT( \\
\qquad OFFSET(B22, -Filter1!$B$1 + 1, 0, Filter1!$B$1, 1), \\
\qquad OFFSET(Filter1!$A$1, 0, 0, Filter1!$B$1, 1)) \\
/ \\
SUMPRODUCT(\\
\qquad ARRAYFORMULA(\\
\qquad \qquad N(ISNUMBER(\\
\qquad \qquad \qquad OFFSET(B22, -Filter1!$B$1 + 1, 0, Filter1!$B$1, 1)))), \\
\qquad OFFSET(Filter1!$A$1, 0, 0, Filter1!$B$1, 1))
\end {array}\]
The details of this expression are as follows:
  • \(Filter1!$B$1\) is \(n\).
  • \(OFFSET(current\_cell,−n+1,0,n,1)\) is used to produce a range of n cells, of which the current cell is the last one.
  • \(OFFSET(Filter1!$A$1, 0, 0, Filter1!$B$1, 1)\) are the weighting coefficients.
  • \(ARRAYFORMULA(N(ISNUMBER(OFFSET(\cdots))))\) applies the function \(N()\) to the boolean result of \(ISNUMBER()\) for each cell in the current range, which will produce an array of zeroes where there is missing data and ones where there is data.

Results

The formula has been tested against some weighting data. The resulting spreadsheet has a plot of the original data, along with the \(AVERAGE()\) data and WMA data for comparison.

Conclusion

A spreadsheet formula for the correct calculation of a weighted moving average has been derived and successfully tested on Google Sheets. The proposed formula deals with missing values in a way similar to the \(AVERAGE()\) function, avoiding the distortions that would be caused either by using zero in place of the missing values or by packing the original series.

Tuesday, September 23, 2014

Exponential Moving Averages on Google Sheets

Abstract

Exponential moving averages (EMA) is a way to remove noise from data series. Unfortunately, useful straightforward spreadsheet support for EMA is absent. This article examines the problems involved and proposes a one line formula solution to add EMA to a spreadsheet.

Keywords

Exponential Moving Average, EMA, Noise Removal, Filtering, Google Sheets, Spreadsheets.

Introduction

Traditionally, implementing an exponential moving average has been done in spreadsheets using a recursive formula, i.e., an auto-regressive (AR) process or an infinite impulse response (IIR) filter. The formula is the following:
\[y(n) = \alpha \cdot x(n) + (1 - \alpha) \cdot y(n-1)\]
There are some problems with this approach:
  1. Strictly speaking, this is not a moving average. Moving average assumes a finite length sliding window under which data is being weighted. There is a growing window, not a sliding window.
  2. The last item implies that the average takes into account every single data sample. The previous formula actually implements an IIR filter. This kind of averaging never forgets a value, although old values certainly get irrelevant over time. It would be nice to have control over the window where the exponential average takes place, especially when one intends to use a small window.
  3. It does not deal properly with missing values. In a spreadsheet, it is common to have missing data values and the previous approach does not allow them. Converting missing values to zero would unacceptably distort the average value. If one insists in applying this formula to the series by packing the original series, the weights applied to the values in the averaging process will not reflect the actual distance in time that these samples might have.
The function AVERAGE(range) is able to deal with this problem quite simply because it uses the same average on every sample, so it is just a matter of dividing the sum by the number of non-blank entries. In a non-uniform averaging like EMA, we need to keep track of which weights were really applied or not due to missing values, and fix the normalizing factor accordingly.

Proposed solution

The function SERIESSUM(a, n, m, x) is defined as
\[SERIESSUM(a, n, m, x) = \sum_{i=0}^n x_i a^{n+m i}.\]
The proposed solution is to use this function twice, first to calculate the weighted sum, and then a second time to calculate the sum of the weights where data is not missing.
Assume that:
  • Cell F1 contains the geometric progression ratio \((\alpha)\);
  • Cell F2 contains the window size \((n)\);
  • Column B contains the raw data;
  • The current cell is B22.
Then the claim is that the following formula will calculate the correct value of EMA:
\[\begin{array}{l}
=\\
SERIESSUM($F$1, $F$2, -1, \\
\qquad ARRAYFORMULA(N(OFFSET(B22, -$F$2 + 1, 0, $F$2, 1))))\\
/ \\
SERIESSUM($F$1,  $F$2, -1, \\
\qquad ARRAYFORMULA(\\
\qquad \qquad N(ISNUMBER(OFFSET(B23, -$F$2 + 1, 0, $F$2, 1)))))
\end{array}\]
The details of this expression are as follows:
  • \(OFFSET(current\_cell, -n+1, 0, n, 1)\) is used to produce a range of \(n\) cells, of which the current cell is the last one.
  • \(ARRAYFORMULA(N(OFFSET(\dotsc)))\) will apply the \(N()\) function to each element of the argument range to generate a new range with zero values in the missing data cells. Without this trick, \(SERIESSUM()\) would use non-missing values as if they were contiguous.
  • \(ARRAYFORMULA(N(ISNUMBER(OFFSET(\dotsc))))\) will generate a range composed of ones where data is not missing and zeros where data is missing.

Results

The formula has been tested against some weighting data. The resulting spreadsheet has a plot of the original data, along with the \(AVERAGE()\) data and EMA data for comparison.

Conclusion

A spreadsheet formula for the correct calculation of an exponential moving average has been derived and successfully tested on Google Sheets. The proposed formula deals with missing values in a way similar to the \(AVERAGE()\) function, avoiding the distortions that would be caused either by using zero in place of the missing values or by packing the original series.