# A histogram of stock returns with normal curve overlay

A histogram is a graphical representation of the frequency distribution of a set of data.

## Part 1

Part 1 provides an introduction to frequency tables and histogram chart. The table is setup by counting bin values, then using the Excel Analysis Toolpak and finally, the Excel **FREQUENCY** function.
The Excel version is available in the `Part 1`

worksheet of the associated file.

### 1.1. Frequency table of the data from bin counts

We start with a list of stock prices and returns for a major Australian retailer over one month period for November 2012
(Figure 1). The daily closing prices `Cprice`

are in column B, and the daily log returns `LogR`

in column C are returned by the Excel **LN** function.

Descriptive statistics are shown in the range `E7:F15`

. Each statistic is returned by an Excel function. A formula is used for the arithmetic Range statistic in cell `F9`

.

In addition to the data summary provided by the descriptive statistics, an analyst might be interested in the number of returns above or below the average, or within plus or minus one standard deviation.

A frequency table tells us how often values occur in a table. In finance, it is often assumed that the stock returns series is normally distributed.

In this example, four bins are used to count the frequency values:

- Bin 1:
`LogR`

≤ (μ - 1σ) - Bin 2: (μ - 1σ) <
`LogR`

< μ - Bin 3: μ <
`LogR`

< (μ + 1σ) - Bin 4: (μ + 1σ) ≤
`LogR`

To count the frequencies in the `LogR`

vector (see Figure 1):

- Estimate the values for the frequency table bins (shown in the range
`E17:F19`

). - Copy and Paste Special the log return
`values`

in the`LogR`

vector (to column I). - Sort the copied
`values`

into ascending order. - Count the frequencies (see column J).

This task can be more easily performed with the histogram tool from the Excel Analysis ToolPak

### 1.2. Histogram using the Analysis ToolPak

The Analysis Tools should be available on the Excel Ribbon under

. If it is not there, then you need to install the Analysis Toolpak (ATP) AddIn.To use the Histogram tool:

- Setup the values for the frequency table bins (see the range
`K5:L8`

shown in Figure 2). This is an array formula linking the range`E17:19`

from Figure 1. - On the
**Data**tab of the Excel Ribbon, select to display the Histogram dialog box (see Figure 3) - Complete the following properties (see Figure 3)
- Input
- Input Range: the LogR range from column C
- Bin Range: $L$6:$L$8. The bin label in cell
`L5`

has been omitted. - Output options
- Output Range: the upper left cell of the target - $M$5
- Select (tick) - Chart Output
- Click the OK button

The output is shown in Figure 4. The output table (range `M5:N9`

) has bin labels for the first three bins, whilst the last bin is given the default label of `More`

.

The chart output option generates the column chart for the four bins. Its format can be edited as required.

The frequency table values in Figure 4 have no link to the source data. Each value in the range `N6:N9`

is a constant as shown by the formula bar entry for cell `N6`

.

This means that the Histogram tool must be run each time the table or the chart need updating.

### 1.3 FREQUENCY function

The **FREQUENCY** function provides a way of linking the frequency table to the source data, and also allows use of dynamic tables and charts used in dashboard type management reports in Part 2 of this document.

The **FREQUENCY** function:

- Syntax:
**FREQUENCY(data_array,bins_array)** - Arguments
- data_array: an array of values from which the frequencies will be counted.
- bins_array: an array of intervals into which the values in data_array will be grouped.

To use the **FREQUENCY** function:

- Setup the values for the frequency table bins (see the range
`K13:L16`

shown in Figure 5). - The FREQUENCY function is an array function. You must select the correct size range for the target, the return values. In this case the frequency vector is 4 rows by 1 column (4 x 1).
Select the range
`M14:M17`

. - Enter the formula:
`FREQUENCY(LogR,L14:L17)`

, see cell`M14`

in Figure 5. The empty cell at`L17`

, corresponds to bin number 4, labeled More at cell`L9`

in Figure 4. - Complete the formula by pressing Control+Shift+Enter.
- The frequency array is shown in Figure 6. Whilst the results are the same as those obtained previously, the
**FREQUENCY**function array maintains links to the source data.

## Part 2

In this part we extend the material from Part 1 to demonstrate an interactive analysis platform for a stock return distribution system with a normal curve overlay. The material is available in the `Part 2 Analysis and Charts`

worksheet of the associated file.

### 2.1 A dynamic analyser of stock returns

The data used is a sample of share price data for the calendar year 2012, from the Australian retailer, Woolworths Limited, ASX code WOW. The data source is the Yahoo Finance web site, and the data has been imported to the WOW worksheet.

#### 2.1.1 Setup the returns vector

To do this (see the range `A:C`

in Figure 7).

- Insert a list of trading days for 2012 and the last day of 2011 (see range column
`A`

). - In column
`B`

, use the Excel**VLOOKUP**function to map the price data in the WOW worksheet to each date in the master list of column`A`

. The formula is shown in the formula bar. - In column
`C`

, insert a vector of daily log returns using the Excel**LN**function. Be careful of the date vector direction.

#### 2.1.2 Setup the stock analyser selection panel

The stock analyser selection panel is shown in the range `E3:H7`

of Figure 7. This provides an easy way to change the sample analysis date and number of observations.

The analyser selection panel cells link to the return data for the 2012 calendar year.
Firstly, the analyst selects an Analysis date (cell `G5`

), then the length of the analysis sample is selected from the Sample (days) list (cell `G6`

).
Satisfactory operation requires that the selection falls within the 2012 data window.

The year 2012 had 253 trading days, and suppose that the analyser permits a sample length (`Sample (days)`

) of 20 to 90 trading days within this 1 year window.
The setup task is simplified with named ranges, and a Workarea of helper cells.

This means that the earliest analysis date (the furthest back in time) will be 11 May 2012. If 31 December 2012 is day 1, then 11 May 2012 is day 164, thus providing a maximum of 90 analysis days including 11 May 2012. [164 + 90 - 1 (because of the date common to both periods) = 253].

To setup the date vectors, create the following names:

- DateLong: RefersTo
`='Part2 Analysis and Charts'!$A$6:$A$258`

. (253 rows). - DateShort: RefersTo
`='Part2 Analysis and Charts'!$A$6:$A$169`

. (164 rows).

The range `G5:G6`

contains two data validation cells. To do add the Data Validation items:

- Select cell
`G5`

- On the
**Data**tab of the Excel ribbon, select - Allow: List
- Source:
`=DateShort`

- Click the OK button.
- Apply a date format to the cell.
- In the Workarea, set up a column vector
`{20,21,...,90}`

. Name this vector`DayVector`

. - Select cell
`G6`

- On the
**Data**tab of the Excel ribbon, select - Allow: List
- Source:
`=DayVector`

- Click the OK button.
- Add the Names:
- Name: Analysis_Date
- RefersTo:
`='Part2 Analysis and Charts'!$G$5`

- Name: Sample__days. Note the double underscore occurs when you apply the
`Sample (days)`

.
item to the label - RefersTo:
`='Part2 Analysis and Charts'!$G$6`

The stock analyser selector panel controls the dynamic vector named `LogRVector`

. In turn the `LogRVector`

uses a nested vector named `DateVector`

as one of its arguments. Here are the details:

Add the Names (several names are setup in section 2.2.3:

- Name: DateVector
- RefersTo:
`=OFFSET(INDEX(DateShort,1,1),PositionTD-1,0,Height)`

- Name: LogRVector
- RefersTo:
`=OFFSET(DateVector,0,2)`

The argument names come from the labels shown in Figure 8.

#### 2.1.3 Setup the stock analyser summary statistics

Other Names have been used in the range `N6:O27`

shown by the red border in Figure 8. The labels in column `N`

has been applied as names to the values in column `0`

.
All cell formulae for the Workarea as shown in the shaded section `P6:Q27`

of Figure 8. The `Holidays`

name refers to rows 31 and 32.

### 2.2 A dynamic histogram of stock returns

In this example, the dynamic histogram uses a fixed number of bins, but allows the intervals and bins widths to vary with the data. Optimal bin numbers and intervals is discussed in Cimbala (2013), Doane (1976) - Doane's formula, Scott (1979) - Scott's normal reference rule, and Sturges (1926) - Sturges' formula.

Given the assumption of a normal distribution in log returns, six symmetrical to the mean bins are used. Three below the mean, and three above the mean.
Remember that the `bin_array`

argument to the **FREQUENCY** function is the upper limit to the values in a particular bin. So bin 3, labeled -1 in Figure 9 has an upper limit of Mu.

To achieve symmetry around the mean, we need to determine the Max{|Min - Mu|, |Max - Mu|}. This statistic is denoted *m* and is used to setup the bins values as shown in Figure 9.

We calculate the m statistic in cell Q35. The Excel **ABS** function returns the absolute value. The six bins are labeled {-3, -2, ... , -1} in column Q. The -4 bin is a place holder
to achieve symmetry in the left tail of the normal curve.

The frequencies in column `R`

use the same method as the **FREQUENCY** function table in Figure 6.
In column `S`

, each frequency value is divided by the vector total to return the relative frequency.

### 2.3 The normal curve overlay

The values for the normal probability density function in column T of Figure 8 are returned by the Excel **NORMDIST** function.

The **NORMDIST** function:

- Syntax:
**NORMDIST(x,mu,sigma, cumulative)** - Arguments
- x: the value at which to evaluate the distribution function .
- mu: the mean or average of the distribution.
- sigma: the standard deviation of the distribution.
- cumulative: TRUE for the cumulative normal distribution function; FALSE for the normal probability density function.

To scale the frequency and bell curve values, the Relative Normal frequency is calculated in column `U`

.

### 2.4 The histogram and normal curve charts.

Looking at Figure 8, the three area marked 1, 2, and 3 are the series for the charts in Figures 10, 11, and 12.
First item 2, the `RelFreq`

, and 3, the `RelNormal`

are graphed as an Excel column chart (Figure 10).

Then the item 2 series column chart is converted to a line chart (Figure 11).

Finally, the normal curve line chart is smoothed by selecting the series, then

, then ticking .Associated Excel file: histogramwithnormalcurveoverlay.xlsx

### References

Cimbala, J.M., (2013), 'Histograms' *Penn State University*, Unpublished Manuscript.

Doane D.P., (1976), 'Aesthetic frequency classification' *American Statistician*, pp.181-183.

Scott D.W., (1979), 'On optimal and data-based histograms', *Biometrika*, pp.605-610.

Sturges H.A, (1926), 'The choice of a class interval' *Journal of the American Statistical Association*, pp.65-66.

- This example was developed in Excel 2010
**Revised:**Wednesday 2nd of November 2016 - 04:44 PM, Pacific Time (PT)