x-y scatter plot with correlated random numbers


In this module:


Random number generator


Pseudo random numbers \(\text{N(0,1) i.i.d.}\) are generated by the Excel RAND function and then passed as the probability argument to the NORM.S.INV function. The correlation structure is incorporated by using the Cholesky decomposition method.


Cholesky decomposition


Generate \(Z_1,Z_2\)

  1. Generate \(Z_1,Z_2 \sim \text{N(0,1) i.i.d.}\)
  2. Set \(X=Z_1\) and \(Y=\rho Z_1 + Z_2 \sqrt{1-\rho^2}\), where \(\rho\) is the correlation coefficient

The X Y data worksheet


\(Z_1,Z_2\) are generated on the worksheet named: X Y data. Pseudo random numbers \(\text{N(0,1) i.i.d.}\) are generated with the nested NORM.S.INV(RAND()) formula.

  1. Cell B3: =IF(ROW()-2<=NoObs,NORM.S.INV(RAND()),"")
  2. Cell C3: =IF(ROW()-2<=NoObs,rho*B3+SQRT(1-rho^2)*NORM.S.INV(RAND()),"") where rho and NoObs are links to the selector and validator panel
  3. Copy B3:C3 and paste to the range B4:B5002. Name B3:B5002 as X, and C3:C5002 as Y

Summary statistics and the X-Y scatter plot


Refer to the X Y Interface worksheet shown in the figure 1 Excel Web App #1.


Fig 1: Excel Web App #1: - the correlated random (number) generator (RNG) on the left, and X Y scatter plot on the right

The selector / validator panel


The correlation selector

  1. D5: the correlation selector - is set up by Data Validation
  2. Select Data > Data Tools > Data Validation
  3. In the Data Validation dialog box - Settings tab
  4. Set Validation criteria; Allow: Decimal; Data: between; Minimum: -1; Maximum 1
  5. In the Data Validation dialog box - Input Message tab
  6. Set Title: Correlation value; Input message: Enter rho in the range ... -1 <= rho <= +1

The "No of points" validator

  1. D6: the "No of point"s validator - is set up by Data Validation
  2. Select Data > Data Tools > Data Validation
  3. In the Data Validation dialog box - Setting tab
  4. Set Validation Criteria; Allow: List; Source: 625,1250,2500,5000
  5. In the Data Validation dialog box - Input Message tab
  6. Set Title: Number of observations; Input message: 625, 1250, 2500, 5000, ... Select 625 to reduce worksheet recalculation time

The descriptive statistics panel


Note: INDIRECT(D$10) uses the value in cell D10, X, to point to the range named X.

Formulas in the summary statistics section:


The X Y scatter plot


  1. To select the data, type XV,YV in the Name Box. If you are using the Web App file, you will need to Unhide the X Y data worksheet before you select the data. Create the chart on that worksheet, then move it to the X Y interface worksheet
  2. Insert the chart - follow the ribbon sequence: Insert > Charts > Charts Dialog Launcher > All Charts tab > X Y (Scatter) > Scatter. Click OK to display the chart shown in figure 2
  3. xlf-xy-scatter-static
    Fig 2: X Y Scatter plot - with Number of points set to 1,250
  4. Using the Insert > Chart sequence does not retain the reference to the dynamic range names, instead a static name is applied (see figure 4). When a vector length shorter than the static range is selected, the X Y chart collapses as shown in figure 3. The x axis (the Horizontal (Value) Axis) displays the observation numbers, with the observation 626 onwards displaying as blank (the static vector being 1,250 data points in length)
  5. xlf-xy-scatter-static-collapse
    Fig 3: X Y Scatter plot - with Number of points set to 625. The scatter plot has collapsed because the Series references are static
  6. To display the Edit Series dialog box, use the ribbon sequence Design > Data > Select Data to display the Select Data Source dialog box, then, in Legend Entries (Series) select Series1, then click Edit. The Edit Series dialog box will now be displayed - figure 4
  7. xlf-xy-scatter-static-edit-series
    Fig 4: X Y Scatter - Edit Series dialog box - showing that Series Values are static - hard coded references
  8. In step 1, the dynamic ranges XY and YV were selected, but these were overwritten by Series X values: ='X Y data!$B$3:$B$1252 and Series Y values: ='X Y data!$C$3:$C$1252 as shown in the Edit Series dialog box in figure 4. The series reference need to be replaced with links to the dynamic vectors
  9. Link the chart series to the dynamic range - by editing the series values. Set Series X values: ='xlf-x-y-scatter.xlsx'!XV, and Series Y values: ='xlf-x-y-scatter.xlsx'!YV
  10. Format the X Y chart - from the Chart Tools tab, select Format > Current Selection > Chart Elements box then the Chart Area element.
  11. From the same ribbon tab, select Size, and set Height: 12.5 cm, and Width: 12.5 cm
  12. A two vector X Y plot has only one series, and the Series name value in figure 4 was left blank, thus the default name is Series1. Select Series1 from the Chart Element box
  13. Click Format Selection to display the Format Data Series task pane. This is the Office 2013 implementation of an Office 2010 dialog box. An example is shown in figure 5
  14. xlf-format-data-series-task-pane
    Fig 5: Excel 2013 task pane - showing Format Data Series options
  15. Select Marker > Marker Options, then Built-in, and set Size: 2
  16. Select Series Options > Horizontal (Value) Axis. This will display the Format Axis task pane
  17. Select Axis Options (the three column icon), then set Bounds; Minimum: -4.0, and Maximum: 4.0. Vertical axis crosses; Axis value: 0.0
  18. Repeat step 10 for the Vertical (value) Axis. Select the axis from the Axis Options > Elements List box - figure 6
  19. xlf-format-axis-task-pane
    Fig 6: Format Axis task pane - showing Axis Options elements list
  20. Add a trendline - Design > Chart Layouts > Add Chart Element > Trendline > More Trendline options to display the Format Trendline task pane.
  21. Select the Fill & Line tab, then set Line: Solid; Color: Standard Colors, Red; Width: 1.5 pt; Dash type: Solid; then Close the task pane.
  22. Hide the chart title - Chart Tools > Design > Add Chart Elements > Chart Title > None
  23. Cut and paste the chart to the X Y interface worksheet
  24. The completed chart appears in figure 7
  25. xlf-xy-scatter-completed-2
    Fig 7: X Y scatter - completed. Includes gridlines. Selector / validator options set to correlation 0.5 and 1250 points

Selected Excel functions used in this module and associated statistical functions.


Excel functions Description
NORM.S.INV(probability)
[2010]
Returns the inverse of the standard normal cumulative distribution \(\text{N(0,1) i.i.d.}\).
RAND()Returns an evenly distributed random number \(0 \le x \lt 1\). The rand function takes no arguments
ROW([reference])Returns the row number of a reference
SQRT(number)Returns the positive square root of a number \(\sqrt{x}; \quad x^{0.5} \)