正規性検定 - Engineering Skills

データ解析を始めるにあたって、データ集合が正規分布に従っている（正規分布で近似できる）かどうかを確認することは重要な第一歩です。統計検定において正規分布であるかどうかの検定があります。実は種類がたくさんあり、算出方法も色々です。今回は勉強を兼ねてデータの正規性検定手法と検定値算出方法を列挙します。扱う手法は以下の通りです。

(1) Anderson-Darling検定
(2) Shapiro-Wilk検定
(3) Shapiro-Francia検定
(4) Kormogorof-Smirnov検定
(5) Lilliefors検定
(6) Cramér–von Mises検定
(7) D'Agostino's K-squared検定
(8) Jarque-Bera検定

以下では、正規分布の累積分布関数（Cumulative Distirution Function, CDF）を[math]F(x)[/math]、その逆関数を[math]\Phi^{-1}()[/math]と表します。サンプルデータ数は[math]n[/math]です。

Anderson-Darling検定

Anderson-Darling検定はWilbur Anderson とDonald A. Darlingに提案された手法です[1]。もともとの手法は平均や分散が既知か未知の場合などで計算方法が異なります。大抵の場合、これらは未知なので一般的な平均未知、分散未知の場合の計算方法を示します。

データを昇順にソート
データ[math]X_i[/math]を平均、標準偏差により[math]Z_i[/math]へ正規化

正規分布

積分

[math] S = \displaystyle\sum_{i=1}^{n} \frac{2i-1}{n} [ ln ( F ( Z_i ) ) + ln ( 1 - F ( Z_{n + i + 1} ) ) ] [/math]
[math] A^2 = - n - S [/math]
[math] A^* = A^2 \times (1 + \displaystyle\frac{0.75}{n} + \displaystyle\frac{2.25}{n^2}) [/math]

P値の計算について、先ほどの[math] A^2 [/math]を用いて下記の近似式により任意の検定統計量のP値が計算されます。[2]

[math] \mathsf{if} ~(A^* \leqq 0.2) ~P = 1 - exp(-13.436 + 101.14A^*- 223.73{A^*}^2) [/math] [math] \mathsf{else~if} ~(0.2 < A^* \leqq 0.34) ~P = 1 - exp(-8.318 + 42.796A^*- 59.938{A^*}^2) [/math] [math] \mathsf{else~if} ~(0.34 < A^* < 0.6) ~P = exp(0.9177 - 4.279A^*- 1.38{A^*}^2) [/math] [math] \mathsf{else} ~P = exp(1.2937 - 5.709A^*+ 0.0186{A^*}^2) [/math]

Shapiro-Wilk検定

Shapiro-Wilk検定はSamuel Sanford ShapiroとMartin Wilkによって提案されました[3]。この手法はサンプルデータ集合の順序統計量と正規分布の順序統計量の相関（共分散）に基づいた検定です。小サンプルについて検証が行われ[4]、次いでアルゴリズムが開発されたいったそうです[5][6]。(読んでません…)。次に示すのはRoyston[7]によって提案された近似計算手法です。検定統計量Wが小さい（相関が弱い）場合に"データが正規分に従わない"という帰無仮説は棄却されます。

データを昇順にソート
平均を計算

正規分布

[math] \tilde{m_i} = \Phi^{-1} ( \frac{i-3/8}{n+1/4} ) [/math]
[math] c_i = \frac{\tilde{m_i}} {|\mathbf{\tilde{m}}|} [/math]
[math] \tilde{a_n} = -2.706056u^5 + 4.434685u^4 - 2.071190u^3 - 0.147981u^2 + 0.221157u + c_n, \tilde{a_1} = \tilde{a_n} [/math]
もし[math]n>5[/math]なら、下記も計算 [math] \tilde{a_{n-1}} = -3.582633u^5+5.682633u^4-1.752461u^3-0.293762u^2+0.042981u+c_{n-1}, \tilde{a_2} = \tilde{a_{n-1}} [/math]
[math] \mathsf{if} ~n \leqq 5\mathsf{then} ~\phi =(|\mathbf{\tilde{m}}| - 2{\tilde{m}_n}^2)/(1-2{\tilde{a}_n}^2) [/math]
[math] \mathsf{else} ~\phi =(|\mathbf{\tilde{m}}| - 2{\tilde{m}_n}^2 - 2{\tilde{m}_{n-1}}^2)/(1-2{\tilde{a}_n}^2 - 2{\tilde{a}_{n-1}}^2) [/math]
[math] \mathsf{if} ~n \leqq 5\mathsf{then} ~\tilde{a_i} = \displaystyle\frac{\tilde{m_i}}{\sqrt{\phi}} ~for ~i = 2,...,n-1 [/math]
[math] \mathsf{else} ~ ~\tilde{a_i} = \displaystyle\frac{\tilde{m_i}}{\sqrt{\phi}} ~for ~i = 3,...,n-2 [/math]
[math] W = \frac{{(\sum_{i=1}^{n}\tilde{a_i}X_i)}^2}{\sum_{i=1}^{n}{(X_i-\bar{X})}^2}[/math]

標準偏差

[math]\sigma[/math]

正規分布

[math] \mathsf{if} ~4 \leqq n \leqq 11 ~\mathsf{then}[/math]
[math] \mathsf{else if} ~n > 11 ~\mathsf{then}[/math]
[math] P = \Phi^{-1}(\frac{g(W)-\mu}{\sigma}) [/math]

Shapiro-Francia検定

Shapiro-Francia検定はShapiro-Wilk検定の簡略版です[9]。サンプルサイズが大きい場合、この検定はShapiro-Wilk検定の同党の検出力となります。

データを昇順にソート
平均を計算
[math] \tilde{m_i} = \Phi^{-1} ( \frac{i-3/8}{n+1/4} ) [/math]
[math] c_i = \frac{\tilde{m_i}} {|\mathbf{\tilde{m}}|} [/math]
[math] W' = \frac{{(\sum_{i=1}^{n}\tilde{c_i}X_i)}^2}{\sum_{i=1}^{n}{(X_i-\bar{X})}^2}[/math]

標準偏差

[math]\sigma[/math]

正規分布

[math] \mu = 1.0521u-1.2725, u = ln(ln(n))-ln(n) [/math]
[math] \sigma = -0.2758v+1.030, v=ln(ln(n))+\frac{2}{ln(n)} [/math]
[math] g(W) = ln(1-W)) [/math]
[math] P = \Phi^{-1}(\frac{g(W)-\mu}{\sigma}) [/math]

Kormogorov-Smirnov検定

Kormogorov-Smirnov検定は二標本の母集団の確率分布が異なるものであるか検定する手法で、Andrey KolmogorovとNikolai Smirnovの名にちなみます[10][11] 。1標本の場合は、帰無仮説において指定された累積分布関数と比較します。ここでは正規分布関数の累積分布関数と比較することになります。正規分布に関する検定については、後述するLillieforsによる改良が知られています。ただし、Kormogorov-Smirnov検定は分布の裾の部分よりも中央値付近の方に強く依存し多くのサンプルサイズを必要とします。正規性の検定に関してはAnderson-Darling検定やShapiro-Wilk検定の方が強力な手法のようです。

データを昇順にソート
データ[math]X_i[/math]を平均、標準偏差により[math]Z_i[/math]へ正規化
[math] D_n^+ = sup(\displaystyle\frac{i} {n}-F(Z_i)) [/math]
[math] D_n^- = sup(F(Z_i)-\displaystyle\frac{i-1} {n}) [/math]
[math] D_n = max(D_n^+,D_n^-) [/math]

定量

[math] Prob(\sqrt{n}D_n) = 2\displaystyle\sum_{i=1}^{\infty} (-1)^{i-1} e^{-2i^{2}x^{2}} [/math]

Lillifors検定

Lillifors検定はKormogorov-Smirnov検定の修正版で検定統計量は同一、検定対象を正規分布のみとして扱います。下記ではJuergen Grossによって"R"のnortestに実装された計算方法を記述します。Lillifors検定のP値はDalla, Wilonson[12]により近似式が発表されています。ただし P値が0.1以下で信頼性があるとのことで、0.1以上ではStphens[13]の方法で計算しています。以下ではP値の計算方法のみ記述します。

[math] \mathsf{if} ~n \leqq 100 ~\mathsf{then} ~KD = D_n, ND = n [/math]
[math] \mathsf{else} ~KD = D_n{(\displaystyle\frac{n}{100})}^{0.49}, ND = 100 [/math]
[math] \small \displaystyle P = exp(-7.01256KD^{2}(ND + 2.78019) + 2.99587KD \sqrt{ND + 2.78019} - 0.122119 + \frac{0.974598}{\sqrt{ND}} + \frac{1.67997}{ND}) [/math]
[math] \displaystyle KK = \sqrt{n} - 0.01 + \frac{0.85}{\sqrt{n}D_n} [/math]
[math] \small \mathsf{if} ~(KK \leqq 0.302) ~P = 1 [/math] [math] \small \mathsf{else~if} ~(0.302 < KK \leqq 0.5) ~P = 2.76773 - 19.828315KK + 80.709644{KK}^2 - 138.55152{KK}^3 + 81.218052{KK}^4 [/math] [math] \small \mathsf{else~if} ~(0.5 < KK \leqq 0.9) ~P = -4.901232 + 40.662806KK - 97.490286{KK}^2+ 94.029866{KK}^3 - 32.355711{KK}^4 [/math] [math] \small \mathsf{else~if} ~(0.9 < KK \leqq 1.31) ~P = 6.198765 - 19.558097KK + 23.186922{KK}^2 - 12.234627{KK}^3 + 2.423045{KK}^4 [/math] [math] \small \mathsf{else} ~P = 0 [/math]

Cramér–von Mises検定

Cramér–von Mises検定もEDFに基づいた検定になります。

データを昇順にソート
データ[math]X_i[/math]を平均、標準偏差により[math]Z_i[/math]へ正規化

正規分布

積分

[math] W = n{w}^2 = \displaystyle \frac{1}{12n} + \sum_{i=1}^{n}{[\frac{2i-1}{2n}-F(Z_i)]}^2 [/math]
[math] \displaystyle W' = W (1.0 + \frac{0.5}{n}) [/math]

D'Agostino's K-squared検定

D'Agostino's K-squared検定は標本歪度（Skewness）、標本尖度（Kurtsis）から計算されます[15][16][17]。標本歪度、標本尖度それぞれに基づく検定と、両者の結合にから生成される包括的な検定が定義されます。

標本歪度[math] g1 [/math]、標本尖度[math] g2 [/math]は以下のように定義されます。
[math] g1 = \displaystyle \frac{\frac{1}{n}\sum_{i=1}^{n}{(x_i-\bar{x})}^3} {{(\frac{1}{n}\sum_{i=1}^{n}{(x_i-\bar{x})}^2)}^\frac{3}{2}} [/math]
[math] g2 = \displaystyle \frac{\frac{1}{n}\sum_{i=1}^{n}{(x_i-\bar{x})}^4} {{(\frac{1}{n}\sum_{i=1}^{n}{(x_i-\bar{x})}^2)}^{2}} - 3 [/math]
これらは標本が正規分布に従っていると標本歪度、標本尖度の平均、分散などが分析でき検定を行えます。ただし検定量の正規分布への収束性が遅いため修正量について検定が行われます。

D'Agostino's Skewness test

[math] \displaystyle {\mu}_2 = \frac{6(n+2)}{(n+1)(n+2)} [/math]
[math] \displaystyle {\gamma}_2 = \frac{36(n-7)({n}^2+2n-5)}{(n-2)(n+5)(n+7)(n+9)} [/math]
[math] \displaystyle W_2 = \sqrt{2{\gamma}_2 + 4} - 1 [/math]
[math] \displaystyle \delta = \frac{1}{\sqrt{\ln{W}}} [/math]
[math] \displaystyle {\alpha}^2 = \frac{2}{({W}^2-1)} [/math]
[math] \displaystyle Z_1 ( g_1 ) = \delta \ln ( \frac{g_1}{\alpha \sqrt{{\mu}_2} } ) [/math]
[math] P = 2 \times \Phi^{-1}( Z_1 ( g_1 ) ) [/math] (両側検定)

D'Agostino's Kurtsis test

[math] \displaystyle {\mu}_1 = - \frac{6}{(n+1)} [/math]
[math] \displaystyle {\mu}_2 = \frac{24n(n-2)(n-3)}{ {(n+1)}^2 (n+3)(n+5)} [/math]
[math] \displaystyle {\gamma}_1 = \frac{6({n}^2-5n+2)}{(n+7)(n+9)} \sqrt{ \frac{6(n+3)(n+5)}{n(n-2)(n-3)} } [/math]
[math] \displaystyle A= 6 + \frac{8}{{\gamma}_1} ( 1 + \sqrt {1 + \frac{4}{ {{\gamma}_1}^2 }}) [/math]
[math] \displaystyle Z_2 ( g_2 ) = \sqrt{\frac{9A}{2}} \{ 1 - \frac{2}{9A} - {( \frac{1-1/A}{1+\frac{g_2-\mu_1}{\sqrt{{\mu}_2}}\sqrt{2/(A-4)} } )}^{\frac{1}{3}} \} [/math]
※Anscombe-Glynn kurtosis test(アンスコム・グリン検定[18])とも呼ばれます。
[math] P = 2 \times \Phi^{-1}( Z_2 ( g_2 ) ) [/math] (両側検定)

D'Agostino's Omnibus test

[math] \displaystyle K^2 = {Z_1 ( g_1 )}^2 + {Z_2 ( g_2 )}^2 [/math]

正規分布

カイ二乗分布

Jarque-Bera検定

Jarque-Bera検定はD'Agostino's K-squared検定と同様に標本歪度、標本尖度の結合に基づく検定統計量を計算します[19][20][21]。

検定統計量JBは以下です。
[math] JB = \displaystyle n\{ \frac{{g_1}^2}{6} + \frac{{(g_2 - 3)}^2}{24} \}[/math]
JBが自由度2のカイ二乗分布に従う事からP値が計算されます。

まとめ

色々な手法を列挙しました。主観が入りますが、物理、物性に関わるエンジニアの実務に関してはAnderson-Darling検定、Shapiro-Wilk検定の二つを知っていれば十分だと思います。とはいえ、背景知識を持っているに越したことはありません。こちらで公開しているツールでは紹介した手法の検定を簡単に行えるようにしているので使ってみてください。

[1] Anderson, T. W.; Darling, D. A. (1952). "Asymptotic theory of certain "goodness-of-fit" criteria based on stochastic processes". Annals of Mathematical Statistics. 23: 193–212.

[2] Ralph B. D'Agostino (1986). "Tests for the Normal Distribution". In D'Agostino, R.B.; Stephens, M.A. (eds.). Goodness-of-Fit Techniques. New York: Marcel Dekker. ISBN 0-8247-7487-6.

[3] Shapiro, S. S.; Wilk, M. B. (1965). "An analysis of variance test for normality (complete samples)". Biometrika. 52 (3–4): 591–611.

[4] Sarhan AE, Greenberg BG. Estimation of location and scale parameters by order statistics from singly and doubly censored samples. The Annals of Mathematical Statistics. 1956; 27(2):427–451.

[5] Davis C, Stephens M. Approximating the Covariance Matrix of Normal Order Statistics. Applied Statistics. 1978; 27(2):206-212.

[6] Royston JP. Algorithm AS 177: Expected normal order statistics (exact and approximate). Journal of the Royal Statistical Society. Series C (Applied Statistics). 1982; 31(2):161–165.

[7] Royston P. Approximating the Shapiro-Wilk W-test for non-normality. Statistics and Computing. 1992; 2(3):117-119.

[8] Blom G. Statistical Estimates and Transformed Beta-Variables. New York, NY. John Wiley & Sons, 1958.

[9] Shapiro SS, Francia R. An approximate analysis of variance test for normality. Journal of the American Statistical Association. 1972; 67(337):215–216.

[10] Kolmogorov A (1933). "Sulla determinazione empirica di una legge di distribuzione". G. Ist. Ital. Attuari. 4: 83–91.

[11] Smirnov N (1948). "Table for estimating the goodness of fit of empirical distributions". Annals of Mathematical Statistics.

[12] Dallal, G.E. and Wilkinson, L. (1986): An analytic approximation to the distribution of Lilliefors’ test for normality. The American Statistician, 40, 294–296.

[13] Stephens, M.A. (1974): EDF statistics for goodness of ﬁt and some comparisons. Journal of the American Statistical Association, 69, 730–737.

[14] Stephens, M.A. (1986): Tests based on EDF statistics. In: D'Agostino, R.B. and Stephens, M.A., eds.: Goodness-of-Fit Techniques. Marcel Dekker, New York.

[15] Pearson, Egon S. (1931). “Note on tests for normality”. Biometrika 22 (3/4): 423–424.

[16] D’Agostino, Ralph B. (1970). “Transformation to normality of the null distribution of g1”. Biometrika 57 (3): 679–681.

[17] D’Agostino, Ralph B.; Albert Belanger; Ralph B. D’Agostino, Jr (1990). “A suggestion for using powerful and informative tests of normality”. The American Statistician 44 (4): 316–321.

[18] Anscombe, Francis J., and William J. Glynn. "Distribution of the kurtosis statistic b 2 for normal samples." Biometrika 70.1 (1983): 227-234.

[19] Jarque, Carlos M.; Bera, Anil K. (1980). “Efficient tests for normality, homoscedasticity and serial independence of regression residuals”. Economics Letters 6 (3): 255–259.

[20] Jarque, Carlos M.; Bera, Anil K. (1981). “Efficient tests for normality, homoscedasticity and serial independence of regression residuals: Monte Carlo evidence”. Economics Letters 7 (4): 313–318.

[21] Jarque, Carlos M.; Bera, Anil K. (1987). “A test for normality of observations and regression residuals”. International Statistical Review 55 (2): 163–172.