実験計画法（２）－一元配置実験と分散分析

実験計画の前に、基礎的な一元配置実験と分散分析について説明します。

一元配置実験

一因子について実験水準を設け繰り返し実験を行うものを一元配置実験と呼びます。平たく言えばパラメータが一つの単純な実験です。例えば下記のような実験結果が得られた場合に、分散分析と呼ばれる検定を行います。今回は検定手順を追います。

f:id:OceanOne:20200824003829p:plain:h105

上記のデータを箱ひげ図（Boxplot）で可視化すると下記のようになります。下図からは、B条件に差異があるようですが解析結果はどうでしょう？

f:id:OceanOne:20200825233329p:plain:w350

一元配置実験の分散分析

まず、さきほどのデータ例について全体の平均を計算すると4.8になります。そして平均と平均からのズレを分離すると下記のように表せます。

f:id:OceanOne:20200824003936p:plain:h115

右辺第二項の二乗和（平方和）は平均からのバラツキの総和です（この例では11.46）。これを総平方和（[math] \displaystyle S_T [/math]）と呼び、データ数で割れば分散になります。

このバラツキの中には実験水準による効果（変動）とランダムな誤差が含まれているので、両者を比較する検定が分散分析です。実験による変動なので平均値を比較するのですが、平均値間のバラツキ（分散）を用いるので分散分析という呼び名になっています。

右辺第二項については、さらに水準毎の平均値と水準平均値からのズレに分解することが出来ます。下記のようになります。

f:id:OceanOne:20200824003852p:plain:h115

上記分解図の右辺第二項の平方和を要因平方和（[math] \displaystyle S_A [/math]）、右辺第三項の平方和を誤差平方和（[math] \displaystyle S_E [/math]）と呼びます。上記分解図から、実験結果は平均値[math] \mu [/math]、因子Aの効果[math] a_{j} [/math]、誤差に[math] e_{ij} [/math]に分解でき、下記のような構造モデルを考えることが出来ます。

[math] \displaystyle x_{ij} = \mu + a_{i} + e_{ij} [/math]

そして分散の加法性から、要因平方和と誤差平方和の和は総平方和になります

[math] \displaystyle S_T = S_A + S_E [/math]

この例では

[math] \displaystyle S_A = {({(-0.6)}^2 + {(1.0)}^2 + {(-0.4)}^2)}*4 = 6.08 [/math]
[math] \displaystyle S_E = S_T - S_A = 11.46 - 6.08 = 5.38 [/math]

[math] \displaystyle S_A [/math]と[math] \displaystyle S_E [/math]を比較して、有意に大きいかどうかで水準差があるかどうか検定します。

この時、帰無仮説は

[math] \displaystyle \mu_{A1} = \mu_{A2} = \mu_{A3} [/math]

であり、一水準でも違いがあれば有意差と判定されます。つまり、3水準以上ある場合、具体的にどの水準に有意差があるのかは教えてくれません。どの水準に有意な差があるか明示的に解析するには、多重比較検定などが必要になります。

分散分析表

これら平方和を各自由度で割ったものが平均平方、要因Aと誤差Eの平均平方の比が検定統計量[math] \displaystyle F_0 [/math]値になります。総データ数をN、水準数をmとしたとき、各自由度は不偏分散の分母と同じように下記のようになります。

総自由度　[math] \displaystyle \phi_T = N - 1 [/math]
要因自由度　[math] \displaystyle \phi_A = m - 1 [/math]
誤差自由度　[math] \displaystyle \phi_E = \phi_T - \phi_A = N - m [/math]

以上をまとめて表にしたものを分散分析表と呼び、下記のようなものになります。

要因	平方和	自由度	平均平方	[math] \displaystyle F_0 [/math]値	[math] \displaystyle P [/math]値
A	[math] \displaystyle S_A [/math]	[math] \displaystyle \phi_A = m - 1 [/math]	[math] \displaystyle V_A [/math]	[math] \displaystyle {V_A}/{V_E} [/math]	[math] \displaystyle P_A [/math]
E	[math] \displaystyle S_E [/math]	[math] \displaystyle \phi_E = N - m [/math]	[math] \displaystyle V_E [/math]
T	[math] \displaystyle S_T [/math]	[math] \displaystyle \phi_T = N - 1 [/math]

検定統計量[math] \displaystyle F_0 [/math]が、自由度[math] \displaystyle (\phi_A , \phi_E) [/math]の[math] \displaystyle F [/math]分布に従うことから[math] \displaystyle P [/math]値は計算されます。

今回の例では下記のようになります。

要因	平方和	自由度	平均平方	[math] \displaystyle F_0 [/math]値	[math] \displaystyle P [/math]値
A	[math] \displaystyle 6.08 [/math]	[math] \displaystyle 2 [/math]	[math] \displaystyle 3.04 [/math]	[math] \displaystyle 5.0855 [/math]	[math] \displaystyle 0.0333 [/math]
E	[math] \displaystyle 5.38 [/math]	[math] \displaystyle 9 [/math]	[math] \displaystyle 0.5978 [/math]
T	[math] \displaystyle 11.46 [/math]	[math] \displaystyle 11 [/math]

平方和の計算

実際に分散分析の平方和計算を行う場合には、等価な簡略版を用います。

◆総平方和◆

総平方和については分散公式から

[math] \displaystyle \frac{1}{N} \sum_{i=1}^{N}{(x_i-\bar{x})}^2 = \frac{1}{N} \sum_{i=1}^{N}{x_i}^2 - {\bar{x}}^2 [/math]

なので

[math] \displaystyle S_T = \sum_{i=1}^{N}{(x_i-\bar{x})}^2 = \sum_{i=1}^{N}{x_i}^2 -N {\bar{x}}^2 [/math]
[math] \displaystyle S_T = \sum_{i=1}^{N}{x_i}^2 -N {( \frac{1}{N}\sum_{i=1}^{N}{x_i} )}^2 [/math]
[math] \displaystyle S_T = \sum_{i=1}^{N}{x_i}^2 -\frac{1}{N}{( \sum_{i=1}^{N}{x_i} )}^2 [/math]

データの総和を[math] \displaystyle T [/math]と置くと下記のようになります。

[math] \displaystyle S_T = \sum_{i=1}^{N}{x_i}^2 -\frac{{T}^2 }{N}[/math]

右辺第二項は修正項（[math] \displaystyle CT [/math]）とも呼ばれ、以下で使います。

[math] \displaystyle S_T = \sum_{i=1}^{N}{x_i}^2 - CT [/math]

◆要因平方和◆

要因平方和は、各水準のデータ数を[math] \displaystyle n_i [/math]、各水準平均を[math] \displaystyle \bar {x_{i \cdot}} [/math]とすると

[math] \displaystyle S_A = \sum_{i=1}^{m} {\sum_{j=1}^{n_i} {( \bar {x_{i \cdot}} - \bar{x} ) ^2 }} [/math]
[math] \displaystyle S_A = \sum_{i=1}^{m} { n_i {( \bar {x_{i \cdot}} - \bar{x} ) ^2 }} [/math]

二乗項を展開します。

[math] \displaystyle S_A = \sum_{i=1}^{m} { n_i \{ (\bar {x_{i \cdot}})^2 - 2 \bar {x_{i \cdot}} \bar {x} + (\bar {x})^2 \} } [/math]

[math] i [/math]水準の総和を[math] \displaystyle T_{i\cdot}[/math]とすると

[math] \displaystyle S_A = n_i \sum_{i=1}^{m} {(\frac{T_{i\cdot}}{n_i})}^2 - 2 \bar{x}\sum_{i=1}^{m}{n_i \bar {x_{i \cdot}}} + {\bar{x}}^2 \sum_{i=1}^{m}{n_i} [/math]
[math] \displaystyle S_A = \sum_{i=1}^{m} \frac{ {T_{i\cdot}}^2}{ {n_i} } - 2 \bar{x}\sum_{i=1}^{m}{T_{i\cdot}} + {\bar{x}}^2 \cdot N [/math]
[math] \displaystyle S_A = \sum_{i=1}^{m} \frac{ {T_{i\cdot}}^2}{ {n_i} } - 2 \bar{x}(N\bar{x}) + N{\bar{x}}^2 [/math]
[math] \displaystyle S_A = \sum_{i=1}^{m} \frac{ {T_{i\cdot}}^2}{ {n_i} } - N{\bar{x}}^2 [/math]

[math] S_T [/math]の場合と同様に、第二項は[math] CT [/math]なので

[math] \displaystyle S_A = \sum_{i=1}^{m} \frac{ {T_{i\cdot}}^2}{ {n_i} } - CT[/math]