月度存档: 五月 2013

Moment Generating Function and Probability Generating Function

Moment Generating Function(mgf) and Probability Generating Function(pgf) are useful techniques in Probability Theorem. As Loss Model studies a lot about probability, mgf and pgf are necessary techniques. So I post some stuffs about them.

The definition of Moment Generating Function(Univariate Case) is

M_{X}(t) = E[e^{tX}] = \int_{-\infty}^{\infty}e^{tx}f(x)\mathrm{d}x

More generally, if X=(X_{1}, X_{2}, \dots, X_{n})^{T}, we use t^{T}X instead of tX:

M_{X}(t) = E[e^{t^{T}X}]

The definition of mgf seems it will be complicated, but why defining it like that? According to Wikipedia, defining that way can be used to find all the moments of the distribution. Employing Taylor's Series to expand e^{tx}, we have that

e^{tX} = 1 + tX + \frac{t^{2}X^{2}}{2!} + \frac{t^{3}X^{3}}{3!}+\cdots+\frac{t^{n}X^{n}}{n!}+\cdots

Such that

M_{X}(t) = 1 + tE[X]+ \frac{t^{2}E[X^{2}]}{2!} + \frac{t^{3}E[X^{3}]}{3!} + \cdots + \frac{t^{n}E[X^{n}]}{n!}+\cdots

It is straightforward to differentiate M_{X}(t) n times with respect to t and setting t =0 to get E[X^{n}].

And if X_{1}, X_{2}, \dots, X_{n} is sequence of independent random variables, and S_{n} = \sum\limits_{i=1}^{n}a_{i}X_{i}. The mgf of S_{n} is

M_{S_{n}}(t) = M_{X_{1}}(a_{1}t)M_{X_{2}}(a_{2}t)\cdots M_{X_{n}}(a_{n}t)

It is notable to remind that some distributions have no mgf because in some case \lim\limits_{n\rightarrow\infty}\sum\limits_{i=0}^{n}\frac{t^{i}E[X^{i}]}{i!} is not exist. For example, lognormal distribution.

For pgf , the definition is here:

G(z) = E[z^{X}]

. If we do a little bit transformation, we could drive our car to mgf:

G(e^{t}) = E[e^{tX}] = M_{X}(t)

When I reading the instruction of pgf on Wikipeida, it sounds like pgf is more appropriate for discrete random variable, but I don't have any evidence.

For Univariate case, a more detailed pgf definition is here:

G(z) = E(z^{X}) = \sum\limits_{x=0}^{\infty}p(x)z^{x}

And for Multivariate case, the definition is here:

G(z) = G(z_{1},\dots,z_{d}) = E(z_{1}^{X_{1}}\cdots z_{d}^{X_{d}}) = \sum\limits_{x_{1},\dots,x_{d}=0}^{\infty}p(x_{1},\dots,x_{d})z_{1}^{X_{1}}\cdots z_{d}^{X_{d}}

From its definition, it is obviously a power series, which guarantees that |z|\leq 1 will make the power series converged. If we setting z = 1^{-}, we could get that

E(\frac{X!}{(X-k)!}) = G^{k}(1^{-}),\ k \geq 0

And if X_{1}, X_{2}, \dots, X_{n} is sequence of independent random variables, and S_{n} = \sum\limits_{i=1}^{n}a_{i}X_{i}. The pgf of S_{n} is

G_{S_{n}}(z) = G_{X_{1}}(z)G_{X_{2}}(z)\cdots G_{X_{n}}(z)

And particularly, if S = X_{1}-X_{2}, we have

G_{S}(z) = G_{X_{1}}(z)G_{X_{2}}(1/z)

Note: All the materials of this post comes from wikipedia.org, you could check it out if you want something more detailed.

Hotelling-Williams T-test (1)

Recently, I am trying to compare the performance of two measures. It turns out a problem of comparing two correlation coefficients \rho_{12} and \rho_{13}, where the subscript 1 is denoting the observation group, 2 and 3 is denoting the measures. To be honest, I don't have any idea at the very beginning. Many thanks to my supvisor Dr. Dennis Cheung, he sent me a PPT about correlation coefficients, which Hotelling-Williams T test [Steiger] is also included.

The formula of Hotelling-Williams T test is here:

t_{(N-3)} = (r_{12}-r_{13})\sqrt{\frac{(N-1)(1+r_{23})}{2(N-1)|R|/(N-3)+\bar{r}^{2}(1-r_{23})^{2}}}

  • N = Number of Observation
  • r_{12} = sample correlation between Observation and measure 2
  • r_{13} = sample correlation between Observation and measure 3
  • r_{23} = sample correlation between measures
  • |R| = 1 - r_{12}^2 - r_{13}^2 - r_{23}^2 + 2(r_{12})(r_{13})(r_{23})
  • \bar{r} = (r_{12} + r_{13})/2
  • \rho means population correlation and r is denoting sample correlation

Hotelling-Williams T Test performs well in my hypothesis testing. It proofs that there is a significant difference between two measures, which explained the phenomenons I have observed. It is linear in my case, but I doubt that whether Hotelling-Williams T test appropriate for non-linear case, like log case . I found that in [crr] blog, there is a post about solving a similar problem --the correlations between the frequency measures and word processing time. Their post is very detailed and two more similar testing techniques are also introduced. One is the Vuong Test[Vuong, 1989], this test was suggested when dealing with a nonlinear problem, for example, the word processing time and log frequency. This will require we should use non-linear regression model. Vuong was suggested for this case for it based on a comparison of the log-likelihood. Another method is developed by Clarke (2007)[Clarke], he suspected that Vuong test is considered conservative for small N. However, after conducting a simulation test conducted by the [crr] blogger, they concluded that Hotelling-Williams T test is the best one and the latter is Vuong test. The Vuong test will be suggested unless the correlation between variables is very little.

The core idea about Hotelling-Williams T test is not clear yet, I will finish that in next post.

  1. [crr]http://crr.ugent.be/archives/546
  2. [Vuong] Vuong, Q.H. (1989): Likelihood Ratio Tests for Model Selection and non-nested Hypotheses. Econometrica, 57, 307-333.
  3. [Clarke] Clarke, K.A. (2007). A Simple Distribution-Free Test for Nonnested Model Selection. Political Analysis, 15, 347-363.
  4. [Steiger] Steiger, J.H. (1980), Tests for comparing elements of a correlation matrix, Psychological Bulletin, 87, 245-251.

最经我需要对比两个指标与观测数据的相关度,\rho_{12}\rho_{13}的比较(下标1表示观测数据,2,3分别表示两种不同的指标)。开始的时候我完全没有任何想法,因为之前的数理统计中没有涉及过这么一个问题。感谢我的导师Dr Dennis Cheung, 他给我发了一份关于相关系数的PPT,上面有Hotelling-Williams T 检验[Steiger]。

Hotelling-Williams T 检验的公式如下:

t_{(N-3)} = (r_{12}-r_{13})\sqrt{\frac{(N-1)(1+r_{23})}{2(N-1)|R|/(N-3)+\bar{r}^{2}(1-r_{23})^{2}}}

  • r_{12} = correlation between Observation and measure 2
  • r_{13} = correlation between Observation and measure 3
  • r_{23} = correlation between measures
  • N = Number of Observation
  • |R| = 1 - r_{12}^2 - r_{13}^2 - r_{23}^2 + 2(r_{12})(r_{13})(r_{23})
  • \bar{r} = (r_{12} + r_{13})/2

从结果上来看,Hotelling-Williams T test在我的数据上的结果还是挺不错的。另外我对它进行一些文献搜索的时候,发现了[crr]的博客上也有解决类似问题的文章--频数指标跟单词处理时间的问题. (英文)博客上面写得非常仔细而且还额外地介绍了两个检验方法。有一个是Vuong检验[Vuong,1989], 它主要是用在一些例如log 频数指标与单词处理时间之类的非线性情况下的建模. 它的一个特出的优点是它的原理基础是基于log-likelihood的。另外一种检验方法是Clarke(2007)[Clarke]检验,他提出这个方法是基于对Vuong检验在样本量小时的保守性的怀疑。但是在[crr]作者的一系列的模拟实验之后,他们建议优先使用Hotelling-Williams T 检验,其次是Vuong检验。并且在变量间的相关度很低的时候是用Vuong检验。

当然,Hotelling-Williams T检验的核心思想我还没有来得及琢磨清楚,不过应该会在下一次的文章中写上。

  1. [crr]http://crr.ugent.be/archives/546
  2. [Vuong] Vuong, Q.H. (1989): Likelihood Ratio Tests for Model Selection and non-nested Hypotheses. Econometrica, 57, 307-333.
  3. [Clarke] Clarke, K.A. (2007). A Simple Distribution-Free Test for Nonnested Model Selection. Political Analysis, 15, 347-363.
  4. [Steiger] Steiger, J.H. (1980), Tests for comparing elements of a correlation matrix, Psychological Bulletin, 87, 245-251.