學而樂: 2015

2015년 11월 17일 화요일

Taylor’s Formula

If $ f(x) = a_0 + a_1 x + a_2 x^2 + \cdots $,
$$
a_n = \frac{f^{(n)}(0)}{n!}
$$
with another base point
$$
f(x) = f(b) + f’(b)(x-b) + \frac{f’’(b)}{2} (x-b)^2 + \frac{ f^{(3)}(b)}{3!}(x-b)^3 + \cdots
$$
For example, $\sqrt{x}$ is not appropriate to expand at $0$ because it’s not differentiable at $0$. so use $ b= 1$,
$$
x^{\frac{1}{2}} = 1 + \frac{1}{2}(x-1) + \frac{ \left( \frac{1}{2} \right) \left( \frac{1}{2} - 1 \right) }{2!} (x-1)^2 + \cdots
$$

2015년 11월 15일 일요일

likelihood function

어떤 probability distribution(parameter $\theta$)에서 sample $\mathbb x$를 뽑을 확률을 $p$라고 하면,
$$
L(\theta | \mathbb x) = p (=p(\mathbb x | \theta) )
$$
이다. 예를들어 Bernoulli distribution($Bin(n,\pi)$)은 다음과 같이 정의되는데,
$$
f(x;\pi) = \pi ^ x (1-\pi)^{(1-x)} , x = 0 , 1, \pi \text{ is unknown parameter}
$$
likelihood function은 다음과 같다.
$$
L(\pi | x) = \frac{x!}{(n-x)!x!} \pi ^x (1-\pi)^{(n-x)}
$$
계산상 편의를 이유로 Log를 취한 log likelihood가 자주 쓰인다.

2015년 11월 12일 목요일

Gibbs sampling

1. initial value $X^{(0)}$
2. update $j$th component by $p(x_j | x_1^{(i+1)}, \cdots , x_{j-1}^{(i+1)}, x_{j+1}^{(i)}, \cdots , x_n^{(i)})$
3. repeat

(from wikipedia)

cf. block gibbs sampling
update $j$th component by $p(x_j | x_1^{(i)}, \cdots , x_{j-1}^{(i)}, x_{j+1}^{(i)}, \cdots , x_n^{(i)})$

2015년 10월 20일 화요일

2015년 9월 7일 월요일

divergence theorem

$
\iint\limits_S F \cdot dS = \iiint\limits_E \nabla F dV
$

2015년 8월 3일 월요일

Euclidean algorithm

출처 : https://en.wikipedia.org/wiki/Euclidean_algorithm#/media/File:Euclidean_algorithm_1071_462.gif

wikipedia의 Euclidean Algorithm항목에 있는 gif animation인데 호제법을 이해하는데 이보다 뛰어난 그림을 본 적이 없다. 아주 직관적이다.

2015년 5월 6일 수요일

Pattern Recognition이

Clustering이랑 결국 같은건가?

그러니까..
3, 5, 7, ( ), 11 에서 괄호 안에 들어갈 숫자를 맞추는 작업이
clustering과 본질적으로 같은 문제인가?

Is Patter Recognition equal to clustering problem?
I mean, for example, if there is a series with an unknown number, is guessing that substantially equal to the problem of clustering?

2015년 5월 5일 화요일

R로 어떻게 비벼보려고 했는데..

결국 그냥 Matlab쓰기로..
심지어 거의 10만원을 들여 정품 사용자가 되었다.
ㅜㅜ

Anscombe's quartet

$ (x, y) $형식인 11개의 점을 하나의 dataset으로 총 네개의 dataset인데, 각 데이터의 mean, var, cor, linear regression이 같다. graphical하게 보면 많이 다르고.
http://en.wikipedia.org/wiki/Anscombe's_quartet

Dijkstra, Anscombe발음

Edsger Wybe Dijkstra
위키피디아에 찾아보면 제일 첫줄에 발음이 나온다 ㅎㅎ
이 아저씨 full name도 아주 어렵다. 독일 사람들 이름이 이렇게 어려웠나..
‘에스커 위버 다익스트라’정도 된다. 에스 ‘커’ 는 그 ‘허’와 ‘커’ 중간쯤 되는 바람새는 발음.
2002년에 돌아가셨으니 당시에 굉장히 큰 뉴스였을텐데 전혀 몰랐다.

이걸 찾아보려한건 아니고 Anscombe발음 찾다가 저기까지 갔다. ‘앤스컴’정도로 읽는다.

2015년 4월 19일 일요일

Tupper's self-referential formula

처음 봤을 때 엄청 신기했던건데 youtube에 간단한 설명도 올라와서 링크

formula설명은 http://en.wikipedia.org/wiki/Tupper%27s_self-referential_formula
동영상은 https://www.youtube.com/watch?v=_s5RFgd59ao

2015년 4월 2일 목요일

Softmax action selection

reinforcement learning에서, $\epsilon$-greedy action의 단점은 가장 높은 Q값 이외에 나머지를 exploration할 때 각 Q값을 고려하지 않는다는 점이다. 그래서 나머지에도 가중치를 부여해서 확률에 따라 선택하게 하는 것이 softmax action selection.
가장 common한 것은 Gibbs distribution.(=Boltzman distribution)
$$
\frac{ e^{Q_t (a) / \tau } }{\sum_{b=1}^n e^{Q_t (b) / \tau }}
$$
$\tau$는 양수이며 온도(temperature)라 불린다. $\tau$가 높으면 나머지가 equi-probable해지고 낮으면 각 선택지의 확률차이가 커진다. $0$에 가까워지면 $\epsilon$-greedy와 동일해진다.
Softmax가 노리는 효과는 다른 방법으로도 달성될 수 있는데 $Q_t (a)$에 임의의 작은 값들을 더하는 것이다. (본인 주: 정확히 같은 효과라기보다 선택지들을 선택할 때 uniform distribution을 따르지 않도록 만든다는 뜻인듯)
$\epsilon$-greedy와 softmax중에 무엇이 더 ‘좋은’ 방법인지는 알려져 있지 않고, task에 따라 다르다. 둘 다 변수를 하나만 조절해야 한다($\tau$ or $\epsilon$)는 점은 같다.

원문은 이 책의 2.3챕터
Original text is in chapter 2.3 of this book.
Pdf file can be easily got from google search.

pursuit method는 softmax보다 더욱 강화된 형태. 예를들어, softmax를 이용해서 probability가 조정된 상태에서 다시 Q$_{max}$에 대해 확률값을 더 키워준다.

2015년 3월 5일 목요일

Proof of the Optimality of Huffman Code

웹에 있는 것들 중에 http://www.utdallas.edu/~sizheng/CS4349.d/l-notes.d/L9.pdf 파일이 가장 쉬운 것 같다.

Lemma 1. 가장 작은 두 글자가 가장 long leaf에 sibling으로 있는 것이 optimal 하다.

직관적으로도 그렇고, 계산도 pdf에 있다.

Lemma 2. optimal한 tree를 통해 (longer or original) optimal한 tree를 얻는다. 다시말해,
x,y를 합해서 z라는 노드를 만들었다면, [z를 포함하면서 optimal한 tree]에서 z만 x,y로 바꿔 [x,y를 포함하는 optimal tree]를 얻는다.

contradiction으로 증명. 이하 [x를 포함하는 optimal tree]를 $T_x$로 줄여 표시함.
$T_z$에서 z만 x,y로 나눈 트리를 $T_{x,y}$라 하고, $T_{x,y}$와 다른 optimal tree(물론 이때 Lemma 1을 만족해야 한다)의 존재 $T_{x,y} '$을 가정한다. $T_{x,y} '$을 통해 $T_z '$을 얻을 수 있는데, $T_z '$의 비용이 $T_z$보다 작음을 계산을 통해(pdf에 나옴) 보일 수 있다. 이는 $T_z$가 optimal하다는 가정에 위배.

위 두 Lemma를 통해 증명 끝.

I think http://www.utdallas.edu/~sizheng/CS4349.d/l-notes.d/L9.pdf $\leftarrow$ this file is most understandable.

Lemma 1. Let the most frequent two letters are, then the two letters must be at the longest leaf as a sibling. That's optimal.
Reference that pdf file, which shows concrete calcuation.

Lemma 2. We can get a longer optimal tree by using a smaller optimal tree. For example, if a node z is got by x, y node, we can get [the optimal tree including x, y] from [the optimal tree including z] by only substituting z by x, y.
Let's say [the optimal tree including x] as $T_x$, and $T_{x,y}$ is the tree including x, y which are split from z. And assume the optimal tree $T_{x,y} '$ that is not equal to $T_{x,y}$. Surely $T_{x,y} '$ must be subject to Lemma 1. Then we can get $T_z '$ from $T_{x,y} '$, and show cost of $T_z '$ is smaller then the cost of $T_z$. See the pdf file for precise calculation. This is contradictory.

From these, proof is done immediately.