Multiplicative LSTM (Workshop Track in ICLR 2017)をTensorFlowで実装した

はじめに

表題の通り、ICLR 2017のWorkshop Trackで発表されたMultiplicative LSTMを実装した。

論文

Ben Krause, Iain Murray, Steve Renals and Liang Lu, "Multiplicative LSTM for sequence modelling," Workshop Track in ICLR 2017.
URL https://openreview.net/forum?id=SJCS5rXFl¬eId=SJCS5rXFl

実装

ICLR2017のレビューコメント

ここより引用する：

Comment: The paper presents a new way of doing multiplicative / tensored recurrent weights in RNNs. The multiplicative weights are input dependent. Results are presented on language modeling (PTB and Hutter). We found the paper to be clearly written, and the idea well motivated. However, as pointed out by the reviewers, the results were not state of the art. We feel that is that this is because the authors did not make a strong attempt at regularizing the training. Better results on a larger set of tasks would have probably made this paper easier to accept.

Pros:
- interesting idea, and reasonable results
Cons:
- only shown on language modeling tasks
- results were not very strong, when compared to other methods (which typically used strong regularization and training like batch normalization etc).
- reviewers did not find the experiments convincing enough, and felt that a fair comparison would be to compare with dynamic weights on the competing RNNs.

他手法と比較したときに、有効性を示すには実験結果がまだちょっと弱い
評価タスクが言語モデルだけで行われており、手法の比較もまだ足りない

先行研究 : Multiplicative RNN (mRNN) *1

１時刻前の隠れ状態に掛けられる重み行列$W_{hh}$を（テンソル的に）分解し、当該時刻の入力に依存するように拡張したもの。character-levelの言語モデリングに有効性が示されている。また、生成モデルとして用いて文章を生成した実験結果から、mRNNが高次の言語的構造、文法的構造を捉えていることが示された。

拡張前
$\begin{eqnarray} \hat{h}_{t} &=& W_{hh} h_{t-1} + W_{hx}x_{t} \end{eqnarray}$

拡張後
$\begin{eqnarray} m_{t} &=& (W_{mx} x_{t}) \odot (W_{mh} h_{t-1})\\ \hat{h}_{t} &=& W_{hm} m_{t} + W_{hx}x_{t} \end{eqnarray}$

なぜ Multiplicative なのか？

積の形にすることで、コンテキストと文字の共起関係（conjunction）を表現している。同時確率的な。

LSTM

論文の中で著者らは以下のLSTM形式を用いた。通常用いられるものとは若干異なる点に注意。$h_{t}$の計算も、活性化関数に通す前にゲートの処理をしているが、何故これで良いのか定性的な説明は（論文中に）あまりないように思う。

$\begin{eqnarray} \hat{h}_{t} &=& W_{hm} m_{t} + W_{hx} x_{t} \\ i_{t} &=& \sigma (W_{ix} x_{t} + W_{ih} h_{t-1}) \\ o_{t} &=& \sigma (W_{ox} x_{t} + W_{oh} h_{t-1})\\ f_{t} &=& \sigma (W_{fx} x_{t} + W_{fh} h_{t-1}) \\ c_{t} &=& f_{t} \odot c_{t-1} + i_{t} \odot \hat{h}_{t} \\ h_{t} &=& \tanh(c_{t} \odot o_{t}) \end{eqnarray}$

提案手法

先行研究として提案されたmultiplicative RNNの仕組みを上記のLSTMに導入したモデル。

$\begin{eqnarray} m_{t} &=& (W_{mx} x_{t}) \odot (W_{mh} h_{t-1}) \\ \hat{h}_{t} &=& W_{hm} m_{t} + W_{hx}x_{t} \\ i_{t} &=& \sigma (W_{ix} x_{t} + W_{im} m_{t}) \\ o_{t} &=& \sigma (W_{ox} x_{t} + W_{om} m_{t})\\ f_{t} &=& \sigma (W_{fx} x_{t} + W_{fm} m_{t}) \end{eqnarray}$

$W_{mx}$と$W_{mh}$の分、パラメータ数は増える。先のレビューの通り、性能が劇的に向上するというわけではないので、タスクを選ぶネットワーク構造なのかもしれない。

参考文献

I. Sutskever, J. Martens, and G. E. Hinton. Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 1017–1024, 2011.
- 著者によるプレゼン動画 Generating Text with Recurrent Neural Networks - TechTalks.tv

*1:参考文献 1