问题描述:

Here the task is classification of videos.

One sample is a single frame.

A group of samples is consecutive frames of a video.

I have a Neural network that gives me an accuracy of 65% for a single sample (frame) .

If I use the output of last but one layer(dense layer (4096)) as a feature extractor and train an lstm with group of these features.

Would the prediction of the group of samples always be better than the single sample approach?

Source:- Paper

In this paper thy have use a CNN as a feature extractor and then used their features to train an LSTM.

The LSTM model on top of the CNN always seems to give an improved result.

The only reasons I can think of where they dont perform better with the CNN+LSTM is.

1) The extracted features do not contain time dependent data that the LSTM can exploit.

2)

相关阅读:
Top