Sign language recognition is the task of recognizing the sign performed by deaf people. Sign gestures can be isolated (one sign per video) or continuous (several signs per video). In this assignment, you will develop a deep learning model for sign language recognition at the sentence level. This task is similar to the video captioning task where the input will be a sign video and the output will be the performed sign(s) by the signer in the text format. An example of the input and output of the system is shown below. In the name of Allah Encoder Decoder In this assignment you need to do the following: O O о Read and prepare the dataset The dataset is provided as images (access link) There are 10 different sentences performed by three signers There are 80 frames extracted from each video's sample Use word embeddings to represent the ground truth text when you feed it to the model For each frame, use MobileNetV2 to extract features from the last fully connected layer before the classification layer of the MobileNetV2 pretrained model. You will use these features as inputs for the following questions. Develop an encoder-decoder model to recognize the sign video (video captioning) Select the best architecture that gives good results Develop an encoder-decoder model with attention to recognize the sign video (video captioning) Report the results using word error rate (WER) metric Transformer Develop a transformer model for the same problem. Sign language recognition is the task of recognizing the sign performed by deaf people. Sign gestures can be isolated (one sign per video) or continuous (several signs per video). In this assignment, you will develop a deep learning model for sign language recognition at the sentence level. This task is similar to the video captioning task where the input will be a sign video and the output will be the performed sign(s) by the signer in the text format. An example of the input and output of the system is shown below. In the name of Allah Encoder Decoder In this assignment you need to do the following: O O о Read and prepare the dataset The dataset is provided as images (access link) There are 10 different sentences performed by three signers There are 80 frames extracted from each video's sample Use word embeddings to represent the ground truth text when you feed it to the model For each frame, use MobileNetV2 to extract features from the last fully connected layer before the classification layer of the MobileNetV2 pretrained model. You will use these features as inputs for the following questions. Develop an encoder-decoder model to recognize the sign video (video captioning) Select the best architecture that gives good results Develop an encoder-decoder model with attention to recognize the sign video (video captioning) Report the results using word error rate (WER) metric Transformer Develop a transformer model for the same problem.

Sign language recognition is the task of recognizing the sign performed by deaf people. Sign gestures can be isolated (one sign per video) or continuous (several signs per video). In this assignment, you will develop a deep learning model for sign language recognition at the sentence level. This task is similar to the video captioning task where the input will be a sign video and the output will be the performed sign(s) by the signer in the text format. An example of the input and output of the system is shown below. In the name of Allah Encoder Decoder In this assignment you need to do the following: O O о Read and prepare the dataset The dataset is provided as images (access link) There are 10 different sentences performed by three signers There are 80 frames extracted from each video's sample Use word embeddings to represent the ground truth text when you feed it to the model For each frame, use MobileNetV2 to extract features from the last fully connected layer before the classification layer of the MobileNetV2 pretrained model. You will use these features as inputs for the following questions. Develop an encoder-decoder model to recognize the sign video (video captioning) Select the best architecture that gives good results Develop an encoder-decoder model with attention to recognize the sign video (video captioning) Report the results using word error rate (WER) metric Transformer Develop a transformer model for the same problem. Sign language recognition is the task of recognizing the sign performed by deaf people. Sign gestures can be isolated (one sign per video) or continuous (several signs per video). In this assignment, you will develop a deep learning model for sign language recognition at the sentence level. This task is similar to the video captioning task where the input will be a sign video and the output will be the performed sign(s) by the signer in the text format. An example of the input and output of the system is shown below. In the name of Allah Encoder Decoder In this assignment you need to do the following: O O о Read and prepare the dataset The dataset is provided as images (access link) There are 10 different sentences performed by three signers There are 80 frames extracted from each video's sample Use word embeddings to represent the ground truth text when you feed it to the model For each frame, use MobileNetV2 to extract features from the last fully connected layer before the classification layer of the MobileNetV2 pretrained model. You will use these features as inputs for the following questions. Develop an encoder-decoder model to recognize the sign video (video captioning) Select the best architecture that gives good results Develop an encoder-decoder model with attention to recognize the sign video (video captioning) Report the results using word error rate (WER) metric Transformer Develop a transformer model for the same problem.