Building a CNN model for labeling video data – Labeling Video Data-2
Assuming you have already downloaded and extracted the Kinetics dataset from GitHub, let’s proceed further:# Define the path to the Kinetics Human action dataset
# Specify the directories
dance = “<your_path>/datasets/Ch9/Kinetics/dance”
brush = “<your_path>/datasets/Ch9/Kinetics/brushing”
new_video_data = “<your_path>/datasets/Ch9/Kinetics/test”
# Load video data and get the maximum number of frames
dance_video, _ = load_videos_from_directory(dance)
brushing_video, _ = load_videos_from_directory(brush)
test_video, _ = load_videos_from_directory(new_video_data)
# Calculate the overall maximum number of frames
max_frames = max(dance_video.shape[1], brushing_video.shape[1])
# Truncate or pad frames to max_frames for both classes
dance_video = dance_video[:, :max_frames, :, :, :]
brushing_video = brushing_video[:, :max_frames, :, :, :]
# Combine data from both classes
video_data = np.concatenate([dance_video, brushing_video])
IV. One-hot encoding: Create labels and perform one-hot encoding:
labels = np.array([0] * len(dance_video) + [1] * \
len(brushing_video))
Check the size of the dataset
print(“Total samples:”, len(video_data))
V. Split the video frames into training and test sets: The training set will be used to train the model, while the test set will be used to evaluate the model’s performance:
Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(video_data, \
labels_one_hot, test_size=0.2, random_state=42)
In machine learning, the random_state parameter is used to ensure reproducibility of the results. When you set a specific random_state value, the data splitting process becomes deterministic, meaning that every time you run the code with the same random_state, you will get the same split. This is particularly important for experimentation, sharing code, or comparing results between different models or algorithms.
By setting a specific value for random_state (in this case, 42), the train–test split will be the same every time the code is executed. This is crucial for reproducibility, as it ensures that others who run the code will obtain the same training and test sets, making results comparable.
- Define the CNN model: Now, we’ll define the architecture of the CNN model using the Keras API. The architecture can vary, depending on the specific requirements of your task. Here’s a basic example:
model = keras.Sequential(
[
layers.Conv3D(32, kernel_size=(3, 3, 3), activation=”relu”, input_shape=(max_frames, 64, 64, 3)),
layers.MaxPooling3D(pool_size=(2, 2, 2)),
layers.Conv3D(64, kernel_size=(3, 3, 3), activation=”relu”),
layers.MaxPooling3D(pool_size=(2, 2, 2)),
layers.Flatten(),
layers.Dense(128, activation=”relu”),
layers.Dense(2, activation=”softmax”) # Two output nodes for binary classification with softmax activation
]
)
In this example, we define a simple CNN architecture with two pairs of convolutional and max-pooling layers, followed by a flattening layer and a dense layer with softmax activation for classification. Adjust the number of filters, kernel sizes, and other parameters based on your specific task requirements.
- Compile the model: Before training the model, we need to compile it by specifying loss function, optimizer, and metrics to evaluate during training:
model.compile(loss=”categorical_crossentropy”, optimizer=”adam”, /
metrics=[“accuracy”])
In this example, we’re using categorical cross-entropy as the loss function, the Adam optimizer, and accuracy as the evaluation metric. Adjust these settings based on your specific problem.
- Train the model: Now, let’s proceed to train the CNN model using the preprocessed video frames. The fit method is utilized for this purpose:
model.fit(X_train, y_train, epochs=10, batch_size=32, \
validation_data=(X_test, y_test))
In this code snippet, x_train and y_train represent the training data (the preprocessed video frames and their corresponding labels). The batch_size parameter determines the number of samples processed in each training iteration, and epochs specify the number of complete passes through the training dataset. Additionally, validation_data is provided to evaluate the model on the test dataset during training.
- Evaluate the model: After training the model, we need to evaluate its performance on the test set to assess its accuracy and generalization capability:
test_loss, test_accuracy = model.evaluate(x_test, y_test)
print(“Test Loss:”, test_loss)
print(“Test Accuracy:”, test_accuracy)
Leave a Reply