AcademicCodeConceptMachine LearningProjectVideos

Score Prediction from User Log with BERT

This project aims to predict user scores based on sequences of their interaction logs with a quiz system, without knowledge of the correct answers or the choices made by the user. The primary challenge is to infer the score from patterns in user behavior during the quiz.

Problem statement:

  • Actions: Distinct types of interactions like changing an answer, requesting a hint, visiting educational resources (EDA), and interacting with a chatbot.
  • States: Representations of the user’s quiz attempt at each action log, including time spent, number of hints requested, and other engagement metrics.
  • Labels: Final scores of students, which are the target predictions of the model.

Implementation steps:

  • 1- Data Collection and Preprocessing: Gather and preprocess data into a format compatible with the Transformer model.
  • 2- Feature Engineering: Develop comprehensive features that encapsulate diverse aspects of user interactions.
  • 3- Model Training: Train the model using the prepared dataset, adjusting parameters as necessary.4- Model Evaluation: Validate the model’s performance on a separate test set to ensure its effectiveness.

Data Collection and Preprocessing

The dataset consists of user interaction logs from a quiz system (DaTu). Each log entry records the type of action performed by the user and the corresponding time interval. The actions include various interactions such as changing an answer, requesting hints, visiting educational resources, and interacting with a chatbot. The raw data is preprocessed to convert actions and time intervals into a sequence of tokens. Each action is assigned a unique token (e.g., ‘UA’ for changing an answer, ‘FH’ for requesting the first hint), and time intervals are binned into categories (e.g., ‘0’ for 0-1 seconds, ‘1’ for 1-5 seconds). Additionally, problem IDs (‘Q1’, ‘Q2’, ‘Q3’) are incorporated to indicate the specific problem being attempted.

Table for each tokens:

ActiontokenActiontoken
Change answer‘UA’First answer‘FA’
Paste answer’‘PA’Update answer explanation‘UE’
Request first hint‘FH’Request another hint‘UH’
Respond to hint feedback‘RH’New answer explanation‘FE’
Freeform code run‘RF’Run code‘RC’
User request‘B’Update confidence‘C’
Complete sub-module‘M’Streamlit interaction‘S’
Problem ‘Q1’, ‘Q2’, ‘Q3’time‘T’

Each internal of time is defined as follow:

intervaltokenintervaltoken
0-1‘0’1-5‘1’
5-10‘2’10-15‘3’
15-20‘4’20-30‘5’
30-60‘6’60-120‘7’
120-300‘8’300>‘MAX’

Model Training

The core model architecture used is a DistilBERT-based Transformer model, enhanced with Low-Rank Adaptation (LoRA) for efficient fine-tuning. The model incorporates custom embeddings for the action-time tokens. The training loop includes standard components such as loss calculation, gradient descent optimization, and learning rate scheduling. Curriculum learning is employed to gradually introduce more complex sequences, starting with simpler ones, to enhance model robustness.

Curriculum Learning Implementation

Curriculum learning is applied by sorting the training data based on sequence length and complexity. Initially, the model is trained on shorter, simpler sequences. As training progresses, more complex sequences are gradually introduced. This approach helps the model to build a strong foundation before tackling more difficult examples.

data augmentation algorithm:

Algorithm:

Input:

  • sequences: A list of sequences where each sequence is a list of tokens.
  • max_changes: The maximum number of changes allowed per sequence.
  • action_prob: The probability of performing an action (repeat, skip, insert).
  • time_prob: The probability of varying time intervals.

Output:

  • all_sequences: A list of original and augmented sequences.

Algorithm:

  1. Initialize all_sequences as an empty list.
  2. For each sequence in sequences:
    1. Append the original sequence to all_sequences.
    2. Initialize augmented_sequence as an empty list.
    3. Initialize number_of_changes to 0.
    4. Initialize index i to 0.
  3. While i is less than the length of the sequence:
    1. If number_of_changes exceeds max_changes:
      • Append the remaining part of the original sequence to augmented_sequence.
      • Append augmented_sequence to all_sequences.
      • Reset augmented_sequence to the first i elements of the original sequence.
    2. Set token to the i-th element of the sequence.
    3. If token starts with ‘Q’:
      • Append token to augmented_sequence.
      • Increment i by 1.
    4. Else if token starts with ‘T’:
      • If a random number is less than time_prob, vary the time interval:
        • Extract the time value from token.
        • Apply a normal distribution to vary the time value within the bounds [0, 8].
        • Append the new time token to augmented_sequence.
        • Increment number_of_changes by 1.
      • Else:
        • Append token to augmented_sequence.
    5. Else (for action tokens):
      • Generate a random number p.
      • Increment number_of_changes by 1.
      • If p is less than action_prob:
        • Repeat the action by appending action, a random time interval (T0, T1, or T2), and action to augmented_sequence.
      • Else if p is less than 2 * action_prob:
        • Skip the action by incrementing i by 2.
      • Else if p is less than 2.5 * action_prob:
        • Insert a random action and time interval before the current action in augmented_sequence.
      • Else:
        • Append the action to augmented_sequence and decrement number_of_changes by 1.
    6. Increment i by 1.
  4. Append the augmented_sequence to all_sequences.

Return all_sequences.

Hyperparameter Comparison Table

HyperparameterBERTDistilBERTDistilBERT + DropoutDistilBERT + Curriculum LearningDistilBERT + LoRADistilBERT + Data Augmentation
Model Namebert-base-uncaseddistilbert-base-uncaseddistilbert-base-uncaseddistilbert-base-uncaseddistilbert-base-uncaseddistilbert-base-uncased
Learning Rate8e-58e-58e-58e-58e-58e-5
Batch Size888888
Max Sequence Length512512512512512512
Epochs505050505050
Warmup Steps444444
Gradient Clipping1.01.01.01.01.01.0
Dropout Rate0.1 (default)0.1 (default)0.3 (custom)0.1 (default)0.1 (default)0.1 (default)
Curriculum LearningNoNoNoYesNoNo
Low-Rank AdaptationNoNoNoNoYesNo
Data AugmentationNoNoNoNoNoYes

Results

MSEMAER2Max error
Bert0.0560.1780.3220.567
Distbert0.0550.1830.3350.543
DB+ dropout0.0570.1810.3410.552
DB+Curriculum0.0520.1700.0620.521
LORA+B+Cur0.0490.1790.1130.517
LORA+DB+Cur0.0540.1720.0300.540
Augmentdata0.0480.1780.1250.467

Conclusion

The proposed approach effectively leverages Transformer models and advanced techniques like curriculum learning and LoRA to predict user scores from interaction logs. The comprehensive feature engineering and targeted fine-tuning strategies result in a robust model capable of providing accurate predictions, demonstrating the potential for enhancing educational tools with advanced machine learning methodologies.

Two related YouTube videos:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses User Verification plugin to reduce spam. See how your comment data is processed.