- Web location for the ICNALE Corpus
- Zip file for deep learning demo using ICNALE data
- Python Packages to Install
- BERT finetune CEFR scores on ICNALE data
Step 1: Import libraries
Step 2: Setup BERT
Step 3: Load the ICNALE Corpus data
Step 4: Divide the data up into training, validation, and test sets
Step 5: Preprocessing Function
Step 6: Set up the metrics we use for model training and evaluation
Step 7: Display the distribution of CEFR scores in the training, validation, and test sets.
Step 8: Set up the training scheme
Step 9: Run the Trainer
Step 10: Evaluate the final model on the held-out test dataset
Step 11: Examine the confusion matrix for this model
Step 12: Reset model to run a regression
Step 13: Redefine the preprocess function</font>
Reset the data using the new reprocess function
Define metrics for regression
The metrics we use for regression are different than metrics we use for classification.
• Error is predicted value minus absolute value. MSE squares that and takes the average, which penalizes more for larger errors. MAE just takes the absolute value of the error and takes the average, which doesn't penalize large errors so severely. R2 (r squared) is the correlation between predicted and true values, squared, again penalizing weaker relations between actual and predicted scores more strongly.