Model Evaluation Metrics

Performance analysis of the medical language model

Perplexity

Lower perplexity indicates better language modeling performance

QA F1 Score

Higher F1 score indicates better question-answering performance

Training History

Details of previous model training sessions and their performance

No training sessions have been recorded yet.

Understanding the Metrics

Perplexity

Perplexity is a measurement of how well a probability model predicts a sample. Lower perplexity indicates the model is better at predicting the sample.

For language models, it can be interpreted as the model's uncertainty when predicting the next word. A lower perplexity means the model is more confident in its predictions.

F1 Score

F1 score is a measure of a model's accuracy on a specific dataset. It is the harmonic mean of precision and recall, where precision is the percentage of correctly identified positive results, and recall is the percentage of actual positive cases the model identified.

For question-answering tasks, a higher F1 score indicates better performance in providing accurate and relevant answers.