The development of perceptual skills is an important aspect of veterinary education. The authors investigated veterinary student competency in lameness evaluation at two stages, before (third year) and during (fourth/fifth year) clinical rotations. Students evaluated horses in videos, where horses were presented during trot on a straight line and in circles. Eye-tracking data were recorded during assessment on the straight line to follow student gaze. On completing the task, students filled in a structured questionnaire. Results showed that the experienced students outperformed inexperienced students, although even experienced students may classify one in four horses incorrectly. Mistakes largely arose from classifying an incorrect limb as lame. The correct detection of sound horses was at chance level. While the experienced student cohort primarily looked at upper body movement (head and sacrum) during lameness assessment, the inexperienced cohort focused on limb movement. Student self-assessment of performance was realistic, and task difficulty was most commonly rated between 3 and 4 out of 5. The inexperienced students named a considerably greater number of visual lameness features than the experienced students. Future dedicated training based on the findings presented here may help students to develop more reliable lameness assessment skills.
- equine lameness
- gait scoring
- eye tracking
- lameness examination
- observer agreement
Statistics from Altmetric.com
An important aspect of veterinary education is the development of perceptual skills. In equine veterinary education, the ability to correctly identify, differentiate and classify lameness is one such skill. However, even experienced veterinarians struggle with this task, especially for mild lameness.1–6 Currently, only a small number of studies have investigated student performance in equine gait scoring2 7 and the nature of their visual assessments remains largely unexplored. In order to develop a systematic approach to lameness education, leading to reliable performance of our future practitioners, it would be beneficial to understand student performance better.
The experience-dependent ability to assess gait abnormalities has been investigated in numerous studies, covering not only horses2 7 but also human beings,8–13 dogs14 and pigs.15 Gait assessment skills improve with experience. Intraobserver agreement (the same person judging a scenario multiple times) often becomes more consistent,2 7 9 12 15 although this may simply reflect mistakes being made more consistently.16 17 Interobserver agreement (different people judging the same scenario) also often increases with experience.7 9 10 12 15 For visual gait assessment in horses, a past landmark study highlighted the strong subjectivity and unreliability of gait evaluation independent of experience level. The authors hypothesised that high intraobserver variation in less experienced observers may be rooted in a lack of confidence and a less systematic assessment protocol compared with their experienced counterparts.2
For a veterinary student, lameness assessment skills are commonly developed through a combination of taught classes and practical exposure on clinical rotations and work placements. During practical sessions, students learn from the experts of their respective institutions. Dedicated textbooks support the expansion of knowledge regarding the gait examination,18–21 although authors vary in the weighting of different lameness pointers. At the Royal Veterinary College (RVC), at the time of performing this study, the didactic aspects of equine lameness were first covered in year 3 of the BVetMed curriculum. At that point, clinical observation and decision-making skills were starting to be developed. The gained knowledge is then applied during intramural clinical rotations in years 4 and 5 where students spend one week on equine orthopaedics. Here, students follow the clinician on duty when performing lameness examinations of horses admitted to the RVC’s Equine Referral Hospital. Students with an interest in equine work typically also choose to perform their extramural studies (EMS) in equine practice or referral hospitals, where a larger number of lameness cases can be followed.
To date, gait assessment studies have focused largely on performance metrics as outcome variables. However, during perceptual tasks it is equally important to quantify the approach taken by participants in order to relate performance and approach. In the case of lameness assessment, this approach would be quantifiable through the pattern of visual attention allocation. Eye tracking allows the recording of gaze - where a participant is looking - when making judgements, opening the door to insights into cognitive processes that were previously inaccessible: due to the narrow field of sharp vision (1–2° visual angle for foveal, respectively central, vision), human beings have to look at those parts in a scene where they expect to find the information that they are seeking.22 Eye tracking has been used in thousands of studies.23 In human medicine, for example, it has helped to understand errors made when interpreting diagnostic images.24–29
In the present study, the authors investigated the approaches and abilities of a random subsection of veterinary students from two different stages in their degree programme when evaluating horses for lameness. They used a combination of gait scoring, eye tracking and questionnaires to quantify the percentage of horses evaluated correctly, sources of judgement error, visual assessment strategies and student attitudes. The aim of this work is to provide a comprehensive baseline for the development of more effective courses on equine lameness identification.
Materials and methods
The task was to assess horses on a PC monitor. Thirty-four video recordings (25 Hz, resolution 1024 x 768 pixels) were selected to compile a database comprising an approximately equal presence of forelimb and hindlimb lameness in both the left and right limbs in different views (Fig 1): 15 clips showed the horses trotting away from and towards the observer on a straight line (condition ‘straight line’), 15 horses trotted on a circle on either the left or the right rein (condition ‘circle’) and 4 horses were shown trotting in lateral view (condition ‘lateral’). The lateral condition was only used to explore student opinion regarding this view, but not for in-depth analysis of performance. All video recordings were edited in Premiere Pro CS4 (Adobe, USA) into clips of approximately 30–50 seconds duration. Each clip presented the horse in the chosen view without sound, the video being repeated at least once. Clips were presented on a 21-inch CRT monitor (Dell Trinitron, refresh rate: 100 Hz).
The veterinary reports for all horses presented on straight line and circle were obtained in order to establish the lame limb and lameness score of each horse from the expert clinical examination based on visual examination and final diagnosis. Four video recordings of horses filmed in lateral view were scored by three experienced assessors who agreed on the lame leg independently and an average was taken from all three scores. For all horses classed as lame, a visually detectable lameness in the limb consistent with the expert opinion was verified by the experimenter (SDS) to warrant that it could be shown to students and ensure that lameness was visible to the trained eye. In total, 14 out of 15 horses were deemed lame on the straight line, 12 out of 15 on the circle and 3 out of 4 in lateral view. Horses were unilaterally forelimb or hindlimb lame with a lameness grade of up to 5 out of 10 (UK grading scale, see Arkell and others7 or popular textbooks) across limbs and conditions. The lame horses on the straight line had a mean (SD) lameness score of 2.38 (1.30) out of 10 for forelimb lameness and 3.00 (1.26) out of 10 for hindlimb lameness. On the circle this amounted to 3.67 (1.21) out of 10 for forelimb lameness and 3.50 (1.05) out of 10 for hindlimb lameness and in lateral view to 3.67 (0.58) out of 10 for hindlimb lameness.
In total, 20 third-year BVetMed students (the ‘inexperienced’ cohort) as well as 12 fourth/fifth-year BVetMed students (the ‘experienced’ cohort, who had performed equine rotations and some participants had done at least one EMS placement in an equine hospital or practice) were recruited from the RVC student body to participate in the study. All participants volunteered to take part in the study by responding to an email invitation (inexperienced and part of the experienced cohort) or after additional encouragement from lecturing staff (part of the experienced cohort).
Eye tracking and gait scoring
Eye-tracking data were collected in parallel to the scoring session using a stand-alone eye tracker (X120, Tobii Technology, Sweden). The eye tracker was positioned underneath the monitor and calibrated for each individual participant at the beginning of the assessment session using the default five-point calibration procedure. Eye-tracking data were recorded at 60 Hz, with a small subset of data recorded at 120 Hz at the beginning of the study. The viewing distance between student and eye tracker was approximately 70 cm, ranging dynamically between 50 and 80 cm according to the calibration volume of the eye tracker.
Video clips were randomised in Matlab (The MathWorks, USA) and presented as seven segments within the proprietary eye-tracker software (Tobii Studio V.2). Each student was presented with the seven segments, each containing several video clips, in random order. This provided the best trade-off at the time between software limitations and workload arising from randomisation. For the assessment of each horse, an on-screen questionnaire was shown after each video clip with the following questions and single-choice answers:
Is the horse lame? If it is lame, which is the most affected leg (Answers: sound, left fore, right fore, left hind, right hind)
Which lameness score would you assign to that leg? (Answers: 0/10 (sound), 1/10, 2/10, 3/10, 4/10, 5/10, 6/10, 7/10, 8/10, 9/10, 10/10 (non-weight bearing))
Is a further leg affected? (Answers: no, left fore, right fore, left hind, right hind)
After completion of the task, students were given a questionnaire which contained questions relating to their self-evaluation and approaches when assessing horses for lameness. Students were asked to
rate the difficulty of the lameness assessment task on a scale from 1 (very simple) to 5 (very difficult);
estimate the percentage of horses they expected to have evaluated correctly;
rank the difficulty of assessing horses on the straight line, on a circle and in lateral view;
note down what movement features were used when assessing forelimb and hindlimb lameness on the straight line and circle;
the cohort of experienced students was additionally asked to estimate how many lameness cases they had seen so far.
Analysis of gait scoring
Scores for straight line and circle were analysed using custom-written software in Matlab (The MathWorks) after exporting data from Tobii Studio. For each student, the response for each horse was compared with the gold standard derived from clinical records. The following metrics were calculated to quantify performance: (1) the percentage of horses correctly classified as lame and sound, (2) the percentage of horses classified lame in the correct limb and (3) the percentage of incorrect classifications. Incorrect classifications were attributed to the following five reasons: students considering a lame horse sound, students considering a sound horse lame and students selecting the incorrect limb(s) as affected, further split into students selecting the incorrect contralateral limb as affected and students selecting an incorrect ipsilateral limb as affected. Variation in the allocated grades was quantified for each horse across all those students who had deemed the horse primarily lame in the forelimb or hindlimb. Score range as well as the difference between each score and the median score were calculated for each cohort. Agreement across students (interobserver agreement) was calculated through the free marginal multilayer kappa30–32 metric using an online kappa calculator (http://justusrandolph.net/kappa). Kappa was calculated for the presence or absence of lameness in general, forelimb lameness and hindlimb lameness. For this metric, the side of lameness was not considered.
Statistical analysis was performed in IBM SPSS V.20, where a Shapiro-Wilk test was run to check data for normal distribution and decide on the use of parametric or non-parametric tests. The percentages of lame horses correctly classed as sound or lame and correctly classed as lame in the correct limb was compared between student cohorts for straight and circle using the Mann-Whitney U test. Consistency in allocated grades was tested this way, too. Within student cohorts, a Wilcoxon signed-rank matched-pairs test was used to examine differences between performance for forelimb and hindlimb lameness and between straight line and circle. Regression analysis was performed to determine whether there was an association between the caseload seen and performance of the experienced cohort.
Analysis of visual assessment
For the assessment of the 15 horses trotting on a straight line away from/towards the observer, eye-tracking data were analysed to determine the preferred regions students examine visually. The constant change in direction of horses on the circle proved prohibitive for analysis due to the manual workload. In order to assign the gaze data to discrete regions on the horse, the position of the os sacrum, head and feet in the video recordings were digitised using Tracker (www.cabrillo.edu/~dbrown/tracker). In Matlab (The MathWorks), point of gaze was then calculated for each frame and assigned to an equally spaced horizontal grid, where it was assigned to 1 out of 18 bins (Fig 2). The grid was calculated based on the tracked os sacrum (trot away from observer) or head (trot towards observer) position with the feet as reference points. For each participant and each of the 15 horses, the percentage viewing time for each region was then calculated as the sum of frames spent looking at each region divided by the total viewing time. The total viewing time was calculated as the sum of all frames where gaze data were tracked on the monitor. For each participant, the mean viewing time was then calculated for each of the 18 discrete regions across all videos in which at least 60 per cent of the total video playing time was being tracked.
Statistical analysis was carried out in SPSS as described above. A Mann-Whitney U test was performed to compare the viewing times allocated to each of the 18 regions (Fig 2 and FIG 5) between the two student cohorts, separately for trot away from and towards the observer.
Analysis of questionnaire responses
From the answers to the questionnaires, the difficulty ratings for the task were extracted and average values and frequencies calculated. The percentage of horses students thought they had evaluated correctly (self-rating) and the objective percentage of horses which evaluated correctly as described above across all 34 horses and all three views was calculated. The difference between self-assessment and objective assessment was then calculated for each student and mean differences for each scenario and student cohort were calculated to examine bias in self-rating. The lameness pointers noted down by the students were extracted from the questionnaires, counted and ranked by nomination for assessment on the straight line and circle.
Statistical analysis was carried out in SPSS as described above. Differences in self-rating were compared between the inexperienced and experienced student cohort using an independent-samples t-test.
In this study, the authors report most results as the median (the mid-point between 50 per cent of data points which are bigger and 50 per cent that are smaller) and interquartile range (IQR) to estimate central tendency and variation for sometimes skewed data distributions. When splitting a data distribution into four equal parts from lowest to highest value, quartiles mark the thresholds between each of these four parts. The IQR reports data variation and quantifies the spread between the first and third quartile: it is the range in which the ‘middle’ 50% of data points can be found, and is in meaning similar to the standard deviation in normal distributed data. Further details can be found in standard statistics textbooks.
The median (IQR) percentage of lame horses correctly classed as generally lame by the inexperienced and experienced cohort for straight line and circle combined was 92 (10) per cent and 98 (10) per cent, respectively. There was no significant difference between cohorts (P≥0.307). The median (IQR) of sound horses correctly classed as sound across all views was 50 (30) per cent for both cohorts, not being significantly different (P=0.924).
For forelimb lameness, the median (IQR) of horses classified lame in the correct limb (Fig 3) for the inexperienced and experienced cohort was 50 (38) per cent and 69 (25) per cent, respectively, on the straight line as well as 83 (42) per cent and 100 (17) per cent, respectively, on the circle. For hindlimb lameness, these values decreased to 33 (33) per cent and 67 (25) per cent, respectively, on the straight line and 50 (33) per cent and 67 (33) per cent, respectively, on the circle. Differences in percentage correct between inexperienced and experienced cohort were significant (P≤0.032) except for forelimb lameness on the circle (P=0.053). Within student cohorts, differences were significant between forelimb and hindlimb lameness on the circle (P≤0.032), but not on the straight line (P≥0.313) and between trotting on the straight line and circle for forelimb lameness (P≤0.003), but not for hindlimb lameness (P≥0.253).
Incorrect classifications were mainly the result of declaring an incorrect limb as lame, with a median (IQR) of 71 (9) per cent and 60 (17) per cent for the inexperienced and experienced cohort, respectively. The incorrect contralateral and ipsilateral limbs were selected in similar proportion.
Mean grades did not differ significantly between cohorts for forelimb lameness (P=0.067). However, for hindlimb lameness grade differed significantly between cohorts (P<0.001), where the experienced cohort’s grade had a median score of 1 below the inexperienced cohort. The inexperienced cohort did not grade more than 6 grades away from the group’s median grade, while the experienced student group graded no more than 3.5 grades away (Fig 4).
Within each student cohort, agreement (Table 1) was highest for the presence or absence of lameness on the circle (kappa=0.70 for inexperienced and kappa=0.89 for experienced students) and lowest for the presence or absence of hindlimb lameness during evaluation away from/towards the observer (kappa=0.15 for inexperienced and kappa=0.20 for experienced students).
For the experienced student cohort, there was no association between the number of cases seen as part of their degree and the percentage of correctly evaluated horses for the nine students that provided data (R2=0.0039).
During trot away from the observer, the percentage viewing time was significantly different between the two cohorts for regions 5, 6, 7, 8, 9, 13 and 14 (P≤0.041), but not for the remaining regions (P≥0.114). During trot towards the observer, the percentage viewing time was significantly different between the two cohorts for regions 6, 7, 12, 13 and 14 (P≤0.047), but not for the remaining regions (P≥0.075). Differences in distribution of gaze data are illustrated in Fig 5. There was considerable variation across students in the distribution of allocated time across body regions for both trotting directions.
The inexperienced cohort rated the difficulty of performing a lameness examination (Table 2) with a mean (SD) score of 4.10 (0.48) out of 5, ranging from 3 to 5. The experienced cohort rated the difficulty with a score of 3.35 (0.47) out of 5, ranging from 3 to 4. The median (IQR) number of lameness cases seen during their time at university by the experienced cohort was 24 (30) cases, ranging from 6 to 100. Assessment away from/towards the observer was judged most difficult by the majority of inexperienced students (12/20 students, 60 per cent), while assessment away from/towards the observer (5/12 students, 42 per cent) and in lateral view (6/12 students, 50 per cent) were ranked as most difficult by a comparable number of experienced students.
Compared with the calculated performance (Fig 6), students underestimated their skill by −11 (15) per cent (inexperienced cohort) and −13 (12) per cent (experienced cohort). The difference between self-rating and objective assessment of performance was not significantly different between cohorts.
Both the inexperienced and experienced cohort most frequently named the use of head movement for the detection of forelimb lameness on the straight line and circle. Both groups also most frequently named pelvis movement on the straight and stride length on the circle for the detection of hindlimb lameness (Tables 3 and 4). The inexperienced cohort named a total of 19 and 18 features for forelimb lameness assessment on straight line and circle, respectively, compared with 9 and 6 features named by the experienced cohort. The inexperienced cohort named a total of 16 and 21 features for hindlimb lameness assessment on straight line and circle, respectively, compared with 5 and 7 features named by the experienced cohort.
Lameness assessment skills found in this study highlighted the need for better training of students. Both inexperienced and experienced student cohorts, while in general reliably classifying horses as lame, were weak in correctly determining the affected limb, and only performed at the level of chance when classifying a horse as sound. The percentage of horses classified as lame in the correct limb significantly differed between the two cohorts, highlighting what the authors hope is the effect of learning throughout the veterinary degree. However, the variation in performance across experienced students was not explained by the amount of lame horses that they had previously observed. This finding substantiates the view that just looking at many horses does not necessarily improve skill level. Rather, the amount of time engaged in deliberate practice, the act of practising and refining a skill actively, accounts for most of the expertise level accomplished.33 Even for the experienced cohort, performance was still not at a level which we would deem acceptable for reliable and repeatable diagnoses: there was large variation in performance across students and a median of less than 70 per cent of horses being correctly classified, with the exception of forelimb lameness on the circle. Consequently, students close to graduation would on average still incorrectly assess one out of four horses on the straight line, with individual participants assessing only 30 per cent of horses correctly. For the inexperienced cohort, individual performance was as low as 0 per cent correct. If these results generalise to other veterinary students, it is clearly important to ensure more targeted teaching of lameness assessment skills, to improve student performance, especially of those choosing an equine career path. Future training should specifically target detection of mild lameness, a task difficult even for expert assessors.5 34 Also, since most mistakes resulted from classifying the incorrect limb as lame, an emphasis should be placed on how to recognise the correct limb.
We found differences in performance between assessment on straight line and circle as well as between forelimb and hindlimb lameness. Only on the circle did experienced students show an outstanding performance for the detection of forelimb lameness with a median of 100 per cent. On the circle, the mean lameness score derived from clinical records was higher (3.67) compared with the straight line (2.38). This is likely to have made lameness easier to see: with increasing lameness, the head nod converges on almost a single vertical head excursion per stride,35–37 resulting in an unambiguous nod-down pattern. This pattern is easier to see than differences in the two excursions between the two steps of a stride during milder lameness since the eyes have to follow movement at only half the step frequency, where there is only a single minimum and maximum during the whole stride.
In line with findings for more experienced clinicians,5 agreement amongst students was lowest for the presence or absence of hindlimb lameness on both straight line and circle. The variation in assigned scores across students compared well with previous findings, where final year students graded no more than three grades away from a horse’s median grade, while more experienced assessors graded no more than two grades and experts graded no more than one grade away.7
While students did not perform at a competent level, their self-evaluation reflected their ‘objective’ performance, showing a realistic view of their own competence, if even a slight underestimate. This finding is a positive sign, for their safety as practitioners, as they are recognising their own limitations while it also marks a positive sign for learning. The self-evaluation of experienced medical practitioners often does not bear any correlation with actual capabilities,38 a finding also observed when reviewing self-assessments of inexperienced and experienced medical personnel.39 However, increasing self-awareness may lead to better self-evaluation40 and identification of the need for learning. Inexperienced students, in particular, repeatedly mentioned struggling with the task due to the lack of learning opportunities and uncertainty regarding lameness features.
Errors in classifying the lame limb were rooted in what the authors consider to be both conceptual and perceptual mistakes. Selection of the incorrect contralateral limb (left v right limb) is likely to have been related to a conceptual error, where the student did not correctly remember the rules relating a movement feature to the lame limb. In contrast, the selection of the incorrect ipsilateral limb (forelimb v hindlimb) may be related to a perceptual error due to the interaction of forelimb and hindlimb lameness: especially for hindlimb lameness, compensatory forelimb movements are a regular and notable occurrence.41 42 For forelimb lameness, compensatory hindlimb movements develop only with more pronounced lameness and to a lesser visible extent.41 42 This linkage between the forelimb and hindlimb movements may lead the student to mistaking compensatory movement for primary lameness, distracting them from further lameness evaluation. In future, study designs should account for this source of error by explicitly requesting observers to evaluate all limbs and if more than one appears lame, and to indicate whether the lameness is assumed to be primary or ‘compensatory’.
Students performed at chance level for the classification of sound horses as non-lame. This finding highlights the need to not only train students in recognising lameness, but also provide training in recognising soundness and familiarising students with the spectrum of movement in sound horses. Knowing what is ‘normal’ and what is not is a key element for the development of discrimination skills, which in turn is an important component of expertise development.43 Research into the teaching of radiographic interpretation has shown that allowing students to compare abnormal and normal radiographs during training resulted in better performance in a subsequent detection test compared with students who were trained with abnormal radiographs only.44 Similarly, facilitating ‘learning by contrast’ or ‘comparison learning’,45 46 for students learning how to analyse ECG, had a positive effect on subsequent student performance of the task.47 48 The need for greater standardisation of gait analysis training and a greater emphasis on what is normal to improve inter-rater reliability has equally been highlighted for human gait analysis.49 Hence, in future, it will be advisable to incorporate a greater number of sound horses into student training.
Visual assessment strategies differed between the two experience levels. The mapped eye-tracking data showed that the cohort of inexperienced students dedicated more time to the assessment of limb movement compared with the experienced cohort, both with the horse moving away from and towards the observer. The experienced students dedicated most time to the assessment of the pelvis area (trot away from the observer) and the head area (trot towards the observer). These findings matched the lameness features named by the two student cohorts: the inexperienced cohort named 16–21 features for assessment of forelimb and hindlimb lameness on straight line and circle, many of them relating to movement of the limbs. In contrast, the experienced cohort named only five to nine features across conditions, a large proportion referring to movement of head and pelvis. These findings indicate that as part of the development of diagnostic expertise students may discard redundant lameness features and rely on the few features which they assume to be most reliable. However, as a possible confounding factor, experienced students may have been taught by the same veterinarian as part of their clinical lameness rotations, which could have caused the shift in feature selection. Variation in the distribution of gaze data and described lameness features between students of the same cohort as well as between cohorts highlighted inconsistency in assessment approach and evaluation protocol, which should be addressed in future training especially for students early in their careers to ensure that they acquire a more consistent skill set.
In this study, the authors investigated veterinary student competency in lameness evaluation before and during clinical rotations. The authors found that the performance level of both cohorts was inadequate in most scenarios, where mistakes largely arose from classifying an incorrect limb as lame. They further found substantial differences in the assessment strategy, where the experienced student cohort primarily looked at upper body movement (head and sacrum) and named only few lameness pointers, while the inexperienced cohort focused to a large extent on limb movement and named a large variety of lameness pointers. The authors conclude that students require a higher level of perceptual training before commencing clinical rotations in order to clarify relevant lameness pointers and acquire the necessary discriminatory skill allowing them to differentiate sound from lame horses and correctly pinpoint the affected limb(s). The authors have by now released a free online training tool which aims to develop these skills, which can be found at www.lamenesstrainer.com. The authors hope that in future systematic training based on the findings presented here, which takes into consideration common misconceptions and—possibly incorrect—‘intuitive’ approaches, may help students to develop more reliable lameness assessment skills.
The authors thank Jon Ward of Acuity ETS for supporting access to the eye tracker for the duration of the study, Professor Roger K Smith for the video recordings of clinical cases, Thilo Pfau and Renate Weller for help with participant recruitment and all of their participants for their time.
Funding SDS’s PhD was funded by the Mellon Trust via the Royal Veterinary College, UK.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.