Article Text

Download PDFPDF

A practical tool for locomotion scoring in sheep: reliability when used by veterinary surgeons and sheep farmers
  1. J. W. Angell, BVSc MRCVS,
  2. P. J. Cripps, BSc(Hons) BVSc MSc PhD MRCVS,
  3. D. H. Grove-White, BVSc MSc DBR PHD DipECBHM FRCVS and
  4. J. S. Duncan, BSc(Hons) BVSc DipECSRHM PhD MRCVS
  1. Department of Veterinary Epidemiology, Institute of Infection and Global Health, The University of Liverpool, Leahurst Campus, Neston, Wirral, Cheshire CH64 7TE, UK
  1. E-mail for correspondence jwa{at}


A four-point locomotion scoring tool for sheep was developed and tested on 10 general practice veterinary surgeons (VS) and 10 sheep farmers. Thirty-four video clips of sheep displaying different locomotion scores were recorded and randomly assorted. Following a set period of training using four other video clips typical of the four locomotion scores, participants then scored the 34 test clips. The participants repeated the training and the exercise one month later. There were high levels of intraobserver repeatability: weighted κ (κW) 0.81 for VS and 0.83 for farmers. There was no difference in intraobserver repeatability between vets and farmers (Wilcoxon signed rank P=0.8). When considering the overall distribution of scores within the video package, there were high levels of interobserver repeatability: mean κW 0.73 for VS and 0.72 for farmers. However, the repeatability for the individual locomotion scores was only fair to moderate. It is therefore recommended that when observations are repeated on different occasions they are made by the same observer.

  • Lameness
  • Locomotion
  • Sheep
  • Mobility
  • Repeatability
  • Reliability
View Full Text

Statistics from


Lameness in sheep is a priority welfare concern for the UK sheep industry (Phythian and others 2011). Current understanding and consensus of opinion have led to recommendations to farmers to identify and treat lame sheep early (FAWC 2011).

Locomotion scoring identifies lame individuals and can be used to determine flock prevalence. Previous scoring tools, for example, Kaler and others (2009); Ley and others (1989); Phythian and others (2012) and Welsh and others (1993) all have limitations, being either overly detailed or simplistic. Furthermore, they all use small numbers of experienced researchers, which could reduce generalisability.

The aim of this study was to develop a locomotion scoring tool for use by farmers and veterinary surgeons (VS) to assess lameness severity in individual sheep, and severity and prevalence in flocks.

Materials and methods

Locomotion scoring

A four-point system was developed by combining the Kaler system (Kaler and others 2009) with the DairyCo Mobility Scoring System (DairyCo 2009):

0: (SOUND) Bears weight evenly on all four feet and walks with an even rhythm.

1: (MILDLY LAME) Steps are uneven but it is not clear which limb or limbs are affected.

2: (MODERATELY LAME) Steps are uneven and the stride may be shortened; the affected limb or limbs are identifiable.

3: (SEVERELY LAME) Mobility is severely compromised such that the sheep frequently stops walking or lies down due to obvious discomfort. The affected limb or limbs are clearly identifiable and may be held off the ground while walking or standing.

Ethical approval was provided by the University of Liverpool (VREC13).

Thirty-eight video clips of sheep walking and standing were made—representing all four scores. To ensure a range of severities was represented, these were scored by three experienced sheep VS to collectively determine the ‘true’ score. Four of the clips were used to train the participants. The other 34 were shown in random order to the participants. If there was more than one sheep visible, a red circle was drawn around the relevant individual.

Study population

The tool was tested on a convenience, non-random sample of 10 general practice VS and 10 sheep farmers.


Each participant was trained using clips typical of each score. Participants then watched the test clips taking as long as needed and were allowed to watch each clip as many times as necessary. No help was given during the test period. The training and assessment were repeated one month later.

Data analysis

Intraobserver agreement

Bias between attempts

For each clip, the score from an individual's second attempt was subtracted from their initial score. Differences were investigated using one-sample t tests.

Exact agreement

Per cent agreement for an observer was determined from the number of observations that matched exactly:Embedded Image

The mean per cent agreement was calculated for VS and farmers and compared using the chi-squared test. Similar data were compiled for the one and two point differences.

Pairwise κ

Weighted κ (κW) was calculated between each pair of observations by each observer using quadratic weights and interpreted using Landis and Koch (1977) (Table 2).

Interobserver agreement

κ between observers

For each observer, a κW was created with each member of their group. The mean of these nine values was that individual's interobserver agreement.

κ for locomotion scores

To examine the repeatability of recording different severities of locomotion score, unweighted κ was obtained for all clips that had received the given score.

Median scores

Median scores for each clip were calculated for both VS and farmers. Differences were assessed using the Wilcoxon signed-rank test.

Statistical significance was set at <0.05. All analyses used STATA V.13 (StataCorp, Texas, USA).


All participants found the tool easy to use. They found it hardest to distinguish between scores 1 and 0.

The mean proportion of scores attributed from the first set of observations was: score 0: 8.7 (26 per cent), score 1:9.9 (29 per cent ), score 2:9.8 (29 per cent) and score 3:5.7 (17 per cent).

Intraobserver agreement

Bias between attempts

Bias was present within and between observers and was significant for three VS and five farmers (Table 1). The largest differences in scores were −0.21 and 0.25, respectively.


Intraobserver and interobserver agreement for veterinary surgeon and farmer observers

Exact agreement

The mean overall exact agreement within individual observers was 65.0 per cent (SD 8.7) for VS and 68.3 per cent (SD 12.2) for farmers (P=0.5).

Pairwise κ

The mean κW at intraobserver level was 0.81 for VS and 0.83 for farmers (P=0.8).

Interobserver agreement

κ between observers

The mean κw at interobserver level was 0.73 (SD 0.04) for VS and 0.72 (SD 0.04) for farmers (P=0.8) (Table 1).

κ for locomotion scores

Overall, for score 3 there is substantial agreement between observers. For other scores, there is moderate or fair agreement (Table 2).


Interobserver agreement for individual locomotion scores

Median scores

The median score assigned to each video clip by VS was not significantly different from that assigned by farmers (P=0.18) (Table 2).


There were score differences between observation attempts; however, the authors consider this bias, while present, is too small to invalidate the scoring system. The variation in locomotion scores (Table 2) indicates bias between observers and may have led to smaller κ values than if the scores had equal prevalence within the video package (Byrt and others 1993). However, given that the lowest prevalence score (score 3) had the highest levels of repeatability between observers, a more equal prevalence would likely have had little impact on the κ values. Both intraobserver and interobserver repeatability were substantial, indicating that this tool could be used reliably in monitoring lameness in individuals over time and enables different observers to reliably measure lameness across farms. However, the interobserver repeatability of locomotion scores was slight to moderate, except for score 3. Therefore, while different observers scored similar proportions of sheep with each locomotion score, the ability to score the same individual with the same score was unsatisfactory.

The large number and two types of observers in this study suggest that the tool is applicable to industry users.


View Abstract


  • Provenance: Not commissioned; externally peer reviewed

  • Funding This study was supported by a grant from the British Veterinary Association Animal Welfare Foundation, from the Norman Hayward Fund and also by a grant from Hybu Cig Cymru/Meat Promotion Wales. The authors are grateful to all the VS and farmers who willingly agreed to take part, including those who willingly provided their sheep.

  • Competing interests None.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.