Article Text

Use of free text clinical records in identifying syndromes and analysing health data
  1. K. Lam, BSc, BVetMed, CertVA, MRCVS1,
  2. T. Parkin, BSc, BVSc, PhD3,
  3. C. Riggs, BVSc, PhD, DEO, DipECVS, MRCVS2 and
  4. K. Morgan, MA, VetMB, PhD, MRCVS4
  1. 1 Department of Veterinary Regulation and International Liaison, Hong Kong Jockey Club, Sha Tin Racecourse, Sha Tin, NT, Hong Kong
  2. 2 Department of Veterinary Clinical Services, Hong Kong Jockey Club, Sha Tin Racecourse, Sha Tin, NT, Hong Kong
  3. 3 Centre for Preventive Medicine, Animal Health Trust, Lanwades Park, Kentford, Newmarket CB8 7UU
  4. 4 Epidemiology Group, Faculty of Veterinary Science, University of Liverpool, Leahurst, Neston CH64 7TE


The analysis of data in clinical records could be useful to epidemiologists in planning analytical studies and identifying new research initiatives. This paper describes the method used to develop a systematic, replicable technique for compressing many words of text into fewer content categories on the basis of explicit rules of user-defined coding, and systematically sorting a large volume of records accurately and reliably. The method was used to categorise the reasons for retirement from racing in Hong Kong of 3727 thoroughbred racehorses between the 1992/93 and 2003/04 racing seasons into a user-defined dictionary. An automated process successfully categorised 95 per cent of the records. The other 5 per cent were assigned manually to one of the dictionary categories. The whole process from initial screening to the categorisation of all the records took approximately 100 man-hours to complete.

Statistics from

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.