The analysis of data in clinical records could be useful to epidemiologists in planning analytical studies and identifying new research initiatives. This paper describes the method used to develop a systematic, replicable technique for compressing many words of text into fewer content categories on the basis of explicit rules of user-defined coding, and systematically sorting a large volume of records accurately and reliably. The method was used to categorise the reasons for retirement from racing in Hong Kong of 3727 thoroughbred racehorses between the 1992/93 and 2003/04 racing seasons into a user-defined dictionary. An automated process successfully categorised 95 per cent of the records. The other 5 per cent were assigned manually to one of the dictionary categories. The whole process from initial screening to the categorisation of all the records took approximately 100 man-hours to complete.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.