Do Small Training Yards Punch Above Their Weight ? : R Code

September 17, 2013 By jasonhathorn in R Code Leave a comment

####################################################################
#
# R Code to accompany the blog post ‘Do small Training Yards Punch Above Their Weight?

# SmallTrainerAnalysis.R
#
# J. Hathorn
#
# v1.0
#
#
# Code to look at whether small yards punch above their weight
#
# written 10-Sep-13
#
#
###################################################################

#rm(list=ls())

library(foreign)
library(maptools)

# read in database files from RI
#
setwd(“C:/Program Files (x86)/RaceForm Interactive”)

RIhorse.data <-read.dbf(“horse.dbf”)
RIouting.data <-read.dbf(“outing.dbf”)
RIrace.data <-read.dbf(“race.dbf”)
RIsire.data <-read.dbf(“sire.dbf”)
RItrainer.data <- read.dbf(“trainer.dbf”)
RIcourse.data<-read.dbf(“course.dbf”)

# #############################################################
# set date parameters to focus on races between chosen dates
# flat season Lincoln to the November Handicap
chosenDateSt<-c(“2012-03-31”)
chosenDateEd<-c(“2012-11-10”)

# set dates for determining yard sizes, set the previous year to the November Handicap
chosenDateSt1<-c(“2011-11-11”)
chosenDateEd1<-c(“2012-11-10”)
#################################################
#
# extract GB course id list from course db
z<-which(RIcourse.data$CCOUNTRY == “GB”)

GBcourseids<-RIcourse.data$CID[z]
GBcoursenames<-RIcourse.data$CNAME[z]
#
# extract GB/IRE trainer lists from trainer db
z<-which(RItrainer.data$TCOUNTRY == “GB”)
GBtrainers<-RItrainer.data$TID[z]
z<-which(RItrainer.data$TCOUNTRY == “IRE”)
IREtrainers<-RItrainer.data$TID[z]
GBIREtrainers<-append(GBtrainers,IREtrainers)
#
#
##################################################

# select outings on the flat between the chosen dates to categorise trainers
tmpidx<-which(RIouting.data$ODATE>=chosenDateSt1 & RIouting.data$ODATE<=chosenDateEd1)
T1.data<-RIouting.data[tmpidx,]

# match the course and add a country variable
z<-match(T1.data$OCOURSEID,RIcourse.data$CID)
T1.data$COCOUNTRY<-NA
T1.data$COCOUNTRY<-RIcourse.data$CCOUNTRY[z]

T1a.data<-T1.data
# select outings that took place on GB and IRE courses and append the 2 d f
tmpidx<-which(T1.data$COCOUNTRY == “GB”)
T1.data<-T1.data[tmpidx,]

tmpidx<-which(T1a.data$COCOUNTRY == “IRE”)
T2.data<-T1a.data[tmpidx,]

T1.data<-rbind(T1.data,T2.data)
# put the horse and trainer IDs into unique variables and add trainer name and country and horse name
horse<-T1.data$OHORSEID
trainer<-T1.data$OTRAINERID
uhorse<-unique(horse)
z<-match(uhorse,horse)
utrainer<-trainer[z]

# match the trainer name/domicile set up domestic/foreign variable
z<-match(utrainer,RItrainer.data$TID)
utrainerhome<-RItrainer.data$TCOUNTRY[z]
utrainername<-RItrainer.data$TSTYLENAME[z]
domestic<-0
z<-which(utrainerhome==”GB”)
domestic[z]<-1
z<-which(utrainerhome==”IRE”)
domestic[z]<-1
z<-which(is.na(domestic))
domestic[z]<-0

# match the horse name and calculate the horse age
z<-match(uhorse,RIhorse.data$HID)
uhorsename<-RIhorse.data$HNAME[z]
uhorsefdate<-RIhorse.data$HFOALDATE[z]
uhorseage<-(as.Date(chosenDateEd)-uhorsefdate)/365
# aggregate to num horses by trainer
trainerid<-tapply(utrainer,utrainer,mean)
numhorses<-tapply(uhorse,utrainer,length)
z<-match(trainerid,RItrainer.data$TID)
trainerctry<-RItrainer.data$TCOUNTRY[z]
trainerdomestic<-0
z<-which(trainerctry==”GB”)
trainerdomestic[z]<-1
z<-which(trainerctry==”IRE”)
trainerdomestic[z]<-1
z<-which(is.na(trainerdomestic))
trainerdomestic[z]<-0
# allocate trainers to overseas/tiny/small/medium/large categories in the variable size
tiny<-5
sml<-25
med<-75
size<-NA

t1<-which(trainerdomestic==0)
size[t1]<-“OVS”

t1<-which(trainerdomestic==1 & numhorses<= tiny)
size[t1]<-“TINY”
t1<-which(trainerdomestic==1 & numhorses>tiny & numhorses <=sml)
size[t1]<-“SMALL”
t1<-which(trainerdomestic==1 & numhorses>sml & numhorses <=med)
size[t1]<-“MEDIUM”
t1<-which(trainerdomestic==1 & numhorses>med)
size[t1]<-“LARGE”

size<-as.factor(size)
# total/avg horses by size category
horsesbycat<-tapply(numhorses,size,sum)
avgbycat<-tapply(numhorses,size,mean,na.rm=TRUE)

# match the size by trainer back to the unique horse vector, will be useful later
#
z<-match(utrainer,trainerid)
utrainersize<-size[z]

###############################################
#
# go to the race file and calculate how many GB handicap and pattern races and runners in the period examined
#
tmpidx<-which(RIrace.data$RDATE>=chosenDateSt & RIrace.data$RDATE<=chosenDateEd & RIrace.data$RFJ==”F”)
R1.data<-RIrace.data[tmpidx,]

z<-match(R1.data$RCOURSEID,RIcourse.data$CID)
R1.data$COCOUNTRY<-NA
R1.data$COCOUNTRY<-RIcourse.data$CCOUNTRY[z]

# select outings that took place on GB courses
tmpidx<-which(R1.data$COCOUNTRY == “GB”)
R1.data<-R1.data[tmpidx,]

# match the winning horse into uhorse to find trainer id/name, trainer home and size category
z<-match(R1.data$RWINHRSID,uhorse)
R1.data$trainerid<-utrainer[z]
R1.data$trainername<-utrainername[z]
R1.data$trainersize<-utrainersize[z]
R1.data$trainerhome<-utrainerhome[z]
R1.data$horseage<-uhorseage[z]

# set up 1/0 values for aggregation later by trainer category
z<-which(R1.data$trainersize==”OVS”)
R1.data$ovs<-NA
R1.data$ovs[z]<-1
z<-which(is.na(R1.data$ovs))
R1.data$ovs[z]<-0

z<-which(R1.data$trainersize==”TINY”)
R1.data$tiny<-NA
R1.data$tiny[z]<-1
z<-which(is.na(R1.data$tiny))
R1.data$tiny[z]<-0

z<-which(R1.data$trainersize==”SMALL”)
R1.data$sml<-NA
R1.data$sml[z]<-1
z<-which(is.na(R1.data$sml))
R1.data$sml[z]<-0

z<-which(R1.data$trainersize==”MEDIUM”)
R1.data$med<-NA
R1.data$med[z]<-1
z<-which(is.na(R1.data$med))
R1.data$med[z]<-0

z<-which(R1.data$trainersize==”LARGE”)
R1.data$lge<-NA
R1.data$lge[z]<-1
z<-which(is.na(R1.data$lge))
R1.data$lge[z]<-0
# match the winning horse into the RIhorse d f to get the sire id
z<-match(R1.data$RWINHRSID,RIhorse.data$HID)
R1.data$WINSIREID<-RIhorse.data$HSIREID[z]

# produce pattern only and handicap only d f
#
z<-which(R1.data$RPATTERN != “NOT” & R1.data$RISHCAP==”FALSE”)
Patterns<-R1.data[z,]

z<-which(R1.data$RISHCAP== “TRUE”)
Hcaps<-R1.data[z,]

# summarise winners for patterns/hcaps by sire ID in total and by trainer category
#
PatternWinsSireIDTmp<-tapply(Patterns$WINSIREID,Patterns$WINSIREID,mean)
PatternWinsSireTmp<-tapply(Patterns$WINSIREID,Patterns$WINSIREID,length)
PatternWinsSireOvsTmp<-tapply(Patterns$ovs,Patterns$WINSIREID,sum)
PatternWinsSireTinyTmp<-tapply(Patterns$tiny,Patterns$WINSIREID,sum)
PatternWinsSireSmlTmp<-tapply(Patterns$sml,Patterns$WINSIREID,sum)
PatternWinsSireMedTmp<-tapply(Patterns$med,Patterns$WINSIREID,sum)
PatternWinsSireLgeTmp<-tapply(Patterns$lge,Patterns$WINSIREID,sum)

HcapWinsSireIDTmp<-tapply(Hcaps$WINSIREID,Hcaps$WINSIREID,mean)
HcapWinsSireTmp<-tapply(Hcaps$WINSIREID,Hcaps$WINSIREID,length)
HcapWinsSireOvsTmp<-tapply(Hcaps$ovs,Hcaps$WINSIREID,sum)
HcapWinsSireTinyTmp<-tapply(Hcaps$tiny,Hcaps$WINSIREID,sum)
HcapWinsSireSmlTmp<-tapply(Hcaps$sml,Hcaps$WINSIREID,sum)
HcapWinsSireMedTmp<-tapply(Hcaps$med,Hcaps$WINSIREID,sum)
HcapWinsSireLgeTmp<-tapply(Hcaps$lge,Hcaps$WINSIREID,sum)

# summarise winners for patterns/hcaps by trainer size
PatternWinsTrainers<-tapply(Patterns$trainersize,Patterns$trainersize,length)
HcapWinsTrainers<-tapply(Hcaps$trainersize,Hcaps$trainersize,length)

#########################
#
# find out number of runs by category for each race type
#
z<-match(T1.data$OTRAINERID,utrainer)
T1.data$trainername<-utrainername[z]
T1.data$trainersize<-utrainersize[z]
T1.data$trainerhome<-utrainerhome[z]
T1.data$horseage<-uhorseage[z]

# set up 1/0 values for aggregation later by trainer category
z<-which(T1.data$trainersize==”OVS”)
T1.data$ovs<-NA
T1.data$ovs[z]<-1
z<-which(is.na(T1.data$ovs))
T1.data$ovs[z]<-0

z<-which(T1.data$trainersize==”TINY”)
T1.data$tiny<-NA
T1.data$tiny[z]<-1
z<-which(is.na(T1.data$tiny))
T1.data$tiny[z]<-0

z<-which(T1.data$trainersize==”SMALL”)
T1.data$sml<-NA
T1.data$sml[z]<-1
z<-which(is.na(T1.data$sml))
T1.data$sml[z]<-0

z<-which(T1.data$trainersize==”MEDIUM”)
T1.data$med<-NA
T1.data$med[z]<-1
z<-which(is.na(T1.data$med))
T1.data$med[z]<-0

z<-which(T1.data$trainersize==”LARGE”)
T1.data$lge<-NA
T1.data$lge[z]<-1
z<-which(is.na(T1.data$lge))
T1.data$lge[z]<-0

# bring in race types

z<-match(T1.data$ORACEID,R1.data$RID)
T1.data$RPATTERN<-R1.data$RPATTERN[z]
T1.data$RISHCAP<-R1.data$RISHCAP[z]

# match each horse into the RIhorse d f to get the sire id
z<-match(T1.data$OHORSEID,RIhorse.data$HID)
T1.data$SIREID<-RIhorse.data$HSIREID[z]

# produce pattern only and handicap only outing d f
#
z<-which(T1.data$RPATTERN != “NOT” & T1.data$RISHCAP==”FALSE”)
OutPatterns<-T1.data[z,]

z<-which(T1.data$RISHCAP== “TRUE”)
OutHcaps<-T1.data[z,]

# summarise runners for patterns/hcaps by trainer size
PatternRunsTrainers<-tapply(OutPatterns$trainersize,OutPatterns$trainersize,length)
HcapRunsTrainers<-tapply(OutHcaps$trainersize,OutHcaps$trainersize,length)

# summarise runners for patterns/hcaps by sire ID
PatternRunsSireID<-tapply(OutPatterns$SIREID,OutPatterns$SIREID,mean)
PatternRunsSire<-tapply(OutPatterns$SIREID,OutPatterns$SIREID,length)
PatternRunsSireOvs<-tapply(OutPatterns$ovs,OutPatterns$SIREID,sum)
PatternRunsSireTiny<-tapply(OutPatterns$tiny,OutPatterns$SIREID,sum)
PatternRunsSireSml<-tapply(OutPatterns$sml,OutPatterns$SIREID,sum)
PatternRunsSireMed<-tapply(OutPatterns$med,OutPatterns$SIREID,sum)
PatternRunsSireLge<-tapply(OutPatterns$lge,OutPatterns$SIREID,sum)

HcapRunsSireID<-tapply(OutHcaps$SIREID,OutHcaps$SIREID,mean)
HcapRunsSire<-tapply(OutHcaps$SIREID,OutHcaps$SIREID,length)
HcapRunsSireOvs<-tapply(OutHcaps$ovs,OutHcaps$SIREID,sum)
HcapRunsSireTiny<-tapply(OutHcaps$tiny,OutHcaps$SIREID,sum)
HcapRunsSireSml<-tapply(OutHcaps$sml,OutHcaps$SIREID,sum)
HcapRunsSireMed<-tapply(OutHcaps$med,OutHcaps$SIREID,sum)
HcapRunsSireLge<-tapply(OutHcaps$lge,OutHcaps$SIREID,sum)

# calc the average age of the horses run by each trainer category in handicaps
#
HcapAge.ovs<-sum(OutHcaps$ovs*as.numeric(OutHcaps$horseage),na.rm=TRUE)/sum(as.numeric(OutHcaps$ovs),na.rm=TRUE)
HcapAge.tiny<-sum(OutHcaps$tiny*as.numeric(OutHcaps$horseage),na.rm=TRUE)/sum(as.numeric(OutHcaps$tiny),na.rm=TRUE)
HcapAge.sml<-sum(OutHcaps$sml*as.numeric(OutHcaps$horseage),na.rm=TRUE)/sum(as.numeric(OutHcaps$sml),na.rm=TRUE)
HcapAge.med<-sum(OutHcaps$med*as.numeric(OutHcaps$horseage),na.rm=TRUE)/sum(as.numeric(OutHcaps$med),na.rm=TRUE)
HcapAge.lge<-sum(OutHcaps$lge*as.numeric(OutHcaps$horseage),na.rm=TRUE)/sum(as.numeric(OutHcaps$lge),na.rm=TRUE)
#
PtrnAge.ovs<-sum(OutPatterns$ovs*as.numeric(OutPatterns$horseage),na.rm=TRUE)/sum(as.numeric(OutPatterns$ovs),na.rm=TRUE)
PtrnAge.tiny<-sum(OutPatterns$tiny*as.numeric(OutPatterns$horseage),na.rm=TRUE)/sum(as.numeric(OutPatterns$tiny),na.rm=TRUE)
PtrnAge.sml<-sum(OutPatterns$sml*as.numeric(OutPatterns$horseage),na.rm=TRUE)/sum(as.numeric(OutPatterns$sml),na.rm=TRUE)
PtrnAge.med<-sum(OutPatterns$med*as.numeric(OutPatterns$horseage),na.rm=TRUE)/sum(as.numeric(OutPatterns$med),na.rm=TRUE)
PtrnAge.lge<-sum(OutPatterns$lge*as.numeric(OutPatterns$horseage),na.rm=TRUE)/sum(as.numeric(OutPatterns$lge),na.rm=TRUE)

# as the winners tapply won’t include all runners, match back into winner variables to line up winners with runners
z<-match(PatternRunsSireID,PatternWinsSireIDTmp)
PatternWinsSire<-PatternWinsSireTmp[z]
PatternWinsSireOvs<-PatternWinsSireOvsTmp[z]
PatternWinsSireTiny<-PatternWinsSireTinyTmp[z]
PatternWinsSireSml<-PatternWinsSireSmlTmp[z]
PatternWinsSireMed<-PatternWinsSireMedTmp[z]
PatternWinsSireLge<-PatternWinsSireLgeTmp[z]

z<-which(is.na(PatternWinsSire))
PatternWinsSire[z]<-0
z<-which(is.na(PatternWinsSireOvs))
PatternWinsSireOvs[z]<-0
z<-which(is.na(PatternWinsSireTiny))
PatternWinsSireTiny[z]<-0
z<-which(is.na(PatternWinsSireSml))
PatternWinsSireSml[z]<-0
z<-which(is.na(PatternWinsSireMed))
PatternWinsSireMed[z]<-0
z<-which(is.na(PatternWinsSireLge))
PatternWinsSireLge[z]<-0

# repeat for handicaps

z<-match(HcapRunsSireID,HcapWinsSireIDTmp)
HcapWinsSire<-HcapWinsSireTmp[z]
HcapWinsSireOvs<-HcapWinsSireOvsTmp[z]
HcapWinsSireTiny<-HcapWinsSireTinyTmp[z]
HcapWinsSireSml<-HcapWinsSireSmlTmp[z]
HcapWinsSireMed<-HcapWinsSireMedTmp[z]
HcapWinsSireLge<-HcapWinsSireLgeTmp[z]

z<-which(is.na(HcapWinsSire))
HcapWinsSire[z]<-0
z<-which(is.na(HcapWinsSireOvs))
HcapWinsSireOvs[z]<-0
z<-which(is.na(HcapWinsSireTiny))
HcapWinsSireTiny[z]<-0
z<-which(is.na(HcapWinsSireSml))
HcapWinsSireSml[z]<-0
z<-which(is.na(HcapWinsSireMed))
HcapWinsSireMed[z]<-0
z<-which(is.na(HcapWinsSireLge))
HcapWinsSireLge[z]<-0

# calc IVs per sire
PatternIVsire<-(PatternWinsSire/sum(PatternWinsSire))/(PatternRunsSire/sum(PatternRunsSire))
HcapIVsire<-(HcapWinsSire/sum(HcapWinsSire))/(HcapRunsSire/sum(HcapRunsSire))

# calc IV adjusted runners by sire
#
PatternRunsSire.IV<- PatternRunsSire*PatternIVsire
PatternRunsSireOvs.IV<- PatternRunsSireOvs*PatternIVsire
PatternRunsSireTiny.IV<- PatternRunsSireTiny*PatternIVsire
PatternRunsSireSml.IV<- PatternRunsSireSml*PatternIVsire
PatternRunsSireMed.IV<- PatternRunsSireMed*PatternIVsire
PatternRunsSireLge.IV<- PatternRunsSireLge*PatternIVsire

HcapRunsSire.IV<- HcapRunsSire*HcapIVsire
HcapRunsSireOvs.IV<- HcapRunsSireOvs*HcapIVsire
HcapRunsSireTiny.IV<- HcapRunsSireTiny*HcapIVsire
HcapRunsSireSml.IV<- HcapRunsSireSml*HcapIVsire
HcapRunsSireMed.IV<- HcapRunsSireMed*HcapIVsire
HcapRunsSireLge.IV<- HcapRunsSireLge*HcapIVsire

Do Small Training Yards Punch Above Their Weight?

September 17, 2013 By jasonhathorn in Horse Racing, Trainer Research 4 Comments

Introduction

When George Margeson’s Lucky Kristale won the Group 2 Duchess of Cambridge stakes in July 2013 it was a newsworthy event. Not only because she won at 20-1, but because she is trained at a yard that has so far sent out fewer than 20 different horses to race on the flat in 2013. Lucky Kristale’s subsequent win in the Group 2 Lowther Stakes at York showed her Newmarket win to be no fluke, with an engagement in the Group 1 Cheveley Park likely to be next on the agenda. Small training yards don’t often win Pattern races in the UK. Of 266 such races that took place on the flat in 2012, just 6 were won by yards that had fewer than 25 horses in training. So how well do small yards perform? Is the flexibility of training a small string outweighed by the advantages in having a large number of horses in training? Are smaller yards able to judge when they have a horse that is capable of winning a Pattern race, and how good are they at placing their horses in handicaps? In short, do small training yards punch above their weight?

These questions are best answered by considering yards in aggregate. It is difficult to draw strong conclusions about an individual trainer’s ability when they don’t have many horses in training, however we can classify trainers by yard size and then examine how each yard classification – small, medium and large – performs. The number of horses in each category means that, assuming the question is framed correctly, the conclusions will have some significance.

The analysis that underpins this piece was carried out in the R statistical environment accessing Raceform Interactive data. The R code is posted elsewhere for interested readers.

Training Yard Classification

Training yards were classified as Tiny, Small, Medium and Large using the criteria in Table 1 below. The Tiny category was included so that results for the Small category were not influenced by yards that have the occasional runner. All races under both flat and NH codes in that took place in Great Britain (GB) and Ireland in the 12 months to the date of the 2012 November Handicap were considered. The number of different horses that ran in this 12 month period determined the size classification of each trainer. An Overseas category is included so that the occasional runner from abroad is not misclassified.

Yard Size	Horses in Training	Yards	Number of Horses	Average
Tiny	fewer than 5	957	1,915	2
Small	between 5 and 25	558	7,253	13
Medium	between 25 and 75	203	8,533	42
Large	more than 75	56	6,967	124
Overseas	based outside GB/ Ireland

Table 1: Yard classification, yard and horse numbers

Table 1 shows that a substantial number of horses are in training in small yards. In aggregate there are more horses in training in Small/Tiny yards than in any other category. This suggests a good degree of success on the part of smaller yards, in being able to attract and retain horses in training.

Pattern Race Analysis

With yards now classified by size, the results of all 266 Pattern races that took place on the Flat in 2012 were examined by yard size and are presented in Table 2. The number of winners and runners and Impact Values (IV) are presented. Impact Values are defined as IV= %winners/%runners and represent opportunity adjusted performance. An IV of 1 represents what you would expect given the opportunity, less than 1 is worse than expected, greater than 1 better than expected.

Yard Size	Winners	Runners	%Winners	%Runners	IV
Tiny	0	14	0.0%	0.6%	0.00
Small	5	145	1.9%	6.0%	0.31
Medium	57	736	21.4%	30.7%	0.70
Large	194	1,437	72.9%	65.4%	1.22
Overseas	10	65	0.0%	0.0%	0.00
TOTAL	266	2,397

Table 2: Pattern race results 2012 wins/runs/IVs by stable classification

Small yards were not well represented in Pattern races in 2012, comprising fewer than 7% of runners. In addition, when small yards did have runners, they do not win as often as would be expected, posting an IV of 0.31. Medium sized yards do not win Pattern races as often as would be expected either, posting an IV of 0.70. Larger yards, whilst having the largest proportion of runners, delivered more winners than would be expected, with an IV of 1.22. From a small sample overseas yards were adept at targeting GB Pattern races in 2012, with an IV of 1.39.

The information in Table 2 does not take into account the quality of the horses that take part in each category of yard. Larger yards are likely to have better quality horses and thus more likely to win Pattern Races. There are a number of ways in which this quality bias could be corrected. One option would be to take into account the cost of the horses in each type of yard. However not all horses pass through the sales ring, so any adjustment based upon sales information would be incomplete. Another option, adopted here, is to adjust the number of runners in each stable size category by the Impact Value of the sire of each of the runners in Pattern races in 2012. Thus if Galileo’s stock had an IV of 2, and his stock ran 10 times in Pattern races, the number of runs would be adjusted to give a sire adjusted run number of 20. Since the successful sires have a higher representation at the larger training yards, this approach takes into account the lower probability that small yards win fewer Pattern races due to the breeding of their horses in training. The effect of this adjustment is to increase the number of runners (and thus decrease the IV) from stables with horses by successful sires and decrease the number of runners (and thus increase IVs) from stables with horses by less successful sires.

Yard Size	Winners	Runners	Runners Adjusted	IV raw	IV adjusted
Tiny	0	14	12	0.00	0.00
Small	5	145	96	0.31	0.47
Medium	57	736	638	0.70	0.81
Large	194	1,437	1,567	1.22	1.12
Overseas	10	65	85	1.39	1.06
TOTAL	266	2,397	2,397

Table 3: Pattern race results 2012 wins/runs/IVs/adjusted IVs by stable classification

The Sire adjustment has reduced the number of runners from small yards from 145 to 96. Medium sized yards also receive some relief. However the adjustment is not enough to take the IVs for small and medium sized yards to 1, with small yards now reporting an IV of 0.47 and medium yards 0.81. Larger yards still have more winners than expected relative to small and medium sized yards, even after making an adjustment for the quality of the horses in each yard category.

Handicap Race Analysis

Are the results seen for Pattern races replicated in Handicaps? Since handicaps are a test of the best horse at the weights, an additional set of skills are brought to bear in placing horses in them. Quality of horse should be less important in these types of race.

Yard Size	Winners	Runners	%Winners	%Runners	IV
Tiny	33	524	1.1%	1.6%	0.65
Small	644	8,406	21.0%	26.4%	0.80
Medium	1,256	13,129	41.0%	41.3%	0.99
Large	1,128	9,741	36.9%	30.6%	1.20
Overseas	0	8	0.0%	0.0%	0.00
TOTAL	3,061	31,808

Table 4: Handicap race results 2012 wins/runs/IVs by stable classification

In handicaps small yards have and IV of 0.80, so 20% fewer winners than expected. Medium sized yards deliver wins in-line with expectations, whilst large yards deliver more wins in handicaps than expected with an IV of 1.20. Results when an adjustment for horse quality via Sire Impact Values is applied are reported in Table 5 below.

Yard Size	Winners	Runners	Runners Adjusted	IV raw	IV adjusted
Tiny	33	524	463	0.65	0.74
Small	644	8,406	8,016	0.80	0.83
Medium	1,256	13,129	13,051	0.99	1.00
Large	1,128	9,741	10,273	1.20	1.14
Overseas	0	8	5	0.00	0.00
TOTAL	3,061	31,808	31,808

Table 5: Handicap race results 2012 wins/runs/IVs/adjusted IVs by stable classification

Whilst there is some improvement in Impact Values for smaller yards, the IV of 0.83 is equivalent to 17% fewer winners than expected. Medium sized yards again deliver wins in-line with expectations, whilst large yards deliver more wins in handicaps than expected with an IV of 1.14.

Average Horse Age & Yard Classification

A possible explanation for smaller yards posting IVs lower than 1 in handicaps is that they keep a greater proportion of exposed horses in training. A proxy for an exposed horse is its age. The average horse age by stable classification, split by Pattern races and Handicaps, is given in Table 6 below.

Yard Size	Pattern Horse Age (average)	Handicaps Horse Age (average)
Tiny	4.9	4.9
Small	5.7	6.5
Medium	5.0	5.5
Large	4.2	5.2
Overseas	5.1	6.0

Table 6: Average horse age by stable classification

The table confirms that small yards have, on average, older horses in training than medium and large yards. This is the case for both Pattern and Handicap races. Whilst an age difference would only favour younger horses in Pattern races if the WFA scale is incorrect, the difference in horse age in handicaps by yard classification suggests that small yards are running more exposed horses than larger yards, and this is a contributory factor in them posting IVs less than 1 in such races.

Summary

So do small training yards punch above their weight? There are a large number of small training yards in Great Britain and Ireland. In 2012 they were responsible, in aggregate, for about a quarter of the runners in flat handicaps. Small yard representation in Pattern races in 2012 was far less, accounting for fewer than 7% of runners. Moreover, given this number of runners, the percentage of winners from small yards was less than might be expected, even after a correction for horse quality is applied. The Impact Value for small training yards was 0.47 in Pattern races, although it is possible that the correction for horse quality applied, via Sire Impact Values, does not go far enough. It could be that the best offspring of a Sire end up at the larger yards and the smaller yards end up with (say) the less good Galileo yearlings. The correction used would not account for this.

In Handicaps the Impact Value for smaller yards was 0.83 in 2012, in contrast larger yards posted an IV of 1.14. One explanation for the difference in performance in handicaps is that the smaller yards are running more exposed horses. The difference in horse age across yard size suggests this is the case. Another explanation for the performance difference is that there is substantial value in having more horses in training because it enables the trainer to categorise his horses more accurately, which leads to better placing.

The results presented suggest that it is the large training yards that are the ones punching above their weight. Training a large number of horses in one yard, whilst being able to keep the average horse age lower than smaller yards, appears to confer a substantial advantage in terms of the results produced on the racecourse.

	Seb on Catalogue sizes & prices a…
	Full Siblings On The… on Does Buying at Tattersalls Boo…
	jasonhathorn on Do Bookmaker Restrictions Harm…
	Richas (@RichasAA) on Do Bookmaker Restrictions Harm…
	racehorsesrace on Trainer Form: Signal or N…

onemilefourandten

Jason Hathorn's horse racing blog

Author Archives: jasonhathorn

Do Small Training Yards Punch Above Their Weight ? : R Code

Do Small Training Yards Punch Above Their Weight?