Measuring Training Yard Success: Impact Values from Maiden, Handicap & Pattern Races

Introduction

The champion trainer for the season is decided using total prize money earned. This measure favours the very largest training yards, particularly those that have access to the offspring of top stallions.  As a result it is somewhat unsatisfactory measure of training yard success. Since Impact Values (IVs) correct for yard size by taking into account the number of runners as well as number of winners,  the playing field between yards of differing sizes is, to a good degree, made level when this measure is used. Whilst there are also limitations with using this measure across all races and for all trainers, the net is cast wider.  In this blog post IVs for different categories of race, namely maidens, handicaps and pattern races, are calculated, both raw and adjusted for Sire IVs (SA), then combined to produce a composite IV measure. Measuring IVs in different race categories enables a more complete picture of training yard success to be built. A by-product of the approach used is that trainers whose results are most and least influenced by the success of particular stallions can be identified.

Data & Methodology

The analysis that underpins this piece was carried out in the R statistical environment accessing Raceform Interactive data for the 2012 flat season. The R code is posted elsewhere for interested readers. To qualify for inclusion in the tables that follow, a training yard must have sent out at least 50 runners in handicaps and 100 runners in total over the course of the 2012 flat season, and be based in Great Britain (GB). A total of 139 yards met this criteria. These yards were then split into 2 groups according to how many different horses had been raced – 66 yards raced at least 40 different horses and are the focus of the analysis in this blog piece. The other 73 yards, smaller in size, were analysed separately and may be the subject of a further blog post. Since we know that on average larger yards deliver higher IVs than smaller yards (see my earlier blog post on this subject) smaller yards that perform well may not have appeared in the listings reported below and it is more appropriate to analyse their results separately.

Impact Values – Maidens

Maiden race IVs are likely to favour large yards with access to potential pattern class horses. Table 1 shows the top 10 yards ranked by sire adjusted IV in maidens. Raw IVs are also reported. Note the dominance of the Richard Hannon yard and the small difference between raw and sire adjusted IVs compared with the larger differences between IVs for Saeed bin Suroor and William Haggas. The large number of horses at the Hannon’s yard appears to confer a substantial advantage in being able to place horses to good effect within maidens. The same comments apply to Richard Fahey’s results. In both yards the large number of horses at their disposal appears to outweigh any advantage given to other yards via ostensibly better bred horses.

Rank Trainer wins runs IV raw IV SA
1 Mrs K Burke 14 49 2.70 3.02
2 Saeed bin Suroor 33 121 2.58 1.93
3 Peter Chapple-Hyam 14 68 1.95 1.81
4 William Haggas 42 182 2.18 1.73
5 Henry Candy 9 63 1.35 1.63
6 Richard Hannon 82 471 1.64 1.58
7 Jeremy Noseda 18 98 1.74 1.57
8 John Quinn 7 39 1.70 1.55
9 Richard Fahey 35 225 1.47 1.49
10 David Simcock 20 108 1.75 1.42

Table 1: Top 10 training yards by Sire adjusted IV in maidens

Impact Values – Handicaps

Table 2 shows the top 10 yards ranked by Sire adjusted IV in handicaps. Raw IVs are also reported. Sir Mark Prescott Bt tops the table, although in common with the majority of the trainers in the top 10 his Sire adjusted IV is substantially lower than his raw IV. Noteworthy are the results of Chris Wall and Michael Appleby, whose IVs are hardly affected by the relative success of the sires of their horses in training. Part of this result is due to their lack of relative success in maidens, suggesting their horses are likely to be highly competitive when they move out of maidens  into handicap company – Chris Wall’s IV in maidens was 0.43, whilst Michael Appelby sent out no maiden winners in 2012. In contrast Sir Mark Prescott Bt, along with 6 other trainers, delivered IVs above 1 in both maiden and handicap company. The other 6 were Marcus Tregoning, Luca Cumani, Sir Michael Stoute, Ed Dunlop, James Fanshawe, Roger Varian and Mick Channon.

Rank Trainer wins runs IV raw IV SA
1 Sir Mark Prescott Bt 31 131 2.45 1.98
2 Marcus Tregoning 16 83 1.99 1.88
3 Sir Michael Stoute 25 115 2.25 1.83
4 Luca Cumani 24 129 1.92 1.75
5 Chris Wall 18 104 1.79 1.74
6 Roger Varian 28 155 1.87 1.61
7 Peter Chapple-Hyam 10 63 1.64 1.53
8 William Haggas 30 166 1.87 1.52
9 Michael Appleby 24 162 1.53 1.52
10 Tom Dascombe 33 211 1.62 1.51

Table 2: Top 10 training yards by Sire adjusted IV in handicaps

Impact Values – Pattern Races

Table 3 shows the top 20 yards ranked by Sire adjusted IV in pattern races. Raw IVs are also reported. The results are more difficult to interpret than maidens and handicaps for individual trainers because of small sample sizes. The Richard Hannon and John Gosden yards dominate the table in terms of number of winners and runners, however the Sire adjusted IVs for both trainers are noticeably  lower than their raw IVs. It is possible this  result is an artefact created by their substantial relative success in producing pattern class winners during the 2012 flat season. A number of yards that perform well on the IV measure in maiden company do not appear in the table below.

Rank Trainer wins runs IV raw IV SA
1 Ann Duffield 2 4 4.47 6.12
2 Alan McCabe 1 5 1.79 2.92
3 David Simcock 2 14 1.28 2.05
4 Sir Henry Cecil 16 60 2.39 1.89
5 David O’Meara 4 20 1.79 1.89
6 Roger Charlton 10 46 1.94 1.84
7 David Barron 2 13 1.38 1.58
8 Roger Varian 9 52 1.55 1.56
9 Sir Michael Stoute 6 41 1.31 1.42
10 Richard Fahey 9 81 0.99 1.36
11 Mrs K Burke 1 12 0.75 1.32
12 Chris Wall 2 15 1.19 1.27
13 Clive Cox 7 35 1.79 1.27
14 Henry Candy 1 11 0.81 1.23
15 Richard Hannon 21 139 1.35 1.16
16 Mark Johnston 8 61 1.17 1.15
17 Marcus Tregoning 2 17 1.05 1.15
18 Luca Cumani 4 26 1.38 1.10
19 John Gosden 23 130 1.58 1.09
20 Mahmood Al Zarooni 9 69 1.17 1.06

Table 3: Top 10 training yards by Sire adjusted IV in pattern races

Impact Values – Composite Measure

A composite IV is calculated by combining together the IVs for maidens, handicaps and pattern races by trainer, weighting by the proportion of runs that each trainer had in each category.  Thus a trainer without runners in pattern races would not be penalised for his non-participation, and the biggest contributor to each trainer’s IV is from the category of race in which they had the biggest proportion of runners. The composite measure was also adjusted for Sire IV. Using this measure Sir Mark Prescott Bt was the top trainer on the flat in 2012, followed by William Haggas and Marcus Tregoning. Noteworthy results were produced by Henry Candy, David Barron, Michael Appleby and Chris Wall, each of whom saw their IV increase after taking the Sire IV adjustment  into account. For 16 of the 20  trainers we see the opposite, suggesting that the adjustment for bloodstock quality used here via a Sire adjusted IV does not go far enough. I will return to this subject in another blog article. Thanks to Declan Meagher and others for making this point  on the separate blog post “Do Small Training Yards Punch Above Their Weight?’.

Rank Trainer IV raw IV SA
1 Sir Mark Prescott Bt 1.81 1.55
2 William Haggas 1.87 1.55
3 Marcus Tregoning 1.62 1.53
4 Saeed bin Suroor 1.86 1.52
5 Roger Varian 1.76 1.51
6 Peter Chapple-Hyam 1.59 1.51
7 Henry Candy 1.31 1.49
8 Sir Michael Stoute 1.91 1.48
9 Sir Henry Cecil 1.86 1.42
10 David Barron 1.25 1.42
11 Richard Hannon 1.52 1.41
12 Luca Cumani 1.59 1.40
13 Jeremy Noseda 1.53 1.39
14 Mrs K Burke 1.39 1.38
15 Michael Appleby 1.29 1.36
16 Chris Wall 1.34 1.35
17 Ralph Beckett 1.64 1.34
18 Roger Charlton 1.49 1.30
19 Tom Dascombe 1.37 1.29
20 David Simcock 1.34 1.27

Table 4: Top 20 training yards by composite IV adjusted for Sire

Training Yards Success & Relationship with Sire Quality

How many training yards are able to deliver improved IVs after the Sire adjustment is taken into account? Remember for successful yards the natural direction for the Sire adjustment to take your IV is downwards. This is because the better quality Sires make an outsized contribution in terms of siring winners. So the yards that are able to increase their IVs after this adjustment is applied are worthy of note. There are 10 yards out of the 66 – see Table 5 below –  that were able to deliver an adjusted composite IV both greater than 1 and higher than their raw composite IV. Henry Candy and David Barron’s results are noteworthy.

Rank Trainer IV comp IV comp SA Difference
1 Henry Candy 1.31 1.49 0.18
2 David Barron 1.25 1.42 0.18
3 Michael Appleby 1.29 1.36 0.07
4 Chris Wall 1.34 1.35 0.02
5 Brian Ellison 1.25 1.26 0.01
6 Kevin Ryan 1.11 1.22 0.11
7 James Given 1.01 1.15 0.14
8 John Quinn 1.12 1.15 0.03
9 Marco Botti 1.06 1.06 0.01
10 Alan Swinbank 1.00 1.02 0.01

Table 5: Top 10 trainers with improved IVs after Sire adjustment ranked on Sire adjusted IV

What of yards that see falls in their IVs after the Sire adjustment is applied? Table 6 ranks the 10 training yards most affected by the Sire IV adjustment. These yards are still highly successful – they still post IVs substantially greater than 1. However, using this metric suggests that these training yards are more reliant than others on the quality of their bloodstock for their success.

Rank Trainer IVcomp IVcomp SA
57 Roger Varian 1.76 1.51
58 Sir Mark Prescott Bt 1.81 1.55
59 James Fanshawe 1.50 1.24
60 Ralph Beckett 1.64 1.34
61 Mahmood Al Zarooni 1.51 1.19
62 William Haggas 1.87 1.55
63 Saeed bin Suroor 1.86 1.52
64 John Gosden 1.65 1.24
65 Sir Michael Stoute 1.91 1.48
66 Sir Henry Cecil 1.86 1.42

Table 6: Bottom 10 trainers with reduced IVs after Sire adjustment

Summary

In this paper the criteria used for measuring training yard success is a Sire Adjusted Impact Value derived from results delivered in maidens, handicaps and pattern races. Using this measure Sir Mark Prescott Bt was the top trainer on the flat in 2012. It is probable the Sire IV adjustment used does not go far enough in terms of correcting for quality and another blog post will address this point.  A small number of trainers produce IVs that improve after an adjustment for Sire quality is made. These training yards are of particular interest.  .

Measuring Training Yard Success: R code

####################################################################
#
#
# Measuring Training Yard Success: Impact Values from Maidens, Handicaps and Pattern Races
# J. Hathorn
#
# v1.0
#
#
# written 18-Sep-13
#
#
###################################################################

#rm(list=ls())

library(foreign)
library(maptools)

# read in database files from RI
#
setwd(“C:/Program Files (x86)/RaceForm Interactive”)

RIhorse.data <-read.dbf(“horse.dbf”)
RIouting.data <-read.dbf(“outing.dbf”)
RIrace.data <-read.dbf(“race.dbf”)
RIsire.data <-read.dbf(“sire.dbf”)
RItrainer.data <- read.dbf(“trainer.dbf”)
RIcourse.data<-read.dbf(“course.dbf”)

# #############################################################
# set date parameters to focus on races between chosen dates
# flat season Lincoln to the November Handicap
chosenDateSt<-c(“2012-03-31”)
chosenDateEd<-c(“2012-11-10”)

# set dates for determining yard sizes, set the previous year to the November Handicap
#chosenDateSt1<-c(“2011-11-11”)
#chosenDateEd1<-c(“2012-11-10”)
#################################################
#
# extract GB course id list from course db
z<-which(RIcourse.data$CCOUNTRY == “GB”)

GBcourseids<-RIcourse.data$CID[z]
GBcoursenames<-RIcourse.data$CNAME[z]
#
# extract GB/IRE trainer lists from trainer db
z<-which(RItrainer.data$TCOUNTRY == “GB”)
GBtrainers<-RItrainer.data$TID[z]
z<-which(RItrainer.data$TCOUNTRY == “IRE”)
IREtrainers<-RItrainer.data$TID[z]
GBIREtrainers<-append(GBtrainers,IREtrainers)
#
#
##################################################

# select outings on the flat between the chosen dates
tmpidx<-which(RIouting.data$ODATE>=chosenDateSt & RIouting.data$ODATE<=chosenDateEd & RIouting.data$OFJ==”F”)
T1.data<-RIouting.data[tmpidx,]

# match the course and add a country variable
z<-match(T1.data$OCOURSEID,RIcourse.data$CID)
T1.data$COCOUNTRY<-NA
T1.data$COCOUNTRY<-RIcourse.data$CCOUNTRY[z]

T1a.data<-T1.data
# select outings that took place on GB courses and append the 2 d f
tmpidx<-which(T1.data$COCOUNTRY == “GB”)
T1.data<-T1.data[tmpidx,]

# #######################
# GB races only
# code to include IRE races if so desired
#tmpidx<-which(T1a.data$COCOUNTRY == “IRE”)
#T2.data<-T1a.data[tmpidx,]
#T1.data<-rbind(T1.data,T2.data)

# ######################
# age restriction if required
# reduce T1 to horses of the desired age given by the parameter agecheck
#agecheck<-2
#tmpidx<-which(T1.data$OAGE==agecheck)
#T1.data<-T1.data[tmpidx,]

# match each horse into the RIhorse d f to get the sire id
z<-match(T1.data$OHORSEID,RIhorse.data$HID)
T1.data$SIREID<-RIhorse.data$HSIREID[z]

# attach some of the race conditions ie age, stakes, handicap etc from the RIrace d f
z<-match(T1.data$ORACEID,RIrace.data$RID)
T1.data$RCOND<-RIrace.data$RCOND[z]
T1.data$RAGE<-RIrace.data$RAGE[z]
T1.data$RANIMAL<-RIrace.data$RANIMAL[z]
T1.data$RPATTERN<-RIrace.data$RPATTERN[z]
T1.data$RISHCAP<-RIrace.data$RISHCAP[z]

# set a 1/0 variable for winners and a 1 variable for runners will help aggregation later
T1.data$runner<-1
T1.data$winner<-0
z<-which(T1.data$OPOS==1)
T1.data$winner[z]<-1

# SP turned into a probability
T1.data$SPprob<-1/(1+T1.data$OSPVAL)

# reformat the ORF rating variable
T1.data$ORF<-as.character(T1.data$ORF)
T1.data$ORF<-gsub(“\\?”,””,T1.data$ORF)
T1.data$ORF<-gsub(“\\+”,””,T1.data$ORF)
T1.data$ORF<-as.numeric(T1.data$ORF)
tmpidx<-which(T1.data$ORF==0)
T1.data$ORF[tmpidx]<-NA
# reformat the OJC rating variable
T1.data$OJC<-as.character(T1.data$OJC)
T1.data$OJC<-gsub(“\\?”,””,T1.data$OJC)
T1.data$OJC<-gsub(“\\+”,””,T1.data$OJC)
T1.data$OJC<-as.numeric(T1.data$OJC)
tmpidx<-which(T1.data$OJC==0)
T1.data$OJC[tmpidx]<-NA

####################################################
#
# produce summary variables by horse – runs/wins/ratings etc
#
hid<-tapply(T1.data$OHORSEID,T1.data$OHORSEID,mean,na.rm=TRUE)
hruns<-tapply(T1.data$runner,T1.data$OHORSEID,sum,na.rm=TRUE)
hwins<-tapply(T1.data$winner,T1.data$OHORSEID,sum,na.rm=TRUE)
hwinner<-pmin(1,hwins)
hrunner<-pmin(1,hruns)
hORmax<-tapply(T1.data$OJC,T1.data$OHORSEID,max,na.rm=TRUE)
hRFmax<-tapply(T1.data$ORF,T1.data$OHORSEID,max,na.rm=TRUE)
z<-which(hORmax==-Inf)
hORmax[z]<-NA
z<-which(hRFmax==-Inf)
hRFmax[z]<-NA

z<-match(hid,T1.data$OHORSEID)
htrainerid<-T1.data$OTRAINERID[z]
z<-match(hid,RIhorse.data$HID)
hname<-RIhorse.data$HNAME[z]
# put these into a d f
HSummary<-data.frame(hid,hname,htrainerid,hruns,hwins,hwinner,hrunner,hORmax,hRFmax)

# ###############
# produce population wide summary stats
univ.OR.med<-median(HSummary$hORmax,na.rm=TRUE)
univ.RF.med<-median(HSummary$hRFmax,na.rm=TRUE)
univ.RF.sd<-sd(HSummary$hRFmax,na.rm=TRUE)
univ.RF.1sdup<-univ.RF.med+univ.RF.sd
univ.winners<-sum(HSummary$hwinner)
univ.runners<-sum(HSummary$hrunner)
univ.winpct<-univ.winners/univ.runners

HSummary$hRF1sdup<-0
z<-which(HSummary$hRFmax>univ.RF.1sdup)
HSummary$hRF1sdup[z]<-1
univ.RF.1sduppct<-sum(HSummary$hRF1sdup)/univ.runners

# ######################################################
#
# now take the horse summary df and produce a trainer summary based upon the horse summary d f

trainer.h<-tapply(HSummary$htrainerid,HSummary$htrainerid,mean,rm=TRUE)
wins.h<-tapply(HSummary$hwins,HSummary$htrainerid,sum,na.rm=TRUE)
runs.h<-tapply(HSummary$hruns,HSummary$htrainerid,sum,na.rm=TRUE)
winspct.h<-wins.h/runs.h
winner.h<-tapply(HSummary$hwinner,HSummary$htrainerid,sum,na.rm=TRUE)
runner.h<-tapply(HSummary$hrunner,HSummary$htrainerid,sum,na.rm=TRUE)
winpct.h<-winner.h/runner.h
#ORmax.med.h<-tapply(HSummary$hORmax,HSummary$htrainerid,median,na.rm=TRUE)
RFmax.med.h<-tapply(HSummary$hRFmax,HSummary$htrainerid,median,na.rm=TRUE)
RFmax.sd.h<-tapply(HSummary$hRFmax,HSummary$htrainerid,sd,na.rm=TRUE)
RFmax.up1sd.h<-RFmax.med.h+RFmax.sd.h
RF.1sdup.h<-tapply(HSummary$hRF1sdup,HSummary$htrainerid,sum,na.rm=TRUE)
RF.1sduppct.h<-RF.1sdup.h/runner.h

TrainerHorses<-data.frame(trainer.h,wins.h,runs.h,winspct.h,winner.h,runner.h,winpct.h,RFmax.med.h,RFmax.sd.h,RFmax.up1sd.h,RF.1sdup.h,RF.1sduppct.h)
z<-match(TrainerHorses$trainer.h,RItrainer.data$TID)
TrainerHorses$tname.h<-RItrainer.data$TSTYLENAME[z]

# write out this d f to a CSV file
#fname<-“c:/Racing Research/Trainer Research/trainerhorses.csv”
#write.csv(TrainerHorses,file=fname)

# ####################################################
#
# now go back to the Outing d f and split into race categories to get IVs etc by trainer
#
# split the races into different data frames
# Maidens
# Handicaps
# Pattern

# ###################################################
#
# put maidens into their own d f
z<-which(T1.data$RANIMAL==”MDN”)
T2.data<-T1.data[z,]

# set up Sire IVs in maidens
sire.ID<-tapply(T2.data$SIREID,T2.data$SIREID,mean,na.rm=TRUE)
sire.wins <- tapply(T2.data$winner,T2.data$SIREID,sum,na.rm=TRUE)
total.wins<-sum(sire.wins)
sire.runs <- tapply(T2.data$runner,T2.data$SIREID,sum,na.rm=TRUE)
total.runs<-sum(sire.runs)
sire.IV<-(sire.wins/total.wins)/(sire.runs/total.runs)

# bring sire IV back into the T2 d f and calc a sire adjusted run variable
z<-match(T2.data$SIREID,sire.ID)
T2.data$sire.IV<-sire.IV[z]
T2.data$runner.SA<-T2.data$runner*T2.data$sire.IV

# calc the trainer IVs in maidens
trainerID<-tapply(T2.data$OTRAINERID,T2.data$OTRAINERID,mean,na.rm=TRUE)
trainer.wins <- tapply(T2.data$winner,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.wins<-sum(trainer.wins)
trainer.runs <- tapply(T2.data$runner,T2.data$OTRAINERID,sum,na.rm=TRUE)
trainer.runs.SA <- tapply(T2.data$runner.SA,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.runs<-sum(trainer.runs)
total.runs.SA<-sum(trainer.runs.SA)
trainer.IV<-(trainer.wins/total.wins)/(trainer.runs/total.runs)
trainer.IV.SA<-(trainer.wins/total.wins)/(trainer.runs.SA/total.runs.SA)

#copy over to maiden specific variables
wins.mdns<-trainer.wins
runs.mdns<-trainer.runs
runs.mdns.SA<-trainer.runs.SA
IV.mdns<-trainer.IV
IV.SA.mdns<-trainer.IV.SA

# put into a maiden trainer summary d f
TrainerMdns<-data.frame(trainerID,wins.mdns,runs.mdns,runs.mdns.SA,IV.mdns,IV.SA.mdns)

# #####################################################
#
# put handicaps into their own d f
z<-which(T1.data$RISHCAP==”TRUE”)
T2.data<-T1.data[z,]

# set up Sire IVs in maidens
sire.ID<-tapply(T2.data$SIREID,T2.data$SIREID,mean,na.rm=TRUE)
sire.wins <- tapply(T2.data$winner,T2.data$SIREID,sum,na.rm=TRUE)
total.wins<-sum(sire.wins)
sire.runs <- tapply(T2.data$runner,T2.data$SIREID,sum,na.rm=TRUE)
total.runs<-sum(sire.runs)
sire.IV<-(sire.wins/total.wins)/(sire.runs/total.runs)

# bring sire IV back into the T2 d f and calc a sire adjusted run variable
z<-match(T2.data$SIREID,sire.ID)
T2.data$sire.IV<-sire.IV[z]
T2.data$runner.SA<-T2.data$runner*T2.data$sire.IV

# calc the trainer IVs in handicaps
trainerID<-tapply(T2.data$OTRAINERID,T2.data$OTRAINERID,mean,na.rm=TRUE)
trainer.wins <- tapply(T2.data$winner,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.wins<-sum(trainer.wins)
trainer.runs <- tapply(T2.data$runner,T2.data$OTRAINERID,sum,na.rm=TRUE)
trainer.runs.SA <- tapply(T2.data$runner.SA,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.runs<-sum(trainer.runs)
total.runs.SA<-sum(trainer.runs.SA)
trainer.IV<-(trainer.wins/total.wins)/(trainer.runs/total.runs)
trainer.IV.SA<-(trainer.wins/total.wins)/(trainer.runs.SA/total.runs.SA)

#copy over to handicap specific variables
wins.hcaps<-trainer.wins
runs.hcaps<-trainer.runs
runs.hcaps.SA<-trainer.runs.SA
IV.hcaps<-trainer.IV
IV.SA.hcaps<-trainer.IV.SA

# put into a handicap trainer summary d f
TrainerHcaps<-data.frame(trainerID,wins.hcaps,runs.hcaps,runs.hcaps.SA,IV.hcaps,IV.SA.hcaps)

# ###################################################
#
# put patterns into their own d f
z<-which(T1.data$RPATTERN !=”NOT” & T1.data$RISHCAP == “FALSE”)
T2.data<-T1.data[z,]

# set up Sire IVs in patterns
sire.ID<-tapply(T2.data$SIREID,T2.data$SIREID,mean,na.rm=TRUE)
sire.wins <- tapply(T2.data$winner,T2.data$SIREID,sum,na.rm=TRUE)
total.wins<-sum(sire.wins)
sire.runs <- tapply(T2.data$runner,T2.data$SIREID,sum,na.rm=TRUE)
total.runs<-sum(sire.runs)
sire.IV<-(sire.wins/total.wins)/(sire.runs/total.runs)

# bring sire IV back into the T2 d f and calc a sire adjusted run variable
z<-match(T2.data$SIREID,sire.ID)
T2.data$sire.IV<-sire.IV[z]
T2.data$runner.SA<-T2.data$runner*T2.data$sire.IV

# calc the trainer IVs in patterns
trainerID<-tapply(T2.data$OTRAINERID,T2.data$OTRAINERID,mean,na.rm=TRUE)
trainer.wins <- tapply(T2.data$winner,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.wins<-sum(trainer.wins)
trainer.runs <- tapply(T2.data$runner,T2.data$OTRAINERID,sum,na.rm=TRUE)
trainer.runs.SA <- tapply(T2.data$runner.SA,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.runs<-sum(trainer.runs)
total.runs.SA<-sum(trainer.runs.SA)
trainer.IV<-(trainer.wins/total.wins)/(trainer.runs/total.runs)
trainer.IV.SA<-(trainer.wins/total.wins)/(trainer.runs.SA/total.runs.SA)

#copy over to pattern specific variables
wins.ptns<-trainer.wins
runs.ptns<-trainer.runs
runs.ptns.SA<-trainer.runs.SA
IV.ptns<-trainer.IV
IV.SA.ptns<-trainer.IV.SA
# put into a pattern trainer summary d f
TrainerPtns<-data.frame(trainerID,wins.ptns,runs.ptns,runs.ptns.SA,IV.ptns,IV.SA.ptns)

# ###################################################
#
# merge the trainer summary d f s
#
Temp<-merge(TrainerMdns,TrainerHcaps,by.x=”trainerID”,by.y=”trainerID”,all.x=”TRUE”,all.y=”TRUE”)
Trainers<-merge(Temp,TrainerPtns,by.x=”trainerID”,by.y=”trainerID”,all.x=”TRUE”,all.y=”TRUE”)

z<-match(Trainers$trainerID,RItrainer.data$TID)
Trainers$tname<-RItrainer.data$TSTYLENAME[z]
Trainers$country<-RItrainer.data$TCOUNTRY[z]

# merge in the ratings d f
#
TrainersAll<-merge(Trainers,TrainerHorses,by.x=”trainerID”,by.y=”trainer.h”,all.x=TRUE,all.y=TRUE)
#
#
# clean up soome of the variables, replace NA by 0
#
z<-which(is.na(TrainersAll$runs.mdns))
TrainersAll$runs.mdns[z]<-0
z<-which(is.na(TrainersAll$runs.hcaps))
TrainersAll$runs.hcaps[z]<-0
z<-which(is.na(TrainersAll$runs.ptns))
TrainersAll$runs.ptns[z]<-0
z<-which(is.na(TrainersAll$runs.mdns.SA))
TrainersAll$runs.mdns.SA[z]<-0
z<-which(is.na(TrainersAll$runs.hcaps.SA))
TrainersAll$runs.hcaps.SA[z]<-0
z<-which(is.na(TrainersAll$runs.ptns.SA))
TrainersAll$runs.ptns.SA[z]<-0
z<-which(is.na(TrainersAll$wins.mdns))
TrainersAll$wins.mdns[z]<-0
z<-which(is.na(TrainersAll$wins.hcaps))
TrainersAll$wins.hcaps[z]<-0
z<-which(is.na(TrainersAll$wins.ptns))
TrainersAll$wins.ptns[z]<-0
z<-which(is.na(TrainersAll$IV.mdns))
TrainersAll$IV.mdns[z]<-0
z<-which(is.na(TrainersAll$IV.SA.mdns))
TrainersAll$IV.SA.mdns[z]<-0
z<-which(is.na(TrainersAll$IV.hcaps))
TrainersAll$IV.hcaps[z]<-0
z<-which(is.na(TrainersAll$IV.SA.hcaps))
TrainersAll$IV.SA.hcaps[z]<-0
z<-which(is.na(TrainersAll$IV.ptns))
TrainersAll$IV.ptns[z]<-0
z<-which(is.na(TrainersAll$IV.SA.ptns))
TrainersAll$IV.SA.ptns[z]<-0

TrainersAll$wins.all<-TrainersAll$wins.mdns+TrainersAll$wins.hcaps+TrainersAll$wins.ptns
TrainersAll$runs.all<-TrainersAll$runs.mdns+TrainersAll$runs.hcaps+TrainersAll$runs.ptns
TrainersAll$runs.all.SA<-TrainersAll$runs.mdns.SA+TrainersAll$runs.hcaps.SA+TrainersAll$runs.ptns.SA

# produce summary stats
#
# composite IVs weighted by all runs in maidens, handicaps and pattern races

TrainersAll$IVcomp1<-(TrainersAll$IV.mdns*sum(TrainersAll$runs.mdns,na.rm=TRUE)+TrainersAll$IV.hcaps*sum(TrainersAll$runs.hcaps,na.rm=TRUE)
+TrainersAll$IV.ptns*sum(TrainersAll$runs.ptns,na.rm=TRUE))/(sum(TrainersAll$runs.mdns,na.rm=TRUE)+sum(TrainersAll$runs.hcaps,na.rm=TRUE)+sum(TrainersAll$runs.ptns,na.rm=TRUE))

TrainersAll$IVcomp1.SA<-(TrainersAll$IV.SA.mdns*sum(TrainersAll$runs.mdns.SA,na.rm=TRUE)+TrainersAll$IV.SA.hcaps*sum(TrainersAll$runs.hcaps.SA,na.rm=TRUE)
+TrainersAll$IV.SA.ptns*sum(TrainersAll$runs.ptns.SA,na.rm=TRUE))/(sum(TrainersAll$runs.mdns.SA,na.rm=TRUE)+sum(TrainersAll$runs.hcaps.SA,na.rm=TRUE)+sum(TrainersAll$runs.ptns.SA,na.rm=TRUE))

TrainersAll$IVcomp2<-(TrainersAll$IV.mdns*TrainersAll$runs.mdns+TrainersAll$IV.hcaps*TrainersAll$runs.hcaps
+TrainersAll$IV.ptns*TrainersAll$runs.ptns)/(TrainersAll$runs.mdns+TrainersAll$runs.hcaps+TrainersAll$runs.ptns)

TrainersAll$IVcomp2.SA<-(TrainersAll$IV.SA.mdns*TrainersAll$runs.mdns.SA+TrainersAll$IV.SA.hcaps*TrainersAll$runs.hcaps.SA
+TrainersAll$IV.SA.ptns*TrainersAll$runs.ptns.SA)/(TrainersAll$runs.mdns.SA+TrainersAll$runs.hcaps.SA+TrainersAll$runs.ptns.SA)

# difference variables, hcaps – mdns
TrainersAll$IVdiff.hcapsmdns<-TrainersAll$IV.hcaps-TrainersAll$IV.mdns
TrainersAll$IVdiff.hcapsmdns.SA<-TrainersAll$IV.SA.hcaps-TrainersAll$IV.SA.mdns

# quality differences using composites
TrainersAll$IVdiff.comp1.SAraw<-TrainersAll$IVcomp1.SA-TrainersAll$IVcomp1
TrainersAll$IVdiff.comp2.SAraw<-TrainersAll$IVcomp2.SA-TrainersAll$IVcomp2
# reduce the list to those trainers that have had >=50 runs in handicaps and are GB based and more than 2*50 runs in total
minruns<-50
z<-which(TrainersAll$runs.hcaps >= minruns & TrainersAll$country==”GB” & TrainersAll$runs.all >= 2*minruns)
Temp<-TrainersAll[z,]
TrainersAll50GB<-Temp[order(-Temp$IVcomp2.SA),]

# write out this d f to a CSV file
fname<-“c:/Racing Research/Trainer Research/trainersall50gb.csv”
write.csv(TrainersAll50GB,file=fname)

slcutoff<-40
z<-which(TrainersAll50GB$runner.h < slcutoff)
TrainersSmall50GB<-TrainersAll50GB[z,]
fname<-“c:/Racing Research/Trainer Research/trainerssmall50gb.csv”
write.csv(TrainersSmall50GB,file=fname)
z<-which(TrainersAll50GB$runner.h >= slcutoff)
TrainersLarge50GB<-TrainersAll50GB[z,]
fname<-“c:/Racing Research/Trainer Research/trainerslarge50gb.csv”
write.csv(TrainersLarge50GB,file=fname)

Owners Facilities: What Makes For a Good One?

In the post this morning I’ve had a letter from ARC Racing in which various improvements for owners on racedays are highlighted. The letter set me thinking : What makes for a good owners facility? In broad terms two things matter most – excellent viewing and comfortable facilities. To expand on this there are four criteria against which I’d judge whether a racecourse has a good owners facility.

Location

The owners facility should at the least have either have paddock views or be located in the stands with uninterrupted views opposite, or near to opposite,  the finish line.  If the owners facility is located away from the track there should be an owners area  located in the stands with uninterrupted views opposite, or near to opposite, the finish line.

Comfort

The owners facility should be large enough to fit the majority of owners and their guests seated.

Food & Drink

Food and drink should be available to a reasonable standard. Haute cuisine doesn’t have to be on offer, decent home cooking or a buffet is fine. I don’t mind paying as an owner if there is a decent selection on offer at a good value price.  Tea and coffee not in paper cups with tea served from tea pots and coffee that isn’t instant.

Badge Requests

It’s understandable that meetings such as Glorious Goodwood or Royal Ascot have restrictions on number of badges, extra badges and paddock passes. Both of these courses a the big meetings deal with owners requests  efficiently and with the minimum of fuss. Not all meetings are in such demand for badges and in these occasions flexibility on the part of courses is a plus.

Course Reviews

I’m going to post reviews of the courses I’ve visited in forthcoming blog pieces. More to follow!

Do Small Training Yards Punch Above Their Weight ? : R Code

####################################################################
#
# R Code to accompany the blog post ‘Do small Training Yards Punch Above Their Weight?

#

# SmallTrainerAnalysis.R
#
# J. Hathorn
#
# v1.0
#
#
# Code to look at whether small yards punch above their weight
#
# written 10-Sep-13
#
#
###################################################################

#rm(list=ls())

library(foreign)
library(maptools)

# read in database files from RI
#
setwd(“C:/Program Files (x86)/RaceForm Interactive”)

RIhorse.data <-read.dbf(“horse.dbf”)
RIouting.data <-read.dbf(“outing.dbf”)
RIrace.data <-read.dbf(“race.dbf”)
RIsire.data <-read.dbf(“sire.dbf”)
RItrainer.data <- read.dbf(“trainer.dbf”)
RIcourse.data<-read.dbf(“course.dbf”)

# #############################################################
# set date parameters to focus on races between chosen dates
# flat season Lincoln to the November Handicap
chosenDateSt<-c(“2012-03-31”)
chosenDateEd<-c(“2012-11-10”)

# set dates for determining yard sizes, set the previous year to the November Handicap
chosenDateSt1<-c(“2011-11-11”)
chosenDateEd1<-c(“2012-11-10”)
#################################################
#
# extract GB course id list from course db
z<-which(RIcourse.data$CCOUNTRY == “GB”)

GBcourseids<-RIcourse.data$CID[z]
GBcoursenames<-RIcourse.data$CNAME[z]
#
# extract GB/IRE trainer lists from trainer db
z<-which(RItrainer.data$TCOUNTRY == “GB”)
GBtrainers<-RItrainer.data$TID[z]
z<-which(RItrainer.data$TCOUNTRY == “IRE”)
IREtrainers<-RItrainer.data$TID[z]
GBIREtrainers<-append(GBtrainers,IREtrainers)
#
#
##################################################

# select outings on the flat between the chosen dates to categorise trainers
tmpidx<-which(RIouting.data$ODATE>=chosenDateSt1 & RIouting.data$ODATE<=chosenDateEd1)
T1.data<-RIouting.data[tmpidx,]

# match the course and add a country variable
z<-match(T1.data$OCOURSEID,RIcourse.data$CID)
T1.data$COCOUNTRY<-NA
T1.data$COCOUNTRY<-RIcourse.data$CCOUNTRY[z]

T1a.data<-T1.data
# select outings that took place on GB and IRE courses and append the 2 d f
tmpidx<-which(T1.data$COCOUNTRY == “GB”)
T1.data<-T1.data[tmpidx,]

tmpidx<-which(T1a.data$COCOUNTRY == “IRE”)
T2.data<-T1a.data[tmpidx,]

T1.data<-rbind(T1.data,T2.data)
# put the horse and trainer IDs into unique variables and add trainer name and country and horse name
horse<-T1.data$OHORSEID
trainer<-T1.data$OTRAINERID
uhorse<-unique(horse)
z<-match(uhorse,horse)
utrainer<-trainer[z]

# match the trainer name/domicile set up domestic/foreign variable
z<-match(utrainer,RItrainer.data$TID)
utrainerhome<-RItrainer.data$TCOUNTRY[z]
utrainername<-RItrainer.data$TSTYLENAME[z]
domestic<-0
z<-which(utrainerhome==”GB”)
domestic[z]<-1
z<-which(utrainerhome==”IRE”)
domestic[z]<-1
z<-which(is.na(domestic))
domestic[z]<-0

# match the horse name and calculate the horse age
z<-match(uhorse,RIhorse.data$HID)
uhorsename<-RIhorse.data$HNAME[z]
uhorsefdate<-RIhorse.data$HFOALDATE[z]
uhorseage<-(as.Date(chosenDateEd)-uhorsefdate)/365
# aggregate to num horses by trainer
trainerid<-tapply(utrainer,utrainer,mean)
numhorses<-tapply(uhorse,utrainer,length)
z<-match(trainerid,RItrainer.data$TID)
trainerctry<-RItrainer.data$TCOUNTRY[z]
trainerdomestic<-0
z<-which(trainerctry==”GB”)
trainerdomestic[z]<-1
z<-which(trainerctry==”IRE”)
trainerdomestic[z]<-1
z<-which(is.na(trainerdomestic))
trainerdomestic[z]<-0
# allocate trainers to overseas/tiny/small/medium/large categories in the variable size
tiny<-5
sml<-25
med<-75
size<-NA

t1<-which(trainerdomestic==0)
size[t1]<-“OVS”

t1<-which(trainerdomestic==1 & numhorses<= tiny)
size[t1]<-“TINY”
t1<-which(trainerdomestic==1 & numhorses>tiny & numhorses <=sml)
size[t1]<-“SMALL”
t1<-which(trainerdomestic==1 & numhorses>sml & numhorses <=med)
size[t1]<-“MEDIUM”
t1<-which(trainerdomestic==1 & numhorses>med)
size[t1]<-“LARGE”

size<-as.factor(size)
# total/avg horses by size category
horsesbycat<-tapply(numhorses,size,sum)
avgbycat<-tapply(numhorses,size,mean,na.rm=TRUE)

# match the size by trainer back to the unique horse vector, will be useful later
#
z<-match(utrainer,trainerid)
utrainersize<-size[z]

###############################################
#
# go to the race file and calculate how many GB handicap and pattern races and runners in the period examined
#
tmpidx<-which(RIrace.data$RDATE>=chosenDateSt & RIrace.data$RDATE<=chosenDateEd & RIrace.data$RFJ==”F”)
R1.data<-RIrace.data[tmpidx,]

z<-match(R1.data$RCOURSEID,RIcourse.data$CID)
R1.data$COCOUNTRY<-NA
R1.data$COCOUNTRY<-RIcourse.data$CCOUNTRY[z]

# select outings that took place on GB courses
tmpidx<-which(R1.data$COCOUNTRY == “GB”)
R1.data<-R1.data[tmpidx,]

# match the winning horse into uhorse to find trainer id/name, trainer home and size category
z<-match(R1.data$RWINHRSID,uhorse)
R1.data$trainerid<-utrainer[z]
R1.data$trainername<-utrainername[z]
R1.data$trainersize<-utrainersize[z]
R1.data$trainerhome<-utrainerhome[z]
R1.data$horseage<-uhorseage[z]

# set up 1/0 values for aggregation later by trainer category
z<-which(R1.data$trainersize==”OVS”)
R1.data$ovs<-NA
R1.data$ovs[z]<-1
z<-which(is.na(R1.data$ovs))
R1.data$ovs[z]<-0

z<-which(R1.data$trainersize==”TINY”)
R1.data$tiny<-NA
R1.data$tiny[z]<-1
z<-which(is.na(R1.data$tiny))
R1.data$tiny[z]<-0

z<-which(R1.data$trainersize==”SMALL”)
R1.data$sml<-NA
R1.data$sml[z]<-1
z<-which(is.na(R1.data$sml))
R1.data$sml[z]<-0

z<-which(R1.data$trainersize==”MEDIUM”)
R1.data$med<-NA
R1.data$med[z]<-1
z<-which(is.na(R1.data$med))
R1.data$med[z]<-0

z<-which(R1.data$trainersize==”LARGE”)
R1.data$lge<-NA
R1.data$lge[z]<-1
z<-which(is.na(R1.data$lge))
R1.data$lge[z]<-0
# match the winning horse into the RIhorse d f to get the sire id
z<-match(R1.data$RWINHRSID,RIhorse.data$HID)
R1.data$WINSIREID<-RIhorse.data$HSIREID[z]

# produce pattern only and handicap only d f
#
z<-which(R1.data$RPATTERN != “NOT” & R1.data$RISHCAP==”FALSE”)
Patterns<-R1.data[z,]

z<-which(R1.data$RISHCAP== “TRUE”)
Hcaps<-R1.data[z,]

# summarise winners for patterns/hcaps by sire ID in total and by trainer category
#
PatternWinsSireIDTmp<-tapply(Patterns$WINSIREID,Patterns$WINSIREID,mean)
PatternWinsSireTmp<-tapply(Patterns$WINSIREID,Patterns$WINSIREID,length)
PatternWinsSireOvsTmp<-tapply(Patterns$ovs,Patterns$WINSIREID,sum)
PatternWinsSireTinyTmp<-tapply(Patterns$tiny,Patterns$WINSIREID,sum)
PatternWinsSireSmlTmp<-tapply(Patterns$sml,Patterns$WINSIREID,sum)
PatternWinsSireMedTmp<-tapply(Patterns$med,Patterns$WINSIREID,sum)
PatternWinsSireLgeTmp<-tapply(Patterns$lge,Patterns$WINSIREID,sum)

HcapWinsSireIDTmp<-tapply(Hcaps$WINSIREID,Hcaps$WINSIREID,mean)
HcapWinsSireTmp<-tapply(Hcaps$WINSIREID,Hcaps$WINSIREID,length)
HcapWinsSireOvsTmp<-tapply(Hcaps$ovs,Hcaps$WINSIREID,sum)
HcapWinsSireTinyTmp<-tapply(Hcaps$tiny,Hcaps$WINSIREID,sum)
HcapWinsSireSmlTmp<-tapply(Hcaps$sml,Hcaps$WINSIREID,sum)
HcapWinsSireMedTmp<-tapply(Hcaps$med,Hcaps$WINSIREID,sum)
HcapWinsSireLgeTmp<-tapply(Hcaps$lge,Hcaps$WINSIREID,sum)

# summarise winners for patterns/hcaps by trainer size
PatternWinsTrainers<-tapply(Patterns$trainersize,Patterns$trainersize,length)
HcapWinsTrainers<-tapply(Hcaps$trainersize,Hcaps$trainersize,length)

#########################
#
# find out number of runs by category for each race type
#
z<-match(T1.data$OTRAINERID,utrainer)
T1.data$trainername<-utrainername[z]
T1.data$trainersize<-utrainersize[z]
T1.data$trainerhome<-utrainerhome[z]
T1.data$horseage<-uhorseage[z]

# set up 1/0 values for aggregation later by trainer category
z<-which(T1.data$trainersize==”OVS”)
T1.data$ovs<-NA
T1.data$ovs[z]<-1
z<-which(is.na(T1.data$ovs))
T1.data$ovs[z]<-0

z<-which(T1.data$trainersize==”TINY”)
T1.data$tiny<-NA
T1.data$tiny[z]<-1
z<-which(is.na(T1.data$tiny))
T1.data$tiny[z]<-0

z<-which(T1.data$trainersize==”SMALL”)
T1.data$sml<-NA
T1.data$sml[z]<-1
z<-which(is.na(T1.data$sml))
T1.data$sml[z]<-0

z<-which(T1.data$trainersize==”MEDIUM”)
T1.data$med<-NA
T1.data$med[z]<-1
z<-which(is.na(T1.data$med))
T1.data$med[z]<-0

z<-which(T1.data$trainersize==”LARGE”)
T1.data$lge<-NA
T1.data$lge[z]<-1
z<-which(is.na(T1.data$lge))
T1.data$lge[z]<-0

# bring in race types

z<-match(T1.data$ORACEID,R1.data$RID)
T1.data$RPATTERN<-R1.data$RPATTERN[z]
T1.data$RISHCAP<-R1.data$RISHCAP[z]

# match each horse into the RIhorse d f to get the sire id
z<-match(T1.data$OHORSEID,RIhorse.data$HID)
T1.data$SIREID<-RIhorse.data$HSIREID[z]

# produce pattern only and handicap only outing d f
#
z<-which(T1.data$RPATTERN != “NOT” & T1.data$RISHCAP==”FALSE”)
OutPatterns<-T1.data[z,]

z<-which(T1.data$RISHCAP== “TRUE”)
OutHcaps<-T1.data[z,]

# summarise runners for patterns/hcaps by trainer size
PatternRunsTrainers<-tapply(OutPatterns$trainersize,OutPatterns$trainersize,length)
HcapRunsTrainers<-tapply(OutHcaps$trainersize,OutHcaps$trainersize,length)

# summarise runners for patterns/hcaps by sire ID
PatternRunsSireID<-tapply(OutPatterns$SIREID,OutPatterns$SIREID,mean)
PatternRunsSire<-tapply(OutPatterns$SIREID,OutPatterns$SIREID,length)
PatternRunsSireOvs<-tapply(OutPatterns$ovs,OutPatterns$SIREID,sum)
PatternRunsSireTiny<-tapply(OutPatterns$tiny,OutPatterns$SIREID,sum)
PatternRunsSireSml<-tapply(OutPatterns$sml,OutPatterns$SIREID,sum)
PatternRunsSireMed<-tapply(OutPatterns$med,OutPatterns$SIREID,sum)
PatternRunsSireLge<-tapply(OutPatterns$lge,OutPatterns$SIREID,sum)

HcapRunsSireID<-tapply(OutHcaps$SIREID,OutHcaps$SIREID,mean)
HcapRunsSire<-tapply(OutHcaps$SIREID,OutHcaps$SIREID,length)
HcapRunsSireOvs<-tapply(OutHcaps$ovs,OutHcaps$SIREID,sum)
HcapRunsSireTiny<-tapply(OutHcaps$tiny,OutHcaps$SIREID,sum)
HcapRunsSireSml<-tapply(OutHcaps$sml,OutHcaps$SIREID,sum)
HcapRunsSireMed<-tapply(OutHcaps$med,OutHcaps$SIREID,sum)
HcapRunsSireLge<-tapply(OutHcaps$lge,OutHcaps$SIREID,sum)

# calc the average age of the horses run by each trainer category in handicaps
#
HcapAge.ovs<-sum(OutHcaps$ovs*as.numeric(OutHcaps$horseage),na.rm=TRUE)/sum(as.numeric(OutHcaps$ovs),na.rm=TRUE)
HcapAge.tiny<-sum(OutHcaps$tiny*as.numeric(OutHcaps$horseage),na.rm=TRUE)/sum(as.numeric(OutHcaps$tiny),na.rm=TRUE)
HcapAge.sml<-sum(OutHcaps$sml*as.numeric(OutHcaps$horseage),na.rm=TRUE)/sum(as.numeric(OutHcaps$sml),na.rm=TRUE)
HcapAge.med<-sum(OutHcaps$med*as.numeric(OutHcaps$horseage),na.rm=TRUE)/sum(as.numeric(OutHcaps$med),na.rm=TRUE)
HcapAge.lge<-sum(OutHcaps$lge*as.numeric(OutHcaps$horseage),na.rm=TRUE)/sum(as.numeric(OutHcaps$lge),na.rm=TRUE)
#
PtrnAge.ovs<-sum(OutPatterns$ovs*as.numeric(OutPatterns$horseage),na.rm=TRUE)/sum(as.numeric(OutPatterns$ovs),na.rm=TRUE)
PtrnAge.tiny<-sum(OutPatterns$tiny*as.numeric(OutPatterns$horseage),na.rm=TRUE)/sum(as.numeric(OutPatterns$tiny),na.rm=TRUE)
PtrnAge.sml<-sum(OutPatterns$sml*as.numeric(OutPatterns$horseage),na.rm=TRUE)/sum(as.numeric(OutPatterns$sml),na.rm=TRUE)
PtrnAge.med<-sum(OutPatterns$med*as.numeric(OutPatterns$horseage),na.rm=TRUE)/sum(as.numeric(OutPatterns$med),na.rm=TRUE)
PtrnAge.lge<-sum(OutPatterns$lge*as.numeric(OutPatterns$horseage),na.rm=TRUE)/sum(as.numeric(OutPatterns$lge),na.rm=TRUE)

 
# as the winners tapply won’t include all runners, match back into winner variables to line up winners with runners
z<-match(PatternRunsSireID,PatternWinsSireIDTmp)
PatternWinsSire<-PatternWinsSireTmp[z]
PatternWinsSireOvs<-PatternWinsSireOvsTmp[z]
PatternWinsSireTiny<-PatternWinsSireTinyTmp[z]
PatternWinsSireSml<-PatternWinsSireSmlTmp[z]
PatternWinsSireMed<-PatternWinsSireMedTmp[z]
PatternWinsSireLge<-PatternWinsSireLgeTmp[z]

z<-which(is.na(PatternWinsSire))
PatternWinsSire[z]<-0
z<-which(is.na(PatternWinsSireOvs))
PatternWinsSireOvs[z]<-0
z<-which(is.na(PatternWinsSireTiny))
PatternWinsSireTiny[z]<-0
z<-which(is.na(PatternWinsSireSml))
PatternWinsSireSml[z]<-0
z<-which(is.na(PatternWinsSireMed))
PatternWinsSireMed[z]<-0
z<-which(is.na(PatternWinsSireLge))
PatternWinsSireLge[z]<-0

# repeat for handicaps

z<-match(HcapRunsSireID,HcapWinsSireIDTmp)
HcapWinsSire<-HcapWinsSireTmp[z]
HcapWinsSireOvs<-HcapWinsSireOvsTmp[z]
HcapWinsSireTiny<-HcapWinsSireTinyTmp[z]
HcapWinsSireSml<-HcapWinsSireSmlTmp[z]
HcapWinsSireMed<-HcapWinsSireMedTmp[z]
HcapWinsSireLge<-HcapWinsSireLgeTmp[z]

z<-which(is.na(HcapWinsSire))
HcapWinsSire[z]<-0
z<-which(is.na(HcapWinsSireOvs))
HcapWinsSireOvs[z]<-0
z<-which(is.na(HcapWinsSireTiny))
HcapWinsSireTiny[z]<-0
z<-which(is.na(HcapWinsSireSml))
HcapWinsSireSml[z]<-0
z<-which(is.na(HcapWinsSireMed))
HcapWinsSireMed[z]<-0
z<-which(is.na(HcapWinsSireLge))
HcapWinsSireLge[z]<-0

# calc IVs per sire
PatternIVsire<-(PatternWinsSire/sum(PatternWinsSire))/(PatternRunsSire/sum(PatternRunsSire))
HcapIVsire<-(HcapWinsSire/sum(HcapWinsSire))/(HcapRunsSire/sum(HcapRunsSire))

# calc IV adjusted runners by sire
#
PatternRunsSire.IV<- PatternRunsSire*PatternIVsire
PatternRunsSireOvs.IV<- PatternRunsSireOvs*PatternIVsire
PatternRunsSireTiny.IV<- PatternRunsSireTiny*PatternIVsire
PatternRunsSireSml.IV<- PatternRunsSireSml*PatternIVsire
PatternRunsSireMed.IV<- PatternRunsSireMed*PatternIVsire
PatternRunsSireLge.IV<- PatternRunsSireLge*PatternIVsire

HcapRunsSire.IV<- HcapRunsSire*HcapIVsire
HcapRunsSireOvs.IV<- HcapRunsSireOvs*HcapIVsire
HcapRunsSireTiny.IV<- HcapRunsSireTiny*HcapIVsire
HcapRunsSireSml.IV<- HcapRunsSireSml*HcapIVsire
HcapRunsSireMed.IV<- HcapRunsSireMed*HcapIVsire
HcapRunsSireLge.IV<- HcapRunsSireLge*HcapIVsire

 

Do Small Training Yards Punch Above Their Weight?

Introduction

When George Margeson’s Lucky Kristale won the Group 2 Duchess of Cambridge stakes in July 2013 it was a newsworthy event. Not only because she won at 20-1, but because she is trained at a yard that has so far sent out fewer than 20 different horses to race on the flat in 2013. Lucky Kristale’s subsequent win in the Group 2 Lowther Stakes at York showed her Newmarket win to be no fluke, with an engagement in the Group 1 Cheveley Park likely to be next on the agenda. Small training yards don’t often win Pattern races in the UK. Of 266 such races that took place on the flat in 2012, just 6 were won by yards that had fewer than 25 horses in training. So how well do small yards perform? Is the flexibility of training a small string outweighed by the advantages in having a large number of horses in training? Are smaller yards able to judge when they have a horse that is capable of winning a Pattern race, and how good are they at placing their horses in handicaps? In short, do small training yards punch above their weight?

These questions are best answered by considering yards in aggregate. It is difficult to draw strong conclusions about an individual trainer’s ability when they don’t have many horses in training, however we can classify trainers by yard size and then examine how each yard classification – small, medium and large – performs. The number of horses in each category means that, assuming the question is framed correctly, the conclusions will have some significance.

The analysis that underpins this piece was carried out in the R statistical environment accessing Raceform Interactive data. The R code is posted elsewhere for interested readers.

 

Training Yard Classification

Training yards were classified as Tiny, Small, Medium and Large using the criteria in Table 1 below. The Tiny category was included so that results for the Small category were not influenced by yards that have the occasional runner. All races under both flat and NH codes in that took place in Great Britain (GB) and Ireland in the 12 months to the date of the 2012 November Handicap were considered.  The number of different horses that ran in this 12 month period determined the size classification of each trainer. An Overseas category is included so that the occasional runner from abroad is not misclassified.

Yard Size   Horses in Training   Yards   Number of Horses   Average  
Tiny fewer than 5 957 1,915 2
Small between 5 and 25 558 7,253 13
Medium between 25 and 75 203 8,533 42
Large more than 75   56 6,967 124
Overseas based outside GB/ Ireland

Table 1: Yard classification, yard and horse numbers

Table 1 shows that a substantial number of horses are in training in small yards. In aggregate there are more horses in training in Small/Tiny yards than in any other category. This suggests a good degree of success on the part of smaller yards, in being able to attract and retain horses in training.

Pattern Race Analysis

With yards now classified by size, the results of all 266 Pattern races that took place on the Flat in 2012 were examined by yard size and are presented in Table 2. The number of winners and runners and Impact Values (IV) are presented. Impact Values are defined as IV= %winners/%runners and represent opportunity adjusted performance. An IV of 1 represents what you would expect given the opportunity, less than 1 is worse than expected, greater than 1 better than expected.

Yard Size   Winners   Runners   %Winners   %Runners   IV
Tiny 0 14 0.0% 0.6% 0.00
Small 5 145 1.9% 6.0% 0.31
Medium 57 736 21.4% 30.7% 0.70
Large 194 1,437 72.9% 65.4% 1.22
Overseas 10 65 0.0% 0.0% 0.00
TOTAL   266   2,397            

Table 2: Pattern race results 2012 wins/runs/IVs by stable classification

Small yards were not well represented in Pattern races in 2012, comprising fewer than 7% of runners. In addition, when small yards did have runners, they do not win as often as would be expected, posting an IV of 0.31. Medium sized yards do not win Pattern races as often as would be expected either, posting an IV of 0.70.  Larger yards, whilst having the largest proportion of runners, delivered more winners than would be expected, with an IV of 1.22. From a small sample overseas yards were adept at targeting GB Pattern races in 2012, with an IV of 1.39.

The information in Table 2 does not take into account the quality of the horses that take part in each category of yard. Larger yards are likely to have better quality horses and thus more likely to win Pattern Races. There are a number of ways in which this quality bias could be corrected. One option would be to take into account the cost of the horses in each type of yard. However not all horses pass through the sales ring, so any adjustment based upon sales information would be incomplete. Another option, adopted here, is to adjust the number of runners in each stable size category by the Impact Value of the sire of each of the runners in Pattern races in 2012. Thus if Galileo’s stock had an IV of 2, and his stock ran 10 times in Pattern races, the number of runs would be adjusted to give a sire adjusted run number of 20. Since the successful sires have a higher representation at the larger training yards, this approach takes into account the lower probability that small yards win fewer Pattern races due to the breeding of their horses in training. The effect of this adjustment is to increase the number of runners (and thus decrease the IV) from stables with horses by successful sires and decrease the number of runners (and thus increase IVs) from stables with horses by less successful sires.

Yard Size   Winners   Runners   Runners Adjusted   IV raw   IV adjusted  
Tiny 0 14 12 0.00 0.00
Small 5 145 96 0.31 0.47
Medium 57 736 638 0.70 0.81
Large 194 1,437 1,567 1.22 1.12
Overseas 10 65 85 1.39 1.06
TOTAL   266   2,397   2,397          

Table 3: Pattern race results 2012 wins/runs/IVs/adjusted IVs by stable classification

The Sire adjustment has reduced the number of runners from small yards from 145 to 96. Medium sized yards also receive some relief. However the adjustment is not enough to take the IVs for small and medium sized yards to 1, with small yards now reporting an IV of 0.47 and medium yards 0.81. Larger yards still have more winners than expected relative to small and medium sized yards, even after making an adjustment for the quality of the horses in each yard category.

Handicap Race Analysis

Are the results seen for Pattern races replicated in Handicaps? Since handicaps are a test of the best horse at the weights, an additional set of skills are brought to bear in placing horses in them. Quality of horse should be less important in these types of race.

Yard Size   Winners   Runners   %Winners   %Runners   IV
Tiny 33 524 1.1% 1.6% 0.65
Small 644 8,406 21.0% 26.4% 0.80
Medium 1,256 13,129 41.0% 41.3% 0.99
Large 1,128 9,741 36.9% 30.6% 1.20
Overseas 0 8 0.0% 0.0% 0.00
TOTAL   3,061   31,808            

Table 4: Handicap race results 2012 wins/runs/IVs by stable classification

In handicaps small yards have and IV of 0.80, so 20% fewer winners than expected. Medium sized yards deliver wins in-line with expectations, whilst large yards deliver more wins in handicaps than expected with an IV of 1.20. Results when an adjustment for horse quality via Sire Impact Values is applied are reported in Table 5 below.

Yard Size   Winners   Runners   Runners Adjusted   IV raw   IV adjusted  
Tiny 33 524 463 0.65 0.74
Small 644 8,406 8,016 0.80 0.83
Medium 1,256 13,129 13,051 0.99 1.00
Large 1,128 9,741 10,273 1.20 1.14
Overseas 0 8 5 0.00 0.00
TOTAL   3,061   31,808   31,808          

Table 5: Handicap race results 2012 wins/runs/IVs/adjusted IVs by stable classification

Whilst there is some improvement in Impact Values for smaller yards, the IV of 0.83 is equivalent to 17% fewer winners than expected. Medium sized yards again deliver wins in-line with expectations, whilst large yards deliver more wins in handicaps than expected with an IV of 1.14.

Average Horse Age & Yard Classification

A possible explanation for smaller yards posting IVs lower than 1 in handicaps is that they keep a greater proportion of exposed horses in training. A proxy for an exposed horse is its age. The average horse age by stable classification, split by Pattern races and Handicaps, is given in Table 6 below.

 

Yard Size   Pattern Horse Age (average)   Handicaps Horse Age (average)  
Tiny 4.9 4.9
Small 5.7 6.5
Medium 5.0 5.5
Large 4.2 5.2
Overseas 5.1 6.0

Table 6: Average horse age by stable classification

The table confirms that small yards have, on average, older horses in training than medium and large yards. This is the case for both Pattern and Handicap races. Whilst an age difference would only favour younger horses in Pattern races if the WFA scale is incorrect, the difference in horse age in handicaps by yard classification suggests that small yards are running more exposed horses than larger yards, and this is a contributory factor in them posting IVs less than 1 in such races.

Summary

So do small training yards punch above their weight? There are a large number of small training yards in Great Britain and Ireland. In 2012 they were responsible, in aggregate, for about a quarter of the runners in flat handicaps. Small yard representation in Pattern races in 2012 was far less, accounting for fewer than 7% of runners. Moreover, given this number of runners, the percentage of winners from small yards was less than might be expected, even after a correction for horse quality is applied. The Impact Value for small training yards was 0.47 in Pattern races, although it is possible that the correction for horse quality applied, via Sire Impact Values, does not go far enough. It could be that the best offspring of a Sire end up at the larger yards and the smaller yards end up with (say) the less good Galileo yearlings. The correction used would not account for this.

In Handicaps the Impact Value for smaller yards was 0.83 in 2012, in contrast larger yards posted an IV of 1.14. One explanation for the difference in performance in handicaps is that the smaller yards are running more exposed horses. The difference in horse age across yard size suggests this is the case. Another explanation for the performance difference is that there is substantial value in having more horses in training because it enables the trainer to categorise his horses more accurately, which leads to better placing.

The results presented suggest that it is the large training yards that are the ones punching above their weight. Training a large number of horses in one yard, whilst being able to keep the average horse age lower than smaller yards, appears to confer a substantial advantage in terms of the results produced on the racecourse.