High Class Novice Chase Candidates: Numbers, Yard Concentration 2008-13

In the last week Nicky Henderson complained about the programme for Novice Chasers, his comments culminating with the line “And that’s why there will be no chasers in three or four years time” .  A forthright summary of his comments can be found on Dan Kelly’s excellent blog here , which firstly covers the ongoing concerns about the Betfair Chase distance, then moves on to the Novice Chase programme in the context of Nicky Henderson’s comments. So leaving the programme book aside, how does the pipeline of high class horses going Novice Chasing look year by year? Using Racing Post Ratings (RPR) the number of horses rated above 145, 150, 155 and 160 is given in Table 1 below for each of the years 2008-2013 inclusive. To qualify horses must be with GB based trainers, never have run in a Chase, achieved the rating at a GB track and have run within twelve months of the end of April of each of the years considered. These filters are designed to capture high class Hurdlers that are candidates for Novice Chasing. The filters will include Hurdlers that won’t go Chasing, and excludes recruits to Novice Chasing from overseas, so the list isn’t complete. Still, these effects should be the same year on year and not affect a year on year comparison. Table 1 shows the pool of candidate horses has varied between 46 and 70 in the last six years, with no clear trend. The numbers for 2013 suggest a healthy pool of candidate horses for Novice Chasing relative to the recent past.

Year RPR 145+ RPR 150+ RPR 155+ RPR 160+
2008 46 29 15 9
2009 70 41 23 10
2010 59 34 18 11
2011 63 38 21 13
2012 55 35 18 13
2013 64 40 22 13

Table 1: High Class Novice Chase Candidates 2008-13

Using horses rated 145+, how has the concentration of horses by training yard changed over the last six years?  Table 2 shows the number of training yards that have 1 only, 2 to 5 and at least 5 high class Novice Chase candidates. So in 2008 17 yards had one candidate. In 2013 there were 19 such yards. No real pattern exists year by year. However it is in the yards with at least one candidate that the picture has changed. In 2009 there were 11 yards with 2 to 5 candidates. By 2013 this had dropped to just four yards. The view that high class Novice Chase candidates have become increasingly concentrated at the largest training yards is borne out by the data. Table 3 shows the same information but represented by total number of horses. The number of candidates in 2013 at smaller yards is the lowest it has been in the last six years and the number in the larger yards the highest. Increasing yard concentration exists.

Year 1 horse only rated 145+ 2 to 5 horses rated 145+ 5 plus horses rated 145+
2008 17 7 1
2009 13 11 3
2010 17 8 2
2011 19 6 3
2012 11 8 3
2013 19 4 4

Table 2: Number of yards with Novice Chase candidates rated 145+

Year up to 5 horses 5+ horses Total horses rated 145+
2008 39 7 46
2009 39 31 70
2010 41 18 59
2011 37 26 63
2012 31 24 55
2013 29 35 64

Table 3: Novice Chase Candidates Yard Concentration

The falling field sizes in Novice Chases cannot be blamed upon the number of horses that could go Novice Chasing. Candidate numbers are healthy. So either the programme book or yard concentration is to blame. The changes made to the Novice Chase programme in the last year or two should have led to an increase in field sizes. The only explanation for their falling in the 2013-14 so far is the refusal of the larger yards to race their best horses against each other. The campaigning of horses is largely a matter for the trainers and their owners. However, if the BHA react to the concentration of the best horses in a few yards by making changes to the programme book to reflect campaigning realities, it is difficult to imagine this leading to a dearth of Novice Chasers in a few years time. Some trainers would argue that Novice Chasing is different from Novice Hurdling and their concern is primarily one of horse welfare.  The first implication is that anyone arguing the opposite position does not have horse welfare at heart. Not a position anyone wishes to inhabit lightly. The further implication is that a series of uncompetitive races should exist so that high class horses can learn the ropes. This will then benefit their long-term career, which, in turn, benefits racing. Perhaps to address both small field sizes and welfare concerns a series of zero prize money Australian style ‘Barrier Trial’  Novice Chases at racecourses could be introduced, with the full cost of hosting these races borne entirely by the owners. No handicap marks would be awarded and no betting available. These trials would allow for legitimate schooling in public in near race conditions. Lowly handicapped horses could take part knowing their handicap marks will be unaffected, better horses could make their own way home, learning the ropes as desired by trainers.  The quid pro quo would be that the Novice Chase programme would be further reduced. Welfare concerns are addressed by the existence of Barrier Trails, whilst field sizes in Novice Chases would increase because of the reduced number of races, improving the viewing spectacle for the racing public.

Trainer Form: Signal or Noise?

Introduction

On day two of the November 2013 Cheltenham meeting David Pipe drew a blank from his seven runners. For many people this result pointed to the Pipe stable being out of form. But whatever might have been ailing the yard on Saturday had disappeared by Sunday morning, with four winners, including The Greatwood, one of the most competitive handicap hurdles in the calendar. No doubt on Sunday evening the Pipe yard was marked out as one to follow. So what of trainer form, is it possible to identify yards that are in or out of form? In the woefully misnamed ‘Statistics’ section of the Racing Post the ‘Hot Trainers’ table uses Strike Rate (winners to runners) over the last 14 days , whilst  the ‘In Form’ table in the ‘Trainerspot GB’ table uses Run To Form, again over the last fortnight. Neither table is useful.  Both are based upon too few runners to be able to draw any meaningful conclusions. These tables are an excellent example of attempting to draw inference from a small information set – whilst this instinct helped us survive when faced with perceived mortal dangers in the past, the very same instincts are likely to mislead in the more prosaic setting of horse racing. Whilst the data in the  ‘In Form’ table isn’t useful, this is only because there are too few observations to be able to draw any firm conclusions. However, the idea of considering trainer form as an average or median of how close to form the horses under the trainers care are running makes intuitive sense.

The analysis that underpins this piece was carried out in the R statistical environment accessing Raceform Interactive data focussing on the 2010-11 and 2011-12 National Hunt seasons. Thanks to Simon Rowlands and James Willoughby for their input.

Trainer Form Variable Definitions

The starting point for the analysis that follows is to define and calculate a Run To Form (RTF) variable. RTF is defined as follows:  the Racing Post Rating (RPR) achieved by each horse in a race subtracted from the maximum RPR achieved by each horse in its runs up to and including the race in question. A horse has to have run more than three times to qualify for consideration. This filter is used to reduce the influence of lightly raced progressive horses. The maximum value RTF can take is zero.

Trainer Form Absolute (TFA)

The next step is to define and calculate a measure of trainer form. Trainer Form Absolute (TFA) is the median RTF for all the runners of a trainer over a particular time period.  Using the 2010-11 National Hunt season Graph 1 shows a histogram of TFA for those trainers that had at least 50 runners over the season. 132 trainers qualified. Note the negative skew. This is a common characteristic of form data in horse racing.  It is difficult to run close to form wheras there are many and varied reasons for horses running below form. The negative skew in RTF at the horse level aggregates to negative skew in TFA at the trainer level. Graph 2 shows the same information as a Box Plot.

TFormHisto

Graph 1: Distribution of trainer TFA NH season 2010-11

TFormBoxPlot

Graph 2: Box Plot of trainer TFA NH season 2010-11

So does a ranking of TFA indicate in form trainers at the top and out of form trainers at the bottom? Inspection of Graphs 1 and 2 highlights the problem with using TFA as a measure of trainer form. The range of TFA across trainers is so wide that the top and bottom of the TFA list won’t change often enough to be able to identify trainers in and out of form – for example if Nicky Henderson normally runs at a -5lb TFA and is currently running at -8lb TFA he would still be near the top of the TFA list, wheras he is running 3lb below his normal TFA rate. It is also of note that the variability of TFA per trainer is correlated with their TFA. Either the better horses, who are at the better yards, run more consistently, or the better trainers are able to get their horses, who are better than those elsewhere, to run more consistently. Or a combination of the two. In this context better means those yards with the highest TFA values.

Trainer Form  Relative (TFR)

Given the concerns about using TFA a measure of relative run to form can be defined and calculated that takes into account the usual RTF per trainer. Trainer Form Relative (TFR) is defined as the difference between TFA in a particular time period and TFA in a previous time period. TFR enables a direct comparison between trainers with widely different absolute levels of form (TFAs). Now Nicky Henderson’s -3lb TFR can be compared with a trainer whose TFA is normally -12lb and is currently running at -9lb, to give a +3lb TFR.

For the analysis that follows TFR is calculated for each of the seven months October 2011 through to April 2012. TFRs are calculated by taking the TFA in each month by trainer and subtracting the TFA posted by trainer for the previous season 2010-11.  So, starting with October 2011, how is its TFR related to the TFRs posted one month later? Graph 3 below shows a significant relationship. At first sight this appears to be clear evidence that trainer form in October 2011 helps predict trainer form in November 2011.

TFormt_t+1

Graph 3: Relationship between TFR October to November 2011  

What happens if we compare TFR in October 2011 with months further out? If form is temporary the relationship should decline through time. If October form predicts November form, it shouldn’t predict December form to the same degree. In the jargon we can postulate an  autoregressive AR(1) process. What we see in the data is the relationship shown in Graph 3 is as strong between October and November as it is between October and other months. See Table 1 below for correlation coefficients between t and t+1, t+2, t+3 using November 2011 as month t.

Correlation of November 2011 TFR with

December 2011             0.44

January 2012                0.51

February 2012              0.43

March 2012                    0.47

April 2012                       0.37

Table 1: Correlation of TFR months t with t+1, t+2, t+3 etc

We shouldn’t observe the same level of correlation across the months. It suggests form has a permanent component to it – an oxymoron. So how to explain this result? Imagine we can fast forward one year. We calculate TFA per trainer based upon the season 2011-12. This enables us to compare the results of a trainer in 2010-11 with the next season 2011-12. If we classify TFR as ABOVE or BELOW zero, and then further classify according to whether a trainer had a BETTER or WORSE season in 2011-12 relative to 2010-11, we see how RTF looks month by month in Table 2 below.

BETTER 2011-12 form                                  WORSE 2011-12 form

                              ABOVE       BELOW                                                            ABOVE       BELOW

ABOVE                     76               42                                                                         8                  28

BELOW                     33               16                                                                       29                109

Table 2: classification of form by month by year

Remember we have peaked into the future in calculating Table 2. It isn’t available until the end of the second season. The number of observations in the ABOVE-ABOVE and BELOW-BELOW categories are too high for the TFR measure to be useful as an indicator of form. In other words when a trainer is ABOVE or BELOW form in a particular month it is likely that they will continue in that category for the next month and the next month after that and for the duration of the season. The problem here is that the comparison used in the TFR calculation, namely the form of the trainer from the previous season, is likely to suffer from bias.  For some trainers it will be higher or lower than the true level of form a trainer can expect, and for those that posted towards the extremes of TFA they are likely to revert back to some degree in the next season. Bias such as this is difficult to remove and as a result relative measures of trainer form such as TFR are as flawed in their own way as absolute measures of trainer from such as TFA.

Summary

I started this analysis with the prior view that trainer form probably exists.  My view now is that it if it does exist it is very difficult to measure.  Absolute measures of trainer form do not exhibit enough variability, wheras relative measures have problems in deciding on an appropriate comparison. For some the measures of trainer form defined above might be too simplistic, arguing that more complex definitions are required. This is entirely possible. But as complexity of definition increases, particularly if it is one of many derivations tried, so does the risk that the measure will work only for the sample of data on which it was tried. If you torture the data for long enough it will tell you anything.

There is another possible use of trainer form – but more in how it is perceived by the market. Consider Graph 4 below. It shows TFA on the x-axis and Strike Rate on the y-axis by trainer for the 2010-11 season. It is the same data expressed in different ways.

TFormTFA_SR

Graph 4: Strike Rate vs trainer form

Strike Rate is a popular measure of form, RTF (and its variants TFA and TFR) less so. Yet Strike Rate is noisier and contains less information than RTF. Given the popularity of trainer form as an idea, and the popularity of Strike Rate as a proxy for trainer form, it is possible that the runners of trainers with a high/low Strike Rate relative to their TFA could have odds that are too far away from their correct values as the market considers these trainers, based upon a faulty premise, to be in or out of form. Armed with the appropriate data this is a testable proposition.

Winning Distances: Trip, Going, Field Size, Race Class & Handicapping

Introduction

The distance that horses finish relative to each other in horse racing is an important consideration in deciding what rating to apply to each horse post-race by public and private handicappers. Whilst it is obvious that trip will affect winning distances, what of going, field size and race class? To what extent do these factors make a contribution, and does the official handicapper take these factors into account in handicapping horses post-race?

Method, Dataset & Definitions

The analysis that underpins this piece was carried out in the R statistical environment accessing Raceform Interactive (RI) data for turf handicap races that took place during the 2011, 2012 and 2013 flat seasons in Great Britain. Races were placed into categories as follows:

Trip – Sprint (up to 6.5f), Mile (6.5-9.5f), Mid-distance (9.5-12.5f) and Long-distance (12.5f+)

Going – Heavy (HY), Soft (S), Good-Soft (GS), Good (G), Good-Firm (GF) and Firm (F)

Field Size – Tiny (fewer than 4 runners), Small (5 to 12 runners) and Large (more than 12 runners)

Race Class – High (Classes 1, 2 and 3) and Low (Classes 4, 5 and 6)

The number of races that took place in each category is given in Table 1 below.

Trip 2011 2012 2013
LONG 494 483 510
MID 450 421 448
MILE 1145 1103 1170
SPRINT 350 339 348

Table 1: Number of races by year by trip

Ground classifications used were those applied by RI rather than the official going. Proportions of races by Going category are given in Table 2a below. The effect of the wet weather in 2012 and dry summer in 2013 can be seen in the proportion of races that took place on the Soft and on Good-Firm in each year.

Going 2011 2012 2013
F 0.0% 0.6% 0.6%
GF 14.4% 14.6% 22.3%
G 65.5% 41.1% 57.4%
GS 14.8% 21.7% 12.6%
S 4.7% 17.1% 5.9%
HY 0.5% 4.9% 1.2%

Table 2a: Proportion of races by year by going category

The number of races by year by Field Size is given in Table 2b below. Field Sizes fell in 2013. The argument that fast going was responsible for the drop in Field Sizes is spurious. In 2011 there were 287 races on GF with small Field Sizes. In 2012 this decreased slightly to 262 races. In 2013 there was a substantial increase to 475 races. Other factors are responsible for the drop experienced in 2013.

Field Sizes 2011 2012 2013
LARGE 576 574 491
SMALL 1863 1772 1985
TINY 52 32 120

Table 2b: Number of races by year by field size

In Table 2c below the relationship between Race Class and Field Size is shown. As expected there is a greater proportion of High Class races with Large Field Sizes. In the analysis that follows races categorised as TINY were excluded from the analysis.

Race Class LARGE SMALL TINY
HIGH 558 963 29
LOW 1083 4656 175

Table 2c: Number of races by class and field size

 

Winning Distances and Going

Graph 1 below shows winning distances by Trip for each Going category. Winning distance is defined as the distance between the winner of a race and the horse coming third.  There aren’t many races that take place on Firm Going and as a result its black line representation on the graph should be treated with caution. Notice how winning distances are similar for the GS, G and GF categories, wheras for Soft and Heavy Going winning distances are quite different. There is also a non-linear relationship between winning distance and Going as Trip increases in distance. The minimum number of categories of Going that best describes winning distances is three:  Heavy, Soft and an amalgamation of the other Going categories. We know from Table 2a that few races take place on Heavy going. As a consequence excluding these races, rather than amalgamating with the Soft going category, will improve the balance of the analysis that follows.

plot1

Graph 1: Winning distance by going category

Winning Distance and Race Class

On average winning distances are higher in Low Class races. Graph 2 below shows median winning distance by Trip by Race Class. The relationship is linear with trip for Low Class races. For High Class longer distance races the median winning distance is lower than for mid distance races. This is counter intuitive. It could be explained by High Class long distance races being run at a different pace – more of a crawl and sprint, resulting in compressed winning distances, rather than an end to end gallop.

DistClass

Graph 2: Winning distance and Race Class

Winning Distance and Field Size

Winning distances are higher in Small Field Size races. Graph 3 below shows the median winning distance for Small and Large Field Sizes. It is possible the Field Size and Race Class winning distance effects are related due to the high relative proportion of High Class races with Large Field Sizes.

DistFSize

Graph 3: Winning Distance and Field Size

Contributions to Winning Distances

The information presented above shows that winning distances are affected by Trip, Going, Field Size and Race Class. Since some of these categories are related to each other analysis of variance (ANOVA) is used to attempt to disentangle the effects and see if all or just a subset of categories are important. In addition we can identify interaction (non-linear) effects, such as that between winning distance and Going. In Table 3 below a summary of the ANOVA table is presented. Apart from the obvious result that Trip and Going are highly significant in terms of explaining winning distances, Field Size and Race Class are important in their own right. In addition two interaction variables are included – Trip with Going and Trip with Race Class. The former is intuitive, the latter less so.

Category                    F Value                   p-value

Trip                              187.626                  2e-16

Going                             91.278                   2e-16

Field Size                      85.227                  2e-16

Race Class                    64.553                   1e-15

Trip*Going                    8.237                   2e-05

Trip*Class                      6.904                  0.000122

Table 3: ANOVA table of contributions to Winning Distance

Winning distances and Subsequent Handicap Changes

The official handicapper has detailed his policy with respect to handicapping here.  Given the wide range of inputs that he states go into his handicapping decisions, we should find a relationship between changes in handicap mark and the race categories examined in the previous section. A variable that takes into account handicap mark changes and winning distances is defined as follows:

lbPerL = (Mark change for winner – Mark change for 3rd)/(Winning distance 1st-3rd)

Graph 4 below shows winning distance on the x-axis and handicap changes winner to third on the y-axis. Whilst there is a relationship (correlation=0.6) there are other factors in addition to winning distances that are used to revise handicap ratings.

DistORchg

Graph 4: Winning distance and handicap changes

Pounds Per Length and Going

Handicap changes per length are lower for races that take place in Soft going. The median difference is 0.25 lbperL. So for with winning distances of 2 lengths, median handicap changes in Soft going are ca. 0.5lb less than on quicker Going.

ORchgGoing

Graph 5: Pounds per Length and Going

Pounds Per Length and Race Class

Handicap changes per length are higher for High Class races. The difference is 0.34 lbperL. With winning distances lower in High Class races, it appears as if the handicapper applies a standard handicap increase to the rating of winners regardless of Race Class.

ORchgClass

Graph 6: Pounds per Length and Race Class

Pounds Per Length and Field Size

Handicap changes per length are higher for races with larger Field Sizes. The difference is 0.33 lbperL. As with Race Class, it appears as if the handicapper applies a standard handicap increase to the rating of winners regardless of Field Size.

ORchgFS

Graph 7: Pounds per Length and Field Size

Understanding the Contributors to Handicap Changes

ANOVA is used to check if the differences seen in the graphs above are statistically significant. Table 4 below shows the handicapper does take into account Going, Field Size and Race Class in the handicap changes he applies to winning horses – the p-values show that each category explains a significant component of the lbperL variable. In the next section we examine if sufficient account is taken of the different race categories.

Category              F Value                 p-value

Trip                          106.119                < 2e-16

Going                         27.191               1.90e-07

Race Class               45.673               1.52e-11

Field Size                  42.151               9.10e-11

Table 4: Contributors to Winning Distances

Is Sufficient Account Taken of Different Race Categories?

If the handicapper takes sufficient account of race categories it should be the case that horses run equally well in their next race. The variable PctBtn (thanks to Simon Rowlands of Timeform for suggesting this variable, for example here) is defined as the percentage of horses beaten next time out by the winner of each race. If the handicapper has done his job, there should be no difference in the average PctBtn variable by race category. ANOVA is used again. Table 5 contains the results. The results for Field Size are statistically significant. It appears as if the handicapper does not raise the handicap mark of winners of large Field Size races by enough, since they beat a higher proportion of their rivals next time out than winners of races in other categories.

Category                   F Value                   p-value

Trip                             0.442                       0.7230

Race Class                0.098                       0.7545

Field Size                  4.821                        0.0282

Going                        0.668                        0.4137

Table 5: Contributors to Winning Distances

Summary

In addition to the obvious effect of Trip and Going on winning distances, Field Sizes and Race Class are also significant contributors. Whilst the handicapper appears to take these factors into account in setting handicap marks, in the case of large fields size handicap winners it appears that winners are insufficiently penalised. It is a small step to suggest that placed horses from large Field Size races are worthy of particular attention next time out.

Yearling Sales vs. Breeze Ups: Where Should You Buy?

Introduction

There are two options for buying racehorses with a flat campaign in mind – the Yearling Sales or Breeze Ups. At the Yearling Sales, which take place in the Autumn, horses are sold unbroken, wheras at the Breeze Ups, which take place the following Spring, part of the sales process is that the horses catalogued breeze for 2 furlongs. Breeze times are used by many as an input into their buying decision. Breeze Up sales have their critics. Young horses being pushed too hard, too young, with a preparation  that instills poor habits (‘boiled brains’), horses do not train on and are more prone to a lack of soundness. Added to that is a perception that horses sold at the Breeze Ups are expensive for what you get. There also exists the perfect totem for Breeze Up critics – The Green Monkey, who breezed in sub 10 seconds, was bought for $16m by Coolmore and never won a race. So is what happened with The Green Monkey indicative of what happens with many Breeze Up graduates? Or can a variant of Godwin’s Law be invoked, so that anyone who mentions The Green Monkey in a discussion regarding the merits of Breeze Up sales has automatically lost the argument? In this blog post the graduates from  Tattersalls Yearling Sales from the Autumn of 2011 are compared with the graduates from the Spring 2012 Breeze Up Sales that took place at Kempton (Ready to Run), Tattersalls Guineas, Tattersalls Craven and DBS. Racecourse performance is compared, along with prices paid.

Methodology

The analysis that underpins this piece was carried out in the R statistical environment accessing Raceform Interactive (RFI) data. This is the same data used by the Racing Post.  Sales information and racecourse performances (wins, runs, ratings) for the 2012 and 2013 flat seasons were collected. For analysis of racecourse performance horses were categorised as coming from either yearling sales, or breeze ups. Horses that sold at more than one sale were placed according to the category of the latest sale.

Catalogue Numbers, Appearance Rates and Withdrawals

Table 1 shows the number of horses in each sale category and the number of horses that went on to compete in a race. the appearance rate for Breeze Ups is 72%, higher than that for the Yearling Sales. The underlying data is from RFI, which records all GB and Irish racing and higher grade overseas races. Sales graduates that raced overseas at a lower level will not be included and so there is a degree of under-reporting in Table 1.  It is possible the Yearling Sales would be more affected by this than the Breeze Ups when overseas buyers are considered, even taking this into account there is little evidence that horses prepared for Breeze Ups are less likely to reach the racecourse.

The withdrawn columns show the numbers/percentages of horses that are withdrawn from sale but subsequently race. The withdrawal rate is twice as high for Breeze Ups. Because horses have to do more at the Breeze Ups than the Yearling Sales, it is not surprising a higher withdrawal rate exists. Consignors do not wish to jeopardise the sale value by sending horses to the Breeze Ups that are not ready to do themselves justice.

Category Catalogued Raced Raced (%) Withdrawn Withdrawn
Yearling Sales 1670 1137 68% 78 4.7%
Breeze Ups 601 434 72% 58 9.7%
Grand Total 2271 1571 69% 136 6.0%

Table 1: Catalogue sizes, appearances and withdrawals by category

Sales Prices

Table 2 shows the number of horses that were sold by category, as well as median, average and maximum prices. Sales exclude vendor buybacks and horses not sold, and the yearling sale numbers exclude those horses that went on to be sold at Breeze Ups. The median sale price was £38,000 at the Yearling Sales and £30,000 at the Breeze Ups. Averages and maximum sale prices were also higher for the Yearling Sales.  These numbers exclude any saving on training fees that accrues from buying ca. 6 months later at the Breeze Ups.

Category Number Median (£) Average (£) Maximum (£)
Yearling Sales 809      38,000             65,972   700,000
Breeze Ups 291      30,000             42,711   300,000

Table 2: Sale Information by category

Ratings Achieved by Sale Category

Table 3 shows, for each sale category, the median of the maximum rating achieved by each horse over its racing career in 2012 (2 year old) and 2013 (3 year old). Medians are also given for ratings achieved first time out, and considering the racing career in 2012 as a 2 year old only. The results for Yearling Sales and Breeze Ups are very similar. At the end of their 2 year old career ratings are both 69. At the end of their 3 year old career they are both 74. Horses sold at both types of sale are of similar quality and progress similarly from age 2 to age 3.

Ratings for first time out runs are lower for Breeze Up horses at 57.5 versus 60 for Yearling Sales graduates. This is an unexpected result. Breeze Up horses progress further from their first run to their maximum than horses sold at the Yearling Sales. The difference isn’t large, however the view that Breeze Up horses are as ready as they can be for racing isn’t borne out by the data. A possible explanation is as follows: Breeze Up consignors are concerned to get their horses to the sales, not wanting them to break down, as a result they veer on the side of caution and under rather than over prepare their horses. When these horses arrive with trainers, they fear the horses have been over prepared, given the reputation that exists for Breeze Up graduates, and the horses are trained more cautiously than Yearling Sales graduates that have been in their charge for longer. The end result is that Breeze Up graduates post lower ratings first time out than Yearling Sales graduates.

Category Rating Rating 1TO Rating 2yo Highest Rated
Yearling Sales 74 60 69 118
Breeze Ups 74 57.5 69 118

Table 3: Ratings by category

Win Rates and Runs per Horse

Win rates for Yearling Sales and Breeze Ups are similar, with 58% of Breeze Up and 55% of Yearling Sale graduates going on to win races. The number of runs per horse is higher for Breeze Ups graduates. The difference is 1.2 races per horse when all races as a 2 year old and 3 year old are considered. It is possible that the Breeze Up preparation selects horses that are able to withstand racing, and buyers are able to identify these horses at the Breeze Ups.

Category Win Rate Runs/Horse all Runs/Horse 2yo
Yearling Sales 55.3%                7.5                    3.7
Breeze Ups 58.0%                8.7                    4.2

Table 4: Win rates and runs per horse

Summary

A comparison of  racecourse performance and sales prices for ca. 1,500 Yearling Sale (2011) and Breeze Up (2012) graduates shows the following:

  • Breeze Up horses sold more cheaply
  • A similar proportion of horses reached the racecourse
  • Ratings achieved were similar
  • Breeze Up horses progressed somewhat more from first run to their maximum rating
  • Win rates were similar
  • Breeze Up horses ran more often

There is little evidence that the criticisms levelled at Breeze Ups are justified, with both types of sale offering opportunities to buyers.

More detailed performance information is available from www.breezeupwinners.com

Does Buying at Tattersalls Book 1 Lead To Guaranteed Success? Prices Paid vs. Racecourse Performance For The 2007 Graduates

Introduction

Record prices paid at the recent Tattersalls 2013 Book 1 Yearling Sale have hit the headlines. A Galileo filly, full sister to Oaks winner Was, sold for a record breaking G5m (G = guineas). The median price paid for a yearling came in at G130,000, an increase of 30% over 2012. Today (14th October 2013) the Book 2 sale starts, followed by Book 3 one week later. Yearling are categorised into the three books by Tattersalls based upon a range of criteria, including pedigree and confirmation. Book 1 is the most prestigious and its graduates typically sell for more than Book 2 graduates, which in turn sell for more than Book 3 graduates.  So how do the graduates of Tattersalls Yearling Sales perform on the racecourse? In common with all of the sales companies any Tattersalls graduate winning a prestigious race results in a tweet and/or email proclaiming where the horse was sold. But how do the graduates of the sales perform in aggregate? Trainer George Baker in a recent blog post alludes to the reality that some of these graduates will end up plying their trade at a basement level.  In this blog post the racecourse performance of all of the 2007 graduates from Books 1, 2 and 3 Tattersalls Yearling Sale is examined. The maximum rating achieved by each horse between 2008 and the end of the 2012 flat season was extracted from the Raceform database, including information from maidens, handicaps and pattern races and the ratings and race performance compared with their yearling sales price.

Yearling Sale Prices By Book

Over 1,500 yearlings were catalogued at Tattersalls Yearling Sale in 2007. Excluding those withdrawn, not sold or bought back, 1,136 yearlings were sold. The Book 1 median was G80,000, twice the Book 2 median of G40,000, with the Book 3 median coming in at G12,000.

Book Sold Median (G) Max (G)
1 447        80,000 1,000,000
2 393         40,000     300,000
3 296         12,000       72,000

Table 1: Tattersalls 2007 Yearling Sale Prices

 

Sale Prices & Subsequent Ratings

How do the graduates from this sale perform on the racecourse? Graph 1 below shows the relationship between prices paid and subsequent maximum rating achieved by each horse. The y axis has rating and the x axis sale price. The relationship is noisy. The correlation between price paid and subsequent rating is 0.20. If log prices are used so that the effect of some of the higher priced lots is dampened, the correlation increases to 0.28. At first glance it doesn’t appear as if much of a relationship exists at all. Does this suggest the work of bloodstock agents, trainers and owners trying to identify the best yearlings is of limited benefit?

Prices Vs. Ratings

Graph 1: Tattersalls 2007 Yearling sale price (G) vs subsequent rating

 

Book Membership & Ratings Achieved

In common with much of the data in horse rating, aggregation enables relationships to be identified.  Table 2 gives the median rating achieved across all of the graduates for each of Books 1, 2 and 3. The best horse from Book 1 posted a rating of 135, the best horse in Books 2 and 3 posted similar ratings of 120 and 119 respectively. The median rating achieved by Book 1 graduates was 78, for Book 2 graduates 73.5 and for Book 3 graduates 68. So a relationship between price paid and subsequent rating does exist when the results are aggregated to the Book level. Note that improvements in ratings become progressively more expensive to buy. In trading up from Book 3 to 2, an extra G28,000 bought you an additional 5.5 points of rating, whilst in trading up from Book 2 to 1 you needed to spend an extra G40,000 to garner an additional 4.5 rating points.

Book Median Rating Max Rating
1 78 135
2 73.5 120
3 68 119

Table 2: Ratings achieved across Books 1, 2 & 3

Wins Rates in Maidens, Handicaps & Pattern Races

Table 3 gives the number of individual winners that came out of each book by race category, table 4 shows the same information expressed as a percentage of horses that sold in each book. The numbers do not sum to the total column because a horse can be a winner in each of the three race categories but only once in total. About half of all graduates from the Tattersalls 2007 Yearling Sales are still maidens and the proportion of yearlings that won at least one race seems to be little affected by the Book in which you wrre sold. However the benefits of buying from Book 1 become clear. Nearly twice as many graduates from Book 1 go on to win pattern races compared with the graduates of Books 2 and 3. Book 1 graduates also win the highest proportion of maidens. This result should probably be upgraded because they are likely to have to contest open maidens, which by their nature are the most competitive. Their sales price and stallion fee would preclude them from competing in auction and median auction races. There is also a knock on effect when open maiden horses go on to compete in handicaps. Race standards applied by the handicapper, allied to his ‘on a line through’ methodology, means that the handicap marks of Book 1 graduates may  leave less room for manoeuvre than the graduates of other books. It is also likely that Book 1 graduates will be trained with a view to possible Pattern company participation, thus competing in maiden company closer to full fitness than the graduates of other books. As a result Book 1 graduates that end up in handicaps could well be doing so on marks that most closely reflect their ability. All of these arguments can be reversed when the graduates of Book 3 are considered.

Book Maidens Handicaps Patterns Total
1 155 126 34 239
2 118 121 16 188
3 84 98 10 152
Total 357 345 60 579

Table 3: Individual winners by Book in Maidens, Handicaps & Pattern Races

Book Maidens Handicaps Patterns Total
1 34.7% 28.2% 7.6% 53.5%
2 30.0% 30.8% 4.1% 47.8%
3 28.4% 33.1% 3.4% 51.4%
All 31.4% 30.4% 5.3% 51.0%

Table 4: Percentage of individual winners by Book in Maidens, Handicaps & Pattern Races

Differentiation Within Books: Does Paying More Work Within Books?

In aggregate the more expensive horses perform better on the racecourse. Is there much difference in subsequent performance if the more expensive Book 1 graduates are compared with those that sold more cheaply from Book 1? Each Book was sorted and split into a top half and bottom half group based upon sale price. The median rating of each group was calculated. Table 5 shows the median price and rating for each of the top half and bottom half by Book. There is a clear relationship between sales price and subsequent rating within each book. In each book the difference is about the same at 9 rating points. The more expensive Book 1 graduates ended up with higher ratings than cheaper Book 1 yearlings. The same is true of Books 2 and 3. In each case the difference in median ratings is about 9 points. It is noteworthy that the incremental cost of each additional rating point depends on your starting rating. In Book 3 it costs G1,800 for every extra rating point, whilst in Book 2 it is G5,455 per point and in Book 1 G21,250. In this respect yearlings trade in much the same way as other trophy assets.

When pattern race winners are considered the more expensive graduates of Books 1 and 2 have more winners than those that sold more cheaply – it is most striking in Book 1, with 24 pattern race winners versus 10 from the bottom half. Table 6 gives this information by Book. The usual caveats apply with respect to interpretation given the small sample sizes.

When median ratings are compared the more expensive graduates of Book 1 performed best, followed by the more expensive graduates of Book 2. However the next best performer is a tie between the more expensive Book 3 graduates and the cheaper yearlings from Book 1.  Yet the more expensive Book 3 graduates have a median sales price less than half that of the cheaper Book 1 graduates, albeit with fewer pattern race winners. If there can ever be value in buying yearlings it appears that, at least in 2007, buying the most expensive Book 3 graduates paid off on the racecourse. It is possible this result is an artefact of the 2007 yearling draft, looking at the results from other years would answer this query.

Median Price Median Price Median Rating Median Rating
Book Top Half Bottom Half Top Half Bottom Half
1         165,000         46,500 82 73
2           70,000         24,000 79 69
3           21,000           6,500 73 64.5

Table 5: Prices paid and ratings within books

 

Book Top Half Bottom Half
1 24 10
2 10 6
3 6 4

Table 6: Pattern winners by book top and bottom half

Summary

Results from the Tattersalls Yearling Sale from 2007 show a noisy relationship between individual sales price and subsequent rating. However in aggregate the relationship becomes clear – the more expensive yearlings, taken as a group, subsequently performed better on the racecourse. It is when pattern races are considered that the benefits from buying at Book 1 were at their most apparent. The median sale in Book 2 took place at G40,000. In Book 1 this doubled to G80,000. Whilst it might seem poor value that spending twice as much resulted in an increase of just 4.5 rating points in the median ratings for Book1 versus Book 2, it nearly doubled the chances of buying a yearling that went on to win a pattern race. Yearlings are priced off the right had tail of the distribution of expected future ratings, and it is the right-hand skewness inherent in the expected future ratings of Book 1 yearlings that causes them to sell so much more expensively than yearlings catalogued in Books 2 and Book 3. The lottery ticket you buy when shopping at Book 1 has a much greater chance of coming up. When prices within Books are considered the same relationships are confirmed. Buying the more expensive graduates from within each Book resulted in higher ratings than attempting to bargain hunt amongst the cheaper yearlings in each Book. In Book 1 buying the more expensive yearlings resulted in nearly 2.5x as many pattern race winners. Now the noisiness of the relationship shown in Graph 1 above means that bargains were available at all prices and in all books, however the probability of buying a bargain yearling that subsequently performed well at the racecourse was maximised if you bought from amongst the more expensive Book 3 graduates.

Top Rated Selections: Often A Long Wait Between Drinks – Why?

Introduction

Tune in to Racing UK or ATR and the chances are the focus will be on picking the winner of the next race. The Racing Post has pages of form and commentary distilled into selections, naps and tips, typically resulting in one selection per race being made. Tipsters tables contain one selection per race,  Tom Segal’s Pricewise column in the Racing Post usually recommends one and occasionally two selections in a handful of Saturday races. For any gambler the key measure of success is the amount of money made or lost over a reasonable time period, and implicit in the various pieces of advice on offer is that one selection per race is the way to achieve gambling success. It seems obvious – there can only be one winner, I just need to find it! One of the consequences of making one selection per race is that you are maximising the chances of sustaining a long losing run. The volatility of your profits/losses are also maximised, as is the path dependency of your trading strategy. None of these are attractive characteristics.  Apart from the effect on your bank balance, losing runs can lead to self-doubt as methodology and existence of a trading edge are questioned, yet the length of the losing run  may be just noise, in line with what you might expect given the size of your trading edge. So what sort of losing runs might you expect given different degrees of edge over the market?  In this blog post Monte Carlo Simulation (MC) is used to compare losing runs given different degrees of trading edge and at different odds.

Methodology

A ten runner race is set up with a set of book odds where the book sums to a 7% over-round. A rating is attached to each horse, and the true odds of each horse winning is defined to be a function of the book odds and its rating. The function works so that highly rated horses have lower true odds than the book odds and vice versa for lowly rated horses. One of the parameters in the function is the degree to which the ratings have an edge over the market. The greater the edge the more the book odds are adjusted. The approach is Bayesian in nature.  The ratings used are arbitrary – they express in numerical form the the likelihood of a particular horse winning – the results presented here are not specific to the use of rating systems. Implicit in any bet placed by a gambler in a probabilistic setting is a set of underlying decisions based upon preferences or rankings that can be thought of as a set of ratings, even if they aren’t expressed as such.

Monte Carlo methods are used to run the race 30,000 times (defined as one simulation, this is equivalent to betting on 15 races a week for 40 years) using the true odds, as defined earlier, to determine the probability of each horse winning. If the winner coincides with the horse that is also top rated, the gambler wins. The book odds associated with the top rated selection and the level of edge are kept constant per simulation run. The process is repeated so that simulations are run at 4 different book odds and 4 levels of edge, to give 16 simulations in total. The book odds chosen are evens, 3/1, 6/1 and 9/1 and the levels of edge chosen to correspond to differing levels of Return on Capital (RoC) of 10%, 5% , 0% (break-even) and -7%.  The latter case represents someone with no edge whose losses over time equal the book over-round.

Relationship Between Edge,  Book Odds and True Odds

Table 1 below gives the relationship between the odds at which you back and the true odds for given levels of edge. So backing at 6/1 with a 10% edge represents true odds of 5.3/1. At a 5% edge backing a 3/1 shot represents true odds of 2.8/1, and backing an even money shot with a 10% edge has true odds of 4/5. The difference between book and true odds is small and sets the context for the analysis that follows. Whilst not the subject of this blog post, tables such as this can be used to give trigger levels at which bets become interesting for a given level of perceived trading edge.

Book Odds with 7% over-round 10% edge 5% edge break-even no edge
evens 0.8 0.9 1.0 1.1
3/1 2.6 2.8 3.0 3.3
6/1 5.3 5.6 6.0 6.5
9/1 8.1 8.5 8.9 9.7

Table 1: Book odds and true odds for differing levels of edge

Relationship Between Edge, Book Odds and Losing Run Length

Table 2 below gives the maximum losing run that from each simulation. The longest losing run experienced from betting at constant odds of evens with a 10% edge was 14 races, at 9/1 with a 10% edge 80 races. The reason it is often a long wait between drinks for top rated selections is the size of the trading edge compared with the odds at which horses are backed. Since the number in Table 2 represent the extreme case of the simulation, the length of losing run that occurs 5% of the time  is reported in Table 3.  Note how the length of losing run changes little with edge. If you typically bet at 6/1 and think you have a 5% edge, and you are on your 17th losing wager, there are no obvious signs from Tables 2 and 3 that you are experiencing anything other than a losing run that occurs one time in twenty. If Pricewise has a 10% edge and gives 3 selections a week all at 9/1, these results suggest that at worst  he could go half a year without a selecting a winner. Note that in practice gamblers will be betting wherever value is perceived regardless of book odds, and the fixing of odds across all simulations is artificial. However it would be straightforward to weight the results to reflect the proportion of bets you typically placed at various odds.

In finance one criteria used to judge the quality of returns delivered by investment managers is the Sharpe Ratio. This penalises returns by the volatility of the return stream. Inspection of table 3 shows that the highest Sharpe Ratio would come from betting even money shots. To emphasise, there is no suggestion that betting even money represents greater value than betting at bigger odds. The simulations are set up so the Return on Capital achieved are the same, and the value inherent in the even money shot is the same as in the 6/1 shot. However the path to terminal wealth followed by betting at evens is inherently less volatile than betting at bigger odds.

Book Odds with 7% over-round 10% edge 5% edge break-even no edge
evens 14 14 16 19
3/1 24 27 27 27
6/1 59 59 60 65
9/1 80 80 80 80

Table 2: Book odds and maximum losing runs  for differing levels of edge

Book Odds with 7% over-round 10% edge 5% edge break-even no edge
evens 3.2 3.5 3.8 4.2
3/1 8.7 9.3 9.8 10.6
6/1 16.9 17.8 18.8 20.3
9/1 25.2 26.4 27.9 ;30.3

Table 3: Book odds and 95% probability losing runs

Relationship Between Edge, Book Odds and Time to Last Cumulative Loss

The results presented so far are unaffected by staking plans. In Table 4 below the number given represents the last race in the simulation at which cumulative profits are negative. This gives a sense of the number of races for the signal inherent in the edge to outweigh the noise. For level stakes betting at 6/1 with a 10% edge , profitability is always positive from the 2,324th race. Note the substantial step up in the wait for cumulative profitability when betting at 6/1 compared with 9/1 at the 5% edge level, and when betting at 3/1 compared with 6/1 at the 10% edge level.  The results highlight the increased path dependency inherent in betting at higher odds. The range of possible outcomes is such that it can take much longer to move into positive cumulative profitability.

Note that employing staking plans such as The Kelly Criterion would potentially improve this level staking result so that the month numbers were lower, particularly for the higher odds results presented, however since the cumulative profits/losses will have meandered around zero the effect on the broad thrust of the conclusion reached is likely to be small.

Book Odds with 7% over-round 10% edge 5% edge
evens              12            432
3/1            628         2,673
6/1         2,324         2,919
9/1         2,923         8,444

Table 4: Book odds and number of races and last breake-ven race

Conclusions

If you choose to bet on the horse that represents your top pick in a race, and you adopt this as a betting approach over many races, you are maximising the total profits you can expect to accrue over time. However this approach has costs associated with it. Whilst maximising expected total profits, you are also maximising both the volatility of your trading profits and exposure to path dependency.

Losing run length is primarily driven by the odds at which you back horses. It is difficult to identify that you have lost your edge in the middle of a losing run because losing run length is primarily driven by the odds at which you bet rather than the size of your betting edge. What may appear to be a loss of ability could merely be an unlucky run that is merely noise.  The reason it is often a long wait between drinks for top rated selections is the size of the trading edge compared with the odds at which horses are backed.

Betting at shorter prices minimises trading profit volatility, path dependency and reduces losing run length. Splitting your stake across more than one selection in a race will (subject to your edge being similar across all runners in a race) increase the probability that your edge will be reflected in your trading profits. These profits will not be as large as if you had made one winning selection, however what you make will be made far more often.  Betting on a number of horses in a race effectively creates one shorter priced aggregate bet. This has a number of attractive features – it reduces losing run length, reduces trading profit volatility and reduces exposure to path dependency. The cost of this approach is that over the long run total profits will be less than betting on one selection only. The trade-off between the two approaches is interesting. Given the associated drawbacks, it is surprising the one selection per race approach appears to be so little questioned and so popular.

Jamie Spencer – Riding Style & Results: What Does The Data Tell Us?

Introduction

The recent criticism by Luca Cumani of two of Jamie Spencer’s rides this year on Mount Athos by Luca Cumani ( “it’s on record that he was given two very bad rides” has caused a good deal of comment and publicity. Simon Holt devotes his column in the Racing Post Weekender this week (25th September edition) to a discussion of Jamie Spencer’s riding style, concluding that “this is a jockey with a bit of star quality and his career record provides impressive defence against the critics”.  As Simon Holt points out, the hold up style he adopts can lend itself to criticism if a horse is perceived as being delivered too late, such as his recent ride on York Glory in the 2013 renewal of the Beverley Bullet. However much of the criticism leveled appears to be founded on one or two rides, rather than by considering his performance over many rides.  In this blog post all of Jamie Spencer’s rides in the 2013 flat season (to 25th September) were examined in terms of riding style, Impact Values and ratings. The rides of a number of other jockeys (Ryan Moore, Richard Hughes, James Doyle and Joe Fanning) were also examined. With each of these jockeys having ridden over 400 rides each this season to date, there is plenty of data to interrogate.

Definition of Running Style/Early Pace Position

The analysis that underpins this piece was carried out in the R statistical environment accessing Raceform Interactive (RFI) data. This is the same data used by the Racing Post. In running comments were used to identify the Early Pace Position (EPP) adopted by each horse in each race contained within  the database. In this blog post the terms EPP and ‘running style’ are used interchangeably. Five categories of running style were defined: leading(1), prominent (2), midfield (3), held up (4) and in rear (5).   Armed with an EPP by horse by race, the most frequently adopted running style by each horse can be identified. These EPPs can be used in conjunction with the remainder of the information contained within the RFI database to examine the relationship between running style and jockey performance. Horses had to have run at least 3 times for an EPP to be assigned to the horse, so if for example, Orfevre ran three times, twice in the lead (style 1) and once prominently (style 2), he’d be assigned an EPP of 1. After this parsing exercise we have the running style adopted by each horse in each race that it took part, and the running style each horse has adopted most frequently in the past.

Jockey Rides, Horse Ability and Starting Prices

To help put Jamie Spencer’s riding style in context the following jockeys were chosen for comparison: Ryan Moore, Richard Hughes, James Doyle and Joe Fanning. The first two are vying for champion jockey in 2013, James Doyle has recently been appointed Prince Khalid Abdullah’s jockey, whilst  Joe Fanning is known for adopting front running tactics and should provide a contrast with the riding style adopted by Jamie Spencer. Table 1  gives information about the ability of horses ridden (using the median rating across all rides ) by each jockey and betting market expectations  using average and median Starting Prices (SPs). All rides in the 2013 flat season were considered. Ryan Moore rides horses with the most ability, posting a median RPR of 81, followed by Jamie Spencer , Richard Hughes, James Doyle  and then Joe Fanning. Note that the SPs for Ryan Moore and Richard Hughes’s rides are close, suggesting the betting markets typically rate their chances similarly. Jamie Spencer comes next in terms of market expectations, with James Doyle last after Joe Fanning, even though, on average, he rides more highly rated horses.

Jockey Median Rating of Rides Average SP Median SP
J Fanning 69 10.4 7.0
R Hughes 76 5.7 4.0
J Doyle 73 12.7 7.5
J Spencer 78 8.7 6.0
R Moore 81 5.5 4.0

Table 1: Median ratings of rides and SP information for selected jockeys

Early Pace Position Profiles

The proportion of horses in each EPP category is given in Table 2, along with wins per category and associated Impact Values (IVs). Impact Value has its usual definition. The IV for front runners is 1.88. As is widely known front runners win more frequently than other  running styles. IVs  for EPP styles 2 (prominent) and 3 (mid-division) are similar at 0.94 and 0.99 respectively, with hold up horses performing somewhat worse at 0.83 and horses that race in rear reporting the lowest IV of 0.60. The IVs reported here by riding style suggest that the most important decision a jockey can take is whether to front run or not. After that, racing prominently or in mid division has similar outcomes, whilst being held up or in rear suggests that the further back you race from a midfield position, the less likely it is that you will win races. There is an important caveat here – the EPP adopted it is not entirely in the jockeys hands, but conditioned on a number of factors, only some of which are in his control. However we do know that on average horses do appear to have a favoured EPP, and this is useful for some of the analysis that follows.

EPP wins runs proportion IV
1 963 4466 12% 1.88
2 408 3787 10% 0.94
3 1714 15097 39% 0.99
4 814 8550 22% 0.83
5 479 6329 17% 0.66
TOTAL 4378 38229 100% 1.00

Table 2: EPP running styles, proportions and Impact Values

Jockey Riding Styles

Perceptions are borne out by the data – Jamie Spencer rides far fewer horses in mid-division than other jockeys, preferring to hold them up or ride them in rear.  Table 3 takes every ride of each jockey and amalgamates by EPP. The differences in EPP adopted by Jamie Spencer are substantial compared with the other jockeys in the table. Note that he rides as least as many front runners and prominent horses as Ryan Moore and James Doyle, it is the mid-division category that he eschews, with more than half of his rides categorised as either held up or in rear.

Jockey EPP1 EPP2 EPP3 EPP4 EPP5
J Fanning 19% 13% 43% 16% 9%
R Hughes 17% 7% 35% 20% 21%
J Doyle 10% 6% 37% 29% 19%
J Spencer 13% 5% 25% 27% 31%
R Moore 10% 8% 40% 23% 20%

Table 3: Riding styles adopted by jockey

Does this result hold when checked against the most frequent riding style of the horses ridden by our jockeys?  Table 4 shows there is some evidence that Jamie Spencer tends to ride more horse that have a hold up running style. However, this could have been caused by the fact that he might be the only jockey to have sat on the horse and thus contributed to its running style. This makes  interpretation more difficult. On balance, however, comparing the riding proportions in tables 4 and 5  shows that Jamie Spencer does appear to ride his mounts with more restraint than is usually the case.

Jockey EPP1 EPP2 EPP3 EPP4 EPP5
J Fanning 11% 15% 33% 20% 21%
R Hughes 6% 13% 31% 29% 22%
J Doyle 4% 10% 29% 32% 25%
J Spencer 4% 8% 31% 30% 27%
R Moore 6% 10% 33% 30% 20%

Table 4: Riding styles by horse

Relationship between Running Style (EPP) and Ratings Achieved

A  measure that compares the rating of each run relative to the maximum rating the horse has achieved is defined as the Relative to Maximum – RTM.  Table 5 below shows average RTM by running style style. On average horses run ca. 18lb below their maximum rating. This is no surprise – ratings are negatively skewed – bounded on the upside by ability and the relatively rare confluence of a set of circumstances that allows a horse to achieve its maximum rating,  and exposed to substantial downside as any number of events (going, draw, pace, opposition, trip and so on)  cause horses to run below their best. Horses with a  prominent running style (EPP 2) are most likely to perform below their best. Remember from table 2 horses that race prominently deliver lower IVs than those that race in mid-division.  It is possible that the pressure of racing prominently conspires against these horses. The best RTM numbers reported are for horses that are held up or ridden in rear. Given the IVs for these categories are substantially lower than 1, a likely explanation for them running more closely to their maximum rating is that they are running on past beaten horses to be placed rather than winning. This has implications for their handicap ratings relative to their ability.  In tables 4 and 5 we have IVs and RTM values by running style classification for all races that took place on the flat in 2013. These tables give us a sense of how often horses win given the their riding style, and to what degree they run close to their maximum form.  Now we turn to the same information at the individual jockey level.

Early Pace Position (EPP) Rating To Maximum (RTM) – average
1 -18.4
2 -19.0
3 -17.9
4 -17.7
5 -17.0

Table 5: RTM by EPP style category 

Jockey Performance: Impact Values & RTM Ratings

Two approaches to measuring jockey ability are those used by John Whitley, often mentioned by James Willoughby on Racing UK, and Timeform. In this blog post two measures already employed, Impact Values  (IVs) and Run To Maximum (RTM) , are calculated at the jockey level. Impact Values by jockey by running style are reported in Table 6 below, RTMs by running style are reported in table 7.  Ryan Moore performs the best across both measures.  On average his rides perform about 8lb better than average (-8lb vs -18lb) and ca. 3lb better than the other jockeys considered here. Particularly noteworthy is his performance on front runners, where he performs nearly 10lb better than average, with an IV of 3.9.  Joe Fanning performs best when he rides front runners.  Richard Hughes, James Doyle and Jamie Spencer perform similarly to each other based upon RTMs – about 5lb better than average, but ca. 3lb behind Ryan Moore. If Starting Prices and horse ability are considered, James Doyle performs particularly well. Perhaps the betting market has underestimated his abilities – if so, his recent appointment by Prince Khalid Abdullah and the likely increase in the quality of his mounts  is likely to change this.

Turning to Jamie Spencer:  he performs best on front running rides, delivering similar IVs to Richard Hughes and yet performing 2.5lb better on average.  What of his hold up rides? Considering horse that are held up or ridden in rear (EPP 4 and 5) , Jamie Spencer’s rides perform second only to Ryan Moore in terms of RTM. Yet the IVs for both of these categories are the second lowest of the jockeys considered here. There are a couple of interpretations. The first is that hold up horses are running into places, achieving respectable ratings and yet not winning. The second is that the horses are being ridden in a style that maximises their chances of running close to their maximum ratings, and the IVs will, over many more rides, reflect this.

Jockey EPP1 EPP2 EPP3 EPP4 EPP5
J Fanning 2.52 1.41 1.13 0.39 0.93
R Hughes 2.76 2.25 2.04 1.65 1.75
J Doyle 1.35 1.14 0.97 1.69 1.79
J Spencer 2.73 2.24 1.69 1.21 1.05
R Moore 3.90 1.74 2.21 1.34 2.07

Table 6: Impact Values by jockey by EPP classification

Table 7 below shows RTM averages by jockey by EPP category. A discussion of IVs and RTM by jockey follows table 7.

Jockey EPP1 EPP2 EPP3 EPP4 EPP5
J Fanning -15.3 -20.4 -17.8 -16.4 -15.5
R Hughes -13.5 -11.8 -12.9 -13.0 -12.9
J Doyle -12.6 -15.1 -11.7 -13.1 -13.9
J Spencer -11.0 -14.7 -13.3 -11.6 -12.6
R Moore -8.5 -10.2 -9.8 -9.4 -10.0

Table 7: RTM by jockey by EPP classification

Summary

  • Ryan Moore is viewed by many as the best rider in the UK – the analysis in this blog post supports this view.
  • James Doyle rides as well as Richard Hughes and Jamie Spencer and  has done so on longer priced horses with less ability.
  • Jamie Spencer rides horses further back than their usual position in races, and in doing so enables them to run closer to their maximum rating. The data suggests riding further back is a matter of choice. Whilst riding horse further back typically compromises their chances of winning races to a degree, the Impact Values for Jamie Spencer’s hold up rides are significantly above the average and also greater than 1. However they are also below that reported by Richard Hughes, James Doyle and Ryan Moore. It is possible that over time and over many more rides, the fact that his mounts are running closer to maximum ratings will be reflected in higher Impact Values than delivered in the 2013 flat season.

Measuring Training Yard Success: Impact Values from Maiden, Handicap & Pattern Races

Introduction

The champion trainer for the season is decided using total prize money earned. This measure favours the very largest training yards, particularly those that have access to the offspring of top stallions.  As a result it is somewhat unsatisfactory measure of training yard success. Since Impact Values (IVs) correct for yard size by taking into account the number of runners as well as number of winners,  the playing field between yards of differing sizes is, to a good degree, made level when this measure is used. Whilst there are also limitations with using this measure across all races and for all trainers, the net is cast wider.  In this blog post IVs for different categories of race, namely maidens, handicaps and pattern races, are calculated, both raw and adjusted for Sire IVs (SA), then combined to produce a composite IV measure. Measuring IVs in different race categories enables a more complete picture of training yard success to be built. A by-product of the approach used is that trainers whose results are most and least influenced by the success of particular stallions can be identified.

Data & Methodology

The analysis that underpins this piece was carried out in the R statistical environment accessing Raceform Interactive data for the 2012 flat season. The R code is posted elsewhere for interested readers. To qualify for inclusion in the tables that follow, a training yard must have sent out at least 50 runners in handicaps and 100 runners in total over the course of the 2012 flat season, and be based in Great Britain (GB). A total of 139 yards met this criteria. These yards were then split into 2 groups according to how many different horses had been raced – 66 yards raced at least 40 different horses and are the focus of the analysis in this blog piece. The other 73 yards, smaller in size, were analysed separately and may be the subject of a further blog post. Since we know that on average larger yards deliver higher IVs than smaller yards (see my earlier blog post on this subject) smaller yards that perform well may not have appeared in the listings reported below and it is more appropriate to analyse their results separately.

Impact Values – Maidens

Maiden race IVs are likely to favour large yards with access to potential pattern class horses. Table 1 shows the top 10 yards ranked by sire adjusted IV in maidens. Raw IVs are also reported. Note the dominance of the Richard Hannon yard and the small difference between raw and sire adjusted IVs compared with the larger differences between IVs for Saeed bin Suroor and William Haggas. The large number of horses at the Hannon’s yard appears to confer a substantial advantage in being able to place horses to good effect within maidens. The same comments apply to Richard Fahey’s results. In both yards the large number of horses at their disposal appears to outweigh any advantage given to other yards via ostensibly better bred horses.

Rank Trainer wins runs IV raw IV SA
1 Mrs K Burke 14 49 2.70 3.02
2 Saeed bin Suroor 33 121 2.58 1.93
3 Peter Chapple-Hyam 14 68 1.95 1.81
4 William Haggas 42 182 2.18 1.73
5 Henry Candy 9 63 1.35 1.63
6 Richard Hannon 82 471 1.64 1.58
7 Jeremy Noseda 18 98 1.74 1.57
8 John Quinn 7 39 1.70 1.55
9 Richard Fahey 35 225 1.47 1.49
10 David Simcock 20 108 1.75 1.42

Table 1: Top 10 training yards by Sire adjusted IV in maidens

Impact Values – Handicaps

Table 2 shows the top 10 yards ranked by Sire adjusted IV in handicaps. Raw IVs are also reported. Sir Mark Prescott Bt tops the table, although in common with the majority of the trainers in the top 10 his Sire adjusted IV is substantially lower than his raw IV. Noteworthy are the results of Chris Wall and Michael Appleby, whose IVs are hardly affected by the relative success of the sires of their horses in training. Part of this result is due to their lack of relative success in maidens, suggesting their horses are likely to be highly competitive when they move out of maidens  into handicap company – Chris Wall’s IV in maidens was 0.43, whilst Michael Appelby sent out no maiden winners in 2012. In contrast Sir Mark Prescott Bt, along with 6 other trainers, delivered IVs above 1 in both maiden and handicap company. The other 6 were Marcus Tregoning, Luca Cumani, Sir Michael Stoute, Ed Dunlop, James Fanshawe, Roger Varian and Mick Channon.

Rank Trainer wins runs IV raw IV SA
1 Sir Mark Prescott Bt 31 131 2.45 1.98
2 Marcus Tregoning 16 83 1.99 1.88
3 Sir Michael Stoute 25 115 2.25 1.83
4 Luca Cumani 24 129 1.92 1.75
5 Chris Wall 18 104 1.79 1.74
6 Roger Varian 28 155 1.87 1.61
7 Peter Chapple-Hyam 10 63 1.64 1.53
8 William Haggas 30 166 1.87 1.52
9 Michael Appleby 24 162 1.53 1.52
10 Tom Dascombe 33 211 1.62 1.51

Table 2: Top 10 training yards by Sire adjusted IV in handicaps

Impact Values – Pattern Races

Table 3 shows the top 20 yards ranked by Sire adjusted IV in pattern races. Raw IVs are also reported. The results are more difficult to interpret than maidens and handicaps for individual trainers because of small sample sizes. The Richard Hannon and John Gosden yards dominate the table in terms of number of winners and runners, however the Sire adjusted IVs for both trainers are noticeably  lower than their raw IVs. It is possible this  result is an artefact created by their substantial relative success in producing pattern class winners during the 2012 flat season. A number of yards that perform well on the IV measure in maiden company do not appear in the table below.

Rank Trainer wins runs IV raw IV SA
1 Ann Duffield 2 4 4.47 6.12
2 Alan McCabe 1 5 1.79 2.92
3 David Simcock 2 14 1.28 2.05
4 Sir Henry Cecil 16 60 2.39 1.89
5 David O’Meara 4 20 1.79 1.89
6 Roger Charlton 10 46 1.94 1.84
7 David Barron 2 13 1.38 1.58
8 Roger Varian 9 52 1.55 1.56
9 Sir Michael Stoute 6 41 1.31 1.42
10 Richard Fahey 9 81 0.99 1.36
11 Mrs K Burke 1 12 0.75 1.32
12 Chris Wall 2 15 1.19 1.27
13 Clive Cox 7 35 1.79 1.27
14 Henry Candy 1 11 0.81 1.23
15 Richard Hannon 21 139 1.35 1.16
16 Mark Johnston 8 61 1.17 1.15
17 Marcus Tregoning 2 17 1.05 1.15
18 Luca Cumani 4 26 1.38 1.10
19 John Gosden 23 130 1.58 1.09
20 Mahmood Al Zarooni 9 69 1.17 1.06

Table 3: Top 10 training yards by Sire adjusted IV in pattern races

Impact Values – Composite Measure

A composite IV is calculated by combining together the IVs for maidens, handicaps and pattern races by trainer, weighting by the proportion of runs that each trainer had in each category.  Thus a trainer without runners in pattern races would not be penalised for his non-participation, and the biggest contributor to each trainer’s IV is from the category of race in which they had the biggest proportion of runners. The composite measure was also adjusted for Sire IV. Using this measure Sir Mark Prescott Bt was the top trainer on the flat in 2012, followed by William Haggas and Marcus Tregoning. Noteworthy results were produced by Henry Candy, David Barron, Michael Appleby and Chris Wall, each of whom saw their IV increase after taking the Sire IV adjustment  into account. For 16 of the 20  trainers we see the opposite, suggesting that the adjustment for bloodstock quality used here via a Sire adjusted IV does not go far enough. I will return to this subject in another blog article. Thanks to Declan Meagher and others for making this point  on the separate blog post “Do Small Training Yards Punch Above Their Weight?’.

Rank Trainer IV raw IV SA
1 Sir Mark Prescott Bt 1.81 1.55
2 William Haggas 1.87 1.55
3 Marcus Tregoning 1.62 1.53
4 Saeed bin Suroor 1.86 1.52
5 Roger Varian 1.76 1.51
6 Peter Chapple-Hyam 1.59 1.51
7 Henry Candy 1.31 1.49
8 Sir Michael Stoute 1.91 1.48
9 Sir Henry Cecil 1.86 1.42
10 David Barron 1.25 1.42
11 Richard Hannon 1.52 1.41
12 Luca Cumani 1.59 1.40
13 Jeremy Noseda 1.53 1.39
14 Mrs K Burke 1.39 1.38
15 Michael Appleby 1.29 1.36
16 Chris Wall 1.34 1.35
17 Ralph Beckett 1.64 1.34
18 Roger Charlton 1.49 1.30
19 Tom Dascombe 1.37 1.29
20 David Simcock 1.34 1.27

Table 4: Top 20 training yards by composite IV adjusted for Sire

Training Yards Success & Relationship with Sire Quality

How many training yards are able to deliver improved IVs after the Sire adjustment is taken into account? Remember for successful yards the natural direction for the Sire adjustment to take your IV is downwards. This is because the better quality Sires make an outsized contribution in terms of siring winners. So the yards that are able to increase their IVs after this adjustment is applied are worthy of note. There are 10 yards out of the 66 – see Table 5 below –  that were able to deliver an adjusted composite IV both greater than 1 and higher than their raw composite IV. Henry Candy and David Barron’s results are noteworthy.

Rank Trainer IV comp IV comp SA Difference
1 Henry Candy 1.31 1.49 0.18
2 David Barron 1.25 1.42 0.18
3 Michael Appleby 1.29 1.36 0.07
4 Chris Wall 1.34 1.35 0.02
5 Brian Ellison 1.25 1.26 0.01
6 Kevin Ryan 1.11 1.22 0.11
7 James Given 1.01 1.15 0.14
8 John Quinn 1.12 1.15 0.03
9 Marco Botti 1.06 1.06 0.01
10 Alan Swinbank 1.00 1.02 0.01

Table 5: Top 10 trainers with improved IVs after Sire adjustment ranked on Sire adjusted IV

What of yards that see falls in their IVs after the Sire adjustment is applied? Table 6 ranks the 10 training yards most affected by the Sire IV adjustment. These yards are still highly successful – they still post IVs substantially greater than 1. However, using this metric suggests that these training yards are more reliant than others on the quality of their bloodstock for their success.

Rank Trainer IVcomp IVcomp SA
57 Roger Varian 1.76 1.51
58 Sir Mark Prescott Bt 1.81 1.55
59 James Fanshawe 1.50 1.24
60 Ralph Beckett 1.64 1.34
61 Mahmood Al Zarooni 1.51 1.19
62 William Haggas 1.87 1.55
63 Saeed bin Suroor 1.86 1.52
64 John Gosden 1.65 1.24
65 Sir Michael Stoute 1.91 1.48
66 Sir Henry Cecil 1.86 1.42

Table 6: Bottom 10 trainers with reduced IVs after Sire adjustment

Summary

In this paper the criteria used for measuring training yard success is a Sire Adjusted Impact Value derived from results delivered in maidens, handicaps and pattern races. Using this measure Sir Mark Prescott Bt was the top trainer on the flat in 2012. It is probable the Sire IV adjustment used does not go far enough in terms of correcting for quality and another blog post will address this point.  A small number of trainers produce IVs that improve after an adjustment for Sire quality is made. These training yards are of particular interest.  .

Measuring Training Yard Success: R code

####################################################################
#
#
# Measuring Training Yard Success: Impact Values from Maidens, Handicaps and Pattern Races
# J. Hathorn
#
# v1.0
#
#
# written 18-Sep-13
#
#
###################################################################

#rm(list=ls())

library(foreign)
library(maptools)

# read in database files from RI
#
setwd(“C:/Program Files (x86)/RaceForm Interactive”)

RIhorse.data <-read.dbf(“horse.dbf”)
RIouting.data <-read.dbf(“outing.dbf”)
RIrace.data <-read.dbf(“race.dbf”)
RIsire.data <-read.dbf(“sire.dbf”)
RItrainer.data <- read.dbf(“trainer.dbf”)
RIcourse.data<-read.dbf(“course.dbf”)

# #############################################################
# set date parameters to focus on races between chosen dates
# flat season Lincoln to the November Handicap
chosenDateSt<-c(“2012-03-31”)
chosenDateEd<-c(“2012-11-10”)

# set dates for determining yard sizes, set the previous year to the November Handicap
#chosenDateSt1<-c(“2011-11-11”)
#chosenDateEd1<-c(“2012-11-10”)
#################################################
#
# extract GB course id list from course db
z<-which(RIcourse.data$CCOUNTRY == “GB”)

GBcourseids<-RIcourse.data$CID[z]
GBcoursenames<-RIcourse.data$CNAME[z]
#
# extract GB/IRE trainer lists from trainer db
z<-which(RItrainer.data$TCOUNTRY == “GB”)
GBtrainers<-RItrainer.data$TID[z]
z<-which(RItrainer.data$TCOUNTRY == “IRE”)
IREtrainers<-RItrainer.data$TID[z]
GBIREtrainers<-append(GBtrainers,IREtrainers)
#
#
##################################################

# select outings on the flat between the chosen dates
tmpidx<-which(RIouting.data$ODATE>=chosenDateSt & RIouting.data$ODATE<=chosenDateEd & RIouting.data$OFJ==”F”)
T1.data<-RIouting.data[tmpidx,]

# match the course and add a country variable
z<-match(T1.data$OCOURSEID,RIcourse.data$CID)
T1.data$COCOUNTRY<-NA
T1.data$COCOUNTRY<-RIcourse.data$CCOUNTRY[z]

T1a.data<-T1.data
# select outings that took place on GB courses and append the 2 d f
tmpidx<-which(T1.data$COCOUNTRY == “GB”)
T1.data<-T1.data[tmpidx,]

# #######################
# GB races only
# code to include IRE races if so desired
#tmpidx<-which(T1a.data$COCOUNTRY == “IRE”)
#T2.data<-T1a.data[tmpidx,]
#T1.data<-rbind(T1.data,T2.data)

# ######################
# age restriction if required
# reduce T1 to horses of the desired age given by the parameter agecheck
#agecheck<-2
#tmpidx<-which(T1.data$OAGE==agecheck)
#T1.data<-T1.data[tmpidx,]

# match each horse into the RIhorse d f to get the sire id
z<-match(T1.data$OHORSEID,RIhorse.data$HID)
T1.data$SIREID<-RIhorse.data$HSIREID[z]

# attach some of the race conditions ie age, stakes, handicap etc from the RIrace d f
z<-match(T1.data$ORACEID,RIrace.data$RID)
T1.data$RCOND<-RIrace.data$RCOND[z]
T1.data$RAGE<-RIrace.data$RAGE[z]
T1.data$RANIMAL<-RIrace.data$RANIMAL[z]
T1.data$RPATTERN<-RIrace.data$RPATTERN[z]
T1.data$RISHCAP<-RIrace.data$RISHCAP[z]

# set a 1/0 variable for winners and a 1 variable for runners will help aggregation later
T1.data$runner<-1
T1.data$winner<-0
z<-which(T1.data$OPOS==1)
T1.data$winner[z]<-1

# SP turned into a probability
T1.data$SPprob<-1/(1+T1.data$OSPVAL)

# reformat the ORF rating variable
T1.data$ORF<-as.character(T1.data$ORF)
T1.data$ORF<-gsub(“\\?”,””,T1.data$ORF)
T1.data$ORF<-gsub(“\\+”,””,T1.data$ORF)
T1.data$ORF<-as.numeric(T1.data$ORF)
tmpidx<-which(T1.data$ORF==0)
T1.data$ORF[tmpidx]<-NA
# reformat the OJC rating variable
T1.data$OJC<-as.character(T1.data$OJC)
T1.data$OJC<-gsub(“\\?”,””,T1.data$OJC)
T1.data$OJC<-gsub(“\\+”,””,T1.data$OJC)
T1.data$OJC<-as.numeric(T1.data$OJC)
tmpidx<-which(T1.data$OJC==0)
T1.data$OJC[tmpidx]<-NA

####################################################
#
# produce summary variables by horse – runs/wins/ratings etc
#
hid<-tapply(T1.data$OHORSEID,T1.data$OHORSEID,mean,na.rm=TRUE)
hruns<-tapply(T1.data$runner,T1.data$OHORSEID,sum,na.rm=TRUE)
hwins<-tapply(T1.data$winner,T1.data$OHORSEID,sum,na.rm=TRUE)
hwinner<-pmin(1,hwins)
hrunner<-pmin(1,hruns)
hORmax<-tapply(T1.data$OJC,T1.data$OHORSEID,max,na.rm=TRUE)
hRFmax<-tapply(T1.data$ORF,T1.data$OHORSEID,max,na.rm=TRUE)
z<-which(hORmax==-Inf)
hORmax[z]<-NA
z<-which(hRFmax==-Inf)
hRFmax[z]<-NA

z<-match(hid,T1.data$OHORSEID)
htrainerid<-T1.data$OTRAINERID[z]
z<-match(hid,RIhorse.data$HID)
hname<-RIhorse.data$HNAME[z]
# put these into a d f
HSummary<-data.frame(hid,hname,htrainerid,hruns,hwins,hwinner,hrunner,hORmax,hRFmax)

# ###############
# produce population wide summary stats
univ.OR.med<-median(HSummary$hORmax,na.rm=TRUE)
univ.RF.med<-median(HSummary$hRFmax,na.rm=TRUE)
univ.RF.sd<-sd(HSummary$hRFmax,na.rm=TRUE)
univ.RF.1sdup<-univ.RF.med+univ.RF.sd
univ.winners<-sum(HSummary$hwinner)
univ.runners<-sum(HSummary$hrunner)
univ.winpct<-univ.winners/univ.runners

HSummary$hRF1sdup<-0
z<-which(HSummary$hRFmax>univ.RF.1sdup)
HSummary$hRF1sdup[z]<-1
univ.RF.1sduppct<-sum(HSummary$hRF1sdup)/univ.runners

# ######################################################
#
# now take the horse summary df and produce a trainer summary based upon the horse summary d f

trainer.h<-tapply(HSummary$htrainerid,HSummary$htrainerid,mean,rm=TRUE)
wins.h<-tapply(HSummary$hwins,HSummary$htrainerid,sum,na.rm=TRUE)
runs.h<-tapply(HSummary$hruns,HSummary$htrainerid,sum,na.rm=TRUE)
winspct.h<-wins.h/runs.h
winner.h<-tapply(HSummary$hwinner,HSummary$htrainerid,sum,na.rm=TRUE)
runner.h<-tapply(HSummary$hrunner,HSummary$htrainerid,sum,na.rm=TRUE)
winpct.h<-winner.h/runner.h
#ORmax.med.h<-tapply(HSummary$hORmax,HSummary$htrainerid,median,na.rm=TRUE)
RFmax.med.h<-tapply(HSummary$hRFmax,HSummary$htrainerid,median,na.rm=TRUE)
RFmax.sd.h<-tapply(HSummary$hRFmax,HSummary$htrainerid,sd,na.rm=TRUE)
RFmax.up1sd.h<-RFmax.med.h+RFmax.sd.h
RF.1sdup.h<-tapply(HSummary$hRF1sdup,HSummary$htrainerid,sum,na.rm=TRUE)
RF.1sduppct.h<-RF.1sdup.h/runner.h

TrainerHorses<-data.frame(trainer.h,wins.h,runs.h,winspct.h,winner.h,runner.h,winpct.h,RFmax.med.h,RFmax.sd.h,RFmax.up1sd.h,RF.1sdup.h,RF.1sduppct.h)
z<-match(TrainerHorses$trainer.h,RItrainer.data$TID)
TrainerHorses$tname.h<-RItrainer.data$TSTYLENAME[z]

# write out this d f to a CSV file
#fname<-“c:/Racing Research/Trainer Research/trainerhorses.csv”
#write.csv(TrainerHorses,file=fname)

# ####################################################
#
# now go back to the Outing d f and split into race categories to get IVs etc by trainer
#
# split the races into different data frames
# Maidens
# Handicaps
# Pattern

# ###################################################
#
# put maidens into their own d f
z<-which(T1.data$RANIMAL==”MDN”)
T2.data<-T1.data[z,]

# set up Sire IVs in maidens
sire.ID<-tapply(T2.data$SIREID,T2.data$SIREID,mean,na.rm=TRUE)
sire.wins <- tapply(T2.data$winner,T2.data$SIREID,sum,na.rm=TRUE)
total.wins<-sum(sire.wins)
sire.runs <- tapply(T2.data$runner,T2.data$SIREID,sum,na.rm=TRUE)
total.runs<-sum(sire.runs)
sire.IV<-(sire.wins/total.wins)/(sire.runs/total.runs)

# bring sire IV back into the T2 d f and calc a sire adjusted run variable
z<-match(T2.data$SIREID,sire.ID)
T2.data$sire.IV<-sire.IV[z]
T2.data$runner.SA<-T2.data$runner*T2.data$sire.IV

# calc the trainer IVs in maidens
trainerID<-tapply(T2.data$OTRAINERID,T2.data$OTRAINERID,mean,na.rm=TRUE)
trainer.wins <- tapply(T2.data$winner,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.wins<-sum(trainer.wins)
trainer.runs <- tapply(T2.data$runner,T2.data$OTRAINERID,sum,na.rm=TRUE)
trainer.runs.SA <- tapply(T2.data$runner.SA,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.runs<-sum(trainer.runs)
total.runs.SA<-sum(trainer.runs.SA)
trainer.IV<-(trainer.wins/total.wins)/(trainer.runs/total.runs)
trainer.IV.SA<-(trainer.wins/total.wins)/(trainer.runs.SA/total.runs.SA)

#copy over to maiden specific variables
wins.mdns<-trainer.wins
runs.mdns<-trainer.runs
runs.mdns.SA<-trainer.runs.SA
IV.mdns<-trainer.IV
IV.SA.mdns<-trainer.IV.SA

# put into a maiden trainer summary d f
TrainerMdns<-data.frame(trainerID,wins.mdns,runs.mdns,runs.mdns.SA,IV.mdns,IV.SA.mdns)

# #####################################################
#
# put handicaps into their own d f
z<-which(T1.data$RISHCAP==”TRUE”)
T2.data<-T1.data[z,]

# set up Sire IVs in maidens
sire.ID<-tapply(T2.data$SIREID,T2.data$SIREID,mean,na.rm=TRUE)
sire.wins <- tapply(T2.data$winner,T2.data$SIREID,sum,na.rm=TRUE)
total.wins<-sum(sire.wins)
sire.runs <- tapply(T2.data$runner,T2.data$SIREID,sum,na.rm=TRUE)
total.runs<-sum(sire.runs)
sire.IV<-(sire.wins/total.wins)/(sire.runs/total.runs)

# bring sire IV back into the T2 d f and calc a sire adjusted run variable
z<-match(T2.data$SIREID,sire.ID)
T2.data$sire.IV<-sire.IV[z]
T2.data$runner.SA<-T2.data$runner*T2.data$sire.IV

# calc the trainer IVs in handicaps
trainerID<-tapply(T2.data$OTRAINERID,T2.data$OTRAINERID,mean,na.rm=TRUE)
trainer.wins <- tapply(T2.data$winner,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.wins<-sum(trainer.wins)
trainer.runs <- tapply(T2.data$runner,T2.data$OTRAINERID,sum,na.rm=TRUE)
trainer.runs.SA <- tapply(T2.data$runner.SA,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.runs<-sum(trainer.runs)
total.runs.SA<-sum(trainer.runs.SA)
trainer.IV<-(trainer.wins/total.wins)/(trainer.runs/total.runs)
trainer.IV.SA<-(trainer.wins/total.wins)/(trainer.runs.SA/total.runs.SA)

#copy over to handicap specific variables
wins.hcaps<-trainer.wins
runs.hcaps<-trainer.runs
runs.hcaps.SA<-trainer.runs.SA
IV.hcaps<-trainer.IV
IV.SA.hcaps<-trainer.IV.SA

# put into a handicap trainer summary d f
TrainerHcaps<-data.frame(trainerID,wins.hcaps,runs.hcaps,runs.hcaps.SA,IV.hcaps,IV.SA.hcaps)

# ###################################################
#
# put patterns into their own d f
z<-which(T1.data$RPATTERN !=”NOT” & T1.data$RISHCAP == “FALSE”)
T2.data<-T1.data[z,]

# set up Sire IVs in patterns
sire.ID<-tapply(T2.data$SIREID,T2.data$SIREID,mean,na.rm=TRUE)
sire.wins <- tapply(T2.data$winner,T2.data$SIREID,sum,na.rm=TRUE)
total.wins<-sum(sire.wins)
sire.runs <- tapply(T2.data$runner,T2.data$SIREID,sum,na.rm=TRUE)
total.runs<-sum(sire.runs)
sire.IV<-(sire.wins/total.wins)/(sire.runs/total.runs)

# bring sire IV back into the T2 d f and calc a sire adjusted run variable
z<-match(T2.data$SIREID,sire.ID)
T2.data$sire.IV<-sire.IV[z]
T2.data$runner.SA<-T2.data$runner*T2.data$sire.IV

# calc the trainer IVs in patterns
trainerID<-tapply(T2.data$OTRAINERID,T2.data$OTRAINERID,mean,na.rm=TRUE)
trainer.wins <- tapply(T2.data$winner,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.wins<-sum(trainer.wins)
trainer.runs <- tapply(T2.data$runner,T2.data$OTRAINERID,sum,na.rm=TRUE)
trainer.runs.SA <- tapply(T2.data$runner.SA,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.runs<-sum(trainer.runs)
total.runs.SA<-sum(trainer.runs.SA)
trainer.IV<-(trainer.wins/total.wins)/(trainer.runs/total.runs)
trainer.IV.SA<-(trainer.wins/total.wins)/(trainer.runs.SA/total.runs.SA)

#copy over to pattern specific variables
wins.ptns<-trainer.wins
runs.ptns<-trainer.runs
runs.ptns.SA<-trainer.runs.SA
IV.ptns<-trainer.IV
IV.SA.ptns<-trainer.IV.SA
# put into a pattern trainer summary d f
TrainerPtns<-data.frame(trainerID,wins.ptns,runs.ptns,runs.ptns.SA,IV.ptns,IV.SA.ptns)

# ###################################################
#
# merge the trainer summary d f s
#
Temp<-merge(TrainerMdns,TrainerHcaps,by.x=”trainerID”,by.y=”trainerID”,all.x=”TRUE”,all.y=”TRUE”)
Trainers<-merge(Temp,TrainerPtns,by.x=”trainerID”,by.y=”trainerID”,all.x=”TRUE”,all.y=”TRUE”)

z<-match(Trainers$trainerID,RItrainer.data$TID)
Trainers$tname<-RItrainer.data$TSTYLENAME[z]
Trainers$country<-RItrainer.data$TCOUNTRY[z]

# merge in the ratings d f
#
TrainersAll<-merge(Trainers,TrainerHorses,by.x=”trainerID”,by.y=”trainer.h”,all.x=TRUE,all.y=TRUE)
#
#
# clean up soome of the variables, replace NA by 0
#
z<-which(is.na(TrainersAll$runs.mdns))
TrainersAll$runs.mdns[z]<-0
z<-which(is.na(TrainersAll$runs.hcaps))
TrainersAll$runs.hcaps[z]<-0
z<-which(is.na(TrainersAll$runs.ptns))
TrainersAll$runs.ptns[z]<-0
z<-which(is.na(TrainersAll$runs.mdns.SA))
TrainersAll$runs.mdns.SA[z]<-0
z<-which(is.na(TrainersAll$runs.hcaps.SA))
TrainersAll$runs.hcaps.SA[z]<-0
z<-which(is.na(TrainersAll$runs.ptns.SA))
TrainersAll$runs.ptns.SA[z]<-0
z<-which(is.na(TrainersAll$wins.mdns))
TrainersAll$wins.mdns[z]<-0
z<-which(is.na(TrainersAll$wins.hcaps))
TrainersAll$wins.hcaps[z]<-0
z<-which(is.na(TrainersAll$wins.ptns))
TrainersAll$wins.ptns[z]<-0
z<-which(is.na(TrainersAll$IV.mdns))
TrainersAll$IV.mdns[z]<-0
z<-which(is.na(TrainersAll$IV.SA.mdns))
TrainersAll$IV.SA.mdns[z]<-0
z<-which(is.na(TrainersAll$IV.hcaps))
TrainersAll$IV.hcaps[z]<-0
z<-which(is.na(TrainersAll$IV.SA.hcaps))
TrainersAll$IV.SA.hcaps[z]<-0
z<-which(is.na(TrainersAll$IV.ptns))
TrainersAll$IV.ptns[z]<-0
z<-which(is.na(TrainersAll$IV.SA.ptns))
TrainersAll$IV.SA.ptns[z]<-0

TrainersAll$wins.all<-TrainersAll$wins.mdns+TrainersAll$wins.hcaps+TrainersAll$wins.ptns
TrainersAll$runs.all<-TrainersAll$runs.mdns+TrainersAll$runs.hcaps+TrainersAll$runs.ptns
TrainersAll$runs.all.SA<-TrainersAll$runs.mdns.SA+TrainersAll$runs.hcaps.SA+TrainersAll$runs.ptns.SA

# produce summary stats
#
# composite IVs weighted by all runs in maidens, handicaps and pattern races

TrainersAll$IVcomp1<-(TrainersAll$IV.mdns*sum(TrainersAll$runs.mdns,na.rm=TRUE)+TrainersAll$IV.hcaps*sum(TrainersAll$runs.hcaps,na.rm=TRUE)
+TrainersAll$IV.ptns*sum(TrainersAll$runs.ptns,na.rm=TRUE))/(sum(TrainersAll$runs.mdns,na.rm=TRUE)+sum(TrainersAll$runs.hcaps,na.rm=TRUE)+sum(TrainersAll$runs.ptns,na.rm=TRUE))

TrainersAll$IVcomp1.SA<-(TrainersAll$IV.SA.mdns*sum(TrainersAll$runs.mdns.SA,na.rm=TRUE)+TrainersAll$IV.SA.hcaps*sum(TrainersAll$runs.hcaps.SA,na.rm=TRUE)
+TrainersAll$IV.SA.ptns*sum(TrainersAll$runs.ptns.SA,na.rm=TRUE))/(sum(TrainersAll$runs.mdns.SA,na.rm=TRUE)+sum(TrainersAll$runs.hcaps.SA,na.rm=TRUE)+sum(TrainersAll$runs.ptns.SA,na.rm=TRUE))

TrainersAll$IVcomp2<-(TrainersAll$IV.mdns*TrainersAll$runs.mdns+TrainersAll$IV.hcaps*TrainersAll$runs.hcaps
+TrainersAll$IV.ptns*TrainersAll$runs.ptns)/(TrainersAll$runs.mdns+TrainersAll$runs.hcaps+TrainersAll$runs.ptns)

TrainersAll$IVcomp2.SA<-(TrainersAll$IV.SA.mdns*TrainersAll$runs.mdns.SA+TrainersAll$IV.SA.hcaps*TrainersAll$runs.hcaps.SA
+TrainersAll$IV.SA.ptns*TrainersAll$runs.ptns.SA)/(TrainersAll$runs.mdns.SA+TrainersAll$runs.hcaps.SA+TrainersAll$runs.ptns.SA)

# difference variables, hcaps – mdns
TrainersAll$IVdiff.hcapsmdns<-TrainersAll$IV.hcaps-TrainersAll$IV.mdns
TrainersAll$IVdiff.hcapsmdns.SA<-TrainersAll$IV.SA.hcaps-TrainersAll$IV.SA.mdns

# quality differences using composites
TrainersAll$IVdiff.comp1.SAraw<-TrainersAll$IVcomp1.SA-TrainersAll$IVcomp1
TrainersAll$IVdiff.comp2.SAraw<-TrainersAll$IVcomp2.SA-TrainersAll$IVcomp2
# reduce the list to those trainers that have had >=50 runs in handicaps and are GB based and more than 2*50 runs in total
minruns<-50
z<-which(TrainersAll$runs.hcaps >= minruns & TrainersAll$country==”GB” & TrainersAll$runs.all >= 2*minruns)
Temp<-TrainersAll[z,]
TrainersAll50GB<-Temp[order(-Temp$IVcomp2.SA),]

# write out this d f to a CSV file
fname<-“c:/Racing Research/Trainer Research/trainersall50gb.csv”
write.csv(TrainersAll50GB,file=fname)

slcutoff<-40
z<-which(TrainersAll50GB$runner.h < slcutoff)
TrainersSmall50GB<-TrainersAll50GB[z,]
fname<-“c:/Racing Research/Trainer Research/trainerssmall50gb.csv”
write.csv(TrainersSmall50GB,file=fname)
z<-which(TrainersAll50GB$runner.h >= slcutoff)
TrainersLarge50GB<-TrainersAll50GB[z,]
fname<-“c:/Racing Research/Trainer Research/trainerslarge50gb.csv”
write.csv(TrainersLarge50GB,file=fname)

Owners Facilities: What Makes For a Good One?

In the post this morning I’ve had a letter from ARC Racing in which various improvements for owners on racedays are highlighted. The letter set me thinking : What makes for a good owners facility? In broad terms two things matter most – excellent viewing and comfortable facilities. To expand on this there are four criteria against which I’d judge whether a racecourse has a good owners facility.

Location

The owners facility should at the least have either have paddock views or be located in the stands with uninterrupted views opposite, or near to opposite,  the finish line.  If the owners facility is located away from the track there should be an owners area  located in the stands with uninterrupted views opposite, or near to opposite, the finish line.

Comfort

The owners facility should be large enough to fit the majority of owners and their guests seated.

Food & Drink

Food and drink should be available to a reasonable standard. Haute cuisine doesn’t have to be on offer, decent home cooking or a buffet is fine. I don’t mind paying as an owner if there is a decent selection on offer at a good value price.  Tea and coffee not in paper cups with tea served from tea pots and coffee that isn’t instant.

Badge Requests

It’s understandable that meetings such as Glorious Goodwood or Royal Ascot have restrictions on number of badges, extra badges and paddock passes. Both of these courses a the big meetings deal with owners requests  efficiently and with the minimum of fuss. Not all meetings are in such demand for badges and in these occasions flexibility on the part of courses is a plus.

Course Reviews

I’m going to post reviews of the courses I’ve visited in forthcoming blog pieces. More to follow!