Does Buying at Tattersalls Book 1 Lead To Guaranteed Success? Prices Paid vs. Racecourse Performance For The 2007 Graduates

Introduction

Record prices paid at the recent Tattersalls 2013 Book 1 Yearling Sale have hit the headlines. A Galileo filly, full sister to Oaks winner Was, sold for a record breaking G5m (G = guineas). The median price paid for a yearling came in at G130,000, an increase of 30% over 2012. Today (14th October 2013) the Book 2 sale starts, followed by Book 3 one week later. Yearling are categorised into the three books by Tattersalls based upon a range of criteria, including pedigree and confirmation. Book 1 is the most prestigious and its graduates typically sell for more than Book 2 graduates, which in turn sell for more than Book 3 graduates.  So how do the graduates of Tattersalls Yearling Sales perform on the racecourse? In common with all of the sales companies any Tattersalls graduate winning a prestigious race results in a tweet and/or email proclaiming where the horse was sold. But how do the graduates of the sales perform in aggregate? Trainer George Baker in a recent blog post alludes to the reality that some of these graduates will end up plying their trade at a basement level.  In this blog post the racecourse performance of all of the 2007 graduates from Books 1, 2 and 3 Tattersalls Yearling Sale is examined. The maximum rating achieved by each horse between 2008 and the end of the 2012 flat season was extracted from the Raceform database, including information from maidens, handicaps and pattern races and the ratings and race performance compared with their yearling sales price.

Yearling Sale Prices By Book

Over 1,500 yearlings were catalogued at Tattersalls Yearling Sale in 2007. Excluding those withdrawn, not sold or bought back, 1,136 yearlings were sold. The Book 1 median was G80,000, twice the Book 2 median of G40,000, with the Book 3 median coming in at G12,000.

Book Sold Median (G) Max (G)
1 447        80,000 1,000,000
2 393         40,000     300,000
3 296         12,000       72,000

Table 1: Tattersalls 2007 Yearling Sale Prices

 

Sale Prices & Subsequent Ratings

How do the graduates from this sale perform on the racecourse? Graph 1 below shows the relationship between prices paid and subsequent maximum rating achieved by each horse. The y axis has rating and the x axis sale price. The relationship is noisy. The correlation between price paid and subsequent rating is 0.20. If log prices are used so that the effect of some of the higher priced lots is dampened, the correlation increases to 0.28. At first glance it doesn’t appear as if much of a relationship exists at all. Does this suggest the work of bloodstock agents, trainers and owners trying to identify the best yearlings is of limited benefit?

Prices Vs. Ratings

Graph 1: Tattersalls 2007 Yearling sale price (G) vs subsequent rating

 

Book Membership & Ratings Achieved

In common with much of the data in horse rating, aggregation enables relationships to be identified.  Table 2 gives the median rating achieved across all of the graduates for each of Books 1, 2 and 3. The best horse from Book 1 posted a rating of 135, the best horse in Books 2 and 3 posted similar ratings of 120 and 119 respectively. The median rating achieved by Book 1 graduates was 78, for Book 2 graduates 73.5 and for Book 3 graduates 68. So a relationship between price paid and subsequent rating does exist when the results are aggregated to the Book level. Note that improvements in ratings become progressively more expensive to buy. In trading up from Book 3 to 2, an extra G28,000 bought you an additional 5.5 points of rating, whilst in trading up from Book 2 to 1 you needed to spend an extra G40,000 to garner an additional 4.5 rating points.

Book Median Rating Max Rating
1 78 135
2 73.5 120
3 68 119

Table 2: Ratings achieved across Books 1, 2 & 3

Wins Rates in Maidens, Handicaps & Pattern Races

Table 3 gives the number of individual winners that came out of each book by race category, table 4 shows the same information expressed as a percentage of horses that sold in each book. The numbers do not sum to the total column because a horse can be a winner in each of the three race categories but only once in total. About half of all graduates from the Tattersalls 2007 Yearling Sales are still maidens and the proportion of yearlings that won at least one race seems to be little affected by the Book in which you wrre sold. However the benefits of buying from Book 1 become clear. Nearly twice as many graduates from Book 1 go on to win pattern races compared with the graduates of Books 2 and 3. Book 1 graduates also win the highest proportion of maidens. This result should probably be upgraded because they are likely to have to contest open maidens, which by their nature are the most competitive. Their sales price and stallion fee would preclude them from competing in auction and median auction races. There is also a knock on effect when open maiden horses go on to compete in handicaps. Race standards applied by the handicapper, allied to his ‘on a line through’ methodology, means that the handicap marks of Book 1 graduates may  leave less room for manoeuvre than the graduates of other books. It is also likely that Book 1 graduates will be trained with a view to possible Pattern company participation, thus competing in maiden company closer to full fitness than the graduates of other books. As a result Book 1 graduates that end up in handicaps could well be doing so on marks that most closely reflect their ability. All of these arguments can be reversed when the graduates of Book 3 are considered.

Book Maidens Handicaps Patterns Total
1 155 126 34 239
2 118 121 16 188
3 84 98 10 152
Total 357 345 60 579

Table 3: Individual winners by Book in Maidens, Handicaps & Pattern Races

Book Maidens Handicaps Patterns Total
1 34.7% 28.2% 7.6% 53.5%
2 30.0% 30.8% 4.1% 47.8%
3 28.4% 33.1% 3.4% 51.4%
All 31.4% 30.4% 5.3% 51.0%

Table 4: Percentage of individual winners by Book in Maidens, Handicaps & Pattern Races

Differentiation Within Books: Does Paying More Work Within Books?

In aggregate the more expensive horses perform better on the racecourse. Is there much difference in subsequent performance if the more expensive Book 1 graduates are compared with those that sold more cheaply from Book 1? Each Book was sorted and split into a top half and bottom half group based upon sale price. The median rating of each group was calculated. Table 5 shows the median price and rating for each of the top half and bottom half by Book. There is a clear relationship between sales price and subsequent rating within each book. In each book the difference is about the same at 9 rating points. The more expensive Book 1 graduates ended up with higher ratings than cheaper Book 1 yearlings. The same is true of Books 2 and 3. In each case the difference in median ratings is about 9 points. It is noteworthy that the incremental cost of each additional rating point depends on your starting rating. In Book 3 it costs G1,800 for every extra rating point, whilst in Book 2 it is G5,455 per point and in Book 1 G21,250. In this respect yearlings trade in much the same way as other trophy assets.

When pattern race winners are considered the more expensive graduates of Books 1 and 2 have more winners than those that sold more cheaply – it is most striking in Book 1, with 24 pattern race winners versus 10 from the bottom half. Table 6 gives this information by Book. The usual caveats apply with respect to interpretation given the small sample sizes.

When median ratings are compared the more expensive graduates of Book 1 performed best, followed by the more expensive graduates of Book 2. However the next best performer is a tie between the more expensive Book 3 graduates and the cheaper yearlings from Book 1.  Yet the more expensive Book 3 graduates have a median sales price less than half that of the cheaper Book 1 graduates, albeit with fewer pattern race winners. If there can ever be value in buying yearlings it appears that, at least in 2007, buying the most expensive Book 3 graduates paid off on the racecourse. It is possible this result is an artefact of the 2007 yearling draft, looking at the results from other years would answer this query.

Median Price Median Price Median Rating Median Rating
Book Top Half Bottom Half Top Half Bottom Half
1         165,000         46,500 82 73
2           70,000         24,000 79 69
3           21,000           6,500 73 64.5

Table 5: Prices paid and ratings within books

 

Book Top Half Bottom Half
1 24 10
2 10 6
3 6 4

Table 6: Pattern winners by book top and bottom half

Summary

Results from the Tattersalls Yearling Sale from 2007 show a noisy relationship between individual sales price and subsequent rating. However in aggregate the relationship becomes clear – the more expensive yearlings, taken as a group, subsequently performed better on the racecourse. It is when pattern races are considered that the benefits from buying at Book 1 were at their most apparent. The median sale in Book 2 took place at G40,000. In Book 1 this doubled to G80,000. Whilst it might seem poor value that spending twice as much resulted in an increase of just 4.5 rating points in the median ratings for Book1 versus Book 2, it nearly doubled the chances of buying a yearling that went on to win a pattern race. Yearlings are priced off the right had tail of the distribution of expected future ratings, and it is the right-hand skewness inherent in the expected future ratings of Book 1 yearlings that causes them to sell so much more expensively than yearlings catalogued in Books 2 and Book 3. The lottery ticket you buy when shopping at Book 1 has a much greater chance of coming up. When prices within Books are considered the same relationships are confirmed. Buying the more expensive graduates from within each Book resulted in higher ratings than attempting to bargain hunt amongst the cheaper yearlings in each Book. In Book 1 buying the more expensive yearlings resulted in nearly 2.5x as many pattern race winners. Now the noisiness of the relationship shown in Graph 1 above means that bargains were available at all prices and in all books, however the probability of buying a bargain yearling that subsequently performed well at the racecourse was maximised if you bought from amongst the more expensive Book 3 graduates.

Top Rated Selections: Often A Long Wait Between Drinks – Why?

Introduction

Tune in to Racing UK or ATR and the chances are the focus will be on picking the winner of the next race. The Racing Post has pages of form and commentary distilled into selections, naps and tips, typically resulting in one selection per race being made. Tipsters tables contain one selection per race,  Tom Segal’s Pricewise column in the Racing Post usually recommends one and occasionally two selections in a handful of Saturday races. For any gambler the key measure of success is the amount of money made or lost over a reasonable time period, and implicit in the various pieces of advice on offer is that one selection per race is the way to achieve gambling success. It seems obvious – there can only be one winner, I just need to find it! One of the consequences of making one selection per race is that you are maximising the chances of sustaining a long losing run. The volatility of your profits/losses are also maximised, as is the path dependency of your trading strategy. None of these are attractive characteristics.  Apart from the effect on your bank balance, losing runs can lead to self-doubt as methodology and existence of a trading edge are questioned, yet the length of the losing run  may be just noise, in line with what you might expect given the size of your trading edge. So what sort of losing runs might you expect given different degrees of edge over the market?  In this blog post Monte Carlo Simulation (MC) is used to compare losing runs given different degrees of trading edge and at different odds.

Methodology

A ten runner race is set up with a set of book odds where the book sums to a 7% over-round. A rating is attached to each horse, and the true odds of each horse winning is defined to be a function of the book odds and its rating. The function works so that highly rated horses have lower true odds than the book odds and vice versa for lowly rated horses. One of the parameters in the function is the degree to which the ratings have an edge over the market. The greater the edge the more the book odds are adjusted. The approach is Bayesian in nature.  The ratings used are arbitrary – they express in numerical form the the likelihood of a particular horse winning – the results presented here are not specific to the use of rating systems. Implicit in any bet placed by a gambler in a probabilistic setting is a set of underlying decisions based upon preferences or rankings that can be thought of as a set of ratings, even if they aren’t expressed as such.

Monte Carlo methods are used to run the race 30,000 times (defined as one simulation, this is equivalent to betting on 15 races a week for 40 years) using the true odds, as defined earlier, to determine the probability of each horse winning. If the winner coincides with the horse that is also top rated, the gambler wins. The book odds associated with the top rated selection and the level of edge are kept constant per simulation run. The process is repeated so that simulations are run at 4 different book odds and 4 levels of edge, to give 16 simulations in total. The book odds chosen are evens, 3/1, 6/1 and 9/1 and the levels of edge chosen to correspond to differing levels of Return on Capital (RoC) of 10%, 5% , 0% (break-even) and -7%.  The latter case represents someone with no edge whose losses over time equal the book over-round.

Relationship Between Edge,  Book Odds and True Odds

Table 1 below gives the relationship between the odds at which you back and the true odds for given levels of edge. So backing at 6/1 with a 10% edge represents true odds of 5.3/1. At a 5% edge backing a 3/1 shot represents true odds of 2.8/1, and backing an even money shot with a 10% edge has true odds of 4/5. The difference between book and true odds is small and sets the context for the analysis that follows. Whilst not the subject of this blog post, tables such as this can be used to give trigger levels at which bets become interesting for a given level of perceived trading edge.

Book Odds with 7% over-round 10% edge 5% edge break-even no edge
evens 0.8 0.9 1.0 1.1
3/1 2.6 2.8 3.0 3.3
6/1 5.3 5.6 6.0 6.5
9/1 8.1 8.5 8.9 9.7

Table 1: Book odds and true odds for differing levels of edge

Relationship Between Edge, Book Odds and Losing Run Length

Table 2 below gives the maximum losing run that from each simulation. The longest losing run experienced from betting at constant odds of evens with a 10% edge was 14 races, at 9/1 with a 10% edge 80 races. The reason it is often a long wait between drinks for top rated selections is the size of the trading edge compared with the odds at which horses are backed. Since the number in Table 2 represent the extreme case of the simulation, the length of losing run that occurs 5% of the time  is reported in Table 3.  Note how the length of losing run changes little with edge. If you typically bet at 6/1 and think you have a 5% edge, and you are on your 17th losing wager, there are no obvious signs from Tables 2 and 3 that you are experiencing anything other than a losing run that occurs one time in twenty. If Pricewise has a 10% edge and gives 3 selections a week all at 9/1, these results suggest that at worst  he could go half a year without a selecting a winner. Note that in practice gamblers will be betting wherever value is perceived regardless of book odds, and the fixing of odds across all simulations is artificial. However it would be straightforward to weight the results to reflect the proportion of bets you typically placed at various odds.

In finance one criteria used to judge the quality of returns delivered by investment managers is the Sharpe Ratio. This penalises returns by the volatility of the return stream. Inspection of table 3 shows that the highest Sharpe Ratio would come from betting even money shots. To emphasise, there is no suggestion that betting even money represents greater value than betting at bigger odds. The simulations are set up so the Return on Capital achieved are the same, and the value inherent in the even money shot is the same as in the 6/1 shot. However the path to terminal wealth followed by betting at evens is inherently less volatile than betting at bigger odds.

Book Odds with 7% over-round 10% edge 5% edge break-even no edge
evens 14 14 16 19
3/1 24 27 27 27
6/1 59 59 60 65
9/1 80 80 80 80

Table 2: Book odds and maximum losing runs  for differing levels of edge

Book Odds with 7% over-round 10% edge 5% edge break-even no edge
evens 3.2 3.5 3.8 4.2
3/1 8.7 9.3 9.8 10.6
6/1 16.9 17.8 18.8 20.3
9/1 25.2 26.4 27.9 ;30.3

Table 3: Book odds and 95% probability losing runs

Relationship Between Edge, Book Odds and Time to Last Cumulative Loss

The results presented so far are unaffected by staking plans. In Table 4 below the number given represents the last race in the simulation at which cumulative profits are negative. This gives a sense of the number of races for the signal inherent in the edge to outweigh the noise. For level stakes betting at 6/1 with a 10% edge , profitability is always positive from the 2,324th race. Note the substantial step up in the wait for cumulative profitability when betting at 6/1 compared with 9/1 at the 5% edge level, and when betting at 3/1 compared with 6/1 at the 10% edge level.  The results highlight the increased path dependency inherent in betting at higher odds. The range of possible outcomes is such that it can take much longer to move into positive cumulative profitability.

Note that employing staking plans such as The Kelly Criterion would potentially improve this level staking result so that the month numbers were lower, particularly for the higher odds results presented, however since the cumulative profits/losses will have meandered around zero the effect on the broad thrust of the conclusion reached is likely to be small.

Book Odds with 7% over-round 10% edge 5% edge
evens              12            432
3/1            628         2,673
6/1         2,324         2,919
9/1         2,923         8,444

Table 4: Book odds and number of races and last breake-ven race

Conclusions

If you choose to bet on the horse that represents your top pick in a race, and you adopt this as a betting approach over many races, you are maximising the total profits you can expect to accrue over time. However this approach has costs associated with it. Whilst maximising expected total profits, you are also maximising both the volatility of your trading profits and exposure to path dependency.

Losing run length is primarily driven by the odds at which you back horses. It is difficult to identify that you have lost your edge in the middle of a losing run because losing run length is primarily driven by the odds at which you bet rather than the size of your betting edge. What may appear to be a loss of ability could merely be an unlucky run that is merely noise.  The reason it is often a long wait between drinks for top rated selections is the size of the trading edge compared with the odds at which horses are backed.

Betting at shorter prices minimises trading profit volatility, path dependency and reduces losing run length. Splitting your stake across more than one selection in a race will (subject to your edge being similar across all runners in a race) increase the probability that your edge will be reflected in your trading profits. These profits will not be as large as if you had made one winning selection, however what you make will be made far more often.  Betting on a number of horses in a race effectively creates one shorter priced aggregate bet. This has a number of attractive features – it reduces losing run length, reduces trading profit volatility and reduces exposure to path dependency. The cost of this approach is that over the long run total profits will be less than betting on one selection only. The trade-off between the two approaches is interesting. Given the associated drawbacks, it is surprising the one selection per race approach appears to be so little questioned and so popular.

Jamie Spencer – Riding Style & Results: What Does The Data Tell Us?

Introduction

The recent criticism by Luca Cumani of two of Jamie Spencer’s rides this year on Mount Athos by Luca Cumani ( “it’s on record that he was given two very bad rides” has caused a good deal of comment and publicity. Simon Holt devotes his column in the Racing Post Weekender this week (25th September edition) to a discussion of Jamie Spencer’s riding style, concluding that “this is a jockey with a bit of star quality and his career record provides impressive defence against the critics”.  As Simon Holt points out, the hold up style he adopts can lend itself to criticism if a horse is perceived as being delivered too late, such as his recent ride on York Glory in the 2013 renewal of the Beverley Bullet. However much of the criticism leveled appears to be founded on one or two rides, rather than by considering his performance over many rides.  In this blog post all of Jamie Spencer’s rides in the 2013 flat season (to 25th September) were examined in terms of riding style, Impact Values and ratings. The rides of a number of other jockeys (Ryan Moore, Richard Hughes, James Doyle and Joe Fanning) were also examined. With each of these jockeys having ridden over 400 rides each this season to date, there is plenty of data to interrogate.

Definition of Running Style/Early Pace Position

The analysis that underpins this piece was carried out in the R statistical environment accessing Raceform Interactive (RFI) data. This is the same data used by the Racing Post. In running comments were used to identify the Early Pace Position (EPP) adopted by each horse in each race contained within  the database. In this blog post the terms EPP and ‘running style’ are used interchangeably. Five categories of running style were defined: leading(1), prominent (2), midfield (3), held up (4) and in rear (5).   Armed with an EPP by horse by race, the most frequently adopted running style by each horse can be identified. These EPPs can be used in conjunction with the remainder of the information contained within the RFI database to examine the relationship between running style and jockey performance. Horses had to have run at least 3 times for an EPP to be assigned to the horse, so if for example, Orfevre ran three times, twice in the lead (style 1) and once prominently (style 2), he’d be assigned an EPP of 1. After this parsing exercise we have the running style adopted by each horse in each race that it took part, and the running style each horse has adopted most frequently in the past.

Jockey Rides, Horse Ability and Starting Prices

To help put Jamie Spencer’s riding style in context the following jockeys were chosen for comparison: Ryan Moore, Richard Hughes, James Doyle and Joe Fanning. The first two are vying for champion jockey in 2013, James Doyle has recently been appointed Prince Khalid Abdullah’s jockey, whilst  Joe Fanning is known for adopting front running tactics and should provide a contrast with the riding style adopted by Jamie Spencer. Table 1  gives information about the ability of horses ridden (using the median rating across all rides ) by each jockey and betting market expectations  using average and median Starting Prices (SPs). All rides in the 2013 flat season were considered. Ryan Moore rides horses with the most ability, posting a median RPR of 81, followed by Jamie Spencer , Richard Hughes, James Doyle  and then Joe Fanning. Note that the SPs for Ryan Moore and Richard Hughes’s rides are close, suggesting the betting markets typically rate their chances similarly. Jamie Spencer comes next in terms of market expectations, with James Doyle last after Joe Fanning, even though, on average, he rides more highly rated horses.

Jockey Median Rating of Rides Average SP Median SP
J Fanning 69 10.4 7.0
R Hughes 76 5.7 4.0
J Doyle 73 12.7 7.5
J Spencer 78 8.7 6.0
R Moore 81 5.5 4.0

Table 1: Median ratings of rides and SP information for selected jockeys

Early Pace Position Profiles

The proportion of horses in each EPP category is given in Table 2, along with wins per category and associated Impact Values (IVs). Impact Value has its usual definition. The IV for front runners is 1.88. As is widely known front runners win more frequently than other  running styles. IVs  for EPP styles 2 (prominent) and 3 (mid-division) are similar at 0.94 and 0.99 respectively, with hold up horses performing somewhat worse at 0.83 and horses that race in rear reporting the lowest IV of 0.60. The IVs reported here by riding style suggest that the most important decision a jockey can take is whether to front run or not. After that, racing prominently or in mid division has similar outcomes, whilst being held up or in rear suggests that the further back you race from a midfield position, the less likely it is that you will win races. There is an important caveat here – the EPP adopted it is not entirely in the jockeys hands, but conditioned on a number of factors, only some of which are in his control. However we do know that on average horses do appear to have a favoured EPP, and this is useful for some of the analysis that follows.

EPP wins runs proportion IV
1 963 4466 12% 1.88
2 408 3787 10% 0.94
3 1714 15097 39% 0.99
4 814 8550 22% 0.83
5 479 6329 17% 0.66
TOTAL 4378 38229 100% 1.00

Table 2: EPP running styles, proportions and Impact Values

Jockey Riding Styles

Perceptions are borne out by the data – Jamie Spencer rides far fewer horses in mid-division than other jockeys, preferring to hold them up or ride them in rear.  Table 3 takes every ride of each jockey and amalgamates by EPP. The differences in EPP adopted by Jamie Spencer are substantial compared with the other jockeys in the table. Note that he rides as least as many front runners and prominent horses as Ryan Moore and James Doyle, it is the mid-division category that he eschews, with more than half of his rides categorised as either held up or in rear.

Jockey EPP1 EPP2 EPP3 EPP4 EPP5
J Fanning 19% 13% 43% 16% 9%
R Hughes 17% 7% 35% 20% 21%
J Doyle 10% 6% 37% 29% 19%
J Spencer 13% 5% 25% 27% 31%
R Moore 10% 8% 40% 23% 20%

Table 3: Riding styles adopted by jockey

Does this result hold when checked against the most frequent riding style of the horses ridden by our jockeys?  Table 4 shows there is some evidence that Jamie Spencer tends to ride more horse that have a hold up running style. However, this could have been caused by the fact that he might be the only jockey to have sat on the horse and thus contributed to its running style. This makes  interpretation more difficult. On balance, however, comparing the riding proportions in tables 4 and 5  shows that Jamie Spencer does appear to ride his mounts with more restraint than is usually the case.

Jockey EPP1 EPP2 EPP3 EPP4 EPP5
J Fanning 11% 15% 33% 20% 21%
R Hughes 6% 13% 31% 29% 22%
J Doyle 4% 10% 29% 32% 25%
J Spencer 4% 8% 31% 30% 27%
R Moore 6% 10% 33% 30% 20%

Table 4: Riding styles by horse

Relationship between Running Style (EPP) and Ratings Achieved

A  measure that compares the rating of each run relative to the maximum rating the horse has achieved is defined as the Relative to Maximum – RTM.  Table 5 below shows average RTM by running style style. On average horses run ca. 18lb below their maximum rating. This is no surprise – ratings are negatively skewed – bounded on the upside by ability and the relatively rare confluence of a set of circumstances that allows a horse to achieve its maximum rating,  and exposed to substantial downside as any number of events (going, draw, pace, opposition, trip and so on)  cause horses to run below their best. Horses with a  prominent running style (EPP 2) are most likely to perform below their best. Remember from table 2 horses that race prominently deliver lower IVs than those that race in mid-division.  It is possible that the pressure of racing prominently conspires against these horses. The best RTM numbers reported are for horses that are held up or ridden in rear. Given the IVs for these categories are substantially lower than 1, a likely explanation for them running more closely to their maximum rating is that they are running on past beaten horses to be placed rather than winning. This has implications for their handicap ratings relative to their ability.  In tables 4 and 5 we have IVs and RTM values by running style classification for all races that took place on the flat in 2013. These tables give us a sense of how often horses win given the their riding style, and to what degree they run close to their maximum form.  Now we turn to the same information at the individual jockey level.

Early Pace Position (EPP) Rating To Maximum (RTM) – average
1 -18.4
2 -19.0
3 -17.9
4 -17.7
5 -17.0

Table 5: RTM by EPP style category 

Jockey Performance: Impact Values & RTM Ratings

Two approaches to measuring jockey ability are those used by John Whitley, often mentioned by James Willoughby on Racing UK, and Timeform. In this blog post two measures already employed, Impact Values  (IVs) and Run To Maximum (RTM) , are calculated at the jockey level. Impact Values by jockey by running style are reported in Table 6 below, RTMs by running style are reported in table 7.  Ryan Moore performs the best across both measures.  On average his rides perform about 8lb better than average (-8lb vs -18lb) and ca. 3lb better than the other jockeys considered here. Particularly noteworthy is his performance on front runners, where he performs nearly 10lb better than average, with an IV of 3.9.  Joe Fanning performs best when he rides front runners.  Richard Hughes, James Doyle and Jamie Spencer perform similarly to each other based upon RTMs – about 5lb better than average, but ca. 3lb behind Ryan Moore. If Starting Prices and horse ability are considered, James Doyle performs particularly well. Perhaps the betting market has underestimated his abilities – if so, his recent appointment by Prince Khalid Abdullah and the likely increase in the quality of his mounts  is likely to change this.

Turning to Jamie Spencer:  he performs best on front running rides, delivering similar IVs to Richard Hughes and yet performing 2.5lb better on average.  What of his hold up rides? Considering horse that are held up or ridden in rear (EPP 4 and 5) , Jamie Spencer’s rides perform second only to Ryan Moore in terms of RTM. Yet the IVs for both of these categories are the second lowest of the jockeys considered here. There are a couple of interpretations. The first is that hold up horses are running into places, achieving respectable ratings and yet not winning. The second is that the horses are being ridden in a style that maximises their chances of running close to their maximum ratings, and the IVs will, over many more rides, reflect this.

Jockey EPP1 EPP2 EPP3 EPP4 EPP5
J Fanning 2.52 1.41 1.13 0.39 0.93
R Hughes 2.76 2.25 2.04 1.65 1.75
J Doyle 1.35 1.14 0.97 1.69 1.79
J Spencer 2.73 2.24 1.69 1.21 1.05
R Moore 3.90 1.74 2.21 1.34 2.07

Table 6: Impact Values by jockey by EPP classification

Table 7 below shows RTM averages by jockey by EPP category. A discussion of IVs and RTM by jockey follows table 7.

Jockey EPP1 EPP2 EPP3 EPP4 EPP5
J Fanning -15.3 -20.4 -17.8 -16.4 -15.5
R Hughes -13.5 -11.8 -12.9 -13.0 -12.9
J Doyle -12.6 -15.1 -11.7 -13.1 -13.9
J Spencer -11.0 -14.7 -13.3 -11.6 -12.6
R Moore -8.5 -10.2 -9.8 -9.4 -10.0

Table 7: RTM by jockey by EPP classification

Summary

  • Ryan Moore is viewed by many as the best rider in the UK – the analysis in this blog post supports this view.
  • James Doyle rides as well as Richard Hughes and Jamie Spencer and  has done so on longer priced horses with less ability.
  • Jamie Spencer rides horses further back than their usual position in races, and in doing so enables them to run closer to their maximum rating. The data suggests riding further back is a matter of choice. Whilst riding horse further back typically compromises their chances of winning races to a degree, the Impact Values for Jamie Spencer’s hold up rides are significantly above the average and also greater than 1. However they are also below that reported by Richard Hughes, James Doyle and Ryan Moore. It is possible that over time and over many more rides, the fact that his mounts are running closer to maximum ratings will be reflected in higher Impact Values than delivered in the 2013 flat season.

Measuring Training Yard Success: Impact Values from Maiden, Handicap & Pattern Races

Introduction

The champion trainer for the season is decided using total prize money earned. This measure favours the very largest training yards, particularly those that have access to the offspring of top stallions.  As a result it is somewhat unsatisfactory measure of training yard success. Since Impact Values (IVs) correct for yard size by taking into account the number of runners as well as number of winners,  the playing field between yards of differing sizes is, to a good degree, made level when this measure is used. Whilst there are also limitations with using this measure across all races and for all trainers, the net is cast wider.  In this blog post IVs for different categories of race, namely maidens, handicaps and pattern races, are calculated, both raw and adjusted for Sire IVs (SA), then combined to produce a composite IV measure. Measuring IVs in different race categories enables a more complete picture of training yard success to be built. A by-product of the approach used is that trainers whose results are most and least influenced by the success of particular stallions can be identified.

Data & Methodology

The analysis that underpins this piece was carried out in the R statistical environment accessing Raceform Interactive data for the 2012 flat season. The R code is posted elsewhere for interested readers. To qualify for inclusion in the tables that follow, a training yard must have sent out at least 50 runners in handicaps and 100 runners in total over the course of the 2012 flat season, and be based in Great Britain (GB). A total of 139 yards met this criteria. These yards were then split into 2 groups according to how many different horses had been raced – 66 yards raced at least 40 different horses and are the focus of the analysis in this blog piece. The other 73 yards, smaller in size, were analysed separately and may be the subject of a further blog post. Since we know that on average larger yards deliver higher IVs than smaller yards (see my earlier blog post on this subject) smaller yards that perform well may not have appeared in the listings reported below and it is more appropriate to analyse their results separately.

Impact Values – Maidens

Maiden race IVs are likely to favour large yards with access to potential pattern class horses. Table 1 shows the top 10 yards ranked by sire adjusted IV in maidens. Raw IVs are also reported. Note the dominance of the Richard Hannon yard and the small difference between raw and sire adjusted IVs compared with the larger differences between IVs for Saeed bin Suroor and William Haggas. The large number of horses at the Hannon’s yard appears to confer a substantial advantage in being able to place horses to good effect within maidens. The same comments apply to Richard Fahey’s results. In both yards the large number of horses at their disposal appears to outweigh any advantage given to other yards via ostensibly better bred horses.

Rank Trainer wins runs IV raw IV SA
1 Mrs K Burke 14 49 2.70 3.02
2 Saeed bin Suroor 33 121 2.58 1.93
3 Peter Chapple-Hyam 14 68 1.95 1.81
4 William Haggas 42 182 2.18 1.73
5 Henry Candy 9 63 1.35 1.63
6 Richard Hannon 82 471 1.64 1.58
7 Jeremy Noseda 18 98 1.74 1.57
8 John Quinn 7 39 1.70 1.55
9 Richard Fahey 35 225 1.47 1.49
10 David Simcock 20 108 1.75 1.42

Table 1: Top 10 training yards by Sire adjusted IV in maidens

Impact Values – Handicaps

Table 2 shows the top 10 yards ranked by Sire adjusted IV in handicaps. Raw IVs are also reported. Sir Mark Prescott Bt tops the table, although in common with the majority of the trainers in the top 10 his Sire adjusted IV is substantially lower than his raw IV. Noteworthy are the results of Chris Wall and Michael Appleby, whose IVs are hardly affected by the relative success of the sires of their horses in training. Part of this result is due to their lack of relative success in maidens, suggesting their horses are likely to be highly competitive when they move out of maidens  into handicap company – Chris Wall’s IV in maidens was 0.43, whilst Michael Appelby sent out no maiden winners in 2012. In contrast Sir Mark Prescott Bt, along with 6 other trainers, delivered IVs above 1 in both maiden and handicap company. The other 6 were Marcus Tregoning, Luca Cumani, Sir Michael Stoute, Ed Dunlop, James Fanshawe, Roger Varian and Mick Channon.

Rank Trainer wins runs IV raw IV SA
1 Sir Mark Prescott Bt 31 131 2.45 1.98
2 Marcus Tregoning 16 83 1.99 1.88
3 Sir Michael Stoute 25 115 2.25 1.83
4 Luca Cumani 24 129 1.92 1.75
5 Chris Wall 18 104 1.79 1.74
6 Roger Varian 28 155 1.87 1.61
7 Peter Chapple-Hyam 10 63 1.64 1.53
8 William Haggas 30 166 1.87 1.52
9 Michael Appleby 24 162 1.53 1.52
10 Tom Dascombe 33 211 1.62 1.51

Table 2: Top 10 training yards by Sire adjusted IV in handicaps

Impact Values – Pattern Races

Table 3 shows the top 20 yards ranked by Sire adjusted IV in pattern races. Raw IVs are also reported. The results are more difficult to interpret than maidens and handicaps for individual trainers because of small sample sizes. The Richard Hannon and John Gosden yards dominate the table in terms of number of winners and runners, however the Sire adjusted IVs for both trainers are noticeably  lower than their raw IVs. It is possible this  result is an artefact created by their substantial relative success in producing pattern class winners during the 2012 flat season. A number of yards that perform well on the IV measure in maiden company do not appear in the table below.

Rank Trainer wins runs IV raw IV SA
1 Ann Duffield 2 4 4.47 6.12
2 Alan McCabe 1 5 1.79 2.92
3 David Simcock 2 14 1.28 2.05
4 Sir Henry Cecil 16 60 2.39 1.89
5 David O’Meara 4 20 1.79 1.89
6 Roger Charlton 10 46 1.94 1.84
7 David Barron 2 13 1.38 1.58
8 Roger Varian 9 52 1.55 1.56
9 Sir Michael Stoute 6 41 1.31 1.42
10 Richard Fahey 9 81 0.99 1.36
11 Mrs K Burke 1 12 0.75 1.32
12 Chris Wall 2 15 1.19 1.27
13 Clive Cox 7 35 1.79 1.27
14 Henry Candy 1 11 0.81 1.23
15 Richard Hannon 21 139 1.35 1.16
16 Mark Johnston 8 61 1.17 1.15
17 Marcus Tregoning 2 17 1.05 1.15
18 Luca Cumani 4 26 1.38 1.10
19 John Gosden 23 130 1.58 1.09
20 Mahmood Al Zarooni 9 69 1.17 1.06

Table 3: Top 10 training yards by Sire adjusted IV in pattern races

Impact Values – Composite Measure

A composite IV is calculated by combining together the IVs for maidens, handicaps and pattern races by trainer, weighting by the proportion of runs that each trainer had in each category.  Thus a trainer without runners in pattern races would not be penalised for his non-participation, and the biggest contributor to each trainer’s IV is from the category of race in which they had the biggest proportion of runners. The composite measure was also adjusted for Sire IV. Using this measure Sir Mark Prescott Bt was the top trainer on the flat in 2012, followed by William Haggas and Marcus Tregoning. Noteworthy results were produced by Henry Candy, David Barron, Michael Appleby and Chris Wall, each of whom saw their IV increase after taking the Sire IV adjustment  into account. For 16 of the 20  trainers we see the opposite, suggesting that the adjustment for bloodstock quality used here via a Sire adjusted IV does not go far enough. I will return to this subject in another blog article. Thanks to Declan Meagher and others for making this point  on the separate blog post “Do Small Training Yards Punch Above Their Weight?’.

Rank Trainer IV raw IV SA
1 Sir Mark Prescott Bt 1.81 1.55
2 William Haggas 1.87 1.55
3 Marcus Tregoning 1.62 1.53
4 Saeed bin Suroor 1.86 1.52
5 Roger Varian 1.76 1.51
6 Peter Chapple-Hyam 1.59 1.51
7 Henry Candy 1.31 1.49
8 Sir Michael Stoute 1.91 1.48
9 Sir Henry Cecil 1.86 1.42
10 David Barron 1.25 1.42
11 Richard Hannon 1.52 1.41
12 Luca Cumani 1.59 1.40
13 Jeremy Noseda 1.53 1.39
14 Mrs K Burke 1.39 1.38
15 Michael Appleby 1.29 1.36
16 Chris Wall 1.34 1.35
17 Ralph Beckett 1.64 1.34
18 Roger Charlton 1.49 1.30
19 Tom Dascombe 1.37 1.29
20 David Simcock 1.34 1.27

Table 4: Top 20 training yards by composite IV adjusted for Sire

Training Yards Success & Relationship with Sire Quality

How many training yards are able to deliver improved IVs after the Sire adjustment is taken into account? Remember for successful yards the natural direction for the Sire adjustment to take your IV is downwards. This is because the better quality Sires make an outsized contribution in terms of siring winners. So the yards that are able to increase their IVs after this adjustment is applied are worthy of note. There are 10 yards out of the 66 – see Table 5 below –  that were able to deliver an adjusted composite IV both greater than 1 and higher than their raw composite IV. Henry Candy and David Barron’s results are noteworthy.

Rank Trainer IV comp IV comp SA Difference
1 Henry Candy 1.31 1.49 0.18
2 David Barron 1.25 1.42 0.18
3 Michael Appleby 1.29 1.36 0.07
4 Chris Wall 1.34 1.35 0.02
5 Brian Ellison 1.25 1.26 0.01
6 Kevin Ryan 1.11 1.22 0.11
7 James Given 1.01 1.15 0.14
8 John Quinn 1.12 1.15 0.03
9 Marco Botti 1.06 1.06 0.01
10 Alan Swinbank 1.00 1.02 0.01

Table 5: Top 10 trainers with improved IVs after Sire adjustment ranked on Sire adjusted IV

What of yards that see falls in their IVs after the Sire adjustment is applied? Table 6 ranks the 10 training yards most affected by the Sire IV adjustment. These yards are still highly successful – they still post IVs substantially greater than 1. However, using this metric suggests that these training yards are more reliant than others on the quality of their bloodstock for their success.

Rank Trainer IVcomp IVcomp SA
57 Roger Varian 1.76 1.51
58 Sir Mark Prescott Bt 1.81 1.55
59 James Fanshawe 1.50 1.24
60 Ralph Beckett 1.64 1.34
61 Mahmood Al Zarooni 1.51 1.19
62 William Haggas 1.87 1.55
63 Saeed bin Suroor 1.86 1.52
64 John Gosden 1.65 1.24
65 Sir Michael Stoute 1.91 1.48
66 Sir Henry Cecil 1.86 1.42

Table 6: Bottom 10 trainers with reduced IVs after Sire adjustment

Summary

In this paper the criteria used for measuring training yard success is a Sire Adjusted Impact Value derived from results delivered in maidens, handicaps and pattern races. Using this measure Sir Mark Prescott Bt was the top trainer on the flat in 2012. It is probable the Sire IV adjustment used does not go far enough in terms of correcting for quality and another blog post will address this point.  A small number of trainers produce IVs that improve after an adjustment for Sire quality is made. These training yards are of particular interest.  .

Measuring Training Yard Success: R code

####################################################################
#
#
# Measuring Training Yard Success: Impact Values from Maidens, Handicaps and Pattern Races
# J. Hathorn
#
# v1.0
#
#
# written 18-Sep-13
#
#
###################################################################

#rm(list=ls())

library(foreign)
library(maptools)

# read in database files from RI
#
setwd(“C:/Program Files (x86)/RaceForm Interactive”)

RIhorse.data <-read.dbf(“horse.dbf”)
RIouting.data <-read.dbf(“outing.dbf”)
RIrace.data <-read.dbf(“race.dbf”)
RIsire.data <-read.dbf(“sire.dbf”)
RItrainer.data <- read.dbf(“trainer.dbf”)
RIcourse.data<-read.dbf(“course.dbf”)

# #############################################################
# set date parameters to focus on races between chosen dates
# flat season Lincoln to the November Handicap
chosenDateSt<-c(“2012-03-31”)
chosenDateEd<-c(“2012-11-10”)

# set dates for determining yard sizes, set the previous year to the November Handicap
#chosenDateSt1<-c(“2011-11-11”)
#chosenDateEd1<-c(“2012-11-10”)
#################################################
#
# extract GB course id list from course db
z<-which(RIcourse.data$CCOUNTRY == “GB”)

GBcourseids<-RIcourse.data$CID[z]
GBcoursenames<-RIcourse.data$CNAME[z]
#
# extract GB/IRE trainer lists from trainer db
z<-which(RItrainer.data$TCOUNTRY == “GB”)
GBtrainers<-RItrainer.data$TID[z]
z<-which(RItrainer.data$TCOUNTRY == “IRE”)
IREtrainers<-RItrainer.data$TID[z]
GBIREtrainers<-append(GBtrainers,IREtrainers)
#
#
##################################################

# select outings on the flat between the chosen dates
tmpidx<-which(RIouting.data$ODATE>=chosenDateSt & RIouting.data$ODATE<=chosenDateEd & RIouting.data$OFJ==”F”)
T1.data<-RIouting.data[tmpidx,]

# match the course and add a country variable
z<-match(T1.data$OCOURSEID,RIcourse.data$CID)
T1.data$COCOUNTRY<-NA
T1.data$COCOUNTRY<-RIcourse.data$CCOUNTRY[z]

T1a.data<-T1.data
# select outings that took place on GB courses and append the 2 d f
tmpidx<-which(T1.data$COCOUNTRY == “GB”)
T1.data<-T1.data[tmpidx,]

# #######################
# GB races only
# code to include IRE races if so desired
#tmpidx<-which(T1a.data$COCOUNTRY == “IRE”)
#T2.data<-T1a.data[tmpidx,]
#T1.data<-rbind(T1.data,T2.data)

# ######################
# age restriction if required
# reduce T1 to horses of the desired age given by the parameter agecheck
#agecheck<-2
#tmpidx<-which(T1.data$OAGE==agecheck)
#T1.data<-T1.data[tmpidx,]

# match each horse into the RIhorse d f to get the sire id
z<-match(T1.data$OHORSEID,RIhorse.data$HID)
T1.data$SIREID<-RIhorse.data$HSIREID[z]

# attach some of the race conditions ie age, stakes, handicap etc from the RIrace d f
z<-match(T1.data$ORACEID,RIrace.data$RID)
T1.data$RCOND<-RIrace.data$RCOND[z]
T1.data$RAGE<-RIrace.data$RAGE[z]
T1.data$RANIMAL<-RIrace.data$RANIMAL[z]
T1.data$RPATTERN<-RIrace.data$RPATTERN[z]
T1.data$RISHCAP<-RIrace.data$RISHCAP[z]

# set a 1/0 variable for winners and a 1 variable for runners will help aggregation later
T1.data$runner<-1
T1.data$winner<-0
z<-which(T1.data$OPOS==1)
T1.data$winner[z]<-1

# SP turned into a probability
T1.data$SPprob<-1/(1+T1.data$OSPVAL)

# reformat the ORF rating variable
T1.data$ORF<-as.character(T1.data$ORF)
T1.data$ORF<-gsub(“\\?”,””,T1.data$ORF)
T1.data$ORF<-gsub(“\\+”,””,T1.data$ORF)
T1.data$ORF<-as.numeric(T1.data$ORF)
tmpidx<-which(T1.data$ORF==0)
T1.data$ORF[tmpidx]<-NA
# reformat the OJC rating variable
T1.data$OJC<-as.character(T1.data$OJC)
T1.data$OJC<-gsub(“\\?”,””,T1.data$OJC)
T1.data$OJC<-gsub(“\\+”,””,T1.data$OJC)
T1.data$OJC<-as.numeric(T1.data$OJC)
tmpidx<-which(T1.data$OJC==0)
T1.data$OJC[tmpidx]<-NA

####################################################
#
# produce summary variables by horse – runs/wins/ratings etc
#
hid<-tapply(T1.data$OHORSEID,T1.data$OHORSEID,mean,na.rm=TRUE)
hruns<-tapply(T1.data$runner,T1.data$OHORSEID,sum,na.rm=TRUE)
hwins<-tapply(T1.data$winner,T1.data$OHORSEID,sum,na.rm=TRUE)
hwinner<-pmin(1,hwins)
hrunner<-pmin(1,hruns)
hORmax<-tapply(T1.data$OJC,T1.data$OHORSEID,max,na.rm=TRUE)
hRFmax<-tapply(T1.data$ORF,T1.data$OHORSEID,max,na.rm=TRUE)
z<-which(hORmax==-Inf)
hORmax[z]<-NA
z<-which(hRFmax==-Inf)
hRFmax[z]<-NA

z<-match(hid,T1.data$OHORSEID)
htrainerid<-T1.data$OTRAINERID[z]
z<-match(hid,RIhorse.data$HID)
hname<-RIhorse.data$HNAME[z]
# put these into a d f
HSummary<-data.frame(hid,hname,htrainerid,hruns,hwins,hwinner,hrunner,hORmax,hRFmax)

# ###############
# produce population wide summary stats
univ.OR.med<-median(HSummary$hORmax,na.rm=TRUE)
univ.RF.med<-median(HSummary$hRFmax,na.rm=TRUE)
univ.RF.sd<-sd(HSummary$hRFmax,na.rm=TRUE)
univ.RF.1sdup<-univ.RF.med+univ.RF.sd
univ.winners<-sum(HSummary$hwinner)
univ.runners<-sum(HSummary$hrunner)
univ.winpct<-univ.winners/univ.runners

HSummary$hRF1sdup<-0
z<-which(HSummary$hRFmax>univ.RF.1sdup)
HSummary$hRF1sdup[z]<-1
univ.RF.1sduppct<-sum(HSummary$hRF1sdup)/univ.runners

# ######################################################
#
# now take the horse summary df and produce a trainer summary based upon the horse summary d f

trainer.h<-tapply(HSummary$htrainerid,HSummary$htrainerid,mean,rm=TRUE)
wins.h<-tapply(HSummary$hwins,HSummary$htrainerid,sum,na.rm=TRUE)
runs.h<-tapply(HSummary$hruns,HSummary$htrainerid,sum,na.rm=TRUE)
winspct.h<-wins.h/runs.h
winner.h<-tapply(HSummary$hwinner,HSummary$htrainerid,sum,na.rm=TRUE)
runner.h<-tapply(HSummary$hrunner,HSummary$htrainerid,sum,na.rm=TRUE)
winpct.h<-winner.h/runner.h
#ORmax.med.h<-tapply(HSummary$hORmax,HSummary$htrainerid,median,na.rm=TRUE)
RFmax.med.h<-tapply(HSummary$hRFmax,HSummary$htrainerid,median,na.rm=TRUE)
RFmax.sd.h<-tapply(HSummary$hRFmax,HSummary$htrainerid,sd,na.rm=TRUE)
RFmax.up1sd.h<-RFmax.med.h+RFmax.sd.h
RF.1sdup.h<-tapply(HSummary$hRF1sdup,HSummary$htrainerid,sum,na.rm=TRUE)
RF.1sduppct.h<-RF.1sdup.h/runner.h

TrainerHorses<-data.frame(trainer.h,wins.h,runs.h,winspct.h,winner.h,runner.h,winpct.h,RFmax.med.h,RFmax.sd.h,RFmax.up1sd.h,RF.1sdup.h,RF.1sduppct.h)
z<-match(TrainerHorses$trainer.h,RItrainer.data$TID)
TrainerHorses$tname.h<-RItrainer.data$TSTYLENAME[z]

# write out this d f to a CSV file
#fname<-“c:/Racing Research/Trainer Research/trainerhorses.csv”
#write.csv(TrainerHorses,file=fname)

# ####################################################
#
# now go back to the Outing d f and split into race categories to get IVs etc by trainer
#
# split the races into different data frames
# Maidens
# Handicaps
# Pattern

# ###################################################
#
# put maidens into their own d f
z<-which(T1.data$RANIMAL==”MDN”)
T2.data<-T1.data[z,]

# set up Sire IVs in maidens
sire.ID<-tapply(T2.data$SIREID,T2.data$SIREID,mean,na.rm=TRUE)
sire.wins <- tapply(T2.data$winner,T2.data$SIREID,sum,na.rm=TRUE)
total.wins<-sum(sire.wins)
sire.runs <- tapply(T2.data$runner,T2.data$SIREID,sum,na.rm=TRUE)
total.runs<-sum(sire.runs)
sire.IV<-(sire.wins/total.wins)/(sire.runs/total.runs)

# bring sire IV back into the T2 d f and calc a sire adjusted run variable
z<-match(T2.data$SIREID,sire.ID)
T2.data$sire.IV<-sire.IV[z]
T2.data$runner.SA<-T2.data$runner*T2.data$sire.IV

# calc the trainer IVs in maidens
trainerID<-tapply(T2.data$OTRAINERID,T2.data$OTRAINERID,mean,na.rm=TRUE)
trainer.wins <- tapply(T2.data$winner,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.wins<-sum(trainer.wins)
trainer.runs <- tapply(T2.data$runner,T2.data$OTRAINERID,sum,na.rm=TRUE)
trainer.runs.SA <- tapply(T2.data$runner.SA,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.runs<-sum(trainer.runs)
total.runs.SA<-sum(trainer.runs.SA)
trainer.IV<-(trainer.wins/total.wins)/(trainer.runs/total.runs)
trainer.IV.SA<-(trainer.wins/total.wins)/(trainer.runs.SA/total.runs.SA)

#copy over to maiden specific variables
wins.mdns<-trainer.wins
runs.mdns<-trainer.runs
runs.mdns.SA<-trainer.runs.SA
IV.mdns<-trainer.IV
IV.SA.mdns<-trainer.IV.SA

# put into a maiden trainer summary d f
TrainerMdns<-data.frame(trainerID,wins.mdns,runs.mdns,runs.mdns.SA,IV.mdns,IV.SA.mdns)

# #####################################################
#
# put handicaps into their own d f
z<-which(T1.data$RISHCAP==”TRUE”)
T2.data<-T1.data[z,]

# set up Sire IVs in maidens
sire.ID<-tapply(T2.data$SIREID,T2.data$SIREID,mean,na.rm=TRUE)
sire.wins <- tapply(T2.data$winner,T2.data$SIREID,sum,na.rm=TRUE)
total.wins<-sum(sire.wins)
sire.runs <- tapply(T2.data$runner,T2.data$SIREID,sum,na.rm=TRUE)
total.runs<-sum(sire.runs)
sire.IV<-(sire.wins/total.wins)/(sire.runs/total.runs)

# bring sire IV back into the T2 d f and calc a sire adjusted run variable
z<-match(T2.data$SIREID,sire.ID)
T2.data$sire.IV<-sire.IV[z]
T2.data$runner.SA<-T2.data$runner*T2.data$sire.IV

# calc the trainer IVs in handicaps
trainerID<-tapply(T2.data$OTRAINERID,T2.data$OTRAINERID,mean,na.rm=TRUE)
trainer.wins <- tapply(T2.data$winner,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.wins<-sum(trainer.wins)
trainer.runs <- tapply(T2.data$runner,T2.data$OTRAINERID,sum,na.rm=TRUE)
trainer.runs.SA <- tapply(T2.data$runner.SA,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.runs<-sum(trainer.runs)
total.runs.SA<-sum(trainer.runs.SA)
trainer.IV<-(trainer.wins/total.wins)/(trainer.runs/total.runs)
trainer.IV.SA<-(trainer.wins/total.wins)/(trainer.runs.SA/total.runs.SA)

#copy over to handicap specific variables
wins.hcaps<-trainer.wins
runs.hcaps<-trainer.runs
runs.hcaps.SA<-trainer.runs.SA
IV.hcaps<-trainer.IV
IV.SA.hcaps<-trainer.IV.SA

# put into a handicap trainer summary d f
TrainerHcaps<-data.frame(trainerID,wins.hcaps,runs.hcaps,runs.hcaps.SA,IV.hcaps,IV.SA.hcaps)

# ###################################################
#
# put patterns into their own d f
z<-which(T1.data$RPATTERN !=”NOT” & T1.data$RISHCAP == “FALSE”)
T2.data<-T1.data[z,]

# set up Sire IVs in patterns
sire.ID<-tapply(T2.data$SIREID,T2.data$SIREID,mean,na.rm=TRUE)
sire.wins <- tapply(T2.data$winner,T2.data$SIREID,sum,na.rm=TRUE)
total.wins<-sum(sire.wins)
sire.runs <- tapply(T2.data$runner,T2.data$SIREID,sum,na.rm=TRUE)
total.runs<-sum(sire.runs)
sire.IV<-(sire.wins/total.wins)/(sire.runs/total.runs)

# bring sire IV back into the T2 d f and calc a sire adjusted run variable
z<-match(T2.data$SIREID,sire.ID)
T2.data$sire.IV<-sire.IV[z]
T2.data$runner.SA<-T2.data$runner*T2.data$sire.IV

# calc the trainer IVs in patterns
trainerID<-tapply(T2.data$OTRAINERID,T2.data$OTRAINERID,mean,na.rm=TRUE)
trainer.wins <- tapply(T2.data$winner,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.wins<-sum(trainer.wins)
trainer.runs <- tapply(T2.data$runner,T2.data$OTRAINERID,sum,na.rm=TRUE)
trainer.runs.SA <- tapply(T2.data$runner.SA,T2.data$OTRAINERID,sum,na.rm=TRUE)
total.runs<-sum(trainer.runs)
total.runs.SA<-sum(trainer.runs.SA)
trainer.IV<-(trainer.wins/total.wins)/(trainer.runs/total.runs)
trainer.IV.SA<-(trainer.wins/total.wins)/(trainer.runs.SA/total.runs.SA)

#copy over to pattern specific variables
wins.ptns<-trainer.wins
runs.ptns<-trainer.runs
runs.ptns.SA<-trainer.runs.SA
IV.ptns<-trainer.IV
IV.SA.ptns<-trainer.IV.SA
# put into a pattern trainer summary d f
TrainerPtns<-data.frame(trainerID,wins.ptns,runs.ptns,runs.ptns.SA,IV.ptns,IV.SA.ptns)

# ###################################################
#
# merge the trainer summary d f s
#
Temp<-merge(TrainerMdns,TrainerHcaps,by.x=”trainerID”,by.y=”trainerID”,all.x=”TRUE”,all.y=”TRUE”)
Trainers<-merge(Temp,TrainerPtns,by.x=”trainerID”,by.y=”trainerID”,all.x=”TRUE”,all.y=”TRUE”)

z<-match(Trainers$trainerID,RItrainer.data$TID)
Trainers$tname<-RItrainer.data$TSTYLENAME[z]
Trainers$country<-RItrainer.data$TCOUNTRY[z]

# merge in the ratings d f
#
TrainersAll<-merge(Trainers,TrainerHorses,by.x=”trainerID”,by.y=”trainer.h”,all.x=TRUE,all.y=TRUE)
#
#
# clean up soome of the variables, replace NA by 0
#
z<-which(is.na(TrainersAll$runs.mdns))
TrainersAll$runs.mdns[z]<-0
z<-which(is.na(TrainersAll$runs.hcaps))
TrainersAll$runs.hcaps[z]<-0
z<-which(is.na(TrainersAll$runs.ptns))
TrainersAll$runs.ptns[z]<-0
z<-which(is.na(TrainersAll$runs.mdns.SA))
TrainersAll$runs.mdns.SA[z]<-0
z<-which(is.na(TrainersAll$runs.hcaps.SA))
TrainersAll$runs.hcaps.SA[z]<-0
z<-which(is.na(TrainersAll$runs.ptns.SA))
TrainersAll$runs.ptns.SA[z]<-0
z<-which(is.na(TrainersAll$wins.mdns))
TrainersAll$wins.mdns[z]<-0
z<-which(is.na(TrainersAll$wins.hcaps))
TrainersAll$wins.hcaps[z]<-0
z<-which(is.na(TrainersAll$wins.ptns))
TrainersAll$wins.ptns[z]<-0
z<-which(is.na(TrainersAll$IV.mdns))
TrainersAll$IV.mdns[z]<-0
z<-which(is.na(TrainersAll$IV.SA.mdns))
TrainersAll$IV.SA.mdns[z]<-0
z<-which(is.na(TrainersAll$IV.hcaps))
TrainersAll$IV.hcaps[z]<-0
z<-which(is.na(TrainersAll$IV.SA.hcaps))
TrainersAll$IV.SA.hcaps[z]<-0
z<-which(is.na(TrainersAll$IV.ptns))
TrainersAll$IV.ptns[z]<-0
z<-which(is.na(TrainersAll$IV.SA.ptns))
TrainersAll$IV.SA.ptns[z]<-0

TrainersAll$wins.all<-TrainersAll$wins.mdns+TrainersAll$wins.hcaps+TrainersAll$wins.ptns
TrainersAll$runs.all<-TrainersAll$runs.mdns+TrainersAll$runs.hcaps+TrainersAll$runs.ptns
TrainersAll$runs.all.SA<-TrainersAll$runs.mdns.SA+TrainersAll$runs.hcaps.SA+TrainersAll$runs.ptns.SA

# produce summary stats
#
# composite IVs weighted by all runs in maidens, handicaps and pattern races

TrainersAll$IVcomp1<-(TrainersAll$IV.mdns*sum(TrainersAll$runs.mdns,na.rm=TRUE)+TrainersAll$IV.hcaps*sum(TrainersAll$runs.hcaps,na.rm=TRUE)
+TrainersAll$IV.ptns*sum(TrainersAll$runs.ptns,na.rm=TRUE))/(sum(TrainersAll$runs.mdns,na.rm=TRUE)+sum(TrainersAll$runs.hcaps,na.rm=TRUE)+sum(TrainersAll$runs.ptns,na.rm=TRUE))

TrainersAll$IVcomp1.SA<-(TrainersAll$IV.SA.mdns*sum(TrainersAll$runs.mdns.SA,na.rm=TRUE)+TrainersAll$IV.SA.hcaps*sum(TrainersAll$runs.hcaps.SA,na.rm=TRUE)
+TrainersAll$IV.SA.ptns*sum(TrainersAll$runs.ptns.SA,na.rm=TRUE))/(sum(TrainersAll$runs.mdns.SA,na.rm=TRUE)+sum(TrainersAll$runs.hcaps.SA,na.rm=TRUE)+sum(TrainersAll$runs.ptns.SA,na.rm=TRUE))

TrainersAll$IVcomp2<-(TrainersAll$IV.mdns*TrainersAll$runs.mdns+TrainersAll$IV.hcaps*TrainersAll$runs.hcaps
+TrainersAll$IV.ptns*TrainersAll$runs.ptns)/(TrainersAll$runs.mdns+TrainersAll$runs.hcaps+TrainersAll$runs.ptns)

TrainersAll$IVcomp2.SA<-(TrainersAll$IV.SA.mdns*TrainersAll$runs.mdns.SA+TrainersAll$IV.SA.hcaps*TrainersAll$runs.hcaps.SA
+TrainersAll$IV.SA.ptns*TrainersAll$runs.ptns.SA)/(TrainersAll$runs.mdns.SA+TrainersAll$runs.hcaps.SA+TrainersAll$runs.ptns.SA)

# difference variables, hcaps – mdns
TrainersAll$IVdiff.hcapsmdns<-TrainersAll$IV.hcaps-TrainersAll$IV.mdns
TrainersAll$IVdiff.hcapsmdns.SA<-TrainersAll$IV.SA.hcaps-TrainersAll$IV.SA.mdns

# quality differences using composites
TrainersAll$IVdiff.comp1.SAraw<-TrainersAll$IVcomp1.SA-TrainersAll$IVcomp1
TrainersAll$IVdiff.comp2.SAraw<-TrainersAll$IVcomp2.SA-TrainersAll$IVcomp2
# reduce the list to those trainers that have had >=50 runs in handicaps and are GB based and more than 2*50 runs in total
minruns<-50
z<-which(TrainersAll$runs.hcaps >= minruns & TrainersAll$country==”GB” & TrainersAll$runs.all >= 2*minruns)
Temp<-TrainersAll[z,]
TrainersAll50GB<-Temp[order(-Temp$IVcomp2.SA),]

# write out this d f to a CSV file
fname<-“c:/Racing Research/Trainer Research/trainersall50gb.csv”
write.csv(TrainersAll50GB,file=fname)

slcutoff<-40
z<-which(TrainersAll50GB$runner.h < slcutoff)
TrainersSmall50GB<-TrainersAll50GB[z,]
fname<-“c:/Racing Research/Trainer Research/trainerssmall50gb.csv”
write.csv(TrainersSmall50GB,file=fname)
z<-which(TrainersAll50GB$runner.h >= slcutoff)
TrainersLarge50GB<-TrainersAll50GB[z,]
fname<-“c:/Racing Research/Trainer Research/trainerslarge50gb.csv”
write.csv(TrainersLarge50GB,file=fname)

Owners Facilities: What Makes For a Good One?

In the post this morning I’ve had a letter from ARC Racing in which various improvements for owners on racedays are highlighted. The letter set me thinking : What makes for a good owners facility? In broad terms two things matter most – excellent viewing and comfortable facilities. To expand on this there are four criteria against which I’d judge whether a racecourse has a good owners facility.

Location

The owners facility should at the least have either have paddock views or be located in the stands with uninterrupted views opposite, or near to opposite,  the finish line.  If the owners facility is located away from the track there should be an owners area  located in the stands with uninterrupted views opposite, or near to opposite, the finish line.

Comfort

The owners facility should be large enough to fit the majority of owners and their guests seated.

Food & Drink

Food and drink should be available to a reasonable standard. Haute cuisine doesn’t have to be on offer, decent home cooking or a buffet is fine. I don’t mind paying as an owner if there is a decent selection on offer at a good value price.  Tea and coffee not in paper cups with tea served from tea pots and coffee that isn’t instant.

Badge Requests

It’s understandable that meetings such as Glorious Goodwood or Royal Ascot have restrictions on number of badges, extra badges and paddock passes. Both of these courses a the big meetings deal with owners requests  efficiently and with the minimum of fuss. Not all meetings are in such demand for badges and in these occasions flexibility on the part of courses is a plus.

Course Reviews

I’m going to post reviews of the courses I’ve visited in forthcoming blog pieces. More to follow!

Do Small Training Yards Punch Above Their Weight ? : R Code

####################################################################
#
# R Code to accompany the blog post ‘Do small Training Yards Punch Above Their Weight?

#

# SmallTrainerAnalysis.R
#
# J. Hathorn
#
# v1.0
#
#
# Code to look at whether small yards punch above their weight
#
# written 10-Sep-13
#
#
###################################################################

#rm(list=ls())

library(foreign)
library(maptools)

# read in database files from RI
#
setwd(“C:/Program Files (x86)/RaceForm Interactive”)

RIhorse.data <-read.dbf(“horse.dbf”)
RIouting.data <-read.dbf(“outing.dbf”)
RIrace.data <-read.dbf(“race.dbf”)
RIsire.data <-read.dbf(“sire.dbf”)
RItrainer.data <- read.dbf(“trainer.dbf”)
RIcourse.data<-read.dbf(“course.dbf”)

# #############################################################
# set date parameters to focus on races between chosen dates
# flat season Lincoln to the November Handicap
chosenDateSt<-c(“2012-03-31”)
chosenDateEd<-c(“2012-11-10”)

# set dates for determining yard sizes, set the previous year to the November Handicap
chosenDateSt1<-c(“2011-11-11”)
chosenDateEd1<-c(“2012-11-10”)
#################################################
#
# extract GB course id list from course db
z<-which(RIcourse.data$CCOUNTRY == “GB”)

GBcourseids<-RIcourse.data$CID[z]
GBcoursenames<-RIcourse.data$CNAME[z]
#
# extract GB/IRE trainer lists from trainer db
z<-which(RItrainer.data$TCOUNTRY == “GB”)
GBtrainers<-RItrainer.data$TID[z]
z<-which(RItrainer.data$TCOUNTRY == “IRE”)
IREtrainers<-RItrainer.data$TID[z]
GBIREtrainers<-append(GBtrainers,IREtrainers)
#
#
##################################################

# select outings on the flat between the chosen dates to categorise trainers
tmpidx<-which(RIouting.data$ODATE>=chosenDateSt1 & RIouting.data$ODATE<=chosenDateEd1)
T1.data<-RIouting.data[tmpidx,]

# match the course and add a country variable
z<-match(T1.data$OCOURSEID,RIcourse.data$CID)
T1.data$COCOUNTRY<-NA
T1.data$COCOUNTRY<-RIcourse.data$CCOUNTRY[z]

T1a.data<-T1.data
# select outings that took place on GB and IRE courses and append the 2 d f
tmpidx<-which(T1.data$COCOUNTRY == “GB”)
T1.data<-T1.data[tmpidx,]

tmpidx<-which(T1a.data$COCOUNTRY == “IRE”)
T2.data<-T1a.data[tmpidx,]

T1.data<-rbind(T1.data,T2.data)
# put the horse and trainer IDs into unique variables and add trainer name and country and horse name
horse<-T1.data$OHORSEID
trainer<-T1.data$OTRAINERID
uhorse<-unique(horse)
z<-match(uhorse,horse)
utrainer<-trainer[z]

# match the trainer name/domicile set up domestic/foreign variable
z<-match(utrainer,RItrainer.data$TID)
utrainerhome<-RItrainer.data$TCOUNTRY[z]
utrainername<-RItrainer.data$TSTYLENAME[z]
domestic<-0
z<-which(utrainerhome==”GB”)
domestic[z]<-1
z<-which(utrainerhome==”IRE”)
domestic[z]<-1
z<-which(is.na(domestic))
domestic[z]<-0

# match the horse name and calculate the horse age
z<-match(uhorse,RIhorse.data$HID)
uhorsename<-RIhorse.data$HNAME[z]
uhorsefdate<-RIhorse.data$HFOALDATE[z]
uhorseage<-(as.Date(chosenDateEd)-uhorsefdate)/365
# aggregate to num horses by trainer
trainerid<-tapply(utrainer,utrainer,mean)
numhorses<-tapply(uhorse,utrainer,length)
z<-match(trainerid,RItrainer.data$TID)
trainerctry<-RItrainer.data$TCOUNTRY[z]
trainerdomestic<-0
z<-which(trainerctry==”GB”)
trainerdomestic[z]<-1
z<-which(trainerctry==”IRE”)
trainerdomestic[z]<-1
z<-which(is.na(trainerdomestic))
trainerdomestic[z]<-0
# allocate trainers to overseas/tiny/small/medium/large categories in the variable size
tiny<-5
sml<-25
med<-75
size<-NA

t1<-which(trainerdomestic==0)
size[t1]<-“OVS”

t1<-which(trainerdomestic==1 & numhorses<= tiny)
size[t1]<-“TINY”
t1<-which(trainerdomestic==1 & numhorses>tiny & numhorses <=sml)
size[t1]<-“SMALL”
t1<-which(trainerdomestic==1 & numhorses>sml & numhorses <=med)
size[t1]<-“MEDIUM”
t1<-which(trainerdomestic==1 & numhorses>med)
size[t1]<-“LARGE”

size<-as.factor(size)
# total/avg horses by size category
horsesbycat<-tapply(numhorses,size,sum)
avgbycat<-tapply(numhorses,size,mean,na.rm=TRUE)

# match the size by trainer back to the unique horse vector, will be useful later
#
z<-match(utrainer,trainerid)
utrainersize<-size[z]

###############################################
#
# go to the race file and calculate how many GB handicap and pattern races and runners in the period examined
#
tmpidx<-which(RIrace.data$RDATE>=chosenDateSt & RIrace.data$RDATE<=chosenDateEd & RIrace.data$RFJ==”F”)
R1.data<-RIrace.data[tmpidx,]

z<-match(R1.data$RCOURSEID,RIcourse.data$CID)
R1.data$COCOUNTRY<-NA
R1.data$COCOUNTRY<-RIcourse.data$CCOUNTRY[z]

# select outings that took place on GB courses
tmpidx<-which(R1.data$COCOUNTRY == “GB”)
R1.data<-R1.data[tmpidx,]

# match the winning horse into uhorse to find trainer id/name, trainer home and size category
z<-match(R1.data$RWINHRSID,uhorse)
R1.data$trainerid<-utrainer[z]
R1.data$trainername<-utrainername[z]
R1.data$trainersize<-utrainersize[z]
R1.data$trainerhome<-utrainerhome[z]
R1.data$horseage<-uhorseage[z]

# set up 1/0 values for aggregation later by trainer category
z<-which(R1.data$trainersize==”OVS”)
R1.data$ovs<-NA
R1.data$ovs[z]<-1
z<-which(is.na(R1.data$ovs))
R1.data$ovs[z]<-0

z<-which(R1.data$trainersize==”TINY”)
R1.data$tiny<-NA
R1.data$tiny[z]<-1
z<-which(is.na(R1.data$tiny))
R1.data$tiny[z]<-0

z<-which(R1.data$trainersize==”SMALL”)
R1.data$sml<-NA
R1.data$sml[z]<-1
z<-which(is.na(R1.data$sml))
R1.data$sml[z]<-0

z<-which(R1.data$trainersize==”MEDIUM”)
R1.data$med<-NA
R1.data$med[z]<-1
z<-which(is.na(R1.data$med))
R1.data$med[z]<-0

z<-which(R1.data$trainersize==”LARGE”)
R1.data$lge<-NA
R1.data$lge[z]<-1
z<-which(is.na(R1.data$lge))
R1.data$lge[z]<-0
# match the winning horse into the RIhorse d f to get the sire id
z<-match(R1.data$RWINHRSID,RIhorse.data$HID)
R1.data$WINSIREID<-RIhorse.data$HSIREID[z]

# produce pattern only and handicap only d f
#
z<-which(R1.data$RPATTERN != “NOT” & R1.data$RISHCAP==”FALSE”)
Patterns<-R1.data[z,]

z<-which(R1.data$RISHCAP== “TRUE”)
Hcaps<-R1.data[z,]

# summarise winners for patterns/hcaps by sire ID in total and by trainer category
#
PatternWinsSireIDTmp<-tapply(Patterns$WINSIREID,Patterns$WINSIREID,mean)
PatternWinsSireTmp<-tapply(Patterns$WINSIREID,Patterns$WINSIREID,length)
PatternWinsSireOvsTmp<-tapply(Patterns$ovs,Patterns$WINSIREID,sum)
PatternWinsSireTinyTmp<-tapply(Patterns$tiny,Patterns$WINSIREID,sum)
PatternWinsSireSmlTmp<-tapply(Patterns$sml,Patterns$WINSIREID,sum)
PatternWinsSireMedTmp<-tapply(Patterns$med,Patterns$WINSIREID,sum)
PatternWinsSireLgeTmp<-tapply(Patterns$lge,Patterns$WINSIREID,sum)

HcapWinsSireIDTmp<-tapply(Hcaps$WINSIREID,Hcaps$WINSIREID,mean)
HcapWinsSireTmp<-tapply(Hcaps$WINSIREID,Hcaps$WINSIREID,length)
HcapWinsSireOvsTmp<-tapply(Hcaps$ovs,Hcaps$WINSIREID,sum)
HcapWinsSireTinyTmp<-tapply(Hcaps$tiny,Hcaps$WINSIREID,sum)
HcapWinsSireSmlTmp<-tapply(Hcaps$sml,Hcaps$WINSIREID,sum)
HcapWinsSireMedTmp<-tapply(Hcaps$med,Hcaps$WINSIREID,sum)
HcapWinsSireLgeTmp<-tapply(Hcaps$lge,Hcaps$WINSIREID,sum)

# summarise winners for patterns/hcaps by trainer size
PatternWinsTrainers<-tapply(Patterns$trainersize,Patterns$trainersize,length)
HcapWinsTrainers<-tapply(Hcaps$trainersize,Hcaps$trainersize,length)

#########################
#
# find out number of runs by category for each race type
#
z<-match(T1.data$OTRAINERID,utrainer)
T1.data$trainername<-utrainername[z]
T1.data$trainersize<-utrainersize[z]
T1.data$trainerhome<-utrainerhome[z]
T1.data$horseage<-uhorseage[z]

# set up 1/0 values for aggregation later by trainer category
z<-which(T1.data$trainersize==”OVS”)
T1.data$ovs<-NA
T1.data$ovs[z]<-1
z<-which(is.na(T1.data$ovs))
T1.data$ovs[z]<-0

z<-which(T1.data$trainersize==”TINY”)
T1.data$tiny<-NA
T1.data$tiny[z]<-1
z<-which(is.na(T1.data$tiny))
T1.data$tiny[z]<-0

z<-which(T1.data$trainersize==”SMALL”)
T1.data$sml<-NA
T1.data$sml[z]<-1
z<-which(is.na(T1.data$sml))
T1.data$sml[z]<-0

z<-which(T1.data$trainersize==”MEDIUM”)
T1.data$med<-NA
T1.data$med[z]<-1
z<-which(is.na(T1.data$med))
T1.data$med[z]<-0

z<-which(T1.data$trainersize==”LARGE”)
T1.data$lge<-NA
T1.data$lge[z]<-1
z<-which(is.na(T1.data$lge))
T1.data$lge[z]<-0

# bring in race types

z<-match(T1.data$ORACEID,R1.data$RID)
T1.data$RPATTERN<-R1.data$RPATTERN[z]
T1.data$RISHCAP<-R1.data$RISHCAP[z]

# match each horse into the RIhorse d f to get the sire id
z<-match(T1.data$OHORSEID,RIhorse.data$HID)
T1.data$SIREID<-RIhorse.data$HSIREID[z]

# produce pattern only and handicap only outing d f
#
z<-which(T1.data$RPATTERN != “NOT” & T1.data$RISHCAP==”FALSE”)
OutPatterns<-T1.data[z,]

z<-which(T1.data$RISHCAP== “TRUE”)
OutHcaps<-T1.data[z,]

# summarise runners for patterns/hcaps by trainer size
PatternRunsTrainers<-tapply(OutPatterns$trainersize,OutPatterns$trainersize,length)
HcapRunsTrainers<-tapply(OutHcaps$trainersize,OutHcaps$trainersize,length)

# summarise runners for patterns/hcaps by sire ID
PatternRunsSireID<-tapply(OutPatterns$SIREID,OutPatterns$SIREID,mean)
PatternRunsSire<-tapply(OutPatterns$SIREID,OutPatterns$SIREID,length)
PatternRunsSireOvs<-tapply(OutPatterns$ovs,OutPatterns$SIREID,sum)
PatternRunsSireTiny<-tapply(OutPatterns$tiny,OutPatterns$SIREID,sum)
PatternRunsSireSml<-tapply(OutPatterns$sml,OutPatterns$SIREID,sum)
PatternRunsSireMed<-tapply(OutPatterns$med,OutPatterns$SIREID,sum)
PatternRunsSireLge<-tapply(OutPatterns$lge,OutPatterns$SIREID,sum)

HcapRunsSireID<-tapply(OutHcaps$SIREID,OutHcaps$SIREID,mean)
HcapRunsSire<-tapply(OutHcaps$SIREID,OutHcaps$SIREID,length)
HcapRunsSireOvs<-tapply(OutHcaps$ovs,OutHcaps$SIREID,sum)
HcapRunsSireTiny<-tapply(OutHcaps$tiny,OutHcaps$SIREID,sum)
HcapRunsSireSml<-tapply(OutHcaps$sml,OutHcaps$SIREID,sum)
HcapRunsSireMed<-tapply(OutHcaps$med,OutHcaps$SIREID,sum)
HcapRunsSireLge<-tapply(OutHcaps$lge,OutHcaps$SIREID,sum)

# calc the average age of the horses run by each trainer category in handicaps
#
HcapAge.ovs<-sum(OutHcaps$ovs*as.numeric(OutHcaps$horseage),na.rm=TRUE)/sum(as.numeric(OutHcaps$ovs),na.rm=TRUE)
HcapAge.tiny<-sum(OutHcaps$tiny*as.numeric(OutHcaps$horseage),na.rm=TRUE)/sum(as.numeric(OutHcaps$tiny),na.rm=TRUE)
HcapAge.sml<-sum(OutHcaps$sml*as.numeric(OutHcaps$horseage),na.rm=TRUE)/sum(as.numeric(OutHcaps$sml),na.rm=TRUE)
HcapAge.med<-sum(OutHcaps$med*as.numeric(OutHcaps$horseage),na.rm=TRUE)/sum(as.numeric(OutHcaps$med),na.rm=TRUE)
HcapAge.lge<-sum(OutHcaps$lge*as.numeric(OutHcaps$horseage),na.rm=TRUE)/sum(as.numeric(OutHcaps$lge),na.rm=TRUE)
#
PtrnAge.ovs<-sum(OutPatterns$ovs*as.numeric(OutPatterns$horseage),na.rm=TRUE)/sum(as.numeric(OutPatterns$ovs),na.rm=TRUE)
PtrnAge.tiny<-sum(OutPatterns$tiny*as.numeric(OutPatterns$horseage),na.rm=TRUE)/sum(as.numeric(OutPatterns$tiny),na.rm=TRUE)
PtrnAge.sml<-sum(OutPatterns$sml*as.numeric(OutPatterns$horseage),na.rm=TRUE)/sum(as.numeric(OutPatterns$sml),na.rm=TRUE)
PtrnAge.med<-sum(OutPatterns$med*as.numeric(OutPatterns$horseage),na.rm=TRUE)/sum(as.numeric(OutPatterns$med),na.rm=TRUE)
PtrnAge.lge<-sum(OutPatterns$lge*as.numeric(OutPatterns$horseage),na.rm=TRUE)/sum(as.numeric(OutPatterns$lge),na.rm=TRUE)

 
# as the winners tapply won’t include all runners, match back into winner variables to line up winners with runners
z<-match(PatternRunsSireID,PatternWinsSireIDTmp)
PatternWinsSire<-PatternWinsSireTmp[z]
PatternWinsSireOvs<-PatternWinsSireOvsTmp[z]
PatternWinsSireTiny<-PatternWinsSireTinyTmp[z]
PatternWinsSireSml<-PatternWinsSireSmlTmp[z]
PatternWinsSireMed<-PatternWinsSireMedTmp[z]
PatternWinsSireLge<-PatternWinsSireLgeTmp[z]

z<-which(is.na(PatternWinsSire))
PatternWinsSire[z]<-0
z<-which(is.na(PatternWinsSireOvs))
PatternWinsSireOvs[z]<-0
z<-which(is.na(PatternWinsSireTiny))
PatternWinsSireTiny[z]<-0
z<-which(is.na(PatternWinsSireSml))
PatternWinsSireSml[z]<-0
z<-which(is.na(PatternWinsSireMed))
PatternWinsSireMed[z]<-0
z<-which(is.na(PatternWinsSireLge))
PatternWinsSireLge[z]<-0

# repeat for handicaps

z<-match(HcapRunsSireID,HcapWinsSireIDTmp)
HcapWinsSire<-HcapWinsSireTmp[z]
HcapWinsSireOvs<-HcapWinsSireOvsTmp[z]
HcapWinsSireTiny<-HcapWinsSireTinyTmp[z]
HcapWinsSireSml<-HcapWinsSireSmlTmp[z]
HcapWinsSireMed<-HcapWinsSireMedTmp[z]
HcapWinsSireLge<-HcapWinsSireLgeTmp[z]

z<-which(is.na(HcapWinsSire))
HcapWinsSire[z]<-0
z<-which(is.na(HcapWinsSireOvs))
HcapWinsSireOvs[z]<-0
z<-which(is.na(HcapWinsSireTiny))
HcapWinsSireTiny[z]<-0
z<-which(is.na(HcapWinsSireSml))
HcapWinsSireSml[z]<-0
z<-which(is.na(HcapWinsSireMed))
HcapWinsSireMed[z]<-0
z<-which(is.na(HcapWinsSireLge))
HcapWinsSireLge[z]<-0

# calc IVs per sire
PatternIVsire<-(PatternWinsSire/sum(PatternWinsSire))/(PatternRunsSire/sum(PatternRunsSire))
HcapIVsire<-(HcapWinsSire/sum(HcapWinsSire))/(HcapRunsSire/sum(HcapRunsSire))

# calc IV adjusted runners by sire
#
PatternRunsSire.IV<- PatternRunsSire*PatternIVsire
PatternRunsSireOvs.IV<- PatternRunsSireOvs*PatternIVsire
PatternRunsSireTiny.IV<- PatternRunsSireTiny*PatternIVsire
PatternRunsSireSml.IV<- PatternRunsSireSml*PatternIVsire
PatternRunsSireMed.IV<- PatternRunsSireMed*PatternIVsire
PatternRunsSireLge.IV<- PatternRunsSireLge*PatternIVsire

HcapRunsSire.IV<- HcapRunsSire*HcapIVsire
HcapRunsSireOvs.IV<- HcapRunsSireOvs*HcapIVsire
HcapRunsSireTiny.IV<- HcapRunsSireTiny*HcapIVsire
HcapRunsSireSml.IV<- HcapRunsSireSml*HcapIVsire
HcapRunsSireMed.IV<- HcapRunsSireMed*HcapIVsire
HcapRunsSireLge.IV<- HcapRunsSireLge*HcapIVsire