The journey started with a spark of inspiration from Twitter. Once I got here throughout Dylan Drummey’s put up about bat keypoint monitoring with MLB video, I instantly acknowledged its potential software at Driveline Baseball. That very same day, I dove headfirst into the mission by starting to annotate 1,000 side-view hitting photographs.
This text was written by Clayton Thompson and Adam Bloebaum within the Driveline R&D division.
Understanding Annotation and Mannequin Coaching
Annotation is a vital first step in coaching laptop imaginative and prescient fashions. Consider it as educating a baby to acknowledge objects – you first want to indicate them many examples whereas declaring what they’re taking a look at. In our case, we have been educating our mannequin to acknowledge two particular factors on a baseball bat: the cap (finish) and the knob (deal with). This course of concerned manually marking these factors on 1000’s of photographs, making a dataset of exact pixel coordinates for every keypoint.

This annotation course of is prime to pose estimation fashions, which differ from conventional object detection fashions. Whereas object detection tells you the place an object is in a picture (like discovering a bat), pose estimation goes additional by figuring out particular factors on that object (like pinpointing precisely the place the cap and knob are). By offering 1000’s of examples of accurately labeled keypoints, we’re basically making a coaching guide for the mannequin to be taught the visible patterns that point out these particular places.


Mannequin Structure
Whereas we had earlier expertise utilizing YOLOv8 for laptop imaginative and prescient duties at Driveline Baseball, this mission required a special strategy. We wanted a customized pose estimation mannequin particularly designed to trace bat keypoints. This was our first enterprise into coaching a customized pose mannequin, and it got here with its personal set of challenges.


The preliminary outcomes have been promising – our mannequin might monitor the bat’s motion comparatively properly. Nevertheless, our final objective was extra formidable: we wished to estimate bat pace from easy side-view video. This meant we wanted to construct a complete dataset pairing side-view video swings with our marker-based movement seize information to offer floor fact velocities for coaching.


Knowledge Assortment Pipeline
Our first main enterprise was organising a strong information assortment system. We repurposed a 2014 Edgertronic SC1 digital camera, synchronizing it with our lab setup to routinely set off with each swing within the mocap lab. The true problem wasn’t simply accumulating the information – it was guaranteeing excellent pairing between our mocap information and side-view movies. We spent practically a month growing subtle pipelines to routinely match and confirm this information every night time.
Processing and Level of Contact Detection
With our information assortment system in place, we confronted our subsequent problem: figuring out the important thing frames in every swing. Whereas we had the pixel positions of the cap and knob, we wanted a dependable option to determine the purpose of contact. Our preliminary strategy of utilizing cap velocity to detect the beginning of the swing proved too inconsistent, so we shifted our focus to discovering the purpose of contact.
This led us to develop one other mannequin, this time for ball detection. We manually annotated roughly 10,000 photographs, however we labored smarter by implementing semi-automatic annotation. Our current mannequin might predict ball places, which we used to help in drawing detection bounding packing containers. This considerably accelerated our annotation course of whereas sustaining accuracy.


Figuring out the precise level of contact required cautious consideration. In a right-side view video, the ball usually strikes proper to left by the body. We initially tried to determine level of contact by detecting the primary body the place the ball modified course, however this strategy got here with important limitations.
Our preliminary technique required the ball to first seem in the best third of the body and present motion within the subsequent body. This strict requirement, mixed with our mannequin’s tendency to generate false positives, created reliability points. We wanted a strong technique to filter out outlier detections whereas sustaining the power to trace the ball’s full trajectory.
The handedness of the batter launched extra complexity. Since our algorithm anticipated right-to-left ball motion, we needed to flip all left-handed hitter movies to seem right-handed. This wasn’t nearly ball monitoring – we wished our positional keypoints for left-handed swings to reflect and match our right-handed information for consistency in our evaluation. For instance, all of Adam’s check swings needed to be flipped to take care of this standardization.


Our resolution developed to make use of each bat and ball positions, discovering the body the place the ball was closest to the bat’s candy spot (outlined as 1/3 of the space from the cap to the knob). We in the end selected to make use of the body simply earlier than precise contact as our reference level.


The Scaling Issue Problem
One in all our most important challenges got here from an sudden supply: perspective. Totally different hitters naturally stand at various distances from the digital camera, which created a elementary downside in our 2D to 3D coordinate conversion. Our preliminary try to unravel this by place normalization proved to be a important mistake, one which had far-reaching implications for our bat pace predictions.
The normalization strategy basically pressured all swings into the identical positional constraints, no matter their precise spatial relationships. This was like making an attempt to measure the peak of buildings by making all their pictures the identical measurement – you lose the essential details about their true scale. However the issue went deeper than simply spatial relationships. By normalizing the positions, we have been inadvertently altering the basic physics of every swing.
# Get the general vary for x and y coordinates
x_min = min(df['x1'].min(), df['x2'].min())
y_min = min(df['y1'].min(), df['y2'].min())
x_max = min(df['x1'].max(), df['x2'].max())
y_max = min(df['y1'].max(), df['y2'].max())
# Normalize all coordinates utilizing the identical vary
df['x1_norm'] = (df['x1'] - x_min) / (x_max - x_min)
df['x1_norm'] = (df['x1'] - x_min) / (x_max - x_min)
df['y1_norm'] = (df['y1'] - y_min) / (y_max - y_min)
df['y2_norm'] = (df['y2'] - y_min) / (y_max - y_min)
In baseball, longer swings present extra time for acceleration, permitting the bat to doubtlessly obtain increased speeds. Once we normalized our place information, we have been successfully compressing or increasing these swing paths artificially. An extended, highly effective swing taken from far-off may be compressed to appear to be a shorter swing, whereas a compact swing taken near the digital camera may be stretched out. This normalization was basically erasing the pure relationship between swing path size and potential bat pace, making our velocity predictions unreliable. In consequence, our mannequin was fairly poor.


We first tried to unravel this utilizing monocular depth modeling – a method that tries to estimate depth from a single digital camera view. Whereas the strategy confirmed promise in extracting depth info from our 2D movies, it proved too computationally intensive and inconsistent for real-time evaluation.


The arbitrary nature of depth predictions from a single viewpoint made it troublesome to determine dependable measurements throughout totally different swings and batting stances. Depth maps have been generated by a neural community for every body which was used to get keypoint depth.


The breakthrough got here after we realized we might use the bat itself as a reference object. Since we all know the precise size of a bat, we might calculate a scaling issue based mostly on its obvious measurement within the video:
scaling_factor = actual_bat_length / max_apparent_bat_length
This elegant resolution supplied a proxy for depth, as a bat seems smaller when farther from the digital camera and bigger when nearer. Initially, we thought of utilizing the obvious bat size at level of contact to find out our scaling issue. Let’s take a look at some examples that illustrate each the promise and limitations of this strategy:
two left-handed hitters, we see comparable obvious bat lengths round level of contact, suggesting this could possibly be a dependable depth proxy.




When inspecting right-handed hitters, we discovered persistently bigger obvious bat lengths – precisely what we’d anticipate since they stand nearer to the digital camera. This validated our primary premise: obvious bat size might certainly point out relative depth. Nevertheless, we additionally noticed practically 20-pixel variations between right-handed swings at comparable distances.




Whereas obvious bat size can point out depth, it’s considerably affected by vertical bat angle. This led us to change our strategy. As an alternative of utilizing the bat size at level of contact, we determined to make use of the utmost obvious bat size all through the whole swing.
# Euclidian Distance to get bat size for every body
np_bat_length = np.sqrt((df['x2'] - df['x1'])**2 + (df['y2'] - df['y1'])**2)
# Getting most showing bat size
max_appearing_bat_length = np_bat_length.max()
# Calculation of the scaling issue
scaliing_factor = bat_length / max_appearing_bat_length
This proved extra sturdy as a result of it decreased the influence of vertical bat angle variations. We don’t have to know the precise distance from the digital camera – we simply want a dependable option to inform if one hitter is usually nearer or additional away than one other.
With this refined strategy, we utilized the scaling issue to our positional information:
x_scaled = pos_x * scaling_factor
This technique dramatically improved our mannequin’s accuracy by preserving each true spatial relationships and pure swing dynamics. The great thing about this resolution lies in its simplicity – we’re utilizing the bat itself to inform us roughly how far-off the hitter is standing.
Knowledge Smoothing and Sign Processing
The Butterworth filter is especially favored in biomechanics due to its “maximally flat” frequency response – that means it minimizes distortion of the true sign inside its passband. This attribute makes it best for human motion evaluation, the place sustaining the integrity of the motion sample whereas eradicating sensor noise is essential. In conventional biomechanics functions like gait evaluation or pitching mechanics, these filters successfully separate the comparatively low-frequency human motion from high-frequency noise.
Nevertheless, bat swing evaluation introduced a novel problem that made conventional biomechanics filtering approaches much less appropriate. Bat pace can change extraordinarily quickly throughout a swing, with the quickest acceleration occurring simply earlier than contact. These speedy adjustments in velocity create high-frequency parts in our sign which might be truly significant information, not noise.


Once we tried to use a fourth-degree Butterworth filter, we discovered it was smoothing out these essential high-frequency parts, successfully erasing a very powerful a part of our sign – the speedy acceleration section of the swing. Even decreasing to a second-degree filter didn’t clear up the issue; we have been nonetheless dropping important details about the swing’s peak velocity section.
The Swingathon and Mannequin Refinement
In December 2024, we acknowledged that our mannequin, whereas performing properly for bat speeds between 50-75 mph, wanted enchancment exterior this vary. This led to an intensive information assortment effort we dubbed “The Swingathon” – a 15-hour session capturing 900 swings with numerous bat sorts and swing traits.


These tee-based swings required us to change our level of contact detection algorithm. We developed a technique that analyzed the ball’s pixel place adjustments (delta) body by body. One important benefit of this strategy was its independence from swing course – the algorithm labored equally properly for each proper and left-handed hitters with out requiring video flipping. A stationary ball on the tee would have a delta of zero, making the preliminary state simple to determine.
When the bat made contact, it might trigger a big change within the ball’s delta, giving us a tough window of frames the place contact possible occurred.


Nevertheless, figuring out the precise level of contact required extra precision. We used a two-step course of: first, the change in delta helped us determine a slender vary of potential contact frames. Then, inside this vary, we carried out a frame-by-frame evaluation of the space between the ball and the bat’s candy spot to pinpoint the precise body of contact. This hybrid strategy mixed the effectivity of delta-based detection with the precision of spatial evaluation.
Keypoint Confidences
A pose estimation mannequin generates two forms of confidence scores: object detection confidence (how sure it’s about figuring out the item and its bounding field) and keypoint confidence (how sure it’s in regards to the particular keypoint places). Initially, we have been filtering our predictions based mostly solely on object detection confidence, requiring a threshold of 0.85 or increased. This strategy, whereas seemingly conservative, was truly inflicting us to discard doubtlessly priceless information.


One more breakthrough got here after we started investigating the connection between these two confidence sorts. Our evaluation revealed one thing stunning: even in frames the place object detection confidence was low (typically as a consequence of movement blur), the mannequin was continuously making correct keypoint predictions. This perception led us to launch a complete investigation into keypoint confidence reliability.
We manually annotated 3,200 photographs, focusing significantly on instances the place our authentic pipeline had struggled (object detection < 0.8). These have been frames that our earlier strategy would have discarded and interpolated. For every picture, we in contrast the bottom fact pixel coordinates with our mannequin’s predictions, analyzing each the miss distances and the mannequin’s reported confidence in every prediction.


The outcomes have been revealing. Our mannequin’s keypoint predictions have been remarkably correct, with most miss distances of solely 14-15 pixels even in difficult instances. After filtering apparent outliers (confidence < 0.99), the statistical evaluation confirmed spectacular precision:
Knob Keypoint Statistics
Statistic | Confidence | Miss Distance (pixels) |
---|---|---|
Depend | 3236 | 3236 |
Imply | 0.999590 | 2.6262 |
Normal Deviation | 0.000465 | 1.9098 |
Minimal | 0.992733 | 0.0413 |
twenty fifth Percentile | 0.999581 | 1.3077 |
Median (fiftieth %) | 0.999720 | 2.1717 |
seventy fifth Percentile | 0.999807 | 3.3687 |
Most | 0.999948 | 14.7283 |
Cap Keypoint Statistics
Statistic | Confidence | Miss Distance (pixels) |
---|---|---|
Depend | 3193 | 3193 |
Imply | 0.999610 | 2.4354 |
Normal Deviation | 0.000531 | 1.9019 |
Minimal | 0.990198 | 0.0345 |
twenty fifth Percentile | 0.999588 | 1.2188 |
Median (fiftieth %) | 0.999742 | 2.0070 |
seventy fifth Percentile | 0.999820 | 3.0375 |
Most | 0.999949 | 15.6513 |
These findings revolutionized our strategy to swing evaluation:
- Unbiased Keypoint Analysis
- Beforehand, if both the cap or knob confidence was low, we’d discard the whole body. Our statistical evaluation confirmed that always one keypoint would keep excessive confidence even when the opposite struggled. By evaluating keypoints independently, we might now maintain the high-confidence keypoint and solely interpolate the opposite. This instantly decreased our information loss by roughly 50% in problematic frames.
- Selective Interpolation
- Earlier than this evaluation, we have been interpolating a mean of 10 frames per video as a consequence of poor object detection confidence. This not solely created potential inaccuracies but in addition required important computational overhead. With our new keypoint-based strategy, we just about eradicated the necessity for interpolation. When interpolation is required, it’s now focused to particular keypoints reasonably than complete frames, preserving extra of the unique information.
- Salvaging “Poor” Bounding Field Predictions
- Maybe most importantly, we found {that a} poor bounding field confidence didn’t essentially point out poor keypoint detection. In frames with movement blur, the mannequin would possibly wrestle to outline the precise boundaries of the bat (resulting in low field confidence) whereas nonetheless precisely figuring out the cap and knob positions. This perception allowed us to maintain many frames we beforehand would have discarded, significantly in the course of the essential high-speed parts of swings.
These enhancements dramatically decreased the necessity for extra mannequin coaching. Beforehand, we’d have concluded that extra annotation was wanted to enhance efficiency in motion-blurred frames. As an alternative, by higher using our current predictions, we’ve created a extra sturdy system whereas truly decreasing computational overhead.
Actual-Time Processing Pipeline
A important requirement of our system was the power to generate bat pace predictions shortly sufficient to be helpful for our similar day movement seize studies. To attain this, we developed an automatic processing pipeline that begins the second a swing is recorded.
We applied a watchdog script that repeatedly screens our digital camera system for brand spanking new movies. As quickly as our side-view digital camera captures a swing, the script routinely identifies the brand new file and instantly begins processing. This automation eliminates any guide steps between video seize and evaluation, permitting us to generate bat pace predictions inside 10 seconds of the swing.


This speedy processing functionality was important for effectively pairing our predictions with the corresponding mocap information and sustaining an organized database of swing analyses. The fast turnaround time ensures that coaches and gamers can entry swing information instantly following their movement seize session within the Launchpad.
Engineering Precision by LOESS Refinement
Each measurement system requires calibration, and our bat pace prediction mannequin was no exception. As we analyzed our outcomes, we found systematic variations between our predictions and precise measurements – variations that adopted delicate however constant patterns throughout totally different speeds and angles.
To deal with these patterns, we applied a LOESS (Domestically Estimated Scatterplot Smoothing) calibration system. This statistical strategy allowed us to map the connection between our preliminary predictions and precise measurements with outstanding precision. By making use of this calibration curve to our predictions, we might routinely regulate for any systematic variations in our mannequin’s output.


The event of this calibration system introduced its personal engineering challenges. We wanted to find out the optimum allocation of our dataset between coaching, validation, and testing phases. By means of cautious evaluation, we found that we might use our real-time predictions as a pure check set, permitting us to dedicate extra information to growing our calibration curve with out compromising our mannequin’s base efficiency.




The outcomes exceeded our expectations. Our calibrated system now produces extremely correct measurements throughout the total vary of bat speeds and assault angles. For instance, if our system initially predicts a bat pace of 70 mph, our calibration adjusts it to 73.09 mph, bringing it consistent with our movement seize measurements. Equally, assault angle predictions are refined with spectacular precision – a uncooked prediction of 8 levels is adjusted to eight.99 levels, matching our floor fact information.
Adjusted Bat Speeds (mph)
50.00 -> 46.64
55.00 -> 51.68
60.00 -> 56.65
65.00 -> 64.32
70.00 -> 73.09
75.00 -> 79.03
80.00 -> 84.95
Adjusted Assault Angles (levels)
-8.00 -> -18.59
-4.00 -> -10.93
0.00 -> -3.65
4.00 -> 2.66
8.00 -> 8.99
12.00 -> 16.95
16.00 -> 27.32
As with every precision measurement system, we’re persevering with to refine our calibration by in depth real-world testing. Over the approaching months, we’ll be accumulating extra validation information to make sure our calibration stays correct throughout all attainable eventualities.
Conclusion: Past Bat Pace
Our journey from preliminary idea to working system has been one in all steady problem-solving and innovation. What started as a easy bat-tracking mission developed into a classy system that bridges the hole between laptop imaginative and prescient and biomechanics. The challenges we encountered – from depth notion to sign processing – pushed us to develop novel options which will have functions past baseball.
The success of our scaling issue strategy demonstrates that generally the best options are probably the most elegant. Through the use of the bat itself as a measuring stick, we solved a posh 3D reconstruction downside with out the necessity for a number of cameras or advanced depth estimation algorithms. This precept of discovering easy, sturdy options to advanced issues guided our complete growth course of.
Wanting forward, we see quite a few prospects for increasing this method. The flexibility to precisely monitor bat motion from a single digital camera view opens doorways for functions starting from newbie participant growth to skilled sport evaluation. We might doubtlessly prolong this technique to different sports activities gear or motion evaluation the place conventional movement seize techniques are impractical.
Most significantly, this mission has proven that with cautious consideration of the underlying physics and biomechanics, laptop imaginative and prescient can present dependable quantitative measurements from easy video enter. As we proceed to refine our system, we’re excited in regards to the potential to make subtle swing evaluation extra accessible to gamers and coaches in any respect ranges of the sport.
Editor’s Word: Driveline Baseball filed for mental property safety beneath the US Patent and Trademark Workplace on this expertise in 2024. Reference # US 63/634,678.
!function(f,b,e,v,n,t,s){if(f.fbq)return;n=f.fbq=function(){n.callMethod?
n.callMethod.apply(n,arguments):n.queue.push(arguments)};if(!f._fbq)f._fbq=n;
n.push=n;n.loaded=!0;n.version=’2.0′;n.queue=[];t=b.createElement(e);t.async=!0;
t.src=v;s=b.getElementsByTagName(e)[0];s.parentNode.insertBefore(t,s)}(window,
document,’script’,’https://connect.facebook.net/en_US/fbevents.js’);