Piotr Skalski – Sport player recognition. How to Detect, Track, and Identify Players

– player and number detection with RF-DETR
– player tracking with SAM2
– team clustering with SigLIP, UMAP and K-means
– number recognition with SmolVLM2

1. detection: we start with RF-DETR model fine-tuned to detect players, numbers, referees, ball, rim
model + dataset: https://universe.roboflow.com/roboflow-jvuqo/basketball-player-detection-3-ycjdo/dataset/6
rf-detr repository: https://github.com/roboflow/rf-detr


2. tracking: SAM2 tracks objects across video using visual prompts like boxes or points.
We use a fine-tuned RF-DETR to detect all players in the first frame, pass these detections to SAM2, and track them in the following frames.
SAM2 tutorial: https://www.youtube.com/watch?v=Dv003fTyO-Y

3. Team clustering: every basketball game differs in uniforms, courts, and visuals, so we use unsupervised learning to build a general solution without manual annotation.
We sample frames, detect players, crop their central regions, and generate embeddings with SigLIP. To avoid noise, we only keep the core of each box, which usually captures the most relevant player details. We then reduce embeddings with UMAP and cluster them with KMeans into two groups, each corresponding to a team.
I used the same approach in my Football AI: https://www.youtube.com/watch?v=aBVGKoNZQUw


4. numbers OCR: Reading player numbers from small and blurry crops is not easy. Traditional OCR models struggle with this task. For this reason, we decided to use SmolVLM2, fine-tuned on a custom multi-modal dataset.
The dataset contains jersey number crops collected from the 2025 NBA Playoffs. These numbers were first auto-annotated using a pre-trained SmolVLM2 model, then manually refined. Finally, a LoRA adapter was trained on the dataset, which we now load for inference.
model + dataset: https://universe.roboflow.com/roboflow-jvuqo/basketball-jersey-numbers-ocr/dataset/3


5. pair player detections with recognized numbers: The next challenge is to assign each detected jersey number to the correct tracked player. To achieve this, we use the mask IoS metric.
IoS is similar to IoU, but with one key difference. While IoU measures the overlap ratio relative to the union of two areas, IoS measures the overlap relative to the smaller area. This means that if a smaller object is completely inside a larger one, IoS equals 1.
We use masks instead of bounding boxes because they provide much more accurate matches. The process is similar to player tracking: we first detect players in the initial video frame using RF-DETR, then track them across frames with SAM2.1. As a result, for every video frame we obtain precise segmentation masks for each player. Next, we run RF-DETR again, this time to detect jersey numbers on each frame. We convert the number bounding boxes into masks and then compute batch IoS between player masks and number masks. Finally, we keep only pairs with an almost perfect overlap (0.9 or higher). By selecting the indices of the corresponding rows and columns, we determine which player detections and jersey number detections belong together.


6. validate numbers: Since the players’ positions relative to the camera are constantly changing, the visibility of their jersey numbers also varies. It is therefore unwise to rely on a single prediction from the number recognition model. Under certain conditions, a 2 may be misread as a 7, and 23 may be mistaken for 2, 3, or 25. Validation of the recognized number across multiple samples is necessary. To increase confidence, we require three identical results across consecutive predictions.
We also introduce spacing between individual predictions. First, because each prediction using SmolVLM2 is relatively expensive. Second, to ensure sufficient variation in the player’s position relative to the camera between samples. In particular, running predictions on consecutive frames makes little sense, as the player’s position is likely almost unchanged.

Code link


Also: Football AI Tutorial: From Basics to Advanced Stats with Python