
Fingerspelling Detection
Videos depicting fingerspelling detections (red dotted line). If fingerspelling is present, the line is at 1 and 0 if not present. The number of TP, FN and FP are shown as well. An event is considered a TP if the intersection over union (IoU) is above 0.5 and a FN otherwise.Fingerspelling Classification
Fingerspelling is a critical component of British Sign Language (BSL), used to spell proper names, technical terms, and words that lack established lexical signs. Fingerspelling recognition is challenging due to the rapid pace of signing and common letter omissions by native signers, while existing BSL fingerspelling datasets are either small in scale or temporally and letter-wise inaccurate. In this work, we introduce a new large-scale BSL fingerspelling dataset, FS23K, constructed using an iterative annotation framework. In addition, we propose a fingerspelling recognition model that explicitly accounts for bi-manual interactions and mouthing cues. As a result, with refined annotations, our approach halves the character error rate (CER) compared to the prior state of the art on fingerspelling recognition. These findings demonstrate the effectiveness of our method and highlight its potential to support future research in sign language understanding and scalable, automated annotation pipelines.

BSL alphabet. Unlike many other sign languages, British Sign Language (BSL) employs bi-manual fingerspelling, which poses additional challenges for recognition due to frequent occlusions between the two hands. Note, these examples are for a left-handed signer.
FS23K Dataset
FS23K contains 2 datasets: temporal boundaries (133K) and words (23K). These datasets derive from the BOBSl dataset, which contains over 1400 hours of interpreted data from the BBC. We make use of the Transpeller automated annotations (also from BOBSL), which contain noisy automatic annotations. The temporal boundaries dataset contain cleaned, time-aligned entries from the Transpeller automatic annotations, with false positives and unavailable videos removed. The word-level dataset is a subset of the temporal boundaries dataset, where the word is fully spelt out and all letters are present. Often when fingerspelling, signers abbreviate words so only 'd' is spelt to communicate 'Darwin'.
The number of letters, frames and hours refers to the FSK23 word level dataset.

Histogram of letter distribution in FS23K. The letters a (16,577) and e (13,754) occur most frequently, whereas q (143) and x (322) appear least often. This imbalance reflects the natural distribution of letters in in-the-wild BBC broadcast data.
Network Architecture

Results

[1] K. R. Prajwal, H. Bull, L. Momeni, S. Albanie, G. Varol, and A. Zisserman. Weakly-supervised fingerspelling recognition in british sign language videos. In British Machine Vision Conference, 2022.
Citation
Recognising BSL Fingerspelling in Continuous Signing Sequencesn
Alyssa Chan*, Taein Kwon*, Andrew Zisserman
* Co-first authors
@article{chan2026recognising,
title={Recognising BSL Fingerspelling in Continuous Signing Sequences},
author={Chan, Alyssa and Kwon, Taein and Zisserman, Andrew},
journal={arXiv preprint arXiv:2603.19523},
year={2026}
}
Acknowledgements
This research is funded by the UKRI EPSRC Programme Grant SignGPT EP/Z535370/1, an SNSF Postdoc.Mobility Fellowship P500PT_225450, and a Royal Society Research Professorship RSRP\R\241003. We are grateful for comments from Ryan Wong.
Team
|
|
|
|---|---|---|
| Alyssa Chan | Taein Kwon | Andrew Zisserman |