Captioning is a vitally important service that makes media content accessible to everyone. According to the National Association of the Deaf in the U.S., captioning “provides a critical link to communication, information, education, news, and entertainment for more than 36 million Americans who are deaf or hard of hearing”. That number is much higher today according to the recent petition to the FCC for rulemaking on the quality of captions, while the statement holds true for millions more the world over.
The history of captioning aligns with the technological evolution seen from the latter part of the 20th century to the present. Captions were introduced in the 1970s when the first open-captioned shows were run on American public television stations. The U.S. government funded captioning technology research, and in 1976 the U.S. Federal Communications Commission (FCC) reserved Line 21 of the broadcasting signal for closed caption transmission. Then in 1979, the National Captioning Institute (NCI) was founded as a non-profit organization and charged with the mission “to promote and provide access to television programs for the deaf and hard-of-hearing community through closed captioning technology.”
NCI developed the first decoder box, began closed-captioning offline broadcasts in 1980, and moved to live closed-captioning using stenographers in 1982. They later partnered with the ITT Corporation to develop the first caption-decoding microchip to be built directly into television sets, which led to the Television Decoder Circuitry Act in 1990. Captioning was off and running!
A specialized workforce was required to produce live closed captions. The early captioners were courtroom and deposition stenographers, as many at NCI are to this day. Steno captioners need up to six months of on-the-job training, with an additional 12-18 months to complete all sports training.
Captioning work is intense and stressful. Back in 1985, offline captions that are prepared in advance averaged a reading speed of 120 words per minute for adults and 90 for children. Steno captioners working on live programming, however, were able to reach staggering typing speeds of up to 300 words per minute. Early steno captioners would sit in a locked room full of monitors, watching along with viewers and captioning the action on the screen live. A text editor would serve as coordinator and director, flagging repeats so the captioner could take a break.
Darlene Parker, Steno Captioning and Realtime Relations Director at NCI, was the sixth steno captioner in the U.S., starting in 1984 after working in the District of Columbia courts. She describes how exciting and rewarding it was to apply her skills to helping deaf and hard-of-hearing audiences understand important news of the day. After captioning a presidential debate, she remembers receiving a letter from a deaf viewer expressing appreciation for being able to watch the debate and not having to wait until the next day to read about it in the newspaper. Ms. Parker also worked on a wide variety of programming and sporting events, including the Super Bowl and the World Series, commenting “As a sports fan, I couldn’t believe I was getting paid to caption sports!” She noted the extremely fast pace of sports talk shows make them among the most challenging programming to caption.
Ms. Parker explained that because steno machines use coded combinations of phonemes instead of letters, “steno captioning is more akin to playing chords on a piano rather than typing on a computer keyboard.” But at breakneck speeds of up to 300 wpm, sometimes mistakes are unavoidable. NCI reports that since inception, the accuracy of live captioning was no less than 95%. It surpassed 99% in 1992, after a decade spent developing specialized techniques, dictionaries and software.
Unfortunately, the elite steno captioner workforce is not a cost-effective solution for today’s greatly expanded live captioning needs. This workforce is decreasing in favor of a new technology: speaker-dependent Automatic Speech Recognition (ASR), introduced into captioning workflows at the turn of the 21st century.
Live captioning is now increasingly produced with the help of voice writers and in some cases even in a completely automated fashion. ASR represents the next tech wave driving captioning for media accessibility forward. We’ll shed more light on it in an upcoming post.
Legislation in the U.S. has long been a key force behind advancing accessibility through captioning and other audiovisual localization services like sign language and audio description.
The Americans with Disabilities Act of 1990 took the first step in this direction by prohibiting discrimination on the basis of disability. It also stipulated that certain public facilities were required to provide access to televised information, films or slide shows. The Television Decoder Circuitry Act of 1990 mandated that, as of July 1993, all 13’’ TVs and larger sold or manufactured in the US contain the closed-caption decoder chip, while the Telecommunications Act of 1996 made similar provisions and further authorized the FCC to issue rules and a timetable regarding the requirement for captioned programming.
Two years later, Section 508 of the Rehabilitation Act Amendments of 1998 required federal agencies to develop, procure, maintain and use accessible technology. The FCC also set broadcaster quotas for captioning, measured in broadcast hours. For content first shown before January 1, 1998, the FCC required that 75% (at broadcaster’s discretion) be captioned; after that date, the quota increased to 100%. But that 100% requirement only applies to network affiliates of the Top 25 news markets. For affiliates of smaller markets, the practice is discretionary and varies widely. There are further exceptions for smaller and newer networks, as well as programming broadcast between 2-6am.
The next captioning legislation milestone was the Twenty-First Century Communications and Video Accessibility Act of 2010, which brought accessibility legislation up-to-date with 21st century digital, broadband and mobile technologies. The FCC Regulation for IP Captioning of 2012 extended closed captioning to internet-delivered video as well.
In 1985, NCI decoders were in 100,000 homes and NCI was captioning 3,600 hours of material annually. This is a stark contrast to the hundreds of thousands hours of captioning now produced in the U.S. alone. Today’s focus is on continually improving caption quality.
This issue was first addressed in a February 2014 FCC declaratory ruling that defined quality standards for accuracy, synchronicity, completeness and placement of captions. However, the “best practices” guidance it set for live captioning via Human and Electronic Newsroom Technique (ENT) workflows has proven inadequate according to experts and organizations representing the deaf community. It also offered no guidance regarding the use of ASR. This led to a 2019 petition to the FCC asking for rulemaking to address caption quality concerns.
Captioning has come a long way since the 1970s, changing the lives of those who can’t hear, and even those who can (think of TVs in an airport, gym or noisy bar). In our next post, we’ll speak with a leading representative of the deaf and hard-of-hearing community for his perspectives on how well captioning serves the community’s needs now, and what they look for in captioning’s future.
AppTek.ai is a global leader in artificial intelligence (AI) and machine learning (ML) technologies for automatic speech recognition (ASR), neural machine translation (NMT), natural language processing/understanding (NLP/U), large language models (LLMs) and text-to-speech (TTS) technologies. The AppTek platform delivers industry-leading solutions for organizations across a breadth of global markets such as media and entertainment, call centers, government, enterprise business, and more. Built by scientists and research engineers who are recognized among the best in the world, AppTek’s solutions cover a wide array of languages/ dialects, channels, domains and demographics.