Credits also contain logos/symbols (near the end), and often have stylistic flairs as well. Video compression is based on making predictions and then adding information (per Shannon's definition) for the deltas from those predictions. The pattern of credits statically sliding at a consistent rate is exactly the sort of prediction codecs are optimized for; for instance, the same algorithms will save space by predicting repeated pixel patterns during a slow camera pan.
Still, I've often thought it would be nice if text were a more first-class citizen within video codecs. I think it's more a toolchain/workflow problem than a shortcoming in video compression technology as such. Whoever is mastering a Blu-Ray or prepping a Hollywood film for Netflix is usually not the same person cutting and assembling the original content. For innumerable reasons (access to raw sources, low return on time spent, chicken-egg playback compatibility), it just doesn't make sense to (for instance) extract the burned-in stylized subtitles and bake them into the codec as text+font data, as opposed to just merging them into the film as pixels and calling it a day.
Fun fact: nearly every Pixar Blu-Ray is split into multiple forking playback paths for different languages, such that if you watch it in French, any scenes with diegetic text (newspapers, signs on buildings) are re-rendered in French. Obviously that's hugely inefficient; yet at 50GB, there's storage to spare, so why not? The end result is a nice touch and a seamless experience.
Text with video is difficult to do correctly for a few different reasons. Just rendering text well is a complicated task that's often done poorly. Allowing arbitrary text styling leads to more complexity. However for the sake of accessibility (and/or regulations) you need some level of styling ability.
This is all besides complexity like video/audio content synced text or handling multiple simultaneous speakers. Even that is besides workflow/tooling issues that you mentioned.
The MPEG-4 spec kind of punted on text and supports fairly basic timed text subtitles. Text essentially has timestamp where it appears and a duration. There's minimal ability to style the text and there's limits on the availability of fonts though it does allow for Unicode so most languages are covered. It's possible to do tricks where you style words at time stamps to give a karaoke effect or identify speakers but that's all on the creation side and is very tricky.
The Matroska spec has a lot more robust support for text but it's more of just preserving the original subtitle/text encoding in the file and letting the player software figure out what to do with that particular format and then displaying it as an overlay on the video.
It's unfortunate text doesn't get more first class love from multimedia specs. There's a lot that could be done, titles and credits as you mention, but also better integration of descriptive or reference text or hyperlink-able anchors.
MPEG 4 (taken as a the whole body of standards, not as two particular video codecs) actually has provisions for text content, vector video layers and even rudimentary 3D objects. On the other hand I'm almost sure that there are no practical implementations of any of that.
Oh, and that's only the beginning. The MPEG-4 standard also includes some pretty wacky kitchen-sink features like animated human faces and bodies (defined in MPEG-4 part 2 as "FBA objects"), and an XML format for representing musical notation (MPEG-4 part 23, SMR).
Still, I've often thought it would be nice if text were a more first-class citizen within video codecs. I think it's more a toolchain/workflow problem than a shortcoming in video compression technology as such. Whoever is mastering a Blu-Ray or prepping a Hollywood film for Netflix is usually not the same person cutting and assembling the original content. For innumerable reasons (access to raw sources, low return on time spent, chicken-egg playback compatibility), it just doesn't make sense to (for instance) extract the burned-in stylized subtitles and bake them into the codec as text+font data, as opposed to just merging them into the film as pixels and calling it a day.
Fun fact: nearly every Pixar Blu-Ray is split into multiple forking playback paths for different languages, such that if you watch it in French, any scenes with diegetic text (newspapers, signs on buildings) are re-rendered in French. Obviously that's hugely inefficient; yet at 50GB, there's storage to spare, so why not? The end result is a nice touch and a seamless experience.