Facebook.com already has decades-worth of natural language text and audio/video from uploads and "live" sessions. That is a deep pool, and wide too because Facebook probably has content in all currently-spoken natural languages, with the exception of those exclusively used by uncontacted peoples. That is a data moat.