Example from a news story. After seeing the video, I must admit I’m not sure if ...

echelon · on Nov 28, 2021

I work really closely on deep fake tech [1] and I'd say I'm relatively current with the state of the art in the literature. This was not deepfaked. The person recorded it themselves and is lying.

The video quality is too good. The lighting and movements lack mistakes. It can't be first order model, wav2lip, or any of the relatively new audio to video models.

The audio doesn't suffer from spectral noise, and it matches the lip movements close enough to not be TTS. Voice conversion (VC) introduces pitch issues that are readily apparent, and it's incredibly hard to train VC models without a ton of parallel audio data from source and target speakers.

This is absolutely a lie (not a deepfake) and I'd bet money on it.

[1] I created https://fakeyou.com cartoon and celebrity TTS, real time voice to voice mapping for VTubers, and am currently working on ML blendshapes.

devenson · on Nov 28, 2021

Friend was suckered by the exact same scam. Hers was NOT a deepfake, although I assumed it was at first. They cajoled her into making the video. She did so reluctantly, so she seemed a bit "off" in the video, but it was indeed her. She was out $300, was super embarrassed, and then completely tormented by how helpless she was in trying to recover her account.

blueblisters · on Nov 28, 2021

I agree. I think the most telling sign is the hand gesture for the number 3 when he says "just invested 300 bucks". Don't think Deepfake models can understand intent yet.

jcims · on Nov 28, 2021

There's two ways to think of this as a deepfake:

1 - This is an actual video of the guy in his home, but they changed/synthesized the audio and then worked on the lip movements to make it match.

2 - This is a video of an actor impersonating the guy, possibly to the extent of impersonating his voice (although his timbre might make that a little tricky), and then they just deepfake the face on to the actor. An example of this is deeptomcruise on TikTok, something that you should treat yourself to if you haven't seen yet - https://www.tiktok.com/@deeptomcruise (first two today aren't great, here's a good one - https://www.tiktok.com/@deeptomcruise/video/7018171271095553... ? )

I'm not even convinced there's any alteration here, but even if there was both of the above could be possible. Adobe demoed something called VoCo in 2016 that never saw the light of day, not sure if there is something approximating this available today: https://www.youtube.com/watch?v=I3l4XLZ59iw&t=260s

whoisjuan · on Nov 28, 2021

What do you mean? It’s not like a deepfake is a model that produces a new video from scratch in a completely random setting.

It needs a base video to be modified. They need a video of a person talking to do the deepfake.

PeterisP · on Nov 28, 2021

Exactly, so the point is that any relevant gestures must be coming from the timing used in an existing video, the current deepfake tech can manipulate lips and facial expressions, but it can't have the video lift three fingers at the proper time when "300" is being said.

So this is either an indication of a very elaborate deepfake which managed to surface an amazingly coincidental source video (which should be possible to find on her archives) or that it's not a deepfake but a real recording.

jhardy54 · on Nov 28, 2021

Or: it’s a deepfake where the attacker made a video and attached the victims face in post-processing.

harperlee · on Nov 28, 2021

Or: the base video is well chosen among the victim’s, the message is crafted as to not deviate a lot from the base, and the deep fake is used to have him say other things.

quitit · on Nov 28, 2021

I think the issue is that there would be insufficient audio sources to generate new spoken language.

pyinstallwoes · on Nov 30, 2021

That is not an issue. You can generate new spoken language with not much more than a word.

quitit · on Nov 30, 2021

Sincerely: I’d love to hear an example of that.

The reason for my skepticism: The state of the art language pronunciation from the best this planet has to offer still requires a full phoneme library recorded in studio conditions. A voice sample taken from a user’s instagram page doesn’t seem like the kind of source material that would be useful to make convincing speech.

idontwantthis · on Nov 28, 2021

Wouldn’t it be a real person making the gesture and his face is deepfaked on?

yonixw · on Nov 28, 2021

I think you are right, if you find a similar person with similar voice (even commission them 5$ in fiverr) and only deep-fake the face in a low quality video, this is very much achievable in today's state of the art.

ricardobeat · on Nov 28, 2021

Pay attention to the hand. His index finger is turned inwards, that would be a very odd “number 3 gesture”. More likely random hand movement from the video used.

jcims · on Nov 28, 2021

The perfection in the audio is precisely what made me skeptical.

k4runa · on Nov 28, 2021

It's exactly the same situation with my friend. “I just invested $300 into Bitcoin and got $10,000 back. Gotta try it,” except the numbers are higher... but she actually says it in the video too.

lelandfe · on Nov 28, 2021

Telling that a major news outlet can't get anything more than a "we're looking into it" response from IG support.

stjohnswarts · on Nov 28, 2021

I mean what if they are looking into it though? Would you prefer a half baked response?

deadmutex · on Nov 28, 2021

Another link, same story: https://www.youtube.com/watch?v=vqr0oER03SE

pangolinplayer · on Nov 28, 2021

Yeah. It's almost like the post is a scam and we are all being duped into promoting the non-deepfake crypto scam.