I tried it on the photo linked below and it makes the string instruments look like they're made of brass. Not trying to bash it since I thought it was pretty impressive overall, but I'm curious about what leads to this type of failure.
thanks for the feedback! it's made out of two models, one model creates a caption and the second model takes the caption and the black and white image and colorizes it. if you click on the edit button you can see the text that generated that colorization. if the text is incorrect, you can edit the text and recolorize it. this often leads to a better result, however, some cases are still hard, especially damaged photos.
I assume you're using CLIP or BLIP for the text generation of the model, and then img2img or something like that for the colorization. What model are you using for the latter colorization?
https://commons.wikimedia.org/wiki/File:Jazzing_orchestra_19...