Running a multimodal LLaVA model, camera, and speech synthesis Image by Enoc Valenzuela, UnsplashModern large multimodal models (LMMs) can process not only text but also different types of data. Indeed, “a picture is worth a thousand words,” and this functionality can be crucial during the interaction with the real world. In this “weekend project,” I…
