Skip to content Skip to sidebar Skip to footer

An Introduction To Fine-Tuning Pre-Trained Transformers Models | by Ram Vegiraju | Feb, 2024

Simplified utilizing the HuggingFace trainer object Image from Unsplash by Markus SpiskeHuggingFace serves as a home to many popular open-source NLP models. Many of these models are effective as is, but often require some sort of training or fine-tuning to improve performance for your specific use-case. As the LLM implosion continues, we will take a…

Read More

Enhancing Vision-Language Models with Chain of Manipulations: A Leap Towards Faithful Visual Reasoning and Error Traceability

Big Vision Language Models (VLMs) trained to comprehend vision have shown viability in broad scenarios like visual question answering, visual grounding, and optical character recognition, capitalizing on the strength of Large Language Models (LLMs) in general knowledge of the world. Humans mark or process the provided photos for convenience and rigor to address the intricate…

Read More

A Weekend AI Project: Making a Visual Assistant for People with Vision Impairments | by Dmitrii Eliuseev | Feb, 2024

Running a multimodal LLaVA model, camera, and speech synthesis Image by Enoc Valenzuela, UnsplashModern large multimodal models (LMMs) can process not only text but also different types of data. Indeed, “a picture is worth a thousand words,” and this functionality can be crucial during the interaction with the real world. In this “weekend project,” I…

Read More