Bridging Multimedia Modalities: Enhanced Multimodal AI Understanding and Intelligent Agents