논문
-
MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields논문 2024. 4. 12. 17:01
https://arxiv.org/abs/2302.02978 MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields Previous research has demonstrated the advantages of integrating data from multiple sources over traditional unimodal data, leading to the emergence of numerous novel multimodal applications. We propose a multimodal classification benchmark MuG with eight arxiv.org 분류..
-
AlphaDAPR: An AI-based Explainable Expert Support System for Art Therapy논문 2024. 4. 1. 09:39
https://dl.acm.org/doi/abs/10.1145/3581641.3584087 AlphaDAPR: An AI-based Explainable Expert Support System for Art Therapy | Proceedings of the 28th International Conference on I ABSTRACT Sketch-based drawing assessments in art therapy are widely used to understand individuals’ cognitive and psychological states, such as cognitive impairment or mental disorders. Along with self-report measures ..
-
A Picture May Be Worth a Thousand Lives: An Interpretable Artificial Intelligence Strategy for Predictions of Suicide Risk from Social Media Images논문 2024. 1. 15. 19:40
https://arxiv.org/abs/2302.09488 A Picture May Be Worth a Thousand Lives: An Interpretable Artificial Intelligence Strategy for Predictions of Suicide Risk fromThe promising research on Artificial Intelligence usages in suicide prevention has principal gaps, including black box methodologies, inadequate outcome measures, and scarce research on non-verbal inputs, such as social media images (desp..
-
MiniGPT-v2: Large Language Model As a Unified Interface for Vision-Language Multi-task Learning논문 2023. 12. 15. 23:03
https://arxiv.org/abs/2310.09478 MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning Large language models have shown their remarkable capabilities as a general interface for various language-related applications. Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description arxiv.org In..
-
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks논문 2023. 12. 12. 21:12
https://arxiv.org/abs/2305.11175 VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric TasksLarge language models (LLMs) have notably accelerated progress towards artificial general intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing them with immense potential across a range of applications. However, inarxiv.org 이 업계의 최신 기술은 ..