Our experience of the world is multimodal – we see objects, hear sounds, feel texture, smell odors, and taste flavors. In recent years, a broad and impactful body of research emerged in artificial intelligence under the umbrella of multimodal, characterized by multiple modalities. As we formalize a long-term research vision for multimodal research, it is important to reflect on its foundational principles and core technical challenges. What is multimodal? Answering this question is complicated by the multi-disciplinary nature of the problem, spread across many domains and research fields. Two key principles have driven many multimodal innovations: heterogeneity and interconnections from multiple modalities. Historical and recent progress will be synthesized in a research-oriented taxonomy, centered around 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification. The talk will conclude with open questions and unsolved challenges essential for a long-term research vision in multimodal research.