Medical image segmentation serves as a pivotal technology in computer vision, particularly capable in providing critical diagnostic information when processing multi?modal medical images like CT and MRI. However, existing techniques still exhibit significant limitations in modality collaborative modeling, precise structural boundary representation, and effective integration of multi?scale semantic information. To address these challenges, this paper proposes MicFormer?HMD, an enhanced architecture that improves upon the traditional MicFormer framework.A Hybrid Gating Module is designed to achieve dynamic feature selection before Cross?Modal interaction through parameterized convolutions and gating functions, enabling adaptive noise suppression and enhances discriminative feature representation. Then, we develop a Multi?Branch Fusion Attention module that employs a Multi?Branch dilated convolution architecture and a dual attention calibration mechanism,significantly improving the model's capability in capturing and integrating multi?scale contextual information. Dynamic Snake Convolution is incorporated, whose deformable kernels adaptively conform to the complex morphology of cardiac anatomical structures, thereby strengthening geometric perception. The proposed MicFormer?HMD architecture demonstrates remarkable advantages in cardiac image segmentation tasks, showing particular improvements in maintaining thin?walled tissue continuity and complex vascular connectivity.