Multi-Source Transfer Learning with Cross-Domain Attention for High-Resolution Remote Sensing Image Classification

Authors

  • Sampada Thigale Cusrow Wadia Institute of Technology, Pune

Keywords:

Remote Sensing Image Classification, Transfer Learning, Convolutional Neural Networks, Cross-Domain Attention, Land-Cover Mapping, Feature Fusion

Abstract

Remote sensing image classification is a critical task in geospatial intelligence, enabling land-cover mapping, urban planning, disaster monitoring, and environmental assessment at regional and global scales. The high spatial and spectral variability of satellite and aerial imagery, combined with the scarcity of labelled training data for specific geographic regions, presents substantial challenges for machine learning approaches. This paper introduces Multi-Source Transfer Learning Network (MSTL-Net), a novel framework that leverages knowledge transferred from two complementary pre-trained convolutional neural network backbones—one trained on large-scale natural image datasets and one fine-tuned on an intermediate remote sensing source domain through a cross-domain attention fusion module to produce discriminative feature representations for target domain classification. The key innovation of MSTL-Net lies in its cross-domain attention mechanism, which dynamically weights and fuses intermediate feature maps from both backbones based on their relevance to the target classification task, enabling the model to exploit complementary domain-specific and general visual knowledge simultaneously. A multi-scale pooling aggregation module further captures spatial context at multiple granularities, addressing the scale variability inherent in satellite imagery. The model is trained using a progressive unfreezing strategy that stabilizes optimization and prevents catastrophic forgetting of pre-trained representations. We evaluate MSTL-Net on the UC Merced Land Use dataset and a custom high-resolution multi-spectral dataset comprising seven land-cover classes from satellite imagery of the Indian subcontinent. Our model achieves an overall accuracy of 95.7%, an average accuracy of 94.3%, and a Kappa coefficient of 0.951, surpassing state-of-the-art transfer learning baselines including fine-tuned ResNet-50, Inception-V3, DenseNet-121, and EfficientNet-B4. Ablation experiments rigorously validate the contribution of each model component. These results establish MSTL-Net as a robust and computationally efficient framework for high-accuracy remote sensing image classification with limited labelled data.

Downloads

Published

15-07-2024

Issue

Section

Articles