2026/5/27

Mohammad Dehnavi

Academic rank: Assistant Professor
ORCID:
Education: PhD.
H-Index:
Faculty: Faculty ofٍٍ Electrical Engineering
ScholarId:
E-mail: m.dehnavi [at] mail.kut.ac.ir
ScopusId:
Phone: 083-38305001
ResearchGate:

Research

Title
Real-time visual feature matching on FPGA platforms through adaptive local-patches processing
Type
JournalPaper
Keywords
FPGA · Scale-invariant feature transform · Hardware architecture · Feature extraction
Year
2026
Journal Multimedia Tools and Applications
DOI
Researchers Mohammad Dehnavi ، Ehsan Pourshakib ، Milad Sarhadi ، Bijan Alizadeh

Abstract

The Scale-Invariant Feature Transform (SIFT) algorithm is widely recognized as a robust method for image feature extraction due to its invariance to scale and rotation. However, its high computational complexity has posed significant challenges for deployment in low-cost, real-time systems. A large portion of this computational load is attributed to descriptor generation and feature matching. In prior FPGA-based implementations, this challenge has been addressed by reading patches from the base Gaussian layer and regenerating the target Gaussian layers through additional processing, which introduces considerable memory and logic overhead. In this work, a hardware architecture is proposed in which all Gaussian layers are stored in off-chip memory to avoid regeneration, allowing patches to be directly accessed from the corresponding scale for descriptor generation. A descriptor matching architecture is also designed, where off-chip memory is utilized to minimize the use of in-chip resources. The system is implemented on a Zynq Ultrascale+ MPSoC platform and connected to a live camera for real-time operation. Through experimental evaluation, it is demonstrated that the system is capable of processing up to 3,000 keypoints per Full HD frame at 30 frames per second, with a total processing time of 33 ms per frame. Compared to state-of-the-art methods, a 5-percentage-point improvement in matching accuracy is achieved, while utilizing only 59K LUTs, 87K FFs, 717 DSPs, and 9.2 Mb of memory. These results confirm that the proposed system is well-suited for high-performance, real-time visual processing in resource-constrained environments.