18 شهریور 1403

محمد دهنوی

مرتبه علمی: استادیار
نشانی: دانشگاه صنعتی کرمانشاه - دانشکده مهندسی برق - گروه مهندسی برق (گرایش های الکترونیک و مخابرات)
تحصیلات: دکترای تخصصی / مهندسی برق - الکترونیک دیجیتال
تلفن: 083-38305001
دانشکده: دانشکده مهندسی برق

مشخصات پژوهش

عنوان
Fcd-cnn: FPGA-based CU depth decision for HEVC intra encoder using CNN
نوع پژوهش مقاله چاپ شده
کلیدواژه‌ها
FPGA · Video compression · Hardware architecture · HEVC
پژوهشگران حسین دهنوی (نفر اول)، محمد دهنوی (نفر دوم)، سجاد حقزاد کلیدبری (نفر سوم)

چکیده

Video compression for storage and transmission has always been a focal point for researchers in the field of image processing. Their efforts aim to reduce the data volume required for video representation while maintaining its quality. HEVC is one of the efficient standards for video compression, receiving special attention due to the increasing demand for high-resolution videos. The main step in video compression involves dividing the coding unit (CU) blocks into smaller blocks that have a uniform texture. In traditional methods, The Discrete Cosine Transform (DCT) is applied, followed by the use of RDO for decision-making on partitioning. This paper presents a novel convolutional neural network (CNN) and its hardware implementation as an alternative to DCT, aimed at speeding up partitioning and reducing the hardware resources required. The proposed hardware utilizes an efficient and lightweight CNN to partition CUs with low hardware resources in real-time applications. This CNN is trained for different Quantization Parameters (QPs) and block sizes to prevent overfitting. Furthermore, the system’s input size is fixed at 16 × 16 , and other input sizes are scaled to this dimension. Loop unrolling, data reuse, and resource sharing are applied in hardware implementation to save resources. The hardware architecture is fixed for all block sizes and QPs, and only the coefficients of the CNN are changed. In terms of compression quality, the proposed hardware achieves a 4.42% BD-BR and −0.19 BD-PSNR compared to HM16.5. The proposed system can process 64 × 64 CU at 150 MHz and in 4914 clock cycles. The hardware resources utilized by the proposed system include 13,141 LUTs, 15,885 Flip-flops, 51 BRAMs, and 74 DSPs.