Research Experience


My primary research field is Computer Vision, Deep Learning and their real-life applications. Despite latest breakthroughs in deep learning, computer vision algorithms still faces a lot of challenges. Numerous variations in real-world data makes it difficult for model to generalize accross diverse domains. Deploying computationally expensive in low-cost resource-constrained is another big challenge. I am interested in coming up with effective solutions to tackle these challenges by developing more robust and efficient systems for computer vision tasks.
I am also interested in vision and AI algorithms that can improve safety and security in society. One way to do that is to explore how machines can better recognize and understand human appearance, behaviors, attributes and interactions in an interpretable manner. Specifically, I am interested to work on image and video recognition, biometrics, and action recognition. My final year undergraduate thesis was on "Efficient Violence Detection from Surveillance Footage" (Published in IJCNN 2021). Currently, at the University of Saskatchewan, I am continuing in the field of Computer Vision as an M.Sc. student under the supervision of Dr. Mrigank Rochan.



Publications (4)

ORCID Profile

Efficient Two Stream Network for Violence Detection Using Separable Convolutional LSTM

Zahidul Islam, Mohammad Rukonuzzaman, Raiyan Ahmed, Md. Hasanul Kabir, Moshiur Farazi
International Joint Conference on Neural Networks (IJCNN) 2021
July 2021

Abstract - Automatically detecting violence from surveillance footage is a subset of activity recognition that deserves special attention because of its wide applicability in unmanned security monitoring systems, internet video filtration, etc. In this work, we propose an efficient two-stream deep learning architecture leveraging Separable Convolutional LSTM (SepConvLSTM) and pre-trained MobileNet where one stream takes in background suppressed frames as inputs and other stream processes difference of adjacent frames. We employed simple and fast input pre-processing techniques that highlight the moving objects in the frames by suppressing non-moving backgrounds and capture the motion in-between frames. As violent actions are mostly characterized by body movements these inputs help produce discriminative features. SepConvLSTM is constructed by replacing convolution operation at each gate of ConvLSTM with a depthwise separable convolution that enables producing robust long-range Spatio-temporal features while using substantially fewer parameters. We experimented with three fusion methods to combine the output feature maps of the two streams. Evaluation of the proposed methods was done on three standard public datasets. Our model outperforms the accuracy on the larger and more challenging RWF-2000 dataset by more than a 2% margin while matching state-of-the-art results on the smaller datasets. Our experiments lead us to conclude, the proposed models are superior in terms of both computational efficiency and detection accuracy.

Paper

Towards Building A Robust Large-Scale Bangla Text Recognition Solution Using A Unique Multiple-Domain Character-Based Document Recognition Approach

AKM Shahariar Azad Rabby, Md. Majedul Islam, Zahidul Islam, Nazmul Hasan, Fuad Rahman
2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)
13-16 Dec. 2021

Abstract - Bangla is one of the world’s top ten popular languages in terms of the num- ber of speakers. It also happens to have a complex script primarily because of the presence of complex characters e.g. graphemes, that are composed of multiple single characters, and the characteristic shorthands e.g. vowel dia- critics, and consonant diacritics, making the number of classes of this script recognition quite large, varied and challenging. In this paper, we present a unique large-scale Bangla document OCR solution based on character-level recognition modules. We have tested our approach on two independent domains - printed and handwritten documents. We also applied our solution to three subdomains within the printed domain - computer-composed documents, letterpress documents, and typewritten documents. Our extensive experiments show that our approach achieves state-of-the-art performance on handwritten and printed documents.

Paper

Data-driven Forecasting of Weather in Bangladesh Leveraging Transformer Network and Strong Inter-feature Correlation

Zahidul Islam, Raisa Fariha
2022 25th IEEE International Conference on Computer and Information Technology (ICCIT)
17-18 Dec. 2022

Abstract - Accurate weather forecasting is indispensable for countries like Bangladesh because of their reliance on agriculture and vulnerability to frequently occurring natural disasters such as floods, cyclones, and riverbank erosion. Bangladesh has a fairly regular annual weather pattern where the weather features, such as temperature, humidity, and rainfall, are highly correlated. Leveraging this inter-feature correlation between temperature and rainfall, we propose a flexible transformer based neural network that can forecast monthly temperature or rainfall by analyzing the past few data-points of any one or both of these weather features. We evaluated the proposed method using a public dataset called Bangladesh Weather Dataset which contains 115 years of Bangladesh weather data comprising month-wise average temperature and rainfall measurements. Our method demonstrates substantial improvements over the previously proposed approaches for this task in both metrics- mean squared error and mean absolute error. Our proposed transformer network is also significantly more lightweight and computationally efficient and accurate. This transformer aided method would pave a way to understand and leverage the complex relationship between the weather features and open up possibilities of coming up with even more robust methods of forecasting weather data.

Button

A Comparative Analysis of Efficient Convolutional Neural Network Based Methods for Plant Disease Classification

Ridwan Mahbub, Samiha Anuva, Ifrad Khan, Zahidul Islam
2022 25th IEEE International Conference on Computer and Information Technology (ICCIT)
17-18 Dec. 2022

Abstract - Ensuring global food sufficiency and security is one of the prime challenges of the twenty-first century. The most effective approach to tackle this challenge is to ensure a healthy agricultural ecosystem. A potential barrier, in this case, would be different diseases that commonly infest and cause great damage to the production. To keep plants disease-free, most countries still rely on human intervention-based approaches. One issue with the mentioned approach is that farmers don't get the help they need at the right time owing to manpower shortages. This paves the way for the development and implementation of automated mechanisms to detect and classify plant disease. Using heavy-weight convolutional neural network or CNN-driven solutions is often not practical as farmers are not equipped with devices capable of running such heavy applications. This is why lightweight CNN architectures capable of operating mobile and embedded devices are crucial. In this work, we present a comparative analysis and overview of different efficient CNN-based methodologies proposed for plant disease classification. Moreover, we fine-tuned off-the-shelf state-of-the-art efficient CNN architectures using transfer learning to analyze and determine the right balance of model size and accuracy.

Button



Reviewing Experience