Obstacle detection during Navigation using Convolutional Neural Networks with LSTM

05 May 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

This research presents a novel approach to obstacle detection during navigation using a combination of Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks. The primary objective is to generate accurate image captions that describe the content of images, which is crucial for applications such as autonomous driving and assistive technologies for the visually impaired. We systematically analyze the architecture of our model, which consists of three main components: a CNN for feature extraction, an LSTM for sequence generation, and a mechanism for sentence formulation. By employing transfer learning with the Inception v3 architecture, we enhance the model's performance while reducing computational costs. Our experiments utilize the Flickr8k dataset, which comprises 8,000 images, each accompanied by five descriptive sentences. We introduce a simplified version of Gated Recurrent Units (GRUs) as an alternative to LSTMs, demonstrating comparable performance with fewer parameters, thus improving training efficiency. The model's effectiveness is evaluated using the Bilingual Evaluation Understudy (BLEU) score, which quantifies the quality of generated captions against reference sentences. Results indicate that our architecture achieves a BLEU score of aprox 80% on the training set and approx 75% on the test set, showcasing its capability to produce semantically and grammatically correct captions. Additionally, we explore the integration of attention mechanisms to enhance the model's focus on relevant image features during caption generation. The findings suggest that our approach not only meets the challenges of automatic image captioning but also holds potential for broader applications in image understanding and navigation systems. Future work will involve expanding the dataset and refining the model to further improve accuracy and robustness in diverse scenarios.

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.