Paolo Cudrano
PhD Candidate,
Continual Learning
for Embodied AI,
AIRLab,
Politecnico di Milano,
Italy
Publications
You can also find my articles on my Google Scholar profile.
2024
-
The Empirical Impact of Forgetting and Transfer in Continual Visual OdometryPaolo Cudrano*, Xiaoyu Luo*, and Matteo MatteucciIn Third Conference on Lifelong Learning Agents (CoLLAs), Jul 2024
As robotics continues to advance, the need for adaptive and continuously-learning embodied agents increases, particularly in the realm of assistance robotics. Quick adaptability and long-term information retention are essential to operate in dynamic environments typical of humans’ everyday lives. A lifelong learning paradigm is thus required, but it is scarcely addressed by current robotics literature. This study empirically investigates the impact of catastrophic forgetting and the effectiveness of knowledge transfer in neural networks trained continuously in an embodied setting. We focus on the task of visual odometry, which holds primary importance for embodied agents in enabling their self-localization. We experiment on the simple continual scenario of discrete transitions between indoor locations, akin to a robot navigating different apartments. In this regime, we observe initial satisfactory performance with high transferability between environments, followed by a specialization phase where the model prioritizes current environment-specific knowledge at the expense of generalization. Conventional regularization strategies and increased model capacity prove ineffective in mitigating this phenomenon. Rehearsal is instead mildly beneficial but with the addition of a substantial memory cost. Incorporating action information, as commonly done in embodied settings, facilitates quicker convergence but exacerbates specialization, making the model overly reliant on its motion expectations and less adept at correctly interpreting visual cues. These findings emphasize the open challenges of balancing adaptation and memory retention in lifelong robotics and contribute valuable insights into the application of a lifelong paradigm on embodied agents.
-
Can Shape-Infused Joint Embeddings Improve Image-Conditioned 3D Diffusion?Cristian Sbrolli, Paolo Cudrano, and Matteo MatteucciIn International Joint Conference on Neural Networks (IJCNN), Jun 2024
Recent advancements in deep generative models, particularly with the application of CLIP (Contrastive Language Image Pretraining) to Denoising Diffusion Probabilistic Models (DDPMs), have demonstrated remarkable effectiveness in text to image generation. The well structured embedding space of CLIP has also been extended to image to shape generation with DDPMs, yielding notable results. Despite these successes, some fundamental questions arise: Does CLIP ensure the best results in shape generation from images? Can we leverage conditioning to bring explicit 3D knowledge into the generative process and obtain better quality? This study introduces CISP (Contrastive Image Shape Pre training), designed to enhance 3D shape synthesis guided by 2D images. CISP aims to enrich the CLIP framework by aligning 2D images with 3D shapes in a shared embedding space, specifically capturing 3D characteristics potentially overlooked by CLIP’s text image focus. Our comprehensive analysis assesses CISP’s guidance performance against CLIP guided models, focusing on generation quality, diversity, and coherence of the produced shapes with the conditioning image. We find that, while matching CLIP in generation quality and diversity, CISP substantially improves coherence with input images, underscoring the value of incorporating 3D knowledge into generative models. These findings suggest a promising direction for advancing the synthesis of 3D visual content by integrating multimodal systems with 3D representations.
-
RadarLCD: Learnable Radar-based Loop Closure Detection PipelineMirko Usuelli, Matteo Frosi, Paolo Cudrano, Simone Mentasti, and Matteo MatteucciIn International Joint Conference on Neural Networks (IJCNN), Jun 2024
Loop Closure Detection (LCD) is an essential task in robotics and computer vision, serving as a fundamental component for various applications across diverse domains. These applications encompass object recognition, image retrieval, and video analysis. LCD consists in identifying whether a robot has returned to a previously visited location, referred to as a loop, and then estimating the related roto-translation with respect to the analyzed location. Despite the numerous advantages of radar sensors, such as their ability to operate under diverse weather conditions and provide a wider range of view compared to other commonly used sensors (e.g., cameras or LiDARs), integrating radar data remains an arduous task due to intrinsic noise and distortion. To address this challenge, this research introduces RadarLCD, a novel supervised deep learning pipeline specifically designed for Loop Closure Detection using the FMCW Radar (Frequency Modulated Continuous Wave) sensor. RadarLCD, a learning-based LCD methodology explicitly designed for radar systems, makes a significant contribution by leveraging the pre-trained HERO (Hybrid Estimation Radar Odometry) model. Being originally developed for radar odometry, HERO’s features are used to select key points crucial for LCD tasks. The methodology undergoes evaluation across a variety of FMCW Radar dataset scenes, and it is compared to state-of-the-art systems such as Scan Context for Place Recognition and ICP for Loop Closure. The results demonstrate that RadarLCD surpasses the alternatives in multiple aspects of Loop Closure Detection.
-
OptimusLine: Consistent Road Line Detection Through TimePaolo Cudrano*, Simone Mentasti*, Riccardo Erminio Filippo Cortelazzo*, and Matteo MatteucciIn 2024 IEEE Intelligent Vehicles Symposium (IV), Jun 2024
In the field of autonomous vehicles, the detection of road line markings is a crucial yet versatile component. It provides real-time guidance for navigation and low-level vehicle control, while it also enables the generation of lane-level HD maps. These maps require high precision to provide low-level details to all future map users. At the same time, control-oriented detection pipelines require increased inference frequency and high robustness to be deployed on a safety-critical system. With this work, we present OptimusLine, a versatile line detection pipeline tackling with ease both scenarios. Built around a frame-by-frame transformer-based neural model operating in image segmentation, we show that OptimusLine achieves state-of-the-art performance and analyze its computational impact. To provide robustness to perturbations when deployed on an actual vehicle, OptimusLine introduces a scheme exploiting temporal links between consecutive frames. Enforcing temporal consistency on each new line prediction, OptimusLine can generate more robust line descriptions and produce an estimate of its prediction uncertainty.
-
Semantic interpretation of raw survey vehicle sensory data for lane-level HD map generationMatteo Bellusci, Paolo Cudrano, Simone Mentasti, Riccardo Erminio Filippo Cortelazzo, and Matteo MatteucciRobotics and Autonomous Systems, Feb 2024
High-definition (HD) maps provide a complementary source of information for Advanced Driver Assistance Systems (ADAS), allowing them to better understand the vehicle’s surroundings and make more informed decisions. HD maps are also largely employed in virtual testing phases to evaluate the behavior of ADAS components under simulated conditions. With the advent of autonomous sensorized vehicles, raw machine-oriented data will be increasingly available. The proposed pipeline aims to provide a high-level semantic interpretation of raw vehicle sensory data to derive, in an automated fashion, lane-oriented HD maps of the environment. We first present RoadStarNet, a deep learning architecture designed to extract and classify road line markings from imagery data. We show how to obtain a semantic Bird’s-Eye View (BEV) mapping of the extracted road line markings by exploiting frame-by-frame localization information. Then, we present how to progress to a graph-based representation that allows modeling complex road line markings’ structures practically, as this representation can be leveraged to produce a Lanelet2 format HD map. Lastly, we experimentally evaluate the proposed approach in real-world scenarios in terms of accuracy and coverage performance.
2023
-
Continual Cross-Dataset Adaptation in Road Surface ClassificationPaolo Cudrano, Matteo Bellusci, Giuseppe Macino, and Matteo MatteucciIn 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Sep 2023
Accurate road surface classification is crucial for autonomous vehicles (AVs) to optimize driving conditions, enhance safety, and enable advanced road mapping. However, deep learning models for road surface classification suffer from poor generalization when tested on unseen datasets. To update these models with new information, also the original training dataset must be taken into account, in order to avoid catastrophic forgetting. This is, however, inefficient if not impossible, e.g., when the data is collected in streams or large amounts. To overcome this limitation and enable fast and efficient cross-dataset adaptation, we propose to employ continual learning finetuning methods designed to retain past knowledge while adapting to new data, thus effectively avoiding forgetting. Experimental results demonstrate the superiority of this approach over naive finetuning, achieving performance close to fresh retraining. While solving this known problem, we also provide a general description of how the same technique can be adopted in other AV scenarios. We highlight the potential computational and economic benefits that a continual-based adaptation can bring to the AV industry, while also reducing greenhouse emissions due to unnecessary joint retraining.
-
Semantic Bird’s-Eye View Road Line MappingMatteo Bellusci, Paolo Cudrano, Simone Mentasti, Riccardo Erminio Filippo Cortelazzo, and Matteo MatteucciIn 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Sep 2023
The development of Autonomous Vehicles (AVs) today requires precise and reliable detection of road line markings. Indeed, recognizing road line markings from camera images acquired by the vehicle plays a crucial role in ensuring its safe navigation and improving its driving performance. Road line detection is of key importance in real-time scenarios for navigation purposes, as well as offline for the generation of HD maps. In recent years, deep neural networks have proven effective in performing this task. In particular, Convolutional Neural Networks (CNNs) have helped develop multiple Advanced Driver Assistance Systems (ADAS), now fully integrated into common commercial vehicles. This paper presents a novel CNN-based pipeline for recognizing road line markings from front-view camera images in an online setup, and it shows how these detections can be aggregated offline into aerial-like maps as a first step toward the creation of HD maps. The proposed architecture comprises a multi-decoder to accurately classify image pixels representing different classes of road line markings, as well as those related to the drivable area. The mapping system then projects the extracted road line points into the Bird’s-Eye View (BEV) space and integrates the extracted information with accurate localization measurements for georeferencing. Experimental evaluations on real-world data, including data acquired with instrumented vehicles, reveal the effectiveness of the proposed pipeline in both frame-by-frame detection and integrated mapping quality.
-
Beyond Image-Plane-Level: A Dataset for Validating End-to-End Line Detection Algorithms for Autonomous VehiclesSimone Mentasti, Paolo Cudrano, Stefano Arrigoni, Matteo Matteucci, and Federico CheliIn 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Sep 2023
In response to the accelerating deployment of autonomous vehicles and the growing reliance on deep learning-based algorithms, multiple line detection datasets have been released in the last few years. However, current datasets tend to focus only on image-plane-level line detection, neglecting the broader scope of this task and thus limiting their utility for comprehensive validation. To address this limitation, this study proposes a novel, custom-acquired dataset designed to enhance the validation of complete line detection pipelines. In particular, our dataset was recorded in controlled environments, capturing the RTK-GNSS position of road lines and the position of a vehicle equipped with a wide field of view camera. The dataset, which was recorded on closed race tracks, presents realistic challenges for line detection algorithms, particularly in narrow sections and rapid chicanes of the roads where lateral lines are not always visible. Furthermore, it offers researchers a unique resource, providing precise ground truth information for both road lines and vehicle positions, enabling the evaluation of complete line detection pipelines from the segmentation phase to the mapping one. Finally, the dataset also reflects the diverse and challenging situations faced by autonomous vehicles in the real world (i.e., multiple weather conditions, sharp cornering sections, tunnels, etc.), making it a valuable tool for enhancing the performance and safety of autonomous vehicles. The complete dataset is made available to researchers at https://airlab.deib.polimi.it/datasets-and-tools/.
-
CISPc: Embedding Images and Point Clouds in a Joint Concept Space by Contrastive LearningCristian Sbrolli, Paolo Cudrano, and Matteo MatteucciIn Image Analysis and Processing – ICIAP 2023, Sep 2023
In the last years, deep learning models have achieved remarkable success in computer vision tasks, but their ability to process and reason about multi-modal data has been limited. The emergence of models leveraging contrastive loss to learn a joint embedding space for images and text has sparked research in multi-modal unsupervised alignment. This paper proposes a contrastive model for the multi-modal alignment of images and 3D representations. In particular, we study the alignment of images and raw point clouds on a learned latent space. The effectiveness of the proposed model is demonstrated through various experiments, including 3D shape retrieval from a single image, testing on out-of-distribution data, and latent space analysis.
2022
-
Clothoid-Based Lane-Level High-Definition Maps: Unifying Sensing and Control ModelsPaolo Cudrano*, Barbara Gallazzi*, Matteo Frosi, Simone Mentasti, and Matteo MatteucciIEEE Vehicular Technology Magazine, Dec 2022
Autonomous vehicles rely on lane-level high-definition (HD) maps for self-localization and trajectory planning. Current mapping, however, relies on simple line models, while clothoid curves have unexplored potential. Clothoids, well known in road design, are often chosen to model the vehicle trajectory in planning and control systems as they describe the road with higher fidelity. For this reason, we propose two vision-based pipelines for generating lane-level HD maps using clothoid models. The first pipeline performs mapping with known poses, requiring precise real-time kinematics GPS (RTK GPS) measurements; the second copes with noisy localizations, solving the simultaneous localization and mapping (SLAM) problem. Both pipelines rely on a line detection algorithm to identify each line marking and perform a graph-based optimization to estimate the map.
-
IC3D: Image-Conditioned 3D Diffusion for Shape GenerationCristian Sbrolli, Paolo Cudrano, Matteo Frosi, and Matteo MatteucciarXiv preprint arXiv:2211.10865, Nov 2022 (Updated Sep 2023)
In the last years, Denoising Diffusion Probabilistic Models (DDPMs) obtained state-of-the-art results in many generative tasks, outperforming GANs and other classes of generative models. In particular, they reached impressive results in various image generation sub-tasks, among which conditional generation tasks such as text-guided image synthesis. Given the success of DDPMs in 2D generation, they have more recently been applied to 3D shape generation, outperforming previous approaches and reaching state-of-the-art results. However, 3D data pose additional challenges, such as the choice of the 3D representation, which impacts design choices and model efficiency. While reaching state-of-the-art results in generation quality, existing 3D DDPM works make little or no use of guidance, mainly being unconditional or class-conditional. In this paper, we present IC3D, the first Image-Conditioned 3D Diffusion model that generates 3D shapes by image guidance. It is also the first 3D DDPM model that adopts voxels as a 3D representation. To guide our DDPM, we present and leverage CISP (Contrastive Image-Shape Pre-training), a model jointly embedding images and shapes by contrastive pre-training, inspired by text-to-image DDPM works. Our generative diffusion model outperforms the state-of-the-art in 3D generation quality and diversity. Furthermore, we show that our generated shapes are preferred by human evaluators to a SoTA single-view 3D reconstruction model in terms of quality and coherence to the query image by running a side-by-side human evaluation.
-
Clothoidal Mapping of Road Line Markings for Autonomous Driving High-Definition MapsBarbara Gallazzi*, Paolo Cudrano*, Matteo Frosi, Simone Mentasti, and Matteo MatteucciIn 2022 IEEE Intelligent Vehicles Symposium (IV), Jun 2022
Lane-level HD maps are crucial for trajectory planning and control in current autonomous vehicles. For this reason, appropriate line models should be adopted to define them. Whereas mapping algorithms often rely on inaccurate representations, clothoid curves possess peculiar smoothness properties that make them desirable representations of road lines in control algorithms. We propose a multi-stage pipeline for the generation of lane-level HD maps from monocular vision relying on clothoidal spline models. We obtain measurements of the line positions using a line detection algorithm, and we exploit a graph-based optimization framework to reach an optimal fitting. An iterative greedy procedure reduces the model complexity removing unnecessary clothoids. We validate our system on a real-world dataset, which we make publicly available for further research at https://airlab.deib.polimi.it/datasets-and-tools/.
-
Detection and mapping of crop weeds and litter for agricultural robotsPaolo Cudrano, Simone Mentasti, Emanuele Locatelli, Matteo Nicolò, Samuele Portanti, Alessandro Romito, Sotirios Stavrakopoulos, Gülce Topal, Mirko Usuelli, Matteo Zinzani, and Matteo MatteucciIn 2022 AEIT International Annual Conference (AEIT), Oct 2022
Agricultural robotics is a fast-spreading research field. Using robots to help or substitute human workers presents numerous advantages. Many tasks, like field monitoring and harvesting, are relatively simple but time-consuming. Instead, robots can perform these tasks with high precision and without interruptions, guaranteeing a continuous analysis of the field and a constant stream of information delivered to the farmers. The availability of such capillary information can be exploited to increase the efficiency of the soil and decrease the need for pesticides. The development of a robust platform for autonomous field navigation and monitoring is the first step toward these goals. We propose a pipeline to control a small robot in a crop field without the need for expensive sensors, such as RTK-GPS or 3D lidars. Additionally, we present an algorithm for the detection and mapping of weeds and undesired objects such as litter, proving the capability of the system to autonomously monitor the state of the field while traversing it.
2020
-
Advances in centerline estimation for autonomous lateral controlPaolo Cudrano, Simone Mentasti, Matteo Matteucci, Mattia Bersani, Stefano Arrigoni, and Federico CheliIn 2020 IEEE Intelligent Vehicles Symposium (IV), Oct 2020
The ability of autonomous vehicles to maintain an accurate trajectory within their road lane is crucial for safe operation. This requires detecting the road lines and estimating the car relative pose within its lane. Lateral lines are usually retrieved from camera images. Still, most of the works on line detection are limited to image mask retrieval and do not provide a usable representation in world coordinates. What we propose in this paper is a complete perception pipeline based on monocular vision and able to retrieve all the information required by a vehicle lateral control system: road lines equation, centerline, vehicle heading and lateral displacement. We evaluate our system by acquiring data with accurate geometric ground truth. To act as a benchmark for further research, we make this new dataset publicly available at http://airlab.deib.polimi.it/datasets/.
-
Robust vehicle pose estimation from vision and INS fusionMattia Bersani, Simone Mentasti, Paolo Cudrano, Michele Vignati, Matteo Matteucci, and Federico CheliIn 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Sep 2020
The vehicle relative pose, referenced to road boundaries or lanes, is fundamental for an autonomous driving vehicle to perform motion planning. The estimate of relative pose includes the difference between the car heading angle and road center-line direction (i.e., the relative heading angle), together with the lateral displacement of the vehicle from the road center-line. This information can be equally derived by an inertial navigation system equipped with GPS receivers or by a vision system based on cameras. However, both solutions present some disadvantages. GPS-based estimates of heading angle shows loss of accuracy during low-speed manoeuvres or within tunnels. On the other hand, a line-fitting based estimation typical of vision systems suffers from lines accuracy, and they are hard to be implemented in an urban environment or high-curvature track scenarios. This work presents an innovative integrated algorithm that fuses those two approaches, providing vehicle relative pose with respect to road lane combining data coming from an inertial navigation system and a line detection algorithm. This integrated solution is then able to always feed the panning algorithm with an accurate estimate of the vehicle pose in multiple challenging scenarios.