Matteo Barbetti

Ph.D. student in Smart Computing at the University of Florence, I deal with Artificial Intelligence developing possible applications to Particle Physics and Medical Physics. Passionate about innovation and scientific dissemination, I never refuse going out to discuss new ideas.

In June 2020 I got the master degree in Physics and Astrophysics with a thesis work for which I spent three months at CERN thanks to a scholarship founded by the National Institute for Nuclear Physics (INFN).

news

selected publications

Lamarr: LHCb ultra-fast simulation based on machine learning models

in 21st International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2022)
arXiv preprint – Submitted on 20 Mar 2023

Abut 90% of the computing resources available to the LHCb experiment has been spent to produce simulated data samples for Run 2 of the Large Hadron Collider at CERN. The upgraded LHCb detector will be able to collect larger data samples, requiring many more simulated events to analyze the data to be collected in Run 3. Simulation is a key necessity of analysis to interpret signal vs background and measure efficiencies. The needed simulation will far exceed the pledged resources, requiring an evolution in technologies and techniques to produce these simulated data samples. In this contribution, we discuss Lamarr, a Gaudi-based framework to speed-up the simulation production parametrizing both the detector response and the reconstruction algorithms of the LHCb experiment. Deep Generative Models powered by several algorithms and strategies are employed to effectively parametrize the high-level response of the single components of the LHCb detector, encoding within neural networks the experimental errors and uncertainties introduced in the detection and reconstruction phases. Where possible, models are trained directly on real data, statistically subtracting any background components through weights application. Embedding Lamarr in the general LHCb Gauss Simulation framework allows to combine its execution with any of the available generators in a seamless way. The resulting software package enables a simulation process completely independent of the Detailed Simulation used to date.
Hyperparameter Optimization as a Service on INFN Cloud

Matteo Barbetti, and Lucio Anderlini

in 21st International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2022)
arXiv preprint – Submitted on 13 Jan 2023

The simplest and often most effective way of parallelizing the training of complex Machine Learning models is to execute several training instances on multiple machines, possibly scanning the hyperparameter space to optimize the underlying statistical model and the learning procedure. Often, such a meta learning procedure is limited by the ability of accessing securely a common database organizing the knowledge of the previous and ongoing trials. Exploiting opportunistic GPUs provided in different environments represents a further challenge when designing such optimization campaigns. In this contribution we discuss how a set of REST APIs can be used to access a dedicated service based on INFN Cloud to monitor and possibly coordinate multiple training instances, with gradientless optimization techniques, via simple HTTP requests. The service, named Hopaas (Hyperparameter OPtimization As A Service), is made of web interface and sets of APIs implemented with a FastAPI back-end running through Uvicorn and NGINX in a virtual instance of INFN Cloud. The optimization algorithms are currently based on Bayesian techniques as provided by Optuna. A Python front-end is also made available for quick prototyping. We present applications to hyperparameter optimization campaigns performed combining private, INFN Cloud and CINECA resources.

Lamarr: the ultra-fast simulation option for the LHCb experiment

L. Anderlini, M. Barbetti, G. Corti, and 7 more authors

in 41st International Conference on High Energy Physics (ICHEP 2022)
PoS 414 (2023) 233 – Published on 15 Jun 2023

During Run 2 of the Large Hadron Collider at CERN, the LHCb experiment has spent more than 80% of the pledged CPU time to produce simulated data samples. The upcoming upgraded version of the experiment will be able to collect larger data samples, requiring many more simulated events to analyze the data to be collected in Run 3. Simulation is a key necessity of analysis to interpret signal vs background and measure efficiencies. The needed simulation will far exceed the pledged resources, requiring an evolution in technologies and techniques to produce these simulated samples. In this contribution, we discuss Lamarr, a Gaudi-based framework to deliver simulated samples parametrizing both the detector response and the reconstruction algorithms. Generative Models powered by several algorithms and strategies are employed to effectively parametrize the high-level response of the single components of the LHCb detector, encoding within neural networks the experimental errors and uncertainties introduced in the detection and reconstruction process. Where possible, models are trained directly on real data, resulting into a simulation process completely independent of the detailed simulation used to date.
JPCS
Oral

Towards Reliable Neural Generative Modeling of Detectors

Lucio Anderlini, Matteo Barbetti, Denis Derkach, and 3 more authors

in 20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2021)
J. Phys.: Conf. Ser. 2438 (2023) 012130 – Published on 15 Feb 2023

The increasing luminosities of future data taking at Large Hadron Collider and next generation collider experiments require an unprecedented amount of simulated events to be produced. Such large scale productions demand a significant amount of valuable computing resources. This brings a demand to use new approaches to event generation and simulation of detector responses. In this paper, we discuss the application of generative adversarial networks (GANs) to the simulation of the LHCb experiment events. We emphasize main pitfalls in the application of GANs and study the systematic effects in detail. The presented results are based on the Geant4 simulation of the LHCb Cherenkov detector.
A full detector description using neural network driven simulation

F. Ratnikov, A. Rogachev, S. Mokhnenko, and 7 more authors

in 15th Pisa Meeting on Advanced Detectors
Nucl. Instrum. Meth. A 1046 (2023) 167591 – Published on 11 Jan 2023

The abundance of data arriving in the new runs of the Large Hadron Collider creates tough requirements for the amount of necessary simulated events and thus for the speed of generating such events. Current approaches can suffer from long generation time and lack of important storage resources to preserve the simulated datasets. The development of the new fast generation techniques is thus crucial for the proper functioning of experiments. We present a novel approach to simulate LHCb detector events using generative machine learning algorithms and other statistical tools. The approaches combine the speed and flexibility of neural networks and encapsulates knowledge about the detector in the form of statistical patterns. Whenever possible, the algorithms are trained using real data, which enhances their robustness against differences between real data and simulation. We discuss particularities of neural network detector simulation implementations and corresponding systematic uncertainties.
\(\mathtt{scikinC}\): a tool for deploying machine learning as binaries

Lucio Anderlini, and Matteo Barbetti

in Computational Tools for High Energy Physics and Cosmology
PoS 409 (2022) 034 – Published on 14 Jul 2022

Machine Learning plays a major role in Computational Physics providing arbitrarily good approximations of arbitrarily complex functions, for example with Artificial Neural Networks and Boosted Decision Trees. Unfortunately, the integration of Machine Learning models trained with Python frameworks in production code, often developed in C, C++, or FORTRAN, is notoriously a complicated task. We present scikinC, a transpiler for scikit-learn and Keras models into plain C functions, intended to be compiled into shared objects and linked to other applications. An example of application to the parametrization of a detector is discussed.