The Industry 4.0 concept with its ongoing process of digitalisation and automation has completely changed the nature of industrial enterprises. Alongside all the connectivity advantages, it introduced new challenges, including the extension of the so-called “attack surface”. On top of the physical threats that were common for the industrial organisations even before, cyber and cyber-physical threats added.
Taking into account the OT top priorities – including enterprise productivity, process reliability and, most importantly, people’s safety – security has been one of the key items on the industrial agenda for about a decade. In order to timely spot suspicious activity and recognize cyber-attacks, industrial organisations use SIEM (Security Information and Event Management) systems that collect event logs from their security systems and correlate anomalous patterns.
Still criminals don’t stop and constantly come up with new ideas of how to compromise an industrial enterprise. Leveraging from physical and cyber security systems to work independently from each other, they are mixing cyber and physical approaches to conduct attacks.
But what are the consequences? A successful cyber or physical attack on connected industrial control systems and networks can disrupt operations or even deny critical services to society: consider, for example, the attack on Colonial Pipeline that halted plant operations for six days, leading to a fuel crisis, and increased prices in the eastern U.S.
The risks are significant and real. By 2025, 30% of critical infrastructure organisations will experience a security breach that will result in the halting of an operations- or mission-critical cyber-physical system, according to Gartner.
To timely provide Italian industrial companies with a right solution, Sababa Security united efforts with Gruppo Iren and University of Genova, and – with the support and funding from the Start 4.0 Competence Center – developed a machine learning (ML) algorithm capable of collecting, processing, and correlating security events from cyber, physical, and cyber-physical security systems. The project took one year from the beginning to the end, and today we asked the project leaders to share some insights and the lessons learned.
The cybersecurity project like this required realistic data from the field, scientific research, commercial guidance, and funds. With important funding, Start 4.0 supported the project, where Gruppo Iren became the source of raw data and the model of the solution. So, while the University of Genoa was in charge of sorting out and classifying the provided data as well as educating AI and testing the algorithm, Sababa Security was supervising and leading the activity.
As cybersecurity is a hot topic in the industrial world, such projects have become common for the last 7-8 years. However, this one became unique for the entire team, as it started in June 2020 – in the middle of the pandemic. Therefore, unlike other cybersecurity security projects, it left more space for reflection due to lack of the personal connection and opinion exchange with colleagues over a cup of coffee.
The 12 months allocated for the project were split into 3 stages: data classification, the algorithm feeding and testing.
Stage 1 – Getting the real data from the real plant. The first stage, which consisted in the data collection inside Iren’s environment, was the most challenging one.
“We collected information from different sources, which is usually not merged together, but once it is, it guarantees more visibility, security and control across both IT and OT operations”, says Mario Marchese, Professor and Head of the Laboratory Satellite Communications and Networking at the University of Genoa.
Dealing with a significant and heterogeneous amount of raw data, the first effort has been the so called feature extraction, which is about selecting those data intended to be informative, useful, and suitable for Machine Learning, according to the task it should perform, which, in this case, was the detection of anomalies and unusual behaviours.
“Having so many different pieces of information, the most complicated part has been their integration, which consisted of defining dynamics that an Artificial Intelligence algorithm would be able to detect and process”, explains Fabio Patrone, researcher and co-leader of Professor Marchese. “It was extremely difficult to figure out which data were actually useful and which behaviours required special attention. For example, let’s consider a bunch of simple events: I am standing at the window, a car is passing by, and a bird flies between me and the car. Most probably this last event is irrelevant. But what if the bird flies there every day at the same time? It may be suspicious and therefore needs to be taken into account”.
In this initial part of the project, the participation of Iren was of fundamental importance: many of its employees were involved, from the Plant Manager to the ICT Security Specialist.
Stage 2 – Solution modelling and prototype selection. The next step was the selection of the best design and algorithm to put together the prototype of the solution. This phase saw a strong synergy between the Sababa Development Team and the researchers from Professor Marchese’s Laboratory.
“Sababa, as project leader, coordinated the players involved, bringing together academic technical expertise with knowledge from the industrial sector. In each phase of the project, and especially in this one, Sababa defined and monitored the technological and organisational requirements pursued, involving targeted professionals to adequately manage the most critical aspects”, comments Matteo Oliveri, Cybersecurity Advisor at Sababa Security, “In the implementation phase, we installed, managed and made available the technological infrastructure needed to define, develop and test the solution, contributing to the integration of the software components used”.
Stage 3 – Testing. The third and last part of the project was the testing phase – one of the most appreciated in Professor Marchese’s laboratory, but also one of the most complex and time-consuming as Machine Learning requires granular configuration to make the algorithm work properly across the safety-relevant environments and the anomaly detection be performed efficiently.
“It is always satisfying to see how a simple idea written on a piece of paper can become reality, even if the path to your goal is not exactly the one you had planned. There is always something that doesn’t go right or that takes longer than expected, but that’s the beauty of ambitious plans”, comments Fabio Patrone.
In projects like this, testing can never be missed, especially when the researchers are going for very practical results, being funded by the government and working for one of the biggest industrial enterprises in the country.
Although it is not the first project in this area, the goal of creating an algorithm for AI capable of detecting relevant heterogeneous data and correlating it with each other was pretty ambitious, but delivered with great results, especially considering the historical period in which everything has started. The solution, with its modular design, is able to ingest and clean up a large variety of events originating from cyber (Firewall, VPN logs), cyber-physical (SCADA, IoT-related events) and physical security systems (people, vehicle access control), thus allowing to have an holistic and multidimensional vision of the infrastructure in terms of security and resilience.
A final aspect not to be underestimated is the scalability of the solution: despite the fact that the starting raw data and sources would be different, the SIEM system can be fully adapted and easily implemented in different environments.
|_ga||2 years||The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.|
|_gat_gtag_UA_150416163_1||1 minute||Set by Google to distinguish users.|
|_gid||1 day||Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.|
|pardot||past||The pardot cookie is set while the visitor is logged in as a Pardot user. The cookie indicates an active session and is not used for tracking.|
|visitor_id909942-hash||10 years||No description|
|lpv909942||30 minutes||No description|
|visitor_id909942||10 years||No description|