Data infrastructure for cybersecurity
get acquainted with cyber-security terms!!!
Why cybersecurity?
Based on the recent data breaches in Canada published on cyberlands.io , it is summarized as
Cybersecurity Breach | Description | Impact |
Bell Canada (2017) | Unauthorized access to customer information including email addresses and phone numbers. | Nearly 1.9 million customers affected. |
Bell Canada (2018) | Unauthorized access to customer information including name and email address. | Up to 100,000 customers affected. |
Home Depot Canada (2020) | Internal system error leaked customer names, email addresses, order numbers, and last four digits of payment cards. | A "very small number" of customers affected. |
TIO Networks (2017) | Unauthorized access to personal information of customers and billers. | 1.6 million customers affected. |
TransUnion (2019) | Third party accessed personal information of about 37,000 Canadian customers. | Client name, date of birth, addresses, credit and loan information, and potentially social insurance number compromised. |
Canada Post (2018) | Unauthorized access to order records of 4,500 customers. | Names, postal codes, dates of delivery, OCS reference numbers, Canada Post tracking numbers, and OCS corporate names and business addresses compromised. |
Nissan Canada (2017) | Unauthorized access to customer information including name, address, vehicle information, and banking information. | All current and former customers affected. |
Superior Plus (2021) | Ransomware attack led to temporary disabling of certain computer systems and applications. | No evidence of personal data compromise. |
Yves Rocher (2019) | Personal data of about 2.5 million Canadian customers left exposed on an unsecured database. | First and last names, dates of birth, phone numbers, email addresses, and zip codes compromised. |
LifeLabs (2019) | Unauthorized access to personal information of an estimated 15 million Canadians. | Half of Canada's total population affected. |
CVS and Walmart Canada (2015) | Canadian information technology vendor leaked credit card information from online photo processing websites. | Data on millions of users possibly compromised. |
Rogers Communications (2015) | Massive company and client data leaks due to a targeted employee. | Emails and contact details of 50 to 70 mid-size businesses compromised. |
Homewood Health (2021) | Stolen documents put up for sale on Marketo by ransomware. | No evidence of unauthorized access to any of Homewood Health's client application systems. |
Panasonic Canada (2022) | Cyberattack led to unauthorized access to internal systems, processes, and networks. | Scope of data affected not disclosed. |
Additoinally, according to survey here, 83% of the companies have been hit with one data breach. The average cost for each data breach is summarized here (taking out very large outliers)
Region | Average Cost | |
US | 9.44 Million | |
World Wide | 4.35 Million |
Data breaches cause customer to lose faith in your product and may lead to significant business loss. Therefore, we need to develop some way to ensure the cyber-security of our company. But the questions come down to how?
Passive Defense: you-know-it
The typical architecture for a software looks like the following diagram,
From left to the right of the diagram,
user
: at user layer, you can have MFA (multi-factor authentication) to make it more sure.device
: at device level it have MDM (mobile device management) to recognize the foreign device (The msg you see when you bought a new phone and trying to log in gmail) to ensure security.web server
: EDR (endpoint detection and response) focusing on end-point level security.application server
database
: encryption at rest (asymmetric key, the key you use to connect github via ssh everyday XD ) to encrpt the data. Even if hacker breach in the db, without private key, they can't get decipher it.
Those defence mechanisms mentioned above that we know are more passive. What if we wish to fight back? Let's get THREAT HUNTING started.
Active Defense: Threat hunting
The diagram below is the typical pattern of an cyber-attack, it consists of
probing period
: the period of time hackers are poking and probing around in your system to find a vulnerability.recovery period
: the average time it takes for alarm to be triggered, mean time to identify = 200 days, reference here.containment period
: the average time to contain it after the alarm has been triggered, mean time to contain = 70 days, same reference as above.
200 days is a long time and what if we could shorten it when they are still probing around in the system? The answer is increase observability and make data-driven decision illustrated in the figure below,
You can think of it as
datadog
, (well datadog also has threat intelligence feature and anomaly detection) a way to hook up to all the log files so we could real-time monitoring it.
Once we find all those "suspicious behavior" via monitoring system, it's time for cybersecurity analysts to rise and shine!!!!
SIEM
Let's imagine a day for cybersecurity analyst
too many blindspots to investiagte
1000 alerts per day, which one is more important?
have to use 100+ siloed scripting tools to investigate 100+ different data source
There comes the savior, SIEM (security information and event management)
: it a centralized place that takes in all the log data from different feeds such as threat intel, EDR, NDR etc and output high-fiedlity alerts by weeding out bad ones.
So analyst can focus on investigate the important ones!
Note:
In the perfect world, it should be 100% automated. But in reality, those tools like SIEM just help you move more towards automation but never 100%.
But as time goes by and you have to work with new system and legacy system. You have more and more SIEM system that is hard to be integrated together (or it just not necessary and we treat each SIEMs as distributed storage).
We could use something like a federated search
to grab data from multiple source (SIEM) and no need to centralize all of those.
XRD
XRD (extended detection and response)
: in short, it's SIEM on steroids which integrate from SIEM and other new systems to make cyber security analyst life easier.
Also more ML-based classification method (instead of rule based) has been implemented in XRD to detect suspicious incidents.
SOAR
SOAR (security orchestration, automation and response)
: it is a data platform that orchestrate each incident (think of cron + JIRA) to create a cohesive (wow big word!) but orchestrated workflow to automate out some of common tasks in the world of cyber security.
Summary
Those tools mentioned above are the common data infrastructure used for the cyber-security team to make data-driven decisions to ensure the cybersecurity of the company. Just a little recap:
SIEM
: a centralized place to aggregate and monitor security-related log data.XRD
: SIEM on steroid with more data sources and more MLSOAR
: a platform that orchestrates and JIRA your security analysts team.