Data loss prevention software detects potential data breaches/data ex-filtration transmissions and prevents them by monitoring, detecting and blocking sensitive data while in-use (endpoint actions), in-motion (network traffic), and at-rest (data storage). In data leakage incidents, sensitive data is disclosed to unauthorized parties by either malicious intent or an inadvertent mistake. Sensitive data includes private or company information, intellectual property (IP), financial or patient information, credit-card data and other information.
Contents
- Categories
- Standard measures
- Advanced measures
- Designated solutions
- Network
- Endpoint
- Data identification
- Data leak detection
- Data at rest
- Data in use
- Data in motion
- References
The terms "data loss" and "data leak" are related and are often used interchangeably. Data loss incidents turn into data leak incidents in cases where media containing sensitive information is lost and subsequently acquired by an unauthorized party. However, a data leak is possible without losing the data on the originating side. Other terms associated with data leakage prevention are information leak detection and prevention (ILDP), information leak prevention (ILP), content monitoring and filtering (CMF), information protection and control (IPC) and extrusion prevention system (EPS), as opposed to intrusion prevention system.
Categories
The technological means employed for dealing with data leakage incidents can be divided into categories: standard security measures, advanced/intelligent security measures, access control and encryption and designated DLP systems.
Standard measures
Standard security measures, such as firewalls, intrusion detection systems (IDSs) and antivirus software, are commonly available products that guard computers against outsider and insider attacks. The use of a firewall, for example, prevents the access of outsiders to the internal network and an intrusion detection system detects intrusion attempts by outsiders. Inside attacks can be averted through antivirus scans that detect Trojan horses that send confidential information, and by the use of thin clients that operate in a client-server architecture with no personal or sensitive data stored on a client device.
Advanced measures
Advanced security measures employ machine learning and temporal reasoning algorithms for detecting abnormal access to data (e.g., databases or information retrieval systems) or abnormal email exchange, honeypots for detecting authorized personnel with malicious intentions and activity-based verification (e.g., recognition of keystroke dynamics) and user activity monitoring for detecting abnormal data access.
Designated solutions
Designated DLP solutions detect and prevent unauthorized attempts to copy or send sensitive data, intentionally or unintentionally, mainly by personnel who are authorized to access the sensitive information. In order to classify certain information as sensitive, these solutions use mechanisms, such as exact data matching, structured data fingerprinting, statistical methods, rule and regular expression matching, published lexicons, conceptual definitions and keywords.
Network
Network (a.k.a. data in motion
Endpoint
Endpoint (a.k.a. data in use
Data identification
DLP solutions include techniques for identifying confidential or sensitive information. Sometimes confused with discovery, data identification is a process by which organizations use a DLP technology to determine what to look for.
Data is classified as structured or unstructured. Structured data resides in fixed fields within a file such as a spreadsheet, while unstructured data refers to free-form text or media as in text documents, PDF files and video. An estimated 80% of all data is unstructured and 20% structured. Data classification is divided into content analysis, focused on structured data and contextual analysis which looks at the place of origin or the application or system that generated the data.
Methods for describing sensitive content are abundant. They can be divided into precise and imprecise methods.
Precise methods involve Content Registration and trigger almost zero false positive incidents.
All other methods are imprecise and can include: keywords, lexicons, regular expressions, extended regular expressions, meta data tags, bayesian analysis and statistical analysis techniques such as Machine Learning, etc.
The strength of the analysis engine directly relates to its accuracy. The accuracy of DLP identification is important to lowering/avoiding false positives and negatives. Accuracy can depend on many variables, some of which may be situational or technological. Testing for accuracy is recommended to ensure a solution has virtually zero false positives/negatives. High false positive rates cause the system to be considered DLD not DLP.
Data leak detection
Sometimes a data distributor gives sensitive data to one or more third parties. Some time later, some of the data is found in an unauthorized place (e.g., on the web or on a user's laptop). The distributor must then investigate the source of the leak.
Data at-rest
"Data at rest" specifically refers to old archived information. This information is of great concern to businesses and government institutions simply because the longer data is left unused in storage, the more likely it might be retrieved by unauthorized individuals. Protecting such data involves methods such as access control, data encryption and data retention policies.
Data in-use
"Data in use" refers to data that the user is currently interacting with. DLP systems that protect data in-use may monitor and flag unauthorized activities. These activities include screen-capture, copy/paste, print and fax operations involving sensitive data. It can be intentional or unintentional attempts to transmit sensitive data over communication channels.
Data in-motion
"Data in motion" is data that is traversing through a network to an endpoint destination. Networks can be internal or external. DLP systems that protect data in-motion monitor sensitive data traveling across a network through various communication channels.