Pattern Description Languages
From TracingWiki
TracingBook : Data Analysis Tools : Pattern Description Languages
Contents |
[edit] Pattern Description Languages
Analyzing kernel traces for pattern matching requires a non-ambiguous, easy to use pattern description language. It should be possible to express in this language a wide variety of patterns touching on several fields of interest such as security, performance debugging and software testing.
In intrusion detection, domain specific languages are used to describe security rules and attack scenarios for validation in audit trails (some of those languages are also used for lexical and syntactic analysis). Furthermore, in parallel computing, other languages are used to express performance-based properties to be validated in MPI (Message Passing Interface) and OpenMP (Open Multi-Processing) traces. Similarly, kernel tracers such as DTrace and SystemTap provide languages dedicated to performing run-time trace analysis.
[edit] Language Requirements
- Simplicity: The language should only provide the minimal set of operators sufficient to describe the variety of patterns.
- Expressiveness: Using the language, it should be possible to describe all sorts of known problematic patterns as well as new ones which may be defined in the future.
- Tracer-independent: The language shouldn't depend on the internal format of the kernel trace. Whether the kernel events are collected using one tracer or another, the pattern description should not vary at all.
- Unambiguous: The language should have a rigorously defined syntax and semantics allowing only one clear interpretation of any given described pattern.
- Multi-event-based patterns support: Some languages, such as the one used to write Snort rules, provide operators that are useful to process only one event, whereas in many cases the patterns are composed of multiple events. It should be possible to describe such patterns including the ordering between the events.
- On-line/Off-line distinction: the language should not make any a priori assumptions whether the analysis is performed on-line or off-line.
[edit] Taxonomy of Pattern Description Languages
Pattern Description Languages can be divided into three separate categories: Domain Specific Languages, General Purpose Languages and Automata-Based Languages. Domain Specific Languages can further be divided into Declarative and Imperative languages. The languages studied below are good candidates for event-based programming; they are studied to determine how applicable they could be to describe problematic patterns that can be found in a kernel trace.
[edit] Domain Specific Languages
Domain specific languages (DSLs), previously known as special purpose programming languages, are usually dedicated to solving a particular problem or implementing a well-defined domain-specific task, as opposed to general-purpose programming languages such as C or Java. Examples of domain specific languages include VHDL (a hardware description language), Csound (a language for creating audio files), and DOT (an input language for GraphViz, a graph visualization software).
[edit] DSL Advantages
The need for a new DSL is often justified whenever the introduced language helps formulating a particular problem or solution more clearly and more easily than by using preexisting languages. For instance, it should support the level of abstraction of the specific domain, hiding all the low-level complexity and implementation, so that the field experts can easily understand, maintain, update and develop new DSL-based programs.
[edit] Declarative DSLs
Declarative programming, in contrast with imperative programming, consists in describing what is to be done rather than how to do it. In other words, the logic of a computation is described rather than the control flow of the program. A popular intrusion detection system called Snort employs a declarative DSL to describe a large number of security related rules which are validated by inspecting the network packets.
[edit] Snort
Snort is a free, open source, extensible, intrusion prevention and detection system that can be used as a packet sniffer, a packet logger, or a network-based intrusion detection system (NIDS). It can operate by validating a database of security rules against the network packets. Rules are used to capture malicious packets before they can cause any damage to the system.
Snort employs a declarative DSL for rules descriptions. Packet networks are inspected by Snort IDS based on the specified rules. As an example, a rule could state that any packet directed to port 80 and containing the string “cmd.exe” is considered malicious and therefore measures could be taken to deal with it.
Snort rules are a set of declarative instructions designed to express a pattern to look for in the network packets. Many parts of the packet can be inspected such as the source and destination IP addresses, the source and destination port numbers, the protocol options and the packet payload. Rules are composed of two parts: the rule header and the rule options. The header contains the rule's action, protocol, source and destination IP addresses and netmasks, and the source and destination port numbers. The rule options are used for many operations such as content matching, TCP flags testing or payload size checking, etc.
The different fields that form a rule header are: Rule actions, protocols, IP addresses, and port numbers. In the first part of the header, one of the three possible actions can be specified:
- Log action: logs the packet that caused the rule to fire.
- Alert action: logs the packet that caused the rule to fire and generates an alert using the selected alert method.
- Pass action: lets the packet through without any further processing.
- Drop action: drops the packet (in-line mode only).
- SDrop action: silently drops the packet (in-line mode only).
- Reject action: rejects the packet (in-line mode only).
The next field in the rule header is the protocol. The three protocols that Snort supports are TCP, UDP and ICMP (the IP protocol stands for any of the previous three). Then, the source and destination IP addresses are specified in the header. The keyword “any” is used to define any IP address. The Direction operator “->” is used to differentiate between the source and the destination of the packet. Furthermore, the netmask could be specified to designate a block of addresses for the destination. In the last part of the header, port numbers are specified. One or a range of port numbers can be specified either directly or by negation.
Snort provides 15 different rule options that are used for operations such as pattern matching or testing the IP and TCP fields. For instance, one can test the TCP flags or the TTL (time-to-live, in the IP header) for certain values. All rule options deal with one packet at a time. So basically, writing a Snort rule would consist in defining a set of constraints on the fields of every occurring packet. Snort validates the packets one at a time, and doesn't allow one rule to be fired based on constraints touching on two separate packets. This is mainly due to the fact that Snort was designed to process packets online while incurring a low impact on the system's performance. By expressing more complex rules, the run-time packet inspection would significantly impact the system's performance.
