Re: Kernel audit output is inconsistent, hard to parse

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Linda Knippers <linda.knippers@hp.com>
To: John Dennis <jdennis@redhat.com>
Cc: linux-audit@redhat.com
Subject: Re: Kernel audit output is inconsistent, hard to parse
Date: Thu, 31 Jan 2008 16:11:38 -0500	[thread overview]
Message-ID: <47A2398A.2050309@hp.com> (raw)
In-Reply-To: <479FAF24.8050109@redhat.com>

At one time we talked about converting to a binary record format.
Maybe some structure would be helpful in defining and enforcing
rules for what information in the records can/should be and what it
all means.

-- ljk

John Dennis wrote:
> The format of audit messages from the kernel is a mess. The
> bottom line is one cannot parse the audit messages without special
> case knowledge of each audit message because the data formatting does
> not follow any regular rules.
> 
> I don't know how it got this way, but it really needs to be fixed.
> 
> The primary offense is string formatting, specifically the use or
> non-use of the functions audit_log_hex() audit_log_untrustedstring(),
> and audit_log_n_untrustedstring(); depending on circumstances.
> 
> The net result is a field value might be one of the following cases:
> 
> 1) a string without quotes (maybe a string, maybe an int, etc.)
> 
> 2) a string enclosed in quotes (implies a string with no escaped chars)
> 
> 3) a string which is represented as a sequence of hex values (not
>    enclosed in quotes, but how do you distinguish this from case 1?)
> 
> Given the name=value formatting it is absolutely impossible to
> correctly interpret the value component unless you know how
> audit_log_format was invoked to generate the name=value pair. This
> will be dependent on the kernel version, the field name, and the audit
> record type. To be specific, during parsing only case 2 is
> unambiguous. You cannot determine between case 1 and case
> 3. Heuristics based on each character being in the hexadecimal
> character set fail for a significant subset of data, thus you don't
> know if the value is a string encoded in hexadecimal which needs to be
> decoded or a string which happens to be composed of hexadecimal
> characters but is not encoded.
> 
> Thus we have the situation where to correctly parse the name=value
> pair one must know the audit record type, the field name, and the
> kernel version. That is just plain CRAZY and UNNECESSARY. Trying hide
> this logic in auparse is just a band-aid over the problem compounded
> by the fact auparse does not always get it right either. This is in
> conjunction with the fact auparse has no way to know the kernel
> version of the audit data it is attempting to parse (nor should it
> even have tables based on kernel version).
> 
> The answer is to make the output parsable without special case
> knowledge. It would appear many of these problems were introduced with
> the functions audit_log_hex() audit_log_untrustedstring(), and
> audit_log_n_untrustedstring() which attempt to correct for a double
> quote, white space, or non-printable character in the output
> string. However these are not used uniformly nor do they follow any
> common approach for string representations in user land (why not?).
> 
> All field values without exception need to be enclosed in quotes to
> delimit the value. Special characters inside the quotes need to
> be escaped, following some standard convention. Please, lets not
> invent a new encoding, this problem has already been solved
> elsewhere many times before!
> 
> Also note the function audit_log_n_untrustedstring() in audit.c has a
> bug and ignores the len parameter (it iterates till it finds a NULL
> terminator even though it's supposed to stop after n chars).
> 
> Suggested Fix:
> --------------
> 
> Most of these problems can easily be fixed if there is exactly one
> central place to format an audit field value. The function
> audit_log_vformat() could very easily ensure consistent formatting via
> % format specifiers in the format string, e.g.:
> 
>     audit_log_format("n=%d path=%s", n, path);
> 
> Building audit output piecewise would be deprecated, e.g. these types
> of sequences would be eliminated.
> 
>     audit_log_format(ab, " n=%d", n);
>     audit_log_format(ab, " name=");
>     audit_log_foo();
> 
> and replaced with:
> 
>     audit_log_format(ab, " n=%d name=%s", foo_to_string(foo));
> 
> Whenever audit_log_vformat() encounters a % format specifier it
> formats to a string, then it converts the string to an escaped quoted
> string, and then inserts the escaped quoted string into the buffer
> (e.g. n="123" name="foo bar\n" )
> 
> This way the formatting is consistent, easy to apply, and is never
> special cased by the caller.
> 
> There are no performance penalties of any note, calling a routine to
> escape only needs to be done when the format specifier is %s. Currently
> this is already done for a subset of output strings, so all we're doing
> is removing the responsibility for escaping from the caller and doing
> it consistently instead of in a subset of cases.
> 
> I don't really care what the encoding is. I only care that it is an
> encoding with wide support. Backslash quoting is very popular,
> familiar and has many implementations. The MIME quoted-printable
> transfer encoding would be another option but might pose some problems
> with line endings. I think backslash quoting would be a good choice. I
> suspect everyone reading this message already knows exactly how to
> interpret a string with backslash escapes.
> 
> Auparse is not the answer:
> --------------------------
> 
> Auparse is not the answer to irregular kernel audit message
> formatting. First of all it forces auparse to have special case logic
> which is not 100% robust and is tied to the kernel source code
> version.
> 
> Second, in it's current implementation auparse confuses transfer
> decoding and substitution, two entirely different concepts needing to
> be applied in entirely different circumstances, but which have been
> conflated.
> 
> auparse_get_field_str() returns the field value in it's encoded form,
> this is almost never of value to the caller. The caller wants the
> field value to be unencoded so it can operate on it. If you want the
> field value to be unencoded you have to call
> auparse_interpret_field(). But auparse_interpret_field() performs two
> distinctly different operations, it both decodes AND performs
> contextual substitution. Contextual substitution only has meaning when
> applied on the same host and at approximately the same time as when
> the audit record was generated. Contextual substitution is mainly of
> value for human readable output, it is difficult to utilize with
> automated machine processing. At the moment it is not possible to get
> a decoded value from auparse without it also performing undesired
> substitution.
> 
> While we're at it:
> ------------------
> 
> If we do fix the format of audit messages we might as well fix some
> other inconsistencies at the same time.
> 
> 1) The initial part of AVC messages do not follow the standard
>    name=value formatting used everywhere else in audit.
> 
>    a) It includes the string "avc:" which is redundant with the audit
>    record type (e.g. type=AVC), the string "avc:" should be removed,
>    it serves no purpose and only makes parsing much harder because of
>    the inconsistency.
> 
>    b) denied|granted are bare words without a field name, it should be
>    seresult="denied", once again to avoid special case parsing.
> 
>    c) The list of operations are enclosed in curly braces {} without a
>    field name, this should be seperms=xxx, where xxx is a list. The
>    use of curly braces to encode a list in audit data is unique. We
>    should define how any audit message should encode a list of values
>    and use that consistently for all audit data. While one could
>    define a syntax such as "[value1, value2]" or some such, it might
>    be informative to look at how other transfer mechanisms such as
>    structured markup and ldap handle this case. They both utilize the
>    concept of multi-valued attributes. Thus there is no list
>    structure, but an attribute is allowed to repeat itself and in the
>    process implicitly creates a list of values for the attribute. Thus
>    {read write} might be represented as seperms="read"
>    seperms="write". This regularity makes parsing much easier, it
>    avoids special case syntax.
> 
> 2) (Note, this is not a kernel issue) The host data is currently
>    prepended to the audit record with the format host=xxx. Is this an
>    encoded string or not? It should be encoded and it should be
>    encoded in exactly the same format as the name/value pairs in the
>    audit records. The same holds true for the record type, it should
>    follow the same syntax as every other name/value pair.
> 
> 3) The string "audit(ssssss.mmmm:iiii):" is a critical delimiter, it
>    separates record properties (e.g. host, type, timestamp) from
>    record data, which must be a sequence of name="value" pairs. But
>    the time stamp should really follow the name/value pair encoding
>    used elsewhere.
> 
> Desired syntax:
> ---------------
> 
> Records consist of a sequence of name="value" pairs.
> 
> Ordering of name/value pairs is significant for multi-valued
> attributes (i.e. where name appears more than once), insignificant
> otherwise.
> 
> The value MUST be enclosed in double quotes with interior characters
> properly escaped.
> 
> White space between name, '=', and "value" is insignificant and
> ignored.
> 
> The audit record is partitioned into two parts
> 
>   a) record properties (i.e. host, record type, timestamp)
> 
>   b) record data
> 
> The partition of properties and data occurs at a colon delimiter,
> i.e. properties : data
> 
> The current formatting of the record timestamp
> (e.g. audit(ssss.mmm:iii) is inconsistent with
> all other name/value pairs. It should be "seconds="sss"
> milliseconds="mmm" serial="iii", this allows parsing to be regular and
> consistent.
> 
> Thus an audit record with consistent syntax would look like this,
> where brackets [] indicate optional components:
> 
> [host=""] type="" seconds="" milliseconds="" serial="" : name="" [name=""]
> 
> What has to change and what's optional:
> ---------------------------------------
> 
> The formatting of name/value pairs in the kernel must be fixed, it is
> simply impossible to correctly parse in it's current state.
> 
> The rest of the suggested changes are syntactic sugar which would make
> parsing easier because of regular syntax, but they are not
> critical. We could retain the existing formats if backwards
> compatibility is felt to trump syntactic cleanliness and ease in
> parsing. It's a judgment call over when and how to introduce change
> and the anticipated impact.
> 
>

next prev parent reply	other threads:[~2008-01-31 21:11 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-29 22:56 Kernel audit output is inconsistent, hard to parse John Dennis
2008-01-29 23:37 ` Miloslav Trmac
2008-01-30  1:15   ` John Dennis
2008-01-30 14:21 ` Steve Grubb
2008-01-30 15:34   ` Paul Moore
2008-01-30 16:01     ` Steve Grubb
2008-01-30 16:35       ` Paul Moore
2008-01-30 16:39     ` Eric Paris
2008-01-30 16:41       ` Eric Paris
2008-01-30 16:19   ` John Dennis
2008-01-30 14:26 ` Paul Moore
2008-01-30 16:30 ` Eric Paris
2008-01-30 17:43   ` John Dennis
2008-01-31 21:11 ` Linda Knippers [this message]
2008-01-31 21:21   ` Steve Grubb
2008-01-31 21:59     ` Paul Moore
2008-01-31 22:43       ` Casey Schaufler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47A2398A.2050309@hp.com \
    --to=linda.knippers@hp.com \
    --cc=jdennis@redhat.com \
    --cc=linux-audit@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.