* get_field_str() and interpret_field() bug with multi-word fields
@ 2008-08-12 17:49 Jonathan Kelly
2008-08-12 18:05 ` LC Bruzenak
` (2 more replies)
0 siblings, 3 replies; 37+ messages in thread
From: Jonathan Kelly @ 2008-08-12 17:49 UTC (permalink / raw)
To: Linux-audit; +Cc: William Kelly, Bret Piatt
[-- Attachment #1.1: Type: text/plain, Size: 4393 bytes --]
Hello,
When using the python auparse library to call AuParser.interpret_field()
on a multi-word field, only the first word in the field is returned.
Using get_field_str() instead of interpret_field() yields the same
output. I have verified that this issue exists in the C library, as
well as the Python. I suspect that this may be an issue for multi-word
fields in general, but have not noticed any other than 'op'.
Here is some sample code and input/output:
---
#/usr/bin/python
from auparse import *
parser = AuParser(AUSOURCE_LOGS)
parser.search_add_item("type", "=", "USER_CHAUTHTOK",
AUSEARCH_STOP_EVENT)
account_changes = []
while(parser.search_next_event() == True):
for record in range(parser.get_num_records()):
event = {}
event ['timestamp'] = parser.get_timestamp().sec
for field in range(parser.get_num_fields()):
key = parser.get_field_name()
value = parser.interpret_field()
event[key] = value
parser.next_field()
if event['type'] == 'USER_CHAUTHTOK':
account_changes.append(event)
parser.next_record()
parser.parse_next_event()
print account_changes
---
#include <auparse.h>
#include <stdio.h>
#include <libaudit.h>
int main(void)
{
auparse_state_t *au = auparse_init(AUSOURCE_LOGS, NULL);
if (au == NULL)
exit(1);
if ( ausearch_add_item(au, "type", "=", "USER_CHAUTHTOK",
AUSEARCH_RULE_CLEAR))
exit(1);
if ( ausearch_set_stop(au, AUSEARCH_STOP_EVENT) )
exit(1);
while (ausearch_next_event(au) > 0) {
if (auparse_find_field(au, "op")) {
printf("interpret: op=%s\n",
auparse_interpret_field(au));
printf("str: op=%s\n",
auparse_get_field_str(au));
}
auparse_next_event(au);
}
auparse_destroy(au);
return 0;
}
---
(audit.log)
type=USER_CHAUTHTOK msg=audit(1218562665.856:1103638): user pid=13396
uid=0 auid=502 msg='op=adding user acct=testuser exe="/usr/sbin/useradd"
(hostname=?, addr=?, terminal=pts/0 res=success)'
type=USER_CHAUTHTOK msg=audit(1218562665.895:1103662): user pid=13396
uid=0 auid=502 msg='op=adding home directory acct=testuser
exe="/usr/sbin/useradd" (hostname=?, addr=?, terminal=pts/0
res=success)'
type=USER_CHAUTHTOK msg=audit(1218562670.415:1103686): user pid=13401
uid=0 auid=502 msg='op=deleting user entries acct=testuser
exe="/usr/sbin/userdel" (hostname=?, addr=?, terminal=pts/0
res=success)'
type=USER_CHAUTHTOK msg=audit(1218562670.416:1103687): user pid=13401
uid=0 auid=502 msg='op=deleting group acct=testuser
exe="/usr/sbin/userdel" (hostname=?, addr=?, terminal=pts/0 res=failed)'
(python with full event)
{'auid': '502', 'exe': '"/usr/sbin/useradd"', 'uid': '0', 'timestamp':
1218562665, 'hostname': '?', 'pid': '13396', 'terminal': 'pts/0', 'res':
'success', 'addr': '?', 'acct': 'testuser', 'type': 'USER_CHAUTHTOK',
'op': 'adding'},
{'auid': '502', 'exe': '"/usr/sbin/useradd"', 'uid': '0', 'timestamp':
1218562665, 'hostname': '?', 'pid': '13396', 'terminal': 'pts/0', 'res':
'success', 'addr': '?', 'acct': 'testuser', 'type': 'USER_CHAUTHTOK',
'op': 'adding'},
{'auid': '502', 'exe': '"/usr/sbin/userdel"', 'uid': '0', 'timestamp':
1218562670, 'hostname': '?', 'pid': '13401', 'terminal': 'pts/0', 'res':
'success', 'addr': '?', 'acct': 'testuser', 'type': 'USER_CHAUTHTOK',
'op': 'deleting'},
{'auid': '502', 'exe': '"/usr/sbin/userdel"', 'uid': '0', 'timestamp':
1218562670, 'hostname': '?', 'pid': '13401', 'terminal': 'pts/0', 'res':
'failed', 'addr': '?', 'acct': 'testuser', 'type': 'USER_CHAUTHTOK',
'op': 'deleting'}]
(c with just op field)
interpret: op=adding
str: op=adding
interpret: op=adding
str: op=adding
interpret: op=deleting
str: op=deleting
interpret: op=deleting
str: op=deleting
---
Unfortunately, my C is a little too rusty for me to attempt a patch
myself, but I hope this gives you everything you need to get this fixed!
Best regards,
Jonathan Kelly
[-- Attachment #1.2: Type: text/html, Size: 11116 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 17:49 get_field_str() and interpret_field() bug with multi-word fields Jonathan Kelly
@ 2008-08-12 18:05 ` LC Bruzenak
2008-08-12 18:52 ` John Dennis
2008-08-12 18:16 ` John Dennis
2008-08-12 19:16 ` Steve Grubb
2 siblings, 1 reply; 37+ messages in thread
From: LC Bruzenak @ 2008-08-12 18:05 UTC (permalink / raw)
To: Jonathan Kelly; +Cc: William Kelly, Linux-audit, Bret Piatt
On Tue, 2008-08-12 at 12:49 -0500, Jonathan Kelly wrote:
> Hello,
>
>
>
> When using the python auparse library to call
> AuParser.interpret_field() on a multi-word field, only the first word
> in the field is returned. Using get_field_str() instead of
> interpret_field() yields the same output. I have verified that this
> issue exists in the C library, as well as the Python. I suspect that
> this may be an issue for multi-word fields in general, but have not
> noticed any other than 'op'.
>
>
Line forms here...see the following thread:
https://www.redhat.com/archives/linux-audit/2008-June/msg00005.html
LCB.
--
LC (Lenny) Bruzenak
lenny@magitekltd.com
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 18:05 ` LC Bruzenak
@ 2008-08-12 18:52 ` John Dennis
2008-08-12 19:02 ` LC Bruzenak
0 siblings, 1 reply; 37+ messages in thread
From: John Dennis @ 2008-08-12 18:52 UTC (permalink / raw)
To: LC Bruzenak; +Cc: William Kelly, Bret Piatt, Linux-audit
[-- Attachment #1.1: Type: text/plain, Size: 1361 bytes --]
LC Bruzenak wrote:
> On Tue, 2008-08-12 at 12:49 -0500, Jonathan Kelly wrote:
>
>> Hello,
>>
>>
>>
>> When using the python auparse library to call
>> AuParser.interpret_field() on a multi-word field, only the first word
>> in the field is returned. Using get_field_str() instead of
>> interpret_field() yields the same output. I have verified that this
>> issue exists in the C library, as well as the Python. I suspect that
>> this may be an issue for multi-word fields in general, but have not
>> noticed any other than 'op'.
>>
>>
>>
>
> Line forms here...see the following thread:
> https://www.redhat.com/archives/linux-audit/2008-June/msg00005.html
>
> LCB.
>
>
The line started a while ago ...
https://www.redhat.com/archives/linux-audit/2008-January/msg00082.html
(the discussion "While we're at it" is irrelevant to the current topic)
FWIW, I think the proper encoding should be that all string values are
enclosed in double quotes and the string encoding follows the same
backslash escaping defined for the C language which was subsequently
adopted by many other system components which would make it instantly
familiar and parseable by many tools. This would be a very simple and
welcome fix.
More complaints here:
https://www.redhat.com/archives/linux-audit/2008-June/msg00009.html
--
John Dennis <jdennis@redhat.com>
[-- Attachment #1.2: Type: text/html, Size: 2221 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 18:52 ` John Dennis
@ 2008-08-12 19:02 ` LC Bruzenak
0 siblings, 0 replies; 37+ messages in thread
From: LC Bruzenak @ 2008-08-12 19:02 UTC (permalink / raw)
To: John Dennis; +Cc: William Kelly, Bret Piatt, Linux-audit
On Tue, 2008-08-12 at 14:52 -0400, John Dennis wrote:
...
>
> FWIW, I think the proper encoding should be that all string values are
> enclosed in double quotes and the string encoding follows the same
> backslash escaping defined for the C language which was subsequently
> adopted by many other system components which would make it instantly
> familiar and parseable by many tools. This would be a very simple and
> welcome fix.
IIRC there were objections to the double-quotes but that would work for
me too.
LCB.
--
LC (Lenny) Bruzenak
lenny@magitekltd.com
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 17:49 get_field_str() and interpret_field() bug with multi-word fields Jonathan Kelly
2008-08-12 18:05 ` LC Bruzenak
@ 2008-08-12 18:16 ` John Dennis
2008-08-12 21:13 ` Steve Grubb
2008-08-12 19:16 ` Steve Grubb
2 siblings, 1 reply; 37+ messages in thread
From: John Dennis @ 2008-08-12 18:16 UTC (permalink / raw)
To: Jonathan Kelly; +Cc: William Kelly, Linux-audit, Bret Piatt
[-- Attachment #1.1: Type: text/plain, Size: 1544 bytes --]
Jonathan Kelly wrote:
>
> Hello,
>
>
>
> When using the python auparse library to call
> AuParser.interpret_field() on a multi-word field, only the first word
> in the field is returned. Using get_field_str() instead of
> interpret_field() yields the same output. I have verified that this
> issue exists in the C library, as well as the Python. I suspect that
> this may be an issue for multi-word fields in general, but have not
> noticed any other than 'op'.
>
The thing to note here is that only the characters up to the first white
space were included in the field.
Unfortunately string handling in audit is seriously broken and has been
for a long time. The audit code does not know how to handle strings with
embedded spaces, quotes, etc. The fundamental problem is the format for
string encoding was never defined. There is a horrible hack the kernel
uses when a string has a space in it, it converts the string to a
sequence of hex characters, thus there is no space in the value of the
key=value pair. Auparse has a hard coded list of keys it expects might
have hex encoded strings in it, if the key (msg in this instance) is in
the list then the interpret function will decode the hex string.
You might try encoding the msg in hex to see if it starts working.
Or you might try convincing the audit maintainers to adopt sane string
handling rules (but good luck on this front, there have been many many
complaints about this for a long time and nothing has happened :-(
--
John Dennis <jdennis@redhat.com>
[-- Attachment #1.2: Type: text/html, Size: 3652 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 18:16 ` John Dennis
@ 2008-08-12 21:13 ` Steve Grubb
2008-08-12 22:10 ` Matthew Booth
0 siblings, 1 reply; 37+ messages in thread
From: Steve Grubb @ 2008-08-12 21:13 UTC (permalink / raw)
To: linux-audit; +Cc: William Kelly, Bret Piatt
On Tuesday 12 August 2008 14:16:07 John Dennis wrote:
> Unfortunately string handling in audit is seriously broken and has been
> for a long time. The audit code does not know how to handle strings with
> embedded spaces, quotes, etc. The fundamental problem is the format for
> string encoding was never defined.
John, the format is well defined. It was decided on by a group of people lead
by an influential kernel developer. He has long since quit working on the
audit code, but it was his suggestion that this is the only thing that would
fly upstream. There was a suggestion to use XDR encoding/decoding, but that
was nixed. The main concern at the time was that they didn't want any fancy
scheme inside the kernel for a user space problem. They wanted compactness in
the algorithm.
The encoding scheme is simple, if any character in a field meets this test:
if (*p == '"' || *p < 0x21 || *p > 0x7f)
You have to encode. Its that simple. This scheme is unicode friendly as well
as taking care of spaces. I didn't invent this, but I have to ensure that
everything continues to work.
If somebody has a better idea/code in hand when we start the 2.0 code, I'd
like to consider it. The pre-requisites are it has to be backward compatible,
it has to handle unicode, it has to handle fields with odd characters.
-Steve
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 21:13 ` Steve Grubb
@ 2008-08-12 22:10 ` Matthew Booth
2008-08-12 23:01 ` Eric Paris
0 siblings, 1 reply; 37+ messages in thread
From: Matthew Booth @ 2008-08-12 22:10 UTC (permalink / raw)
To: Steve Grubb; +Cc: William Kelly, linux-audit, Bret Piatt
Steve Grubb wrote:
> If somebody has a better idea/code in hand when we start the 2.0 code, I'd
> like to consider it. The pre-requisites are it has to be backward compatible,
> it has to handle unicode, it has to handle fields with odd characters.
I have thought for some time now that the kernel would do better to
produce binary records. This would have many advantages, including:
* Very simple parsing
* Much faster to parse
* Faster to produce
* Much easier to specify
The production of text would then be the problem of the audit daemon. If
the current text based nightmare were frozen, they could even live
side-by-side.
Matt
--
Matthew Booth, RHCA, RHCSS
Red Hat, Global Professional Services
M: +44 (0)7977 267231
GPG ID: D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 22:10 ` Matthew Booth
@ 2008-08-12 23:01 ` Eric Paris
0 siblings, 0 replies; 37+ messages in thread
From: Eric Paris @ 2008-08-12 23:01 UTC (permalink / raw)
To: Matthew Booth; +Cc: William Kelly, Bret Piatt, linux-audit
On Tue, 2008-08-12 at 23:10 +0100, Matthew Booth wrote:
> Steve Grubb wrote:
> > If somebody has a better idea/code in hand when we start the 2.0 code, I'd
> > like to consider it. The pre-requisites are it has to be backward compatible,
> > it has to handle unicode, it has to handle fields with odd characters.
>
> I have thought for some time now that the kernel would do better to
> produce binary records. This would have many advantages, including:
>
> * Very simple parsing
> * Much faster to parse
> * Faster to produce
> * Much easier to specify
>
> The production of text would then be the problem of the audit daemon. If
> the current text based nightmare were frozen, they could even live
> side-by-side.
I've heard this binary audit data talk before. What would it actually
look like?
I'm perfectly fine if someone comes up with some patches that make
wholesale interface changes, but you better be d@m^ sure that I can run
that kernel on RHEL5 and it will work.
-Eric
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 17:49 get_field_str() and interpret_field() bug with multi-word fields Jonathan Kelly
2008-08-12 18:05 ` LC Bruzenak
2008-08-12 18:16 ` John Dennis
@ 2008-08-12 19:16 ` Steve Grubb
2008-08-12 19:58 ` John Dennis
2 siblings, 1 reply; 37+ messages in thread
From: Steve Grubb @ 2008-08-12 19:16 UTC (permalink / raw)
To: linux-audit; +Cc: William Kelly, Bret Piatt
On Tuesday 12 August 2008 13:49:37 Jonathan Kelly wrote:
> When using the python auparse library to call AuParser.interpret_field()
> on a multi-word field, only the first word in the field is returned.
> Using get_field_str() instead of interpret_field() yields the same
> output. I have verified that this issue exists in the C library, as
> well as the Python. I suspect that this may be an issue for multi-word
> fields in general, but have not noticed any other than 'op'.
I am going to expose the encoder in libaudit in the next release so that
fields that have spaces can be escaped such that the parser can handle it.
Since the next release or two will be going into RHEL5, any other changes are
a non-starter.
-Steve
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 19:16 ` Steve Grubb
@ 2008-08-12 19:58 ` John Dennis
2008-08-12 20:11 ` Eric Paris
0 siblings, 1 reply; 37+ messages in thread
From: John Dennis @ 2008-08-12 19:58 UTC (permalink / raw)
To: Steve Grubb; +Cc: William Kelly, linux-audit, Bret Piatt
[-- Attachment #1.1: Type: text/plain, Size: 1836 bytes --]
Steve Grubb wrote:
> On Tuesday 12 August 2008 13:49:37 Jonathan Kelly wrote:
>
>> When using the python auparse library to call AuParser.interpret_field()
>> on a multi-word field, only the first word in the field is returned.
>> Using get_field_str() instead of interpret_field() yields the same
>> output. I have verified that this issue exists in the C library, as
>> well as the Python. I suspect that this may be an issue for multi-word
>> fields in general, but have not noticed any other than 'op'.
>>
>
> I am going to expose the encoder in libaudit in the next release so that
> fields that have spaces can be escaped such that the parser can handle it.
> Since the next release or two will be going into RHEL5, any other changes are
> a non-starter.
>
This is another band-aid compounding the problem by entrenching the hack
further into more code which will then have to be modified when a
correct solution is finally put into place.
Providing an encoder entry point for code which generates audit
key/value pairs does not fix the ambiguous parsing problem. During
parsing one still has to know a priori which field names in which
records are candidates for the hack, it is impossible to correctly parse
otherwise. It also does not address the problem that decoding the hex
value is only performed by "interpret", which is also the wrong
location, it needs to be done during parsing and the field value needs
to be returned with the string restored to it's correct value prior to
transport encoding.
So many people have complained about this; I do not understand the
resistance to fixing it. The argument it would break something which is
broken to begin with does not seem like a reasonable justification to
me. The sooner it's fixed the better IMHO.
--
John Dennis <jdennis@redhat.com>
[-- Attachment #1.2: Type: text/html, Size: 2394 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 19:58 ` John Dennis
@ 2008-08-12 20:11 ` Eric Paris
2008-08-12 20:32 ` Steve Grubb
2008-08-12 20:57 ` John Dennis
0 siblings, 2 replies; 37+ messages in thread
From: Eric Paris @ 2008-08-12 20:11 UTC (permalink / raw)
To: John Dennis; +Cc: William Kelly, Bret Piatt, linux-audit
On Tue, 2008-08-12 at 15:58 -0400, John Dennis wrote:
> So many people have complained about this; I do not understand the
> resistance to fixing it. The argument it would break something which
> is broken to begin with does not seem like a reasonable justification
> to me. The sooner it's fixed the better IMHO.
Show me the code and I'll start trying to fix the kernel based on that
code as best we can. But before you start read over the article
Can user-space bugs be kernel regressions?
http://lwn.net/Articles/292143/
As soon as you grasp that article send me the code and we'll work
together to fix this problem!
-Eric
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 20:11 ` Eric Paris
@ 2008-08-12 20:32 ` Steve Grubb
2008-08-12 21:09 ` John Dennis
2008-08-13 0:33 ` Klaus Heinrich Kiwi
2008-08-12 20:57 ` John Dennis
1 sibling, 2 replies; 37+ messages in thread
From: Steve Grubb @ 2008-08-12 20:32 UTC (permalink / raw)
To: Eric Paris; +Cc: William Kelly, linux-audit, Bret Piatt
On Tuesday 12 August 2008 16:11:42 Eric Paris wrote:
> As soon as you grasp that article send me the code and we'll work
> together to fix this problem!
And any code created needs to be backwards compatible. you could have new user
space/new kernel, or new user space/old kernel, and new kernel/old user
space. You have no way of dictating which versions of anything people will
use.
-Steve
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 20:32 ` Steve Grubb
@ 2008-08-12 21:09 ` John Dennis
2008-08-12 21:24 ` Steve Grubb
2008-08-13 0:33 ` Klaus Heinrich Kiwi
1 sibling, 1 reply; 37+ messages in thread
From: John Dennis @ 2008-08-12 21:09 UTC (permalink / raw)
To: Steve Grubb; +Cc: William Kelly, linux-audit, Bret Piatt
[-- Attachment #1.1: Type: text/plain, Size: 1019 bytes --]
Steve Grubb wrote:
> On Tuesday 12 August 2008 16:11:42 Eric Paris wrote:
>
>> As soon as you grasp that article send me the code and we'll work
>> together to fix this problem!
>>
>
> And any code created needs to be backwards compatible. you could have new user
> space/new kernel, or new user space/old kernel, and new kernel/old user
> space.
The fact you can have any combination of kernel, user code, and
historical log files is precisely why this need to be fixed ASAP. Why?
Because there is no value in being backwards compatible with a data
stream you can't read when any of the three components (kernel, user
libraries, files) are permuted.
The longer we archive log data with this problem the worse the problem
gets. The fact the triplet <kernel, user libraries, file> have to be in
sync is untenable in the long run.
> You have no way of dictating which versions of anything people will
> use.
>
>
Thank you for making my point :-)
> -Steve
>
--
John Dennis <jdennis@redhat.com>
[-- Attachment #1.2: Type: text/html, Size: 1760 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 21:09 ` John Dennis
@ 2008-08-12 21:24 ` Steve Grubb
2008-08-12 22:37 ` John Dennis
0 siblings, 1 reply; 37+ messages in thread
From: Steve Grubb @ 2008-08-12 21:24 UTC (permalink / raw)
To: John Dennis; +Cc: William Kelly, linux-audit, Bret Piatt
On Tuesday 12 August 2008 17:09:18 John Dennis wrote:
> The fact you can have any combination of kernel, user code, and
> historical log files is precisely why this need to be fixed ASAP. Why?
> Because there is no value in being backwards compatible with a data
> stream you can't read when any of the three components (kernel, user
> libraries, files) are permuted.
John, you are very wrong here. We are about to role out remote logging for the
audit system. Anyone who works on production systems knows that they stay
deployed for many years because re-deploying takes manhours and is therefore
a cost sink. The less you touch a system, the better off you are financially.
So, in the future you will likely have a RHEL6 machine aggregating RHEL5
machines. They will not be happy if they find that they have to upgrade all
the machines just to do reports. There's no way I'm going to tell people we
are cutting you off, you have to upgrade.
-Steve
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 21:24 ` Steve Grubb
@ 2008-08-12 22:37 ` John Dennis
0 siblings, 0 replies; 37+ messages in thread
From: John Dennis @ 2008-08-12 22:37 UTC (permalink / raw)
To: Steve Grubb; +Cc: William Kelly, linux-audit, Bret Piatt
[-- Attachment #1.1: Type: text/plain, Size: 2339 bytes --]
Steve Grubb wrote:
> On Tuesday 12 August 2008 17:09:18 John Dennis wrote:
>
>> The fact you can have any combination of kernel, user code, and
>> historical log files is precisely why this need to be fixed ASAP. Why?
>> Because there is no value in being backwards compatible with a data
>> stream you can't read when any of the three components (kernel, user
>> libraries, files) are permuted.
>>
>
> John, you are very wrong here.
I respectfully disagree.
> We are about to role out remote logging for the
> audit system. ... So, in the future you will likely have a RHEL6 machine aggregating RHEL5
> machines.
This is exactly the problem I trying to avoid. Once the log data is
divorced from the user space tools necessary to correctly parse it there
are going to be enormous problems.
Let me be clear, I'm worried about the scenario where an audit log file
was archived from some random system in MegaCorp, then many years later
an auditor investigating MegaCorp decides that log file has critical
information in it. Is MegaCorp going to be able to satisfy the
regulatory requirements to correctly extract the audit data when the
sys-admin who set up the logging left the company years ago, the
information about the system has since been lost, the system has since
been re-installed with a new OS, and no one bothered to archive the
matching version of auparse with the log file?
Don't forget, many auditing regulations require the raw log data to be
preserved, not an interpreted version of the log data. This means one
cannot just run auparse over the file to re-format it prior to archiving
it unless one is willing to store two copies, the raw file and an
interpreted version. People don't want to store two versions of data for
obvious reasons. They want to store the raw data and correctly read it
at any point in the future with one tool. The current scheme does not
satisfy those requirements, nor is it scalable.
I believe it's an absolute requirement that audit log files can be
correctly parsed independent of any external information.
> They will not be happy if they find that they have to upgrade all
> the machines just to do reports. There's no way I'm going to tell people we
> are cutting you off, you have to upgrade.
>
> -Steve
>
--
John Dennis <jdennis@redhat.com>
[-- Attachment #1.2: Type: text/html, Size: 3187 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 20:32 ` Steve Grubb
2008-08-12 21:09 ` John Dennis
@ 2008-08-13 0:33 ` Klaus Heinrich Kiwi
2008-08-13 15:09 ` Eric Paris
2008-08-13 22:35 ` Casey Schaufler
1 sibling, 2 replies; 37+ messages in thread
From: Klaus Heinrich Kiwi @ 2008-08-13 0:33 UTC (permalink / raw)
To: Steve Grubb; +Cc: William Kelly, linux-audit, Bret Piatt
On Tue, 2008-08-12 at 16:32 -0400, Steve Grubb wrote:
>
> And any code created needs to be backwards compatible. you could have new user
> space/new kernel, or new user space/old kernel, and new kernel/old user
> space. You have no way of dictating which versions of anything people will
> use.
But isn't this the exact situation we face now? I remember people asking
in this list about audit for RHEL4, and in the end the problem was that
they had their kernel upgraded so userspace and kernel were not in
sync...
I think that if we take this discussion to extremes, we'd be talking
about a 'self-descriptive meta language' so that upgrades to
userspace/kernel are well covered (can you say "xml"?)
On the other hand, I do agree that the way it's currently implemented is
prone to error when it comes to reporting/analyzing data. e.g., I
remember fixing a 'mode' field in an audit object where it was being
displayed as hex where we would expect an octal. In the future, when
parsing this field, how would I know which radix a mode=003 field is?
Fundamentally, was the record generated in patched kernel or not? If we
take this change applied to every kernel and userspace change of the
audit records, how can auparse possibly cover all cases?
I also feel that stricter message format rules for userspace would help.
Nowadays is difficult to 'reconstruct' the meaning of an audit record
just by looking at their fields. Some fields don't even have a value,
other use the same field twice in the same record. And for most people
wanting to analyze audit data the fields=value pairs are the only
reliable source.
-Klaus
--
Klaus Heinrich Kiwi <klausk@linux.vnet.ibm.com>
Linux Security Development, IBM Linux Technology Center
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-13 0:33 ` Klaus Heinrich Kiwi
@ 2008-08-13 15:09 ` Eric Paris
2008-08-13 16:25 ` Klaus Heinrich Kiwi
2008-08-13 16:29 ` John Dennis
2008-08-13 22:35 ` Casey Schaufler
1 sibling, 2 replies; 37+ messages in thread
From: Eric Paris @ 2008-08-13 15:09 UTC (permalink / raw)
To: Klaus Heinrich Kiwi; +Cc: William Kelly, Bret Piatt, linux-audit
On Tue, 2008-08-12 at 21:33 -0300, Klaus Heinrich Kiwi wrote:
> On Tue, 2008-08-12 at 16:32 -0400, Steve Grubb wrote:
> >
> > And any code created needs to be backwards compatible. you could have new user
> > space/new kernel, or new user space/old kernel, and new kernel/old user
> > space. You have no way of dictating which versions of anything people will
> > use.
>
> But isn't this the exact situation we face now? I remember people asking
> in this list about audit for RHEL4, and in the end the problem was that
> they had their kernel upgraded so userspace and kernel were not in
> sync...
As I recall in the RHEL4 case the problem was the RH was carrying their
own audit patches and so we didn't have to deal with upstream policies.
Look at how strict those upstream policies. It doesn't even matter if
backwards compatibility for BUGS are broken, new kernel + ancient
userspace MUST be able to work somehow.
> I think that if we take this discussion to extremes, we'd be talking
> about a 'self-descriptive meta language' so that upgrades to
> userspace/kernel are well covered (can you say "xml"?)
HAHAHA, kernel output xml? dream on :) I'm willing to do wholesale
output changes, but something that heavy in kernel is impossible to
push. I can just see Al cussing up a storm as he read that.
> On the other hand, I do agree that the way it's currently implemented is
> prone to error when it comes to reporting/analyzing data. e.g., I
> remember fixing a 'mode' field in an audit object where it was being
> displayed as hex where we would expect an octal. In the future, when
> parsing this field, how would I know which radix a mode=003 field is?
> Fundamentally, was the record generated in patched kernel or not? If we
> take this change applied to every kernel and userspace change of the
> audit records, how can auparse possibly cover all cases?
>
> I also feel that stricter message format rules for userspace would help.
> Nowadays is difficult to 'reconstruct' the meaning of an audit record
> just by looking at their fields. Some fields don't even have a value,
> other use the same field twice in the same record. And for most people
> wanting to analyze audit data the fields=value pairs are the only
> reliable source.
I'm all for this, but I still firmly believe that if I am a user of the
audit subsystem and I decide to use my own ugly format that completely
sucks its not audit's fault when it can't parse those things. aka if
you can't use auparse to look at SELinux records, too bad, that's
SELinux's problem to parse them. But everything the audit system
controls could and should be moved to a better standard.
-Eric
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-13 15:09 ` Eric Paris
@ 2008-08-13 16:25 ` Klaus Heinrich Kiwi
2008-08-13 17:02 ` Steve Grubb
2008-08-14 18:25 ` Stephen Smalley
2008-08-13 16:29 ` John Dennis
1 sibling, 2 replies; 37+ messages in thread
From: Klaus Heinrich Kiwi @ 2008-08-13 16:25 UTC (permalink / raw)
To: Eric Paris; +Cc: William Kelly, Bret Piatt, linux-audit
On Wed, 2008-08-13 at 11:09 -0400, Eric Paris wrote:
> HAHAHA, kernel output xml? dream on :) I'm willing to do
> wholesale
> output changes, but something that heavy in kernel is impossible to
> push. I can just see Al cussing up a storm as he read that.
That's exactly my point. There's no sense in discussing a 'ideal' format
for audit stream coming out of the kernel, since it's well agreed
(thankfully) that the kernel part should be as minimal as possible.
I like Mathew's idea of having a binary format though. Maybe it's
possible to carry the legacy format for some time while we have a more
robust (and extensible) binary format in parallel? And then having a
binary format version tag within each record?
I know I know, at the time I have more questions than answers. I only
wanted to express my feeling that there is indeed a problem with the
current format.
I know you and Steve tried before to talk with the SELinux guys trying
to have a saner format for AVCs and stuff. Do you feel that's an
impossible barrier to cross or maybe we try again and convince them that
stricter formatting rules will bring more users for their audit data?
-Klaus
--
Klaus Heinrich Kiwi <klausk@linux.vnet.ibm.com>
Linux Security Development, IBM Linux Technology Center
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-13 16:25 ` Klaus Heinrich Kiwi
@ 2008-08-13 17:02 ` Steve Grubb
2008-08-13 17:30 ` LC Bruzenak
2008-08-13 18:49 ` Linda Knippers
2008-08-14 18:25 ` Stephen Smalley
1 sibling, 2 replies; 37+ messages in thread
From: Steve Grubb @ 2008-08-13 17:02 UTC (permalink / raw)
To: Klaus Heinrich Kiwi; +Cc: William Kelly, linux-audit, Bret Piatt
On Wednesday 13 August 2008 12:25:09 Klaus Heinrich Kiwi wrote:
> I like Mathew's idea of having a binary format though. Maybe it's
> possible to carry the legacy format for some time while we have a more
> robust (and extensible) binary format in parallel? And then having a
> binary format version tag within each record?
Yes, there would have to be a migration path. I think we talked about XDR as a
possibility 4 years ago because its already inside the kernel. The kernel
guys at the time wanted to re-use something already inside or something that
was compact in its representation.
What I believe lead to text based was the general feeling that logs should be
human readable with less, tail, or vi if need be.
A problem with binary representations will be what happens with aggregated
big-endian and little-endian system logs?
> I know I know, at the time I have more questions than answers. I only
> wanted to express my feeling that there is indeed a problem with the
> current format.
There is a problem with any format. How would changing to binary help when we
realize that we forgot auid in CONFIG_CHANGE? The only thing that might help
is to stab a version number into each record because its size is going to
change. This is going to lead to much more complex code in the parser.
The current technique is flexible in that the field is either there or not but
it parses either way. For example, we recently added ses to syscall records.
The auparse library can handle it being there or not. Now and in the future.
The application that uses those logs may have to decide whether that's
important or not. I don't think that is a judgment call for a library to
make.
In a binary representation, you would have a version number to describe what
structure to cast the pointer to. If you have new log with old user space, it
won't parse because it won't have the template to cast with.
> I know you and Steve tried before to talk with the SELinux guys trying
> to have a saner format for AVCs and stuff. Do you feel that's an
> impossible barrier to cross or maybe we try again and convince them that
> stricter formatting rules will bring more users for their audit data?
I don't know their recent thoughts on this. USER_AVC is seriously broken and
unusable. I've been thinking about linking auparse to sending user space
events to make sure that only parsable events are sent (it would go to syslog
with an error so that its not lost forever). No app should consider sending
an event as a performance impact, so this should be doable - but not in the
1.7.x seres. :)
-Steve
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-13 17:02 ` Steve Grubb
@ 2008-08-13 17:30 ` LC Bruzenak
2008-08-13 18:49 ` Linda Knippers
1 sibling, 0 replies; 37+ messages in thread
From: LC Bruzenak @ 2008-08-13 17:30 UTC (permalink / raw)
To: Steve Grubb; +Cc: William Kelly, linux-audit, Bret Piatt
On Wed, 2008-08-13 at 13:02 -0400, Steve Grubb wrote:
>
>
> There is a problem with any format. How would changing to binary help
> when we realize that we forgot auid in CONFIG_CHANGE? The only thing
> that might help is to stab a version number into each record because
> its size is going to change. This is going to lead to much more
> complex code in the parser.
I was thinking along those lines also - that if there was an identifier
embedded in the audit data that the parser could read, it could know if
it could parse it or not.
Then the matching parser format could be used if it were separated out
as a plugin. Not sure it would need to be each record but maybe just in
the auditd start event?
Maybe this isn't really practicable per se, but if there is a way to
future-proof the data/format pair rather than maintain perpetual
backwards compatibility over advancements I'd think it worthwhile.
LCB.
--
LC (Lenny) Bruzenak
lenny@magitekltd.com
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-13 17:02 ` Steve Grubb
2008-08-13 17:30 ` LC Bruzenak
@ 2008-08-13 18:49 ` Linda Knippers
2008-08-13 19:58 ` John Dennis
1 sibling, 1 reply; 37+ messages in thread
From: Linda Knippers @ 2008-08-13 18:49 UTC (permalink / raw)
To: Steve Grubb; +Cc: William Kelly, linux-audit, Bret Piatt
Steve Grubb wrote:
> On Wednesday 13 August 2008 12:25:09 Klaus Heinrich Kiwi wrote:
>> I like Mathew's idea of having a binary format though. Maybe it's
>> possible to carry the legacy format for some time while we have a more
>> robust (and extensible) binary format in parallel? And then having a
>> binary format version tag within each record?
>
> Yes, there would have to be a migration path. I think we talked about XDR as a
> possibility 4 years ago because its already inside the kernel. The kernel
> guys at the time wanted to re-use something already inside or something that
> was compact in its representation.
>
> What I believe lead to text based was the general feeling that logs should be
> human readable with less, tail, or vi if need be.
LAuS had a binary record format and I don't recall people complaining
about it.
> A problem with binary representations will be what happens with aggregated
> big-endian and little-endian system logs?
Just define a standard.
>
>> I know I know, at the time I have more questions than answers. I only
>> wanted to express my feeling that there is indeed a problem with the
>> current format.
>
> There is a problem with any format. How would changing to binary help when we
> realize that we forgot auid in CONFIG_CHANGE? The only thing that might help
> is to stab a version number into each record because its size is going to
> change. This is going to lead to much more complex code in the parser.
>
> The current technique is flexible in that the field is either there or not but
> it parses either way. For example, we recently added ses to syscall records.
> The auparse library can handle it being there or not. Now and in the future.
> The application that uses those logs may have to decide whether that's
> important or not. I don't think that is a judgment call for a library to
> make.
>
> In a binary representation, you would have a version number to describe what
> structure to cast the pointer to. If you have new log with old user space, it
> won't parse because it won't have the template to cast with.
Is that any different from not being able to parse something the tools
don't know about?
-- ljk
>
>
>> I know you and Steve tried before to talk with the SELinux guys trying
>> to have a saner format for AVCs and stuff. Do you feel that's an
>> impossible barrier to cross or maybe we try again and convince them that
>> stricter formatting rules will bring more users for their audit data?
>
> I don't know their recent thoughts on this. USER_AVC is seriously broken and
> unusable. I've been thinking about linking auparse to sending user space
> events to make sure that only parsable events are sent (it would go to syslog
> with an error so that its not lost forever). No app should consider sending
> an event as a performance impact, so this should be doable - but not in the
> 1.7.x seres. :)
>
> -Steve
>
> --
> Linux-audit mailing list
> Linux-audit@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-audit
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-13 18:49 ` Linda Knippers
@ 2008-08-13 19:58 ` John Dennis
0 siblings, 0 replies; 37+ messages in thread
From: John Dennis @ 2008-08-13 19:58 UTC (permalink / raw)
To: Linda Knippers; +Cc: William Kelly, Bret Piatt, linux-audit
Linda Knippers wrote:
> Steve Grubb wrote:
>> In a binary representation, you would have a version number to
>> describe what structure to cast the pointer to. If you have new log
>> with old user space, it won't parse because it won't have the
>> template to cast with.
>
> Is that any different from not being able to parse something the tools
> don't know about?
It's useful to distinguish between two entirely different concepts which
are at play here, but unfortunately get confused and intermingled,
parsing and interpretation. A well designed protocol is always parsable
by any version of the parser and any version of the input stream. This
can be achieved because the protocol stream is well defined and any
unknown protocol elements can be "stepped over". Once parsed any given
protocol element is subject to interpretation, this is version specific.
For example in the v2 protocol a "security identifier" (i.e. sid) might
have been added, only a v2 tool could properly interpret the "sid" but a
v1 parser could still parse the v2 stream (in fact a v1 parser should
also be able to know the type of the unknown "sid", e.g. integer,
string, etc.)
Extensible protocol design is a mature (and relatively simple) computer
science discipline. The paradigm most likely to be familiar to people is
ASN.1, but there are a host of other equally valid approaches, both
binary and text based (I am not advocating for any given protocol design
paradigm, but I do advocate we adopt one).
The reality is the audit stream is a protocol. The problem is the audit
stream never had the principles of protocol design applied to it. We've
tried to compensate for the lack of protocol rigorousness in the data
stream by building a parser with special case exceptions and heuristics
which is inherently unsustainable. If instead the audit data stream
followed the rigours of a network protocol most of these issues would
simply vanish.
--
John Dennis <jdennis@redhat.com>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-13 16:25 ` Klaus Heinrich Kiwi
2008-08-13 17:02 ` Steve Grubb
@ 2008-08-14 18:25 ` Stephen Smalley
2008-08-15 13:58 ` Matteo Michelini
1 sibling, 1 reply; 37+ messages in thread
From: Stephen Smalley @ 2008-08-14 18:25 UTC (permalink / raw)
To: Klaus Heinrich Kiwi; +Cc: linux-audit, Bret Piatt
On Wed, 2008-08-13 at 13:25 -0300, Klaus Heinrich Kiwi wrote:
> On Wed, 2008-08-13 at 11:09 -0400, Eric Paris wrote:
> > HAHAHA, kernel output xml? dream on :) I'm willing to do
> > wholesale
> > output changes, but something that heavy in kernel is impossible to
> > push. I can just see Al cussing up a storm as he read that.
>
> That's exactly my point. There's no sense in discussing a 'ideal' format
> for audit stream coming out of the kernel, since it's well agreed
> (thankfully) that the kernel part should be as minimal as possible.
>
> I like Mathew's idea of having a binary format though. Maybe it's
> possible to carry the legacy format for some time while we have a more
> robust (and extensible) binary format in parallel? And then having a
> binary format version tag within each record?
>
> I know I know, at the time I have more questions than answers. I only
> wanted to express my feeling that there is indeed a problem with the
> current format.
>
> I know you and Steve tried before to talk with the SELinux guys trying
> to have a saner format for AVCs and stuff. Do you feel that's an
> impossible barrier to cross or maybe we try again and convince them that
> stricter formatting rules will bring more users for their audit data?
If you want to ask the "SELinux guys", ask on the selinux@tycho.nsa.gov
list. But in this case: we've always been willing to take changes to
the AVC audit format; we have merely pointed out that it has to be done
in a way that provides full backward compatibility both in kernel and in
the userland, as we are not allowed to break existing userland with new
kernel and we'd like new userland to still work on old kernels. Patches
that meet those standards accepted.
--
Stephen Smalley
National Security Agency
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-14 18:25 ` Stephen Smalley
@ 2008-08-15 13:58 ` Matteo Michelini
2008-08-15 14:10 ` Steve Grubb
2008-08-15 14:15 ` Stephen Smalley
0 siblings, 2 replies; 37+ messages in thread
From: Matteo Michelini @ 2008-08-15 13:58 UTC (permalink / raw)
To: Stephen Smalley; +Cc: linux-audit, Bret Piatt
I'm working on a binary format for the linux-audit system as part of a
university research project.
The goal is having something similar to BSM trails.
What do you think about it?
2008/8/14, Stephen Smalley <sds@tycho.nsa.gov>:
>
> On Wed, 2008-08-13 at 13:25 -0300, Klaus Heinrich Kiwi wrote:
>> On Wed, 2008-08-13 at 11:09 -0400, Eric Paris wrote:
>> > HAHAHA, kernel output xml? dream on :) I'm willing to do
>> > wholesale
>> > output changes, but something that heavy in kernel is impossible to
>> > push. I can just see Al cussing up a storm as he read that.
>>
>> That's exactly my point. There's no sense in discussing a 'ideal' format
>> for audit stream coming out of the kernel, since it's well agreed
>> (thankfully) that the kernel part should be as minimal as possible.
>>
>> I like Mathew's idea of having a binary format though. Maybe it's
>> possible to carry the legacy format for some time while we have a more
>> robust (and extensible) binary format in parallel? And then having a
>> binary format version tag within each record?
>>
>> I know I know, at the time I have more questions than answers. I only
>> wanted to express my feeling that there is indeed a problem with the
>> current format.
>>
>> I know you and Steve tried before to talk with the SELinux guys trying
>> to have a saner format for AVCs and stuff. Do you feel that's an
>> impossible barrier to cross or maybe we try again and convince them that
>> stricter formatting rules will bring more users for their audit data?
>
> If you want to ask the "SELinux guys", ask on the selinux@tycho.nsa.gov
> list. But in this case: we've always been willing to take changes to
> the AVC audit format; we have merely pointed out that it has to be done
> in a way that provides full backward compatibility both in kernel and in
> the userland, as we are not allowed to break existing userland with new
> kernel and we'd like new userland to still work on old kernels. Patches
> that meet those standards accepted.
>
> --
> Stephen Smalley
> National Security Agency
>
> --
> Linux-audit mailing list
> Linux-audit@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-audit
>
--
Matteo Michelini (Milan - Italy)
http://www.michelini.co.uk
Linux registered user: #332873
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-15 13:58 ` Matteo Michelini
@ 2008-08-15 14:10 ` Steve Grubb
2008-08-15 15:27 ` Matteo Michelini
2008-08-15 14:15 ` Stephen Smalley
1 sibling, 1 reply; 37+ messages in thread
From: Steve Grubb @ 2008-08-15 14:10 UTC (permalink / raw)
To: linux-audit; +Cc: Bret Piatt
On Friday 15 August 2008 09:58:54 Matteo Michelini wrote:
> I'm working on a binary format for the linux-audit system as part of a
> university research project.
Big-endian/little-endian in aggregated logs? Will the kernel authors allow the
encoder in the kernel? XDR was the only option we had last time. Versioning
of structs? How do old user space tools work with new kernel that may change
layout? Patents?
-Steve
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-15 14:10 ` Steve Grubb
@ 2008-08-15 15:27 ` Matteo Michelini
0 siblings, 0 replies; 37+ messages in thread
From: Matteo Michelini @ 2008-08-15 15:27 UTC (permalink / raw)
To: Steve Grubb; +Cc: linux-audit, Bret Piatt
2008/8/15, Steve Grubb <sgrubb@redhat.com>:
> On Friday 15 August 2008 09:58:54 Matteo Michelini wrote:
>> I'm working on a binary format for the linux-audit system as part of a
>> university research project.
>
> Big-endian/little-endian in aggregated logs? Will the kernel authors allow
> the
> encoder in the kernel? XDR was the only option we had last time. Versioning
> of structs? How do old user space tools work with new kernel that may change
> layout? Patents?
>
I must design and implement something that is really close to the
FreeBSD BSM implementation, because in userspace we have a tool (an
IDS) that works with BSM trails format only.
I'm designing the patch with the big-endian encoding format.
My idea is only to add this capability to the existing text-based format.
The FreeBSD BSM implementation is BSD License..
> -Steve
>
--
Matteo Michelini (Milan - Italy)
http://www.michelini.co.uk
Linux registered user: #332873
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-15 13:58 ` Matteo Michelini
2008-08-15 14:10 ` Steve Grubb
@ 2008-08-15 14:15 ` Stephen Smalley
1 sibling, 0 replies; 37+ messages in thread
From: Stephen Smalley @ 2008-08-15 14:15 UTC (permalink / raw)
To: Matteo Michelini; +Cc: linux-audit, Bret Piatt
On Fri, 2008-08-15 at 15:58 +0200, Matteo Michelini wrote:
> I'm working on a binary format for the linux-audit system as part of a
> university research project.
>
> The goal is having something similar to BSM trails.
> What do you think about it?
If your question is whether we would be ok with converting SELinux avc
messages and other SELinux audit messages to a binary format if the rest
of the audit system converted over to a binary format, then I think we'd
be fine with that as long as there was a system setting that preserved
the old text-based audit format for compatibility with existing
userland. And obviously someone would have to do the work of converting
SELinux userland to either understand the new format or to always use
audit interfaces or utilities that internally convert the new binary
format to the old text-based format (e.g. audit2allow is typically fed
ausearch output, so as long as that remains text, audit2allow doesn't
have to care what the raw format is).
>
> 2008/8/14, Stephen Smalley <sds@tycho.nsa.gov>:
> >
> > On Wed, 2008-08-13 at 13:25 -0300, Klaus Heinrich Kiwi wrote:
> >> On Wed, 2008-08-13 at 11:09 -0400, Eric Paris wrote:
> >> > HAHAHA, kernel output xml? dream on :) I'm willing to do
> >> > wholesale
> >> > output changes, but something that heavy in kernel is impossible to
> >> > push. I can just see Al cussing up a storm as he read that.
> >>
> >> That's exactly my point. There's no sense in discussing a 'ideal' format
> >> for audit stream coming out of the kernel, since it's well agreed
> >> (thankfully) that the kernel part should be as minimal as possible.
> >>
> >> I like Mathew's idea of having a binary format though. Maybe it's
> >> possible to carry the legacy format for some time while we have a more
> >> robust (and extensible) binary format in parallel? And then having a
> >> binary format version tag within each record?
> >>
> >> I know I know, at the time I have more questions than answers. I only
> >> wanted to express my feeling that there is indeed a problem with the
> >> current format.
> >>
> >> I know you and Steve tried before to talk with the SELinux guys trying
> >> to have a saner format for AVCs and stuff. Do you feel that's an
> >> impossible barrier to cross or maybe we try again and convince them that
> >> stricter formatting rules will bring more users for their audit data?
> >
> > If you want to ask the "SELinux guys", ask on the selinux@tycho.nsa.gov
> > list. But in this case: we've always been willing to take changes to
> > the AVC audit format; we have merely pointed out that it has to be done
> > in a way that provides full backward compatibility both in kernel and in
> > the userland, as we are not allowed to break existing userland with new
> > kernel and we'd like new userland to still work on old kernels. Patches
> > that meet those standards accepted.
> >
> > --
> > Stephen Smalley
> > National Security Agency
> >
> > --
> > Linux-audit mailing list
> > Linux-audit@redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-audit
> >
>
>
--
Stephen Smalley
National Security Agency
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-13 15:09 ` Eric Paris
2008-08-13 16:25 ` Klaus Heinrich Kiwi
@ 2008-08-13 16:29 ` John Dennis
1 sibling, 0 replies; 37+ messages in thread
From: John Dennis @ 2008-08-13 16:29 UTC (permalink / raw)
To: Eric Paris; +Cc: William Kelly, linux-audit, Bret Piatt
[-- Attachment #1.1: Type: text/plain, Size: 1558 bytes --]
Eric Paris wrote:
> On Tue, 2008-08-12 at 21:33 -0300, Klaus Heinrich Kiwi wrote:
>
>> I think that if we take this discussion to extremes, we'd be talking
>> about a 'self-descriptive meta language' so that upgrades to
>> userspace/kernel are well covered (can you say "xml"?)
>>
>
> HAHAHA, kernel output xml? dream on :) I'm willing to do wholesale
> output changes, but something that heavy in kernel is impossible to
> push. I can just see Al cussing up a storm as he read that.
>
Just to be clear no one is suggesting XML or anything heavy weight.
Rather what is being suggested are trivial changes. For example string
values are always enclosed in double quotes with interior characters
properly escaped, or that non-decimal integer values include a radix
prefix. I think one could simply summarize this as saying the lexical
structure of value tokens match the lexical structure of the C
programming language tokens which is pretty simple but unambiguous (plus
there is a wealth of code to generate and parse these simple ubiquitous
tokens).
The implementation would be equally simple. Code which generates audit
data calls a printf style varargs function which takes a format string
and optional parameters. This single simple call is responsible for
formatting a few basic data types which observes the token rules.
To handle backward compatibility auparse could insulate users from the
format changes by looking for either the old or new format, preferring
the newer version.
--
John Dennis <jdennis@redhat.com>
[-- Attachment #1.2: Type: text/html, Size: 2087 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-13 0:33 ` Klaus Heinrich Kiwi
2008-08-13 15:09 ` Eric Paris
@ 2008-08-13 22:35 ` Casey Schaufler
1 sibling, 0 replies; 37+ messages in thread
From: Casey Schaufler @ 2008-08-13 22:35 UTC (permalink / raw)
To: Klaus Heinrich Kiwi; +Cc: William Kelly, Bret Piatt, linux-audit
Klaus Heinrich Kiwi wrote:
> On Tue, 2008-08-12 at 16:32 -0400, Steve Grubb wrote:
>
>> And any code created needs to be backwards compatible. you could have new user
>> space/new kernel, or new user space/old kernel, and new kernel/old user
>> space. You have no way of dictating which versions of anything people will
>> use.
>>
>
> But isn't this the exact situation we face now? I remember people asking
> in this list about audit for RHEL4, and in the end the problem was that
> they had their kernel upgraded so userspace and kernel were not in
> sync...
>
> I think that if we take this discussion to extremes, we'd be talking
> about a 'self-descriptive meta language' so that upgrades to
> userspace/kernel are well covered (can you say "xml"?)
>
Before y'all go laughing too hard at this suggestion, the Irix
audit trail format is a 'self-descriptive meta language' that only
misses being XML by having preceded XML by a year or two.
> On the other hand, I do agree that the way it's currently implemented is
> prone to error when it comes to reporting/analyzing data. e.g., I
> remember fixing a 'mode' field in an audit object where it was being
> displayed as hex where we would expect an octal. In the future, when
> parsing this field, how would I know which radix a mode=003 field is?
> Fundamentally, was the record generated in patched kernel or not? If we
> take this change applied to every kernel and userspace change of the
> audit records, how can auparse possibly cover all cases?
>
> I also feel that stricter message format rules for userspace would help.
> Nowadays is difficult to 'reconstruct' the meaning of an audit record
> just by looking at their fields. Some fields don't even have a value,
> other use the same field twice in the same record. And for most people
> wanting to analyze audit data the fields=value pairs are the only
> reliable source.
>
> -Klaus
>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 20:11 ` Eric Paris
2008-08-12 20:32 ` Steve Grubb
@ 2008-08-12 20:57 ` John Dennis
2008-08-12 21:18 ` Steve Grubb
2008-08-12 22:59 ` Eric Paris
1 sibling, 2 replies; 37+ messages in thread
From: John Dennis @ 2008-08-12 20:57 UTC (permalink / raw)
To: Eric Paris; +Cc: William Kelly, Bret Piatt, linux-audit
[-- Attachment #1.1: Type: text/plain, Size: 1600 bytes --]
Eric Paris wrote:
> On Tue, 2008-08-12 at 15:58 -0400, John Dennis wrote:
>
>
>> So many people have complained about this; I do not understand the
>> resistance to fixing it. The argument it would break something which
>> is broken to begin with does not seem like a reasonable justification
>> to me. The sooner it's fixed the better IMHO.
>>
>
> Show me the code and I'll start trying to fix the kernel based on that
> code as best we can. But before you start read over the article
>
> Can user-space bugs be kernel regressions?
> http://lwn.net/Articles/292143/
>
> As soon as you grasp that article send me the code and we'll work
> together to fix this problem!
>
>
Perhaps you should grasp the concept this is not a user space bug but a
flawed implementation. Anyone with the most basic understanding of
parsing and protocols would never defend the current implementation (the
fact it's in the kernel does not suspend the laws of computer science
and justify it).
Let me give you a simple example, suppose this key/value pair was in an
audit record:
foo=00
How does one know which of the possible values foo has:
1) it's the integer zero (but in what radix? does the leading zero imply
octal or is it just an insignificant digit?)
2) it's the hexadecimal encoding of a single character string containing
one null byte.
3) it's the 2 character string "00" consisting of two zero characters.
The fact is it's ambiguous, it could be any of the above. It's ambiguous
because the audit stream is an improperly specified protocol.
--
John Dennis <jdennis@redhat.com>
[-- Attachment #1.2: Type: text/html, Size: 2240 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 20:57 ` John Dennis
@ 2008-08-12 21:18 ` Steve Grubb
2008-08-12 21:40 ` John Dennis
2008-08-12 22:59 ` Eric Paris
1 sibling, 1 reply; 37+ messages in thread
From: Steve Grubb @ 2008-08-12 21:18 UTC (permalink / raw)
To: John Dennis; +Cc: William Kelly, linux-audit, Bret Piatt
On Tuesday 12 August 2008 16:57:59 John Dennis wrote:
> Let me give you a simple example, suppose this key/value pair was in an
> audit record:
>
> foo=00
>
> How does one know which of the possible values foo has:
>
> 1) it's the integer zero (but in what radix? does the leading zero imply
> octal or is it just an insignificant digit?)
>
> 2) it's the hexadecimal encoding of a single character string containing
> one null byte.
>
> 3) it's the 2 character string "00" consisting of two zero characters.
>
> The fact is it's ambiguous, it could be any of the above. It's ambiguous
> because the audit stream is an improperly specified protocol.
John, this is the way that the kernel works. The kernel and user space
utilities that interface to it are developed together with an understanding
of how the numbers are represented. Go take a look in /proc/1/stat. What does
any of that mean? Is it parsable? I'll guarantee its defined well enough
program can use it.
The point is that all of /proc is written without implicit parsing rules.
That's the way it is when dealing with kernel and its user space utilities.
There is no field in the kernel that is unhandled by the audit system and
without knowing specifically what's in it.
-Steve
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 21:18 ` Steve Grubb
@ 2008-08-12 21:40 ` John Dennis
2008-08-12 21:53 ` Steve Grubb
0 siblings, 1 reply; 37+ messages in thread
From: John Dennis @ 2008-08-12 21:40 UTC (permalink / raw)
To: Steve Grubb; +Cc: William Kelly, linux-audit, Bret Piatt
[-- Attachment #1.1: Type: text/plain, Size: 1989 bytes --]
Steve Grubb wrote:
> On Tuesday 12 August 2008 16:57:59 John Dennis wrote:
>
>> Let me give you a simple example, suppose this key/value pair was in an
>> audit record:
>>
>> foo=00
>>
>> How does one know which of the possible values foo has:
>>
>> 1) it's the integer zero (but in what radix? does the leading zero imply
>> octal or is it just an insignificant digit?)
>>
>> 2) it's the hexadecimal encoding of a single character string containing
>> one null byte.
>>
>> 3) it's the 2 character string "00" consisting of two zero characters.
>>
>> The fact is it's ambiguous, it could be any of the above. It's ambiguous
>> because the audit stream is an improperly specified protocol.
>>
>
> John, this is the way that the kernel works. The kernel and user space
> utilities that interface to it are developed together with an understanding
> of how the numbers are represented. Go take a look in /proc/1/stat. What does
> any of that mean? Is it parsable? I'll guarantee its defined well enough
> program can use it.
>
Bad example, proc works because it's (mostly) well defined. Sorry, but
the audit stream is not well defined, it can change at any time. We have
existence proofs of it changing. That doesn't even take into account
"user audit data". A properly specified protocol is immune to these
problems.
> The point is that all of /proc is written without implicit parsing rules.
> That's the way it is when dealing with kernel and its user space utilities.
> There is no field in the kernel that is unhandled by the audit system and
> without knowing specifically what's in it.
>
I'm sorry Steve, but this simply doesn't work. How the heck am I
supposed to correctly parse an audit log file from 5 years ago if either
I don't know the kernel version that produced it or have available the
matching user space tools from that era? This is going to be an absolute
nightmare for IPA and other compliance tools.
--
John Dennis <jdennis@redhat.com>
[-- Attachment #1.2: Type: text/html, Size: 2557 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 21:40 ` John Dennis
@ 2008-08-12 21:53 ` Steve Grubb
2008-08-12 22:11 ` John Dennis
0 siblings, 1 reply; 37+ messages in thread
From: Steve Grubb @ 2008-08-12 21:53 UTC (permalink / raw)
To: John Dennis; +Cc: William Kelly, linux-audit, Bret Piatt
On Tuesday 12 August 2008 17:40:00 John Dennis wrote:
> Bad example, proc works because it's (mostly) well defined.
What does the 25th field in /proc/1/stat mean? You can't tell without looking
at the kernel source code.
> > The point is that all of /proc is written without implicit parsing rules.
> > That's the way it is when dealing with kernel and its user space
> > utilities. There is no field in the kernel that is unhandled by the audit
> > system and without knowing specifically what's in it.
>
> I'm sorry Steve, but this simply doesn't work. How the heck am I
> supposed to correctly parse an audit log file from 5 years ago if either
> I don't know the kernel version that produced it
ausearch --start today -m DAEMON_START
----
time->Tue Aug 12 08:03:52 2008
node=127.0.0.1 type=DAEMON_START msg=audit(1218542632.238:4562): auditd start,
ver=1.7.4 format=raw kernel=2.6.26-0.17.rc3.sg3.fc9.x86_64 auid=4294967295
pid=2139 res=success
> or have available the matching user space tools from that era? This is going
> to be an absolute nightmare for IPA and other compliance tools.
With backwards compatibility you don't have to worry about having tools of
that era.
-Steve
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 21:53 ` Steve Grubb
@ 2008-08-12 22:11 ` John Dennis
2008-08-12 22:46 ` Steve Grubb
0 siblings, 1 reply; 37+ messages in thread
From: John Dennis @ 2008-08-12 22:11 UTC (permalink / raw)
To: Steve Grubb; +Cc: William Kelly, linux-audit, Bret Piatt
Steve Grubb wrote:
> With backwards compatibility you don't have to worry about having tools of
> that era.
>
This does not address the problem of matching a log file to a kernel
version nor unanticipated user space provided audit records.
--
John Dennis <jdennis@redhat.com>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 22:11 ` John Dennis
@ 2008-08-12 22:46 ` Steve Grubb
0 siblings, 0 replies; 37+ messages in thread
From: Steve Grubb @ 2008-08-12 22:46 UTC (permalink / raw)
To: John Dennis; +Cc: William Kelly, linux-audit, Bret Piatt
On Tuesday 12 August 2008 18:11:48 John Dennis wrote:
> > With backwards compatibility you don't have to worry about having tools
> > of that era.
> >
>
> This does not address the problem of matching a log file to a kernel
> version
The kernel version is recorded in the daemon startup event.
> nor unanticipated user space provided audit records.
The onus is with the developers to test their work. I did that for all the
audit records I added throughout user space.
-Steve
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: get_field_str() and interpret_field() bug with multi-word fields
2008-08-12 20:57 ` John Dennis
2008-08-12 21:18 ` Steve Grubb
@ 2008-08-12 22:59 ` Eric Paris
1 sibling, 0 replies; 37+ messages in thread
From: Eric Paris @ 2008-08-12 22:59 UTC (permalink / raw)
To: John Dennis; +Cc: William Kelly, Bret Piatt, linux-audit
On Tue, 2008-08-12 at 16:57 -0400, John Dennis wrote:
> Eric Paris wrote:
> > On Tue, 2008-08-12 at 15:58 -0400, John Dennis wrote:
> >
> >
> > > So many people have complained about this; I do not understand the
> > > resistance to fixing it. The argument it would break something which
> > > is broken to begin with does not seem like a reasonable justification
> > > to me. The sooner it's fixed the better IMHO.
> > >
> >
> > Show me the code and I'll start trying to fix the kernel based on that
> > code as best we can. But before you start read over the article
> >
> > Can user-space bugs be kernel regressions?
> > http://lwn.net/Articles/292143/
> >
> > As soon as you grasp that article send me the code and we'll work
> > together to fix this problem!
> >
> >
>
> Perhaps you should grasp the concept this is not a user space bug but
> a flawed implementation. Anyone with the most basic understanding of
> parsing and protocols would never defend the current implementation
> (the fact it's in the kernel does not suspend the laws of computer
> science and justify it).
Although I don't completely agree with you I do not disagree. The
problem is that you are simply wrong in your proposed solution. Any
kernel change MUST be backwards compatible with old userspace. Period.
End of story. If you want to break that rule just give up. Bitch till
you turn blue, any kernel change MUST be backwards compatible with old
userspace.
Given that fact. And it's a fact. I'm willing to start working on
moving forward with a new interface. Most likely its going to have to
be a little tunable /proc/sys/kernel/old_audit_format which is going to
emit records EXACTLY the way we do it today. We can add that to the
list of things that will get removed (In about 2+ years.) I may even be
amenable to accepting a compile time option for old/new kernel output
format. But in either case the patch set need to address ALL
non-extensible kernel->userspace data flows (AUDIT_SIGNAL_GET or
whatever that message interface is sucks, ioctls suck) But I'll do/help
write that kind of work.
So as long as you somehow MAINTAIN COMPATIBILITY WITH OLD USERSPACE
please design the better interface. I'd love to see some patches and as
long as you maintain the one rule I've laid down I'll gladly work with
you. (There is another rule, you aren't allowed to add even slightly
complex parsers into the kernel, but I don't think you want to do that)
We need to remember a couple of things though.
#1 the kernel does NOT under ANY circumstances validate audit data. If
userspace sends us crap that doesn't fit the system, too bad, crap goes
out.
#2 is really just #1. What it really means is that patches that change
security/selinux are not acceptable. Users of the audit system can do
any darn thing they want. the audit parsing utilities can't possibly
know what the point of a message from SMACK means and so if SMACK
doesn't follow your convention its not your problem. Same goes for
SELinux or any other USER of the audit system. The kernel does not and
WILL NOT validate messages.
-Eric
>
> Let me give you a simple example, suppose this key/value pair was in
> an audit record:
>
> foo=00
>
> How does one know which of the possible values foo has:
>
> 1) it's the integer zero (but in what radix? does the leading zero
> imply octal or is it just an insignificant digit?)
>
> 2) it's the hexadecimal encoding of a single character string
> containing one null byte.
>
> 3) it's the 2 character string "00" consisting of two zero characters.
>
> The fact is it's ambiguous, it could be any of the above. It's
> ambiguous because the audit stream is an improperly specified
> protocol.
> --
> John Dennis <jdennis@redhat.com>
^ permalink raw reply [flat|nested] 37+ messages in thread
* get_field_str() and interpret_field() bug with multi-word fields
@ 2008-08-13 16:57 Jonathan Kelly
0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Kelly @ 2008-08-13 16:57 UTC (permalink / raw)
To: Linux-audit; +Cc: William Kelly, Bret Piatt
[-- Attachment #1.1: Type: text/plain, Size: 898 bytes --]
Hi again,
For what it's worth, I dug through the code a bit, and am pretty sure that this particular issue exists in lines 71-78 of ellist.c:
ptr = strtok_r(buf, " ", &saved);
if (ptr == NULL)
return -1;
do { // If there's an '=' sign, its a keeper
nvnode n;
char *val = strchr(ptr, '=');
if (val) {
Basically, it's splitting the string at " " and discarding anything that doesn't contain '=', which is what is resulting in anything after the initial space in a field being discarded. Splitting at '\s\w+=' (pardon my regexp) instead would allow for the desired results, unless I'm mistaken, but would require some significant recoding of that function (beyond my capacity as a C programmer without much fail and gnashing of teeth). I hope this is helpful!
Best regards,
Jonathan Kelly
[-- Attachment #1.2: Type: text/html, Size: 2246 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2008-08-15 15:28 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-12 17:49 get_field_str() and interpret_field() bug with multi-word fields Jonathan Kelly
2008-08-12 18:05 ` LC Bruzenak
2008-08-12 18:52 ` John Dennis
2008-08-12 19:02 ` LC Bruzenak
2008-08-12 18:16 ` John Dennis
2008-08-12 21:13 ` Steve Grubb
2008-08-12 22:10 ` Matthew Booth
2008-08-12 23:01 ` Eric Paris
2008-08-12 19:16 ` Steve Grubb
2008-08-12 19:58 ` John Dennis
2008-08-12 20:11 ` Eric Paris
2008-08-12 20:32 ` Steve Grubb
2008-08-12 21:09 ` John Dennis
2008-08-12 21:24 ` Steve Grubb
2008-08-12 22:37 ` John Dennis
2008-08-13 0:33 ` Klaus Heinrich Kiwi
2008-08-13 15:09 ` Eric Paris
2008-08-13 16:25 ` Klaus Heinrich Kiwi
2008-08-13 17:02 ` Steve Grubb
2008-08-13 17:30 ` LC Bruzenak
2008-08-13 18:49 ` Linda Knippers
2008-08-13 19:58 ` John Dennis
2008-08-14 18:25 ` Stephen Smalley
2008-08-15 13:58 ` Matteo Michelini
2008-08-15 14:10 ` Steve Grubb
2008-08-15 15:27 ` Matteo Michelini
2008-08-15 14:15 ` Stephen Smalley
2008-08-13 16:29 ` John Dennis
2008-08-13 22:35 ` Casey Schaufler
2008-08-12 20:57 ` John Dennis
2008-08-12 21:18 ` Steve Grubb
2008-08-12 21:40 ` John Dennis
2008-08-12 21:53 ` Steve Grubb
2008-08-12 22:11 ` John Dennis
2008-08-12 22:46 ` Steve Grubb
2008-08-12 22:59 ` Eric Paris
-- strict thread matches above, loose matches on Subject: below --
2008-08-13 16:57 Jonathan Kelly
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox