public inbox for linux-audit@redhat.com
 help / color / mirror / Atom feed
* Backwards-compatible string encoding
@ 2009-03-27 15:18 Joshua Roys
  2009-03-27 16:41 ` John Dennis
  0 siblings, 1 reply; 3+ messages in thread
From: Joshua Roys @ 2009-03-27 15:18 UTC (permalink / raw)
  To: linux-audit@redhat.com

Hello all,

I have just run into the problem that many of you have: trying to parse 
the audit logs.

Yesterday I read through the linux-audit mail archive.  Here are the 
related topics I have found:
  https://www.redhat.com/archives/linux-audit/2006-March/msg00093.html
  https://www.redhat.com/archives/linux-audit/2006-March/msg00158.html
  https://www.redhat.com/archives/linux-audit/2007-November/msg00036.html
  https://www.redhat.com/archives/linux-audit/2008-January/msg00082.html
  https://www.redhat.com/archives/linux-audit/2008-March/msg00024.html
  https://www.redhat.com/archives/linux-audit/2008-May/msg00029.html
  https://www.redhat.com/archives/linux-audit/2008-June/msg00005.html
  https://www.redhat.com/archives/linux-audit/2008-August/msg00078.html
  https://www.redhat.com/archives/linux-audit/2009-March/msg00018.html

 From these I see these requirements (correct me if I am wrong):
- must be backwards-compatible (doesn't break user-space on FC2, etc)
- kernel does no verifying of incoming user-space strings
- kernel must output strings in a "simple" format (e.g. no XML :-)
- able to write a parser that guarantees all (relevant) input ends up in 
output
- use disk space efficiently
- handle UTF-8

Based on things other people have proposed, how does this sound:
- radix prefixes for any non-base10 number (I think audit mostly does 
this already?)
- hex-encode strings (and do not quote) if:
-- contains non-ASCII or non-printable characters
- quote strings if:
-- contains whitespace or '=' or '"' (in which case you have to output 
something like '\"'
-- entirely {hex,octal,base10} characters

Or we could just save a little more headache at the cost of 
space/readability and hex-encode on '=' and '"' too.  Looking at 
auparse, we may have to hexencode with embedded '"'.

Check if you need to encode first, then check for quoting.  Something 
like...

// somewhere in kernel/audit.c ?
char *audit_log_sane_string(char *str, size_t slen) {

int quoteme = 0;
size_t i, numhex = 0;

for(i = 0; i < slen; i++) {
   if (!isprint(str[i])) return(hexencode(str));
   if (isspace(str[i]) || str[i] == '=' || str[i] == '"') quoteme = 1;
   if (isxdigit(str[i])) numhex++; // xdigit covers base8,10,16
}

if (quoteme || numhex == slen) return(quote(str));

return(strdup(str)); // kstrdup...?

}

Oh, and if anyone has ideas for making shadow-utils play nicer with 
audit, I possibly have that kind of time on my hands.  Also, getting rid 
of the extra punctuation [:(,)] would be great.

What do you all think?

Joshua Roys

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-04-09 19:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-27 15:18 Backwards-compatible string encoding Joshua Roys
2009-03-27 16:41 ` John Dennis
2009-04-09 19:55   ` Joshua Roys

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox