From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joshua Roys Subject: Backwards-compatible string encoding Date: Fri, 27 Mar 2009 11:18:48 -0400 Message-ID: <49CCEE58.5010800@gtri.gatech.edu> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mx3.redhat.com (mx3.redhat.com [172.16.48.32]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n2RGIxuP007849 for ; Fri, 27 Mar 2009 12:19:18 -0400 Received: from relay1.gtri.gatech.edu (relay1.gtri.gatech.edu [130.207.199.161]) by mx3.redhat.com (8.13.8/8.13.8) with ESMTP id n2RFIvlc026545 for ; Fri, 27 Mar 2009 11:18:59 -0400 Received: from apatlisdmail24.core.gtri.org (apatlisdmail24.core.gtri.org [130.207.199.195]) by relay1.gtri.gatech.edu (Spam Firewall) with ESMTP id B354E46633F for ; Fri, 27 Mar 2009 11:18:56 -0400 (EDT) Received: from apatlisdmail24.core.gtri.org (apatlisdmail24.core.gtri.org [130.207.199.195]) by relay1.gtri.gatech.edu with ESMTP id oeAm4R8NkV8Z1Zlt for ; Fri, 27 Mar 2009 11:18:56 -0400 (EDT) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-audit-bounces@redhat.com Errors-To: linux-audit-bounces@redhat.com To: "linux-audit@redhat.com" List-Id: linux-audit@redhat.com Hello all, I have just run into the problem that many of you have: trying to parse the audit logs. Yesterday I read through the linux-audit mail archive. Here are the related topics I have found: https://www.redhat.com/archives/linux-audit/2006-March/msg00093.html https://www.redhat.com/archives/linux-audit/2006-March/msg00158.html https://www.redhat.com/archives/linux-audit/2007-November/msg00036.html https://www.redhat.com/archives/linux-audit/2008-January/msg00082.html https://www.redhat.com/archives/linux-audit/2008-March/msg00024.html https://www.redhat.com/archives/linux-audit/2008-May/msg00029.html https://www.redhat.com/archives/linux-audit/2008-June/msg00005.html https://www.redhat.com/archives/linux-audit/2008-August/msg00078.html https://www.redhat.com/archives/linux-audit/2009-March/msg00018.html From these I see these requirements (correct me if I am wrong): - must be backwards-compatible (doesn't break user-space on FC2, etc) - kernel does no verifying of incoming user-space strings - kernel must output strings in a "simple" format (e.g. no XML :-) - able to write a parser that guarantees all (relevant) input ends up in output - use disk space efficiently - handle UTF-8 Based on things other people have proposed, how does this sound: - radix prefixes for any non-base10 number (I think audit mostly does this already?) - hex-encode strings (and do not quote) if: -- contains non-ASCII or non-printable characters - quote strings if: -- contains whitespace or '=' or '"' (in which case you have to output something like '\"' -- entirely {hex,octal,base10} characters Or we could just save a little more headache at the cost of space/readability and hex-encode on '=' and '"' too. Looking at auparse, we may have to hexencode with embedded '"'. Check if you need to encode first, then check for quoting. Something like... // somewhere in kernel/audit.c ? char *audit_log_sane_string(char *str, size_t slen) { int quoteme = 0; size_t i, numhex = 0; for(i = 0; i < slen; i++) { if (!isprint(str[i])) return(hexencode(str)); if (isspace(str[i]) || str[i] == '=' || str[i] == '"') quoteme = 1; if (isxdigit(str[i])) numhex++; // xdigit covers base8,10,16 } if (quoteme || numhex == slen) return(quote(str)); return(strdup(str)); // kstrdup...? } Oh, and if anyone has ideas for making shadow-utils play nicer with audit, I possibly have that kind of time on my hands. Also, getting rid of the extra punctuation [:(,)] would be great. What do you all think? Joshua Roys