A change to string encoding

public inbox for linux-audit@redhat.com
 help / color / mirror / Atom feed

* A change to string encoding
@ 2009-03-10 11:07 Matthew Booth
  2009-03-10 15:58 ` Steve Grubb
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Matthew Booth @ 2009-03-10 11:07 UTC (permalink / raw)
  To: linux-audit

[-- Attachment #1.1: Type: text/plain, Size: 1017 bytes --]

The problem with current string encoding is that it is parsable, but
non-human readable. It also complicates parsing by requiring 2 different
decoding methods to be implemented.

It occurs to me that a URL encoding scheme would also meet the parsing
requirements. Additionally:

1. It is always human readable.
2. There is only 1 encoding scheme.
3. Substring matching on encoded strings will always succeed.

URL encoding is just one way to achieve this, and has the advantage of
being widely implemented. However, the minimal requirements would be a
scheme which encoded only separator characters (whitespace in this case)
without the use of those separators.

I'm sure this has been considered before. Given that it's a road I'm
considering heading down, what were the reasons for not doing it?

Thanks,

Matt
-- 
Matthew Booth, RHCA, RHCSS
Red Hat, Global Professional Services

M:       +44 (0)7977 267231
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A change to string encoding
  2009-03-10 11:07 A change to string encoding Matthew Booth
@ 2009-03-10 15:58 ` Steve Grubb
  2009-03-10 18:13   ` Matthew Booth
  2009-03-10 19:55   ` John Dennis
  2009-03-10 16:22 ` Tomas Mraz
  2009-03-10 18:51 ` Eric Paris
  2 siblings, 2 replies; 6+ messages in thread
From: Steve Grubb @ 2009-03-10 15:58 UTC (permalink / raw)
  To: linux-audit

On Tuesday 10 March 2009 07:07:17 am Matthew Booth wrote:
> The problem with current string encoding is that it is parsable, but
> non-human readable.

There are times when it has things that would never be human readable.

> URL encoding is just one way to achieve this, and has the advantage of
> being widely implemented. 

Inside the kernel?

> I'm sure this has been considered before. Given that it's a road I'm
> considering heading down, what were the reasons for not doing it?

Can you encode data structures in it? The kernel developer at the time wanted 
something that was either already in the kernel or something that could be 
implemented in a couple lines of code and something that works for any kind 
of encoding that needed to be done. So, I think minimal amount of code and 
maximum flexibility is what drove the decision.

-Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A change to string encoding
  2009-03-10 11:07 A change to string encoding Matthew Booth
  2009-03-10 15:58 ` Steve Grubb
@ 2009-03-10 16:22 ` Tomas Mraz
  2009-03-10 18:51 ` Eric Paris
  2 siblings, 0 replies; 6+ messages in thread
From: Tomas Mraz @ 2009-03-10 16:22 UTC (permalink / raw)
  To: linux-audit

On Tue, 2009-03-10 at 11:07 +0000, Matthew Booth wrote:
> The problem with current string encoding is that it is parsable, but
> non-human readable. It also complicates parsing by requiring 2 different
> decoding methods to be implemented.
> 
> It occurs to me that a URL encoding scheme would also meet the parsing
> requirements. Additionally:
> 
> 1. It is always human readable.
> 2. There is only 1 encoding scheme.
> 3. Substring matching on encoded strings will always succeed.
> 
> URL encoding is just one way to achieve this, and has the advantage of
> being widely implemented. However, the minimal requirements would be a
> scheme which encoded only separator characters (whitespace in this case)
> without the use of those separators.
> 
> I'm sure this has been considered before. Given that it's a road I'm
> considering heading down, what were the reasons for not doing it?

It was already discussed here without a conclusion:
http://marc.info/?l=linux-audit&m=120978583018941&w=2
-- 
Tomas Mraz
No matter how far down the wrong road you've gone, turn back.
                                              Turkish proverb

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A change to string encoding
  2009-03-10 15:58 ` Steve Grubb
@ 2009-03-10 18:13   ` Matthew Booth
  2009-03-10 19:55   ` John Dennis
  1 sibling, 0 replies; 6+ messages in thread
From: Matthew Booth @ 2009-03-10 18:13 UTC (permalink / raw)
  To: Steve Grubb; +Cc: linux-audit

On Tue, 2009-03-10 at 11:58 -0400, Steve Grubb wrote:
> On Tuesday 10 March 2009 07:07:17 am Matthew Booth wrote:
> > The problem with current string encoding is that it is parsable, but
> > non-human readable.
> 
> There are times when it has things that would never be human readable.

Do you have an exhaustive list? Off the top of my head I can think of
the record which contains a struct sockaddr.

> > URL encoding is just one way to achieve this, and has the advantage of
> > being widely implemented. 
> 
> Inside the kernel?

No, I mean the format output by the audit daemon.

> > I'm sure this has been considered before. Given that it's a road I'm
> > considering heading down, what were the reasons for not doing it?
> 
> Can you encode data structures in it? The kernel developer at the time wanted 
> something that was either already in the kernel or something that could be 
> implemented in a couple lines of code and something that works for any kind 
> of encoding that needed to be done. So, I think minimal amount of code and 
> maximum flexibility is what drove the decision.

Sounds like this falls in to the 'easy to produce' category. We also
need to consider how this data is consumed. I'll take this opportunity
to suggest again that a binary format produced by the kernel would be no
worse than a hex encoded binary format, and a whole lot easier for the
audit daemon to mangle into something usable on the output side.

Matt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A change to string encoding
  2009-03-10 11:07 A change to string encoding Matthew Booth
  2009-03-10 15:58 ` Steve Grubb
  2009-03-10 16:22 ` Tomas Mraz
@ 2009-03-10 18:51 ` Eric Paris
  2 siblings, 0 replies; 6+ messages in thread
From: Eric Paris @ 2009-03-10 18:51 UTC (permalink / raw)
  To: Matthew Booth; +Cc: linux-audit

On Tue, 2009-03-10 at 11:07 +0000, Matthew Booth wrote:
> The problem with current string encoding is that it is parsable, but
> non-human readable. It also complicates parsing by requiring 2 different
> decoding methods to be implemented.
> 
> It occurs to me that a URL encoding scheme would also meet the parsing
> requirements. Additionally:
> 
> 1. It is always human readable.
> 2. There is only 1 encoding scheme.
> 3. Substring matching on encoded strings will always succeed.
> 
> URL encoding is just one way to achieve this, and has the advantage of
> being widely implemented. However, the minimal requirements would be a
> scheme which encoded only separator characters (whitespace in this case)
> without the use of those separators.
> 
> I'm sure this has been considered before. Given that it's a road I'm
> considering heading down, what were the reasons for not doing it?

Lack of code.  And history I guess.  What we have is fast and easy.  Any
encoding scheme must meet both of those.  It's come up before with the
basic agreement that what we have isn't great.  It works, is about the
best thing you can say about it.

Backwards compatibility is a big issue.  Any new code (in the kernel at
least) has to allow us to continue outputting the way we do for some
time.  I've said it before and I'll say it again, I'm willing to
entertain a new string encoding system in the kernel but I don't have
the time to write it.

There was talk that someone in the IPA project was going to write an
audit plugin that would re-encode strings to something they liked, but I
haven't seen it.

As long as you have some way to maintain backwards compatibility and
have the time to write it, I think just about any other string encoding
scheme would make people happier than what we have today...

-Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A change to string encoding
  2009-03-10 15:58 ` Steve Grubb
  2009-03-10 18:13   ` Matthew Booth
@ 2009-03-10 19:55   ` John Dennis
  1 sibling, 0 replies; 6+ messages in thread
From: John Dennis @ 2009-03-10 19:55 UTC (permalink / raw)
  To: Steve Grubb; +Cc: linux-audit

Steve Grubb wrote:
> Can you encode data structures in it? The kernel developer at the time wanted 
> something that was either already in the kernel or something that could be 
> implemented in a couple lines of code and something that works for any kind 
> of encoding that needed to be done. So, I think minimal amount of code and 
> maximum flexibility is what drove the decision.
>   
The comment was *not* about encoding data structures, rather it was 
about string encoding.

I have provided code in the past which encodes a string according to the 
ISO C99 standard. It does not tax the kernel, use excessive resources, 
or is complicated in any sense whatsoever (it's just a per character 
table lookup) and wraps the result in double quotes.

Yes, this would change the output of the kernel audit data, which does 
have the potential to break existing user code. However, it's often been 
stated only the official audit libraries should ever be used to read 
audit data and if that recommendation still holds then the audit 
libraries should be capable of gracefully handling either the old or new 
format providing a transparent transition.

I hope at some point we can start to address this reoccurring issue.

-- 
John Dennis <jdennis@redhat.com>

Looking to carve out IT costs?
www.redhat.com/carveoutcosts/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-03-10 19:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-10 11:07 A change to string encoding Matthew Booth
2009-03-10 15:58 ` Steve Grubb
2009-03-10 18:13   ` Matthew Booth
2009-03-10 19:55   ` John Dennis
2009-03-10 16:22 ` Tomas Mraz
2009-03-10 18:51 ` Eric Paris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox