From: Timothy Miller <miller@techsource.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Painlessly shrinking kernel messages (Re: kernel support for non-english user messages)
Date: Thu, 10 Apr 2003 19:58:46 -0400 [thread overview]
Message-ID: <3E960536.5010900@techsource.com> (raw)
In-Reply-To: 1050010963.12494.132.camel@dhcp22.swansea.linux.org.uk
Alan Cox wrote:
>Not a totally crazy idea. You could also do 5pack and some of the other
>string tricks people have used in time. You also dont need to do word
>boundaries.
>
My google search for '5pack' didn't come up with anything relevant.
Things that come to mind include converting to a character set which
requires fewer than 8 bits per character and then packing them into
bytes. Or perhaps making a list of every quintuplet of characters that
ever occurs and assign them codes.
I initially considered the idea of ignoring word boundaries. I rejected
it because part of the "painless" factor would be that it could be done
manually without a lot of thinking. But I will run a test which ignores
word boundaries and see what kinds of results I get. Of course, if we
want to do something that involves some post-compile magic or whatnot,
then we can do all sorts of gnarley tricks. But that doesn't differ (in
complexity) much from the idea someone else mentioned which was to
completely remove all messages from the kernel by magically converting
them to numbers or hashes and then decoding them outside of the kernel.
There was mentioned a valid point that boot messages need to be handled
properly by the kernel before any services are up. Separating the boot
messages from the non-boot messages would require manual intervention
that goes against the painless factor, and is the pie slice containing
only non-boot messages large enough that it's worth it? There seem to
be quite a lot of boot messages that could benefit from some sort of
completely-in-kernel compression.
>
>For embedded at least this is far from ludicrous as a concept. The
>tricky piece for all of these is working out how to grab each printk
>format string and do things to it. That lets you do compression,
>removal, internationalisation, cataloguing ..
>
>
Hmmm...
- Make gcc produce assember output
- Find all calls to prink
- Cross-reference those against all static strings
- Compress the strings
- Run through gas, etc.
The problem with this approach is that we have to deal with different
architectures. The plus is that any unsupported arch just doesn't run
the compression tool and uses regular printk.
How about:
- Use perl or yacc or something to parse the kernel source for strings
- Compress them
- Make the substitutions inline in the source as part of the
pre-processing stage
- Compile
Heck, we could just embed this functionality directly into the
preprocessor. Unfortunately, this one is somewhat beyond my current
knowledge of the tools that would make it convenient.
Just as a note, I worked on my test program to make it a more accurate.
For 128 codes, the actual reduction is 38946 bytes. For this
algorithm, I look to see if any of the shorter words are contained in
any of the larger ones; in the case where the shorter word's
substitution would shrink the kernel more than the larger, I add the
larger word's count to the smaller and delete the larger.
If we were to outlaw some of the lower characters, such as most
non-printing characters and all lower-case, then that brings us up to
having 184 codes to work with. That lets us save 42692 bytes. If we
were to go to two-character codes, where the first one is 128-255 and
the second is 1-255, that brings the number of codes up to 32640. It
turns out that, with my current algorithm, it doesn't buy anything, and
it also violates the painless factor by giving people a huge list of
words they have to pick from when writing kernel messages. Also, it
turns out that there are only just over 500 different words which would
save more than 2 bytes by being encoded.
I need to get a LOT more clever about this before it's worth doing.
I'll try the no-word-boundaries approach. And we'll see how interested
other people are in having to DEAL with it.
BTW, should I faint or something because THE Alan Cox responded to my
first post to lkml? :)
You hate it when people say that sort of thing, don't you. :)
next prev parent reply other threads:[~2003-04-10 23:34 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-04-10 22:08 Painlessly shrinking kernel messages (Re: kernel support for non-english user messages) Timothy Miller
2003-04-10 21:42 ` Alan Cox
2003-04-10 23:58 ` Timothy Miller [this message]
2003-04-11 1:14 ` Alan Cox
2003-04-11 23:02 ` Timothy Miller
2003-04-11 23:03 ` David Lang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3E960536.5010900@techsource.com \
--to=miller@techsource.com \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox