From: Timothy Miller <miller@techsource.com>
To: linux-kernel@vger.kernel.org
Subject: Painlessly shrinking kernel messages (Re: kernel support for non-english user messages)
Date: Thu, 10 Apr 2003 18:08:45 -0400 [thread overview]
Message-ID: <3E95EB6D.4020004@techsource.com> (raw)
I took the liberty of reading the FAQ (yeah, I saw 9.16) and joining the
list after reading an interesting recent discussion on i18n of kernel
messages. In short, the primary maintainers of the kernel don't want
it, and I agree with them.
HOWEVER, the discussion inspired me to think about ways of reducing some
of the unfortunate but necessary bloat caused by keeping all of those
strings in RAM. Naturally, any way to do this must be absolutely
painless, so I came up with the following set of restrictions:
- Absolutely no requirement to change existing strings, unless you feel
like it
- Must be easy to use
- Must actually shrink the kernel
- The impact on the way kernel messages appear should be minimized
To be brief, the idea I came up with was to identify the 128 most common
words in kernel messages and replace them with single character values
above 127 which printk would decode on the way out. Once the list was
determined, there would be a header file people could use, at their
leisure, to make stubstitutions. So, for instance, instead of having this:
printk("invalid: ...");
We would have this:
#define MSG_INVALID "\200"
...
prink(MSG_INVALID "...");
To judge the practicality of this, I used 'strings' on an uncompressed
kernel image (2.4.20, IIRC) and then ran it through this:
tr '[:lower:]' '[:upper:]' | tr '[:blank:]' '\n' | sort | uniq -c | tr ' ' 0
This gave me a list of all words found in the kernel along with their
counts. Then I ran it through a positively awful little C program which
I wrote to determine not the 128 most frequent, but rather, the 128 that
would result in the maximum shrinkage (maximize count * (length-1)).
The results of that run are given below. The results of the test are
that this approach might save up to 62424 bytes of kernel space which is
only about 3% of the kernel image size I got the strings from, but it's
nearly 27% of the total output I got from 'strings'. Is it worth it?
Maybe not yet, but then again, there may be an even more intelligent
approach to this compression that we could use, hopefully one which
wouldn't require any more effort to use.
Here's are the results:
count string
-------- --------
37 GIGABIT
102 BLOCK
62 NULL
871 [^_]
26 INTERFACE
23 MICROSYSTEMS
75 RAGE
338 SE
226 TECH
113 DEVICE
214 <3>
838 PC
19 <3>INIT_MODULE:
35 REGISTER
41 <3>EXT3-FS
656 UWVS
57 NETWORK
32 SUPPORT
97 COMPUTER
878 [^_
137 NET
198 MODE
534 INC
33 INTERNATIONAL
59 CARDBUS
203 TECHNO
119 TECHNOLOGY
46 CORP.
31 EXT2-FS
290 CONTROLLER
64 ASSERTION
83 DATA/FAX
249 DATA
60 KERNEL:
304 CONTROL
33 INVALID
322 %D
486 PCI
185 INC.
61 ERROR
80 PORT
154 IDE
74 INODE
102 <4>
88 KERNEL
52 ELECTRONICS
44 <3>EXT3
117 FAILED
70 AUDIO
83 HOST
27 SEMICONDUCTOR
50 CHIPS
63 DEVFS
117 ETHERNET
299 ID
291 COM
46 CANNOT
24 TRANSACTION
238 TO
79 TECHNOLOGIES
63 %08X
98 D$$
37 PROCESS
288 CORP
56 DATA/FAX/VOICE
39 COMMUNICATIONS
44 10/100
38 SERIAL
146 CORPORATION
236 TEC
107 MICRO
26 MICROSYSTEM
95 ADAPTER
324 NO
50 POWER
121 56K
27 ACCELERATOR
33 RESEARCH
21 INTEGRATED
271 PRO
19 TECHNOLOGIES,
237 LT
43 CHIPSET
28 NETWORKS
317 L$
40 <3>EXT3-FS:
1665 CO
192 BRIDGE
13 MICROELECTRONICS
157 JOURNAL
147 FOR
91 9D$
18 CYBERSERIAL
54 CYBER
56 MEMORY
34 DATA/FAX/VOICE/SPKP
49 SMART
207 LTD
137 TCP
57 CACHE
407 T$
160 <6>
26 GRAPHICS
888 D$
140 SYSTEMS
249 AT
6 JOURNAL->J_COMMITTING_TRANSACTION
142 MODEM
32 CHANNEL
131 %S:
394 %S
14 COMMIT_TRANSACTION
63 FILE
28 SMARTDAA)
67 CHIP
30 WINMODEM
113 NOT
139 ETH
331 DEV
197 FO
52 VIDEO
73 ELECTRONIC
67 EXT3
99 CARD
1336 IN
222 SYSTEM
197 AD
53 COMMUNICATION
Total reduction: 62424
Comments?
NOTE: I realize that some of those words probably aren't actually
"strings" in the kernel. This is a feasibility test, not a suggested list.
next reply other threads:[~2003-04-10 21:44 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-04-10 22:08 Timothy Miller [this message]
2003-04-10 21:42 ` Painlessly shrinking kernel messages (Re: kernel support for non-english user messages) Alan Cox
2003-04-10 23:58 ` Timothy Miller
2003-04-11 1:14 ` Alan Cox
2003-04-11 23:02 ` Timothy Miller
2003-04-11 23:03 ` David Lang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3E95EB6D.4020004@techsource.com \
--to=miller@techsource.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox