Re: Swap Compression

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: rmoser <mlmoser@comcast.net>
To: linux-kernel@vger.kernel.org
Subject: Re: Swap Compression
Date: Mon, 28 Apr 2003 22:58:46 -0400	[thread overview]
Message-ID: <200304282258460250.00DF10C2@smtp.comcast.net> (raw)
In-Reply-To: 200304282258310030.00DED562@smtp.comcast.net

[-- Attachment #1: Type: text/plain, Size: 1616 bytes --]

*********** REPLY SEPARATOR  ***********

On 4/28/2003 at 6:25 PM Timothy Miller wrote:

>rmoser wrote:
>
>>So what's the best way to do this?  I was originally thinking like this:
>>
>>Grab some swap data
>>Stuff it into fcomp_push()
>>When you have 100k of data, seal it up
>>Write that 100k block
>>
>>But does swap compress in 64k blocks?  80x86 has 64k segments.
>>The calculation for compression performance loss, 256/size_of_input,
>>gives a loss of 0.003906 (0.3906%) for a dataset 65536 bytes long.
>>So would it be better to just compress segments, whatever size they
>>may be, and index those?  This would, of course, be much more efficient
>>in terms of finding the data to uncompress.  (And system dependant)
>>
>>
>We're not using DOS here.  x86 has arbitrarily-sized segments.  It is
>the PAGES you want to swap, and they are typically 4k.
>
>
>BTW, I see swap compression as being more valuable for over-loaded
>servers than for PDA's.  Yes, I see the advantage for a PDA, but you
>only get any significant benefit out of this if you have the horsepower
>and RAM required for doing heavy-weight compression.  If you can't
>compress a page very much, you don't benefit much from swapping it to RAM.

Feh.  Selectable compression is the eventual goal.  If you want serious
heavy-weight compression, go with fcomp2.  Ahh heck I'll attatch the
.plan file for fcomp2; this is rediculously RAM and CPU intensive in good
implimentation, so don't even think for a second that you'll want to use
this on your cpu-overloaded boxen.

And no implimenting this right away, it's too much to code!

--Bluefox Icy

[-- Attachment #2: __fcomp2.plan --]
[-- Type: application/octet-stream, Size: 7901 bytes --]

fcomp
Fox Compression 2:  Size is everything

by Bluefox Icy

This algoritm is copyright Bluefox Icy under the LGPL as it exists on April
28, 2003.  LGPL v2.0

Definitions:
  Base 0:  A number that starts at 0.  So 1 byte base 0 counters are ranged
  from 0 to 255.

  Base 1:  A number that starts at 1.  So 1 byte base 1 counters are ranged
  from 1 to 256, with 0x00 being 1 and 0x01 being 2 and 0xFF being 256.

  Order of Redundancy:  A numeric count of how much a specific string is used
  to reference other data in a file's compressed output.

fcomp uses a compression algorithm that is focussed on speed only.  It uses
little RAM, and was intended for kernel-level RAM compression and for packing
executable files on 1 Mhz 6502 processors (its creation purpose).  fcomp2 is
quite different.

fcomp2 uses compression and decompression routines by Bluefox Icy.  It was
created literally with size in mind only.  fcomp2 compression should be sane,
but there's always the warning that I wrote the boyer-moore search myself, and
someone really needs to FSA-prove that it works.  Replace it with KMP,  brute
force, or your own equivalent BM if you want a 100% guarentee.

fcomp2 compressed data is a mixed backpointer/table data format.  Officially,
fcomp2 is a floating size backpointer format; backpointer size and max distance
is capable of switching along the way.  Most implimentations will want to use
a fixed size backpointer though, usually 24 bit (16 MB).

fcomp2 uses a pointer table and various pointer types.  The pointer types are
quite diverse.  A general pointer container looks like so:

TYPE    SIZE    DESC
Byte       1    Type of pointer
X          X    The pointer itself
3Byte      3    24 bit forward reference indicating the location of the next
                pointer.  It is base 0, and a 0 indicates EOS (just like in
		fcomp)

Literal pointers (1) point back into the stream and may be 8, 16, 24, or 32
bit. These pointers are standard go-back-and-copy-N pointers, like fcomp uses.
They provide a pointer within the window that the scan is using.

TYPE    SIZE    DESC
Byte       1    Size N in bytes of pointer data
NByte      N    Distance back to look, base 1 (0 == 1, range is 1..2^N)
NByte      N    Distance to copy, base 1 (0 == 1, range is 0..2^N)

Table pointers (2) are extra power.  Table pointers are pointers that reference
a table which holds the absolute locations in the stream (not including the
table) of data that each pointer points to.  In reality, all back pointers can
be replaced with table pointers.

Please note that by stream, we mean the actual pointer-and-data stream that is
processed in the compression and NOT the supplimentary header or table.

TYPE    SIZE    DESC
Byte       1    Size N in bytes of pointer data.  Increasing the size increases
                your range in the table, i.e. 1 byte can access entries 0..255,
		2 can access 0..65535, 3 0..16777383, and so on.
NByte      N    Index of the entry to copy
NByte      N    Number of bytes base 0 to skip from the beginning
NByte      N    Number of bytes base 1 to copy

The final type of pointer is a NULL (0) pointer.  If the pointer is null, it
simply is blank, i.e. X = 0 in the general pointer container definition.  A
null pointer is always 4 bytes long including the general pointer container
defined above, and 0 bytes long not including it.  It looks like this:

TYPE    SIZE    DESC

Yes, this is complete.

The stream format is:

fcomp2 stream:
TYPE   SIZE    DESC
3Byte     3    Length of X base 0 (for consistency)
stream    X    a bunch of bytes to take as pure stream
Pointer   Y    A pointer inside a general pointer container

The last 2 repeat until EOS, which is where the general pointer container has
its next pointer as a 0.  The stream should always end with a pointer with a
forward reference of 0, either Literal, Table, or NULL.

The fcomp2 table is pretty simple.  It points to offsets in the stream of data,
which is read by a decompressor and used to give further
compression/decompression performance.  This works when things are out of reach
of backpointers.  Also, a very good implimentation may take the time to
optimize the table and stream; that is, it may move all the most commonly used
table entries to the beginning and use the smallest pointer data size possible.

The fcomp2 table is in itself not built for size.  Data of multiple redundancy
is initially entered into the table as a 32 bit (4 byte) offset.  The length is
not stored; the compressor will find an offset from this within the reach of
the next table entry at compression time, while the decompressor reads entry,
skip, and copy.  To do this more easily, the compressor should keep the data
table in the order the data appears in the file until compression is finished,
and then sort the table based on order of redundancy (aka how many times each
entry is used) from greatest to least.  As the table is sorted, the stream and
table both may be adjusted so as to squeeze out more space (i.e. the most used
table entries are referenced as 8 bit pointers, changing the pointers to point
at these and use 8 bit storage changes the compressed stream).  Note that
altering the size of the compressed stream should not alter the table, as it is
in reference to the uncompressed stream.

fcomp2-table:
TYPE    SIZE    DESC
Dword      4    A 32 bit offset of the data in the uncompressed stream

An fcomp2 data file contains a few things.  First are 2 longs:  Table and
Stream offsets in the file.  Second is some data, the table, and the stream.
The "some data" can be anything (encryption flags, CRC's, etc).

fcomp2 data file:
TYPE   SIZE    DESC
DWord     4    32 bit offset base 0 (this is AT 0) of where the table is in the
               fire
DWord     4    32 bit base 0 offset of the fcomp2 stream
stream    X    Whatever you want until the table occurs
fcomp2-table Y  The table
fcomp2-stream Z The stream.  The beginning of this is exactly where a pointer to
               offset 0 in the table would point to

I would encourage the open source community to experiment with implimenting
this compression schema.  It is VERY RAM intensive, but a good implimentation
should be able to squeeze great compression ratios from this.  Here are the
requirements for the best implimentation:

1.  Intelligent Table Generation
  Table entries are each 4 bytes in size.  A table should not be generated at a
  point less than 4 bytes from the previous table.
2.  Optimized Table Sorting
  The entries most used in the table should be placed at the beginning; the
  table should be sorted in descending reference order.  Then the pointers must
  be adjusted (must as in this will break the stream if you don't do this) in
  the compressed stream to point to the new locations of the table entries.  In
  adjusting the pointers, the compressor should shrink them to the smallest
  possible size given the index number they must reference and the
  distance/vector they must skip and copy.  For the best optimization, this
  step should also take into account the distance/vector sizes in the pointers,
  to assess how many pointers would gain and lose size in each range of index
  size (1/2/3/4 bytes) and compute the total size gain (positive or negative)
  for each pointer given the positions, and adjust according to these final
  scores.
3.  Intelligent Table Discarding
  When table entries in the table are close enough together, the pointers which
  reference them may be capable of using only a small set from a larger set of
  table entries.  If this is the case, the compressor should remove those table
  entries and readjust all related pointers.

If a compressor is written, it does not have to follow any of the above to have
compatible output with fcomp2 binary files; however, the above guidelines give
the best compression ratios for fcomp2 compressed files.

next prev parent reply	other threads:[~2003-04-29  2:48 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-25 22:48 Re: Swap Compression rmoser
2003-04-26  9:17 ` Jörn Engel
     [not found]   ` <200304261148590300.00CE9372@smtp.comcast.net>
     [not found]     ` <20030426160920.GC21015@wohnheim.fh-wedel.de>
2003-04-26 23:41       ` Some code for " rmoser
2003-04-27  2:24       ` rmoser
2003-04-27  9:04         ` Jörn Engel
2003-04-27 17:24           ` rmoser
2003-04-27 17:51             ` Jörn Engel
2003-04-27 18:22               ` William Lee Irwin III
2003-04-27 18:31               ` rmoser
2003-04-27 19:04                 ` Jörn Engel
2003-04-27 19:57                   ` Livio Baldini Soares
2003-04-27 20:24                     ` rmoser
     [not found]                   ` <200304271609460030.01FC8C2B@smtp.comcast.net>
2003-04-27 20:10                     ` rmoser
2003-04-27 21:52                   ` rmoser
2003-04-27 21:55                     ` Re: Swap Compression -- Try 2 rmoser
2003-04-28  8:52                   ` Swap Compression Eric W. Biederman
2003-04-28 10:26                     ` Jörn Engel
     [not found]                 ` <3EAE8899.2010208@techsource.com>
     [not found]                   ` <200304291521120230.0462A551@smtp.comcast.net>
2003-04-29 19:21                     ` rmoser
     [not found]         ` <3EADAA5D.1090408@techsource.com>
     [not found]           ` <200304282258310030.00DED562@smtp.comcast.net>
2003-04-29  2:58             ` rmoser [this message]
  -- strict thread matches above, loose matches on Subject: below --
2003-05-09  3:21 Perez-Gonzalez, Inaky
2003-05-08  3:17 Perez-Gonzalez, Inaky
2003-05-08  8:07 ` Jörn Engel
2003-04-29 20:07 Timothy Miller
2003-04-29 20:40 ` rmoser
2003-04-29 21:14   ` John Bradford
2003-04-30  0:59     ` rmoser
2003-04-30  2:48       ` Con Kolivas
2003-04-30 12:59         ` Jörn Engel
2003-04-30 19:18           ` rmoser
2003-05-01 22:07           ` rmoser
2003-05-02  2:46             ` jw schultz
2003-04-25 22:32 rmoser
2003-04-28 21:35 ` Timothy Miller
2003-04-29  0:43   ` Con Kolivas
     [not found] <200304251640110420.0069172B@smtp.comcast.net>
2003-04-25 20:48 ` rmoser
2003-04-25 21:14   ` Jörn Engel
2003-04-25 21:17   ` John Bradford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200304282258460250.00DF10C2@smtp.comcast.net \
    --to=mlmoser@comcast.net \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox