All of lore.kernel.org
 help / color / mirror / Atom feed
* Can compression at filesystem level improve overall performance?
@ 2004-03-19 14:25 Erik Terpstra
  2004-03-19 16:29 ` Redeeman
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Erik Terpstra @ 2004-03-19 14:25 UTC (permalink / raw)
  To: reiserfs-list

Hello everyone,

For the last couple of years I noticed that the performance of most of 
my systems has it's bottleneck in data throughput rather than CPU 
performance.

Is it fair to say that today compression at the filesystem level would 
improve overall performance?

If this is the case, it probably wouldn't be too hard to implement as a 
module in Reiser4?

Any thoughts?

--Erik.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-19 14:25 Can compression at filesystem level improve overall performance? Erik Terpstra
@ 2004-03-19 16:29 ` Redeeman
  2004-03-19 16:53   ` Nikita Danilov
  2004-03-19 18:59 ` Hans Reiser
  2004-03-23  0:17 ` Miguel
  2 siblings, 1 reply; 22+ messages in thread
From: Redeeman @ 2004-03-19 16:29 UTC (permalink / raw)
  To: Reiserfs Mailinglist

On Fri, 2004-03-19 at 15:25, Erik Terpstra wrote:
> Hello everyone,
> 
> For the last couple of years I noticed that the performance of most of 
> my systems has it's bottleneck in data throughput rather than CPU 
> performance.
> 
> Is it fair to say that today compression at the filesystem level would 
> improve overall performance?
> 
> If this is the case, it probably wouldn't be too hard to implement as a 
> module in Reiser4?
> 
> Any thoughts?
the more agressive you compress it the more cpu it takes, and that will
make it slower, but i think a small compression algorithm for filesystem
purpose could be written... however, i doubt it will be worth it,
harddrives are really cheap nowadays.. but maybe some algortihm to
compress cleartext only, or something..


> 
> --Erik.
-- 
Regards, Redeeman
redeeman@metanurb.dk


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-19 16:29 ` Redeeman
@ 2004-03-19 16:53   ` Nikita Danilov
  2004-03-21 14:29     ` Sean Johnson
                       ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Nikita Danilov @ 2004-03-19 16:53 UTC (permalink / raw)
  To: Redeeman; +Cc: Reiserfs Mailinglist

Redeeman writes:
 > On Fri, 2004-03-19 at 15:25, Erik Terpstra wrote:
 > > Hello everyone,
 > > 
 > > For the last couple of years I noticed that the performance of most of 
 > > my systems has it's bottleneck in data throughput rather than CPU 
 > > performance.
 > > 
 > > Is it fair to say that today compression at the filesystem level would 
 > > improve overall performance?
 > > 
 > > If this is the case, it probably wouldn't be too hard to implement as a 
 > > module in Reiser4?
 > > 
 > > Any thoughts?
 > the more agressive you compress it the more cpu it takes, and that will
 > make it slower, but i think a small compression algorithm for filesystem
 > purpose could be written... however, i doubt it will be worth it,
 > harddrives are really cheap nowadays.. but maybe some algortihm to
 > compress cleartext only, or something..

That's common misconception. :)

The goal of compression is to conserve disk bandwidth rather than space.

By compressing it is possible to transfer data (== uncompressed data
user works with), at a rate higher than raw device bandwidth.

 > 
 > 
 > > 
 > > --Erik.
 > -- 
 > Regards, Redeeman
 > redeeman@metanurb.dk
 > 

Nikita.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-19 14:25 Can compression at filesystem level improve overall performance? Erik Terpstra
  2004-03-19 16:29 ` Redeeman
@ 2004-03-19 18:59 ` Hans Reiser
  2004-03-23  0:17 ` Miguel
  2 siblings, 0 replies; 22+ messages in thread
From: Hans Reiser @ 2004-03-19 18:59 UTC (permalink / raw)
  To: Erik Terpstra; +Cc: reiserfs-list

Erik Terpstra wrote:

> Hello everyone,
>
> For the last couple of years I noticed that the performance of most of 
> my systems has it's bottleneck in data throughput rather than CPU 
> performance.
>
> Is it fair to say that today compression at the filesystem level would 
> improve overall performance?
>
> If this is the case, it probably wouldn't be too hard to implement as 
> a module in Reiser4?
>
> Any thoughts?
>
> --Erik.
>
>
We are working on it.  Actually, it is hard to code because we need to 
compress at flush to disk time rather than at each write.  Flush related 
plugins are the hardest plugins.

-- 
Hans


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-19 16:53   ` Nikita Danilov
@ 2004-03-21 14:29     ` Sean Johnson
  2004-03-21 23:17       ` Can compression at filesystem level improve overall The Amazing Dragon
  2004-03-22  8:01     ` Can compression at filesystem level improve overall performance? Kris Van Bruwaene
  2004-03-22 18:00     ` Scott Young
  2 siblings, 1 reply; 22+ messages in thread
From: Sean Johnson @ 2004-03-21 14:29 UTC (permalink / raw)
  To: Reiserfs Mailinglist

On Fri, 2004-03-19 at 11:53, Nikita Danilov wrote:
> That's common misconception. :)
> 
> The goal of compression is to conserve disk bandwidth rather than space.
> 
> By compressing it is possible to transfer data (== uncompressed data
> user works with), at a rate higher than raw device bandwidth.

I am far from any kind of authority on filesystems, but doesn't compression
make data corruption a significantly nastier bugaboo?

Thanks,

Sean


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall
  2004-03-21 14:29     ` Sean Johnson
@ 2004-03-21 23:17       ` The Amazing Dragon
  2004-03-21 23:23         ` Sean Johnson
  2004-03-22  9:14         ` Hans Reiser
  0 siblings, 2 replies; 22+ messages in thread
From: The Amazing Dragon @ 2004-03-21 23:17 UTC (permalink / raw)
  To: sean; +Cc: reiserfs-list

> From: Sean Johnson <sean@gutenpress.org>
> On Fri, 2004-03-19 at 11:53, Nikita Danilov wrote:
> > That's common misconception. :)
> > 
> > The goal of compression is to conserve disk bandwidth rather than space.
> > 
> > By compressing it is possible to transfer data (== uncompressed data
> > user works with), at a rate higher than raw device bandwidth.
> 
> I am far from any kind of authority on filesystems, but doesn't compression
> make data corruption a significantly nastier bugaboo?

Potentially. Depending upon the encoding losing one block of encoded data
maps to losing many blocks of decoded data. Also losing the first block
of data might make it impossible to recover later blocks.

But these aren't issues since you do error correction near the physical
layer, and backups just you make sure. You do, don't you?


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \   (    |         EHeM@cs.pdx.edu      PGP 8881EF59         |    )   /
  \_  \   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
    \___\_|_/82 04 A1 3C C7 B1 37 2A*E3 6E 84 DA 97 4C 40 E6\_|_/___/



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall
  2004-03-21 23:17       ` Can compression at filesystem level improve overall The Amazing Dragon
@ 2004-03-21 23:23         ` Sean Johnson
  2004-03-22  9:14         ` Hans Reiser
  1 sibling, 0 replies; 22+ messages in thread
From: Sean Johnson @ 2004-03-21 23:23 UTC (permalink / raw)
  To: reiserfs-list

On Sun, 2004-03-21 at 18:17, The Amazing Dragon wrote:
> > From: Sean Johnson <sean@gutenpress.org>
> > On Fri, 2004-03-19 at 11:53, Nikita Danilov wrote:
> > > That's common misconception. :)
> > > 
> > > The goal of compression is to conserve disk bandwidth rather than space.
> > > 
> > > By compressing it is possible to transfer data (== uncompressed data
> > > user works with), at a rate higher than raw device bandwidth.
> > 
> > I am far from any kind of authority on filesystems, but doesn't compression
> > make data corruption a significantly nastier bugaboo?
> 
> Potentially. Depending upon the encoding losing one block of encoded data
> maps to losing many blocks of decoded data. Also losing the first block
> of data might make it impossible to recover later blocks.
> 
> But these aren't issues since you do error correction near the physical
> layer, and backups just you make sure. You do, don't you?

Heh, in the words of my previous team lead "...sure we do backups, it's 
just the restores that we have problems with ..." *grin*

Sean



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-19 16:53   ` Nikita Danilov
  2004-03-21 14:29     ` Sean Johnson
@ 2004-03-22  8:01     ` Kris Van Bruwaene
  2004-03-22 18:00     ` Scott Young
  2 siblings, 0 replies; 22+ messages in thread
From: Kris Van Bruwaene @ 2004-03-22  8:01 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Reiserfs Mailinglist

Nikita Danilov wrote:

>Redeeman writes:
> > On Fri, 2004-03-19 at 15:25, Erik Terpstra wrote:
> > > Is it fair to say that today compression at the filesystem level would 
> > > improve overall performance?
> > the more agressive you compress it the more cpu it takes, and that will
> > make it slower, but i think a small compression algorithm for filesystem
> > purpose could be written... however, i doubt it will be worth it,
> > harddrives are really cheap nowadays.. but maybe some algortihm to
> > compress cleartext only, or something..
>
>That's common misconception. :)
>
>The goal of compression is to conserve disk bandwidth rather than space.
>
>By compressing it is possible to transfer data (== uncompressed data
>user works with), at a rate higher than raw device bandwidth.
>  
>
Something else to consider: the gain might not be so impressive, since 
many files are already heavily compressed: apart from the obvious ones 
(.zip .gz .bz2) most audio (.mp3) and video is natively compressed 
(mpeg2/4), and amy office files as well (presentations, pdf...).



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall
  2004-03-21 23:17       ` Can compression at filesystem level improve overall The Amazing Dragon
  2004-03-21 23:23         ` Sean Johnson
@ 2004-03-22  9:14         ` Hans Reiser
  1 sibling, 0 replies; 22+ messages in thread
From: Hans Reiser @ 2004-03-22  9:14 UTC (permalink / raw)
  To: reiserfs-list; +Cc: sean, Edward Shishkin

The Amazing Dragon (Elliott Mitchell) wrote:

>>From: Sean Johnson <sean@gutenpress.org>
>>On Fri, 2004-03-19 at 11:53, Nikita Danilov wrote:
>>    
>>
>>>That's common misconception. :)
>>>
>>>The goal of compression is to conserve disk bandwidth rather than space.
>>>
>>>By compressing it is possible to transfer data (== uncompressed data
>>>user works with), at a rate higher than raw device bandwidth.
>>>      
>>>
>>I am far from any kind of authority on filesystems, but doesn't compression
>>make data corruption a significantly nastier bugaboo?
>>    
>>
>
>Potentially. Depending upon the encoding losing one block of encoded data
>maps to losing many blocks of decoded data. Also losing the first block
>of data might make it impossible to recover later blocks.
>  
>
I think it will just make you lose the compression atom, but Edward can 
say more when he gets back from vacation.

>But these aren't issues since you do error correction near the physical
>layer, and backups just you make sure. You do, don't you?
>
>
>  
>


-- 
Hans


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-19 16:53   ` Nikita Danilov
  2004-03-21 14:29     ` Sean Johnson
  2004-03-22  8:01     ` Can compression at filesystem level improve overall performance? Kris Van Bruwaene
@ 2004-03-22 18:00     ` Scott Young
  2004-03-22 20:04       ` Hans Reiser
  2 siblings, 1 reply; 22+ messages in thread
From: Scott Young @ 2004-03-22 18:00 UTC (permalink / raw)
  To: reiserfs-list


> 
> That's common misconception. :)
> 
> The goal of compression is to conserve disk bandwidth rather than space.
> 
> By compressing it is possible to transfer data (== uncompressed data
> user works with), at a rate higher than raw device bandwidth.

I will be doing some research on an algorithm that speeds up data
transfers over a network by adaptively selecting a compression
algorithm.  It can be applied to filesystem reads and writes too.  When
the send queue is reasonably full on the server, it starts compressing
data at the tail of the queue while sending the data at the head of the
queue.  If the output stream catches up to segment currently being
compressed, then that segment is sent uncompressed.  If the compressed
data is not significantly smaller, then the uncompressed data is sent
instead.  For network applications that are not network interface bound
(like rsync over a 100mbit connection), the buffer will be empty most of
the time and therefore little compression would be needed or wanted as
it would only slow the application down.  Compression is chosen from a
pool of algorithms and varied depending on the history of buffer
overflows and under-runs.  Slower, better compression algorithms are
used when the buffer is mostly full and the compression is observably
effective.  The idea here is to minimize the time between the client
requesting the data and having the usable data in a minimal amount of
time.  This can be seen as a time-verses-amount-of-usable-data-on-client
graph, and some applications prefer a low latency for the initial stream
of data (such as a web page) whereas some prefer the time to retrieve a
very large piece of data (such as scp scott@1.2.3.4/SomeBigDocument.sxw
/home/scott over a 56k modem).

Adapting this to filesystem concepts, the server can be seen as the
write process and the client can be seen as the read process.  The idea
can be applied to Reiser4 by compressing the overwrite set while the
journal data is being written, and then compressing the tail of the
relocate set moving backwards until the write stream catches up to the
compression.  It could also take into account the estimated
decompression time when reading the data back, and use it for deciding
whether the compression ratio is good enough to write the compressed
data instead of the uncompressed data.

Another interesting twist would be to cache the compressed data if the
same data is going to be sent from the server several times.  This
reduces CPU overhead on the server (and possibly it's memory
requirements for caching the data, and reduces the amount of data that
needs to be read from the drive), but it is complicated in the context
of a network algorithm and is mostly application-dependent.  This is
research for another day, maybe in the form of a derived-data plugin for
ReiserFS where an application tells the filesystem how to construct the
file, and the filesystem can store the original, the result, or both,
depending on space needs and performance analysis, with copy-on-write
metadata flags when appropriate.

I haven't started coding the adaptive compression algorithm yet, but I
have a general idea about how I am going to implement it.  For the
proof-of-concept, I want to write this using sockets and some basic
library compression algorithms (gzip, bzip2, and maybe a simple MTF +
Adaptive Huffman).  Later variants may work with TCP or other protocols
around that layer.  Any suggestions will be appreciated.


Scott Young





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-22 18:00     ` Scott Young
@ 2004-03-22 20:04       ` Hans Reiser
  2004-03-23  3:03         ` Scott Young
  0 siblings, 1 reply; 22+ messages in thread
From: Hans Reiser @ 2004-03-22 20:04 UTC (permalink / raw)
  To: Scott Young; +Cc: reiserfs-list, Edward Shishkin

Scott Young wrote:

>>That's common misconception. :)
>>
>>The goal of compression is to conserve disk bandwidth rather than space.
>>
>>By compressing it is possible to transfer data (== uncompressed data
>>user works with), at a rate higher than raw device bandwidth.
>>    
>>
>
>I will be doing some research on an algorithm that speeds up data
>transfers over a network by adaptively selecting a compression
>algorithm.  It can be applied to filesystem reads and writes too.  When
>the send queue is reasonably full on the server, it starts compressing
>data at the tail of the queue while sending the data at the head of the
>queue.  If the output stream catches up to segment currently being
>compressed, then that segment is sent uncompressed.  If the compressed
>data is not significantly smaller, then the uncompressed data is sent
>instead.  For network applications that are not network interface bound
>(like rsync over a 100mbit connection), the buffer will be empty most of
>the time and therefore little compression would be needed or wanted as
>it would only slow the application down.  Compression is chosen from a
>pool of algorithms and varied depending on the history of buffer
>overflows and under-runs.  Slower, better compression algorithms are
>used when the buffer is mostly full and the compression is observably
>effective.  The idea here is to minimize the time between the client
>requesting the data and having the usable data in a minimal amount of
>time.  This can be seen as a time-verses-amount-of-usable-data-on-client
>graph, and some applications prefer a low latency for the initial stream
>of data (such as a web page) whereas some prefer the time to retrieve a
>very large piece of data (such as scp scott@1.2.3.4/SomeBigDocument.sxw
>/home/scott over a 56k modem).
>
>Adapting this to filesystem concepts, the server can be seen as the
>write process and the client can be seen as the read process. 
>
I don't understand.  Why not view the client as the disk drive and the 
bus as the network?

> The idea
>can be applied to Reiser4 by compressing the overwrite set while the
>journal data is being written, and then compressing the tail of the
>relocate set moving backwards until the write stream catches up to the
>compression.  It could also take into account the estimated
>decompression time when reading the data back, and use it for deciding
>whether the compression ratio is good enough to write the compressed
>data instead of the uncompressed data.
>  
>
I didn't understand the above.

>Another interesting twist would be to cache the compressed data if the
>same data is going to be sent from the server several times.  This
>reduces CPU overhead on the server (and possibly it's memory
>requirements for caching the data, and reduces the amount of data that
>needs to be read from the drive), but it is complicated in the context
>of a network algorithm and is mostly application-dependent.  This is
>research for another day, maybe in the form of a derived-data plugin for
>ReiserFS where an application tells the filesystem how to construct the
>file, and the filesystem can store the original, the result, or both,
>depending on space needs and performance analysis, with copy-on-write
>metadata flags when appropriate.
>  
>
I didn't understand the above.

>I haven't started coding the adaptive compression algorithm yet, but I
>have a general idea about how I am going to implement it.  For the
>proof-of-concept, I want to write this using sockets and some basic
>library compression algorithms (gzip, bzip2, and maybe a simple MTF +
>Adaptive Huffman).  Later variants may work with TCP or other protocols
>around that layer.  Any suggestions will be appreciated.
>  
>
I think we need to use adaptive compression in Reiser4, based on the 
type of file being compressed,  and anyone who finds it interesting to 
develop heuristics for selecting compression strategies is welcome to 
help and join the fun.

>
>Scott Young
>
>
>
>
>
>
>  
>


-- 
Hans


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-19 14:25 Can compression at filesystem level improve overall performance? Erik Terpstra
  2004-03-19 16:29 ` Redeeman
  2004-03-19 18:59 ` Hans Reiser
@ 2004-03-23  0:17 ` Miguel
  2 siblings, 0 replies; 22+ messages in thread
From: Miguel @ 2004-03-23  0:17 UTC (permalink / raw)
  To: Erik Terpstra; +Cc: reiserfs-list

On Fri, 19 Mar 2004 15:25:49 +0100
Erik Terpstra <erik@solidcode.net> wrote:

> Hello everyone,
> 
> For the last couple of years I noticed that the performance of most of
> 
> my systems has it's bottleneck in data throughput rather than CPU 
> performance.
> 
> Is it fair to say that today compression at the filesystem level would
> 
> improve overall performance?
> 
> If this is the case, it probably wouldn't be too hard to implement as
> a module in Reiser4?
> 
> Any thoughts?
> 
> --Erik.
> 
There's some publication on this issue:

http://ssrc.cse.ucsc.edu/Papers/ucsc-crl-03-04.pdf (Measuring the
compressibility of metadata and small files for disk/NVRAM hybrid
storage systems) at http://ssrc.cse.ucsc.edu/publications.shtml

I'dont know if it's statements are correct but it's a beginning

-- 
La resistencia es fútil todos seréis asimilados

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-22 20:04       ` Hans Reiser
@ 2004-03-23  3:03         ` Scott Young
  2004-03-23 10:59           ` Hans Reiser
  2004-03-29  5:16           ` Tom Vier
  0 siblings, 2 replies; 22+ messages in thread
From: Scott Young @ 2004-03-23  3:03 UTC (permalink / raw)
  To: reiserfs-list

On Mon, 2004-03-22 at 15:04, Hans Reiser wrote:
> Scott Young wrote:
> 
> >>That's common misconception. :)
> >>
> >>The goal of compression is to conserve disk bandwidth rather than space.
> >>
> >>By compressing it is possible to transfer data (== uncompressed data
> >>user works with), at a rate higher than raw device bandwidth.
> >>    
> >>
> >
> >I will be doing some research on an algorithm that speeds up data
> >transfers over a network by adaptively selecting a compression
> >algorithm.  It can be applied to filesystem reads and writes too.  When
> >the send queue is reasonably full on the server, it starts compressing
> >data at the tail of the queue while sending the data at the head of the
> >queue.  If the output stream catches up to segment currently being
> >compressed, then that segment is sent uncompressed.  If the compressed
> >data is not significantly smaller, then the uncompressed data is sent
> >instead.  For network applications that are not network interface bound
> >(like rsync over a 100mbit connection), the buffer will be empty most of
> >the time and therefore little compression would be needed or wanted as
> >it would only slow the application down.  Compression is chosen from a
> >pool of algorithms and varied depending on the history of buffer
> >overflows and under-runs.  Slower, better compression algorithms are
> >used when the buffer is mostly full and the compression is observably
> >effective.  The idea here is to minimize the time between the client
> >requesting the data and having the usable data in a minimal amount of
> >time.  This can be seen as a time-verses-amount-of-usable-data-on-client
> >graph, and some applications prefer a low latency for the initial stream
> >of data (such as a web page) whereas some prefer the time to retrieve a
> >very large piece of data (such as scp scott@1.2.3.4/SomeBigDocument.sxw
> >/home/scott over a 56k modem).
> >
> >Adapting this to filesystem concepts, the server can be seen as the
> >write process and the client can be seen as the read process. 
> >
> I don't understand.  Why not view the client as the disk drive and the 
> bus as the network?

Interesting view.  The "server," as I see it, is the source of the data,
and the client is the ultimate destination.  In the context of a
filesystem, one could see the disk as the ultimate destination. 
However, data on a disk has no use at all until it is read and used by
some application.  Therefore, I see writing to the filesystem as a
server and, reading from the filesystem as a client, and the bus and
disk drive together as the network (at least when translating the
concept from a network context to a filesystem context).

> > The idea
> >can be applied to Reiser4 by compressing the overwrite set while the
> >journal data is being written, and then compressing the tail of the
> >relocate set moving backwards until the write stream catches up to the
> >compression.  It could also take into account the estimated
> >decompression time when reading the data back, and use it for deciding
> >whether the compression ratio is good enough to write the compressed
> >data instead of the uncompressed data.
> >  
> >
> I didn't understand the above.

What I'm saying is that you can start writing uncompressed data to the
drive while the yet-to-be-written data is being compressed.  The goal is
to have some segments of data compressed, and have them compressed
before they come up next for writing to the drive.  Virtually no time is
lost if the data cannot be compressed because the data will be sent to
the disk at full speed anyway, whether or not the system had time to
compress it.  The repacker could compress the data that was written to
the drive before it could be compressed if the repacker thinks that
compression would speed up reading of that data (or significantly reduce
space usage, which will generally happen at the same time as data moving
faster because of compression).


> >Another interesting twist would be to cache the compressed data if the
> >same data is going to be sent from the server several times.  This
> >reduces CPU overhead on the server (and possibly it's memory
> >requirements for caching the data, and reduces the amount of data that
> >needs to be read from the drive), but it is complicated in the context
> >of a network algorithm and is mostly application-dependent.  This is
> >research for another day, maybe in the form of a derived-data plugin for
> >ReiserFS where an application tells the filesystem how to construct the
> >file, and the filesystem can store the original, the result, or both,
> >depending on space needs and performance analysis, with copy-on-write
> >metadata flags when appropriate.
> >  
> >
> I didn't understand the above.

Yeah, I realize now what I wrote was an incoherent mess.  I think too
much and write too little sometimes :-)  I was combining the idea for
networking with another different idea for ReiserFS, and that
combination came out unclear.

As an example, take a web server that has HTTP compression enabled. 
Instead of compressing the data once per request, simply store the
compressed version and send that when the next request occurs.  The
server isn't compressing data all the time, so the CPU can be used for
other tasks.  To make life easier on the web-server developers, the
filesystem could have an interface that allows for defining a file as a
derivative of another with some transformation done to that file, such
as a compression algorithm.  It'd be like telling the filesystem that
"File B is what happens when you do X to file A."  The web server
developer would not have to worry about storing the compressed file to
increase performance because it would be handled by the filesystem. 
Furthermore, the web server developer would not have to use the
compression libraries directly, making their job much easier.

There could be a "live" derivative, where any change to the original
file reflects in the derivative file, or they could be constant, where
any change in the original does not reflect in the derivative.  When I
say "constant" I mean that the source file can be thought of as being
constant because any change to the original will not be reflected in the
derivative.  Think of the simplest case: a derivative file with the
identity transform applied to the original file, or simply stated,
copying a file.  A "live" derivative with the identity transform would
act like a hard link, and a constant derivative with the identity
transform would act like copying the entire file and storing it
separately (Underneath the covers, the transform could just link the
derived file to the original with copy-on-write enabled, thereby
improving performance).  This becomes complicated depending on what
files are actually stored and what is the fastest way to arrange
things.  The derivative could simply be calculated from the original,
and space can be saved by not storing the derivative on disk, or CPU
time could be saved by storing it on disk (e.g. an MD5 derived-file). 
Sometimes the derivative may load faster if it is calculated from the
original (e.g. if /dev/urandom is implemented as a derived stream from
some seed file.  In this case the file size could be infinite too, so
storing it would not be prudent).  If the derivative file is not stored
on the disk, and it is marked as a constant derivative, then the
original file would have to use copy-on-write so that the constant
derivative file isn't altered.

Your idea of /etc/.passwd as the combination of small per-user password
files could be implemented using derived files.  They would need some
way of moving the writes to the combined file back to the original small
files, but that could be a part of the derived-data plugin for combined
files.

As another example, gcc could create a "live" derived file a.out that
that means "a.out is the live derivative of files src1.c, src2.c,
src3.c, src4.c when applied using the command gcc -o a.out src1.c src2.c
src3.c src4.c".  Of course it would be encoded in the syntax that the
derived-file plugin would understand.  When the source files change, the
developer wouldn't have to rerun gcc to run the executable.  The file
would automagically be compiled from the sources.

The more I think about it, it seems that derived-files could be
implemented as plugin(s) with only one extra feature added to the core
filesystem: copy-on-write.  (Copy-on-write could be extended to only
write changes, and have the new version be constructed as a derived file
from the original, thereby compactly maintaining multiple versions of a
file and improving write performance, but I digress)

> >I haven't started coding the adaptive compression algorithm yet, but I
> >have a general idea about how I am going to implement it.  For the
> >proof-of-concept, I want to write this using sockets and some basic
> >library compression algorithms (gzip, bzip2, and maybe a simple MTF +
> >Adaptive Huffman).  Later variants may work with TCP or other protocols
> >around that layer.  Any suggestions will be appreciated.
> >  
> >
> I think we need to use adaptive compression in Reiser4, based on the 
> type of file being compressed,  

I think it should be based on how well compression is working during the
actual write.  That way you don't have the overhead of compression when
there is no benefit of compression, and CPU-bound applications will not
run slower because of CPU-intensive compression.  The type of file could
be used, but I'd like to see it as only a suggestion to the adaptive
compression heuristic.

> and anyone who finds it interesting to 
> develop heuristics for selecting compression strategies is welcome to 
> help and join the fun.

And it shall be fun!

I have many other ideas relating to compression.  In high school I
designed a filesystem that used the ideas in rsync to find identical
segments of data between files, and only write the matching segments
once.  It could barely be called a filesystem (It was written as a
filesystem-within-a-file using java and I didn't implement deletion or
mounting), but it gave me many ideas and directions to go for making the
general idea work in a real filesystem.  One thing I realized is that an
array of millions or billions of random hashes can be compressed, and a
Trie is a good basis for implementing a compressed data structure for
random hashes (I could further describe the complete data structure I
designed, but that is probably worth writing a coherent peer-reviewed
paper instead of quickly writing an incoherent email on the topic). 
Furthermore, I realized that my filesystem algorithm would significantly
slow down the system, and only be applicable in specific circumstances
(such as a when there is a dedicated file server that has plenty of CPU
and RAM to give up).  This leads to the network algorithm for
compressing when compressing is good:  it embodies a general heuristic 
for algorithms that have parts that don't have to be done but can have
positive effects they are done.  For example, compression does not have
to be done to send data across a network, and an rsync-like signature
search does not need to be done to write data to a filesystem, but using
them will lead to a performance improvement in some circumstances.

If/when I implement these ideas, I will contribute them to ReiserFS. 
I'd have more time to work on this if I didn't have college and a
part-time job, but there's no rush to implement all these features.








^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-23  3:03         ` Scott Young
@ 2004-03-23 10:59           ` Hans Reiser
  2004-03-24 16:19             ` Scott Young
  2004-03-29  5:16           ` Tom Vier
  1 sibling, 1 reply; 22+ messages in thread
From: Hans Reiser @ 2004-03-23 10:59 UTC (permalink / raw)
  To: Scott Young; +Cc: reiserfs-list

Scott Young wrote:

>On Mon, 2004-03-22 at 15:04, Hans Reiser wrote:
>  
>
>>Scott Young wrote:
>>
>>    
>>
>>>>That's common misconception. :)
>>>>
>>>>The goal of compression is to conserve disk bandwidth rather than space.
>>>>
>>>>By compressing it is possible to transfer data (== uncompressed data
>>>>user works with), at a rate higher than raw device bandwidth.
>>>>   
>>>>
>>>>        
>>>>
>>>I will be doing some research on an algorithm that speeds up data
>>>transfers over a network by adaptively selecting a compression
>>>algorithm.  It can be applied to filesystem reads and writes too.  When
>>>the send queue is reasonably full on the server, it starts compressing
>>>data at the tail of the queue while sending the data at the head of the
>>>queue.  If the output stream catches up to segment currently being
>>>compressed, then that segment is sent uncompressed.  If the compressed
>>>data is not significantly smaller, then the uncompressed data is sent
>>>instead.  For network applications that are not network interface bound
>>>(like rsync over a 100mbit connection), the buffer will be empty most of
>>>the time and therefore little compression would be needed or wanted as
>>>it would only slow the application down.  Compression is chosen from a
>>>pool of algorithms and varied depending on the history of buffer
>>>overflows and under-runs.  Slower, better compression algorithms are
>>>used when the buffer is mostly full and the compression is observably
>>>effective.  The idea here is to minimize the time between the client
>>>requesting the data and having the usable data in a minimal amount of
>>>time.  This can be seen as a time-verses-amount-of-usable-data-on-client
>>>graph, and some applications prefer a low latency for the initial stream
>>>of data (such as a web page) whereas some prefer the time to retrieve a
>>>very large piece of data (such as scp scott@1.2.3.4/SomeBigDocument.sxw
>>>/home/scott over a 56k modem).
>>>
>>>Adapting this to filesystem concepts, the server can be seen as the
>>>write process and the client can be seen as the read process. 
>>>
>>>      
>>>
>>I don't understand.  Why not view the client as the disk drive and the 
>>bus as the network?
>>    
>>
>
>Interesting view.  The "server," as I see it, is the source of the data,
>and the client is the ultimate destination.  In the context of a
>filesystem, one could see the disk as the ultimate destination. 
>However, data on a disk has no use at all until it is read and used by
>some application.  Therefore, I see writing to the filesystem as a
>server and, reading from the filesystem as a client, and the bus and
>disk drive together as the network (at least when translating the
>concept from a network context to a filesystem context).
>
>  
>
>>>The idea
>>>can be applied to Reiser4 by compressing the overwrite set while the
>>>journal data is being written, and then compressing the tail of the
>>>relocate set moving backwards until the write stream catches up to the
>>>compression.  It could also take into account the estimated
>>>decompression time when reading the data back, and use it for deciding
>>>whether the compression ratio is good enough to write the compressed
>>>data instead of the uncompressed data.
>>> 
>>>
>>>      
>>>
>>I didn't understand the above.
>>    
>>
>
>What I'm saying is that you can start writing uncompressed data to the
>drive while the yet-to-be-written data is being compressed.  The goal is
>to have some segments of data compressed, and have them compressed
>before they come up next for writing to the drive.  Virtually no time is
>lost if the data cannot be compressed because the data will be sent to
>the disk at full speed anyway, whether or not the system had time to
>compress it.  The repacker could compress the data that was written to
>the drive before it could be compressed if the repacker thinks that
>compression would speed up reading of that data (or significantly reduce
>space usage, which will generally happen at the same time as data moving
>faster because of compression).
>  
>
Too much complexity.  LZW is fast and effective.

>
>  
>
>>>Another interesting twist would be to cache the compressed data if the
>>>same data is going to be sent from the server several times.  This
>>>reduces CPU overhead on the server (and possibly it's memory
>>>requirements for caching the data, and reduces the amount of data that
>>>needs to be read from the drive), but it is complicated in the context
>>>of a network algorithm and is mostly application-dependent.  This is
>>>research for another day, maybe in the form of a derived-data plugin for
>>>ReiserFS where an application tells the filesystem how to construct the
>>>file, and the filesystem can store the original, the result, or both,
>>>depending on space needs and performance analysis, with copy-on-write
>>>metadata flags when appropriate.
>>> 
>>>
>>>      
>>>
>>I didn't understand the above.
>>    
>>
>
>Yeah, I realize now what I wrote was an incoherent mess.  I think too
>much and write too little sometimes :-)  I was combining the idea for
>networking with another different idea for ReiserFS, and that
>combination came out unclear.
>
>As an example, take a web server that has HTTP compression enabled. 
>Instead of compressing the data once per request, simply store the
>compressed version and send that when the next request occurs.  The
>server isn't compressing data all the time, so the CPU can be used for
>other tasks.  To make life easier on the web-server developers, the
>filesystem could have an interface that allows for defining a file as a
>derivative of another with some transformation done to that file, such
>as a compression algorithm.  It'd be like telling the filesystem that
>"File B is what happens when you do X to file A."  The web server
>developer would not have to worry about storing the compressed file to
>increase performance because it would be handled by the filesystem. 
>Furthermore, the web server developer would not have to use the
>compression libraries directly, making their job much easier.
>
>There could be a "live" derivative, where any change to the original
>file reflects in the derivative file, or they could be constant, where
>any change in the original does not reflect in the derivative.  When I
>say "constant" I mean that the source file can be thought of as being
>constant because any change to the original will not be reflected in the
>derivative.  Think of the simplest case: a derivative file with the
>identity transform applied to the original file, or simply stated,
>copying a file.  A "live" derivative with the identity transform would
>act like a hard link, and a constant derivative with the identity
>transform would act like copying the entire file and storing it
>separately (Underneath the covers, the transform could just link the
>derived file to the original with copy-on-write enabled, thereby
>improving performance).  This becomes complicated depending on what
>files are actually stored and what is the fastest way to arrange
>things.  The derivative could simply be calculated from the original,
>and space can be saved by not storing the derivative on disk, or CPU
>time could be saved by storing it on disk (e.g. an MD5 derived-file). 
>Sometimes the derivative may load faster if it is calculated from the
>original (e.g. if /dev/urandom is implemented as a derived stream from
>some seed file.  In this case the file size could be infinite too, so
>storing it would not be prudent).  If the derivative file is not stored
>on the disk, and it is marked as a constant derivative, then the
>original file would have to use copy-on-write so that the constant
>derivative file isn't altered.
>
>Your idea of /etc/.passwd as the combination of small per-user password
>files could be implemented using derived files.  They would need some
>way of moving the writes to the combined file back to the original small
>files, but that could be a part of the derived-data plugin for combined
>files.
>
>As another example, gcc could create a "live" derived file a.out that
>that means "a.out is the live derivative of files src1.c, src2.c,
>src3.c, src4.c when applied using the command gcc -o a.out src1.c src2.c
>src3.c src4.c".  Of course it would be encoded in the syntax that the
>derived-file plugin would understand.  When the source files change, the
>developer wouldn't have to rerun gcc to run the executable.  The file
>would automagically be compiled from the sources.
>
>The more I think about it, it seems that derived-files could be
>implemented as plugin(s) with only one extra feature added to the core
>filesystem: copy-on-write.  (Copy-on-write could be extended to only
>write changes, and have the new version be constructed as a derived file
>from the original, thereby compactly maintaining multiple versions of a
>file and improving write performance, but I digress)
>  
>
Consider our use of compression atoms, it may reduce the incentive for 
what you describe.

>  
>
>>>I haven't started coding the adaptive compression algorithm yet, but I
>>>have a general idea about how I am going to implement it.  For the
>>>proof-of-concept, I want to write this using sockets and some basic
>>>library compression algorithms (gzip, bzip2, and maybe a simple MTF +
>>>Adaptive Huffman).  Later variants may work with TCP or other protocols
>>>around that layer.  Any suggestions will be appreciated.
>>> 
>>>
>>>      
>>>
>>I think we need to use adaptive compression in Reiser4, based on the 
>>type of file being compressed,  
>>    
>>
>
>I think it should be based on how well compression is working during the
>actual write.  That way you don't have the overhead of compression when
>there is no benefit of compression, and CPU-bound applications will not
>run slower because of CPU-intensive compression.  The type of file could
>be used, but I'd like to see it as only a suggestion to the adaptive
>compression heuristic.
>
>  
>
>>and anyone who finds it interesting to 
>>develop heuristics for selecting compression strategies is welcome to 
>>help and join the fun.
>>    
>>
>
>And it shall be fun!
>
>I have many other ideas relating to compression.  In high school I
>designed a filesystem that used the ideas in rsync to find identical
>segments of data between files, and only write the matching segments
>once.  It could barely be called a filesystem (It was written as a
>filesystem-within-a-file using java and I didn't implement deletion or
>mounting), but it gave me many ideas and directions to go for making the
>general idea work in a real filesystem.  One thing I realized is that an
>array of millions or billions of random hashes can be compressed, and a
>Trie is a good basis for implementing a compressed data structure for
>random hashes (I could further describe the complete data structure I
>designed, but that is probably worth writing a coherent peer-reviewed
>paper instead of quickly writing an incoherent email on the topic). 
>  
>
I think ATT and MS have done work in this area.  It is interesting 
topic, you need to (deeply) worry about adding additional seeks though.

>Furthermore, I realized that my filesystem algorithm would significantly
>slow down the system, and only be applicable in specific circumstances
>(such as a when there is a dedicated file server that has plenty of CPU
>and RAM to give up).  This leads to the network algorithm for
>compressing when compressing is good:  it embodies a general heuristic 
>for algorithms that have parts that don't have to be done but can have
>positive effects they are done.  For example, compression does not have
>to be done to send data across a network, and an rsync-like signature
>search does not need to be done to write data to a filesystem, but using
>them will lead to a performance improvement in some circumstances.
>
>If/when I implement these ideas, I will contribute them to ReiserFS. 
>I'd have more time to work on this if I didn't have college and a
>part-time job, but there's no rush to implement all these features.
>
>
>
>
>
>
>
>
>
>  
>


-- 
Hans


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-23 10:59           ` Hans Reiser
@ 2004-03-24 16:19             ` Scott Young
  2004-03-29  5:25               ` Tom Vier
  0 siblings, 1 reply; 22+ messages in thread
From: Scott Young @ 2004-03-24 16:19 UTC (permalink / raw)
  To: reiserfs-list

Oops... I forgot to use "reply to list" instead of "reply to sender."



> >What I'm saying is that you can start writing uncompressed data to the
> >drive while the yet-to-be-written data is being compressed.  The goal is
> >to have some segments of data compressed, and have them compressed
> >before they come up next for writing to the drive.  Virtually no time is
> >lost if the data cannot be compressed because the data will be sent to
> >the disk at full speed anyway, whether or not the system had time to
> >compress it.  The repacker could compress the data that was written to
> >the drive before it could be compressed if the repacker thinks that
> >compression would speed up reading of that data (or significantly reduce
> >space usage, which will generally happen at the same time as data moving
> >faster because of compression).
> >  
> >
> Too much complexity.  LZW is fast and effective.

If you apply LZW when the data is not compressible, or when running a
CPU bound application, then you would slow the system down.  My approach
tries to eliminate that	effect (at least during writes) by changing the
compression algorithm (possibly to no compression) depending on how the
compression is doing in relation to the drive writes.  Set up a race
between the compression and the drive write, and the data will always be
written at an optimal speed, whether the data is coming from an IO or
CPU intensive application.


> >As an example, take a web server that has HTTP compression enabled. 
> >Instead of compressing the data once per request, simply store the
> >compressed version and send that when the next request occurs.  The
> >server isn't compressing data all the time, so the CPU can be used for
> >other tasks.  To make life easier on the web-server developers, the
> >filesystem could have an interface that allows for defining a file as a
> >derivative of another with some transformation done to that file, such
> >as a compression algorithm.  It'd be like telling the filesystem that
> >"File B is what happens when you do X to file A."  The web server
> >developer would not have to worry about storing the compressed file to
> >increase performance because it would be handled by the filesystem. 
> >Furthermore, the web server developer would not have to use the
> >compression libraries directly, making their job much easier.
> >
> >There could be a "live" derivative, where any change to the original
> >file reflects in the derivative file, or they could be constant, where
> >any change in the original does not reflect in the derivative.  When I
> >say "constant" I mean that the source file can be thought of as being
> >constant because any change to the original will not be reflected in the
> >derivative.  Think of the simplest case: a derivative file with the
> >identity transform applied to the original file, or simply stated,
> >copying a file.  A "live" derivative with the identity transform would
> >act like a hard link, and a constant derivative with the identity
> >transform would act like copying the entire file and storing it
> >separately (Underneath the covers, the transform could just link the
> >derived file to the original with copy-on-write enabled, thereby
> >improving performance).  This becomes complicated depending on what
> >files are actually stored and what is the fastest way to arrange
> >things.  The derivative could simply be calculated from the original,
> >and space can be saved by not storing the derivative on disk, or CPU
> >time could be saved by storing it on disk (e.g. an MD5 derived-file). 
> >Sometimes the derivative may load faster if it is calculated from the
> >original (e.g. if /dev/urandom is implemented as a derived stream from
> >some seed file.  In this case the file size could be infinite too, so
> >storing it would not be prudent).  If the derivative file is not stored
> >on the disk, and it is marked as a constant derivative, then the
> >original file would have to use copy-on-write so that the constant
> >derivative file isn't altered.
> >
> >Your idea of /etc/.passwd as the combination of small per-user password
> >files could be implemented using derived files.  They would need some
> >way of moving the writes to the combined file back to the original small
> >files, but that could be a part of the derived-data plugin for combined
> >files.
> >
> >As another example, gcc could create a "live" derived file a.out that
> >that means "a.out is the live derivative of files src1.c, src2.c,
> >src3.c, src4.c when applied using the command gcc -o a.out src1.c src2.c
> >src3.c src4.c".  Of course it would be encoded in the syntax that the
> >derived-file plugin would understand.  When the source files change, the
> >developer wouldn't have to rerun gcc to run the executable.  The file
> >would automagically be compiled from the sources.
> >
> >The more I think about it, it seems that derived-files could be
> >implemented as plugin(s) with only one extra feature added to the core
> >filesystem: copy-on-write.  (Copy-on-write could be extended to only
> >write changes, and have the new version be constructed as a derived file
> >from the original, thereby compactly maintaining multiple versions of a
> >file and improving write performance, but I digress)
> >  
> >
> Consider our use of compression atoms, it may reduce the incentive for 
> what you describe.

I'm not exactly sure about what you mean by a compression atom, but what
I describe goes beyond mere compression.  It would allow the application
developers to use, as an example, compressed versions of a file and not
have to worry about how, when, and if the compressed version of a file
is stored.  All they would have to say is "I want a compressed version
of this file," and the filesystem would efficiently deliver that
request, possibly storing the result to speed up future identical
requests.  Derived data is mostly related to the semantic layer while
the compress-as-you-write idea is completely related to the storage
layer.


> >I have many other ideas relating to compression.  In high school I
> >designed a filesystem that used the ideas in rsync to find identical
> >segments of data between files, and only write the matching segments
> >once.  It could barely be called a filesystem (It was written as a
> >filesystem-within-a-file using java and I didn't implement deletion or
> >mounting), but it gave me many ideas and directions to go for making the
> >general idea work in a real filesystem.  One thing I realized is that an
> >array of millions or billions of random hashes can be compressed, and a
> >Trie is a good basis for implementing a compressed data structure for
> >random hashes (I could further describe the complete data structure I
> >designed, but that is probably worth writing a coherent peer-reviewed
> >paper instead of quickly writing an incoherent email on the topic). 
> >  
> >
> I think ATT and MS have done work in this area.  It is interesting 
> topic, you need to (deeply) worry about adding additional seeks though.

I believe MS calls it Shadow Copy, and all it does is write what was
changed in a file and keep the old versions.  Using rsync for finding
matching data is a bit more effective at reducing space usage, and
storing the signatures on all of the files on the filesystem leads to
yet another algorithm I have devised: a network algorithm that can
quickly find matching blocks of data between a file being downloaded and
any file on the filesystem (like rsync, but without the need to specify
source files), AND it can easily be layered on top of other systems like
BitTorrent or Freenet.

And yes, additional seeks is a major problem.  One solution to this
would be to do a complete copy-on-write when modifying a file (or at
least copy enough to minimize seeks), and have the repacker encode the
old version of the file using the signature-search technique.  There is
no need for additional seeks in the most recent version of the file, and
the old versions are stored in a compact manner, especially if that
particular file is large with few changes.  Of course, some performance
is sacrificed by copying the file on write, but much of this loss can be
overcome, and the result is the world's fastest versioning filesystem. 
Additional metadata could identify the program and the user that made
that specific version of a file, and any malicious changes can be
selectively rolled back.  An administrator may use the metadata to audit
the changes a file made to the filesystem, or a program could be run in
a jail where the changes it makes to the filesystem are only visible to
itself.  Versioning opens up a plethora of possibilities for data
security.

Looking at the grid of security Hans has previously described (files x
users x programs), instead of having the value at each point be rw, r-,
-w, or --, versioning would add in rv and -v, where the touched files
are written using versioning control and possibly rj and -j for a
changes-are-only-visible-to-the-program-itself-or-admins-who-want-to-
see-those-changes jail.






^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-23  3:03         ` Scott Young
  2004-03-23 10:59           ` Hans Reiser
@ 2004-03-29  5:16           ` Tom Vier
  2004-03-30  3:34             ` Scott Young
  1 sibling, 1 reply; 22+ messages in thread
From: Tom Vier @ 2004-03-29  5:16 UTC (permalink / raw)
  To: reiserfs-list

On Mon, Mar 22, 2004 at 10:03:36PM -0500, Scott Young wrote:
> What I'm saying is that you can start writing uncompressed data to the
> drive while the yet-to-be-written data is being compressed.  The goal is

i like your race idea. one problem is all writes now take significant cpu
(depending on the method). i'm wondering about a daemon that (in userspace)
that scans for files that are worth compressing. that's what disk doubler
used to do. i'd be interested to see some benchmarks, to see if compression
has much effect on total throughput.

> As an example, take a web server that has HTTP compression enabled. 
> Instead of compressing the data once per request, simply store the
> compressed version and send that when the next request occurs.  The

the server should cache the compressed versions in files. i'm pretty sure
apache does.

> There could be a "live" derivative, where any change to the original
> file reflects in the derivative file, or they could be constant, where
> any change in the original does not reflect in the derivative.  When I
<snip>

i dunno about doing all this in the kernel. as it is, i wish the kernel
wasn't so complicated. 8) what about a preload lib that puts everything
through bitkeeper or cvs?

> developer wouldn't have to rerun gcc to run the executable.  The file
> would automagically be compiled from the sources.

ide's can do this. and do you really want to have it run every time you
save? i save often, many times when i know there are going to be problems
cuz i haven't finished what i'm working on. a "lint" button would be better,
imho.

> The more I think about it, it seems that derived-files could be
> implemented as plugin(s) with only one extra feature added to the core
> filesystem: copy-on-write.  (Copy-on-write could be extended to only
> write changes, and have the new version be constructed as a derived file
> from the original, thereby compactly maintaining multiple versions of a
> file and improving write performance, but I digress)

cow is something that i've wanted to have ever since using hard links for
kernel trees (cp -al).

-- 
Tom Vier <tmv@comcast.net>
DSA Key ID 0x15741ECE

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-24 16:19             ` Scott Young
@ 2004-03-29  5:25               ` Tom Vier
  0 siblings, 0 replies; 22+ messages in thread
From: Tom Vier @ 2004-03-29  5:25 UTC (permalink / raw)
  To: reiserfs-list

On Wed, Mar 24, 2004 at 11:19:23AM -0500, Scott Young wrote:
> Additional metadata could identify the program and the user that made
> that specific version of a file, and any malicious changes can be
> selectively rolled back.  An administrator may use the metadata to audit
> the changes a file made to the filesystem, or a program could be run in
> a jail where the changes it makes to the filesystem are only visible to
> itself.  Versioning opens up a plethora of possibilities for data
> security.

or you could just chattr +i, and avoid changes in the first place. 8)

-- 
Tom Vier <tmv@comcast.net>
DSA Key ID 0x15741ECE

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-29  5:16           ` Tom Vier
@ 2004-03-30  3:34             ` Scott Young
  2004-03-30  4:53               ` Tom Vier
  0 siblings, 1 reply; 22+ messages in thread
From: Scott Young @ 2004-03-30  3:34 UTC (permalink / raw)
  To: reiserfs-list

On Mon, 2004-03-29 at 00:16, Tom Vier wrote:
> On Mon, Mar 22, 2004 at 10:03:36PM -0500, Scott Young wrote:
> > What I'm saying is that you can start writing uncompressed data to the
> > drive while the yet-to-be-written data is being compressed.  The goal is
> 
> i like your race idea. one problem is all writes now take significant cpu
> (depending on the method). 

The neat part about it is that the CPU will be used, but in a way that
it is generally not a problem.  It shouldn't slow down cpu-bound
applications because in that case the buffer will be mostly empty while
it computes.  However, an app writing compressible data might slow down
another CPU-intensive one.  I have realized that the compression should
be done in a separate thread (because it would otherwise block as it's
running), and all writes could be compressed using the same low-priority
thread (or threads, since there would be 1 thread per processor).

> i'm wondering about a daemon that (in userspace)
> that scans for files that are worth compressing. that's what disk doubler
> used to do. i'd be interested to see some benchmarks, to see if compression
> has much effect on total throughput.

This is what the repacker is for.  It squeezes slums together to
increase fanout (and therefore decrease number of seeks), and eventually
some other compression methods may be implemented to speed up total
throughput.


> > As an example, take a web server that has HTTP compression enabled. 
> > Instead of compressing the data once per request, simply store the
> > compressed version and send that when the next request occurs.  The
> 
> the server should cache the compressed versions in files. i'm pretty sure
> apache does.

It probably does, but it's not particularly easy for the webserver
developers.  How do they know if the uncompressed version of a file has
changed?  It's much more complicated to do this in an application that
can not easily detect file changes.  

> 
> > There could be a "live" derivative, where any change to the original
> > file reflects in the derivative file, or they could be constant, where
> > any change in the original does not reflect in the derivative.  When I
> <snip>
> 
> i dunno about doing all this in the kernel. as it is, i wish the kernel
> wasn't so complicated. 8) what about a preload lib that puts everything
> through bitkeeper or cvs?

I don't quite understand what you mean about putting things through
bitkeeper or cvs, but it would probably be good to implement (at least
part of) this in VFS, as a replacement for VFS, as a new layer
underneath VFS, or as a module in VFS.  Why complicate the filesystem
code (except to maybe send higher-level instructions to the FS so the FS
can optimize its execution of those instructions, like sending the
instruction "copy A to B" to the FS so the FS can do copy-on-write
instead of having to take a bunch of "read this from A" and "write this
to B" instructions that have the same effect)?

> > developer wouldn't have to rerun gcc to run the executable.  The file
> > would automagically be compiled from the sources.
> 
> ide's can do this. and do you really want to have it run every time you
> save? i save often, many times when i know there are going to be problems
> cuz i haven't finished what i'm working on. 

No, it would compile when you access the executable and it realizes that
it's source files have changed, and there could be a usespace program
(running at nice level 20) that propagates recent changes.

There would also be no need for a makefile, since all of that
information would be stored in the instructions to create the derived
executable.  It would also be trivial to generate the makefile by
running a program that uses the stored instructions in the derived
executable to generate the makefile.  Wouldn't it be nice to have your
makefile generated as you compile the code?  It would be good if the
system could store and express things that are somewhat evident from the
user's actions.

> a "lint" button would be better, imho.

A programming language that is designed well and doesn't need things
like lint would be better yet, but that is a bit off-topic.  If the
systems designers made it easier to make applications, then the
applications designers would have more time to create buttons like that.






^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-30  3:34             ` Scott Young
@ 2004-03-30  4:53               ` Tom Vier
  2004-03-31  4:51                 ` Scott Young
  2004-04-08 11:47                 ` Stewart Smith
  0 siblings, 2 replies; 22+ messages in thread
From: Tom Vier @ 2004-03-30  4:53 UTC (permalink / raw)
  To: Scott Young; +Cc: reiserfs-list

On Mon, Mar 29, 2004 at 10:34:36PM -0500, Scott Young wrote:
> This is what the repacker is for.  It squeezes slums together to
> increase fanout (and therefore decrease number of seeks), and eventually
> some other compression methods may be implemented to speed up total
> throughput.

a little off topic, but
an online defragger is an interesting idea. i think i remember the topic
coming up for ext2 along time ago. iirc, reiserfs can lose performance over
time (usage, actually), too.

> It probably does, but it's not particularly easy for the webserver

why not? it just sends the gzip file (or runs gzip, if there isn't a cached
.gz version). i've noticed that when i use lynx now, it often says it's
using ~/tmp/*.gz, so i think many sites are now using it. dynamic content is
another issue, but that's just an issue of what's the cheapest way to get
the data to the user, buy more throughput or buy more cpu.

> developers.  How do they know if the uncompressed version of a file has
> changed?  It's much more complicated to do this in an application that
> can not easily detect file changes.  

mtime. 8)

> > i dunno about doing all this in the kernel. as it is, i wish the kernel
> > wasn't so complicated. 8) what about a preload lib that puts everything
> > through bitkeeper or cvs?
> 
> I don't quite understand what you mean about putting things through
> bitkeeper or cvs, but it would probably be good to implement (at least

i meant all the versioning stuff. bitkeeper seems fairly complicated, and i
wouldn't want that in the kernel.

> part of) this in VFS, as a replacement for VFS, as a new layer
> underneath VFS, or as a module in VFS.  Why complicate the filesystem
> code (except to maybe send higher-level instructions to the FS so the FS
> can optimize its execution of those instructions, like sending the

i think someone recently announced (or proposed) a user mode fs interface.
that would be nice for things like this (if you insist on putting it all in
the fs 8).

> instruction "copy A to B" to the FS so the FS can do copy-on-write
> instead of having to take a bunch of "read this from A" and "write this
> to B" instructions that have the same effect)?

yeah, cow would be neat.

> > ide's can do this. and do you really want to have it run every time you
> > save? i save often, many times when i know there are going to be problems
> > cuz i haven't finished what i'm working on. 
> 
> No, it would compile when you access the executable and it realizes that
> it's source files have changed, and there could be a usespace program
> (running at nice level 20) that propagates recent changes.

you could just alias cmd="make && cmd".

> There would also be no need for a makefile, since all of that
> information would be stored in the instructions to create the derived

but what format would they be in? how would you describe all the
dependencies? you'd end up with something just like a makefile. now you've
moving make into kernel.

> executable.  It would also be trivial to generate the makefile by
> running a program that uses the stored instructions in the derived
> executable to generate the makefile.  Wouldn't it be nice to have your
> makefile generated as you compile the code?  It would be good if the
> system could store and express things that are somewhat evident from the
> user's actions.

ide's can generate makefiles, i think (i only use emacs).

-- 
Tom Vier <tmv@comcast.net>
DSA Key ID 0x15741ECE

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-30  4:53               ` Tom Vier
@ 2004-03-31  4:51                 ` Scott Young
  2004-04-08 21:46                   ` Tom Vier
  2004-04-08 11:47                 ` Stewart Smith
  1 sibling, 1 reply; 22+ messages in thread
From: Scott Young @ 2004-03-31  4:51 UTC (permalink / raw)
  To: reiserfs-list



> > It probably does, but it's not particularly easy for the webserver
> 
> why not? it just sends the gzip file (or runs gzip, if there isn't a cached
> .gz version). i've noticed that when i use lynx now, it often says it's
> using ~/tmp/*.gz, so i think many sites are now using it. dynamic content is
> another issue, but that's just an issue of what's the cheapest way to get
> the data to the user, buy more throughput or buy more cpu.

If you make things easy and efficient for the webserver developers, then
they can focus their efforts on other optimizations.

> > developers.  How do they know if the uncompressed version of a file has
> > changed?  It's much more complicated to do this in an application that
> > can not easily detect file changes.  
> 
> mtime. 8)

So now they need to somehow store the mtime of the original file with
the gzipped file.  It does not seem so simple anymore, since you have do
deal with storing an mtime in a file, setting up permissions on this
file, not to mention that you're wasting space and you have do deal with
different scalability issues when working with millions or billions of
these mtime data entries and storing both the compressed and original
versions of the file.  The webserver developers should be able to tell
the FS (or some FS interface like VFS) that "B is a compressed version
of A, give me B" just as easily as they can tell the FS that "B is a
copy of A, give me B."


> > > i dunno about doing all this in the kernel. as it is, i wish the kernel
> > > wasn't so complicated. 8) what about a preload lib that puts everything
> > > through bitkeeper or cvs?
> > 
> > I don't quite understand what you mean about putting things through
> > bitkeeper or cvs, but it would probably be good to implement (at least
> 
> i meant all the versioning stuff. bitkeeper seems fairly complicated, and i
> wouldn't want that in the kernel.

It would probably be some relatively simple code in Reiser4's
storage-layer (for efficiency), and then the complicated stuff could be
done in userspace.  It could look like
/some/file/metas/versions/linear/5 from the FS, or maybe just a linked
list: /some/file/metas/versions/previous/versions/previous.  The
userspace programs would still need to do their own thing with diffing,
patching, figuring out how to show a versioning tree from a linear
versioning view, and transacting with the FS, but they would have this
new tool for efficient file versioning.

> > part of) this in VFS, as a replacement for VFS, as a new layer
> > underneath VFS, or as a module in VFS.  Why complicate the filesystem
> > code (except to maybe send higher-level instructions to the FS so the FS
> > can optimize its execution of those instructions, like sending the
> 
> i think someone recently announced (or proposed) a user mode fs interface.
> that would be nice for things like this (if you insist on putting it all in
> the fs 8).

That's what Reiser4 modules are for :)  Now if only VFS was modular....
Could you please post a link to this user mode fs interface?  I'd be
interested in reading about it.

> > > ide's can do this. and do you really want to have it run every time you
> > > save? i save often, many times when i know there are going to be problems
> > > cuz i haven't finished what i'm working on. 
> > 
> > No, it would compile when you access the executable and it realizes that
> > it's source files have changed, and there could be a usespace program
> > (running at nice level 20) that propagates recent changes.
> 
> you could just alias cmd="make && cmd".

I do something similar all the time.  However, with this method you
don't get a userspace program that tries to recompile while you're
rewriting, reviewing, and executing your code.  Wouldn't it be nice to
let the computer work more while you wait less?  This alias hack also
lacks information indicating how cmd was generated, so you can't easily
have utilities that take advantage of this information (like a makefile
generator or an automatic recompiler).

> 
> > There would also be no need for a makefile, since all of that
> > information would be stored in the instructions to create the derived
> 
> but what format would they be in? how would you describe all the
> dependencies? you'd end up with something just like a makefile. now you've
> moving make into kernel.

With modules, Reiser4 is a very extensible filesystem.  The dependencies
could be stored in whatever format is most efficient for storing the
dependencies, and those dependencies could be used in userspace in
whatever complicated way the userspace programs want to use them.  This
is the same as what i'm saying about versioning, but in a slightly
different example.  The FS would just store some more information at the
request of userspace programs.  The FS should provide: 1. a convenient
and powerful way of expressing this information and 2. an efficient way
of storing this information.  Every filesystem so far has failed in
doing this to some extent, and I'm ready for something with more
expressive power.

> > executable.  It would also be trivial to generate the makefile by
> > running a program that uses the stored instructions in the derived
> > executable to generate the makefile.  Wouldn't it be nice to have your
> > makefile generated as you compile the code?  It would be good if the
> > system could store and express things that are somewhat evident from the
> > user's actions.
> 
> ide's can generate makefiles, i think (i only use emacs).

Don't you consider Emacs to be an IDE?  I think it could even be it's
own operating system :)

Joking aside, when you compile using emacs, it could store and use the
fact that the resulting executable was generated by running a bunch of
commands you typed in.  With this information, a userspace program could
help generate the makefile.  I don't have much to say about other IDEs
because I haven't used any Unix IDEs and they probably have no problem
generating makefiles.  IDEs can wrap up so much information because
everything is built within them.  This information-wrapping could be
done in other places to achieve the same effects.






^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-30  4:53               ` Tom Vier
  2004-03-31  4:51                 ` Scott Young
@ 2004-04-08 11:47                 ` Stewart Smith
  1 sibling, 0 replies; 22+ messages in thread
From: Stewart Smith @ 2004-04-08 11:47 UTC (permalink / raw)
  To: Tom Vier; +Cc: Scott Young, reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 558 bytes --]

On Tue, 2004-03-30 at 14:53, Tom Vier wrote:
> an online defragger is an interesting idea. i think i remember the topic
> coming up for ext2 along time ago. iirc, reiserfs can lose performance over
> time (usage, actually), too.

XFS has this, xfs_fsr (part of xfsdump package on debian, might be
called that on other distros too...)

Although... it's pretty hard to get XFS to fragment in the first place,
which is the best way to do things - but it's a hard way :)

-- 
Stewart Smith (stewart@flamingspork.com)
http://www.flamingspork.com/


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Can compression at filesystem level improve overall performance?
  2004-03-31  4:51                 ` Scott Young
@ 2004-04-08 21:46                   ` Tom Vier
  0 siblings, 0 replies; 22+ messages in thread
From: Tom Vier @ 2004-04-08 21:46 UTC (permalink / raw)
  To: reiserfs-list

On Tue, Mar 30, 2004 at 11:51:18PM -0500, Scott Young wrote:
> If you make things easy and efficient for the webserver developers, then
> they can focus their efforts on other optimizations.

i don't know how apache handles caching compressed static content, but it
can't be that hard. i don't think there's any reason to put that in the
kernel.

> > > developers.  How do they know if the uncompressed version of a file has
> > > changed?  It's much more complicated to do this in an application that
> > > can not easily detect file changes.  
> > 
> > mtime. 8)
> 
> So now they need to somehow store the mtime of the original file with
> the gzipped file.  It does not seem so simple anymore, since you have do

no, just check if the mtime of the .html is > the compressed cached version.

> deal with storing an mtime in a file, setting up permissions on this
> file, not to mention that you're wasting space and you have do deal with

wasting space how?

> different scalability issues when working with millions or billions of
> these mtime data entries and storing both the compressed and original

that has to be done somewhere (i don't think the kernel is right place).

> versions of the file.  The webserver developers should be able to tell
> the FS (or some FS interface like VFS) that "B is a compressed version
> of A, give me B" just as easily as they can tell the FS that "B is a
> copy of A, give me B."

> > i meant all the versioning stuff. bitkeeper seems fairly complicated, and i
> > wouldn't want that in the kernel.
> 
> It would probably be some relatively simple code in Reiser4's
> storage-layer (for efficiency), and then the complicated stuff could be
> done in userspace.  It could look like

if it's handing this complexity off to a daemon, then i have less of a
problem with it. i still think apps can (and already do) deal with this
stuff. i just don't think it's useful or necessary. one man's trash is
another's treasure, though.

> /some/file/metas/versions/linear/5 from the FS, or maybe just a linked
> list: /some/file/metas/versions/previous/versions/previous.  The
> userspace programs would still need to do their own thing with diffing,
> patching, figuring out how to show a versioning tree from a linear
> versioning view, and transacting with the FS, but they would have this
> new tool for efficient file versioning.

> > i think someone recently announced (or proposed) a user mode fs interface.
> > that would be nice for things like this (if you insist on putting it all in
> > the fs 8).
> 
> That's what Reiser4 modules are for :)  Now if only VFS was modular....
> Could you please post a link to this user mode fs interface?  I'd be
> interested in reading about it.

i don't have one. i just remember someone meantioning it on l-k.

> I do something similar all the time.  However, with this method you
> don't get a userspace program that tries to recompile while you're
> rewriting, reviewing, and executing your code.  Wouldn't it be nice to
> let the computer work more while you wait less?  This alias hack also
> lacks information indicating how cmd was generated, so you can't easily
> have utilities that take advantage of this information (like a makefile
> generator or an automatic recompiler).

an ide can do all this.

> > > There would also be no need for a makefile, since all of that
> > > information would be stored in the instructions to create the derived
> > 
> > but what format would they be in? how would you describe all the
> > dependencies? you'd end up with something just like a makefile. now you've
> > moving make into kernel.
> 
> With modules, Reiser4 is a very extensible filesystem.  The dependencies
> could be stored in whatever format is most efficient for storing the
> dependencies, and those dependencies could be used in userspace in
> whatever complicated way the userspace programs want to use them.  This

so now you need an in-kernel 'make', right?

> is the same as what i'm saying about versioning, but in a slightly
> different example.  The FS would just store some more information at the
> request of userspace programs.  The FS should provide: 1. a convenient
> and powerful way of expressing this information and 2. an efficient way
> of storing this information.  Every filesystem so far has failed in
> doing this to some extent, and I'm ready for something with more
> expressive power.

> > ide's can generate makefiles, i think (i only use emacs).
> 
> Don't you consider Emacs to be an IDE?  I think it could even be it's
> own operating system :)
> 
> Joking aside, when you compile using emacs, it could store and use the
> fact that the resulting executable was generated by running a bunch of
> commands you typed in.  With this information, a userspace program could
> help generate the makefile.  I don't have much to say about other IDEs
> because I haven't used any Unix IDEs and they probably have no problem
> generating makefiles.  IDEs can wrap up so much information because
> everything is built within them.  This information-wrapping could be
> done in other places to achieve the same effects.

-- 
Tom Vier <tmv@comcast.net>
DSA Key ID 0x15741ECE

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2004-04-08 21:46 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-19 14:25 Can compression at filesystem level improve overall performance? Erik Terpstra
2004-03-19 16:29 ` Redeeman
2004-03-19 16:53   ` Nikita Danilov
2004-03-21 14:29     ` Sean Johnson
2004-03-21 23:17       ` Can compression at filesystem level improve overall The Amazing Dragon
2004-03-21 23:23         ` Sean Johnson
2004-03-22  9:14         ` Hans Reiser
2004-03-22  8:01     ` Can compression at filesystem level improve overall performance? Kris Van Bruwaene
2004-03-22 18:00     ` Scott Young
2004-03-22 20:04       ` Hans Reiser
2004-03-23  3:03         ` Scott Young
2004-03-23 10:59           ` Hans Reiser
2004-03-24 16:19             ` Scott Young
2004-03-29  5:25               ` Tom Vier
2004-03-29  5:16           ` Tom Vier
2004-03-30  3:34             ` Scott Young
2004-03-30  4:53               ` Tom Vier
2004-03-31  4:51                 ` Scott Young
2004-04-08 21:46                   ` Tom Vier
2004-04-08 11:47                 ` Stewart Smith
2004-03-19 18:59 ` Hans Reiser
2004-03-23  0:17 ` Miguel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.