File system compression, not at the block layer

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* File system compression, not at the block layer
@ 2004-04-23 17:26 Timothy Miller
  2004-04-23 17:30 ` Miquel van Smoorenburg
  2004-04-28  1:00 ` David Lang
  0 siblings, 2 replies; 43+ messages in thread
From: Timothy Miller @ 2004-04-23 17:26 UTC (permalink / raw)
  To: Linux Kernel Mailing List

This is probably just another of my silly "they already thought of that 
and someone is doing exactly this" ideas.

I get the impression that a lot of people interested in doing FS 
compression want to do it at the block layer.  This gets complicated, 
because you need to allocate partial physical blocks.

Well, why not do the compression at the highest layer?

The idea is something akin to changing this (syntax variation intentional):

    tar cf - somefiles* > file

To this:

    tar cf - somefiles* | gzip > file

Except doing it transparently and for all files.

This way, the disk cache is all compressed data, and only decompressed 
as it's read or written by a process.

For files below a certain size, this is obviously pointless, since you 
can't save any space.  But in many cases, this could speed up the I/O 
for large files that are compressable.  (Space is cheap.  The only 
reason to compress is for speed.)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 17:26 File system compression, not at the block layer Timothy Miller
@ 2004-04-23 17:30 ` Miquel van Smoorenburg
  2004-04-23 17:41   ` Theodore Ts'o
  2004-04-28  1:00 ` David Lang
  1 sibling, 1 reply; 43+ messages in thread
From: Miquel van Smoorenburg @ 2004-04-23 17:30 UTC (permalink / raw)
  To: linux-kernel

In article <408951CE.3080908@techsource.com>,
Timothy Miller  <miller@techsource.com> wrote:
>Well, why not do the compression at the highest layer?
>[...] doing it transparently and for all files.

http://e2compr.sourceforge.net/

Mike.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 17:30 ` Miquel van Smoorenburg
@ 2004-04-23 17:41   ` Theodore Ts'o
  2004-04-23 17:57     ` Jörn Engel
                       ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Theodore Ts'o @ 2004-04-23 17:41 UTC (permalink / raw)
  To: Miquel van Smoorenburg; +Cc: linux-kernel

On Fri, Apr 23, 2004 at 05:30:21PM +0000, Miquel van Smoorenburg wrote:
> In article <408951CE.3080908@techsource.com>,
> Timothy Miller  <miller@techsource.com> wrote:
> >Well, why not do the compression at the highest layer?
> >[...] doing it transparently and for all files.
> 
> http://e2compr.sourceforge.net/

It's been done (see the above URL), but given how cheap disk space has
gotten, and how the speed of CPU has gotten faster much more quickly
than disk access has, many/most people have not be interested in
trading off performance for space.  As a result, there are race
conditions in e2compr (which is why it never got merged into
mainline), and there hasn't been sufficient interest to either (a)
forward port e2compr to more recent kernels revisions, or (b) find and
fix the race conditions.

							- Ted

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 17:41   ` Theodore Ts'o
@ 2004-04-23 17:57     ` Jörn Engel
  2004-04-23 18:14     ` Timothy Miller
  2004-04-27 20:34     ` Pavel Machek
  2 siblings, 0 replies; 43+ messages in thread
From: Jörn Engel @ 2004-04-23 17:57 UTC (permalink / raw)
  To: Theodore Ts'o, Miquel van Smoorenburg, linux-kernel

On Fri, 23 April 2004 13:41:47 -0400, Theodore Ts'o wrote:
> 
> It's been done (see the above URL), but given how cheap disk space has
> gotten, and how the speed of CPU has gotten faster much more quickly
> than disk access has, many/most people have not be interested in
> trading off performance for space.

Also, most diskspace today is filled by data that is already
compressed.

Jörn

-- 
Ninety percent of everything is crap.
-- Sturgeon's Law

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 17:41   ` Theodore Ts'o
  2004-04-23 17:57     ` Jörn Engel
@ 2004-04-23 18:14     ` Timothy Miller
  2004-04-23 18:34       ` Paul Jackson
  2004-04-27 20:34     ` Pavel Machek
  2 siblings, 1 reply; 43+ messages in thread
From: Timothy Miller @ 2004-04-23 18:14 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Miquel van Smoorenburg, linux-kernel

Theodore Ts'o wrote:
> On Fri, Apr 23, 2004 at 05:30:21PM +0000, Miquel van Smoorenburg wrote:
> 
>>In article <408951CE.3080908@techsource.com>,
>>Timothy Miller  <miller@techsource.com> wrote:
>>
>>>Well, why not do the compression at the highest layer?
>>>[...] doing it transparently and for all files.
>>
>>http://e2compr.sourceforge.net/
> 
> 
> It's been done (see the above URL), but given how cheap disk space has
> gotten, and how the speed of CPU has gotten faster much more quickly
> than disk access has, many/most people have not be interested in
> trading off performance for space.  As a result, there are race
> conditions in e2compr (which is why it never got merged into
> mainline), and there hasn't been sufficient interest to either (a)
> forward port e2compr to more recent kernels revisions, or (b) find and
> fix the race conditions.

Well, performance has been my only interest.  Aside from the embedded 
space (which already uses cramfs or something, right?), the only real 
benefit to FS compression is the fact that it would reduce the amount of 
data that you have to read from disk.  If your IDE drive gives you 
50MB/sec, and your file compresses by 50%, then you get 100MB/sec 
reading that file.

In a private email, one gentleman (who can credit himself if he likes) 
pointed out that compression doesn't reduce the number of seeks, and 
since seek times dominate, the benefit of compression would diminish.

SO... in addition to the brilliance of AS, is there anything else that 
can be done (using compression or something else) which could aid in 
reducing seek time?

Nutty idea:  Interleave files on the disk.  So, any given file will have 
its blocks allocated at, say, intervals of every 17 blocks.  Make up for 
the sequential performance hit with compression or something, but to get 
to the beginning of groups of files, seek time is reduced.  Maybe. 
Probably not, but hey.  :)

Another idea is to actively fragment the disk based on access patterns. 
  The most frequently accessed blocks are grouped together so as to 
maximize over-all throughput.  The problem with this is that, well, say 
boot time is critical -- booting wouldn't happen enough to get enough 
attention so that its blocks get optimized (they would get dispersed as 
a result of more common activities); but database access could benefit 
in the long-term.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 18:14     ` Timothy Miller
@ 2004-04-23 18:34       ` Paul Jackson
  2004-04-23 20:14         ` Joel Jaeggli
  0 siblings, 1 reply; 43+ messages in thread
From: Paul Jackson @ 2004-04-23 18:34 UTC (permalink / raw)
  To: Timothy Miller; +Cc: tytso, miquels, linux-kernel

> SO... in addition to the brilliance of AS, is there anything else that 
> can be done (using compression or something else) which could aid in 
> reducing seek time?

Buy more disks and only use a small portion of each for all but the
most infrequently accessed data.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 18:34       ` Paul Jackson
@ 2004-04-23 20:14         ` Joel Jaeggli
  2004-04-23 20:34           ` Richard B. Johnson
  2004-04-23 21:15           ` Timothy Miller
  0 siblings, 2 replies; 43+ messages in thread
From: Joel Jaeggli @ 2004-04-23 20:14 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Timothy Miller, tytso, miquels, linux-kernel

On Fri, 23 Apr 2004, Paul Jackson wrote:

> > SO... in addition to the brilliance of AS, is there anything else that 
> > can be done (using compression or something else) which could aid in 
> > reducing seek time?
> 
> Buy more disks and only use a small portion of each for all but the
> most infrequently accessed data.

faster drives. The biggest disks at this point are far slower that the 
fastest... the average read service time on a maxtor atlas 15k is like 
5.7ms on 250GB western digital sata, 14.1ms, so that more than twice as 
many reads can be executed on the fastest disks you can buy now... of 
course then you pay for it in cost, heat, density, and controller costs. 
everthing is a tradeoff though.

> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja@darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 20:14         ` Joel Jaeggli
@ 2004-04-23 20:34           ` Richard B. Johnson
  2004-04-23 20:44             ` Måns Rullgård
                               ` (4 more replies)
  2004-04-23 21:15           ` Timothy Miller
  1 sibling, 5 replies; 43+ messages in thread
From: Richard B. Johnson @ 2004-04-23 20:34 UTC (permalink / raw)
  To: Joel Jaeggli; +Cc: Paul Jackson, Timothy Miller, tytso, miquels, linux-kernel

On Fri, 23 Apr 2004, Joel Jaeggli wrote:

> On Fri, 23 Apr 2004, Paul Jackson wrote:
>
> > > SO... in addition to the brilliance of AS, is there anything else that
> > > can be done (using compression or something else) which could aid in
> > > reducing seek time?
> >
> > Buy more disks and only use a small portion of each for all but the
> > most infrequently accessed data.
>
> faster drives. The biggest disks at this point are far slower that the
> fastest... the average read service time on a maxtor atlas 15k is like
> 5.7ms on 250GB western digital sata, 14.1ms, so that more than twice as
> many reads can be executed on the fastest disks you can buy now... of
> course then you pay for it in cost, heat, density, and controller costs.
> everthing is a tradeoff though.
>

If you want to have fast disks, then you should do what I
suggested to Digital 20 years ago when they had ST-506
interfaces and SCSI was available only from third-parties.
It was called "striping" (I'm serious!). Not the so-called
RAID crap that took the original idea and destroyed it.
If you have 32-bits, you design an interface board for 32
disks. The interface board strips each bit to the data that
each disk gets. That makes the whole array 32 times faster
than a single drive and, of course, 32 times larger.

There is no redundancy in such an array, just brute-force
speed. One can add additional bits and CRC correction which
would allow the failure (or removal) of one drive at a time.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.26 on an i686 machine (5557.45 BogoMips).
            Note 96.31% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 20:34           ` Richard B. Johnson
@ 2004-04-23 20:44             ` Måns Rullgård
  2004-04-23 20:59               ` Richard B. Johnson
  2004-04-23 21:31             ` Joel Jaeggli
                               ` (3 subsequent siblings)
  4 siblings, 1 reply; 43+ messages in thread
From: Måns Rullgård @ 2004-04-23 20:44 UTC (permalink / raw)
  To: linux-kernel

"Richard B. Johnson" <root@chaos.analogic.com> writes:

> On Fri, 23 Apr 2004, Joel Jaeggli wrote:
>
>> On Fri, 23 Apr 2004, Paul Jackson wrote:
>>
>> > > SO... in addition to the brilliance of AS, is there anything else that
>> > > can be done (using compression or something else) which could aid in
>> > > reducing seek time?
>> >
>> > Buy more disks and only use a small portion of each for all but the
>> > most infrequently accessed data.
>>
>> faster drives. The biggest disks at this point are far slower that the
>> fastest... the average read service time on a maxtor atlas 15k is like
>> 5.7ms on 250GB western digital sata, 14.1ms, so that more than twice as
>> many reads can be executed on the fastest disks you can buy now... of
>> course then you pay for it in cost, heat, density, and controller costs.
>> everthing is a tradeoff though.
>>
>
> If you want to have fast disks, then you should do what I
> suggested to Digital 20 years ago when they had ST-506
> interfaces and SCSI was available only from third-parties.
> It was called "striping" (I'm serious!). Not the so-called
> RAID crap that took the original idea and destroyed it.
> If you have 32-bits, you design an interface board for 32
> disks. The interface board strips each bit to the data that
> each disk gets. That makes the whole array 32 times faster
> than a single drive and, of course, 32 times larger.

For best performance, the spindles should be synchronized too.  This
might be tricky with disks not intended for such operation, of course.

-- 
Måns Rullgård
mru@kth.se


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 20:44             ` Måns Rullgård
@ 2004-04-23 20:59               ` Richard B. Johnson
  2004-04-23 21:14                 ` Ben Greear
  2004-04-23 21:18                 ` Timothy Miller
  0 siblings, 2 replies; 43+ messages in thread
From: Richard B. Johnson @ 2004-04-23 20:59 UTC (permalink / raw)
  To: Måns Rullgård; +Cc: linux-kernel

On Fri, 23 Apr 2004, [iso-8859-1] Måns Rullgård wrote:

> "Richard B. Johnson" <root@chaos.analogic.com> writes:
>
> > On Fri, 23 Apr 2004, Joel Jaeggli wrote:
> >
> >> On Fri, 23 Apr 2004, Paul Jackson wrote:
> >>
> >> > > SO... in addition to the brilliance of AS, is there anything else that
> >> > > can be done (using compression or something else) which could aid in
> >> > > reducing seek time?
> >> >
> >> > Buy more disks and only use a small portion of each for all but the
> >> > most infrequently accessed data.
> >>
> >> faster drives. The biggest disks at this point are far slower that the
> >> fastest... the average read service time on a maxtor atlas 15k is like
> >> 5.7ms on 250GB western digital sata, 14.1ms, so that more than twice as
> >> many reads can be executed on the fastest disks you can buy now... of
> >> course then you pay for it in cost, heat, density, and controller costs.
> >> everthing is a tradeoff though.
> >>
> >
> > If you want to have fast disks, then you should do what I
> > suggested to Digital 20 years ago when they had ST-506
> > interfaces and SCSI was available only from third-parties.
> > It was called "striping" (I'm serious!). Not the so-called
> > RAID crap that took the original idea and destroyed it.
> > If you have 32-bits, you design an interface board for 32
> > disks. The interface board strips each bit to the data that
> > each disk gets. That makes the whole array 32 times faster
> > than a single drive and, of course, 32 times larger.
>
> For best performance, the spindles should be synchronized too.  This
> might be tricky with disks not intended for such operation, of course.

Actually not. You need a FIFO to cache your bits into buffers of bytes
anyway. Depending upon the length of the FIFO, you can "rubber-band" a
lot of rotational latency. When you are dealing with a lot of drives,
you are never going to have all the write currents turn on at the same
time anyway because they are (very) soft-sectored, i.e., block
replacement, etc.

Your argument was used to shout down the idea. Actually, I think
it was lost in the NIH syndrome anyway.

>
> --
> Måns Rullgård
> mru@kth.se
>


Cheers,
Dick Johnson
Penguin : Linux version 2.4.26 on an i686 machine (5557.45 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 20:59               ` Richard B. Johnson
@ 2004-04-23 21:14                 ` Ben Greear
  2004-04-23 21:25                   ` Timothy Miller
  2004-04-23 21:18                 ` Timothy Miller
  1 sibling, 1 reply; 43+ messages in thread
From: Ben Greear @ 2004-04-23 21:14 UTC (permalink / raw)
  To: root; +Cc: Måns Rullgård, linux-kernel

Richard B. Johnson wrote:

> Actually not. You need a FIFO to cache your bits into buffers of bytes
> anyway. Depending upon the length of the FIFO, you can "rubber-band" a
> lot of rotational latency. When you are dealing with a lot of drives,
> you are never going to have all the write currents turn on at the same
> time anyway because they are (very) soft-sectored, i.e., block
> replacement, etc.

Wouldn't this pretty much guarantee worst-case latency scenario for reading, since
on average at least one of your 32 disks is going to require a full rotation
(and probably a seek) to find it's bit?

Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 21:14                 ` Ben Greear
@ 2004-04-23 21:25                   ` Timothy Miller
  2004-04-24  4:58                     ` Ben Greear
  0 siblings, 1 reply; 43+ messages in thread
From: Timothy Miller @ 2004-04-23 21:25 UTC (permalink / raw)
  To: Ben Greear; +Cc: root, Måns Rullgård, linux-kernel



Ben Greear wrote:
> Richard B. Johnson wrote:
> 
>> Actually not. You need a FIFO to cache your bits into buffers of bytes
>> anyway. Depending upon the length of the FIFO, you can "rubber-band" a
>> lot of rotational latency. When you are dealing with a lot of drives,
>> you are never going to have all the write currents turn on at the same
>> time anyway because they are (very) soft-sectored, i.e., block
>> replacement, etc.
> 
> 
> Wouldn't this pretty much guarantee worst-case latency scenario for 
> reading, since
> on average at least one of your 32 disks is going to require a full 
> rotation
> (and probably a seek) to find it's bit?


Only for the first bit of a block.  For large streams of reads, the 
fifos will keep things going, except for occasionally as drives drift in 
their relative rotation positions which can cause some delays.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 21:25                   ` Timothy Miller
@ 2004-04-24  4:58                     ` Ben Greear
  2004-04-27 15:45                       ` Timothy Miller
  0 siblings, 1 reply; 43+ messages in thread
From: Ben Greear @ 2004-04-24  4:58 UTC (permalink / raw)
  To: Timothy Miller; +Cc: root, linux-kernel

Timothy Miller wrote:

>> Wouldn't this pretty much guarantee worst-case latency scenario for 
>> reading, since
>> on average at least one of your 32 disks is going to require a full 
>> rotation
>> (and probably a seek) to find it's bit?
> 
> 
> 
> Only for the first bit of a block.  For large streams of reads, the 
> fifos will keep things going, except for occasionally as drives drift in 
> their relative rotation positions which can cause some delays.

So how is that better than using a striping raid that stripes at the
block level or multi-block level?

Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-24  4:58                     ` Ben Greear
@ 2004-04-27 15:45                       ` Timothy Miller
  0 siblings, 0 replies; 43+ messages in thread
From: Timothy Miller @ 2004-04-27 15:45 UTC (permalink / raw)
  To: Ben Greear; +Cc: root, linux-kernel



Ben Greear wrote:
> Timothy Miller wrote:
> 
>>> Wouldn't this pretty much guarantee worst-case latency scenario for 
>>> reading, since
>>> on average at least one of your 32 disks is going to require a full 
>>> rotation
>>> (and probably a seek) to find it's bit?
>>
>>
>>
>>
>> Only for the first bit of a block.  For large streams of reads, the 
>> fifos will keep things going, except for occasionally as drives drift 
>> in their relative rotation positions which can cause some delays.
> 
> 
> So how is that better than using a striping raid that stripes at the
> block level or multi-block level?
> 


It's only better for large streaming writes.  The FIFOs I'm talking 
about above would certainly be smaller than typical RAID0 stripes.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 20:59               ` Richard B. Johnson
  2004-04-23 21:14                 ` Ben Greear
@ 2004-04-23 21:18                 ` Timothy Miller
  2004-04-24  1:28                   ` Horst von Brand
  2004-04-24  2:24                   ` Tom Vier
  1 sibling, 2 replies; 43+ messages in thread
From: Timothy Miller @ 2004-04-23 21:18 UTC (permalink / raw)
  To: root; +Cc: Måns Rullgård, linux-kernel



Richard B. Johnson wrote:

> 
> Actually not. You need a FIFO to cache your bits into buffers of bytes
> anyway. Depending upon the length of the FIFO, you can "rubber-band" a
> lot of rotational latency. When you are dealing with a lot of drives,
> you are never going to have all the write currents turn on at the same
> time anyway because they are (very) soft-sectored, i.e., block
> replacement, etc.
> 
> Your argument was used to shout down the idea. Actually, I think
> it was lost in the NIH syndrome anyway.
> 


In a drive with multiple platters and therefore multiple heads, you 
could read/write from all heads simultaneously.  Or is that how they 
already do it?



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 21:18                 ` Timothy Miller
@ 2004-04-24  1:28                   ` Horst von Brand
  2004-04-24  2:24                   ` Tom Vier
  1 sibling, 0 replies; 43+ messages in thread
From: Horst von Brand @ 2004-04-24  1:28 UTC (permalink / raw)
  To: Timothy Miller; +Cc: Linux Kernel Mailing List

Timothy Miller <miller@techsource.com> said:

[...]

> In a drive with multiple platters and therefore multiple heads, you 
> could read/write from all heads simultaneously.  Or is that how they 
> already do it?

No. Current disks have bad blocks (way too small on disk to be able to
ensure 100% OK), and they are remapped by the drive firmware to spare
cilinders. To have the exact same blocks broken on each surface would be a
real lottery.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 21:18                 ` Timothy Miller
  2004-04-24  1:28                   ` Horst von Brand
@ 2004-04-24  2:24                   ` Tom Vier
  2004-04-24  7:36                     ` Willy Tarreau
  2004-04-27 15:43                     ` Timothy Miller
  1 sibling, 2 replies; 43+ messages in thread
From: Tom Vier @ 2004-04-24  2:24 UTC (permalink / raw)
  To: linux-kernel

On Fri, Apr 23, 2004 at 05:18:44PM -0400, Timothy Miller wrote:
> In a drive with multiple platters and therefore multiple heads, you 
> could read/write from all heads simultaneously.  Or is that how they 
> already do it?

fwih, there was once a drive that did this. the problem is track alignment.
these days, you'd need seperate motors for each head.

-- 
Tom Vier <tmv@comcast.net>
DSA Key ID 0x15741ECE

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-24  2:24                   ` Tom Vier
@ 2004-04-24  7:36                     ` Willy Tarreau
  2004-04-24 16:02                       ` Eric D. Mudama
  2004-04-25  3:05                       ` Horst von Brand
  2004-04-27 15:43                     ` Timothy Miller
  1 sibling, 2 replies; 43+ messages in thread
From: Willy Tarreau @ 2004-04-24  7:36 UTC (permalink / raw)
  To: Tom Vier; +Cc: linux-kernel

On Fri, Apr 23, 2004 at 10:24:58PM -0400, Tom Vier wrote:
> On Fri, Apr 23, 2004 at 05:18:44PM -0400, Timothy Miller wrote:
> > In a drive with multiple platters and therefore multiple heads, you 
> > could read/write from all heads simultaneously.  Or is that how they 
> > already do it?
> 
> fwih, there was once a drive that did this. the problem is track alignment.
> these days, you'd need seperate motors for each head.

I think they now all do it. Haven't you noticed that drives with many
platters are always faster than their cousins with fewer platters ? And
I don't speak about access time, but about sequential reads.

Willy


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-24  7:36                     ` Willy Tarreau
@ 2004-04-24 16:02                       ` Eric D. Mudama
  2004-04-25  3:05                       ` Horst von Brand
  1 sibling, 0 replies; 43+ messages in thread
From: Eric D. Mudama @ 2004-04-24 16:02 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Tom Vier, linux-kernel

On Sat, Apr 24 at  9:36, Willy Tarreau wrote:
>On Fri, Apr 23, 2004 at 10:24:58PM -0400, Tom Vier wrote:
>> On Fri, Apr 23, 2004 at 05:18:44PM -0400, Timothy Miller wrote:
>> > In a drive with multiple platters and therefore multiple heads, you 
>> > could read/write from all heads simultaneously.  Or is that how they 
>> > already do it?
>> 
>> fwih, there was once a drive that did this. the problem is track alignment.
>> these days, you'd need seperate motors for each head.
>
>I think they now all do it. Haven't you noticed that drives with many
>platters are always faster than their cousins with fewer platters ? And
>I don't speak about access time, but about sequential reads.

Only one read/write element can be active at one time in a modern disk
drive.  The issue is that while the drive's headstack was originally
in alignment, all sorts of factors can cause it to fall out of
alignment.  If that occurs, the heads might not line up with each
other, meaning that when you used to line up with A1 and B1 (side A,
cylinder 1) your two heads now align with A1 and B40.

Every surface has embedded servo information, which allows the drive
to work around mechanical variability and handling damage.  The
difference in position between adjacent heads in a drive factors into
a parameter called "head switch skew".  Head switch skew is "how long
does it take us to seek to the next sequential LBA after reading the
last LBA on a track/head?"  Track-to-track skew is how long to seek
and settle on the adjacent track on the same head.

These two parameters are used to generate the drive's format, which in
turn account for the sequential throughput.  (higher skews means lower
usage duty cycle means lower overall throughput.)  If the skews are
set too low, the drive blows revs because it can't settle in time for
the LBA it needs to read.

In general, a drive with lots of heads will perform better on most
workloads because it doesn't have to seek as far radially to cover the
same amount of data.  However, a single-headed and a multi-headed
drive of the same generation should be virtually identical in
sequential throughput... within a few percent.  If anything, the
single-headed drive should be a bit faster because track-to-track
skews are typically smaller than headswitch skews.

--eric

-- 
Eric D. Mudama
edmudama@mail.bounceswoosh.org

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-24  7:36                     ` Willy Tarreau
  2004-04-24 16:02                       ` Eric D. Mudama
@ 2004-04-25  3:05                       ` Horst von Brand
  2004-04-25  7:29                         ` Willy Tarreau
  1 sibling, 1 reply; 43+ messages in thread
From: Horst von Brand @ 2004-04-25  3:05 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Linux Kernel Mailing List

Willy Tarreau <willy@w.ods.org> said:
> On Fri, Apr 23, 2004 at 10:24:58PM -0400, Tom Vier wrote:
> > On Fri, Apr 23, 2004 at 05:18:44PM -0400, Timothy Miller wrote:
> > > In a drive with multiple platters and therefore multiple heads, you 
> > > could read/write from all heads simultaneously.  Or is that how they 
> > > already do it?
> > 
> > fwih, there was once a drive that did this. the problem is track alignment.
> > these days, you'd need seperate motors for each head.

> I think they now all do it.

No.

>                             Haven't you noticed that drives with many
> platters are always faster than their cousins with fewer platters ? And
> I don't speak about access time, but about sequential reads.

Have you ever wondered how they squeeze 16 or more platters into that slim
enclosure? If you take them apart, the question evaporates: There are 2 or
3 platters in them, no more. The "many platters" are an artifact of BIOS'
"disk geometry" description.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-25  3:05                       ` Horst von Brand
@ 2004-04-25  7:29                         ` Willy Tarreau
  2004-04-25 19:50                           ` Eric D. Mudama
  0 siblings, 1 reply; 43+ messages in thread
From: Willy Tarreau @ 2004-04-25  7:29 UTC (permalink / raw)
  To: Horst von Brand; +Cc: Linux Kernel Mailing List

On Sat, Apr 24, 2004 at 11:05:05PM -0400, Horst von Brand wrote:

> >                             Haven't you noticed that drives with many
> > platters are always faster than their cousins with fewer platters ? And
> > I don't speak about access time, but about sequential reads.
> 
> Have you ever wondered how they squeeze 16 or more platters into that slim
> enclosure? If you take them apart, the question evaporates: There are 2 or
> 3 platters in them, no more. The "many platters" are an artifact of BIOS'
> "disk geometry" description.

I know, I was speaking about physical platters of course. Mark Hann told
me in private that he disagreed with me, so I checked recent disks 
(36, 73, 147 GB SCSI with 1, 2, 4 platters) and he was right, they have
exactly the same spec concerning speed. But I said that I remember the
times when I regularly did this test on disks that I was integrating about
7-8 years ago, they were 2.1, 4.3, 6.4 GB (1,2,3 platters), and I'm fairly
certain that the 1-platter performed at about 5 MB/s while the 6.4 was around
12 MB/s. BTW, the 9GB SCSI I have in my PC does about 28 MB/s for 1 platter,
while its 18 GB equivalent (2 platters) does about 51. So I think that what
I observed remained true for such capacities, but changed on bigger disks
because of mechanical constraints. Afterall, what's 18 GB now ? Less than
one twentieth of the biggest disk.

Anyway, this is off-topic, so that's my last post on LKML on the subject.

Regards,
Willy

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-25  7:29                         ` Willy Tarreau
@ 2004-04-25 19:50                           ` Eric D. Mudama
  0 siblings, 0 replies; 43+ messages in thread
From: Eric D. Mudama @ 2004-04-25 19:50 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Horst von Brand, Linux Kernel Mailing List

On Sun, Apr 25 at  9:29, Willy Tarreau wrote:
>I know, I was speaking about physical platters of course. Mark Hann told
>me in private that he disagreed with me, so I checked recent disks 
>(36, 73, 147 GB SCSI with 1, 2, 4 platters) and he was right, they have
>exactly the same spec concerning speed. But I said that I remember the
>times when I regularly did this test on disks that I was integrating about
>7-8 years ago, they were 2.1, 4.3, 6.4 GB (1,2,3 platters), and I'm fairly
>certain that the 1-platter performed at about 5 MB/s while the 6.4 was around
>12 MB/s. BTW, the 9GB SCSI I have in my PC does about 28 MB/s for 1 platter,
>while its 18 GB equivalent (2 platters) does about 51. So I think that what
>I observed remained true for such capacities, but changed on bigger disks
>because of mechanical constraints. Afterall, what's 18 GB now ? Less than
>one twentieth of the biggest disk.
>
>Anyway, this is off-topic, so that's my last post on LKML on the subject.

Let me throw in a final $.02...

Are you sure your 9GB and 18GB drives are of the same "generation" of
technology?  SCSI drive platters have gotten smaller and smaller to
shorten the seek distance (they use 2.5" media now inside 3.5" drives)
for random operations, and I'm wondering if your 18GB is in fact a
generation ahead of your 9GB.

Are you sure your 9GB SCSI drive only has 1 platter in it?

--eric

-- 
Eric D. Mudama
edmudama@mail.bounceswoosh.org


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-24  2:24                   ` Tom Vier
  2004-04-24  7:36                     ` Willy Tarreau
@ 2004-04-27 15:43                     ` Timothy Miller
  2004-04-28  0:29                       ` Tom Vier
  1 sibling, 1 reply; 43+ messages in thread
From: Timothy Miller @ 2004-04-27 15:43 UTC (permalink / raw)
  To: Tom Vier; +Cc: linux-kernel



Tom Vier wrote:
> On Fri, Apr 23, 2004 at 05:18:44PM -0400, Timothy Miller wrote:
> 
>>In a drive with multiple platters and therefore multiple heads, you 
>>could read/write from all heads simultaneously.  Or is that how they 
>>already do it?
> 
> 
> fwih, there was once a drive that did this. the problem is track alignment.
> these days, you'd need seperate motors for each head.
> 

Oh, yeah.  Forget the separate motors.  Would definately need that to 
move heads independently.

The problem is track alignment.  Don't drives dedicate one track on one 
platter as an alignment track?


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-27 15:43                     ` Timothy Miller
@ 2004-04-28  0:29                       ` Tom Vier
  0 siblings, 0 replies; 43+ messages in thread
From: Tom Vier @ 2004-04-28  0:29 UTC (permalink / raw)
  To: Timothy Miller; +Cc: linux-kernel

On Tue, Apr 27, 2004 at 11:43:58AM -0400, Timothy Miller wrote:
> The problem is track alignment.  Don't drives dedicate one track on one 
> platter as an alignment track?

it used to be one whole plater was for servo alignment, i think. embedded
servo signals have been around for at least 7 years.

-- 
Tom Vier <tmv@comcast.net>
DSA Key ID 0x15741ECE

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 20:34           ` Richard B. Johnson
  2004-04-23 20:44             ` Måns Rullgård
@ 2004-04-23 21:31             ` Joel Jaeggli
  2004-04-23 22:20               ` Ian Stirling
  2004-04-23 23:34             ` Paul Jackson
                               ` (2 subsequent siblings)
  4 siblings, 1 reply; 43+ messages in thread
From: Joel Jaeggli @ 2004-04-23 21:31 UTC (permalink / raw)
  To: Richard B. Johnson
  Cc: Paul Jackson, Timothy Miller, tytso, miquels, linux-kernel

On Fri, 23 Apr 2004, Richard B. Johnson wrote:
> 
> If you want to have fast disks, then you should do what I
> suggested to Digital 20 years ago when they had ST-506
> interfaces and SCSI was available only from third-parties.
> It was called "striping" (I'm serious!). Not the so-called
> RAID crap that took the original idea and destroyed it.
> If you have 32-bits, you design an interface board for 32
> disks. The interface board strips each bit to the data that
> each disk gets. That makes the whole array 32 times faster
> than a single drive and, of course, 32 times larger.
> 
> There is no redundancy in such an array, just brute-force
> speed. One can add additional bits and CRC correction which
> would allow the failure (or removal) of one drive at a time.

except disks no longer encode one bit at a time (with prml), and you're
still serializing requests across all the spindles instead of dividing
requests between spindles... it's pretty clear that in the forseeable
future capacity grown will continue to far outstrip access speed in
spinning magnetic media. I would agree that any serious improvement is
likely to come for more creativly arranging the data at the block or
filesystem level, netapps log-structured raid4 being one direction to 
head... 
 
> Cheers,
> Dick Johnson
> Penguin : Linux version 2.4.26 on an i686 machine (5557.45 BogoMips).
>             Note 96.31% of all statistics are fiction.
> 
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja@darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 21:31             ` Joel Jaeggli
@ 2004-04-23 22:20               ` Ian Stirling
  0 siblings, 0 replies; 43+ messages in thread
From: Ian Stirling @ 2004-04-23 22:20 UTC (permalink / raw)
  To: Joel Jaeggli
  Cc: Richard B. Johnson, Paul Jackson, Timothy Miller, tytso, miquels,
	linux-kernel

Joel Jaeggli wrote:
> On Fri, 23 Apr 2004, Richard B. Johnson wrote:
> 
>>If you want to have fast disks, then you should do what I
>>suggested to Digital 20 years ago when they had ST-506
>>interfaces and SCSI was available only from third-parties.

> except disks no longer encode one bit at a time (with prml), and you're
> still serializing requests across all the spindles instead of dividing
> requests between spindles... it's pretty clear that in the forseeable
> future capacity grown will continue to far outstrip access speed in
> spinning magnetic media. I would agree that any serious improvement is

I happened to do some sums about a week ago.

My first drive was ST225R, which was 60M,3600RPM and the whole drive could be
read in 2 or 3 mins.
My new 160G drive is 7200RPM, and reads in around 50 mins.

It's not a complete coincidence that sqrt(160/.06) is about 50, and the number
of revs to read the drive is pretty much dead on 50 times.

The areal density of disk drives tends to go up both by adding more tracks, and
by squeezing the data into each track more densely.

While you can speed up the disk maybe 5 times if you are willing to pay the price,
the increasing number of tracks means that you'r still going to need lots more
revs to read the drive.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 20:34           ` Richard B. Johnson
  2004-04-23 20:44             ` Måns Rullgård
  2004-04-23 21:31             ` Joel Jaeggli
@ 2004-04-23 23:34             ` Paul Jackson
  2004-04-27 15:42               ` Timothy Miller
  2004-04-24  1:18             ` Horst von Brand
  2004-04-26 10:22             ` Jörn Engel
  4 siblings, 1 reply; 43+ messages in thread
From: Paul Jackson @ 2004-04-23 23:34 UTC (permalink / raw)
  To: root; +Cc: joelja, miller, tytso, miquels, linux-kernel

> If you want to have fast disks, then you should do what I
> suggested to Digital 20 years ago when they had ST-506
> interfaces and SCSI was available only from third-parties.
> It was called "striping" (I'm serious!).

That gets your bandwidth up, but does nothing for latency.

Depending on your workload, that may or may not be critical.

As a former SGI employee noted:

  "Money can buy bandwidth, but latency is forever" -- John Mashey

To get latency down, you need fast rotating disks and short strokes
(waste most of the disk on little used data, or on nothing at all).
And even that won't get you much faster than 20 years ago.

That, or lots of main memory, or if the data is pretty much
read-only, perhaps some complicated data duplication.

But we're not in such bad shape there - folks have been dealing
with that speed difference for at least 20 years ;).

It's the speed difference between the processor and main memory
that's more challenging now - as it approaches speed differences
we once saw between processor and disk.

To heck with disk compression - it's time for main memory compression.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 23:34             ` Paul Jackson
@ 2004-04-27 15:42               ` Timothy Miller
  2004-04-27 16:02                 ` Jörn Engel
  0 siblings, 1 reply; 43+ messages in thread
From: Timothy Miller @ 2004-04-27 15:42 UTC (permalink / raw)
  To: Paul Jackson; +Cc: root, joelja, tytso, miquels, linux-kernel



Paul Jackson wrote:

> 
> To heck with disk compression - it's time for main memory compression.
> 

I think nVidia and ATI chips do that with the Z buffer.  Definately 
improves bandwidth utilization.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-27 15:42               ` Timothy Miller
@ 2004-04-27 16:02                 ` Jörn Engel
  0 siblings, 0 replies; 43+ messages in thread
From: Jörn Engel @ 2004-04-27 16:02 UTC (permalink / raw)
  To: Timothy Miller; +Cc: Paul Jackson, root, joelja, tytso, miquels, linux-kernel

On Tue, 27 April 2004 11:42:11 -0400, Timothy Miller wrote:
> Paul Jackson wrote:
> 
> >To heck with disk compression - it's time for main memory compression.
> 
> I think nVidia and ATI chips do that with the Z buffer.  Definately 
> improves bandwidth utilization.
           ^^^^^^^^^

Well stated.  For general purpose cpus with unpredictable access
patterns, compression makes latency even worse, so you need even
bigger caches.

On the other hand, memory compression makes memory bigger, and memory
of course is a disk cache, so it does improve latency somewhere.

Jörn

-- 
Victory in war is not repetitious.
-- Sun Tzu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 20:34           ` Richard B. Johnson
                               ` (2 preceding siblings ...)
  2004-04-23 23:34             ` Paul Jackson
@ 2004-04-24  1:18             ` Horst von Brand
  2004-04-26 10:22             ` Jörn Engel
  4 siblings, 0 replies; 43+ messages in thread
From: Horst von Brand @ 2004-04-24  1:18 UTC (permalink / raw)
  To: root; +Cc: Linux Kernel Mailing List

"Richard B. Johnson" <root@chaos.analogic.com> said:

[...]

> If you want to have fast disks, then you should do what I
> suggested to Digital 20 years ago when they had ST-506
> interfaces and SCSI was available only from third-parties.
> It was called "striping" (I'm serious!). Not the so-called
> RAID crap that took the original idea and destroyed it.
> If you have 32-bits, you design an interface board for 32
> disks. The interface board strips each bit to the data that
> each disk gets. That makes the whole array 32 times faster
> than a single drive and, of course, 32 times larger.

But seeks are just as slow as before... and weigh in more as sectors are
shorter (for the same visible sector size, 1/32th). I'm not so sure this is
a win overall.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 20:34           ` Richard B. Johnson
                               ` (3 preceding siblings ...)
  2004-04-24  1:18             ` Horst von Brand
@ 2004-04-26 10:22             ` Jörn Engel
  4 siblings, 0 replies; 43+ messages in thread
From: Jörn Engel @ 2004-04-26 10:22 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Timothy Miller, linux-kernel

On Fri, 23 April 2004 16:34:21 -0400, Richard B. Johnson wrote:
> 
> If you want to have fast disks, then you should do what I
> suggested to Digital 20 years ago when they had ST-506
> interfaces and SCSI was available only from third-parties.
> It was called "striping" (I'm serious!). Not the so-called
> RAID crap that took the original idea and destroyed it.
> If you have 32-bits, you design an interface board for 32
> disks. The interface board strips each bit to the data that
> each disk gets. That makes the whole array 32 times faster
> than a single drive and, of course, 32 times larger.
> 
> There is no redundancy in such an array, just brute-force
> speed. One can add additional bits and CRC correction which
> would allow the failure (or removal) of one drive at a time.

...and so you add latency to the ever-growing list of concepts you
publically prove to be unaware of.

Those 32 disks now have something like 32x50MB/s or 1.6GB/s, great.
Seek time is still 10ms, though, so now each seek costs as much as
16MB of continuous data transfer.  Nice.  So readahead will be 64MB,
and disk cache 1GB, just to get rid of some seeks again?  Sure.

If you were a little smarter and used the so-called RAID crap, you
would have stripes of about the readahead size (or more) and seeks get
spread up between disks.  Sure, transfer speed will usually be lower
than 1.6GB/s, but who cares.  The point is that each seek will only
cost you as much as 500kB of continuous transfer.

But like so many other things, you will refuse to understand this as
well, right?  Well, at least don't try to convince the unaware,
please.

Jörn

-- 
There's nothing better for promoting creativity in a medium than
making an audience feel "Hmm  I could do better than that!"
-- Douglas Adams in a slashdot interview

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 20:14         ` Joel Jaeggli
  2004-04-23 20:34           ` Richard B. Johnson
@ 2004-04-23 21:15           ` Timothy Miller
  2004-04-23 21:36             ` Joel Jaeggli
  1 sibling, 1 reply; 43+ messages in thread
From: Timothy Miller @ 2004-04-23 21:15 UTC (permalink / raw)
  To: Joel Jaeggli; +Cc: Paul Jackson, tytso, miquels, linux-kernel



Joel Jaeggli wrote:
> On Fri, 23 Apr 2004, Paul Jackson wrote:
> 
> 
>>>SO... in addition to the brilliance of AS, is there anything else that 
>>>can be done (using compression or something else) which could aid in 
>>>reducing seek time?
>>
>>Buy more disks and only use a small portion of each for all but the
>>most infrequently accessed data.
> 
> 
> faster drives. The biggest disks at this point are far slower that the 
> fastest... the average read service time on a maxtor atlas 15k is like 
> 5.7ms on 250GB western digital sata, 14.1ms, so that more than twice as 
> many reads can be executed on the fastest disks you can buy now... of 
> course then you pay for it in cost, heat, density, and controller costs. 
> everthing is a tradeoff though.


I had this idea of packing a bunch of those really tiny Toshiba 
quarter-sized drives and some sort of RAID0 controller into a box the 
size of a 3.5" hard drive.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 21:15           ` Timothy Miller
@ 2004-04-23 21:36             ` Joel Jaeggli
  0 siblings, 0 replies; 43+ messages in thread
From: Joel Jaeggli @ 2004-04-23 21:36 UTC (permalink / raw)
  To: Timothy Miller; +Cc: Paul Jackson, tytso, miquels, linux-kernel

On Fri, 23 Apr 2004, Timothy Miller wrote:

> 
> 
> Joel Jaeggli wrote:
> > On Fri, 23 Apr 2004, Paul Jackson wrote:
> > 
> > 
> >>>SO... in addition to the brilliance of AS, is there anything else that 
> >>>can be done (using compression or something else) which could aid in 
> >>>reducing seek time?
> >>
> >>Buy more disks and only use a small portion of each for all but the
> >>most infrequently accessed data.
> > 
> > 
> > faster drives. The biggest disks at this point are far slower that the 
> > fastest... the average read service time on a maxtor atlas 15k is like 
> > 5.7ms on 250GB western digital sata, 14.1ms, so that more than twice as 
> > many reads can be executed on the fastest disks you can buy now... of 
> > course then you pay for it in cost, heat, density, and controller costs. 
> > everthing is a tradeoff though.
> 
> 
> I had this idea of packing a bunch of those really tiny Toshiba 
> quarter-sized drives and some sort of RAID0 controller into a box the 
> size of a 3.5" hard drive.

they're deathly slow... I'll send you an hdparm from an 4GB ibm microdrive 
the next time I have it mounted...
 
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja@darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 17:41   ` Theodore Ts'o
  2004-04-23 17:57     ` Jörn Engel
  2004-04-23 18:14     ` Timothy Miller
@ 2004-04-27 20:34     ` Pavel Machek
  2004-04-28 22:57       ` Timothy Miller
  2 siblings, 1 reply; 43+ messages in thread
From: Pavel Machek @ 2004-04-27 20:34 UTC (permalink / raw)
  To: Theodore Ts'o, Miquel van Smoorenburg, linux-kernel

Hi!

> > >Well, why not do the compression at the highest layer?
> > >[...] doing it transparently and for all files.
> > 
> > http://e2compr.sourceforge.net/
> 
> It's been done (see the above URL), but given how cheap disk space has
> gotten, and how the speed of CPU has gotten faster much more quickly
> than disk access has, many/most people have not be interested in
> trading off performance for space.  As a result, there are race

Is CPU_speed / disk_throughput increasing? If so, compression
might help once again. CPU_speed / net_throughput probably is
increasing, so compressedNFS would probably make sense.
				Pavel
-- 
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms         


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-27 20:34     ` Pavel Machek
@ 2004-04-28 22:57       ` Timothy Miller
  2004-04-29  9:46         ` Jörn Engel
  0 siblings, 1 reply; 43+ messages in thread
From: Timothy Miller @ 2004-04-28 22:57 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Theodore Ts'o, Miquel van Smoorenburg, linux-kernel



Pavel Machek wrote:
> Hi!
> 
> 
>>>>Well, why not do the compression at the highest layer?
>>>>[...] doing it transparently and for all files.
>>>
>>>http://e2compr.sourceforge.net/
>>
>>It's been done (see the above URL), but given how cheap disk space has
>>gotten, and how the speed of CPU has gotten faster much more quickly
>>than disk access has, many/most people have not be interested in
>>trading off performance for space.  As a result, there are race
> 
> 
> Is CPU_speed / disk_throughput increasing? If so, compression
> might help once again. CPU_speed / net_throughput probably is
> increasing, so compressedNFS would probably make sense.


I've always felt that way, but every time I mention it, people tell me 
it's not worth the CPU overhead.  For many years, I have felt that there 
should be an IP socket type which was inherently compressed.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-28 22:57       ` Timothy Miller
@ 2004-04-29  9:46         ` Jörn Engel
  2004-04-29  9:52           ` Pavel Machek
  0 siblings, 1 reply; 43+ messages in thread
From: Jörn Engel @ 2004-04-29  9:46 UTC (permalink / raw)
  To: Timothy Miller
  Cc: Pavel Machek, Theodore Ts'o, Miquel van Smoorenburg,
	linux-kernel

On Wed, 28 April 2004 18:57:08 -0400, Timothy Miller wrote:
> 
> I've always felt that way, but every time I mention it, people tell me 
> it's not worth the CPU overhead.  For many years, I have felt that there 
> should be an IP socket type which was inherently compressed.

Ever heard of ssh? ;)

Depending on speed of network and cpus involved, scp can be faster
than nfs.

Jörn

-- 
And spam is a useful source of entropy for /dev/random too!
-- Jasmine Strong

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-29  9:46         ` Jörn Engel
@ 2004-04-29  9:52           ` Pavel Machek
  2004-04-29 10:09             ` Jörn Engel
  2004-04-29 17:17             ` Tim Connors
  0 siblings, 2 replies; 43+ messages in thread
From: Pavel Machek @ 2004-04-29  9:52 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Timothy Miller, Theodore Ts'o, Miquel van Smoorenburg,
	linux-kernel

Hi!

> > I've always felt that way, but every time I mention it, people tell me 
> > it's not worth the CPU overhead.  For many years, I have felt that there 
> > should be an IP socket type which was inherently compressed.
> 
> Ever heard of ssh? ;)

Its too high level, and if you want compression but not encryption
that's tricky to do.

> Depending on speed of network and cpus involved, scp can be faster
> than nfs.

Well... but that's due to nfs being broken, right?
								Pavel
-- 
934a471f20d6580d5aad759bf0d97ddc

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-29  9:52           ` Pavel Machek
@ 2004-04-29 10:09             ` Jörn Engel
  2004-04-29 10:19               ` Pavel Machek
  2004-04-29 17:17             ` Tim Connors
  1 sibling, 1 reply; 43+ messages in thread
From: Jörn Engel @ 2004-04-29 10:09 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Timothy Miller, Theodore Ts'o, Miquel van Smoorenburg,
	linux-kernel

On Thu, 29 April 2004 11:52:37 +0200, Pavel Machek wrote:
> 
> > Ever heard of ssh? ;)
> 
> Its too high level, and if you want compression but not encryption
> that's tricky to do.
> 
> > Depending on speed of network and cpus involved, scp can be faster
> > than nfs.
> 
> Well... but that's due to nfs being broken, right?

I don't think nfs is broken because of missing compression, but yes,
the difference is by design.

Jörn

-- 
A victorious army first wins and then seeks battle.
-- Sun Tzu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-29 10:09             ` Jörn Engel
@ 2004-04-29 10:19               ` Pavel Machek
  0 siblings, 0 replies; 43+ messages in thread
From: Pavel Machek @ 2004-04-29 10:19 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Timothy Miller, Theodore Ts'o, Miquel van Smoorenburg,
	linux-kernel

On Čt 29-04-04 12:09:42, Jörn Engel wrote:
> On Thu, 29 April 2004 11:52:37 +0200, Pavel Machek wrote:
> > 
> > > Ever heard of ssh? ;)
> > 
> > Its too high level, and if you want compression but not encryption
> > that's tricky to do.
> > 
> > > Depending on speed of network and cpus involved, scp can be faster
> > > than nfs.
> > 
> > Well... but that's due to nfs being broken, right?
> 
> I don't think nfs is broken because of missing compression, but yes,
> the difference is by design.

Well, scp is easy, scp is linear copy of file. nfs is little more
tricky. I'm not talking about compression, due to various reasons
(UDP?), nfs is not always able to get the wire speed.
							Pavel
-- 
934a471f20d6580d5aad759bf0d97ddc

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re:  File system compression, not at the block layer
  2004-04-29  9:52           ` Pavel Machek
  2004-04-29 10:09             ` Jörn Engel
@ 2004-04-29 17:17             ` Tim Connors
  1 sibling, 0 replies; 43+ messages in thread
From: Tim Connors @ 2004-04-29 17:17 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Timothy Miller, Theodore Ts'o, Miquel van Smoorenburg,
	linux-kernel

Pavel Machek <pavel@suse.cz> said on Thu, 29 Apr 2004 11:52:37 +0200:
> Hi!
> 
> > > I've always felt that way, but every time I mention it, people tell me 
> > > it's not worth the CPU overhead.  For many years, I have felt that there 
> > > should be an IP socket type which was inherently compressed.
> > 
> > Ever heard of ssh? ;)
> 
> Its too high level, and if you want compression but not encryption
> that's tricky to do.

Just today we were trying to transfer ~350GB from a shell of a machine
(running knopix, with a very small amount of installed software, and
absolutely no disk space left) holding 4 disks to our raid disks --
the only thing installed was ssh, with even rsh being a symlink to ssh
(I was going to remove a whole bunch of packages to free up some space
so I could install rsh, but they didn't let me - it took them long
enough to get it to "work" in the first place).

Problem was that rsync combined with ssh was reading/writing at about
2MB/sec, given the age of the CPU. That will take a day more than
they have.

To put it bluntly, ssh is a *shit* solution on a secured net where
people care about performance.

-- 
TimC -- http://astronomy.swin.edu.au/staff/tconnors/
White dwarf seeks red giant star for binary relationship

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-23 17:26 File system compression, not at the block layer Timothy Miller
  2004-04-23 17:30 ` Miquel van Smoorenburg
@ 2004-04-28  1:00 ` David Lang
  2004-04-28 10:09   ` Jörn Engel
  1 sibling, 1 reply; 43+ messages in thread
From: David Lang @ 2004-04-28  1:00 UTC (permalink / raw)
  To: Timothy Miller; +Cc: Linux Kernel Mailing List

to answer the fundamental question that was asked in this thread but not
answered.

the reason why we want to compress at the block level instead of over the
entire file is that sometimes we want to do random seeks into the middle
of the file or replace a chunk in the middle of a file (edits, inserts,
etc). by doing the compression in a block the worst that you have to do is
to read that one block, decompress it and get your data out (or modify the
block, compress it and put it back on disk). if your unit of compression
is the entire file each of these options will require manipulating basicly
the entire file (Ok, reads you can possibly stop after you found your
data)

as for those who say that compression isn't useful becouse CPU's are so
much faster then disks, I will argue that that's exactly when the
compression becomes most useful, if your CPU would otherwise be idle
waiting for the data to move to/from disk then the compression is
essentially free and you save time overall by transfering less data
through your IO bottleneck. however the right way to do this may be to put
a compression engine on the drive and allow the OS to request/send either
compressed or uncompressed data, that way if it's CPU bound or the data is
already compressed it won't spend CPU time on it, but if it will compress
the CPU and drive interface compress it to ease the bandwidth load between
them.

David Lang

 On Fri, 23 Apr 2004, Timothy Miller wrote:

> Date: Fri, 23 Apr 2004 13:26:38 -0400
> From: Timothy Miller <miller@techsource.com>
> To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
> Subject: File system compression, not at the block layer
>
> This is probably just another of my silly "they already thought of that
> and someone is doing exactly this" ideas.
>
> I get the impression that a lot of people interested in doing FS
> compression want to do it at the block layer.  This gets complicated,
> because you need to allocate partial physical blocks.
>
> Well, why not do the compression at the highest layer?
>
> The idea is something akin to changing this (syntax variation intentional):
>
>     tar cf - somefiles* > file
>
> To this:
>
>     tar cf - somefiles* | gzip > file
>
> Except doing it transparently and for all files.
>
> This way, the disk cache is all compressed data, and only decompressed
> as it's read or written by a process.
>
> For files below a certain size, this is obviously pointless, since you
> can't save any space.  But in many cases, this could speed up the I/O
> for large files that are compressable.  (Space is cheap.  The only
> reason to compress is for speed.)
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-28  1:00 ` David Lang
@ 2004-04-28 10:09   ` Jörn Engel
  2004-04-28 10:21     ` Nikita Danilov
  0 siblings, 1 reply; 43+ messages in thread
From: Jörn Engel @ 2004-04-28 10:09 UTC (permalink / raw)
  To: David Lang; +Cc: Timothy Miller, Linux Kernel Mailing List

On Tue, 27 April 2004 18:00:11 -0700, David Lang wrote:
> 
> to answer the fundamental question that was asked in this thread but not
> answered.
> 
> the reason why we want to compress at the block level instead of over the
> entire file is that sometimes we want to do random seeks into the middle
> of the file or replace a chunk in the middle of a file (edits, inserts,
> etc). by doing the compression in a block the worst that you have to do is
> to read that one block, decompress it and get your data out (or modify the
> block, compress it and put it back on disk). if your unit of compression
> is the entire file each of these options will require manipulating basicly
> the entire file (Ok, reads you can possibly stop after you found your
> data)

*IF* your unit of compression...

If that is the complete block device, you're stupid and deserve what
you get.  If it is the file, same thing.  No difference.

Do it at the file system level or don't do it at all.

Jörn

-- 
Don't worry about people stealing your ideas. If your ideas are any good,
you'll have to ram them down people's throats.
-- Howard Aiken quoted by Ken Iverson quoted by Jim Horning quoted by
   Raph Levien, 1979

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: File system compression, not at the block layer
  2004-04-28 10:09   ` Jörn Engel
@ 2004-04-28 10:21     ` Nikita Danilov
  0 siblings, 0 replies; 43+ messages in thread
From: Nikita Danilov @ 2004-04-28 10:21 UTC (permalink / raw)
  To: Jörn Engel; +Cc: David Lang, Timothy Miller, Linux Kernel Mailing List

JЖrn Engel writes:
 > On Tue, 27 April 2004 18:00:11 -0700, David Lang wrote:
 > > 
 > > to answer the fundamental question that was asked in this thread but not
 > > answered.
 > > 
 > > the reason why we want to compress at the block level instead of over the
 > > entire file is that sometimes we want to do random seeks into the middle
 > > of the file or replace a chunk in the middle of a file (edits, inserts,
 > > etc). by doing the compression in a block the worst that you have to do is
 > > to read that one block, decompress it and get your data out (or modify the
 > > block, compress it and put it back on disk). if your unit of compression
 > > is the entire file each of these options will require manipulating basicly
 > > the entire file (Ok, reads you can possibly stop after you found your
 > > data)
 > 
 > *IF* your unit of compression...
 > 
 > If that is the complete block device, you're stupid and deserve what
 > you get.  If it is the file, same thing.  No difference.
 > 
 > Do it at the file system level or don't do it at all.

File system where unit of disk space allocation is smaller than disk
block (i.e., several files can use portions of the same disk block) can
efficiently use various "units of compression": 100 bytes, device block
size, N-blocks, etc.

 > 
 > JЖrn

Nikita.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2004-04-29 17:18 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-23 17:26 File system compression, not at the block layer Timothy Miller
2004-04-23 17:30 ` Miquel van Smoorenburg
2004-04-23 17:41   ` Theodore Ts'o
2004-04-23 17:57     ` Jörn Engel
2004-04-23 18:14     ` Timothy Miller
2004-04-23 18:34       ` Paul Jackson
2004-04-23 20:14         ` Joel Jaeggli
2004-04-23 20:34           ` Richard B. Johnson
2004-04-23 20:44             ` Måns Rullgård
2004-04-23 20:59               ` Richard B. Johnson
2004-04-23 21:14                 ` Ben Greear
2004-04-23 21:25                   ` Timothy Miller
2004-04-24  4:58                     ` Ben Greear
2004-04-27 15:45                       ` Timothy Miller
2004-04-23 21:18                 ` Timothy Miller
2004-04-24  1:28                   ` Horst von Brand
2004-04-24  2:24                   ` Tom Vier
2004-04-24  7:36                     ` Willy Tarreau
2004-04-24 16:02                       ` Eric D. Mudama
2004-04-25  3:05                       ` Horst von Brand
2004-04-25  7:29                         ` Willy Tarreau
2004-04-25 19:50                           ` Eric D. Mudama
2004-04-27 15:43                     ` Timothy Miller
2004-04-28  0:29                       ` Tom Vier
2004-04-23 21:31             ` Joel Jaeggli
2004-04-23 22:20               ` Ian Stirling
2004-04-23 23:34             ` Paul Jackson
2004-04-27 15:42               ` Timothy Miller
2004-04-27 16:02                 ` Jörn Engel
2004-04-24  1:18             ` Horst von Brand
2004-04-26 10:22             ` Jörn Engel
2004-04-23 21:15           ` Timothy Miller
2004-04-23 21:36             ` Joel Jaeggli
2004-04-27 20:34     ` Pavel Machek
2004-04-28 22:57       ` Timothy Miller
2004-04-29  9:46         ` Jörn Engel
2004-04-29  9:52           ` Pavel Machek
2004-04-29 10:09             ` Jörn Engel
2004-04-29 10:19               ` Pavel Machek
2004-04-29 17:17             ` Tim Connors
2004-04-28  1:00 ` David Lang
2004-04-28 10:09   ` Jörn Engel
2004-04-28 10:21     ` Nikita Danilov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox