Poor performance during disk writes

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Poor performance during disk writes
@ 2001-12-18  0:53 jlm
  2001-12-18  1:09 ` Reid Hekman
  2001-12-18 18:46 ` Andre Hedrick
  0 siblings, 2 replies; 20+ messages in thread
From: jlm @ 2001-12-18  0:53 UTC (permalink / raw)
  To: linux-kernel

I have been witnessing what I believe to be poor performance from my
computer ever since I have moved into the 2.4.x kernel versions. Combing
through the unofficial archives of this mailing list reveals some others
having similar problems, but I haven't seen any real resolution, or
maybe their problems are completely different than mine.

The problem simply is that whenever the computer does a big disk write,
everything else is put on hold. Maybe this isn't a problem, but just the
way it was written. I have tested this on a 2.2.x kernel and it also
does it, but to a much lesser extent, to the point that I noticed the
performance loss in the upgrade to a 2.4.x kernel and decided to
investigate further.

But also, I do not have any performance problems with disk reads.
Programs can be loading up all they want and I am able to use my
computer for other things during that time. It just seems to me (maybe
I'm wrong) that the computer should be able to send small bits of data
to the disk for writing during the off cycles and not affect the rest of
the system (which is what I imagine it is doing for reads).

To test, I've got a 142Meg file. I copy it around, makeing sure to copy
from one disk to another. Of course the copy goes fine, because it does
a cache (as I've been reading here), but eventually it needs to write
out to disk (or when I do a sync) and here is where the computer hangs
for a bit. If an mp3 is playing, it halts for 5 seconds at a time, mouse
movement on the screen is VERY jerky, Gkrellm will stop updating for
seconds and even just in console I can't type in stuff for a bit.

I've been using hdparm to try and tweak hard disk access, but I'm not so
sure this is the problem, and it's making me more confused about the
entire situation. hdparm doesn't allow me to set using_dma, which it
seems ought to be a necessity for getting a decent speed out of your
hard drives (not that speed is the problem here), but despite that I
still get a 51MB/s cache read speed in testing. Confusing, is the hard
drive using (u)dma or not? Also, unmaskirq masks things a bit slower.

So, the questions: Is there a way for me to stop this, some configure
option? Is it a bug/performance issue that needs to be addressed in the
kernel? Should I just go back to the 2.2.x kernel series and shutup
already?

I'm running 3 hard drives (30G Maxtor, 20G Seagate, and 2.1G Quantum
Fireball) on an AMD k6-2 3dnow with a Gigabyte GA-5AX MOBO and the ALI
Aladin V chipset.

Thanks for your time and let me know if you need any more info/ output
from dmesg or something.

-- 
MACINTOSH = Machine Always Crashes If Not The Operating System Hangs
"Life would be so much easier if we could just look at the source code."
- Dave Olson

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Poor performance during disk writes
  2001-12-18  0:53 Poor performance during disk writes jlm
@ 2001-12-18  1:09 ` Reid Hekman
  2001-12-18  1:36   ` jlm
  2001-12-18 18:46 ` Andre Hedrick
  1 sibling, 1 reply; 20+ messages in thread
From: Reid Hekman @ 2001-12-18  1:09 UTC (permalink / raw)
  To: jlm; +Cc: linux-kernel


> So, the questions: Is there a way for me to stop this, some configure
> option? Is it a bug/performance issue that needs to be addressed in the
> kernel? Should I just go back to the 2.2.x kernel series and shutup
> already?
> 
> I'm running 3 hard drives (30G Maxtor, 20G Seagate, and 2.1G Quantum
> Fireball) on an AMD k6-2 3dnow with a Gigabyte GA-5AX MOBO and the ALI
> Aladin V chipset.
> 
> Thanks for your time and let me know if you need any more info/ output
> from dmesg or something.

Specific kernel version, df, & hdparm output would all be helpful. 

> - Dave Olson

Regards,
Reid



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Poor performance during disk writes
  2001-12-18  1:09 ` Reid Hekman
@ 2001-12-18  1:36   ` jlm
  2001-12-18  2:01     ` Reid Hekman
  0 siblings, 1 reply; 20+ messages in thread
From: jlm @ 2001-12-18  1:36 UTC (permalink / raw)
  To: linux-kernel

On Mon, 2001-12-17 at 20:09, Reid Hekman wrote:

> Specific kernel version, df, & hdparm output would all be helpful. 
/usr 24> uname -a
Linux PC2 2.4.16 #1 Sun Dec 2 15:26:09 EST 2001 i586 unknown
/usr 25> df . 
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/hdb1              6047724   3472788   2267728  61% /usr
/usr 26> hdparm -I /dev/hdb

/dev/hdb:

non-removable ATA device, with non-removable media
        Model Number:           ST320413A                               
        Serial Number:          6ED2305M            
        Firmware Revision:      3.39    
Standards:
        Supported: 1 2 3 4 5 
        Likely used: 5
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        bytes/track:    0               (obsolete)
        bytes/sector:   0               (obsolete)
        current sector capacity: 16514064
        LBA user addressable sectors = 39102336
Capabilities:
        LBA, IORDY(can be disabled)
        Buffer size: 512.0kB    Queue depth: 1
        Standby timer values: spec'd by standard
        r/w multiple sector transfer: Max = 16  Current = 16
        DMA: mdma0 mdma1 *mdma2 udma0 udma1 udma2 udma3 udma4 udma5 
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4 
             Cycle time: no flow control=240ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    READ BUFFER cmd
           *    WRITE BUFFER cmd
           *    Host Protected Area feature set
           *    look-ahead
           *    write cache
           *    Power Management feature set
                Security Mode feature set
                SMART feature set
                SET MAX security extension
           *    DOWNLOAD MICROCODE cmd
Security: 
        Master password revision code = 65534
                supported
        not     enabled
        not     locked
        not     frozen
        not     expired: security count
        not     supported: enhanced erase
HW reset results:
        CBLID- above Vih
        Device num = 1
Checksum: correct

My hdb hard drive is where I found the problem originally. Also, I'm
running the ext2 filesystem.

-- 
MACINTOSH = Machine Always Crashes If Not The Operating System Hangs
"Life would be so much easier if we could just look at the source code."
- Dave Olson


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Poor performance during disk writes
  2001-12-18  1:36   ` jlm
@ 2001-12-18  2:01     ` Reid Hekman
  0 siblings, 0 replies; 20+ messages in thread
From: Reid Hekman @ 2001-12-18  2:01 UTC (permalink / raw)
  To: jlm; +Cc: linux-kernel

On Mon, 2001-12-17 at 19:36, jlm wrote:
> On Mon, 2001-12-17 at 20:09, Reid Hekman wrote:
> 
> > Specific kernel version, df, & hdparm output would all be helpful. 
> /usr 24> uname -a
> Linux PC2 2.4.16 #1 Sun Dec 2 15:26:09 EST 2001 i586 unknown

Is PCI IDE support for your chipset compiled in? PCI DMA by default?

> non-removable ATA device, with non-removable media
>         Model Number:           ST320413A                               
>         Serial Number:          6ED2305M            
>         Firmware Revision:      3.39    
[...]
> Capabilities:
>         LBA, IORDY(can be disabled)
>         Buffer size: 512.0kB    Queue depth: 1
>         Standby timer values: spec'd by standard
>         r/w multiple sector transfer: Max = 16  Current = 16
>         DMA: mdma0 mdma1 *mdma2 udma0 udma1 udma2 udma3 udma4 udma5 
>              Cycle time: min=120ns recommended=120ns

Can you set udma on the drive instead?

> - Dave Olson

Regards,
Reid


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Poor performance during disk writes
  2001-12-18  0:53 Poor performance during disk writes jlm
  2001-12-18  1:09 ` Reid Hekman
@ 2001-12-18 18:46 ` Andre Hedrick
  2001-12-18 17:42   ` Gérard Roudier
  1 sibling, 1 reply; 20+ messages in thread
From: Andre Hedrick @ 2001-12-18 18:46 UTC (permalink / raw)
  To: jlm; +Cc: linux-kernel



File './Bonnie.2276', size: 1073741824, volumes: 1
Writing with putc()...  done:  72692 kB/s  83.7 %CPU
Rewriting...            done:  25355 kB/s  12.0 %CPU
Writing intelligently...done: 103022 kB/s  40.5 %CPU
Reading with getc()...  done:  37188 kB/s  67.5 %CPU
Reading intelligently...done:  40809 kB/s  11.4 %CPU
Seeker 2...Seeker 1...Seeker 3...start 'em...done...done...done...
              ---Sequential Output (nosync)--- ---Sequential Input-- --Rnd Seek-
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --04k (03)-
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU   /sec %CPU
       1*1024 72692 83.7 103022 40.5 25355 12.0 37188 67.5 40809 11.4  382.1  2.4

Maybe this is the kind of performance you want out your ATA subsystem.
Maybe if I could get a patch in to the kernels we could all have stable
and fast IO.

Regards,


Andre Hedrick
CEO/President, LAD Storage Consulting Group
Linux ATA Development
Linux Disk Certification Project

 On 17 Dec 2001, jlm wrote:

> I have been witnessing what I believe to be poor performance from my
> computer ever since I have moved into the 2.4.x kernel versions. Combing
> through the unofficial archives of this mailing list reveals some others
> having similar problems, but I haven't seen any real resolution, or
> maybe their problems are completely different than mine.
> 
> The problem simply is that whenever the computer does a big disk write,
> everything else is put on hold. Maybe this isn't a problem, but just the
> way it was written. I have tested this on a 2.2.x kernel and it also
> does it, but to a much lesser extent, to the point that I noticed the
> performance loss in the upgrade to a 2.4.x kernel and decided to
> investigate further.
> 
> But also, I do not have any performance problems with disk reads.
> Programs can be loading up all they want and I am able to use my
> computer for other things during that time. It just seems to me (maybe
> I'm wrong) that the computer should be able to send small bits of data
> to the disk for writing during the off cycles and not affect the rest of
> the system (which is what I imagine it is doing for reads).
> 
> To test, I've got a 142Meg file. I copy it around, makeing sure to copy
> from one disk to another. Of course the copy goes fine, because it does
> a cache (as I've been reading here), but eventually it needs to write
> out to disk (or when I do a sync) and here is where the computer hangs
> for a bit. If an mp3 is playing, it halts for 5 seconds at a time, mouse
> movement on the screen is VERY jerky, Gkrellm will stop updating for
> seconds and even just in console I can't type in stuff for a bit.
> 
> I've been using hdparm to try and tweak hard disk access, but I'm not so
> sure this is the problem, and it's making me more confused about the
> entire situation. hdparm doesn't allow me to set using_dma, which it
> seems ought to be a necessity for getting a decent speed out of your
> hard drives (not that speed is the problem here), but despite that I
> still get a 51MB/s cache read speed in testing. Confusing, is the hard
> drive using (u)dma or not? Also, unmaskirq masks things a bit slower.
> 
> So, the questions: Is there a way for me to stop this, some configure
> option? Is it a bug/performance issue that needs to be addressed in the
> kernel? Should I just go back to the 2.2.x kernel series and shutup
> already?
> 
> I'm running 3 hard drives (30G Maxtor, 20G Seagate, and 2.1G Quantum
> Fireball) on an AMD k6-2 3dnow with a Gigabyte GA-5AX MOBO and the ALI
> Aladin V chipset.
> 
> Thanks for your time and let me know if you need any more info/ output
> from dmesg or something.
> 
> -- 
> MACINTOSH = Machine Always Crashes If Not The Operating System Hangs
> "Life would be so much easier if we could just look at the source code."
> - Dave Olson
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Poor performance during disk writes
  2001-12-18 18:46 ` Andre Hedrick
@ 2001-12-18 17:42   ` Gérard Roudier
  2001-12-18 20:34     ` Andre Hedrick
  2001-12-21 16:29     ` Poor performance during disk writes Troy Benjegerdes
  0 siblings, 2 replies; 20+ messages in thread
From: Gérard Roudier @ 2001-12-18 17:42 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: jlm, linux-kernel



On Tue, 18 Dec 2001, Andre Hedrick wrote:

> File './Bonnie.2276', size: 1073741824, volumes: 1
> Writing with putc()...  done:  72692 kB/s  83.7 %CPU
> Rewriting...            done:  25355 kB/s  12.0 %CPU
> Writing intelligently...done: 103022 kB/s  40.5 %CPU
> Reading with getc()...  done:  37188 kB/s  67.5 %CPU
> Reading intelligently...done:  40809 kB/s  11.4 %CPU
> Seeker 2...Seeker 1...Seeker 3...start 'em...done...done...done...
>               ---Sequential Output (nosync)--- ---Sequential Input-- --Rnd Seek-
>               -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --04k (03)-
> Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU   /sec %CPU
>        1*1024 72692 83.7 103022 40.5 25355 12.0 37188 67.5 40809 11.4  382.1  2.4
>
> Maybe this is the kind of performance you want out your ATA subsystem.
> Maybe if I could get a patch in to the kernels we could all have stable
> and fast IO.

I rather see lots of wasting rather than performance, here. Bonnie says
that your subsystem can sustain 103 MB/s write but only 41 MB/s read. This
looks about 60% throughput wasted for read.

Note that if you intend to use it only for write-only applications,
performance are not that bad, even if just dropping the data on the floor
would give you infinite throughput without any difference in
functionnality. :-)


Gérard Roudier
Not CEO, not President of anything.

> Regards,
>
>
> Andre Hedrick
> CEO/President, LAD Storage Consulting Group
> Linux ATA Development
> Linux Disk Certification Project


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Poor performance during disk writes
  2001-12-18 17:42   ` Gérard Roudier
@ 2001-12-18 20:34     ` Andre Hedrick
  2001-12-18 19:09       ` Gérard Roudier
  2001-12-21 16:29     ` Poor performance during disk writes Troy Benjegerdes
  1 sibling, 1 reply; 20+ messages in thread
From: Andre Hedrick @ 2001-12-18 20:34 UTC (permalink / raw)
  To: Gérard Roudier; +Cc: jlm, linux-kernel

On Tue, 18 Dec 2001, Gérard Roudier wrote:

> 
> 
> On Tue, 18 Dec 2001, Andre Hedrick wrote:
> 
> > File './Bonnie.2276', size: 1073741824, volumes: 1
> > Writing with putc()...  done:  72692 kB/s  83.7 %CPU
> > Rewriting...            done:  25355 kB/s  12.0 %CPU
> > Writing intelligently...done: 103022 kB/s  40.5 %CPU
> > Reading with getc()...  done:  37188 kB/s  67.5 %CPU
> > Reading intelligently...done:  40809 kB/s  11.4 %CPU
> > Seeker 2...Seeker 1...Seeker 3...start 'em...done...done...done...
> >               ---Sequential Output (nosync)--- ---Sequential Input-- --Rnd Seek-
> >               -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --04k (03)-
> > Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU   /sec %CPU
> >        1*1024 72692 83.7 103022 40.5 25355 12.0 37188 67.5 40809 11.4  382.1  2.4
> >
> > Maybe this is the kind of performance you want out your ATA subsystem.
> > Maybe if I could get a patch in to the kernels we could all have stable
> > and fast IO.
> 
> I rather see lots of wasting rather than performance, here. Bonnie says
> that your subsystem can sustain 103 MB/s write but only 41 MB/s read. This
> looks about 60% throughput wasted for read.
> 
> Note that if you intend to use it only for write-only applications,
> performance are not that bad, even if just dropping the data on the floor
> would give you infinite throughput without any difference in
> functionnality. :-)

Well sense somebody paid/paying me make write performance go through the
roof -- that is what I did.  Now if you look closely you could see that in
writing we are doing a boat load more work than reading.  If somebody want
me to throttle the reads more then they know how to get it done.

Regards,

Andre Hedrick
Linux Disk Certification Project                Linux ATA Development


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Poor performance during disk writes
  2001-12-18 20:34     ` Andre Hedrick
@ 2001-12-18 19:09       ` Gérard Roudier
  2001-12-19 23:26         ` jlm
  0 siblings, 1 reply; 20+ messages in thread
From: Gérard Roudier @ 2001-12-18 19:09 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: jlm, linux-kernel



On Tue, 18 Dec 2001, Andre Hedrick wrote:

> On Tue, 18 Dec 2001, Gérard Roudier wrote:
>
> >
> >
> > On Tue, 18 Dec 2001, Andre Hedrick wrote:
> >
> > > File './Bonnie.2276', size: 1073741824, volumes: 1
> > > Writing with putc()...  done:  72692 kB/s  83.7 %CPU
> > > Rewriting...            done:  25355 kB/s  12.0 %CPU
> > > Writing intelligently...done: 103022 kB/s  40.5 %CPU
> > > Reading with getc()...  done:  37188 kB/s  67.5 %CPU
> > > Reading intelligently...done:  40809 kB/s  11.4 %CPU
> > > Seeker 2...Seeker 1...Seeker 3...start 'em...done...done...done...
> > >               ---Sequential Output (nosync)--- ---Sequential Input-- --Rnd Seek-
> > >               -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --04k (03)-
> > > Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU   /sec %CPU
> > >        1*1024 72692 83.7 103022 40.5 25355 12.0 37188 67.5 40809 11.4  382.1  2.4
> > >
> > > Maybe this is the kind of performance you want out your ATA subsystem.
> > > Maybe if I could get a patch in to the kernels we could all have stable
> > > and fast IO.
> >
> > I rather see lots of wasting rather than performance, here. Bonnie says
> > that your subsystem can sustain 103 MB/s write but only 41 MB/s read. This
> > looks about 60% throughput wasted for read.
> >
> > Note that if you intend to use it only for write-only applications,
> > performance are not that bad, even if just dropping the data on the floor
> > would give you infinite throughput without any difference in
> > functionnality. :-)
>
> Well sense somebody paid/paying me make write performance go through the
> roof -- that is what I did.  Now if you look closely you could see that in
> writing we are doing a boat load more work than reading.  If somebody want
> me to throttle the reads more then they know how to get it done.

I am not the one that will pay you for that, as you can guess. :-)

I just was curious about the technical reasons, if any, of so large a
difference. Just, the CPU and the memory subsystem are certainly not the
issue. But I donnot want to prevent you from earning from such kind of
improvement. Hence, let me go back to free scsi.

  Gérard.

> Regards,
>
> Andre Hedrick
> Linux Disk Certification Project                Linux ATA Development


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Poor performance during disk writes
  2001-12-18 19:09       ` Gérard Roudier
@ 2001-12-19 23:26         ` jlm
  2001-12-20 10:49           ` Helge Hafting
  0 siblings, 1 reply; 20+ messages in thread
From: jlm @ 2001-12-19 23:26 UTC (permalink / raw)
  To: linux-kernel

On Tue, 2001-12-18 at 14:09, Gérard Roudier wrote:
> 
> 
> On Tue, 18 Dec 2001, Andre Hedrick wrote:
> 
> > On Tue, 18 Dec 2001, Gérard Roudier wrote:
> >
> > >
> > >
> > > On Tue, 18 Dec 2001, Andre Hedrick wrote:
> > >
> > > > File './Bonnie.2276', size: 1073741824, volumes: 1
> > > > Writing with putc()...  done:  72692 kB/s  83.7 %CPU
> > > > Rewriting...            done:  25355 kB/s  12.0 %CPU
> > > > Writing intelligently...done: 103022 kB/s  40.5 %CPU
> > > > Reading with getc()...  done:  37188 kB/s  67.5 %CPU
> > > > Reading intelligently...done:  40809 kB/s  11.4 %CPU
> > > > Seeker 2...Seeker 1...Seeker 3...start
'em...done...done...done...
> > > >               ---Sequential Output (nosync)--- ---Sequential
Input-- --Rnd Seek-
> > > >               -Per Char- --Block--- -Rewrite-- -Per Char-
--Block--- --04k (03)-
> > > > Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec
%CPU   /sec %CPU
> > > >        1*1024 72692 83.7 103022 40.5 25355 12.0 37188 67.5 40809
11.4  382.1  2.4
> > > >
> > > > Maybe this is the kind of performance you want out your ATA
subsystem.
> > > > Maybe if I could get a patch in to the kernels we could all have
stable
> > > > and fast IO.
I think people might be missing the issue that I'm having, here. Let me
see if I can clarify. I'm not too concerned about write speed. I don't
care too much if the hard drive can only write one byte per second. The
problem is that when the kernel decides to write out to the disk, it is
pre-empting everything else. All output to the user in X, the sound
card, and also text typing in the console is put "on the back burner"
while the disk is written to.

It seems to me that smaller chunks of data can be written to the disk
without disrupting my use of the computer (which is the case with
untarring a small file, for instance), so if the kernel has got a lot to
write to disk, just do that as a bunch of smaller writes and we should
be fine.

So I guess I don't really care what mode the hard drive is operating in
(udma, mdma, dma or plain ide), I just don't want to have to go get a
cup of coffee while the hard drive saves some data. Is there a "don't
pre-empt the rest of the system" switch for the eide drives? Is there
something fundamental/unique going on here that I'm missing?

Thanks for listening.
> > >
> > > I rather see lots of wasting rather than performance, here. Bonnie
says
> > > that your subsystem can sustain 103 MB/s write but only 41 MB/s
read. This
> > > looks about 60% throughput wasted for read.
> > >
> > > Note that if you intend to use it only for write-only
applications,
> > > performance are not that bad, even if just dropping the data on
the floor
> > > would give you infinite throughput without any difference in
> > > functionnality. :-)
> >
> > Well sense somebody paid/paying me make write performance go through
the
> > roof -- that is what I did.  Now if you look closely you could see
that in
> > writing we are doing a boat load more work than reading.  If
somebody want
> > me to throttle the reads more then they know how to get it done.
> 
> I am not the one that will pay you for that, as you can guess. :-)
> 
> I just was curious about the technical reasons, if any, of so large a
> difference. Just, the CPU and the memory subsystem are certainly not
the
> issue. But I donnot want to prevent you from earning from such kind of
> improvement. Hence, let me go back to free scsi.

-- 
MACINTOSH = Machine Always Crashes If Not The Operating System Hangs
"Life would be so much easier if we could just look at the source code."
- Dave Olson


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Poor performance during disk writes
  2001-12-19 23:26         ` jlm
@ 2001-12-20 10:49           ` Helge Hafting
  2001-12-20 11:16             ` Oops in 2.4.14-pre6 and 2.4.14-pre9aa1 Andre Margis
  0 siblings, 1 reply; 20+ messages in thread
From: Helge Hafting @ 2001-12-20 10:49 UTC (permalink / raw)
  To: jlm, linux-kernel

jlm wrote:

> I think people might be missing the issue that I'm having, here. Let me
> see if I can clarify. I'm not too concerned about write speed. I don't
> care too much if the hard drive can only write one byte per second. The
> problem is that when the kernel decides to write out to the disk, it is
> pre-empting everything else. All output to the user in X, the sound
> card, and also text typing in the console is put "on the back burner"
> while the disk is written to.

There may be a problem here, and maybe not:

All of the actions above _may_ require disk access.  The shell you type
into could be swapped out, for example.  A slow disk will be a problem
in that case, swapin won't happen until the disk head seeks to
the relevant position, and that may be delayed by the write.  Even
if the cpu is capable of doing other work while IO is going on.

> It seems to me that smaller chunks of data can be written to the disk
> without disrupting my use of the computer (which is the case with
> untarring a small file, for instance), so if the kernel has got a lot to
> write to disk, just do that as a bunch of smaller writes and we should
> be fine.
> 
> So I guess I don't really care what mode the hard drive is operating in
> (udma, mdma, dma or plain ide), I just don't want to have to go get a
> cup of coffee while the hard drive saves some data. Is there a "don't

Devices generally get the cpu before anything else.  A good disk system
don't need much cpu.  Running IDE in PIO mode require a lot
of cpu though.   Using any of the DMA modes avoids that.

> pre-empt the rest of the system" switch for the eide drives? Is there
> something fundamental/unique going on here that I'm missing?
dma, udma, etc. is that switch.  It lets the cpu do other work (such as
redrawing X) while the disk is busy.  Plain ide is what you don't want.

The problem of waiting for other files or swapping while a really big
write
is going on is different.  Get more drives, so the big writes go
to one drive while you get stuff swapped in (or other file access)
on other drive(s).  The kernel is capable of getting fast response
from one drive while another is completely bogged down with
enormous writes.

Helge Hafting

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Oops in 2.4.14-pre6 and 2.4.14-pre9aa1
  2001-12-20 10:49           ` Helge Hafting
@ 2001-12-20 11:16             ` Andre Margis
  0 siblings, 0 replies; 20+ messages in thread
From: Andre Margis @ 2001-12-20 11:16 UTC (permalink / raw)
  To: linux-kernel

I'm running a Application server  in a DELL POWEREDGE 8450 with 4xP-III 700 
Mhz, 4GB Memory. After 40 days running kernel 2.4.14-pre6 my system reports 
the following error in /var/adm/messages:

Dec 13 13:52:42 front01 kernel: invalid operand: 0000
Dec 13 13:52:42 front01 kernel: CPU:    3
Dec 13 13:52:42 front01 kernel: EIP:    0010:[<c012e764>]    Not tainted
Dec 13 13:52:42 front01 kernel: EFLAGS: 00010282
Dec 13 13:52:42 front01 kernel: eax: 00000880   ebx: c855e280   ecx: c855e280 
  edx: 00000000
Dec 13 13:52:42 front01 kernel: esi: fe000ff4   edi: 00000000   ebp: 00000ff1 
  esp: ce439eb0
Dec 13 13:52:42 front01 kernel: ds: 0018   es: 0018   ss: 0018
Dec 13 13:52:42 front01 kernel: Process ps.bin (pid: 29154, 
stackpage=ce439000)
Dec 13 13:52:42 front01 kernel: Stack: c855e280 fe000ff4 e58f3003 00000ff1 
f40252a3 c01f4bd8 c021bb70 c01504aa
Dec 13 13:52:42 front01 kernel:        c855e280 c012eebc c011daf8 00000003 
00000003 bffffff1 eef09920 c011dbb6
Dec 13 13:52:42 front01 kernel:        ca07b400 eef09920 bffffff1 e58f3000 
00000003 00000000 ca07b400 ca07b41c
Dec 13 13:52:42 front01 kernel: Call Trace: [<c01504aa>] [<c012eebc>] 
[<c011daf8>] [<c011dbb6>] [<c011dc3a>]
Dec 13 13:52:42 front01 kernel:    [<c014f66a>] [<c014f8db>] [<c01343d7>] 
[<c0106f73>]
Dec 13 13:52:42 front01 kernel:
Dec 13 13:52:42 front01 kernel: Code: 0f 0b ba 00 e0 ff ff 80 63 18 eb 21 e2 
f6 42 05 20 0f 85 55

I change ther kernel to 2.4.15-pre9aa1 and today the same error occurs, one 
week later, with this message:
Dec 20 01:21:58 front01 kernel: invalid operand: 0000
Dec 20 01:21:58 front01 kernel: CPU:    3
Dec 20 01:21:58 front01 kernel: EIP:    0010:[<c013153b>]    Not tainted
Dec 20 01:21:58 front01 kernel: EFLAGS: 00010202
Dec 20 01:21:58 front01 kernel: eax: 00000840   ebx: c5ba3ec0   ecx: c5ba3ec0 
  edx: 00000000
Dec 20 01:21:58 front01 kernel: esi: fe000f81   edi: 00000000   ebp: 00000f7b 
  esp: c2cebeb0
Dec 20 01:21:58 front01 kernel: ds: 0018   es: 0018   ss: 0018
Dec 20 01:21:58 front01 kernel: Process ps.bin (pid: 19556, 
stackpage=c2ceb000)
Dec 20 01:21:58 front01 kernel: Stack: c5ba3ec0 fe000f81 cd7b1006 00000f7b 
c0238fb0 c0154d8a d51db340 d51db340
Dec 20 01:21:58 front01 kernel:        c5ba3ec0 c0131d98 c011ffb8 00000006 
00000006 bfffff7b c8b8e740 c0120076
Dec 20 01:21:58 front01 kernel:        c294b820 c8b8e740 bfffff7b cd7b1000 
00000006 00000000 c294b83c c294b820
Dec 20 01:21:58 front01 kernel: Call Trace: [<c0154d8a>] [<c0131d98>] 
[<c011ffb8>] [<c0120076>] [<c0120119>]
Dec 20 01:21:58 front01 kernel:    [<c0153f3a>] [<c01541ab>] [<c0138117>] 
[<c0106f73>]
Dec 20 01:21:58 front01 kernel:
Dec 20 01:21:58 front01 kernel: Code: 0f 0b 8b 43 18 a8 80 74 02 0f 0b b9 00 
e0 ff ff 80 63 18 eb

ksymoops:

ksymoops 2.4.1 on i686 2.4.15-pre9.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.15-pre9/ (default)
     -m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Error (expand_objects): cannot stat(/lib/reiserfs.o) for reiserfs
Error (expand_objects): cannot stat(/lib/sym53c8xx.o) for sym53c8xx
Error (expand_objects): cannot stat(/lib/qla2x00.o) for qla2x00
Error (expand_objects): cannot stat(/lib/megaraid.o) for megaraid
Error (expand_objects): cannot stat(/lib/sd_mod.o) for sd_mod
Error (expand_objects): cannot stat(/lib/scsi_mod.o) for scsi_mod
Warning (read_object): no symbols in 
/lib/modules/2.4.15-pre9/build/net/ipv4/netfilter/netfilter.o
Warning (read_object): no symbols in 
/lib/modules/2.4.15-pre9/build/net/ipv6/netfilter/netfilter.o
Warning (read_object): no symbols in 
/lib/modules/2.4.15-pre9/build/drivers/net/fc/fc.o
Warning (read_object): no symbols in 
/lib/modules/2.4.15-pre9/build/drivers/net/wan/wan.o
Warning (read_object): no symbols in 
/lib/modules/2.4.15-pre9/build/drivers/net/appletalk/appletalk.o
Warning (read_object): no symbols in 
/lib/modules/2.4.15-pre9/build/drivers/net/tokenring/tr.o
Warning (read_object): no symbols in 
/lib/modules/2.4.15-pre9/build/drivers/net/pcmcia/pcmcia_net.o
Warning (read_object): no symbols in 
/lib/modules/2.4.15-pre9/build/drivers/net/wireless/wireless_net.o
Warning (read_object): no symbols in 
/lib/modules/2.4.15-pre9/build/drivers/misc/misc.o
Warning (read_object): no symbols in 
/lib/modules/2.4.15-pre9/build/drivers/cdrom/driver.o
Warning (read_object): no symbols in 
/lib/modules/2.4.15-pre9/build/drivers/media/radio/radio.o
Warning (read_object): no symbols in 
/lib/modules/2.4.15-pre9/build/drivers/media/video/video.o
Warning (read_object): no symbols in 
/lib/modules/2.4.15-pre9/build/drivers/media/media.o
Warning (read_object): no symbols in 
/lib/modules/2.4.15-pre9/build/drivers/sound/sounddrivers.o
Warning (read_object): no symbols in 
/lib/modules/2.4.15-pre9/build/drivers/parport/driver.o
Warning (read_object): no symbols in 
/lib/modules/2.4.15-pre9/build/drivers/hotplug/vmlinux-obj.o
Warning (compare_maps): mismatch on symbol partition_name  , ksyms_base says 
c01a3160, System.map says c0158b20.  Ignoring ksyms_base entry
Warning (compare_maps): mismatch on symbol vg  , lvm-mod says c89d8b20, 
/lib/modules/2.4.15-pre9/kernel/drivers/md/lvm-mod.o says c89d8780.  Ignoring 
/lib/modules/2.4.15-pre9/kernel/drivers/md/lvm-mod.o entry
Warning (map_ksym_to_module): cannot match loaded module reiserfs to a unique 
module object.  Trace may not be reliable.
Warning (map_ksym_to_module): cannot match loaded module sym53c8xx to a 
unique module object.  Trace may not be reliable.
Warning (map_ksym_to_module): cannot match loaded module qla2x00 to a unique 
module object.  Trace may not be reliable.
Warning (map_ksym_to_module): cannot match loaded module megaraid to a unique 
mole object.  Trace may not be reliable.
Warning (map_ksym_to_module): cannot match loaded module scsi_mod to a unique 
module object.  Trace may not be reliable.
Dec 20 01:21:58 front01 kernel: invalid operand: 0000
Dec 20 01:21:58 front01 kernel: CPU:    3
Dec 20 01:21:58 front01 kernel: EIP:    0010:[<c013153b>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Dec 20 01:21:58 front01 kernel: EFLAGS: 00010202
Dec 20 01:21:58 front01 kernel: eax: 00000840   ebx: c5ba3ec0   ecx: c5ba3ec0 
edx: 00000000
Dec 20 01:21:58 front01 kernel: esi: fe000f81   edi: 00000000   ebp: 00000f7b 
esp: c2cebeb0
Dec 20 01:21:58 front01 kernel: ds: 0018   es: 0018   ss: 0018
Dec 20 01:21:58 front01 kernel: Process ps.bin (pid: 19556, 
stackpage=c2ceb000)
Dec 20 01:21:58 front01 kernel: Stack: c5ba3ec0 fe000f81 cd7b1006 00000f7b 
c0238fb0 c0154d8a d51db340 d51db340 
Dec 20 01:21:58 front01 kernel:        c5ba3ec0 c0131d98 c011ffb8 00000006 
00000006 bfffff7b c8b8e740 c0120076 
Dec 20 01:21:58 front01 kernel:        c294b820 c8b8e740 bfffff7b cd7b1000 
00000006 00000000 c294b83c c294b820 
Dec 20 01:21:58 front01 kernel: Call Trace: [<c0154d8a>] [<c0131d98>] 
[<c011ffb8>] [<c0120076>] [<c0120119>]

Warning (Oops_read): Code line not seen, dumping what data is available

>>EIP; c013153b <__free_pages_ok+4b/238>   <=====
Trace; c0154d8a <proc_base_lookup+22a/23c>
Trace; c0131d98 <__free_pages+1c/20>
Trace; c011ffb8 <access_one_page+244/2a0>
Trace; c0120076 <access_mm+62/7c>
Trace; c0120119 <access_process_vm+89/c4>
Trace; c0153f3a <proc_pid_cmdline+62/e8>
Trace; c01541ab <proc_info_read+53/110>
Trace; c0138117 <sys_read+8f/c4>
Trace; c0106f73 <system_call+33/38>

26 warnings and 6 errors issued.  Results may not be reliable.

In all oops the system stay up, but if you run a ps command this process 
freeze.

It's possible to reboot the machine using reboot -f

Any help?

Thank's in advance

Andre Margis

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Poor performance during disk writes
  2001-12-18 17:42   ` Gérard Roudier
  2001-12-18 20:34     ` Andre Hedrick
@ 2001-12-21 16:29     ` Troy Benjegerdes
  1 sibling, 0 replies; 20+ messages in thread
From: Troy Benjegerdes @ 2001-12-21 16:29 UTC (permalink / raw)
  To: Gérard Roudier; +Cc: Andre Hedrick, jlm, linux-kernel

On Tue, Dec 18, 2001 at 06:42:49PM +0100, Gérard Roudier wrote:
> 
> 
> On Tue, 18 Dec 2001, Andre Hedrick wrote:
> 
> > File './Bonnie.2276', size: 1073741824, volumes: 1
> > Writing with putc()...  done:  72692 kB/s  83.7 %CPU
> > Rewriting...            done:  25355 kB/s  12.0 %CPU
> > Writing intelligently...done: 103022 kB/s  40.5 %CPU
> > Reading with getc()...  done:  37188 kB/s  67.5 %CPU
> > Reading intelligently...done:  40809 kB/s  11.4 %CPU
> > Seeker 2...Seeker 1...Seeker 3...start 'em...done...done...done...
> >               ---Sequential Output (nosync)--- ---Sequential Input-- --Rnd Seek-
> >               -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --04k (03)-
> > Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU   /sec %CPU
> >        1*1024 72692 83.7 103022 40.5 25355 12.0 37188 67.5 40809 11.4  382.1  2.4
> >
> > Maybe this is the kind of performance you want out your ATA subsystem.
> > Maybe if I could get a patch in to the kernels we could all have stable
> > and fast IO.
> 
> I rather see lots of wasting rather than performance, here. Bonnie says
> that your subsystem can sustain 103 MB/s write but only 41 MB/s read. This
> looks about 60% throughput wasted for read.

Uh, well, um, what drive is he writing too?? He could very well have 2 gig 
of memory in this box and half the writes were cached. 41MB/s seems 
reasonable for most common IDE disks. Of course I know Andre has some 
rather 'uncommon' IDE drives :P

Does bonnie actually do any sort of 'sync' operation to ensure data 
writen is on the disk? Is that 100mb/sec write real, or just because of 
block layer caching?

> 
> Note that if you intend to use it only for write-only applications,
> performance are not that bad, even if just dropping the data on the floor
> would give you infinite throughput without any difference in
> functionnality. :-)
> 
> 
> Gérard Roudier
> Not CEO, not President of anything.
> 
> > Regards,
> >
> >
> > Andre Hedrick
> > CEO/President, LAD Storage Consulting Group
> > Linux ATA Development
> > Linux Disk Certification Project
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Troy Benjegerdes | master of mispeeling | 'da hozer' |  hozer@drgw.net
-----"If this message isn't misspelled, I didn't write it" -- Me -----
"Why do musicians compose symphonies and poets write poems? They do it
because life wouldn't have any meaning for them if they didn't. That's 
why I draw cartoons. It's my life." -- Charles Schulz

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Poor performance during disk writes
@ 2001-12-20 13:27 Dieter Nützel
  2001-12-20 14:51 ` safemode
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Dieter Nützel @ 2001-12-20 13:27 UTC (permalink / raw)
  To: Helge Hafting; +Cc: jlm, Andre Hedrick, Linux Kernel List

On Thursday, 20.12.201, 10:49 Helge Hafting wrote:
> jlm wrote:
[-]
> > So I guess I don't really care what mode the hard drive is operating in
> > (udma, mdma, dma or plain ide), I just don't want to have to go get a
> > cup of coffee while the hard drive saves some data. Is there a "don't
>
> Devices generally get the cpu before anything else.  A good disk system
> don't need much cpu.  Running IDE in PIO mode require a lot
> of cpu though.   Using any of the DMA modes avoids that.

Amen..
Sorry, Helge sure you are right in theory but try dbench 32 (maybe 
bonnie/bonnie++) and playing an MP3/Ogg-Vorbis in parallel...
That's my first test on any "new" kernel version.

Even with an 1 GHz Athlon II, 640 MB, U160 DDYS 18 GB, 10k IBM disk (on an 
AHA-2940UW) it stutters like mad. I am running all my kernel _with_ Robert 
Love's preempt + lock-break patches and it doesn't solve the problem.
CPU load is (very) low but it do not work like it should.

> > pre-empt the rest of the system" switch for the eide drives? Is there
> > something fundamental/unique going on here that I'm missing?
> dma, udma, etc. is that switch.  It lets the cpu do other work (such as
> redrawing X) while the disk is busy.  Plain ide is what you don't want.

See above the whole system show some bad hiccup.

> The problem of waiting for other files or swapping while a really big
> write is going on is different.  Get more drives, so the big writes go
> to one drive while you get stuff swapped in (or other file access)
> on other drive(s).  The kernel is capable of getting fast response
> from one drive while another is completely bogged down with
> enormous writes.

Tried this already. Neither I put my test files (MP3/Ogg-Vorbis) in /dev/shm 
or a nother disk it do not change anything.

There must be something in the VFS?

-Dieter

-- 
Dieter Nützel
Graduate Student, Computer Science
University of Hamburg
@home: Dieter.Nuetzel@hamburg.de

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Poor performance during disk writes
  2001-12-20 13:27 Dieter Nützel
@ 2001-12-20 14:51 ` safemode
  2001-12-20 17:40 ` William Lee Irwin III
       [not found] ` <0112201629230E.01835@manta>
  2 siblings, 0 replies; 20+ messages in thread
From: safemode @ 2001-12-20 14:51 UTC (permalink / raw)
  To: Dieter Nützel; +Cc: Helge Hafting, jlm, Andre Hedrick, Linux Kernel List

On Thu, 2001-12-20 at 08:27, Dieter Nützel wrote:
> On Thursday, 20.12.201, 10:49 Helge Hafting wrote:
> > jlm wrote:
> [-]
> > > So I guess I don't really care what mode the hard drive is operating in
> > > (udma, mdma, dma or plain ide), I just don't want to have to go get a
> > > cup of coffee while the hard drive saves some data. Is there a "don't
> >
> > Devices generally get the cpu before anything else.  A good disk system
> > don't need much cpu.  Running IDE in PIO mode require a lot
> > of cpu though.   Using any of the DMA modes avoids that.
> 
> Amen..
> Sorry, Helge sure you are right in theory but try dbench 32 (maybe 
> bonnie/bonnie++) and playing an MP3/Ogg-Vorbis in parallel...
> That's my first test on any "new" kernel version.
> 
> Even with an 1 GHz Athlon II, 640 MB, U160 DDYS 18 GB, 10k IBM disk (on an 
> AHA-2940UW) it stutters like mad. I am running all my kernel _with_ Robert 
> Love's preempt + lock-break patches and it doesn't solve the problem.
> CPU load is (very) low but it do not work like it should.

try it with vanilla 2.4.17-rc1.  I just did and i'm getting no
stuttering at all.   nice -n 20 dbench 32.   Worked quite nicely. Of
course i'm using an ext3 fs which is more important than your cpu speed
or ram.   This kind of discussion has been talked about and argued over
many times in the past here already.  Too many factors go into this and
in the end,  dbench is _Meant_ to preempt everything else.  if you want
a real test find a real program you really use and use it.  


 
> > > pre-empt the rest of the system" switch for the eide drives? Is there
> > > something fundamental/unique going on here that I'm missing?
> > dma, udma, etc. is that switch.  It lets the cpu do other work (such as
> > redrawing X) while the disk is busy.  Plain ide is what you don't want.
> 
> See above the whole system show some bad hiccup.
> 
> > The problem of waiting for other files or swapping while a really big
> > write is going on is different.  Get more drives, so the big writes go
> > to one drive while you get stuff swapped in (or other file access)
> > on other drive(s).  The kernel is capable of getting fast response
> > from one drive while another is completely bogged down with
> > enormous writes.
> 
> Tried this already. Neither I put my test files (MP3/Ogg-Vorbis) in /dev/shm 
> or a nother disk it do not change anything.
> 
> There must be something in the VFS?
> 
> -Dieter
> 
> -- 
> Dieter Nützel
> Graduate Student, Computer Science
> University of Hamburg
> @home: Dieter.Nuetzel@hamburg.de
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Poor performance during disk writes
  2001-12-20 13:27 Dieter Nützel
  2001-12-20 14:51 ` safemode
@ 2001-12-20 17:40 ` William Lee Irwin III
  2001-12-20 18:19   ` Andrew Morton
       [not found] ` <0112201629230E.01835@manta>
  2 siblings, 1 reply; 20+ messages in thread
From: William Lee Irwin III @ 2001-12-20 17:40 UTC (permalink / raw)
  To: Linux Kernel List

On Thu, Dec 20, 2001 at 02:27:17PM +0100, Dieter N?tzel wrote:
> Amen..
> Sorry, Helge sure you are right in theory but try dbench 32 (maybe 
> bonnie/bonnie++) and playing an MP3/Ogg-Vorbis in parallel...
> That's my first test on any "new" kernel version.
> 
> Even with an 1 GHz Athlon II, 640 MB, U160 DDYS 18 GB, 10k IBM disk (on an 
> AHA-2940UW) it stutters like mad. I am running all my kernel _with_ Robert 
> Love's preempt + lock-break patches and it doesn't solve the problem.
> CPU load is (very) low but it do not work like it should.

I tried this on my 600MHz Athlon with 768MB of RAM and U160 DDYS 36GB
10Krpm IBM disk on a Adaptect 39160 I managed to get it not to stutter
at all. I was also using preempt + lockbreak and a few others. The
crucial patch appeared to be from Andrew Morton and it involved tuning
the elevator to avoid read starvation. A significantly helpful hardware
suggestion regarding the sound card and drivers came from Linus himself,
though.

Linus and others pointed out that applications are able to cause some
drivers to generate a large number of interrupts by using small buffers
and unfriendly ioctl's, especially esd. My workaround was to change out
sound hardware and disable esd. If this is happening to you, /proc/profile
should show handle_IRQ_event() and schedule() very high up. On the other
hand, this shows up as a steady drain on system resources and excessive
system time, not stuttering or skipping.

Andrew, I don't have the URL for that still floating around. Can you
point Dieter to it?

Cheers,
Bill

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Poor performance during disk writes
  2001-12-20 17:40 ` William Lee Irwin III
@ 2001-12-20 18:19   ` Andrew Morton
  2001-12-20 18:29     ` Dave Jones
  2001-12-21 16:50     ` Jens Axboe
  0 siblings, 2 replies; 20+ messages in thread
From: Andrew Morton @ 2001-12-20 18:19 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Linux Kernel List

William Lee Irwin III wrote:
> 
> Andrew, I don't have the URL for that still floating around. Can you
> point Dieter to it?

It's here.

You need to run

	elvtune -b N /dev/hdXX

where N=0 is "disable", N=1 is minimum read latency, N=6 is
a reasonable setting.


--- linux-2.4.17-pre6/drivers/block/elevator.c	Thu Jul 19 20:59:41 2001
+++ linux-akpm/drivers/block/elevator.c	Sat Dec  8 11:10:36 2001
@@ -74,11 +74,10 @@ inline int bh_rq_in_between(struct buffe
 	return 0;
 }
 
-
 int elevator_linus_merge(request_queue_t *q, struct request **req,
 			 struct list_head * head,
 			 struct buffer_head *bh, int rw,
-			 int max_sectors)
+			 int max_sectors, int max_bomb_segments)
 {
 	struct list_head *entry = &q->queue_head;
 	unsigned int count = bh->b_size >> 9, ret = ELEVATOR_NO_MERGE;
@@ -116,6 +115,56 @@ int elevator_linus_merge(request_queue_t
 		}
 	}
 
+	/*
+	 * If we failed to merge a read anywhere in the request
+	 * queue, we really don't want to place it at the end
+	 * of the list, behind lots of writes.  So place it near
+	 * the front.
+	 *
+	 * We don't want to place it in front of _all_ writes: that
+	 * would create lots of seeking, and isn't tunable.
+	 * We try to avoid promoting this read in front of existing
+	 * reads.
+	 *
+	 * max_bomb_sectors becomes the maximum number of write
+	 * requests which we allow to remain in place in front of
+	 * a newly introduced read.  We weight things a little bit,
+	 * so large writes are more expensive than small ones, but it's
+	 * requests which count, not sectors.
+	 */
+	if (max_bomb_segments && rw == READ && ret == ELEVATOR_NO_MERGE) {
+		int cur_latency = 0;
+		struct request * const cur_request = *req;
+
+		entry = head->next;
+		while (entry != &q->queue_head) {
+			struct request *__rq;
+
+			if (entry == &q->queue_head)
+				BUG();
+			if (entry == q->queue_head.next &&
+					q->head_active && !q->plugged)
+				BUG();
+			__rq = blkdev_entry_to_request(entry);
+
+			if (__rq == cur_request) {
+				/*
+				 * This is where the old algorithm placed it.
+				 * There's no point pushing it further back,
+				 * so leave it here, in sorted order.
+				 */
+				break;
+			}
+			if (__rq->cmd == WRITE) {
+				cur_latency += 1 + __rq->nr_sectors / 64;
+				if (cur_latency >= max_bomb_segments) {
+					*req = __rq;
+					break;
+				}
+			}
+			entry = entry->next;
+		}
+	}
 	return ret;
 }
 
@@ -144,7 +193,7 @@ void elevator_linus_merge_req(struct req
 int elevator_noop_merge(request_queue_t *q, struct request **req,
 			struct list_head * head,
 			struct buffer_head *bh, int rw,
-			int max_sectors)
+			int max_sectors, int max_bomb_segments)
 {
 	struct list_head *entry;
 	unsigned int count = bh->b_size >> 9;
@@ -188,7 +237,7 @@ int blkelvget_ioctl(elevator_t * elevato
 	output.queue_ID			= elevator->queue_ID;
 	output.read_latency		= elevator->read_latency;
 	output.write_latency		= elevator->write_latency;
-	output.max_bomb_segments	= 0;
+	output.max_bomb_segments	= elevator->max_bomb_segments;
 
 	if (copy_to_user(arg, &output, sizeof(blkelv_ioctl_arg_t)))
 		return -EFAULT;
@@ -207,9 +256,12 @@ int blkelvset_ioctl(elevator_t * elevato
 		return -EINVAL;
 	if (input.write_latency < 0)
 		return -EINVAL;
+	if (input.max_bomb_segments < 0)
+		return -EINVAL;
 
 	elevator->read_latency		= input.read_latency;
 	elevator->write_latency		= input.write_latency;
+	elevator->max_bomb_segments	= input.max_bomb_segments;
 	return 0;
 }
 
--- linux-2.4.17-pre6/drivers/block/ll_rw_blk.c	Mon Nov  5 21:01:11 2001
+++ linux-akpm/drivers/block/ll_rw_blk.c	Sat Dec  8 11:10:36 2001
@@ -690,7 +690,8 @@ again:
 	} else if (q->head_active && !q->plugged)
 		head = head->next;
 
-	el_ret = elevator->elevator_merge_fn(q, &req, head, bh, rw,max_sectors);
+	el_ret = elevator->elevator_merge_fn(q, &req, head, bh,
+				rw, max_sectors, elevator->max_bomb_segments);
 	switch (el_ret) {
 
 		case ELEVATOR_BACK_MERGE:
--- linux-2.4.17-pre6/include/linux/elevator.h	Thu Feb 15 16:58:34 2001
+++ linux-akpm/include/linux/elevator.h	Sat Dec  8 11:10:36 2001
@@ -5,8 +5,9 @@ typedef void (elevator_fn) (struct reque
 			    struct list_head *,
 			    struct list_head *, int);
 
-typedef int (elevator_merge_fn) (request_queue_t *, struct request **, struct list_head *,
-				 struct buffer_head *, int, int);
+typedef int (elevator_merge_fn)(request_queue_t *, struct request **,
+				struct list_head *, struct buffer_head *bh,
+				int rw, int max_sectors, int max_bomb_segments);
 
 typedef void (elevator_merge_cleanup_fn) (request_queue_t *, struct request *, int);
 
@@ -16,6 +17,7 @@ struct elevator_s
 {
 	int read_latency;
 	int write_latency;
+	int max_bomb_segments;
 
 	elevator_merge_fn *elevator_merge_fn;
 	elevator_merge_cleanup_fn *elevator_merge_cleanup_fn;
@@ -24,13 +26,13 @@ struct elevator_s
 	unsigned int queue_ID;
 };
 
-int elevator_noop_merge(request_queue_t *, struct request **, struct list_head *, struct buffer_head *, int, int);
-void elevator_noop_merge_cleanup(request_queue_t *, struct request *, int);
-void elevator_noop_merge_req(struct request *, struct request *);
-
-int elevator_linus_merge(request_queue_t *, struct request **, struct list_head *, struct buffer_head *, int, int);
-void elevator_linus_merge_cleanup(request_queue_t *, struct request *, int);
-void elevator_linus_merge_req(struct request *, struct request *);
+elevator_merge_fn elevator_noop_merge;
+elevator_merge_cleanup_fn elevator_noop_merge_cleanup;
+elevator_merge_req_fn elevator_noop_merge_req;
+
+elevator_merge_fn elevator_linus_merge;
+elevator_merge_cleanup_fn elevator_linus_merge_cleanup;
+elevator_merge_req_fn elevator_linus_merge_req;
 
 typedef struct blkelv_ioctl_arg_s {
 	int queue_ID;
@@ -54,22 +56,6 @@ extern void elevator_init(elevator_t *, 
 #define ELEVATOR_FRONT_MERGE	1
 #define ELEVATOR_BACK_MERGE	2
 
-/*
- * This is used in the elevator algorithm.  We don't prioritise reads
- * over writes any more --- although reads are more time-critical than
- * writes, by treating them equally we increase filesystem throughput.
- * This turns out to give better overall performance.  -- sct
- */
-#define IN_ORDER(s1,s2)				\
-	((((s1)->rq_dev == (s2)->rq_dev &&	\
-	   (s1)->sector < (s2)->sector)) ||	\
-	 (s1)->rq_dev < (s2)->rq_dev)
-
-#define BHRQ_IN_ORDER(bh, rq)			\
-	((((bh)->b_rdev == (rq)->rq_dev &&	\
-	   (bh)->b_rsector < (rq)->sector)) ||	\
-	 (bh)->b_rdev < (rq)->rq_dev)
-
 static inline int elevator_request_latency(elevator_t * elevator, int rw)
 {
 	int latency;
@@ -85,7 +71,7 @@ static inline int elevator_request_laten
 ((elevator_t) {								\
 	0,				/* read_latency */		\
 	0,				/* write_latency */		\
-									\
+	0,				/* max_bomb_segments */		\
 	elevator_noop_merge,		/* elevator_merge_fn */		\
 	elevator_noop_merge_cleanup,	/* elevator_merge_cleanup_fn */	\
 	elevator_noop_merge_req,	/* elevator_merge_req_fn */	\
@@ -95,7 +81,7 @@ static inline int elevator_request_laten
 ((elevator_t) {								\
 	8192,				/* read passovers */		\
 	16384,				/* write passovers */		\
-									\
+	0,				/* max_bomb_segments */		\
 	elevator_linus_merge,		/* elevator_merge_fn */		\
 	elevator_linus_merge_cleanup,	/* elevator_merge_cleanup_fn */	\
 	elevator_linus_merge_req,	/* elevator_merge_req_fn */	\

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Poor performance during disk writes
  2001-12-20 18:19   ` Andrew Morton
@ 2001-12-20 18:29     ` Dave Jones
  2001-12-21 16:50       ` Jens Axboe
  2001-12-21 16:50     ` Jens Axboe
  1 sibling, 1 reply; 20+ messages in thread
From: Dave Jones @ 2001-12-20 18:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: William Lee Irwin III, Linux Kernel List

On Thu, 20 Dec 2001, Andrew Morton wrote:

> You need to run
> 	elvtune -b N /dev/hdXX
> where N=0 is "disable", N=1 is minimum read latency, N=6 is
> a reasonable setting.

I'm curious, why was max_bomb_segments dropped the last time
it was in the tree ? I recall it happening, but the reason
escapes me.

Dave.

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Poor performance during disk writes
  2001-12-20 18:29     ` Dave Jones
@ 2001-12-21 16:50       ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2001-12-21 16:50 UTC (permalink / raw)
  To: Dave Jones; +Cc: Andrew Morton, William Lee Irwin III, Linux Kernel List

On Thu, Dec 20 2001, Dave Jones wrote:
> On Thu, 20 Dec 2001, Andrew Morton wrote:
> 
> > You need to run
> > 	elvtune -b N /dev/hdXX
> > where N=0 is "disable", N=1 is minimum read latency, N=6 is
> > a reasonable setting.
> 
> I'm curious, why was max_bomb_segments dropped the last time
> it was in the tree ? I recall it happening, but the reason
> escapes me.

Fooled me too the first time, read Andrew's patch though. It isn't
related at all.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Poor performance during disk writes
  2001-12-20 18:19   ` Andrew Morton
  2001-12-20 18:29     ` Dave Jones
@ 2001-12-21 16:50     ` Jens Axboe
  1 sibling, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2001-12-21 16:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: William Lee Irwin III, Linux Kernel List

On Thu, Dec 20 2001, Andrew Morton wrote:
> --- linux-2.4.17-pre6/drivers/block/ll_rw_blk.c	Mon Nov  5 21:01:11 2001
> +++ linux-akpm/drivers/block/ll_rw_blk.c	Sat Dec  8 11:10:36 2001
> @@ -690,7 +690,8 @@ again:
>  	} else if (q->head_active && !q->plugged)
>  		head = head->next;
>  
> -	el_ret = elevator->elevator_merge_fn(q, &req, head, bh, rw,max_sectors);
> +	el_ret = elevator->elevator_merge_fn(q, &req, head, bh,
> +				rw, max_sectors, elevator->max_bomb_segments);

merge function can just grab max_bomb_segments ala

	int mbs = q->elevator.max_bomb_segments

so no need to modify the merge functions.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

[parent not found: <0112201629230E.01835@manta>]

[parent not found: <200112201436.fBKEa2m26640@zero.tech9.net>]

* Re: Poor performance during disk writes
       [not found]   ` <200112201436.fBKEa2m26640@zero.tech9.net>
@ 2001-12-20 19:02     ` Robert Love
  0 siblings, 0 replies; 20+ messages in thread
From: Robert Love @ 2001-12-20 19:02 UTC (permalink / raw)
  To: Dieter N?tzel; +Cc: vda, Helge Hafting, jlm, Andre Hedrick, Linux Kernel List

On Thu, 2001-12-20 at 09:35, Dieter N?tzel wrote:

> > Robert maintains latency measurement patch, do you use it?
> 
> Yes, I did the ReiserFS lock-break tests for him.
>  
> > Does it show where are the problems?
> 
> NO, we have no clue, yet :-(

Want to see if its the VM?  Rik van Riel has updated his 2.4-ac VM for
new kernels and added reverse page mapping (a neat feature).

It is available at:
	http://www.surriel.com/patches/2.4/2.4.16-rmap-6

Give it a whirl, you might me impressed.  If not, maybe we can scratch
the VM as the problem and stare meanly at VFS ;-)

(Note my lock-break patch will fail on the new VM.  Ignore it.  The rest
is still fine. Perhaps I'll do a lock-break for this VM later).

	Robert Love

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2001-12-21 16:53 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-18  0:53 Poor performance during disk writes jlm
2001-12-18  1:09 ` Reid Hekman
2001-12-18  1:36   ` jlm
2001-12-18  2:01     ` Reid Hekman
2001-12-18 18:46 ` Andre Hedrick
2001-12-18 17:42   ` Gérard Roudier
2001-12-18 20:34     ` Andre Hedrick
2001-12-18 19:09       ` Gérard Roudier
2001-12-19 23:26         ` jlm
2001-12-20 10:49           ` Helge Hafting
2001-12-20 11:16             ` Oops in 2.4.14-pre6 and 2.4.14-pre9aa1 Andre Margis
2001-12-21 16:29     ` Poor performance during disk writes Troy Benjegerdes
  -- strict thread matches above, loose matches on Subject: below --
2001-12-20 13:27 Dieter Nützel
2001-12-20 14:51 ` safemode
2001-12-20 17:40 ` William Lee Irwin III
2001-12-20 18:19   ` Andrew Morton
2001-12-20 18:29     ` Dave Jones
2001-12-21 16:50       ` Jens Axboe
2001-12-21 16:50     ` Jens Axboe
     [not found] ` <0112201629230E.01835@manta>
     [not found]   ` <200112201436.fBKEa2m26640@zero.tech9.net>
2001-12-20 19:02     ` Robert Love

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox