Re: queue_nr_requests needs to be selective

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Jeff V. Merkey" <jmerkey@vger.timpanogas.org>
To: Andrew Morton <akpm@zip.com.au>
Cc: linux-kernel@vger.kernel.org
Subject: Re: queue_nr_requests needs to be selective
Date: Fri, 1 Mar 2002 16:20:16 -0700	[thread overview]
Message-ID: <20020301162016.A12413@vger.timpanogas.org> (raw)
In-Reply-To: <20020301132254.A11528@vger.timpanogas.org> <3C7FE7DD.98121E87@zip.com.au>
In-Reply-To: <3C7FE7DD.98121E87@zip.com.au>; from akpm@zip.com.au on Fri, Mar 01, 2002 at 12:43:09PM -0800

On Fri, Mar 01, 2002 at 12:43:09PM -0800, Andrew Morton wrote:
> "Jeff V. Merkey" wrote:
> > 
> > Linus/Alan/Linux,
> > 
> > Performance numbers can be increased dramatically (> 300 MB/S)
> > by increasing queue_nr_requests in ll_rw_blk.c on large RAID
> > controllers that are hosting a lot of drives.
> 
> I don't immediately see why increasing the queue length should
> increase bandwidth in this manner.  One possibility is that
> the shorter queue results in tasks sleeping in __get_request_wait
> more often, and the real problem is the "request starvation" thing.

This is the case.  We end up sleeping at with 8,000 buffer head requests
(4K block size) queued per adapter.  After I made the change and 
increased the size to 1024, this number increased to 17,000 
buffer heads queued per adapter.  Performance went up and processor
utilization went down.  

>From the profiling I ran, we were sleeping in __get_request_wait
far too much.  This value should be maintained on a per card 
basis and for RAID controllers that present a single virtual disk
for many physical disks (i.e. on 3Ware this number is 8), we should 
make the queue 8 X default.  I guess each driver would need to 
change this value based on how many actual drives were attached 
to the controller.  

> 
> The "request starvation" thing could conceivably result in more
> seeky behaviour.  In your kernel, disk writeouts come from
> two places:

This is not happening.  The elevator code is above this level and 
I am seeing the requests are ordered for the most part at this 
layer.

> 
> - Off the tail of the dirty buffer LRU
> 
> - Basically random guess, from the page LRU.

There are two senarios, one scenario is not using Linus' buffer cache
but a custom cache maintained between SCI nodes, another implementation
is using Linus' buffer cache.  We are seeing > 300 MB/S on the SCI 
cache.  

> 
> It's competition between these two writeout sources which causes
> decreased bandwidth - I've seen kernels in which ext2 writeback
> performance was down 40% due to this.
> 
> Anyway.   You definitely need to try 2.4.19-pre1.   Max sleep times
> in __get_request_wait will be improved, and it's possible that the
> bandwidth will improve.  Or not.  My gut feel is that it won't
> help.
> 

How about just increasing the value of queue_nr_requests or making 
it adapter specific?

> And yes, 128 requests is too few.  It used to be ~1000.  I think
> this change was made in a (misguided, unsuccessful) attempt to
> manage latency for readers.  The request queue is the only mechanism
> we have for realigning out-of-order requests and it needs to be
> larger so it can do this better. I've seen 15-25% throughput
> improvements from a 1024-slot request queue.

> 
> And if a return to a large request queue damages latency (it doesn't)
> then we need to fix that latency *without* damaging request merging.
> 
> First step: please test 2.4.19-pre1 or -pre2.  Also 2.4.19-pre1-ac2
> may provide some surprises..
> 

I will test, but unless this value is higher, I am skeptical I will see
the needed improvement.  The issue here is that it sleeps too much
and what's really happening and that we are forcing 8 disk drives
toshare 64/128 request buffers rather than provide each physical disk
with what it really needs.  

Jeff

> -
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

next prev parent reply	other threads:[~2002-03-01 23:06 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-03-01 20:22 queue_nr_requests needs to be selective Jeff V. Merkey
2002-03-01 20:43 ` Andrew Morton
2002-03-01 21:03   ` Alan Cox
2002-03-01 23:22     ` Jeff V. Merkey
2002-03-01 23:20   ` Jeff V. Merkey [this message]
2002-03-01 23:23     ` Andrew Morton
2002-03-02  0:27       ` Jeff V. Merkey
2002-03-02  0:49         ` Andrew Morton
2002-03-02  2:16           ` Jeff V. Merkey
2002-03-02  3:50             ` Andrew Morton
2002-03-02  4:34               ` Jeff V. Merkey
2002-03-02  7:33               ` Jeff V. Merkey
2002-03-02  9:10               ` Jens Axboe
2002-03-02  9:22                 ` Andrew Morton
2002-03-04  9:09                   ` Jens Axboe
2002-03-02  0:51 ` Mike Anderson
2002-03-02  4:39   ` Jeff V. Merkey
2002-03-02  5:59     ` Jeff V. Merkey
2002-03-02  6:01       ` Jeff V. Merkey
2002-03-02  6:16         ` Jeff V. Merkey
2002-03-04  7:16       ` Mike Anderson
2002-03-04 17:39         ` Jeff V. Merkey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020301162016.A12413@vger.timpanogas.org \
    --to=jmerkey@vger.timpanogas.org \
    --cc=akpm@zip.com.au \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox