All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@suse.de>
To: Al Boldi <a1426z@gawab.com>
Cc: linux-kernel@vger.kernel.org, David Chinner <dgc@sgi.com>
Subject: Re: [PATCH] Direct I/O bio size regression
Date: Mon, 24 Apr 2006 21:49:10 +0200	[thread overview]
Message-ID: <20060424194910.GK29724@suse.de> (raw)
In-Reply-To: <200604242006.11758.a1426z@gawab.com>

On Mon, Apr 24 2006, Al Boldi wrote:
> David Chinner wrote:
> > On Mon, Apr 24, 2006 at 11:05:08AM +0200, Jens Axboe wrote:
> > > On Mon, Apr 24 2006, Jens Axboe wrote:
> > > > > Index: 2.6.x-xfs-new/fs/bio.c
> > > > > ===================================================================
> > > > > --- 2.6.x-xfs-new.orig/fs/bio.c   2006-02-06 11:57:50.000000000
> > > > > +1100 +++ 2.6.x-xfs-new/fs/bio.c        2006-04-24
> > > > > 15:46:16.849484424 +1000 @@ -304,7 +304,7 @@ int
> > > > > bio_get_nr_vecs(struct block_device request_queue_t *q =
> > > > > bdev_get_queue(bdev);
> > > > >   int nr_pages;
> > > > >
> > > > > - nr_pages = ((q->max_sectors << 9) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > > > + nr_pages = ((q->max_hw_sectors << 9) + PAGE_SIZE - 1) >>
> > > > > PAGE_SHIFT; if (nr_pages > q->max_phys_segments)
> > > > >           nr_pages = q->max_phys_segments;
> > > > >   if (nr_pages > q->max_hw_segments)
> > > > > @@ -446,7 +446,7 @@ int bio_add_page(struct bio *bio, struct
> > > > >            unsigned int offset)
> > > > >  {
> > > > >   struct request_queue *q = bdev_get_queue(bio->bi_bdev);
> > > > > - return __bio_add_page(q, bio, page, len, offset, q->max_sectors);
> > > > > + return __bio_add_page(q, bio, page, len, offset,
> > > > > q->max_hw_sectors); }
> > > > >
> > > > >  struct bio_map_data {
> > > >
> > > > Clearly correct, I'll make sure this gets merged right away.
> > >
> > > Spoke too soon... The last part is actually on purpose, to prevent
> > > really huge requests as part of normal file system IO.
> >
> > I don't understand why this was considered necessary. It
> > doesn't appear to be explained in any of the code so can you
> > explain the problem that large filesystem I/Os pose to the block
> > layer? We _need_ to be able to drive really huge requests from the
> > filesystem down to the disks, especially for direct I/O.....
> > FWIW, we've just got XFS to the point where we could issue large
> > I/Os (up to 8MB on 16k pages) with a default configuration kernel
> > and filesystem using md+dm on an Altix. That makes an artificial
> > 512KB filesystem I/O size limit a pretty major step backwards in
> > terms of performance for default configs.....
> >
> > > That's why we
> > > have a bio_add_pc_page(). The first hunk may cause things to not work
> > > optimally then if we don't apply the last hunk.
> >
> > bio_add_pc_page() requires a request queue to be passed to it.  It's
> > called only from scsi layers in the context of mapping pages into a
> > bio from sg_io(). The comment for bio_add_pc_page() says for use
> > with REQ_PC queues only, and that appears to only be used by ide-cd
> > cdroms. Is that comment correct?
> >
> > Also, it seems to me that using bio_add_pc_page() in a filesystem
> > or in the generic direct i/o code seems like a gross layering
> > violation to me because they are supposed to know nothing about
> > request queues.
> >
> > > The best approach is probably to tune max_sectors on the system itself.
> > > That's why it is exposed, after all.
> >
> > You mean /sys/block/sd*/max_sector_kb?
> 
> On my system max_hw_sectors_kb is fixed at 1024, and max_sectors_kb defaults 
> to 512, which leads to terribly fluctuating thruput.
> 
> Setting max_sectors_kb = max_hw_sectors_kb makes things even worse.
> 
> Tuning max_sectors_kb to ~192 only stabilizes this situation.

That sounds pretty strange. Do you have a test case?

-- 
Jens Axboe


  reply	other threads:[~2006-04-24 19:48 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-24 17:06 [PATCH] Direct I/O bio size regression Al Boldi
2006-04-24 19:49 ` Jens Axboe [this message]
2006-04-24 20:59   ` Al Boldi
2006-04-25  7:52     ` Nick Piggin
2006-04-25 10:45       ` Al Boldi
  -- strict thread matches above, loose matches on Subject: below --
2006-04-24  6:14 David Chinner
2006-04-24  7:02 ` Jens Axboe
2006-04-24  9:05   ` Jens Axboe
2006-04-24 14:56     ` David Chinner
2006-04-24 18:47       ` Jens Axboe
2006-04-26  2:30         ` David Chinner
2006-04-26  5:28           ` Jens Axboe
2006-04-26 15:41             ` David Chinner
2006-04-26 17:55               ` Jens Axboe
2006-05-07 16:25           ` Lee Revell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060424194910.GK29724@suse.de \
    --to=axboe@suse.de \
    --cc=a1426z@gawab.com \
    --cc=dgc@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.