public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Boaz Harrosh <bharrosh@panasas.com>
Cc: Jens Axboe <Jens.Axboe@oracle.com>,
	linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: Actually using the sg table/chain code
Date: Wed, 16 Jan 2008 10:46:56 -0600	[thread overview]
Message-ID: <1200502016.3136.11.camel@localhost.localdomain> (raw)
In-Reply-To: <478E32BD.50505@panasas.com>


On Wed, 2008-01-16 at 18:37 +0200, Boaz Harrosh wrote:
> On Wed, Jan 16 2008 at 18:11 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
> > On Wed, Jan 16 2008 at 17:09 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >> On Wed, 2008-01-16 at 16:01 +0200, Boaz Harrosh wrote:
> >>> On Tue, Jan 15 2008 at 19:35 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
> >>>> On Tue, Jan 15 2008 at 18:49 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >>>>> On Tue, 2008-01-15 at 18:09 +0200, Boaz Harrosh wrote:
> >>>>>> On Tue, Jan 15 2008 at 17:52 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >>>>>>> I thought, now we had this new shiny code to increase the scatterlist
> >>>>>>> table size I'd try it out.  It turns out there's a pretty vast block
> >>>>>>> conspiracy that prevents us going over 128 entries in a scatterlist.
> >>>>>>>
> >>>>>>> The first problems are in SCSI:  The host parameters sg_tablesize and
> >>>>>>> max_sectors are used to set the queue limits max_hw_segments and
> >>>>>>> max_sectors respectively (the former is the maximum number of entries
> >>>>>>> the HBA can tolerate in a scatterlist for each transaction, the latter
> >>>>>>> is a total transfer cap on the maxiumum number of 512 byte sectors).
> >>>>>>> The default settings, assuming the HBA doesn't vary them are
> >>>>>>> sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS
> >>>>>>> (1024).  A quick calculation shows the latter is actually 512k or 128
> >>>>>>> pages (at 4k pages), hence the persistent 128 entry limit.
> >>>>>>>
> >>>>>>> However, raising max_sectors and sg_tablesize together still doesn't
> >>>>>>> help:  There's actually an insidious limit sitting in the block layer as
> >>>>>>> well.  This is what blk_queue_max_sectors says:
> >>>>>>>
> >>>>>>> void blk_queue_max_sectors(struct request_queue *q, unsigned int
> >>>>>>> max_sectors)
> >>>>>>> {
> >>>>>>> 	if ((max_sectors << 9) < PAGE_CACHE_SIZE) {
> >>>>>>> 		max_sectors = 1 << (PAGE_CACHE_SHIFT - 9);
> >>>>>>> 		printk("%s: set to minimum %d\n", __FUNCTION__, max_sectors);
> >>>>>>> 	}
> >>>>>>>
> >>>>>>> 	if (BLK_DEF_MAX_SECTORS > max_sectors)
> >>>>>>> 		q->max_hw_sectors = q->max_sectors = max_sectors;
> >>>>>>>  	else {
> >>>>>>> 		q->max_sectors = BLK_DEF_MAX_SECTORS;
> >>>>>>> 		q->max_hw_sectors = max_sectors;
> >>>>>>> 	}
> >>>>>>> }
> >>>>>>>
> >>>>>>> So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is
> >>>>>>> defined in blkdev.h to .... 1024, thus also forcing the queue down to
> >>>>>>> 128 scatterlist entries.
> >>>>>>>
> >>>>>>> Once I raised this limit as well, I was able to transfer over 128
> >>>>>>> scatterlist elements during benchmark test runs of normal I/O (actually
> >>>>>>> kernel compiles seem best, they hit 608 scatterlist entries).
> >>>>>>>
> >>>>>>> So my question, is there any reason not to raise this limit to something
> >>>>>>> large (like 65536) or even eliminate it altogether?
> >>>>>>>
> >>>>>>> James
> >>>>>>>
> >>>>>> I have an old branch here where I've swiped through the scsi drivers just
> >>>>>> to remove the SG_ALL limit. Unfortunately some drivers mean laterally
> >>>>>> 255 when using SG_ALL. So I passed driver by driver and carfully inspected
> >>>>>> the code to change it to something driver specific if they really meant
> >>>>>> 255.
> >>>>>>
> >>>>>> I have used sg_tablesize = ~0; to indicate, I don't care any will do,
> >>>>>> and some driver constant if there is a real limit. Though removing
> >>>>>> SG_ALL at the end.
> >>>>>>
> >>>>>> Should I freshen up this branch and send it.
> >>>>> By all means; however, I think having the defined constant SG_ALL is
> >>>>> useful (even if it is eventually just set to ~0) it means I can support
> >>>>> any scatterlist size.  Having the drivers set sg_tablesize correctly
> >>>>> that can't support SG_ALL is pretty vital.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> James
> >>>> OK will do.
> >>>>
> >>>> I have found the old branch and am looking. I agree with you about the 
> >>>> SG_ALL. I will fix it to have a patch per changed driver, with out changing
> >>>> SG_ALL, and then final patch to just change SG_ALL.
> >>>>
> >>>> Boaz
> >>> James hi
> >>> reinspecting the code, what should I do with drivers that do not support chaining
> >>> do to SW that still do sglist++?
> >>>
> >>> should I set their sg_tablesize to SG_MAX_SINGLE_ALLOC, or hard code to 128, and put
> >>> a FIXME: in the submit message?
> >>>
> >>> or should we fix them first and serialize this effort on top of those fixes.
> >>> (also in light of the other email where you removed the chaining flag)
> >> How many of them are left?
> >>
> >> The correct value is clearly SCSI_MAX_SG_SEGMENTS which fortunately
> >> "[PATCH] remove use_sg_chaining" moved into a shared header.  Worst
> >> case, just use that and add a fixme comment giving the real value (if
> >> there is one).
> >>
> >> James
> >>
> >>
> > 
> > I have 9 up to now and 10 more drivers to check. All but one are
> > SW, one by one SCp.buffer++, so once it's fixed they should be able
> > to go back to SG_ALL. But for now I will set them to SCSI_MAX_SG_SEGMENTS
> > as you requested. I have not checked drivers that did not use SG_ALL
> > but I trust these are usually smaller.
> > 
> > Boaz
> > 
> > 
> James Hi.
> 
> Looking at the patches I just realized that I made a mistake and did
> not work on top of your: "[PATCH] remove use_sg_chaining" .
> Now rebasing should be easy but I think my patch should go first because
> there are some 10-15 drivers that are not chained ready but will work
> perfectly after my patch that sets sg_tablesize to SCSI_MAX_SG_SEGMENTS
> 
> should I rebase or should "[PATCH] remove use_sg_chaining" be rebased?

The order doesn't matter; the two patches are completely orthogonal.
Just send the list what you have ... I'm rebasing a lot of stuff fairly
often at this stage in the merge cycle.

James



  reply	other threads:[~2008-01-16 16:47 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-15 15:52 Actually using the sg table/chain code James Bottomley
2008-01-15 16:09 ` Boaz Harrosh
2008-01-15 16:49   ` James Bottomley
2008-01-15 17:35     ` Boaz Harrosh
2008-01-16 14:01       ` Boaz Harrosh
2008-01-16 15:09         ` James Bottomley
2008-01-16 16:11           ` Boaz Harrosh
2008-01-16 16:37             ` Boaz Harrosh
2008-01-16 16:46               ` James Bottomley [this message]
2008-01-15 19:52 ` Jeff Garzik
2008-01-15 20:14   ` James Bottomley
2008-01-16 15:06 ` Jens Axboe
2008-01-16 15:47   ` James Bottomley
2008-01-16 16:08     ` Jens Axboe
2008-02-22 16:13     ` Mike Christie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1200502016.3136.11.camel@localhost.localdomain \
    --to=james.bottomley@hansenpartnership.com \
    --cc=Jens.Axboe@oracle.com \
    --cc=bharrosh@panasas.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox