Re: SCSI mid layer and high IOPS capable devices

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: scameron@beardog.cce.hp.com
To: Bart Van Assche <bvanassche@acm.org>
Cc: linux-scsi@vger.kernel.org, stephenmcameron@gmail.com,
	dab@hp.com, scameron@beardog.cce.hp.com
Subject: Re: SCSI mid layer and high IOPS capable devices
Date: Tue, 11 Dec 2012 16:46:26 -0600	[thread overview]
Message-ID: <20121211224626.GB20898@beardog.cce.hp.com> (raw)
In-Reply-To: <50C6ED1A.7040404@acm.org>

On Tue, Dec 11, 2012 at 09:21:46AM +0100, Bart Van Assche wrote:
> On 12/11/12 01:00, scameron@beardog.cce.hp.com wrote:
> >I tried using scsi_debug with fake_rw and also the scsi_ram driver
> >that was recently posted to get some idea of what the maximum IOPS
> >that could be pushed through the SCSI midlayer might be, and the
> >numbers were a little disappointing (was getting around 150k iops
> >with scsi_debug with reads and writes faked, and around 3x that
> >with the block driver actually doing the i/o).
> 
> With which request size was that ? 

4k (I'm thinking the request size should not matter too much since
fake_rw=1 causes the i/o not to actually be done -- there's no data 
transferred.  Similarly with scsi_ram there's a flag to discard 
reads and writes that I was using.)

> I see about 330K IOPS @ 4 KB and 
> about 540K IOPS @ 512 bytes with the SRP protocol, a RAM disk at the 
> target side, a single SCSI LUN and a single IB cable. These results have 
> been obtained on a setup with low-end CPU's. Had you set rq_affinity to 
> 2 in your tests ?

No, hadn't done anything with rq_affinity.  I had spread interrupts
around by turning off irqbalance and echoing things into /proc/irq/*, and
running a bunch of dd processes (one per cpu) like this: 

	taskset -c $cpu dd if=/dev/blah of=/dev/null bs=4k iflag=direct &

And the hardware in this case should route the interrupts back to the processor
which submitted the i/o (the submitted command contains info that lets the hw
know which msix vector we want the io to come back on.)

I would be curious to see what kind of results you would get with scsi_debug
with fake_rw=1.  I am sort of suspecting that trying to put an "upper limit"
on scsi LLD IOPS performance by seeing what scsi_debug will do with fake_rw=1
is not really valid (or, maybe I'm doing it wrong) as I know of one case in
which a real HW scsi driver beats scsi_debug fake_rw=1 at IOPS on the very
same system, which seems like it shouldn't be possible.  Kind of mysterious.

Another mystery I haven't been able to clear up -- I'm using code like
this to set affinity hints 

        int i, cpu;

        cpu = cpumask_first(cpu_online_mask);
        for (i = 0; i < h->noqs; i++) {
                int idx = i ? i + 1 : i;
                int rc;
                rc = irq_set_affinity_hint(h->qinfo[idx].msix_vector,
                                        get_cpu_mask(cpu));

                if (rc)
                        dev_warn(&h->pdev->dev, "Failed to hint affinity of vector %d to cpu %d\n",
                                        h->qinfo[idx].msix_vector, cpu);
                cpu = cpumask_next(cpu, cpu_online_mask);
        }

and those hints are set (querying /proc/irq/*/affinity_hint shows that my hints
are in there) but the hints are not "taken", that is /proc/irq/smp_affinity
does not match the hints.

doing this:

for x in `seq $first_irq $last_irq`
do
	cat /proc/irq/$x/affinity_hint > /proc/irq/$x/smp_affinity
done

(where first_irq and last_irq specify the range of irqs assigned
to my driver) makes the hints be "taken".

I noticed nvme doesn't seem to suffer from this, somehow the hints are
taken automatically (er, I don't recall if /proc/irq/*/smp_affinity matches
affinity_hints for nvme, but interrupts seem spread around without doing
anything special).   I haven't seen anything in the nvme code related to affinity
that I'm not already doing as well in my driver, so it is a mystery to me why
that difference in behavior occurs.

-- steve

next prev parent reply	other threads:[~2012-12-11 21:46 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-11  0:00 SCSI mid layer and high IOPS capable devices scameron
2012-12-11  8:21 ` Bart Van Assche
2012-12-11 22:46   ` scameron [this message]
2012-12-13 11:40     ` Bart Van Assche
2012-12-13 18:03       ` scameron
2012-12-13 17:18         ` Bart Van Assche
2012-12-13 15:22 ` Bart Van Assche
2012-12-13 17:25   ` scameron
2012-12-13 16:47     ` Bart Van Assche
2012-12-13 16:49       ` Christoph Hellwig
2012-12-14  9:44         ` Bart Van Assche
2012-12-14 16:44           ` scameron
2012-12-14 16:15             ` Bart Van Assche
2012-12-14 19:55               ` scameron
2012-12-14 19:28                 ` Bart Van Assche
2012-12-14 21:06                   ` scameron
2012-12-15  9:40                     ` Bart Van Assche
2012-12-19 14:23                       ` Christoph Hellwig
2012-12-13 21:20       ` scameron
2012-12-14  0:22       ` Jack Wang
     [not found]         ` <CADzpL0TMT31yka98Zv0=53N4=pDZOc9+gacnvDWMbj+iZg4H5w@mail.gmail.com>
     [not found]           ` <006301cdd99c$35099b40$9f1cd1c0$@com>
     [not found]             ` <CADzpL0S5cfCRQftrxHij8KOjKj55psSJedmXLBQz1uQm_SC30A@mail.gmail.com>
2012-12-14  4:59               ` Jack Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121211224626.GB20898@beardog.cce.hp.com \
    --to=scameron@beardog.cce.hp.com \
    --cc=bvanassche@acm.org \
    --cc=dab@hp.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=stephenmcameron@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).