Re: SCSI mid layer and high IOPS capable devices

All of lore.kernel.org
 help / color / mirror / Atom feed

From: scameron@beardog.cce.hp.com
To: Bart Van Assche <bvanassche@acm.org>
Cc: linux-scsi@vger.kernel.org, stephenmcameron@gmail.com,
	dab@hp.com, scameron@beardog.cce.hp.com
Subject: Re: SCSI mid layer and high IOPS capable devices
Date: Tue, 11 Dec 2012 16:46:26 -0600	[thread overview]
Message-ID: <20121211224626.GB20898@beardog.cce.hp.com> (raw)
In-Reply-To: <50C6ED1A.7040404@acm.org>

On Tue, Dec 11, 2012 at 09:21:46AM +0100, Bart Van Assche wrote:
> On 12/11/12 01:00, scameron@beardog.cce.hp.com wrote:
> >I tried using scsi_debug with fake_rw and also the scsi_ram driver
> >that was recently posted to get some idea of what the maximum IOPS
> >that could be pushed through the SCSI midlayer might be, and the
> >numbers were a little disappointing (was getting around 150k iops
> >with scsi_debug with reads and writes faked, and around 3x that
> >with the block driver actually doing the i/o).
> 
> With which request size was that ? 

4k (I'm thinking the request size should not matter too much since
fake_rw=1 causes the i/o not to actually be done -- there's no data 
transferred.  Similarly with scsi_ram there's a flag to discard 
reads and writes that I was using.)

> I see about 330K IOPS @ 4 KB and 
> about 540K IOPS @ 512 bytes with the SRP protocol, a RAM disk at the 
> target side, a single SCSI LUN and a single IB cable. These results have 
> been obtained on a setup with low-end CPU's. Had you set rq_affinity to 
> 2 in your tests ?

No, hadn't done anything with rq_affinity.  I had spread interrupts
around by turning off irqbalance and echoing things into /proc/irq/*, and
running a bunch of dd processes (one per cpu) like this: 

	taskset -c $cpu dd if=/dev/blah of=/dev/null bs=4k iflag=direct &

And the hardware in this case should route the interrupts back to the processor
which submitted the i/o (the submitted command contains info that lets the hw
know which msix vector we want the io to come back on.)

I would be curious to see what kind of results you would get with scsi_debug
with fake_rw=1.  I am sort of suspecting that trying to put an "upper limit"
on scsi LLD IOPS performance by seeing what scsi_debug will do with fake_rw=1
is not really valid (or, maybe I'm doing it wrong) as I know of one case in
which a real HW scsi driver beats scsi_debug fake_rw=1 at IOPS on the very
same system, which seems like it shouldn't be possible.  Kind of mysterious.

Another mystery I haven't been able to clear up -- I'm using code like
this to set affinity hints 

        int i, cpu;

        cpu = cpumask_first(cpu_online_mask);
        for (i = 0; i < h->noqs; i++) {
                int idx = i ? i + 1 : i;
                int rc;
                rc = irq_set_affinity_hint(h->qinfo[idx].msix_vector,
                                        get_cpu_mask(cpu));

                if (rc)
                        dev_warn(&h->pdev->dev, "Failed to hint affinity of vector %d to cpu %d\n",
                                        h->qinfo[idx].msix_vector, cpu);
                cpu = cpumask_next(cpu, cpu_online_mask);
        }

and those hints are set (querying /proc/irq/*/affinity_hint shows that my hints
are in there) but the hints are not "taken", that is /proc/irq/smp_affinity
does not match the hints.

doing this:

for x in `seq $first_irq $last_irq`
do
	cat /proc/irq/$x/affinity_hint > /proc/irq/$x/smp_affinity
done

(where first_irq and last_irq specify the range of irqs assigned
to my driver) makes the hints be "taken".

I noticed nvme doesn't seem to suffer from this, somehow the hints are
taken automatically (er, I don't recall if /proc/irq/*/smp_affinity matches
affinity_hints for nvme, but interrupts seem spread around without doing
anything special).   I haven't seen anything in the nvme code related to affinity
that I'm not already doing as well in my driver, so it is a mystery to me why
that difference in behavior occurs.

-- steve

next prev parent reply	other threads:[~2012-12-11 21:46 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-11  0:00 SCSI mid layer and high IOPS capable devices scameron
2012-12-11  8:21 ` Bart Van Assche
2012-12-11 22:46   ` scameron [this message]
2012-12-13 11:40     ` Bart Van Assche
2012-12-13 18:03       ` scameron
2012-12-13 17:18         ` Bart Van Assche
2012-12-13 15:22 ` Bart Van Assche
2012-12-13 17:25   ` scameron
2012-12-13 16:47     ` Bart Van Assche
2012-12-13 16:49       ` Christoph Hellwig
2012-12-14  9:44         ` Bart Van Assche
2012-12-14 16:44           ` scameron
2012-12-14 16:15             ` Bart Van Assche
2012-12-14 19:55               ` scameron
2012-12-14 19:28                 ` Bart Van Assche
2012-12-14 21:06                   ` scameron
2012-12-15  9:40                     ` Bart Van Assche
2012-12-19 14:23                       ` Christoph Hellwig
2012-12-13 21:20       ` scameron
2012-12-14  0:22       ` Jack Wang
     [not found]         ` <CADzpL0TMT31yka98Zv0=53N4=pDZOc9+gacnvDWMbj+iZg4H5w@mail.gmail.com>
     [not found]           ` <006301cdd99c$35099b40$9f1cd1c0$@com>
     [not found]             ` <CADzpL0S5cfCRQftrxHij8KOjKj55psSJedmXLBQz1uQm_SC30A@mail.gmail.com>
2012-12-14  4:59               ` Jack Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121211224626.GB20898@beardog.cce.hp.com \
    --to=scameron@beardog.cce.hp.com \
    --cc=bvanassche@acm.org \
    --cc=dab@hp.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=stephenmcameron@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.