From: scameron@beardog.cce.hp.com
To: Bart Van Assche <bvanassche@acm.org>
Cc: linux-scsi@vger.kernel.org, stephenmcameron@gmail.com,
dab@hp.com, scameron@beardog.cce.hp.com
Subject: Re: SCSI mid layer and high IOPS capable devices
Date: Tue, 11 Dec 2012 16:46:26 -0600 [thread overview]
Message-ID: <20121211224626.GB20898@beardog.cce.hp.com> (raw)
In-Reply-To: <50C6ED1A.7040404@acm.org>
On Tue, Dec 11, 2012 at 09:21:46AM +0100, Bart Van Assche wrote:
> On 12/11/12 01:00, scameron@beardog.cce.hp.com wrote:
> >I tried using scsi_debug with fake_rw and also the scsi_ram driver
> >that was recently posted to get some idea of what the maximum IOPS
> >that could be pushed through the SCSI midlayer might be, and the
> >numbers were a little disappointing (was getting around 150k iops
> >with scsi_debug with reads and writes faked, and around 3x that
> >with the block driver actually doing the i/o).
>
> With which request size was that ?
4k (I'm thinking the request size should not matter too much since
fake_rw=1 causes the i/o not to actually be done -- there's no data
transferred. Similarly with scsi_ram there's a flag to discard
reads and writes that I was using.)
> I see about 330K IOPS @ 4 KB and
> about 540K IOPS @ 512 bytes with the SRP protocol, a RAM disk at the
> target side, a single SCSI LUN and a single IB cable. These results have
> been obtained on a setup with low-end CPU's. Had you set rq_affinity to
> 2 in your tests ?
No, hadn't done anything with rq_affinity. I had spread interrupts
around by turning off irqbalance and echoing things into /proc/irq/*, and
running a bunch of dd processes (one per cpu) like this:
taskset -c $cpu dd if=/dev/blah of=/dev/null bs=4k iflag=direct &
And the hardware in this case should route the interrupts back to the processor
which submitted the i/o (the submitted command contains info that lets the hw
know which msix vector we want the io to come back on.)
I would be curious to see what kind of results you would get with scsi_debug
with fake_rw=1. I am sort of suspecting that trying to put an "upper limit"
on scsi LLD IOPS performance by seeing what scsi_debug will do with fake_rw=1
is not really valid (or, maybe I'm doing it wrong) as I know of one case in
which a real HW scsi driver beats scsi_debug fake_rw=1 at IOPS on the very
same system, which seems like it shouldn't be possible. Kind of mysterious.
Another mystery I haven't been able to clear up -- I'm using code like
this to set affinity hints
int i, cpu;
cpu = cpumask_first(cpu_online_mask);
for (i = 0; i < h->noqs; i++) {
int idx = i ? i + 1 : i;
int rc;
rc = irq_set_affinity_hint(h->qinfo[idx].msix_vector,
get_cpu_mask(cpu));
if (rc)
dev_warn(&h->pdev->dev, "Failed to hint affinity of vector %d to cpu %d\n",
h->qinfo[idx].msix_vector, cpu);
cpu = cpumask_next(cpu, cpu_online_mask);
}
and those hints are set (querying /proc/irq/*/affinity_hint shows that my hints
are in there) but the hints are not "taken", that is /proc/irq/smp_affinity
does not match the hints.
doing this:
for x in `seq $first_irq $last_irq`
do
cat /proc/irq/$x/affinity_hint > /proc/irq/$x/smp_affinity
done
(where first_irq and last_irq specify the range of irqs assigned
to my driver) makes the hints be "taken".
I noticed nvme doesn't seem to suffer from this, somehow the hints are
taken automatically (er, I don't recall if /proc/irq/*/smp_affinity matches
affinity_hints for nvme, but interrupts seem spread around without doing
anything special). I haven't seen anything in the nvme code related to affinity
that I'm not already doing as well in my driver, so it is a mystery to me why
that difference in behavior occurs.
-- steve
next prev parent reply other threads:[~2012-12-11 21:46 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-11 0:00 SCSI mid layer and high IOPS capable devices scameron
2012-12-11 8:21 ` Bart Van Assche
2012-12-11 22:46 ` scameron [this message]
2012-12-13 11:40 ` Bart Van Assche
2012-12-13 18:03 ` scameron
2012-12-13 17:18 ` Bart Van Assche
2012-12-13 15:22 ` Bart Van Assche
2012-12-13 17:25 ` scameron
2012-12-13 16:47 ` Bart Van Assche
2012-12-13 16:49 ` Christoph Hellwig
2012-12-14 9:44 ` Bart Van Assche
2012-12-14 16:44 ` scameron
2012-12-14 16:15 ` Bart Van Assche
2012-12-14 19:55 ` scameron
2012-12-14 19:28 ` Bart Van Assche
2012-12-14 21:06 ` scameron
2012-12-15 9:40 ` Bart Van Assche
2012-12-19 14:23 ` Christoph Hellwig
2012-12-13 21:20 ` scameron
2012-12-14 0:22 ` Jack Wang
[not found] ` <CADzpL0TMT31yka98Zv0=53N4=pDZOc9+gacnvDWMbj+iZg4H5w@mail.gmail.com>
[not found] ` <006301cdd99c$35099b40$9f1cd1c0$@com>
[not found] ` <CADzpL0S5cfCRQftrxHij8KOjKj55psSJedmXLBQz1uQm_SC30A@mail.gmail.com>
2012-12-14 4:59 ` Jack Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121211224626.GB20898@beardog.cce.hp.com \
--to=scameron@beardog.cce.hp.com \
--cc=bvanassche@acm.org \
--cc=dab@hp.com \
--cc=linux-scsi@vger.kernel.org \
--cc=stephenmcameron@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).