linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jack Wang" <jack_wang@usish.com>
To: 'Bart Van Assche' <bvanassche@acm.org>, scameron@beardog.cce.hp.com
Cc: linux-scsi@vger.kernel.org, stephenmcameron@gmail.com, dab@hp.com
Subject: RE: SCSI mid layer and high IOPS capable devices
Date: Fri, 14 Dec 2012 08:22:26 +0800	[thread overview]
Message-ID: <005c01cdd991$18d59bf0$4a80d3d0$@com> (raw)
In-Reply-To: <50CA0692.2010903@acm.org>

On 12/13/12 18:25, scameron@beardog.cce.hp.com wrote:
> On Thu, Dec 13, 2012 at 04:22:33PM +0100, Bart Van Assche wrote:
>> On 12/11/12 01:00, scameron@beardog.cce.hp.com wrote:
>>> The driver, like nvme, has a submit and reply queue per cpu.
>>
>> This is interesting. If my interpretation of the POSIX spec is 
>> correct then aio_write() allows to queue overlapping writes and all 
>> writes submitted by the same thread have to be performed in the order 
>> they were submitted by that thread. What if a thread submits a first 
>> write via aio_write(), gets rescheduled on another CPU and submits a 
>> second overlapping write also via aio_write() ? If a block driver 
>> uses one queue per CPU, does that mean that such writes that were 
>> issued in order can be executed in a different order by the driver 
>> and/or hardware than the order in which the writes were submitted ?
>>
>> See also the aio_write() man page, The Open Group Base Specifications 
>> Issue 7, IEEE Std 1003.1-2008 
>>
(http://pubs.opengroup.org/onlinepubs/9699919799/functions/aio_write.html).
>
> It is my understanding that the low level driver is free to re-order 
> the i/o's any way it wants, as is the hardware.  It is up to the 
> layers above to enforce any ordering requirements.  For a long time 
> there was a bug in the cciss driver that all i/o's submitted to the 
> driver got reversed in order -- adding to head of a list instead of to 
> the tail, or vice versa, I forget which -- and it caused no real 
> problems (apart from some slight performance issues that were mostly
masked by the Smart Array's cache.
> It was caught by firmware guys noticing LBAs coming in in weird orders 
> for supposedly sequential workloads.
>
> So in your scenario, I think the overlapping writes should not be 
> submitted by the block layer to the low level driver concurrently, as 
> the block layer is aware that the lld is free to re-order things.  (I 
> am very certain that this is the case for scsi low level drivers and 
> block drivers using a request_fn interface -- less certain about block 
> drivers using the make_request interface to submit i/o's, as this 
> interface is pretty new to me.

As far as I know there are basically two choices:
1. Allow the LLD to reorder any pair of write requests. The only way
    for higher layers to ensure the order of (overlapping) writes is then
    to separate these in time. Or in other words, limit write request
    queue depth to one.
2. Do not allow the LLD to reorder overlapping write requests. This
    allows higher software layers to queue write requests (queue depth
    > 1).

 From my experience with block and SCSI drivers option (1) doesn't look
attractive from a performance point of view. From what I have seen
performance with QD=1 is several times lower than performance with QD > 1.
But maybe I overlooked something ?



Bart.

I was seen low queue depth improve sequential performance, and high queue
depth improve random performance.

Jack

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the
body of a message to majordomo@vger.kernel.org More majordomo info at
http://vger.kernel.org/majordomo-info.html


  parent reply	other threads:[~2012-12-14  0:38 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-11  0:00 SCSI mid layer and high IOPS capable devices scameron
2012-12-11  8:21 ` Bart Van Assche
2012-12-11 22:46   ` scameron
2012-12-13 11:40     ` Bart Van Assche
2012-12-13 18:03       ` scameron
2012-12-13 17:18         ` Bart Van Assche
2012-12-13 15:22 ` Bart Van Assche
2012-12-13 17:25   ` scameron
2012-12-13 16:47     ` Bart Van Assche
2012-12-13 16:49       ` Christoph Hellwig
2012-12-14  9:44         ` Bart Van Assche
2012-12-14 16:44           ` scameron
2012-12-14 16:15             ` Bart Van Assche
2012-12-14 19:55               ` scameron
2012-12-14 19:28                 ` Bart Van Assche
2012-12-14 21:06                   ` scameron
2012-12-15  9:40                     ` Bart Van Assche
2012-12-19 14:23                       ` Christoph Hellwig
2012-12-13 21:20       ` scameron
2012-12-14  0:22       ` Jack Wang [this message]
     [not found]         ` <CADzpL0TMT31yka98Zv0=53N4=pDZOc9+gacnvDWMbj+iZg4H5w@mail.gmail.com>
     [not found]           ` <006301cdd99c$35099b40$9f1cd1c0$@com>
     [not found]             ` <CADzpL0S5cfCRQftrxHij8KOjKj55psSJedmXLBQz1uQm_SC30A@mail.gmail.com>
2012-12-14  4:59               ` Jack Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='005c01cdd991$18d59bf0$4a80d3d0$@com' \
    --to=jack_wang@usish.com \
    --cc=bvanassche@acm.org \
    --cc=dab@hp.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=scameron@beardog.cce.hp.com \
    --cc=stephenmcameron@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).