* [SCSI] qla2xxx: Question about out-of-order SCSI command processing
@ 2013-12-02 22:12 Alireza Haghdoost
2013-12-02 23:38 ` James Bottomley
2013-12-03 12:09 ` Bart Van Assche
0 siblings, 2 replies; 8+ messages in thread
From: Alireza Haghdoost @ 2013-12-02 22:12 UTC (permalink / raw)
To: linux-scsi; +Cc: Andrew Vasquez, linux-driver, Jerry Fredin
Hi,
We are working on a very high I/O throughput application and facing
with a challenge to send the I/O request to a SAN drive in-order. I
would appreciate if you can help us with an explanation about this
unexpected behavior of SCSI layer or qla2xxx driver.
The problem is that I/O requests arrive out-of-order to the SAN
controller if we submit those request with about 1us gap. In other
words, we observed if we issue I/O request very fast they arrive to
the SAN controller out-of-order. For example we observe I/O requests
dispatched from block layer request_queue with the ordering in the
lefts column and then arrive to the SAN controller with right column
order:
Dispatched Arrived to
>From Block Layer SAN Controller
=========== ===========
LBA1 LBA2
LBA2 LBA3
LBA3 LBA1
LBA4 LBA4
We track the requests ordering from application layer down to the
block layer and it seems the block layer pull requests off the block
layer request_queue in-order. These requests also handed to the
low-level scsi driver (qla2xxx) in order. However, the ordering
changes when those requests arrive to the SAN controller.
There could be potentially multiple reasons for such a undesirable
behavior. Since the scsi commands are entered in the qla2xxx TCQ queue
in-order, I think the low-level scsi driver are not able to pull these
scsi commands off the TCQ in-order.
I was wondering if there is any way to tune the qla2xxx to perform
like a FIFO and does not change the ordering of the scsi commands ?
Note that we are using kernel 3.11 and libaio to perform async IO.
Moreover I don't see any out-of-order request arrives to the SAN
controller initially when the limited load on the host. However, I
start to saw out-of-order pattern when the /sys/block/sdX/inflight is
above 500. I don't think the qla2xxx is overloaded yet since it can
accommodate up to 2048 in_flight scsi command. Moreover, I don't see
out-of-order requests arrive to SAN if I submit the requests with 5us
gap. In this case, the number of in_flight scsi commands also can
shoot up to 2048 but the ordering is maintained.
I would appreciate if you can provide an explanation for such
undesirable behavior of scsi subsystem.
Thanks,
Alireza
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [SCSI] qla2xxx: Question about out-of-order SCSI command processing
2013-12-02 22:12 [SCSI] qla2xxx: Question about out-of-order SCSI command processing Alireza Haghdoost
@ 2013-12-02 23:38 ` James Bottomley
2013-12-03 12:26 ` Bart Van Assche
2013-12-03 12:09 ` Bart Van Assche
1 sibling, 1 reply; 8+ messages in thread
From: James Bottomley @ 2013-12-02 23:38 UTC (permalink / raw)
To: Alireza Haghdoost; +Cc: linux-scsi, Andrew Vasquez, linux-driver, Jerry Fredin
On Mon, 2013-12-02 at 16:12 -0600, Alireza Haghdoost wrote:
> Hi,
>
> We are working on a very high I/O throughput application and facing
> with a challenge to send the I/O request to a SAN drive in-order. I
> would appreciate if you can help us with an explanation about this
> unexpected behavior of SCSI layer or qla2xxx driver.
>
> The problem is that I/O requests arrive out-of-order to the SAN
> controller if we submit those request with about 1us gap. In other
> words, we observed if we issue I/O request very fast they arrive to
> the SAN controller out-of-order. For example we observe I/O requests
> dispatched from block layer request_queue with the ordering in the
> lefts column and then arrive to the SAN controller with right column
> order:
Well this would be because we don't guarantee order at any granularity
below barriers. We won't reorder across barriers but below them we can
reorder the commands and, of course, we use simple tags for queuing
which entitles the underlying storage hardware to reorder within its
internal queue. Previously, when everything was single threaded issue,
you mostly got FIFO behaviour because reorder really only occurred on
error or busy, but I would imagine that's changing now with multiqueue.
James
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [SCSI] qla2xxx: Question about out-of-order SCSI command processing
2013-12-02 22:12 [SCSI] qla2xxx: Question about out-of-order SCSI command processing Alireza Haghdoost
2013-12-02 23:38 ` James Bottomley
@ 2013-12-03 12:09 ` Bart Van Assche
2013-12-03 17:19 ` Alireza Haghdoost
1 sibling, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2013-12-03 12:09 UTC (permalink / raw)
To: Alireza Haghdoost, linux-scsi; +Cc: Andrew Vasquez, linux-driver, Jerry Fredin
On 12/02/13 23:12, Alireza Haghdoost wrote:
> Note that we are using kernel 3.11 and libaio to perform async IO.
I think libaio can reorder commands before these reach the SCSI core.
Bart.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [SCSI] qla2xxx: Question about out-of-order SCSI command processing
2013-12-02 23:38 ` James Bottomley
@ 2013-12-03 12:26 ` Bart Van Assche
2013-12-03 13:25 ` James Bottomley
0 siblings, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2013-12-03 12:26 UTC (permalink / raw)
To: James Bottomley, Alireza Haghdoost
Cc: linux-scsi, Andrew Vasquez, linux-driver, Jerry Fredin
On 12/03/13 00:38, James Bottomley wrote:
> Well this would be because we don't guarantee order at any granularity
> below barriers. We won't reorder across barriers but below them we can
> reorder the commands and, of course, we use simple tags for queuing
> which entitles the underlying storage hardware to reorder within its
> internal queue. Previously, when everything was single threaded issue,
> you mostly got FIFO behaviour because reorder really only occurred on
> error or busy, but I would imagine that's changing now with multiqueue.
Reordering SCSI commands was fine as long as hard disks were the only
supported storage medium. This is because most hard disk controllers do
not perform writes in the order these writes are submitted to their
controller. However, with several SSD models it is possible to tell the
controller to preserve write order. Furthermore, the optimizations that
are possible by using atomic writes are only safe if it is guaranteed
that none of the layers between the application and the SCSI target
changes the order in which an application submitted these atomic writes.
In other words, although it was safe in the past to reorder the writes
submitted between two successive barriers such reordering would
eliminate several of the benefits of atomic writes. A quote from the
draft SCSI atomics specification
(http://www.t10.org/cgi-bin/ac.pl?t=d&f=13-064r7.pdf):
<quote>
Atomic writes may:
a) increase write endurance
A) reducing writes increases the life of a flash-based SSD
b) increase performance
A) reducing writes results in fewer system calls, fewer I/Os over
the SCSI transport protocol, and fewer interrupts
c) improve reliability for non-journaled data
d) simplify applications
A) reduce or eliminate journaling
B) keep applications from managing atomicity
</quote>
Bart.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [SCSI] qla2xxx: Question about out-of-order SCSI command processing
2013-12-03 12:26 ` Bart Van Assche
@ 2013-12-03 13:25 ` James Bottomley
2013-12-03 17:46 ` Alireza Haghdoost
0 siblings, 1 reply; 8+ messages in thread
From: James Bottomley @ 2013-12-03 13:25 UTC (permalink / raw)
To: Bart Van Assche
Cc: Alireza Haghdoost, linux-scsi, Andrew Vasquez, linux-driver,
Jerry Fredin
On Tue, 2013-12-03 at 13:26 +0100, Bart Van Assche wrote:
> On 12/03/13 00:38, James Bottomley wrote:
> > Well this would be because we don't guarantee order at any granularity
> > below barriers. We won't reorder across barriers but below them we can
> > reorder the commands and, of course, we use simple tags for queuing
> > which entitles the underlying storage hardware to reorder within its
> > internal queue. Previously, when everything was single threaded issue,
> > you mostly got FIFO behaviour because reorder really only occurred on
> > error or busy, but I would imagine that's changing now with multiqueue.
>
> Reordering SCSI commands was fine as long as hard disks were the only
> supported storage medium. This is because most hard disk controllers do
> not perform writes in the order these writes are submitted to their
> controller.
Well, no, we could have used Ordered instead of Simple tags ... that
would preserve submission order according to spec. This wouldn't really
work for SATA because NCQ only has simple tags. The point is that our
granular unit of ordering is between two barriers, which is way above
the request/tag level so we didn't bother to enforce tag ordering. We
discussed it over the course of several years, because strict ordering
would have relieved us of the need to do barriers. However, handling
strict ordering in the face of requeuing events like QUEUE FULL or BUSY
is hard so we didn't bother.
James
> However, with several SSD models it is possible to tell the
> controller to preserve write order. Furthermore, the optimizations that
> are possible by using atomic writes are only safe if it is guaranteed
> that none of the layers between the application and the SCSI target
> changes the order in which an application submitted these atomic writes.
> In other words, although it was safe in the past to reorder the writes
> submitted between two successive barriers such reordering would
> eliminate several of the benefits of atomic writes. A quote from the
> draft SCSI atomics specification
> (http://www.t10.org/cgi-bin/ac.pl?t=d&f=13-064r7.pdf):
> <quote>
> Atomic writes may:
> a) increase write endurance
> A) reducing writes increases the life of a flash-based SSD
> b) increase performance
> A) reducing writes results in fewer system calls, fewer I/Os over
> the SCSI transport protocol, and fewer interrupts
> c) improve reliability for non-journaled data
> d) simplify applications
> A) reduce or eliminate journaling
> B) keep applications from managing atomicity
> </quote>
>
> Bart.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [SCSI] qla2xxx: Question about out-of-order SCSI command processing
2013-12-03 12:09 ` Bart Van Assche
@ 2013-12-03 17:19 ` Alireza Haghdoost
0 siblings, 0 replies; 8+ messages in thread
From: Alireza Haghdoost @ 2013-12-03 17:19 UTC (permalink / raw)
To: Bart Van Assche; +Cc: linux-scsi, Andrew Vasquez, linux-driver, Jerry Fredin
On Tue, Dec 3, 2013 at 6:09 AM, Bart Van Assche <bvanassche@acm.org> wrote:
> I think libaio can reorder commands before these reach the SCSI core.
Thanks for your comments. I think the libaio is above the block layer.
Therefore, since we observed the ordering is maintained in the block
layer, I don't think libaio reorder IO requests with in an IO context.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [SCSI] qla2xxx: Question about out-of-order SCSI command processing
2013-12-03 13:25 ` James Bottomley
@ 2013-12-03 17:46 ` Alireza Haghdoost
2013-12-03 18:34 ` James Bottomley
0 siblings, 1 reply; 8+ messages in thread
From: Alireza Haghdoost @ 2013-12-03 17:46 UTC (permalink / raw)
To: James Bottomley
Cc: Bart Van Assche, linux-scsi, Andrew Vasquez, linux-driver,
Jerry Fredin
On Tue, Dec 3, 2013 at 7:25 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> Well, no, we could have used Ordered instead of Simple tags ... that
> would preserve submission order according to spec. This wouldn't really
> work for SATA because NCQ only has simple tags.
Thanks a lot James for your comments. Is it possible to configure TCQ
mode to the Ordered tag instead of Simple tags? I understand that NCQ
does not support Ordered tags but I think it would be nice to keep
this functionality as an option for other SCSI targets like qla2xxx.
I can see the discussion about TAG ordering in the mailing list.
However, I am not sure if it is functional right now or not.
> The point is that our
> granular unit of ordering is between two barriers, which is way above
> the request/tag level so we didn't bother to enforce tag ordering.
Does a barrier force flush all in_flight SCSI commands ? Based on my
understanding if we put a barrier between multiple requests, it wont
return until TCQ process all in_flight scsi commands. Which means we
can not keep a fixed load on TCQ and it would certainly reduce the
throughput of our application.
> However, handling
> strict ordering in the face of requeuing events like QUEUE FULL or BUSY
> is hard so we didn't bother.
We have a peace of code to monitor in_flight requests and avoid
QUEUE_FULL events. However, would you please let us know a case that
cause a BUSY events ? Does it means the scsi target is busy processing
other requests with-in the same host machine ?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [SCSI] qla2xxx: Question about out-of-order SCSI command processing
2013-12-03 17:46 ` Alireza Haghdoost
@ 2013-12-03 18:34 ` James Bottomley
0 siblings, 0 replies; 8+ messages in thread
From: James Bottomley @ 2013-12-03 18:34 UTC (permalink / raw)
To: Alireza Haghdoost
Cc: Bart Van Assche, linux-scsi, Andrew Vasquez, linux-driver,
Jerry Fredin
On Tue, 2013-12-03 at 11:46 -0600, Alireza Haghdoost wrote:
> On Tue, Dec 3, 2013 at 7:25 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > Well, no, we could have used Ordered instead of Simple tags ... that
> > would preserve submission order according to spec. This wouldn't really
> > work for SATA because NCQ only has simple tags.
>
> Thanks a lot James for your comments. Is it possible to configure TCQ
> mode to the Ordered tag instead of Simple tags? I understand that NCQ
> does not support Ordered tags but I think it would be nice to keep
> this functionality as an option for other SCSI targets like qla2xxx.
> I can see the discussion about TAG ordering in the mailing list.
> However, I am not sure if it is functional right now or not.
It's set in the scsi_populate_tag() inline function (scsi_tcq.h).
That's currently hard coded to simple tags.
> > The point is that our
> > granular unit of ordering is between two barriers, which is way above
> > the request/tag level so we didn't bother to enforce tag ordering.
>
> Does a barrier force flush all in_flight SCSI commands ?
A flush barrier does, yes ... that's the predominant implementation.
> Based on my
> understanding if we put a barrier between multiple requests, it wont
> return until TCQ process all in_flight scsi commands. Which means we
> can not keep a fixed load on TCQ and it would certainly reduce the
> throughput of our application.
Yes, that's what we see in filesystems with barriers enabled. It's the
price we pay for integrity.
> > However, handling
> > strict ordering in the face of requeuing events like QUEUE FULL or BUSY
> > is hard so we didn't bother.
>
> We have a peace of code to monitor in_flight requests and avoid
> QUEUE_FULL events. However, would you please let us know a case that
> cause a BUSY events ? Does it means the scsi target is busy processing
> other requests with-in the same host machine ?
BUSY is a catch all status. It's different from QUEUE FULL because
queue tracking algorithms use QUEUE FULL to determine the optimal number
of in-flight commands (we can do that in the mid-layer today with the
queue full tracking code). BUSY means the command needs retrying
because of some other condition on the initiator that isn't connected
with the task queues, so it isn't counted against the queue full status
tracking for reducing command flows. It's most often returned by
multi-initiator devices in the presence of management (or even
statistics type) conditions, or because of scheduling or caching issues.
James
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-12-03 18:34 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-02 22:12 [SCSI] qla2xxx: Question about out-of-order SCSI command processing Alireza Haghdoost
2013-12-02 23:38 ` James Bottomley
2013-12-03 12:26 ` Bart Van Assche
2013-12-03 13:25 ` James Bottomley
2013-12-03 17:46 ` Alireza Haghdoost
2013-12-03 18:34 ` James Bottomley
2013-12-03 12:09 ` Bart Van Assche
2013-12-03 17:19 ` Alireza Haghdoost
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox