* Adjusting minimum packet size or "wait to merge requests" in SRP
@ 2009-10-28 18:47 Chris Worley
[not found] ` <f3177b9e0910281147u5a47f75ao8bbe156d5b04969c-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Chris Worley @ 2009-10-28 18:47 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, scst-devel
It appears that SRP tries to coalesce and fragment initiator I/O
requests into 64KB packets, as that looks to be the size requested
to/from the device on the target side (and the I/O scheduler is
disabled on the target).
Is there a way to control this, where no coalescing occurs when
latency is an issue and requests are small, and no fragmentation
occurs when requests are large?
Or, am I totally wrong in my assumption that SRP is coalescing/fragmenting data?
Thanks,
Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread[parent not found: <f3177b9e0910281147u5a47f75ao8bbe156d5b04969c-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Adjusting minimum packet size or "wait to merge requests" in SRP [not found] ` <f3177b9e0910281147u5a47f75ao8bbe156d5b04969c-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-10-28 19:14 ` Bart Van Assche [not found] ` <e2e108260910281214y5e3b5f4u24438986672e81b3-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2009-10-28 19:51 ` Roland Dreier 2009-10-29 18:30 ` [Scst-devel] " Vladislav Bolkhovitin 2 siblings, 1 reply; 8+ messages in thread From: Bart Van Assche @ 2009-10-28 19:14 UTC (permalink / raw) To: Chris Worley; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, scst-devel On Wed, Oct 28, 2009 at 7:47 PM, Chris Worley <worleys-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > It appears that SRP tries to coalesce and fragment initiator I/O > requests into 64KB packets, as that looks to be the size requested > to/from the device on the target side (and the I/O scheduler is > disabled on the target). > > Is there a way to control this, where no coalescing occurs when > latency is an issue and requests are small, and no fragmentation > occurs when requests are large? > > Or, am I totally wrong in my assumption that SRP is coalescing/fragmenting data? Regarding avoiding coalescing of I/O requests: which I/O scheduler is being used on the initiator system and how has it been configured via sysfs ? Adjusting the constant MAX_RDMA_SIZE in scst/srpt/src/ib_srpt.h might help to avoid fragmentation of large requests by the SRP protocol. Please post a follow-up message to the mailing list with your findings such that MAX_RDMA_SIZE can be converted from a compile-time constant to a sysfs variable if this would be useful. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <e2e108260910281214y5e3b5f4u24438986672e81b3-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Adjusting minimum packet size or "wait to merge requests" in SRP [not found] ` <e2e108260910281214y5e3b5f4u24438986672e81b3-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-10-28 19:38 ` Chris Worley [not found] ` <f3177b9e0910281238n1e53653eq3e667010caf8e745-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Chris Worley @ 2009-10-28 19:38 UTC (permalink / raw) To: Bart Van Assche; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, scst-devel On Wed, Oct 28, 2009 at 1:14 PM, Bart Van Assche <bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > On Wed, Oct 28, 2009 at 7:47 PM, Chris Worley <worleys-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> It appears that SRP tries to coalesce and fragment initiator I/O >> requests into 64KB packets, as that looks to be the size requested >> to/from the device on the target side (and the I/O scheduler is >> disabled on the target). >> >> Is there a way to control this, where no coalescing occurs when >> latency is an issue and requests are small, and no fragmentation >> occurs when requests are large? >> >> Or, am I totally wrong in my assumption that SRP is coalescing/fragmenting data? > > Regarding avoiding coalescing of I/O requests: which I/O scheduler is > being used on the initiator system and how has it been configured via > sysfs ? There is no scheduler running on either target or initiator on the drives in question (sorry I worded that incorrectly initially), or so I've been told (this information is second-hand). I did see iostat output from the initiator in his case, where there were long waits and service times that I'm guessing was due to some coalescing/merging. There was also a hint in the iostat output that a scheduler was enabled, as there were non-zero values (occasionally) under the [rw]qm/s columns, which, if I understand iostat correctly, means there is a scheduler merging results. So you're saying there is no hold-off for merging on the initiator side of the IB/SRP stack? > > Adjusting the constant MAX_RDMA_SIZE in scst/srpt/src/ib_srpt.h might > help to avoid fragmentation of large requests by the SRP protocol. > Please post a follow-up message to the mailing list with your findings > such that MAX_RDMA_SIZE can be converted from a compile-time constant > to a sysfs variable if this would be useful. Will do. Thanks, Chris > > Bart. > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <f3177b9e0910281238n1e53653eq3e667010caf8e745-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Adjusting minimum packet size or "wait to merge requests" in SRP [not found] ` <f3177b9e0910281238n1e53653eq3e667010caf8e745-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-10-28 19:58 ` David Dillow [not found] ` <1256759902.3544.9.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: David Dillow @ 2009-10-28 19:58 UTC (permalink / raw) To: Chris Worley Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA, scst-devel On Wed, 2009-10-28 at 13:38 -0600, Chris Worley wrote: > There is no scheduler running on either target or initiator on the > drives in question (sorry I worded that incorrectly initially), or so > I've been told (this information is second-hand). So, noop scheduler, then? Under noop, the block layer will send requests as soon as it can without merging. If it has more requests outstanding than the queue length on the SRP initiator, then it will merge the new request with the queued ones if possible. > I did see iostat > output from the initiator in his case, where there were long waits and > service times that I'm guessing was due to some coalescing/merging. > There was also a hint in the iostat output that a scheduler was > enabled, as there were non-zero values (occasionally) under the > [rw]qm/s columns, which, if I understand iostat correctly, means there > is a scheduler merging results. > > So you're saying there is no hold-off for merging on the initiator > side of the IB/SRP stack? The SRP initiator just hands off requests as quick as they are sent to it by the block layer. You can control how big those requests are by tuning /sys/block/$DEV/queue/max_sectors_kb up to .../max_hw_sectors_kb which gets set by the max_sect parameter when adding the SRP target. You can get some hold-off potentially by using a non-noop scheduler for the block device, see /sys/block/$DEV/queue/scheduler. 'as' or 'deadline' may fit your bill, but they have a habit of breaking up requests into smaller chunks. Also, you want 'options ib_srp srp_sg_tablesize=255' in /etc/modprobe.conf, as by default it only allows 12 scatter/gather entries, which will only guarantee a 48KB request size. Using 255 guarantees you can send a 1020KB request. Of course, if the pages coalesce in the request, you can send much larger requests before running out of S/G entires. max_sectors_kb will limit what gets sent in either case. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <1256759902.3544.9.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>]
* Re: Adjusting minimum packet size or "wait to merge requests" in SRP [not found] ` <1256759902.3544.9.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org> @ 2009-10-28 20:25 ` Chris Worley [not found] ` <f3177b9e0910281325i5ef5ce86u758ed665329232f2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Chris Worley @ 2009-10-28 20:25 UTC (permalink / raw) To: David Dillow Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA, scst-devel On Wed, Oct 28, 2009 at 1:58 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote: > On Wed, 2009-10-28 at 13:38 -0600, Chris Worley wrote: >> There is no scheduler running on either target or initiator on the >> drives in question (sorry I worded that incorrectly initially), or so >> I've been told (this information is second-hand). > > So, noop scheduler, then? Yes, "elevator=noop" on both sides. Again, sorry to be unclear about that. > > Under noop, the block layer will send requests as soon as it can without > merging. If it has more requests outstanding than the queue length on > the SRP initiator, then it will merge the new request with the queued > ones if possible. So, noop will merge requests when the queue is full, but not hold-off to merge? <snip> > > The SRP initiator just hands off requests as quick as they are sent to > it by the block layer. You can control how big those requests are by > tuning /sys/block/$DEV/queue/max_sectors_kb up to .../max_hw_sectors_kb > which gets set by the max_sect parameter when adding the SRP target. So the block layer may also hold-off on small requests, and decreasing max_sectors_kb will force it to flush to the SRP initiator ASAP (or is this just used for fragmentation of large requests)? Note that 'm trying to minimize latency for very small requests. Thanks, Chris -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <f3177b9e0910281325i5ef5ce86u758ed665329232f2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Adjusting minimum packet size or "wait to merge requests" in SRP [not found] ` <f3177b9e0910281325i5ef5ce86u758ed665329232f2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-10-28 21:05 ` David Dillow 0 siblings, 0 replies; 8+ messages in thread From: David Dillow @ 2009-10-28 21:05 UTC (permalink / raw) To: Chris Worley Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, scst-devel On Wed, 2009-10-28 at 16:25 -0400, Chris Worley wrote: > On Wed, Oct 28, 2009 at 1:58 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote: > > Under noop, the block layer will send requests as soon as it can without > > merging. If it has more requests outstanding than the queue length on > > the SRP initiator, then it will merge the new request with the queued > > ones if possible. > > So, noop will merge requests when the queue is full, but not hold-off > to merge? Correct. > > The SRP initiator just hands off requests as quick as they are sent to > > it by the block layer. You can control how big those requests are by > > tuning /sys/block/$DEV/queue/max_sectors_kb up to .../max_hw_sectors_kb > > which gets set by the max_sect parameter when adding the SRP target. > > So the block layer may also hold-off on small requests, and decreasing > max_sectors_kb will force it to flush to the SRP initiator ASAP (or is > this just used for fragmentation of large requests)? It is just used for breaking up large requests. The deadline, as, and cfq schedulers may have some hold-off -- I've not checked -- but noop does not. You can check the length of the queue by looking at /sys/class/scsi_disk/$TARGET/device/queue_depth. That may well be 63, which is the maximum queue depth for the SRP initiator unless you patch the source. Keep in mind that those 63 requests are shared across all LUNs on that connection, so you may queue up before that, if you are driving many LUNs. > Note that 'm trying to minimize latency for very small requests. Reads or writes? Are you doing direct IO or plain read/write? File system or block device access? Are you using the SCSI devices (/dev/sda etc) or DM multipath (/dev/mpath/*)? The SRP initiator is playing the cards it has been dealt, but you could be getting coalescing from the rest of the system -- for example, I have no idea if the SRP target code will do read ahead and turn a 4KB request into a 64KB one -- I suspect it is possible. You can also turn on SCSI logging to see what is being handed to the initiator to be sure which side of the connection this is occurring. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Adjusting minimum packet size or "wait to merge requests" in SRP [not found] ` <f3177b9e0910281147u5a47f75ao8bbe156d5b04969c-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2009-10-28 19:14 ` Bart Van Assche @ 2009-10-28 19:51 ` Roland Dreier 2009-10-29 18:30 ` [Scst-devel] " Vladislav Bolkhovitin 2 siblings, 0 replies; 8+ messages in thread From: Roland Dreier @ 2009-10-28 19:51 UTC (permalink / raw) To: Chris Worley; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, scst-devel > It appears that SRP tries to coalesce and fragment initiator I/O > requests into 64KB packets, as that looks to be the size requested > to/from the device on the target side (and the I/O scheduler is > disabled on the target). There is no code in the SRP initiator that does anything to change IO requests that I know of. So I think this is happening somewhere higher in the stack. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Scst-devel] Adjusting minimum packet size or "wait to merge requests" in SRP [not found] ` <f3177b9e0910281147u5a47f75ao8bbe156d5b04969c-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2009-10-28 19:14 ` Bart Van Assche 2009-10-28 19:51 ` Roland Dreier @ 2009-10-29 18:30 ` Vladislav Bolkhovitin 2 siblings, 0 replies; 8+ messages in thread From: Vladislav Bolkhovitin @ 2009-10-29 18:30 UTC (permalink / raw) To: Chris Worley; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, scst-devel Chris Worley, on 10/28/2009 09:47 PM wrote: > It appears that SRP tries to coalesce and fragment initiator I/O > requests into 64KB packets, as that looks to be the size requested > to/from the device on the target side (and the I/O scheduler is > disabled on the target). > > Is there a way to control this, where no coalescing occurs when > latency is an issue and requests are small, and no fragmentation > occurs when requests are large? > > Or, am I totally wrong in my assumption that SRP is coalescing/fragmenting data? You can at any time see size of requests you are receiving on the target side by either enabling "scsi" logging (hopefully, you know how to do it) or by looking in /proc/scsi_tgt/sgv. In the latter file you will see general statistics for power of 2 allocations, i.e. request for 10K will increase 16K row. Vlad -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-10-29 18:30 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-28 18:47 Adjusting minimum packet size or "wait to merge requests" in SRP Chris Worley
[not found] ` <f3177b9e0910281147u5a47f75ao8bbe156d5b04969c-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-10-28 19:14 ` Bart Van Assche
[not found] ` <e2e108260910281214y5e3b5f4u24438986672e81b3-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-10-28 19:38 ` Chris Worley
[not found] ` <f3177b9e0910281238n1e53653eq3e667010caf8e745-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-10-28 19:58 ` David Dillow
[not found] ` <1256759902.3544.9.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
2009-10-28 20:25 ` Chris Worley
[not found] ` <f3177b9e0910281325i5ef5ce86u758ed665329232f2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-10-28 21:05 ` David Dillow
2009-10-28 19:51 ` Roland Dreier
2009-10-29 18:30 ` [Scst-devel] " Vladislav Bolkhovitin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox