From: Bernd Schubert <bs_lists-ivAEE9vf7JuUmYeGgvxl9AC/G2K4zDHf@public.gmane.org>
To: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Bernd Schubert <bschubert-LfVdkaOWEx8@public.gmane.org>
Subject: Re: srp sg_tablesize
Date: Sat, 21 Aug 2010 20:20:54 +0200 [thread overview]
Message-ID: <201008212020.55028.bs_lists@aakef.fastmail.fm> (raw)
In-Reply-To: <AANLkTimFS=QkHd9+393mS1gQ5ZnL79jSDQaUZ8C_Xd2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Saturday, August 21, 2010, Bart Van Assche wrote:
> On Sat, Aug 21, 2010 at 6:27 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> > On Sat, 2010-08-21 at 13:14 +0200, Bart Van Assche wrote:
> > > On Fri, Aug 20, 2010 at 9:49 AM, Bernd Schubert
> > >
> > > <bs_lists-ivAEE9vf7JuUmYeGgvxl9AC/G2K4zDHf@public.gmane.org> wrote:
> > > > In ib_srp.c sg_tablesize is defined as 255. With that value we see
> > > > lots of IO requests of size 1020. As I already wrote on linux-scsi,
> > > > that is really sub- optimal for DDN storage, as lots of IO requests
> > > > of size 1020 come up.
> > > >
> > > > Now the question is if we can safely increase it. Is there somewhere
> > > > a definition what is the real hardware supported size? And shouldn't
> > > > we increase sg_tablesize, but also set the .dma_boundary value?
> > >
> > > (resending as plain text)
> > >
> > > The request size of 1020 indicates that there are less than 60 data
> > > buffer descriptors in the SRP_CMD request. So you are probably hitting
> > > another limit than srp_sg_tablesize.
> >
> > 4 KB * 255 descriptors = 1020 KB
> >
> > IIRC, we verified that we were seeing 255 entries in the S/G list with a
> > few printk()s, but it has been a few years.
> >
> > I'm not sure how you came up with 60 descriptors -- could you elaborate
> > please?
>
> The original message mentions "size 1020" but not the unit of that
> size. So I guessed that this referred to an SRP_CMD information unit
> of 1020 bytes. And in a SRP_CMD message of 1020 bytes there fit at
> most 59 descriptors ((1020-68)/16). Now that I see your computation,
> I'm afraid that my guess about the meaning of the original message was
> wrong. Looks like I have been delving too deep into the SRP protocol
Er sorry, I really meant 1020K IOs. That is something that easily can be
monitored on DDN storage.
> > ...
>
> > > Did this occur with buffered (asynchronous) or unbuffered (direct) I/O
> > > ? And in the first case, which I/O scheduler did you use ?
> >
> > I'm sure Bernd will speak for his situation, but we've seen it with both
> > buffered and unbuffered, with the deadline and noop schedulers (mostly
> > on vendor 2.6.18 kernels). CFQ never gave us larger than 512 KB
> > requests. Our main use is Lustre, which does unbuffered IO from the
> > kernel.
>
> If ib_srp is already sending SRP commands with 255 descriptors,
> changing the configuration of the I/O scheduler or the I/O mode will
> not help.
>
> What might help - depending on how the target is implemented - is
> using an I/O depth larger than one. ib_srp sends all SRP_CMDs with the
It depends if we enable write-back cache or not. The older S2A architechture
does not mirror the cache at all and therefore write-back cache is supposed to
be disabled. The recent SFA architechture mirrors the write-back cache and so
it supposed to be enabled. With enabled witeback-cache an 'improved' command
processing is done (I don't know details myself). However, cache mirroring is
an expensive operation if the system can do 10GB/s and there IOs only will go
into the cache, if the size is not a multiple of 1024K. 1MiB IOs are directly
send to the disks. And now that leaves us with srp, where we see to many 1020K
request, which will need to be processed by the write-back cache....
> task attribute SIMPLE, so a target is allowed to process these
> requests concurrently. For the ib_srpt target I see the following
> results over a single QDR link and a NULLIO target (fio
> --bs=$((1020*1024)) --ioengine=psync --buffered=0 --rw=read --thread
> --numjobs=${threads} --group_reporting --gtod_reduce=1 --name=${dev}
> --filename=${dev}):
>
> I/O depth Bandwidth (MB/s)
> 1 1270
> 2 2300
> 4 2500
> 8 2670
> 16 2700
>
> That last result is close to the bandwidth reported by ib_rdma_bw.
How exactly do you do that? Is that something I would try with our storage as
well? I guess only with a special firmware version, which I also do not have
access to.
Thanks,
Bernd
--
Bernd Schubert
DataDirect Networks
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-08-21 18:20 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-20 7:49 srp sg_tablesize Bernd Schubert
[not found] ` <201008200949.54595.bs_lists-ivAEE9vf7JuUmYeGgvxl9AC/G2K4zDHf@public.gmane.org>
2010-08-20 14:15 ` David Dillow
[not found] ` <1282313740.7441.25.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
2010-08-24 19:47 ` Bernd Schubert
[not found] ` <201008242147.50692.bs_lists-ivAEE9vf7JuUmYeGgvxl9AC/G2K4zDHf@public.gmane.org>
2010-08-24 20:23 ` David Dillow
2010-08-21 11:14 ` Bart Van Assche
[not found] ` <AANLkTimMoyEpfYPFSLLqS9ZCg3VyyOQcd4i2zzCQjHMN-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-08-21 16:27 ` David Dillow
[not found] ` <1282408043.20840.13.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2010-08-21 17:28 ` Bart Van Assche
[not found] ` <AANLkTimFS=QkHd9+393mS1gQ5ZnL79jSDQaUZ8C_Xd2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-08-21 18:20 ` Bernd Schubert [this message]
[not found] ` <201008212020.55028.bs_lists-ivAEE9vf7JuUmYeGgvxl9AC/G2K4zDHf@public.gmane.org>
2010-08-21 20:50 ` David Dillow
2010-08-22 7:15 ` Bart Van Assche
2010-08-21 20:38 ` David Dillow
2010-08-21 18:04 ` Bernd Schubert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201008212020.55028.bs_lists@aakef.fastmail.fm \
--to=bs_lists-ivaee9vf7juumyeggvxl9ac/g2k4zdhf@public.gmane.org \
--cc=bschubert-LfVdkaOWEx8@public.gmane.org \
--cc=bvanassche-HInyCGIudOg@public.gmane.org \
--cc=dillowda-1Heg1YXhbW8@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox