From: "Steve Wise" <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: 'Chuck Lever' <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: "'Amrani,
Ram'" <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>,
"'Elior,
Ariel'" <Ariel.Elior-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>,
"'Kalderon,
Michal'"
<Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
'Hariprasad S'
<hariprasad-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>,
'Faisal Latif'
<faisal.latif-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
'Doug Ledford' <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: RE: NFSoRDMA Fails for max_sge Less Than 18
Date: Wed, 11 Jan 2017 13:53:40 -0600 [thread overview]
Message-ID: <038901d26c44$67d6c4f0$37844ed0$@opengridcomputing.com> (raw)
In-Reply-To: <28B0D906-7BDB-4B87-94E9-6BE263BFBFF7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> Hi Steve-
>
>
> > On Jan 11, 2017, at 12:04 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
> wrote:
> >
> > Hey Chuck,
> >
> >>>
> >>> Browsing the code of other drivers it can be seen that this ability is
> > either
> >> hardcoded or is
> >>> learnt by the driver from the device.
> >>
> >> In the latter case, there's no way for me to know what that
> >> capability is by looking at kernel code. There's also no way
> >> for me to know about out-of-tree drivers or pre-release devices.
> >
> > But shouldn't NFS always limit its sge depths based on
ib_device_attr->max_sge?
> > I don't think it is reasonable to assume any minimum value supported by all
> > devices...
>
> The previous "minimum" requirement assumed 2 SGEs. RPC-over-RDMA just
> doesn't work with fewer than that.
>
> So you would rather have RPC-over-RDMA automatically reduce the inline
> threshold setting for each connection if the current system setting
> cannot be supported by the device? I'll consider that.
>
That seems reasonable.
> Keeping logic in the ULPs to handle devices with tiny capabilities
> duplicates a lot of complexity and impedes the introduction of new
> features like larger inline thresholds.
>
> The one abstract feature ULPs might really want is the ability to
> send medium-sized data payloads in place. More than a handful of
> SGEs is needed for that capability (in the kernel, where such payloads
> are typically in the page cache).
>
> It might be cool to have an API similar to rdma_rw that allows ULPs
> to use a scatterlist for Send and Receive operations. It could hide
> the driver and device maximum SGE values.
>
I'm not sure what you mean by "in place"? (sorry for being not up to speed on
this whole issue) But perhaps some API like this could be added to rdma_rw...
>
> >> It's not feasible for me to stock my lab with more than a
> >> couple of devices anyway.
> >>
> >> For all these reasons, I rely on HCA vendors for smoke testing
> >> NFS/RDMA with their devices.
> >>
> >> [1] was posted for review on public mailing lists for weeks. I
> >> received no review comments or reports of testing successes or
> >> failures from any vendor, until Broadcom's report in late
> >> December, three months after [1] appeared in a kernel release
> >> candidate.
> >>
> >> This may sound like sour grapes, but this is a review and
> >> testing gap, and I think the community should have the ability
> >> to address it.
> >>
> >> HCA vendors, especially, have to focus on kernel release
> >> candidate testing if functional ULPs are a critical release
> >> criterion for them.
> >>
> >
> > You're absolutely right. I'm querying Chelsio to see how this might have
> > slipped through the cracks. Did this initial change land in linux-4.9?
>
> I believe so.
>
>
> > I have one nit though, your patch series are always very long and thus, to
me,
> > tedious to review. It would be nice to see 5-8 patches submitted for review
vs
> > 15+.
>
> I cap my patch series around 20 for just this reason. That
> seemed to be the average number being posted for other ULPs.
>
> The flip side is that sometimes it takes several quarters to
> get a full set of changes upstream. Splitting features across
> kernel releases means the feature can't be reviewed together,
> and is sometimes more difficult for distribution backports.
>
> Could also go with more smaller patches, where each patch is
> easier to review, or capping at 8 patches but each patch
> is more complex.
>
> It's also OK to suggest series reorganization whenever you
> feel the ennui. ;-)
>
>
> >>> If I'm not mistaken, this issue affects nes and
> >>> cxgb3/4 drivers, and perhaps others.
> >>
> >> ocrdma and Oracle's HCA.
> >>
> >>
> >>> E.g., for cxgb4:
> >>>
> >>> #define T4_MAX_RECV_SGE 4
> >>
> >> Yet, without hard-coded max_sge values in kernel drivers, it's
> >> difficult to say whether 4 is truly the lower bound.
> >>
> >>
> >>> static int c4iw_query_device(struct ib_device *ibdev, struct
> > ib_device_attr
> >> *props,
> >>> struct ib_udata *uhw)
> >>> {
> >>> ...
> >>> props->max_sge = T4_MAX_RECV_SGE;
> >>>
> >>> ***
> >
> > FYI: cxgb4 supports 4 max for recv wrs, and 17 max for send wrs. Perhaps
17
> > avoided any problems for cxgb4 with the original code?
>
> The original code needed only two SGEs for sending, and one for
> receiving.
>
> IIRC the RPC-over-RDMA receive path still needs just one SGE.
>
No I mean the code that bumps it up to 18. Would that cause an immediate
failure if cxgb4 supported 17 and only enforces it at post_send() time?
(haven't looked in detail at your patches...sorry). Our QA ran testing on 4.9
and didn't see this issue, so that's why I'm asking. They have not yet run
NFS/RDMA testing on 4.9-rc. I've asked them to do a quick regression test asap.
>
> > Note: the ib_device_attr only has a max_sge that pertains to both send and
recv,
> > so cxgb4 sets it to the min value. We should probably add a max_recv_sge
and
> > max_send_sge to ib_device_attr...
>
> I could go for that too.
>
I'm swamped right now to add this, but the changes should be trivial...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-01-11 19:53 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-11 7:41 NFSoRDMA Fails for max_sge Less Than 18 Amrani, Ram
[not found] ` <SN1PR07MB2207F28F05DC6E22B03CC516F8660-mikhvbZlbf8TSoR2DauN2+FPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-01-11 16:38 ` Chuck Lever
[not found] ` <FE817A76-28A7-4AEC-AF1E-01DE15790E43-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-01-11 17:04 ` Steve Wise
2017-01-11 19:40 ` Chuck Lever
[not found] ` <28B0D906-7BDB-4B87-94E9-6BE263BFBFF7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-01-11 19:53 ` Steve Wise [this message]
2017-01-11 20:09 ` Chuck Lever
[not found] ` <383D1FFB-346B-40E9-A174-606F13AFF849-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-01-11 20:18 ` Steve Wise
2017-01-11 20:35 ` Chuck Lever
[not found] ` <A7A39994-66C6-4467-837B-288348C0CC53-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-01-13 22:42 ` Steve Wise
2017-01-13 22:43 ` Chuck Lever
[not found] ` <906A1B75-67E9-4D10-AD87-18F694F54818-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-01-13 22:45 ` Steve Wise
2017-01-19 17:19 ` Amrani, Ram
2017-01-11 21:11 ` Jason Gunthorpe
[not found] ` <20170111211123.GD28917-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-01-11 21:15 ` Steve Wise
2017-01-11 21:34 ` Jason Gunthorpe
[not found] ` <20170111213416.GA30681-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-01-11 21:48 ` Steve Wise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='038901d26c44$67d6c4f0$37844ed0$@opengridcomputing.com' \
--to=swise-7bpotxp6k4+p2yhjcf5u+vpxobypeauw@public.gmane.org \
--cc=Ariel.Elior-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org \
--cc=Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org \
--cc=Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org \
--cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=faisal.latif-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=hariprasad-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.