* NFSoRDMA Fails for max_sge Less Than 18
@ 2017-01-11 7:41 Amrani, Ram
[not found] ` <SN1PR07MB2207F28F05DC6E22B03CC516F8660-mikhvbZlbf8TSoR2DauN2+FPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Amrani, Ram @ 2017-01-11 7:41 UTC (permalink / raw)
To: Chuck Lever
Cc: Elior, Ariel, Kalderon, Michal,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Hariprasad S,
Steve Wise, Faisal Latif, Doug Ledford
Hi Chuck,
We discovered that your recent work (see[1]) on NFSoRDMA broke that functionality on our device.
This seems to stem from a new requirement of minimum 18 SGES for NFSoRDMA to work.
Our device supports only 4 SGEs, and it seems other devices also have limitations in that
regard which would prevent the NFSoRDMA from working on them.
Mounting NFS over RDMA fails with the message: "Cannot allocate memory".
After enabling RPC debug information we've found this is due to this piece of code
from net/sunrpc/xprtrdma/verbs.c:
if (ia->ri_device->attrs.max_sge < RPCRDMA_MAX_SEND_SGES) {
dprintk("RPC: %s: insufficient sge's available\n",
__func__);
return -ENOMEM;
}
Our device supports 4 sges while the minimum is now 18 sges, for PAGE_SIZE of 4KB:
#define RPCRDMA_MAX_INLINE (65536) /* max inline thresh */
RPCRDMA_MAX_SEND_PAGES = PAGE_SIZE + RPCRDMA_MAX_INLINE - 1,
RPCRDMA_MAX_PAGE_SGES = (RPCRDMA_MAX_SEND_PAGES >> PAGE_SHIFT) + 1,
RPCRDMA_MAX_SEND_SGES = 1 + 1 + RPCRDMA_MAX_PAGE_SGES + 1,
On kernel 4.8 and before, NFSoRDMA worked well with our device as only 2 SGEs were required.
The code looked like this:
#define RPCRDMA_MAX_IOVS (2)
if (ia->ri_device->attrs.max_sge < RPCRDMA_MAX_IOVS) {
dprintk("RPC: %s: insufficient sge's available\n",
__func__);
return -ENOMEM;
}
Browsing the code of other drivers it can be seen that this ability is either hardcoded or is
learnt by the driver from the device. If I'm not mistaken, this issue affects nes and
cxgb3/4 drivers, and perhaps others.
E.g., for cxgb4:
#define T4_MAX_RECV_SGE 4
static int c4iw_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
struct ib_udata *uhw)
{
...
props->max_sge = T4_MAX_RECV_SGE;
***
[1] https://patchwork.kernel.org/patch/9333951/
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFSoRDMA Fails for max_sge Less Than 18
[not found] ` <SN1PR07MB2207F28F05DC6E22B03CC516F8660-mikhvbZlbf8TSoR2DauN2+FPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-01-11 16:38 ` Chuck Lever
[not found] ` <FE817A76-28A7-4AEC-AF1E-01DE15790E43-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Chuck Lever @ 2017-01-11 16:38 UTC (permalink / raw)
To: Amrani, Ram
Cc: Elior, Ariel, Kalderon, Michal,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Hariprasad S,
Steve Wise, Faisal Latif, Doug Ledford
> On Jan 11, 2017, at 2:41 AM, Amrani, Ram <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org> wrote:
>
> Hi Chuck,
> We discovered that your recent work (see[1]) on NFSoRDMA broke that functionality on our device.
> This seems to stem from a new requirement of minimum 18 SGES for NFSoRDMA to work.
This issue was reported weeks ago by Broadcom, and I have a fix
pending for v4.11.
http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=commit;h=a300d316ac76ad000e14c4d309afdcdb6c0bd9ac
The above fix reduces the minimum requirement to 5 SGEs, so it
probably won't address the issue for your device (though
Broadcom reported that the fix worked for them).
> Our device supports only 4 SGEs, and it seems other devices also have limitations in that
> regard which would prevent the NFSoRDMA from working on them.
Of course NFS/RDMA should work for all in-tree drivers.
I will revisit [1] and see if there's any way to manage with 4
SGEs. I think reducing the minimum to a single partial or whole
page should be enough.
If not, I will send a revert for [1] for v4.10-rc.
> Mounting NFS over RDMA fails with the message: "Cannot allocate memory".
> After enabling RPC debug information we've found this is due to this piece of code
> from net/sunrpc/xprtrdma/verbs.c:
>
> if (ia->ri_device->attrs.max_sge < RPCRDMA_MAX_SEND_SGES) {
> dprintk("RPC: %s: insufficient sge's available\n",
> __func__);
> return -ENOMEM;
> }
>
> Our device supports 4 sges while the minimum is now 18 sges, for PAGE_SIZE of 4KB:
>
> #define RPCRDMA_MAX_INLINE (65536) /* max inline thresh */
>
> RPCRDMA_MAX_SEND_PAGES = PAGE_SIZE + RPCRDMA_MAX_INLINE - 1,
> RPCRDMA_MAX_PAGE_SGES = (RPCRDMA_MAX_SEND_PAGES >> PAGE_SHIFT) + 1,
> RPCRDMA_MAX_SEND_SGES = 1 + 1 + RPCRDMA_MAX_PAGE_SGES + 1,
>
> On kernel 4.8 and before, NFSoRDMA worked well with our device as only 2 SGEs were required.
> The code looked like this:
> #define RPCRDMA_MAX_IOVS (2)
>
> if (ia->ri_device->attrs.max_sge < RPCRDMA_MAX_IOVS) {
> dprintk("RPC: %s: insufficient sge's available\n",
> __func__);
> return -ENOMEM;
> }
>
> Browsing the code of other drivers it can be seen that this ability is either hardcoded or is
> learnt by the driver from the device.
In the latter case, there's no way for me to know what that
capability is by looking at kernel code. There's also no way
for me to know about out-of-tree drivers or pre-release devices.
It's not feasible for me to stock my lab with more than a
couple of devices anyway.
For all these reasons, I rely on HCA vendors for smoke testing
NFS/RDMA with their devices.
[1] was posted for review on public mailing lists for weeks. I
received no review comments or reports of testing successes or
failures from any vendor, until Broadcom's report in late
December, three months after [1] appeared in a kernel release
candidate.
This may sound like sour grapes, but this is a review and
testing gap, and I think the community should have the ability
to address it.
HCA vendors, especially, have to focus on kernel release
candidate testing if functional ULPs are a critical release
criterion for them.
> If I'm not mistaken, this issue affects nes and
> cxgb3/4 drivers, and perhaps others.
ocrdma and Oracle's HCA.
> E.g., for cxgb4:
>
> #define T4_MAX_RECV_SGE 4
Yet, without hard-coded max_sge values in kernel drivers, it's
difficult to say whether 4 is truly the lower bound.
> static int c4iw_query_device(struct ib_device *ibdev, struct ib_device_attr *props,
> struct ib_udata *uhw)
> {
> ...
> props->max_sge = T4_MAX_RECV_SGE;
>
> ***
> [1] https://patchwork.kernel.org/patch/9333951/
>
--
Chuck Lever
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: NFSoRDMA Fails for max_sge Less Than 18
[not found] ` <FE817A76-28A7-4AEC-AF1E-01DE15790E43-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2017-01-11 17:04 ` Steve Wise
2017-01-11 19:40 ` Chuck Lever
0 siblings, 1 reply; 16+ messages in thread
From: Steve Wise @ 2017-01-11 17:04 UTC (permalink / raw)
To: 'Chuck Lever', 'Amrani, Ram'
Cc: 'Elior, Ariel', 'Kalderon, Michal',
linux-rdma-u79uwXL29TY76Z2rM5mHXA, 'Hariprasad S',
'Faisal Latif', 'Doug Ledford'
Hey Chuck,
> >
> > Browsing the code of other drivers it can be seen that this ability is
either
> hardcoded or is
> > learnt by the driver from the device.
>
> In the latter case, there's no way for me to know what that
> capability is by looking at kernel code. There's also no way
> for me to know about out-of-tree drivers or pre-release devices.
But shouldn't NFS always limit its sge depths based on ib_device_attr->max_sge?
I don't think it is reasonable to assume any minimum value supported by all
devices...
>
> It's not feasible for me to stock my lab with more than a
> couple of devices anyway.
>
> For all these reasons, I rely on HCA vendors for smoke testing
> NFS/RDMA with their devices.
>
> [1] was posted for review on public mailing lists for weeks. I
> received no review comments or reports of testing successes or
> failures from any vendor, until Broadcom's report in late
> December, three months after [1] appeared in a kernel release
> candidate.
>
> This may sound like sour grapes, but this is a review and
> testing gap, and I think the community should have the ability
> to address it.
>
> HCA vendors, especially, have to focus on kernel release
> candidate testing if functional ULPs are a critical release
> criterion for them.
>
You're absolutely right. I'm querying Chelsio to see how this might have
slipped through the cracks. Did this initial change land in linux-4.9?
I have one nit though, your patch series are always very long and thus, to me,
tedious to review. It would be nice to see 5-8 patches submitted for review vs
15+.
>
> > If I'm not mistaken, this issue affects nes and
> > cxgb3/4 drivers, and perhaps others.
>
> ocrdma and Oracle's HCA.
>
>
> > E.g., for cxgb4:
> >
> > #define T4_MAX_RECV_SGE 4
>
> Yet, without hard-coded max_sge values in kernel drivers, it's
> difficult to say whether 4 is truly the lower bound.
>
>
> > static int c4iw_query_device(struct ib_device *ibdev, struct
ib_device_attr
> *props,
> > struct ib_udata *uhw)
> > {
> > ...
> > props->max_sge = T4_MAX_RECV_SGE;
> >
> > ***
FYI: cxgb4 supports 4 max for recv wrs, and 17 max for send wrs. Perhaps 17
avoided any problems for cxgb4 with the original code?
Note: the ib_device_attr only has a max_sge that pertains to both send and recv,
so cxgb4 sets it to the min value. We should probably add a max_recv_sge and
max_send_sge to ib_device_attr...
Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFSoRDMA Fails for max_sge Less Than 18
2017-01-11 17:04 ` Steve Wise
@ 2017-01-11 19:40 ` Chuck Lever
[not found] ` <28B0D906-7BDB-4B87-94E9-6BE263BFBFF7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Chuck Lever @ 2017-01-11 19:40 UTC (permalink / raw)
To: Steve Wise
Cc: Amrani, Ram, Elior, Ariel, Kalderon, Michal,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hariprasad S, Faisal Latif,
Doug Ledford
Hi Steve-
> On Jan 11, 2017, at 12:04 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
>
> Hey Chuck,
>
>>>
>>> Browsing the code of other drivers it can be seen that this ability is
> either
>> hardcoded or is
>>> learnt by the driver from the device.
>>
>> In the latter case, there's no way for me to know what that
>> capability is by looking at kernel code. There's also no way
>> for me to know about out-of-tree drivers or pre-release devices.
>
> But shouldn't NFS always limit its sge depths based on ib_device_attr->max_sge?
> I don't think it is reasonable to assume any minimum value supported by all
> devices...
The previous "minimum" requirement assumed 2 SGEs. RPC-over-RDMA just
doesn't work with fewer than that.
So you would rather have RPC-over-RDMA automatically reduce the inline
threshold setting for each connection if the current system setting
cannot be supported by the device? I'll consider that.
Keeping logic in the ULPs to handle devices with tiny capabilities
duplicates a lot of complexity and impedes the introduction of new
features like larger inline thresholds.
The one abstract feature ULPs might really want is the ability to
send medium-sized data payloads in place. More than a handful of
SGEs is needed for that capability (in the kernel, where such payloads
are typically in the page cache).
It might be cool to have an API similar to rdma_rw that allows ULPs
to use a scatterlist for Send and Receive operations. It could hide
the driver and device maximum SGE values.
>> It's not feasible for me to stock my lab with more than a
>> couple of devices anyway.
>>
>> For all these reasons, I rely on HCA vendors for smoke testing
>> NFS/RDMA with their devices.
>>
>> [1] was posted for review on public mailing lists for weeks. I
>> received no review comments or reports of testing successes or
>> failures from any vendor, until Broadcom's report in late
>> December, three months after [1] appeared in a kernel release
>> candidate.
>>
>> This may sound like sour grapes, but this is a review and
>> testing gap, and I think the community should have the ability
>> to address it.
>>
>> HCA vendors, especially, have to focus on kernel release
>> candidate testing if functional ULPs are a critical release
>> criterion for them.
>>
>
> You're absolutely right. I'm querying Chelsio to see how this might have
> slipped through the cracks. Did this initial change land in linux-4.9?
I believe so.
> I have one nit though, your patch series are always very long and thus, to me,
> tedious to review. It would be nice to see 5-8 patches submitted for review vs
> 15+.
I cap my patch series around 20 for just this reason. That
seemed to be the average number being posted for other ULPs.
The flip side is that sometimes it takes several quarters to
get a full set of changes upstream. Splitting features across
kernel releases means the feature can't be reviewed together,
and is sometimes more difficult for distribution backports.
Could also go with more smaller patches, where each patch is
easier to review, or capping at 8 patches but each patch
is more complex.
It's also OK to suggest series reorganization whenever you
feel the ennui. ;-)
>>> If I'm not mistaken, this issue affects nes and
>>> cxgb3/4 drivers, and perhaps others.
>>
>> ocrdma and Oracle's HCA.
>>
>>
>>> E.g., for cxgb4:
>>>
>>> #define T4_MAX_RECV_SGE 4
>>
>> Yet, without hard-coded max_sge values in kernel drivers, it's
>> difficult to say whether 4 is truly the lower bound.
>>
>>
>>> static int c4iw_query_device(struct ib_device *ibdev, struct
> ib_device_attr
>> *props,
>>> struct ib_udata *uhw)
>>> {
>>> ...
>>> props->max_sge = T4_MAX_RECV_SGE;
>>>
>>> ***
>
> FYI: cxgb4 supports 4 max for recv wrs, and 17 max for send wrs. Perhaps 17
> avoided any problems for cxgb4 with the original code?
The original code needed only two SGEs for sending, and one for
receiving.
IIRC the RPC-over-RDMA receive path still needs just one SGE.
> Note: the ib_device_attr only has a max_sge that pertains to both send and recv,
> so cxgb4 sets it to the min value. We should probably add a max_recv_sge and
> max_send_sge to ib_device_attr...
I could go for that too.
--
Chuck Lever
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: NFSoRDMA Fails for max_sge Less Than 18
[not found] ` <28B0D906-7BDB-4B87-94E9-6BE263BFBFF7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2017-01-11 19:53 ` Steve Wise
2017-01-11 20:09 ` Chuck Lever
2017-01-11 21:11 ` Jason Gunthorpe
1 sibling, 1 reply; 16+ messages in thread
From: Steve Wise @ 2017-01-11 19:53 UTC (permalink / raw)
To: 'Chuck Lever'
Cc: 'Amrani, Ram', 'Elior, Ariel',
'Kalderon, Michal', linux-rdma-u79uwXL29TY76Z2rM5mHXA,
'Hariprasad S', 'Faisal Latif',
'Doug Ledford'
> Hi Steve-
>
>
> > On Jan 11, 2017, at 12:04 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
> wrote:
> >
> > Hey Chuck,
> >
> >>>
> >>> Browsing the code of other drivers it can be seen that this ability is
> > either
> >> hardcoded or is
> >>> learnt by the driver from the device.
> >>
> >> In the latter case, there's no way for me to know what that
> >> capability is by looking at kernel code. There's also no way
> >> for me to know about out-of-tree drivers or pre-release devices.
> >
> > But shouldn't NFS always limit its sge depths based on
ib_device_attr->max_sge?
> > I don't think it is reasonable to assume any minimum value supported by all
> > devices...
>
> The previous "minimum" requirement assumed 2 SGEs. RPC-over-RDMA just
> doesn't work with fewer than that.
>
> So you would rather have RPC-over-RDMA automatically reduce the inline
> threshold setting for each connection if the current system setting
> cannot be supported by the device? I'll consider that.
>
That seems reasonable.
> Keeping logic in the ULPs to handle devices with tiny capabilities
> duplicates a lot of complexity and impedes the introduction of new
> features like larger inline thresholds.
>
> The one abstract feature ULPs might really want is the ability to
> send medium-sized data payloads in place. More than a handful of
> SGEs is needed for that capability (in the kernel, where such payloads
> are typically in the page cache).
>
> It might be cool to have an API similar to rdma_rw that allows ULPs
> to use a scatterlist for Send and Receive operations. It could hide
> the driver and device maximum SGE values.
>
I'm not sure what you mean by "in place"? (sorry for being not up to speed on
this whole issue) But perhaps some API like this could be added to rdma_rw...
>
> >> It's not feasible for me to stock my lab with more than a
> >> couple of devices anyway.
> >>
> >> For all these reasons, I rely on HCA vendors for smoke testing
> >> NFS/RDMA with their devices.
> >>
> >> [1] was posted for review on public mailing lists for weeks. I
> >> received no review comments or reports of testing successes or
> >> failures from any vendor, until Broadcom's report in late
> >> December, three months after [1] appeared in a kernel release
> >> candidate.
> >>
> >> This may sound like sour grapes, but this is a review and
> >> testing gap, and I think the community should have the ability
> >> to address it.
> >>
> >> HCA vendors, especially, have to focus on kernel release
> >> candidate testing if functional ULPs are a critical release
> >> criterion for them.
> >>
> >
> > You're absolutely right. I'm querying Chelsio to see how this might have
> > slipped through the cracks. Did this initial change land in linux-4.9?
>
> I believe so.
>
>
> > I have one nit though, your patch series are always very long and thus, to
me,
> > tedious to review. It would be nice to see 5-8 patches submitted for review
vs
> > 15+.
>
> I cap my patch series around 20 for just this reason. That
> seemed to be the average number being posted for other ULPs.
>
> The flip side is that sometimes it takes several quarters to
> get a full set of changes upstream. Splitting features across
> kernel releases means the feature can't be reviewed together,
> and is sometimes more difficult for distribution backports.
>
> Could also go with more smaller patches, where each patch is
> easier to review, or capping at 8 patches but each patch
> is more complex.
>
> It's also OK to suggest series reorganization whenever you
> feel the ennui. ;-)
>
>
> >>> If I'm not mistaken, this issue affects nes and
> >>> cxgb3/4 drivers, and perhaps others.
> >>
> >> ocrdma and Oracle's HCA.
> >>
> >>
> >>> E.g., for cxgb4:
> >>>
> >>> #define T4_MAX_RECV_SGE 4
> >>
> >> Yet, without hard-coded max_sge values in kernel drivers, it's
> >> difficult to say whether 4 is truly the lower bound.
> >>
> >>
> >>> static int c4iw_query_device(struct ib_device *ibdev, struct
> > ib_device_attr
> >> *props,
> >>> struct ib_udata *uhw)
> >>> {
> >>> ...
> >>> props->max_sge = T4_MAX_RECV_SGE;
> >>>
> >>> ***
> >
> > FYI: cxgb4 supports 4 max for recv wrs, and 17 max for send wrs. Perhaps
17
> > avoided any problems for cxgb4 with the original code?
>
> The original code needed only two SGEs for sending, and one for
> receiving.
>
> IIRC the RPC-over-RDMA receive path still needs just one SGE.
>
No I mean the code that bumps it up to 18. Would that cause an immediate
failure if cxgb4 supported 17 and only enforces it at post_send() time?
(haven't looked in detail at your patches...sorry). Our QA ran testing on 4.9
and didn't see this issue, so that's why I'm asking. They have not yet run
NFS/RDMA testing on 4.9-rc. I've asked them to do a quick regression test asap.
>
> > Note: the ib_device_attr only has a max_sge that pertains to both send and
recv,
> > so cxgb4 sets it to the min value. We should probably add a max_recv_sge
and
> > max_send_sge to ib_device_attr...
>
> I could go for that too.
>
I'm swamped right now to add this, but the changes should be trivial...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFSoRDMA Fails for max_sge Less Than 18
2017-01-11 19:53 ` Steve Wise
@ 2017-01-11 20:09 ` Chuck Lever
[not found] ` <383D1FFB-346B-40E9-A174-606F13AFF849-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Chuck Lever @ 2017-01-11 20:09 UTC (permalink / raw)
To: Steve Wise
Cc: Amrani, Ram, Elior, Ariel, Kalderon, Michal,
List Linux RDMA Mailing, Hariprasad S, Faisal Latif, Doug Ledford
> On Jan 11, 2017, at 2:53 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
>
>> Hi Steve-
>>
>>
>>> On Jan 11, 2017, at 12:04 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
>> wrote:
>>>
>>> Hey Chuck,
>>>
>>>>>
>>>>> Browsing the code of other drivers it can be seen that this ability is
>>> either
>>>> hardcoded or is
>>>>> learnt by the driver from the device.
>>>>
>>>> In the latter case, there's no way for me to know what that
>>>> capability is by looking at kernel code. There's also no way
>>>> for me to know about out-of-tree drivers or pre-release devices.
>>>
>>> But shouldn't NFS always limit its sge depths based on
> ib_device_attr->max_sge?
>>> I don't think it is reasonable to assume any minimum value supported by all
>>> devices...
>>
>> The previous "minimum" requirement assumed 2 SGEs. RPC-over-RDMA just
>> doesn't work with fewer than that.
>>
>> So you would rather have RPC-over-RDMA automatically reduce the inline
>> threshold setting for each connection if the current system setting
>> cannot be supported by the device? I'll consider that.
>>
>
> That seems reasonable.
>
>> Keeping logic in the ULPs to handle devices with tiny capabilities
>> duplicates a lot of complexity and impedes the introduction of new
>> features like larger inline thresholds.
>>
>> The one abstract feature ULPs might really want is the ability to
>> send medium-sized data payloads in place. More than a handful of
>> SGEs is needed for that capability (in the kernel, where such payloads
>> are typically in the page cache).
>>
>> It might be cool to have an API similar to rdma_rw that allows ULPs
>> to use a scatterlist for Send and Receive operations. It could hide
>> the driver and device maximum SGE values.
>>
>
> I'm not sure what you mean by "in place"? (sorry for being not up to speed on
> this whole issue) But perhaps some API like this could be added to rdma_rw...
"in place" == the SGE array would point to struct pages
containing parts of the message payload. That's basically what
this "support large inline threshold" patch is doing.
If the device supports only 4 SGEs, then the largest
message size that can be sent this way is just one or two
pages.
Some would prefer to send much larger payloads this way.
I guess what I'm asking is whether 4 SGEs is going to be typical
of HCAs going forward, or whether there is a definite trend for
adding more in new device designs.
>>>> It's not feasible for me to stock my lab with more than a
>>>> couple of devices anyway.
>>>>
>>>> For all these reasons, I rely on HCA vendors for smoke testing
>>>> NFS/RDMA with their devices.
>>>>
>>>> [1] was posted for review on public mailing lists for weeks. I
>>>> received no review comments or reports of testing successes or
>>>> failures from any vendor, until Broadcom's report in late
>>>> December, three months after [1] appeared in a kernel release
>>>> candidate.
>>>>
>>>> This may sound like sour grapes, but this is a review and
>>>> testing gap, and I think the community should have the ability
>>>> to address it.
>>>>
>>>> HCA vendors, especially, have to focus on kernel release
>>>> candidate testing if functional ULPs are a critical release
>>>> criterion for them.
>>>>
>>>
>>> You're absolutely right. I'm querying Chelsio to see how this might have
>>> slipped through the cracks. Did this initial change land in linux-4.9?
>>
>> I believe so.
>>
>>
>>> I have one nit though, your patch series are always very long and thus, to
> me,
>>> tedious to review. It would be nice to see 5-8 patches submitted for review
> vs
>>> 15+.
>>
>> I cap my patch series around 20 for just this reason. That
>> seemed to be the average number being posted for other ULPs.
>>
>> The flip side is that sometimes it takes several quarters to
>> get a full set of changes upstream. Splitting features across
>> kernel releases means the feature can't be reviewed together,
>> and is sometimes more difficult for distribution backports.
>>
>> Could also go with more smaller patches, where each patch is
>> easier to review, or capping at 8 patches but each patch
>> is more complex.
>>
>> It's also OK to suggest series reorganization whenever you
>> feel the ennui. ;-)
>>
>>
>>>>> If I'm not mistaken, this issue affects nes and
>>>>> cxgb3/4 drivers, and perhaps others.
>>>>
>>>> ocrdma and Oracle's HCA.
>>>>
>>>>
>>>>> E.g., for cxgb4:
>>>>>
>>>>> #define T4_MAX_RECV_SGE 4
>>>>
>>>> Yet, without hard-coded max_sge values in kernel drivers, it's
>>>> difficult to say whether 4 is truly the lower bound.
>>>>
>>>>
>>>>> static int c4iw_query_device(struct ib_device *ibdev, struct
>>> ib_device_attr
>>>> *props,
>>>>> struct ib_udata *uhw)
>>>>> {
>>>>> ...
>>>>> props->max_sge = T4_MAX_RECV_SGE;
>>>>>
>>>>> ***
>>>
>>> FYI: cxgb4 supports 4 max for recv wrs, and 17 max for send wrs. Perhaps
> 17
>>> avoided any problems for cxgb4 with the original code?
>>
>> The original code needed only two SGEs for sending, and one for
>> receiving.
>>
>> IIRC the RPC-over-RDMA receive path still needs just one SGE.
>>
>
> No I mean the code that bumps it up to 18. Would that cause an immediate
> failure if cxgb4 supported 17 and only enforces it at post_send() time?
"mount" would fail immediately if the driver reported max_sge == 17.
The check that Ram mentioned happens at mount time, before anything
has been sent.
> (haven't looked in detail at your patches...sorry). Our QA ran testing on 4.9
> and didn't see this issue, so that's why I'm asking. They have not yet run
> NFS/RDMA testing on 4.9-rc. I've asked them to do a quick regression test asap.
That's curious!
>>> Note: the ib_device_attr only has a max_sge that pertains to both send and
> recv,
>>> so cxgb4 sets it to the min value. We should probably add a max_recv_sge
> and
>>> max_send_sge to ib_device_attr...
>>
>> I could go for that too.
>>
>
> I'm swamped right now to add this, but the changes should be trivial...
Maybe I could get to it, but no promises.
--
Chuck Lever
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: NFSoRDMA Fails for max_sge Less Than 18
[not found] ` <383D1FFB-346B-40E9-A174-606F13AFF849-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2017-01-11 20:18 ` Steve Wise
2017-01-11 20:35 ` Chuck Lever
0 siblings, 1 reply; 16+ messages in thread
From: Steve Wise @ 2017-01-11 20:18 UTC (permalink / raw)
To: 'Chuck Lever'
Cc: 'Amrani, Ram', 'Elior, Ariel',
'Kalderon, Michal', 'List Linux RDMA Mailing',
'Hariprasad S', 'Faisal Latif',
'Doug Ledford'
<snip>
> >> It might be cool to have an API similar to rdma_rw that allows ULPs
> >> to use a scatterlist for Send and Receive operations. It could hide
> >> the driver and device maximum SGE values.
> >>
> >
> > I'm not sure what you mean by "in place"? (sorry for being not up to speed
on
> > this whole issue) But perhaps some API like this could be added to
rdma_rw...
>
> "in place" == the SGE array would point to struct pages
> containing parts of the message payload. That's basically what
> this "support large inline threshold" patch is doing.
>
> If the device supports only 4 SGEs, then the largest
> message size that can be sent this way is just one or two
> pages.
>
> Some would prefer to send much larger payloads this way.
>
I'm not sure how the API would do this w/o having to send multiple ULP protocol
SEND messages and thus that moves the logic into the ULP. IE if the device only
supports 4 SGE, and that only allows 2 pages worth of inline data, then the ULP
needs to create multiple SEND messages with ULP headers in each, I would think.
Not sure how this could be done below the ULP...
> I guess what I'm asking is whether 4 SGEs is going to be typical
> of HCAs going forward, or whether there is a definite trend for
> adding more in new device designs.
>
The iWARP spec mandates 4 as the minimum. That's where the 4 came from for
iWARP devices...
<snip>
> >> The original code needed only two SGEs for sending, and one for
> >> receiving.
> >>
> >> IIRC the RPC-over-RDMA receive path still needs just one SGE.
> >>
> >
> > No I mean the code that bumps it up to 18. Would that cause an immediate
> > failure if cxgb4 supported 17 and only enforces it at post_send() time?
>
> "mount" would fail immediately if the driver reported max_sge == 17.
> The check that Ram mentioned happens at mount time, before anything
> has been sent.
>
Hmm...
>
> > (haven't looked in detail at your patches...sorry). Our QA ran testing on
4.9
> > and didn't see this issue, so that's why I'm asking. They have not yet
run
> > NFS/RDMA testing on 4.9-rc. I've asked them to do a quick regression test
asap.
>
Correction: I meant 4.10-rc above.
But still, I believe Chelsio tested 4.9, so perhaps it isn't the "mount" that
causes a failure but trying to send something with an SGE > 4 that happens
immediately after the mount? And since cxgb4 supports up to 17, the failure
wouldn't be seen until some inline message was attempted that required 18
sges...
> That's curious!
>
>
> >>> Note: the ib_device_attr only has a max_sge that pertains to both send and
> > recv,
> >>> so cxgb4 sets it to the min value. We should probably add a max_recv_sge
> > and
> >>> max_send_sge to ib_device_attr...
> >>
> >> I could go for that too.
> >>
> >
> > I'm swamped right now to add this, but the changes should be trivial...
>
> Maybe I could get to it, but no promises.
(I'm holding my breath! ;))
Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFSoRDMA Fails for max_sge Less Than 18
2017-01-11 20:18 ` Steve Wise
@ 2017-01-11 20:35 ` Chuck Lever
[not found] ` <A7A39994-66C6-4467-837B-288348C0CC53-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Chuck Lever @ 2017-01-11 20:35 UTC (permalink / raw)
To: Steve Wise
Cc: Amrani, Ram, Elior, Ariel, Kalderon, Michal,
List Linux RDMA Mailing, Hariprasad S, Faisal Latif, Doug Ledford
> On Jan 11, 2017, at 3:18 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
>
> <snip>
>
>>>> It might be cool to have an API similar to rdma_rw that allows ULPs
>>>> to use a scatterlist for Send and Receive operations. It could hide
>>>> the driver and device maximum SGE values.
>>>>
>>>
>>> I'm not sure what you mean by "in place"? (sorry for being not up to speed
> on
>>> this whole issue) But perhaps some API like this could be added to
> rdma_rw...
>>
>> "in place" == the SGE array would point to struct pages
>> containing parts of the message payload. That's basically what
>> this "support large inline threshold" patch is doing.
>>
>> If the device supports only 4 SGEs, then the largest
>> message size that can be sent this way is just one or two
>> pages.
>>
>> Some would prefer to send much larger payloads this way.
>>
>
> I'm not sure how the API would do this w/o having to send multiple ULP protocol
> SEND messages and thus that moves the logic into the ULP. IE if the device only
> supports 4 SGE, and that only allows 2 pages worth of inline data, then the ULP
> needs to create multiple SEND messages with ULP headers in each, I would think.
> Not sure how this could be done below the ULP...
Right, me neither. It was just a thought.
>> I guess what I'm asking is whether 4 SGEs is going to be typical
>> of HCAs going forward, or whether there is a definite trend for
>> adding more in new device designs.
>>
>
> The iWARP spec mandates 4 as the minimum. That's where the 4 came from for
> iWARP devices...
>
> <snip>
>
>>>> The original code needed only two SGEs for sending, and one for
>>>> receiving.
>>>>
>>>> IIRC the RPC-over-RDMA receive path still needs just one SGE.
>>>>
>>>
>>> No I mean the code that bumps it up to 18. Would that cause an immediate
>>> failure if cxgb4 supported 17 and only enforces it at post_send() time?
>>
>> "mount" would fail immediately if the driver reported max_sge == 17.
>> The check that Ram mentioned happens at mount time, before anything
>> has been sent.
>>
>
> Hmm...
>
>>
>>> (haven't looked in detail at your patches...sorry). Our QA ran testing on
> 4.9
>>> and didn't see this issue, so that's why I'm asking. They have not yet
> run
>>> NFS/RDMA testing on 4.9-rc. I've asked them to do a quick regression test
> asap.
>>
>
> Correction: I meant 4.10-rc above.
>
> But still, I believe Chelsio tested 4.9, so perhaps it isn't the "mount" that
> causes a failure but trying to send something with an SGE > 4 that happens
> immediately after the mount? And since cxgb4 supports up to 17, the failure
> wouldn't be seen until some inline message was attempted that required 18
> sges...
The original check for 18 or 19 was too aggressive (that's the
bug here). With the default inline threshold settings, RPC-over-RDMA
won't ever use more than 4 (or at most 5) SGEs for RDMA Send.
So if somehow the mount was allowed, and no changes were made to the
default settings, everything should still work fine for cxgb4.
>> That's curious!
>>
>>
>>>>> Note: the ib_device_attr only has a max_sge that pertains to both send and
>>> recv,
>>>>> so cxgb4 sets it to the min value. We should probably add a max_recv_sge
>>> and
>>>>> max_send_sge to ib_device_attr...
>>>>
>>>> I could go for that too.
>>>>
>>>
>>> I'm swamped right now to add this, but the changes should be trivial...
>>
>> Maybe I could get to it, but no promises.
>
> (I'm holding my breath! ;))
>
> Steve.
>
--
Chuck Lever
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFSoRDMA Fails for max_sge Less Than 18
[not found] ` <28B0D906-7BDB-4B87-94E9-6BE263BFBFF7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-01-11 19:53 ` Steve Wise
@ 2017-01-11 21:11 ` Jason Gunthorpe
[not found] ` <20170111211123.GD28917-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
1 sibling, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2017-01-11 21:11 UTC (permalink / raw)
To: Chuck Lever
Cc: Steve Wise, Amrani, Ram, Elior, Ariel, Kalderon, Michal,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, Hariprasad S, Faisal Latif,
Doug Ledford
On Wed, Jan 11, 2017 at 02:40:37PM -0500, Chuck Lever wrote:
> It might be cool to have an API similar to rdma_rw that allows ULPs
> to use a scatterlist for Send and Receive operations. It could hide
> the driver and device maximum SGE values.
That would be good, it could bounce buffer, build temporary MRs,
or repeat RDMA WRITE/READ as appropriate..
The ULP really is not the right place to put this trade off logic
since it will be very device specific..
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: NFSoRDMA Fails for max_sge Less Than 18
[not found] ` <20170111211123.GD28917-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-01-11 21:15 ` Steve Wise
2017-01-11 21:34 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: Steve Wise @ 2017-01-11 21:15 UTC (permalink / raw)
To: 'Jason Gunthorpe', 'Chuck Lever'
Cc: 'Amrani, Ram', 'Elior, Ariel',
'Kalderon, Michal', linux-rdma-u79uwXL29TY76Z2rM5mHXA,
'Hariprasad S', 'Faisal Latif',
'Doug Ledford'
>
> On Wed, Jan 11, 2017 at 02:40:37PM -0500, Chuck Lever wrote:
>
> > It might be cool to have an API similar to rdma_rw that allows ULPs
> > to use a scatterlist for Send and Receive operations. It could hide
> > the driver and device maximum SGE values.
>
> That would be good, it could bounce buffer, build temporary MRs,
> or repeat RDMA WRITE/READ as appropriate..
>
We're talking SEND opcodes where the ULP transfers data in the SEND that
includes the command request or reply instead of having the target/server side
issue RDMA opcodes.
> The ULP really is not the right place to put this trade off logic
> since it will be very device specific..
>
A temporary REG_MR would do the trick...assuming it can contain the SGE depth as
well. But issuing multiple SEND operations involves ULP protocol headers so
that couldn't be done by a core service...
Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFSoRDMA Fails for max_sge Less Than 18
2017-01-11 21:15 ` Steve Wise
@ 2017-01-11 21:34 ` Jason Gunthorpe
[not found] ` <20170111213416.GA30681-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2017-01-11 21:34 UTC (permalink / raw)
To: Steve Wise
Cc: 'Chuck Lever', 'Amrani, Ram',
'Elior, Ariel', 'Kalderon, Michal',
linux-rdma-u79uwXL29TY76Z2rM5mHXA, 'Hariprasad S',
'Faisal Latif', 'Doug Ledford'
On Wed, Jan 11, 2017 at 03:15:04PM -0600, Steve Wise wrote:
> > That would be good, it could bounce buffer, build temporary MRs,
> > or repeat RDMA WRITE/READ as appropriate..
>
> We're talking SEND opcodes where the ULP transfers data in the SEND that
> includes the command request or reply instead of having the target/server side
> issue RDMA opcodes.
Yes, I know, handling READ/WRITE also would be the 'full generality'
of a core API in this area.
> > The ULP really is not the right place to put this trade off logic
> > since it will be very device specific..
>
> A temporary REG_MR would do the trick...assuming it can contain the SGE depth as
> well. But issuing multiple SEND operations involves ULP protocol headers so
> that couldn't be done by a core service...
Right, for SEND the core API would need to pack the given SG list into
whatever the device limit is, using bounce buffers if necessary.
For NFS, since the data is coming out of the page cache, a MR should
be always be buildable for the data? That means 1 send sge for the
header and 1 for the MR?
Basically the core API would be a packing API - input a SG list and a
MR pool and you get back another SG list that fits within the device
limits and can be used in a WR.
The core figures out how to optimally compute the packed SG list.
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: NFSoRDMA Fails for max_sge Less Than 18
[not found] ` <20170111213416.GA30681-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-01-11 21:48 ` Steve Wise
0 siblings, 0 replies; 16+ messages in thread
From: Steve Wise @ 2017-01-11 21:48 UTC (permalink / raw)
To: 'Jason Gunthorpe'
Cc: 'Chuck Lever', 'Amrani, Ram',
'Elior, Ariel', 'Kalderon, Michal',
linux-rdma-u79uwXL29TY76Z2rM5mHXA, 'Hariprasad S',
'Faisal Latif', 'Doug Ledford'
> On Wed, Jan 11, 2017 at 03:15:04PM -0600, Steve Wise wrote:
>
> > > That would be good, it could bounce buffer, build temporary MRs,
> > > or repeat RDMA WRITE/READ as appropriate..
> >
> > We're talking SEND opcodes where the ULP transfers data in the SEND that
> > includes the command request or reply instead of having the target/server
side
> > issue RDMA opcodes.
>
> Yes, I know, handling READ/WRITE also would be the 'full generality'
> of a core API in this area.
>
> > > The ULP really is not the right place to put this trade off logic
> > > since it will be very device specific..
> >
> > A temporary REG_MR would do the trick...assuming it can contain the SGE
depth
> as
> > well. But issuing multiple SEND operations involves ULP protocol headers so
> > that couldn't be done by a core service...
>
> Right, for SEND the core API would need to pack the given SG list into
> whatever the device limit is, using bounce buffers if necessary.
>
> For NFS, since the data is coming out of the page cache, a MR should
> be always be buildable for the data? That means 1 send sge for the
> header and 1 for the MR?
>
Yes, but the REG_MR has limits on how many pages can be registered, so it could
be 1 for the header and n MRs. And if n causes the send_wr max_sge to be
exceeded something else would have to be done. Maybe bounce buffers could be
used? I'm not sure how though. And maybe the API just fails in this case and
the ULP will have to _not_ send this as inline?
> Basically the core API would be a packing API - input a SG list and a
> MR pool and you get back another SG list that fits within the device
> limits and can be used in a WR.
>
> The core figures out how to optimally compute the packed SG list.
>
Are you volunteering? :) Or should we sign up Christoph? :):)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: NFSoRDMA Fails for max_sge Less Than 18
[not found] ` <A7A39994-66C6-4467-837B-288348C0CC53-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2017-01-13 22:42 ` Steve Wise
2017-01-13 22:43 ` Chuck Lever
0 siblings, 1 reply; 16+ messages in thread
From: Steve Wise @ 2017-01-13 22:42 UTC (permalink / raw)
To: 'Chuck Lever'
Cc: 'Amrani, Ram', 'Elior, Ariel',
'Kalderon, Michal', 'List Linux RDMA Mailing',
'Hariprasad S', 'Faisal Latif',
'Doug Ledford'
> >>
> >>> (haven't looked in detail at your patches...sorry). Our QA ran
testing on
> > 4.9
> >>> and didn't see this issue, so that's why I'm asking. They have not
yet
> > run
> >>> NFS/RDMA testing on 4.9-rc. I've asked them to do a quick regression
> test
> > asap.
> >>
> >
> > Correction: I meant 4.10-rc above.
> >
> > But still, I believe Chelsio tested 4.9, so perhaps it isn't the "mount"
that
> > causes a failure but trying to send something with an SGE > 4 that
happens
> > immediately after the mount? And since cxgb4 supports up to 17, the
> failure
> > wouldn't be seen until some inline message was attempted that required
> 18
> > sges...
>
> The original check for 18 or 19 was too aggressive (that's the
> bug here). With the default inline threshold settings, RPC-over-RDMA
> won't ever use more than 4 (or at most 5) SGEs for RDMA Send.
>
> So if somehow the mount was allowed, and no changes were made to the
> default settings, everything should still work fine for cxgb4.
>
Hey Chuck, Chelsio confirmed mounts fail with 4.10-rc. So can you get a fix
in 4.10-rc (and back to 4.9 if the regression is there) to ensure at least a
max_sge of 4 is supported?
Steveo
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFSoRDMA Fails for max_sge Less Than 18
2017-01-13 22:42 ` Steve Wise
@ 2017-01-13 22:43 ` Chuck Lever
[not found] ` <906A1B75-67E9-4D10-AD87-18F694F54818-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Chuck Lever @ 2017-01-13 22:43 UTC (permalink / raw)
To: Steve Wise
Cc: Amrani, Ram, Elior, Ariel, Kalderon, Michal,
List Linux RDMA Mailing, Hariprasad S, Faisal Latif, Doug Ledford
> On Jan 13, 2017, at 5:42 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
>
>>>>
>>>>> (haven't looked in detail at your patches...sorry). Our QA ran
> testing on
>>> 4.9
>>>>> and didn't see this issue, so that's why I'm asking. They have not
> yet
>>> run
>>>>> NFS/RDMA testing on 4.9-rc. I've asked them to do a quick regression
>> test
>>> asap.
>>>>
>>>
>>> Correction: I meant 4.10-rc above.
>>>
>>> But still, I believe Chelsio tested 4.9, so perhaps it isn't the "mount"
> that
>>> causes a failure but trying to send something with an SGE > 4 that
> happens
>>> immediately after the mount? And since cxgb4 supports up to 17, the
>> failure
>>> wouldn't be seen until some inline message was attempted that required
>> 18
>>> sges...
>>
>> The original check for 18 or 19 was too aggressive (that's the
>> bug here). With the default inline threshold settings, RPC-over-RDMA
>> won't ever use more than 4 (or at most 5) SGEs for RDMA Send.
>>
>> So if somehow the mount was allowed, and no changes were made to the
>> default settings, everything should still work fine for cxgb4.
>>
>
> Hey Chuck, Chelsio confirmed mounts fail with 4.10-rc. So can you get a fix
> in 4.10-rc (and back to 4.9 if the regression is there) to ensure at least a
> max_sge of 4 is supported?
Posted earlier today:
Available in the "nfs-rdma-for-4.10-rc" topic branch of this git repo:
git://git.linux-nfs.org/projects/cel/cel-2.6.git
--
Chuck Lever
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: NFSoRDMA Fails for max_sge Less Than 18
[not found] ` <906A1B75-67E9-4D10-AD87-18F694F54818-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2017-01-13 22:45 ` Steve Wise
2017-01-19 17:19 ` Amrani, Ram
1 sibling, 0 replies; 16+ messages in thread
From: Steve Wise @ 2017-01-13 22:45 UTC (permalink / raw)
To: 'Chuck Lever'
Cc: 'Amrani, Ram', 'Elior, Ariel',
'Kalderon, Michal', 'List Linux RDMA Mailing',
'Hariprasad S', 'Faisal Latif',
'Doug Ledford'
> > Hey Chuck, Chelsio confirmed mounts fail with 4.10-rc. So can you get a
fix
> > in 4.10-rc (and back to 4.9 if the regression is there) to ensure at
least a
> > max_sge of 4 is supported?
>
> Posted earlier today:
>
> Available in the "nfs-rdma-for-4.10-rc" topic branch of this git repo:
>
> git://git.linux-nfs.org/projects/cel/cel-2.6.git
Great! I'll ask Chelsio to verify it.
Thanks,
Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: NFSoRDMA Fails for max_sge Less Than 18
[not found] ` <906A1B75-67E9-4D10-AD87-18F694F54818-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-01-13 22:45 ` Steve Wise
@ 2017-01-19 17:19 ` Amrani, Ram
1 sibling, 0 replies; 16+ messages in thread
From: Amrani, Ram @ 2017-01-19 17:19 UTC (permalink / raw)
To: Chuck Lever, Steve Wise
Cc: Elior, Ariel, Kalderon, Michal, List Linux RDMA Mailing,
Hariprasad S, Faisal Latif, Doug Ledford
>
> Posted earlier today:
>
> Available in the "nfs-rdma-for-4.10-rc" topic branch of this git repo:
>
> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>
>
> --
> Chuck Lever
>
>
Thanks Chuck.
It works fine now.
Ram
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-01-19 17:19 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-11 7:41 NFSoRDMA Fails for max_sge Less Than 18 Amrani, Ram
[not found] ` <SN1PR07MB2207F28F05DC6E22B03CC516F8660-mikhvbZlbf8TSoR2DauN2+FPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-01-11 16:38 ` Chuck Lever
[not found] ` <FE817A76-28A7-4AEC-AF1E-01DE15790E43-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-01-11 17:04 ` Steve Wise
2017-01-11 19:40 ` Chuck Lever
[not found] ` <28B0D906-7BDB-4B87-94E9-6BE263BFBFF7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-01-11 19:53 ` Steve Wise
2017-01-11 20:09 ` Chuck Lever
[not found] ` <383D1FFB-346B-40E9-A174-606F13AFF849-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-01-11 20:18 ` Steve Wise
2017-01-11 20:35 ` Chuck Lever
[not found] ` <A7A39994-66C6-4467-837B-288348C0CC53-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-01-13 22:42 ` Steve Wise
2017-01-13 22:43 ` Chuck Lever
[not found] ` <906A1B75-67E9-4D10-AD87-18F694F54818-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-01-13 22:45 ` Steve Wise
2017-01-19 17:19 ` Amrani, Ram
2017-01-11 21:11 ` Jason Gunthorpe
[not found] ` <20170111211123.GD28917-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-01-11 21:15 ` Steve Wise
2017-01-11 21:34 ` Jason Gunthorpe
[not found] ` <20170111213416.GA30681-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-01-11 21:48 ` Steve Wise
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).