* [PATCH 0/4] nfs/nfsd/sunrpc: enforce requirement for congestion control protocols in NFSv4
@ 2017-02-23 17:03 Jeff Layton
2017-02-23 17:03 ` [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols Jeff Layton
` (5 more replies)
0 siblings, 6 replies; 36+ messages in thread
From: Jeff Layton @ 2017-02-23 17:03 UTC (permalink / raw)
To: bfields, trond.myklebust; +Cc: schumaker.anna, linux-nfs
RFC5661 says:
Where an NFSv4.1 implementation supports operation over the IP
network protocol, any transport used between NFS and IP MUST be among
the IETF-approved congestion control transport protocols.
...and RFC7530 has similar verbiage. The NFS server has never enforced
this requirement, however, so a user could issue NFSv4 calls against
the server via UDP.
This patchset adds a small bit of infrastructure to the sunrpc layer
to enforce this requirement, and has the nfs and nfsd layers set the
appropriate flags for it. It also has knfsd skip registering a UDP
port for NFSv4, using the same flags.
Lightly tested by hand, but it's fairly straightforward.
Jeff Layton (4):
sunrpc: flag transports as using IETF approved congestion control
protocols
sunrpc: turn bitfield flags in svc_version into bools
nfs/nfsd/sunrpc: enforce congestion control protocol requirement for
NFSv4
sunrpc: don't register UDP port with rpcbind when version needs
congestion control
fs/nfs/callback_xdr.c | 6 ++++--
fs/nfsd/nfs2acl.c | 1 -
fs/nfsd/nfs3acl.c | 1 -
fs/nfsd/nfs4proc.c | 13 +++++++------
include/linux/sunrpc/svc.h | 12 ++++++++----
include/linux/sunrpc/svc_xprt.h | 1 +
net/sunrpc/svc.c | 22 +++++++++++++++++++++-
net/sunrpc/svcsock.c | 1 +
net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++
9 files changed, 44 insertions(+), 15 deletions(-)
--
2.9.3
^ permalink raw reply [flat|nested] 36+ messages in thread* [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols 2017-02-23 17:03 [PATCH 0/4] nfs/nfsd/sunrpc: enforce requirement for congestion control protocols in NFSv4 Jeff Layton @ 2017-02-23 17:03 ` Jeff Layton 2017-02-23 19:42 ` Tom Talpey 2017-02-23 17:03 ` [PATCH 2/4] sunrpc: turn bitfield flags in svc_version into bools Jeff Layton ` (4 subsequent siblings) 5 siblings, 1 reply; 36+ messages in thread From: Jeff Layton @ 2017-02-23 17:03 UTC (permalink / raw) To: bfields, trond.myklebust; +Cc: schumaker.anna, linux-nfs Signed-off-by: Jeff Layton <jlayton@redhat.com> --- include/linux/sunrpc/svc_xprt.h | 1 + net/sunrpc/svcsock.c | 1 + net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++ 3 files changed, 4 insertions(+) diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index 7440290f64ac..f8aa9452b63c 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -67,6 +67,7 @@ struct svc_xprt { #define XPT_CACHE_AUTH 11 /* cache auth info */ #define XPT_LOCAL 12 /* connection from loopback interface */ #define XPT_KILL_TEMP 13 /* call xpo_kill_temp_xprt before closing */ +#define XPT_CONG_CTRL 14 /* IETF approved congestion control protocol */ struct svc_serv *xpt_server; /* service for transport */ atomic_t xpt_reserved; /* space on outq that is rsvd */ diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index de066acdb34e..1956b8b96b2d 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1306,6 +1306,7 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv) svc_xprt_init(sock_net(svsk->sk_sock->sk), &svc_tcp_class, &svsk->sk_xprt, serv); set_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags); + set_bit(XPT_CONG_CTRL, &svsk->sk_xprt.xpt_flags); if (sk->sk_state == TCP_LISTEN) { dprintk("setting up TCP socket for listening\n"); set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags); diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index 39652d390a9c..96b4797c2c54 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -571,6 +571,8 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv, spin_lock_init(&cma_xprt->sc_ctxt_lock); spin_lock_init(&cma_xprt->sc_map_lock); + set_bit(XPT_CONG_CTRL, &cma_xprt->sc_xprt.xpt_flags); + if (listener) set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags); -- 2.9.3 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols 2017-02-23 17:03 ` [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols Jeff Layton @ 2017-02-23 19:42 ` Tom Talpey 2017-02-23 20:00 ` Jeff Layton 2017-02-23 20:15 ` Chuck Lever 0 siblings, 2 replies; 36+ messages in thread From: Tom Talpey @ 2017-02-23 19:42 UTC (permalink / raw) To: Jeff Layton, bfields, trond.myklebust; +Cc: schumaker.anna, linux-nfs On 2/23/2017 12:03 PM, Jeff Layton wrote: > Signed-off-by: Jeff Layton <jlayton@redhat.com> > --- > include/linux/sunrpc/svc_xprt.h | 1 + > net/sunrpc/svcsock.c | 1 + > net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++ There's a possibly-important detail here. Not all RDMA transports have "IETF-approved congestion control", for example, RoCEv1. However, iWARP and (arguably) RoCEv2 do. On the other hand, as a nonroutable protocol, RoCEv1 may not fall under this restriction. Net-net, inspecting only the RDMA attribute of the transport may be insufficient here. It could be argued however that the xprtrdma layer, with its rpcrdma crediting, provides such congestion. But that needs to be made explicit, and perhaps, discussed in IETF. Initially, I think it would be important to flag this as a point for the future. For now, it may be best to flag RoCEv1 as not supporting congestion. Tom. > 3 files changed, 4 insertions(+) > > diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h > index 7440290f64ac..f8aa9452b63c 100644 > --- a/include/linux/sunrpc/svc_xprt.h > +++ b/include/linux/sunrpc/svc_xprt.h > @@ -67,6 +67,7 @@ struct svc_xprt { > #define XPT_CACHE_AUTH 11 /* cache auth info */ > #define XPT_LOCAL 12 /* connection from loopback interface */ > #define XPT_KILL_TEMP 13 /* call xpo_kill_temp_xprt before closing */ > +#define XPT_CONG_CTRL 14 /* IETF approved congestion control protocol */ > > struct svc_serv *xpt_server; /* service for transport */ > atomic_t xpt_reserved; /* space on outq that is rsvd */ > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > index de066acdb34e..1956b8b96b2d 100644 > --- a/net/sunrpc/svcsock.c > +++ b/net/sunrpc/svcsock.c > @@ -1306,6 +1306,7 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv) > svc_xprt_init(sock_net(svsk->sk_sock->sk), &svc_tcp_class, > &svsk->sk_xprt, serv); > set_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags); > + set_bit(XPT_CONG_CTRL, &svsk->sk_xprt.xpt_flags); > if (sk->sk_state == TCP_LISTEN) { > dprintk("setting up TCP socket for listening\n"); > set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags); > diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c > index 39652d390a9c..96b4797c2c54 100644 > --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c > +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c > @@ -571,6 +571,8 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv, > spin_lock_init(&cma_xprt->sc_ctxt_lock); > spin_lock_init(&cma_xprt->sc_map_lock); > > + set_bit(XPT_CONG_CTRL, &cma_xprt->sc_xprt.xpt_flags); > + > if (listener) > set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags); > > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols 2017-02-23 19:42 ` Tom Talpey @ 2017-02-23 20:00 ` Jeff Layton 2017-02-23 20:06 ` Tom Talpey 2017-02-23 20:15 ` Chuck Lever 1 sibling, 1 reply; 36+ messages in thread From: Jeff Layton @ 2017-02-23 20:00 UTC (permalink / raw) To: Tom Talpey, bfields, trond.myklebust Cc: schumaker.anna, linux-nfs, Chuck Lever, linux-rdma On Thu, 2017-02-23 at 14:42 -0500, Tom Talpey wrote: > On 2/23/2017 12:03 PM, Jeff Layton wrote: > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > > --- > > include/linux/sunrpc/svc_xprt.h | 1 + > > net/sunrpc/svcsock.c | 1 + > > net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++ > > There's a possibly-important detail here. Not all RDMA transports have > "IETF-approved congestion control", for example, RoCEv1. However, iWARP > and (arguably) RoCEv2 do. On the other hand, as a nonroutable protocol, > RoCEv1 may not fall under this restriction. > > Net-net, inspecting only the RDMA attribute of the transport may be > insufficient here. > > It could be argued however that the xprtrdma layer, with its rpcrdma > crediting, provides such congestion. But that needs to be made > explicit, and perhaps, discussed in IETF. Initially, I think it would > be important to flag this as a point for the future. For now, it may > be best to flag RoCEv1 as not supporting congestion. > > Tom. > (cc'ing Chuck and the linux-rdma list) Thanks Tom, that's very interesting. Not being well versed in the xprtrdma layer, what condition should we use here to set the flag? git grep shows that the string "ROCEV1" only shows up in the bxnt_en driver. Is there some way to determine this generically for any given RDMA driver? > > 3 files changed, 4 insertions(+) > > > > diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h > > index 7440290f64ac..f8aa9452b63c 100644 > > --- a/include/linux/sunrpc/svc_xprt.h > > +++ b/include/linux/sunrpc/svc_xprt.h > > @@ -67,6 +67,7 @@ struct svc_xprt { > > #define XPT_CACHE_AUTH 11 /* cache auth info */ > > #define XPT_LOCAL 12 /* connection from loopback interface */ > > #define XPT_KILL_TEMP 13 /* call xpo_kill_temp_xprt before closing */ > > +#define XPT_CONG_CTRL 14 /* IETF approved congestion control protocol */ > > > > struct svc_serv *xpt_server; /* service for transport */ > > atomic_t xpt_reserved; /* space on outq that is rsvd */ > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > > index de066acdb34e..1956b8b96b2d 100644 > > --- a/net/sunrpc/svcsock.c > > +++ b/net/sunrpc/svcsock.c > > @@ -1306,6 +1306,7 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv) > > svc_xprt_init(sock_net(svsk->sk_sock->sk), &svc_tcp_class, > > &svsk->sk_xprt, serv); > > set_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags); > > + set_bit(XPT_CONG_CTRL, &svsk->sk_xprt.xpt_flags); > > if (sk->sk_state == TCP_LISTEN) { > > dprintk("setting up TCP socket for listening\n"); > > set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags); > > diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c > > index 39652d390a9c..96b4797c2c54 100644 > > --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c > > +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c > > @@ -571,6 +571,8 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv, > > spin_lock_init(&cma_xprt->sc_ctxt_lock); > > spin_lock_init(&cma_xprt->sc_map_lock); > > > > + set_bit(XPT_CONG_CTRL, &cma_xprt->sc_xprt.xpt_flags); > > + > > if (listener) > > set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags); > > > > -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols 2017-02-23 20:00 ` Jeff Layton @ 2017-02-23 20:06 ` Tom Talpey 2017-02-23 20:11 ` J. Bruce Fields 2017-02-23 20:17 ` Chuck Lever 0 siblings, 2 replies; 36+ messages in thread From: Tom Talpey @ 2017-02-23 20:06 UTC (permalink / raw) To: Jeff Layton, bfields, trond.myklebust Cc: schumaker.anna, linux-nfs, Chuck Lever, linux-rdma On 2/23/2017 3:00 PM, Jeff Layton wrote: > On Thu, 2017-02-23 at 14:42 -0500, Tom Talpey wrote: >> On 2/23/2017 12:03 PM, Jeff Layton wrote: >>> Signed-off-by: Jeff Layton <jlayton@redhat.com> >>> --- >>> include/linux/sunrpc/svc_xprt.h | 1 + >>> net/sunrpc/svcsock.c | 1 + >>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++ >> >> There's a possibly-important detail here. Not all RDMA transports have >> "IETF-approved congestion control", for example, RoCEv1. However, iWARP >> and (arguably) RoCEv2 do. On the other hand, as a nonroutable protocol, >> RoCEv1 may not fall under this restriction. >> >> Net-net, inspecting only the RDMA attribute of the transport may be >> insufficient here. >> >> It could be argued however that the xprtrdma layer, with its rpcrdma >> crediting, provides such congestion. But that needs to be made >> explicit, and perhaps, discussed in IETF. Initially, I think it would >> be important to flag this as a point for the future. For now, it may >> be best to flag RoCEv1 as not supporting congestion. >> >> Tom. >> > > (cc'ing Chuck and the linux-rdma list) > > Thanks Tom, that's very interesting. > > Not being well versed in the xprtrdma layer, what condition should we > use here to set the flag? git grep shows that the string "ROCEV1" only > shows up in the bxnt_en driver. Is there some way to determine this > generically for any given RDMA driver? I would not code RoCEv1 as an exception, I would code iWARP and RoCEv2 as the only eligible ones. There are any number of other possibilities, none of which should be automatically flagged as congestion-controlled. I'm also not sure I'm comfortable with hardcoding such a list into RPC. But it may be the best you can do for now. Chuck, are you aware of a verbs interface to obtain the RDMA transport type? Tom. > > >>> 3 files changed, 4 insertions(+) >>> >>> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h >>> index 7440290f64ac..f8aa9452b63c 100644 >>> --- a/include/linux/sunrpc/svc_xprt.h >>> +++ b/include/linux/sunrpc/svc_xprt.h >>> @@ -67,6 +67,7 @@ struct svc_xprt { >>> #define XPT_CACHE_AUTH 11 /* cache auth info */ >>> #define XPT_LOCAL 12 /* connection from loopback interface */ >>> #define XPT_KILL_TEMP 13 /* call xpo_kill_temp_xprt before closing */ >>> +#define XPT_CONG_CTRL 14 /* IETF approved congestion control protocol */ >>> >>> struct svc_serv *xpt_server; /* service for transport */ >>> atomic_t xpt_reserved; /* space on outq that is rsvd */ >>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c >>> index de066acdb34e..1956b8b96b2d 100644 >>> --- a/net/sunrpc/svcsock.c >>> +++ b/net/sunrpc/svcsock.c >>> @@ -1306,6 +1306,7 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv) >>> svc_xprt_init(sock_net(svsk->sk_sock->sk), &svc_tcp_class, >>> &svsk->sk_xprt, serv); >>> set_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags); >>> + set_bit(XPT_CONG_CTRL, &svsk->sk_xprt.xpt_flags); >>> if (sk->sk_state == TCP_LISTEN) { >>> dprintk("setting up TCP socket for listening\n"); >>> set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags); >>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c >>> index 39652d390a9c..96b4797c2c54 100644 >>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c >>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c >>> @@ -571,6 +571,8 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv, >>> spin_lock_init(&cma_xprt->sc_ctxt_lock); >>> spin_lock_init(&cma_xprt->sc_map_lock); >>> >>> + set_bit(XPT_CONG_CTRL, &cma_xprt->sc_xprt.xpt_flags); >>> + >>> if (listener) >>> set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags); >>> >>> > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols 2017-02-23 20:06 ` Tom Talpey @ 2017-02-23 20:11 ` J. Bruce Fields 2017-02-23 20:26 ` Jason Gunthorpe 2017-02-23 20:32 ` Jeff Layton 2017-02-23 20:17 ` Chuck Lever 1 sibling, 2 replies; 36+ messages in thread From: J. Bruce Fields @ 2017-02-23 20:11 UTC (permalink / raw) To: Tom Talpey Cc: Jeff Layton, trond.myklebust, schumaker.anna, linux-nfs, Chuck Lever, linux-rdma On Thu, Feb 23, 2017 at 03:06:25PM -0500, Tom Talpey wrote: > On 2/23/2017 3:00 PM, Jeff Layton wrote: > >On Thu, 2017-02-23 at 14:42 -0500, Tom Talpey wrote: > >>On 2/23/2017 12:03 PM, Jeff Layton wrote: > >>>Signed-off-by: Jeff Layton <jlayton@redhat.com> > >>>--- > >>> include/linux/sunrpc/svc_xprt.h | 1 + > >>> net/sunrpc/svcsock.c | 1 + > >>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++ > >> > >>There's a possibly-important detail here. Not all RDMA transports have > >>"IETF-approved congestion control", for example, RoCEv1. However, iWARP > >>and (arguably) RoCEv2 do. On the other hand, as a nonroutable protocol, > >>RoCEv1 may not fall under this restriction. > >> > >>Net-net, inspecting only the RDMA attribute of the transport may be > >>insufficient here. > >> > >>It could be argued however that the xprtrdma layer, with its rpcrdma > >>crediting, provides such congestion. But that needs to be made > >>explicit, and perhaps, discussed in IETF. Initially, I think it would > >>be important to flag this as a point for the future. For now, it may > >>be best to flag RoCEv1 as not supporting congestion. > >> > >>Tom. > >> > > > >(cc'ing Chuck and the linux-rdma list) > > > >Thanks Tom, that's very interesting. > > > >Not being well versed in the xprtrdma layer, what condition should we > >use here to set the flag? git grep shows that the string "ROCEV1" only > >shows up in the bxnt_en driver. Is there some way to determine this > >generically for any given RDMA driver? > > I would not code RoCEv1 as an exception, I would code iWARP and RoCEv2 > as the only eligible ones. There are any number of other possibilities, > none of which should be automatically flagged as congestion-controlled. > > I'm also not sure I'm comfortable with hardcoding such a list into RPC. > But it may be the best you can do for now. Chuck, are you aware of a > verbs interface to obtain the RDMA transport type? If this gets too complicated--we've been allowing NFSv4/UDP for years, letting this one (arguable?) exception through in RDMA a little longer won't kill us. (And if we really shouldn't be doing NFSv4 over some RDMA transports--is it worth supporting them at all, if the only support we can get is NFSv3-only?) --b. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols 2017-02-23 20:11 ` J. Bruce Fields @ 2017-02-23 20:26 ` Jason Gunthorpe 2017-02-23 20:33 ` Tom Talpey 2017-02-23 20:32 ` Jeff Layton 1 sibling, 1 reply; 36+ messages in thread From: Jason Gunthorpe @ 2017-02-23 20:26 UTC (permalink / raw) To: J. Bruce Fields Cc: Tom Talpey, Jeff Layton, trond.myklebust, schumaker.anna, linux-nfs, Chuck Lever, linux-rdma On Thu, Feb 23, 2017 at 03:11:09PM -0500, J. Bruce Fields wrote: > (And if we really shouldn't be doing NFSv4 over some RDMA transports--is > it worth supporting them at all, if the only support we can get is > NFSv3-only?) This seems like a strange comment - NFSv4 should be supported on all RDMA transports, surely? Largely RDMA lives in its own congestion management world. If a site is running RDMA they have done something to mitigate interactions with TCP style congestion control on the same wire. Jason ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols 2017-02-23 20:26 ` Jason Gunthorpe @ 2017-02-23 20:33 ` Tom Talpey 2017-02-23 20:55 ` Jason Gunthorpe 0 siblings, 1 reply; 36+ messages in thread From: Tom Talpey @ 2017-02-23 20:33 UTC (permalink / raw) To: Jason Gunthorpe, J. Bruce Fields Cc: Jeff Layton, trond.myklebust, schumaker.anna, linux-nfs, Chuck Lever, linux-rdma On 2/23/2017 3:26 PM, Jason Gunthorpe wrote: > On Thu, Feb 23, 2017 at 03:11:09PM -0500, J. Bruce Fields wrote: > >> (And if we really shouldn't be doing NFSv4 over some RDMA transports--is >> it worth supporting them at all, if the only support we can get is >> NFSv3-only?) > > This seems like a strange comment - NFSv4 should be supported on all > RDMA transports, surely? > > Largely RDMA lives in its own congestion management world. If a site > is running RDMA they have done something to mitigate interactions with > TCP style congestion control on the same wire. The key words are "IETF-approved". Mitigation and Interaction are operational decisions, not protocol design. We could argue that the requirement is bogus, or that all RDMA transports comply, or that the RPCRDMA layer provides it, but none of these arguments would grant IETF approval. That said, I think there's a lot of room for interpretation here. Tom. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols 2017-02-23 20:33 ` Tom Talpey @ 2017-02-23 20:55 ` Jason Gunthorpe 2017-02-24 15:08 ` Tom Talpey 0 siblings, 1 reply; 36+ messages in thread From: Jason Gunthorpe @ 2017-02-23 20:55 UTC (permalink / raw) To: Tom Talpey Cc: J. Bruce Fields, Jeff Layton, trond.myklebust, schumaker.anna, linux-nfs, Chuck Lever, linux-rdma On Thu, Feb 23, 2017 at 03:33:53PM -0500, Tom Talpey wrote: > The key words are "IETF-approved". Mitigation and Interaction are > operational decisions, not protocol design. We are talking about this bit from RFC 3530 ? Where an NFS version 4 implementation supports operation over the IP network protocol, the supported transports between NFS and IP MUST be among the IETF-approved congestion control transport protocols, which include TCP and SCTP. This gives most of RDMA an out as it is not over the IP protocol. The only obvious troubled one is RoCEv2.. Jason ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols 2017-02-23 20:55 ` Jason Gunthorpe @ 2017-02-24 15:08 ` Tom Talpey 2017-02-24 17:17 ` Jeff Layton 0 siblings, 1 reply; 36+ messages in thread From: Tom Talpey @ 2017-02-24 15:08 UTC (permalink / raw) To: Jason Gunthorpe Cc: J. Bruce Fields, Jeff Layton, trond.myklebust, schumaker.anna, linux-nfs, Chuck Lever, linux-rdma On 2/23/2017 3:55 PM, Jason Gunthorpe wrote: > On Thu, Feb 23, 2017 at 03:33:53PM -0500, Tom Talpey wrote: > >> The key words are "IETF-approved". Mitigation and Interaction are >> operational decisions, not protocol design. > > We are talking about this bit from RFC 3530 ? > > Where an NFS version 4 implementation supports operation over the IP > network protocol, the supported transports between NFS and IP MUST be > among the IETF-approved congestion control transport protocols, which > include TCP and SCTP. > > This gives most of RDMA an out as it is not over the IP protocol. The > only obvious troubled one is RoCEv2.. RFC7530 has updated this text somewhat, but it's similar, yes. https://tools.ietf.org/html/rfc7530#section-3.1 The specification language specifically calls out IP-based transports, which is why I mentioned that RoCEv1, being non-IP-based and not even truly routable, could obtain a bye. But the NFS layer IMO should really not be digging down to this level. I think it would be much better if each transport could expose a relevant attribute, which NFS could optionally inspect. As you mention, RoCEv2 is a bit of a pickle. It's UDP/IP-based, and it does have end-to-end congestion control, but technically speaking it is not "IETF approved". I'm not sure what call to make there. Tom. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols 2017-02-24 15:08 ` Tom Talpey @ 2017-02-24 17:17 ` Jeff Layton 2017-02-24 18:03 ` Jason Gunthorpe 0 siblings, 1 reply; 36+ messages in thread From: Jeff Layton @ 2017-02-24 17:17 UTC (permalink / raw) To: Tom Talpey, Jason Gunthorpe Cc: J. Bruce Fields, trond.myklebust, schumaker.anna, linux-nfs, Chuck Lever, linux-rdma On Fri, 2017-02-24 at 10:08 -0500, Tom Talpey wrote: > On 2/23/2017 3:55 PM, Jason Gunthorpe wrote: > > On Thu, Feb 23, 2017 at 03:33:53PM -0500, Tom Talpey wrote: > > > > > The key words are "IETF-approved". Mitigation and Interaction are > > > operational decisions, not protocol design. > > > > We are talking about this bit from RFC 3530 ? > > > > Where an NFS version 4 implementation supports operation over the IP > > network protocol, the supported transports between NFS and IP MUST be > > among the IETF-approved congestion control transport protocols, which > > include TCP and SCTP. > > > > This gives most of RDMA an out as it is not over the IP protocol. The > > only obvious troubled one is RoCEv2.. > > RFC7530 has updated this text somewhat, but it's similar, yes. > > https://tools.ietf.org/html/rfc7530#section-3.1 > > The specification language specifically calls out IP-based transports, > which is why I mentioned that RoCEv1, being non-IP-based and not even > truly routable, could obtain a bye. But the NFS layer IMO should really > not be digging down to this level. I think it would be much better if > each transport could expose a relevant attribute, which NFS could > optionally inspect. > > As you mention, RoCEv2 is a bit of a pickle. It's UDP/IP-based, and it > does have end-to-end congestion control, but technically speaking it > is not "IETF approved". I'm not sure what call to make there. > > Tom. I'm perfectly willing to forgo the "IETF approved" verbiage since it's ambiguous anyway. We can certainly just use more weasel words in the comments as well. RFC5661 has a bit more to say on the matter: NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA- based transports with the following attributes: o The transport supports reliable delivery of data, which NFSv4.1 requires but neither NFSv4.1 nor RPC has facilities for ensuring [34]. o The transport delivers data in the order it was sent. Ordered delivery simplifies detection of transmit errors, and simplifies the sending of arbitrary sized requests and responses via the record marking protocol [3]. I'd rather rely on those attributes instead of any sort of IETF approval anyway. Do all RDMA transports (RoCEv2, in particular) have those characteristics? If not, then perhaps there is some way to have different RDMA transport drivers set flags in some structure as to whether they fulfill those requirements, and then have the xprtrdma driver check for those flags. In any case, for now I think we should just give all RDMA transports a pass, and clean that up later. I'm mostly interested in excluding UDP over IP for now -- being more strict with RDMA can come later. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols 2017-02-24 17:17 ` Jeff Layton @ 2017-02-24 18:03 ` Jason Gunthorpe 0 siblings, 0 replies; 36+ messages in thread From: Jason Gunthorpe @ 2017-02-24 18:03 UTC (permalink / raw) To: Jeff Layton Cc: Tom Talpey, J. Bruce Fields, trond.myklebust, schumaker.anna, linux-nfs, Chuck Lever, linux-rdma > I'd rather rely on those attributes instead of any sort of IETF > approval anyway. Do all RDMA transports (RoCEv2, in particular) have > those characteristics? The NFS-RDMA driver only works with RDMA RC transports which are defined to provide all those characteristics. > In any case, for now I think we should just give all RDMA transports a > pass, and clean that up later. I'm mostly interested in excluding UDP > over IP for now -- being more strict with RDMA can come later. Makes sense to me. At the end of the day I think everything supported by NFS-RDMA should be permitted to use NFSv4, and we should ignore sketchy spec language that suggests otherwise. Jason ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols 2017-02-23 20:11 ` J. Bruce Fields 2017-02-23 20:26 ` Jason Gunthorpe @ 2017-02-23 20:32 ` Jeff Layton 1 sibling, 0 replies; 36+ messages in thread From: Jeff Layton @ 2017-02-23 20:32 UTC (permalink / raw) To: J. Bruce Fields, Tom Talpey Cc: trond.myklebust, schumaker.anna, linux-nfs, Chuck Lever, linux-rdma On Thu, 2017-02-23 at 15:11 -0500, J. Bruce Fields wrote: > On Thu, Feb 23, 2017 at 03:06:25PM -0500, Tom Talpey wrote: > > On 2/23/2017 3:00 PM, Jeff Layton wrote: > > > On Thu, 2017-02-23 at 14:42 -0500, Tom Talpey wrote: > > > > On 2/23/2017 12:03 PM, Jeff Layton wrote: > > > > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > > > > > --- > > > > > include/linux/sunrpc/svc_xprt.h | 1 + > > > > > net/sunrpc/svcsock.c | 1 + > > > > > net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++ > > > > > > > > There's a possibly-important detail here. Not all RDMA transports have > > > > "IETF-approved congestion control", for example, RoCEv1. However, iWARP > > > > and (arguably) RoCEv2 do. On the other hand, as a nonroutable protocol, > > > > RoCEv1 may not fall under this restriction. > > > > > > > > Net-net, inspecting only the RDMA attribute of the transport may be > > > > insufficient here. > > > > > > > > It could be argued however that the xprtrdma layer, with its rpcrdma > > > > crediting, provides such congestion. But that needs to be made > > > > explicit, and perhaps, discussed in IETF. Initially, I think it would > > > > be important to flag this as a point for the future. For now, it may > > > > be best to flag RoCEv1 as not supporting congestion. > > > > > > > > Tom. > > > > > > > > > > (cc'ing Chuck and the linux-rdma list) > > > > > > Thanks Tom, that's very interesting. > > > > > > Not being well versed in the xprtrdma layer, what condition should we > > > use here to set the flag? git grep shows that the string "ROCEV1" only > > > shows up in the bxnt_en driver. Is there some way to determine this > > > generically for any given RDMA driver? > > > > I would not code RoCEv1 as an exception, I would code iWARP and RoCEv2 > > as the only eligible ones. There are any number of other possibilities, > > none of which should be automatically flagged as congestion-controlled. > > > > I'm also not sure I'm comfortable with hardcoding such a list into RPC. > > But it may be the best you can do for now. Chuck, are you aware of a > > verbs interface to obtain the RDMA transport type? > > If this gets too complicated--we've been allowing NFSv4/UDP for years, > letting this one (arguable?) exception through in RDMA a little longer > won't kill us. > That's my feeling too. This is still an improvement over the status quo, and hopefully anyone with RDMA hardware will have a bit more clue as to whether it can properly support v4. We can always further restrict when rdma_create_xprt sets the flag in a later patch if we figure out some way to determine this generically. I will plan to add a comment that we're setting this RDMA svc_xprt universally even though it may not always be true. > (And if we really shouldn't be doing NFSv4 over some RDMA transports--is > it worth supporting them at all, if the only support we can get is > NFSv3-only?) > I'd be inclined to leave them working and just deny the use of v4 on such transports. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols 2017-02-23 20:06 ` Tom Talpey 2017-02-23 20:11 ` J. Bruce Fields @ 2017-02-23 20:17 ` Chuck Lever 1 sibling, 0 replies; 36+ messages in thread From: Chuck Lever @ 2017-02-23 20:17 UTC (permalink / raw) To: Tom Talpey Cc: Jeff Layton, bfields, trond.myklebust, schumaker.anna, linux-nfs, linux-rdma -- Chuck Lever > On Feb 23, 2017, at 3:06 PM, Tom Talpey <tom@talpey.com> wrote: > >> On 2/23/2017 3:00 PM, Jeff Layton wrote: >>> On Thu, 2017-02-23 at 14:42 -0500, Tom Talpey wrote: >>>> On 2/23/2017 12:03 PM, Jeff Layton wrote: >>>> Signed-off-by: Jeff Layton <jlayton@redhat.com> >>>> --- >>>> include/linux/sunrpc/svc_xprt.h | 1 + >>>> net/sunrpc/svcsock.c | 1 + >>>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++ >>> >>> There's a possibly-important detail here. Not all RDMA transports have >>> "IETF-approved congestion control", for example, RoCEv1. However, iWARP >>> and (arguably) RoCEv2 do. On the other hand, as a nonroutable protocol, >>> RoCEv1 may not fall under this restriction. >>> >>> Net-net, inspecting only the RDMA attribute of the transport may be >>> insufficient here. >>> >>> It could be argued however that the xprtrdma layer, with its rpcrdma >>> crediting, provides such congestion. But that needs to be made >>> explicit, and perhaps, discussed in IETF. Initially, I think it would >>> be important to flag this as a point for the future. For now, it may >>> be best to flag RoCEv1 as not supporting congestion. >>> >>> Tom. >>> >> >> (cc'ing Chuck and the linux-rdma list) >> >> Thanks Tom, that's very interesting. >> >> Not being well versed in the xprtrdma layer, what condition should we >> use here to set the flag? git grep shows that the string "ROCEV1" only >> shows up in the bxnt_en driver. Is there some way to determine this >> generically for any given RDMA driver? > > I would not code RoCEv1 as an exception, I would code iWARP and RoCEv2 > as the only eligible ones. There are any number of other possibilities, > none of which should be automatically flagged as congestion-controlled. > > I'm also not sure I'm comfortable with hardcoding such a list into RPC. > But it may be the best you can do for now. Chuck, are you aware of a > verbs interface to obtain the RDMA transport type? Yes, I can have a look when I get to Connectathon. > > Tom. > >> >> >>>> 3 files changed, 4 insertions(+) >>>> >>>> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h >>>> index 7440290f64ac..f8aa9452b63c 100644 >>>> --- a/include/linux/sunrpc/svc_xprt.h >>>> +++ b/include/linux/sunrpc/svc_xprt.h >>>> @@ -67,6 +67,7 @@ struct svc_xprt { >>>> #define XPT_CACHE_AUTH 11 /* cache auth info */ >>>> #define XPT_LOCAL 12 /* connection from loopback interface */ >>>> #define XPT_KILL_TEMP 13 /* call xpo_kill_temp_xprt before closing */ >>>> +#define XPT_CONG_CTRL 14 /* IETF approved congestion control protocol */ >>>> >>>> struct svc_serv *xpt_server; /* service for transport */ >>>> atomic_t xpt_reserved; /* space on outq that is rsvd */ >>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c >>>> index de066acdb34e..1956b8b96b2d 100644 >>>> --- a/net/sunrpc/svcsock.c >>>> +++ b/net/sunrpc/svcsock.c >>>> @@ -1306,6 +1306,7 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv) >>>> svc_xprt_init(sock_net(svsk->sk_sock->sk), &svc_tcp_class, >>>> &svsk->sk_xprt, serv); >>>> set_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags); >>>> + set_bit(XPT_CONG_CTRL, &svsk->sk_xprt.xpt_flags); >>>> if (sk->sk_state == TCP_LISTEN) { >>>> dprintk("setting up TCP socket for listening\n"); >>>> set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags); >>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c >>>> index 39652d390a9c..96b4797c2c54 100644 >>>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c >>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c >>>> @@ -571,6 +571,8 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv, >>>> spin_lock_init(&cma_xprt->sc_ctxt_lock); >>>> spin_lock_init(&cma_xprt->sc_map_lock); >>>> >>>> + set_bit(XPT_CONG_CTRL, &cma_xprt->sc_xprt.xpt_flags); >>>> + >>>> if (listener) >>>> set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags); >>>> >>>> >> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols 2017-02-23 19:42 ` Tom Talpey 2017-02-23 20:00 ` Jeff Layton @ 2017-02-23 20:15 ` Chuck Lever 1 sibling, 0 replies; 36+ messages in thread From: Chuck Lever @ 2017-02-23 20:15 UTC (permalink / raw) To: Tom Talpey Cc: Jeff Layton, bfields, trond.myklebust, schumaker.anna, linux-nfs -- Chuck Lever > On Feb 23, 2017, at 2:42 PM, Tom Talpey <tom@talpey.com> wrote: > >> On 2/23/2017 12:03 PM, Jeff Layton wrote: >> Signed-off-by: Jeff Layton <jlayton@redhat.com> >> --- >> include/linux/sunrpc/svc_xprt.h | 1 + >> net/sunrpc/svcsock.c | 1 + >> net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++ > > There's a possibly-important detail here. Not all RDMA transports have > "IETF-approved congestion control", for example, RoCEv1. However, iWARP > and (arguably) RoCEv2 do. On the other hand, as a nonroutable protocol, > RoCEv1 may not fall under this restriction. Probably a discussion that could be handled in rfc5667bis. But it's not clear what IETF-approved congestion control is, precisely. We'd have to say something about InfiniBand (a transport which is not specified by the IETF) too. > Net-net, inspecting only the RDMA attribute of the transport may be > insufficient here. The transport implementation would have to set it in the svc_xprt rather than having it be a constant field in the xprt class. > It could be argued however that the xprtrdma layer, with its rpcrdma > crediting, provides such congestion. But that needs to be made > explicit, and perhaps, discussed in IETF. Initially, I think it would > be important to flag this as a point for the future. For now, it may > be best to flag RoCEv1 as not supporting congestion. > > Tom. > >> 3 files changed, 4 insertions(+) >> >> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h >> index 7440290f64ac..f8aa9452b63c 100644 >> --- a/include/linux/sunrpc/svc_xprt.h >> +++ b/include/linux/sunrpc/svc_xprt.h >> @@ -67,6 +67,7 @@ struct svc_xprt { >> #define XPT_CACHE_AUTH 11 /* cache auth info */ >> #define XPT_LOCAL 12 /* connection from loopback interface */ >> #define XPT_KILL_TEMP 13 /* call xpo_kill_temp_xprt before closing */ >> +#define XPT_CONG_CTRL 14 /* IETF approved congestion control protocol */ >> >> struct svc_serv *xpt_server; /* service for transport */ >> atomic_t xpt_reserved; /* space on outq that is rsvd */ >> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c >> index de066acdb34e..1956b8b96b2d 100644 >> --- a/net/sunrpc/svcsock.c >> +++ b/net/sunrpc/svcsock.c >> @@ -1306,6 +1306,7 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv) >> svc_xprt_init(sock_net(svsk->sk_sock->sk), &svc_tcp_class, >> &svsk->sk_xprt, serv); >> set_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags); >> + set_bit(XPT_CONG_CTRL, &svsk->sk_xprt.xpt_flags); >> if (sk->sk_state == TCP_LISTEN) { >> dprintk("setting up TCP socket for listening\n"); >> set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags); >> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c >> index 39652d390a9c..96b4797c2c54 100644 >> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c >> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c >> @@ -571,6 +571,8 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv, >> spin_lock_init(&cma_xprt->sc_ctxt_lock); >> spin_lock_init(&cma_xprt->sc_map_lock); >> >> + set_bit(XPT_CONG_CTRL, &cma_xprt->sc_xprt.xpt_flags); >> + >> if (listener) >> set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags); >> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 2/4] sunrpc: turn bitfield flags in svc_version into bools 2017-02-23 17:03 [PATCH 0/4] nfs/nfsd/sunrpc: enforce requirement for congestion control protocols in NFSv4 Jeff Layton 2017-02-23 17:03 ` [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols Jeff Layton @ 2017-02-23 17:03 ` Jeff Layton 2017-02-23 17:03 ` [PATCH 3/4] nfs/nfsd/sunrpc: enforce congestion control protocol requirement for NFSv4 Jeff Layton ` (3 subsequent siblings) 5 siblings, 0 replies; 36+ messages in thread From: Jeff Layton @ 2017-02-23 17:03 UTC (permalink / raw) To: bfields, trond.myklebust; +Cc: schumaker.anna, linux-nfs It's just simpler to read this way, IMO. Also, no need to explicitly set vs_hidden to false in the nfsacl ones. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/nfs/callback_xdr.c | 4 ++-- fs/nfsd/nfs2acl.c | 1 - fs/nfsd/nfs3acl.c | 1 - fs/nfsd/nfs4proc.c | 2 +- include/linux/sunrpc/svc.h | 9 +++++---- net/sunrpc/svc.c | 2 +- 6 files changed, 9 insertions(+), 10 deletions(-) diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index eb094c6011d8..e9836f611d9c 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -1083,7 +1083,7 @@ struct svc_version nfs4_callback_version1 = { .vs_proc = nfs4_callback_procedures1, .vs_xdrsize = NFS4_CALLBACK_XDRSIZE, .vs_dispatch = NULL, - .vs_hidden = 1, + .vs_hidden = true, }; struct svc_version nfs4_callback_version4 = { @@ -1092,5 +1092,5 @@ struct svc_version nfs4_callback_version4 = { .vs_proc = nfs4_callback_procedures1, .vs_xdrsize = NFS4_CALLBACK_XDRSIZE, .vs_dispatch = NULL, - .vs_hidden = 1, + .vs_hidden = true, }; diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index d08cd88155c7..838f90f3f890 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -376,5 +376,4 @@ struct svc_version nfsd_acl_version2 = { .vs_proc = nfsd_acl_procedures2, .vs_dispatch = nfsd_dispatch, .vs_xdrsize = NFS3_SVC_XDRSIZE, - .vs_hidden = 0, }; diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 0c890347cde3..dcb5f79076c0 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -266,6 +266,5 @@ struct svc_version nfsd_acl_version3 = { .vs_proc = nfsd_acl_procedures3, .vs_dispatch = nfsd_dispatch, .vs_xdrsize = NFS3_SVC_XDRSIZE, - .vs_hidden = 0, }; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 74a6e573e061..2b73b37c2b36 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2481,7 +2481,7 @@ struct svc_version nfsd_version4 = { .vs_proc = nfsd_procedures4, .vs_dispatch = nfsd_dispatch, .vs_xdrsize = NFS4_SVC_XDRSIZE, - .vs_rpcb_optnl = 1, + .vs_rpcb_optnl = true, }; /* diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 7321ae933867..96467c95f02e 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -400,10 +400,11 @@ struct svc_version { struct svc_procedure * vs_proc; /* per-procedure info */ u32 vs_xdrsize; /* xdrsize needed for this version */ - unsigned int vs_hidden : 1, /* Don't register with portmapper. - * Only used for nfsacl so far. */ - vs_rpcb_optnl:1;/* Don't care the result of register. - * Only used for nfsv4. */ + /* Don't register with rpcbind */ + bool vs_hidden; + + /* Don't care if the rpcbind registration fails */ + bool vs_rpcb_optnl; /* Override dispatch function (e.g. when caching replies). * A return value of 0 means drop the request. diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 75f290bddca1..85bcdea67a3f 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -385,7 +385,7 @@ static int svc_uses_rpcbind(struct svc_serv *serv) for (i = 0; i < progp->pg_nvers; i++) { if (progp->pg_vers[i] == NULL) continue; - if (progp->pg_vers[i]->vs_hidden == 0) + if (!progp->pg_vers[i]->vs_hidden) return 1; } } -- 2.9.3 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 3/4] nfs/nfsd/sunrpc: enforce congestion control protocol requirement for NFSv4 2017-02-23 17:03 [PATCH 0/4] nfs/nfsd/sunrpc: enforce requirement for congestion control protocols in NFSv4 Jeff Layton 2017-02-23 17:03 ` [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols Jeff Layton 2017-02-23 17:03 ` [PATCH 2/4] sunrpc: turn bitfield flags in svc_version into bools Jeff Layton @ 2017-02-23 17:03 ` Jeff Layton 2017-02-23 17:03 ` [PATCH 4/4] sunrpc: don't register UDP port with rpcbind when version needs congestion control Jeff Layton ` (2 subsequent siblings) 5 siblings, 0 replies; 36+ messages in thread From: Jeff Layton @ 2017-02-23 17:03 UTC (permalink / raw) To: bfields, trond.myklebust; +Cc: schumaker.anna, linux-nfs NFSv4 requires an "IETF approved congestion control transport". In practical terms, that means that you should not run NFSv4 over UDP. The server has never enforced that requirement though. This patchset fixes that by adding a new flag to the svc_version that states that it requires a congestion-controlled transport, and a new flag to the svc_xprt that declares that the transport fulfills that requirement. We then use those flags to enforce the requirement in svc_process_common. In practical terms, that means that we set the new xprt flag in TCP and RDMA transports, and the the new svc_version flag in NFSv4 svc_version structs. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/nfs/callback_xdr.c | 2 ++ fs/nfsd/nfs4proc.c | 13 +++++++------ include/linux/sunrpc/svc.h | 3 +++ net/sunrpc/svc.c | 13 +++++++++++++ 4 files changed, 25 insertions(+), 6 deletions(-) diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index e9836f611d9c..fd0284c1dc32 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -1084,6 +1084,7 @@ struct svc_version nfs4_callback_version1 = { .vs_xdrsize = NFS4_CALLBACK_XDRSIZE, .vs_dispatch = NULL, .vs_hidden = true, + .vs_need_cong_ctrl = true, }; struct svc_version nfs4_callback_version4 = { @@ -1093,4 +1094,5 @@ struct svc_version nfs4_callback_version4 = { .vs_xdrsize = NFS4_CALLBACK_XDRSIZE, .vs_dispatch = NULL, .vs_hidden = true, + .vs_need_cong_ctrl = true, }; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 2b73b37c2b36..f82fcaa2e1d9 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2476,12 +2476,13 @@ static struct svc_procedure nfsd_procedures4[2] = { }; struct svc_version nfsd_version4 = { - .vs_vers = 4, - .vs_nproc = 2, - .vs_proc = nfsd_procedures4, - .vs_dispatch = nfsd_dispatch, - .vs_xdrsize = NFS4_SVC_XDRSIZE, - .vs_rpcb_optnl = true, + .vs_vers = 4, + .vs_nproc = 2, + .vs_proc = nfsd_procedures4, + .vs_dispatch = nfsd_dispatch, + .vs_xdrsize = NFS4_SVC_XDRSIZE, + .vs_rpcb_optnl = true, + .vs_need_cong_ctrl = true, }; /* diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 96467c95f02e..65b1c707c40b 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -406,6 +406,9 @@ struct svc_version { /* Don't care if the rpcbind registration fails */ bool vs_rpcb_optnl; + /* Only allowed on IETF-approved congestion controlled xprts */ + bool vs_need_cong_ctrl; + /* Override dispatch function (e.g. when caching replies). * A return value of 0 means drop the request. * vs_dispatch == NULL means use default dispatcher. diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 85bcdea67a3f..028c91a07950 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1169,6 +1169,19 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) !(versp = progp->pg_vers[vers])) goto err_bad_vers; + /* + * Some protocol versions (namely NFSv4) require "IETF approved + * congestion control protocols". IOW, UDP is not allowed. We mark + * those when setting up the svc_xprt, and verify that for them here. + * + * The spec is not very clear about what error should be returned when + * someone tries to access a server that is listening on UDP for lower + * versions though. RPC_PROG_MISMATCH seems to be the closest fit. + */ + if (versp->vs_need_cong_ctrl && + !test_bit(XPT_CONG_CTRL, &rqstp->rq_xprt->xpt_flags)) + goto err_bad_vers; + procp = versp->vs_proc + proc; if (proc >= versp->vs_nproc || !procp->pc_func) goto err_bad_proc; -- 2.9.3 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 4/4] sunrpc: don't register UDP port with rpcbind when version needs congestion control 2017-02-23 17:03 [PATCH 0/4] nfs/nfsd/sunrpc: enforce requirement for congestion control protocols in NFSv4 Jeff Layton ` (2 preceding siblings ...) 2017-02-23 17:03 ` [PATCH 3/4] nfs/nfsd/sunrpc: enforce congestion control protocol requirement for NFSv4 Jeff Layton @ 2017-02-23 17:03 ` Jeff Layton 2017-02-23 17:17 ` [PATCH 0/4] nfs/nfsd/sunrpc: enforce requirement for congestion control protocols in NFSv4 Jeff Layton 2017-02-24 18:25 ` [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements Jeff Layton 5 siblings, 0 replies; 36+ messages in thread From: Jeff Layton @ 2017-02-23 17:03 UTC (permalink / raw) To: bfields, trond.myklebust; +Cc: schumaker.anna, linux-nfs Signed-off-by: Jeff Layton <jlayton@redhat.com> --- net/sunrpc/svc.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 028c91a07950..984f1e73722f 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -976,6 +976,13 @@ int svc_register(const struct svc_serv *serv, struct net *net, if (vers->vs_hidden) continue; + /* + * Don't register a UDP port if we need congestion + * control. + */ + if (vers->vs_need_cong_ctrl && proto == IPPROTO_UDP) + continue; + error = __svc_register(net, progp->pg_name, progp->pg_prog, i, family, proto, port); -- 2.9.3 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 0/4] nfs/nfsd/sunrpc: enforce requirement for congestion control protocols in NFSv4 2017-02-23 17:03 [PATCH 0/4] nfs/nfsd/sunrpc: enforce requirement for congestion control protocols in NFSv4 Jeff Layton ` (3 preceding siblings ...) 2017-02-23 17:03 ` [PATCH 4/4] sunrpc: don't register UDP port with rpcbind when version needs congestion control Jeff Layton @ 2017-02-23 17:17 ` Jeff Layton 2017-02-24 18:25 ` [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements Jeff Layton 5 siblings, 0 replies; 36+ messages in thread From: Jeff Layton @ 2017-02-23 17:17 UTC (permalink / raw) To: bfields, trond.myklebust; +Cc: schumaker.anna, linux-nfs On Thu, 2017-02-23 at 12:03 -0500, Jeff Layton wrote: > RFC5661 says: > > Where an NFSv4.1 implementation supports operation over the IP > network protocol, any transport used between NFS and IP MUST be among > the IETF-approved congestion control transport protocols. > > ...and RFC7530 has similar verbiage. The NFS server has never enforced > this requirement, however, so a user could issue NFSv4 calls against > the server via UDP. > > This patchset adds a small bit of infrastructure to the sunrpc layer > to enforce this requirement, and has the nfs and nfsd layers set the > appropriate flags for it. It also has knfsd skip registering a UDP > port for NFSv4, using the same flags. > > Lightly tested by hand, but it's fairly straightforward. > > Jeff Layton (4): > sunrpc: flag transports as using IETF approved congestion control > protocols > sunrpc: turn bitfield flags in svc_version into bools > nfs/nfsd/sunrpc: enforce congestion control protocol requirement for > NFSv4 > sunrpc: don't register UDP port with rpcbind when version needs > congestion control > > fs/nfs/callback_xdr.c | 6 ++++-- > fs/nfsd/nfs2acl.c | 1 - > fs/nfsd/nfs3acl.c | 1 - > fs/nfsd/nfs4proc.c | 13 +++++++------ > include/linux/sunrpc/svc.h | 12 ++++++++---- > include/linux/sunrpc/svc_xprt.h | 1 + > net/sunrpc/svc.c | 22 +++++++++++++++++++++- > net/sunrpc/svcsock.c | 1 + > net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 ++ > 9 files changed, 44 insertions(+), 15 deletions(-) > I probably should have sent this as an RFC first. I'm not 100% clear on whether PROG_MISMATCH is the right return code there. Also, there is still a small wart after this patchset. The high/low program versions reported look a little odd: $ rpcinfo -T udp knfsdsrv nfs 4 rpcinfo: RPC: Program/version mismatch; low version = 3, high version = 4 program 100003 version 4 is not available We could try to fix this and report different values depending on the socket type, but I'm not sure I really care. AFAIK, this is just informative anyway, and it's not _technically_ wrong. The server does support version 4, just not the UDP socket where we sent the RPC ping. Thoughts? -- Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements 2017-02-23 17:03 [PATCH 0/4] nfs/nfsd/sunrpc: enforce requirement for congestion control protocols in NFSv4 Jeff Layton ` (4 preceding siblings ...) 2017-02-23 17:17 ` [PATCH 0/4] nfs/nfsd/sunrpc: enforce requirement for congestion control protocols in NFSv4 Jeff Layton @ 2017-02-24 18:25 ` Jeff Layton 2017-02-24 18:25 ` [PATCH v2 1/4] sunrpc: turn bitfield flags in svc_version into bools Jeff Layton ` (6 more replies) 5 siblings, 7 replies; 36+ messages in thread From: Jeff Layton @ 2017-02-24 18:25 UTC (permalink / raw) To: bfields, trond.myklebust Cc: schumaker.anna, linux-nfs, chuck.lever, tom, jgunthorpe v2: comment clarifications, and commit log cleanup. No functional changes. RFC5661 says: NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA- based transports with the following attributes: o The transport supports reliable delivery of data, which NFSv4.1 requires but neither NFSv4.1 nor RPC has facilities for ensuring [34]. o The transport delivers data in the order it was sent. Ordered delivery simplifies detection of transmit errors, and simplifies the sending of arbitrary sized requests and responses via the record marking protocol [3]. ...and then some hand-wavy stuff about congestion control. RFC7530 doesn't mention needing reliable and ordered delivery, but it does need congestion control. In practical terms, that means we should be excluding NFSv4 from UDP transports. The NFS server has never enforced this requirement, however, so a user could issue NFSv4 calls against the server via UDP. This patchset adds a small bit of infrastructure to the sunrpc layer to enforce this requirement, and has the nfs and nfsd layers set the appropriate flags for it on their server-side transports. It also has the rpcbind client skip registering the protocol version on a UDP port when that flag is set. Lightly tested by hand, but it's fairly straightforward. Jeff Layton (4): sunrpc: turn bitfield flags in svc_version into bools sunrpc: flag transports as having both reliable and ordered delivery, and congestion control nfs/nfsd/sunrpc: enforce transport requirements for NFSv4 sunrpc: don't register UDP port with rpcbind when version needs congestion control fs/nfs/callback_xdr.c | 6 ++++-- fs/nfsd/nfs2acl.c | 1 - fs/nfsd/nfs3acl.c | 1 - fs/nfsd/nfs4proc.c | 13 +++++++------ include/linux/sunrpc/svc.h | 12 ++++++++---- include/linux/sunrpc/svc_xprt.h | 1 + net/sunrpc/svc.c | 23 ++++++++++++++++++++++- net/sunrpc/svcsock.c | 1 + net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 ++++++++ 9 files changed, 51 insertions(+), 15 deletions(-) -- 2.9.3 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 1/4] sunrpc: turn bitfield flags in svc_version into bools 2017-02-24 18:25 ` [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements Jeff Layton @ 2017-02-24 18:25 ` Jeff Layton 2017-02-24 18:25 ` [PATCH v2 2/4] sunrpc: flag transports as having both reliable and ordered delivery, and congestion control Jeff Layton ` (5 subsequent siblings) 6 siblings, 0 replies; 36+ messages in thread From: Jeff Layton @ 2017-02-24 18:25 UTC (permalink / raw) To: bfields, trond.myklebust Cc: schumaker.anna, linux-nfs, chuck.lever, tom, jgunthorpe It's just simpler to read this way, IMO. Also, no need to explicitly set vs_hidden to false in the nfsacl ones. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/nfs/callback_xdr.c | 4 ++-- fs/nfsd/nfs2acl.c | 1 - fs/nfsd/nfs3acl.c | 1 - fs/nfsd/nfs4proc.c | 2 +- include/linux/sunrpc/svc.h | 9 +++++---- net/sunrpc/svc.c | 2 +- 6 files changed, 9 insertions(+), 10 deletions(-) diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index eb094c6011d8..e9836f611d9c 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -1083,7 +1083,7 @@ struct svc_version nfs4_callback_version1 = { .vs_proc = nfs4_callback_procedures1, .vs_xdrsize = NFS4_CALLBACK_XDRSIZE, .vs_dispatch = NULL, - .vs_hidden = 1, + .vs_hidden = true, }; struct svc_version nfs4_callback_version4 = { @@ -1092,5 +1092,5 @@ struct svc_version nfs4_callback_version4 = { .vs_proc = nfs4_callback_procedures1, .vs_xdrsize = NFS4_CALLBACK_XDRSIZE, .vs_dispatch = NULL, - .vs_hidden = 1, + .vs_hidden = true, }; diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index d08cd88155c7..838f90f3f890 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -376,5 +376,4 @@ struct svc_version nfsd_acl_version2 = { .vs_proc = nfsd_acl_procedures2, .vs_dispatch = nfsd_dispatch, .vs_xdrsize = NFS3_SVC_XDRSIZE, - .vs_hidden = 0, }; diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 0c890347cde3..dcb5f79076c0 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -266,6 +266,5 @@ struct svc_version nfsd_acl_version3 = { .vs_proc = nfsd_acl_procedures3, .vs_dispatch = nfsd_dispatch, .vs_xdrsize = NFS3_SVC_XDRSIZE, - .vs_hidden = 0, }; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 74a6e573e061..2b73b37c2b36 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2481,7 +2481,7 @@ struct svc_version nfsd_version4 = { .vs_proc = nfsd_procedures4, .vs_dispatch = nfsd_dispatch, .vs_xdrsize = NFS4_SVC_XDRSIZE, - .vs_rpcb_optnl = 1, + .vs_rpcb_optnl = true, }; /* diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 7321ae933867..96467c95f02e 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -400,10 +400,11 @@ struct svc_version { struct svc_procedure * vs_proc; /* per-procedure info */ u32 vs_xdrsize; /* xdrsize needed for this version */ - unsigned int vs_hidden : 1, /* Don't register with portmapper. - * Only used for nfsacl so far. */ - vs_rpcb_optnl:1;/* Don't care the result of register. - * Only used for nfsv4. */ + /* Don't register with rpcbind */ + bool vs_hidden; + + /* Don't care if the rpcbind registration fails */ + bool vs_rpcb_optnl; /* Override dispatch function (e.g. when caching replies). * A return value of 0 means drop the request. diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 75f290bddca1..85bcdea67a3f 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -385,7 +385,7 @@ static int svc_uses_rpcbind(struct svc_serv *serv) for (i = 0; i < progp->pg_nvers; i++) { if (progp->pg_vers[i] == NULL) continue; - if (progp->pg_vers[i]->vs_hidden == 0) + if (!progp->pg_vers[i]->vs_hidden) return 1; } } -- 2.9.3 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 2/4] sunrpc: flag transports as having both reliable and ordered delivery, and congestion control 2017-02-24 18:25 ` [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements Jeff Layton 2017-02-24 18:25 ` [PATCH v2 1/4] sunrpc: turn bitfield flags in svc_version into bools Jeff Layton @ 2017-02-24 18:25 ` Jeff Layton 2017-02-24 18:25 ` [PATCH v2 3/4] nfs/nfsd/sunrpc: enforce transport requirements for NFSv4 Jeff Layton ` (4 subsequent siblings) 6 siblings, 0 replies; 36+ messages in thread From: Jeff Layton @ 2017-02-24 18:25 UTC (permalink / raw) To: bfields, trond.myklebust Cc: schumaker.anna, linux-nfs, chuck.lever, tom, jgunthorpe NFSv4 requires a transport protocol with reliable and ordered delivery, and congestion control in most cases. On an IP network, that means that NFSv4 over UDP should be forbidden. The situation with RDMA is a bit more nuanced, but most RDMA transports are suitable for this. For now, we assume that all RDMA transports are suitable, but we may need to revise that at some point. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- include/linux/sunrpc/svc_xprt.h | 1 + net/sunrpc/svcsock.c | 1 + net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 ++++++++ 3 files changed, 10 insertions(+) diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index 7440290f64ac..c832acf509d8 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -67,6 +67,7 @@ struct svc_xprt { #define XPT_CACHE_AUTH 11 /* cache auth info */ #define XPT_LOCAL 12 /* connection from loopback interface */ #define XPT_KILL_TEMP 13 /* call xpo_kill_temp_xprt before closing */ +#define XPT_CONG_CTRL 14 /* has reliable and ordered delivery, and congestion control */ struct svc_serv *xpt_server; /* service for transport */ atomic_t xpt_reserved; /* space on outq that is rsvd */ diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index de066acdb34e..1956b8b96b2d 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1306,6 +1306,7 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv) svc_xprt_init(sock_net(svsk->sk_sock->sk), &svc_tcp_class, &svsk->sk_xprt, serv); set_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags); + set_bit(XPT_CONG_CTRL, &svsk->sk_xprt.xpt_flags); if (sk->sk_state == TCP_LISTEN) { dprintk("setting up TCP socket for listening\n"); set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags); diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index 39652d390a9c..0476b412c7b1 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -571,6 +571,14 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv, spin_lock_init(&cma_xprt->sc_ctxt_lock); spin_lock_init(&cma_xprt->sc_map_lock); + /* + * Note that this implies that the underlying transport support + * reliable and ordered delivery, and (generally) some form of + * congestion control. For now, we assume that all supported RDMA + * transports are suitable here. + */ + set_bit(XPT_CONG_CTRL, &cma_xprt->sc_xprt.xpt_flags); + if (listener) set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags); -- 2.9.3 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 3/4] nfs/nfsd/sunrpc: enforce transport requirements for NFSv4 2017-02-24 18:25 ` [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements Jeff Layton 2017-02-24 18:25 ` [PATCH v2 1/4] sunrpc: turn bitfield flags in svc_version into bools Jeff Layton 2017-02-24 18:25 ` [PATCH v2 2/4] sunrpc: flag transports as having both reliable and ordered delivery, and congestion control Jeff Layton @ 2017-02-24 18:25 ` Jeff Layton 2017-02-24 18:25 ` [PATCH v2 4/4] sunrpc: don't register UDP port with rpcbind when version needs congestion control Jeff Layton ` (3 subsequent siblings) 6 siblings, 0 replies; 36+ messages in thread From: Jeff Layton @ 2017-02-24 18:25 UTC (permalink / raw) To: bfields, trond.myklebust Cc: schumaker.anna, linux-nfs, chuck.lever, tom, jgunthorpe NFSv4 requires reliable and ordered delivery, and (in most cases) congestion control. In practical terms, that means that you should not run NFSv4 over UDP. The server has never enforced that requirement, however. This patchset fixes this by adding a new flag to the svc_version that states that it has these transport requirements. With that, we can check that the transport has XPT_CONG_CTRL set before processing an RPC. If it doesn't we reject it with RPC_PROG_MISMATCH. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/nfs/callback_xdr.c | 2 ++ fs/nfsd/nfs4proc.c | 13 +++++++------ include/linux/sunrpc/svc.h | 3 +++ net/sunrpc/svc.c | 14 ++++++++++++++ 4 files changed, 26 insertions(+), 6 deletions(-) diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index e9836f611d9c..fd0284c1dc32 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -1084,6 +1084,7 @@ struct svc_version nfs4_callback_version1 = { .vs_xdrsize = NFS4_CALLBACK_XDRSIZE, .vs_dispatch = NULL, .vs_hidden = true, + .vs_need_cong_ctrl = true, }; struct svc_version nfs4_callback_version4 = { @@ -1093,4 +1094,5 @@ struct svc_version nfs4_callback_version4 = { .vs_xdrsize = NFS4_CALLBACK_XDRSIZE, .vs_dispatch = NULL, .vs_hidden = true, + .vs_need_cong_ctrl = true, }; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 2b73b37c2b36..f82fcaa2e1d9 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2476,12 +2476,13 @@ static struct svc_procedure nfsd_procedures4[2] = { }; struct svc_version nfsd_version4 = { - .vs_vers = 4, - .vs_nproc = 2, - .vs_proc = nfsd_procedures4, - .vs_dispatch = nfsd_dispatch, - .vs_xdrsize = NFS4_SVC_XDRSIZE, - .vs_rpcb_optnl = true, + .vs_vers = 4, + .vs_nproc = 2, + .vs_proc = nfsd_procedures4, + .vs_dispatch = nfsd_dispatch, + .vs_xdrsize = NFS4_SVC_XDRSIZE, + .vs_rpcb_optnl = true, + .vs_need_cong_ctrl = true, }; /* diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 96467c95f02e..45a7da2b169e 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -406,6 +406,9 @@ struct svc_version { /* Don't care if the rpcbind registration fails */ bool vs_rpcb_optnl; + /* Need xprt with reliable, ordered delivery and congestion control */ + bool vs_need_cong_ctrl; + /* Override dispatch function (e.g. when caching replies). * A return value of 0 means drop the request. * vs_dispatch == NULL means use default dispatcher. diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 85bcdea67a3f..668e01d8abb8 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1169,6 +1169,20 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) !(versp = progp->pg_vers[vers])) goto err_bad_vers; + /* + * Some protocol versions (namely NFSv4) require reliable and ordered + * delivery and usually some form of congestion control. IOW, UDP is + * not allowed. We mark those when setting up the svc_xprt, and verify + * that here. + * + * The spec is not very clear about what error should be returned when + * someone tries to access a server that is listening on UDP for lower + * versions though. RPC_PROG_MISMATCH seems to be the closest fit. + */ + if (versp->vs_need_cong_ctrl && + !test_bit(XPT_CONG_CTRL, &rqstp->rq_xprt->xpt_flags)) + goto err_bad_vers; + procp = versp->vs_proc + proc; if (proc >= versp->vs_nproc || !procp->pc_func) goto err_bad_proc; -- 2.9.3 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 4/4] sunrpc: don't register UDP port with rpcbind when version needs congestion control 2017-02-24 18:25 ` [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements Jeff Layton ` (2 preceding siblings ...) 2017-02-24 18:25 ` [PATCH v2 3/4] nfs/nfsd/sunrpc: enforce transport requirements for NFSv4 Jeff Layton @ 2017-02-24 18:25 ` Jeff Layton 2017-02-24 18:38 ` [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements Chuck Lever ` (2 subsequent siblings) 6 siblings, 0 replies; 36+ messages in thread From: Jeff Layton @ 2017-02-24 18:25 UTC (permalink / raw) To: bfields, trond.myklebust Cc: schumaker.anna, linux-nfs, chuck.lever, tom, jgunthorpe Signed-off-by: Jeff Layton <jlayton@redhat.com> --- net/sunrpc/svc.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 668e01d8abb8..bb4a1a50305b 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -976,6 +976,13 @@ int svc_register(const struct svc_serv *serv, struct net *net, if (vers->vs_hidden) continue; + /* + * Don't register a UDP port if we need congestion + * control. + */ + if (vers->vs_need_cong_ctrl && proto == IPPROTO_UDP) + continue; + error = __svc_register(net, progp->pg_name, progp->pg_prog, i, family, proto, port); -- 2.9.3 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements 2017-02-24 18:25 ` [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements Jeff Layton ` (3 preceding siblings ...) 2017-02-24 18:25 ` [PATCH v2 4/4] sunrpc: don't register UDP port with rpcbind when version needs congestion control Jeff Layton @ 2017-02-24 18:38 ` Chuck Lever 2017-02-24 18:53 ` Jeff Layton 2017-02-24 18:53 ` Tom Talpey 2017-02-24 21:25 ` J. Bruce Fields 6 siblings, 1 reply; 36+ messages in thread From: Chuck Lever @ 2017-02-24 18:38 UTC (permalink / raw) To: Jeff Layton Cc: J. Bruce Fields, Trond Myklebust, Anna Schumaker, Linux NFS Mailing List, Tom Talpey, jgunthorpe > On Feb 24, 2017, at 10:25 AM, Jeff Layton <jlayton@redhat.com> wrote: > > v2: comment clarifications, and commit log cleanup. No functional changes. > > RFC5661 says: > > NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA- > based transports with the following attributes: > > > o The transport supports reliable delivery of data, which NFSv4.1 > requires but neither NFSv4.1 nor RPC has facilities for ensuring > [34]. > > o The transport delivers data in the order it was sent. Ordered > delivery simplifies detection of transmit errors, and simplifies > the sending of arbitrary sized requests and responses via the > record marking protocol [3]. > > ...and then some hand-wavy stuff about congestion control. RFC7530 > doesn't mention needing reliable and ordered delivery, but it does need > congestion control. > > In practical terms, that means we should be excluding NFSv4 from UDP > transports. The NFS server has never enforced this requirement, > however, so a user could issue NFSv4 calls against the server via UDP. RPC-over-RDMA Version One requires the use of RDMA Reliable Connections, which is a layer above the link layer that provides reliable, in-order delivery using connection semantics. This meets all stated transport requirements in RFC 5661. The language of RFC 5661 says that UDP by itself must not be used for NFSv4. IMO the use of Reliable Connections with RPC-over-RDMA makes this a non-issue for NFSv4, even for RoCE v2. rfc5667bis-06 was submitted this morning to address this. > This patchset adds a small bit of infrastructure to the sunrpc layer to > enforce this requirement, and has the nfs and nfsd layers set the > appropriate flags for it on their server-side transports. It also has > the rpcbind client skip registering the protocol version on a UDP port > when that flag is set. > > Lightly tested by hand, but it's fairly straightforward. > > Jeff Layton (4): > sunrpc: turn bitfield flags in svc_version into bools > sunrpc: flag transports as having both reliable and ordered delivery, > and congestion control > nfs/nfsd/sunrpc: enforce transport requirements for NFSv4 > sunrpc: don't register UDP port with rpcbind when version needs > congestion control > > fs/nfs/callback_xdr.c | 6 ++++-- > fs/nfsd/nfs2acl.c | 1 - > fs/nfsd/nfs3acl.c | 1 - > fs/nfsd/nfs4proc.c | 13 +++++++------ > include/linux/sunrpc/svc.h | 12 ++++++++---- > include/linux/sunrpc/svc_xprt.h | 1 + > net/sunrpc/svc.c | 23 ++++++++++++++++++++++- > net/sunrpc/svcsock.c | 1 + > net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 ++++++++ > 9 files changed, 51 insertions(+), 15 deletions(-) > > -- > 2.9.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements 2017-02-24 18:38 ` [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements Chuck Lever @ 2017-02-24 18:53 ` Jeff Layton 2017-02-24 21:23 ` J. Bruce Fields 0 siblings, 1 reply; 36+ messages in thread From: Jeff Layton @ 2017-02-24 18:53 UTC (permalink / raw) To: Chuck Lever Cc: J. Bruce Fields, Trond Myklebust, Anna Schumaker, Linux NFS Mailing List, Tom Talpey, jgunthorpe On Fri, 2017-02-24 at 10:38 -0800, Chuck Lever wrote: > > On Feb 24, 2017, at 10:25 AM, Jeff Layton <jlayton@redhat.com> wrote: > > > > v2: comment clarifications, and commit log cleanup. No functional changes. > > > > RFC5661 says: > > > > NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA- > > based transports with the following attributes: > > > > > > o The transport supports reliable delivery of data, which NFSv4.1 > > requires but neither NFSv4.1 nor RPC has facilities for ensuring > > [34]. > > > > o The transport delivers data in the order it was sent. Ordered > > delivery simplifies detection of transmit errors, and simplifies > > the sending of arbitrary sized requests and responses via the > > record marking protocol [3]. > > > > ...and then some hand-wavy stuff about congestion control. RFC7530 > > doesn't mention needing reliable and ordered delivery, but it does need > > congestion control. > > > > In practical terms, that means we should be excluding NFSv4 from UDP > > transports. The NFS server has never enforced this requirement, > > however, so a user could issue NFSv4 calls against the server via UDP. > > RPC-over-RDMA Version One requires the use of RDMA Reliable > Connections, which is a layer above the link layer that > provides reliable, in-order delivery using connection > semantics. This meets all stated transport requirements in > RFC 5661. > > The language of RFC 5661 says that UDP by itself must not be > used for NFSv4. IMO the use of Reliable Connections with > RPC-over-RDMA makes this a non-issue for NFSv4, even for RoCE > v2. > > rfc5667bis-06 was submitted this morning to address this. > Thanks, I may plagiarize you and update the comment in rdma_create_xprt if that's ok: + /* + * RPC-over-RDMA Version One requires the use of RDMA Reliable + * Connections, which is a layer above the link layer that provides + * reliable, in-order delivery using connection semantics. + */ I won't bother to re-post just for that though. > > This patchset adds a small bit of infrastructure to the sunrpc layer to > > enforce this requirement, and has the nfs and nfsd layers set the > > appropriate flags for it on their server-side transports. It also has > > the rpcbind client skip registering the protocol version on a UDP port > > when that flag is set. > > > > Lightly tested by hand, but it's fairly straightforward. > > > > Jeff Layton (4): > > sunrpc: turn bitfield flags in svc_version into bools > > sunrpc: flag transports as having both reliable and ordered delivery, > > and congestion control > > nfs/nfsd/sunrpc: enforce transport requirements for NFSv4 > > sunrpc: don't register UDP port with rpcbind when version needs > > congestion control > > > > fs/nfs/callback_xdr.c | 6 ++++-- > > fs/nfsd/nfs2acl.c | 1 - > > fs/nfsd/nfs3acl.c | 1 - > > fs/nfsd/nfs4proc.c | 13 +++++++------ > > include/linux/sunrpc/svc.h | 12 ++++++++---- > > include/linux/sunrpc/svc_xprt.h | 1 + > > net/sunrpc/svc.c | 23 ++++++++++++++++++++++- > > net/sunrpc/svcsock.c | 1 + > > net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 ++++++++ > > 9 files changed, 51 insertions(+), 15 deletions(-) > > > > -- > > 2.9.3 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Chuck Lever > > > -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements 2017-02-24 18:53 ` Jeff Layton @ 2017-02-24 21:23 ` J. Bruce Fields 0 siblings, 0 replies; 36+ messages in thread From: J. Bruce Fields @ 2017-02-24 21:23 UTC (permalink / raw) To: Jeff Layton Cc: Chuck Lever, Trond Myklebust, Anna Schumaker, Linux NFS Mailing List, Tom Talpey, jgunthorpe On Fri, Feb 24, 2017 at 01:53:33PM -0500, Jeff Layton wrote: > On Fri, 2017-02-24 at 10:38 -0800, Chuck Lever wrote: > > > On Feb 24, 2017, at 10:25 AM, Jeff Layton <jlayton@redhat.com> wrote: > > > > > > v2: comment clarifications, and commit log cleanup. No functional changes. > > > > > > RFC5661 says: > > > > > > NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA- > > > based transports with the following attributes: > > > > > > > > > o The transport supports reliable delivery of data, which NFSv4.1 > > > requires but neither NFSv4.1 nor RPC has facilities for ensuring > > > [34]. > > > > > > o The transport delivers data in the order it was sent. Ordered > > > delivery simplifies detection of transmit errors, and simplifies > > > the sending of arbitrary sized requests and responses via the > > > record marking protocol [3]. > > > > > > ...and then some hand-wavy stuff about congestion control. RFC7530 > > > doesn't mention needing reliable and ordered delivery, but it does need > > > congestion control. > > > > > > In practical terms, that means we should be excluding NFSv4 from UDP > > > transports. The NFS server has never enforced this requirement, > > > however, so a user could issue NFSv4 calls against the server via UDP. > > > > RPC-over-RDMA Version One requires the use of RDMA Reliable > > Connections, which is a layer above the link layer that > > provides reliable, in-order delivery using connection > > semantics. This meets all stated transport requirements in > > RFC 5661. > > > > The language of RFC 5661 says that UDP by itself must not be > > used for NFSv4. IMO the use of Reliable Connections with > > RPC-over-RDMA makes this a non-issue for NFSv4, even for RoCE > > v2. > > > > rfc5667bis-06 was submitted this morning to address this. > > > > Thanks, I may plagiarize you and update the comment in rdma_create_xprt > if that's ok: > > + /* > + * RPC-over-RDMA Version One requires the use of RDMA Reliable > + * Connections, which is a layer above the link layer that provides > + * reliable, in-order delivery using connection semantics. > + */ > > I won't bother to re-post just for that though. Unless we thnk this is esepcially important I'd rather leave the comment as is ("we assume that all supported RDMA transports are suitable here.") instead of getting into more detail. --b. > > > > This patchset adds a small bit of infrastructure to the sunrpc layer to > > > enforce this requirement, and has the nfs and nfsd layers set the > > > appropriate flags for it on their server-side transports. It also has > > > the rpcbind client skip registering the protocol version on a UDP port > > > when that flag is set. > > > > > > Lightly tested by hand, but it's fairly straightforward. > > > > > > Jeff Layton (4): > > > sunrpc: turn bitfield flags in svc_version into bools > > > sunrpc: flag transports as having both reliable and ordered delivery, > > > and congestion control > > > nfs/nfsd/sunrpc: enforce transport requirements for NFSv4 > > > sunrpc: don't register UDP port with rpcbind when version needs > > > congestion control > > > > > > fs/nfs/callback_xdr.c | 6 ++++-- > > > fs/nfsd/nfs2acl.c | 1 - > > > fs/nfsd/nfs3acl.c | 1 - > > > fs/nfsd/nfs4proc.c | 13 +++++++------ > > > include/linux/sunrpc/svc.h | 12 ++++++++---- > > > include/linux/sunrpc/svc_xprt.h | 1 + > > > net/sunrpc/svc.c | 23 ++++++++++++++++++++++- > > > net/sunrpc/svcsock.c | 1 + > > > net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 ++++++++ > > > 9 files changed, 51 insertions(+), 15 deletions(-) > > > > > > -- > > > 2.9.3 > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > > Chuck Lever > > > > > > > > -- > Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements 2017-02-24 18:25 ` [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements Jeff Layton ` (4 preceding siblings ...) 2017-02-24 18:38 ` [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements Chuck Lever @ 2017-02-24 18:53 ` Tom Talpey 2017-02-24 21:22 ` J. Bruce Fields 2017-02-24 21:25 ` J. Bruce Fields 6 siblings, 1 reply; 36+ messages in thread From: Tom Talpey @ 2017-02-24 18:53 UTC (permalink / raw) To: Jeff Layton, bfields, trond.myklebust Cc: schumaker.anna, linux-nfs, chuck.lever, jgunthorpe On 2/24/2017 1:25 PM, Jeff Layton wrote: > v2: comment clarifications, and commit log cleanup. No functional changes. > > RFC5661 says: > > NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA- > based transports with the following attributes: > > > o The transport supports reliable delivery of data, which NFSv4.1 > requires but neither NFSv4.1 nor RPC has facilities for ensuring > [34]. > > o The transport delivers data in the order it was sent. Ordered > delivery simplifies detection of transmit errors, and simplifies > the sending of arbitrary sized requests and responses via the > record marking protocol [3]. > > ...and then some hand-wavy stuff about congestion control. RFC7530 > doesn't mention needing reliable and ordered delivery, but it does need > congestion control. Snipping some stuff for a pedantic response :-) There are several good reasons why RFC7530 does not specify reliable and ordered. The most obvious being, it doesn't need them. Because it has a session, it can handle out-of-order messages at its layer. This is in fact critical to supporting trunking and multipathing. And with the session comes the ability to detect replays, so reliability can be obviated there too. In fact, apart from congestion control, with the proper session support, NFSv4.1 can run very nicely over an unreliable unordered transport. Now, NFS4.0, and NFSv3 and NFSv2 before it, are another matter entirely. Note that RDMA transports provide remote direct placement only in RC (Reliable Connected) endpoints, which is why rpcrdma uses that mode. Tom. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements 2017-02-24 18:53 ` Tom Talpey @ 2017-02-24 21:22 ` J. Bruce Fields 0 siblings, 0 replies; 36+ messages in thread From: J. Bruce Fields @ 2017-02-24 21:22 UTC (permalink / raw) To: Tom Talpey Cc: Jeff Layton, trond.myklebust, schumaker.anna, linux-nfs, chuck.lever, jgunthorpe On Fri, Feb 24, 2017 at 01:53:21PM -0500, Tom Talpey wrote: > On 2/24/2017 1:25 PM, Jeff Layton wrote: > >v2: comment clarifications, and commit log cleanup. No functional changes. > > > >RFC5661 says: > > > > NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA- > > based transports with the following attributes: > > > > > > o The transport supports reliable delivery of data, which NFSv4.1 > > requires but neither NFSv4.1 nor RPC has facilities for ensuring > > [34]. > > > > o The transport delivers data in the order it was sent. Ordered > > delivery simplifies detection of transmit errors, and simplifies > > the sending of arbitrary sized requests and responses via the > > record marking protocol [3]. > > > >...and then some hand-wavy stuff about congestion control. RFC7530 > >doesn't mention needing reliable and ordered delivery, but it does need > >congestion control. > > Snipping some stuff for a pedantic response :-) > > There are several good reasons why RFC7530 does not specify reliable and > ordered. OK, I'm dropping "reliable and ordered" from the comments and applying. --b. > The most obvious being, it doesn't need them. Because it has > a session, it can handle out-of-order messages at its layer. This is in > fact critical to supporting trunking and multipathing. And with the > session comes the ability to detect replays, so reliability can be > obviated there too. > > In fact, apart from congestion control, with the proper session support, > NFSv4.1 can run very nicely over an unreliable unordered transport. > Now, NFS4.0, and NFSv3 and NFSv2 before it, are another matter entirely. > > Note that RDMA transports provide remote direct placement only in RC > (Reliable Connected) endpoints, which is why rpcrdma uses that mode. > > Tom. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements 2017-02-24 18:25 ` [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements Jeff Layton ` (5 preceding siblings ...) 2017-02-24 18:53 ` Tom Talpey @ 2017-02-24 21:25 ` J. Bruce Fields 2017-02-24 21:34 ` Jeff Layton 6 siblings, 1 reply; 36+ messages in thread From: J. Bruce Fields @ 2017-02-24 21:25 UTC (permalink / raw) To: Jeff Layton Cc: trond.myklebust, schumaker.anna, linux-nfs, chuck.lever, tom, jgunthorpe The one other minor thing we could do is skip adding the UDP listener entirely in the v4-only case. I think that's a job for rpc.nfsd? --b. On Fri, Feb 24, 2017 at 01:25:21PM -0500, Jeff Layton wrote: > v2: comment clarifications, and commit log cleanup. No functional changes. > > RFC5661 says: > > NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA- > based transports with the following attributes: > > > o The transport supports reliable delivery of data, which NFSv4.1 > requires but neither NFSv4.1 nor RPC has facilities for ensuring > [34]. > > o The transport delivers data in the order it was sent. Ordered > delivery simplifies detection of transmit errors, and simplifies > the sending of arbitrary sized requests and responses via the > record marking protocol [3]. > > ...and then some hand-wavy stuff about congestion control. RFC7530 > doesn't mention needing reliable and ordered delivery, but it does need > congestion control. > > In practical terms, that means we should be excluding NFSv4 from UDP > transports. The NFS server has never enforced this requirement, > however, so a user could issue NFSv4 calls against the server via UDP. > > This patchset adds a small bit of infrastructure to the sunrpc layer to > enforce this requirement, and has the nfs and nfsd layers set the > appropriate flags for it on their server-side transports. It also has > the rpcbind client skip registering the protocol version on a UDP port > when that flag is set. > > Lightly tested by hand, but it's fairly straightforward. > > Jeff Layton (4): > sunrpc: turn bitfield flags in svc_version into bools > sunrpc: flag transports as having both reliable and ordered delivery, > and congestion control > nfs/nfsd/sunrpc: enforce transport requirements for NFSv4 > sunrpc: don't register UDP port with rpcbind when version needs > congestion control > > fs/nfs/callback_xdr.c | 6 ++++-- > fs/nfsd/nfs2acl.c | 1 - > fs/nfsd/nfs3acl.c | 1 - > fs/nfsd/nfs4proc.c | 13 +++++++------ > include/linux/sunrpc/svc.h | 12 ++++++++---- > include/linux/sunrpc/svc_xprt.h | 1 + > net/sunrpc/svc.c | 23 ++++++++++++++++++++++- > net/sunrpc/svcsock.c | 1 + > net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 ++++++++ > 9 files changed, 51 insertions(+), 15 deletions(-) > > -- > 2.9.3 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements 2017-02-24 21:25 ` J. Bruce Fields @ 2017-02-24 21:34 ` Jeff Layton 2017-02-24 21:44 ` J. Bruce Fields 0 siblings, 1 reply; 36+ messages in thread From: Jeff Layton @ 2017-02-24 21:34 UTC (permalink / raw) To: J. Bruce Fields Cc: trond.myklebust, schumaker.anna, linux-nfs, chuck.lever, tom, jgunthorpe On Fri, 2017-02-24 at 16:25 -0500, J. Bruce Fields wrote: > The one other minor thing we could do is skip adding the UDP listener > entirely in the v4-only case. I think that's a job for rpc.nfsd? > > --b. > Yeah I think we'd need to fix that in rpc.nfsd. Maybe it's time to just start doing having it do TCP-only by default anyway? Make it so you have to explicitly enable UDP listeners if you want them? Does anyone seriously run NFS over UDP these days for anything other than interop testing? :) > On Fri, Feb 24, 2017 at 01:25:21PM -0500, Jeff Layton wrote: > > v2: comment clarifications, and commit log cleanup. No functional changes. > > > > RFC5661 says: > > > > NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA- > > based transports with the following attributes: > > > > > > o The transport supports reliable delivery of data, which NFSv4.1 > > requires but neither NFSv4.1 nor RPC has facilities for ensuring > > [34]. > > > > o The transport delivers data in the order it was sent. Ordered > > delivery simplifies detection of transmit errors, and simplifies > > the sending of arbitrary sized requests and responses via the > > record marking protocol [3]. > > > > ...and then some hand-wavy stuff about congestion control. RFC7530 > > doesn't mention needing reliable and ordered delivery, but it does need > > congestion control. > > > > In practical terms, that means we should be excluding NFSv4 from UDP > > transports. The NFS server has never enforced this requirement, > > however, so a user could issue NFSv4 calls against the server via UDP. > > > > This patchset adds a small bit of infrastructure to the sunrpc layer to > > enforce this requirement, and has the nfs and nfsd layers set the > > appropriate flags for it on their server-side transports. It also has > > the rpcbind client skip registering the protocol version on a UDP port > > when that flag is set. > > > > Lightly tested by hand, but it's fairly straightforward. > > > > Jeff Layton (4): > > sunrpc: turn bitfield flags in svc_version into bools > > sunrpc: flag transports as having both reliable and ordered delivery, > > and congestion control > > nfs/nfsd/sunrpc: enforce transport requirements for NFSv4 > > sunrpc: don't register UDP port with rpcbind when version needs > > congestion control > > > > fs/nfs/callback_xdr.c | 6 ++++-- > > fs/nfsd/nfs2acl.c | 1 - > > fs/nfsd/nfs3acl.c | 1 - > > fs/nfsd/nfs4proc.c | 13 +++++++------ > > include/linux/sunrpc/svc.h | 12 ++++++++---- > > include/linux/sunrpc/svc_xprt.h | 1 + > > net/sunrpc/svc.c | 23 ++++++++++++++++++++++- > > net/sunrpc/svcsock.c | 1 + > > net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 ++++++++ > > 9 files changed, 51 insertions(+), 15 deletions(-) > > > > -- > > 2.9.3 -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements 2017-02-24 21:34 ` Jeff Layton @ 2017-02-24 21:44 ` J. Bruce Fields 2017-02-27 11:59 ` Jeff Layton 0 siblings, 1 reply; 36+ messages in thread From: J. Bruce Fields @ 2017-02-24 21:44 UTC (permalink / raw) To: Jeff Layton Cc: trond.myklebust, schumaker.anna, linux-nfs, chuck.lever, tom, jgunthorpe On Fri, Feb 24, 2017 at 04:34:24PM -0500, Jeff Layton wrote: > On Fri, 2017-02-24 at 16:25 -0500, J. Bruce Fields wrote: > > The one other minor thing we could do is skip adding the UDP listener > > entirely in the v4-only case. I think that's a job for rpc.nfsd? > > > > --b. > > > > Yeah I think we'd need to fix that in rpc.nfsd. > > Maybe it's time to just start doing having it do TCP-only by default > anyway? Make it so you have to explicitly enable UDP listeners if you > want them? Does anyone seriously run NFS over UDP these days for > anything other than interop testing? :) I thought I remembered somebody floating this on linux-nfs a couple years ago and finding there were still a couple vocal users. Or maybe that was NFSv2. I can't find the thread now. I'm pretty conservative about anything that might break people's ancient but working setups on upgrade, but maybe it's time. Just switching the default to off in nfs-utils first would be the way to go, I think, then if that goes well we could think about phasing out kernel support. --b. > > > > On Fri, Feb 24, 2017 at 01:25:21PM -0500, Jeff Layton wrote: > > > v2: comment clarifications, and commit log cleanup. No functional changes. > > > > > > RFC5661 says: > > > > > > NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA- > > > based transports with the following attributes: > > > > > > > > > o The transport supports reliable delivery of data, which NFSv4.1 > > > requires but neither NFSv4.1 nor RPC has facilities for ensuring > > > [34]. > > > > > > o The transport delivers data in the order it was sent. Ordered > > > delivery simplifies detection of transmit errors, and simplifies > > > the sending of arbitrary sized requests and responses via the > > > record marking protocol [3]. > > > > > > ...and then some hand-wavy stuff about congestion control. RFC7530 > > > doesn't mention needing reliable and ordered delivery, but it does need > > > congestion control. > > > > > > In practical terms, that means we should be excluding NFSv4 from UDP > > > transports. The NFS server has never enforced this requirement, > > > however, so a user could issue NFSv4 calls against the server via UDP. > > > > > > This patchset adds a small bit of infrastructure to the sunrpc layer to > > > enforce this requirement, and has the nfs and nfsd layers set the > > > appropriate flags for it on their server-side transports. It also has > > > the rpcbind client skip registering the protocol version on a UDP port > > > when that flag is set. > > > > > > Lightly tested by hand, but it's fairly straightforward. > > > > > > Jeff Layton (4): > > > sunrpc: turn bitfield flags in svc_version into bools > > > sunrpc: flag transports as having both reliable and ordered delivery, > > > and congestion control > > > nfs/nfsd/sunrpc: enforce transport requirements for NFSv4 > > > sunrpc: don't register UDP port with rpcbind when version needs > > > congestion control > > > > > > fs/nfs/callback_xdr.c | 6 ++++-- > > > fs/nfsd/nfs2acl.c | 1 - > > > fs/nfsd/nfs3acl.c | 1 - > > > fs/nfsd/nfs4proc.c | 13 +++++++------ > > > include/linux/sunrpc/svc.h | 12 ++++++++---- > > > include/linux/sunrpc/svc_xprt.h | 1 + > > > net/sunrpc/svc.c | 23 ++++++++++++++++++++++- > > > net/sunrpc/svcsock.c | 1 + > > > net/sunrpc/xprtrdma/svc_rdma_transport.c | 8 ++++++++ > > > 9 files changed, 51 insertions(+), 15 deletions(-) > > > > > > -- > > > 2.9.3 > > -- > Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements 2017-02-24 21:44 ` J. Bruce Fields @ 2017-02-27 11:59 ` Jeff Layton 2017-02-27 12:08 ` Tom Talpey 0 siblings, 1 reply; 36+ messages in thread From: Jeff Layton @ 2017-02-27 11:59 UTC (permalink / raw) To: J. Bruce Fields Cc: trond.myklebust, schumaker.anna, linux-nfs, chuck.lever, tom, jgunthorpe On Fri, 2017-02-24 at 16:44 -0500, J. Bruce Fields wrote: > On Fri, Feb 24, 2017 at 04:34:24PM -0500, Jeff Layton wrote: > > On Fri, 2017-02-24 at 16:25 -0500, J. Bruce Fields wrote: > > > The one other minor thing we could do is skip adding the UDP listener > > > entirely in the v4-only case. I think that's a job for rpc.nfsd? > > > > > > --b. > > > > > > > Yeah I think we'd need to fix that in rpc.nfsd. > > > > Maybe it's time to just start doing having it do TCP-only by default > > anyway? Make it so you have to explicitly enable UDP listeners if you > > want them? Does anyone seriously run NFS over UDP these days for > > anything other than interop testing? :) > > I thought I remembered somebody floating this on linux-nfs a couple > years ago and finding there were still a couple vocal users. Or maybe > that was NFSv2. I can't find the thread now. > > I'm pretty conservative about anything that might break people's ancient > but working setups on upgrade, but maybe it's time. > > Just switching the default to off in nfs-utils first would be the way to > go, I think, then if that goes well we could think about phasing out > kernel support. > > --b. > Ok, I posted a patch a couple of days ago as an RFC. It's pretty straightforward and works. I don't see any need to turn off kernel support just yet. If we do have users who need it, turning it back on is pretty trivial with nfs.conf. What I'd really like is to eventually have distros move to a default nfsd configuration that is v4-only. Have the kernel only listen for v4 calls on TCP, turn off lockd and statd, and make mountd not open any IP sockets. What we'd need to make that happen, I think is a [global] stanza in nfs.conf with a single 'nfsd_v3' boolean that defaults to off. If someone needs to serve v3, they could turn that on and everything would be reenabled. That would take a bit of plumbing through various daemons though. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements 2017-02-27 11:59 ` Jeff Layton @ 2017-02-27 12:08 ` Tom Talpey 2017-02-27 12:55 ` Jeff Layton 0 siblings, 1 reply; 36+ messages in thread From: Tom Talpey @ 2017-02-27 12:08 UTC (permalink / raw) To: Jeff Layton, J. Bruce Fields Cc: trond.myklebust, schumaker.anna, linux-nfs, chuck.lever, jgunthorpe On 2/27/2017 6:59 AM, Jeff Layton wrote: > What we'd need to make that happen, I think is a [global] stanza in > nfs.conf with a single 'nfsd_v3' boolean that defaults to off. If Don't forget v2! And maybe even v4.0 if you're encouraging non-legacy operation. RFC3530 was published 14 years ago, btw. RFC1813 in 1995, and RFC1094 in 1989. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements 2017-02-27 12:08 ` Tom Talpey @ 2017-02-27 12:55 ` Jeff Layton 2017-02-27 14:20 ` J. Bruce Fields 0 siblings, 1 reply; 36+ messages in thread From: Jeff Layton @ 2017-02-27 12:55 UTC (permalink / raw) To: Tom Talpey, J. Bruce Fields Cc: trond.myklebust, schumaker.anna, linux-nfs, chuck.lever, jgunthorpe On Mon, 2017-02-27 at 07:08 -0500, Tom Talpey wrote: > On 2/27/2017 6:59 AM, Jeff Layton wrote: > > What we'd need to make that happen, I think is a [global] stanza in > > nfs.conf with a single 'nfsd_v3' boolean that defaults to off. If > > Don't forget v2! And maybe even v4.0 if you're encouraging non-legacy > operation. RFC3530 was published 14 years ago, btw. RFC1813 in 1995, > and RFC1094 in 1989. I think v2 already defaults to off these days? But yeah, I could see us adding a similar boolean for v2. Maybe we don't need a new switch at all, and just need to have everything look at the [nfsd] vers2= and vers3= config file options? I think wiring nfsd and mountd up properly for this would be fairly easy here. statd is a little tougher since we don't want to run it or sm- notify at all if v2/3 are disabled. I wonder if there is any way we can make systemd look at this config file and decide whether to start statd based on whether either of those options is set? I'd have no issue with eventually defaulting with v4.0 disabled as well, but there are a fair number of clients in the field that don't support v4.1 (or don't support it well). I think we'd need to wait and see how much grief we get about disabling v3 by default before we go there. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements 2017-02-27 12:55 ` Jeff Layton @ 2017-02-27 14:20 ` J. Bruce Fields 0 siblings, 0 replies; 36+ messages in thread From: J. Bruce Fields @ 2017-02-27 14:20 UTC (permalink / raw) To: Jeff Layton Cc: Tom Talpey, trond.myklebust, schumaker.anna, linux-nfs, chuck.lever, jgunthorpe On Mon, Feb 27, 2017 at 07:55:55AM -0500, Jeff Layton wrote: > On Mon, 2017-02-27 at 07:08 -0500, Tom Talpey wrote: > > On 2/27/2017 6:59 AM, Jeff Layton wrote: > > > What we'd need to make that happen, I think is a [global] stanza in > > > nfs.conf with a single 'nfsd_v3' boolean that defaults to off. If > > > > Don't forget v2! And maybe even v4.0 if you're encouraging non-legacy > > operation. RFC3530 was published 14 years ago, btw. RFC1813 in 1995, > > and RFC1094 in 1989. Looking just at the RHEL history.... I think we enabled experimental v4 in 2005 in RHEL4, but regretted that. It wasn't a default until RHEL6 in 2010. Other OS's were different, but in general I think implementation lagged specification by a lot. Ditto to some degree for 4.1. > I think v2 already defaults to off these days? But yeah, I could see us > adding a similar boolean for v2. Maybe we don't need a new switch at > all, and just need to have everything look at the [nfsd] vers2= and > vers3= config file options? > > I think wiring nfsd and mountd up properly for this would be fairly easy > here. statd is a little tougher since we don't want to run it or sm- > notify at all if v2/3 are disabled. I wonder if there is any way we can > make systemd look at this config file and decide whether to start statd > based on whether either of those options is set? Neil might have ideas--see https://lwn.net/Articles/701549/. --b. > I'd have no issue with eventually defaulting with v4.0 disabled as well, > but there are a fair number of clients in the field that don't support > v4.1 (or don't support it well). I think we'd need to wait and see how > much grief we get about disabling v3 by default before we go there. ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2017-02-27 14:20 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-02-23 17:03 [PATCH 0/4] nfs/nfsd/sunrpc: enforce requirement for congestion control protocols in NFSv4 Jeff Layton 2017-02-23 17:03 ` [PATCH 1/4] sunrpc: flag transports as using IETF approved congestion control protocols Jeff Layton 2017-02-23 19:42 ` Tom Talpey 2017-02-23 20:00 ` Jeff Layton 2017-02-23 20:06 ` Tom Talpey 2017-02-23 20:11 ` J. Bruce Fields 2017-02-23 20:26 ` Jason Gunthorpe 2017-02-23 20:33 ` Tom Talpey 2017-02-23 20:55 ` Jason Gunthorpe 2017-02-24 15:08 ` Tom Talpey 2017-02-24 17:17 ` Jeff Layton 2017-02-24 18:03 ` Jason Gunthorpe 2017-02-23 20:32 ` Jeff Layton 2017-02-23 20:17 ` Chuck Lever 2017-02-23 20:15 ` Chuck Lever 2017-02-23 17:03 ` [PATCH 2/4] sunrpc: turn bitfield flags in svc_version into bools Jeff Layton 2017-02-23 17:03 ` [PATCH 3/4] nfs/nfsd/sunrpc: enforce congestion control protocol requirement for NFSv4 Jeff Layton 2017-02-23 17:03 ` [PATCH 4/4] sunrpc: don't register UDP port with rpcbind when version needs congestion control Jeff Layton 2017-02-23 17:17 ` [PATCH 0/4] nfs/nfsd/sunrpc: enforce requirement for congestion control protocols in NFSv4 Jeff Layton 2017-02-24 18:25 ` [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements Jeff Layton 2017-02-24 18:25 ` [PATCH v2 1/4] sunrpc: turn bitfield flags in svc_version into bools Jeff Layton 2017-02-24 18:25 ` [PATCH v2 2/4] sunrpc: flag transports as having both reliable and ordered delivery, and congestion control Jeff Layton 2017-02-24 18:25 ` [PATCH v2 3/4] nfs/nfsd/sunrpc: enforce transport requirements for NFSv4 Jeff Layton 2017-02-24 18:25 ` [PATCH v2 4/4] sunrpc: don't register UDP port with rpcbind when version needs congestion control Jeff Layton 2017-02-24 18:38 ` [PATCH v2 0/4] nfs/nfsd/sunrpc: enforce NFSv4 transport requirements Chuck Lever 2017-02-24 18:53 ` Jeff Layton 2017-02-24 21:23 ` J. Bruce Fields 2017-02-24 18:53 ` Tom Talpey 2017-02-24 21:22 ` J. Bruce Fields 2017-02-24 21:25 ` J. Bruce Fields 2017-02-24 21:34 ` Jeff Layton 2017-02-24 21:44 ` J. Bruce Fields 2017-02-27 11:59 ` Jeff Layton 2017-02-27 12:08 ` Tom Talpey 2017-02-27 12:55 ` Jeff Layton 2017-02-27 14:20 ` J. Bruce Fields
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).