* [PATCH 0/7] Remaining rpcbind patches for 2.6.27
@ 2008-06-30 22:38 Chuck Lever
[not found] ` <20080630223646.24534.74654.stgit-ewv44WTpT0t9HhUboXbp9zCvJB+x5qRC@public.gmane.org>
0 siblings, 1 reply; 28+ messages in thread
From: Chuck Lever @ 2008-06-30 22:38 UTC (permalink / raw)
To: trond.myklebust; +Cc: linux-nfs
Hi Trond-
Seven patches that implement kernel RPC service registration via rpcbind v4.
This allows the kernel to advertise IPv4-only services on hosts with IPv6
addresses, for example.
---
Chuck Lever (7):
SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon
SUNRPC: Quickly detect missing portmapper during RPC service registration
SUNRPC: introduce new rpc_task flag that fails requests on xprt disconnect
SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier
SUNRPC: None of rpcb_create's callers wants a privileged source port
SUNRPC: Introduce a specific rpcb_create for contacting localhost
SUNRPC: Use correct XDR encoding procedure for rpcbind SET/UNSET
include/linux/sunrpc/clnt.h | 3
include/linux/sunrpc/sched.h | 2
net/sunrpc/clnt.c | 2
net/sunrpc/rpcb_clnt.c | 290 +++++++++++++++++++++++++++++++++++++-----
4 files changed, 263 insertions(+), 34 deletions(-)
--
Chuck Lever
chu ckl eve rat ora cle dot com
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 1/7] SUNRPC: Use correct XDR encoding procedure for rpcbind SET/UNSET
[not found] ` <20080630223646.24534.74654.stgit-ewv44WTpT0t9HhUboXbp9zCvJB+x5qRC@public.gmane.org>
@ 2008-06-30 22:38 ` Chuck Lever
2008-06-30 22:38 ` [PATCH 2/7] SUNRPC: Introduce a specific rpcb_create for contacting localhost Chuck Lever
` (6 subsequent siblings)
7 siblings, 0 replies; 28+ messages in thread
From: Chuck Lever @ 2008-06-30 22:38 UTC (permalink / raw)
To: trond.myklebust; +Cc: linux-nfs
The rpcbind versions 3 and 4 SET and UNSET procedures use the same
arguments as the GETADDR procedure.
While definitely a bug, this hasn't been a problem so far since the
kernel hasn't used version 3 or 4 SET and UNSET. But this will change
in just a moment.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/rpcb_clnt.c | 12 ++++++++----
1 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 2b3a013..756e4f0 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -425,6 +425,10 @@ static void rpcb_getport_done(struct rpc_task *child, void *data)
rpcb_wake_rpcbind_waiters(xprt, status);
}
+/*
+ * XDR functions for rpcbind
+ */
+
static int rpcb_encode_mapping(struct rpc_rqst *req, __be32 *p,
struct rpcbind_args *rpcb)
{
@@ -580,14 +584,14 @@ static struct rpc_procinfo rpcb_procedures2[] = {
};
static struct rpc_procinfo rpcb_procedures3[] = {
- PROC(SET, mapping, set),
- PROC(UNSET, mapping, set),
+ PROC(SET, getaddr, set),
+ PROC(UNSET, getaddr, set),
PROC(GETADDR, getaddr, getaddr),
};
static struct rpc_procinfo rpcb_procedures4[] = {
- PROC(SET, mapping, set),
- PROC(UNSET, mapping, set),
+ PROC(SET, getaddr, set),
+ PROC(UNSET, getaddr, set),
PROC(GETADDR, getaddr, getaddr),
PROC(GETVERSADDR, getaddr, getaddr),
};
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 2/7] SUNRPC: Introduce a specific rpcb_create for contacting localhost
[not found] ` <20080630223646.24534.74654.stgit-ewv44WTpT0t9HhUboXbp9zCvJB+x5qRC@public.gmane.org>
2008-06-30 22:38 ` [PATCH 1/7] SUNRPC: Use correct XDR encoding procedure for rpcbind SET/UNSET Chuck Lever
@ 2008-06-30 22:38 ` Chuck Lever
2008-06-30 22:38 ` [PATCH 3/7] SUNRPC: None of rpcb_create's callers wants a privileged source port Chuck Lever
` (5 subsequent siblings)
7 siblings, 0 replies; 28+ messages in thread
From: Chuck Lever @ 2008-06-30 22:38 UTC (permalink / raw)
To: trond.myklebust; +Cc: linux-nfs
Add rpcb_create_local() for use by rpcb_register() and upcoming IPv6
registration functions.
Ensure any errors encountered by rpcb_create_local() are properly
reported.
We can also use a statically allocated constant loopback socket address
instead of one allocated on the stack and initialized every time the
function is called.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/rpcb_clnt.c | 42 +++++++++++++++++++++++++++++++-----------
1 files changed, 31 insertions(+), 11 deletions(-)
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 756e4f0..7f48031 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -112,6 +112,29 @@ static void rpcb_wake_rpcbind_waiters(struct rpc_xprt *xprt, int status)
rpc_wake_up_status(&xprt->binding, status);
}
+static const struct sockaddr_in rpcb_inaddr_loopback = {
+ .sin_family = AF_INET,
+ .sin_addr.s_addr = htonl(INADDR_LOOPBACK),
+ .sin_port = htons(RPCBIND_PORT),
+};
+
+static struct rpc_clnt *rpcb_create_local(struct sockaddr *addr,
+ size_t addrlen, u32 version)
+{
+ struct rpc_create_args args = {
+ .protocol = XPRT_TRANSPORT_UDP,
+ .address = addr,
+ .addrsize = addrlen,
+ .servername = "localhost",
+ .program = &rpcb_program,
+ .version = version,
+ .authflavor = RPC_AUTH_UNIX,
+ .flags = RPC_CLNT_CREATE_NOPING,
+ };
+
+ return rpc_create(&args);
+}
+
static struct rpc_clnt *rpcb_create(char *hostname, struct sockaddr *srvaddr,
size_t salen, int proto, u32 version,
int privileged)
@@ -157,10 +180,6 @@ static struct rpc_clnt *rpcb_create(char *hostname, struct sockaddr *srvaddr,
*/
int rpcb_register(u32 prog, u32 vers, int prot, unsigned short port, int *okay)
{
- struct sockaddr_in sin = {
- .sin_family = AF_INET,
- .sin_addr.s_addr = htonl(INADDR_LOOPBACK),
- };
struct rpcbind_args map = {
.r_prog = prog,
.r_vers = vers,
@@ -180,14 +199,15 @@ int rpcb_register(u32 prog, u32 vers, int prot, unsigned short port, int *okay)
"rpcbind\n", (port ? "" : "un"),
prog, vers, prot, port);
- rpcb_clnt = rpcb_create("localhost", (struct sockaddr *) &sin,
- sizeof(sin), XPRT_TRANSPORT_UDP, RPCBVERS_2, 1);
- if (IS_ERR(rpcb_clnt))
- return PTR_ERR(rpcb_clnt);
+ rpcb_clnt = rpcb_create_local((struct sockaddr *)&rpcb_inaddr_loopback,
+ sizeof(rpcb_inaddr_loopback),
+ RPCBVERS_2);
+ if (!IS_ERR(rpcb_clnt)) {
+ error = rpc_call_sync(rpcb_clnt, &msg, 0);
+ rpc_shutdown_client(rpcb_clnt);
+ } else
+ error = PTR_ERR(rpcb_clnt);
- error = rpc_call_sync(rpcb_clnt, &msg, 0);
-
- rpc_shutdown_client(rpcb_clnt);
if (error < 0)
printk(KERN_WARNING "RPC: failed to contact local rpcbind "
"server (errno %d).\n", -error);
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 3/7] SUNRPC: None of rpcb_create's callers wants a privileged source port
[not found] ` <20080630223646.24534.74654.stgit-ewv44WTpT0t9HhUboXbp9zCvJB+x5qRC@public.gmane.org>
2008-06-30 22:38 ` [PATCH 1/7] SUNRPC: Use correct XDR encoding procedure for rpcbind SET/UNSET Chuck Lever
2008-06-30 22:38 ` [PATCH 2/7] SUNRPC: Introduce a specific rpcb_create for contacting localhost Chuck Lever
@ 2008-06-30 22:38 ` Chuck Lever
2008-06-30 22:39 ` [PATCH 4/7] SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier Chuck Lever
` (4 subsequent siblings)
7 siblings, 0 replies; 28+ messages in thread
From: Chuck Lever @ 2008-06-30 22:38 UTC (permalink / raw)
To: trond.myklebust; +Cc: linux-nfs
Clean up: Callers that required a privileged source port now use
rpcb_create_local(), so we can remove the @privileged argument from
rpcb_create().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/rpcb_clnt.c | 12 +++++-------
1 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 7f48031..f106740 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -136,8 +136,7 @@ static struct rpc_clnt *rpcb_create_local(struct sockaddr *addr,
}
static struct rpc_clnt *rpcb_create(char *hostname, struct sockaddr *srvaddr,
- size_t salen, int proto, u32 version,
- int privileged)
+ size_t salen, int proto, u32 version)
{
struct rpc_create_args args = {
.protocol = proto,
@@ -147,7 +146,8 @@ static struct rpc_clnt *rpcb_create(char *hostname, struct sockaddr *srvaddr,
.program = &rpcb_program,
.version = version,
.authflavor = RPC_AUTH_UNIX,
- .flags = RPC_CLNT_CREATE_NOPING,
+ .flags = (RPC_CLNT_CREATE_NOPING |
+ RPC_CLNT_CREATE_NONPRIVPORT),
};
switch (srvaddr->sa_family) {
@@ -161,8 +161,6 @@ static struct rpc_clnt *rpcb_create(char *hostname, struct sockaddr *srvaddr,
return NULL;
}
- if (!privileged)
- args.flags |= RPC_CLNT_CREATE_NONPRIVPORT;
return rpc_create(&args);
}
@@ -251,7 +249,7 @@ int rpcb_getport_sync(struct sockaddr_in *sin, u32 prog, u32 vers, int prot)
__func__, NIPQUAD(sin->sin_addr.s_addr), prog, vers, prot);
rpcb_clnt = rpcb_create(NULL, (struct sockaddr *)sin,
- sizeof(*sin), prot, RPCBVERS_2, 0);
+ sizeof(*sin), prot, RPCBVERS_2);
if (IS_ERR(rpcb_clnt))
return PTR_ERR(rpcb_clnt);
@@ -361,7 +359,7 @@ void rpcb_getport_async(struct rpc_task *task)
task->tk_pid, __func__, bind_version);
rpcb_clnt = rpcb_create(clnt->cl_server, sap, salen, xprt->prot,
- bind_version, 0);
+ bind_version);
if (IS_ERR(rpcb_clnt)) {
status = PTR_ERR(rpcb_clnt);
dprintk("RPC: %5u %s: rpcb_create failed, error %ld\n",
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 4/7] SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier
[not found] ` <20080630223646.24534.74654.stgit-ewv44WTpT0t9HhUboXbp9zCvJB+x5qRC@public.gmane.org>
` (2 preceding siblings ...)
2008-06-30 22:38 ` [PATCH 3/7] SUNRPC: None of rpcb_create's callers wants a privileged source port Chuck Lever
@ 2008-06-30 22:39 ` Chuck Lever
2008-06-30 22:39 ` [PATCH 5/7] SUNRPC: introduce new rpc_task flag that fails requests on xprt disconnect Chuck Lever
` (3 subsequent siblings)
7 siblings, 0 replies; 28+ messages in thread
From: Chuck Lever @ 2008-06-30 22:39 UTC (permalink / raw)
To: trond.myklebust; +Cc: linux-nfs
rpcbind version 4 registration will reuse part of rpcb_register, so just
split it out into a separate function now.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/rpcb_clnt.c | 48 ++++++++++++++++++++++++++++++------------------
1 files changed, 30 insertions(+), 18 deletions(-)
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index f106740..f242c2d 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -164,6 +164,30 @@ static struct rpc_clnt *rpcb_create(char *hostname, struct sockaddr *srvaddr,
return rpc_create(&args);
}
+static int rpcb_register_call(struct sockaddr *addr, size_t addrlen,
+ u32 version, struct rpc_message *msg,
+ int *result)
+{
+ struct rpc_clnt *rpcb_clnt;
+ int error = 0;
+
+ *result = 0;
+
+ rpcb_clnt = rpcb_create_local(addr, addrlen, version);
+ if (!IS_ERR(rpcb_clnt)) {
+ error = rpc_call_sync(rpcb_clnt, msg, 0);
+ rpc_shutdown_client(rpcb_clnt);
+ } else
+ error = PTR_ERR(rpcb_clnt);
+
+ if (error < 0)
+ printk(KERN_WARNING "RPC: failed to contact local rpcbind "
+ "server (errno %d).\n", -error);
+ dprintk("RPC: registration status %d/%d\n", error, *result);
+
+ return error;
+}
+
/**
* rpcb_register - set or unset a port registration with the local rpcbind svc
* @prog: RPC program number to bind
@@ -185,33 +209,21 @@ int rpcb_register(u32 prog, u32 vers, int prot, unsigned short port, int *okay)
.r_port = port,
};
struct rpc_message msg = {
- .rpc_proc = &rpcb_procedures2[port ?
- RPCBPROC_SET : RPCBPROC_UNSET],
.rpc_argp = &map,
.rpc_resp = okay,
};
- struct rpc_clnt *rpcb_clnt;
- int error = 0;
dprintk("RPC: %sregistering (%u, %u, %d, %u) with local "
"rpcbind\n", (port ? "" : "un"),
prog, vers, prot, port);
- rpcb_clnt = rpcb_create_local((struct sockaddr *)&rpcb_inaddr_loopback,
- sizeof(rpcb_inaddr_loopback),
- RPCBVERS_2);
- if (!IS_ERR(rpcb_clnt)) {
- error = rpc_call_sync(rpcb_clnt, &msg, 0);
- rpc_shutdown_client(rpcb_clnt);
- } else
- error = PTR_ERR(rpcb_clnt);
+ msg.rpc_proc = &rpcb_procedures2[RPCBPROC_UNSET];
+ if (port)
+ msg.rpc_proc = &rpcb_procedures2[RPCBPROC_SET];
- if (error < 0)
- printk(KERN_WARNING "RPC: failed to contact local rpcbind "
- "server (errno %d).\n", -error);
- dprintk("RPC: registration status %d/%d\n", error, *okay);
-
- return error;
+ return rpcb_register_call((struct sockaddr *)&rpcb_inaddr_loopback,
+ sizeof(rpcb_inaddr_loopback),
+ RPCBVERS_2, &msg, okay);
}
/**
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 5/7] SUNRPC: introduce new rpc_task flag that fails requests on xprt disconnect
[not found] ` <20080630223646.24534.74654.stgit-ewv44WTpT0t9HhUboXbp9zCvJB+x5qRC@public.gmane.org>
` (3 preceding siblings ...)
2008-06-30 22:39 ` [PATCH 4/7] SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier Chuck Lever
@ 2008-06-30 22:39 ` Chuck Lever
2008-06-30 22:39 ` [PATCH 6/7] SUNRPC: Quickly detect missing portmapper during RPC service registration Chuck Lever
` (2 subsequent siblings)
7 siblings, 0 replies; 28+ messages in thread
From: Chuck Lever @ 2008-06-30 22:39 UTC (permalink / raw)
To: trond.myklebust; +Cc: linux-nfs
Introduce a new RPC client capability that allows RPC consumers to specify
that if a connection cannot be established to the remote RPC service, a
submitted request should fail immediately.
Useful for in-kernel RPC applications that want to connect and do just one
or a handful of idempotent requests, but don't want any long waits if the
remote service is not available.
No, Tom, I'm not going to call it "squishy." :-)
Note that because the UDP transport uses an unconnected socket, this new
flag is a no-op for UDP.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
include/linux/sunrpc/sched.h | 2 ++
net/sunrpc/clnt.c | 2 ++
2 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index 64981a2..dcbdb60 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -130,12 +130,14 @@ struct rpc_task_setup {
#define RPC_TASK_DYNAMIC 0x0080 /* task was kmalloc'ed */
#define RPC_TASK_KILLED 0x0100 /* task was killed */
#define RPC_TASK_SOFT 0x0200 /* Use soft timeouts */
+#define RPC_TASK_ONESHOT 0x0400 /* fail if can't connect */
#define RPC_IS_ASYNC(t) ((t)->tk_flags & RPC_TASK_ASYNC)
#define RPC_IS_SWAPPER(t) ((t)->tk_flags & RPC_TASK_SWAPPER)
#define RPC_DO_ROOTOVERRIDE(t) ((t)->tk_flags & RPC_TASK_ROOTCREDS)
#define RPC_ASSASSINATED(t) ((t)->tk_flags & RPC_TASK_KILLED)
#define RPC_IS_SOFT(t) ((t)->tk_flags & RPC_TASK_SOFT)
+#define RPC_IS_ONESHOT(t) ((t)->tk_flags & RPC_TASK_ONESHOT)
#define RPC_TASK_RUNNING 0
#define RPC_TASK_QUEUED 1
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 09631f6..171d53c 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1029,6 +1029,8 @@ call_connect_status(struct rpc_task *task)
switch (status) {
case -ENOTCONN:
+ if (RPC_IS_ONESHOT(task))
+ break;
case -EAGAIN:
task->tk_action = call_bind;
if (!RPC_IS_SOFT(task))
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 6/7] SUNRPC: Quickly detect missing portmapper during RPC service registration
[not found] ` <20080630223646.24534.74654.stgit-ewv44WTpT0t9HhUboXbp9zCvJB+x5qRC@public.gmane.org>
` (4 preceding siblings ...)
2008-06-30 22:39 ` [PATCH 5/7] SUNRPC: introduce new rpc_task flag that fails requests on xprt disconnect Chuck Lever
@ 2008-06-30 22:39 ` Chuck Lever
2008-06-30 22:39 ` [PATCH 7/7] SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon Chuck Lever
2008-07-03 20:45 ` [PATCH 0/7] Remaining rpcbind patches for 2.6.27 J. Bruce Fields
7 siblings, 0 replies; 28+ messages in thread
From: Chuck Lever @ 2008-06-30 22:39 UTC (permalink / raw)
To: trond.myklebust; +Cc: linux-nfs
Currently the in-kernel RPC server uses an unconnected UDP socket to
contact the local user-space portmapper daemon because UDP is less
overhead than TCP. However, attempting to register or unregister a
local RPC service with a portmapper that isn't listening for requests
results in a long timeout wait (35 seconds per request, by my
calculations).
Using TCP, the absence of a listener results in an immediate ECONNREFUSED.
Setting the ONESHOT flag causes the RPC client to fail RPC requests as soon
as this occurs.
Therefore, this patch changes rpcb_register() to use TCP to contact the
local portmapper. Starting up in-kernel RPC services should no longer
hang if there is no local portmapper listening for requests.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/rpcb_clnt.c | 22 ++++++++++++++++++++--
1 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index f242c2d..c33345a 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -67,6 +67,12 @@ enum {
#define RPCB_OWNER_STRING "rpcb"
#define RPCB_MAXOWNERLEN sizeof(RPCB_OWNER_STRING)
+/*
+ * Number of seconds before timing out a registration request
+ * to our local user space rpcbind daemon.
+ */
+#define RPCB_REGISTER_TO_SECS (10UL)
+
static void rpcb_getport_done(struct rpc_task *, void *);
static struct rpc_program rpcb_program;
@@ -118,14 +124,26 @@ static const struct sockaddr_in rpcb_inaddr_loopback = {
.sin_port = htons(RPCBIND_PORT),
};
+/*
+ * TCP is always used to contact the local rpcbind daemon because we
+ * get an immediate indication of whether there is a remote listener
+ * when a connection is attempted.
+ */
+static const struct rpc_timeout rpcb_local_tcp_timeout = {
+ .to_initval = RPCB_REGISTER_TO_SECS * HZ,
+ .to_maxval = RPCB_REGISTER_TO_SECS * HZ,
+ .to_retries = 0,
+};
+
static struct rpc_clnt *rpcb_create_local(struct sockaddr *addr,
size_t addrlen, u32 version)
{
struct rpc_create_args args = {
- .protocol = XPRT_TRANSPORT_UDP,
+ .protocol = XPRT_TRANSPORT_TCP,
.address = addr,
.addrsize = addrlen,
.servername = "localhost",
+ .timeout = &rpcb_local_tcp_timeout,
.program = &rpcb_program,
.version = version,
.authflavor = RPC_AUTH_UNIX,
@@ -175,7 +193,7 @@ static int rpcb_register_call(struct sockaddr *addr, size_t addrlen,
rpcb_clnt = rpcb_create_local(addr, addrlen, version);
if (!IS_ERR(rpcb_clnt)) {
- error = rpc_call_sync(rpcb_clnt, msg, 0);
+ error = rpc_call_sync(rpcb_clnt, msg, RPC_TASK_ONESHOT);
rpc_shutdown_client(rpcb_clnt);
} else
error = PTR_ERR(rpcb_clnt);
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 7/7] SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon
[not found] ` <20080630223646.24534.74654.stgit-ewv44WTpT0t9HhUboXbp9zCvJB+x5qRC@public.gmane.org>
` (5 preceding siblings ...)
2008-06-30 22:39 ` [PATCH 6/7] SUNRPC: Quickly detect missing portmapper during RPC service registration Chuck Lever
@ 2008-06-30 22:39 ` Chuck Lever
2008-07-03 20:45 ` [PATCH 0/7] Remaining rpcbind patches for 2.6.27 J. Bruce Fields
7 siblings, 0 replies; 28+ messages in thread
From: Chuck Lever @ 2008-06-30 22:39 UTC (permalink / raw)
To: trond.myklebust; +Cc: linux-nfs
Introduce a new API to register RPC services on IPv6 interfaces to allow
the NFS server and lockd to advertise on IPv6 networks.
Unlike rpcb_register(), the new rpcb_v4_register() function uses rpcbind
protocol version 4 to contact the local rpcbind daemon. The version 4
SET/UNSET procedures allow services to register address families besides
AF_INET, register at specific network interfaces, and register transport
protocols besides UDP and TCP. All of this functionality is exposed via
the new rpcb_v4_register() kernel API.
A user-space rpcbind daemon implementation that supports version 4 of the
rpcbind protocol is required in order to make use of this new API.
Note that rpcbind version 3 is sufficient to support the new rpcbind
facilities listed above, but most extant implementations use version 4.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
include/linux/sunrpc/clnt.h | 3 +
net/sunrpc/rpcb_clnt.c | 178 ++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 177 insertions(+), 4 deletions(-)
diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index 764fd4c..e5bfe01 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -125,6 +125,9 @@ void rpc_shutdown_client(struct rpc_clnt *);
void rpc_release_client(struct rpc_clnt *);
int rpcb_register(u32, u32, int, unsigned short, int *);
+int rpcb_v4_register(const u32 program, const u32 version,
+ const struct sockaddr *address,
+ const char *netid, int *result);
int rpcb_getport_sync(struct sockaddr_in *, u32, u32, int);
void rpcb_getport_async(struct rpc_task *);
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index c33345a..038dbb3 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -90,6 +90,7 @@ struct rpcbind_args {
static struct rpc_procinfo rpcb_procedures2[];
static struct rpc_procinfo rpcb_procedures3[];
+static struct rpc_procinfo rpcb_procedures4[];
struct rpcb_info {
u32 rpc_vers;
@@ -124,6 +125,12 @@ static const struct sockaddr_in rpcb_inaddr_loopback = {
.sin_port = htons(RPCBIND_PORT),
};
+static const struct sockaddr_in6 rpcb_in6addr_loopback = {
+ .sin6_family = AF_INET6,
+ .sin6_addr = IN6ADDR_LOOPBACK_INIT,
+ .sin6_port = htons(RPCBIND_PORT),
+};
+
/*
* TCP is always used to contact the local rpcbind daemon because we
* get an immediate indication of whether there is a remote listener
@@ -210,13 +217,38 @@ static int rpcb_register_call(struct sockaddr *addr, size_t addrlen,
* rpcb_register - set or unset a port registration with the local rpcbind svc
* @prog: RPC program number to bind
* @vers: RPC version number to bind
- * @prot: transport protocol to use to make this request
+ * @prot: transport protocol to register
* @port: port value to register
- * @okay: result code
+ * @okay: OUT: result code
+ *
+ * RPC services invoke this function to advertise their contact
+ * information via the system's rpcbind daemon. RPC services
+ * invoke this function once for each [program, version, transport]
+ * tuple they wish to advertise.
+ *
+ * Callers may also unregister RPC services that are no longer
+ * available by setting the passed-in port to zero. This removes
+ * all registered transports for [program, version] from the local
+ * rpcbind database.
+ *
+ * Returns zero if the registration request was dispatched
+ * successfully and a reply was received. The rpcbind daemon's
+ * boolean result code is stored in *okay.
*
- * port == 0 means unregister, port != 0 means register.
+ * Returns an errno value and sets *result to zero if there was
+ * some problem that prevented the rpcbind request from being
+ * dispatched, or if the rpcbind daemon did not respond within
+ * the timeout.
*
- * This routine supports only rpcbind version 2.
+ * This function uses rpcbind protocol version 2 to contact the
+ * local rpcbind daemon.
+ *
+ * Registration works over both AF_INET and AF_INET6, and services
+ * registered via this function are advertised as available for any
+ * address. If the local rpcbind daemon is listening on AF_INET6,
+ * services registered via this function will be advertised on
+ * IN6ADDR_ANY (ie available for all AF_INET and AF_INET6
+ * addresses).
*/
int rpcb_register(u32 prog, u32 vers, int prot, unsigned short port, int *okay)
{
@@ -244,6 +276,144 @@ int rpcb_register(u32 prog, u32 vers, int prot, unsigned short port, int *okay)
RPCBVERS_2, &msg, okay);
}
+/*
+ * Fill in AF_INET family-specific arguments to register
+ */
+static int rpcb_register_netid4(struct sockaddr_in *address_to_register,
+ struct rpc_message *msg)
+{
+ struct rpcbind_args *map = msg->rpc_argp;
+ unsigned short port = ntohs(address_to_register->sin_port);
+ char buf[32];
+
+ /* Construct AF_INET universal address */
+ snprintf(buf, sizeof(buf),
+ NIPQUAD_FMT".%u.%u",
+ NIPQUAD(address_to_register->sin_addr.s_addr),
+ port >> 8, port & 0xff);
+ map->r_addr = buf;
+
+ dprintk("RPC: %sregistering [%u, %u, %s, '%s'] with "
+ "local rpcbind\n", (port ? "" : "un"),
+ map->r_prog, map->r_vers,
+ map->r_addr, map->r_netid);
+
+ msg->rpc_proc = &rpcb_procedures4[RPCBPROC_UNSET];
+ if (port)
+ msg->rpc_proc = &rpcb_procedures4[RPCBPROC_SET];
+
+ return rpcb_register_call((struct sockaddr *)&rpcb_inaddr_loopback,
+ sizeof(rpcb_inaddr_loopback),
+ RPCBVERS_4, msg, msg->rpc_resp);
+}
+
+/*
+ * Fill in AF_INET6 family-specific arguments to register
+ */
+static int rpcb_register_netid6(struct sockaddr_in6 *address_to_register,
+ struct rpc_message *msg)
+{
+ struct rpcbind_args *map = msg->rpc_argp;
+ unsigned short port = ntohs(address_to_register->sin6_port);
+ char buf[64];
+
+ /* Construct AF_INET6 universal address */
+ snprintf(buf, sizeof(buf),
+ NIP6_FMT".%u.%u",
+ NIP6(address_to_register->sin6_addr),
+ port >> 8, port & 0xff);
+ map->r_addr = buf;
+
+ dprintk("RPC: %sregistering [%u, %u, %s, '%s'] with "
+ "local rpcbind\n", (port ? "" : "un"),
+ map->r_prog, map->r_vers,
+ map->r_addr, map->r_netid);
+
+ msg->rpc_proc = &rpcb_procedures4[RPCBPROC_UNSET];
+ if (port)
+ msg->rpc_proc = &rpcb_procedures4[RPCBPROC_SET];
+
+ return rpcb_register_call((struct sockaddr *)&rpcb_in6addr_loopback,
+ sizeof(rpcb_in6addr_loopback),
+ RPCBVERS_4, msg, msg->rpc_resp);
+}
+
+/**
+ * rpcb_v4_register - set or unset a port registration with the local rpcbind
+ * @program: RPC program number of service to (un)register
+ * @version: RPC version number of service to (un)register
+ * @address: address family, IP address, and port to (un)register
+ * @netid: netid of transport protocol to (un)register
+ * @result: result code from rpcbind RPC call
+ *
+ * RPC services invoke this function to advertise their contact
+ * information via the system's rpcbind daemon. RPC services
+ * invoke this function once for each [program, version, address,
+ * netid] tuple they wish to advertise.
+ *
+ * Callers may also unregister RPC services that are no longer
+ * available by setting the port number in the passed-in address
+ * to zero. Callers pass a netid of "" to unregister all
+ * transport netids associated with [program, version, address].
+ *
+ * Returns zero if the registration request was dispatched
+ * successfully and a reply was received. The rpcbind daemon's
+ * result code is stored in *result.
+ *
+ * Returns an errno value and sets *result to zero if there was
+ * some problem that prevented the rpcbind request from being
+ * dispatched, or if the rpcbind daemon did not respond within
+ * the timeout.
+ *
+ * This function uses rpcbind protocol version 4 to contact the
+ * local rpcbind daemon. The local rpcbind daemon must support
+ * version 4 of the rpcbind protocol in order for these functions
+ * to register a service successfully.
+ *
+ * Supported netids include "udp" and "tcp" for UDP and TCP over
+ * IPv4, and "udp6" and "tcp6" for UDP and TCP over IPv6,
+ * respectively.
+ *
+ * The contents of @address determine the address family and the
+ * port to be registered. The usual practice is to pass INADDR_ANY
+ * as the raw address, but specifying a non-zero address is also
+ * supported by this API if the caller wishes to advertise an RPC
+ * service on a specific network interface.
+ *
+ * Note that passing in INADDR_ANY does not create the same service
+ * registration as IN6ADDR_ANY. The former advertises an RPC
+ * service on any IPv4 address, but not on IPv6. The latter
+ * advertises the service on all IPv4 and IPv6 addresses.
+ */
+int rpcb_v4_register(const u32 program, const u32 version,
+ const struct sockaddr *address, const char *netid,
+ int *result)
+{
+ struct rpcbind_args map = {
+ .r_prog = program,
+ .r_vers = version,
+ .r_netid = netid,
+ .r_owner = RPCB_OWNER_STRING,
+ };
+ struct rpc_message msg = {
+ .rpc_argp = &map,
+ .rpc_resp = result,
+ };
+
+ *result = 0;
+
+ switch (address->sa_family) {
+ case AF_INET:
+ return rpcb_register_netid4((struct sockaddr_in *)address,
+ &msg);
+ case AF_INET6:
+ return rpcb_register_netid6((struct sockaddr_in6 *)address,
+ &msg);
+ }
+
+ return -EAFNOSUPPORT;
+}
+
/**
* rpcb_getport_sync - obtain the port for an RPC service on a given host
* @sin: address of remote peer
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27
[not found] ` <20080630223646.24534.74654.stgit-ewv44WTpT0t9HhUboXbp9zCvJB+x5qRC@public.gmane.org>
` (6 preceding siblings ...)
2008-06-30 22:39 ` [PATCH 7/7] SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon Chuck Lever
@ 2008-07-03 20:45 ` J. Bruce Fields
2008-07-07 18:20 ` Trond Myklebust
7 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-07-03 20:45 UTC (permalink / raw)
To: Chuck Lever; +Cc: trond.myklebust, linux-nfs
On Mon, Jun 30, 2008 at 06:38:35PM -0400, Chuck Lever wrote:
> Hi Trond-
>
> Seven patches that implement kernel RPC service registration via rpcbind v4.
> This allows the kernel to advertise IPv4-only services on hosts with IPv6
> addresses, for example.
This is Trond's baliwick, but I read through all 7 quickly and they
looked good to me....
--b.
>
> ---
>
> Chuck Lever (7):
> SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon
> SUNRPC: Quickly detect missing portmapper during RPC service registration
> SUNRPC: introduce new rpc_task flag that fails requests on xprt disconnect
> SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier
> SUNRPC: None of rpcb_create's callers wants a privileged source port
> SUNRPC: Introduce a specific rpcb_create for contacting localhost
> SUNRPC: Use correct XDR encoding procedure for rpcbind SET/UNSET
>
>
> include/linux/sunrpc/clnt.h | 3
> include/linux/sunrpc/sched.h | 2
> net/sunrpc/clnt.c | 2
> net/sunrpc/rpcb_clnt.c | 290 +++++++++++++++++++++++++++++++++++++-----
> 4 files changed, 263 insertions(+), 34 deletions(-)
>
> --
> Chuck Lever
> chu ckl eve rat ora cle dot com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27
2008-07-03 20:45 ` [PATCH 0/7] Remaining rpcbind patches for 2.6.27 J. Bruce Fields
@ 2008-07-07 18:20 ` Trond Myklebust
2008-07-07 18:43 ` Chuck Lever
0 siblings, 1 reply; 28+ messages in thread
From: Trond Myklebust @ 2008-07-07 18:20 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Chuck Lever, linux-nfs
On Thu, 2008-07-03 at 16:45 -0400, J. Bruce Fields wrote:
> On Mon, Jun 30, 2008 at 06:38:35PM -0400, Chuck Lever wrote:
> > Hi Trond-
> >
> > Seven patches that implement kernel RPC service registration via rpcbind v4.
> > This allows the kernel to advertise IPv4-only services on hosts with IPv6
> > addresses, for example.
>
> This is Trond's baliwick, but I read through all 7 quickly and they
> looked good to me....
They look more or less OK to me too, however I'm a bit unhappy about the
RPC_TASK_ONESHOT name: it isn't at all descriptive.
I also have questions about the change to a TCP socket here. Why not
just implement connected UDP sockets?
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27
2008-07-07 18:20 ` Trond Myklebust
@ 2008-07-07 18:43 ` Chuck Lever
2008-07-07 18:51 ` Trond Myklebust
0 siblings, 1 reply; 28+ messages in thread
From: Chuck Lever @ 2008-07-07 18:43 UTC (permalink / raw)
To: Trond Myklebust; +Cc: J. Bruce Fields, linux-nfs
On Jul 7, 2008, at 2:20 PM, Trond Myklebust wrote:
> On Thu, 2008-07-03 at 16:45 -0400, J. Bruce Fields wrote:
>> On Mon, Jun 30, 2008 at 06:38:35PM -0400, Chuck Lever wrote:
>>> Hi Trond-
>>>
>>> Seven patches that implement kernel RPC service registration via
>>> rpcbind v4.
>>> This allows the kernel to advertise IPv4-only services on hosts
>>> with IPv6
>>> addresses, for example.
>>
>> This is Trond's baliwick, but I read through all 7 quickly and they
>> looked good to me....
>
> They look more or less OK to me too, however I'm a bit unhappy about
> the
> RPC_TASK_ONESHOT name: it isn't at all descriptive.
Open to suggestions. I thought RPC_TASK_FAIL_WITHOUT_CONNECTION was a
bit wordy ;-)
> I also have questions about the change to a TCP socket here. Why not
> just implement connected UDP sockets?
Changing rpcb_register() to use a TCP socket is less work overall, and
we get a positive hand shake between the kernel and user space when
the TCP connection is opened.
Other services might also want to use TCP+ONESHOT for several short
requests over a real network with actual packet loss, but they might
find CUDP+ONESHOT less practical/reliable (or even forbidden in the
case of NFSv4). So we would end up with something of a one-off
implementation for rpcb_register.
The downside of using TCP in this case is that it's more overhead: 8
packets instead of two for registration in the common case, and it
leaves a single privileged port in TIME_WAIT for each registered
service. I don't think this matters much as registration happens
quite infrequently.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27
2008-07-07 18:43 ` Chuck Lever
@ 2008-07-07 18:51 ` Trond Myklebust
2008-07-07 19:44 ` Chuck Lever
0 siblings, 1 reply; 28+ messages in thread
From: Trond Myklebust @ 2008-07-07 18:51 UTC (permalink / raw)
To: Chuck Lever; +Cc: J. Bruce Fields, linux-nfs
On Mon, 2008-07-07 at 14:43 -0400, Chuck Lever wrote:
> On Jul 7, 2008, at 2:20 PM, Trond Myklebust wrote:
> > On Thu, 2008-07-03 at 16:45 -0400, J. Bruce Fields wrote:
> >> On Mon, Jun 30, 2008 at 06:38:35PM -0400, Chuck Lever wrote:
> >>> Hi Trond-
> >>>
> >>> Seven patches that implement kernel RPC service registration via
> >>> rpcbind v4.
> >>> This allows the kernel to advertise IPv4-only services on hosts
> >>> with IPv6
> >>> addresses, for example.
> >>
> >> This is Trond's baliwick, but I read through all 7 quickly and they
> >> looked good to me....
> >
> > They look more or less OK to me too, however I'm a bit unhappy about
> > the
> > RPC_TASK_ONESHOT name: it isn't at all descriptive.
>
> Open to suggestions. I thought RPC_TASK_FAIL_WITHOUT_CONNECTION was a
> bit wordy ;-)
RPC_TASK_CONNECT_ONCE ?
> > I also have questions about the change to a TCP socket here. Why not
> > just implement connected UDP sockets?
>
> Changing rpcb_register() to use a TCP socket is less work overall, and
> we get a positive hand shake between the kernel and user space when
> the TCP connection is opened.
>
> Other services might also want to use TCP+ONESHOT for several short
> requests over a real network with actual packet loss, but they might
> find CUDP+ONESHOT less practical/reliable (or even forbidden in the
> case of NFSv4). So we would end up with something of a one-off
> implementation for rpcb_register.
I don't see what that has to do with anything: the connection failed
codepath in call_connect_status() should be the same in both the TCP and
the UDP case.
> The downside of using TCP in this case is that it's more overhead: 8
> packets instead of two for registration in the common case, and it
> leaves a single privileged port in TIME_WAIT for each registered
> service. I don't think this matters much as registration happens
> quite infrequently.
The problem is that registration usually happens at boot time, which is
also when most of the NFS 'mount' requests will be eating privileged
ports.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27
2008-07-07 18:51 ` Trond Myklebust
@ 2008-07-07 19:44 ` Chuck Lever
[not found] ` <76bd70e30807071244v4db1c366uc7599d2dd806bf1b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 28+ messages in thread
From: Chuck Lever @ 2008-07-07 19:44 UTC (permalink / raw)
To: Trond Myklebust; +Cc: J. Bruce Fields, linux-nfs
On Mon, Jul 7, 2008 at 2:51 PM, Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:
> On Mon, 2008-07-07 at 14:43 -0400, Chuck Lever wrote:
>> On Jul 7, 2008, at 2:20 PM, Trond Myklebust wrote:
>> > On Thu, 2008-07-03 at 16:45 -0400, J. Bruce Fields wrote:
>> >> On Mon, Jun 30, 2008 at 06:38:35PM -0400, Chuck Lever wrote:
>> >>> Hi Trond-
>> >>>
>> >>> Seven patches that implement kernel RPC service registration via
>> >>> rpcbind v4.
>> >>> This allows the kernel to advertise IPv4-only services on hosts
>> >>> with IPv6
>> >>> addresses, for example.
>> >>
>> >> This is Trond's baliwick, but I read through all 7 quickly and they
>> >> looked good to me....
>> >
>> > They look more or less OK to me too, however I'm a bit unhappy about
>> > the
>> > RPC_TASK_ONESHOT name: it isn't at all descriptive.
>>
>> Open to suggestions. I thought RPC_TASK_FAIL_WITHOUT_CONNECTION was a
>> bit wordy ;-)
>
> RPC_TASK_CONNECT_ONCE ?
That's not the semantic I was really going for. FAIL_ON_CONNRESET is
probably closer.
>> > I also have questions about the change to a TCP socket here. Why not
>> > just implement connected UDP sockets?
>>
>> Changing rpcb_register() to use a TCP socket is less work overall, and
>> we get a positive hand shake between the kernel and user space when
>> the TCP connection is opened.
>>
>> Other services might also want to use TCP+ONESHOT for several short
>> requests over a real network with actual packet loss, but they might
>> find CUDP+ONESHOT less practical/reliable (or even forbidden in the
>> case of NFSv4). So we would end up with something of a one-off
>> implementation for rpcb_register.
>
> I don't see what that has to do with anything: the connection failed
> codepath in call_connect_status() should be the same in both the TCP and
> the UDP case.
If you would like connected UDP, I won't object to you implementing
it. However, I never tested whether a connected UDP socket will give
the desired semantics without extra code in the UDP transport (for
example, an ->sk_error callback). I don't think it's worth the hassle
if we have to add code to UDP that only this tiny use case would need.
>> The downside of using TCP in this case is that it's more overhead: 8
>> packets instead of two for registration in the common case, and it
>> leaves a single privileged port in TIME_WAIT for each registered
>> service. I don't think this matters much as registration happens
>> quite infrequently.
>
> The problem is that registration usually happens at boot time, which is
> also when most of the NFS 'mount' requests will be eating privileged
> ports.
You're talking about the difference between supporting say 1358 mounts
at boot time versus 1357 mounts at boot time.
In most cases, a client with hundreds of mounts will use up exactly
one extra privileged TCP port to register NLM during the first
lockd_up() call. If these are all NFSv4 mounts, it will use exactly
zero extra ports, since the NFSv4 callback service is not even
registered.
Considering that _each_ mount operation can take between 2 and 5
privileged ports, while registering NFSD and NLM both would take
exactly two ports at boot time, I think that registration is wrong
place to optimize.
--
Chuck Lever
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27
[not found] ` <76bd70e30807071244v4db1c366uc7599d2dd806bf1b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-07-07 20:51 ` Trond Myklebust
2008-07-07 21:19 ` J. Bruce Fields
2008-07-10 17:27 ` Chuck Lever
0 siblings, 2 replies; 28+ messages in thread
From: Trond Myklebust @ 2008-07-07 20:51 UTC (permalink / raw)
To: chucklever; +Cc: J. Bruce Fields, linux-nfs
On Mon, 2008-07-07 at 15:44 -0400, Chuck Lever wrote:
> If you would like connected UDP, I won't object to you implementing
> it. However, I never tested whether a connected UDP socket will give
> the desired semantics without extra code in the UDP transport (for
> example, an ->sk_error callback). I don't think it's worth the hassle
> if we have to add code to UDP that only this tiny use case would need.
>
OK. I'll set these patches aside until I have time to look into adding
connected UDP support.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27
2008-07-07 20:51 ` Trond Myklebust
@ 2008-07-07 21:19 ` J. Bruce Fields
2008-07-07 22:13 ` Trond Myklebust
2008-07-10 17:27 ` Chuck Lever
1 sibling, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-07-07 21:19 UTC (permalink / raw)
To: Trond Myklebust; +Cc: chucklever, linux-nfs
On Mon, Jul 07, 2008 at 04:51:17PM -0400, Trond Myklebust wrote:
> On Mon, 2008-07-07 at 15:44 -0400, Chuck Lever wrote:
>
> > If you would like connected UDP, I won't object to you implementing
> > it. However, I never tested whether a connected UDP socket will give
> > the desired semantics without extra code in the UDP transport (for
> > example, an ->sk_error callback). I don't think it's worth the hassle
> > if we have to add code to UDP that only this tiny use case would need.
> >
>
> OK. I'll set these patches aside until I have time to look into adding
> connected UDP support.
I'm curious--why weren't you convinced by this argument?:
"You're talking about the difference between supporting say 1358
mounts at boot time versus 1357 mounts at boot time.
"In most cases, a client with hundreds of mounts will use up
exactly one extra privileged TCP port to register NLM during the
first lockd_up() call. If these are all NFSv4 mounts, it will
use exactly zero extra ports, since the NFSv4 callback service
is not even registered.
"Considering that _each_ mount operation can take between 2 and
5 privileged ports, while registering NFSD and NLM both would
take exactly two ports at boot time, I think that registration
is wrong place to optimize."
I'll admit to not following this carefully, but that seemed reasonable
to me.
--b.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27
2008-07-07 21:19 ` J. Bruce Fields
@ 2008-07-07 22:13 ` Trond Myklebust
2008-07-07 22:56 ` J. Bruce Fields
0 siblings, 1 reply; 28+ messages in thread
From: Trond Myklebust @ 2008-07-07 22:13 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: chucklever, linux-nfs
On Mon, 2008-07-07 at 17:19 -0400, J. Bruce Fields wrote:
> On Mon, Jul 07, 2008 at 04:51:17PM -0400, Trond Myklebust wrote:
> > On Mon, 2008-07-07 at 15:44 -0400, Chuck Lever wrote:
> >
> > > If you would like connected UDP, I won't object to you implementing
> > > it. However, I never tested whether a connected UDP socket will give
> > > the desired semantics without extra code in the UDP transport (for
> > > example, an ->sk_error callback). I don't think it's worth the hassle
> > > if we have to add code to UDP that only this tiny use case would need.
> > >
> >
> > OK. I'll set these patches aside until I have time to look into adding
> > connected UDP support.
>
> I'm curious--why weren't you convinced by this argument?:
>
> "You're talking about the difference between supporting say 1358
> mounts at boot time versus 1357 mounts at boot time.
Where did you get those figures from? Firstly, the total number of
privileged ports is much smaller. Secondly, the number of _free_
privileged ports can vary wildly depending on the user's setup.
> "In most cases, a client with hundreds of mounts will use up
> exactly one extra privileged TCP port to register NLM during the
> first lockd_up() call. If these are all NFSv4 mounts, it will
> use exactly zero extra ports, since the NFSv4 callback service
> is not even registered.
>
> "Considering that _each_ mount operation can take between 2 and
> 5 privileged ports, while registering NFSD and NLM both would
> take exactly two ports at boot time, I think that registration
> is wrong place to optimize."
>
> I'll admit to not following this carefully, but that seemed reasonable
> to me.
Like it or not, this _is_ a user interface change: if someone has set up
their iptables firewall or is using the tcp_wrapper library to limit
access to the portmapper (a common practice), then this change is
forcing them to change that.
It is not as if UDP connections are prohibitively difficult to implement
either. The entire framework is already there for the TCP case, and so
the following patch (as yet untested) should be close...
--------------------------------------------------------------------------
commit 161c60bc13899b0def4251cffa492ae6faa00b93
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Mon Jul 7 17:43:12 2008 -0400
SUNRPC: Add connected sockets for UDP
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 4486c59..2e49f5a 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -580,8 +580,8 @@ static int xs_udp_send_request(struct rpc_task *task)
req->rq_svec->iov_len);
status = xs_sendpages(transport->sock,
- xs_addr(xprt),
- xprt->addrlen, xdr,
+ NULL,
+ 0, xdr,
req->rq_bytes_sent);
dprintk("RPC: xs_udp_send_request(%u) = %d\n",
@@ -1445,13 +1445,13 @@ static inline void xs_reclassify_socket6(struct socket *sock)
}
#endif
-static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
+static int xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
{
struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
+ struct sock *sk = sock->sk;
+ int ret;
if (!transport->inet) {
- struct sock *sk = sock->sk;
-
write_lock_bh(&sk->sk_callback_lock);
sk->sk_user_data = xprt;
@@ -1463,8 +1463,6 @@ static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
sk->sk_no_check = UDP_CSUM_NORCV;
sk->sk_allocation = GFP_ATOMIC;
- xprt_set_connected(xprt);
-
/* Reset to new socket */
transport->sock = sock;
transport->inet = sk;
@@ -1472,6 +1470,39 @@ static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
write_unlock_bh(&sk->sk_callback_lock);
}
xs_udp_do_set_buffer_size(xprt);
+ ret = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, 0);
+
+ if (ret == 0) {
+ spin_lock_bh(&xprt->transport_lock);
+ if (sk->sk_state == TCP_ESTABLISHED)
+ xprt_set_connected(xprt);
+ spin_unlock_bh(&xprt->transport_lock);
+ }
+ return ret;
+}
+
+/*
+ * We need to preserve the port number so the reply cache on the server can
+ * find our cached RPC replies when we get around to reconnecting.
+ */
+static void xs_sock_reuse_connection(struct rpc_xprt *xprt)
+{
+ int result;
+ struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
+ struct sockaddr any;
+
+ dprintk("RPC: disconnecting xprt %p to reuse port\n", xprt);
+
+ /*
+ * Disconnect the transport socket by doing a connect operation
+ * with AF_UNSPEC. This should return immediately...
+ */
+ memset(&any, 0, sizeof(any));
+ any.sa_family = AF_UNSPEC;
+ result = kernel_connect(transport->sock, &any, sizeof(any), 0);
+ if (result)
+ dprintk("RPC: AF_UNSPEC connect return code %d\n",
+ result);
}
/**
@@ -1491,25 +1522,35 @@ static void xs_udp_connect_worker4(struct work_struct *work)
if (xprt->shutdown || !xprt_bound(xprt))
goto out;
- /* Start by resetting any existing state */
- xs_close(xprt);
-
- if ((err = sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock)) < 0) {
- dprintk("RPC: can't create UDP transport socket (%d).\n", -err);
- goto out;
- }
- xs_reclassify_socket4(sock);
+ if (!sock) {
+ if ((err = sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock)) < 0) {
+ dprintk("RPC: can't create UDP transport socket (%d).\n", -err);
+ goto out;
+ }
+ xs_reclassify_socket4(sock);
- if (xs_bind4(transport, sock)) {
- sock_release(sock);
- goto out;
- }
+ if (xs_bind4(transport, sock)) {
+ sock_release(sock);
+ goto out;
+ }
+ } else
+ xs_sock_reuse_connection(xprt);
dprintk("RPC: worker connecting xprt %p to address: %s\n",
xprt, xprt->address_strings[RPC_DISPLAY_ALL]);
- xs_udp_finish_connecting(xprt, sock);
- status = 0;
+ status = xs_udp_finish_connecting(xprt, sock);
+ if (status < 0) {
+ switch (status) {
+ case -ECONNREFUSED:
+ case -ECONNRESET:
+ /* retry with existing socket, after a delay */
+ break;
+ default:
+ /* get rid of existing socket, and retry */
+ xs_close(xprt);
+ }
+ }
out:
xprt_wake_pending_tasks(xprt, status);
xprt_clear_connecting(xprt);
@@ -1532,54 +1573,40 @@ static void xs_udp_connect_worker6(struct work_struct *work)
if (xprt->shutdown || !xprt_bound(xprt))
goto out;
- /* Start by resetting any existing state */
- xs_close(xprt);
-
- if ((err = sock_create_kern(PF_INET6, SOCK_DGRAM, IPPROTO_UDP, &sock)) < 0) {
- dprintk("RPC: can't create UDP transport socket (%d).\n", -err);
- goto out;
- }
- xs_reclassify_socket6(sock);
+ if (!sock) {
+ if ((err = sock_create_kern(PF_INET6, SOCK_DGRAM, IPPROTO_UDP, &sock)) < 0) {
+ dprintk("RPC: can't create UDP transport socket (%d).\n", -err);
+ goto out;
+ }
+ xs_reclassify_socket6(sock);
- if (xs_bind6(transport, sock) < 0) {
- sock_release(sock);
- goto out;
- }
+ if (xs_bind6(transport, sock) < 0) {
+ sock_release(sock);
+ goto out;
+ }
+ } else
+ xs_sock_reuse_connection(xprt);
dprintk("RPC: worker connecting xprt %p to address: %s\n",
xprt, xprt->address_strings[RPC_DISPLAY_ALL]);
- xs_udp_finish_connecting(xprt, sock);
- status = 0;
+ status = xs_udp_finish_connecting(xprt, sock);
+ if (status < 0) {
+ switch (status) {
+ case -ECONNREFUSED:
+ case -ECONNRESET:
+ /* retry with existing socket, after a delay */
+ break;
+ default:
+ /* get rid of existing socket, and retry */
+ xs_close(xprt);
+ }
+ }
out:
xprt_wake_pending_tasks(xprt, status);
xprt_clear_connecting(xprt);
}
-/*
- * We need to preserve the port number so the reply cache on the server can
- * find our cached RPC replies when we get around to reconnecting.
- */
-static void xs_tcp_reuse_connection(struct rpc_xprt *xprt)
-{
- int result;
- struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
- struct sockaddr any;
-
- dprintk("RPC: disconnecting xprt %p to reuse port\n", xprt);
-
- /*
- * Disconnect the transport socket by doing a connect operation
- * with AF_UNSPEC. This should return immediately...
- */
- memset(&any, 0, sizeof(any));
- any.sa_family = AF_UNSPEC;
- result = kernel_connect(transport->sock, &any, sizeof(any), 0);
- if (result)
- dprintk("RPC: AF_UNSPEC connect return code %d\n",
- result);
-}
-
static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
{
struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
@@ -1650,7 +1677,7 @@ static void xs_tcp_connect_worker4(struct work_struct *work)
}
} else
/* "close" the socket, preserving the local port */
- xs_tcp_reuse_connection(xprt);
+ xs_sock_reuse_connection(xprt);
dprintk("RPC: worker connecting xprt %p to address: %s\n",
xprt, xprt->address_strings[RPC_DISPLAY_ALL]);
@@ -1710,7 +1737,7 @@ static void xs_tcp_connect_worker6(struct work_struct *work)
}
} else
/* "close" the socket, preserving the local port */
- xs_tcp_reuse_connection(xprt);
+ xs_sock_reuse_connection(xprt);
dprintk("RPC: worker connecting xprt %p to address: %s\n",
xprt, xprt->address_strings[RPC_DISPLAY_ALL]);
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27
2008-07-07 22:13 ` Trond Myklebust
@ 2008-07-07 22:56 ` J. Bruce Fields
2008-07-08 1:56 ` Chuck Lever
0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-07-07 22:56 UTC (permalink / raw)
To: Trond Myklebust; +Cc: chucklever, linux-nfs
On Mon, Jul 07, 2008 at 06:13:42PM -0400, Trond Myklebust wrote:
> On Mon, 2008-07-07 at 17:19 -0400, J. Bruce Fields wrote:
> > On Mon, Jul 07, 2008 at 04:51:17PM -0400, Trond Myklebust wrote:
> > > On Mon, 2008-07-07 at 15:44 -0400, Chuck Lever wrote:
> > >
> > > > If you would like connected UDP, I won't object to you implementing
> > > > it. However, I never tested whether a connected UDP socket will give
> > > > the desired semantics without extra code in the UDP transport (for
> > > > example, an ->sk_error callback). I don't think it's worth the hassle
> > > > if we have to add code to UDP that only this tiny use case would need.
> > > >
> > >
> > > OK. I'll set these patches aside until I have time to look into adding
> > > connected UDP support.
> >
> > I'm curious--why weren't you convinced by this argument?:
> >
> > "You're talking about the difference between supporting say 1358
> > mounts at boot time versus 1357 mounts at boot time.
>
> Where did you get those figures from? Firstly, the total number of
> privileged ports is much smaller. Secondly, the number of _free_
> privileged ports can vary wildly depending on the user's setup.
So by default (from min/max_resvport and Chuck's "between 2 and 5"
estimate of privileged ports per mount) you'd get (1024-665)/(2 to 5)
mounts, so between 71 and 179 mounts, not taking into account what else
they might be used for. So that's a bit closer to the point where 1
port plus or minus might make a difference, OK.
> > "In most cases, a client with hundreds of mounts will use up
> > exactly one extra privileged TCP port to register NLM during the
> > first lockd_up() call. If these are all NFSv4 mounts, it will
> > use exactly zero extra ports, since the NFSv4 callback service
> > is not even registered.
> >
> > "Considering that _each_ mount operation can take between 2 and
> > 5 privileged ports, while registering NFSD and NLM both would
> > take exactly two ports at boot time, I think that registration
> > is wrong place to optimize."
> >
> > I'll admit to not following this carefully, but that seemed reasonable
> > to me.
>
> Like it or not, this _is_ a user interface change: if someone has set up
> their iptables firewall or is using the tcp_wrapper library to limit
> access to the portmapper (a common practice), then this change is
> forcing them to change that.
Yeah, I can buy that. OK, thanks for the explanation.
--b.
>
> It is not as if UDP connections are prohibitively difficult to implement
> either. The entire framework is already there for the TCP case, and so
> the following patch (as yet untested) should be close...
>
>
> --------------------------------------------------------------------------
> commit 161c60bc13899b0def4251cffa492ae6faa00b93
> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date: Mon Jul 7 17:43:12 2008 -0400
>
> SUNRPC: Add connected sockets for UDP
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
>
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 4486c59..2e49f5a 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -580,8 +580,8 @@ static int xs_udp_send_request(struct rpc_task *task)
> req->rq_svec->iov_len);
>
> status = xs_sendpages(transport->sock,
> - xs_addr(xprt),
> - xprt->addrlen, xdr,
> + NULL,
> + 0, xdr,
> req->rq_bytes_sent);
>
> dprintk("RPC: xs_udp_send_request(%u) = %d\n",
> @@ -1445,13 +1445,13 @@ static inline void xs_reclassify_socket6(struct socket *sock)
> }
> #endif
>
> -static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
> +static int xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
> {
> struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
> + struct sock *sk = sock->sk;
> + int ret;
>
> if (!transport->inet) {
> - struct sock *sk = sock->sk;
> -
> write_lock_bh(&sk->sk_callback_lock);
>
> sk->sk_user_data = xprt;
> @@ -1463,8 +1463,6 @@ static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
> sk->sk_no_check = UDP_CSUM_NORCV;
> sk->sk_allocation = GFP_ATOMIC;
>
> - xprt_set_connected(xprt);
> -
> /* Reset to new socket */
> transport->sock = sock;
> transport->inet = sk;
> @@ -1472,6 +1470,39 @@ static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
> write_unlock_bh(&sk->sk_callback_lock);
> }
> xs_udp_do_set_buffer_size(xprt);
> + ret = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, 0);
> +
> + if (ret == 0) {
> + spin_lock_bh(&xprt->transport_lock);
> + if (sk->sk_state == TCP_ESTABLISHED)
> + xprt_set_connected(xprt);
> + spin_unlock_bh(&xprt->transport_lock);
> + }
> + return ret;
> +}
> +
> +/*
> + * We need to preserve the port number so the reply cache on the server can
> + * find our cached RPC replies when we get around to reconnecting.
> + */
> +static void xs_sock_reuse_connection(struct rpc_xprt *xprt)
> +{
> + int result;
> + struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
> + struct sockaddr any;
> +
> + dprintk("RPC: disconnecting xprt %p to reuse port\n", xprt);
> +
> + /*
> + * Disconnect the transport socket by doing a connect operation
> + * with AF_UNSPEC. This should return immediately...
> + */
> + memset(&any, 0, sizeof(any));
> + any.sa_family = AF_UNSPEC;
> + result = kernel_connect(transport->sock, &any, sizeof(any), 0);
> + if (result)
> + dprintk("RPC: AF_UNSPEC connect return code %d\n",
> + result);
> }
>
> /**
> @@ -1491,25 +1522,35 @@ static void xs_udp_connect_worker4(struct work_struct *work)
> if (xprt->shutdown || !xprt_bound(xprt))
> goto out;
>
> - /* Start by resetting any existing state */
> - xs_close(xprt);
> -
> - if ((err = sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock)) < 0) {
> - dprintk("RPC: can't create UDP transport socket (%d).\n", -err);
> - goto out;
> - }
> - xs_reclassify_socket4(sock);
> + if (!sock) {
> + if ((err = sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock)) < 0) {
> + dprintk("RPC: can't create UDP transport socket (%d).\n", -err);
> + goto out;
> + }
> + xs_reclassify_socket4(sock);
>
> - if (xs_bind4(transport, sock)) {
> - sock_release(sock);
> - goto out;
> - }
> + if (xs_bind4(transport, sock)) {
> + sock_release(sock);
> + goto out;
> + }
> + } else
> + xs_sock_reuse_connection(xprt);
>
> dprintk("RPC: worker connecting xprt %p to address: %s\n",
> xprt, xprt->address_strings[RPC_DISPLAY_ALL]);
>
> - xs_udp_finish_connecting(xprt, sock);
> - status = 0;
> + status = xs_udp_finish_connecting(xprt, sock);
> + if (status < 0) {
> + switch (status) {
> + case -ECONNREFUSED:
> + case -ECONNRESET:
> + /* retry with existing socket, after a delay */
> + break;
> + default:
> + /* get rid of existing socket, and retry */
> + xs_close(xprt);
> + }
> + }
> out:
> xprt_wake_pending_tasks(xprt, status);
> xprt_clear_connecting(xprt);
> @@ -1532,54 +1573,40 @@ static void xs_udp_connect_worker6(struct work_struct *work)
> if (xprt->shutdown || !xprt_bound(xprt))
> goto out;
>
> - /* Start by resetting any existing state */
> - xs_close(xprt);
> -
> - if ((err = sock_create_kern(PF_INET6, SOCK_DGRAM, IPPROTO_UDP, &sock)) < 0) {
> - dprintk("RPC: can't create UDP transport socket (%d).\n", -err);
> - goto out;
> - }
> - xs_reclassify_socket6(sock);
> + if (!sock) {
> + if ((err = sock_create_kern(PF_INET6, SOCK_DGRAM, IPPROTO_UDP, &sock)) < 0) {
> + dprintk("RPC: can't create UDP transport socket (%d).\n", -err);
> + goto out;
> + }
> + xs_reclassify_socket6(sock);
>
> - if (xs_bind6(transport, sock) < 0) {
> - sock_release(sock);
> - goto out;
> - }
> + if (xs_bind6(transport, sock) < 0) {
> + sock_release(sock);
> + goto out;
> + }
> + } else
> + xs_sock_reuse_connection(xprt);
>
> dprintk("RPC: worker connecting xprt %p to address: %s\n",
> xprt, xprt->address_strings[RPC_DISPLAY_ALL]);
>
> - xs_udp_finish_connecting(xprt, sock);
> - status = 0;
> + status = xs_udp_finish_connecting(xprt, sock);
> + if (status < 0) {
> + switch (status) {
> + case -ECONNREFUSED:
> + case -ECONNRESET:
> + /* retry with existing socket, after a delay */
> + break;
> + default:
> + /* get rid of existing socket, and retry */
> + xs_close(xprt);
> + }
> + }
> out:
> xprt_wake_pending_tasks(xprt, status);
> xprt_clear_connecting(xprt);
> }
>
> -/*
> - * We need to preserve the port number so the reply cache on the server can
> - * find our cached RPC replies when we get around to reconnecting.
> - */
> -static void xs_tcp_reuse_connection(struct rpc_xprt *xprt)
> -{
> - int result;
> - struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
> - struct sockaddr any;
> -
> - dprintk("RPC: disconnecting xprt %p to reuse port\n", xprt);
> -
> - /*
> - * Disconnect the transport socket by doing a connect operation
> - * with AF_UNSPEC. This should return immediately...
> - */
> - memset(&any, 0, sizeof(any));
> - any.sa_family = AF_UNSPEC;
> - result = kernel_connect(transport->sock, &any, sizeof(any), 0);
> - if (result)
> - dprintk("RPC: AF_UNSPEC connect return code %d\n",
> - result);
> -}
> -
> static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
> {
> struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
> @@ -1650,7 +1677,7 @@ static void xs_tcp_connect_worker4(struct work_struct *work)
> }
> } else
> /* "close" the socket, preserving the local port */
> - xs_tcp_reuse_connection(xprt);
> + xs_sock_reuse_connection(xprt);
>
> dprintk("RPC: worker connecting xprt %p to address: %s\n",
> xprt, xprt->address_strings[RPC_DISPLAY_ALL]);
> @@ -1710,7 +1737,7 @@ static void xs_tcp_connect_worker6(struct work_struct *work)
> }
> } else
> /* "close" the socket, preserving the local port */
> - xs_tcp_reuse_connection(xprt);
> + xs_sock_reuse_connection(xprt);
>
> dprintk("RPC: worker connecting xprt %p to address: %s\n",
> xprt, xprt->address_strings[RPC_DISPLAY_ALL]);
>
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> Trond.Myklebust@netapp.com
> www.netapp.com
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27
2008-07-07 22:56 ` J. Bruce Fields
@ 2008-07-08 1:56 ` Chuck Lever
0 siblings, 0 replies; 28+ messages in thread
From: Chuck Lever @ 2008-07-08 1:56 UTC (permalink / raw)
To: J. Bruce Fields, Trond Myklebust; +Cc: linux-nfs
On Mon, Jul 7, 2008 at 6:56 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Mon, Jul 07, 2008 at 06:13:42PM -0400, Trond Myklebust wrote:
>> On Mon, 2008-07-07 at 17:19 -0400, J. Bruce Fields wrote:
>> > On Mon, Jul 07, 2008 at 04:51:17PM -0400, Trond Myklebust wrote:
>> > > On Mon, 2008-07-07 at 15:44 -0400, Chuck Lever wrote:
>> > >
>> > > > If you would like connected UDP, I won't object to you implementing
>> > > > it. However, I never tested whether a connected UDP socket will give
>> > > > the desired semantics without extra code in the UDP transport (for
>> > > > example, an ->sk_error callback). I don't think it's worth the hassle
>> > > > if we have to add code to UDP that only this tiny use case would need.
>> > > >
>> > >
>> > > OK. I'll set these patches aside until I have time to look into adding
>> > > connected UDP support.
>> >
>> > I'm curious--why weren't you convinced by this argument?:
>> >
>> > "You're talking about the difference between supporting say 1358
>> > mounts at boot time versus 1357 mounts at boot time.
>>
>> Where did you get those figures from?
It's just an example.
>> Firstly, the total number of
>> privileged ports is much smaller. Secondly, the number of _free_
>> privileged ports can vary wildly depending on the user's setup.
That's my point. An admin could _maybe_ get one more NFS mount if she
disables a network service above port 665. Are we going to stop users
from enabling non-NFS network services that might use an extra
privileged port?
The right way to fix this is to fix the way Linux does NFS mounts. It
would be nice if we had a transport capability for performing RPCs
against multiple remote servers from one datagram socket. (not
broadcast RPC, but simply sharing one socket amongst multiple datagram
RPC consumers on the client).
This kind of transport would definitely help the mount port limitation
-- the kernel mount client and the kernel rpcbind client could do all
their UDP-based traffic through a single privileged socket. Would it
be hard to share an rpc_xprt between several rpc_clients for a group
of low-volume RPC services?
The RPC client could also provide a small pool of stream transport
sockets for the same purpose. They can stay connected to a remote
until they time out, or until another local consumer needs to send to
a unique unconnected remote and there aren't any free sockets in the
pool.
Nfs-utils uses the whole range of privileged ports, being careful to
avoid well-known ports for long-lived services. The kernel should be
able to do this also for short-lived sockets like making a MNT or
rpcbind request.
We could even force the RPC service registration code to use a low
numbered privileged port to avoid the ports normally used by
long-lived RPC sockets.
> So by default (from min/max_resvport and Chuck's "between 2 and 5"
> estimate of privileged ports per mount) you'd get (1024-665)/(2 to 5)
> mounts, so between 71 and 179 mounts, not taking into account what else
> they might be used for. So that's a bit closer to the point where 1
> port plus or minus might make a difference, OK.
Doing a mount takes some time. If it takes as long as a second, for
example, for each mount to complete, that gives enough time for one or
more TCP ports used and abandoned for the first mounts to become
available again for later ones. If some of the ports are UDP and some
TCP, that means even more ports are available, since the port range is
effectively twice as large, and only the TCP ports remain in
TIME_WAIT. A MNT request, for instance, goes over UDP by default.
So it's not as simple as this calculation implies -- it depends on
timing, other enabled non-NFS services, and which transports are used
for NFS. One extra port is noise.
In many cases someone who needs more than a hundred or so mounts can
use unprivileged source ports, as Trond suggested on this list last
week.
>> > "In most cases, a client with hundreds of mounts will use up
>> > exactly one extra privileged TCP port to register NLM during the
>> > first lockd_up() call. If these are all NFSv4 mounts, it will
>> > use exactly zero extra ports, since the NFSv4 callback service
>> > is not even registered.
>> >
>> > "Considering that _each_ mount operation can take between 2 and
>> > 5 privileged ports, while registering NFSD and NLM both would
>> > take exactly two ports at boot time, I think that registration
>> > is wrong place to optimize."
>> >
>> > I'll admit to not following this carefully, but that seemed reasonable
>> > to me.
>>
>> Like it or not, this _is_ a user interface change: if someone has set up
>> their iptables firewall or is using the tcp_wrapper library to limit
>> access to the portmapper (a common practice), then this change is
>> forcing them to change that.
It would have been nice to hear such a specific complaint months ago
when I first suggested doing this on linux-nfs.
>> It is not as if UDP connections are prohibitively difficult to implement
>> either. The entire framework is already there for the TCP case, and so
>> the following patch (as yet untested) should be close...
I don't think CUDP is hard to implement or a bad idea. It just seemed
like a lot to do for a simple change that can be accomplished with an
existing transport implementation.
>> --------------------------------------------------------------------------
>> commit 161c60bc13899b0def4251cffa492ae6faa00b93
>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
>> Date: Mon Jul 7 17:43:12 2008 -0400
>>
>> SUNRPC: Add connected sockets for UDP
I haven't looked closely at this in a while, but datagram socket
disconnection events may be reported to kernel datagram sockets via
->sk_error(), and not via an error code from sendmsg() or connect().
Also, do we need a set of shorter reconnect timeout values especially
for UDP sockets?
Is there any possibility that UDP-based RPC servers may see a difference?
A supplemental change might be to make the kernel's mount and rpcbind
clients use RPC_TASK_ONESHOT or whatever we call it.
>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
>>
>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>> index 4486c59..2e49f5a 100644
>> --- a/net/sunrpc/xprtsock.c
>> +++ b/net/sunrpc/xprtsock.c
>> @@ -580,8 +580,8 @@ static int xs_udp_send_request(struct rpc_task *task)
>> req->rq_svec->iov_len);
>>
>> status = xs_sendpages(transport->sock,
>> - xs_addr(xprt),
>> - xprt->addrlen, xdr,
>> + NULL,
>> + 0, xdr,
>> req->rq_bytes_sent);
This is the only place an address and length are passed to
xs_sendpages(), so you could eliminate these arguments and simplify
xs_sendpages() as a result.
However, if we want both a connected UDP capability and a shared UDP
socket capability, we may need to keep these arguments, and make the
UDP connection logic added below conditional.
>> dprintk("RPC: xs_udp_send_request(%u) = %d\n",
>> @@ -1445,13 +1445,13 @@ static inline void xs_reclassify_socket6(struct socket *sock)
>> }
>> #endif
>>
>> -static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
>> +static int xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
>> {
>> struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
>> + struct sock *sk = sock->sk;
>> + int ret;
>>
>> if (!transport->inet) {
>> - struct sock *sk = sock->sk;
>> -
>> write_lock_bh(&sk->sk_callback_lock);
>>
>> sk->sk_user_data = xprt;
>> @@ -1463,8 +1463,6 @@ static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
>> sk->sk_no_check = UDP_CSUM_NORCV;
>> sk->sk_allocation = GFP_ATOMIC;
>>
>> - xprt_set_connected(xprt);
>> -
>> /* Reset to new socket */
>> transport->sock = sock;
>> transport->inet = sk;
>> @@ -1472,6 +1470,39 @@ static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
>> write_unlock_bh(&sk->sk_callback_lock);
>> }
>> xs_udp_do_set_buffer_size(xprt);
>> + ret = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, 0);
>> +
>> + if (ret == 0) {
>> + spin_lock_bh(&xprt->transport_lock);
>> + if (sk->sk_state == TCP_ESTABLISHED)
>> + xprt_set_connected(xprt);
>> + spin_unlock_bh(&xprt->transport_lock);
>> + }
>> + return ret;
>> +}
>> +
>> +/*
>> + * We need to preserve the port number so the reply cache on the server can
>> + * find our cached RPC replies when we get around to reconnecting.
>> + */
>> +static void xs_sock_reuse_connection(struct rpc_xprt *xprt)
>> +{
>> + int result;
>> + struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
>> + struct sockaddr any;
>> +
>> + dprintk("RPC: disconnecting xprt %p to reuse port\n", xprt);
>> +
>> + /*
>> + * Disconnect the transport socket by doing a connect operation
>> + * with AF_UNSPEC. This should return immediately...
>> + */
>> + memset(&any, 0, sizeof(any));
>> + any.sa_family = AF_UNSPEC;
>> + result = kernel_connect(transport->sock, &any, sizeof(any), 0);
>> + if (result)
>> + dprintk("RPC: AF_UNSPEC connect return code %d\n",
>> + result);
>> }
>>
>> /**
>> @@ -1491,25 +1522,35 @@ static void xs_udp_connect_worker4(struct work_struct *work)
>> if (xprt->shutdown || !xprt_bound(xprt))
>> goto out;
>>
>> - /* Start by resetting any existing state */
>> - xs_close(xprt);
>> -
>> - if ((err = sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock)) < 0) {
>> - dprintk("RPC: can't create UDP transport socket (%d).\n", -err);
>> - goto out;
>> - }
>> - xs_reclassify_socket4(sock);
>> + if (!sock) {
>> + if ((err = sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock)) < 0) {
>> + dprintk("RPC: can't create UDP transport socket (%d).\n", -err);
>> + goto out;
>> + }
>> + xs_reclassify_socket4(sock);
>>
>> - if (xs_bind4(transport, sock)) {
>> - sock_release(sock);
>> - goto out;
>> - }
>> + if (xs_bind4(transport, sock)) {
>> + sock_release(sock);
>> + goto out;
>> + }
>> + } else
>> + xs_sock_reuse_connection(xprt);
>>
>> dprintk("RPC: worker connecting xprt %p to address: %s\n",
>> xprt, xprt->address_strings[RPC_DISPLAY_ALL]);
>>
>> - xs_udp_finish_connecting(xprt, sock);
>> - status = 0;
>> + status = xs_udp_finish_connecting(xprt, sock);
>> + if (status < 0) {
>> + switch (status) {
>> + case -ECONNREFUSED:
>> + case -ECONNRESET:
>> + /* retry with existing socket, after a delay */
>> + break;
>> + default:
>> + /* get rid of existing socket, and retry */
>> + xs_close(xprt);
>> + }
>> + }
>> out:
>> xprt_wake_pending_tasks(xprt, status);
>> xprt_clear_connecting(xprt);
>> @@ -1532,54 +1573,40 @@ static void xs_udp_connect_worker6(struct work_struct *work)
>> if (xprt->shutdown || !xprt_bound(xprt))
>> goto out;
>>
>> - /* Start by resetting any existing state */
>> - xs_close(xprt);
>> -
>> - if ((err = sock_create_kern(PF_INET6, SOCK_DGRAM, IPPROTO_UDP, &sock)) < 0) {
>> - dprintk("RPC: can't create UDP transport socket (%d).\n", -err);
>> - goto out;
>> - }
>> - xs_reclassify_socket6(sock);
>> + if (!sock) {
>> + if ((err = sock_create_kern(PF_INET6, SOCK_DGRAM, IPPROTO_UDP, &sock)) < 0) {
>> + dprintk("RPC: can't create UDP transport socket (%d).\n", -err);
>> + goto out;
>> + }
>> + xs_reclassify_socket6(sock);
>>
>> - if (xs_bind6(transport, sock) < 0) {
>> - sock_release(sock);
>> - goto out;
>> - }
>> + if (xs_bind6(transport, sock) < 0) {
>> + sock_release(sock);
>> + goto out;
>> + }
>> + } else
>> + xs_sock_reuse_connection(xprt);
>>
>> dprintk("RPC: worker connecting xprt %p to address: %s\n",
>> xprt, xprt->address_strings[RPC_DISPLAY_ALL]);
>>
>> - xs_udp_finish_connecting(xprt, sock);
>> - status = 0;
>> + status = xs_udp_finish_connecting(xprt, sock);
>> + if (status < 0) {
>> + switch (status) {
>> + case -ECONNREFUSED:
>> + case -ECONNRESET:
>> + /* retry with existing socket, after a delay */
>> + break;
>> + default:
>> + /* get rid of existing socket, and retry */
>> + xs_close(xprt);
>> + }
>> + }
>> out:
>> xprt_wake_pending_tasks(xprt, status);
>> xprt_clear_connecting(xprt);
>> }
>>
>> -/*
>> - * We need to preserve the port number so the reply cache on the server can
>> - * find our cached RPC replies when we get around to reconnecting.
>> - */
>> -static void xs_tcp_reuse_connection(struct rpc_xprt *xprt)
>> -{
>> - int result;
>> - struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
>> - struct sockaddr any;
>> -
>> - dprintk("RPC: disconnecting xprt %p to reuse port\n", xprt);
>> -
>> - /*
>> - * Disconnect the transport socket by doing a connect operation
>> - * with AF_UNSPEC. This should return immediately...
>> - */
>> - memset(&any, 0, sizeof(any));
>> - any.sa_family = AF_UNSPEC;
>> - result = kernel_connect(transport->sock, &any, sizeof(any), 0);
>> - if (result)
>> - dprintk("RPC: AF_UNSPEC connect return code %d\n",
>> - result);
>> -}
>> -
>> static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
>> {
>> struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
>> @@ -1650,7 +1677,7 @@ static void xs_tcp_connect_worker4(struct work_struct *work)
>> }
>> } else
>> /* "close" the socket, preserving the local port */
>> - xs_tcp_reuse_connection(xprt);
>> + xs_sock_reuse_connection(xprt);
>>
>> dprintk("RPC: worker connecting xprt %p to address: %s\n",
>> xprt, xprt->address_strings[RPC_DISPLAY_ALL]);
>> @@ -1710,7 +1737,7 @@ static void xs_tcp_connect_worker6(struct work_struct *work)
>> }
>> } else
>> /* "close" the socket, preserving the local port */
>> - xs_tcp_reuse_connection(xprt);
>> + xs_sock_reuse_connection(xprt);
>>
>> dprintk("RPC: worker connecting xprt %p to address: %s\n",
>> xprt, xprt->address_strings[RPC_DISPLAY_ALL]);
--
Chuck Lever
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27
2008-07-07 20:51 ` Trond Myklebust
2008-07-07 21:19 ` J. Bruce Fields
@ 2008-07-10 17:27 ` Chuck Lever
2008-07-11 18:40 ` J. Bruce Fields
1 sibling, 1 reply; 28+ messages in thread
From: Chuck Lever @ 2008-07-10 17:27 UTC (permalink / raw)
To: Trond Myklebust; +Cc: J. Bruce Fields, Linux NFS Mailing List
On Jul 7, 2008, at 4:51 PM, Trond Myklebust wrote:
> On Mon, 2008-07-07 at 15:44 -0400, Chuck Lever wrote:
>
>> If you would like connected UDP, I won't object to you implementing
>> it. However, I never tested whether a connected UDP socket will give
>> the desired semantics without extra code in the UDP transport (for
>> example, an ->sk_error callback). I don't think it's worth the
>> hassle
>> if we have to add code to UDP that only this tiny use case would
>> need.
>>
>
> OK. I'll set these patches aside until I have time to look into adding
> connected UDP support.
That's not completely necessary... the one-shot + TCP changes just
make it nicer when the local rpcbind is not listening. Without these,
the cases where the rpcbind daemon isn't running, or doesn't support
rpcbind v3/v4 and the kernel was built with CONFIG_SUNRPC_REGISTER_V4,
will cause some delays before failing, but otherwise shouldn't be a
problem.
I think you can drop the patch to change rpcb registration to go over
TCP for now unless you already have a CUDP implementation you are
happy with.
I know a lot of folks are waiting for IPv6 support to appear, and I
don't want this detail to hang it up.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27
2008-07-10 17:27 ` Chuck Lever
@ 2008-07-11 18:40 ` J. Bruce Fields
2008-07-11 19:11 ` Chuck Lever
0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-07-11 18:40 UTC (permalink / raw)
To: Chuck Lever; +Cc: Trond Myklebust, Linux NFS Mailing List
On Thu, Jul 10, 2008 at 01:27:47PM -0400, Chuck Lever wrote:
>
> On Jul 7, 2008, at 4:51 PM, Trond Myklebust wrote:
>
>> On Mon, 2008-07-07 at 15:44 -0400, Chuck Lever wrote:
>>
>>> If you would like connected UDP, I won't object to you implementing
>>> it. However, I never tested whether a connected UDP socket will give
>>> the desired semantics without extra code in the UDP transport (for
>>> example, an ->sk_error callback). I don't think it's worth the
>>> hassle
>>> if we have to add code to UDP that only this tiny use case would
>>> need.
>>>
>>
>> OK. I'll set these patches aside until I have time to look into adding
>> connected UDP support.
>
> That's not completely necessary... the one-shot + TCP changes just make
> it nicer when the local rpcbind is not listening. Without these, the
> cases where the rpcbind daemon isn't running, or doesn't support rpcbind
> v3/v4 and the kernel was built with CONFIG_SUNRPC_REGISTER_V4, will cause
> some delays before failing, but otherwise shouldn't be a problem.
>
> I think you can drop the patch to change rpcb registration to go over
> TCP for now unless you already have a CUDP implementation you are happy
> with.
So actually in your original series of 7 I think that'd mean dropping
numbers 5 and 6 and keeping the rest?
I've lost track of the status of the 3 series you submitted on the 30th:
"Remaining rpcbind patches for 2.6.27"
- this one, probably ready after dropping 2 patches
"rpcbind v4 support in net/sunrpc/svc*"
- Do you still want this considered for 2.6.27?
"NLM clean-ups for IPv6 support"
- I think you were saying there's still a bug being
tracked down in this series?
--b.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27
2008-07-11 18:40 ` J. Bruce Fields
@ 2008-07-11 19:11 ` Chuck Lever
[not found] ` <76bd70e30807111211m567e9f8cv38a975bbc9df5758-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 28+ messages in thread
From: Chuck Lever @ 2008-07-11 19:11 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Trond Myklebust, Linux NFS Mailing List
On Fri, Jul 11, 2008 at 2:40 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Thu, Jul 10, 2008 at 01:27:47PM -0400, Chuck Lever wrote:
>>
>> On Jul 7, 2008, at 4:51 PM, Trond Myklebust wrote:
>>
>>> On Mon, 2008-07-07 at 15:44 -0400, Chuck Lever wrote:
>>>
>>>> If you would like connected UDP, I won't object to you implementing
>>>> it. However, I never tested whether a connected UDP socket will give
>>>> the desired semantics without extra code in the UDP transport (for
>>>> example, an ->sk_error callback). I don't think it's worth the
>>>> hassle
>>>> if we have to add code to UDP that only this tiny use case would
>>>> need.
>>>>
>>>
>>> OK. I'll set these patches aside until I have time to look into adding
>>> connected UDP support.
>>
>> That's not completely necessary... the one-shot + TCP changes just make
>> it nicer when the local rpcbind is not listening. Without these, the
>> cases where the rpcbind daemon isn't running, or doesn't support rpcbind
>> v3/v4 and the kernel was built with CONFIG_SUNRPC_REGISTER_V4, will cause
>> some delays before failing, but otherwise shouldn't be a problem.
>>
>> I think you can drop the patch to change rpcb registration to go over
>> TCP for now unless you already have a CUDP implementation you are happy
>> with.
>
> So actually in your original series of 7 I think that'd mean dropping
> numbers 5 and 6 and keeping the rest?
So, 5/7 adds "one shot" support to the RPC client. I think that might
be interesting for other kernel services, like making rpcbind queries
over TCP, or NFSv4 callback. I'd like to advocate for keeping that
one so others can build on it (with whatever name for the create flag
we can agree on), but it's not really necessary for subsequent
patches.
6/7 changes the rpcb_register logic to use "one shot" + TCP -- that's
the one that is controversial and can be dropped.
> I've lost track of the status of the 3 series you submitted on the 30th:
>
> "Remaining rpcbind patches for 2.6.27"
> - this one, probably ready after dropping 2 patches
Yes.
> "rpcbind v4 support in net/sunrpc/svc*"
> - Do you still want this considered for 2.6.27?
Yes, please.
The default CONFIG setting added in this patch set should cause the
kernel to continue using portmap instead of rpcbindv3/v4 for RPC
service registration, so by default there shouldn't be any change in
behavior.
> "NLM clean-ups for IPv6 support"
> - I think you were saying there's still a bug being
> tracked down in this series?
There are probably a few bugs in this series, but I'd still like to
consider it for 2.6.27. I think the architecture that is laid out
here is pretty solid, and we will have time to exercise this and get
it right in linux-next and during .27's -rc period. Even if the whole
series is not included, I think there are good cleanups here that
should be solid enough to include.
The bug I mentioned last night is with lock recovery with NFSv3 over
IPv4, so it's not something that prevents NFSv2/v3 from working in
general. We haven't had a decent lock recovery test until just
recently.
Can we assume this is going in for now, and start the review and
integration process? I've already made a few minor changes in this
series since I posted these, so I'm sure I will have to post at least
one refresh. But it would be useful to review this series carefully
now even if some or all of it is not going into 2.6.27.
Thanks for asking!
--
Chuck Lever
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 0/7] Remaining rpcbind patches for 2.6.27
[not found] ` <76bd70e30807111211m567e9f8cv38a975bbc9df5758-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-07-14 19:56 ` J. Bruce Fields
[not found] ` <76bd70e30807141430o783ef431pb61eae97b42e00b4@mail.gmail.com>
2008-07-14 20:03 ` [PATCH 1/5] SUNRPC: Use correct XDR encoding procedure for rpcbind SET/UNSET J. Bruce Fields
1 sibling, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-07-14 19:56 UTC (permalink / raw)
To: chucklever; +Cc: Trond Myklebust, Linux NFS Mailing List
On Fri, Jul 11, 2008 at 03:11:29PM -0400, Chuck Lever wrote:
> On Fri, Jul 11, 2008 at 2:40 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > On Thu, Jul 10, 2008 at 01:27:47PM -0400, Chuck Lever wrote:
> >>
> >> On Jul 7, 2008, at 4:51 PM, Trond Myklebust wrote:
> >>
> >>> On Mon, 2008-07-07 at 15:44 -0400, Chuck Lever wrote:
> >>>
> >>>> If you would like connected UDP, I won't object to you implementing
> >>>> it. However, I never tested whether a connected UDP socket will give
> >>>> the desired semantics without extra code in the UDP transport (for
> >>>> example, an ->sk_error callback). I don't think it's worth the
> >>>> hassle
> >>>> if we have to add code to UDP that only this tiny use case would
> >>>> need.
> >>>>
> >>>
> >>> OK. I'll set these patches aside until I have time to look into adding
> >>> connected UDP support.
> >>
> >> That's not completely necessary... the one-shot + TCP changes just make
> >> it nicer when the local rpcbind is not listening. Without these, the
> >> cases where the rpcbind daemon isn't running, or doesn't support rpcbind
> >> v3/v4 and the kernel was built with CONFIG_SUNRPC_REGISTER_V4, will cause
> >> some delays before failing, but otherwise shouldn't be a problem.
> >>
> >> I think you can drop the patch to change rpcb registration to go over
> >> TCP for now unless you already have a CUDP implementation you are happy
> >> with.
> >
> > So actually in your original series of 7 I think that'd mean dropping
> > numbers 5 and 6 and keeping the rest?
>
> So, 5/7 adds "one shot" support to the RPC client. I think that might
> be interesting for other kernel services, like making rpcbind queries
> over TCP, or NFSv4 callback. I'd like to advocate for keeping that
> one so others can build on it (with whatever name for the create flag
> we can agree on), but it's not really necessary for subsequent
> patches.
>
> 6/7 changes the rpcb_register logic to use "one shot" + TCP -- that's
> the one that is controversial and can be dropped.
May as well at least apply the other 5? Trond is carrying other
net/sunrpc/rpcb_clnt.c patches, so they probably need to go in his tree.
I guess I'll go ahead and send along versions based on latest
trond/devel.
--b.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 1/5] SUNRPC: Use correct XDR encoding procedure for rpcbind SET/UNSET
[not found] ` <76bd70e30807111211m567e9f8cv38a975bbc9df5758-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-07-14 19:56 ` J. Bruce Fields
@ 2008-07-14 20:03 ` J. Bruce Fields
2008-07-14 20:03 ` [PATCH 2/5] SUNRPC: Introduce a specific rpcb_create for contacting localhost J. Bruce Fields
1 sibling, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-07-14 20:03 UTC (permalink / raw)
To: Trond Myklebust
Cc: Linux NFS Mailing List, chucklever, Chuck Lever, J. Bruce Fields
From: Chuck Lever <chuck.lever@oracle.com>
The rpcbind versions 3 and 4 SET and UNSET procedures use the same
arguments as the GETADDR procedure.
While definitely a bug, this hasn't been a problem so far since the
kernel hasn't used version 3 or 4 SET and UNSET. But this will change
in just a moment.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
---
net/sunrpc/rpcb_clnt.c | 12 ++++++++----
1 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 24e93e0..0021fad 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -426,6 +426,10 @@ static void rpcb_getport_done(struct rpc_task *child, void *data)
map->r_status = status;
}
+/*
+ * XDR functions for rpcbind
+ */
+
static int rpcb_encode_mapping(struct rpc_rqst *req, __be32 *p,
struct rpcbind_args *rpcb)
{
@@ -581,14 +585,14 @@ static struct rpc_procinfo rpcb_procedures2[] = {
};
static struct rpc_procinfo rpcb_procedures3[] = {
- PROC(SET, mapping, set),
- PROC(UNSET, mapping, set),
+ PROC(SET, getaddr, set),
+ PROC(UNSET, getaddr, set),
PROC(GETADDR, getaddr, getaddr),
};
static struct rpc_procinfo rpcb_procedures4[] = {
- PROC(SET, mapping, set),
- PROC(UNSET, mapping, set),
+ PROC(SET, getaddr, set),
+ PROC(UNSET, getaddr, set),
PROC(GETADDR, getaddr, getaddr),
PROC(GETVERSADDR, getaddr, getaddr),
};
--
1.5.5.rc1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 2/5] SUNRPC: Introduce a specific rpcb_create for contacting localhost
2008-07-14 20:03 ` [PATCH 1/5] SUNRPC: Use correct XDR encoding procedure for rpcbind SET/UNSET J. Bruce Fields
@ 2008-07-14 20:03 ` J. Bruce Fields
2008-07-14 20:03 ` [PATCH 3/5] SUNRPC: None of rpcb_create's callers wants a privileged source port J. Bruce Fields
0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-07-14 20:03 UTC (permalink / raw)
To: Trond Myklebust
Cc: Linux NFS Mailing List, chucklever, Chuck Lever, J. Bruce Fields
From: Chuck Lever <chuck.lever@oracle.com>
Add rpcb_create_local() for use by rpcb_register() and upcoming IPv6
registration functions.
Ensure any errors encountered by rpcb_create_local() are properly
reported.
We can also use a statically allocated constant loopback socket address
instead of one allocated on the stack and initialized every time the
function is called.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
---
net/sunrpc/rpcb_clnt.c | 42 +++++++++++++++++++++++++++++++-----------
1 files changed, 31 insertions(+), 11 deletions(-)
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 0021fad..35c1ded 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -116,6 +116,29 @@ static void rpcb_map_release(void *data)
kfree(map);
}
+static const struct sockaddr_in rpcb_inaddr_loopback = {
+ .sin_family = AF_INET,
+ .sin_addr.s_addr = htonl(INADDR_LOOPBACK),
+ .sin_port = htons(RPCBIND_PORT),
+};
+
+static struct rpc_clnt *rpcb_create_local(struct sockaddr *addr,
+ size_t addrlen, u32 version)
+{
+ struct rpc_create_args args = {
+ .protocol = XPRT_TRANSPORT_UDP,
+ .address = addr,
+ .addrsize = addrlen,
+ .servername = "localhost",
+ .program = &rpcb_program,
+ .version = version,
+ .authflavor = RPC_AUTH_UNIX,
+ .flags = RPC_CLNT_CREATE_NOPING,
+ };
+
+ return rpc_create(&args);
+}
+
static struct rpc_clnt *rpcb_create(char *hostname, struct sockaddr *srvaddr,
size_t salen, int proto, u32 version,
int privileged)
@@ -161,10 +184,6 @@ static struct rpc_clnt *rpcb_create(char *hostname, struct sockaddr *srvaddr,
*/
int rpcb_register(u32 prog, u32 vers, int prot, unsigned short port, int *okay)
{
- struct sockaddr_in sin = {
- .sin_family = AF_INET,
- .sin_addr.s_addr = htonl(INADDR_LOOPBACK),
- };
struct rpcbind_args map = {
.r_prog = prog,
.r_vers = vers,
@@ -184,14 +203,15 @@ int rpcb_register(u32 prog, u32 vers, int prot, unsigned short port, int *okay)
"rpcbind\n", (port ? "" : "un"),
prog, vers, prot, port);
- rpcb_clnt = rpcb_create("localhost", (struct sockaddr *) &sin,
- sizeof(sin), XPRT_TRANSPORT_UDP, RPCBVERS_2, 1);
- if (IS_ERR(rpcb_clnt))
- return PTR_ERR(rpcb_clnt);
+ rpcb_clnt = rpcb_create_local((struct sockaddr *)&rpcb_inaddr_loopback,
+ sizeof(rpcb_inaddr_loopback),
+ RPCBVERS_2);
+ if (!IS_ERR(rpcb_clnt)) {
+ error = rpc_call_sync(rpcb_clnt, &msg, 0);
+ rpc_shutdown_client(rpcb_clnt);
+ } else
+ error = PTR_ERR(rpcb_clnt);
- error = rpc_call_sync(rpcb_clnt, &msg, 0);
-
- rpc_shutdown_client(rpcb_clnt);
if (error < 0)
printk(KERN_WARNING "RPC: failed to contact local rpcbind "
"server (errno %d).\n", -error);
--
1.5.5.rc1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 3/5] SUNRPC: None of rpcb_create's callers wants a privileged source port
2008-07-14 20:03 ` [PATCH 2/5] SUNRPC: Introduce a specific rpcb_create for contacting localhost J. Bruce Fields
@ 2008-07-14 20:03 ` J. Bruce Fields
2008-07-14 20:03 ` [PATCH 4/5] SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier J. Bruce Fields
0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-07-14 20:03 UTC (permalink / raw)
To: Trond Myklebust
Cc: Linux NFS Mailing List, chucklever, Chuck Lever, J. Bruce Fields
From: Chuck Lever <chuck.lever@oracle.com>
Clean up: Callers that required a privileged source port now use
rpcb_create_local(), so we can remove the @privileged argument from
rpcb_create().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
---
net/sunrpc/rpcb_clnt.c | 12 +++++-------
1 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 35c1ded..691bd21 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -140,8 +140,7 @@ static struct rpc_clnt *rpcb_create_local(struct sockaddr *addr,
}
static struct rpc_clnt *rpcb_create(char *hostname, struct sockaddr *srvaddr,
- size_t salen, int proto, u32 version,
- int privileged)
+ size_t salen, int proto, u32 version)
{
struct rpc_create_args args = {
.protocol = proto,
@@ -151,7 +150,8 @@ static struct rpc_clnt *rpcb_create(char *hostname, struct sockaddr *srvaddr,
.program = &rpcb_program,
.version = version,
.authflavor = RPC_AUTH_UNIX,
- .flags = RPC_CLNT_CREATE_NOPING,
+ .flags = (RPC_CLNT_CREATE_NOPING |
+ RPC_CLNT_CREATE_NONPRIVPORT),
};
switch (srvaddr->sa_family) {
@@ -165,8 +165,6 @@ static struct rpc_clnt *rpcb_create(char *hostname, struct sockaddr *srvaddr,
return NULL;
}
- if (!privileged)
- args.flags |= RPC_CLNT_CREATE_NONPRIVPORT;
return rpc_create(&args);
}
@@ -255,7 +253,7 @@ int rpcb_getport_sync(struct sockaddr_in *sin, u32 prog, u32 vers, int prot)
__func__, NIPQUAD(sin->sin_addr.s_addr), prog, vers, prot);
rpcb_clnt = rpcb_create(NULL, (struct sockaddr *)sin,
- sizeof(*sin), prot, RPCBVERS_2, 0);
+ sizeof(*sin), prot, RPCBVERS_2);
if (IS_ERR(rpcb_clnt))
return PTR_ERR(rpcb_clnt);
@@ -365,7 +363,7 @@ void rpcb_getport_async(struct rpc_task *task)
task->tk_pid, __func__, bind_version);
rpcb_clnt = rpcb_create(clnt->cl_server, sap, salen, xprt->prot,
- bind_version, 0);
+ bind_version);
if (IS_ERR(rpcb_clnt)) {
status = PTR_ERR(rpcb_clnt);
dprintk("RPC: %5u %s: rpcb_create failed, error %ld\n",
--
1.5.5.rc1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 4/5] SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier
2008-07-14 20:03 ` [PATCH 3/5] SUNRPC: None of rpcb_create's callers wants a privileged source port J. Bruce Fields
@ 2008-07-14 20:03 ` J. Bruce Fields
2008-07-14 20:03 ` [PATCH 5/5] SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon J. Bruce Fields
0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-07-14 20:03 UTC (permalink / raw)
To: Trond Myklebust
Cc: Linux NFS Mailing List, chucklever, Chuck Lever, J. Bruce Fields
From: Chuck Lever <chuck.lever@oracle.com>
rpcbind version 4 registration will reuse part of rpcb_register, so just
split it out into a separate function now.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
---
net/sunrpc/rpcb_clnt.c | 48 ++++++++++++++++++++++++++++++------------------
1 files changed, 30 insertions(+), 18 deletions(-)
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 691bd21..8b75c30 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -168,6 +168,30 @@ static struct rpc_clnt *rpcb_create(char *hostname, struct sockaddr *srvaddr,
return rpc_create(&args);
}
+static int rpcb_register_call(struct sockaddr *addr, size_t addrlen,
+ u32 version, struct rpc_message *msg,
+ int *result)
+{
+ struct rpc_clnt *rpcb_clnt;
+ int error = 0;
+
+ *result = 0;
+
+ rpcb_clnt = rpcb_create_local(addr, addrlen, version);
+ if (!IS_ERR(rpcb_clnt)) {
+ error = rpc_call_sync(rpcb_clnt, msg, 0);
+ rpc_shutdown_client(rpcb_clnt);
+ } else
+ error = PTR_ERR(rpcb_clnt);
+
+ if (error < 0)
+ printk(KERN_WARNING "RPC: failed to contact local rpcbind "
+ "server (errno %d).\n", -error);
+ dprintk("RPC: registration status %d/%d\n", error, *result);
+
+ return error;
+}
+
/**
* rpcb_register - set or unset a port registration with the local rpcbind svc
* @prog: RPC program number to bind
@@ -189,33 +213,21 @@ int rpcb_register(u32 prog, u32 vers, int prot, unsigned short port, int *okay)
.r_port = port,
};
struct rpc_message msg = {
- .rpc_proc = &rpcb_procedures2[port ?
- RPCBPROC_SET : RPCBPROC_UNSET],
.rpc_argp = &map,
.rpc_resp = okay,
};
- struct rpc_clnt *rpcb_clnt;
- int error = 0;
dprintk("RPC: %sregistering (%u, %u, %d, %u) with local "
"rpcbind\n", (port ? "" : "un"),
prog, vers, prot, port);
- rpcb_clnt = rpcb_create_local((struct sockaddr *)&rpcb_inaddr_loopback,
- sizeof(rpcb_inaddr_loopback),
- RPCBVERS_2);
- if (!IS_ERR(rpcb_clnt)) {
- error = rpc_call_sync(rpcb_clnt, &msg, 0);
- rpc_shutdown_client(rpcb_clnt);
- } else
- error = PTR_ERR(rpcb_clnt);
+ msg.rpc_proc = &rpcb_procedures2[RPCBPROC_UNSET];
+ if (port)
+ msg.rpc_proc = &rpcb_procedures2[RPCBPROC_SET];
- if (error < 0)
- printk(KERN_WARNING "RPC: failed to contact local rpcbind "
- "server (errno %d).\n", -error);
- dprintk("RPC: registration status %d/%d\n", error, *okay);
-
- return error;
+ return rpcb_register_call((struct sockaddr *)&rpcb_inaddr_loopback,
+ sizeof(rpcb_inaddr_loopback),
+ RPCBVERS_2, &msg, okay);
}
/**
--
1.5.5.rc1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 5/5] SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon
2008-07-14 20:03 ` [PATCH 4/5] SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier J. Bruce Fields
@ 2008-07-14 20:03 ` J. Bruce Fields
0 siblings, 0 replies; 28+ messages in thread
From: J. Bruce Fields @ 2008-07-14 20:03 UTC (permalink / raw)
To: Trond Myklebust
Cc: Linux NFS Mailing List, chucklever, Chuck Lever, J. Bruce Fields
From: Chuck Lever <chuck.lever@oracle.com>
Introduce a new API to register RPC services on IPv6 interfaces to allow
the NFS server and lockd to advertise on IPv6 networks.
Unlike rpcb_register(), the new rpcb_v4_register() function uses rpcbind
protocol version 4 to contact the local rpcbind daemon. The version 4
SET/UNSET procedures allow services to register address families besides
AF_INET, register at specific network interfaces, and register transport
protocols besides UDP and TCP. All of this functionality is exposed via
the new rpcb_v4_register() kernel API.
A user-space rpcbind daemon implementation that supports version 4 of the
rpcbind protocol is required in order to make use of this new API.
Note that rpcbind version 3 is sufficient to support the new rpcbind
facilities listed above, but most extant implementations use version 4.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
---
include/linux/sunrpc/clnt.h | 3 +
net/sunrpc/rpcb_clnt.c | 178 ++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 177 insertions(+), 4 deletions(-)
diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index 764fd4c..e5bfe01 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -125,6 +125,9 @@ void rpc_shutdown_client(struct rpc_clnt *);
void rpc_release_client(struct rpc_clnt *);
int rpcb_register(u32, u32, int, unsigned short, int *);
+int rpcb_v4_register(const u32 program, const u32 version,
+ const struct sockaddr *address,
+ const char *netid, int *result);
int rpcb_getport_sync(struct sockaddr_in *, u32, u32, int);
void rpcb_getport_async(struct rpc_task *);
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 8b75c30..24db2b4 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -87,6 +87,7 @@ struct rpcbind_args {
static struct rpc_procinfo rpcb_procedures2[];
static struct rpc_procinfo rpcb_procedures3[];
+static struct rpc_procinfo rpcb_procedures4[];
struct rpcb_info {
u32 rpc_vers;
@@ -122,6 +123,12 @@ static const struct sockaddr_in rpcb_inaddr_loopback = {
.sin_port = htons(RPCBIND_PORT),
};
+static const struct sockaddr_in6 rpcb_in6addr_loopback = {
+ .sin6_family = AF_INET6,
+ .sin6_addr = IN6ADDR_LOOPBACK_INIT,
+ .sin6_port = htons(RPCBIND_PORT),
+};
+
static struct rpc_clnt *rpcb_create_local(struct sockaddr *addr,
size_t addrlen, u32 version)
{
@@ -196,13 +203,38 @@ static int rpcb_register_call(struct sockaddr *addr, size_t addrlen,
* rpcb_register - set or unset a port registration with the local rpcbind svc
* @prog: RPC program number to bind
* @vers: RPC version number to bind
- * @prot: transport protocol to use to make this request
+ * @prot: transport protocol to register
* @port: port value to register
- * @okay: result code
+ * @okay: OUT: result code
+ *
+ * RPC services invoke this function to advertise their contact
+ * information via the system's rpcbind daemon. RPC services
+ * invoke this function once for each [program, version, transport]
+ * tuple they wish to advertise.
+ *
+ * Callers may also unregister RPC services that are no longer
+ * available by setting the passed-in port to zero. This removes
+ * all registered transports for [program, version] from the local
+ * rpcbind database.
+ *
+ * Returns zero if the registration request was dispatched
+ * successfully and a reply was received. The rpcbind daemon's
+ * boolean result code is stored in *okay.
*
- * port == 0 means unregister, port != 0 means register.
+ * Returns an errno value and sets *result to zero if there was
+ * some problem that prevented the rpcbind request from being
+ * dispatched, or if the rpcbind daemon did not respond within
+ * the timeout.
*
- * This routine supports only rpcbind version 2.
+ * This function uses rpcbind protocol version 2 to contact the
+ * local rpcbind daemon.
+ *
+ * Registration works over both AF_INET and AF_INET6, and services
+ * registered via this function are advertised as available for any
+ * address. If the local rpcbind daemon is listening on AF_INET6,
+ * services registered via this function will be advertised on
+ * IN6ADDR_ANY (ie available for all AF_INET and AF_INET6
+ * addresses).
*/
int rpcb_register(u32 prog, u32 vers, int prot, unsigned short port, int *okay)
{
@@ -230,6 +262,144 @@ int rpcb_register(u32 prog, u32 vers, int prot, unsigned short port, int *okay)
RPCBVERS_2, &msg, okay);
}
+/*
+ * Fill in AF_INET family-specific arguments to register
+ */
+static int rpcb_register_netid4(struct sockaddr_in *address_to_register,
+ struct rpc_message *msg)
+{
+ struct rpcbind_args *map = msg->rpc_argp;
+ unsigned short port = ntohs(address_to_register->sin_port);
+ char buf[32];
+
+ /* Construct AF_INET universal address */
+ snprintf(buf, sizeof(buf),
+ NIPQUAD_FMT".%u.%u",
+ NIPQUAD(address_to_register->sin_addr.s_addr),
+ port >> 8, port & 0xff);
+ map->r_addr = buf;
+
+ dprintk("RPC: %sregistering [%u, %u, %s, '%s'] with "
+ "local rpcbind\n", (port ? "" : "un"),
+ map->r_prog, map->r_vers,
+ map->r_addr, map->r_netid);
+
+ msg->rpc_proc = &rpcb_procedures4[RPCBPROC_UNSET];
+ if (port)
+ msg->rpc_proc = &rpcb_procedures4[RPCBPROC_SET];
+
+ return rpcb_register_call((struct sockaddr *)&rpcb_inaddr_loopback,
+ sizeof(rpcb_inaddr_loopback),
+ RPCBVERS_4, msg, msg->rpc_resp);
+}
+
+/*
+ * Fill in AF_INET6 family-specific arguments to register
+ */
+static int rpcb_register_netid6(struct sockaddr_in6 *address_to_register,
+ struct rpc_message *msg)
+{
+ struct rpcbind_args *map = msg->rpc_argp;
+ unsigned short port = ntohs(address_to_register->sin6_port);
+ char buf[64];
+
+ /* Construct AF_INET6 universal address */
+ snprintf(buf, sizeof(buf),
+ NIP6_FMT".%u.%u",
+ NIP6(address_to_register->sin6_addr),
+ port >> 8, port & 0xff);
+ map->r_addr = buf;
+
+ dprintk("RPC: %sregistering [%u, %u, %s, '%s'] with "
+ "local rpcbind\n", (port ? "" : "un"),
+ map->r_prog, map->r_vers,
+ map->r_addr, map->r_netid);
+
+ msg->rpc_proc = &rpcb_procedures4[RPCBPROC_UNSET];
+ if (port)
+ msg->rpc_proc = &rpcb_procedures4[RPCBPROC_SET];
+
+ return rpcb_register_call((struct sockaddr *)&rpcb_in6addr_loopback,
+ sizeof(rpcb_in6addr_loopback),
+ RPCBVERS_4, msg, msg->rpc_resp);
+}
+
+/**
+ * rpcb_v4_register - set or unset a port registration with the local rpcbind
+ * @program: RPC program number of service to (un)register
+ * @version: RPC version number of service to (un)register
+ * @address: address family, IP address, and port to (un)register
+ * @netid: netid of transport protocol to (un)register
+ * @result: result code from rpcbind RPC call
+ *
+ * RPC services invoke this function to advertise their contact
+ * information via the system's rpcbind daemon. RPC services
+ * invoke this function once for each [program, version, address,
+ * netid] tuple they wish to advertise.
+ *
+ * Callers may also unregister RPC services that are no longer
+ * available by setting the port number in the passed-in address
+ * to zero. Callers pass a netid of "" to unregister all
+ * transport netids associated with [program, version, address].
+ *
+ * Returns zero if the registration request was dispatched
+ * successfully and a reply was received. The rpcbind daemon's
+ * result code is stored in *result.
+ *
+ * Returns an errno value and sets *result to zero if there was
+ * some problem that prevented the rpcbind request from being
+ * dispatched, or if the rpcbind daemon did not respond within
+ * the timeout.
+ *
+ * This function uses rpcbind protocol version 4 to contact the
+ * local rpcbind daemon. The local rpcbind daemon must support
+ * version 4 of the rpcbind protocol in order for these functions
+ * to register a service successfully.
+ *
+ * Supported netids include "udp" and "tcp" for UDP and TCP over
+ * IPv4, and "udp6" and "tcp6" for UDP and TCP over IPv6,
+ * respectively.
+ *
+ * The contents of @address determine the address family and the
+ * port to be registered. The usual practice is to pass INADDR_ANY
+ * as the raw address, but specifying a non-zero address is also
+ * supported by this API if the caller wishes to advertise an RPC
+ * service on a specific network interface.
+ *
+ * Note that passing in INADDR_ANY does not create the same service
+ * registration as IN6ADDR_ANY. The former advertises an RPC
+ * service on any IPv4 address, but not on IPv6. The latter
+ * advertises the service on all IPv4 and IPv6 addresses.
+ */
+int rpcb_v4_register(const u32 program, const u32 version,
+ const struct sockaddr *address, const char *netid,
+ int *result)
+{
+ struct rpcbind_args map = {
+ .r_prog = program,
+ .r_vers = version,
+ .r_netid = netid,
+ .r_owner = RPCB_OWNER_STRING,
+ };
+ struct rpc_message msg = {
+ .rpc_argp = &map,
+ .rpc_resp = result,
+ };
+
+ *result = 0;
+
+ switch (address->sa_family) {
+ case AF_INET:
+ return rpcb_register_netid4((struct sockaddr_in *)address,
+ &msg);
+ case AF_INET6:
+ return rpcb_register_netid6((struct sockaddr_in6 *)address,
+ &msg);
+ }
+
+ return -EAFNOSUPPORT;
+}
+
/**
* rpcb_getport_sync - obtain the port for an RPC service on a given host
* @sin: address of remote peer
--
1.5.5.rc1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Fwd: [PATCH 0/7] Remaining rpcbind patches for 2.6.27
[not found] ` <76bd70e30807141430o783ef431pb61eae97b42e00b4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-07-14 21:31 ` Chuck Lever
0 siblings, 0 replies; 28+ messages in thread
From: Chuck Lever @ 2008-07-14 21:31 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Linux NFS Mailing List
On Mon, Jul 14, 2008 at 3:56 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Fri, Jul 11, 2008 at 03:11:29PM -0400, Chuck Lever wrote:
>> On Fri, Jul 11, 2008 at 2:40 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>> > On Thu, Jul 10, 2008 at 01:27:47PM -0400, Chuck Lever wrote:
>> >>
>> >> On Jul 7, 2008, at 4:51 PM, Trond Myklebust wrote:
>> >>
>> >>> On Mon, 2008-07-07 at 15:44 -0400, Chuck Lever wrote:
>> >>>
>> >>>> If you would like connected UDP, I won't object to you implementing
>> >>>> it. However, I never tested whether a connected UDP socket will give
>> >>>> the desired semantics without extra code in the UDP transport (for
>> >>>> example, an ->sk_error callback). I don't think it's worth the
>> >>>> hassle
>> >>>> if we have to add code to UDP that only this tiny use case would
>> >>>> need.
>> >>>>
>> >>>
>> >>> OK. I'll set these patches aside until I have time to look into adding
>> >>> connected UDP support.
>> >>
>> >> That's not completely necessary... the one-shot + TCP changes just make
>> >> it nicer when the local rpcbind is not listening. Without these, the
>> >> cases where the rpcbind daemon isn't running, or doesn't support rpcbind
>> >> v3/v4 and the kernel was built with CONFIG_SUNRPC_REGISTER_V4, will cause
>> >> some delays before failing, but otherwise shouldn't be a problem.
>> >>
>> >> I think you can drop the patch to change rpcb registration to go over
>> >> TCP for now unless you already have a CUDP implementation you are happy
>> >> with.
>> >
>> > So actually in your original series of 7 I think that'd mean dropping
>> > numbers 5 and 6 and keeping the rest?
>>
>> So, 5/7 adds "one shot" support to the RPC client. I think that might
>> be interesting for other kernel services, like making rpcbind queries
>> over TCP, or NFSv4 callback. I'd like to advocate for keeping that
>> one so others can build on it (with whatever name for the create flag
>> we can agree on), but it's not really necessary for subsequent
>> patches.
>>
>> 6/7 changes the rpcb_register logic to use "one shot" + TCP -- that's
>> the one that is controversial and can be dropped.
>
> May as well at least apply the other 5? Trond is carrying other
> net/sunrpc/rpcb_clnt.c patches, so they probably need to go in his tree.
>
> I guess I'll go ahead and send along versions based on latest
> trond/devel.
Yep, it's reasonable to get these circulating to ensure they don't
cause any unwanted side effects for standard rpcbindv2/IPv4 usage.
We just need to take care these don't get too badly re-ordered when
merging all this back together when 2.6.27 opens.
--
Chuck Lever
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2008-07-14 21:32 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-30 22:38 [PATCH 0/7] Remaining rpcbind patches for 2.6.27 Chuck Lever
[not found] ` <20080630223646.24534.74654.stgit-ewv44WTpT0t9HhUboXbp9zCvJB+x5qRC@public.gmane.org>
2008-06-30 22:38 ` [PATCH 1/7] SUNRPC: Use correct XDR encoding procedure for rpcbind SET/UNSET Chuck Lever
2008-06-30 22:38 ` [PATCH 2/7] SUNRPC: Introduce a specific rpcb_create for contacting localhost Chuck Lever
2008-06-30 22:38 ` [PATCH 3/7] SUNRPC: None of rpcb_create's callers wants a privileged source port Chuck Lever
2008-06-30 22:39 ` [PATCH 4/7] SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier Chuck Lever
2008-06-30 22:39 ` [PATCH 5/7] SUNRPC: introduce new rpc_task flag that fails requests on xprt disconnect Chuck Lever
2008-06-30 22:39 ` [PATCH 6/7] SUNRPC: Quickly detect missing portmapper during RPC service registration Chuck Lever
2008-06-30 22:39 ` [PATCH 7/7] SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon Chuck Lever
2008-07-03 20:45 ` [PATCH 0/7] Remaining rpcbind patches for 2.6.27 J. Bruce Fields
2008-07-07 18:20 ` Trond Myklebust
2008-07-07 18:43 ` Chuck Lever
2008-07-07 18:51 ` Trond Myklebust
2008-07-07 19:44 ` Chuck Lever
[not found] ` <76bd70e30807071244v4db1c366uc7599d2dd806bf1b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-07-07 20:51 ` Trond Myklebust
2008-07-07 21:19 ` J. Bruce Fields
2008-07-07 22:13 ` Trond Myklebust
2008-07-07 22:56 ` J. Bruce Fields
2008-07-08 1:56 ` Chuck Lever
2008-07-10 17:27 ` Chuck Lever
2008-07-11 18:40 ` J. Bruce Fields
2008-07-11 19:11 ` Chuck Lever
[not found] ` <76bd70e30807111211m567e9f8cv38a975bbc9df5758-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-07-14 19:56 ` J. Bruce Fields
[not found] ` <76bd70e30807141430o783ef431pb61eae97b42e00b4@mail.gmail.com>
[not found] ` <76bd70e30807141430o783ef431pb61eae97b42e00b4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-07-14 21:31 ` Fwd: " Chuck Lever
2008-07-14 20:03 ` [PATCH 1/5] SUNRPC: Use correct XDR encoding procedure for rpcbind SET/UNSET J. Bruce Fields
2008-07-14 20:03 ` [PATCH 2/5] SUNRPC: Introduce a specific rpcb_create for contacting localhost J. Bruce Fields
2008-07-14 20:03 ` [PATCH 3/5] SUNRPC: None of rpcb_create's callers wants a privileged source port J. Bruce Fields
2008-07-14 20:03 ` [PATCH 4/5] SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier J. Bruce Fields
2008-07-14 20:03 ` [PATCH 5/5] SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon J. Bruce Fields
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox