[nfsd PATCH 0/3] address issues with shutdown while portmap is dead

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* [nfsd PATCH 0/3] address issues with shutdown while portmap is dead
@ 2009-05-28  6:33 NeilBrown
       [not found] ` <20090528062730.15937.70579.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2009-05-28  6:33 UTC (permalink / raw)
  To: linux-nfs

Hi all.
Currently if you shutdown nfsd while portmap is dead, it takes about
3.5 minutes.  This isn't ideal.

The following patch series address some issues surrounding that.
The first two a minor fixes to nfsd which get it behave a little bit
more consistently.  They don't directly affect the issue.

The third changes the timeouts used when rpcb_client talks to
portmap or rpcbind to unregister a service.  It simply reduces the
timeout to 0.5 seconds with no retries.
As there should be no packet loss, this should only fail if
portmap/rpcbind isn't working.  And in that case unregistering
wont serve any purpose anyway.

And as unregistering is merely for cleanliness, not for correctness,
there wouldn't be a problem even if something did go wrong.

With this patch, the 3.5 minutes drops to 3 seconds.

Comments welcome.

Thanks,
NeilBrown

---

NeilBrown (3):
      sunrpc: reduce timeout when unregistering rpcbind registrations.
      nfsd: optimise the starting of zero threads when none are running.
      nfsd: don't take nfsd_mutex twice when setting number of threads.

 fs/nfsd/nfsctl.c       |   13 +++++++++----
 fs/nfsd/nfssvc.c       |    8 ++++++--
 net/sunrpc/rpcb_clnt.c |   17 +++++++++++++++--
 3 files changed, 30 insertions(+), 8 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 25+ messages in thread

[parent not found: <20090528062730.15937.70579.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>]

* [PATCH 1/3] nfsd: don't take nfsd_mutex twice when setting number of threads.
       [not found] ` <20090528062730.15937.70579.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2009-05-28  6:33   ` NeilBrown
       [not found]     ` <20090528063303.15937.55202.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  2009-05-28  6:33   ` [PATCH 2/3] nfsd: optimise the starting of zero threads when none are running NeilBrown
  2009-05-28  6:33   ` [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations NeilBrown
  2 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2009-05-28  6:33 UTC (permalink / raw)
  To: linux-nfs; +Cc: NeilBrown

Currently when we right a number to 'threads' in nfsdfs,
we take the nfsd_mutex, update the number of threads, then take the
mutex again to read the number of threads.

Mostly this isn't a big deal.  However if we are write '0', and
portmap happens to be dead, then we can get unpredictable behaviour.
If the nfsd threads all got killed quickly and the last thread is
waiting for portmap to respond, then the second time we take the mutex
we will block waiting for the last thread.
However if the nfsd threads didn't die quite that fast, then there
will be no contention when we try to take the mutex again.

Unpredictability isn't fun, and waiting for the last thread to exit is
pointless, so avoid taking the lock twice.
To achieve this, get nfsd_svc return a non-negative number of active
threads when not returning a negative error.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 fs/nfsd/nfsctl.c |   13 +++++++++----
 fs/nfsd/nfssvc.c |    2 ++
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index af16849..1ddd4f9 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -207,10 +207,14 @@ static struct file_operations pool_stats_operations = {
 static ssize_t write_svc(struct file *file, char *buf, size_t size)
 {
 	struct nfsctl_svc *data;
+	int err;
 	if (size < sizeof(*data))
 		return -EINVAL;
 	data = (struct nfsctl_svc*) buf;
-	return nfsd_svc(data->svc_port, data->svc_nthreads);
+	err = nfsd_svc(data->svc_port, data->svc_nthreads);
+	if (err < 0)
+		return err;
+	return 0;
 }
 
 /**
@@ -692,10 +696,11 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
 		if (newthreads < 0)
 			return -EINVAL;
 		rv = nfsd_svc(NFS_PORT, newthreads);
-		if (rv)
+		if (rv < 0)
 			return rv;
-	}
-	sprintf(buf, "%d\n", nfsd_nrthreads());
+	} else
+		rv = nfsd_nrthreads();
+	sprintf(buf, "%d\n", rv);
 	return strlen(buf);
 }
 
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index cbba4a9..df6c59e 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -413,6 +413,8 @@ nfsd_svc(unsigned short port, int nrservs)
 		goto failure;
 
 	error = svc_set_num_threads(nfsd_serv, NULL, nrservs);
+	if (error == 0)
+		error = nfsd_serv->sv_nrthreads - 1;
  failure:
 	svc_destroy(nfsd_serv);		/* Release server */
  out:



^ permalink raw reply related	[flat|nested] 25+ messages in thread

[parent not found: <20090528063303.15937.55202.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>]

* Re: [PATCH 1/3] nfsd: don't take nfsd_mutex twice when setting number of threads.
       [not found]     ` <20090528063303.15937.55202.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2009-06-11 17:52       ` Jeff Layton
       [not found]         ` <20090611135255.0fa2f728-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff Layton @ 2009-06-11 17:52 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs

On Thu, 28 May 2009 16:33:03 +1000
NeilBrown <neilb@suse.de> wrote:

> Currently when we right a number to 'threads' in nfsdfs,
> we take the nfsd_mutex, update the number of threads, then take the
> mutex again to read the number of threads.
> 
> Mostly this isn't a big deal.  However if we are write '0', and
> portmap happens to be dead, then we can get unpredictable behaviour.
> If the nfsd threads all got killed quickly and the last thread is
> waiting for portmap to respond, then the second time we take the mutex
> we will block waiting for the last thread.
> However if the nfsd threads didn't die quite that fast, then there
> will be no contention when we try to take the mutex again.
> 
> Unpredictability isn't fun, and waiting for the last thread to exit is
> pointless, so avoid taking the lock twice.
> To achieve this, get nfsd_svc return a non-negative number of active
> threads when not returning a negative error.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> 
>  fs/nfsd/nfsctl.c |   13 +++++++++----
>  fs/nfsd/nfssvc.c |    2 ++
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index af16849..1ddd4f9 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -207,10 +207,14 @@ static struct file_operations pool_stats_operations = {
>  static ssize_t write_svc(struct file *file, char *buf, size_t size)
>  {
>  	struct nfsctl_svc *data;
> +	int err;
>  	if (size < sizeof(*data))
>  		return -EINVAL;
>  	data = (struct nfsctl_svc*) buf;
> -	return nfsd_svc(data->svc_port, data->svc_nthreads);
> +	err = nfsd_svc(data->svc_port, data->svc_nthreads);
> +	if (err < 0)
> +		return err;
> +	return 0;
>  }
>  
>  /**
> @@ -692,10 +696,11 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
>  		if (newthreads < 0)
>  			return -EINVAL;
>  		rv = nfsd_svc(NFS_PORT, newthreads);
> -		if (rv)
> +		if (rv < 0)
>  			return rv;
> -	}
> -	sprintf(buf, "%d\n", nfsd_nrthreads());
> +	} else
> +		rv = nfsd_nrthreads();
> +	sprintf(buf, "%d\n", rv);
>  	return strlen(buf);
>  }
>  
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index cbba4a9..df6c59e 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -413,6 +413,8 @@ nfsd_svc(unsigned short port, int nrservs)
>  		goto failure;
>  
>  	error = svc_set_num_threads(nfsd_serv, NULL, nrservs);
> +	if (error == 0)
> +		error = nfsd_serv->sv_nrthreads - 1;
						^^^^
Why -1 here? I thought you wanted nfsd_svc to return the number of
threads?

Also, looks like Bruce's tree has changed a bit recently and I don't
think this patch will apply cleanly now. You may need to respin...

>   failure:
>  	svc_destroy(nfsd_serv);		/* Release server */
>   out:
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

[parent not found: <20090611135255.0fa2f728-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>]

* Re: [PATCH 1/3] nfsd: don't take nfsd_mutex twice when setting number of threads.
       [not found]         ` <20090611135255.0fa2f728-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>
@ 2009-06-11 18:01           ` Jeff Layton
  0 siblings, 0 replies; 25+ messages in thread
From: Jeff Layton @ 2009-06-11 18:01 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs

On Thu, 11 Jun 2009 13:52:55 -0400
Jeff Layton <jlayton@redhat.com> wrote:

> On Thu, 28 May 2009 16:33:03 +1000
> NeilBrown <neilb@suse.de> wrote:
> 
> > Currently when we right a number to 'threads' in nfsdfs,
> > we take the nfsd_mutex, update the number of threads, then take the
> > mutex again to read the number of threads.
> > 
> > Mostly this isn't a big deal.  However if we are write '0', and
> > portmap happens to be dead, then we can get unpredictable behaviour.
> > If the nfsd threads all got killed quickly and the last thread is
> > waiting for portmap to respond, then the second time we take the mutex
> > we will block waiting for the last thread.
> > However if the nfsd threads didn't die quite that fast, then there
> > will be no contention when we try to take the mutex again.
> > 
> > Unpredictability isn't fun, and waiting for the last thread to exit is
> > pointless, so avoid taking the lock twice.
> > To achieve this, get nfsd_svc return a non-negative number of active
> > threads when not returning a negative error.
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> > 
> >  fs/nfsd/nfsctl.c |   13 +++++++++----
> >  fs/nfsd/nfssvc.c |    2 ++
> >  2 files changed, 11 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > index af16849..1ddd4f9 100644
> > --- a/fs/nfsd/nfsctl.c
> > +++ b/fs/nfsd/nfsctl.c
> > @@ -207,10 +207,14 @@ static struct file_operations pool_stats_operations = {
> >  static ssize_t write_svc(struct file *file, char *buf, size_t size)
> >  {
> >  	struct nfsctl_svc *data;
> > +	int err;
> >  	if (size < sizeof(*data))
> >  		return -EINVAL;
> >  	data = (struct nfsctl_svc*) buf;
> > -	return nfsd_svc(data->svc_port, data->svc_nthreads);
> > +	err = nfsd_svc(data->svc_port, data->svc_nthreads);
> > +	if (err < 0)
> > +		return err;
> > +	return 0;
> >  }
> >  
> >  /**
> > @@ -692,10 +696,11 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
> >  		if (newthreads < 0)
> >  			return -EINVAL;
> >  		rv = nfsd_svc(NFS_PORT, newthreads);
> > -		if (rv)
> > +		if (rv < 0)
> >  			return rv;
> > -	}
> > -	sprintf(buf, "%d\n", nfsd_nrthreads());
> > +	} else
> > +		rv = nfsd_nrthreads();
> > +	sprintf(buf, "%d\n", rv);
> >  	return strlen(buf);
> >  }
> >  
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index cbba4a9..df6c59e 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -413,6 +413,8 @@ nfsd_svc(unsigned short port, int nrservs)
> >  		goto failure;
> >  
> >  	error = svc_set_num_threads(nfsd_serv, NULL, nrservs);
> > +	if (error == 0)
> > +		error = nfsd_serv->sv_nrthreads - 1;
> 						^^^^
> Why -1 here? I thought you wanted nfsd_svc to return the number of
> threads?
> 

Ahh nm...I see why. sv_nrthreads is artificially high due to the
earlier svc_get. Hate the refcounting with this stuff -- blech...

> Also, looks like Bruce's tree has changed a bit recently and I don't
> think this patch will apply cleanly now. You may need to respin...
> 
> >   failure:
> >  	svc_destroy(nfsd_serv);		/* Release server */
> >   out:
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 
> -- 
> Jeff Layton <jlayton@redhat.com>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/3] nfsd: optimise the starting of zero threads when none are running.
       [not found] ` <20090528062730.15937.70579.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  2009-05-28  6:33   ` [PATCH 1/3] nfsd: don't take nfsd_mutex twice when setting number of threads NeilBrown
@ 2009-05-28  6:33   ` NeilBrown
       [not found]     ` <20090528063303.15937.57966.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  2009-05-28  6:33   ` [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations NeilBrown
  2 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2009-05-28  6:33 UTC (permalink / raw)
  To: linux-nfs; +Cc: NeilBrown

Currently, if we ask to set then number of nfsd threads to zero when
there are none running, we set up all the sockets and register the
service, and then tear it all down again.
This is pointless.

So detect that case and exit promptly.
(also remove an assignment to 'error' which was never used.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 fs/nfsd/nfssvc.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index df6c59e..83b7d41 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -390,12 +390,14 @@ nfsd_svc(unsigned short port, int nrservs)
 
 	mutex_lock(&nfsd_mutex);
 	dprintk("nfsd: creating service\n");
-	error = -EINVAL;
 	if (nrservs <= 0)
 		nrservs = 0;
 	if (nrservs > NFSD_MAXSERVS)
 		nrservs = NFSD_MAXSERVS;
-	
+	error = 0;
+	if (nrservs == 0 && nfsd_serv == NULL)
+		goto out;
+
 	/* Readahead param cache - will no-op if it already exists */
 	error =	nfsd_racache_init(2*nrservs);
 	if (error<0)



^ permalink raw reply related	[flat|nested] 25+ messages in thread

[parent not found: <20090528063303.15937.57966.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>]

* Re: [PATCH 2/3] nfsd: optimise the starting of zero threads when none are running.
       [not found]     ` <20090528063303.15937.57966.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2009-06-11 17:26       ` Jeff Layton
  0 siblings, 0 replies; 25+ messages in thread
From: Jeff Layton @ 2009-06-11 17:26 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs, NeilBrown

On Thu, 28 May 2009 16:33:03 +1000
NeilBrown <neilb@suse.de> wrote:

> Currently, if we ask to set then number of nfsd threads to zero when
> there are none running, we set up all the sockets and register the
> service, and then tear it all down again.
> This is pointless.
> 
> So detect that case and exit promptly.
> (also remove an assignment to 'error' which was never used.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>

Acked-by: Jeff Layton <jlayton@redhat.com>

FWIW, the patchset I did to add IPv6 support makes some similar
optimizations to the userspace rpc.nfsd program too.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
       [not found] ` <20090528062730.15937.70579.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  2009-05-28  6:33   ` [PATCH 1/3] nfsd: don't take nfsd_mutex twice when setting number of threads NeilBrown
  2009-05-28  6:33   ` [PATCH 2/3] nfsd: optimise the starting of zero threads when none are running NeilBrown
@ 2009-05-28  6:33   ` NeilBrown
       [not found]     ` <20090528063303.15937.62423.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  2 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2009-05-28  6:33 UTC (permalink / raw)
  To: linux-nfs; +Cc: NeilBrown

Unregistering an RPC service is not essential - but it is tidy.
So it is unpleasant to wait a long time for it to complete.

As unregistering is always directed to localhost, the most likely
reason for any delay is the portmap (or rpcbind) is not running.
In that case, waiting it totally pointless.  In any case, we should
expect a quick response, and zero packet loss.

So reduce the timeouts to a total of half a second.

This means that if we try to stop nfsd while portmap is not running,
it takes 3 seconds instead of 3.5 minutes.

Note that stopping portmap before the last nfsd thread has died is
not simply a configuration error.  When we kill nfsd threads they
could take a while to die, and it is possible that portmap could
then be killed before the last nfsd thread has had a change to finish
up.

[An alternate might be to make the sunrpc code always "connect"
udp sockets so that "port not reachable" errors would get reported
back.  This requires a more intrusive change though and might have
other consequences]

Signed-off-by: NeilBrown <neilb@suse.de>
---

 net/sunrpc/rpcb_clnt.c |   17 +++++++++++++++--
 1 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index beee6da..31f7e2e 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -132,7 +132,8 @@ static const struct sockaddr_in rpcb_inaddr_loopback = {
 };

 static struct rpc_clnt *rpcb_create_local(struct sockaddr *addr,
-					  size_t addrlen, u32 version)
+					  size_t addrlen, u32 version,
+					  int unregister)
 {
 	struct rpc_create_args args = {
 		.protocol	= XPRT_TRANSPORT_UDP,
@@ -144,6 +145,16 @@ static struct rpc_clnt *rpcb_create_local(struct sockaddr *addr,
 		.authflavor	= RPC_AUTH_UNIX,
 		.flags		= RPC_CLNT_CREATE_NOPING,
 	};
+	const struct rpc_timeout unreg_timeout = {
+		.to_initval = HZ/2,
+		.to_maxval = HZ/2,
+		.to_increment = 0,
+		.to_retries = 0,
+		.to_exponential = 0,
+	};
+
+	if (unregister)
+		args.timeout = &unreg_timeout;

 	return rpc_create(&args);
 }
@@ -183,10 +194,12 @@ static int rpcb_register_call(const u32 version, struct rpc_message *msg)
 	size_t addrlen = sizeof(rpcb_inaddr_loopback);
 	struct rpc_clnt *rpcb_clnt;
 	int result, error = 0;
+	int unregister;

 	msg->rpc_resp = &result;

-	rpcb_clnt = rpcb_create_local(addr, addrlen, version);
+	unregister = (msg->rpc_proc->p_proc == RPCBPROC_UNSET);
+	rpcb_clnt = rpcb_create_local(addr, addrlen, version, unregister);
 	if (!IS_ERR(rpcb_clnt)) {
 		error = rpc_call_sync(rpcb_clnt, msg, 0);
 		rpc_shutdown_client(rpcb_clnt);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

[parent not found: <20090528063303.15937.62423.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>]

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
       [not found]     ` <20090528063303.15937.62423.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2009-05-28 13:07       ` Tom Talpey
  2009-06-11  4:49         ` Neil Brown
  2009-05-28 13:43       ` Chuck Lever
  1 sibling, 1 reply; 25+ messages in thread
From: Tom Talpey @ 2009-05-28 13:07 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs

At 02:33 AM 5/28/2009, NeilBrown wrote:
>Unregistering an RPC service is not essential - but it is tidy.
>So it is unpleasant to wait a long time for it to complete.
>
>As unregistering is always directed to localhost, the most likely
>reason for any delay is the portmap (or rpcbind) is not running.
>In that case, waiting it totally pointless.  In any case, we should
>expect a quick response, and zero packet loss.
>
>So reduce the timeouts to a total of half a second.

Why wait for the response at all in this case? With zero retries
and nothing to do with the result, it seems pointless to even wait
for half a second.

Tom.


>
>This means that if we try to stop nfsd while portmap is not running,
>it takes 3 seconds instead of 3.5 minutes.
>
>Note that stopping portmap before the last nfsd thread has died is
>not simply a configuration error.  When we kill nfsd threads they
>could take a while to die, and it is possible that portmap could
>then be killed before the last nfsd thread has had a change to finish
>up.
>
>[An alternate might be to make the sunrpc code always "connect"
>udp sockets so that "port not reachable" errors would get reported
>back.  This requires a more intrusive change though and might have
>other consequences]
>
>Signed-off-by: NeilBrown <neilb@suse.de>
>---
>
> net/sunrpc/rpcb_clnt.c |   17 +++++++++++++++--
> 1 files changed, 15 insertions(+), 2 deletions(-)
>
>diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
>index beee6da..31f7e2e 100644
>--- a/net/sunrpc/rpcb_clnt.c
>+++ b/net/sunrpc/rpcb_clnt.c
>@@ -132,7 +132,8 @@ static const struct sockaddr_in rpcb_inaddr_loopback = {
> };
> 
> static struct rpc_clnt *rpcb_create_local(struct sockaddr *addr,
>-					  size_t addrlen, u32 version)
>+					  size_t addrlen, u32 version,
>+					  int unregister)
> {
> 	struct rpc_create_args args = {
> 		.protocol	= XPRT_TRANSPORT_UDP,
>@@ -144,6 +145,16 @@ static struct rpc_clnt *rpcb_create_local(struct 
>sockaddr *addr,
> 		.authflavor	= RPC_AUTH_UNIX,
> 		.flags		= RPC_CLNT_CREATE_NOPING,
> 	};
>+	const struct rpc_timeout unreg_timeout = {
>+		.to_initval = HZ/2,
>+		.to_maxval = HZ/2,
>+		.to_increment = 0,
>+		.to_retries = 0,
>+		.to_exponential = 0,
>+	};
>+
>+	if (unregister)
>+		args.timeout = &unreg_timeout;
> 
> 	return rpc_create(&args);
> }
>@@ -183,10 +194,12 @@ static int rpcb_register_call(const u32 version, 
>struct rpc_message *msg)
> 	size_t addrlen = sizeof(rpcb_inaddr_loopback);
> 	struct rpc_clnt *rpcb_clnt;
> 	int result, error = 0;
>+	int unregister;
> 
> 	msg->rpc_resp = &result;
> 
>-	rpcb_clnt = rpcb_create_local(addr, addrlen, version);
>+	unregister = (msg->rpc_proc->p_proc == RPCBPROC_UNSET);
>+	rpcb_clnt = rpcb_create_local(addr, addrlen, version, unregister);
> 	if (!IS_ERR(rpcb_clnt)) {
> 		error = rpc_call_sync(rpcb_clnt, msg, 0);
> 		rpc_shutdown_client(rpcb_clnt);
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
  2009-05-28 13:07       ` Tom Talpey
@ 2009-06-11  4:49         ` Neil Brown
       [not found]           ` <18992.36038.267957.467326-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Neil Brown @ 2009-06-11  4:49 UTC (permalink / raw)
  To: Tom Talpey; +Cc: linux-nfs

On Thursday May 28, tmtalpey@gmail.com wrote:
> At 02:33 AM 5/28/2009, NeilBrown wrote:
> >Unregistering an RPC service is not essential - but it is tidy.
> >So it is unpleasant to wait a long time for it to complete.
> >
> >As unregistering is always directed to localhost, the most likely
> >reason for any delay is the portmap (or rpcbind) is not running.
> >In that case, waiting it totally pointless.  In any case, we should
> >expect a quick response, and zero packet loss.
> >
> >So reduce the timeouts to a total of half a second.
> 
> Why wait for the response at all in this case? With zero retries
> and nothing to do with the result, it seems pointless to even wait
> for half a second.

That's a fair point, thanks.  I'll keep that in mind if I revisit
these patches.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 25+ messages in thread

[parent not found: <18992.36038.267957.467326-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>]

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
       [not found]           ` <18992.36038.267957.467326-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2009-06-11 15:02             ` Chuck Lever
  0 siblings, 0 replies; 25+ messages in thread
From: Chuck Lever @ 2009-06-11 15:02 UTC (permalink / raw)
  To: Neil Brown, Tom Talpey; +Cc: Linux NFS Mailing list

On Jun 11, 2009, at 12:49 AM, Neil Brown wrote:
> On Thursday May 28, tmtalpey@gmail.com wrote:
>> At 02:33 AM 5/28/2009, NeilBrown wrote:
>>> Unregistering an RPC service is not essential - but it is tidy.
>>> So it is unpleasant to wait a long time for it to complete.
>>>
>>> As unregistering is always directed to localhost, the most likely
>>> reason for any delay is the portmap (or rpcbind) is not running.
>>> In that case, waiting it totally pointless.  In any case, we should
>>> expect a quick response, and zero packet loss.
>>>
>>> So reduce the timeouts to a total of half a second.
>>
>> Why wait for the response at all in this case? With zero retries
>> and nothing to do with the result, it seems pointless to even wait
>> for half a second.
>
> That's a fair point, thanks.  I'll keep that in mind if I revisit
> these patches.

While not waiting is probably OK in the "shutdown" case, it could be a  
problem during normal operation.  The kernel unregisters RPC services  
before it attempts to register them (to clear any old registrations).   
Not waiting for the unregister reply might be racy.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
       [not found]     ` <20090528063303.15937.62423.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  2009-05-28 13:07       ` Tom Talpey
@ 2009-05-28 13:43       ` Chuck Lever
  2009-06-11  4:48         ` Neil Brown
  1 sibling, 1 reply; 25+ messages in thread
From: Chuck Lever @ 2009-05-28 13:43 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-nfs

On May 28, 2009, at 2:33 AM, NeilBrown wrote:
> Unregistering an RPC service is not essential - but it is tidy.
> So it is unpleasant to wait a long time for it to complete.
>
> As unregistering is always directed to localhost, the most likely
> reason for any delay is the portmap (or rpcbind) is not running.
> In that case, waiting it totally pointless.  In any case, we should
> expect a quick response, and zero packet loss.
>
> So reduce the timeouts to a total of half a second.

A much better solution here would be to use a connected UDP socket.   
This would make start-up in the absense of rpcbind/portmap fail much  
quicker, we could keep the longer, safer timeout value, and the kernel  
could warn about exactly what the problem is.  It would also have  
similar benefits for the kernel's mount and NSM clients, and when  
starting up services.  It would additionally make UDP RPC ping  
checking fail very quickly.

> This means that if we try to stop nfsd while portmap is not running,
> it takes 3 seconds instead of 3.5 minutes.
>
> Note that stopping portmap before the last nfsd thread has died is
> not simply a configuration error.  When we kill nfsd threads they
> could take a while to die, and it is possible that portmap could
> then be killed before the last nfsd thread has had a change to finish
> up.
>
> [An alternate might be to make the sunrpc code always "connect"
> udp sockets so that "port not reachable" errors would get reported
> back.  This requires a more intrusive change though and might have
> other consequences]

We had discussed this about a year ago when I started adding IPv6  
support.  I had suggested switching the local rpc client to use TCP  
instead of UDP to solve exactly this time-out problem during start- 
up.  There was some resistance to the idea because TCP would leave  
privileged ports in TIMEWAIT (at shutdown, this is probably not a  
significant concern).

Trond had intended to introduce connected UDP socket support to the  
RPC client, although we were also interested in someday having a  
single UDP socket for all RPC traffic... the design never moved on  
from there.

My feeling at this point is that having a connected UDP socket  
transport would be simpler and have broader benefits than waiting for  
an eventual design that can accommodate multiple transport instances  
sharing a single socket.

> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>
> net/sunrpc/rpcb_clnt.c |   17 +++++++++++++++--
> 1 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
> index beee6da..31f7e2e 100644
> --- a/net/sunrpc/rpcb_clnt.c
> +++ b/net/sunrpc/rpcb_clnt.c
> @@ -132,7 +132,8 @@ static const struct sockaddr_in  
> rpcb_inaddr_loopback = {
> };
>
> static struct rpc_clnt *rpcb_create_local(struct sockaddr *addr,
> -					  size_t addrlen, u32 version)
> +					  size_t addrlen, u32 version,
> +					  int unregister)
> {

It might be cleaner to define the timeout structures as global  
statics, and pass in a pointer to the preferred structure instead of  
checking a passed-in an integer.  One conditional branch instead of two.

> 	struct rpc_create_args args = {
> 		.protocol	= XPRT_TRANSPORT_UDP,
> @@ -144,6 +145,16 @@ static struct rpc_clnt  
> *rpcb_create_local(struct sockaddr *addr,
> 		.authflavor	= RPC_AUTH_UNIX,
> 		.flags		= RPC_CLNT_CREATE_NOPING,
> 	};
> +	const struct rpc_timeout unreg_timeout = {
> +		.to_initval = HZ/2,
> +		.to_maxval = HZ/2,
> +		.to_increment = 0,
> +		.to_retries = 0,
> +		.to_exponential = 0,
> +	};
> +
> +	if (unregister)
> +		args.timeout = &unreg_timeout;
>
> 	return rpc_create(&args);
> }
> @@ -183,10 +194,12 @@ static int rpcb_register_call(const u32  
> version, struct rpc_message *msg)
> 	size_t addrlen = sizeof(rpcb_inaddr_loopback);
> 	struct rpc_clnt *rpcb_clnt;
> 	int result, error = 0;
> +	int unregister;
>
> 	msg->rpc_resp = &result;
>
> -	rpcb_clnt = rpcb_create_local(addr, addrlen, version);
> +	unregister = (msg->rpc_proc->p_proc == RPCBPROC_UNSET);
> +	rpcb_clnt = rpcb_create_local(addr, addrlen, version, unregister);
> 	if (!IS_ERR(rpcb_clnt)) {
> 		error = rpc_call_sync(rpcb_clnt, msg, 0);
> 		rpc_shutdown_client(rpcb_clnt);

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
  2009-05-28 13:43       ` Chuck Lever
@ 2009-06-11  4:48         ` Neil Brown
       [not found]           ` <18992.35996.986951.556723-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Neil Brown @ 2009-06-11  4:48 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Thursday May 28, chuck.lever@oracle.com wrote:
> On May 28, 2009, at 2:33 AM, NeilBrown wrote:
> >
> > [An alternate might be to make the sunrpc code always "connect"
> > udp sockets so that "port not reachable" errors would get reported
> > back.  This requires a more intrusive change though and might have
> > other consequences]
> 
> We had discussed this about a year ago when I started adding IPv6  
> support.  I had suggested switching the local rpc client to use TCP  
> instead of UDP to solve exactly this time-out problem during start- 
> up.  There was some resistance to the idea because TCP would leave  
> privileged ports in TIMEWAIT (at shutdown, this is probably not a  
> significant concern).
> 
> Trond had intended to introduce connected UDP socket support to the  
> RPC client, although we were also interested in someday having a  
> single UDP socket for all RPC traffic... the design never moved on  
> from there.
> 
> My feeling at this point is that having a connected UDP socket  
> transport would be simpler and have broader benefits than waiting for  
> an eventual design that can accommodate multiple transport instances  
> sharing a single socket.

The use of connected UDP would have to be limited to known-safe cases
such as contacting the local portmap.  I believe there are still NFS
servers out there that - if multihomed - can reply from a different
address to the one the request was sent to.

And we would need to check that rpcbind does the right thing.  I
recently discovered that rpcbind is buggy and will sometimes respond
from the wrong interface - I suspect localhost addresses are safe, but
we would need to check, or fix it (I fixed that bug in portmap (glibc
actually) 6 years ago and now it appears again in rpcbind - groan!).

How hard would it be to add (optional) connected UDP support?  Would
we just make the code more like the TCP version, or are there any
gotchas that you know of that we would need to be careful of?

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 25+ messages in thread

[parent not found: <18992.35996.986951.556723-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>]

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
       [not found]           ` <18992.35996.986951.556723-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2009-06-11 15:44             ` Chuck Lever
  2009-07-02 20:04               ` Chuck Lever
  0 siblings, 1 reply; 25+ messages in thread
From: Chuck Lever @ 2009-06-11 15:44 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-nfs

On Jun 11, 2009, at 12:48 AM, Neil Brown wrote:
> On Thursday May 28, chuck.lever@oracle.com wrote:
>> On May 28, 2009, at 2:33 AM, NeilBrown wrote:
>>>
>>> [An alternate might be to make the sunrpc code always "connect"
>>> udp sockets so that "port not reachable" errors would get reported
>>> back.  This requires a more intrusive change though and might have
>>> other consequences]
>>
>> We had discussed this about a year ago when I started adding IPv6
>> support.  I had suggested switching the local rpc client to use TCP
>> instead of UDP to solve exactly this time-out problem during start-
>> up.  There was some resistance to the idea because TCP would leave
>> privileged ports in TIMEWAIT (at shutdown, this is probably not a
>> significant concern).
>>
>> Trond had intended to introduce connected UDP socket support to the
>> RPC client, although we were also interested in someday having a
>> single UDP socket for all RPC traffic... the design never moved on
>> from there.
>>
>> My feeling at this point is that having a connected UDP socket
>> transport would be simpler and have broader benefits than waiting for
>> an eventual design that can accommodate multiple transport instances
>> sharing a single socket.
>
> The use of connected UDP would have to be limited to known-safe cases
> such as contacting the local portmap.  I believe there are still NFS
> servers out there that - if multihomed - can reply from a different
> address to the one the request was sent to.

I think I advocated for adding an entirely new transport capability  
called CUDP at the time.  But this is definitely something to remember  
as we test.

If a new transport capability is added, at this point we would likely  
need some additional logic in the NFS mount parsing logic to expose  
such a transport to user space.  So, leaving that parsing logic alone  
should insulate the NFS client from the new transport until we have  
more confidence.

> And we would need to check that rpcbind does the right thing.  I
> recently discovered that rpcbind is buggy and will sometimes respond
> from the wrong interface - I suspect localhost addresses are safe, but
> we would need to check, or fix it (I fixed that bug in portmap (glibc
> actually) 6 years ago and now it appears again in rpcbind - groan!).

Details welcome.  We will probably need to fix libtirpc.

> How hard would it be to add (optional) connected UDP support?  Would
> we just make the code more like the TCP version, or are there any
> gotchas that you know of that we would need to be careful of?

The code in net/sunrpc/xprtsock.c is a bunch of transport methods,  
many of which are shared between the UDP and TCP transport  
capabilities.  You could probably do this easily by creating a new  
xprt_class structure and a new ops vector, then reuse as many UDP  
methods as possible.  The TCP connect method could be usable as is,  
but it would be simple to copy-n-paste a new one if some variation is  
required.  Then, define a new XPRT_ value, and use that in  
rpcb_create_local().

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
  2009-06-11 15:44             ` Chuck Lever
@ 2009-07-02 20:04               ` Chuck Lever
  2009-07-06 12:42                 ` Suresh Jayaraman
  0 siblings, 1 reply; 25+ messages in thread
From: Chuck Lever @ 2009-07-02 20:04 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux NFS mailing list

On Jun 11, 2009, at 11:44 AM, Chuck Lever wrote:
> On Jun 11, 2009, at 12:48 AM, Neil Brown wrote:
>> On Thursday May 28, chuck.lever@oracle.com wrote:
>>> On May 28, 2009, at 2:33 AM, NeilBrown wrote:
>>>>
>>>> [An alternate might be to make the sunrpc code always "connect"
>>>> udp sockets so that "port not reachable" errors would get reported
>>>> back.  This requires a more intrusive change though and might have
>>>> other consequences]
>>>
>>> We had discussed this about a year ago when I started adding IPv6
>>> support.  I had suggested switching the local rpc client to use TCP
>>> instead of UDP to solve exactly this time-out problem during start-
>>> up.  There was some resistance to the idea because TCP would leave
>>> privileged ports in TIMEWAIT (at shutdown, this is probably not a
>>> significant concern).
>>>
>>> Trond had intended to introduce connected UDP socket support to the
>>> RPC client, although we were also interested in someday having a
>>> single UDP socket for all RPC traffic... the design never moved on
>>> from there.
>>>
>>> My feeling at this point is that having a connected UDP socket
>>> transport would be simpler and have broader benefits than waiting  
>>> for
>>> an eventual design that can accommodate multiple transport instances
>>> sharing a single socket.
>>
>> The use of connected UDP would have to be limited to known-safe cases
>> such as contacting the local portmap.  I believe there are still NFS
>> servers out there that - if multihomed - can reply from a different
>> address to the one the request was sent to.
>
> I think I advocated for adding an entirely new transport capability  
> called CUDP at the time.  But this is definitely something to  
> remember as we test.
>
> If a new transport capability is added, at this point we would  
> likely need some additional logic in the NFS mount parsing logic to  
> expose such a transport to user space.  So, leaving that parsing  
> logic alone should insulate the NFS client from the new transport  
> until we have more confidence.
>
>> And we would need to check that rpcbind does the right thing.  I
>> recently discovered that rpcbind is buggy and will sometimes respond
>> from the wrong interface - I suspect localhost addresses are safe,  
>> but
>> we would need to check, or fix it (I fixed that bug in portmap (glibc
>> actually) 6 years ago and now it appears again in rpcbind - groan!).
>
> Details welcome.  We will probably need to fix libtirpc.
>
>> How hard would it be to add (optional) connected UDP support?  Would
>> we just make the code more like the TCP version, or are there any
>> gotchas that you know of that we would need to be careful of?
>
> The code in net/sunrpc/xprtsock.c is a bunch of transport methods,  
> many of which are shared between the UDP and TCP transport  
> capabilities.  You could probably do this easily by creating a new  
> xprt_class structure and a new ops vector, then reuse as many UDP  
> methods as possible.  The TCP connect method could be usable as is,  
> but it would be simple to copy-n-paste a new one if some variation  
> is required.  Then, define a new XPRT_ value, and use that in  
> rpcb_create_local().

I've thought about this some more...

It seems to me that you might be better off using the existing UDP  
transport code, but adding a new RPC_CLNT_CREATE_ flag to enable  
connected UDP semantics.  The two transports are otherwise exactly the  
same.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
  2009-07-02 20:04               ` Chuck Lever
@ 2009-07-06 12:42                 ` Suresh Jayaraman
  2009-07-06 14:30                   ` Chuck Lever
  0 siblings, 1 reply; 25+ messages in thread
From: Suresh Jayaraman @ 2009-07-06 12:42 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Neil Brown, Linux NFS mailing list

Chuck Lever wrote:
> On Jun 11, 2009, at 11:44 AM, Chuck Lever wrote:
>> On Jun 11, 2009, at 12:48 AM, Neil Brown wrote:

>>
>>> How hard would it be to add (optional) connected UDP support?  Would
>>> we just make the code more like the TCP version, or are there any
>>> gotchas that you know of that we would need to be careful of?
>>
>> The code in net/sunrpc/xprtsock.c is a bunch of transport methods,
>> many of which are shared between the UDP and TCP transport
>> capabilities.  You could probably do this easily by creating a new
>> xprt_class structure and a new ops vector, then reuse as many UDP
>> methods as possible.  The TCP connect method could be usable as is,
>> but it would be simple to copy-n-paste a new one if some variation is
>> required.  Then, define a new XPRT_ value, and use that in
>> rpcb_create_local().

I attempted a patch based on your suggestions, while the socket seems
to be getting the -ECONNREFUSED error, but it isn't propagating all the
way up (yet to debug, why)




 include/linux/sunrpc/xprtsock.h |    6 ++
 net/sunrpc/rpcb_clnt.c          |    2 +-
 net/sunrpc/xprt.c               |    5 +
 net/sunrpc/xprtsock.c           |  186 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 198 insertions(+), 1 deletions(-)

diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h
index c2a46c4..3f05a45 100644
--- a/include/linux/sunrpc/xprtsock.h
+++ b/include/linux/sunrpc/xprtsock.h
@@ -22,6 +22,12 @@ void		cleanup_socket_xprt(void);
  */
 #define XPRT_TRANSPORT_UDP	IPPROTO_UDP
 #define XPRT_TRANSPORT_TCP	IPPROTO_TCP
+/*
+ * Connected UDP doesn't seem to be a well-defined protocol, picking a number
+ * other than 17 and 6 should be OK.
+ * FIXME: If the above assumption is wrong.
+ */
+#define XPRT_TRANSPORT_CUDP	18
 
 /*
  * RPC slot table sizes for UDP, TCP transports
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index beee6da..73f8048 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -135,7 +135,7 @@ static struct rpc_clnt *rpcb_create_local(struct sockaddr *addr,
 					  size_t addrlen, u32 version)
 {
 	struct rpc_create_args args = {
-		.protocol	= XPRT_TRANSPORT_UDP,
+		.protocol	= XPRT_TRANSPORT_CUDP,
 		.address	= addr,
 		.addrsize	= addrlen,
 		.servername	= "localhost",
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index f412a85..7e605f1 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -898,6 +898,11 @@ void xprt_transmit(struct rpc_task *task)
 	req->rq_connect_cookie = xprt->connect_cookie;
 	req->rq_xtime = jiffies;
 	status = xprt->ops->send_request(task);
+	if (status == -ECONNREFUSED) {
+		dprintk("RPC: send_request returned (%d) in xprt_transmit\n",
+				status);
+		rpc_exit(task, status);
+	}
 	if (status != 0) {
 		task->tk_status = status;
 		return;
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 83c73c4..6cc24c3 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -193,6 +193,8 @@ static ctl_table sunrpc_table[] = {
  */
 #define XS_IDLE_DISC_TO		(5U * 60 * HZ)
 
+#define IPPROTO_CUDP	XPRT_TRANSPORT_CUDP
+
 #ifdef RPC_DEBUG
 # undef  RPC_DEBUG_DATA
 # define RPCDBG_FACILITY	RPCDBG_TRANS
@@ -1832,6 +1834,100 @@ out:
 	xprt_wake_pending_tasks(xprt, status);
 }
 
+/**
+ * xs_cudp_connect_worker4 - set up a connected UDP socket
+ * @work: RPC transport to connect
+ *
+ * Invoked by a work queue tasklet.
+ */
+static void xs_cudp_connect_worker4(struct work_struct *work)
+{
+	struct sock_xprt *transport =
+		container_of(work, struct sock_xprt, connect_worker.work);
+	struct rpc_xprt *xprt = &transport->xprt;
+	struct socket *sock = transport->sock;
+	int err, status = -EIO;
+
+	if (xprt->shutdown)
+		goto out;
+
+	/* Start by resetting any existing state */
+	xs_reset_transport(transport);
+
+	err = sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock);
+	if (err < 0) {
+		dprintk("RPC:       can't create UDP transport socket (%d).\n", -err);
+		goto out;
+	}
+	xs_reclassify_socket4(sock);
+
+	if (xs_bind4(transport, sock)) {
+		sock_release(sock);
+		goto out;
+	}
+
+	dprintk("RPC:       worker connecting xprt %p to address: %s\n",
+			xprt, xprt->address_strings[RPC_DISPLAY_ALL]);
+
+	xs_udp_finish_connecting(xprt, sock);
+	err = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK);
+	if (err < 0) {
+		dprintk("RPC:	connect on UDP socket.. failed (%d).\n", -err);
+		goto out;
+	}
+	status = 0;
+out:
+	xprt_clear_connecting(xprt);
+	xprt_wake_pending_tasks(xprt, status);
+}
+
+/**
+ * xs_cudp_connect_worker6 - set up a Connected UDP socket
+ * @work: RPC transport to connect
+ *
+ * Invoked by a work queue tasklet.
+ */
+static void xs_cudp_connect_worker6(struct work_struct *work)
+{
+	struct sock_xprt *transport =
+		container_of(work, struct sock_xprt, connect_worker.work);
+	struct rpc_xprt *xprt = &transport->xprt;
+	struct socket *sock = transport->sock;
+	int err, status = -EIO;
+
+	if (xprt->shutdown)
+		goto out;
+
+	/* Start by resetting any existing state */
+	xs_reset_transport(transport);
+
+	err = sock_create_kern(PF_INET6, SOCK_DGRAM, IPPROTO_UDP, &sock);
+	if (err < 0) {
+		dprintk("RPC:       can't create UDP transport socket (%d).\n", -err);
+		goto out;
+	}
+	xs_reclassify_socket6(sock);
+
+	if (xs_bind6(transport, sock) < 0) {
+		sock_release(sock);
+		goto out;
+	}
+
+	dprintk("RPC:       worker connecting xprt %p to address: %s\n",
+			xprt, xprt->address_strings[RPC_DISPLAY_ALL]);
+
+	xs_udp_finish_connecting(xprt, sock);
+	err = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK);
+	if (err < 0) {
+		dprintk("RPC:	connect on UDP socket... failed (%d).\n", -err);
+		goto out;
+	}
+	status = 0;
+out:
+	xprt_clear_connecting(xprt);
+	xprt_wake_pending_tasks(xprt, status);
+}
+
 /*
  * We need to preserve the port number so the reply cache on the server can
  * find our cached RPC replies when we get around to reconnecting.
@@ -2192,6 +2288,24 @@ static struct rpc_xprt_ops xs_tcp_ops = {
 	.print_stats		= xs_tcp_print_stats,
 };
 
+static struct rpc_xprt_ops xs_cudp_ops = {
+	.set_buffer_size	= xs_udp_set_buffer_size,
+	.reserve_xprt		= xprt_reserve_xprt_cong,
+	.release_xprt		= xprt_release_xprt_cong,
+	.rpcbind		= rpcb_getport_async,
+	.set_port		= xs_set_port,
+	.connect		= xs_tcp_connect,
+	.buf_alloc		= rpc_malloc,
+	.buf_free		= rpc_free,
+	.send_request		= xs_udp_send_request,
+	.set_retrans_timeout	= xprt_set_retrans_timeout_rtt,
+	.timer			= xs_udp_timer,
+	.release_request	= xprt_release_rqst_cong,
+	.close			= xs_close,
+	.destroy		= xs_destroy,
+	.print_stats		= xs_udp_print_stats,
+};
+
 static struct rpc_xprt *xs_setup_xprt(struct xprt_create *args,
 				      unsigned int slot_table_size)
 {
@@ -2298,6 +2412,68 @@ static struct rpc_xprt *xs_setup_udp(struct xprt_create *args)
 	return ERR_PTR(-EINVAL);
 }
 
+/**
+ * xs_setup_cudp - Set up transport to use a connected UDP socket
+ * @args: rpc transport creation arguments
+ *
+ */
+static struct rpc_xprt *xs_setup_cudp(struct xprt_create *args)
+{
+	struct sockaddr *addr = args->dstaddr;
+	struct rpc_xprt *xprt;
+	struct sock_xprt *transport;
+
+	xprt = xs_setup_xprt(args, xprt_udp_slot_table_entries);
+	if (IS_ERR(xprt))
+		return xprt;
+	transport = container_of(xprt, struct sock_xprt, xprt);
+
+	xprt->prot = IPPROTO_CUDP;
+	xprt->tsh_size = 0;
+	/* XXX: header size can vary due to auth type, IPv6, etc. */
+	xprt->max_payload = (1U << 16) - (MAX_HEADER << 3);
+
+	xprt->bind_timeout = XS_BIND_TO;
+	xprt->connect_timeout = XS_UDP_CONN_TO;
+	xprt->reestablish_timeout = XS_UDP_REEST_TO;
+	xprt->idle_timeout = XS_IDLE_DISC_TO;
+
+	xprt->ops = &xs_cudp_ops;
+	xprt->timeout = &xs_udp_default_timeout;
+
+	switch (addr->sa_family) {
+	case AF_INET:
+		if (((struct sockaddr_in *)addr)->sin_port != htons(0))
+			xprt_set_bound(xprt);
+
+		INIT_DELAYED_WORK(&transport->connect_worker,
+					xs_cudp_connect_worker4);
+		xs_format_ipv4_peer_addresses(xprt, "cudp", RPCBIND_NETID_UDP);
+		break;
+	case AF_INET6:
+		if (((struct sockaddr_in6 *)addr)->sin6_port != htons(0))
+			xprt_set_bound(xprt);
+
+		INIT_DELAYED_WORK(&transport->connect_worker,
+					xs_cudp_connect_worker6);
+		xs_format_ipv6_peer_addresses(xprt, "cudp", RPCBIND_NETID_UDP);
+		break;
+	default:
+		kfree(xprt);
+		return ERR_PTR(-EAFNOSUPPORT);
+	}
+
+	dprintk("RPC:       set up transport to address %s\n",
+			xprt->address_strings[RPC_DISPLAY_ALL]);
+
+	if (try_module_get(THIS_MODULE))
+		return xprt;
+
+	kfree(xprt->slot);
+	kfree(xprt);
+	return ERR_PTR(-EINVAL);
+}
+
 static const struct rpc_timeout xs_tcp_default_timeout = {
 	.to_initval = 60 * HZ,
 	.to_maxval = 60 * HZ,
@@ -2379,6 +2555,14 @@ static struct xprt_class	xs_tcp_transport = {
 	.setup		= xs_setup_tcp,
 };
 
+static struct xprt_class	xs_cudp_transport = {
+	.list		= LIST_HEAD_INIT(xs_cudp_transport.list),
+	.name		= "cudp",
+	.owner		= THIS_MODULE,
+	.ident		= IPPROTO_CUDP,
+	.setup		= xs_setup_cudp,
+};
+
 /**
  * init_socket_xprt - set up xprtsock's sysctls, register with RPC client
  *
@@ -2392,6 +2576,7 @@ int init_socket_xprt(void)
 
 	xprt_register_transport(&xs_udp_transport);
 	xprt_register_transport(&xs_tcp_transport);
+	xprt_register_transport(&xs_cudp_transport);
 
 	return 0;
 }
@@ -2411,4 +2596,5 @@ void cleanup_socket_xprt(void)
 
 	xprt_unregister_transport(&xs_udp_transport);
 	xprt_unregister_transport(&xs_tcp_transport);
+	xprt_unregister_transport(&xs_cudp_transport);
 }

> I've thought about this some more...
> 
> It seems to me that you might be better off using the existing UDP
> transport code, but adding a new RPC_CLNT_CREATE_ flag to enable
> connected UDP semantics.  The two transports are otherwise exactly the
> same.
> 

It doesn't seem that there is a clean way of doing this. The function
xs_setup_udp() sets up the corresponding connect_worker function which
actually sets up the UDP socket. There doesn't seem to be a way to check
whether this flag (or a new rpc_clnt->cl_ flag) is set or not in either of
the functions.

OTOH, why not use AF_LOCAL sockets since it's for local communication only?


Thanks,


-- 
Suresh Jayaraman

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
  2009-07-06 12:42                 ` Suresh Jayaraman
@ 2009-07-06 14:30                   ` Chuck Lever
  2009-07-06 16:08                     ` Suresh Jayaraman
  0 siblings, 1 reply; 25+ messages in thread
From: Chuck Lever @ 2009-07-06 14:30 UTC (permalink / raw)
  To: Suresh Jayaraman; +Cc: Neil Brown, Trond Myklebust, Linux NFS mailing list

On Jul 6, 2009, at 8:42 AM, Suresh Jayaraman wrote:
> Chuck Lever wrote:
>> On Jun 11, 2009, at 11:44 AM, Chuck Lever wrote:
>>> On Jun 11, 2009, at 12:48 AM, Neil Brown wrote:
>
>>>
>>>> How hard would it be to add (optional) connected UDP support?   
>>>> Would
>>>> we just make the code more like the TCP version, or are there any
>>>> gotchas that you know of that we would need to be careful of?
>>>
>>> The code in net/sunrpc/xprtsock.c is a bunch of transport methods,
>>> many of which are shared between the UDP and TCP transport
>>> capabilities.  You could probably do this easily by creating a new
>>> xprt_class structure and a new ops vector, then reuse as many UDP
>>> methods as possible.  The TCP connect method could be usable as is,
>>> but it would be simple to copy-n-paste a new one if some variation  
>>> is
>>> required.  Then, define a new XPRT_ value, and use that in
>>> rpcb_create_local().
>
> I attempted a patch based on your suggestions, while the socket seems
> to be getting the -ECONNREFUSED error, but it isn't propagating all  
> the
> way up (yet to debug, why)

I suspect it's because a while ago Trond changed the connect logic to  
retry everything, including ECONNREFUSED.

I've hit this problem recently as well.  The kernel's NFS mount client  
needs rpcbind to recognize when a port is not active so it can stop  
retrying that port and switch to a different transport, just as  
mount.nfs does.

We will need to add a mechanism to allow ECONNREFUSED to be propagated  
up the stack again.  Trond suggested adding a per-RPC flag (on  
rpc_call_sync()) to do this.  Connection semantics seem to me to be an  
attribute of the transport, not of a single RPC, though.  And, the  
protocol where we really need this is rpcbind, which usually creates a  
transport to send just a single RPC.

> include/linux/sunrpc/xprtsock.h |    6 ++
> net/sunrpc/rpcb_clnt.c          |    2 +-
> net/sunrpc/xprt.c               |    5 +
> net/sunrpc/xprtsock.c           |  186 ++++++++++++++++++++++++++++++ 
> +++++++++
> 4 files changed, 198 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/ 
> xprtsock.h
> index c2a46c4..3f05a45 100644
> --- a/include/linux/sunrpc/xprtsock.h
> +++ b/include/linux/sunrpc/xprtsock.h
> @@ -22,6 +22,12 @@ void		cleanup_socket_xprt(void);
>  */
> #define XPRT_TRANSPORT_UDP	IPPROTO_UDP
> #define XPRT_TRANSPORT_TCP	IPPROTO_TCP
> +/*
> + * Connected UDP doesn't seem to be a well-defined protocol,  
> picking a number
> + * other than 17 and 6 should be OK.
> + * FIXME: If the above assumption is wrong.
> + */
> +#define XPRT_TRANSPORT_CUDP	18
>
> /*
>  * RPC slot table sizes for UDP, TCP transports
> diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
> index beee6da..73f8048 100644
> --- a/net/sunrpc/rpcb_clnt.c
> +++ b/net/sunrpc/rpcb_clnt.c
> @@ -135,7 +135,7 @@ static struct rpc_clnt *rpcb_create_local(struct  
> sockaddr *addr,
> 					  size_t addrlen, u32 version)
> {
> 	struct rpc_create_args args = {
> -		.protocol	= XPRT_TRANSPORT_UDP,
> +		.protocol	= XPRT_TRANSPORT_CUDP,
> 		.address	= addr,
> 		.addrsize	= addrlen,
> 		.servername	= "localhost",
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index f412a85..7e605f1 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -898,6 +898,11 @@ void xprt_transmit(struct rpc_task *task)
> 	req->rq_connect_cookie = xprt->connect_cookie;
> 	req->rq_xtime = jiffies;
> 	status = xprt->ops->send_request(task);
> +	if (status == -ECONNREFUSED) {
> +		dprintk("RPC: send_request returned (%d) in xprt_transmit\n",
> +				status);
> +		rpc_exit(task, status);
> +	}
> 	if (status != 0) {
> 		task->tk_status = status;
> 		return;
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 83c73c4..6cc24c3 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -193,6 +193,8 @@ static ctl_table sunrpc_table[] = {
>  */
> #define XS_IDLE_DISC_TO		(5U * 60 * HZ)
>
> +#define IPPROTO_CUDP	XPRT_TRANSPORT_CUDP
> +
> #ifdef RPC_DEBUG
> # undef  RPC_DEBUG_DATA
> # define RPCDBG_FACILITY	RPCDBG_TRANS
> @@ -1832,6 +1834,100 @@ out:
> 	xprt_wake_pending_tasks(xprt, status);
> }
>
> +/**
> + * xs_cudp_connect_worker4 - set up a connected UDP socket
> + * @work: RPC transport to connect
> + *
> + * Invoked by a work queue tasklet.
> + */
> +static void xs_cudp_connect_worker4(struct work_struct *work)
> +{
> +	struct sock_xprt *transport =
> +		container_of(work, struct sock_xprt, connect_worker.work);
> +	struct rpc_xprt *xprt = &transport->xprt;
> +	struct socket *sock = transport->sock;
> +	int err, status = -EIO;
> +
> +	if (xprt->shutdown)
> +		goto out;
> +
> +	/* Start by resetting any existing state */
> +	xs_reset_transport(transport);
> +
> +	err = sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock);
> +	if (err < 0) {
> +		dprintk("RPC:       can't create UDP transport socket (%d).\n", - 
> err);
> +		goto out;
> +	}
> +	xs_reclassify_socket4(sock);
> +
> +	if (xs_bind4(transport, sock)) {
> +		sock_release(sock);
> +		goto out;
> +	}
> +
> +	dprintk("RPC:       worker connecting xprt %p to address: %s\n",
> +			xprt, xprt->address_strings[RPC_DISPLAY_ALL]);
> +
> +	xs_udp_finish_connecting(xprt, sock);
> +	err = kernel_connect(sock, xs_addr(xprt), xprt->addrlen,  
> O_NONBLOCK);
> +	if (err < 0) {
> +		dprintk("RPC:	connect on UDP socket.. failed (%d).\n", -err);
> +		goto out;
> +	}
> +	status = 0;
> +out:
> +	xprt_clear_connecting(xprt);
> +	xprt_wake_pending_tasks(xprt, status);
> +}
> +
> +/**
> + * xs_cudp_connect_worker6 - set up a Connected UDP socket
> + * @work: RPC transport to connect
> + *
> + * Invoked by a work queue tasklet.
> + */
> +static void xs_cudp_connect_worker6(struct work_struct *work)
> +{
> +	struct sock_xprt *transport =
> +		container_of(work, struct sock_xprt, connect_worker.work);
> +	struct rpc_xprt *xprt = &transport->xprt;
> +	struct socket *sock = transport->sock;
> +	int err, status = -EIO;
> +
> +	if (xprt->shutdown)
> +		goto out;
> +
> +	/* Start by resetting any existing state */
> +	xs_reset_transport(transport);
> +
> +	err = sock_create_kern(PF_INET6, SOCK_DGRAM, IPPROTO_UDP, &sock);
> +	if (err < 0) {
> +		dprintk("RPC:       can't create UDP transport socket (%d).\n", - 
> err);
> +		goto out;
> +	}
> +	xs_reclassify_socket6(sock);
> +
> +	if (xs_bind6(transport, sock) < 0) {
> +		sock_release(sock);
> +		goto out;
> +	}
> +
> +	dprintk("RPC:       worker connecting xprt %p to address: %s\n",
> +			xprt, xprt->address_strings[RPC_DISPLAY_ALL]);
> +
> +	xs_udp_finish_connecting(xprt, sock);
> +	err = kernel_connect(sock, xs_addr(xprt), xprt->addrlen,  
> O_NONBLOCK);
> +	if (err < 0) {
> +		dprintk("RPC:	connect on UDP socket... failed (%d).\n", -err);
> +		goto out;
> +	}
> +	status = 0;
> +out:
> +	xprt_clear_connecting(xprt);
> +	xprt_wake_pending_tasks(xprt, status);
> +}
> +
> /*
>  * We need to preserve the port number so the reply cache on the  
> server can
>  * find our cached RPC replies when we get around to reconnecting.
> @@ -2192,6 +2288,24 @@ static struct rpc_xprt_ops xs_tcp_ops = {
> 	.print_stats		= xs_tcp_print_stats,
> };
>
> +static struct rpc_xprt_ops xs_cudp_ops = {
> +	.set_buffer_size	= xs_udp_set_buffer_size,
> +	.reserve_xprt		= xprt_reserve_xprt_cong,
> +	.release_xprt		= xprt_release_xprt_cong,
> +	.rpcbind		= rpcb_getport_async,
> +	.set_port		= xs_set_port,
> +	.connect		= xs_tcp_connect,
> +	.buf_alloc		= rpc_malloc,
> +	.buf_free		= rpc_free,
> +	.send_request		= xs_udp_send_request,
> +	.set_retrans_timeout	= xprt_set_retrans_timeout_rtt,
> +	.timer			= xs_udp_timer,
> +	.release_request	= xprt_release_rqst_cong,
> +	.close			= xs_close,
> +	.destroy		= xs_destroy,
> +	.print_stats		= xs_udp_print_stats,
> +};
> +
> static struct rpc_xprt *xs_setup_xprt(struct xprt_create *args,
> 				      unsigned int slot_table_size)
> {
> @@ -2298,6 +2412,68 @@ static struct rpc_xprt *xs_setup_udp(struct  
> xprt_create *args)
> 	return ERR_PTR(-EINVAL);
> }
>
> +/**
> + * xs_setup_cudp - Set up transport to use a connected UDP socket
> + * @args: rpc transport creation arguments
> + *
> + */
> +static struct rpc_xprt *xs_setup_cudp(struct xprt_create *args)
> +{
> +	struct sockaddr *addr = args->dstaddr;
> +	struct rpc_xprt *xprt;
> +	struct sock_xprt *transport;
> +
> +	xprt = xs_setup_xprt(args, xprt_udp_slot_table_entries);
> +	if (IS_ERR(xprt))
> +		return xprt;
> +	transport = container_of(xprt, struct sock_xprt, xprt);
> +
> +	xprt->prot = IPPROTO_CUDP;
> +	xprt->tsh_size = 0;
> +	/* XXX: header size can vary due to auth type, IPv6, etc. */
> +	xprt->max_payload = (1U << 16) - (MAX_HEADER << 3);
> +
> +	xprt->bind_timeout = XS_BIND_TO;
> +	xprt->connect_timeout = XS_UDP_CONN_TO;
> +	xprt->reestablish_timeout = XS_UDP_REEST_TO;
> +	xprt->idle_timeout = XS_IDLE_DISC_TO;
> +
> +	xprt->ops = &xs_cudp_ops;
> +	xprt->timeout = &xs_udp_default_timeout;
> +
> +	switch (addr->sa_family) {
> +	case AF_INET:
> +		if (((struct sockaddr_in *)addr)->sin_port != htons(0))
> +			xprt_set_bound(xprt);
> +
> +		INIT_DELAYED_WORK(&transport->connect_worker,
> +					xs_cudp_connect_worker4);
> +		xs_format_ipv4_peer_addresses(xprt, "cudp", RPCBIND_NETID_UDP);
> +		break;
> +	case AF_INET6:
> +		if (((struct sockaddr_in6 *)addr)->sin6_port != htons(0))
> +			xprt_set_bound(xprt);
> +
> +		INIT_DELAYED_WORK(&transport->connect_worker,
> +					xs_cudp_connect_worker6);
> +		xs_format_ipv6_peer_addresses(xprt, "cudp", RPCBIND_NETID_UDP);
> +		break;
> +	default:
> +		kfree(xprt);
> +		return ERR_PTR(-EAFNOSUPPORT);
> +	}
> +
> +	dprintk("RPC:       set up transport to address %s\n",
> +			xprt->address_strings[RPC_DISPLAY_ALL]);
> +
> +	if (try_module_get(THIS_MODULE))
> +		return xprt;
> +
> +	kfree(xprt->slot);
> +	kfree(xprt);
> +	return ERR_PTR(-EINVAL);
> +}
> +
> static const struct rpc_timeout xs_tcp_default_timeout = {
> 	.to_initval = 60 * HZ,
> 	.to_maxval = 60 * HZ,
> @@ -2379,6 +2555,14 @@ static struct xprt_class	xs_tcp_transport = {
> 	.setup		= xs_setup_tcp,
> };
>
> +static struct xprt_class	xs_cudp_transport = {
> +	.list		= LIST_HEAD_INIT(xs_cudp_transport.list),
> +	.name		= "cudp",
> +	.owner		= THIS_MODULE,
> +	.ident		= IPPROTO_CUDP,
> +	.setup		= xs_setup_cudp,
> +};
> +
> /**
>  * init_socket_xprt - set up xprtsock's sysctls, register with RPC  
> client
>  *
> @@ -2392,6 +2576,7 @@ int init_socket_xprt(void)
>
> 	xprt_register_transport(&xs_udp_transport);
> 	xprt_register_transport(&xs_tcp_transport);
> +	xprt_register_transport(&xs_cudp_transport);
>
> 	return 0;
> }
> @@ -2411,4 +2596,5 @@ void cleanup_socket_xprt(void)
>
> 	xprt_unregister_transport(&xs_udp_transport);
> 	xprt_unregister_transport(&xs_tcp_transport);
> +	xprt_unregister_transport(&xs_cudp_transport);
> }
>
>> I've thought about this some more...
>>
>> It seems to me that you might be better off using the existing UDP
>> transport code, but adding a new RPC_CLNT_CREATE_ flag to enable
>> connected UDP semantics.  The two transports are otherwise exactly  
>> the
>> same.
>>
>
> It doesn't seem that there is a clean way of doing this. The function
> xs_setup_udp() sets up the corresponding connect_worker function which
> actually sets up the UDP socket. There doesn't seem to be a way to  
> check
> whether this flag (or a new rpc_clnt->cl_ flag) is set or not in  
> either of
> the functions.
>
> OTOH, why not use AF_LOCAL sockets since it's for local  
> communication only?
>
>
> Thanks,
>
>
> -- 
> Suresh Jayaraman

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
  2009-07-06 14:30                   ` Chuck Lever
@ 2009-07-06 16:08                     ` Suresh Jayaraman
  2009-07-06 16:22                       ` Trond Myklebust
  2009-07-06 16:31                       ` Chuck Lever
  0 siblings, 2 replies; 25+ messages in thread
From: Suresh Jayaraman @ 2009-07-06 16:08 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Neil Brown, Trond Myklebust, Linux NFS mailing list

Chuck Lever wrote:
> On Jul 6, 2009, at 8:42 AM, Suresh Jayaraman wrote:
>> Chuck Lever wrote:
>>> On Jun 11, 2009, at 11:44 AM, Chuck Lever wrote:
>>>> On Jun 11, 2009, at 12:48 AM, Neil Brown wrote:
>>
>>>>
>>>>> How hard would it be to add (optional) connected UDP support?  Would
>>>>> we just make the code more like the TCP version, or are there any
>>>>> gotchas that you know of that we would need to be careful of?
>>>>
>>>> The code in net/sunrpc/xprtsock.c is a bunch of transport methods,
>>>> many of which are shared between the UDP and TCP transport
>>>> capabilities.  You could probably do this easily by creating a new
>>>> xprt_class structure and a new ops vector, then reuse as many UDP
>>>> methods as possible.  The TCP connect method could be usable as is,
>>>> but it would be simple to copy-n-paste a new one if some variation is
>>>> required.  Then, define a new XPRT_ value, and use that in
>>>> rpcb_create_local().
>>
>> I attempted a patch based on your suggestions, while the socket seems
>> to be getting the -ECONNREFUSED error, but it isn't propagating all the
>> way up (yet to debug, why)
> 
> I suspect it's because a while ago Trond changed the connect logic to
> retry everything, including ECONNREFUSED.

> I've hit this problem recently as well.  The kernel's NFS mount client
> needs rpcbind to recognize when a port is not active so it can stop
> retrying that port and switch to a different transport, just as
> mount.nfs does.
> 
> We will need to add a mechanism to allow ECONNREFUSED to be propagated
> up the stack again.  Trond suggested adding a per-RPC flag (on
> rpc_call_sync()) to do this.  Connection semantics seem to me to be an
> attribute of the transport, not of a single RPC, though.  And, the
> protocol where we really need this is rpcbind, which usually creates a
> transport to send just a single RPC.
> 

Ah ok, good to know this.

BTW, it seems my questions on using RPC_CLNT_CREATE_ flag and using
AF_LOCAL sockets got overshadowed (seen below) by the patch. Would
making rpcbind using AF_LOCAL sockets a good idea or connected UDP still
seems a better solution?

>>
>>> I've thought about this some more...
>>>
>>> It seems to me that you might be better off using the existing UDP
>>> transport code, but adding a new RPC_CLNT_CREATE_ flag to enable
>>> connected UDP semantics.  The two transports are otherwise exactly the
>>> same.
>>>
>>
>> It doesn't seem that there is a clean way of doing this. The function
>> xs_setup_udp() sets up the corresponding connect_worker function which
>> actually sets up the UDP socket. There doesn't seem to be a way to check
>> whether this flag (or a new rpc_clnt->cl_ flag) is set or not in
>> either of
>> the functions.
>>
>> OTOH, why not use AF_LOCAL sockets since it's for local communication
>> only?
>>

Thanks,

-- 
Suresh Jayaraman

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
  2009-07-06 16:08                     ` Suresh Jayaraman
@ 2009-07-06 16:22                       ` Trond Myklebust
  2009-07-06 16:31                       ` Chuck Lever
  1 sibling, 0 replies; 25+ messages in thread
From: Trond Myklebust @ 2009-07-06 16:22 UTC (permalink / raw)
  To: Suresh Jayaraman; +Cc: Chuck Lever, Neil Brown, Linux NFS mailing list

On Mon, 2009-07-06 at 21:38 +0530, Suresh Jayaraman wrote:
> BTW, it seems my questions on using RPC_CLNT_CREATE_ flag and using
> AF_LOCAL sockets got overshadowed (seen below) by the patch. Would
> making rpcbind using AF_LOCAL sockets a good idea or connected UDP still
> seems a better solution?

The kernel RPC client doesn't support AF_LOCAL sockets. Why would we
want to add them when we already have other mechanisms to communicate
with userland that provide the same functionality?

Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
  2009-07-06 16:08                     ` Suresh Jayaraman
  2009-07-06 16:22                       ` Trond Myklebust
@ 2009-07-06 16:31                       ` Chuck Lever
  2009-07-06 16:40                         ` Trond Myklebust
  1 sibling, 1 reply; 25+ messages in thread
From: Chuck Lever @ 2009-07-06 16:31 UTC (permalink / raw)
  To: Suresh Jayaraman; +Cc: Neil Brown, Trond Myklebust, Linux NFS mailing list

On Jul 6, 2009, at 12:08 PM, Suresh Jayaraman wrote:
> Chuck Lever wrote:
>> On Jul 6, 2009, at 8:42 AM, Suresh Jayaraman wrote:
>>> Chuck Lever wrote:
>>>> On Jun 11, 2009, at 11:44 AM, Chuck Lever wrote:
>>>>> On Jun 11, 2009, at 12:48 AM, Neil Brown wrote:
>>>
>>>>>
>>>>>> How hard would it be to add (optional) connected UDP support?   
>>>>>> Would
>>>>>> we just make the code more like the TCP version, or are there any
>>>>>> gotchas that you know of that we would need to be careful of?
>>>>>
>>>>> The code in net/sunrpc/xprtsock.c is a bunch of transport methods,
>>>>> many of which are shared between the UDP and TCP transport
>>>>> capabilities.  You could probably do this easily by creating a new
>>>>> xprt_class structure and a new ops vector, then reuse as many UDP
>>>>> methods as possible.  The TCP connect method could be usable as  
>>>>> is,
>>>>> but it would be simple to copy-n-paste a new one if some  
>>>>> variation is
>>>>> required.  Then, define a new XPRT_ value, and use that in
>>>>> rpcb_create_local().
>>>
>>> I attempted a patch based on your suggestions, while the socket  
>>> seems
>>> to be getting the -ECONNREFUSED error, but it isn't propagating  
>>> all the
>>> way up (yet to debug, why)
>>
>> I suspect it's because a while ago Trond changed the connect logic to
>> retry everything, including ECONNREFUSED.
>
>> I've hit this problem recently as well.  The kernel's NFS mount  
>> client
>> needs rpcbind to recognize when a port is not active so it can stop
>> retrying that port and switch to a different transport, just as
>> mount.nfs does.
>>
>> We will need to add a mechanism to allow ECONNREFUSED to be  
>> propagated
>> up the stack again.  Trond suggested adding a per-RPC flag (on
>> rpc_call_sync()) to do this.  Connection semantics seem to me to be  
>> an
>> attribute of the transport, not of a single RPC, though.  And, the
>> protocol where we really need this is rpcbind, which usually  
>> creates a
>> transport to send just a single RPC.
>
> Ah ok, good to know this.
>
> BTW, it seems my questions on using RPC_CLNT_CREATE_ flag and using
> AF_LOCAL sockets got overshadowed (seen below) by the patch. Would
> making rpcbind using AF_LOCAL sockets a good idea or connected UDP  
> still
> seems a better solution?

See below.

>>>> I've thought about this some more...
>>>>
>>>> It seems to me that you might be better off using the existing UDP
>>>> transport code, but adding a new RPC_CLNT_CREATE_ flag to enable
>>>> connected UDP semantics.  The two transports are otherwise  
>>>> exactly the
>>>> same.
>>>
>>> It doesn't seem that there is a clean way of doing this. The  
>>> function
>>> xs_setup_udp() sets up the corresponding connect_worker function  
>>> which
>>> actually sets up the UDP socket. There doesn't seem to be a way to  
>>> check
>>> whether this flag (or a new rpc_clnt->cl_ flag) is set or not in
>>> either of
>>> the functions.

See xs_tcp_connect().  On connect, you get an rpc_task structure which  
contains an rpc_clnt pointer, and can use that to switch connection  
semantics.

>>> OTOH, why not use AF_LOCAL sockets since it's for local  
>>> communication
>>> only?

I have considered that.  AF_LOCAL in fact could replace all of our  
upcall mechanisms.  However, portmapper, which doesn't support  
AF_LOCAL, is still used in some distributions.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
  2009-07-06 16:31                       ` Chuck Lever
@ 2009-07-06 16:40                         ` Trond Myklebust
       [not found]                           ` <1246898450.11267.12.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Trond Myklebust @ 2009-07-06 16:40 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Suresh Jayaraman, Neil Brown, Linux NFS mailing list

On Mon, 2009-07-06 at 12:31 -0400, Chuck Lever wrote:
> I have considered that.  AF_LOCAL in fact could replace all of our  
> upcall mechanisms.  However, portmapper, which doesn't support  
> AF_LOCAL, is still used in some distributions.

As could AF_NETLINK, fork(), pipes, fifos, etc... Again: why would we
want to saddle ourselves with rpc over AF_LOCAL?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

[parent not found: <1246898450.11267.12.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
       [not found]                           ` <1246898450.11267.12.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-07-06 16:57                             ` Chuck Lever
  2009-07-06 17:14                               ` Trond Myklebust
  0 siblings, 1 reply; 25+ messages in thread
From: Chuck Lever @ 2009-07-06 16:57 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Suresh Jayaraman, Neil Brown, Linux NFS mailing list

On Jul 6, 2009, at 12:40 PM, Trond Myklebust wrote:
> On Mon, 2009-07-06 at 12:31 -0400, Chuck Lever wrote:
>> I have considered that.  AF_LOCAL in fact could replace all of our
>> upcall mechanisms.  However, portmapper, which doesn't support
>> AF_LOCAL, is still used in some distributions.
>
> As could AF_NETLINK, fork(), pipes, fifos, etc... Again: why would we
> want to saddle ourselves with rpc over AF_LOCAL?

TI-RPC supports AF_LOCAL RPC transports.

[cel@matisse notify-one]$ rpcinfo
    program version netid     address                service    owner
     100000    4    tcp6      ::.0.111               portmapper  
superuser
     100000    3    tcp6      ::.0.111               portmapper  
superuser
     100000    4    udp6      ::.0.111               portmapper  
superuser
     100000    3    udp6      ::.0.111               portmapper  
superuser
     100000    4    tcp       0.0.0.0.0.111          portmapper  
superuser
     100000    3    tcp       0.0.0.0.0.111          portmapper  
superuser
     100000    2    tcp       0.0.0.0.0.111          portmapper  
superuser
     100000    4    udp       0.0.0.0.0.111          portmapper  
superuser
     100000    3    udp       0.0.0.0.0.111          portmapper  
superuser
     100000    2    udp       0.0.0.0.0.111          portmapper  
superuser
     100000    4    local     /var/run/rpcbind.sock  portmapper  
superuser
     100000    3    local     /var/run/rpcbind.sock  portmapper  
superuser
     100024    1    udp       0.0.0.0.206.127        status     29
     100024    1    tcp       0.0.0.0.166.105        status     29
     100024    1    udp6      ::.141.238             status     29
     100024    1    tcp6      ::.192.160             status     29
[cel@matisse notify-one]$

The listing for '/var/run/rpcbind.sock' is rpcbind's AF_LOCAL  
listener.  TI-RPC's rpcb_foo() calls use this method of accessing the  
rpcbind database rather than going over loopback.

rpcbind scrapes the caller's effective UID off the transport socket  
and uses that for authentication.  Note the "owner" column... that  
comes from the socket's UID, not from the r_owner field.  When a  
service is registered over the network, the owner column says  
"unknown" and basically anyone can unset it.

If the kernel used AF_LOCAL to register its services, it would mean we  
would never use a network port for local rpcbind calls between the  
kernel and rpcbind, and rpcbind could automatically prevent the  
kernel's RPC services from getting unset by malicious users.  If /var/ 
run/rpcbind.sock isn't there, the kernel would know immediately that  
rpcbind wasn't running.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
  2009-07-06 16:57                             ` Chuck Lever
@ 2009-07-06 17:14                               ` Trond Myklebust
       [not found]                                 ` <1246900456.11267.34.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Trond Myklebust @ 2009-07-06 17:14 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Suresh Jayaraman, Neil Brown, Linux NFS mailing list

On Mon, 2009-07-06 at 12:57 -0400, Chuck Lever wrote:
> On Jul 6, 2009, at 12:40 PM, Trond Myklebust wrote:
> > On Mon, 2009-07-06 at 12:31 -0400, Chuck Lever wrote:
> >> I have considered that.  AF_LOCAL in fact could replace all of our
> >> upcall mechanisms.  However, portmapper, which doesn't support
> >> AF_LOCAL, is still used in some distributions.
> >
> > As could AF_NETLINK, fork(), pipes, fifos, etc... Again: why would we
> > want to saddle ourselves with rpc over AF_LOCAL?
> 
> TI-RPC supports AF_LOCAL RPC transports.
> 
> [cel@matisse notify-one]$ rpcinfo
>     program version netid     address                service    owner
>      100000    4    tcp6      ::.0.111               portmapper  
> superuser
>      100000    3    tcp6      ::.0.111               portmapper  
> superuser
>      100000    4    udp6      ::.0.111               portmapper  
> superuser
>      100000    3    udp6      ::.0.111               portmapper  
> superuser
>      100000    4    tcp       0.0.0.0.0.111          portmapper  
> superuser
>      100000    3    tcp       0.0.0.0.0.111          portmapper  
> superuser
>      100000    2    tcp       0.0.0.0.0.111          portmapper  
> superuser
>      100000    4    udp       0.0.0.0.0.111          portmapper  
> superuser
>      100000    3    udp       0.0.0.0.0.111          portmapper  
> superuser
>      100000    2    udp       0.0.0.0.0.111          portmapper  
> superuser
>      100000    4    local     /var/run/rpcbind.sock  portmapper  
> superuser
>      100000    3    local     /var/run/rpcbind.sock  portmapper  
> superuser
>      100024    1    udp       0.0.0.0.206.127        status     29
>      100024    1    tcp       0.0.0.0.166.105        status     29
>      100024    1    udp6      ::.141.238             status     29
>      100024    1    tcp6      ::.192.160             status     29
> [cel@matisse notify-one]$
> 
> The listing for '/var/run/rpcbind.sock' is rpcbind's AF_LOCAL  
> listener.  TI-RPC's rpcb_foo() calls use this method of accessing the  
> rpcbind database rather than going over loopback.
> 
> rpcbind scrapes the caller's effective UID off the transport socket  
> and uses that for authentication.  Note the "owner" column... that  
> comes from the socket's UID, not from the r_owner field.  When a  
> service is registered over the network, the owner column says  
> "unknown" and basically anyone can unset it.
> 
> If the kernel used AF_LOCAL to register its services, it would mean we  
> would never use a network port for local rpcbind calls between the  
> kernel and rpcbind, and rpcbind could automatically prevent the  
> kernel's RPC services from getting unset by malicious users.  If /var/ 
> run/rpcbind.sock isn't there, the kernel would know immediately that  
> rpcbind wasn't running.

So what? You can achieve the same with any number of communication
channels (including the network). Just add a timeout to the current
'connect()' function, and set it to a low value when doing rpcbind
upcalls.

What's so special about libtirpc or rpcbind that we have to keep
redesigning the kernel to work around their limitations instead of the
other way round?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

[parent not found: <1246900456.11267.34.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
       [not found]                                 ` <1246900456.11267.34.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-07-06 17:51                                   ` Chuck Lever
  2009-07-06 17:58                                     ` Trond Myklebust
  0 siblings, 1 reply; 25+ messages in thread
From: Chuck Lever @ 2009-07-06 17:51 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Suresh Jayaraman, Neil Brown, Linux NFS mailing list

On Jul 6, 2009, at 1:14 PM, Trond Myklebust wrote:
> On Mon, 2009-07-06 at 12:57 -0400, Chuck Lever wrote:
>> On Jul 6, 2009, at 12:40 PM, Trond Myklebust wrote:
>>> On Mon, 2009-07-06 at 12:31 -0400, Chuck Lever wrote:
>>>> I have considered that.  AF_LOCAL in fact could replace all of our
>>>> upcall mechanisms.  However, portmapper, which doesn't support
>>>> AF_LOCAL, is still used in some distributions.
>>>
>>> As could AF_NETLINK, fork(), pipes, fifos, etc... Again: why would  
>>> we
>>> want to saddle ourselves with rpc over AF_LOCAL?
>>
>> TI-RPC supports AF_LOCAL RPC transports.
>>
>> [cel@matisse notify-one]$ rpcinfo
>>    program version netid     address                service    owner
>>     100000    4    tcp6      ::.0.111               portmapper
>> superuser
>>     100000    3    tcp6      ::.0.111               portmapper
>> superuser
>>     100000    4    udp6      ::.0.111               portmapper
>> superuser
>>     100000    3    udp6      ::.0.111               portmapper
>> superuser
>>     100000    4    tcp       0.0.0.0.0.111          portmapper
>> superuser
>>     100000    3    tcp       0.0.0.0.0.111          portmapper
>> superuser
>>     100000    2    tcp       0.0.0.0.0.111          portmapper
>> superuser
>>     100000    4    udp       0.0.0.0.0.111          portmapper
>> superuser
>>     100000    3    udp       0.0.0.0.0.111          portmapper
>> superuser
>>     100000    2    udp       0.0.0.0.0.111          portmapper
>> superuser
>>     100000    4    local     /var/run/rpcbind.sock  portmapper
>> superuser
>>     100000    3    local     /var/run/rpcbind.sock  portmapper
>> superuser
>>     100024    1    udp       0.0.0.0.206.127        status     29
>>     100024    1    tcp       0.0.0.0.166.105        status     29
>>     100024    1    udp6      ::.141.238             status     29
>>     100024    1    tcp6      ::.192.160             status     29
>> [cel@matisse notify-one]$
>>
>> The listing for '/var/run/rpcbind.sock' is rpcbind's AF_LOCAL
>> listener.  TI-RPC's rpcb_foo() calls use this method of accessing the
>> rpcbind database rather than going over loopback.
>>
>> rpcbind scrapes the caller's effective UID off the transport socket
>> and uses that for authentication.  Note the "owner" column... that
>> comes from the socket's UID, not from the r_owner field.  When a
>> service is registered over the network, the owner column says
>> "unknown" and basically anyone can unset it.
>>
>> If the kernel used AF_LOCAL to register its services, it would mean  
>> we
>> would never use a network port for local rpcbind calls between the
>> kernel and rpcbind, and rpcbind could automatically prevent the
>> kernel's RPC services from getting unset by malicious users.  If / 
>> var/
>> run/rpcbind.sock isn't there, the kernel would know immediately that
>> rpcbind wasn't running.
>
> So what? You can achieve the same with any number of communication
> channels (including the network). Just add a timeout to the current
> 'connect()' function, and set it to a low value when doing rpcbind
> upcalls.

I suggested such a scheme last year when we first discussed connected  
UDP, and it was decided that especially short timeouts for local  
rpcbind calls were not appropriate.

In general, however, the network layer does tell us immediately when  
the service is not running (ICMP port unreachable or RST).  The  
kernel's RPC client is basically ignoring that information.

> What's so special about libtirpc or rpcbind that we have to keep
> redesigning the kernel to work around their limitations instead of the
> other way round?

I'm not sure what you're referring to, in specific.

However, since rpcbind is a standard network protocol, the kernel  
really does have to talk the protocol correctly if we want to  
interoperate with non-Linux implementations.  For local-only cases, we  
need to ensure that the kernel is backwards compatible with portmapper.

In this case, Suresh and Neil are dealing with a problem that occurs  
whether rpcbind or portmapper is running -- basically during shutdown,  
if user space has killed those processes, the kernel waits for a bit  
instead of deciding immediately that it should exit.  Nothing to do  
with TI-RPC, though TI-RPC does offer a potential solution (AF_LOCAL).

In the mount.nfs case, user space uses RST/port unreachable  
specifically for determining when the server does not support a  
particular transport (see nfs_probe_port).  That code is actually  
baked into the mount command, it's not part of the library.  If we  
want to see version/transport negotiation in the kernel, then the  
kernel rpcbind client has to have the ability to detect quickly when  
the remote does not support the requested transport.  Again, nothing  
to do with TI-RPC.

In both cases, it turns out that the library implementations in user  
space already fail quickly.  RPC_CANTRECV is returned if an attempt is  
made to send an rpcbind query to an inactive UDP port.   
RPC_SYSTEMERROR/ECONNREFUSED is returned if an attempt is made to send  
an rpcbind query to an inactive TCP port.  In my view, the kernel is  
lacking here, and should be made to emulate user space more closely.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
  2009-07-06 17:51                                   ` Chuck Lever
@ 2009-07-06 17:58                                     ` Trond Myklebust
       [not found]                                       ` <1246903105.23966.4.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Trond Myklebust @ 2009-07-06 17:58 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Suresh Jayaraman, Neil Brown, Linux NFS mailing list

On Mon, 2009-07-06 at 13:51 -0400, Chuck Lever wrote:

> In both cases, it turns out that the library implementations in user  
> space already fail quickly.  RPC_CANTRECV is returned if an attempt is  
> made to send an rpcbind query to an inactive UDP port.   
> RPC_SYSTEMERROR/ECONNREFUSED is returned if an attempt is made to send  
> an rpcbind query to an inactive TCP port.  In my view, the kernel is  
> lacking here, and should be made to emulate user space more closely.

I fully agree that we can fix the above by more judicious interpretation
of the returned networking errors. What I don't see eye to eye with is
the assertion that has been floating about that the current behaviour
should be a good enough reason to add kernel AF_LOCAL support.

Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

[parent not found: <1246903105.23966.4.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]

* Re: [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations.
       [not found]                                       ` <1246903105.23966.4.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-07-06 18:32                                         ` Chuck Lever
  0 siblings, 0 replies; 25+ messages in thread
From: Chuck Lever @ 2009-07-06 18:32 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Suresh Jayaraman, Neil Brown, Linux NFS mailing list

On Jul 6, 2009, at 1:58 PM, Trond Myklebust wrote:
> On Mon, 2009-07-06 at 13:51 -0400, Chuck Lever wrote:
>> In both cases, it turns out that the library implementations in user
>> space already fail quickly.  RPC_CANTRECV is returned if an attempt  
>> is
>> made to send an rpcbind query to an inactive UDP port.
>> RPC_SYSTEMERROR/ECONNREFUSED is returned if an attempt is made to  
>> send
>> an rpcbind query to an inactive TCP port.  In my view, the kernel is
>> lacking here, and should be made to emulate user space more closely.
>
> I fully agree that we can fix the above by more judicious  
> interpretation
> of the returned networking errors. What I don't see eye to eye with is
> the assertion that has been floating about that the current behaviour
> should be a good enough reason to add kernel AF_LOCAL support.

Since portmapper doesn't support it, and the kernel (for now) needs  
backwards compatibility with portmapper, AF_LOCAL for rpcbind is  
rather a non-starter.

In my opinion there are other reasons to consider kernel-level  
AF_LOCAL support at some later point.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2009-07-06 18:32 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-28  6:33 [nfsd PATCH 0/3] address issues with shutdown while portmap is dead NeilBrown
     [not found] ` <20090528062730.15937.70579.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2009-05-28  6:33   ` [PATCH 1/3] nfsd: don't take nfsd_mutex twice when setting number of threads NeilBrown
     [not found]     ` <20090528063303.15937.55202.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2009-06-11 17:52       ` Jeff Layton
     [not found]         ` <20090611135255.0fa2f728-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>
2009-06-11 18:01           ` Jeff Layton
2009-05-28  6:33   ` [PATCH 2/3] nfsd: optimise the starting of zero threads when none are running NeilBrown
     [not found]     ` <20090528063303.15937.57966.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2009-06-11 17:26       ` Jeff Layton
2009-05-28  6:33   ` [PATCH 3/3] sunrpc: reduce timeout when unregistering rpcbind registrations NeilBrown
     [not found]     ` <20090528063303.15937.62423.stgit-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2009-05-28 13:07       ` Tom Talpey
2009-06-11  4:49         ` Neil Brown
     [not found]           ` <18992.36038.267957.467326-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2009-06-11 15:02             ` Chuck Lever
2009-05-28 13:43       ` Chuck Lever
2009-06-11  4:48         ` Neil Brown
     [not found]           ` <18992.35996.986951.556723-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2009-06-11 15:44             ` Chuck Lever
2009-07-02 20:04               ` Chuck Lever
2009-07-06 12:42                 ` Suresh Jayaraman
2009-07-06 14:30                   ` Chuck Lever
2009-07-06 16:08                     ` Suresh Jayaraman
2009-07-06 16:22                       ` Trond Myklebust
2009-07-06 16:31                       ` Chuck Lever
2009-07-06 16:40                         ` Trond Myklebust
     [not found]                           ` <1246898450.11267.12.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-07-06 16:57                             ` Chuck Lever
2009-07-06 17:14                               ` Trond Myklebust
     [not found]                                 ` <1246900456.11267.34.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-07-06 17:51                                   ` Chuck Lever
2009-07-06 17:58                                     ` Trond Myklebust
     [not found]                                       ` <1246903105.23966.4.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-07-06 18:32                                         ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox