From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: linux-nfs-owner@vger.kernel.org Received: from mx2.netapp.com ([216.240.18.37]:49392 "EHLO mx2.netapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753245Ab1KGTtF convert rfc822-to-8bit (ORCPT ); Mon, 7 Nov 2011 14:49:05 -0500 Message-ID: <1320695343.7987.7.camel@lade.trondhjem.org> Subject: Re: [PATCH] sunrpc: wake up SOFTCONN tasks when a connection error happens. From: Trond Myklebust To: NeilBrown Cc: Chuck Lever , Jeff Layton , NFS Date: Mon, 07 Nov 2011 14:49:03 -0500 In-Reply-To: <20111107150654.1c045aad@notabene.brown> References: <20111107150654.1c045aad@notabene.brown> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org List-ID: On Sun, 2011-11-06 at 20:06 -0800, NeilBrown wrote: > hi all, > It being over a year since I last raised this I thought it might be time to > try again. > > The problem is that an NFSv4 mount request (the default) to an unrouteable > server results in a 3 minute timeout instead of an instant failure. > > This is easy to test by simply removing your default route then trying to > mount something outside your local network. > > This patch causes any SOFTCONN task to be woken up as soon as a connection > error occurs so that it can fail promptly. The failure reasons gets passed > back and as it is not ETIMEDOUT it causes immediate failure. > > Is this a reasonable approach? > > Thanks, > NeilBrown > > > > > From a1aea8fc3977ffa9951c3d7f27dbb1905e5f560f Mon Sep 17 00:00:00 2001 > From: NeilBrown > Date: Mon, 7 Nov 2011 15:00:17 +1100 > Subject: [PATCH] sunrpc: wake up SOFTCONN tasks when a connection error > happens. > > A 'SOFTCONN' task should fail if there is an error or a major timeout > during connection. > > However errors are currently converted into a timeout (60seconds for > TCP) which is treated as a minor timeout and 3 of these are required > before failure. > > The result of this is that if you try to mount an NFSv4 filesystem > (which doesn't require rpcbind and the failure modes that provides) > from a server which you do not have a route to (an so get > NETUNREACHABLE), you have an unnecessary 3 minutes timeout. > > So when ENETUNREACH is reported for a connection - or other errors > which are fatal, wake up any SOFTCONN tasks with that error - rather > than letting them wait 60 seconds and then generate ETIMEDOUT. > > This causes the above mentioned mount attempt to fail instantly. > > Signed-off-by: NeilBrown > --- > include/linux/sunrpc/sched.h | 1 + > net/sunrpc/sched.c | 29 +++++++++++++++++++++++++++++ > net/sunrpc/xprtsock.c | 6 +++++- > 3 files changed, 35 insertions(+), 1 deletions(-) > > diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h > index e775689..b85451b 100644 > --- a/include/linux/sunrpc/sched.h > +++ b/include/linux/sunrpc/sched.h > @@ -236,6 +236,7 @@ void rpc_wake_up_queued_task(struct rpc_wait_queue *, > void rpc_wake_up(struct rpc_wait_queue *); > struct rpc_task *rpc_wake_up_next(struct rpc_wait_queue *); > void rpc_wake_up_status(struct rpc_wait_queue *, int); > +void rpc_wake_up_softconn_status(struct rpc_wait_queue *, int); > int rpc_queue_empty(struct rpc_wait_queue *); > void rpc_delay(struct rpc_task *, unsigned long); > void * rpc_malloc(struct rpc_task *, size_t); > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c > index d12ffa5..d92000a 100644 > --- a/net/sunrpc/sched.c > +++ b/net/sunrpc/sched.c > @@ -543,6 +543,35 @@ void rpc_wake_up_status(struct rpc_wait_queue *queue, int status) > } > EXPORT_SYMBOL_GPL(rpc_wake_up_status); > > +/** > + * rpc_wake_up_softconn_status - wake up all SOFTCONN rpc_tasks and set their > + * status value. > + * @queue: rpc_wait_queue on which the tasks are sleeping > + * @status: status value to set > + * > + * Grabs queue->lock > + */ > +void rpc_wake_up_softconn_status(struct rpc_wait_queue *queue, int status) > +{ > + struct rpc_task *task, *next; > + struct list_head *head; > + > + spin_lock_bh(&queue->lock); > + head = &queue->tasks[queue->maxpriority]; > + for (;;) { > + list_for_each_entry_safe(task, next, head, u.tk_wait.list) > + if (RPC_IS_SOFTCONN(task)) { > + task->tk_status = status; > + rpc_wake_up_task_queue_locked(queue, task); > + } This is basically rpc_wake_up_status() with an extra conditional test (which again is just rpc_wake_up() with an extra status argument). Should we consider merging these functions? > + if (head == &queue->tasks[0]) > + break; > + head--; > + } > + spin_unlock_bh(&queue->lock); > +} > +EXPORT_SYMBOL_GPL(rpc_wake_up_softconn_status); Why do we want to export this? -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com