From: Trond Myklebust <Trond.Myklebust@netapp.com>
To: Jeff Garzik <jeff@garzik.org>
Cc: Marc Dietrich <Marc.Dietrich@ap.physik.uni-giessen.de>,
kernel list <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: NFSv4 poops itself
Date: Fri, 27 Jul 2007 10:45:50 -0400 [thread overview]
Message-ID: <1185547550.6586.24.camel@localhost> (raw)
In-Reply-To: <46A9F5D7.4050501@garzik.org>
[-- Attachment #1: Type: text/plain, Size: 2616 bytes --]
On Fri, 2007-07-27 at 09:40 -0400, Jeff Garzik wrote:
> (please don't drop CC's when you reply to email; you are cutting
> relevant people out of the loop)
>
>
> Marc Dietrich wrote:
> > me too, my server has 2.6.18-? (openSUSE 10.2). On the client
> > (2.6.23-rc1-mm1), I also see (shortly before the hang)
> >
> > Jul 26 13:09:19 fb07-iapwap2 kernel: =================================
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [ INFO: inconsistent lock state ]
> > Jul 26 13:09:19 fb07-iapwap2 kernel: 2.6.23-rc1-mm1 #1
> > Jul 26 13:09:19 fb07-iapwap2 kernel: ---------------------------------
> > Jul 26 13:09:19 fb07-iapwap2 kernel: inconsistent {softirq-on-W} ->
> > {in-softirq-W} usage.
> > Jul 26 13:09:19 fb07-iapwap2 kernel: hald/3873 [HC0[0]:SC1[1]:HE1:SE0] takes:
> > Jul 26 13:09:19 fb07-iapwap2 kernel: (rpc_credcache_lock){-+..}, at:
> > [<c01dc166>] _atomic_dec_and_lock+0x16/0x60
> > Jul 26 13:09:19 fb07-iapwap2 kernel: {softirq-on-W} state was registered at:
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [<c013d4f7>] mark_lock+0x77/0x630
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [<c013c094>] add_lock_to_list+0x44/0xc0
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [<c013e8af>]
> > __lock_acquire+0x65f/0x1020
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [<c013db0e>] mark_held_locks+0x5e/0x80
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [<c012448d>] local_bh_enable+0x7d/0x130
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [<c013f2cf>] lock_acquire+0x5f/0x80
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [<c01dc166>]
> > _atomic_dec_and_lock+0x16/0x60
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [<c02d156a>] _spin_lock+0x2a/0x40
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [<c01dc166>]
> > _atomic_dec_and_lock+0x16/0x60
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [<c01dc166>]
> > _atomic_dec_and_lock+0x16/0x60
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [<c02d156a>] _spin_lock+0x2a/0x40
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [<dcecf770>] put_rpccred+0x60/0x110
> > [sunrpc]
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [<dcecf840>]
> > rpcauth_unbindcred+0x20/0x60 [sunrpc]
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [<dcece1f4>] rpc_put_task+0x44/0xb0
> > [sunrpc]
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [<dcec8ffd>] rpc_call_sync+0x2d/0x40
> > [sunrpc]
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [<dced680d>] rpcb_register+0x10d/0x1c0
> > [sunrpc]
> > Jul 26 13:09:19 fb07-iapwap2 kernel: [<dced06ef>] svc_register+0x8f/0x160
> > [sunrpc]
> [continues]
That particular hang in rpciod_down we do have a fix for, but it is not
related to the issue you were seeing Jeff.
Trond
[-- Attachment #2: linux-2.6.23-001-fix_rpciod_down_race.dif --]
[-- Type: message/rfc822, Size: 3070 bytes --]
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: No Subject
Date: Thu, 19 Jul 2007 16:32:20 -0400
Message-ID: <1185547550.6586.25.camel@localhost>
The commit 4ada539ed77c7a2bbcb75cafbbd7bd8d2b9bef7b lead to the unpleasant
possibility of an asynchronous rpc_task being required to call
rpciod_down() when it is complete. This again means that the rpciod
workqueue may get to call destroy_workqueue on itself -> hang...
Change rpciod_up/rpciod_down to just get/put the module, and then
create/destroy the workqueues on module load/unload.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
net/sunrpc/sched.c | 57 +++++++++++++++++++++-------------------------------
1 files changed, 23 insertions(+), 34 deletions(-)
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index b5723c2..954d7ec 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -50,8 +50,6 @@ static RPC_WAITQ(delay_queue, "delayq");
/*
* rpciod-related stuff
*/
-static DEFINE_MUTEX(rpciod_mutex);
-static atomic_t rpciod_users = ATOMIC_INIT(0);
struct workqueue_struct *rpciod_workqueue;
/*
@@ -961,60 +959,49 @@ void rpc_killall_tasks(struct rpc_clnt *clnt)
spin_unlock(&clnt->cl_lock);
}
+int rpciod_up(void)
+{
+ return try_module_get(THIS_MODULE) ? 0 : -EINVAL;
+}
+
+void rpciod_down(void)
+{
+ module_put(THIS_MODULE);
+}
+
/*
- * Start up the rpciod process if it's not already running.
+ * Start up the rpciod workqueue.
*/
-int
-rpciod_up(void)
+static int rpciod_start(void)
{
struct workqueue_struct *wq;
- int error = 0;
-
- if (atomic_inc_not_zero(&rpciod_users))
- return 0;
-
- mutex_lock(&rpciod_mutex);
- /* Guard against races with rpciod_down() */
- if (rpciod_workqueue != NULL)
- goto out_ok;
/*
* Create the rpciod thread and wait for it to start.
*/
dprintk("RPC: creating workqueue rpciod\n");
- error = -ENOMEM;
wq = create_workqueue("rpciod");
- if (wq == NULL)
- goto out;
-
rpciod_workqueue = wq;
- error = 0;
-out_ok:
- atomic_inc(&rpciod_users);
-out:
- mutex_unlock(&rpciod_mutex);
- return error;
+ return rpciod_workqueue != NULL;
}
-void
-rpciod_down(void)
+static void rpciod_stop(void)
{
- if (!atomic_dec_and_test(&rpciod_users))
- return;
+ struct workqueue_struct *wq = NULL;
- mutex_lock(&rpciod_mutex);
+ if (rpciod_workqueue == NULL)
+ return;
dprintk("RPC: destroying workqueue rpciod\n");
- if (atomic_read(&rpciod_users) == 0 && rpciod_workqueue != NULL) {
- destroy_workqueue(rpciod_workqueue);
- rpciod_workqueue = NULL;
- }
- mutex_unlock(&rpciod_mutex);
+ wq = rpciod_workqueue;
+ rpciod_workqueue = NULL;
+ destroy_workqueue(wq);
}
void
rpc_destroy_mempool(void)
{
+ rpciod_stop();
if (rpc_buffer_mempool)
mempool_destroy(rpc_buffer_mempool);
if (rpc_task_mempool)
@@ -1048,6 +1035,8 @@ rpc_init_mempool(void)
rpc_buffer_slabp);
if (!rpc_buffer_mempool)
goto err_nomem;
+ if (!rpciod_start())
+ goto err_nomem;
return 0;
err_nomem:
rpc_destroy_mempool();
next prev parent reply other threads:[~2007-07-27 14:46 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-27 12:53 NFSv4 poops itself Jeff Garzik
2007-07-27 13:33 ` Trond Myklebust
2007-07-27 17:09 ` Trond Myklebust
2007-07-27 17:16 ` Jeff Garzik
2007-07-27 17:29 ` Trond Myklebust
2007-08-03 19:13 ` J. Bruce Fields
2007-08-06 13:40 ` J. Bruce Fields
2007-08-06 18:40 ` Trond Myklebust
2007-07-27 18:29 ` Marc Dietrich
2007-07-27 13:36 ` Marc Dietrich
2007-07-27 13:40 ` Jeff Garzik
2007-07-27 14:45 ` Trond Myklebust [this message]
2007-07-27 14:58 ` Satyam Sharma
2007-07-27 17:58 ` Marc Dietrich
2007-07-27 18:39 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1185547550.6586.24.camel@localhost \
--to=trond.myklebust@netapp.com \
--cc=Marc.Dietrich@ap.physik.uni-giessen.de \
--cc=akpm@linux-foundation.org \
--cc=jeff@garzik.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox