From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.183]) by ozlabs.org (Postfix) with ESMTP id F19CDDDDF6 for ; Mon, 19 Nov 2007 05:44:20 +1100 (EST) Received: by py-out-1112.google.com with SMTP id a29so6460771pyi for ; Sun, 18 Nov 2007 10:44:19 -0800 (PST) Message-ID: <64bb37e0711181044s75fd1081sdf44dac2e060d49a@mail.gmail.com> Date: Sun, 18 Nov 2007 19:44:19 +0100 From: "Torsten Kaiser" To: "Peter Zijlstra" Subject: Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4 In-Reply-To: <20071117230508.GB25905@dyad> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 References: <473DA608.1020804@linux.vnet.ibm.com> <64bb37e0711170953p67d1be49lf4eaa190d662e2b4@mail.gmail.com> <20071117180946.GA14055@elte.hu> <20071117101957.7562639d.akpm@linux-foundation.org> <64bb37e0711171140w5f1451e0qea081a4fbc7a45f7@mail.gmail.com> <20071117230508.GB25905@dyad> Cc: Trond Myklebust , steved@redhat.com, LKML , Kamalesh Babulal , linuxppc-dev@ozlabs.org, nfs@lists.sourceforge.net, Andrew Morton , Jan Blunck , Ingo Molnar , Balbir Singh List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Nov 18, 2007 12:05 AM, Peter Zijlstra wrote: > I've been staring at this NFS code for a while an can't make any sense > out of it. It seems to correctly initialize the waitqueue. So this would > indicate corruption of some sort. No, it does not "correctly" initialize the waitqueue. It doesn't even try to initialize it. I now found the guilty patch and what is wrong with it. nfs-stop-sillyname-renames-and-unmounts-from-racing.patch adds: @@ -110,8 +112,22 @@ struct nfs_server { filesystem */ #endif void (*destroy)(struct nfs_server *); + + atomic_t active; /* Keep trace of any activity to this server */ + wait_queue_head_t active_wq; /* Wait for any activity to stop */ and tries to initialize it: @@ -593,6 +593,10 @@ static int nfs_init_server(struct nfs_server *server, server->namelen = data->namlen; /* Create a client RPC handle for the NFSv3 ACL management interface */ nfs_init_server_aclclient(server); + + init_waitqueue_head(&server->active_wq); + atomic_set(&server->active, 0); + and then uses it via nfs_sb_active and nfs_sb_deactive: @@ -29,6 +29,7 @@ struct nfs_unlinkdata { static void nfs_free_unlinkdata(struct nfs_unlinkdata *data) { + nfs_sb_deactive(NFS_SERVER(data->dir)); iput(data->dir); put_rpccred(data->cred); kfree(data->args.name.name); @@ -151,6 +152,7 @@ static int nfs_do_call_unlink(struct dentry *parent, struct inode *dir, struct n nfs_dec_sillycount(dir); return 0; } + nfs_sb_active(NFS_SERVER(dir)); data->args.fh = NFS_FH(dir); nfs_fattr_init(&data->res.dir_attr); But it does not notice this: struct dentry_operations nfs_dentry_operations = { .d_revalidate = nfs_lookup_revalidate, .d_delete = nfs_dentry_delete, .d_iput = nfs_dentry_iput, }; struct dentry_operations nfs4_dentry_operations = { .d_revalidate = nfs_open_revalidate, .d_delete = nfs_dentry_delete, .d_iput = nfs_dentry_iput, }; NFSv2/3 and NFSv4 share the same dentry_iput and so share the same unlink and sillyrename logic. But they do not share nfs_init_server()! I wonder why this doesn't blow up more violently, but only hangs... But as I don't know if it is correct to add the workqueue initialization to nfs4_init_server() or remove the nfs_sb_active / nfs_sb_deactive for the NFSv4 case, I can't offer a patch to fix this. Torsten