All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Krafft <krafft@de.ibm.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>,
	linux-kernel@vger.kernel.org
Subject: Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context
Date: Thu, 26 Jul 2007 14:44:33 +0200	[thread overview]
Message-ID: <20070726144433.080126f7@localhost> (raw)
In-Reply-To: <200707261323.38310.arnd@arndb.de>

[-- Attachment #1: Type: text/plain, Size: 6167 bytes --]

On Thu, 26 Jul 2007 13:23:37 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Wednesday 25 July 2007, Trond Myklebust wrote:
> > 
> > On Wed, 2007-07-25 at 17:08 +0200, Christian Krafft wrote:
> > 
> > > Obviously the locking code in nfs_free_open_context is wrong.
> > > Checking the list for entries and removing the entry should be an atomic operation.
> > 
> > Wrong. It is quite safe to test the structure member ctx->list for
> > emptiness outside the spinlock because we have an explicit guarantee
> > that nobody else has a reference to this structure, plus the
> > atomic_dec_and_test() in kref_put() has acted as a memory barrier for
> > us.
> 
> Well, the real question then is how the ctx can still be present in the
> nfsi->open_files list. Since we are in nfs_free_open_context(), there
> must not be any pointer to the ctx anywhere, but still we have this other
> thread calling get_nfs_open_context() on it.
> 
> 	Arnd <><

Thanks for the pointer Arnd,

Trond, does the patch below look better to you ?

Note:
For now the list_del_init is needed, so that the BUG_ON(!list_empty) works correctly.
(I have no clue why list_empty should not be working, when the list has been poisoned by a list_del.)
I am running the stress tests now. It seems to work so far.
---


Subject: nfs: fix locking in nfs/inode.c in nfs_free_open_context

From: Christian Krafft <krafft@de.ibm.com>

While running some tests on a server using NFS root I found some
locking problems.

The scripts I am using can be found under at

http://localhorst.gotdns.org/parabelboi/nfs/

Please note, I am using rpm to generate read load. I found no better way
to generate that much stress on the read path ;-)

It's a two way machine btw, with mount options tcp,nolock the problem
doesn't occur (probably due to a different timing).
When running write_back.sh six times concurrent with one read.sh,
the processes will be locked within a few minutes.

The problem occurs at least on 2.6.22 and 2.6.23-rc1.
It shows up like this:

Badness at lib/kref.c:33
NIP: c00000000018b734 LR: c00000000014c37c CTR: c0000000001566f0
REGS: c00000007aebf160 TRAP: 0700   Not tainted  (2.6.23-rc1)
MSR: 9000000000029032 <EE,ME,IR,DR>  CR: 84000422  XER: 00000000
TASK = c00000007df7d530[30769] 'sync' THREAD: c00000007aebc000 CPU: 3
NIP [c00000000018b734] .kref_get+0xc/0x28
LR [c00000000014c37c] .get_nfs_open_context+0x1c/0x38
Call Trace:
[c00000007aebf3e0] [c00000000033d9f4] ._spin_lock+0x10/0x24 (unreliable)
[c00000007aebf460] [c00000000014cf40] .nfs_find_open_context+0x68/0xcc
[c00000007aebf500] [c000000000156654] .nfs_writepage_locked+0x164/0x200
[c00000007aebf600] [c00000000015670c] .nfs_writepage+0x1c/0x48
[c00000007aebf690] [c00000000008e53c] .__writepage+0x3c/0x84
[c00000007aebf710] [c00000000008ed84] .write_cache_pages+0x238/0x3f4
[c00000007aebf880] [c000000000155840] .nfs_writepages+0x7c/0xc0
[c00000007aebf970] [c00000000008efe0] .do_writepages+0x68/0xa4
[c00000007aebf9f0] [c0000000000e0290] .__writeback_single_inode+0x1bc/0x3b8
[c00000007aebfae0] [c0000000000e0984] .sync_sb_inodes+0x20c/0x308
[c00000007aebfb90] [c0000000000e0b50] .sync_inodes_sb+0xd0/0x100
[c00000007aebfc80] [c0000000000e0c14] .__sync_inodes+0x94/0x120
nstruction dump:
4e800421 e8410028 38000001 38210080 7c030378 e8010010 ebc1fff0 7c0803a6 
4e800020 80030000 7c000034 5400d97e <0b000000> 7c001828 30000001 7c00192d 
Unable to handle kernel paging request for data at address 0x00200200
Faulting instruction address: 0xc000000000193474
cpu 0x0: Vector: 300 (Data Access) at [c00000000ff4b5b0]
    pc: c000000000193474: .list_del+0x20/0xa4
    lr: c00000000014c2e8: .nfs_free_open_context+0x4c/0xc4
    sp: c00000000ff4b830
   msr: 9000000000009032
   dar: 200200
 dsisr: 40000000
  current = 0xc00000000ff0a230
  paca    = 0xc00000000047e900
    pid   = 506, comm = rpciod/0
enter ? for help
[c00000000ff4b8b0] c00000000014c2e8 .nfs_free_open_context+0x4c/0xc4
[c00000000ff4b940] c00000000018b708 .kref_put+0x74/0x94
[c00000000ff4b9c0] c00000000014c1dc .put_nfs_open_context+0x1c/0x34
[c00000000ff4ba40] c000000000150ec0 .nfs_free_request+0x2c/0x5c
[c00000000ff4bad0] c00000000018b708 .kref_put+0x74/0x94
[c00000000ff4bb50] c000000000150e2c .nfs_release_request+0x20/0x38
[c00000000ff4bbd0] c00000000015750c .nfs_writeback_done_partial+0x270/0x2c8
[c00000000ff4bc80] c0000000003274fc .rpc_exit_task+0x5c/0xa0

Obviously the locking code in nfs_free_open_context is wrong.
Checking the list for entries and removing the entry should be an atomic operation.
Also list_del_init should be used, because list_del will leave the empty list in
an undefined condition.

After applying the patch the scripts run a bit longer, but another error occurs :-/

Signed-off-by: Christian Krafft <krafft@de.ibm.com>

Index: linux-2.6.23-rc1/fs/nfs/inode.c
===================================================================
--- linux-2.6.23-rc1.orig/fs/nfs/inode.c
+++ linux-2.6.23-rc1/fs/nfs/inode.c
@@ -485,12 +485,8 @@ static void nfs_free_open_context(struct
 	struct nfs_open_context *ctx = container_of(kref,
 			struct nfs_open_context, kref);
 
-	if (!list_empty(&ctx->list)) {
-		struct inode *inode = ctx->path.dentry->d_inode;
-		spin_lock(&inode->i_lock);
-		list_del(&ctx->list);
-		spin_unlock(&inode->i_lock);
-	}
+	BUG_ON(!list_empty(&ctx->list));
+
 	if (ctx->state != NULL)
 		nfs4_close_state(&ctx->path, ctx->state, ctx->mode);
 	if (ctx->cred != NULL)
@@ -549,7 +545,7 @@ static void nfs_file_clear_open_context(
 	if (ctx) {
 		filp->private_data = NULL;
 		spin_lock(&inode->i_lock);
-		list_move_tail(&ctx->list, &NFS_I(inode)->open_files);
+		list_del_init(&ctx->list);
 		spin_unlock(&inode->i_lock);
 		put_nfs_open_context(ctx);
 	}

-- 
Mit freundlichen Gruessen,
kind regards,

Christian Krafft
IBM Systems & Technology Group,
Linux Kernel Development
IT Specialist


Vorsitzender des Aufsichtsrats:	Martin Jetter
Geschaeftsfuehrung:		Herbert Kircher
Sitz der Gesellschaft:		Boeblingen
Registriergericht:		Amtsgericht Stuttgart, HRB 243294


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

  parent reply	other threads:[~2007-07-26 12:45 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-25 15:08 [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context Christian Krafft
2007-07-25 17:28 ` Trond Myklebust
2007-07-26 11:23   ` Arnd Bergmann
2007-07-26 12:37     ` Trond Myklebust
2007-07-26 12:44     ` Christian Krafft [this message]
2007-07-26 13:23       ` Trond Myklebust
2007-07-26 15:13         ` Arnd Bergmann
2007-07-26 16:08           ` Trond Myklebust
2007-07-26 18:00             ` Christian Krafft
2007-07-27 10:02               ` Christian Krafft
2007-07-27  9:44             ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070726144433.080126f7@localhost \
    --to=krafft@de.ibm.com \
    --cc=arnd@arndb.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.