* 2.5.70-bk16: nfs crash
@ 2003-06-12 12:56 John M Flinchbaugh
2003-06-12 13:52 ` Dipankar Sarma
0 siblings, 1 reply; 22+ messages in thread
From: John M Flinchbaugh @ 2003-06-12 12:56 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 5740 bytes --]
running 2.5.70-bk16, i got this error and hang. sysrq worked for
reboot, etc.
it apparently crashed when it mounted an nfs export from a 2.4.18 box,
and tried to mv a file. i doubt it matters, but the nic is an
orinoco_cs wireless card. thanks.
Jun 12 02:00:04 density kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Jun 12 02:00:04 density kernel: printing eip:
Jun 12 02:00:04 density kernel: c0169ef1
Jun 12 02:00:04 density kernel: *pde = 00000000
Jun 12 02:00:04 density kernel: Oops: 0002 [#1]
Jun 12 02:00:04 density kernel: CPU: 0
Jun 12 02:00:04 density kernel: EIP: 0060:[<c0169ef1>] Not tainted
Jun 12 02:00:04 density kernel: EFLAGS: 00010246
Jun 12 02:00:04 density kernel: EIP is at d_move+0x51/0x250
Jun 12 02:00:04 density kernel: eax: 00000000 ebx: cd5e6960 ecx: cd5e69d0 edx: 00000000
Jun 12 02:00:04 density kernel: esi: cd2b4440 edi: cfdb4af4 ebp: cd087e4c esp: cd087e20
Jun 12 02:00:04 density kernel: ds: 007b es: 007b ss: 0068
Jun 12 02:00:04 density kernel: Process mv (pid: 4960, threadinfo=cd086000 task=d41681a0)
Jun 12 02:00:04 density kernel: Stack: cdf4a508 cd5e69dc 0000000e cdf4a508 cd087e70 00000014 d8bf416d cdf4a6a0
Jun 12 02:00:04 density kernel: cd087e70 cd2b4440 cdf4a614 cd087ebc d8bf1ce7 cd5e6960 cd2b4440 cdf4a614
Jun 12 02:00:04 density kernel: cd087ea0 00000001 00000108 d78f0c98 73666e2e 30303030 31313030 30303030
Jun 12 02:00:04 density kernel: Call Trace:
Jun 12 02:00:04 density kernel: [<d8bf416d>] nfs_zap_caches+0x5d/0xd0 [nfs]
Jun 12 02:00:04 density kernel: [<d8bf1ce7>] nfs_sillyrename+0x197/0x220 [nfs]
Jun 12 02:00:04 density kernel: [<d8bf259c>] nfs_rename+0x20c/0x2b0 [nfs]
Jun 12 02:00:04 density kernel: [<c0161f50>] vfs_rename_other+0xc0/0x120
Jun 12 02:00:04 density kernel: [<d8bf2390>] nfs_rename+0x0/0x2b0 [nfs]
Jun 12 02:00:04 density kernel: [<c016213d>] vfs_rename+0x18d/0x3a0
Jun 12 02:00:04 density kernel: [<c01624f3>] sys_rename+0x1a3/0x1d0
Jun 12 02:00:04 density kernel: [<c01093cb>] syscall_call+0x7/0xb
Jun 12 02:00:04 density kernel:
Jun 12 02:00:04 density kernel: Code: 89 02 74 03 89 50 04 c7 43 70 00 01 10 00 c7 41 04 00 02 20
Jun 12 02:00:04 density kernel: <6>note: mv[4960] exited with preempt_count 5
Jun 12 02:00:04 density kernel: bad: scheduling while atomic!
Jun 12 02:00:04 density kernel: Call Trace:
Jun 12 02:00:04 density kernel: [<c0116d32>] schedule+0x3c2/0x3d0
Jun 12 02:00:04 density kernel: [<c01d5576>] __copy_from_user_ll+0x66/0x70
Jun 12 02:00:04 density kernel: [<c0135a06>] unlock_page+0x16/0x50
Jun 12 02:00:04 density kernel: [<c0137e95>] generic_file_aio_write_nolock+0x5f5/0xba0
Jun 12 02:00:04 density kernel: [<c01384b8>] generic_file_write_nolock+0x78/0x90
Jun 12 02:00:04 density kernel: [<c0225658>] vt_console_print+0x1e8/0x2f0
Jun 12 02:00:04 density kernel: [<c011a8de>] __call_console_drivers+0x5e/0x60
Jun 12 02:00:04 density kernel: [<c011a9b5>] call_console_drivers+0x65/0x120
Jun 12 02:00:04 density kernel: [<c01385b5>] generic_file_write+0x55/0x70
Jun 12 02:00:04 density kernel: [<c01af270>] reiserfs_file_write+0x50/0x652
Jun 12 02:00:04 density kernel: [<c02465f2>] vgacon_scroll+0x192/0x250
Jun 12 02:00:04 density kernel: [<c0245243>] vgacon_cursor+0xe3/0x1e0
Jun 12 02:00:04 density kernel: [<c0121ae1>] mod_timer+0x131/0x190
Jun 12 02:00:04 density kernel: [<c0132ea0>] check_free_space+0x100/0x190
Jun 12 02:00:04 density kernel: [<c022638b>] poke_blanked_console+0x6b/0x80
Jun 12 02:00:04 density kernel: [<c022567d>] vt_console_print+0x20d/0x2f0
Jun 12 02:00:04 density kernel: [<c01162e9>] try_to_wake_up+0xa9/0x150
Jun 12 02:00:04 density kernel: [<c0116dba>] default_wake_function+0x2a/0x30
Jun 12 02:00:04 density kernel: [<c013346a>] do_acct_process+0x27a/0x290
Jun 12 02:00:04 density kernel: [<c0114820>] do_page_fault+0x0/0x4cd
Jun 12 02:00:04 density kernel: [<c01334b9>] acct_process+0x39/0x61
Jun 12 02:00:04 density kernel: [<c011c81d>] do_exit+0x8d/0x460
Jun 12 02:00:04 density kernel: [<c0114820>] do_page_fault+0x0/0x4cd
Jun 12 02:00:04 density kernel: [<c0109c01>] die+0xe1/0xf0
Jun 12 02:00:04 density kernel: [<c011494f>] do_page_fault+0x12f/0x4cd
Jun 12 02:00:04 density kernel: [<d8c12988>] nfs_procedures+0x108/0x1b0 [nfs]
Jun 12 02:00:04 density kernel: [<d8babc70>] rpc_run_timer+0x0/0x80 [sunrpc]
Jun 12 02:00:04 density kernel: [<c0114820>] do_page_fault+0x0/0x4cd
Jun 12 02:00:04 density kernel: [<c0109575>] error_code+0x2d/0x38
Jun 12 02:00:04 density kernel: [<c0169ef1>] d_move+0x51/0x250
Jun 12 02:00:04 density kernel: [<d8bf416d>] nfs_zap_caches+0x5d/0xd0 [nfs]
Jun 12 02:00:04 density kernel: [<d8bf1ce7>] nfs_sillyrename+0x197/0x220 [nfs]
Jun 12 02:00:04 density kernel: [<d8bf259c>] nfs_rename+0x20c/0x2b0 [nfs]
Jun 12 02:00:04 density kernel: [<c0161f50>] vfs_rename_other+0xc0/0x120
Jun 12 02:00:04 density kernel: [<d8bf2390>] nfs_rename+0x0/0x2b0 [nfs]
Jun 12 02:00:04 density kernel: [<c016213d>] vfs_rename+0x18d/0x3a0
Jun 12 02:00:04 density kernel: [<c01624f3>] sys_rename+0x1a3/0x1d0
Jun 12 02:00:04 density kernel: [<c01093cb>] syscall_call+0x7/0xb
Jun 12 02:00:04 density kernel:
some info about the build environment:
Gnu C 3.3
Gnu make 3.80
util-linux 2.11z
mount 2.11z
Linux C Library 2.3.1
Dynamic linker (ldd) 2.3.1
Procps 3.1.9
Net-tools 1.60
Console-tools 0.2.3
Sh-utils 5.0
--
____________________}John Flinchbaugh{______________________
| glynis@hjsoft.com http://www.hjsoft.com/~glynis/ |
~~Powered by Linux: Reboots are for hardware upgrades only~~
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: 2.5.70-bk16: nfs crash 2003-06-12 12:56 2.5.70-bk16: nfs crash John M Flinchbaugh @ 2003-06-12 13:52 ` Dipankar Sarma 2003-06-12 15:33 ` Dipankar Sarma ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Dipankar Sarma @ 2003-06-12 13:52 UTC (permalink / raw) To: John M Flinchbaugh Cc: linux-kernel, Trond Myklebust, Linus Torvalds, Maneesh Soni On Thu, Jun 12, 2003 at 12:57:26PM +0000, John M Flinchbaugh wrote: > running 2.5.70-bk16, i got this error and hang. sysrq worked for > reboot, etc. > > it apparently crashed when it mounted an nfs export from a 2.4.18 box, > and tried to mv a file. i doubt it matters, but the nic is an > orinoco_cs wireless card. thanks. > > Jun 12 02:00:04 density kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 > Jun 12 02:00:04 density kernel: printing eip: > Jun 12 02:00:04 density kernel: c0169ef1 > Jun 12 02:00:04 density kernel: *pde = 00000000 > Jun 12 02:00:04 density kernel: Oops: 0002 [#1] > Jun 12 02:00:04 density kernel: CPU: 0 > Jun 12 02:00:04 density kernel: EIP: 0060:[<c0169ef1>] Not tainted > Jun 12 02:00:04 density kernel: EFLAGS: 00010246 > Jun 12 02:00:04 density kernel: EIP is at d_move+0x51/0x250 > Jun 12 02:00:04 density kernel: eax: 00000000 ebx: cd5e6960 ecx: cd5e69d0 edx: 00000000 I am not supprised at all by this, I can see two csets in Linus' tree that will definitely break dcache - 1. http://linux.bkbits.net:8080/linux-2.5/cset@1.1215.104.2?nav=index.html|ChangeSet@-2d __d_drop() *must not* initialize d_hash fields. Lockfree lookup depends on that. If __d_drop() needs to be allowed on an unhashed dentry, the right thing to do would be to check for DCACHE_UNHASHED before unhashing. I will submit a patch a little later to do this. 2. http://linux.bkbits.net:8080/linux-2.5/cset@1.1215.104.1?nav=index.html|ChangeSet@-2d hlist poison patch is broken. list_del_rcu() and hlist_del_rcu() *must not* re-initialize the pointers. Maneesh submitted a patch earlier today that corrects this - http://marc.theaimsgroup.com/?l=linux-kernel&m=105541206017154&w=2 Thanks Dipankar ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-12 13:52 ` Dipankar Sarma @ 2003-06-12 15:33 ` Dipankar Sarma 2003-06-12 15:35 ` Trond Myklebust 2003-06-12 15:49 ` Linus Torvalds 2 siblings, 0 replies; 22+ messages in thread From: Dipankar Sarma @ 2003-06-12 15:33 UTC (permalink / raw) To: John M Flinchbaugh Cc: linux-kernel, Trond Myklebust, Linus Torvalds, Maneesh Soni On Thu, Jun 12, 2003 at 07:22:54PM +0530, Dipankar Sarma wrote: > I am not supprised at all by this, I can see two csets in Linus' tree > that will definitely break dcache - > > 1. http://linux.bkbits.net:8080/linux-2.5/cset@1.1215.104.2?nav=index.html|ChangeSet@-2d > > __d_drop() *must not* initialize d_hash fields. Lockfree lookup depends on > that. If __d_drop() needs to be allowed on an unhashed dentry, the right > thing to do would be to check for DCACHE_UNHASHED before unhashing. I will > submit a patch a little later to do this. > > > 2. http://linux.bkbits.net:8080/linux-2.5/cset@1.1215.104.1?nav=index.html|ChangeSet@-2d > > hlist poison patch is broken. list_del_rcu() and hlist_del_rcu() > *must not* re-initialize the pointers. Maneesh submitted a patch > earlier today that corrects this - > > http://marc.theaimsgroup.com/?l=linux-kernel&m=105541206017154&w=2 John, Can you try the patch below along with Maneesh's patch mentioned above to see if they fix your problem ? Thanks Dipankar Fix __d_drop() to be allowed on unhashed dentries correctly. Don't re-initialize the pointers, instead check for DCACHE_UNHASHED before really unhashing it. include/linux/dcache.h | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) diff -puN include/linux/dcache.h~dentry-hash-init-fix include/linux/dcache.h --- linux-2.5.70-bk16/include/linux/dcache.h~dentry-hash-init-fix 2003-06-12 20:43:32.000000000 +0530 +++ linux-2.5.70-bk16-dipankar/include/linux/dcache.h 2003-06-12 20:47:10.000000000 +0530 @@ -174,8 +174,10 @@ extern spinlock_t dcache_lock; static inline void __d_drop(struct dentry *dentry) { - dentry->d_vfs_flags |= DCACHE_UNHASHED; - hlist_del_rcu_init(&dentry->d_hash); + if (!(dentry->d_vfs_flags & DCACHE_UNHASHED)) { + dentry->d_vfs_flags |= DCACHE_UNHASHED; + hlist_del_rcu(&dentry->d_hash); + } } static inline void d_drop(struct dentry *dentry) _ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-12 13:52 ` Dipankar Sarma 2003-06-12 15:33 ` Dipankar Sarma @ 2003-06-12 15:35 ` Trond Myklebust 2003-06-12 15:53 ` Dipankar Sarma 2003-06-12 15:49 ` Linus Torvalds 2 siblings, 1 reply; 22+ messages in thread From: Trond Myklebust @ 2003-06-12 15:35 UTC (permalink / raw) To: dipankar; +Cc: John M Flinchbaugh, linux-kernel, Linus Torvalds, Maneesh Soni >>>>> " " == Dipankar Sarma <dipankar@in.ibm.com> writes: > I am not supprised at all by this, I can see two csets in > Linus' tree that will definitely break dcache - > 1. http://linux.bkbits.net:8080/linux-2.5/cset@1.1215.104.2?nav=index.html|ChangeSet@-2d > __d_drop() *must not* initialize d_hash fields. Lockfree lookup > depends on that. If __d_drop() needs to be allowed on an > unhashed dentry, the right thing to do would be to check for > DCACHE_UNHASHED before unhashing. I will submit a patch a > little later to do this. Can you please remind us exactly what the benefits of all this are? Why can't d_free() immediately free the memory instead of relying on the RCU mechanism? Cheers, Trond ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-12 15:35 ` Trond Myklebust @ 2003-06-12 15:53 ` Dipankar Sarma 2003-06-12 16:26 ` Trond Myklebust 2003-06-12 16:30 ` viro 0 siblings, 2 replies; 22+ messages in thread From: Dipankar Sarma @ 2003-06-12 15:53 UTC (permalink / raw) To: Trond Myklebust Cc: John M Flinchbaugh, linux-kernel, Linus Torvalds, Maneesh Soni On Thu, Jun 12, 2003 at 08:35:14AM -0700, Trond Myklebust wrote: > >>>>> " " == Dipankar Sarma <dipankar@in.ibm.com> writes: > > __d_drop() *must not* initialize d_hash fields. Lockfree lookup > > depends on that. If __d_drop() needs to be allowed on an > > unhashed dentry, the right thing to do would be to check for > > DCACHE_UNHASHED before unhashing. I will submit a patch a > > little later to do this. > > Can you please remind us exactly what the benefits of all this are? > Why can't d_free() immediately free the memory instead of relying on > the RCU mechanism? Because we no longer hold the dcache_lock while doing a d_lookup(). With the dentry still around (RCU wouldn't happen until all CPUs do a context switch or execute user-level code), lookup can continue to traverse the hash list while another CPU deletes the currrent dentry. Once RCU happens, it is guranteed that no other CPU could be in that dentry during hash list traversal. That is why we have _rcu versions of the list deletion macros. Lockfree d_lookup() gives us significant benefits in larger SMP machines. Does my patch meet the requirements that you had for __d_drop() ? Thanks Dipankar ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-12 15:53 ` Dipankar Sarma @ 2003-06-12 16:26 ` Trond Myklebust 2003-06-12 16:49 ` Linus Torvalds 2003-06-12 19:53 ` Dipankar Sarma 2003-06-12 16:30 ` viro 1 sibling, 2 replies; 22+ messages in thread From: Trond Myklebust @ 2003-06-12 16:26 UTC (permalink / raw) To: dipankar; +Cc: John M Flinchbaugh, linux-kernel, Linus Torvalds, Maneesh Soni >>>>> " " == Dipankar Sarma <dipankar@in.ibm.com> writes: > Does my patch meet the requirements that you had for __d_drop() > ? I still need a real fix for d_move(). In addition, I'm getting worried about the changes in functionality that you've introduced here. It seems to me that your lockless scheme opens for a *lot* of races: Look at all those functions that take dcache_lock, and then test dentry->d_count. Unless I'm missing something here, your d_lookup() clearly has them all screwed, no? Cheers, Trond ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-12 16:26 ` Trond Myklebust @ 2003-06-12 16:49 ` Linus Torvalds 2003-06-12 16:55 ` Linus Torvalds 2003-06-12 19:53 ` Dipankar Sarma 1 sibling, 1 reply; 22+ messages in thread From: Linus Torvalds @ 2003-06-12 16:49 UTC (permalink / raw) To: Trond Myklebust; +Cc: dipankar, John M Flinchbaugh, linux-kernel, Maneesh Soni On Thu, 12 Jun 2003, Trond Myklebust wrote: > > I still need a real fix for d_move(). How about something like this.. It still breaks if the _target_ is unhashed, but quite frankly, if you have unhashed the target, I think something is seriously wrong. Linus --- ===== fs/dcache.c 1.57 vs edited ===== --- 1.57/fs/dcache.c Sat Jun 7 10:17:01 2003 +++ edited/fs/dcache.c Thu Jun 12 09:42:48 2003 @@ -1221,10 +1221,14 @@ } /* Move the dentry to the target hash queue, if on different bucket */ + if (dentry->d_vfs_flags & DCACHE_UNHASHED) + goto already_unhashed; if (dentry->d_bucket != target->d_bucket) { - dentry->d_bucket = target->d_bucket; hlist_del_rcu(&dentry->d_hash); +already_unhashed: + dentry->d_bucket = target->d_bucket; hlist_add_head_rcu(&dentry->d_hash, target->d_bucket); + dentry->d_vfs_flags &= ~DCACHE_UNHASHED; } /* Unhash the target: dput() will then get rid of it */ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-12 16:49 ` Linus Torvalds @ 2003-06-12 16:55 ` Linus Torvalds 0 siblings, 0 replies; 22+ messages in thread From: Linus Torvalds @ 2003-06-12 16:55 UTC (permalink / raw) To: Trond Myklebust; +Cc: dipankar, John M Flinchbaugh, linux-kernel, Maneesh Soni On Thu, 12 Jun 2003, Linus Torvalds wrote: > > How about something like this.. It still breaks if the _target_ is > unhashed Actually, it doesn't _break_, it just does something surprising (but possibly quite correct): it will hash the dentry to the same hash chain the target _used_ to be on before being unhashed. Which might actually be the right thing, but it still sounds to me like a bad idea to have a unhashed target. Linus ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-12 16:26 ` Trond Myklebust 2003-06-12 16:49 ` Linus Torvalds @ 2003-06-12 19:53 ` Dipankar Sarma 2003-06-13 5:24 ` Trond Myklebust 1 sibling, 1 reply; 22+ messages in thread From: Dipankar Sarma @ 2003-06-12 19:53 UTC (permalink / raw) To: Trond Myklebust Cc: John M Flinchbaugh, linux-kernel, Linus Torvalds, Maneesh Soni On Thu, Jun 12, 2003 at 09:26:29AM -0700, Trond Myklebust wrote: > I still need a real fix for d_move(). In addition, I'm getting worried Linus' fix allowing unhashed src dentries seems ok, if that is what you are looking for. > about the changes in functionality that you've introduced here. It > seems to me that your lockless scheme opens for a *lot* of races: > > Look at all those functions that take dcache_lock, and then test > dentry->d_count. Unless I'm missing something here, your d_lookup() > clearly has them all screwed, no? Not necessarily. One example is the fact that d_lookup() can only increase d_count. Besides, dput() decrements d_count without dcache_lock, so I am not sure holding dcache_lock during d_count test buys you much. We will do some audit tomorrow and see if there are issues here. Thanks Dipankar ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-12 19:53 ` Dipankar Sarma @ 2003-06-13 5:24 ` Trond Myklebust 2003-06-13 5:50 ` Dipankar Sarma 2003-06-13 6:06 ` Dipankar Sarma 0 siblings, 2 replies; 22+ messages in thread From: Trond Myklebust @ 2003-06-13 5:24 UTC (permalink / raw) To: dipankar; +Cc: John M Flinchbaugh, linux-kernel, Linus Torvalds, Maneesh Soni >>>>> " " == Dipankar Sarma <dipankar@in.ibm.com> writes: >> Look at all those functions that take dcache_lock, and then >> test >> dentry-> d_count. Unless I'm missing something here, your >> d_lookup() clearly has them all screwed, no? > Not necessarily. One example is the fact that d_lookup() can > only increase d_count. Besides, dput() decrements d_count > without dcache_lock, so I am not sure holding dcache_lock > during d_count test buys you much. Wrong. Look at the VFS code. In all cases the test is of the form. spin_lock(&dcache_lock); /* Are we the sole users of this dentry */ if (atomic_read(&dentry->d_count) == 1) { /* Yes - do some operation */ } Knowing that d_lookup() can *increase* d_count is not a plus here. The whole idea is to have a test for sole use. In most cases, the "do some operation" above is d_drop(dentry); in order (for instance) to ensure that nobody else can look up this dentry while we're working on it (e.g. rename or unlink,...). Your d_lookup() screws the above example of code which you can find in any number of VFS functions. dput(), d_delete(), d_invalidate(), d_prune_aliases(), prune_dcache(), shrink_dcache_sb() are but a few functions that rely on the above code snippet working to keep d_lookup() from intruding. Cheers, Trond ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-13 5:24 ` Trond Myklebust @ 2003-06-13 5:50 ` Dipankar Sarma 2003-06-13 6:13 ` Trond Myklebust 2003-06-13 6:06 ` Dipankar Sarma 1 sibling, 1 reply; 22+ messages in thread From: Dipankar Sarma @ 2003-06-13 5:50 UTC (permalink / raw) To: Trond Myklebust Cc: John M Flinchbaugh, linux-kernel, Linus Torvalds, Maneesh Soni On Thu, Jun 12, 2003 at 10:24:16PM -0700, Trond Myklebust wrote: > Wrong. Look at the VFS code. In all cases the test is of the form. > > spin_lock(&dcache_lock); > /* Are we the sole users of this dentry */ > if (atomic_read(&dentry->d_count) == 1) { > /* Yes - do some operation */ > } > > > Knowing that d_lookup() can *increase* d_count is not a plus here. The > whole idea is to have a test for sole use. Well, d_lookup() isn't the only place that does a dget() without holding dcache_lock. There are *many* places where dget() is done without holding dcache_lock. That didn't seem to be a requirement in the pre-RCU dcache model. > > In most cases, the "do some operation" above is > > d_drop(dentry); > I don't think that would work in pre or post-RCU dcache. > in order (for instance) to ensure that nobody else can look up this > dentry while we're working on it (e.g. rename or unlink,...). rename, unlink etc. hold the per-dentry lock, so they are protected against lockfree d_lookup(). > > Your d_lookup() screws the above example of code which you can find in > any number of VFS functions. dput(), d_delete(), d_invalidate(), > d_prune_aliases(), prune_dcache(), shrink_dcache_sb() are but a few > functions that rely on the above code snippet working to keep > d_lookup() from intruding. Those routines hold the per-dentry lock as required and that protects them from intruding lockfree d_lookup(). Thanks Dipankar ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-13 5:50 ` Dipankar Sarma @ 2003-06-13 6:13 ` Trond Myklebust 2003-06-13 6:54 ` Dipankar Sarma 0 siblings, 1 reply; 22+ messages in thread From: Trond Myklebust @ 2003-06-13 6:13 UTC (permalink / raw) To: dipankar Cc: Trond Myklebust, John M Flinchbaugh, linux-kernel, Linus Torvalds, Maneesh Soni >>>>> " " == Dipankar Sarma <dipankar@in.ibm.com> writes: > Well, d_lookup() isn't the only place that does a dget() > without holding dcache_lock. There are *many* places where > dget() is done without holding dcache_lock. That didn't seem to > be a requirement in the pre-RCU dcache model. d_lookup() is the only place where someone can pick up an existing dentry for which they do not already hold a reference. >> d_invalidate(), d_prune_aliases(), prune_dcache(), >> shrink_dcache_sb() are but a few functions that rely on the >> above code snippet working to keep d_lookup() from intruding. > Those routines hold the per-dentry lock as required and that > protects them from intruding lockfree d_lookup(). d_invalidate() does not. d_prune_aliases() does not. d_unhash() does not. Down in the per-filesystem code, I know of several locations in the NFS code that do not. There's one in procfs. I'm sure you can find more examples in the other filesystems too... Your argument only holds water if you demand that all callers of d_drop() should also take the per-dentry lock. AFAICS this is not being enforced. Cheers, Trond ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-13 6:13 ` Trond Myklebust @ 2003-06-13 6:54 ` Dipankar Sarma 0 siblings, 0 replies; 22+ messages in thread From: Dipankar Sarma @ 2003-06-13 6:54 UTC (permalink / raw) To: Trond Myklebust Cc: John M Flinchbaugh, linux-kernel, Linus Torvalds, Maneesh Soni On Thu, Jun 12, 2003 at 11:13:31PM -0700, Trond Myklebust wrote: > >>>>> " " == Dipankar Sarma <dipankar@in.ibm.com> writes: > d_lookup() is the only place where someone can pick up an existing > dentry for which they do not already hold a reference. AFAICS, the way to do a perfectly safe "sole ownership" check is to grab the per-dentry lock (instead of dcache_lock) and do the check. > >> d_invalidate(), d_prune_aliases(), prune_dcache(), > >> shrink_dcache_sb() are but a few functions that rely on the > >> above code snippet working to keep d_lookup() from intruding. > > > Those routines hold the per-dentry lock as required and that > > protects them from intruding lockfree d_lookup(). > > d_invalidate() does not. d_prune_aliases() does not. d_unhash() does > not. > > Down in the per-filesystem code, I know of several locations in the > NFS code that do not. There's one in procfs. I'm sure you can find > more examples in the other filesystems too... Yes and we need an audit. Besides I see places where d_count check is being done without holding *any* lock. That is ok only if the dentry is guranteed to be unhashed and we are doing a "sole ownership" check. Otherwise it may not work. > > Your argument only holds water if you demand that all callers of > d_drop() should also take the per-dentry lock. AFAICS this is not > being enforced. There are some rules that we need to work out irrespective of RCU and document clearly - 1. d_unhashed() checks that are being done with and without dcache_lock - if (!d_unhashed(new_dentry)) { d_drop(new_dentry); rehash = new_dentry; } This seems to require that either we hold the dcache_lock or the operations that we do on the dentry allow unhashed dentries. 2. Checking for "sole ownership" of dentries. Depending on whether the dentry is hashed and what operation we are going to do, we will need to hold the per-dentry lock. I am glad that you are reviewing this. Is there any other dcache operations in filesystems that you would like to add to the list above ? Thanks Dipankar ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-13 5:24 ` Trond Myklebust 2003-06-13 5:50 ` Dipankar Sarma @ 2003-06-13 6:06 ` Dipankar Sarma 1 sibling, 0 replies; 22+ messages in thread From: Dipankar Sarma @ 2003-06-13 6:06 UTC (permalink / raw) To: Trond Myklebust Cc: John M Flinchbaugh, linux-kernel, Linus Torvalds, Maneesh Soni On Thu, Jun 12, 2003 at 10:24:16PM -0700, Trond Myklebust wrote: > Wrong. Look at the VFS code. In all cases the test is of the form. > > spin_lock(&dcache_lock); > /* Are we the sole users of this dentry */ > if (atomic_read(&dentry->d_count) == 1) { > /* Yes - do some operation */ > } > > > Knowing that d_lookup() can *increase* d_count is not a plus here. The > whole idea is to have a test for sole use. I missed this part. If you want to do this, I would suggest taking the per-dentry lock instead. Most dcache routines have been fixed to do this. We will look around and see anything violates this rule. Thanks Dipankar ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-12 15:53 ` Dipankar Sarma 2003-06-12 16:26 ` Trond Myklebust @ 2003-06-12 16:30 ` viro 2003-06-12 16:55 ` Dipankar Sarma 1 sibling, 1 reply; 22+ messages in thread From: viro @ 2003-06-12 16:30 UTC (permalink / raw) To: Dipankar Sarma Cc: Trond Myklebust, John M Flinchbaugh, linux-kernel, Linus Torvalds, Maneesh Soni On Thu, Jun 12, 2003 at 09:23:45PM +0530, Dipankar Sarma wrote: > Lockfree d_lookup() gives us significant benefits in larger > SMP machines. I wonder if they outweight debugging time wasted after any change... Note that for vfsmounts proposed RCU patch had been utterly useless - practically all improvements had been from separate lock for vfsmounts (see akpm tree). ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-12 16:30 ` viro @ 2003-06-12 16:55 ` Dipankar Sarma 0 siblings, 0 replies; 22+ messages in thread From: Dipankar Sarma @ 2003-06-12 16:55 UTC (permalink / raw) To: viro Cc: Trond Myklebust, John M Flinchbaugh, linux-kernel, Linus Torvalds, Maneesh Soni On Thu, Jun 12, 2003 at 05:30:45PM +0100, viro@parcelfarce.linux.theplanet.co.uk wrote: > On Thu, Jun 12, 2003 at 09:23:45PM +0530, Dipankar Sarma wrote: > > > Lockfree d_lookup() gives us significant benefits in larger > > SMP machines. > > I wonder if they outweight debugging time wasted after any change... Several sets of numbers have been published in lkml on this. I will work on sending out my updates to the vfs locking document you wrote ASAP. AFAICS, most dcache APIs work as is despite lockfree lookup. As long as we follow those rules, we should be ok. > > Note that for vfsmounts proposed RCU patch had been utterly useless - > practically all improvements had been from separate lock for vfsmounts > (see akpm tree). Yes and that is why Maneesh's patch had two parts. In that case benefit came from reducing acquisition of dcache_lock by the first part itself - using a separate lock for vfsmounts. It does not seem possible to split up dcache_lock any further without very significant changes in vfs. The acquisition of vfsmount_lock by itself was not significant enough to really warrant lockfree lookup. Thanks Dipankar ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-12 13:52 ` Dipankar Sarma 2003-06-12 15:33 ` Dipankar Sarma 2003-06-12 15:35 ` Trond Myklebust @ 2003-06-12 15:49 ` Linus Torvalds 2003-06-12 16:05 ` Dipankar Sarma 2003-06-12 16:18 ` Linus Torvalds 2 siblings, 2 replies; 22+ messages in thread From: Linus Torvalds @ 2003-06-12 15:49 UTC (permalink / raw) To: Dipankar Sarma Cc: John M Flinchbaugh, linux-kernel, Trond Myklebust, Maneesh Soni On Thu, 12 Jun 2003, Dipankar Sarma wrote: > > hlist poison patch is broken. list_del_rcu() and hlist_del_rcu() > *must not* re-initialize the pointers. Maneesh submitted a patch > earlier today that corrects this - Sorry, but you're wrong. If you depend on not re-initializing the pointers, you should not use the "xxx_del()" function, and you should document it. This is that the __xxx_del() functions are there for - they won't do the poisoning. The regular delete functions have historically always poisoned the pointers - it was only removed a few months ago by Andrew. Linus ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-12 15:49 ` Linus Torvalds @ 2003-06-12 16:05 ` Dipankar Sarma 2003-06-12 16:18 ` Linus Torvalds 1 sibling, 0 replies; 22+ messages in thread From: Dipankar Sarma @ 2003-06-12 16:05 UTC (permalink / raw) To: Linus Torvalds Cc: John M Flinchbaugh, linux-kernel, Trond Myklebust, Maneesh Soni On Thu, Jun 12, 2003 at 08:49:53AM -0700, Linus Torvalds wrote: > On Thu, 12 Jun 2003, Dipankar Sarma wrote: > > > > hlist poison patch is broken. list_del_rcu() and hlist_del_rcu() > > *must not* re-initialize the pointers. Maneesh submitted a patch > > earlier today that corrects this - > > Sorry, but you're wrong. > > If you depend on not re-initializing the pointers, you should not use the > "xxx_del()" function, and you should document it. Right. That is why they are list_del_rcu() and hlist_del_rcu(). The comments for list_del_rcu() clearly say that pointers are not re-initialized. > > This is that the __xxx_del() functions are there for - they won't do the > poisoning. The regular delete functions have historically always poisoned > the pointers - it was only removed a few months ago by Andrew. Poisoning xxx_del() functions is ok. That should be done. Thanks Dipankar ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-12 15:49 ` Linus Torvalds 2003-06-12 16:05 ` Dipankar Sarma @ 2003-06-12 16:18 ` Linus Torvalds 2003-06-12 16:35 ` Dipankar Sarma 2003-06-13 12:48 ` Maneesh Soni 1 sibling, 2 replies; 22+ messages in thread From: Linus Torvalds @ 2003-06-12 16:18 UTC (permalink / raw) To: Dipankar Sarma Cc: John M Flinchbaugh, linux-kernel, Trond Myklebust, Maneesh Soni, Andrew Morton On Thu, 12 Jun 2003, Linus Torvalds wrote: > > If you depend on not re-initializing the pointers, you should not use the > "xxx_del()" function, and you should document it. Besides, the code doesn't actually depend on not re-initializing the pointers, it depends on the _forward_ pointers still being walkable in case some other CPU is traversing the list just as we remove the entry. Which means that I think the proper patch is to (a) document this and also (b) poison the back pointer. A patch like the attached, in short. Linus --- ===== include/linux/dcache.h 1.32 vs edited ===== --- 1.32/include/linux/dcache.h Tue Jun 10 14:56:43 2003 +++ edited/include/linux/dcache.h Thu Jun 12 09:12:27 2003 @@ -174,8 +174,10 @@ static inline void __d_drop(struct dentry *dentry) { - dentry->d_vfs_flags |= DCACHE_UNHASHED; - hlist_del_rcu_init(&dentry->d_hash); + if (!(dentry->d_vfs_flags & DCACHE_UNHASHED)) { + dentry->d_vfs_flags |= DCACHE_UNHASHED; + hlist_del_rcu(&dentry->d_hash); + } } static inline void d_drop(struct dentry *dentry) ===== include/linux/list.h 1.32 vs edited ===== --- 1.32/include/linux/list.h Tue Jun 10 15:46:31 2003 +++ edited/include/linux/list.h Thu Jun 12 08:59:31 2003 @@ -152,14 +152,17 @@ /** * list_del_rcu - deletes entry from list without re-initialization * @entry: the element to delete from the list. + * * Note: list_empty on entry does not return true after this, * the entry is in an undefined state. It is useful for RCU based * lockfree traversal. + * + * In particular, it means that we can not poison the forward + * pointers that may still be used for path walking. */ static inline void list_del_rcu(struct list_head *entry) { __list_del(entry->prev, entry->next); - entry->next = LIST_POISON1; entry->prev = LIST_POISON2; } @@ -431,7 +434,22 @@ n->pprev = LIST_POISON2; } -#define hlist_del_rcu hlist_del /* list_del_rcu is identical too? */ +/** + * hlist_del_rcu - deletes entry from hash list without re-initialization + * @entry: the element to delete from the list. + * + * Note: list_empty on entry does not return true after this, + * the entry is in an undefined state. It is useful for RCU based + * lockfree traversal. + * + * In particular, it means that we can not poison the forward + * pointers that may still be used for path walking. + */ +static inline void hlist_del_rcu(struct hlist_node *n) +{ + __hlist_del(n); + n->pprev = LIST_POISON2; +} static __inline__ void hlist_del_init(struct hlist_node *n) { ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-12 16:18 ` Linus Torvalds @ 2003-06-12 16:35 ` Dipankar Sarma 2003-06-12 16:47 ` Linus Torvalds 2003-06-13 12:48 ` Maneesh Soni 1 sibling, 1 reply; 22+ messages in thread From: Dipankar Sarma @ 2003-06-12 16:35 UTC (permalink / raw) To: Linus Torvalds Cc: John M Flinchbaugh, linux-kernel, Trond Myklebust, Maneesh Soni, Andrew Morton On Thu, Jun 12, 2003 at 09:18:11AM -0700, Linus Torvalds wrote: > > On Thu, 12 Jun 2003, Linus Torvalds wrote: > > > > If you depend on not re-initializing the pointers, you should not use the > > "xxx_del()" function, and you should document it. > > Besides, the code doesn't actually depend on not re-initializing the > pointers, it depends on the _forward_ pointers still being walkable in > case some other CPU is traversing the list just as we remove the entry. > > Which means that I think the proper patch is to (a) document this and also > (b) poison the back pointer. That should work. However, I do have once concern. At the generic list macro level, we don't know if the lockfree traversal is being done in forward or backward direction. So, I am not sure if list_del_rcu() should poison the backward pointer or atleast document the fact that RCU-based traversal would not work on backward pointers. hlist_del_rcu() can indeed poison pprev. > > A patch like the attached, in short. Since we are at it, I can submit a patch documenting the rest of hlist functions, if you like. Thanks Dipankar ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-12 16:35 ` Dipankar Sarma @ 2003-06-12 16:47 ` Linus Torvalds 0 siblings, 0 replies; 22+ messages in thread From: Linus Torvalds @ 2003-06-12 16:47 UTC (permalink / raw) To: Dipankar Sarma Cc: John M Flinchbaugh, linux-kernel, Trond Myklebust, Maneesh Soni, Andrew Morton On Thu, 12 Jun 2003, Dipankar Sarma wrote: > > That should work. However, I do have once concern. At the generic > list macro level, we don't know if the lockfree traversal is being > done in forward or backward direction. Sure we do. We do have backwards list traversal, but it's already not available for the RCU case (ie we have "list_for_each_entry_rcu()", but we don't have "list_for_each_entry_reverse_rcu()"). Of course, somebody may be using the non-RCU versions of the list traversal macros on a RCU list, but that would be a bug anyway. Linus ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 2.5.70-bk16: nfs crash 2003-06-12 16:18 ` Linus Torvalds 2003-06-12 16:35 ` Dipankar Sarma @ 2003-06-13 12:48 ` Maneesh Soni 1 sibling, 0 replies; 22+ messages in thread From: Maneesh Soni @ 2003-06-13 12:48 UTC (permalink / raw) To: Linus Torvalds Cc: Dipankar Sarma, John M Flinchbaugh, linux-kernel, Trond Myklebust, Andrew Morton On Thu, Jun 12, 2003 at 09:18:11AM -0700, Linus Torvalds wrote: > > [..] > --- > ===== include/linux/dcache.h 1.32 vs edited ===== > --- 1.32/include/linux/dcache.h Tue Jun 10 14:56:43 2003 > +++ edited/include/linux/dcache.h Thu Jun 12 09:12:27 2003 > @@ -174,8 +174,10 @@ > > static inline void __d_drop(struct dentry *dentry) > { > - dentry->d_vfs_flags |= DCACHE_UNHASHED; > - hlist_del_rcu_init(&dentry->d_hash); > + if (!(dentry->d_vfs_flags & DCACHE_UNHASHED)) { > + dentry->d_vfs_flags |= DCACHE_UNHASHED; > + hlist_del_rcu(&dentry->d_hash); > + } > } Looks like there is some problem in this. With this conditional d_drop, umounting an NFS mount goes in a loop and oopses in dput() This is on 2.5.70-bk17. Unable to handle kernel paging request at virtual address 00100104 printing eip: c016f9b1 *pde = 00000000 Oops: 0002 [#1] CPU: 3 EIP: 0060:[<c016f9b1>] Not tainted EFLAGS: 00010246 EIP is at dput+0x1f1/0x350 eax: 00100100 ebx: f72a8c80 ecx: f72a8c9c edx: 00200200 esi: f64d2000 edi: f72a8c88 ebp: 00000000 esp: f64d3e8c ds: 007b es: 007b ss: 0068 Process umount (pid: 926, threadinfo=f64d2000 task=f79d9900) Stack: 00000010 c016007b 0000007b ffffffef f72a8c80 f6448380 f64481e0 c02fccca f72a8c80 f6448380 f72a8cf8 f72a8820 f72a8820 f6448860 c035f8e0 c02fd1c0 f72a8820 f78775e0 f7fde560 f74971d9 00000005 10ee271a 00000010 00000000 Call Trace: [<c016007b>] bd_claim+0x2b/0xa0 [<c02fccca>] rpc_depopulate+0x16a/0x190 [<c02fd1c0>] rpc_rmdir+0x60/0xa0 [<c02f2a4d>] rpcauth_destroy+0xd/0x60 [<c02ec45d>] rpc_destroy_client+0x4d/0x70 [<c01b6427>] nfs_put_super+0x17/0x40 [<c015e03f>] generic_shutdown_super+0xbf/0x200 [<c015eec0>] kill_anon_super+0x10/0x80 [<c01b8811>] nfs_kill_super+0x11/0x20 [<c015dcf9>] deactivate_super+0xa9/0x140 [<c0175d08>] __mntput+0x18/0x30 [<c0165e79>] path_release+0x29/0x30 [<c0176641>] sys_umount+0x81/0x90 [<c015710f>] sys_close+0x9f/0x100 [<c017665c>] sys_oldumount+0xc/0x10 [<c0109477>] syscall_call+0x7/0xb Removing the DCACHE_UNHASHED check makes it work again. Needs more investigation. -- Maneesh Soni IBM Linux Technology Center, IBM India Software Lab, Bangalore. Phone: +91-80-5044999 email: maneesh@in.ibm.com http://lse.sourceforge.net/ ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2003-06-13 12:32 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-06-12 12:56 2.5.70-bk16: nfs crash John M Flinchbaugh 2003-06-12 13:52 ` Dipankar Sarma 2003-06-12 15:33 ` Dipankar Sarma 2003-06-12 15:35 ` Trond Myklebust 2003-06-12 15:53 ` Dipankar Sarma 2003-06-12 16:26 ` Trond Myklebust 2003-06-12 16:49 ` Linus Torvalds 2003-06-12 16:55 ` Linus Torvalds 2003-06-12 19:53 ` Dipankar Sarma 2003-06-13 5:24 ` Trond Myklebust 2003-06-13 5:50 ` Dipankar Sarma 2003-06-13 6:13 ` Trond Myklebust 2003-06-13 6:54 ` Dipankar Sarma 2003-06-13 6:06 ` Dipankar Sarma 2003-06-12 16:30 ` viro 2003-06-12 16:55 ` Dipankar Sarma 2003-06-12 15:49 ` Linus Torvalds 2003-06-12 16:05 ` Dipankar Sarma 2003-06-12 16:18 ` Linus Torvalds 2003-06-12 16:35 ` Dipankar Sarma 2003-06-12 16:47 ` Linus Torvalds 2003-06-13 12:48 ` Maneesh Soni
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.