public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG?] possible recursive locking detected
@ 2006-07-26 16:05 Rolf Eike Beer
  2006-07-27  5:53 ` Andrew Morton
  0 siblings, 1 reply; 18+ messages in thread
From: Rolf Eike Beer @ 2006-07-26 16:05 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2052 bytes --]

Hi,

I did some memory stress test (allocating and mlock()ing a huge number of 
pages) from userspace. At the very beginning of that I got that error long 
before the system got unresponsible and the oom killer dropped in.

Eike

=============================================
[ INFO: possible recursive locking detected ]
kded/5304 is trying to acquire lock:
 (&inode->i_mutex){--..}, at: [<c11f476e>] mutex_lock+0x21/0x24

but task is already holding lock:
 (&inode->i_mutex){--..}, at: [<c11f476e>] mutex_lock+0x21/0x24

other info that might help us debug this:
3 locks held by kded/5304:
 #0:  (&inode->i_mutex){--..}, at: [<c11f476e>] mutex_lock+0x21/0x24
 #1:  (shrinker_rwsem){----}, at: [<c1046312>] shrink_slab+0x25/0x136
 #2:  (&type->s_umount_key#14){----}, at: [<c106be2e>] prune_dcache+0xf6/0x144

stack backtrace:
 [<c1003aa9>] show_trace_log_lvl+0x54/0xfd
 [<c1004915>] show_trace+0xd/0x10
 [<c100492f>] dump_stack+0x17/0x1c
 [<c102e0e1>] __lock_acquire+0x753/0x99c
 [<c102e5ac>] lock_acquire+0x4a/0x6a
 [<c11f4609>] __mutex_lock_slowpath+0xb0/0x1f4
 [<c11f476e>] mutex_lock+0x21/0x24
 [<f0854fc4>] ntfs_put_inode+0x3b/0x74 [ntfs]
 [<c106cf3f>] iput+0x33/0x6a
 [<c106b707>] dentry_iput+0x5b/0x73
 [<c106bd15>] prune_one_dentry+0x56/0x79
 [<c106be42>] prune_dcache+0x10a/0x144
 [<c106be95>] shrink_dcache_memory+0x19/0x31
 [<c10463bd>] shrink_slab+0xd0/0x136
 [<c1047494>] try_to_free_pages+0x129/0x1d5
 [<c1043d91>] __alloc_pages+0x18e/0x284
 [<c104044b>] read_cache_page+0x59/0x131
 [<c109e96f>] ext2_get_page+0x1c/0x1ff
 [<c109ebc4>] ext2_find_entry+0x72/0x139
 [<c109ec99>] ext2_inode_by_name+0xe/0x2e
 [<c10a1cad>] ext2_lookup+0x1f/0x65
 [<c1064661>] do_lookup+0xa0/0x134
 [<c1064e9a>] __link_path_walk+0x7a5/0xbe4
 [<c1065329>] link_path_walk+0x50/0xca
 [<c106586d>] do_path_lookup+0x212/0x25a
 [<c1065da9>] __user_walk_fd+0x2d/0x41
 [<c10600bd>] vfs_stat_fd+0x19/0x40
 [<c10600f5>] vfs_stat+0x11/0x13
 [<c1060826>] sys_stat64+0x14/0x2a
 [<c1002845>] sysenter_past_esp+0x56/0x8d

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG?] possible recursive locking detected
  2006-07-26 16:05 [BUG?] possible recursive locking detected Rolf Eike Beer
@ 2006-07-27  5:53 ` Andrew Morton
  2006-07-27  6:51   ` Nick Piggin
  2006-07-27  7:29   ` Arjan van de Ven
  0 siblings, 2 replies; 18+ messages in thread
From: Andrew Morton @ 2006-07-27  5:53 UTC (permalink / raw)
  To: Rolf Eike Beer; +Cc: linux-kernel, Anton Altaparmakov

On Wed, 26 Jul 2006 18:05:21 +0200
Rolf Eike Beer <eike-kernel@sf-tec.de> wrote:

> Hi,
> 
> I did some memory stress test (allocating and mlock()ing a huge number of 
> pages) from userspace. At the very beginning of that I got that error long 
> before the system got unresponsible and the oom killer dropped in.
> 
> Eike
> 
> =============================================
> [ INFO: possible recursive locking detected ]
> kded/5304 is trying to acquire lock:
>  (&inode->i_mutex){--..}, at: [<c11f476e>] mutex_lock+0x21/0x24
> 
> but task is already holding lock:
>  (&inode->i_mutex){--..}, at: [<c11f476e>] mutex_lock+0x21/0x24
> 
> other info that might help us debug this:
> 3 locks held by kded/5304:
>  #0:  (&inode->i_mutex){--..}, at: [<c11f476e>] mutex_lock+0x21/0x24
>  #1:  (shrinker_rwsem){----}, at: [<c1046312>] shrink_slab+0x25/0x136
>  #2:  (&type->s_umount_key#14){----}, at: [<c106be2e>] prune_dcache+0xf6/0x144
> 
> stack backtrace:
>  [<c1003aa9>] show_trace_log_lvl+0x54/0xfd
>  [<c1004915>] show_trace+0xd/0x10
>  [<c100492f>] dump_stack+0x17/0x1c
>  [<c102e0e1>] __lock_acquire+0x753/0x99c
>  [<c102e5ac>] lock_acquire+0x4a/0x6a
>  [<c11f4609>] __mutex_lock_slowpath+0xb0/0x1f4
>  [<c11f476e>] mutex_lock+0x21/0x24
>  [<f0854fc4>] ntfs_put_inode+0x3b/0x74 [ntfs]
>  [<c106cf3f>] iput+0x33/0x6a
>  [<c106b707>] dentry_iput+0x5b/0x73
>  [<c106bd15>] prune_one_dentry+0x56/0x79
>  [<c106be42>] prune_dcache+0x10a/0x144
>  [<c106be95>] shrink_dcache_memory+0x19/0x31
>  [<c10463bd>] shrink_slab+0xd0/0x136
>  [<c1047494>] try_to_free_pages+0x129/0x1d5
>  [<c1043d91>] __alloc_pages+0x18e/0x284
>  [<c104044b>] read_cache_page+0x59/0x131
>  [<c109e96f>] ext2_get_page+0x1c/0x1ff
>  [<c109ebc4>] ext2_find_entry+0x72/0x139
>  [<c109ec99>] ext2_inode_by_name+0xe/0x2e
>  [<c10a1cad>] ext2_lookup+0x1f/0x65
>  [<c1064661>] do_lookup+0xa0/0x134
>  [<c1064e9a>] __link_path_walk+0x7a5/0xbe4
>  [<c1065329>] link_path_walk+0x50/0xca
>  [<c106586d>] do_path_lookup+0x212/0x25a
>  [<c1065da9>] __user_walk_fd+0x2d/0x41
>  [<c10600bd>] vfs_stat_fd+0x19/0x40
>  [<c10600f5>] vfs_stat+0x11/0x13
>  [<c1060826>] sys_stat64+0x14/0x2a
>  [<c1002845>] sysenter_past_esp+0x56/0x8d

We hold the ext2 directory mutex, and ntfs_put_inode is trying to take an
ntfs i_mutex.  Not a deadlock as such, but it could become one in ntfs if
ntfs ever does a __GFP_WAIT allocation inside i_mutex, which it surely
does.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG?] possible recursive locking detected
  2006-07-27  5:53 ` Andrew Morton
@ 2006-07-27  6:51   ` Nick Piggin
  2006-07-27  7:15     ` Anton Altaparmakov
  2006-07-27  7:24     ` Andrew Morton
  2006-07-27  7:29   ` Arjan van de Ven
  1 sibling, 2 replies; 18+ messages in thread
From: Nick Piggin @ 2006-07-27  6:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rolf Eike Beer, linux-kernel, Anton Altaparmakov

Andrew Morton wrote:
> On Wed, 26 Jul 2006 18:05:21 +0200
> Rolf Eike Beer <eike-kernel@sf-tec.de> wrote:
> 
> 
>>Hi,
>>
>>I did some memory stress test (allocating and mlock()ing a huge number of 
>>pages) from userspace. At the very beginning of that I got that error long 
>>before the system got unresponsible and the oom killer dropped in.
>>
>>Eike
>>
>>=============================================
>>[ INFO: possible recursive locking detected ]
>>kded/5304 is trying to acquire lock:
>> (&inode->i_mutex){--..}, at: [<c11f476e>] mutex_lock+0x21/0x24
>>
>>but task is already holding lock:
>> (&inode->i_mutex){--..}, at: [<c11f476e>] mutex_lock+0x21/0x24
>>
>>other info that might help us debug this:
>>3 locks held by kded/5304:
>> #0:  (&inode->i_mutex){--..}, at: [<c11f476e>] mutex_lock+0x21/0x24
>> #1:  (shrinker_rwsem){----}, at: [<c1046312>] shrink_slab+0x25/0x136
>> #2:  (&type->s_umount_key#14){----}, at: [<c106be2e>] prune_dcache+0xf6/0x144
>>
>>stack backtrace:
>> [<c1003aa9>] show_trace_log_lvl+0x54/0xfd
>> [<c1004915>] show_trace+0xd/0x10
>> [<c100492f>] dump_stack+0x17/0x1c
>> [<c102e0e1>] __lock_acquire+0x753/0x99c
>> [<c102e5ac>] lock_acquire+0x4a/0x6a
>> [<c11f4609>] __mutex_lock_slowpath+0xb0/0x1f4
>> [<c11f476e>] mutex_lock+0x21/0x24
>> [<f0854fc4>] ntfs_put_inode+0x3b/0x74 [ntfs]
>> [<c106cf3f>] iput+0x33/0x6a
>> [<c106b707>] dentry_iput+0x5b/0x73
>> [<c106bd15>] prune_one_dentry+0x56/0x79
>> [<c106be42>] prune_dcache+0x10a/0x144
>> [<c106be95>] shrink_dcache_memory+0x19/0x31
>> [<c10463bd>] shrink_slab+0xd0/0x136
>> [<c1047494>] try_to_free_pages+0x129/0x1d5
>> [<c1043d91>] __alloc_pages+0x18e/0x284
>> [<c104044b>] read_cache_page+0x59/0x131
>> [<c109e96f>] ext2_get_page+0x1c/0x1ff
>> [<c109ebc4>] ext2_find_entry+0x72/0x139
>> [<c109ec99>] ext2_inode_by_name+0xe/0x2e
>> [<c10a1cad>] ext2_lookup+0x1f/0x65
>> [<c1064661>] do_lookup+0xa0/0x134
>> [<c1064e9a>] __link_path_walk+0x7a5/0xbe4
>> [<c1065329>] link_path_walk+0x50/0xca
>> [<c106586d>] do_path_lookup+0x212/0x25a
>> [<c1065da9>] __user_walk_fd+0x2d/0x41
>> [<c10600bd>] vfs_stat_fd+0x19/0x40
>> [<c10600f5>] vfs_stat+0x11/0x13
>> [<c1060826>] sys_stat64+0x14/0x2a
>> [<c1002845>] sysenter_past_esp+0x56/0x8d
> 
> 
> We hold the ext2 directory mutex, and ntfs_put_inode is trying to take an
> ntfs i_mutex.  Not a deadlock as such, but it could become one in ntfs if
> ntfs ever does a __GFP_WAIT allocation inside i_mutex, which it surely
> does.

Though it should be using GFP_NOFS, right? So the dcache shrinker would
not reenter the fs in that case.

I'm surprised ext2 is allocating with __GFP_FS set, though. Would that
cause any problem?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG?] possible recursive locking detected
  2006-07-27  6:51   ` Nick Piggin
@ 2006-07-27  7:15     ` Anton Altaparmakov
  2006-07-27  7:38       ` Andrew Morton
  2006-07-27  7:24     ` Andrew Morton
  1 sibling, 1 reply; 18+ messages in thread
From: Anton Altaparmakov @ 2006-07-27  7:15 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Rolf Eike Beer, linux-kernel, Anton Altaparmakov

On Thu, 2006-07-27 at 16:51 +1000, Nick Piggin wrote:
> Andrew Morton wrote:
> > On Wed, 26 Jul 2006 18:05:21 +0200
> > Rolf Eike Beer <eike-kernel@sf-tec.de> wrote:
> > 
> > 
> >>Hi,
> >>
> >>I did some memory stress test (allocating and mlock()ing a huge number of 
> >>pages) from userspace. At the very beginning of that I got that error long 
> >>before the system got unresponsible and the oom killer dropped in.
> >>
> >>Eike
> >>
> >>=============================================
> >>[ INFO: possible recursive locking detected ]
> >>kded/5304 is trying to acquire lock:
> >> (&inode->i_mutex){--..}, at: [<c11f476e>] mutex_lock+0x21/0x24
> >>
> >>but task is already holding lock:
> >> (&inode->i_mutex){--..}, at: [<c11f476e>] mutex_lock+0x21/0x24
> >>
> >>other info that might help us debug this:
> >>3 locks held by kded/5304:
> >> #0:  (&inode->i_mutex){--..}, at: [<c11f476e>] mutex_lock+0x21/0x24
> >> #1:  (shrinker_rwsem){----}, at: [<c1046312>] shrink_slab+0x25/0x136
> >> #2:  (&type->s_umount_key#14){----}, at: [<c106be2e>] prune_dcache+0xf6/0x144
> >>
> >>stack backtrace:
> >> [<c1003aa9>] show_trace_log_lvl+0x54/0xfd
> >> [<c1004915>] show_trace+0xd/0x10
> >> [<c100492f>] dump_stack+0x17/0x1c
> >> [<c102e0e1>] __lock_acquire+0x753/0x99c
> >> [<c102e5ac>] lock_acquire+0x4a/0x6a
> >> [<c11f4609>] __mutex_lock_slowpath+0xb0/0x1f4
> >> [<c11f476e>] mutex_lock+0x21/0x24
> >> [<f0854fc4>] ntfs_put_inode+0x3b/0x74 [ntfs]
> >> [<c106cf3f>] iput+0x33/0x6a
> >> [<c106b707>] dentry_iput+0x5b/0x73
> >> [<c106bd15>] prune_one_dentry+0x56/0x79
> >> [<c106be42>] prune_dcache+0x10a/0x144
> >> [<c106be95>] shrink_dcache_memory+0x19/0x31
> >> [<c10463bd>] shrink_slab+0xd0/0x136
> >> [<c1047494>] try_to_free_pages+0x129/0x1d5
> >> [<c1043d91>] __alloc_pages+0x18e/0x284
> >> [<c104044b>] read_cache_page+0x59/0x131
> >> [<c109e96f>] ext2_get_page+0x1c/0x1ff
> >> [<c109ebc4>] ext2_find_entry+0x72/0x139
> >> [<c109ec99>] ext2_inode_by_name+0xe/0x2e
> >> [<c10a1cad>] ext2_lookup+0x1f/0x65
> >> [<c1064661>] do_lookup+0xa0/0x134
> >> [<c1064e9a>] __link_path_walk+0x7a5/0xbe4
> >> [<c1065329>] link_path_walk+0x50/0xca
> >> [<c106586d>] do_path_lookup+0x212/0x25a
> >> [<c1065da9>] __user_walk_fd+0x2d/0x41
> >> [<c10600bd>] vfs_stat_fd+0x19/0x40
> >> [<c10600f5>] vfs_stat+0x11/0x13
> >> [<c1060826>] sys_stat64+0x14/0x2a
> >> [<c1002845>] sysenter_past_esp+0x56/0x8d
> > 
> > 
> > We hold the ext2 directory mutex, and ntfs_put_inode is trying to take an
> > ntfs i_mutex.  Not a deadlock as such, but it could become one in ntfs if
> > ntfs ever does a __GFP_WAIT allocation inside i_mutex, which it surely
> > does.

Yes we do use __GFP_WAIT but the only alternative is to panic() and I
certainly prefer a __GFP_WAIT allocation that can deadlock in some cases
compared to a 100% certain panic() to kill the system...

> Though it should be using GFP_NOFS, right? So the dcache shrinker would
> not reenter the fs in that case.

NTFS _always_ sets at least GFP_NOFS.

> I'm surprised ext2 is allocating with __GFP_FS set, though. Would that
> cause any problem?

That is an ext2 bug IMO.  A file system should always use GFP_NOFS
otherwise it is asking for trouble.

Best regards,

        Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://www.linux-ntfs.org/ & http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG?] possible recursive locking detected
  2006-07-27  6:51   ` Nick Piggin
  2006-07-27  7:15     ` Anton Altaparmakov
@ 2006-07-27  7:24     ` Andrew Morton
  1 sibling, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2006-07-27  7:24 UTC (permalink / raw)
  To: Nick Piggin; +Cc: eike-kernel, linux-kernel, aia21

On Thu, 27 Jul 2006 16:51:29 +1000
Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> > We hold the ext2 directory mutex, and ntfs_put_inode is trying to take an
> > ntfs i_mutex.  Not a deadlock as such, but it could become one in ntfs if
> > ntfs ever does a __GFP_WAIT allocation inside i_mutex, which it surely
> > does.
> 
> Though it should be using GFP_NOFS, right? So the dcache shrinker would
> not reenter the fs in that case.

Sort-of, arguably.  Many years ago, holding i_mutex (i_sem) was considered
to be "in the fs" and one should use GFP_NOFS.

(This code dates from the ext2 directory-in-pagecache conversion - it's
2.4 stuff.)

It's better, of course, to use GFP_HIGHUSER for pagecache so we should aim
to get this working.  And that means don't-take-i_mutex-on-the-reclaim-path.

We quite possibly are doing that in other places, too.

> I'm surprised ext2 is allocating with __GFP_FS set, though. Would that
> cause any problem?

It might, if ext2 takes i_mutex on the reclaim path.  But it doesn't.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG?] possible recursive locking detected
  2006-07-27  5:53 ` Andrew Morton
  2006-07-27  6:51   ` Nick Piggin
@ 2006-07-27  7:29   ` Arjan van de Ven
  1 sibling, 0 replies; 18+ messages in thread
From: Arjan van de Ven @ 2006-07-27  7:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rolf Eike Beer, linux-kernel, Anton Altaparmakov

On Wed, 2006-07-26 at 22:53 -0700, Andrew Morton wrote:
> On Wed, 26 Jul 2006 18:05:21 +0200
> Rolf Eike Beer <eike-kernel@sf-tec.de> wrote:
> 
> > Hi,
> > 
> > I did some memory stress test (allocating and mlock()ing a huge number of 
> > pages) from userspace. At the very beginning of that I got that error long 
> > before the system got unresponsible and the oom killer dropped in.
> > 
> > Eike
> > 
> > =============================================
> > [ INFO: possible recursive locking detected ]
> > kded/5304 is trying to acquire lock:
> >  (&inode->i_mutex){--..}, at: [<c11f476e>] mutex_lock+0x21/0x24
> > 
> > but task is already holding lock:
> >  (&inode->i_mutex){--..}, at: [<c11f476e>] mutex_lock+0x21/0x24
> > 
> > other info that might help us debug this:
> > 3 locks held by kded/5304:
> >  #0:  (&inode->i_mutex){--..}, at: [<c11f476e>] mutex_lock+0x21/0x24
> >  #1:  (shrinker_rwsem){----}, at: [<c1046312>] shrink_slab+0x25/0x136
> >  #2:  (&type->s_umount_key#14){----}, at: [<c106be2e>] prune_dcache+0xf6/0x144
> > 
> > stack backtrace:
> >  [<c1003aa9>] show_trace_log_lvl+0x54/0xfd
> >  [<c1004915>] show_trace+0xd/0x10
> >  [<c100492f>] dump_stack+0x17/0x1c
> >  [<c102e0e1>] __lock_acquire+0x753/0x99c
> >  [<c102e5ac>] lock_acquire+0x4a/0x6a
> >  [<c11f4609>] __mutex_lock_slowpath+0xb0/0x1f4
> >  [<c11f476e>] mutex_lock+0x21/0x24
> >  [<f0854fc4>] ntfs_put_inode+0x3b/0x74 [ntfs]
> >  [<c106cf3f>] iput+0x33/0x6a
> >  [<c106b707>] dentry_iput+0x5b/0x73
> >  [<c106bd15>] prune_one_dentry+0x56/0x79
> >  [<c106be42>] prune_dcache+0x10a/0x144
> >  [<c106be95>] shrink_dcache_memory+0x19/0x31
> >  [<c10463bd>] shrink_slab+0xd0/0x136
> >  [<c1047494>] try_to_free_pages+0x129/0x1d5
> >  [<c1043d91>] __alloc_pages+0x18e/0x284
> >  [<c104044b>] read_cache_page+0x59/0x131
> >  [<c109e96f>] ext2_get_page+0x1c/0x1ff
> >  [<c109ebc4>] ext2_find_entry+0x72/0x139
> >  [<c109ec99>] ext2_inode_by_name+0xe/0x2e
> >  [<c10a1cad>] ext2_lookup+0x1f/0x65
> >  [<c1064661>] do_lookup+0xa0/0x134
> >  [<c1064e9a>] __link_path_walk+0x7a5/0xbe4
> >  [<c1065329>] link_path_walk+0x50/0xca
> >  [<c106586d>] do_path_lookup+0x212/0x25a
> >  [<c1065da9>] __user_walk_fd+0x2d/0x41
> >  [<c10600bd>] vfs_stat_fd+0x19/0x40
> >  [<c10600f5>] vfs_stat+0x11/0x13
> >  [<c1060826>] sys_stat64+0x14/0x2a
> >  [<c1002845>] sysenter_past_esp+0x56/0x8d
> 
> We hold the ext2 directory mutex, and ntfs_put_inode is trying to take an
> ntfs i_mutex.  Not a deadlock as such, but it could become one in ntfs if
> ntfs ever does a __GFP_WAIT allocation inside i_mutex, which it surely
> does.

I talked with Al about this one briefly yesterday; he considers it not a
bug but personally I'd be a lot happier if ext2 would use GFP_NOFS for
this allocation. Well not so much ext2 as read_cache_page() which is
generic code....


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG?] possible recursive locking detected
  2006-07-27  7:15     ` Anton Altaparmakov
@ 2006-07-27  7:38       ` Andrew Morton
  2006-07-27  8:19         ` Anton Altaparmakov
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2006-07-27  7:38 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: nickpiggin, eike-kernel, linux-kernel, aia21

On Thu, 27 Jul 2006 08:15:27 +0100
Anton Altaparmakov <aia21@cam.ac.uk> wrote:

> > I'm surprised ext2 is allocating with __GFP_FS set, though. Would that
> > cause any problem?
> 
> That is an ext2 bug IMO.

There is no bug.

What there is is an ill-defined set of rules.  If we want to tighten these
rules we have a choice between

a) Never enter page reclaim while holding i_mutex or

b) never take i_mutex on the page reclaim path.


Implementing a) would be a disaster.  It means that our main write()
implementation in mm/filemap.c (which holds i_mutex) wouldn't be able to
reclaim pages to satisfy the write.  And generally, we do want to use the
strongest memory allocation mode at all times.

So we'll have a better kernel if we implement b).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG?] possible recursive locking detected
  2006-07-27  7:38       ` Andrew Morton
@ 2006-07-27  8:19         ` Anton Altaparmakov
  2006-07-27  8:53           ` Andrew Morton
  2006-07-27  9:18           ` Nick Piggin
  0 siblings, 2 replies; 18+ messages in thread
From: Anton Altaparmakov @ 2006-07-27  8:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nickpiggin, eike-kernel, linux-kernel, aia21

On Thu, 2006-07-27 at 00:38 -0700, Andrew Morton wrote:
> On Thu, 27 Jul 2006 08:15:27 +0100
> Anton Altaparmakov <aia21@cam.ac.uk> wrote:
> 
> > > I'm surprised ext2 is allocating with __GFP_FS set, though. Would that
> > > cause any problem?
> > 
> > That is an ext2 bug IMO.
> 
> There is no bug.
> 
> What there is is an ill-defined set of rules.  If we want to tighten these
> rules we have a choice between

I beg to differ.  It is a bug.  You cannot reenter the file system when
the file system is trying to allocate memory.  Otherwise you can never
allocate memory with any locks held or you are bound to introduce an
A->B B->A deadlock somewhere.

> a) Never enter page reclaim while holding i_mutex or
> 
> b) never take i_mutex on the page reclaim path.
> 
> 
> Implementing a) would be a disaster.  It means that our main write()
> implementation in mm/filemap.c (which holds i_mutex) wouldn't be able to
> reclaim pages to satisfy the write.  And generally, we do want to use the
> strongest memory allocation mode at all times.
> 
> So we'll have a better kernel if we implement b).

b) is impossible for ntfs.  The only potential partial solution would be
to make ntfs use trylock everywhere and if that fails abort whatever the
VFS is trying to do with an EAGAIN and it is up to the VFS to deal with
it.  That would require quite a big rewrite of the VFS given a lot of FS
methods invoked by the VFS do not have a return value at the moment...
For example put_inode and clear_inode would need to accept EAGAIN and
would have to undo what was done before them so they could be retried
later.  To steal your own words, "we would have a better kernel" if the
VFS did error handling and recovery properly...  (-;

Several random examples:

- NTFS holds metadata in page cache pages so if you want to reclaim a
page ntfs has to flush the metadata to disk.  It cannot do that without
holding locks.

- When an inode has to be flushed out due to clear_inode ntfs has to
take i_mutex in order to write the inode to disk properly (which
involves locking all parent directories of the inode in turn and
updating the directory entry for that inode - this has to be done
because the directory entry in ntfs contains not only the filename but
also the stat information such as file size, allocation size on disk,
a/m/c/C time etc and chkdsk in Windows complains if those are not
uptodate.

Best regards,

        Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://www.linux-ntfs.org/ & http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG?] possible recursive locking detected
  2006-07-27  8:19         ` Anton Altaparmakov
@ 2006-07-27  8:53           ` Andrew Morton
  2006-07-27  9:28             ` Anton Altaparmakov
  2006-07-27  9:18           ` Nick Piggin
  1 sibling, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2006-07-27  8:53 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: nickpiggin, eike-kernel, linux-kernel, aia21

On Thu, 27 Jul 2006 09:19:58 +0100
Anton Altaparmakov <aia21@cam.ac.uk> wrote:

> b) is impossible for ntfs.

ntfs write() is already doing GFP_HIGHUSER allocations inside i_mutex.

Presumably there's some reason why it isn't deadlocking at present.  Could
be that we'll end up deciding to make lockdep shut up about cross-fs
i_mutex-takings, but that's a bit lame because if some other fs starts
taking i_mutex in the reclaim path we're exposed to ab/ba deadlocks, and
they won't be reported.

But sorry, we just cannot go and require that write()'s pagecache
allocations not be able to write dirty data, not be able to strip buffers
from clean pages and not be able to reclaim slab.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG?] possible recursive locking detected
  2006-07-27  8:19         ` Anton Altaparmakov
  2006-07-27  8:53           ` Andrew Morton
@ 2006-07-27  9:18           ` Nick Piggin
  2006-07-27  9:35             ` Anton Altaparmakov
  1 sibling, 1 reply; 18+ messages in thread
From: Nick Piggin @ 2006-07-27  9:18 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: Andrew Morton, eike-kernel, linux-kernel, aia21

Anton Altaparmakov wrote:
> On Thu, 2006-07-27 at 00:38 -0700, Andrew Morton wrote:
> 
>>On Thu, 27 Jul 2006 08:15:27 +0100
>>Anton Altaparmakov <aia21@cam.ac.uk> wrote:
>>
>>
>>>>I'm surprised ext2 is allocating with __GFP_FS set, though. Would that
>>>>cause any problem?
>>>
>>>That is an ext2 bug IMO.
>>
>>There is no bug.
>>
>>What there is is an ill-defined set of rules.  If we want to tighten these
>>rules we have a choice between
> 
> 
> I beg to differ.  It is a bug.  You cannot reenter the file system when
> the file system is trying to allocate memory.  Otherwise you can never
> allocate memory with any locks held or you are bound to introduce an
> A->B B->A deadlock somewhere.

I don't think it is a bug in general. It really depends on the allocation:

- If it is a path that might be required in order to writeout a page, then
yes GFP_NOFS is going to help prevent deadlocks.

- If it is a path where you'll take the same locks as page reclaim requires,
then again GFP_NOFS is required.

For NTFS case, it seems like holding i_mutex on the write path falls foul
of the second problem. But I agree with Andrew that this is a critical case
where we do have to enter the fs. GFP_NOFS is too big a hammer to use.

I guess you'd have to change NTFS to do something sane privately, or come
up with a nice general solution that doesn't harm the common filesystems
that apparently don't have a problem here... can you just add GFP_NOFS to
NTFS's mapping_gfp_mask to start with?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG?] possible recursive locking detected
  2006-07-27  8:53           ` Andrew Morton
@ 2006-07-27  9:28             ` Anton Altaparmakov
  2006-07-27  9:46               ` Ingo Molnar
  0 siblings, 1 reply; 18+ messages in thread
From: Anton Altaparmakov @ 2006-07-27  9:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nickpiggin, eike-kernel, linux-kernel, aia21

On Thu, 2006-07-27 at 01:53 -0700, Andrew Morton wrote:
> On Thu, 27 Jul 2006 09:19:58 +0100
> Anton Altaparmakov <aia21@cam.ac.uk> wrote:
> 
> > b) is impossible for ntfs.
> 
> ntfs write() is already doing GFP_HIGHUSER allocations inside i_mutex.

Yes.

> Presumably there's some reason why it isn't deadlocking at present.

NTFS always also supplies GFP_NOFS so it should never deadlock on itself
(except in cases where it has no control of memory allocations because
VFS / kernel functions do the allocation).

I would assume the likelyhood of a cross fs ab/ba deadlock is so small
that it effectively never happens...

> Could
> be that we'll end up deciding to make lockdep shut up about cross-fs
> i_mutex-takings, but that's a bit lame because if some other fs starts
> taking i_mutex in the reclaim path we're exposed to ab/ba deadlocks, and
> they won't be reported.

True but I think that will be the only solution to get rid of the
messages.  It is not the only place in the kernel where there are
theoretical problems in the code but it is considered acceptable because
it happens very seldomly.  This is such a case I feel.

An example is the potential deadlock in generic buffered file write
where we fault in a page via fault_in_pages_readable() but there is
nothing to guarantee that page will not go away between us doing this
and us using the page.

This deadlock is hit tons when using reiserfs (unmodified) and even ext3
is affected on servers that run very high i/o loads.  (Loads being
between once a day and once a week and it requires a reboot to sort out
for the unmodified reiserfs case but we have a local patch that improves
matters dramatically.)

And still the code is there and no-one complains as the real solutions
are either too horrible to contemplate or need a lot of intricate
changes to the kernel's page handling (e.g. like a "page is pinned" flag
that then needs to be honoured in all the right places)...

> But sorry, we just cannot go and require that write()'s pagecache
> allocations not be able to write dirty data, not be able to strip buffers
> from clean pages and not be able to reclaim slab.

As I said, perhaps all those code paths need to be able to take a "no"
for an answer from the FS and the fs needs to try-lock and abort if it
fails.  The problem at the moment is that the FS must succeed or
deadlock trying (or panic or silently cause errors on the file system
take your pick of what you consider the least bad).

Best regards,

        Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://www.linux-ntfs.org/ & http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG?] possible recursive locking detected
  2006-07-27  9:18           ` Nick Piggin
@ 2006-07-27  9:35             ` Anton Altaparmakov
  2006-07-27 10:02               ` Nick Piggin
  0 siblings, 1 reply; 18+ messages in thread
From: Anton Altaparmakov @ 2006-07-27  9:35 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, eike-kernel, linux-kernel, aia21

On Thu, 2006-07-27 at 19:18 +1000, Nick Piggin wrote:
> Anton Altaparmakov wrote:
> > On Thu, 2006-07-27 at 00:38 -0700, Andrew Morton wrote:
> > 
> >>On Thu, 27 Jul 2006 08:15:27 +0100
> >>Anton Altaparmakov <aia21@cam.ac.uk> wrote:
> >>
> >>
> >>>>I'm surprised ext2 is allocating with __GFP_FS set, though. Would that
> >>>>cause any problem?
> >>>
> >>>That is an ext2 bug IMO.
> >>
> >>There is no bug.
> >>
> >>What there is is an ill-defined set of rules.  If we want to tighten these
> >>rules we have a choice between
> > 
> > 
> > I beg to differ.  It is a bug.  You cannot reenter the file system when
> > the file system is trying to allocate memory.  Otherwise you can never
> > allocate memory with any locks held or you are bound to introduce an
> > A->B B->A deadlock somewhere.
> 
> I don't think it is a bug in general. It really depends on the allocation:
> 
> - If it is a path that might be required in order to writeout a page, then
> yes GFP_NOFS is going to help prevent deadlocks.
> 
> - If it is a path where you'll take the same locks as page reclaim requires,
> then again GFP_NOFS is required.
> 
> For NTFS case, it seems like holding i_mutex on the write path falls foul
> of the second problem. But I agree with Andrew that this is a critical case
> where we do have to enter the fs. GFP_NOFS is too big a hammer to use.
> 
> I guess you'd have to change NTFS to do something sane privately, or come
> up with a nice general solution that doesn't harm the common filesystems
> that apparently don't have a problem here... can you just add GFP_NOFS to
> NTFS's mapping_gfp_mask to start with?

I don't think NTFS has a problem either.  It is a theoretical problem
with an extremely small chance of being hit.  I am happy to have such a
problem for now.  There are more pressing problems to solve.  The only
thing that needs to happen is for the messages to stop so people stop
complaining / getting worried about them...

Best regards,

        Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://www.linux-ntfs.org/ & http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG?] possible recursive locking detected
  2006-07-27  9:28             ` Anton Altaparmakov
@ 2006-07-27  9:46               ` Ingo Molnar
  2006-07-27 14:31                 ` Anton Altaparmakov
  0 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2006-07-27  9:46 UTC (permalink / raw)
  To: Anton Altaparmakov
  Cc: Andrew Morton, nickpiggin, eike-kernel, linux-kernel, aia21,
	Arjan van de Ven


* Anton Altaparmakov <aia21@cam.ac.uk> wrote:

> An example is the potential deadlock in generic buffered file write 
> where we fault in a page via fault_in_pages_readable() but there is 
> nothing to guarantee that page will not go away between us doing this 
> and us using the page.

isnt this solved by:

 commit 6527c2bdf1f833cc18e8f42bd97973d583e4aa83
 Author: Vladimir V. Saveliev <vs@namesys.com>
 Date:   Tue Jun 27 02:53:57 2006 -0700

     [PATCH] generic_file_buffered_write(): deadlock on vectored write

?

if not, do you have any description of the problem or a link to previous 
discussion[s] outlining the problem? To me it appears this is a kernel 
bug where we simply dropped the ball to fix it. I personally dont find 
it acceptable to have deadlocks in the kernel, where all that is needed 
to trigger it is "high i/o loads", no matter how hard it is to fix the 
deadlock.

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG?] possible recursive locking detected
  2006-07-27  9:35             ` Anton Altaparmakov
@ 2006-07-27 10:02               ` Nick Piggin
  2006-07-27 12:30                 ` Anton Altaparmakov
  0 siblings, 1 reply; 18+ messages in thread
From: Nick Piggin @ 2006-07-27 10:02 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: Andrew Morton, eike-kernel, linux-kernel, aia21

Anton Altaparmakov wrote:
> On Thu, 2006-07-27 at 19:18 +1000, Nick Piggin wrote:
> 
>>Anton Altaparmakov wrote:

>>>I beg to differ.  It is a bug.  You cannot reenter the file system when
>>>the file system is trying to allocate memory.  Otherwise you can never
>>>allocate memory with any locks held or you are bound to introduce an
>>>A->B B->A deadlock somewhere.
>>
>>I don't think it is a bug in general. It really depends on the allocation:
>>
>>- If it is a path that might be required in order to writeout a page, then
>>yes GFP_NOFS is going to help prevent deadlocks.
>>
>>- If it is a path where you'll take the same locks as page reclaim requires,
>>then again GFP_NOFS is required.
>>
>>For NTFS case, it seems like holding i_mutex on the write path falls foul
>>of the second problem. But I agree with Andrew that this is a critical case
>>where we do have to enter the fs. GFP_NOFS is too big a hammer to use.
>>
>>I guess you'd have to change NTFS to do something sane privately, or come
>>up with a nice general solution that doesn't harm the common filesystems
>>that apparently don't have a problem here... can you just add GFP_NOFS to
>>NTFS's mapping_gfp_mask to start with?
> 
> 
> I don't think NTFS has a problem either.  It is a theoretical problem

No, I mean: *really* doesn't have a problem. If Andrew says ext2 doesn't
need i_mutex in reclaim, then I tend to believe him.

> with an extremely small chance of being hit.  I am happy to have such a
> problem for now.  There are more pressing problems to solve.  The only
> thing that needs to happen is for the messages to stop so people stop
> complaining / getting worried about them...

I guess the memory deadlock issue is probably mostly theoretical, although
it is still nice to get them fixed. I'd imagine a deadlock condition -- if
one really exists -- could be hit without much problem though. Page reclaim
will readily get kicked from the write(2) path, and will potentially free
*lots* of stuff from there.

If it isn't a problem for you, I'd suspect it might be due to some other
conditions which happen to mean it is avoided. For example, the inode who's
i_mutex you are holding will have a raised refcount AFAIK, so it will not
get reclaimed and so may get around your problem.

This would be a valid solution IMO. It probably could do with documentation
to outline the issues, though.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG?] possible recursive locking detected
  2006-07-27 10:02               ` Nick Piggin
@ 2006-07-27 12:30                 ` Anton Altaparmakov
  0 siblings, 0 replies; 18+ messages in thread
From: Anton Altaparmakov @ 2006-07-27 12:30 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, eike-kernel, linux-kernel, aia21

On Thu, 2006-07-27 at 20:02 +1000, Nick Piggin wrote:
> Anton Altaparmakov wrote:
> > On Thu, 2006-07-27 at 19:18 +1000, Nick Piggin wrote:
> > 
> >>Anton Altaparmakov wrote:
> 
> >>>I beg to differ.  It is a bug.  You cannot reenter the file system when
> >>>the file system is trying to allocate memory.  Otherwise you can never
> >>>allocate memory with any locks held or you are bound to introduce an
> >>>A->B B->A deadlock somewhere.
> >>
> >>I don't think it is a bug in general. It really depends on the allocation:
> >>
> >>- If it is a path that might be required in order to writeout a page, then
> >>yes GFP_NOFS is going to help prevent deadlocks.
> >>
> >>- If it is a path where you'll take the same locks as page reclaim requires,
> >>then again GFP_NOFS is required.
> >>
> >>For NTFS case, it seems like holding i_mutex on the write path falls foul
> >>of the second problem. But I agree with Andrew that this is a critical case
> >>where we do have to enter the fs. GFP_NOFS is too big a hammer to use.
> >>
> >>I guess you'd have to change NTFS to do something sane privately, or come
> >>up with a nice general solution that doesn't harm the common filesystems
> >>that apparently don't have a problem here... can you just add GFP_NOFS to
> >>NTFS's mapping_gfp_mask to start with?
> > 
> > 
> > I don't think NTFS has a problem either.  It is a theoretical problem
> 
> No, I mean: *really* doesn't have a problem. If Andrew says ext2 doesn't
> need i_mutex in reclaim, then I tend to believe him.
> 
> > with an extremely small chance of being hit.  I am happy to have such a
> > problem for now.  There are more pressing problems to solve.  The only
> > thing that needs to happen is for the messages to stop so people stop
> > complaining / getting worried about them...
> 
> I guess the memory deadlock issue is probably mostly theoretical, although
> it is still nice to get them fixed. I'd imagine a deadlock condition -- if
> one really exists -- could be hit without much problem though. Page reclaim
> will readily get kicked from the write(2) path, and will potentially free
> *lots* of stuff from there.
> 
> If it isn't a problem for you, I'd suspect it might be due to some other
> conditions which happen to mean it is avoided. For example, the inode who's
> i_mutex you are holding will have a raised refcount AFAIK, so it will not
> get reclaimed and so may get around your problem.

That is true, yes.  So at least in that respect it should be safe.

> This would be a valid solution IMO. It probably could do with documentation
> to outline the issues, though.

That is true.

Best regards,

        Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://www.linux-ntfs.org/ & http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG?] possible recursive locking detected
  2006-07-27  9:46               ` Ingo Molnar
@ 2006-07-27 14:31                 ` Anton Altaparmakov
  2006-07-27 14:45                   ` Ingo Molnar
  0 siblings, 1 reply; 18+ messages in thread
From: Anton Altaparmakov @ 2006-07-27 14:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, nickpiggin, eike-kernel, linux-kernel, aia21,
	Arjan van de Ven

On Thu, 2006-07-27 at 11:46 +0200, Ingo Molnar wrote:
> * Anton Altaparmakov <aia21@cam.ac.uk> wrote:
> 
> > An example is the potential deadlock in generic buffered file write 
> > where we fault in a page via fault_in_pages_readable() but there is 
> > nothing to guarantee that page will not go away between us doing this 
> > and us using the page.
> 
> isnt this solved by:
> 
>  commit 6527c2bdf1f833cc18e8f42bd97973d583e4aa83
>  Author: Vladimir V. Saveliev <vs@namesys.com>
>  Date:   Tue Jun 27 02:53:57 2006 -0700
> 
>      [PATCH] generic_file_buffered_write(): deadlock on vectored write
> 
> ?
> 
> if not, do you have any description of the problem or a link to previous 
> discussion[s] outlining the problem? To me it appears this is a kernel 
> bug where we simply dropped the ball to fix it. I personally dont find 
> it acceptable to have deadlocks in the kernel, where all that is needed 
> to trigger it is "high i/o loads", no matter how hard it is to fix the 
> deadlock.

For reiserfs?  Certainly not given it doesn't use
generic_file_buffered_write() and instead does the most useless and
stupid thing known to mankind by causing the deadlock even more
effectively (it grabs and locks all the pages and _then_ calls
fault_in_pages_readable() afterwards!)...  In fact the way we stabilize
reiserfs is to make it use generic_file_write() which doesn't do such
stupidities...

Note that even the above patch is not a 100% solution.  What guarantees
are there that the page faulted in will still be around when it is read
a few lines down the line in the code?  Given sufficient parallel memory
pressure/io pressure it can still cause the page to be evicted again
immediately after it is faulted in...

All the above patch does is to _dramatically_ reduce the race window for
this happening but it does not eliminate it in theory (AFAICS).

So if your stance is that deadlocks are completely unacceptable it still
is not fixed.  If your stance is that _really_ unlikely deadlocks are
acceptable then it is fixed.

The heavily loaded servers here certainly do not suffer from the
deadlock any more (well they haven't done for a while anyway no
guarantees it won't happen tomorrow).

Best regards,

        Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://www.linux-ntfs.org/ & http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG?] possible recursive locking detected
  2006-07-27 14:31                 ` Anton Altaparmakov
@ 2006-07-27 14:45                   ` Ingo Molnar
  2006-07-27 18:04                     ` Andrew Morton
  0 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2006-07-27 14:45 UTC (permalink / raw)
  To: Anton Altaparmakov
  Cc: Andrew Morton, nickpiggin, eike-kernel, linux-kernel, aia21,
	Arjan van de Ven


* Anton Altaparmakov <aia21@cam.ac.uk> wrote:

> Note that even the above patch is not a 100% solution.  What 
> guarantees are there that the page faulted in will still be around 
> when it is read a few lines down the line in the code?  Given 
> sufficient parallel memory pressure/io pressure it can still cause the 
> page to be evicted again immediately after it is faulted in...
>
> All the above patch does is to _dramatically_ reduce the race window 
> for this happening but it does not eliminate it in theory (AFAICS).
> 
> So if your stance is that deadlocks are completely unacceptable it 
> still is not fixed.  If your stance is that _really_ unlikely 
> deadlocks are acceptable then it is fixed.

my 'stance' is pretty common-sense: exploitable deadlocks (it's possible 
to force eviction of a page), or even hard-to-trigger but possible 
deadlocks (which are not associated with hopeless resource exhaustation) 
must be fixed.

couldnt we exclude the case of 'write writing to the same page it is 
reading from' abuse, to avoid the deadlock problem?

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG?] possible recursive locking detected
  2006-07-27 14:45                   ` Ingo Molnar
@ 2006-07-27 18:04                     ` Andrew Morton
  0 siblings, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2006-07-27 18:04 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: aia21, nickpiggin, eike-kernel, linux-kernel, aia21, arjan

On Thu, 27 Jul 2006 16:45:43 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Anton Altaparmakov <aia21@cam.ac.uk> wrote:
> 
> > Note that even the above patch is not a 100% solution.  What 
> > guarantees are there that the page faulted in will still be around 
> > when it is read a few lines down the line in the code?  Given 
> > sufficient parallel memory pressure/io pressure it can still cause the 
> > page to be evicted again immediately after it is faulted in...
> >
> > All the above patch does is to _dramatically_ reduce the race window 
> > for this happening but it does not eliminate it in theory (AFAICS).
> > 
> > So if your stance is that deadlocks are completely unacceptable it 
> > still is not fixed.  If your stance is that _really_ unlikely 
> > deadlocks are acceptable then it is fixed.
> 
> my 'stance' is pretty common-sense: exploitable deadlocks (it's possible 
> to force eviction of a page), or even hard-to-trigger but possible 
> deadlocks (which are not associated with hopeless resource exhaustation) 
> must be fixed.

Yeah.  It's super-hard to hit though - I spent some time trying to do so
back in 2.5.<late> and was unable to do so.

And nobody is likely to hit it in production because nobody will go and
write() into a pagecache page from a mmapped copy of the same page
(surely?).  So it's the deliberately-triggered deadlocks we need to be
concerned of here.

That's for ext2/3.  I didn't know about the reiserfs problem.

> couldnt we exclude the case of 'write writing to the same page it is 
> reading from' abuse, to avoid the deadlock problem?

That would involve doing a follow_page() to get at the other pageframe.  If
we were to do that, we could just pin the page.  But we've always been
reluctant to add the cost of that.

I guess we could fix it by making the copy_to/from_user be atomic and if it
faults, drop the page lock, loop around and try again.

There's a more serious deadlock in there: an ab/ba deadlock between
journal_start() and lock_page().  It's hard to fix.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2006-07-27 18:05 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-26 16:05 [BUG?] possible recursive locking detected Rolf Eike Beer
2006-07-27  5:53 ` Andrew Morton
2006-07-27  6:51   ` Nick Piggin
2006-07-27  7:15     ` Anton Altaparmakov
2006-07-27  7:38       ` Andrew Morton
2006-07-27  8:19         ` Anton Altaparmakov
2006-07-27  8:53           ` Andrew Morton
2006-07-27  9:28             ` Anton Altaparmakov
2006-07-27  9:46               ` Ingo Molnar
2006-07-27 14:31                 ` Anton Altaparmakov
2006-07-27 14:45                   ` Ingo Molnar
2006-07-27 18:04                     ` Andrew Morton
2006-07-27  9:18           ` Nick Piggin
2006-07-27  9:35             ` Anton Altaparmakov
2006-07-27 10:02               ` Nick Piggin
2006-07-27 12:30                 ` Anton Altaparmakov
2006-07-27  7:24     ` Andrew Morton
2006-07-27  7:29   ` Arjan van de Ven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox