public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)...
       [not found]   ` <20080622181011.GC625@csn.ul.ie>
@ 2008-06-22 18:14     ` Mel Gorman
  2008-06-22 18:54       ` Daniel J Blueman
       [not found]     ` <20080622112100.794b1ae1@infradead.org>
  1 sibling, 1 reply; 8+ messages in thread
From: Mel Gorman @ 2008-06-22 18:14 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Christoph Lameter, Linus Torvalds, Alexander Beregalov,
	Linux Kernel, david, xfs

(Sorry for the resend, the wrong Dave Chinner's email address was used)

On (22/06/08 10:58), Daniel J Blueman didst pronounce:
> I'm seeing a similar issue [2] to what was recently reported [1] by
> Alexander, but with another workload involving XFS and memory
> pressure.
> 

Is NFS involved or is this XFS only? It looks like XFS-only but no harm in
being sure.

I'm beginning to wonder if this is a problem where a lot of dirty inodes are
being written back in this path and we stall while that happens. I'm still
not getting why we are triggering this now and did not before 2.6.26-rc1
or why it bisects to the zonelist modifications. Diffing the reclaim and
allocation paths between 2.6.25 and 2.6.26-rc1 has not yielded any candidates
for me yet that would explain this.

> SLUB allocator is in use and config is at http://quora.org/config-client-debug .
> 
> Let me know if you'd like more details/vmlinux objdump etc.
> 
> Thanks,
>  Daniel
> 
> --- [1]
> 
> http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/e673c9173d45a735/db9213ef39e4e11c
> 
> --- [2]
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.26-rc7-210c #2
> -------------------------------------------------------
> AutopanoPro/4470 is trying to acquire lock:
>  (iprune_mutex){--..}, at: [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290
> 
> but task is already holding lock:
>  (&mm->mmap_sem){----}, at: [<ffffffff805e3e15>] do_page_fault+0x255/0x890
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #2 (&mm->mmap_sem){----}:
>       [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020
>       [<ffffffff802793f5>] lock_acquire+0x65/0x90
>       [<ffffffff805df5ab>] down_read+0x3b/0x70
>       [<ffffffff805e3e3c>] do_page_fault+0x27c/0x890
>       [<ffffffff805e16cd>] error_exit+0x0/0xa9
>       [<ffffffffffffffff>] 0xffffffffffffffff
> 
> -> #1 (&(&ip->i_iolock)->mr_lock){----}:
>       [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020
>       [<ffffffff802793f5>] lock_acquire+0x65/0x90
>       [<ffffffff8026d746>] down_write_nested+0x46/0x80
>       [<ffffffff8039df29>] xfs_ilock+0x99/0xa0
>       [<ffffffff8039e0cf>] xfs_ireclaim+0x3f/0x90
>       [<ffffffff803ba889>] xfs_finish_reclaim+0x59/0x1a0
>       [<ffffffff803bc199>] xfs_reclaim+0x109/0x110
>       [<ffffffff803c9541>] xfs_fs_clear_inode+0xe1/0x110
>       [<ffffffff802d906d>] clear_inode+0x7d/0x110
>       [<ffffffff802d93aa>] dispose_list+0x2a/0x100
>       [<ffffffff802d96af>] shrink_icache_memory+0x22f/0x290
>       [<ffffffff8029d868>] shrink_slab+0x168/0x1d0
>       [<ffffffff8029e0b6>] kswapd+0x3b6/0x560
>       [<ffffffff8026921d>] kthread+0x4d/0x80
>       [<ffffffff80227428>] child_rip+0xa/0x12
>       [<ffffffffffffffff>] 0xffffffffffffffff
> 
> -> #0 (iprune_mutex){--..}:
>       [<ffffffff80278db7>] __lock_acquire+0xa47/0x1020
>       [<ffffffff802793f5>] lock_acquire+0x65/0x90
>       [<ffffffff805dedd5>] mutex_lock_nested+0xb5/0x300
>       [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290
>       [<ffffffff8029d868>] shrink_slab+0x168/0x1d0
>       [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0
>       [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0
>       [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10
>       [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0
>       [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0
>       [<ffffffff805e3ecc>] do_page_fault+0x30c/0x890
>       [<ffffffff805e16cd>] error_exit+0x0/0xa9
>       [<ffffffffffffffff>] 0xffffffffffffffff
> 
> other info that might help us debug this:
> 
> 2 locks held by AutopanoPro/4470:
>  #0:  (&mm->mmap_sem){----}, at: [<ffffffff805e3e15>] do_page_fault+0x255/0x890
>  #1:  (shrinker_rwsem){----}, at: [<ffffffff8029d732>] shrink_slab+0x32/0x1d0
> 
> stack backtrace:
> Pid: 4470, comm: AutopanoPro Not tainted 2.6.26-rc7-210c #2
> 
> Call Trace:
>  [<ffffffff80276823>] print_circular_bug_tail+0x83/0x90
>  [<ffffffff80275e09>] ? print_circular_bug_entry+0x49/0x60
>  [<ffffffff80278db7>] __lock_acquire+0xa47/0x1020
>  [<ffffffff802793f5>] lock_acquire+0x65/0x90
>  [<ffffffff802d94fd>] ? shrink_icache_memory+0x7d/0x290
>  [<ffffffff805dedd5>] mutex_lock_nested+0xb5/0x300
>  [<ffffffff802d94fd>] ? shrink_icache_memory+0x7d/0x290
>  [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290
>  [<ffffffff8029d732>] ? shrink_slab+0x32/0x1d0
>  [<ffffffff8029d868>] shrink_slab+0x168/0x1d0
>  [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0
>  [<ffffffff8029c240>] ? isolate_pages_global+0x0/0x40
>  [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0
>  [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10
>  [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0
>  [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0
>  [<ffffffff80277e2f>] ? trace_hardirqs_on+0xbf/0x150
>  [<ffffffff805e3e15>] ? do_page_fault+0x255/0x890
>  [<ffffffff805e3ecc>] do_page_fault+0x30c/0x890
>  [<ffffffff805e16cd>] error_exit+0x0/0xa9
> -- 
> Daniel J Blueman
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)...
  2008-06-22 18:14     ` [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec) Mel Gorman
@ 2008-06-22 18:54       ` Daniel J Blueman
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel J Blueman @ 2008-06-22 18:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Lameter, Linus Torvalds, Alexander Beregalov,
	Linux Kernel, david, xfs

On Sun, Jun 22, 2008 at 7:14 PM, Mel Gorman <mel@csn.ul.ie> wrote:
> On (22/06/08 10:58), Daniel J Blueman didst pronounce:
>> I'm seeing a similar issue [2] to what was recently reported [1] by
>> Alexander, but with another workload involving XFS and memory
>> pressure.
>>
>
> Is NFS involved or is this XFS only? It looks like XFS-only but no harm in
> being sure.

The application is reading a 10MB file from NFS around every few
seconds, and writing back ~30MB every few seconds to local XFS;
another thread in the same application is consuming that data and
writing out ~2MB to NFS after circa 10 input files,  so NFS isn't
dominant but is involved.

> I'm beginning to wonder if this is a problem where a lot of dirty inodes are
> being written back in this path and we stall while that happens. I'm still
> not getting why we are triggering this now and did not before 2.6.26-rc1
> or why it bisects to the zonelist modifications. Diffing the reclaim and
> allocation paths between 2.6.25 and 2.6.26-rc1 has not yielded any candidates
> for me yet that would explain this.
>
>> SLUB allocator is in use and config is at http://quora.org/config-client-debug .
>>
>> Let me know if you'd like more details/vmlinux objdump etc.
>>
>> Thanks,
>>  Daniel
>>
>> --- [1]
>>
>> http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/e673c9173d45a735/db9213ef39e4e11c
>>
>> --- [2]
>>
>> =======================================================
>> [ INFO: possible circular locking dependency detected ]
>> 2.6.26-rc7-210c #2
>> -------------------------------------------------------
>> AutopanoPro/4470 is trying to acquire lock:
>>  (iprune_mutex){--..}, at: [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290
>>
>> but task is already holding lock:
>>  (&mm->mmap_sem){----}, at: [<ffffffff805e3e15>] do_page_fault+0x255/0x890
>>
>> which lock already depends on the new lock.
>>
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #2 (&mm->mmap_sem){----}:
>>       [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020
>>       [<ffffffff802793f5>] lock_acquire+0x65/0x90
>>       [<ffffffff805df5ab>] down_read+0x3b/0x70
>>       [<ffffffff805e3e3c>] do_page_fault+0x27c/0x890
>>       [<ffffffff805e16cd>] error_exit+0x0/0xa9
>>       [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> -> #1 (&(&ip->i_iolock)->mr_lock){----}:
>>       [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020
>>       [<ffffffff802793f5>] lock_acquire+0x65/0x90
>>       [<ffffffff8026d746>] down_write_nested+0x46/0x80
>>       [<ffffffff8039df29>] xfs_ilock+0x99/0xa0
>>       [<ffffffff8039e0cf>] xfs_ireclaim+0x3f/0x90
>>       [<ffffffff803ba889>] xfs_finish_reclaim+0x59/0x1a0
>>       [<ffffffff803bc199>] xfs_reclaim+0x109/0x110
>>       [<ffffffff803c9541>] xfs_fs_clear_inode+0xe1/0x110
>>       [<ffffffff802d906d>] clear_inode+0x7d/0x110
>>       [<ffffffff802d93aa>] dispose_list+0x2a/0x100
>>       [<ffffffff802d96af>] shrink_icache_memory+0x22f/0x290
>>       [<ffffffff8029d868>] shrink_slab+0x168/0x1d0
>>       [<ffffffff8029e0b6>] kswapd+0x3b6/0x560
>>       [<ffffffff8026921d>] kthread+0x4d/0x80
>>       [<ffffffff80227428>] child_rip+0xa/0x12
>>       [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> -> #0 (iprune_mutex){--..}:
>>       [<ffffffff80278db7>] __lock_acquire+0xa47/0x1020
>>       [<ffffffff802793f5>] lock_acquire+0x65/0x90
>>       [<ffffffff805dedd5>] mutex_lock_nested+0xb5/0x300
>>       [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290
>>       [<ffffffff8029d868>] shrink_slab+0x168/0x1d0
>>       [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0
>>       [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0
>>       [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10
>>       [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0
>>       [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0
>>       [<ffffffff805e3ecc>] do_page_fault+0x30c/0x890
>>       [<ffffffff805e16cd>] error_exit+0x0/0xa9
>>       [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> other info that might help us debug this:
>>
>> 2 locks held by AutopanoPro/4470:
>>  #0:  (&mm->mmap_sem){----}, at: [<ffffffff805e3e15>] do_page_fault+0x255/0x890
>>  #1:  (shrinker_rwsem){----}, at: [<ffffffff8029d732>] shrink_slab+0x32/0x1d0
>>
>> stack backtrace:
>> Pid: 4470, comm: AutopanoPro Not tainted 2.6.26-rc7-210c #2
>>
>> Call Trace:
>>  [<ffffffff80276823>] print_circular_bug_tail+0x83/0x90
>>  [<ffffffff80275e09>] ? print_circular_bug_entry+0x49/0x60
>>  [<ffffffff80278db7>] __lock_acquire+0xa47/0x1020
>>  [<ffffffff802793f5>] lock_acquire+0x65/0x90
>>  [<ffffffff802d94fd>] ? shrink_icache_memory+0x7d/0x290
>>  [<ffffffff805dedd5>] mutex_lock_nested+0xb5/0x300
>>  [<ffffffff802d94fd>] ? shrink_icache_memory+0x7d/0x290
>>  [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290
>>  [<ffffffff8029d732>] ? shrink_slab+0x32/0x1d0
>>  [<ffffffff8029d868>] shrink_slab+0x168/0x1d0
>>  [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0
>>  [<ffffffff8029c240>] ? isolate_pages_global+0x0/0x40
>>  [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0
>>  [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10
>>  [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0
>>  [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0
>>  [<ffffffff80277e2f>] ? trace_hardirqs_on+0xbf/0x150
>>  [<ffffffff805e3e15>] ? do_page_fault+0x255/0x890
>>  [<ffffffff805e3ecc>] do_page_fault+0x30c/0x890
>>  [<ffffffff805e16cd>] error_exit+0x0/0xa9
>> --
>> Daniel J Blueman
>>
>
> --
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)...
       [not found] ` <6278d2220806220258p28de00c1x615ad7b2f708e3f8@mail.gmail.com>
       [not found]   ` <20080622181011.GC625@csn.ul.ie>
@ 2008-06-22 22:19   ` Dave Chinner
  2008-06-23  0:24     ` Mel Gorman
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2008-06-22 22:19 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Christoph Lameter, Mel Gorman, Linus Torvalds,
	Alexander Beregalov, Linux Kernel, xfs

[added xfs@oss.sgi.com to cc]

On Sun, Jun 22, 2008 at 10:58:56AM +0100, Daniel J Blueman wrote:
> I'm seeing a similar issue [2] to what was recently reported [1] by
> Alexander, but with another workload involving XFS and memory
> pressure.
> 
> SLUB allocator is in use and config is at http://quora.org/config-client-debug .
> 
> Let me know if you'd like more details/vmlinux objdump etc.
> 
> Thanks,
>  Daniel
> 
> --- [1]
> 
> http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/e673c9173d45a735/db9213ef39e4e11c
> 
> --- [2]
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.26-rc7-210c #2
> -------------------------------------------------------
> AutopanoPro/4470 is trying to acquire lock:
>  (iprune_mutex){--..}, at: [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290
> 
> but task is already holding lock:
>  (&mm->mmap_sem){----}, at: [<ffffffff805e3e15>] do_page_fault+0x255/0x890
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #2 (&mm->mmap_sem){----}:
>       [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020
>       [<ffffffff802793f5>] lock_acquire+0x65/0x90
>       [<ffffffff805df5ab>] down_read+0x3b/0x70
>       [<ffffffff805e3e3c>] do_page_fault+0x27c/0x890
>       [<ffffffff805e16cd>] error_exit+0x0/0xa9
>       [<ffffffffffffffff>] 0xffffffffffffffff
> 
> -> #1 (&(&ip->i_iolock)->mr_lock){----}:
>       [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020
>       [<ffffffff802793f5>] lock_acquire+0x65/0x90
>       [<ffffffff8026d746>] down_write_nested+0x46/0x80
>       [<ffffffff8039df29>] xfs_ilock+0x99/0xa0
>       [<ffffffff8039e0cf>] xfs_ireclaim+0x3f/0x90
>       [<ffffffff803ba889>] xfs_finish_reclaim+0x59/0x1a0
>       [<ffffffff803bc199>] xfs_reclaim+0x109/0x110
>       [<ffffffff803c9541>] xfs_fs_clear_inode+0xe1/0x110
>       [<ffffffff802d906d>] clear_inode+0x7d/0x110
>       [<ffffffff802d93aa>] dispose_list+0x2a/0x100
>       [<ffffffff802d96af>] shrink_icache_memory+0x22f/0x290
>       [<ffffffff8029d868>] shrink_slab+0x168/0x1d0
>       [<ffffffff8029e0b6>] kswapd+0x3b6/0x560
>       [<ffffffff8026921d>] kthread+0x4d/0x80
>       [<ffffffff80227428>] child_rip+0xa/0x12
>       [<ffffffffffffffff>] 0xffffffffffffffff

You may as well ignore anything invlving this path in XFS until
lockdep gets fixed. The kswapd reclaim path is inverted over the
synchronous reclaim path that is xfs_ilock -> run out of memory ->
prune_icache and then potentially another -> xfs_ilock.

In this case, XFS can *never* deadlock because the second xfs_ilock
is on a different, unreferenced, unlocked inode, but without turning
off lockdep there is nothing in XFS that can be done to prevent
this warning.

Therxp eis a similar bug in the VM w.r.t the mmap_sem in that the
mmap_sem is held across a call to put_filp() which can result in
inversions between the xfs_ilock and mmap_sem.

Both of these cases cannot be solved by changing XFS - lockdep
needs to be made aware of paths that can invert normal locking
order (like prune_icache) so it doesn't give false positives
like this.

> -> #0 (iprune_mutex){--..}:
>       [<ffffffff80278db7>] __lock_acquire+0xa47/0x1020
>       [<ffffffff802793f5>] lock_acquire+0x65/0x90
>       [<ffffffff805dedd5>] mutex_lock_nested+0xb5/0x300
>       [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290
>       [<ffffffff8029d868>] shrink_slab+0x168/0x1d0
>       [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0
>       [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0
>       [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10
>       [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0
>       [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0
>       [<ffffffff805e3ecc>] do_page_fault+0x30c/0x890
>       [<ffffffff805e16cd>] error_exit+0x0/0xa9
>       [<ffffffffffffffff>] 0xffffffffffffffff

This case is different in that it įs complaining about mmap_sem vs
iprune_mutex, so I think that we can pretty much ignore the XFS side
of things here - the problem is higher level code....

>  [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0
>  [<ffffffff8029c240>] ? isolate_pages_global+0x0/0x40
>  [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0
>  [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10
>  [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0
>  [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0

FWIW, should page allocation in a page fault be allowed to recurse
into the filesystem? If I follow the spaghetti of inline and
compiler inlined functions correctly, this is a GFP_HIGHUSER_MOVABLE
allocation, right? Should we be allowing shrink_icache_memory()
to be called at all in the page fault path?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)...
       [not found]       ` <6278d2220806221356o4c611e43n305ec9653d6d5359@mail.gmail.com>
@ 2008-06-22 22:29         ` Dave Chinner
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2008-06-22 22:29 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Arjan van de Ven, Mel Gorman, Christoph Lameter, Linus Torvalds,
	Alexander Beregalov, Linux Kernel, xfs

On Sun, Jun 22, 2008 at 09:56:17PM +0100, Daniel J Blueman wrote:
> On Sun, Jun 22, 2008 at 7:21 PM, Arjan van de Ven <arjan@infradead.org> wrote:
> > this sort of thing can easily be exposed with the latencytop tool...
> > it will at least tell you WHAT the system is blocking on.
> > (not so much the why, the tool isn't smart enough to automatically spit
> > out kernel patches yet)
> 
> Good plan. I reproduced this without NFS mounted, so pure XFS. I
> wasn't able to capture the same process's (ie 5480) latencytop trace,
> but it may have contributed to the list.
> 
> A fair amount of debugging was turned on, hurting the latency some
> (despite it being a 3.6GHz Penryn).
> 
> Daniel
> 
> --- [1]
> 
> $ vmstat 1
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> <snip>
>  1  1    156  14424     12 2329672    0    0     0 110755  177 3820 57  6 36  0
>  2  1    156  14736     12 2348152    0    0    24 25172  204 26018 35 21 43  1
>  5  0    156  24252     12 2363800    0    0 59656    31  545 25292 35 14 28 23
>  4  0    156  14696     12 2317784    0    0  3824     0   38 23083 95  6  0  0
>  4  0    156  14440     12 2319304    0    0  4672     0   72 3372 93  3  3  2
>  2  0    156  14428     12 2318484    0    0     0     4   27  731 52  0 49  0
>  2  0    156  14480     12 2308512    0    0     0    12   32 36629 39 13 49  0
>  2  0    156  14572     12 2301220    0    0  3904 12316  117 10760 58  7 26 11
> 
> --- [2]
> 
> Cause                                                Maximum     Percentage
> down xfs_buf_lock _xfs_buf_find xfs_buf_get_flags 271.1 msec          0.4 %

Waiting on I/O to complete. Your disk is busy.

> down xfs_buf_iowait xfs_buf_iostart xfs_buf_read_f206.1 msec          1.3 %

Waiting on I/O to complete. Your disk is busy.

> down xfs_buf_lock xfs_getsb xfs_trans_getsb xfs_tr160.4 msec          0.5 %

Waiting on a superblock I/O or a transaction to complete. Your
disk is busy.  (Note, this one can be avoided with lazy
superblock counters).

[snip reast of "busy disk trace"]

But really, all latencytop is telling you here is that it takes
time to wait for I/O to complete. It's mostly useless for tracking
down locking issues when you've got I/O in progress...

> 2.6.26-rc7-211c #2
> -------------------------------------------------------
> AutopanoPro/5480 is trying to acquire lock:
>  (iprune_mutex){--..}, at: [<ffffffff802d2a5d>] shrink_icache_memory+0x7d/0x280
> 
> but task is already holding lock:
>  (&(&ip->i_iolock)->mr_lock){----}, at: [<ffffffff803a459f>]
> xfs_ilock+0xbf/0x110
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (&(&ip->i_iolock)->mr_lock){----}:
>        [<ffffffff802769ad>] __lock_acquire+0xbdd/0x1020
>        [<ffffffff80276e55>] lock_acquire+0x65/0x90
>        [<ffffffff8026b41b>] down_write_nested+0x4b/0x90
>        [<ffffffff803a45df>] xfs_ilock+0xff/0x110
>        [<ffffffff803a47af>] xfs_ireclaim+0x3f/0x90
>        [<ffffffff803c9169>] xfs_finish_reclaim+0x59/0x220
>        [<ffffffff803cc3b5>] xfs_reclaim+0x185/0x190
>        [<ffffffff803d9211>] xfs_fs_clear_inode+0xe1/0x130
>        [<ffffffff802d25c7>] clear_inode+0x87/0x120
>        [<ffffffff802d290a>] dispose_list+0x2a/0x100
>        [<ffffffff802d2c06>] shrink_icache_memory+0x226/0x280
>        [<ffffffff8029d9c5>] shrink_slab+0x125/0x180
>        [<ffffffff8029dc52>] try_to_free_pages+0x232/0x360
>        [<ffffffff80297f0d>] __alloc_pages_internal+0x1ed/0x4a0
>        [<ffffffff802981db>] __alloc_pages+0xb/0x10
>        [<ffffffff802a312a>] handle_mm_fault+0x46a/0x6d0
>        [<ffffffff8060510a>] do_page_fault+0x3ca/0x830
>        [<ffffffff80602add>] error_exit+0x0/0xa9
>        [<ffffffffffffffff>] 0xffffffffffffffff

mmap_sem -> iprune_mutex -> xfs_ilock

> -> #0 (iprune_mutex){--..}:
>        [<ffffffff80276817>] __lock_acquire+0xa47/0x1020
>        [<ffffffff80276e55>] lock_acquire+0x65/0x90
>        [<ffffffff8060059a>] mutex_lock_nested+0xba/0x2b0
>        [<ffffffff802d2a5d>] shrink_icache_memory+0x7d/0x280
>        [<ffffffff8029d9c5>] shrink_slab+0x125/0x180
>        [<ffffffff8029dc52>] try_to_free_pages+0x232/0x360
>        [<ffffffff80297f0d>] __alloc_pages_internal+0x1ed/0x4a0
>        [<ffffffff802981db>] __alloc_pages+0xb/0x10
>        [<ffffffff8029a6b6>] __do_page_cache_readahead+0x136/0x230
>        [<ffffffff8029aa08>] ondemand_readahead+0x128/0x1f0
>        [<ffffffff8029ab45>] page_cache_async_readahead+0x75/0xa0
>        [<ffffffff80293a8a>] generic_file_aio_read+0x28a/0x610
>        [<ffffffff803d78c4>] xfs_read+0x124/0x270
>        [<ffffffff803d4416>] __xfs_file_read+0x46/0x50
>        [<ffffffff803d4451>] xfs_file_aio_read+0x11/0x20
>        [<ffffffff802bc1b1>] do_sync_read+0xf1/0x130
>        [<ffffffff802bca74>] vfs_read+0xc4/0x160
>        [<ffffffff802bcf10>] sys_read+0x50/0x90
>        [<ffffffff8022639b>] system_call_after_swapgs+0x7b/0x80
>        [<ffffffffffffffff>] 0xffffffffffffffff

xfs_ilock -> iprune_mutex

This is exactly the situation I mentioned in the previous email.
There is no potential deadlock here between the xfs_ilock and
iprune_mutex as the xfs_ilock that is held before and/or after
iprune_mutex is guaranteed to be different (the first is in use
so will never be found by shrink_icache_memory())...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)...
  2008-06-22 22:19   ` Dave Chinner
@ 2008-06-23  0:24     ` Mel Gorman
  2008-06-23  0:53       ` Dave Chinner
  2008-06-23  7:22       ` Christoph Hellwig
  0 siblings, 2 replies; 8+ messages in thread
From: Mel Gorman @ 2008-06-23  0:24 UTC (permalink / raw)
  To: Daniel J Blueman, Christoph Lameter, Linus Torvalds,
	Alexander Beregalov, Linux Kernel, xfs

On (23/06/08 08:19), Dave Chinner didst pronounce:
> [added xfs@oss.sgi.com to cc]
> 
> On Sun, Jun 22, 2008 at 10:58:56AM +0100, Daniel J Blueman wrote:
> > I'm seeing a similar issue [2] to what was recently reported [1] by
> > Alexander, but with another workload involving XFS and memory
> > pressure.
> > 
> > SLUB allocator is in use and config is at http://quora.org/config-client-debug .
> > 
> > Let me know if you'd like more details/vmlinux objdump etc.
> > 
> > Thanks,
> >  Daniel
> > 
> > --- [1]
> > 
> > http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/e673c9173d45a735/db9213ef39e4e11c
> > 
> > --- [2]
> > 
> > =======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 2.6.26-rc7-210c #2
> > -------------------------------------------------------
> > AutopanoPro/4470 is trying to acquire lock:
> >  (iprune_mutex){--..}, at: [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290
> > 
> > but task is already holding lock:
> >  (&mm->mmap_sem){----}, at: [<ffffffff805e3e15>] do_page_fault+0x255/0x890
> > 
> > which lock already depends on the new lock.
> > 
> > 
> > the existing dependency chain (in reverse order) is:
> > 
> > -> #2 (&mm->mmap_sem){----}:
> >       [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020
> >       [<ffffffff802793f5>] lock_acquire+0x65/0x90
> >       [<ffffffff805df5ab>] down_read+0x3b/0x70
> >       [<ffffffff805e3e3c>] do_page_fault+0x27c/0x890
> >       [<ffffffff805e16cd>] error_exit+0x0/0xa9
> >       [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > -> #1 (&(&ip->i_iolock)->mr_lock){----}:
> >       [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020
> >       [<ffffffff802793f5>] lock_acquire+0x65/0x90
> >       [<ffffffff8026d746>] down_write_nested+0x46/0x80
> >       [<ffffffff8039df29>] xfs_ilock+0x99/0xa0
> >       [<ffffffff8039e0cf>] xfs_ireclaim+0x3f/0x90
> >       [<ffffffff803ba889>] xfs_finish_reclaim+0x59/0x1a0
> >       [<ffffffff803bc199>] xfs_reclaim+0x109/0x110
> >       [<ffffffff803c9541>] xfs_fs_clear_inode+0xe1/0x110
> >       [<ffffffff802d906d>] clear_inode+0x7d/0x110
> >       [<ffffffff802d93aa>] dispose_list+0x2a/0x100
> >       [<ffffffff802d96af>] shrink_icache_memory+0x22f/0x290
> >       [<ffffffff8029d868>] shrink_slab+0x168/0x1d0
> >       [<ffffffff8029e0b6>] kswapd+0x3b6/0x560
> >       [<ffffffff8026921d>] kthread+0x4d/0x80
> >       [<ffffffff80227428>] child_rip+0xa/0x12
> >       [<ffffffffffffffff>] 0xffffffffffffffff
> 
> You may as well ignore anything invlving this path in XFS until
> lockdep gets fixed. The kswapd reclaim path is inverted over the
> synchronous reclaim path that is xfs_ilock -> run out of memory ->
> prune_icache and then potentially another -> xfs_ilock.
> 

In that case, have you any theory as to why this circular dependency is
being reported now but wasn't before 2.6.26-rc1? I'm beginning to wonder
if the bisecting fingering the zonelist modifiation is just a
co-incidence.

Also, do you think the stalls were happening before but just not being noticed?

> In this case, XFS can *never* deadlock because the second xfs_ilock
> is on a different, unreferenced, unlocked inode, but without turning
> off lockdep there is nothing in XFS that can be done to prevent
> this warning.
> 
> Therxp eis a similar bug in the VM w.r.t the mmap_sem in that the
> mmap_sem is held across a call to put_filp() which can result in
> inversions between the xfs_ilock and mmap_sem.
> 
> Both of these cases cannot be solved by changing XFS - lockdep
> needs to be made aware of paths that can invert normal locking
> order (like prune_icache) so it doesn't give false positives
> like this.
> 
> > -> #0 (iprune_mutex){--..}:
> >       [<ffffffff80278db7>] __lock_acquire+0xa47/0x1020
> >       [<ffffffff802793f5>] lock_acquire+0x65/0x90
> >       [<ffffffff805dedd5>] mutex_lock_nested+0xb5/0x300
> >       [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290
> >       [<ffffffff8029d868>] shrink_slab+0x168/0x1d0
> >       [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0
> >       [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0
> >       [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10
> >       [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0
> >       [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0
> >       [<ffffffff805e3ecc>] do_page_fault+0x30c/0x890
> >       [<ffffffff805e16cd>] error_exit+0x0/0xa9
> >       [<ffffffffffffffff>] 0xffffffffffffffff
> 
> This case is different in that it ??s complaining about mmap_sem vs
> iprune_mutex, so I think that we can pretty much ignore the XFS side
> of things here - the problem is higher level code....
> 
> >  [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0
> >  [<ffffffff8029c240>] ? isolate_pages_global+0x0/0x40
> >  [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0
> >  [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10
> >  [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0
> >  [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0
> 
> FWIW, should page allocation in a page fault be allowed to recurse
> into the filesystem? If I follow the spaghetti of inline and
> compiler inlined functions correctly, this is a GFP_HIGHUSER_MOVABLE
> allocation, right? Should we be allowing shrink_icache_memory()
> to be called at all in the page fault path?
> 

Well, the page fault path is able to go to sleep and can enter direct
reclaim under low memory situations. Right now, I'm failing to see why a
page fault should not be allowed to reclaim pages in use by a
filesystem. It was allowed before so the question still is why the
circular lock warning appears now but didn't before.

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)...
  2008-06-23  0:24     ` Mel Gorman
@ 2008-06-23  0:53       ` Dave Chinner
  2008-06-23  7:22       ` Christoph Hellwig
  1 sibling, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2008-06-23  0:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Daniel J Blueman, Christoph Lameter, Linus Torvalds,
	Alexander Beregalov, Linux Kernel, xfs

On Mon, Jun 23, 2008 at 01:24:15AM +0100, Mel Gorman wrote:
> On (23/06/08 08:19), Dave Chinner didst pronounce:
> > [added xfs@oss.sgi.com to cc]
> > 
> > On Sun, Jun 22, 2008 at 10:58:56AM +0100, Daniel J Blueman wrote:
> > > I'm seeing a similar issue [2] to what was recently reported [1] by
> > > Alexander, but with another workload involving XFS and memory
> > > pressure.

[....]

> > You may as well ignore anything invlving this path in XFS until
> > lockdep gets fixed. The kswapd reclaim path is inverted over the
> > synchronous reclaim path that is xfs_ilock -> run out of memory ->
> > prune_icache and then potentially another -> xfs_ilock.
> > 
> 
> In that case, have you any theory as to why this circular dependency is
> being reported now but wasn't before 2.6.26-rc1? I'm beginning to wonder
> if the bisecting fingering the zonelist modifiation is just a
> co-incidence.

Probably co-incidence. Perhaps it's simply changed the way reclaim
is behaving and we are more likely to be trimming slab caches
instead of getting free pages from the page lists?

> Also, do you think the stalls were happening before but just not
> being noticed?

Entirely possible, I think,  but I know of no evidence one way or
another.

[....]

> > FWIW, should page allocation in a page fault be allowed to recurse
> > into the filesystem? If I follow the spaghetti of inline and
> > compiler inlined functions correctly, this is a GFP_HIGHUSER_MOVABLE
> > allocation, right? Should we be allowing shrink_icache_memory()
> > to be called at all in the page fault path?
> 
> Well, the page fault path is able to go to sleep and can enter direct
> reclaim under low memory situations. Right now, I'm failing to see why a
> page fault should not be allowed to reclaim pages in use by a
> filesystem.  It was allowed before so the question still is why the
> circular lock warning appears now but didn't before.

Yeah, it's the fact that this is the first time that this lockdep
warning has come up that prompted me to ask the question. I know
that we are not allowed to lock an inode in the fault path as that
can lead to deadlocks in the read and write paths, so what I was
really wondering is if we can deadlock in a more convoluted manner
by taking locks on *other inodes* in the page fault path....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)...
  2008-06-23  0:24     ` Mel Gorman
  2008-06-23  0:53       ` Dave Chinner
@ 2008-06-23  7:22       ` Christoph Hellwig
  2008-06-23 18:38         ` Mel Gorman
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2008-06-23  7:22 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Daniel J Blueman, Christoph Lameter, Linus Torvalds,
	Alexander Beregalov, Linux Kernel, xfs

On Mon, Jun 23, 2008 at 01:24:15AM +0100, Mel Gorman wrote:
> In that case, have you any theory as to why this circular dependency is
> being reported now but wasn't before 2.6.26-rc1? I'm beginning to wonder
> if the bisecting fingering the zonelist modifiation is just a
> co-incidence.

I've seen this traces since lockdep was added when running xfsqa.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)...
  2008-06-23  7:22       ` Christoph Hellwig
@ 2008-06-23 18:38         ` Mel Gorman
  0 siblings, 0 replies; 8+ messages in thread
From: Mel Gorman @ 2008-06-23 18:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Daniel J Blueman, Christoph Lameter, Linus Torvalds,
	Alexander Beregalov, Linux Kernel, xfs, david

On (23/06/08 03:22), Christoph Hellwig didst pronounce:
> On Mon, Jun 23, 2008 at 01:24:15AM +0100, Mel Gorman wrote:
> > In that case, have you any theory as to why this circular dependency is
> > being reported now but wasn't before 2.6.26-rc1? I'm beginning to wonder
> > if the bisecting fingering the zonelist modifiation is just a
> > co-incidence.
> 
> I've seen this traces since lockdep was added when running xfsqa.
> 

Oh right, so this isn't even 2.6.26-rc1 as such. It's an older problem that
seems to be happening in more cases now, right? At this point, I believe the
bisection fingering the zonelist modification was a co-incidence as reclaim
behaviour at least is equivilant although catching the memory leak early
was a lucky positive outcome.

It's still not clear why the circular warning is happening more regularly now
but it's "something else". Considering the number of changes made to NFS,
XFS, reclaim and other areas since, I'm not sure how to go about finding
the real underlying problem or if it can be dealt with in a trivial manner.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-06-23 18:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <6278d2220806220256g674304ectb945c14e7e09fede@mail.gmail.com>
     [not found] ` <6278d2220806220258p28de00c1x615ad7b2f708e3f8@mail.gmail.com>
     [not found]   ` <20080622181011.GC625@csn.ul.ie>
2008-06-22 18:14     ` [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec) Mel Gorman
2008-06-22 18:54       ` Daniel J Blueman
     [not found]     ` <20080622112100.794b1ae1@infradead.org>
     [not found]       ` <6278d2220806221356o4c611e43n305ec9653d6d5359@mail.gmail.com>
2008-06-22 22:29         ` Dave Chinner
2008-06-22 22:19   ` Dave Chinner
2008-06-23  0:24     ` Mel Gorman
2008-06-23  0:53       ` Dave Chinner
2008-06-23  7:22       ` Christoph Hellwig
2008-06-23 18:38         ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox