* Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)... [not found] ` <20080622181011.GC625@csn.ul.ie> @ 2008-06-22 18:14 ` Mel Gorman 2008-06-22 18:54 ` Daniel J Blueman [not found] ` <20080622112100.794b1ae1@infradead.org> 1 sibling, 1 reply; 8+ messages in thread From: Mel Gorman @ 2008-06-22 18:14 UTC (permalink / raw) To: Daniel J Blueman Cc: Christoph Lameter, Linus Torvalds, Alexander Beregalov, Linux Kernel, david, xfs (Sorry for the resend, the wrong Dave Chinner's email address was used) On (22/06/08 10:58), Daniel J Blueman didst pronounce: > I'm seeing a similar issue [2] to what was recently reported [1] by > Alexander, but with another workload involving XFS and memory > pressure. > Is NFS involved or is this XFS only? It looks like XFS-only but no harm in being sure. I'm beginning to wonder if this is a problem where a lot of dirty inodes are being written back in this path and we stall while that happens. I'm still not getting why we are triggering this now and did not before 2.6.26-rc1 or why it bisects to the zonelist modifications. Diffing the reclaim and allocation paths between 2.6.25 and 2.6.26-rc1 has not yielded any candidates for me yet that would explain this. > SLUB allocator is in use and config is at http://quora.org/config-client-debug . > > Let me know if you'd like more details/vmlinux objdump etc. > > Thanks, > Daniel > > --- [1] > > http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/e673c9173d45a735/db9213ef39e4e11c > > --- [2] > > ======================================================= > [ INFO: possible circular locking dependency detected ] > 2.6.26-rc7-210c #2 > ------------------------------------------------------- > AutopanoPro/4470 is trying to acquire lock: > (iprune_mutex){--..}, at: [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290 > > but task is already holding lock: > (&mm->mmap_sem){----}, at: [<ffffffff805e3e15>] do_page_fault+0x255/0x890 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #2 (&mm->mmap_sem){----}: > [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020 > [<ffffffff802793f5>] lock_acquire+0x65/0x90 > [<ffffffff805df5ab>] down_read+0x3b/0x70 > [<ffffffff805e3e3c>] do_page_fault+0x27c/0x890 > [<ffffffff805e16cd>] error_exit+0x0/0xa9 > [<ffffffffffffffff>] 0xffffffffffffffff > > -> #1 (&(&ip->i_iolock)->mr_lock){----}: > [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020 > [<ffffffff802793f5>] lock_acquire+0x65/0x90 > [<ffffffff8026d746>] down_write_nested+0x46/0x80 > [<ffffffff8039df29>] xfs_ilock+0x99/0xa0 > [<ffffffff8039e0cf>] xfs_ireclaim+0x3f/0x90 > [<ffffffff803ba889>] xfs_finish_reclaim+0x59/0x1a0 > [<ffffffff803bc199>] xfs_reclaim+0x109/0x110 > [<ffffffff803c9541>] xfs_fs_clear_inode+0xe1/0x110 > [<ffffffff802d906d>] clear_inode+0x7d/0x110 > [<ffffffff802d93aa>] dispose_list+0x2a/0x100 > [<ffffffff802d96af>] shrink_icache_memory+0x22f/0x290 > [<ffffffff8029d868>] shrink_slab+0x168/0x1d0 > [<ffffffff8029e0b6>] kswapd+0x3b6/0x560 > [<ffffffff8026921d>] kthread+0x4d/0x80 > [<ffffffff80227428>] child_rip+0xa/0x12 > [<ffffffffffffffff>] 0xffffffffffffffff > > -> #0 (iprune_mutex){--..}: > [<ffffffff80278db7>] __lock_acquire+0xa47/0x1020 > [<ffffffff802793f5>] lock_acquire+0x65/0x90 > [<ffffffff805dedd5>] mutex_lock_nested+0xb5/0x300 > [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290 > [<ffffffff8029d868>] shrink_slab+0x168/0x1d0 > [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0 > [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0 > [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10 > [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0 > [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0 > [<ffffffff805e3ecc>] do_page_fault+0x30c/0x890 > [<ffffffff805e16cd>] error_exit+0x0/0xa9 > [<ffffffffffffffff>] 0xffffffffffffffff > > other info that might help us debug this: > > 2 locks held by AutopanoPro/4470: > #0: (&mm->mmap_sem){----}, at: [<ffffffff805e3e15>] do_page_fault+0x255/0x890 > #1: (shrinker_rwsem){----}, at: [<ffffffff8029d732>] shrink_slab+0x32/0x1d0 > > stack backtrace: > Pid: 4470, comm: AutopanoPro Not tainted 2.6.26-rc7-210c #2 > > Call Trace: > [<ffffffff80276823>] print_circular_bug_tail+0x83/0x90 > [<ffffffff80275e09>] ? print_circular_bug_entry+0x49/0x60 > [<ffffffff80278db7>] __lock_acquire+0xa47/0x1020 > [<ffffffff802793f5>] lock_acquire+0x65/0x90 > [<ffffffff802d94fd>] ? shrink_icache_memory+0x7d/0x290 > [<ffffffff805dedd5>] mutex_lock_nested+0xb5/0x300 > [<ffffffff802d94fd>] ? shrink_icache_memory+0x7d/0x290 > [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290 > [<ffffffff8029d732>] ? shrink_slab+0x32/0x1d0 > [<ffffffff8029d868>] shrink_slab+0x168/0x1d0 > [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0 > [<ffffffff8029c240>] ? isolate_pages_global+0x0/0x40 > [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0 > [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10 > [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0 > [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0 > [<ffffffff80277e2f>] ? trace_hardirqs_on+0xbf/0x150 > [<ffffffff805e3e15>] ? do_page_fault+0x255/0x890 > [<ffffffff805e3ecc>] do_page_fault+0x30c/0x890 > [<ffffffff805e16cd>] error_exit+0x0/0xa9 > -- > Daniel J Blueman > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)... 2008-06-22 18:14 ` [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec) Mel Gorman @ 2008-06-22 18:54 ` Daniel J Blueman 0 siblings, 0 replies; 8+ messages in thread From: Daniel J Blueman @ 2008-06-22 18:54 UTC (permalink / raw) To: Mel Gorman Cc: Christoph Lameter, Linus Torvalds, Alexander Beregalov, Linux Kernel, david, xfs On Sun, Jun 22, 2008 at 7:14 PM, Mel Gorman <mel@csn.ul.ie> wrote: > On (22/06/08 10:58), Daniel J Blueman didst pronounce: >> I'm seeing a similar issue [2] to what was recently reported [1] by >> Alexander, but with another workload involving XFS and memory >> pressure. >> > > Is NFS involved or is this XFS only? It looks like XFS-only but no harm in > being sure. The application is reading a 10MB file from NFS around every few seconds, and writing back ~30MB every few seconds to local XFS; another thread in the same application is consuming that data and writing out ~2MB to NFS after circa 10 input files, so NFS isn't dominant but is involved. > I'm beginning to wonder if this is a problem where a lot of dirty inodes are > being written back in this path and we stall while that happens. I'm still > not getting why we are triggering this now and did not before 2.6.26-rc1 > or why it bisects to the zonelist modifications. Diffing the reclaim and > allocation paths between 2.6.25 and 2.6.26-rc1 has not yielded any candidates > for me yet that would explain this. > >> SLUB allocator is in use and config is at http://quora.org/config-client-debug . >> >> Let me know if you'd like more details/vmlinux objdump etc. >> >> Thanks, >> Daniel >> >> --- [1] >> >> http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/e673c9173d45a735/db9213ef39e4e11c >> >> --- [2] >> >> ======================================================= >> [ INFO: possible circular locking dependency detected ] >> 2.6.26-rc7-210c #2 >> ------------------------------------------------------- >> AutopanoPro/4470 is trying to acquire lock: >> (iprune_mutex){--..}, at: [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290 >> >> but task is already holding lock: >> (&mm->mmap_sem){----}, at: [<ffffffff805e3e15>] do_page_fault+0x255/0x890 >> >> which lock already depends on the new lock. >> >> >> the existing dependency chain (in reverse order) is: >> >> -> #2 (&mm->mmap_sem){----}: >> [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020 >> [<ffffffff802793f5>] lock_acquire+0x65/0x90 >> [<ffffffff805df5ab>] down_read+0x3b/0x70 >> [<ffffffff805e3e3c>] do_page_fault+0x27c/0x890 >> [<ffffffff805e16cd>] error_exit+0x0/0xa9 >> [<ffffffffffffffff>] 0xffffffffffffffff >> >> -> #1 (&(&ip->i_iolock)->mr_lock){----}: >> [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020 >> [<ffffffff802793f5>] lock_acquire+0x65/0x90 >> [<ffffffff8026d746>] down_write_nested+0x46/0x80 >> [<ffffffff8039df29>] xfs_ilock+0x99/0xa0 >> [<ffffffff8039e0cf>] xfs_ireclaim+0x3f/0x90 >> [<ffffffff803ba889>] xfs_finish_reclaim+0x59/0x1a0 >> [<ffffffff803bc199>] xfs_reclaim+0x109/0x110 >> [<ffffffff803c9541>] xfs_fs_clear_inode+0xe1/0x110 >> [<ffffffff802d906d>] clear_inode+0x7d/0x110 >> [<ffffffff802d93aa>] dispose_list+0x2a/0x100 >> [<ffffffff802d96af>] shrink_icache_memory+0x22f/0x290 >> [<ffffffff8029d868>] shrink_slab+0x168/0x1d0 >> [<ffffffff8029e0b6>] kswapd+0x3b6/0x560 >> [<ffffffff8026921d>] kthread+0x4d/0x80 >> [<ffffffff80227428>] child_rip+0xa/0x12 >> [<ffffffffffffffff>] 0xffffffffffffffff >> >> -> #0 (iprune_mutex){--..}: >> [<ffffffff80278db7>] __lock_acquire+0xa47/0x1020 >> [<ffffffff802793f5>] lock_acquire+0x65/0x90 >> [<ffffffff805dedd5>] mutex_lock_nested+0xb5/0x300 >> [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290 >> [<ffffffff8029d868>] shrink_slab+0x168/0x1d0 >> [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0 >> [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0 >> [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10 >> [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0 >> [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0 >> [<ffffffff805e3ecc>] do_page_fault+0x30c/0x890 >> [<ffffffff805e16cd>] error_exit+0x0/0xa9 >> [<ffffffffffffffff>] 0xffffffffffffffff >> >> other info that might help us debug this: >> >> 2 locks held by AutopanoPro/4470: >> #0: (&mm->mmap_sem){----}, at: [<ffffffff805e3e15>] do_page_fault+0x255/0x890 >> #1: (shrinker_rwsem){----}, at: [<ffffffff8029d732>] shrink_slab+0x32/0x1d0 >> >> stack backtrace: >> Pid: 4470, comm: AutopanoPro Not tainted 2.6.26-rc7-210c #2 >> >> Call Trace: >> [<ffffffff80276823>] print_circular_bug_tail+0x83/0x90 >> [<ffffffff80275e09>] ? print_circular_bug_entry+0x49/0x60 >> [<ffffffff80278db7>] __lock_acquire+0xa47/0x1020 >> [<ffffffff802793f5>] lock_acquire+0x65/0x90 >> [<ffffffff802d94fd>] ? shrink_icache_memory+0x7d/0x290 >> [<ffffffff805dedd5>] mutex_lock_nested+0xb5/0x300 >> [<ffffffff802d94fd>] ? shrink_icache_memory+0x7d/0x290 >> [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290 >> [<ffffffff8029d732>] ? shrink_slab+0x32/0x1d0 >> [<ffffffff8029d868>] shrink_slab+0x168/0x1d0 >> [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0 >> [<ffffffff8029c240>] ? isolate_pages_global+0x0/0x40 >> [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0 >> [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10 >> [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0 >> [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0 >> [<ffffffff80277e2f>] ? trace_hardirqs_on+0xbf/0x150 >> [<ffffffff805e3e15>] ? do_page_fault+0x255/0x890 >> [<ffffffff805e3ecc>] do_page_fault+0x30c/0x890 >> [<ffffffff805e16cd>] error_exit+0x0/0xa9 >> -- >> Daniel J Blueman >> > > -- > Mel Gorman > Part-time Phd Student Linux Technology Center > University of Limerick IBM Dublin Software Lab -- Daniel J Blueman ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20080622112100.794b1ae1@infradead.org>]
[parent not found: <6278d2220806221356o4c611e43n305ec9653d6d5359@mail.gmail.com>]
* Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)... [not found] ` <6278d2220806221356o4c611e43n305ec9653d6d5359@mail.gmail.com> @ 2008-06-22 22:29 ` Dave Chinner 0 siblings, 0 replies; 8+ messages in thread From: Dave Chinner @ 2008-06-22 22:29 UTC (permalink / raw) To: Daniel J Blueman Cc: Arjan van de Ven, Mel Gorman, Christoph Lameter, Linus Torvalds, Alexander Beregalov, Linux Kernel, xfs On Sun, Jun 22, 2008 at 09:56:17PM +0100, Daniel J Blueman wrote: > On Sun, Jun 22, 2008 at 7:21 PM, Arjan van de Ven <arjan@infradead.org> wrote: > > this sort of thing can easily be exposed with the latencytop tool... > > it will at least tell you WHAT the system is blocking on. > > (not so much the why, the tool isn't smart enough to automatically spit > > out kernel patches yet) > > Good plan. I reproduced this without NFS mounted, so pure XFS. I > wasn't able to capture the same process's (ie 5480) latencytop trace, > but it may have contributed to the list. > > A fair amount of debugging was turned on, hurting the latency some > (despite it being a 3.6GHz Penryn). > > Daniel > > --- [1] > > $ vmstat 1 > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > <snip> > 1 1 156 14424 12 2329672 0 0 0 110755 177 3820 57 6 36 0 > 2 1 156 14736 12 2348152 0 0 24 25172 204 26018 35 21 43 1 > 5 0 156 24252 12 2363800 0 0 59656 31 545 25292 35 14 28 23 > 4 0 156 14696 12 2317784 0 0 3824 0 38 23083 95 6 0 0 > 4 0 156 14440 12 2319304 0 0 4672 0 72 3372 93 3 3 2 > 2 0 156 14428 12 2318484 0 0 0 4 27 731 52 0 49 0 > 2 0 156 14480 12 2308512 0 0 0 12 32 36629 39 13 49 0 > 2 0 156 14572 12 2301220 0 0 3904 12316 117 10760 58 7 26 11 > > --- [2] > > Cause Maximum Percentage > down xfs_buf_lock _xfs_buf_find xfs_buf_get_flags 271.1 msec 0.4 % Waiting on I/O to complete. Your disk is busy. > down xfs_buf_iowait xfs_buf_iostart xfs_buf_read_f206.1 msec 1.3 % Waiting on I/O to complete. Your disk is busy. > down xfs_buf_lock xfs_getsb xfs_trans_getsb xfs_tr160.4 msec 0.5 % Waiting on a superblock I/O or a transaction to complete. Your disk is busy. (Note, this one can be avoided with lazy superblock counters). [snip reast of "busy disk trace"] But really, all latencytop is telling you here is that it takes time to wait for I/O to complete. It's mostly useless for tracking down locking issues when you've got I/O in progress... > 2.6.26-rc7-211c #2 > ------------------------------------------------------- > AutopanoPro/5480 is trying to acquire lock: > (iprune_mutex){--..}, at: [<ffffffff802d2a5d>] shrink_icache_memory+0x7d/0x280 > > but task is already holding lock: > (&(&ip->i_iolock)->mr_lock){----}, at: [<ffffffff803a459f>] > xfs_ilock+0xbf/0x110 > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #1 (&(&ip->i_iolock)->mr_lock){----}: > [<ffffffff802769ad>] __lock_acquire+0xbdd/0x1020 > [<ffffffff80276e55>] lock_acquire+0x65/0x90 > [<ffffffff8026b41b>] down_write_nested+0x4b/0x90 > [<ffffffff803a45df>] xfs_ilock+0xff/0x110 > [<ffffffff803a47af>] xfs_ireclaim+0x3f/0x90 > [<ffffffff803c9169>] xfs_finish_reclaim+0x59/0x220 > [<ffffffff803cc3b5>] xfs_reclaim+0x185/0x190 > [<ffffffff803d9211>] xfs_fs_clear_inode+0xe1/0x130 > [<ffffffff802d25c7>] clear_inode+0x87/0x120 > [<ffffffff802d290a>] dispose_list+0x2a/0x100 > [<ffffffff802d2c06>] shrink_icache_memory+0x226/0x280 > [<ffffffff8029d9c5>] shrink_slab+0x125/0x180 > [<ffffffff8029dc52>] try_to_free_pages+0x232/0x360 > [<ffffffff80297f0d>] __alloc_pages_internal+0x1ed/0x4a0 > [<ffffffff802981db>] __alloc_pages+0xb/0x10 > [<ffffffff802a312a>] handle_mm_fault+0x46a/0x6d0 > [<ffffffff8060510a>] do_page_fault+0x3ca/0x830 > [<ffffffff80602add>] error_exit+0x0/0xa9 > [<ffffffffffffffff>] 0xffffffffffffffff mmap_sem -> iprune_mutex -> xfs_ilock > -> #0 (iprune_mutex){--..}: > [<ffffffff80276817>] __lock_acquire+0xa47/0x1020 > [<ffffffff80276e55>] lock_acquire+0x65/0x90 > [<ffffffff8060059a>] mutex_lock_nested+0xba/0x2b0 > [<ffffffff802d2a5d>] shrink_icache_memory+0x7d/0x280 > [<ffffffff8029d9c5>] shrink_slab+0x125/0x180 > [<ffffffff8029dc52>] try_to_free_pages+0x232/0x360 > [<ffffffff80297f0d>] __alloc_pages_internal+0x1ed/0x4a0 > [<ffffffff802981db>] __alloc_pages+0xb/0x10 > [<ffffffff8029a6b6>] __do_page_cache_readahead+0x136/0x230 > [<ffffffff8029aa08>] ondemand_readahead+0x128/0x1f0 > [<ffffffff8029ab45>] page_cache_async_readahead+0x75/0xa0 > [<ffffffff80293a8a>] generic_file_aio_read+0x28a/0x610 > [<ffffffff803d78c4>] xfs_read+0x124/0x270 > [<ffffffff803d4416>] __xfs_file_read+0x46/0x50 > [<ffffffff803d4451>] xfs_file_aio_read+0x11/0x20 > [<ffffffff802bc1b1>] do_sync_read+0xf1/0x130 > [<ffffffff802bca74>] vfs_read+0xc4/0x160 > [<ffffffff802bcf10>] sys_read+0x50/0x90 > [<ffffffff8022639b>] system_call_after_swapgs+0x7b/0x80 > [<ffffffffffffffff>] 0xffffffffffffffff xfs_ilock -> iprune_mutex This is exactly the situation I mentioned in the previous email. There is no potential deadlock here between the xfs_ilock and iprune_mutex as the xfs_ilock that is held before and/or after iprune_mutex is guaranteed to be different (the first is in use so will never be found by shrink_icache_memory())... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)... [not found] ` <6278d2220806220258p28de00c1x615ad7b2f708e3f8@mail.gmail.com> [not found] ` <20080622181011.GC625@csn.ul.ie> @ 2008-06-22 22:19 ` Dave Chinner 2008-06-23 0:24 ` Mel Gorman 1 sibling, 1 reply; 8+ messages in thread From: Dave Chinner @ 2008-06-22 22:19 UTC (permalink / raw) To: Daniel J Blueman Cc: Christoph Lameter, Mel Gorman, Linus Torvalds, Alexander Beregalov, Linux Kernel, xfs [added xfs@oss.sgi.com to cc] On Sun, Jun 22, 2008 at 10:58:56AM +0100, Daniel J Blueman wrote: > I'm seeing a similar issue [2] to what was recently reported [1] by > Alexander, but with another workload involving XFS and memory > pressure. > > SLUB allocator is in use and config is at http://quora.org/config-client-debug . > > Let me know if you'd like more details/vmlinux objdump etc. > > Thanks, > Daniel > > --- [1] > > http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/e673c9173d45a735/db9213ef39e4e11c > > --- [2] > > ======================================================= > [ INFO: possible circular locking dependency detected ] > 2.6.26-rc7-210c #2 > ------------------------------------------------------- > AutopanoPro/4470 is trying to acquire lock: > (iprune_mutex){--..}, at: [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290 > > but task is already holding lock: > (&mm->mmap_sem){----}, at: [<ffffffff805e3e15>] do_page_fault+0x255/0x890 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #2 (&mm->mmap_sem){----}: > [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020 > [<ffffffff802793f5>] lock_acquire+0x65/0x90 > [<ffffffff805df5ab>] down_read+0x3b/0x70 > [<ffffffff805e3e3c>] do_page_fault+0x27c/0x890 > [<ffffffff805e16cd>] error_exit+0x0/0xa9 > [<ffffffffffffffff>] 0xffffffffffffffff > > -> #1 (&(&ip->i_iolock)->mr_lock){----}: > [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020 > [<ffffffff802793f5>] lock_acquire+0x65/0x90 > [<ffffffff8026d746>] down_write_nested+0x46/0x80 > [<ffffffff8039df29>] xfs_ilock+0x99/0xa0 > [<ffffffff8039e0cf>] xfs_ireclaim+0x3f/0x90 > [<ffffffff803ba889>] xfs_finish_reclaim+0x59/0x1a0 > [<ffffffff803bc199>] xfs_reclaim+0x109/0x110 > [<ffffffff803c9541>] xfs_fs_clear_inode+0xe1/0x110 > [<ffffffff802d906d>] clear_inode+0x7d/0x110 > [<ffffffff802d93aa>] dispose_list+0x2a/0x100 > [<ffffffff802d96af>] shrink_icache_memory+0x22f/0x290 > [<ffffffff8029d868>] shrink_slab+0x168/0x1d0 > [<ffffffff8029e0b6>] kswapd+0x3b6/0x560 > [<ffffffff8026921d>] kthread+0x4d/0x80 > [<ffffffff80227428>] child_rip+0xa/0x12 > [<ffffffffffffffff>] 0xffffffffffffffff You may as well ignore anything invlving this path in XFS until lockdep gets fixed. The kswapd reclaim path is inverted over the synchronous reclaim path that is xfs_ilock -> run out of memory -> prune_icache and then potentially another -> xfs_ilock. In this case, XFS can *never* deadlock because the second xfs_ilock is on a different, unreferenced, unlocked inode, but without turning off lockdep there is nothing in XFS that can be done to prevent this warning. Therxp eis a similar bug in the VM w.r.t the mmap_sem in that the mmap_sem is held across a call to put_filp() which can result in inversions between the xfs_ilock and mmap_sem. Both of these cases cannot be solved by changing XFS - lockdep needs to be made aware of paths that can invert normal locking order (like prune_icache) so it doesn't give false positives like this. > -> #0 (iprune_mutex){--..}: > [<ffffffff80278db7>] __lock_acquire+0xa47/0x1020 > [<ffffffff802793f5>] lock_acquire+0x65/0x90 > [<ffffffff805dedd5>] mutex_lock_nested+0xb5/0x300 > [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290 > [<ffffffff8029d868>] shrink_slab+0x168/0x1d0 > [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0 > [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0 > [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10 > [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0 > [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0 > [<ffffffff805e3ecc>] do_page_fault+0x30c/0x890 > [<ffffffff805e16cd>] error_exit+0x0/0xa9 > [<ffffffffffffffff>] 0xffffffffffffffff This case is different in that it įs complaining about mmap_sem vs iprune_mutex, so I think that we can pretty much ignore the XFS side of things here - the problem is higher level code.... > [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0 > [<ffffffff8029c240>] ? isolate_pages_global+0x0/0x40 > [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0 > [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10 > [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0 > [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0 FWIW, should page allocation in a page fault be allowed to recurse into the filesystem? If I follow the spaghetti of inline and compiler inlined functions correctly, this is a GFP_HIGHUSER_MOVABLE allocation, right? Should we be allowing shrink_icache_memory() to be called at all in the page fault path? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)... 2008-06-22 22:19 ` Dave Chinner @ 2008-06-23 0:24 ` Mel Gorman 2008-06-23 0:53 ` Dave Chinner 2008-06-23 7:22 ` Christoph Hellwig 0 siblings, 2 replies; 8+ messages in thread From: Mel Gorman @ 2008-06-23 0:24 UTC (permalink / raw) To: Daniel J Blueman, Christoph Lameter, Linus Torvalds, Alexander Beregalov, Linux Kernel, xfs On (23/06/08 08:19), Dave Chinner didst pronounce: > [added xfs@oss.sgi.com to cc] > > On Sun, Jun 22, 2008 at 10:58:56AM +0100, Daniel J Blueman wrote: > > I'm seeing a similar issue [2] to what was recently reported [1] by > > Alexander, but with another workload involving XFS and memory > > pressure. > > > > SLUB allocator is in use and config is at http://quora.org/config-client-debug . > > > > Let me know if you'd like more details/vmlinux objdump etc. > > > > Thanks, > > Daniel > > > > --- [1] > > > > http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/e673c9173d45a735/db9213ef39e4e11c > > > > --- [2] > > > > ======================================================= > > [ INFO: possible circular locking dependency detected ] > > 2.6.26-rc7-210c #2 > > ------------------------------------------------------- > > AutopanoPro/4470 is trying to acquire lock: > > (iprune_mutex){--..}, at: [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290 > > > > but task is already holding lock: > > (&mm->mmap_sem){----}, at: [<ffffffff805e3e15>] do_page_fault+0x255/0x890 > > > > which lock already depends on the new lock. > > > > > > the existing dependency chain (in reverse order) is: > > > > -> #2 (&mm->mmap_sem){----}: > > [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020 > > [<ffffffff802793f5>] lock_acquire+0x65/0x90 > > [<ffffffff805df5ab>] down_read+0x3b/0x70 > > [<ffffffff805e3e3c>] do_page_fault+0x27c/0x890 > > [<ffffffff805e16cd>] error_exit+0x0/0xa9 > > [<ffffffffffffffff>] 0xffffffffffffffff > > > > -> #1 (&(&ip->i_iolock)->mr_lock){----}: > > [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020 > > [<ffffffff802793f5>] lock_acquire+0x65/0x90 > > [<ffffffff8026d746>] down_write_nested+0x46/0x80 > > [<ffffffff8039df29>] xfs_ilock+0x99/0xa0 > > [<ffffffff8039e0cf>] xfs_ireclaim+0x3f/0x90 > > [<ffffffff803ba889>] xfs_finish_reclaim+0x59/0x1a0 > > [<ffffffff803bc199>] xfs_reclaim+0x109/0x110 > > [<ffffffff803c9541>] xfs_fs_clear_inode+0xe1/0x110 > > [<ffffffff802d906d>] clear_inode+0x7d/0x110 > > [<ffffffff802d93aa>] dispose_list+0x2a/0x100 > > [<ffffffff802d96af>] shrink_icache_memory+0x22f/0x290 > > [<ffffffff8029d868>] shrink_slab+0x168/0x1d0 > > [<ffffffff8029e0b6>] kswapd+0x3b6/0x560 > > [<ffffffff8026921d>] kthread+0x4d/0x80 > > [<ffffffff80227428>] child_rip+0xa/0x12 > > [<ffffffffffffffff>] 0xffffffffffffffff > > You may as well ignore anything invlving this path in XFS until > lockdep gets fixed. The kswapd reclaim path is inverted over the > synchronous reclaim path that is xfs_ilock -> run out of memory -> > prune_icache and then potentially another -> xfs_ilock. > In that case, have you any theory as to why this circular dependency is being reported now but wasn't before 2.6.26-rc1? I'm beginning to wonder if the bisecting fingering the zonelist modifiation is just a co-incidence. Also, do you think the stalls were happening before but just not being noticed? > In this case, XFS can *never* deadlock because the second xfs_ilock > is on a different, unreferenced, unlocked inode, but without turning > off lockdep there is nothing in XFS that can be done to prevent > this warning. > > Therxp eis a similar bug in the VM w.r.t the mmap_sem in that the > mmap_sem is held across a call to put_filp() which can result in > inversions between the xfs_ilock and mmap_sem. > > Both of these cases cannot be solved by changing XFS - lockdep > needs to be made aware of paths that can invert normal locking > order (like prune_icache) so it doesn't give false positives > like this. > > > -> #0 (iprune_mutex){--..}: > > [<ffffffff80278db7>] __lock_acquire+0xa47/0x1020 > > [<ffffffff802793f5>] lock_acquire+0x65/0x90 > > [<ffffffff805dedd5>] mutex_lock_nested+0xb5/0x300 > > [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290 > > [<ffffffff8029d868>] shrink_slab+0x168/0x1d0 > > [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0 > > [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0 > > [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10 > > [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0 > > [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0 > > [<ffffffff805e3ecc>] do_page_fault+0x30c/0x890 > > [<ffffffff805e16cd>] error_exit+0x0/0xa9 > > [<ffffffffffffffff>] 0xffffffffffffffff > > This case is different in that it ??s complaining about mmap_sem vs > iprune_mutex, so I think that we can pretty much ignore the XFS side > of things here - the problem is higher level code.... > > > [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0 > > [<ffffffff8029c240>] ? isolate_pages_global+0x0/0x40 > > [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0 > > [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10 > > [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0 > > [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0 > > FWIW, should page allocation in a page fault be allowed to recurse > into the filesystem? If I follow the spaghetti of inline and > compiler inlined functions correctly, this is a GFP_HIGHUSER_MOVABLE > allocation, right? Should we be allowing shrink_icache_memory() > to be called at all in the page fault path? > Well, the page fault path is able to go to sleep and can enter direct reclaim under low memory situations. Right now, I'm failing to see why a page fault should not be allowed to reclaim pages in use by a filesystem. It was allowed before so the question still is why the circular lock warning appears now but didn't before. > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)... 2008-06-23 0:24 ` Mel Gorman @ 2008-06-23 0:53 ` Dave Chinner 2008-06-23 7:22 ` Christoph Hellwig 1 sibling, 0 replies; 8+ messages in thread From: Dave Chinner @ 2008-06-23 0:53 UTC (permalink / raw) To: Mel Gorman Cc: Daniel J Blueman, Christoph Lameter, Linus Torvalds, Alexander Beregalov, Linux Kernel, xfs On Mon, Jun 23, 2008 at 01:24:15AM +0100, Mel Gorman wrote: > On (23/06/08 08:19), Dave Chinner didst pronounce: > > [added xfs@oss.sgi.com to cc] > > > > On Sun, Jun 22, 2008 at 10:58:56AM +0100, Daniel J Blueman wrote: > > > I'm seeing a similar issue [2] to what was recently reported [1] by > > > Alexander, but with another workload involving XFS and memory > > > pressure. [....] > > You may as well ignore anything invlving this path in XFS until > > lockdep gets fixed. The kswapd reclaim path is inverted over the > > synchronous reclaim path that is xfs_ilock -> run out of memory -> > > prune_icache and then potentially another -> xfs_ilock. > > > > In that case, have you any theory as to why this circular dependency is > being reported now but wasn't before 2.6.26-rc1? I'm beginning to wonder > if the bisecting fingering the zonelist modifiation is just a > co-incidence. Probably co-incidence. Perhaps it's simply changed the way reclaim is behaving and we are more likely to be trimming slab caches instead of getting free pages from the page lists? > Also, do you think the stalls were happening before but just not > being noticed? Entirely possible, I think, but I know of no evidence one way or another. [....] > > FWIW, should page allocation in a page fault be allowed to recurse > > into the filesystem? If I follow the spaghetti of inline and > > compiler inlined functions correctly, this is a GFP_HIGHUSER_MOVABLE > > allocation, right? Should we be allowing shrink_icache_memory() > > to be called at all in the page fault path? > > Well, the page fault path is able to go to sleep and can enter direct > reclaim under low memory situations. Right now, I'm failing to see why a > page fault should not be allowed to reclaim pages in use by a > filesystem. It was allowed before so the question still is why the > circular lock warning appears now but didn't before. Yeah, it's the fact that this is the first time that this lockdep warning has come up that prompted me to ask the question. I know that we are not allowed to lock an inode in the fault path as that can lead to deadlocks in the read and write paths, so what I was really wondering is if we can deadlock in a more convoluted manner by taking locks on *other inodes* in the page fault path.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)... 2008-06-23 0:24 ` Mel Gorman 2008-06-23 0:53 ` Dave Chinner @ 2008-06-23 7:22 ` Christoph Hellwig 2008-06-23 18:38 ` Mel Gorman 1 sibling, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2008-06-23 7:22 UTC (permalink / raw) To: Mel Gorman Cc: Daniel J Blueman, Christoph Lameter, Linus Torvalds, Alexander Beregalov, Linux Kernel, xfs On Mon, Jun 23, 2008 at 01:24:15AM +0100, Mel Gorman wrote: > In that case, have you any theory as to why this circular dependency is > being reported now but wasn't before 2.6.26-rc1? I'm beginning to wonder > if the bisecting fingering the zonelist modifiation is just a > co-incidence. I've seen this traces since lockdep was added when running xfsqa. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)... 2008-06-23 7:22 ` Christoph Hellwig @ 2008-06-23 18:38 ` Mel Gorman 0 siblings, 0 replies; 8+ messages in thread From: Mel Gorman @ 2008-06-23 18:38 UTC (permalink / raw) To: Christoph Hellwig Cc: Daniel J Blueman, Christoph Lameter, Linus Torvalds, Alexander Beregalov, Linux Kernel, xfs, david On (23/06/08 03:22), Christoph Hellwig didst pronounce: > On Mon, Jun 23, 2008 at 01:24:15AM +0100, Mel Gorman wrote: > > In that case, have you any theory as to why this circular dependency is > > being reported now but wasn't before 2.6.26-rc1? I'm beginning to wonder > > if the bisecting fingering the zonelist modifiation is just a > > co-incidence. > > I've seen this traces since lockdep was added when running xfsqa. > Oh right, so this isn't even 2.6.26-rc1 as such. It's an older problem that seems to be happening in more cases now, right? At this point, I believe the bisection fingering the zonelist modification was a co-incidence as reclaim behaviour at least is equivilant although catching the memory leak early was a lucky positive outcome. It's still not clear why the circular warning is happening more regularly now but it's "something else". Considering the number of changes made to NFS, XFS, reclaim and other areas since, I'm not sure how to go about finding the real underlying problem or if it can be dealt with in a trivial manner. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-06-23 18:37 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <6278d2220806220256g674304ectb945c14e7e09fede@mail.gmail.com>
[not found] ` <6278d2220806220258p28de00c1x615ad7b2f708e3f8@mail.gmail.com>
[not found] ` <20080622181011.GC625@csn.ul.ie>
2008-06-22 18:14 ` [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec) Mel Gorman
2008-06-22 18:54 ` Daniel J Blueman
[not found] ` <20080622112100.794b1ae1@infradead.org>
[not found] ` <6278d2220806221356o4c611e43n305ec9653d6d5359@mail.gmail.com>
2008-06-22 22:29 ` Dave Chinner
2008-06-22 22:19 ` Dave Chinner
2008-06-23 0:24 ` Mel Gorman
2008-06-23 0:53 ` Dave Chinner
2008-06-23 7:22 ` Christoph Hellwig
2008-06-23 18:38 ` Mel Gorman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox