Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)...

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Daniel J Blueman <daniel.blueman@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>,
	Mel Gorman <mel@csn.ul.ie>, Christoph Lameter <clameter@sgi.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Alexander Beregalov <a.beregalov@gmail.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	xfs@oss.sgi.com
Subject: Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)...
Date: Mon, 23 Jun 2008 08:29:52 +1000	[thread overview]
Message-ID: <20080622222952.GB11558@disturbed> (raw)
In-Reply-To: <6278d2220806221356o4c611e43n305ec9653d6d5359@mail.gmail.com>

On Sun, Jun 22, 2008 at 09:56:17PM +0100, Daniel J Blueman wrote:
> On Sun, Jun 22, 2008 at 7:21 PM, Arjan van de Ven <arjan@infradead.org> wrote:
> > this sort of thing can easily be exposed with the latencytop tool...
> > it will at least tell you WHAT the system is blocking on.
> > (not so much the why, the tool isn't smart enough to automatically spit
> > out kernel patches yet)
> 
> Good plan. I reproduced this without NFS mounted, so pure XFS. I
> wasn't able to capture the same process's (ie 5480) latencytop trace,
> but it may have contributed to the list.
> 
> A fair amount of debugging was turned on, hurting the latency some
> (despite it being a 3.6GHz Penryn).
> 
> Daniel
> 
> --- [1]
> 
> $ vmstat 1
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> <snip>
>  1  1    156  14424     12 2329672    0    0     0 110755  177 3820 57  6 36  0
>  2  1    156  14736     12 2348152    0    0    24 25172  204 26018 35 21 43  1
>  5  0    156  24252     12 2363800    0    0 59656    31  545 25292 35 14 28 23
>  4  0    156  14696     12 2317784    0    0  3824     0   38 23083 95  6  0  0
>  4  0    156  14440     12 2319304    0    0  4672     0   72 3372 93  3  3  2
>  2  0    156  14428     12 2318484    0    0     0     4   27  731 52  0 49  0
>  2  0    156  14480     12 2308512    0    0     0    12   32 36629 39 13 49  0
>  2  0    156  14572     12 2301220    0    0  3904 12316  117 10760 58  7 26 11
> 
> --- [2]
> 
> Cause                                                Maximum     Percentage
> down xfs_buf_lock _xfs_buf_find xfs_buf_get_flags 271.1 msec          0.4 %

Waiting on I/O to complete. Your disk is busy.

> down xfs_buf_iowait xfs_buf_iostart xfs_buf_read_f206.1 msec          1.3 %

Waiting on I/O to complete. Your disk is busy.

> down xfs_buf_lock xfs_getsb xfs_trans_getsb xfs_tr160.4 msec          0.5 %

Waiting on a superblock I/O or a transaction to complete. Your
disk is busy.  (Note, this one can be avoided with lazy
superblock counters).

[snip reast of "busy disk trace"]

But really, all latencytop is telling you here is that it takes
time to wait for I/O to complete. It's mostly useless for tracking
down locking issues when you've got I/O in progress...

> 2.6.26-rc7-211c #2
> -------------------------------------------------------
> AutopanoPro/5480 is trying to acquire lock:
>  (iprune_mutex){--..}, at: [<ffffffff802d2a5d>] shrink_icache_memory+0x7d/0x280
> 
> but task is already holding lock:
>  (&(&ip->i_iolock)->mr_lock){----}, at: [<ffffffff803a459f>]
> xfs_ilock+0xbf/0x110
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (&(&ip->i_iolock)->mr_lock){----}:
>        [<ffffffff802769ad>] __lock_acquire+0xbdd/0x1020
>        [<ffffffff80276e55>] lock_acquire+0x65/0x90
>        [<ffffffff8026b41b>] down_write_nested+0x4b/0x90
>        [<ffffffff803a45df>] xfs_ilock+0xff/0x110
>        [<ffffffff803a47af>] xfs_ireclaim+0x3f/0x90
>        [<ffffffff803c9169>] xfs_finish_reclaim+0x59/0x220
>        [<ffffffff803cc3b5>] xfs_reclaim+0x185/0x190
>        [<ffffffff803d9211>] xfs_fs_clear_inode+0xe1/0x130
>        [<ffffffff802d25c7>] clear_inode+0x87/0x120
>        [<ffffffff802d290a>] dispose_list+0x2a/0x100
>        [<ffffffff802d2c06>] shrink_icache_memory+0x226/0x280
>        [<ffffffff8029d9c5>] shrink_slab+0x125/0x180
>        [<ffffffff8029dc52>] try_to_free_pages+0x232/0x360
>        [<ffffffff80297f0d>] __alloc_pages_internal+0x1ed/0x4a0
>        [<ffffffff802981db>] __alloc_pages+0xb/0x10
>        [<ffffffff802a312a>] handle_mm_fault+0x46a/0x6d0
>        [<ffffffff8060510a>] do_page_fault+0x3ca/0x830
>        [<ffffffff80602add>] error_exit+0x0/0xa9
>        [<ffffffffffffffff>] 0xffffffffffffffff

mmap_sem -> iprune_mutex -> xfs_ilock

> -> #0 (iprune_mutex){--..}:
>        [<ffffffff80276817>] __lock_acquire+0xa47/0x1020
>        [<ffffffff80276e55>] lock_acquire+0x65/0x90
>        [<ffffffff8060059a>] mutex_lock_nested+0xba/0x2b0
>        [<ffffffff802d2a5d>] shrink_icache_memory+0x7d/0x280
>        [<ffffffff8029d9c5>] shrink_slab+0x125/0x180
>        [<ffffffff8029dc52>] try_to_free_pages+0x232/0x360
>        [<ffffffff80297f0d>] __alloc_pages_internal+0x1ed/0x4a0
>        [<ffffffff802981db>] __alloc_pages+0xb/0x10
>        [<ffffffff8029a6b6>] __do_page_cache_readahead+0x136/0x230
>        [<ffffffff8029aa08>] ondemand_readahead+0x128/0x1f0
>        [<ffffffff8029ab45>] page_cache_async_readahead+0x75/0xa0
>        [<ffffffff80293a8a>] generic_file_aio_read+0x28a/0x610
>        [<ffffffff803d78c4>] xfs_read+0x124/0x270
>        [<ffffffff803d4416>] __xfs_file_read+0x46/0x50
>        [<ffffffff803d4451>] xfs_file_aio_read+0x11/0x20
>        [<ffffffff802bc1b1>] do_sync_read+0xf1/0x130
>        [<ffffffff802bca74>] vfs_read+0xc4/0x160
>        [<ffffffff802bcf10>] sys_read+0x50/0x90
>        [<ffffffff8022639b>] system_call_after_swapgs+0x7b/0x80
>        [<ffffffffffffffff>] 0xffffffffffffffff

xfs_ilock -> iprune_mutex

This is exactly the situation I mentioned in the previous email.
There is no potential deadlock here between the xfs_ilock and
iprune_mutex as the xfs_ilock that is held before and/or after
iprune_mutex is guaranteed to be different (the first is in use
so will never be found by shrink_icache_memory())...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2008-06-22 22:29 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <6278d2220806220256g674304ectb945c14e7e09fede@mail.gmail.com>
2008-06-22  9:58 ` [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec) Daniel J Blueman
2008-06-22 18:10   ` Mel Gorman
2008-06-22 18:14     ` Mel Gorman
2008-06-22 18:54       ` Daniel J Blueman
2008-06-22 18:21     ` Arjan van de Ven
2008-06-22 20:56       ` Daniel J Blueman
2008-06-22 22:29         ` Dave Chinner [this message]
2008-06-22 22:19   ` Dave Chinner
2008-06-23  0:24     ` Mel Gorman
2008-06-23  0:53       ` Dave Chinner
2008-06-23  7:22       ` Christoph Hellwig
2008-06-23 18:38         ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080622222952.GB11558@disturbed \
    --to=david@fromorbit.com \
    --cc=a.beregalov@gmail.com \
    --cc=arjan@infradead.org \
    --cc=clameter@sgi.com \
    --cc=daniel.blueman@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=torvalds@linux-foundation.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.