From: Jerome Marchand <jmarchan@redhat.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>,
Anna Schumaker <anna.schumaker@netapp.com>,
Christoph Hellwig <hch@infradead.org>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Mel Gorman <mgorman@suse.de>
Subject: Re: [RFC PATCH] nfs: avoid swap-over-NFS deadlock
Date: Mon, 27 Jul 2015 13:25:47 +0200 [thread overview]
Message-ID: <55B6153B.1070604@redhat.com> (raw)
In-Reply-To: <20150727105216.GD2660@techsingularity.net>
[-- Attachment #1: Type: text/plain, Size: 6351 bytes --]
On 07/27/2015 12:52 PM, Mel Gorman wrote:
> On Wed, Jul 22, 2015 at 03:46:16PM +0200, Jerome Marchand wrote:
>> On 07/22/2015 02:23 PM, Trond Myklebust wrote:
>>> On Wed, Jul 22, 2015 at 4:10 AM, Jerome Marchand <jmarchan@redhat.com> wrote:
>>>>
>>>> Lockdep warns about a inconsistent {RECLAIM_FS-ON-W} ->
>>>> {IN-RECLAIM_FS-W} usage. The culpritt is the inode->i_mutex taken in
>>>> nfs_file_direct_write(). This code was introduced by commit a9ab5e840669
>>>> ("nfs: page cache invalidation for dio").
>>>> This naive test patch avoid to take the mutex on a swapfile and makes
>>>> lockdep happy again. However I don't know much about NFS code and I
>>>> assume it's probably not the proper solution. Any thought?
>>>>
>>>> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
>>>
>>> NFS is not the only O_DIRECT implementation to set the inode->i_mutex.
>>> Why can't this be fixed in the generic swap code instead of adding
>>> yet-another-exception-for-IS_SWAPFILE?
>>
>> I meant to cc Mel. Just added him.
>>
>
> Can the full lockdep warning be included as it'll be easier to see then if
> the generic swap code can somehow special case this? Currently, generic
> swapping does not not need to care about how the filesystem locked.
> For most filesystems, it's writing directly to the blocks on disk and
> bypassing the FS. In the NFS case it'd be surprising to find that there
> also are dirty pages in page cache that belong to the swap file as it's
> going to cause corruption. If there is any special casing it would to only
> attempt the invalidation in the !swap case and warn if mapping->nrpages. It
> still would look a bit weird but safer than just not acquiring the mutex
> and then potentially attempting an invalidation.
>
[ 6819.501009] =================================
[ 6819.501009] [ INFO: inconsistent lock state ]
[ 6819.501009] 4.2.0-rc1-shmacct-babka-v2-next-20150709+ #255 Not tainted
[ 6819.501009] ---------------------------------
[ 6819.501009] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[ 6819.501009] kswapd0/38 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 6819.501009] (&sb->s_type->i_mutex_key#17){+.+.?.}, at: [<ffffffffa03772a5>] nfs_file_direct_write+0x85/0x3f0 [nfs]
[ 6819.501009] {RECLAIM_FS-ON-W} state was registered at:
[ 6819.501009] [<ffffffff81107f51>] mark_held_locks+0x71/0x90
[ 6819.501009] [<ffffffff8110b775>] lockdep_trace_alloc+0x75/0xe0
[ 6819.501009] [<ffffffff81245529>] kmem_cache_alloc_node_trace+0x39/0x440
[ 6819.501009] [<ffffffff81225b8f>] __get_vm_area_node+0x7f/0x160
[ 6819.501009] [<ffffffff81226eb2>] __vmalloc_node_range+0x72/0x2c0
[ 6819.501009] [<ffffffff81227424>] vzalloc+0x54/0x60
[ 6819.501009] [<ffffffff8122c7c8>] SyS_swapon+0x628/0xfc0
[ 6819.501009] [<ffffffff81867772>] entry_SYSCALL_64_fastpath+0x12/0x76
[ 6819.501009] irq event stamp: 163459
[ 6819.501009] hardirqs last enabled at (163459): [<ffffffff81866c66>] _raw_spin_unlock_irqrestore+0x36/0x60
[ 6819.501009] hardirqs last disabled at (163458): [<ffffffff8186747b>] _raw_spin_lock_irqsave+0x2b/0x90
[ 6819.501009] softirqs last enabled at (162966): [<ffffffff810b13d3>] __do_softirq+0x363/0x630
[ 6819.501009] softirqs last disabled at (162961): [<ffffffff810b1a03>] irq_exit+0xf3/0x100
[ 6819.501009]
other info that might help us debug this:
[ 6819.501009] Possible unsafe locking scenario:
[ 6819.501009] CPU0
[ 6819.501009] ----
[ 6819.501009] lock(&sb->s_type->i_mutex_key#17);
[ 6819.501009] <Interrupt>
[ 6819.501009] lock(&sb->s_type->i_mutex_key#17);
[ 6819.501009]
*** DEADLOCK ***
[ 6819.501009] no locks held by kswapd0/38.
[ 6819.501009]
stack backtrace:
[ 6819.501009] CPU: 1 PID: 38 Comm: kswapd0 Not tainted 4.2.0-rc1-shmacct-babka-v2-next-20150709+ #255
[ 6819.501009] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 6819.501009] 0000000000000000 00000000cca71737 ffff880033f374d8 ffffffff8185ce5b
[ 6819.501009] 0000000000000000 ffff880033f30000 ffff880033f37538 ffffffff8185732d
[ 6819.501009] 0000000000000000 ffff880000000001 ffff880000000001 ffffffff8102f49f
[ 6819.501009] Call Trace:
[ 6819.501009] [<ffffffff8185ce5b>] dump_stack+0x4c/0x65
[ 6819.501009] [<ffffffff8185732d>] print_usage_bug+0x1f2/0x203
[ 6819.501009] [<ffffffff8102f49f>] ? save_stack_trace+0x2f/0x50
[ 6819.501009] [<ffffffff81107430>] ? check_usage_backwards+0x150/0x150
[ 6819.501009] [<ffffffff81107e52>] mark_lock+0x212/0x2a0
[ 6819.501009] [<ffffffff81108d73>] __lock_acquire+0x8d3/0x1f40
[ 6819.501009] [<ffffffff8110953e>] ? __lock_acquire+0x109e/0x1f40
[ 6819.501009] [<ffffffff8110ac92>] lock_acquire+0xc2/0x280
[ 6819.501009] [<ffffffffa03772a5>] ? nfs_file_direct_write+0x85/0x3f0 [nfs]
[ 6819.501009] [<ffffffff818641bf>] mutex_lock_nested+0x7f/0x3f0
[ 6819.501009] [<ffffffffa03772a5>] ? nfs_file_direct_write+0x85/0x3f0 [nfs]
[ 6819.501009] [<ffffffff81105328>] ? __lock_is_held+0x58/0x80
[ 6819.501009] [<ffffffffa03772a5>] ? nfs_file_direct_write+0x85/0x3f0 [nfs]
[ 6819.501009] [<ffffffff8122a500>] ? get_swap_bio+0x90/0x90
[ 6819.501009] [<ffffffffa03772a5>] nfs_file_direct_write+0x85/0x3f0 [nfs]
[ 6819.501009] [<ffffffff8122a500>] ? get_swap_bio+0x90/0x90
[ 6819.501009] [<ffffffffa0377640>] nfs_direct_IO+0x30/0x50 [nfs]
[ 6819.501009] [<ffffffff8122a9b5>] __swap_writepage+0x105/0x270
[ 6819.501009] [<ffffffff8122ab59>] swap_writepage+0x39/0x70
[ 6819.501009] [<ffffffff811fbef2>] shmem_writepage+0x1f2/0x330
[ 6819.501009] [<ffffffff811f3319>] pageout.isra.48+0x189/0x4a0
[ 6819.501009] [<ffffffff811f5497>] shrink_page_list+0x9b7/0xc80
[ 6819.501009] [<ffffffff811f60a8>] shrink_inactive_list+0x3a8/0x800
[ 6819.501009] [<ffffffff810e72f5>] ? local_clock+0x15/0x30
[ 6819.501009] [<ffffffff811f6f10>] shrink_lruvec+0x610/0x800
[ 6819.501009] [<ffffffff811f71e7>] shrink_zone+0xe7/0x2d0
[ 6819.501009] [<ffffffff811f8ddd>] kswapd+0x55d/0xd30
[ 6819.501009] [<ffffffff811f8880>] ? mem_cgroup_shrink_node_zone+0x490/0x490
[ 6819.501009] [<ffffffff810d1a74>] kthread+0x104/0x120
[ 6819.501009] [<ffffffff810d1970>] ? kthread_create_on_node+0x250/0x250
[ 6819.501009] [<ffffffff81867aef>] ret_from_fork+0x3f/0x70
[ 6819.501009] [<ffffffff810d1970>] ? kthread_create_on_node+0x250/0x250
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
next prev parent reply other threads:[~2015-07-27 11:26 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-22 8:10 [RFC PATCH] nfs: avoid swap-over-NFS deadlock Jerome Marchand
2015-07-22 12:23 ` Trond Myklebust
2015-07-22 13:46 ` Jerome Marchand
2015-07-27 10:52 ` Mel Gorman
2015-07-27 11:25 ` Jerome Marchand [this message]
2015-08-20 12:23 ` Mel Gorman
2015-09-01 16:22 ` Jerome Marchand
2015-09-03 14:01 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55B6153B.1070604@redhat.com \
--to=jmarchan@redhat.com \
--cc=anna.schumaker@netapp.com \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mgorman@techsingularity.net \
--cc=trond.myklebust@primarydata.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox