linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: alexander.levin@verizon.com
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>, Hugh Dickins <hughd@google.com>,
	Michel Lespinasse <walken@google.com>, Jan Kara <jack@suse.cz>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [patch 1/3] mm: protect set_page_dirty() from ongoing truncation
Date: Mon, 10 Apr 2017 14:06:38 +0200	[thread overview]
Message-ID: <20170410120638.GD3224@quack2.suse.cz> (raw)
In-Reply-To: <20170410022230.xe5sukvflvoh4ula@sasha-lappy>

On Mon 10-04-17 02:22:33, alexander.levin@verizon.com wrote:
> On Fri, Dec 05, 2014 at 09:52:44AM -0500, Johannes Weiner wrote:
> > Tejun, while reviewing the code, spotted the following race condition
> > between the dirtying and truncation of a page:
> > 
> > __set_page_dirty_nobuffers()       __delete_from_page_cache()
> >   if (TestSetPageDirty(page))
> >                                      page->mapping = NULL
> > 				     if (PageDirty())
> > 				       dec_zone_page_state(page, NR_FILE_DIRTY);
> > 				       dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
> >     if (page->mapping)
> >       account_page_dirtied(page)
> >         __inc_zone_page_state(page, NR_FILE_DIRTY);
> > 	__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
> > 
> > which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE.
> > 
> > Dirtiers usually lock out truncation, either by holding the page lock
> > directly, or in case of zap_pte_range(), by pinning the mapcount with
> > the page table lock held.  The notable exception to this rule, though,
> > is do_wp_page(), for which this race exists.  However, do_wp_page()
> > already waits for a locked page to unlock before setting the dirty
> > bit, in order to prevent a race where clear_page_dirty() misses the
> > page bit in the presence of dirty ptes.  Upgrade that wait to a fully
> > locked set_page_dirty() to also cover the situation explained above.
> > 
> > Afterwards, the code in set_page_dirty() dealing with a truncation
> > race is no longer needed.  Remove it.
> > 
> > Reported-by: Tejun Heo <tj@kernel.org>
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: <stable@vger.kernel.org>
> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> Hi Johannes,
> 
> I'm seeing the following while fuzzing with trinity on linux-next (I've changed
> the WARN to a VM_BUG_ON_PAGE for some extra page info).

But this looks more like a bug in 9p which allows v9fs_write_end() to dirty
a !Uptodate page?

								Honza

> 
> [   18.991007] page:ffffea000307c8c0 count:3 mapcount:0 mapping:ffff88010444cbf8 index:0x1^M
> [   18.993051] flags: 0x1fffc0000000011(locked|dirty)^M
> [   18.993621] raw: 01fffc0000000011 ffff88010444cbf8 0000000000000001 00000003ffffffff^M
> [   18.994522] raw: dead000000000100 dead000000000200 0000000000000000 ffff880109c38008^M                                                                     [   18.995418] page dumped because: VM_BUG_ON_PAGE(!PagePrivate(page) && !PageUptodate(page))^M
> [   18.996381] page->mem_cgroup:ffff880109c38008^M                                                                                                            [   18.996935] ------------[ cut here ]------------^M                                                                                                         [   18.997483] kernel BUG at mm/page-writeback.c:2486!^M                                                                                                      [   18.998063] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN^M
> [   18.998756] Modules linked in:^M                                                                                                                           [   18.999129] CPU: 5 PID: 1388 Comm: trinity-c34 Not tainted 4.11.0-rc5-next-20170407-dirty #12^M                                                            [   19.000117] task: ffff880106ee5d40 task.stack: ffff8800c0f40000^M                                                                                          [   19.000828] RIP: 0010:__set_page_dirty_nobuffers (??:?)
> [   19.001491] RSP: 0018:ffff8800c0f47318 EFLAGS: 00010006^M
> [   19.002103] RAX: 0000000000000000 RBX: 1ffff100181e8e67 RCX: 0000000000000000^M
> [   19.002929] RDX: 0000000000000021 RSI: 1ffff100181e8da7 RDI: ffffed00181e8e58^M
> [   19.004806] RBP: ffff8800c0f47440 R08: 3830303833633930 R09: 3130383866666666^M
> [   19.005626] R10: dffffc0000000000 R11: 0000000000001491 R12: ffff8800c0f47418^M
> [   19.006452] R13: ffffea000307c8c0 R14: ffff88010444cc10 R15: ffff88010444cbf8^M
> [   19.007277] FS:  00007ff6a26fb700(0000) GS:ffff88010a340000(0000) knlGS:0000000000000000^M
> [   19.008424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
> [   19.009092] CR2: 00007ff6a155267c CR3: 00000000cb301000 CR4: 00000000000406a0^M
> [   19.009919] Call Trace:^M
> [   19.012266] set_page_dirty (mm/page-writeback.c:2579)
> [   19.020028] v9fs_write_end (fs/9p/vfs_addr.c:325)
> [   19.022473] generic_perform_write (mm/filemap.c:2842)
> [   19.024857] __generic_file_write_iter (mm/filemap.c:2957)
> [   19.025830] generic_file_write_iter (./include/linux/fs.h:702 mm/filemap.c:2985)
> [   19.028549] __do_readv_writev (./include/linux/fs.h:1734 fs/read_write.c:696 fs/read_write.c:862)
> [   19.029924] do_readv_writev (fs/read_write.c:895)
> [   19.034044] vfs_writev (fs/read_write.c:921)
> [   19.035223] do_writev (fs/read_write.c:955)
> [   19.036925] SyS_writev (fs/read_write.c:1024)
> [   19.037297] do_syscall_64 (arch/x86/entry/common.c:284)
> [   19.042085] entry_SYSCALL64_slow_path (arch/x86/entry/entry_64.S:249)                                                                                      [   19.042608] RIP: 0033:0x7ff6a200a8e9^M                                                                                                                     [   19.043015] RSP: 002b:00007fff78079608 EFLAGS: 00000246 ORIG_RAX: 0000000000000014^M
> [   19.044253] RAX: ffffffffffffffda RBX: 0000000000000014 RCX: 00007ff6a200a8e9^M                                                                            [   19.045045] RDX: 0000000000000001 RSI: 0000000002337d60 RDI: 000000000000018b^M
> [   19.045835] RBP: 00007ff6a2601000 R08: 000000482a1a83cf R09: fffdffffffffffff^M                                                                            [   19.046627] R10: 0012536735f82cf7 R11: 0000000000000246 R12: 0000000000000002^M                                                                            [   19.047413] R13: 00007ff6a2601048 R14: 00007ff6a26fb698 R15: 00007ff6a2601000^M                                                                            [ 19.048212] Code: 89 85 f0 fe ff ff e8 39 1b 20 00 8b 85 f0 fe ff ff eb 1a e8 2c bd 12 00 31 c0 eb 11 48 c7 c6 e0 c4 47 83 4c 89 ef e8 39 44 07 00 <0f> 0b 48 ba 00 00 00 00 00 fc ff df 48 c7 04 13 00 00 00 00 48 ^M                                                                                                     All code                                                                                                                                                      ========                                                                                                                                                         0:   89 85 f0 fe ff ff       mov    %eax,-0x110(%rbp)
>    6:   e8 39 1b 20 00          callq  0x201b44 
>    b:   8b 85 f0 fe ff ff       mov    -0x110(%rbp),%eax 
>   11:   eb 1a                   jmp    0x2d 
>   13:   e8 2c bd 12 00          callq  0x12bd44 
>   18:   31 c0                   xor    %eax,%eax 
>   1a:   eb 11                   jmp    0x2d 
>   1c:   48 c7 c6 e0 c4 47 83    mov    $0xffffffff8347c4e0,%rsi
>   23:   4c 89 ef                mov    %r13,%rdi
>   26:   e8 39 44 07 00          callq  0x74464
>   2b:*  0f 0b                   ud2             <-- trapping instruction
>   2d:   48 ba 00 00 00 00 00    movabs $0xdffffc0000000000,%rdx
>   34:   fc ff df
>   37:   48 c7 04 13 00 00 00    movq   $0x0,(%rbx,%rdx,1)
>   3e:   00
>   3f:   48                      rex.W
>         ...
> 
> Code starting with the faulting instruction
> ===========================================
>    0:   0f 0b                   ud2
>    2:   48 ba 00 00 00 00 00    movabs $0xdffffc0000000000,%rdx
>    9:   fc ff df
>    c:   48 c7 04 13 00 00 00    movq   $0x0,(%rbx,%rdx,1)
>   13:   00
>   14:   48                      rex.W
>         ...
> [   19.050311] RIP: __set_page_dirty_nobuffers+0x407/0x450 RSP: ffff8800c0f47318^M (??:?)
> 
> -- 
> 
> Thanks,
> Sasha
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-04-10 12:06 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-05 14:52 [patch 1/3] mm: protect set_page_dirty() from ongoing truncation Johannes Weiner
2014-12-05 14:52 ` [patch 2/3] mm: memory: remove ->vm_file check on shared writable vmas Johannes Weiner
2014-12-05 14:52 ` [patch 3/3] mm: memory: merge shared-writable dirtying branches in do_wp_page() Johannes Weiner
2014-12-09 18:22   ` Jan Kara
2014-12-09 18:18 ` [patch 1/3] mm: protect set_page_dirty() from ongoing truncation Jan Kara
2017-04-10  2:22 ` alexander.levin
2017-04-10 12:06   ` Jan Kara [this message]
2017-04-10 15:07     ` alexander.levin
2017-04-10 15:51       ` Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2014-12-16 16:18 [patch 0/3 resend] mm: close race between dirtying and truncation Johannes Weiner
2014-12-16 16:18 ` [patch 1/3] mm: protect set_page_dirty() from ongoing truncation Johannes Weiner
2014-12-01 22:58 Johannes Weiner
2014-12-02  9:12 ` Jan Kara
2014-12-02 15:06   ` Johannes Weiner
2014-12-02 11:56 ` Kirill A. Shutemov
2014-12-02 15:11   ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170410120638.GD3224@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.levin@verizon.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).