Re: [PATCH v2] ext4: fix race condition between buffer write and page_mkwrite

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Theodore Ts'o" <tytso@mit.edu>
To: Baokun Li <libaokun1@huawei.com>
Cc: linux-ext4@vger.kernel.org, adilger.kernel@dilger.ca,
	jack@suse.cz, ritesh.list@gmail.com,
	linux-kernel@vger.kernel.org, jun.nie@linaro.org,
	ebiggers@kernel.org, yi.zhang@huawei.com, yangerkun@huawei.com,
	yukuai3@huawei.com,
	syzbot+a158d886ca08a3fecca4@syzkaller.appspotmail.com,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] ext4: fix race condition between buffer write and page_mkwrite
Date: Sat, 3 Jun 2023 23:04:45 -0400	[thread overview]
Message-ID: <20230604030445.GF1128744@mit.edu> (raw)
In-Reply-To: <20230530134405.322194-1-libaokun1@huawei.com>

I tried testing to see if this fixed [1], and it appears to be
triggering a lockdep warning[2] at this line in the patch:

[1] https://syzkaller.appspot.com/bug?extid=f4582777a19ec422b517
[2] https://syzkaller.appspot.com/x/report.txt?x=17260843280000

> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index d101b3b0c7da..9df82d72eb90 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -808,6 +809,27 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
>  	if (!daxdev_mapping_supported(vma, dax_dev))
>  		return -EOPNOTSUPP;
>  
> +	/*
> +	 * Writing via mmap has no logic to handle inline data, so we
> +	 * need to call ext4_convert_inline_data() to convert the inode
> +	 * to normal format before doing so, otherwise a BUG_ON will be
> +	 * triggered in ext4_writepages() due to the
> +	 * EXT4_STATE_MAY_INLINE_DATA flag. Moreover, we need to grab
> +	 * i_rwsem during conversion, since clearing and setting the
> +	 * inline data flag may race with ext4_buffered_write_iter()
> +	 * to trigger a BUG_ON.
> +	 */
> +	if (ext4_has_feature_inline_data(sb) &&
> +	    vma->vm_flags & VM_SHARED && vma->vm_flags & VM_MAYWRITE) {
> +		int err;
> +
> +		inode_lock(inode); <=================== LOCKDEP warning
> +		err = ext4_convert_inline_data(inode);
> +		inode_unlock(inode);
> +		if (err)
> +			return err;
> +	}


The details of the lockdep warning from [2], which appears to be a
mmap(2) racing with a buffered write(2) are below.  Could you take a
look?

Thanks!

					- Ted

======================================================
WARNING: possible circular locking dependency detected
6.4.0-rc4-syzkaller-geb1f822c76be-dirty #0 Not tainted
------------------------------------------------------
syz-executor.4/5589 is trying to acquire lock:
ffff888024228168 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_lock include/linux/mmap_lock.h:142 [inline]
ffff888024228168 (&mm->mmap_lock){++++}-{3:3}, at: do_user_addr_fault+0xb3d/0x1210 arch/x86/mm/fault.c:1391

but task is already holding lock:
ffff88806a066800 (&sb->s_type->i_mutex_key#8){++++}-{3:3}, at: inode_lock include/linux/fs.h:775 [inline]
ffff88806a066800 (&sb->s_type->i_mutex_key#8){++++}-{3:3}, at: ext4_buffered_write_iter+0xb0/0x460 fs/ext4/file.c:283

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&sb->s_type->i_mutex_key#8){++++}-{3:3}:
       down_write+0x92/0x200 kernel/locking/rwsem.c:1573
       inode_lock include/linux/fs.h:775 [inline]
       ext4_file_mmap+0x62e/0x800 fs/ext4/file.c:826
       call_mmap include/linux/fs.h:1873 [inline]
       mmap_region+0x694/0x28d0 mm/mmap.c:2652
       do_mmap+0x831/0xf60 mm/mmap.c:1394
       vm_mmap_pgoff+0x1a2/0x3b0 mm/util.c:543
       ksys_mmap_pgoff+0x41f/0x5a0 mm/mmap.c:1440
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd

-> #0 (&mm->mmap_lock){++++}-{3:3}:
       check_prev_add kernel/locking/lockdep.c:3113 [inline]
       check_prevs_add kernel/locking/lockdep.c:3232 [inline]
       validate_chain kernel/locking/lockdep.c:3847 [inline]
       __lock_acquire+0x2fcd/0x5f30 kernel/locking/lockdep.c:5088
       lock_acquire kernel/locking/lockdep.c:5705 [inline]
       lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5670
       down_read+0x9c/0x480 kernel/locking/rwsem.c:1520
       mmap_read_lock include/linux/mmap_lock.h:142 [inline]
       do_user_addr_fault+0xb3d/0x1210 arch/x86/mm/fault.c:1391
       handle_page_fault arch/x86/mm/fault.c:1534 [inline]
       exc_page_fault+0x98/0x170 arch/x86/mm/fault.c:1590
       asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:570
       fault_in_readable+0x1a5/0x210 mm/gup.c:1856
       fault_in_iov_iter_readable+0x252/0x2c0 lib/iov_iter.c:362
       generic_perform_write+0x1ae/0x570 mm/filemap.c:3913
       ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:289
       ext4_file_write_iter+0xbe0/0x1740 fs/ext4/file.c:710
       call_write_iter include/linux/fs.h:1868 [inline]
       new_sync_write fs/read_write.c:491 [inline]
       vfs_write+0x945/0xd50 fs/read_write.c:584
       ksys_write+0x12b/0x250 fs/read_write.c:637
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&sb->s_type->i_mutex_key#8);
                               lock(&mm->mmap_lock);
                               lock(&sb->s_type->i_mutex_key#8);
  rlock(&mm->mmap_lock);

 *** DEADLOCK ***

3 locks held by syz-executor.4/5589:
 #0: ffff88802a7fe0e8 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe7/0x100 fs/file.c:1047
 #1: ffff888021fe0460 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x12b/0x250 fs/read_write.c:637
 #2: ffff88806a066800 (&sb->s_type->i_mutex_key#8){++++}-{3:3}, at: inode_lock include/linux/fs.h:775 [inline]
 #2: ffff88806a066800 (&sb->s_type->i_mutex_key#8){++++}-{3:3}, at: ext4_buffered_write_iter+0xb0/0x460 fs/ext4/file.c:283

stack backtrace:
CPU: 0 PID: 5589 Comm: syz-executor.4 Not tainted 6.4.0-rc4-syzkaller-geb1f822c76be-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
 check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2188
 check_prev_add kernel/locking/lockdep.c:3113 [inline]
 check_prevs_add kernel/locking/lockdep.c:3232 [inline]
 validate_chain kernel/locking/lockdep.c:3847 [inline]
 __lock_acquire+0x2fcd/0x5f30 kernel/locking/lockdep.c:5088
 lock_acquire kernel/locking/lockdep.c:5705 [inline]
 lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5670
 down_read+0x9c/0x480 kernel/locking/rwsem.c:1520
 mmap_read_lock include/linux/mmap_lock.h:142 [inline]
 do_user_addr_fault+0xb3d/0x1210 arch/x86/mm/fault.c:1391
 handle_page_fault arch/x86/mm/fault.c:1534 [inline]
 exc_page_fault+0x98/0x170 arch/x86/mm/fault.c:1590
 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:570
RIP: 0010:fault_in_readable+0x1a5/0x210 mm/gup.c:1856
Code: fc ff df 48 c7 04 02 00 00 00 00 48 83 c4 48 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 45 31 e4 eb ce e8 ae 51 c4 ff 45 31 f6 <41> 8a 45 00 31 ff 44 89 f6 88 44 24 28 e8 b9 4d c4 ff 45 85 f6 75
RSP: 0018:ffffc90006187a38 EFLAGS: 00050246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888026905940 RSI: ffffffff81bff672 RDI: 0000000000000007
RBP: 00000000200002cc R08: 0000000000000007 R09: 0000000000000000
R10: 00000000000002c0 R11: 1ffffffff219cbe3 R12: 000000000000000c
R13: 00000000200002c0 R14: 0000000000000000 R15: 1ffff92000c30f48
 fault_in_iov_iter_readable+0x252/0x2c0 lib/iov_iter.c:362
 generic_perform_write+0x1ae/0x570 mm/filemap.c:3913
 ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:289
 ext4_file_write_iter+0xbe0/0x1740 fs/ext4/file.c:710
 call_write_iter include/linux/fs.h:1868 [inline]
 new_sync_write fs/read_write.c:491 [inline]
 vfs_write+0x945/0xd50 fs/read_write.c:584
 ksys_write+0x12b/0x250 fs/read_write.c:637
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f2359a8c169
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f235a721168 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007f2359bac050 RCX: 00007f2359a8c169
RDX: 000000000000000c RSI: 00000000200002c0 RDI: 0000000000000004
RBP: 00007f2359ae7ca1 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffea89659df R14: 00007f235a721300 R15: 0000000000022000
 </TASK>
----------------
Code disassembly (best guess), 2 bytes skipped:
   0:	df 48 c7             	fisttps -0x39(%rax)
   3:	04 02                	add    $0x2,%al
   5:	00 00                	add    %al,(%rax)
   7:	00 00                	add    %al,(%rax)
   9:	48 83 c4 48          	add    $0x48,%rsp
   d:	4c 89 e0             	mov    %r12,%rax
  10:	5b                   	pop    %rbx
  11:	5d                   	pop    %rbp
  12:	41 5c                	pop    %r12
  14:	41 5d                	pop    %r13
  16:	41 5e                	pop    %r14
  18:	41 5f                	pop    %r15
  1a:	c3                   	retq
  1b:	45 31 e4             	xor    %r12d,%r12d
  1e:	eb ce                	jmp    0xffffffee
  20:	e8 ae 51 c4 ff       	callq  0xffc451d3
  25:	45 31 f6             	xor    %r14d,%r14d
* 28:	41 8a 45 00          	mov    0x0(%r13),%al <-- trapping instruction
  2c:	31 ff                	xor    %edi,%edi
  2e:	44 89 f6             	mov    %r14d,%esi
  31:	88 44 24 28          	mov    %al,0x28(%rsp)
  35:	e8 b9 4d c4 ff       	callq  0xffc44df3
  3a:	45 85 f6             	test   %r14d,%r14d
  3d:	75                   	.byte 0x75

next prev parent reply	other threads:[~2023-06-04  3:05 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-30 13:44 [PATCH v2] ext4: fix race condition between buffer write and page_mkwrite Baokun Li
2023-05-31 13:15 ` Jan Kara
2023-06-04  3:04 ` Theodore Ts'o [this message]
2023-06-04 21:08   ` Theodore Ts'o
2023-06-05  1:58     ` Matthew Wilcox
2023-06-05  9:16       ` Jan Kara
2023-06-05 12:21         ` Jan Kara
2023-06-05 14:55           ` Matthew Wilcox
2023-06-05 15:08             ` Jan Kara
2023-06-06 12:19               ` Baokun Li
2024-04-15  4:28               ` Baokun Li
2024-04-15 12:34                 ` Jan Kara
2024-04-15 14:07                   ` Baokun Li
2023-06-05  2:17   ` Baokun Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230604030445.GF1128744@mit.edu \
    --to=tytso@mit.edu \
    --cc=adilger.kernel@dilger.ca \
    --cc=ebiggers@kernel.org \
    --cc=jack@suse.cz \
    --cc=jun.nie@linaro.org \
    --cc=libaokun1@huawei.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ritesh.list@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=syzbot+a158d886ca08a3fecca4@syzkaller.appspotmail.com \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.