From: harshad shirwadkar <harshadshirwadkar@gmail.com>
To: Murphy Zhou <jencce.kernel@gmail.com>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>,
"Theodore Ts'o" <tytso@mit.edu>
Subject: Re: [fsdax] kernel BUG at fs/ext4/ext4_extents.h:199!
Date: Wed, 11 Nov 2020 10:08:22 -0800 [thread overview]
Message-ID: <CAD+ocbwkPKnnaKS25eziNp6Pz0yLSwWyitzSMFWhxjFgi_37aQ@mail.gmail.com> (raw)
In-Reply-To: <20201111031039.do6syeiam4pgftvu@xzhoux.usersys.redhat.com>
Thanks for the report Murphy. I just realized we have a collision of
mount flags:
#define EXT4_MOUNT2_DAX_NEVER 0x00000008 /* Do not allow
Direct Access */
#define EXT4_MOUNT2_DAX_INODE 0x00000010 /* For printing
options only */
#define EXT4_MOUNT2_EXPLICIT_JOURNAL_CHECKSUM 0x00000008 /* User explicitly
specified journal checksum */
#define EXT4_MOUNT2_JOURNAL_FAST_COMMIT 0x00000010 /* Journal fast commit */
So whenever dax=inode is turned on, fast commit also gets turned on.
I'll send out a fix for this soon.
Thanks,
Harshad
On Tue, Nov 10, 2020 at 7:10 PM Murphy Zhou <jencce.kernel@gmail.com> wrote:
>
> Hi,
>
> A corner case panic starts since the fast-commit merge.
>
> ------
> kernel BUG at fs/ext4/ext4_extents.h:199!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 5 PID: 1184 Comm: t_mmap_dio Tainted: G E 5.10.0-rc3-master-407ab579637c+ #37
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
> RIP: 0010:ext4_fc_write_inode_data+0x1d0/0x1e0 [ext4]
> Code: 7f 00 00 74 25 66 81 ca 00 80 66 89 54 24 30 e9 62 ff ff ff 4c 89 ff e8 9e 63 4d c1 31 c0 eb 84 b8 83 ff ff ff e9 7a ff ff ff <0f> 0b e8 79 c8 4c c1 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41
> RSP: 0018:ffff9bc200d53d88 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000008000
> RDX: 0000000000008000 RSI: 0000000000060002 RDI: 000032ee98c0aa2c
> RBP: 0000000000008000 R08: ffffffffc0559158 R09: ffffffffc054cdc0
> R10: 0000000000008000 R11: 0000000000000000 R12: 0000000000007fff
> R13: ffff9bc200d53e6c R14: ffff88d2e0cf3420 R15: ffff88d2e0cf33a8
> FS: 00007f20006cab80(0000) GS:ffff88d367140000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2000937000 CR3: 00000003db940002 CR4: 0000000000020ee0
> Call Trace:
> ext4_fc_commit+0x874/0x900 [ext4]
> ? file_check_and_advance_wb_err+0x2e/0xc0
> ext4_sync_file+0xd4/0x350 [ext4]
> __x64_sys_fsync+0x34/0x60
> do_syscall_64+0x33/0x40
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f20008d1ef7
> ------
>
> Bisect result shows the first bad commit:
> [aa75f4d3daaeb1389b9cce9d6b84401eaf228d4e] ext4: main fast-commit commit path
>
> It's hit by running xfstests generic/605 on pmem ramdisk, fsdax enabled.
> I've narrowed down to a simplified reproducer, which requires a pmem ramdisk
> been setup first, eg /dev/pmem10. And /root/xfstests-dev is built.
>
> --------------------- Reproducer start -----------
> #! /bin/bash
> SCRATCH_DEV=/dev/pmem10
> SCRATCH_MNT=/daxmnt
> SRC_FILE=$SCRATCH_MNT/src
> DST_FILE=$SCRATCH_MNT/dst
> XFS_IO_PROG=/usr/sbin/xfs_io
> tsize=$((128 * 1024 * 1024))
> wipefs -af $SCRATCH_DEV > out.full 2>&1
> mkfs.ext4 -b 4096 $SCRATCH_DEV >> out.full 2>&1 || exit
> mkdir -p $SCRATCH_MNT
> mount -t ext4 -o dax=inode,context=system_u:object_r:nfs_t:s0 $SCRATCH_DEV $SCRATCH_MNT || exit
> prep_files()
> {
> rm -f $SRC_FILE $DST_FILE
> $XFS_IO_PROG -f -c "falloc 0 $tsize" $SRC_FILE $DST_FILE >> out.full 2>&1 || exit
> }
> prep_files
> # with O_DIRECT first
> /root/xfstests-dev/src/t_mmap_dio $SRC_FILE $DST_FILE 1024 "dio both dax" && echo pass dio
> prep_files
> # again with buffered IO
> /root/xfstests-dev/src/t_mmap_dio -b $SRC_FILE $DST_FILE 1024 "buffered both dax" && echo pass bio
> umount /daxmnt
> --------------------- Reproducer end -----------
>
> The "context=xx" and "dax=inode" mount option are *necessary* to reproduce.
>
> NO panic reproduced if mount with "-o dax" or "dax=always" or "dax=never".
>
> The context value seems irrelevant. Other value can also trigger this panic.
> The inode size value seems irrelevant. Tested with 128 and 256, both panic.
>
> Some gdb info and fs info paste below for you ref.
>
> Thanks,
> Murphy
>
> ------------ gdb info -----------
> (gdb) l *(ext4_fc_write_inode_data+0x1d0/0x1e0)
> 0x64fd0 is in ext4_fc_write_inode_data (fs/ext4/fast_commit.c:787).
> 782 /*
> 783 * Writes updated data ranges for the inode in question. Updates CRC.
> 784 * Returns 0 on success, error otherwise.
> 785 */
> 786 static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)
> 787 {
> 788 ext4_lblk_t old_blk_size, cur_lblk_off, new_blk_size;
> 789 struct ext4_inode_info *ei = EXT4_I(inode);
> 790 struct ext4_map_blocks map;
> 791 struct ext4_fc_add_range fc_ext;
> (gdb) l *(ext4_fc_write_inode_data+0x1d0)
> 0x651a0 is in ext4_fc_write_inode_data (fs/ext4/ext4_extents.h:199).
> 194 }
> 195
> 196 static inline void ext4_ext_mark_unwritten(struct ext4_extent *ext)
> 197 {
> 198 /* We can not have an unwritten extent of zero length! */
> 199 BUG_ON((le16_to_cpu(ext->ee_len) & ~EXT_INIT_MAX_LEN) == 0);
> 200 ext->ee_len |= cpu_to_le16(EXT_INIT_MAX_LEN);
> 201 }
> 202
> 203 static inline int ext4_ext_is_unwritten(struct ext4_extent *ext)
> (gdb) l *(ext4_fc_commit+0x874/0x900)
> 0x67460 is in ext4_fc_commit (fs/ext4/fast_commit.c:1072).
> 1067 * commit_tid if needed. If it's not possible to perform a fast commit
> 1068 * due to various reasons, we fall back to full commit. Returns 0
> 1069 * on success, error otherwise.
> 1070 */
> 1071 int ext4_fc_commit(journal_t *journal, tid_t commit_tid)
> 1072 {
> 1073 struct super_block *sb = (struct super_block *)(journal->j_private);
> 1074 struct ext4_sb_info *sbi = EXT4_SB(sb);
> 1075 int nblks = 0, ret, bsize = journal->j_blocksize;
> 1076 int subtid = atomic_read(&sbi->s_fc_subtid);
> (gdb) l *(ext4_fc_commit+0x874)
> 0x67cd4 is in ext4_fc_commit (fs/ext4/fast_commit.c:972).
> 967 */
> 968 ret = ext4_fc_write_inode(inode, crc);
> 969 if (ret)
> 970 goto lock_and_exit;
> 971
> 972 ret = ext4_fc_write_inode_data(inode, crc);
> 973 if (ret)
> 974 goto lock_and_exit;
> 975
> 976 if (!ext4_fc_add_dentry_tlv(
> (gdb)
> -------------gdb info end ------------
>
> --------------- tune2fs -l outout --------
> tune2fs 1.45.5 (07-Jan-2020)
> Filesystem volume name: <none>
> Last mounted on: <not available>
> Filesystem UUID: 281dc870-a1ac-4f0e-85ec-e5cecd3d0f88
> Filesystem magic number: 0xEF53
> Filesystem revision #: 1 (dynamic)
> Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
> Filesystem flags: signed_directory_hash
> Default mount options: user_xattr acl
> Filesystem state: clean
> Errors behavior: Continue
> Filesystem OS type: Linux
> Inode count: 655360
> Block count: 2621440
> Reserved block count: 131072
> Free blocks: 2554687
> Free inodes: 655349
> First block: 0
> Block size: 4096
> Fragment size: 4096
> Group descriptor size: 64
> Reserved GDT blocks: 1024
> Blocks per group: 32768
> Fragments per group: 32768
> Inodes per group: 8192
> Inode blocks per group: 512
> Flex block group size: 16
> Filesystem created: Wed Nov 11 10:20:53 2020
> Last mount time: Wed Nov 11 10:20:53 2020
> Last write time: Wed Nov 11 10:20:53 2020
> Mount count: 1
> Maximum mount count: -1
> Last checked: Wed Nov 11 10:20:53 2020
> Check interval: 0 (<none>)
> Lifetime writes: 4137 kB
> Reserved blocks uid: 0 (user root)
> Reserved blocks gid: 0 (group root)
> First inode: 11
> Inode size: 256
> Required extra isize: 32
> Desired extra isize: 32
> Journal inode: 8
> Default directory hash: half_md4
> Directory Hash Seed: b7a0f1fd-a3fa-4669-ab23-3d63df2b16ef
> Journal backup: inode blocks
> Checksum type: crc32c
> Checksum: 0x555540b3
> --------------- tune2fs -l output end --------
>
> --------------- mount info ----------
> /dev/pmem10 on /daxmnt type ext4 (rw,relatime,context=system_u:object_r:nfs_t:s0,dax=inode,fast_commit)
> --------------- mount info end ----------
> --
> Murphy
prev parent reply other threads:[~2020-11-11 18:08 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-11 3:10 [fsdax] kernel BUG at fs/ext4/ext4_extents.h:199! Murphy Zhou
2020-11-11 18:08 ` harshad shirwadkar [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAD+ocbwkPKnnaKS25eziNp6Pz0yLSwWyitzSMFWhxjFgi_37aQ@mail.gmail.com \
--to=harshadshirwadkar@gmail.com \
--cc=jencce.kernel@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).