From: Guoqing Jiang <guoqing.jiang@linux.dev>
To: Sachin Sant <sachinp@linux.vnet.ibm.com>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org, yi.zhang@huawei.com, jack@suse.cz
Subject: Re: [powerpc][5.13.0-next-20210701] Kernel crash while running ltp(chdir01) tests
Date: Fri, 2 Jul 2021 17:38:10 +0800 [thread overview]
Message-ID: <bf1c5b38-92f1-65db-e210-a97a199718ba@linux.dev> (raw)
In-Reply-To: <26ACA75D-E13D-405B-9BFC-691B5FB64243@linux.vnet.ibm.com>
On 7/2/21 4:51 PM, Sachin Sant wrote:
> While running LTP tests (chdir01) against 5.13.0-next20210701 booted on a Power server,
> following crash is encountered.
>
> [ 3051.182992] ext2 filesystem being mounted at /var/tmp/avocado_oau90dri/ltp-W0cFB5HtCy/lKhal5/mntpoint supports timestamps until 2038 (0x7fffffff)
> [ 3051.621341] EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
> [ 3051.624645] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
> [ 3051.624682] ext3 filesystem being mounted at /var/tmp/avocado_oau90dri/ltp-W0cFB5HtCy/lKhal5/mntpoint supports timestamps until 2038 (0x7fffffff)
> [ 3051.629026] Kernel attempted to read user page (13fda70000) - exploit attempt? (uid: 0)
> [ 3051.629074] BUG: Unable to handle kernel data access on read at 0x13fda70000
> [ 3051.629103] Faulting instruction address: 0xc0000000006fa5cc
> [ 3051.629118] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 3051.629130] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> [ 3051.629148] Modules linked in: vfat fat btrfs blake2b_generic xor zstd_compress raid6_pq xfs loop sctp ip6_udp_tunnel udp_tunnel libcrc32c rpadlpar_io rpaphp dm_mod bonding rfkill sunrpc pseries_rng xts vmx_crypto uio_pdrv_genirq uio sch_fq_codel ip_tables ext4 mbcache jbd2 sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp fuse [last unloaded: test_cpuidle_latency]
> [ 3051.629270] CPU: 10 PID: 274044 Comm: chdir01 Tainted: G W OE 5.13.0-next-20210701 #1
> [ 3051.629289] NIP: c0000000006fa5cc LR: c008000006949bc4 CTR: c0000000006fa5a0
> [ 3051.629300] REGS: c000000f74de3660 TRAP: 0300 Tainted: G W OE (5.13.0-next-20210701)
> [ 3051.629314] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24000288 XER: 20040000
> [ 3051.629342] CFAR: c008000006957564 DAR: 00000013fda70000 DSISR: 40000000 IRQMASK: 0
> [ 3051.629342] GPR00: c008000006949bc4 c000000f74de3900 c0000000029bc800 c000000f88f0ab80
> [ 3051.629342] GPR04: ffffffffffffffff 0000000000000020 0000000024000282 0000000000000000
> [ 3051.629342] GPR08: c00000110628c828 0000000000000000 00000013fda70000 c008000006957550
> [ 3051.629342] GPR12: c0000000006fa5a0 c0000013ffffbe80 0000000000000000 0000000000000000
> [ 3051.629342] GPR16: 0000000000000000 0000000000000000 00000000100555f8 0000000010050d40
> [ 3051.629342] GPR20: 0000000000000000 0000000010026188 0000000010026160 c000000f88f0ac08
> [ 3051.629342] GPR24: 0000000000000000 c000000f88f0a920 0000000000000000 0000000000000002
> [ 3051.629342] GPR28: c000000f88f0ac50 c000000f88f0a800 c000000fc5577d00 c000000f88f0ab80
> [ 3051.629468] NIP [c0000000006fa5cc] percpu_counter_add_batch+0x2c/0xf0
> [ 3051.629493] LR [c008000006949bc4] __jbd2_journal_remove_checkpoint+0x9c/0x280 [jbd2]
> [ 3051.629526] Call Trace:
> [ 3051.629532] [c000000f74de3900] [c000000f88f0a84c] 0xc000000f88f0a84c (unreliable)
> [ 3051.629547] [c000000f74de3940] [c008000006949bc4] __jbd2_journal_remove_checkpoint+0x9c/0x280 [jbd2]
> [ 3051.629577] [c000000f74de3980] [c008000006949eb4] jbd2_log_do_checkpoint+0x10c/0x630 [jbd2]
> [ 3051.629605] [c000000f74de3a40] [c0080000069547dc] jbd2_journal_destroy+0x1b4/0x4e0 [jbd2]
> [ 3051.629636] [c000000f74de3ad0] [c00800000735d72c] ext4_put_super+0xb4/0x560 [ext4]
> [ 3051.629703] [c000000f74de3b60] [c000000000484d64] generic_shutdown_super+0xc4/0x1d0
> [ 3051.629720] [c000000f74de3bd0] [c000000000484f48] kill_block_super+0x38/0x90
> [ 3051.629736] [c000000f74de3c00] [c000000000485120] deactivate_locked_super+0x80/0x100
> [ 3051.629752] [c000000f74de3c30] [c0000000004bec1c] cleanup_mnt+0x10c/0x1d0
> [ 3051.629767] [c000000f74de3c80] [c000000000188b08] task_work_run+0xf8/0x170
> [ 3051.629783] [c000000f74de3cd0] [c000000000021a24] do_notify_resume+0x434/0x480
> [ 3051.629800] [c000000f74de3d80] [c000000000032910] interrupt_exit_user_prepare_main+0x1a0/0x260
> [ 3051.629816] [c000000f74de3de0] [c000000000032d08] syscall_exit_prepare+0x68/0x150
> [ 3051.629830] [c000000f74de3e10] [c00000000000c770] system_call_common+0x100/0x258
> [ 3051.629846] --- interrupt: c00 at 0x7fffa2b92ffc
> [ 3051.629855] NIP: 00007fffa2b92ffc LR: 00007fffa2b92fcc CTR: 0000000000000000
> [ 3051.629867] REGS: c000000f74de3e80 TRAP: 0c00 Tainted: G W OE (5.13.0-next-20210701)
> [ 3051.629880] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 24000474 XER: 00000000
> [ 3051.629908] IRQMASK: 0
> [ 3051.629908] GPR00: 0000000000000034 00007fffc0242e20 00007fffa2c77100 0000000000000000
> [ 3051.629908] GPR04: 0000000000000000 0000000000000078 0000000000000000 0000000000000020
> [ 3051.629908] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 3051.629908] GPR12: 0000000000000000 00007fffa2d1a310 0000000000000000 0000000000000000
> [ 3051.629908] GPR16: 0000000000000000 0000000000000000 00000000100555f8 0000000010050d40
> [ 3051.629908] GPR20: 0000000000000000 0000000010026188 0000000010026160 00000000100288f0
> [ 3051.629908] GPR24: 00007fffa2d13320 00000000000186a0 0000000010025dd8 0000000010055688
> [ 3051.629908] GPR28: 0000000010024bb8 0000000000000001 0000000000000001 0000000000000000
> [ 3051.630022] NIP [00007fffa2b92ffc] 0x7fffa2b92ffc
> [ 3051.630032] LR [00007fffa2b92fcc] 0x7fffa2b92fcc
> [ 3051.630041] --- interrupt: c00
> [ 3051.630048] Instruction dump:
> [ 3051.630057] 60000000 3c4c022c 38422260 7c0802a6 fbe1fff8 fba1ffe8 7c7f1b78 fbc1fff0
> [ 3051.630078] f8010010 f821ffc1 e94d0030 e9230020 <7fca4aaa> 7fbe2214 7fa9fe76 7d2aea78
> [ 3051.630102] ---[ end trace 83afe3a19212c333 ]---
> [ 3051.633656]
> [ 3052.633681] Kernel panic - not syncing: Fatal exception
>
> 5.13.0-next-20210630 was good. Bisect points to following patch:
>
> commit 4ba3fcdde7e3
> jbd2,ext4: add a shrinker to release checkpointed buffers
>
> Reverting this patch allows the test to run successfully.
I guess the problem is j_jh_shrink_count was destroyed in ext4_put_super
_> jbd2_journal_unregister_shrinker
which is before the path ext4_put_super -> jbd2_journal_destroy ->
jbd2_log_do_checkpoint to call
percpu_counter_dec(&journal->j_jh_shrink_count).
And since jbd2_journal_unregister_shrinker is already called inside
jbd2_journal_destroy, does it make sense
to do this?
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1176,7 +1176,6 @@ static void ext4_put_super(struct super_block *sb)
ext4_unregister_sysfs(sb);
if (sbi->s_journal) {
- jbd2_journal_unregister_shrinker(sbi->s_journal);
aborted = is_journal_aborted(sbi->s_journal);
err = jbd2_journal_destroy(sbi->s_journal);
sbi->s_journal = NULL;
Thanks,
Guoqing
WARNING: multiple messages have this Message-ID (diff)
From: Guoqing Jiang <guoqing.jiang@linux.dev>
To: Sachin Sant <sachinp@linux.vnet.ibm.com>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
Cc: jack@suse.cz, linuxppc-dev@lists.ozlabs.org, yi.zhang@huawei.com
Subject: Re: [powerpc][5.13.0-next-20210701] Kernel crash while running ltp(chdir01) tests
Date: Fri, 2 Jul 2021 17:38:10 +0800 [thread overview]
Message-ID: <bf1c5b38-92f1-65db-e210-a97a199718ba@linux.dev> (raw)
In-Reply-To: <26ACA75D-E13D-405B-9BFC-691B5FB64243@linux.vnet.ibm.com>
On 7/2/21 4:51 PM, Sachin Sant wrote:
> While running LTP tests (chdir01) against 5.13.0-next20210701 booted on a Power server,
> following crash is encountered.
>
> [ 3051.182992] ext2 filesystem being mounted at /var/tmp/avocado_oau90dri/ltp-W0cFB5HtCy/lKhal5/mntpoint supports timestamps until 2038 (0x7fffffff)
> [ 3051.621341] EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
> [ 3051.624645] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
> [ 3051.624682] ext3 filesystem being mounted at /var/tmp/avocado_oau90dri/ltp-W0cFB5HtCy/lKhal5/mntpoint supports timestamps until 2038 (0x7fffffff)
> [ 3051.629026] Kernel attempted to read user page (13fda70000) - exploit attempt? (uid: 0)
> [ 3051.629074] BUG: Unable to handle kernel data access on read at 0x13fda70000
> [ 3051.629103] Faulting instruction address: 0xc0000000006fa5cc
> [ 3051.629118] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 3051.629130] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> [ 3051.629148] Modules linked in: vfat fat btrfs blake2b_generic xor zstd_compress raid6_pq xfs loop sctp ip6_udp_tunnel udp_tunnel libcrc32c rpadlpar_io rpaphp dm_mod bonding rfkill sunrpc pseries_rng xts vmx_crypto uio_pdrv_genirq uio sch_fq_codel ip_tables ext4 mbcache jbd2 sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp fuse [last unloaded: test_cpuidle_latency]
> [ 3051.629270] CPU: 10 PID: 274044 Comm: chdir01 Tainted: G W OE 5.13.0-next-20210701 #1
> [ 3051.629289] NIP: c0000000006fa5cc LR: c008000006949bc4 CTR: c0000000006fa5a0
> [ 3051.629300] REGS: c000000f74de3660 TRAP: 0300 Tainted: G W OE (5.13.0-next-20210701)
> [ 3051.629314] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24000288 XER: 20040000
> [ 3051.629342] CFAR: c008000006957564 DAR: 00000013fda70000 DSISR: 40000000 IRQMASK: 0
> [ 3051.629342] GPR00: c008000006949bc4 c000000f74de3900 c0000000029bc800 c000000f88f0ab80
> [ 3051.629342] GPR04: ffffffffffffffff 0000000000000020 0000000024000282 0000000000000000
> [ 3051.629342] GPR08: c00000110628c828 0000000000000000 00000013fda70000 c008000006957550
> [ 3051.629342] GPR12: c0000000006fa5a0 c0000013ffffbe80 0000000000000000 0000000000000000
> [ 3051.629342] GPR16: 0000000000000000 0000000000000000 00000000100555f8 0000000010050d40
> [ 3051.629342] GPR20: 0000000000000000 0000000010026188 0000000010026160 c000000f88f0ac08
> [ 3051.629342] GPR24: 0000000000000000 c000000f88f0a920 0000000000000000 0000000000000002
> [ 3051.629342] GPR28: c000000f88f0ac50 c000000f88f0a800 c000000fc5577d00 c000000f88f0ab80
> [ 3051.629468] NIP [c0000000006fa5cc] percpu_counter_add_batch+0x2c/0xf0
> [ 3051.629493] LR [c008000006949bc4] __jbd2_journal_remove_checkpoint+0x9c/0x280 [jbd2]
> [ 3051.629526] Call Trace:
> [ 3051.629532] [c000000f74de3900] [c000000f88f0a84c] 0xc000000f88f0a84c (unreliable)
> [ 3051.629547] [c000000f74de3940] [c008000006949bc4] __jbd2_journal_remove_checkpoint+0x9c/0x280 [jbd2]
> [ 3051.629577] [c000000f74de3980] [c008000006949eb4] jbd2_log_do_checkpoint+0x10c/0x630 [jbd2]
> [ 3051.629605] [c000000f74de3a40] [c0080000069547dc] jbd2_journal_destroy+0x1b4/0x4e0 [jbd2]
> [ 3051.629636] [c000000f74de3ad0] [c00800000735d72c] ext4_put_super+0xb4/0x560 [ext4]
> [ 3051.629703] [c000000f74de3b60] [c000000000484d64] generic_shutdown_super+0xc4/0x1d0
> [ 3051.629720] [c000000f74de3bd0] [c000000000484f48] kill_block_super+0x38/0x90
> [ 3051.629736] [c000000f74de3c00] [c000000000485120] deactivate_locked_super+0x80/0x100
> [ 3051.629752] [c000000f74de3c30] [c0000000004bec1c] cleanup_mnt+0x10c/0x1d0
> [ 3051.629767] [c000000f74de3c80] [c000000000188b08] task_work_run+0xf8/0x170
> [ 3051.629783] [c000000f74de3cd0] [c000000000021a24] do_notify_resume+0x434/0x480
> [ 3051.629800] [c000000f74de3d80] [c000000000032910] interrupt_exit_user_prepare_main+0x1a0/0x260
> [ 3051.629816] [c000000f74de3de0] [c000000000032d08] syscall_exit_prepare+0x68/0x150
> [ 3051.629830] [c000000f74de3e10] [c00000000000c770] system_call_common+0x100/0x258
> [ 3051.629846] --- interrupt: c00 at 0x7fffa2b92ffc
> [ 3051.629855] NIP: 00007fffa2b92ffc LR: 00007fffa2b92fcc CTR: 0000000000000000
> [ 3051.629867] REGS: c000000f74de3e80 TRAP: 0c00 Tainted: G W OE (5.13.0-next-20210701)
> [ 3051.629880] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 24000474 XER: 00000000
> [ 3051.629908] IRQMASK: 0
> [ 3051.629908] GPR00: 0000000000000034 00007fffc0242e20 00007fffa2c77100 0000000000000000
> [ 3051.629908] GPR04: 0000000000000000 0000000000000078 0000000000000000 0000000000000020
> [ 3051.629908] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 3051.629908] GPR12: 0000000000000000 00007fffa2d1a310 0000000000000000 0000000000000000
> [ 3051.629908] GPR16: 0000000000000000 0000000000000000 00000000100555f8 0000000010050d40
> [ 3051.629908] GPR20: 0000000000000000 0000000010026188 0000000010026160 00000000100288f0
> [ 3051.629908] GPR24: 00007fffa2d13320 00000000000186a0 0000000010025dd8 0000000010055688
> [ 3051.629908] GPR28: 0000000010024bb8 0000000000000001 0000000000000001 0000000000000000
> [ 3051.630022] NIP [00007fffa2b92ffc] 0x7fffa2b92ffc
> [ 3051.630032] LR [00007fffa2b92fcc] 0x7fffa2b92fcc
> [ 3051.630041] --- interrupt: c00
> [ 3051.630048] Instruction dump:
> [ 3051.630057] 60000000 3c4c022c 38422260 7c0802a6 fbe1fff8 fba1ffe8 7c7f1b78 fbc1fff0
> [ 3051.630078] f8010010 f821ffc1 e94d0030 e9230020 <7fca4aaa> 7fbe2214 7fa9fe76 7d2aea78
> [ 3051.630102] ---[ end trace 83afe3a19212c333 ]---
> [ 3051.633656]
> [ 3052.633681] Kernel panic - not syncing: Fatal exception
>
> 5.13.0-next-20210630 was good. Bisect points to following patch:
>
> commit 4ba3fcdde7e3
> jbd2,ext4: add a shrinker to release checkpointed buffers
>
> Reverting this patch allows the test to run successfully.
I guess the problem is j_jh_shrink_count was destroyed in ext4_put_super
_> jbd2_journal_unregister_shrinker
which is before the path ext4_put_super -> jbd2_journal_destroy ->
jbd2_log_do_checkpoint to call
percpu_counter_dec(&journal->j_jh_shrink_count).
And since jbd2_journal_unregister_shrinker is already called inside
jbd2_journal_destroy, does it make sense
to do this?
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1176,7 +1176,6 @@ static void ext4_put_super(struct super_block *sb)
ext4_unregister_sysfs(sb);
if (sbi->s_journal) {
- jbd2_journal_unregister_shrinker(sbi->s_journal);
aborted = is_journal_aborted(sbi->s_journal);
err = jbd2_journal_destroy(sbi->s_journal);
sbi->s_journal = NULL;
Thanks,
Guoqing
next prev parent reply other threads:[~2021-07-02 9:38 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-02 8:51 [powerpc][5.13.0-next-20210701] Kernel crash while running ltp(chdir01) tests Sachin Sant
2021-07-02 8:51 ` Sachin Sant
2021-07-02 9:38 ` Guoqing Jiang [this message]
2021-07-02 9:38 ` Guoqing Jiang
2021-07-02 13:13 ` Theodore Ts'o
2021-07-02 13:13 ` Theodore Ts'o
2021-07-02 13:23 ` Zhang Yi
2021-07-02 13:23 ` Zhang Yi
2021-07-02 13:52 ` Zhang Yi
2021-07-02 13:52 ` Zhang Yi
2021-07-02 16:11 ` Theodore Ts'o
2021-07-02 16:11 ` Theodore Ts'o
2021-07-02 22:11 ` Theodore Ts'o
2021-07-02 22:11 ` Theodore Ts'o
2021-07-03 3:37 ` Zhang Yi
2021-07-03 3:37 ` Zhang Yi
2021-07-03 3:52 ` Theodore Ts'o
2021-07-03 3:52 ` Theodore Ts'o
2021-07-03 3:05 ` Zhang Yi
2021-07-03 3:05 ` Zhang Yi
2021-07-03 3:35 ` Theodore Ts'o
2021-07-03 3:35 ` Theodore Ts'o
2021-07-03 4:55 ` Zhang Yi
2021-07-03 4:55 ` Zhang Yi
2021-07-04 14:04 ` Theodore Ts'o
2021-07-04 14:04 ` Theodore Ts'o
2021-07-05 2:17 ` Zhang Yi
2021-07-05 2:17 ` Zhang Yi
2021-07-05 14:50 ` [PATCH -v2] ext4: inline jbd2_journal_[un]register_shrinker() Theodore Ts'o
2021-07-05 18:29 ` Jon Hunter
2021-07-06 1:38 ` Zhang Yi
2021-07-05 9:58 ` [powerpc][5.13.0-next-20210701] Kernel crash while running ltp(chdir01) tests Jan Kara
2021-07-05 9:58 ` Jan Kara
2021-07-05 11:27 ` Sachin Sant
2021-07-05 11:27 ` Sachin Sant
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bf1c5b38-92f1-65db-e210-a97a199718ba@linux.dev \
--to=guoqing.jiang@linux.dev \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=sachinp@linux.vnet.ibm.com \
--cc=yi.zhang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.