All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Theodore Ts'o <tytso@mit.edu>, Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
Subject: Re: BUG: 6.10: ext4 mpage_process_page_bufs() BUG_ON triggers
Date: Wed, 4 Sep 2024 20:30:43 +0100	[thread overview]
Message-ID: <Zti1Y5fthhgiL5Xb@shell.armlinux.org.uk> (raw)
In-Reply-To: <ZtirReiX7J+MDhuh@shell.armlinux.org.uk>

On Wed, Sep 04, 2024 at 07:47:33PM +0100, Russell King (Oracle) wrote:
> With a 6.10 based kernel, no changes to filesystem/MM code, I'm
> seeing a reliable BUG_ON() within minutes of booting on one of my
> VMs. I don't have a complete oops dump, but this is what I do
> have, cobbled together from what was logged by journald, and
> what syslogd was able to splat on the terminals before the VM
> died.
> 
> Sep 04 15:51:46 lists kernel: kernel BUG at fs/ext4/inode.c:1967!
> 
> [ 1346.494848] Call trace:
> [ 1346.495409] [<c04b4f90>] (mpage_process_page_bufs) from [<c04b938c>] (mpage_prepare_extent_to_map+0x410/0x51c)
> [ 1346.499202] [<c04b938c>] (mpage_prepare_extent_to_map) from [<c04bbc40>] (ext4_do_writepages+0x320/0xb94)
> [ 1346.502113] [<c04bbc40>] (ext4_do_writepages) from [<c04bc5dc>] (ext4_writepages+0xc0/0x1b4)
> [ 1346.504662] [<c04bc5dc>] (ext4_writepages) from [<c0361154>] (do_writepages+0x68/0x220)
> [ 1346.506974] [<c0361154>] (do_writepages) from [<c0354868>] (filemap_fdatawrite_wbc+0x64/0x84)
> [ 1346.509165] [<c0354868>] (filemap_fdatawrite_wbc) from [<c035706c>] (__filemap_fdatawrite_range+0x50/0x58)
> [ 1346.511414] [<c035706c>] (__filemap_fdatawrite_range) from [<c035709c>] (filemap_flush+0x28/0x30)
> [ 1346.513518] [<c035709c>] (filemap_flush) from [<c04a8834>] (ext4_release_file+0x70/0xac)
> [ 1346.515312] [<c04a8834>] (ext4_release_file) from [<c03f8088>] (__fput+0xd4/0x2cc)
> [ 1346.517219] [<c03f8088>] (__fput) from [<c03f3e64>] (sys_close+0x28/0x5c)
> [ 1346.518720] [<c03f3e64>] (sys_close) from [<c0200060>] (ret_fast_syscall+0x0/0x5c)
> 
> From a quick look, I don't see any patches that touch fs/ext4/inode.c
> that might address this.
> 
> I'm not able to do any debugging, and from Friday, I suspect I won't
> even be able to use a computer (due to operations on my eyes.)

After rebooting the VM, the next oops was:

Sep 04 19:33:41 lists kernel: Unable to handle kernel paging request at virtual address 5ed304f3 when read
Sep 04 19:33:42 lists kernel: [5ed304f3] *pgd=80000040005003, *pmd=00000000
Sep 04 19:33:42 lists kernel: Internal error: Oops: 206 [#1] PREEMPT SMP ARM

 kernel:[  205.583038] Internal error: Oops: 206 [#1] PREEMPT SMP ARM
 kernel:[  205.630530] Process kworker/u4:2 (pid: 33, stack limit = 0xc68f8000)
...
 kernel:[  205.661017] Call trace:
 kernel:[  205.661997] [<c04d9060>] (ext4_finish_bio) from [<c04d931c>] (ext4_release_io_end+0x48/0xfc)
 kernel:[  205.664523] [<c04d931c>] (ext4_release_io_end) from [<c04d94d8>] (ext4_end_io_rsv_work+0x88/0x188)
 kernel:[  205.666628] [<c04d94d8>] (ext4_end_io_rsv_work) from [<c023f310>] (process_one_work+0x178/0x30c)
 kernel:[  205.669924] [<c023f310>] (process_one_work) from [<c023fe48>] (worker_thread+0x25c/0x438)
 kernel:[  205.671679] [<c023fe48>] (worker_thread) from [<c02480b0>] (kthread+0xfc/0x12c)
 kernel:[  205.673607] [<c02480b0>] (kthread) from [<c020015c>] (ret_from_fork+0x14/0x38)
 kernel:[  205.682719] Code: e1540005 0a00000d e5941008 e594201c (e5913000)

This corresponds with:

c04d9050:       e1540005        cmp     r4, r5
c04d9054:       0a00000d        beq     c04d9090 <ext4_finish_bio+0x208>
c04d9058:       e5941008        ldr     r1, [r4, #8]
c04d905c:       e594201c        ldr     r2, [r4, #28]
c04d9060:       e5913000        ldr     r3, [r1] ;<<<==== faulting instruction

This code is:

                /*
                 * We check all buffers in the folio under b_uptodate_lock
                 * to avoid races with other end io clearing async_write flags
                 */
                spin_lock_irqsave(&head->b_uptodate_lock, flags);
                do {
                        if (bh_offset(bh) < bio_start ||
                            bh_offset(bh) + bh->b_size > bio_end) {

where r4 is "bh", r4+8 is the location of the bh->b_page pointer.

static inline unsigned long bh_offset(const struct buffer_head *bh)
{
        return (unsigned long)(bh)->b_data & (page_size(bh->b_page) - 1);
}

static inline unsigned long compound_nr(struct page *page)
{
        struct folio *folio = (struct folio *)page;

        if (!test_bit(PG_head, &folio->flags))
                return 1;

where PG_head is bit 6. Thus, bh->b_page was corrupt.

I've booted back into 6.7 on the offending VM, and it's stable, so it
appears to be a regression between 6.7 and 6.10.

-- 
*** please note that I probably will only be occasionally responsive
*** for an unknown period of time due to recent eye surgery making
*** reading quite difficult.

RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

  reply	other threads:[~2024-09-04 19:30 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-04 18:47 BUG: 6.10: ext4 mpage_process_page_bufs() BUG_ON triggers Russell King (Oracle)
2024-09-04 19:30 ` Russell King (Oracle) [this message]
2024-09-04 19:50   ` Russell King (Oracle)
2024-09-05 13:15     ` Theodore Ts'o
2024-10-11 17:11       ` Russell King (Oracle)
2024-10-18  0:44         ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zti1Y5fthhgiL5Xb@shell.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=adilger.kernel@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.