From: Roman Gushchin <guro@fb.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Boyang Xue <bxue@redhat.com>,
Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
<linux-fsdevel@vger.kernel.org>
Subject: Re: Patch 'writeback, cgroup: release dying cgwbs by switching attached inodes' leads to kernel crash
Date: Fri, 16 Jul 2021 13:03:59 -0700 [thread overview]
Message-ID: <YPHmLwF09QCPB7tw@carbon.dhcp.thefacebook.com> (raw)
In-Reply-To: <20210716162340.GY22357@magnolia>
On Fri, Jul 16, 2021 at 09:23:40AM -0700, Darrick J. Wong wrote:
> On Thu, Jul 15, 2021 at 03:28:12PM -0700, Darrick J. Wong wrote:
> > On Thu, Jul 15, 2021 at 01:08:15PM -0700, Roman Gushchin wrote:
> > > On Thu, Jul 15, 2021 at 10:10:50AM -0700, Darrick J. Wong wrote:
> > > > On Thu, Jul 15, 2021 at 11:51:50AM +0800, Boyang Xue wrote:
> > > > > On Thu, Jul 15, 2021 at 10:36 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > >
> > > > > > On Thu, Jul 15, 2021 at 12:22:28AM +0800, Boyang Xue wrote:
> > > > > > > It's unclear to me that where to find the required address in the
> > > > > > > addr2line command line, i.e.
> > > > > > >
> > > > > > > addr2line -e /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux
> > > > > > > <what address here?>
> > > > > >
> > > > > > ./scripts/faddr2line /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux cleanup_offline_cgwbs_workfn+0x320/0x394
> > > > > >
> > > > >
> > > > > Thanks! The result is the same as the
> > > > >
> > > > > addr2line -i -e
> > > > > /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux
> > > > > FFFF8000102D6DD0
> > > > >
> > > > > But this script is very handy.
> > > > >
> > > > > # /usr/src/kernels/5.14.0-0.rc1.15.bx.el9.aarch64/scripts/faddr2line
> > > > > /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux
> > > > > cleanup_offlin
> > > > > e_cgwbs_workfn+0x320/0x394
> > > > > cleanup_offline_cgwbs_workfn+0x320/0x394:
> > > > > arch_atomic64_fetch_add_unless at
> > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/atomic-arch-fallback.h:2265
> > > > > (inlined by) arch_atomic64_add_unless at
> > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/atomic-arch-fallback.h:2290
> > > > > (inlined by) atomic64_add_unless at
> > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/asm-generic/atomic-instrumented.h:1149
> > > > > (inlined by) atomic_long_add_unless at
> > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/asm-generic/atomic-long.h:491
> > > > > (inlined by) percpu_ref_tryget_many at
> > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/percpu-refcount.h:247
> > > > > (inlined by) percpu_ref_tryget at
> > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/percpu-refcount.h:266
> > > > > (inlined by) wb_tryget at
> > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/backing-dev-defs.h:227
> > > > > (inlined by) wb_tryget at
> > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/backing-dev-defs.h:224
> > > > > (inlined by) cleanup_offline_cgwbs_workfn at
> > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/mm/backing-dev.c:679
> > > > >
> > > > > # vi /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/mm/backing-dev.c
> > > > > ```
> > > > > static void cleanup_offline_cgwbs_workfn(struct work_struct *work)
> > > > > {
> > > > > struct bdi_writeback *wb;
> > > > > LIST_HEAD(processed);
> > > > >
> > > > > spin_lock_irq(&cgwb_lock);
> > > > >
> > > > > while (!list_empty(&offline_cgwbs)) {
> > > > > wb = list_first_entry(&offline_cgwbs, struct bdi_writeback,
> > > > > offline_node);
> > > > > list_move(&wb->offline_node, &processed);
> > > > >
> > > > > /*
> > > > > * If wb is dirty, cleaning up the writeback by switching
> > > > > * attached inodes will result in an effective removal of any
> > > > > * bandwidth restrictions, which isn't the goal. Instead,
> > > > > * it can be postponed until the next time, when all io
> > > > > * will be likely completed. If in the meantime some inodes
> > > > > * will get re-dirtied, they should be eventually switched to
> > > > > * a new cgwb.
> > > > > */
> > > > > if (wb_has_dirty_io(wb))
> > > > > continue;
> > > > >
> > > > > if (!wb_tryget(wb)) <=== line#679
> > > > > continue;
> > > > >
> > > > > spin_unlock_irq(&cgwb_lock);
> > > > > while (cleanup_offline_cgwb(wb))
> > > > > cond_resched();
> > > > > spin_lock_irq(&cgwb_lock);
> > > > >
> > > > > wb_put(wb);
> > > > > }
> > > > >
> > > > > if (!list_empty(&processed))
> > > > > list_splice_tail(&processed, &offline_cgwbs);
> > > > >
> > > > > spin_unlock_irq(&cgwb_lock);
> > > > > }
> > > > > ```
> > > > >
> > > > > BTW, this bug can be only reproduced on a non-debug production built
> > > > > kernel (a.k.a kernel rpm package), it's not reproducible on a debug
> > > > > build with various debug configuration enabled (a.k.a kernel-debug rpm
> > > > > package)
> > > >
> > > > FWIW I've also seen this regularly on x86_64 kernels on ext4 with all
> > > > default mkfs settings when running generic/256.
> > >
> > > Oh, that's a useful information, thank you!
> > >
> > > Btw, would you mind to give a patch from an earlier message in the thread
> > > a test? I'd highly appreciate it.
> > >
> > > Thanks!
> >
> > Will do.
>
> fstests passed here, so
>
> Tested-by: Darrick J. Wong <djwong@kernel.org>
Great, thank you!
next prev parent reply other threads:[~2021-07-16 20:04 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-14 3:21 Patch 'writeback, cgroup: release dying cgwbs by switching attached inodes' leads to kernel crash Boyang Xue
2021-07-14 3:57 ` Boyang Xue
2021-07-14 4:11 ` Roman Gushchin
2021-07-14 8:44 ` Boyang Xue
2021-07-14 9:26 ` Jan Kara
2021-07-14 16:22 ` Boyang Xue
2021-07-14 23:46 ` Roman Gushchin
2021-07-15 1:42 ` Boyang Xue
2021-07-15 9:31 ` Jan Kara
2021-07-15 16:04 ` Roman Gushchin
2021-07-16 1:37 ` Boyang Xue
2021-07-15 2:35 ` Matthew Wilcox
2021-07-15 3:51 ` Boyang Xue
2021-07-15 17:10 ` Darrick J. Wong
2021-07-15 20:08 ` Roman Gushchin
2021-07-15 22:28 ` Darrick J. Wong
2021-07-16 16:23 ` Darrick J. Wong
2021-07-16 20:03 ` Roman Gushchin [this message]
2021-07-17 12:00 ` Boyang Xue
2021-07-22 5:29 ` Boyang Xue
2021-07-22 5:41 ` Roman Gushchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YPHmLwF09QCPB7tw@carbon.dhcp.thefacebook.com \
--to=guro@fb.com \
--cc=bxue@redhat.com \
--cc=djwong@kernel.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).