public inbox for linux-bcachefs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Daniel J Blueman <daniel@quora.org>
Cc: Coly Li <colyli@suse.de>,
	linux-bcachefs@vger.kernel.org,
	Kent Overstreet <kent.overstreet@gmail.com>
Subject: Re: trans path overflow during metadata replication with lockdep
Date: Fri, 13 Oct 2023 08:44:38 -0400	[thread overview]
Message-ID: <ZSk7tq/4Ah8EE2an@bfoster> (raw)
In-Reply-To: <CAMVG2svBFVkjnEJRdC5H=uXxL6T_mYH8XdQQcBrUP7mqrsh61Q@mail.gmail.com>

On Fri, Oct 13, 2023 at 07:52:54PM +0800, Daniel J Blueman wrote:
> On Fri, 13 Oct 2023 at 19:06, Brian Foster <bfoster@redhat.com> wrote:
> > On Tue, Oct 10, 2023 at 11:20:59PM +0800, Coly Li wrote:
> > > Forward it to linux-bcachefs@vger.kenrel.org <mailto:linux-bcachefs@vger.kenrel.org>
> > >
> > > > 2023年10月10日 07:43,Daniel J Blueman <daniel@quora.org> 写道:
> > > >
> > > > Firstly, bcachefs introduces a new era of in-tree filesystems with
> > > > some monumental features (sorry, ZFS); hats off to Kent for landing
> > > > this!
> > > >
> > > > My testing finds it is in great shape; far better than BTRFS was when
> > > > it landed. Testing on linux next-20231005 with additional debug checks
> > > > atop the Ubuntu 23.04 kernel generic config [1], I was able to provoke
> > > > a btree trans path overflow cornercase [2].
> >
> > Thanks for testing and for the bug report. I'm curious what kind of
> > testing you've been doing in general and perhaps if you have any more
> > interesting results to share? ;)
> 
> Thanks for asking Brian! My validation seeks to exercise edge cases by
> triggering multiple conditions in parallel, then working backwards to
> identify a minimal and reliable reproducer.
> 
> I'll prepare a post with additional early findings in the coming days.
> 
> > > > The minimal reproducer is:
> > > > # modprobe brd rd_nr=2 rd_size=1048576
> > > > # bcachefs format --metadata_replicas=2 --label=tier1.1 /dev/ram0
> > > > --label=tier1.2 /dev/ram1
> > > > # mount -t bcachefs /dev/ram0:/dev/ram1 /mnt
> > > > # dd if=/dev/zero of=/mnt/test bs=128M
> > > >
> > > > The issue doesn't reproduce with metadata_replicas=1 or a single block device.
> >
> > My naive assumption would be that the higher replica count increases the
> > size of some of these transactions due to having to update multiple
> > devices, but this is not an area I've dug into in depth tbh. I've not
> > seen this yet myself, but I think most of the multi device testing I've
> > done so far has still been with replicas=1.
> 
> With the stock Ubuntu kernel config (no CONFIG_LOCKDEP, likely similar
> to your codebase), my testing eventually provoked a crash; I'll see if
> I can reproduce this later in case it's the same path as this report
> with CONFIG_LOCKDEP.
> 

Ah, Ok. It would be interesting to know whether LOCKDEP is more of an
overhead or timing contributing factor (as opposed to functional).

> > > > At debug entry, I couldn't determine why BTREE_ITER_MAX must be 64
> > > > rather than 32 when CONFIG_LOCKDEP is set, however the panic doesn't
> > > > occur without CONFIG_LOCKDEP, so it appears related; keeping it at
> > > > value 32 with CONFIG_LOCKDEP doesn't prevent the panic also.
> > > >
> > > > @Kent/anyone?
> >
> > Hmm, that is interesting. I'm not sure why lockdep would be a factor
> > here. Where did you come up with the notion that ITER_MAX could be 32
> > (and the idea to test it) with CONFIG_LOCKDEP=y? It looks like it's
> > hardcoded to 64, but perhaps I'm missing something.
> 
> See https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/fs/bcachefs/btree_types.h#n390
> 
> #ifndef CONFIG_LOCKDEP
> #define BTREE_ITER_MAX 64
> #else
> #define BTREE_ITER_MAX 32
> #endif
> 

Oh, I was looking at the master branch. I didn't realize this was
different in for-next. That looks like a relatively old commit as well.
Kent might have to chime in on what the deal is with that.

So my guess would be that this is a bit of a whack-a-mole between
accommodating lockdep and workloads that stress the transaction data
structure in this way. For the purposes of your testing, I think it's
probably more useful to either set this to 64 (and accept the more
graceful lockdep overflow situation) or just disable lockdep and confirm
whether things work correctly.

For the purposes of bcachefs, I think this means that either this
setting is not suitable for !BCACHEFS_DEBUG kernels (regardless of
lockdep state) or alternatively that handling of this limit condition
needs to be improved to be more graceful one way or another, because we
probably don't want to trade the lockdep overflow behavior (which IIRC
just splats and disables lockdep) for a kernel panic in most cases.

Brian

> > We'll see if Kent can make more sense of the crash signature whenever
> > he's back around. Otherwise I'll make a note to try and reproduce it
> > once I work through some other things...
> 
> It would be good to get some independent confirmation on this.
> 
> Thanks again!
>   Dan
> -- 
> Daniel J Blueman
> 


  reply	other threads:[~2023-10-13 12:45 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAMVG2sun2sqFXt=H-0cVWnATGMMFpe-0ksRWy3uhUZXbA5m1qA@mail.gmail.com>
2023-10-10 15:20 ` trans path overflow during metadata replication with lockdep Coly Li
2023-10-13 11:06   ` Brian Foster
2023-10-13 11:52     ` Daniel J Blueman
2023-10-13 12:44       ` Brian Foster [this message]
2023-10-19  1:16   ` Kent Overstreet
2023-10-20  7:20     ` Daniel J Blueman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZSk7tq/4Ah8EE2an@bfoster \
    --to=bfoster@redhat.com \
    --cc=colyli@suse.de \
    --cc=daniel@quora.org \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-bcachefs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox