All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>
Cc: adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org,
	syzkaller-bugs@googlegroups.com,
	syzbot <syzbot+4acc7d910e617b360859@syzkaller.appspotmail.com>
Subject: Re: [syzbot] [ext4?] BUG: sleeping function called from invalid context in ext4_update_super
Date: Sun, 11 Jun 2023 09:18:29 -0400	[thread overview]
Message-ID: <20230611131829.GA1584772@mit.edu> (raw)
In-Reply-To: <2511036.4XsnlVU6TS@suse> <2113211.OBFZWjSADL@suse>

On Sun, Jun 11, 2023 at 09:05:31AM +0200, Fabio M. De Francesco wrote:
> 
> Are you okay with me submitting a patch with your "Suggested by:" tag? Or 
> maybe you prefer to take care of it yourself? For now I await your kind reply.

Sure, feel free to create such a patch.

I would strongly recommend that you use gce-xfstests or kvm-xfstests
before submitting ext4 patches.  In this particular case, it's a
relatively simple patch, but it's a good habit to get into.  See [1]
for more details.

[1] https://thunk.org/gce-xfstests

At the bare minimum, it would be useful for you to run
"{kvm,gce}-xfstests smoke" or more preferably, "{kvm,gce}-xfstests -c
ext4/4k -g auto".  If it's a particular complex patch series, running
"gce-xfstests -c ext4/all -g auto" is nice, since it can save me a lot
of time when y ou've introduced a subtle corner case bug that doesn't
show up with lighter testing, and I have to track it down myself.  The
latter only costs about $2 --- you could do "kvm-xfstests -c ext4/all
-g auto", but it will take a day or so, and it's not much fun unless
you have dedicated test hardware.  So if you can afford the $2, I
strongly recommend using gce-xfstests for the full set of tests.  (The
smoke test using gce-xfstests costs a penny or two, last time I priced
it out.  But it's not too bad to run it using kvm-xfstests.)


> Can we "reliably" test !in_atomic() and act accordingly? I remember that the 
> in_atomic() helper cannot always detect atomic contexts.

No; we can do something like BUG_ON(in_atomic(), but it will only work
if LOCKDEP is enabled, and that's generally is not enabled on
production systems.


On Sun, Jun 11, 2023 at 11:38:07AM +0200, Fabio M. De Francesco wrote:
> In the meantime, I have had time to think of a different solution that allows 
> the workqueue the chance to run even if the file system is configured to 
> immediately panic on error (sorry, no code - I'm in a hurry)...
> 
> This can let you leave that call to ext4_error() that commit 5354b2af3406 
> ("ext4: allow ext4_get_group_info()") had introduced (if it works - please 
> keep on reading).
> 
> 1) Init a global variable ("glob") and set it to 0.
> 2) Modify the code of the error handling workqueue to set "glob" to 1, soon 
> before the task is done.
> 3) Change the fragment that panics the system to call mdelay() in a loop  (it 
> doesn't sleep - correct?) for an adequate amount of time and periodically 
> check READ_ONCE(global) == 1. If true, break and then panic, otherwise 
> reiterate the loop.

Well, it's more than a bit ugly, and it's not necessasrily guaranteed
to work.  After all we're doing this to allow ext4_error() to be
called in critical sections.  But that means that while we're doing
this mdelay loop, we're holding on to a spinlock.  While lockdep isn't
smart enough to catch this particular deadlock, it's still a real
potential problem, which means such a solution is also fragile.

It's better off simply prohibiting ext4_error() to be called while
holding a lock, and in most places this isn't all that hard.  Most of
the time, you don't want to hold spinlocks across a long period of
time, because this can become a scalability bottleneck.  This means
very often the pattern is something like this:

      spin_lock(..);
         ...
      ret = do_stuff();
      spin_unlock(..);

So it's enough to check the error return after you've unlocked the
spinlock.  And you can also just _not_ call ext4_error() inside
do_stuff(), but have do_stuff return an error code, possibly with a
call to ext4_warning() if you want to get some context for the problem
into the kernel log, and then have the top-level function call
ext4_error().

Cheers,

						- Ted

  reply	other threads:[~2023-06-11 13:18 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-10 13:52 [syzbot] [ext4?] BUG: sleeping function called from invalid context in ext4_update_super syzbot
2023-06-10 20:41 ` Fabio M. De Francesco
2023-06-10 20:49   ` Fabio M. De Francesco
2023-06-11  3:20   ` Theodore Ts'o
2023-06-11  7:05     ` Fabio M. De Francesco
2023-06-11 13:18       ` Theodore Ts'o [this message]
2023-06-11 19:15         ` Fabio M. De Francesco
2023-06-12  0:19           ` Theodore Ts'o
2023-06-14 19:38             ` Fabio M. De Francesco
2023-06-11  9:38     ` Fabio M. De Francesco

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230611131829.GA1584772@mit.edu \
    --to=tytso@mit.edu \
    --cc=adilger.kernel@dilger.ca \
    --cc=fmdefrancesco@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=syzbot+4acc7d910e617b360859@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.