All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Biggers <ebiggers@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: Jeff Layton <jlayton@kernel.org>,
	linux-kernel@vger.kernel.org,
	"J. Bruce Fields" <bfields@fieldses.org>,
	viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org,
	Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>,
	linux-kernel-mentees@lists.linuxfoundation.org,
	syzbot+e6d5398a02c516ce5e70@syzkaller.appspotmail.com
Subject: Re: [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
Date: Wed, 7 Jul 2021 10:58:01 -0700	[thread overview]
Message-ID: <YOXrKXJi3GaAxizw@gmail.com> (raw)
In-Reply-To: <YOXVb5z3W+T9El+g@casper.infradead.org>

On Wed, Jul 07, 2021 at 05:25:19PM +0100, Matthew Wilcox wrote:
> On Wed, Jul 07, 2021 at 12:18:45PM -0400, Jeff Layton wrote:
> > On Wed, 2021-07-07 at 17:46 +0200, Greg KH wrote:
> > > On Wed, Jul 07, 2021 at 11:34:17AM -0400, J. Bruce Fields wrote:
> > > > On Wed, Jul 07, 2021 at 05:31:06PM +0200, Greg KH wrote:
> > > > > On Wed, Jul 07, 2021 at 11:19:36AM -0400, J. Bruce Fields wrote:
> > > > > > On Wed, Jul 07, 2021 at 05:06:45PM +0200, Greg KH wrote:
> > > > > > > On Wed, Jul 07, 2021 at 09:51:29AM -0400, J. Bruce Fields wrote:
> > > > > > > > On Wed, Jul 07, 2021 at 07:40:47AM -0400, Jeff Layton wrote:
> > > > > > > > > On Wed, 2021-07-07 at 12:51 +0200, Greg KH wrote:
> > > > > > > > > > On Wed, Jul 07, 2021 at 06:44:42AM -0400, Jeff Layton wrote:
> > > > > > > > > > > On Wed, 2021-07-07 at 08:05 +0200, Greg KH wrote:
> > > > > > > > > > > > On Wed, Jul 07, 2021 at 10:35:47AM +0800, Desmond Cheong Zhi Xi wrote:
> > > > > > > > > > > > > +	WARN_ON_ONCE(irqs_disabled());
> > > > > > > > > > > > 
> > > > > > > > > > > > If this triggers, you just rebooted the box :(
> > > > > > > > > > > > 
> > > > > > > > > > > > Please never do this, either properly handle the problem and return an
> > > > > > > > > > > > error, or do not check for this.  It is not any type of "fix" at all,
> > > > > > > > > > > > and at most, a debugging aid while you work on the root problem.
> > > > > > > > > > > > 
> > > > > > > > > > > > thanks,
> > > > > > > > > > > > 
> > > > > > > > > > > > greg k-h
> > > > > > > > > > > 
> > > > > > > > > > > Wait, what? Why would testing for irqs being disabled and throwing a
> > > > > > > > > > > WARN_ON in that case crash the box?
> > > > > > > > > > 
> > > > > > > > > > If panic-on-warn is enabled, which is a common setting for systems these
> > > > > > > > > > days.
> > > > > > > > > 
> > > > > > > > > Ok, that makes some sense.
> > > > > > > > 
> > > > > > > > Wait, I don't get it.
> > > > > > > > 
> > > > > > > > How are we supposed to decide when to use WARN, when to use BUG, and
> > > > > > > > when to panic?  Do we really want to treat them all as equivalent?  And
> > > > > > > > who exactly is turning on panic-on-warn?
> > > > > > > 
> > > > > > > You never use WARN or BUG, unless the system is so messed up that you
> > > > > > > can not possibly recover from the issue.
> > > > > > 
> > > > > > I've heard similar advice for BUG before, but this is the first I've
> > > > > > heard it for WARN.  Do we have any guidelines for how to choose between
> > > > > > WARN and BUG?
> > > > > 
> > > > > Never use either :)
> > > > 
> > > > I can't tell if you're kidding.
> > > 
> > > I am not.
> > > 
> > > > Is there some plan to remove them?
> > > 
> > > Over time, yes.  And any WARN that userspace can ever hit should be
> > > removed today.
> > > 
> > > > There are definitely cases where I've been able to resolve a problem
> > > > more quickly because I got a backtrace from a WARN.
> > > 
> > > If you want a backtrace, ask for that, recover from the error, and move
> > > on.  Do not allow userspace to reboot a machine for no good reason as
> > > again, panic-on-warn is a common setting that people use now.
> > > 
> > > This is what all of the syzbot work has been doing, it triggers things
> > > that cause WARN() to be hit and so we have to fix them.
> > > 
> > 
> > This seems really draconian. Clearly we do want to fix things that show
> > a WARN (otherwise we wouldn't bother warning about it), but I don't
> > think that's a reason to completely avoid them. My understanding has
> > always been:
> > 
> > BUG: for when you reach some condition where the kernel (probably) can't
> > carry on
> > 
> > WARN: for when you reach some condition that is problematic but where
> > the machine can probably soldier on. 
> > 
> > Over the last several years, I've changed a lot of BUGs into WARNs to
> > avoid crashing the box unnecessarily. If someone is setting
> > panic_on_warn, then aren't they just getting what they asked for?
> > 
> > While I don't feel that strongly about this particular WARN in this
> > patch, it seems like a reasonable thing to do. If someone calls these
> > functions with IRQs disabled, then they might end up with some subtle
> > problems that could be hard to detect otherwise.
> 
> Don't we already have a debugging option that would catch this?
> 
> config DEBUG_IRQFLAGS
>         bool "Debug IRQ flag manipulation"
>         help
>           Enables checks for potentially unsafe enabling or disabling of
>           interrupts, such as calling raw_local_irq_restore() when interrupts
>           are enabled.
> 
> so I think this particular warn is unnecessary.
> 
> But I also disagree with Greg.  Normal users aren't setting panic-on-warn.
> Various build bots are setting panic-on-warn -- and they should -- because
> we shouldn't be able to trigger these kinds of warnings from userspace.
> Those are bugs that should be fixed.  But there's no reason to shy away
> from using a WARN when it's the right thing to do.

Yes, WARN is the right choice for signaling a kernel bug that is "recoverable",
e.g. by returning an error or simply ignoring it.  WARN is the wrong choice only
when the condition is user-triggerable, e.g. via invalid inputs passed to system
calls.  I don't understand why Greg is advocating against all use of WARN; that
would make it harder for kernel bugs to be found and reported.  Users of
panic_on_warn (which are usually test systems) *want* the kernel to panic if an
assertion fails -- that's the whole point of it.  I'm not sure why we are still
having this discussion, as the differences between and valid uses for WARN and
BUG were documented in include/asm-generic/bug.h a long time ago...

- Eric
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

WARNING: multiple messages have this Message-ID (diff)
From: Eric Biggers <ebiggers@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: Jeff Layton <jlayton@kernel.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>,
	viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, skhan@linuxfoundation.org,
	linux-kernel-mentees@lists.linuxfoundation.org,
	syzbot+e6d5398a02c516ce5e70@syzkaller.appspotmail.com
Subject: Re: [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
Date: Wed, 7 Jul 2021 10:58:01 -0700	[thread overview]
Message-ID: <YOXrKXJi3GaAxizw@gmail.com> (raw)
In-Reply-To: <YOXVb5z3W+T9El+g@casper.infradead.org>

On Wed, Jul 07, 2021 at 05:25:19PM +0100, Matthew Wilcox wrote:
> On Wed, Jul 07, 2021 at 12:18:45PM -0400, Jeff Layton wrote:
> > On Wed, 2021-07-07 at 17:46 +0200, Greg KH wrote:
> > > On Wed, Jul 07, 2021 at 11:34:17AM -0400, J. Bruce Fields wrote:
> > > > On Wed, Jul 07, 2021 at 05:31:06PM +0200, Greg KH wrote:
> > > > > On Wed, Jul 07, 2021 at 11:19:36AM -0400, J. Bruce Fields wrote:
> > > > > > On Wed, Jul 07, 2021 at 05:06:45PM +0200, Greg KH wrote:
> > > > > > > On Wed, Jul 07, 2021 at 09:51:29AM -0400, J. Bruce Fields wrote:
> > > > > > > > On Wed, Jul 07, 2021 at 07:40:47AM -0400, Jeff Layton wrote:
> > > > > > > > > On Wed, 2021-07-07 at 12:51 +0200, Greg KH wrote:
> > > > > > > > > > On Wed, Jul 07, 2021 at 06:44:42AM -0400, Jeff Layton wrote:
> > > > > > > > > > > On Wed, 2021-07-07 at 08:05 +0200, Greg KH wrote:
> > > > > > > > > > > > On Wed, Jul 07, 2021 at 10:35:47AM +0800, Desmond Cheong Zhi Xi wrote:
> > > > > > > > > > > > > +	WARN_ON_ONCE(irqs_disabled());
> > > > > > > > > > > > 
> > > > > > > > > > > > If this triggers, you just rebooted the box :(
> > > > > > > > > > > > 
> > > > > > > > > > > > Please never do this, either properly handle the problem and return an
> > > > > > > > > > > > error, or do not check for this.  It is not any type of "fix" at all,
> > > > > > > > > > > > and at most, a debugging aid while you work on the root problem.
> > > > > > > > > > > > 
> > > > > > > > > > > > thanks,
> > > > > > > > > > > > 
> > > > > > > > > > > > greg k-h
> > > > > > > > > > > 
> > > > > > > > > > > Wait, what? Why would testing for irqs being disabled and throwing a
> > > > > > > > > > > WARN_ON in that case crash the box?
> > > > > > > > > > 
> > > > > > > > > > If panic-on-warn is enabled, which is a common setting for systems these
> > > > > > > > > > days.
> > > > > > > > > 
> > > > > > > > > Ok, that makes some sense.
> > > > > > > > 
> > > > > > > > Wait, I don't get it.
> > > > > > > > 
> > > > > > > > How are we supposed to decide when to use WARN, when to use BUG, and
> > > > > > > > when to panic?  Do we really want to treat them all as equivalent?  And
> > > > > > > > who exactly is turning on panic-on-warn?
> > > > > > > 
> > > > > > > You never use WARN or BUG, unless the system is so messed up that you
> > > > > > > can not possibly recover from the issue.
> > > > > > 
> > > > > > I've heard similar advice for BUG before, but this is the first I've
> > > > > > heard it for WARN.  Do we have any guidelines for how to choose between
> > > > > > WARN and BUG?
> > > > > 
> > > > > Never use either :)
> > > > 
> > > > I can't tell if you're kidding.
> > > 
> > > I am not.
> > > 
> > > > Is there some plan to remove them?
> > > 
> > > Over time, yes.  And any WARN that userspace can ever hit should be
> > > removed today.
> > > 
> > > > There are definitely cases where I've been able to resolve a problem
> > > > more quickly because I got a backtrace from a WARN.
> > > 
> > > If you want a backtrace, ask for that, recover from the error, and move
> > > on.  Do not allow userspace to reboot a machine for no good reason as
> > > again, panic-on-warn is a common setting that people use now.
> > > 
> > > This is what all of the syzbot work has been doing, it triggers things
> > > that cause WARN() to be hit and so we have to fix them.
> > > 
> > 
> > This seems really draconian. Clearly we do want to fix things that show
> > a WARN (otherwise we wouldn't bother warning about it), but I don't
> > think that's a reason to completely avoid them. My understanding has
> > always been:
> > 
> > BUG: for when you reach some condition where the kernel (probably) can't
> > carry on
> > 
> > WARN: for when you reach some condition that is problematic but where
> > the machine can probably soldier on. 
> > 
> > Over the last several years, I've changed a lot of BUGs into WARNs to
> > avoid crashing the box unnecessarily. If someone is setting
> > panic_on_warn, then aren't they just getting what they asked for?
> > 
> > While I don't feel that strongly about this particular WARN in this
> > patch, it seems like a reasonable thing to do. If someone calls these
> > functions with IRQs disabled, then they might end up with some subtle
> > problems that could be hard to detect otherwise.
> 
> Don't we already have a debugging option that would catch this?
> 
> config DEBUG_IRQFLAGS
>         bool "Debug IRQ flag manipulation"
>         help
>           Enables checks for potentially unsafe enabling or disabling of
>           interrupts, such as calling raw_local_irq_restore() when interrupts
>           are enabled.
> 
> so I think this particular warn is unnecessary.
> 
> But I also disagree with Greg.  Normal users aren't setting panic-on-warn.
> Various build bots are setting panic-on-warn -- and they should -- because
> we shouldn't be able to trigger these kinds of warnings from userspace.
> Those are bugs that should be fixed.  But there's no reason to shy away
> from using a WARN when it's the right thing to do.

Yes, WARN is the right choice for signaling a kernel bug that is "recoverable",
e.g. by returning an error or simply ignoring it.  WARN is the wrong choice only
when the condition is user-triggerable, e.g. via invalid inputs passed to system
calls.  I don't understand why Greg is advocating against all use of WARN; that
would make it harder for kernel bugs to be found and reported.  Users of
panic_on_warn (which are usually test systems) *want* the kernel to panic if an
assertion fails -- that's the whole point of it.  I'm not sure why we are still
having this discussion, as the differences between and valid uses for WARN and
BUG were documented in include/asm-generic/bug.h a long time ago...

- Eric

  parent reply	other threads:[~2021-07-07 17:58 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-07  2:35 [PATCH v2 0/2] fcntl: fix potential deadlocks Desmond Cheong Zhi Xi
2021-07-07  2:35 ` Desmond Cheong Zhi Xi
2021-07-07  2:35 ` [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock Desmond Cheong Zhi Xi
2021-07-07  2:35   ` Desmond Cheong Zhi Xi
2021-07-07  6:05   ` Greg KH
2021-07-07  6:05     ` Greg KH
2021-07-07  6:54     ` Desmond Cheong Zhi Xi
2021-07-07  6:54       ` Desmond Cheong Zhi Xi
2021-07-07 10:44     ` Jeff Layton
2021-07-07 10:44       ` Jeff Layton
2021-07-07 10:51       ` Greg KH
2021-07-07 10:51         ` Greg KH
2021-07-07 11:40         ` Jeff Layton
2021-07-07 11:40           ` Jeff Layton
2021-07-07 13:51           ` J. Bruce Fields
2021-07-07 13:51             ` J. Bruce Fields
2021-07-07 15:06             ` Greg KH
2021-07-07 15:06               ` Greg KH
2021-07-07 15:19               ` J. Bruce Fields
2021-07-07 15:19                 ` J. Bruce Fields
2021-07-07 15:31                 ` Greg KH
2021-07-07 15:31                   ` Greg KH
2021-07-07 15:34                   ` J. Bruce Fields
2021-07-07 15:34                     ` J. Bruce Fields
2021-07-07 15:46                     ` Greg KH
2021-07-07 15:46                       ` Greg KH
2021-07-07 16:18                       ` Jeff Layton
2021-07-07 16:18                         ` Jeff Layton
2021-07-07 16:25                         ` Matthew Wilcox
2021-07-07 16:25                           ` Matthew Wilcox
2021-07-07 17:52                           ` Jeff Layton
2021-07-07 17:52                             ` Jeff Layton
2021-07-07 17:58                           ` Eric Biggers [this message]
2021-07-07 17:58                             ` Eric Biggers
2021-07-07  2:35 ` [PATCH v2 2/2] fcntl: fix potential deadlock for &fasync_struct.fa_lock Desmond Cheong Zhi Xi
2021-07-07  2:35   ` Desmond Cheong Zhi Xi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YOXrKXJi3GaAxizw@gmail.com \
    --to=ebiggers@kernel.org \
    --cc=bfields@fieldses.org \
    --cc=desmondcheongzx@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel-mentees@lists.linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=syzbot+e6d5398a02c516ce5e70@syzkaller.appspotmail.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.