From: Kalle Valo <kvalo@kernel.org>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-doc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Ingo Molnar <mingo@kernel.org>,
David Laight <David.Laight@ACULAB.COM>,
Jonathan Corbet <corbet@lwn.net>,
Andy Whitcroft <apw@canonical.com>, Joe Perches <joe@perches.com>,
Dwaipayan Ray <dwaipayanray1@gmail.com>,
Lukas Bulwahn <lukas.bulwahn@gmail.com>,
Baoquan He <bhe@redhat.com>, Vivek Goyal <vgoyal@redhat.com>,
Dave Young <dyoung@redhat.com>,
Jani Nikula <jani.nikula@linux.intel.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Nicholas Piggin <npiggin@gmail.com>,
Christophe Leroy <christophe.leroy@csgroup.eu>
Subject: Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
Date: Wed, 21 Sep 2022 07:40:00 +0300 [thread overview]
Message-ID: <87pmfp8hnj.fsf@kernel.org> (raw)
In-Reply-To: <20220920122302.99195-2-david@redhat.com> (David Hildenbrand's message of "Tue, 20 Sep 2022 14:23:00 +0200")
David Hildenbrand <david@redhat.com> writes:
> Linus notes [1] that the introduction of new code that uses VM_BUG_ON()
> is just as bad as BUG_ON(), because it will crash the kernel on
> distributions that enable CONFIG_DEBUG_VM (like Fedora):
>
> VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
> no different, the only difference is "we can make the code smaller
> because these are less important". [2]
>
> This resulted in a more generic discussion about usage of BUG() and
> friends. While there might be corner cases that still deserve a BUG_ON(),
> most BUG_ON() cases should simply use WARN_ON_ONCE() and implement a
> recovery path if reasonable:
>
> The only possible case where BUG_ON can validly be used is "I have
> some fundamental data corruption and cannot possibly return an
> error". [2]
>
> As a very good approximation is the general rule:
>
> "absolutely no new BUG_ON() calls _ever_" [2]
>
> ... not even if something really shouldn't ever happen and is merely for
> documenting that an invariant always has to hold. However, there are sill
> exceptions where BUG_ON() may be used:
>
> If you have a "this is major internal corruption, there's no way we can
> continue", then BUG_ON() is appropriate. [3]
>
> There is only one good BUG_ON():
>
> Now, that said, there is one very valid sub-form of BUG_ON():
> BUILD_BUG_ON() is absolutely 100% fine. [2]
>
> While WARN will also crash the machine with panic_on_warn set, that's
> exactly to be expected:
>
> So we have two very different cases: the "virtual machine with good
> logging where a dead machine is fine" - use 'panic_on_warn'. And
> the actual real hardware with real drivers, running real loads by
> users. [4]
>
> The basic idea is that warnings will similarly get reported by users
> and be found during testing. However, in contrast to a BUG(), there is a
> way to actually influence the expected behavior (e.g., panic_on_warn)
> and to eventually keep the machine alive to extract some debug info.
>
> Ingo notes that not all WARN_ON_ONCE cases need recovery. If we don't ever
> expect this code to trigger in any case, recovery code is not really
> helpful.
>
> I'd prefer to keep all these warnings 'simple' - i.e. no attempted
> recovery & control flow, unless we ever expect these to trigger.
> [5]
>
> There have been different rules floating around that were never properly
> documented. Let's try to clarify.
>
> [1] https://lkml.kernel.org/r/CAHk-=wiEAH+ojSpAgx_Ep=NKPWHU8AdO3V56BXcCsU97oYJ1EA@mail.gmail.com
> [2] https://lore.kernel.org/r/CAHk-=wg40EAZofO16Eviaj7mfqDhZ2gVEbvfsMf6gYzspRjYvw@mail.gmail.com
> [2] https://lkml.kernel.org/r/CAHk-=wit-DmhMfQErY29JSPjFgebx_Ld+pnerc4J2Ag990WwAA@mail.gmail.com
> [4] https://lore.kernel.org/r/CAHk-=wgF7K2gSSpy=m_=K3Nov4zaceUX9puQf1TjkTJLA2XC_g@mail.gmail.com
> [5] https://lore.kernel.org/r/YwIW+mVeZoTOxn%2F4@gmail.com
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
[...]
> +Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
> +**************************************************
> +
> +WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it
> +is common for a given warning condition, if it occurs at all, to occur
> +multiple times. This can fill up and wrap the kernel log, and can even slow
> +the system enough that the excessive logging turns into its own, additional
> +problem.
FWIW I have had cases where WARN() messages caused a reboot, maybe
mention that here? In my case the logging was so excessive that the
watchdog wasn't updated and in the end the device was forcefully
rebooted.
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
WARNING: multiple messages have this Message-ID (diff)
From: Kalle Valo <kvalo@kernel.org>
To: David Hildenbrand <david@redhat.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>,
Baoquan He <bhe@redhat.com>,
linux-doc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
Dave Young <dyoung@redhat.com>, Jonathan Corbet <corbet@lwn.net>,
Nicholas Piggin <npiggin@gmail.com>,
linux-kernel@vger.kernel.org,
Jani Nikula <jani.nikula@linux.intel.com>,
linux-mm@kvack.org, David Laight <David.Laight@ACULAB.COM>,
Dwaipayan Ray <dwaipayanray1@gmail.com>,
Andy Whitcroft <apw@canonical.com>, Joe Perches <joe@perches.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Ingo Molnar <mingo@kernel.org>, Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
Date: Wed, 21 Sep 2022 07:40:00 +0300 [thread overview]
Message-ID: <87pmfp8hnj.fsf@kernel.org> (raw)
In-Reply-To: <20220920122302.99195-2-david@redhat.com> (David Hildenbrand's message of "Tue, 20 Sep 2022 14:23:00 +0200")
David Hildenbrand <david@redhat.com> writes:
> Linus notes [1] that the introduction of new code that uses VM_BUG_ON()
> is just as bad as BUG_ON(), because it will crash the kernel on
> distributions that enable CONFIG_DEBUG_VM (like Fedora):
>
> VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
> no different, the only difference is "we can make the code smaller
> because these are less important". [2]
>
> This resulted in a more generic discussion about usage of BUG() and
> friends. While there might be corner cases that still deserve a BUG_ON(),
> most BUG_ON() cases should simply use WARN_ON_ONCE() and implement a
> recovery path if reasonable:
>
> The only possible case where BUG_ON can validly be used is "I have
> some fundamental data corruption and cannot possibly return an
> error". [2]
>
> As a very good approximation is the general rule:
>
> "absolutely no new BUG_ON() calls _ever_" [2]
>
> ... not even if something really shouldn't ever happen and is merely for
> documenting that an invariant always has to hold. However, there are sill
> exceptions where BUG_ON() may be used:
>
> If you have a "this is major internal corruption, there's no way we can
> continue", then BUG_ON() is appropriate. [3]
>
> There is only one good BUG_ON():
>
> Now, that said, there is one very valid sub-form of BUG_ON():
> BUILD_BUG_ON() is absolutely 100% fine. [2]
>
> While WARN will also crash the machine with panic_on_warn set, that's
> exactly to be expected:
>
> So we have two very different cases: the "virtual machine with good
> logging where a dead machine is fine" - use 'panic_on_warn'. And
> the actual real hardware with real drivers, running real loads by
> users. [4]
>
> The basic idea is that warnings will similarly get reported by users
> and be found during testing. However, in contrast to a BUG(), there is a
> way to actually influence the expected behavior (e.g., panic_on_warn)
> and to eventually keep the machine alive to extract some debug info.
>
> Ingo notes that not all WARN_ON_ONCE cases need recovery. If we don't ever
> expect this code to trigger in any case, recovery code is not really
> helpful.
>
> I'd prefer to keep all these warnings 'simple' - i.e. no attempted
> recovery & control flow, unless we ever expect these to trigger.
> [5]
>
> There have been different rules floating around that were never properly
> documented. Let's try to clarify.
>
> [1] https://lkml.kernel.org/r/CAHk-=wiEAH+ojSpAgx_Ep=NKPWHU8AdO3V56BXcCsU97oYJ1EA@mail.gmail.com
> [2] https://lore.kernel.org/r/CAHk-=wg40EAZofO16Eviaj7mfqDhZ2gVEbvfsMf6gYzspRjYvw@mail.gmail.com
> [2] https://lkml.kernel.org/r/CAHk-=wit-DmhMfQErY29JSPjFgebx_Ld+pnerc4J2Ag990WwAA@mail.gmail.com
> [4] https://lore.kernel.org/r/CAHk-=wgF7K2gSSpy=m_=K3Nov4zaceUX9puQf1TjkTJLA2XC_g@mail.gmail.com
> [5] https://lore.kernel.org/r/YwIW+mVeZoTOxn%2F4@gmail.com
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
[...]
> +Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
> +**************************************************
> +
> +WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it
> +is common for a given warning condition, if it occurs at all, to occur
> +multiple times. This can fill up and wrap the kernel log, and can even slow
> +the system enough that the excessive logging turns into its own, additional
> +problem.
FWIW I have had cases where WARN() messages caused a reboot, maybe
mention that here? In my case the logging was so excessive that the
watchdog wasn't updated and in the end the device was forcefully
rebooted.
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
next prev parent reply other threads:[~2022-09-21 4:40 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-20 12:22 [PATCH v1 0/3] coding-style.rst: document BUG() and WARN() rules David Hildenbrand
2022-09-20 12:22 ` David Hildenbrand
2022-09-20 12:23 ` [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel") David Hildenbrand
2022-09-20 12:23 ` David Hildenbrand
2022-09-21 4:40 ` Kalle Valo [this message]
2022-09-21 4:40 ` Kalle Valo
2022-09-22 14:12 ` David Hildenbrand
2022-09-22 14:12 ` David Hildenbrand
2022-09-26 7:44 ` Kalle Valo
2022-09-26 7:44 ` Kalle Valo
2022-10-04 12:32 ` David Hildenbrand
2022-10-04 12:32 ` David Hildenbrand
2022-09-22 13:43 ` Akira Yokosawa
2022-09-22 13:43 ` Akira Yokosawa
2022-09-22 14:41 ` David Hildenbrand
2022-09-22 14:41 ` David Hildenbrand
2022-09-23 2:26 ` John Hubbard
2022-09-23 2:26 ` John Hubbard
2022-09-23 2:37 ` John Hubbard
2022-09-23 2:37 ` John Hubbard
2022-09-23 10:55 ` David Hildenbrand
2022-09-23 10:55 ` David Hildenbrand
2022-09-20 12:23 ` [PATCH v1 2/3] powerpc/prom_init: drop PROM_BUG() David Hildenbrand
2022-09-20 12:23 ` David Hildenbrand
2022-09-21 13:02 ` Michael Ellerman
2022-09-21 13:02 ` Michael Ellerman
2022-09-21 13:03 ` David Hildenbrand
2022-09-21 13:03 ` David Hildenbrand
2022-09-20 12:23 ` [PATCH v1 3/3] checkpatch: warn on usage of VM_BUG_ON() and other BUG variants David Hildenbrand
2022-09-20 12:23 ` David Hildenbrand
2022-09-23 2:05 ` John Hubbard
2022-09-23 2:05 ` John Hubbard
2022-09-23 2:11 ` Joe Perches
2022-09-23 2:11 ` Joe Perches
2022-09-23 2:20 ` John Hubbard
2022-09-23 2:20 ` John Hubbard
2022-09-23 10:58 ` David Hildenbrand
2022-09-23 10:58 ` David Hildenbrand
2022-10-04 13:24 ` (subset) [PATCH v1 0/3] coding-style.rst: document BUG() and WARN() rules Michael Ellerman
2022-10-04 13:24 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87pmfp8hnj.fsf@kernel.org \
--to=kvalo@kernel.org \
--cc=David.Laight@ACULAB.COM \
--cc=akpm@linux-foundation.org \
--cc=apw@canonical.com \
--cc=bhe@redhat.com \
--cc=christophe.leroy@csgroup.eu \
--cc=corbet@lwn.net \
--cc=david@redhat.com \
--cc=dwaipayanray1@gmail.com \
--cc=dyoung@redhat.com \
--cc=jani.nikula@linux.intel.com \
--cc=joe@perches.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lukas.bulwahn@gmail.com \
--cc=mingo@kernel.org \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.