All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org,
	kexec@lists.infradead.org, David Hildenbrand <david@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>,
	David Laight <David.Laight@ACULAB.COM>,
	Jonathan Corbet <corbet@lwn.net>,
	Andy Whitcroft <apw@canonical.com>, Joe Perches <joe@perches.com>,
	Dwaipayan Ray <dwaipayanray1@gmail.com>,
	Lukas Bulwahn <lukas.bulwahn@gmail.com>,
	Baoquan He <bhe@redhat.com>, Vivek Goyal <vgoyal@redhat.com>,
	Dave Young <dyoung@redhat.com>
Subject: [PATCH RFC 1/2] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
Date: Wed, 24 Aug 2022 18:30:59 +0200	[thread overview]
Message-ID: <20220824163100.224449-2-david@redhat.com> (raw)
In-Reply-To: <20220824163100.224449-1-david@redhat.com>

Linus notes [1] that the introduction of new code that uses VM_BUG_ON()
is just as bad as BUG_ON(), because it will crash the kernel on
distributions that enable CONFIG_DEBUG_VM (like Fedora):

    VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
    no different, the only difference is "we can make the code smaller
    because these are less important". [2]

This resulted in a more generic discussion about usage of BUG() and
friends. While there might be corner cases that still deserve a BUG_ON(),
most BUG_ON() cases should simply use WARN_ON_ONCE() and implement a
recovery path if reasonable:

    The only possible case where BUG_ON can validly be used is "I have
    some fundamental data corruption and cannot possibly return an
    error". [2]

As a very good approximation is the general rule:

    "absolutely no new BUG_ON() calls _ever_" [2]

... not even if something really shouldn't ever happen and is merely for
documenting that an invariant always has to hold.

There is only one good BUG_ON():

    Now, that said, there is one very valid sub-form of BUG_ON():
    BUILD_BUG_ON() is absolutely 100% fine. [2]

While WARN will also crash the machine with panic_on_warn set, that's
exactly to be expected:

    So we have two very different cases: the "virtual machine with good
    logging where a dead machine is fine" - use 'panic_on_warn'. And
    the actual real hardware with real drivers, running real loads by
    users. [3]

The basic idea is that warnings will similarly get reported by users
and be found during testing. However, in contrast to a BUG(), there is a
way to actually influence the expected behavior (e.g., panic_on_warn)
and to eventually keep the machine alive to extract some debug info.

Ingo notes that not all WARN_ON_ONCE cases need recovery. If we don't ever
expect this code to trigger in any case, recovery code is not really
helpful.

    I'd prefer to keep all these warnings 'simple' - i.e. no attempted
    recovery & control flow, unless we ever expect these to trigger.
    [4]

There have been different rules floating around that were never properly
documented. Let's try to clarify.

[1] https://lkml.kernel.org/r/CAHk-=wiEAH+ojSpAgx_Ep=NKPWHU8AdO3V56BXcCsU97oYJ1EA@mail.gmail.com
[2] https://lore.kernel.org/r/CAHk-=wg40EAZofO16Eviaj7mfqDhZ2gVEbvfsMf6gYzspRjYvw@mail.gmail.com
[3] https://lore.kernel.org/r/CAHk-=wgF7K2gSSpy=m_=K3Nov4zaceUX9puQf1TjkTJLA2XC_g@mail.gmail.com

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 Documentation/process/coding-style.rst | 27 ++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
index 03eb53fd029a..a6d81ff578fe 100644
--- a/Documentation/process/coding-style.rst
+++ b/Documentation/process/coding-style.rst
@@ -1186,6 +1186,33 @@ expression used.  For instance:
 	#endif /* CONFIG_SOMETHING */
 
 
+22) Do not crash the kernel
+---------------------------
+
+Do not add new code that uses BUG(), BUG_ON(), VM_BUG_ON(), ... to crash
+the kernel if certain conditions are not met. Instead, use WARN_ON_ONCE()
+with recovery code (if reasonable) instead. Unavoidable data corruption /
+security issues might be a very rare exception to this rule and need good
+justification.
+
+There is no need for WARN_ON_ONCE() recovery code if we never expect it to
+possibly trigger unless something goes seriously wrong or some other code
+is changed to break invariants. Note that VM_WARN_ON_ONCE() cannot be used
+in conditionals.
+
+Usage of WARN() and friends is fine for something that is not expected to
+be triggered easily. ``panic_on_warn`` users are not an excuse to not use
+WARN(): whoever enables ``panic_on_warn`` explicitly asked the kernel to
+crash in this case.
+
+However, WARN() and friends should not be used for something that is
+expected to trigger easily, for example, by user space. pr_warn_once()
+might be a reasonable replacement to notify the user.
+
+Note that BUILD_BUG_ON() is perfectly fine because it will make compilation
+fail instead of crashing the kernel.
+
+
 Appendix I) References
 ----------------------
 
-- 
2.37.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org,
	kexec@lists.infradead.org, David Hildenbrand <david@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>,
	David Laight <David.Laight@ACULAB.COM>,
	Jonathan Corbet <corbet@lwn.net>,
	Andy Whitcroft <apw@canonical.com>, Joe Perches <joe@perches.com>,
	Dwaipayan Ray <dwaipayanray1@gmail.com>,
	Lukas Bulwahn <lukas.bulwahn@gmail.com>,
	Baoquan He <bhe@redhat.com>, Vivek Goyal <vgoyal@redhat.com>,
	Dave Young <dyoung@redhat.com>
Subject: [PATCH RFC 1/2] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
Date: Wed, 24 Aug 2022 18:30:59 +0200	[thread overview]
Message-ID: <20220824163100.224449-2-david@redhat.com> (raw)
In-Reply-To: <20220824163100.224449-1-david@redhat.com>

Linus notes [1] that the introduction of new code that uses VM_BUG_ON()
is just as bad as BUG_ON(), because it will crash the kernel on
distributions that enable CONFIG_DEBUG_VM (like Fedora):

    VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally
    no different, the only difference is "we can make the code smaller
    because these are less important". [2]

This resulted in a more generic discussion about usage of BUG() and
friends. While there might be corner cases that still deserve a BUG_ON(),
most BUG_ON() cases should simply use WARN_ON_ONCE() and implement a
recovery path if reasonable:

    The only possible case where BUG_ON can validly be used is "I have
    some fundamental data corruption and cannot possibly return an
    error". [2]

As a very good approximation is the general rule:

    "absolutely no new BUG_ON() calls _ever_" [2]

... not even if something really shouldn't ever happen and is merely for
documenting that an invariant always has to hold.

There is only one good BUG_ON():

    Now, that said, there is one very valid sub-form of BUG_ON():
    BUILD_BUG_ON() is absolutely 100% fine. [2]

While WARN will also crash the machine with panic_on_warn set, that's
exactly to be expected:

    So we have two very different cases: the "virtual machine with good
    logging where a dead machine is fine" - use 'panic_on_warn'. And
    the actual real hardware with real drivers, running real loads by
    users. [3]

The basic idea is that warnings will similarly get reported by users
and be found during testing. However, in contrast to a BUG(), there is a
way to actually influence the expected behavior (e.g., panic_on_warn)
and to eventually keep the machine alive to extract some debug info.

Ingo notes that not all WARN_ON_ONCE cases need recovery. If we don't ever
expect this code to trigger in any case, recovery code is not really
helpful.

    I'd prefer to keep all these warnings 'simple' - i.e. no attempted
    recovery & control flow, unless we ever expect these to trigger.
    [4]

There have been different rules floating around that were never properly
documented. Let's try to clarify.

[1] https://lkml.kernel.org/r/CAHk-=wiEAH+ojSpAgx_Ep=NKPWHU8AdO3V56BXcCsU97oYJ1EA@mail.gmail.com
[2] https://lore.kernel.org/r/CAHk-=wg40EAZofO16Eviaj7mfqDhZ2gVEbvfsMf6gYzspRjYvw@mail.gmail.com
[3] https://lore.kernel.org/r/CAHk-=wgF7K2gSSpy=m_=K3Nov4zaceUX9puQf1TjkTJLA2XC_g@mail.gmail.com

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 Documentation/process/coding-style.rst | 27 ++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
index 03eb53fd029a..a6d81ff578fe 100644
--- a/Documentation/process/coding-style.rst
+++ b/Documentation/process/coding-style.rst
@@ -1186,6 +1186,33 @@ expression used.  For instance:
 	#endif /* CONFIG_SOMETHING */
 
 
+22) Do not crash the kernel
+---------------------------
+
+Do not add new code that uses BUG(), BUG_ON(), VM_BUG_ON(), ... to crash
+the kernel if certain conditions are not met. Instead, use WARN_ON_ONCE()
+with recovery code (if reasonable) instead. Unavoidable data corruption /
+security issues might be a very rare exception to this rule and need good
+justification.
+
+There is no need for WARN_ON_ONCE() recovery code if we never expect it to
+possibly trigger unless something goes seriously wrong or some other code
+is changed to break invariants. Note that VM_WARN_ON_ONCE() cannot be used
+in conditionals.
+
+Usage of WARN() and friends is fine for something that is not expected to
+be triggered easily. ``panic_on_warn`` users are not an excuse to not use
+WARN(): whoever enables ``panic_on_warn`` explicitly asked the kernel to
+crash in this case.
+
+However, WARN() and friends should not be used for something that is
+expected to trigger easily, for example, by user space. pr_warn_once()
+might be a reasonable replacement to notify the user.
+
+Note that BUILD_BUG_ON() is perfectly fine because it will make compilation
+fail instead of crashing the kernel.
+
+
 Appendix I) References
 ----------------------
 
-- 
2.37.1


  reply	other threads:[~2022-08-24 16:31 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-24 16:30 [PATCH RFC 0/2] coding-style.rst: document BUG() and WARN() rules David Hildenbrand
2022-08-24 16:30 ` David Hildenbrand
2022-08-24 16:30 ` David Hildenbrand [this message]
2022-08-24 16:30   ` [PATCH RFC 1/2] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel") David Hildenbrand
2022-08-24 21:59   ` John Hubbard
2022-08-24 21:59     ` John Hubbard
2022-08-25 12:12     ` David Hildenbrand
2022-08-25 12:12       ` David Hildenbrand
2022-08-26  1:43       ` Dave Young
2022-08-26  1:43         ` Dave Young
2022-08-26 17:02         ` David Hildenbrand
2022-08-26 17:02           ` David Hildenbrand
2022-08-29  1:55           ` Dave Young
2022-08-29  1:55             ` Dave Young
2022-08-29  3:07             ` Linus Torvalds
2022-08-29  3:07               ` Linus Torvalds
2022-08-29  4:49               ` John Hubbard
2022-08-29  4:49                 ` John Hubbard
2022-08-29 17:19                 ` Linus Torvalds
2022-08-29 17:19                   ` Linus Torvalds
2022-08-29  8:44               ` David Hildenbrand
2022-08-29  8:44                 ` David Hildenbrand
2022-08-29  9:25               ` Jani Nikula
2022-08-29  9:25                 ` Jani Nikula
2022-08-24 16:31 ` [PATCH RFC 2/2] checkpatch: warn on usage of VM_BUG_ON() and friends David Hildenbrand
2022-08-24 16:31   ` David Hildenbrand
2022-08-24 16:52   ` Joe Perches
2022-08-24 16:52     ` Joe Perches
2022-08-24 19:00     ` David Hildenbrand
2022-08-24 19:00       ` David Hildenbrand
2022-08-25  9:58     ` David Hildenbrand
2022-08-25  9:58       ` David Hildenbrand
2022-08-25 11:43       ` Jani Nikula
2022-08-25 11:43         ` Jani Nikula
2022-08-25 11:51         ` David Hildenbrand
2022-08-25 11:51           ` David Hildenbrand
2022-08-25  2:30 ` [PATCH RFC 0/2] coding-style.rst: document BUG() and WARN() rules John Hubbard
2022-08-25  2:30   ` John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220824163100.224449-2-david@redhat.com \
    --to=david@redhat.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=akpm@linux-foundation.org \
    --cc=apw@canonical.com \
    --cc=bhe@redhat.com \
    --cc=corbet@lwn.net \
    --cc=dwaipayanray1@gmail.com \
    --cc=dyoung@redhat.com \
    --cc=joe@perches.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lukas.bulwahn@gmail.com \
    --cc=mingo@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.