Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Hugo Mills <hugo@carfax.org.uk>
To: Marc MERLIN <marc@merlins.org>
Cc: Liu Bo <bo.li.liu@oracle.com>, Chris Mason <clm@fb.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: 4.11.0: kernel BUG at fs/btrfs/ctree.h:1779!
Date: Sat, 20 May 2017 01:48:07 +0000	[thread overview]
Message-ID: <20170520014807.GT9701@carfax.org.uk> (raw)
In-Reply-To: <20170520012522.GC29894@merlins.org>

[-- Attachment #1: Type: text/plain, Size: 3141 bytes --]

On Fri, May 19, 2017 at 06:25:22PM -0700, Marc MERLIN wrote:
> On Sat, May 20, 2017 at 12:57:09AM +0000, Hugo Mills wrote:
> >    I think from the POV of removing these BUG_ONs, it doesn't matter
> > which FS causes them. "All" you need to know is where the error
> > happened. From there, you can (in theory) work out what was wrong and
> > handle it more elagantly than simply stopping.
>  
> Sorry, "you" being the code author, or the user?

   Author.

> If code author, I'd rather this be worked out without the extra steps of
> having to guess or spend more time to see which FS.

   As I understand it, it doesn't really matter which FS it comes
from. The question is: The kernel has hit this BUG_ON. What do you
actually want to do when this happens? You can't bring the kernel to a
grinding halt (BUG_ON), so how do you handle this more elegantly?

   It actually doesn't matter about the state of any specific FS that
caused this particular problem. The fact is, someone decided to check
on the FS's state, and punted the problem of handling the check's
failure to someone later (the BUG_ON). You(*)'ve got to pick up that punt
and deal with it more cleanly.

(*) You == some kernel developer.

> My FS takes up to a day to scrub and btrfs check, clearly making me do this
> over 3 of them is not a good use of time and a loss of up to 3 days of wall
> clock time.
> Not counting that during that time, I have loss of service on all my
> filesystems because I don't want to mount them read-write.
> 
> >    Obviously it would be nice, from the POV of the sysadmin, to know
> > which FS was complaining, but as an FS developer it's secondary to
> > identifying a BUG_ON which happens in real life, which offers an
> > opportunity to make the error path more elegant.
> 
> If the FS is remounted R/O, further damage is averted and it's obvious to
> the admin which FS has a problem.
> 
> Is there a reason why all errors that are serious enough, do not cause the
> FS to remount R/O instead of having any BUG/BUG_ON at all?

   Simply that it's easier to write a BUG_ON than to write the code to
bubble up a failure to the point that the FS can be made RO. This is a
clean-up kind of process: BUG_ONs should mostly be changed into a
proper error-handling path leading to remount-RO (in the worst
cases). As I understand it, it's not massively difficult, but it's
probably non-trivial effort to get right in each case.

> WARN_ON is also fine obviously if the error is not serious enough, or doing
> a WARN_ON + a remount R/O

   Sure, but everything shouild really be turned into either a proper
error-handling path (most likely remount RO), or explicitly defined as
BUG_ON (i.e. "this must never happen -- if it does, then the hardware
is fucked up, and we're not responsible for the consequences") It's
that latter definition that's part of the hard decision-making process
for the kernel dev.

   Hugo.

-- 
Hugo Mills             | Great oxymorons of the world, no. 7:
hugo@... carfax.org.uk | The Simple Truth
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

      reply	other threads:[~2017-05-20  1:49 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-19  4:16 4.11.0: kernel BUG at fs/btrfs/ctree.h:1779! Marc MERLIN
2017-05-19 19:03 ` Liu Bo
2017-05-20  0:11   ` Marc MERLIN
2017-05-20  0:37     ` Hugo Mills
2017-05-20  0:47       ` Marc MERLIN
2017-05-20  0:57         ` Hugo Mills
2017-05-20  1:25           ` Marc MERLIN
2017-05-20  1:48             ` Hugo Mills [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170520014807.GT9701@carfax.org.uk \
    --to=hugo@carfax.org.uk \
    --cc=bo.li.liu@oracle.com \
    --cc=clm@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=marc@merlins.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox