A note on spotting "bugs" [Was: ENOSPC after conversion]

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Robert White <rwhite@pobox.com>
To: Patrik Lundquist <patrik.lundquist@gmail.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: A note on spotting "bugs" [Was: ENOSPC after conversion]
Date: Thu, 11 Dec 2014 14:00:23 -0800	[thread overview]
Message-ID: <548A13F7.30904@pobox.com> (raw)
In-Reply-To: <CAA7pwKNhYxeQjfTyd4WQrsQ7MuapKgRfjwF3kHY+VWDnVk+cTA@mail.gmail.com>

On 12/11/2014 12:18 AM, Patrik Lundquist wrote:
> * Full balance, that ended with "98 enospc errors during balance."

Assuming that quote is an actual quote from the output of the balance...

We can strongly infer that this sort of occurrence is expected since 
there is code to keep track of it and report the total times it happened.

"Bugs" are unexpected things that cause failures and/or damage.

Expected but non-optimal things that print summaries of their 
occurrences tend to be "expected unpleasantness that has been explored 
by the programmer, causes no harm, and is not worth fixing", which is 
different thing than a bug. It's a "No Useful Options".

Cant Fix and Wont Fix events lie somewhere above that on the programmers 
scale that goes from perfect execution to absolute train-wreck bug.

Were I the programmer I might have written this as "98 extents skipped 
due to space constraints (ENOSPC)".

I won't be offering a patch to that effect, however, as there may be 
other kinds of expected ENOSPC events contributing to that counter, so 
re-writing the summary text could be making untrue statements.

I've been have been chasing this with you because your statement that 
"-dusage=99 works, but not -dusage=100". But the message above tells me 
that your characterization as "not working" is somewhat overstating 
things. It _worked_ with -dusage=100 in that it didn't abort, crash, 
trash data, or hang. It just had to skip some elements due to well 
understood (by the implementor) and fully reported conditions.

So lets explore what the system "could have done" instead of just 
skipping those extents...

It could have tried to break the extent into smaller pieces. But to do 
that it would have to dissect the contents of the extent and go looking 
for ways to repack them into two or more smaller extents. Those 
candidate extents would have to be allocated based on guesses before the 
attempt because other writers might steal the space if you don't 
preallocate. This could involve repeated retries and result in taking 
one big extent and exploding it into any number of tiny extents. 
Performing this task could take unbounded time. In computer science it's 
an NP-complete function of arbitrary complexity sometimes called "the 
floppy problem" (a name that is impossible to google usefully, it seems, 
because the word floppy is search poison 8-) ).

The Floppy Problem :: so called because one of the original formulations 
was "how many floppy disks do I need to optimally pack these files 
without having to cut up the files themselves?" Indeed multi-floppy 
"Zip" programs were invented to skip that whole painful mess so people 
could just ship their software. 8-)

If you start reading here 
http://en.wikipedia.org/wiki/Cutting_stock_problem and work your way 
back through the knapsack problem you'll get a glimpse how ugly this 
sort of corner case can get.

In our case the "roll" being "cut" is the donor extent and the possible 
widths/sizes are the discernible gaps in the raw extent map and the 
constraint is that we can't break cut any of the internally allocated 
regions within the extent (we can only relocate them not break them up 
because that could lead to needing to allocate more metadata space in 
the extent tree which could invalidate our planned cuts etc till the end 
of time.)

So it is a problem that _can_ be solved programatically, but it's not a 
problem that is worth the time to solve either in programmer hours or in 
disk write hours.

So yea... It's big, It's valid, and you've got no single place to copy 
it to that is equally big, so it gets skipped.

Not a bug.

next prev parent reply	other threads:[~2014-12-11 22:00 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-11  8:18 ENOSPC after conversion [Was: Fixing Btrfs Filesystem Full Problems typo?] Patrik Lundquist
2014-12-11 10:18 ` Robert White
2014-12-11 23:01   ` Patrik Lundquist
2014-12-12  0:36     ` Robert White
2014-12-12  1:10     ` Robert White
2014-12-11 22:00 ` Robert White [this message]
2014-12-12  6:42   ` A note on spotting "bugs" [Was: ENOSPC after conversion] Patrik Lundquist
2014-12-12 13:29     ` Robert White
2014-12-12 14:09       ` Patrik Lundquist
2014-12-13  1:12       ` Duncan
2014-12-13  3:10         ` Robert White

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=548A13F7.30904@pobox.com \
    --to=rwhite@pobox.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=patrik.lundquist@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).