Re: why am I getting "No space left on device" here?

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: why am  I getting "No space left on device" here?
Date: Wed, 15 Jan 2014 19:05:41 +0000 (UTC)	[thread overview]
Message-ID: <pan$6d4dc$e3f7651a$b8f89b4a$ecb5f643@cox.net> (raw)
In-Reply-To: 20140115115543.790adafb@wpkg.org

Tomasz Chmielewski posted on Wed, 15 Jan 2014 11:55:43 +0100 as excerpted:

> I'm no longer able to write to this btrfs filesystem:
> 
> # df -h /home
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sdb4       5.2T  3.6T  1.6T  71% /home

FWIW, standard df doesn't really know how to work with btrfs' advanced 
layout yet, so its output is, let's say "less than ideal", on btrfs, 
particularly on the various btrfs multi-device configurations.

btrfs fi show and btrfs fi df, combined, form the usable replacement on 
btrfs.  Fortunately you listed their output as well...

> # btrfs fi show /home Label: crawler-btrfs  uuid:
> 60f1759c-45f6-4484-9f60-66a4e9bbf2b6
>         Total devices 2 FS bytes used 1.80TiB
>	  devid    3 size 2.56TiB used 1.80TiB path /dev/sdb4
>	  devid    4 size 2.56TiB used 1.80TiB path /dev/sda4
> 
> Btrfs v3.12

Looks pretty reasonable and well balanced, as a raid1 should be. =:^)

Only 1.80 TiB of 2.56 TiB on each device allocated, so there's plenty
of room left to allocate additional chunks as needed.  =:^)

> # btrfs filesystem df /home
> Data, RAID1: total=1.75TiB, used=1.75TiB
> System, RAID1: total=32.00MiB, used=268.00KiB
> Metadata, RAID1: total=53.00GiB, used=51.71GiB

Data chunks are full, 1.75 TiB of 1.75 TiB, so it'll need to allocate a 
new data chunk pretty quickly when you start copying.  (raid1 mode, so 
it'd allocate chunks in pairs on the two devices).  FWIW, data chunks are 
1 GiB each.

Metadata chunks, 51.71 GiB used of 53.00 GiB.  1.25+ GiB free.  Metadata 
chunks are a quarter GiB (256 MiB) each, so that's several chunks worth, 
free.

> However:
> 
> # dd if=/dev/urandom of=bigfile
> dd: writing to `bigfile': No space left on device
> 186+0 records in
> 185+0 records out
> 94720 bytes (95 kB) copied, 0.0144045 s, 6.6 MB/s
> 
> 
> I don't understand why - can anyone explain?

Well, there's two levels of explanation here, but unfortunately they 
don't fully cover it.  Still, here's what's available:

At the first level, as hinted above in the df comments, btrfs' space 
calculation is MUCH more complex than that of a normal filesystem.

First, unlike a normal filesystem, btrfs data and metadata are treated 
separately, and they're very unlikely to run out together, so one or the 
other will be out while the other has room left.  Then there's the fact 
that metadata is dup by default, while data is single, so metadata by 
default takes up twice the space it normally would.  (Plus of course 
btrfs has checksums and even small partial-block file-tail data in its 
metadata, in addition to it all being dupped, so there's a lot MORE 
metadata to deal with on btrfs, than on a normal filesystem.)

In ordered to deal with that, btrfs sets up the empty filesystem as a big 
reserve pool of potential chunks that can be allocated to data or 
metadata as needed, so there's the whole already allocated vs. still 
unallocated and free to allocate thing, as displayed by btrfs fi show, 
that other filesystems don't normally deal with.  Meanwhile, btrfs fi df 
displays, separately for each of data, metadata and system chunks, how 
much of the already allocated space is actually used.  You can see my 
comments on your output above.

Then there's the whole multi-device thing and the various raid modes that 
btrfs has, that simply don't apply to normal filesystems.  Both data/
metadata as raid1 with two devices is actually rather simple, since one 
copy goes to each device.  Actually, that's even simpler than the default 
single-device case, since a single device defaults to dup metadata, 
single data, which is harder to figure out than a two-device raid1's 
simple one copy to each device rule.  But a two-device raid1 is the 
simple case!

Then there's the fact that eventually, the plan is to allow different 
subvolumes to be configurable with different raid levels, so it could 
well be that you'd have raid1, raid10, raid6, raid5, and single, all on 
the same filesystem!

No *WONDER* df doesn't know how to report all this!  Actually, they're 
already working on making df better for the simple all-one-type cases at 
least, but I doubt it'll ever be "good" at reporting for btrfs in the 
complex cases, since it's simply too simple a tool for that job.

This is actually covered on the btrfs wiki in the FAQ as well, altho I 
think I covered it more thoroughly above.  But they'll give you some 
hints for dealing with the problem as well, and I'd definitely recommend 
spending some time reading the wiki in any case, since there's certainly 
more there that you're likely to find very useful as an admin running 
btrfs on your systems.

FAQ  (space-related, see 1.3 and 1.4, and 4.4 thru 4.10)
https://btrfs.wiki.kernel.org/index.php/FAQ

General btrfs wiki link (bookmark it! =:^)
https://btrfs.wiki.kernel.org

The space-related FAQ entries should cover the theory, and give you some 
hints for fixing the problem as well, but there appears to be more going 
on in your case, as you have _PLENTY_ of unallocated space remaining so 
allocating more shouldn't be a problem.  And I had a similar issue 
recently as well -- plenty of space left (tho in my case it was on a 
small mixed-mode filesystem).

That's the second level which I alluded to, where the FAQ and the answers 
above don't really cover things.

In my case, I was copying over a bunch of files at once. Actually, I had 
just done a fresh mkfs.btrfs on the /boot on one of my two ssds (with the 
other one still bootable in case something went wrong while I was setting 
up the new /boot, of course), and was trying to install grub2's modules 
and config files to it once again.  As I said, that's a small (sub 1-gig 
so mixed-mode instead of separate data/metadata) filesystem, and the 
files in question were pretty small, too

But what I found here, was that while some files copied just fine, others 
failed.  HOWEVER, I was using mc (aka midnight commander), and I used its 
directory diff feature to figure out what had copied and what hadn't, 
which left all the uncopied files in the source selected, so I could try 
copying them once again.

And the weird thing was, while the original copy errored out due with a 
no-space error, when I tried again to copy the files that hadn't copied, 
more of them copied over without error!  By doing this a couple times, I 
was able to get everything copied over.

What happened was this.  When the error occurred, while I had unallocated 
space left as shown in btrfs fi show, btrfs fi df showed nearly full 
usage.  (Again, with a sub-gig filesystem, btrfs uses mixed-mode by 
default, so data/metadata combine, so it was just the single mixed-type 
chunk that was about full, not separate data/metadata.  And a balance as 
suggested in the FAQ... didn't help, and I mount with 
compress=lzo,autodefrag already, and it was a fresh filesystem, so...)

But by trying the file copies again for just the files that had been 
missed the first time, the order was different, and something, somehow, 
triggered a new chunk allocation, that for whatever reason, btrfs had 
failed to allocate when it should have, the first time around.

Which brings us back to your case.  While I was dealing with a small sub-
gig filesystem and thus mixed-mode, you're dealing with a large 
filesystem and separate data/metadata chunks.

But just as my already allocated mixed-mode chunks were just about full 
and I needed another one allocated to complete the job, so your data 
chunks are full or very close, according to btrfs fi df, and you need a 
new one allocated (and if the file is greater than a gig in size, likely 
more than one) to finish the job.

And in both our cases, there's plenty of unallocated space in the pool, 
but for whatever reason, btrfs isn't allocating that new chunk when it 
should!  Why, I can't say, but as I mentioned, I was able to work around 
the problem here by trying the remaining files in a different order, and 
at some point, btrfs figured out it needed that new chunk allocated, and 
everything went fine after that.

So... why btrfs is failing to allocate a new chunk when it needs to I 
can't say, but I *CAN* say you're not the only one to have run into the 
problem recently; I did too.

And just as I did, here, with a bit of monkeying around, you can 
/probably/ get btrfs to allocate that new data chunk and get on with 
things.  But the trouble is, since I don't know what the exact problem is 
or what exactly I did to persuade btrfs to do that new chunk allocation, 
I can't tell you exactly what to do to get it to happen, all I can do is 
suggest you try copying smaller files or files of different sizes around 
a bit, hoping to trigger that allocation.

Once that chunk allocation happens, you should be good for at least a 
gig, since that's the data-chunk size, but if your file is over a gig in 
size, you may run into the problem again.  In that case...  Well, you 
could try copying several gigs of smaller files, then once it allocates 
what you need, delete them, leaving the data chunks allocated but with 
enough unused space to copy the original multi-gig file over.

But there's certainly some sort of chunk allocation bug involved here, 
since there was for me and is for you certainly unallocated space 
available to allocate new chunks, and from the btrfs fi df, we can see 
that the existing chunks are full and a new chunk SHOULD be allocated, 
but isn't being allocated, thus the bug.  It'll probably be fixed in 
time, but meanwhile, try monkeying around a bit with other file sizes to 
hopefully work around the issue.

HTH =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2014-01-15 19:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-15 10:55 why am I getting "No space left on device" here? Tomasz Chmielewski
2014-01-15 19:05 ` Duncan [this message]
2014-01-15 19:40   ` Martin Steigerwald
2014-01-15 21:50     ` Duncan
2014-01-15 19:38 ` Chris Murphy
2014-01-15 20:22 ` Tomasz Chmielewski
2014-01-18  0:15   ` Tomasz Chmielewski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$6d4dc$e3f7651a$b8f89b4a$ecb5f643@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox