From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: fdmanana@gmail.com
Cc: dsterba@suse.cz, Jakob Unterwurzacher <jakobunt@gmail.com>,
linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: fallocate does not prevent ENOSPC on write
Date: Wed, 24 Apr 2019 07:49:02 +0800 [thread overview]
Message-ID: <8a3b5e64-1df7-447d-3b07-e276b8d65b40@gmx.com> (raw)
In-Reply-To: <CAL3q7H7dzVQFRcTzBNNhCU69i5Yi4xStT14XjRw_vivH6QWnRw@mail.gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 4844 bytes --]
On 2019/4/23 下午10:50, Filipe Manana wrote:
> On Tue, Apr 23, 2019 at 1:14 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2019/4/23 下午7:33, David Sterba wrote:
>>> On Tue, Apr 23, 2019 at 10:16:32AM +0800, Qu Wenruo wrote:
>>>> On 2019/4/23 上午5:09, Jakob Unterwurzacher wrote:
>>>>> I have a user who is reporting ENOSPC errors when running gocryptfs on
>>>>> top of btrfs (ticket: https://github.com/rfjakob/gocryptfs/issues/395 ).
>>>>>
>>>>> What is interesting is that the error gets thrown at write time. This
>>>>> is not supposed to happen, because gocryptfs does
>>>>>
>>>>> fallocate(..., FALLOC_FL_KEEP_SIZE, ...)
>>>>>
>>>>> before writing.
>>>>>
>>>>> I wrote a minimal reproducer in C: https://github.com/rfjakob/fallocate_write
>>>>> This is what it looks like on ext4:
>>>>>
>>>>> $ ../fallocate_write/fallocate_write
>>>>> reading from /dev/urandom
>>>>> writing to ./blob.379Q8P
>>>>> writing blocks of 132096 bytes each
>>>>> [...]
>>>>> fallocate failed: No space left on device
>>>>>
>>>>> On btrfs, it will instead look like this:
>>>>>
>>>>> [...]
>>>>> pwrite failed: No space left on device
>>>>>
>>>>> Is this a bug in btrfs' fallocate implementation or am I reading the
>>>>> guarantees that fallocate gives me wrong?
>>>>
>>>> Since v4.7, this commit changed the how btrfs do NodataCOW check:
>>>> c6887cd11149 ("Btrfs: don't do nocow check unless we have to").
>>>>
>>>> Before that commit, btrfs always check if they need to reserve space for
>>>> COW, while after that patch, btrfs never checks unless we have no space.
>>>>
>>>> However this screws up other nodatacow space check.
>>>> And due to its age and deep changeset, it's pretty hard to fix it.
>>>> I have tried several times, but it will only cause more problems.
>>>
>>> What if the commit is reverted, if the problem is otherwise hard to fix?
>>
>> Tried reverted, but all other problems came up.
>
> I haven't seen an explanation on why that patch causes ENOSPC or what
> nodatacow space check screw ups it causes.
>
> It seems fine to me, and what we currently do:
>
> 1) For any buffered write, check if there's enough free data space;
> 2) If not try to allocate a new data chunk;
> 3) If that fails check if the file has the "have prealloc extents"
> flag or has the nodatacow flag set
> 4) If any of those conditions is true, check if we can write to the
> existing extent - if it's not shared or no checksums exist in its
> range, meaning it's an unwritten (prealloc) extent, return success to
> userspace
>
> So what's wrong with it? And how does it cause the ENOSPC?
E.g.
We have a 128Mb preallocated file extent.
And assume the fs only have 128M free data space, meaning 0 remaining
space at all.
Then we try to buffer write, which means buffered will just fail as it
will need data space.
The idea is always here for fallocate/pwrite, just the timing where the
ENOSPC happens.
We have btrfs/153 for the same reason to fail for a long time, although
it's from quota, but the reason the completely the same.
Thanks,
Qu
>
> Trying the reproducer, at least on a 5.0 kernel, does never fail on a
> pwrite for me, but always on fallocate:
>
> $ mkfs.btrfs -f -b $((4 * 1024 * 1024 * 1024)) /dev/sdi
> $ mount /dev/sdi /mnt/sdi
> $ cd /mnt/sdi
> $ /path/to/reproducer
> reading from /dev/urandom
> writing to ./blob.IIa6tH
> writing blocks of 132096 bytes each
> total 125 MiB, 65.52 MiB/s
> total 251 MiB, 44.59 MiB/s
> total 377 MiB, 55.23 MiB/s
> total 503 MiB, 66.21 MiB/s
> total 629 MiB, 59.97 MiB/s
> total 755 MiB, 3.70 MiB/s
> total 881 MiB, 50.24 MiB/s
> total 1007 MiB, 64.51 MiB/s
> total 1133 MiB, 50.70 MiB/s
> total 1259 MiB, 49.29 MiB/s
> total 1385 MiB, 47.93 MiB/s
> total 1511 MiB, 4.00 MiB/s
> total 1637 MiB, 49.85 MiB/s
> total 1763 MiB, 48.11 MiB/s
> total 1889 MiB, 66.62 MiB/s
> total 2015 MiB, 5.60 MiB/s
> total 2141 MiB, 19.58 MiB/s
> total 2267 MiB, 64.80 MiB/s
> total 2393 MiB, 13.23 MiB/s
> total 2519 MiB, 14.95 MiB/s
> fallocate failed: No space left on device
>
> So either that was tested on a rather old kernel or:
>
> 1) we had snapshotting happening between a fallocate and a pwrite (or
> at the same time as the pwrite)
> 2) before the pwrite (or during) the unwritten/prealloc extent was
> reflinked (cp --reflink, clone or dedupe ioctls)
>
> What did I miss here?
>
> Thanks.
>
>>
>> E.g. reserved space underflow.
>>
>> I'll find the old thread and retry again.
>>
>> Thanks,
>> Qu
>>
>>> This seems to break the semantics of fallocate so the performance should
>>> not the main concern here.
>>>
>>
>
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2019-04-23 23:49 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-22 21:09 fallocate does not prevent ENOSPC on write Jakob Unterwurzacher
2019-04-23 2:16 ` Qu Wenruo
2019-04-23 11:33 ` David Sterba
2019-04-23 12:12 ` Qu Wenruo
2019-04-23 14:50 ` Filipe Manana
2019-04-23 19:21 ` Jakob Unterwurzacher
2019-04-23 23:56 ` Zygo Blaxell
2019-04-27 11:25 ` Jakob Unterwurzacher
2019-04-23 23:49 ` Qu Wenruo [this message]
2019-04-24 9:28 ` Filipe Manana
2019-04-24 9:50 ` Qu Wenruo
2019-04-25 5:49 ` Qu Wenruo
2019-04-25 13:25 ` Josef Bacik
2019-04-25 13:50 ` Qu Wenruo
2019-04-25 14:09 ` Josef Bacik
2019-04-25 14:11 ` Qu Wenruo
2019-04-25 14:13 ` Josef Bacik
2019-04-25 14:16 ` Qu Wenruo
2019-04-26 12:47 ` David Sterba
2019-04-25 14:43 ` Filipe Manana
2019-04-25 23:16 ` Qu Wenruo
2019-04-25 14:39 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8a3b5e64-1df7-447d-3b07-e276b8d65b40@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=dsterba@suse.cz \
--cc=fdmanana@gmail.com \
--cc=jakobunt@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox