public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: Greg KH <greg@kroah.com>
Cc: stable@vger.kernel.org, linux-btrfs@vger.kernel.org,
	Filipe Manana <fdmanana@suse.com>,
	Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>
Subject: Re: [PATCH stable-5.15.y] btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range
Date: Sun, 6 Mar 2022 21:48:11 +0800	[thread overview]
Message-ID: <48a4a70a-909b-ea81-0912-fa16974b813d@oracle.com> (raw)
In-Reply-To: <YiNnK4HMpZSg41Gc@kroah.com>



On 05/03/2022 21:35, Greg KH wrote:
> On Thu, Mar 03, 2022 at 09:30:31PM +0800, Anand Jain wrote:
>> From: Filipe Manana <fdmanana@suse.com>
>>
>> Commit f0bfa76a11e93d0fe2c896fcb566568c5e8b5d3f upstream.
>>
>> When doing a direct IO write against a file range that either has
>> preallocated extents in that range or has regular extents and the file
>> has the NOCOW attribute set, the write fails with -ENOSPC when all of
>> the following conditions are met:
>>
>> 1) There are no data blocks groups with enough free space matching
>>     the size of the write;
>>
>> 2) There's not enough unallocated space for allocating a new data block
>>     group;
>>
>> 3) The extents in the target file range are not shared, neither through
>>     snapshots nor through reflinks.
>>
>> This is wrong because a NOCOW write can be done in such case, and in fact
>> it's possible to do it using a buffered IO write, since when failing to
>> allocate data space, the buffered IO path checks if a NOCOW write is
>> possible.
>>
>> The failure in direct IO write path comes from the fact that early on,
>> at btrfs_dio_iomap_begin(), we try to allocate data space for the write
>> and if it that fails we return the error and stop - we never check if we
>> can do NOCOW. But later, at btrfs_get_blocks_direct_write(), we check
>> if we can do a NOCOW write into the range, or a subset of the range, and
>> then release the previously reserved data space.
>>
>> Fix this by doing the data reservation only if needed, when we must COW,
>> at btrfs_get_blocks_direct_write() instead of doing it at
>> btrfs_dio_iomap_begin(). This also simplifies a bit the logic and removes
>> the inneficiency of doing unnecessary data reservations.
>>
>> The following example test script reproduces the problem:
>>
>>    $ cat dio-nocow-enospc.sh
>>    #!/bin/bash
>>
>>    DEV=/dev/sdj
>>    MNT=/mnt/sdj
>>
>>    # Use a small fixed size (1G) filesystem so that it's quick to fill
>>    # it up.
>>    # Make sure the mixed block groups feature is not enabled because we
>>    # later want to not have more space available for allocating data
>>    # extents but still have enough metadata space free for the file writes.
>>    mkfs.btrfs -f -b $((1024 * 1024 * 1024)) -O ^mixed-bg $DEV
>>    mount $DEV $MNT
>>
>>    # Create our test file with the NOCOW attribute set.
>>    touch $MNT/foobar
>>    chattr +C $MNT/foobar
>>
>>    # Now fill in all unallocated space with data for our test file.
>>    # This will allocate a data block group that will be full and leave
>>    # no (or a very small amount of) unallocated space in the device, so
>>    # that it will not be possible to allocate a new block group later.
>>    echo
>>    echo "Creating test file with initial data..."
>>    xfs_io -c "pwrite -S 0xab -b 1M 0 900M" $MNT/foobar
>>
>>    # Now try a direct IO write against file range [0, 10M[.
>>    # This should succeed since this is a NOCOW file and an extent for the
>>    # range was previously allocated.
>>    echo
>>    echo "Trying direct IO write over allocated space..."
>>    xfs_io -d -c "pwrite -S 0xcd -b 10M 0 10M" $MNT/foobar
>>
>>    umount $MNT
>>
>> When running the test:
>>
>>    $ ./dio-nocow-enospc.sh
>>    (...)
>>
>>    Creating test file with initial data...
>>    wrote 943718400/943718400 bytes at offset 0
>>    900 MiB, 900 ops; 0:00:01.43 (625.526 MiB/sec and 625.5265 ops/sec)
>>
>>    Trying direct IO write over allocated space...
>>    pwrite: No space left on device
>>
>> A test case for fstests will follow, testing both this direct IO write
>> scenario as well as the buffered IO write scenario to make it less likely
>> to get future regressions on the buffered IO case.
>>
>> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>> Signed-off-by: David Sterba <dsterba@suse.com>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/inode.c | 142 ++++++++++++++++++++++++++---------------------
>>   1 file changed, 78 insertions(+), 64 deletions(-)
> 
> A normal "cherry pick" worked here, right?

  Yes. A normal cherry pick worked.

> Also this is needed for 5.16.

  Right. It applies to 5.16 as well.

Thanks, Anand

> 
> thanks,
> 
> greg k-h

      reply	other threads:[~2022-03-06 13:48 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-03 13:30 [PATCH stable-5.15.y] btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range Anand Jain
2022-03-05 13:35 ` Greg KH
2022-03-06 13:48   ` Anand Jain [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48a4a70a-909b-ea81-0912-fa16974b813d@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=greg@kroah.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox