Linux CIFS filesystem development
 help / color / mirror / Atom feed
From: Piyush Sachdeva <s.piyush1024@gmail.com>
To: Jeff Layton <jlayton@kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-cifs@vger.kernel.org,
	linux-nfs@vger.kernel.org, netfs@lists.linux.dev
Cc: sprasad@microsoft.com, linux-kernel@vger.kernel.org, sfrench@samba.org
Subject: Re: [DISCUSSION] Preventing ENOSPC/EDQUOT writeback errors on network filesystems
Date: Wed, 13 May 2026 18:41:10 +0530	[thread overview]
Message-ID: <m2zf23e9ox.fsf@gmail.com> (raw)
In-Reply-To: <9e48229614786e0c2e92bb6a2dd3269868f160d0.camel@kernel.org>

Jeff Layton <jlayton@kernel.org> writes:

> On Tue, 2026-05-05 at 11:41 +0530, Piyush Sachdeva wrote:
>> Hi,
>> There have been plenty of discussions on how to handle writeback errors for
>> network filesystems, but most have focused on error reporting after the fact.
>> I'd like to start a discussion around preventing writeback errors specifically
>> ENOSPC and EDQUOT, before they cause silent data loss.
>> 
>> The problem:
>> With buffered writes on network filesystems (cifs, nfs, etc.), the write()
>> syscall copies data into the page cache and returns success immediately. The
>> actual upload to the server happens later during writeback. If the server is
>> out of space at that point, the write fails with ENOSPC. The netfs/writeback
>> layer records this error via mapping_set_error(), but critically the folio's
>> writeback flag is cleared and the page is now clean. Under memory pressure, the
>> VM can reclaim these clean pages, permanently losing data that the application
>> believes was successfully written. Meanwhile, i_size has already been updated
>> to reflect the new file size. So stat() shows a file size inclusive of the data
>> that was never persisted. Another inconsistency here is that total free space
>> hasn't been modified for the file system on the server, leading to incorrect
>> values in statfs() output from the client's pov (assuming statfs() calls go
>> to the server).
>> To illustrate with real-world scenarios:
>> 
>> - A user or application can keep issuing writes to an fd well beyond the
>>   available space, since buffered writes return success as soon as data is
>>   copied to the page cache. A significant amount of data, exceeding the
>>   available quota can accumulate before fsync() is called, at which point
>>   critical data loss is nearly certain.
>> 
>> - A malicious user can exploit this to keep resources pinned and memory
>>   oversubscribed, impacting other applications.
>> 
>> The error is technically observable: fsync() will return it, and close()
>> surfaces it through the flush callback. But in practice, many applications
>> check neither, and the POSIX "just call fsync()" answer isn't satisfying for
>> users who lose data silently.
>> 
>
> Yet, it is the only real answer we have.
>
> This is just a fundamental issue with buffered writes and delayed
> writeback. Either you flush the data to stable storage now, or you have
> to do it later. If you do it later, then it can still fail for all
> sorts of reasons.
>
>> Local filesystems largely avoid this because they can check available space
>> synchronously in write_begin() and fail the write() syscall directly. Network
>> filesystems can't do this cheaply — a round-trip per write to check server
>> space would negate the benefits of buffered I/O.
>> 
>> Through recent development, netfs is becoming a central layer for network
>> filesystem I/O. It already has retry logic for transient failures (EAGAIN,
>> ECONNABORTED), but ENOSPC/EDQUOT remain hard failures. This affects every
>> network filesystem using buffered writes.
>> 
>> I am curious to know if NFS has a solution to this and what the approach is
>> towards this specific problem by NFS community?
>> 
>> This problem is worth solving for all network filesystems. I have a few
>> thoughts on approaches, combining cached statfs() output with
>> fallocate()-style pre-allocation on the write path:
>> 
>> 1. Pre-allocate space on the server before writing to the page cache,
>>    analogous to fallocate() on the write path. This guarantees server-side
>>    space for page cache data.
>> 
>> 2. Since per-write fallocate() calls require a server round-trip, effectively
>>    negating the benefit of buffered I/O. Use cached statfs() output to gate
>>    when pre-allocation is triggered. For example, once free space drops below
>>    20% of total space, enable fallocate() on the write path. Otherwise, let
>>    writes proceed as normal.
>> 
>> 3. Handle refresh and synchronization of the cached statfs() data separately
>>    to avoid staleness.
>> 
>> I'd appreciate feedback from the community on viable approaches.
>
> NFSv4.2 does have an ALLOCATE operation:
>
>     https://datatracker.ietf.org/doc/html/rfc7862#section-15.1
>
> ...and such an operation could (in principle) precede WRITE in a
> compound, but that doesn't really help you. By the time we're issuing
> RPCs to the server, the client application has already finished its
> writes and moved on.
>
> For applications that want to avoid ENOSPC/EDQUOT, the best thing they
> could do is call fallocate() themselves to ensure that the space
> exists. With a sufficiently recent NFS client and server, that should
> DTRT.

Hey Jeff,
Thanks for your email and for sharing the NFS spec. I noticed that the
ALLOCATE operation ends up checking for space during write-back as well,
and the initial concern of loosing data still remain. But if we do the
operation before writing to the page-cache, it would be a performance
issue.

I will try a few experiments and then post my findings here. 

--
Regards,
Piyush

      reply	other threads:[~2026-05-13 13:11 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05  6:11 [DISCUSSION] Preventing ENOSPC/EDQUOT writeback errors on network filesystems Piyush Sachdeva
2026-05-06  6:11 ` Jeff Layton
2026-05-13 13:11   ` Piyush Sachdeva [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2zf23e9ox.fsf@gmail.com \
    --to=s.piyush1024@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=netfs@lists.linux.dev \
    --cc=sfrench@samba.org \
    --cc=sprasad@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox