From: Piyush Sachdeva <s.piyush1024@gmail.com>
To: Jeff Layton <jlayton@kernel.org>,
linux-fsdevel@vger.kernel.org, linux-cifs@vger.kernel.org,
linux-nfs@vger.kernel.org, netfs@lists.linux.dev
Cc: sprasad@microsoft.com, linux-kernel@vger.kernel.org, sfrench@samba.org
Subject: Re: [DISCUSSION] Preventing ENOSPC/EDQUOT writeback errors on network filesystems
Date: Wed, 13 May 2026 18:41:10 +0530 [thread overview]
Message-ID: <m2zf23e9ox.fsf@gmail.com> (raw)
In-Reply-To: <9e48229614786e0c2e92bb6a2dd3269868f160d0.camel@kernel.org>
Jeff Layton <jlayton@kernel.org> writes:
> On Tue, 2026-05-05 at 11:41 +0530, Piyush Sachdeva wrote:
>> Hi,
>> There have been plenty of discussions on how to handle writeback errors for
>> network filesystems, but most have focused on error reporting after the fact.
>> I'd like to start a discussion around preventing writeback errors specifically
>> ENOSPC and EDQUOT, before they cause silent data loss.
>>
>> The problem:
>> With buffered writes on network filesystems (cifs, nfs, etc.), the write()
>> syscall copies data into the page cache and returns success immediately. The
>> actual upload to the server happens later during writeback. If the server is
>> out of space at that point, the write fails with ENOSPC. The netfs/writeback
>> layer records this error via mapping_set_error(), but critically the folio's
>> writeback flag is cleared and the page is now clean. Under memory pressure, the
>> VM can reclaim these clean pages, permanently losing data that the application
>> believes was successfully written. Meanwhile, i_size has already been updated
>> to reflect the new file size. So stat() shows a file size inclusive of the data
>> that was never persisted. Another inconsistency here is that total free space
>> hasn't been modified for the file system on the server, leading to incorrect
>> values in statfs() output from the client's pov (assuming statfs() calls go
>> to the server).
>> To illustrate with real-world scenarios:
>>
>> - A user or application can keep issuing writes to an fd well beyond the
>> available space, since buffered writes return success as soon as data is
>> copied to the page cache. A significant amount of data, exceeding the
>> available quota can accumulate before fsync() is called, at which point
>> critical data loss is nearly certain.
>>
>> - A malicious user can exploit this to keep resources pinned and memory
>> oversubscribed, impacting other applications.
>>
>> The error is technically observable: fsync() will return it, and close()
>> surfaces it through the flush callback. But in practice, many applications
>> check neither, and the POSIX "just call fsync()" answer isn't satisfying for
>> users who lose data silently.
>>
>
> Yet, it is the only real answer we have.
>
> This is just a fundamental issue with buffered writes and delayed
> writeback. Either you flush the data to stable storage now, or you have
> to do it later. If you do it later, then it can still fail for all
> sorts of reasons.
>
>> Local filesystems largely avoid this because they can check available space
>> synchronously in write_begin() and fail the write() syscall directly. Network
>> filesystems can't do this cheaply — a round-trip per write to check server
>> space would negate the benefits of buffered I/O.
>>
>> Through recent development, netfs is becoming a central layer for network
>> filesystem I/O. It already has retry logic for transient failures (EAGAIN,
>> ECONNABORTED), but ENOSPC/EDQUOT remain hard failures. This affects every
>> network filesystem using buffered writes.
>>
>> I am curious to know if NFS has a solution to this and what the approach is
>> towards this specific problem by NFS community?
>>
>> This problem is worth solving for all network filesystems. I have a few
>> thoughts on approaches, combining cached statfs() output with
>> fallocate()-style pre-allocation on the write path:
>>
>> 1. Pre-allocate space on the server before writing to the page cache,
>> analogous to fallocate() on the write path. This guarantees server-side
>> space for page cache data.
>>
>> 2. Since per-write fallocate() calls require a server round-trip, effectively
>> negating the benefit of buffered I/O. Use cached statfs() output to gate
>> when pre-allocation is triggered. For example, once free space drops below
>> 20% of total space, enable fallocate() on the write path. Otherwise, let
>> writes proceed as normal.
>>
>> 3. Handle refresh and synchronization of the cached statfs() data separately
>> to avoid staleness.
>>
>> I'd appreciate feedback from the community on viable approaches.
>
> NFSv4.2 does have an ALLOCATE operation:
>
> https://datatracker.ietf.org/doc/html/rfc7862#section-15.1
>
> ...and such an operation could (in principle) precede WRITE in a
> compound, but that doesn't really help you. By the time we're issuing
> RPCs to the server, the client application has already finished its
> writes and moved on.
>
> For applications that want to avoid ENOSPC/EDQUOT, the best thing they
> could do is call fallocate() themselves to ensure that the space
> exists. With a sufficiently recent NFS client and server, that should
> DTRT.
Hey Jeff,
Thanks for your email and for sharing the NFS spec. I noticed that the
ALLOCATE operation ends up checking for space during write-back as well,
and the initial concern of loosing data still remain. But if we do the
operation before writing to the page-cache, it would be a performance
issue.
I will try a few experiments and then post my findings here.
--
Regards,
Piyush
prev parent reply other threads:[~2026-05-13 13:11 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 6:11 [DISCUSSION] Preventing ENOSPC/EDQUOT writeback errors on network filesystems Piyush Sachdeva
2026-05-06 6:11 ` Jeff Layton
2026-05-13 13:11 ` Piyush Sachdeva [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m2zf23e9ox.fsf@gmail.com \
--to=s.piyush1024@gmail.com \
--cc=jlayton@kernel.org \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=netfs@lists.linux.dev \
--cc=sfrench@samba.org \
--cc=sprasad@microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.