From: David Howells <dhowells@redhat.com>
To: Steve French <smfrench@gmail.com>,
Namjae Jeon <linkinjeon@kernel.org>,
jra@samba.org
Cc: dhowells@redhat.com, ronniesahlberg@gmail.com,
Tom Talpey <tom@talpey.com>, Stefan Metzmacher <metze@samba.org>,
jlayton@kernel.org, linux-cifs@vger.kernel.org,
samba-technical@lists.samba.org
Subject: Can fallocate() ops be emulated better using SMB request compounding?
Date: Thu, 07 Dec 2023 15:58:46 +0000 [thread overview]
Message-ID: <700923.1701964726@warthog.procyon.org.uk> (raw)
Hi Steve, Namjae, Jeremy,
At the moment certain fallocate() operations aren't very well implemented in
the cifs filesystem on Linux, either because the protocol doesn't fully
support them or because the ops being used don't also set the EOF marker at
the same time and a separate RPC must be made to do that.
For instance:
- FALLOC_FL_ZERO_RANGE does some zeroing and then sets the EOF as two
distinctly separate operations. The code prevents you from doing this op
under some circumstances as it doesn't have an oplock and doesn't want to
race with a third party (note that smb3_punch_hole() doesn't have this
check).
- FALLOC_FL_COLLAPSE_RANGE uses COPYCHUNK to move the file down and then sets
the EOF as two separate operations as there is no protocol op for this.
However, the copy will likely fail if the ranges overlap and it's
non-atomic with respect to a third party.
- FALLOC_FL_INSERT_RANGE has the same issues as FALLOC_FL_COLLAPSE_RANGE.
Question: Would it be possible to do all of these better by using compounding
with SMB2_FLAGS_RELATED_OPERATIONS? In particular, if two components of a
compound are marked related, does the second get skipped if the first fails?
Further, are the two ops then essentially done atomically?
If this is the case, then for FALLOC_FL_ZERO_RANGE, just compounding the
SET_ZERO_DATA with the SET-EOF will reduce or eliminate the race window.
For FALLOC_FL_COLLAPSE/INSERT_RANGE, we could compound the COPYCHUNK and
SET-EOF. As long as the SET-EOF won't happen if the COPYCHUNK fails, this
will reduce the race.
However, for COLLAPSE/INSERT, we can go further: recognise the { COPYCHUNK,
SET-EOF } compound on the server and see if the file positions, chunk length
EOF and future EOF are consistent with a collapse/insert request and, if so,
convert the pair of them to a single fallocate() call and try that; if that
fails, fall back to copy_file_range() and ftruncate().
As an alternative, at least for removing the 3rd-party races, is it possible
to make sure we have an appropriate oplock around the two components in each
case? It would mean potentially more trips to the server, but would remove
the window, I think.
David
next reply other threads:[~2023-12-07 15:58 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-07 15:58 David Howells [this message]
2023-12-07 17:40 ` Can fallocate() ops be emulated better using SMB request compounding? Jeremy Allison
2023-12-07 17:50 ` David Howells
2023-12-07 18:18 ` Jeff Layton
2023-12-07 18:32 ` Jeremy Allison
2023-12-07 20:25 ` Tom Talpey
2023-12-08 14:03 ` Steve French
2023-12-07 18:12 ` Paulo Alcantara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=700923.1701964726@warthog.procyon.org.uk \
--to=dhowells@redhat.com \
--cc=jlayton@kernel.org \
--cc=jra@samba.org \
--cc=linkinjeon@kernel.org \
--cc=linux-cifs@vger.kernel.org \
--cc=metze@samba.org \
--cc=ronniesahlberg@gmail.com \
--cc=samba-technical@lists.samba.org \
--cc=smfrench@gmail.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox