public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Christoph Hellwig <hch@infradead.org>
Cc: Zhang Yi <yi.zhang@huaweicloud.com>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-kernel@vger.kernel.org, viro@zeniv.linux.org.uk,
	brauner@kernel.org, jack@suse.cz, djwong@kernel.org,
	adilger.kernel@dilger.ca, yi.zhang@huawei.com,
	chengzhihao1@huawei.com, yukuai3@huawei.com,
	yangerkun@huawei.com, Sai Chaitanya Mitta <mittachaitu@gmail.com>,
	linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH 1/2] fs: introduce FALLOC_FL_FORCE_ZERO to fallocate
Date: Mon, 6 Jan 2025 11:17:32 -0500	[thread overview]
Message-ID: <20250106161732.GG1284777@mit.edu> (raw)
In-Reply-To: <Z3u-OCX86j-q7JXo@infradead.org>

On Mon, Jan 06, 2025 at 03:27:52AM -0800, Christoph Hellwig wrote:
> There's a feature request for something similar on the xfs list, so
> I guess people are asking for it.

Yeah, I have folks asking for this on the ext4 side as well.

The one caution that I've given to them is that there is no guarantee
what the performance will be for WRITE SAME or equivalent operations,
since the standards documents state that performance is out of scope
for the document.  So in some cases, WRITE SAME might be fast (if for
example it is just adjusing FTL metadata on an SSD, or some similar
thing on cloud-emulated block devices such as Google's Persistent Desk
or Amazon's Elastic Block Device --- what Darrick has called "software
defined storage" for the cloud), but in other hardware deployments,
WRITE SAME might be as slow as writing zeros to an HDD.

This is technically not the kernel's problem, since we can also use
the same mealy-mouth "performance is out of scope and not the kernel's
concern", but that just transfers the problem to the application
programmers.  I could imagine some kind of tunable which we can make
the block device pretend that it really doesn't support using WRITE
SAME if the performance characteristics are such that it's a Bad Idea
to use it, so that there's a single tunable knob that the system
adminstrator can reach for as opposed to have different ways for
PostgresQL, MySQL, Oracle Enterprise Database, etc have for
configuring whether or not to disable WRITE SAME, but that's not
something we need to decide right away.

> That being said this really should not be a modifier but a separate
> operation, as the logic is very different from FALLOC_FL_ZERO_RANGE,
> similar to how plain prealloc, hole punch and zero range are different
> operations despite all of them resulting in reads of zeroes from the
> range.

Yes.  And we might decide that it should be done using some kind of
ioctl, such as BLKDISCARD, as opposed to a new fallocate operation,
since it really isn't a filesystem metadata operation, just as
BLKDISARD isn't.  The other side of the argument is that ioctls are
ugly, and maybe all new such operations should be plumbed through via
fallocate as opposed to adding a new ioctl.  I don't have strong
feelings on this, although I *do* belive that whatever interface we
use, whether it be fallocate or ioctl, it should be supported by block
devices and files in a file system, to make life easier for those
databases that want to support running on a raw block device (for
full-page advertisements on the back cover of the Businessweek
magazine) or on files (which is how 99.9% of all real-world users
actually run enterprise databases.  :-)

						- Ted

  reply	other threads:[~2025-01-06 16:20 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-28  1:45 [RFC PATCH 0/2] fallocate: introduce FALLOC_FL_FORCE_ZERO flag Zhang Yi
2024-12-28  1:45 ` [RFC PATCH 1/2] fs: introduce FALLOC_FL_FORCE_ZERO to fallocate Zhang Yi
2025-01-06 11:27   ` Christoph Hellwig
2025-01-06 16:17     ` Theodore Ts'o [this message]
2025-01-06 16:27       ` Christoph Hellwig
2025-01-06 17:31         ` Darrick J. Wong
2025-01-06 18:06           ` Christoph Hellwig
2025-01-07 14:05           ` Zhang Yi
2025-01-07 16:42             ` Christoph Hellwig
2025-01-08  1:20               ` Zhang Yi
2025-01-07 11:22       ` Zhang Yi
2025-01-07 12:38     ` Zhang Yi
2024-12-28  1:45 ` [RFC PATCH 2/2] ext4: add FALLOC_FL_FORCE_ZERO support Zhang Yi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250106161732.GG1284777@mit.edu \
    --to=tytso@mit.edu \
    --cc=adilger.kernel@dilger.ca \
    --cc=brauner@kernel.org \
    --cc=chengzhihao1@huawei.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mittachaitu@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yi.zhang@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox