From: Matthew Wilcox <willy@infradead.org>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-xfs@vger.kernel.org
Subject: Re: Strange SEEK_HOLE / SEEK_DATA behavior
Date: Mon, 26 Oct 2020 15:14:04 +0000 [thread overview]
Message-ID: <20201026151404.GR20115@casper.infradead.org> (raw)
In-Reply-To: <20201026145710.GF28769@quack2.suse.cz>
On Mon, Oct 26, 2020 at 03:57:10PM +0100, Jan Kara wrote:
> Hello!
>
> When reviewing Matthew's THP patches I've noticed one odd behavior which
> got copied from current iomap seek hole/data helpers. Currently we have:
>
> # fallocate -l 4096 testfile
> # xfs_io -x -c "seek -h 0" testfile
> Whence Result
> HOLE 0
> # dd if=testfile bs=4096 count=1 of=/dev/null
> # xfs_io -x -c "seek -h 0" testfile
> Whence Result
> HOLE 4096
>
> So once we read from an unwritten extent, the areas with cached pages
> suddently become treated as data. Later when pages get evicted, they become
> treated as holes again. Strictly speaking I wouldn't say this is a bug
> since nobody promises we won't treat holes as data but it looks weird.
> Shouldn't we treat clean pages over unwritten extents still as holes and
> only once the page becomes dirty treat is as data? What do other people
> think?
I think we actually discussed this recently. Unless I misunderstood
one or both messages:
https://lore.kernel.org/linux-fsdevel/20201014223743.GD7391@dread.disaster.area/
I agree it's not great, but I'm not sure it's worth getting it "right"
by tracking whether a page contains only zeroes.
I have been vaguely thinking about optimising for read-mostly workloads
on sparse files by storing a magic entry that means "use the zero
page" in the page cache instead of a page, like DAX does (only better).
It hasn't risen to the top of my list yet. Does anyone have a workload
that would benefit from it?
(I don't mean "can anybody construct one"; that's trivially possible.
I mean, do any customers care about the performance of that workload?)
next prev parent reply other threads:[~2020-10-26 15:14 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-26 14:57 Strange SEEK_HOLE / SEEK_DATA behavior Jan Kara
2020-10-26 15:14 ` Matthew Wilcox [this message]
2020-10-26 16:48 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201026151404.GR20115@casper.infradead.org \
--to=willy@infradead.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox