public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-xfs@vger.kernel.org
Subject: Re: Strange SEEK_HOLE / SEEK_DATA behavior
Date: Mon, 26 Oct 2020 17:48:10 +0100	[thread overview]
Message-ID: <20201026164810.GI28769@quack2.suse.cz> (raw)
In-Reply-To: <20201026151404.GR20115@casper.infradead.org>

On Mon 26-10-20 15:14:04, Matthew Wilcox wrote:
> On Mon, Oct 26, 2020 at 03:57:10PM +0100, Jan Kara wrote:
> > Hello!
> > 
> > When reviewing Matthew's THP patches I've noticed one odd behavior which
> > got copied from current iomap seek hole/data helpers. Currently we have:
> > 
> > # fallocate -l 4096 testfile
> > # xfs_io -x -c "seek -h 0" testfile
> > Whence	Result
> > HOLE	0
> > # dd if=testfile bs=4096 count=1 of=/dev/null
> > # xfs_io -x -c "seek -h 0" testfile
> > Whence	Result
> > HOLE	4096
> > 
> > So once we read from an unwritten extent, the areas with cached pages
> > suddently become treated as data. Later when pages get evicted, they become
> > treated as holes again. Strictly speaking I wouldn't say this is a bug
> > since nobody promises we won't treat holes as data but it looks weird.
> > Shouldn't we treat clean pages over unwritten extents still as holes and
> > only once the page becomes dirty treat is as data? What do other people
> > think?
> 
> I think we actually discussed this recently.  Unless I misunderstood
> one or both messages:
> 
> https://lore.kernel.org/linux-fsdevel/20201014223743.GD7391@dread.disaster.area/

Thanks for the link. That indeed explains it, the concern is that if we'd
check for PageDirty like I suggested, then it would be racy (page could
have been written out just before we found it but after we've received
block mapping from the filesystem). So using PageUptodate is less racy
(although still somewhat racy because page could be also reclaimed).

> I agree it's not great, but I'm not sure it's worth getting it "right"
> by tracking whether a page contains only zeroes.

Yeah, I don't think it's worth it just for this.

> I have been vaguely thinking about optimising for read-mostly workloads
> on sparse files by storing a magic entry that means "use the zero
> page" in the page cache instead of a page, like DAX does (only better).
> It hasn't risen to the top of my list yet.  Does anyone have a workload
> that would benefit from it?
> 
> (I don't mean "can anybody construct one"; that's trivially possible.
> I mean, do any customers care about the performance of that workload?)

No workload comes to my mind now.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

      reply	other threads:[~2020-10-26 16:48 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-26 14:57 Strange SEEK_HOLE / SEEK_DATA behavior Jan Kara
2020-10-26 15:14 ` Matthew Wilcox
2020-10-26 16:48   ` Jan Kara [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201026164810.GI28769@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox