From: Dave Chinner <david@fromorbit.com>
To: Avi Kivity <avi@scylladb.com>
Cc: linux-xfs@vger.kernel.org, Glauber Costa <glauber@scylladb.com>,
Raphael Carvalho <raphaelsc@scylladb.com>
Subject: Re: Intermittent zeroed pages with AIO+DIO+XFS
Date: Fri, 4 Aug 2017 13:14:10 +1000 [thread overview]
Message-ID: <20170804031410.GC21024@dastard> (raw)
In-Reply-To: <9a63f206-d026-eae4-9556-957ef94855b4@scylladb.com>
On Fri, Aug 04, 2017 at 05:40:07AM +0300, Avi Kivity wrote:
> On 08/04/2017 01:09 AM, Dave Chinner wrote:
> >On Thu, Aug 03, 2017 at 05:52:45PM +0300, Avi Kivity wrote:
> >>Hello,
> >>
> >Hi Avi,
> >
> >>I have an application that uses AIO+DIO to write data to a file on
> >>XFS. The writes use 128k buffers. Very rarely, I see aligned 4k
> >>blocks within the file that are zeroed. The blocks are not aligned
> >>to 128k boundary, just 4k. The buffers are allocated in anonymous
> >>memory, which is usually using transparent hugepages. The files are
> >>fully allocated, not sparse (checked post-mortem).
> >Did you check that the extents are written? i.e. there aren't
> >sporadic 4k unwritten extents in the file? (xfs_bmap -vvp output)
>
> Raphael did that, and the result was that the file was NOT sparse.
Sure, but a file with unwritten extents is not sparse. It's just got
extents that will always read as zeros. The extra "-vvp" output
tells you the unwritten flag state and does not merge contiguous
extents that differ only in state.
i.e:
$ sudo xfs_io -fd -c "falloc 0 1M" -c "pwrite 900k 200k" /mnt/scratch/foo
wrote 204800/204800 bytes at offset 921600
200 KiB, 50 ops; 0.0000 sec (13.838 MiB/sec and 3542.5818 ops/sec)
$ sudo xfs_bmap /mnt/scratch/foo
/mnt/scratch/foo:
0: [0..2199]: 160..2359
Looks fully allocated. However:
$ sudo xfs_bmap -vvp /mnt/scratch/foo
/mnt/scratch/foo:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..1799]: 160..1959 0 (160..1959) 1800 010000
1: [1800..2199]: 1960..2359 0 (1960..2359) 400 000000
FLAG Values:
0100000 Shared extent
0010000 Unwritten preallocated extent
0001000 Doesn't begin on stripe unit
0000100 Doesn't end on stripe unit
0000010 Doesn't begin on stripe width
0000001 Doesn't end on stripe width
$
The first 900k of the file is an unwritten extent, which returns
zeros...
> btw, we also run with the extent size hint set to 32MB.
Which means that space is definitely being allocated as unwritten
extents, then overwritten and converted on IO completion. Hence if
the overwrite is not complete, or there's a bug in the unwritten
extent conversion, it may leave unwritten extents where it
shouldn't....
> >What kernel version is this seen on? We've changed the XFS DIO
> >IO path implementation substantially in recent times....
>
> CentOS 7.2's kernel. Glauber, do you now the precise version string?
Can you reproduce on an upstream kernel? Problems with highly
patched distro kernels really need to be directed to the distro...
> >>Does this trigger anything in anyone's mind?
> >Nope - do you have a reproducer you can share?
> >
>
> Run a certain NoSQL database for months on a cluster with lots of
> activity, and _may_ see it a few time. It's very rare, but it's
> there.
Needle in a haystack, then - the problem could be anywhere in the
storage stack, including hardware. You're going to need to
isolate the problem to the filesystem for us, which means a
reproducer script of some kind...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2017-08-04 3:14 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-03 14:52 Intermittent zeroed pages with AIO+DIO+XFS Avi Kivity
2017-08-03 22:09 ` Dave Chinner
2017-08-04 2:40 ` Avi Kivity
2017-08-04 2:50 ` Glauber Costa
2017-08-04 3:14 ` Dave Chinner [this message]
2017-08-04 3:36 ` Avi Kivity
2017-08-04 4:04 ` Raphael S. Carvalho
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170804031410.GC21024@dastard \
--to=david@fromorbit.com \
--cc=avi@scylladb.com \
--cc=glauber@scylladb.com \
--cc=linux-xfs@vger.kernel.org \
--cc=raphaelsc@scylladb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox