From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:48574 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754325AbdERJr4 (ORCPT ); Thu, 18 May 2017 05:47:56 -0400 Date: Thu, 18 May 2017 17:47:53 +0800 From: Eryu Guan Subject: Re: [PATCH 0/3 v2] xfs: Fix SEEK_HOLE implementation Message-ID: <20170518094753.GL7250@eguan.usersys.redhat.com> References: <20170517121046.2632-1-jack@suse.cz> <20170517123115.GG7250@eguan.usersys.redhat.com> <20170517145746.GC5632@quack2.suse.cz> <20170518090346.GC9084@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170518090346.GC9084@quack2.suse.cz> Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: Jan Kara Cc: "Darrick J . Wong" , linux-xfs@vger.kernel.org, Brian Foster On Thu, May 18, 2017 at 11:03:46AM +0200, Jan Kara wrote: > On Wed 17-05-17 16:57:46, Jan Kara wrote: > > On Wed 17-05-17 20:31:15, Eryu Guan wrote: > > > Hi Jan, > > > > > > On Wed, May 17, 2017 at 02:10:43PM +0200, Jan Kara wrote: > > > > Hello, > > > > > > > > this is the second revision of the patches to fix bugs in XFS's SEEK_HOLE > > > > implementation and cleanup the code a bit. > > > > > > > > Changes since v1: > > > > * Fixed some more buggy cases > > > > * Simplified code a bit as suggested by Darrick > > > > * Fixed range check as spotted by Brian > > > > > > I applied this patchset on top of 4.12-rc1 kernel to test your v4 test > > > case, your new test passed all my tests, but I found generic/285 > > > regressed with sub-page block size XFS, 285.full showed that failure was > > > from subtest 7 > > > > > > 07. Test file with unwritten extents, only have dirty pages > > > 07.01 SEEK_HOLE expected 0 or 11264, got 0. succ > > > 07.02 SEEK_HOLE expected 1 or 11264, got 1. succ > > > 07.03 SEEK_DATA expected 10240 or 10240, got -1. FAIL > > > 07.04 SEEK_DATA expected 10240 or 10240, got -1. FAIL > > > > > > And manual test showed subtest 8 failed too > > > > > > # ./src/seek_sanity_test -s 8 -e 8 /mnt/xfs/testfile > > > File system magic#: 0x58465342 > > > Allocation size: 4096 > > > > > > 08. Test file with unwritten extents, only have unwritten pages > > > 08.01 SEEK_HOLE expected 0 or 5632, got 0. succ > > > 08.02 SEEK_HOLE expected 1 or 5632, got 1. succ > > > 08.03 SEEK_DATA expected 5120 or 5120, got -1. FAIL > > > 08.04 SEEK_DATA expected 5120 or 5120, got -1. FAIL > > > > > > Other subtests all passed with sub-page block size XFS. > > > > Strange. It doesn't fail for me this way even with 1k blocksize. I'll > > investigate more tomorrow. > > So I've been trying quite hard to reproduce the failure but I failed. Since > you are apparently getting some error out of lseek can you find out which > error it is (likely ENXIO but I'd like to confirm) and where it gets > generated? I don't see how it could possibly happen that SEEK_DATA would > miss that single page generated by this test and how any of my patches > would influence this particular situation. Thanks! Yes, it's ENXIO, strace log: ... open("/mnt/xfs/testfile07", O_RDWR|O_CREAT|O_TRUNC, 0644) = 3 write(1, "07. Test file with unwritten ext"..., 6007. Test file with unwritten extents, only have dirty pages ) = 60 fallocate(3, 0, 0, 11264) = 0 pwrite(3, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 1024, 10240) = 1024 lseek(3, 0, SEEK_HOLE) = 0 write(1, "07.01 SEEK_HOLE expected 0 or 11"..., 7107.01 SEEK_HOLE expected 0 or 11264, got 0. succ ) = 71 lseek(3, 1, SEEK_HOLE) = 1 write(1, "07.02 SEEK_HOLE expected 1 or 11"..., 7107.02 SEEK_HOLE expected 1 or 11264, got 1. succ ) = 71 lseek(3, 0, SEEK_DATA) = -1 ENXIO (No such device or address) write(1, "07.03 SEEK_DATA expected 10240 o"..., 7107.03 SEEK_DATA expected 10240 or 10240, got -1. FAIL ) = 71 lseek(3, 1, SEEK_DATA) = -1 ENXIO (No such device or address) write(1, "07.04 SEEK_DATA expected 10240 o"..., 7107.04 SEEK_DATA expected 10240 or 10240, got -1. FAIL ... It's the second patch "xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff()" introduced the issue for me. ENXIO was returned from the if (nmap == 1) block in __xfs_seek_hole_data() /* * We only received one extent out of the two requested. This * means we've hit EOF and didn't find what we are looking for. */ if (nmap == 1) { .... /* * If we were looking for data, it's nowhere to be found */ ASSERT(whence == SEEK_DATA); error = -ENXIO; goto out_error; } Seems that's because the do {} while() loop in xfs_find_get_desired_pgoff() was broken out earlier due to patch 2. /* Searching done if the page index is out of range. */ if (page->index >= end) { goto out; } In my case, it returned earlier because page->index == end == 2. I was testing with 1k block size xfs. xfs_info output: meta-data=/dev/sdc2 isize=512 agcount=4, agsize=5242880 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1 spinodes=0 rmapbt=0 = reflink=0 data = bsize=1024 blocks=20971520, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=1024 blocks=10240, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Hope this helps, if you need more info please let me know. Thanks, Eryu