* fsx failure on 3.10.0-rc1+ (xfstests 263) -- Mapped Read: non-zero data past EOF @ 2013-06-10 13:17 Brian Foster 2013-06-10 21:31 ` Dave Chinner 0 siblings, 1 reply; 4+ messages in thread From: Brian Foster @ 2013-06-10 13:17 UTC (permalink / raw) To: xfs Hi guys, I wanted to get this onto the list... I suspect this could be similar/related to the issue reported here: http://oss.sgi.com/archives/xfs/2013-06/msg00066.html While running xfstests, the only apparent regression I hit from 3.9.0 was generic/263. This test fails due to the following command (and resulting output): # fsx -N 10000 -o 128000 -l 500000 -r 4096 -t 512 -w 512 -Z /mnt/junk truncating to largest ever: 0x12a00 truncating to largest ever: 0x75400 fallocating to largest ever: 0x7a120 Mapped Read: non-zero data past EOF (0x79dff) page offset 0xe00 is 0xe927 offset = 0x78000, size = 0x1220 LOG DUMP (7966 total operations): ... 7959( 23 mod 256): TRUNCATE DOWN from 0x2d200 to 0x1e200 7960( 24 mod 256): MAPWRITE 0x54800 thru 0x655e7 (0x10de8 bytes) 7961( 25 mod 256): FALLOC 0x448b4 thru 0x5835d (0x13aa9 bytes) INTERIOR 7962( 26 mod 256): WRITE 0x8200 thru 0xb7ff (0x3600 bytes) 7963( 27 mod 256): READ 0x61000 thru 0x64fff (0x4000 bytes) 7964( 28 mod 256): MAPREAD 0x6000 thru 0xe5fe (0x85ff bytes) 7965( 29 mod 256): WRITE 0x6ca00 thru 0x79dff (0xd400 bytes) HOLE 7966( 30 mod 256): MAPREAD 0x78000 thru 0x7921f (0x1220 bytes) Correct content saved for comparison (maybe hexdump "/mnt/junk" vs "/mnt/junk.fsxgood") So if I'm following that correctly, we truncate the file down to 0x1e200, extend it with an mmap write to 0x655e7, do a couple internal reads/writes, extend to 0x79dff with a direct write and hit stale data on an mmap read at eof. Post-mortem on the file: # stat /mnt/junk File: `/mnt/junk' Size: 499200 Blocks: 704 IO Block: 4096 regular file Device: fd02h/64770d Inode: 131 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-06-05 11:04:04.968000000 -0400 Modify: 2013-06-05 11:04:04.967000000 -0400 Change: 2013-06-05 11:04:04.967000000 -0400 Birth: - # xfs_bmap -v /mnt/junk junk: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..31]: 96..127 0 (96..127) 32 1: [32..127]: 256..351 0 (256..351) 96 2: [128..223]: 896..991 0 (896..991) 96 3: [224..543]: hole 320 4: [544..1023]: 1312..1791 0 (1312..1791) 480 10000 I ran a bisect between tot and 3.9 and narrowed down to: e114b5fc xfs: increase prealloc size to double that of the previous extent ... though IIRC, this was an additive fix on top of the recent sparse speculative prealloc updates, so it might not be much more than a data point: a1e16c26 xfs: limit speculative prealloc size on sparse files This is interesting from a release perspective in that the latter change is included in 3.9 and the former fix is not. Therefore, I went back to 3.8 and found I can reproduce it there as well. FWIW, I can also reproduce this on tot with allocsize=131072 (though it appears to be intermittent) and on 3.8 under similar circumstances (w/ speculative prealloc or allocsize >= 131072). Given all that, my speculation at this point is that the more recent prealloc changes probably don't introduce the core issue, but rather alter the behavior enough to determine whether this test case triggers it. Brian P.S., I also came across the following thread which, if related, suggests this might be known/understood to a degree: http://oss.sgi.com/archives/xfs/2012-04/msg00703.html _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: fsx failure on 3.10.0-rc1+ (xfstests 263) -- Mapped Read: non-zero data past EOF 2013-06-10 13:17 fsx failure on 3.10.0-rc1+ (xfstests 263) -- Mapped Read: non-zero data past EOF Brian Foster @ 2013-06-10 21:31 ` Dave Chinner 2013-06-10 23:17 ` Brian Foster 0 siblings, 1 reply; 4+ messages in thread From: Dave Chinner @ 2013-06-10 21:31 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On Mon, Jun 10, 2013 at 09:17:31AM -0400, Brian Foster wrote: > Hi guys, > > I wanted to get this onto the list... I suspect this could be > similar/related to the issue reported here: > > http://oss.sgi.com/archives/xfs/2013-06/msg00066.html Unlikely - generic/263 tests mmap IO vs direct IO, and Sage's problem has neither... > While running xfstests, the only apparent regression I hit from 3.9.0 > was generic/263. This test fails due to the following command (and > resulting output): Not a regression - 263 has been failing ever since it was introduced in 2011 by: commit 0d69e10ed15b01397e8c6fd7833fa3c2970ec024 Author: Christoph Hellwig <hch@infradead.org> Date: Mon Oct 10 18:22:16 2011 +0000 split mapped writes vs direct I/O tests from 091 This effectively reverts xfstests: add mapped write fsx operations to 091 and adds a new test case for it. It tests something slightly different, and regressions in existing tests due to new features are pretty nasty in a test suite. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com> It is testing mmap() writes vs direct IO, something that is known to be fundamentally broken (i.e. racy) as mmap() page fault path does not hold the XFS_IOLOCK or i_mutex in any way. The direct IO path tries to wark around this by flushing and invalidating cached pages before IO submission, but the lack of locking in the page fault path means we can't avoid the race entirely. > P.S., I also came across the following thread which, if related, > suggests this might be known/understood to a degree: > > http://oss.sgi.com/archives/xfs/2012-04/msg00703.html Yup, that's potentially one aspect of it. However, have you run the test code on ext3/4? it works just fine - it's only XFS that has problems with this case, so it's not clear that this is a DIO problem. It was never able to work out where ext3/ext4 were zeroing the part of the page beyond EOF, and I couldn't ever make the DIO code reliably do the right thing. It's one of the reasons that lead to this discussion as LSFMM: http://lwn.net/Articles/548351/ Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: fsx failure on 3.10.0-rc1+ (xfstests 263) -- Mapped Read: non-zero data past EOF 2013-06-10 21:31 ` Dave Chinner @ 2013-06-10 23:17 ` Brian Foster 2013-06-10 23:42 ` Dave Chinner 0 siblings, 1 reply; 4+ messages in thread From: Brian Foster @ 2013-06-10 23:17 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 06/10/2013 05:31 PM, Dave Chinner wrote: > On Mon, Jun 10, 2013 at 09:17:31AM -0400, Brian Foster wrote: >> Hi guys, >> >> I wanted to get this onto the list... I suspect this could be >> similar/related to the issue reported here: >> >> http://oss.sgi.com/archives/xfs/2013-06/msg00066.html > > Unlikely - generic/263 tests mmap IO vs direct IO, and Sage's > problem has neither... > Oh, Ok. I didn't look at that one closely enough then. >> While running xfstests, the only apparent regression I hit from 3.9.0 >> was generic/263. This test fails due to the following command (and >> resulting output): > > Not a regression - 263 has been failing ever since it was introduced > in 2011 by: > > commit 0d69e10ed15b01397e8c6fd7833fa3c2970ec024 ... > > It is testing mmap() writes vs direct IO, something that is known to > be fundamentally broken (i.e. racy) as mmap() page fault path does > not hold the XFS_IOLOCK or i_mutex in any way. The direct IO path > tries to wark around this by flushing and invalidating cached pages > before IO submission, but the lack of locking in the page fault path > means we can't avoid the race entirely. > Thanks for the explanation. >> P.S., I also came across the following thread which, if related, >> suggests this might be known/understood to a degree: >> >> http://oss.sgi.com/archives/xfs/2012-04/msg00703.html > > Yup, that's potentially one aspect of it. However, have you run the > test code on ext3/4? it works just fine - it's only XFS that has > problems with this case, so it's not clear that this is a DIO > problem. It was never able to work out where ext3/ext4 were zeroing > the part of the page beyond EOF, and I couldn't ever make the DIO > code reliably do the right thing. It's one of the reasons that lead > to this discussion as LSFMM: > > http://lwn.net/Articles/548351/ > Interesting, thanks again. I did happen to run the script and the fsx test on the ext4 rootfs of my VM and observed expected behavior. Note that I mentioned this was harder to reproduce with fixed alloc sizes less than 128k or so. I don't believe ext4 does any kind of speculative preallocation in the manner that XFS does. Perhaps that is a factor..? Brian > Cheers, > > Dave. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: fsx failure on 3.10.0-rc1+ (xfstests 263) -- Mapped Read: non-zero data past EOF 2013-06-10 23:17 ` Brian Foster @ 2013-06-10 23:42 ` Dave Chinner 0 siblings, 0 replies; 4+ messages in thread From: Dave Chinner @ 2013-06-10 23:42 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On Mon, Jun 10, 2013 at 07:17:22PM -0400, Brian Foster wrote: > On 06/10/2013 05:31 PM, Dave Chinner wrote: > > On Mon, Jun 10, 2013 at 09:17:31AM -0400, Brian Foster wrote: > >> Hi guys, > >> > >> I wanted to get this onto the list... I suspect this could be > >> similar/related to the issue reported here: > >> > >> http://oss.sgi.com/archives/xfs/2013-06/msg00066.html > > > > Unlikely - generic/263 tests mmap IO vs direct IO, and Sage's > > problem has neither... > > > > Oh, Ok. I didn't look at that one closely enough then. > > >> While running xfstests, the only apparent regression I hit from 3.9.0 > >> was generic/263. This test fails due to the following command (and > >> resulting output): > > > > Not a regression - 263 has been failing ever since it was introduced > > in 2011 by: > > > > commit 0d69e10ed15b01397e8c6fd7833fa3c2970ec024 > ... > > > > It is testing mmap() writes vs direct IO, something that is known to > > be fundamentally broken (i.e. racy) as mmap() page fault path does > > not hold the XFS_IOLOCK or i_mutex in any way. The direct IO path > > tries to wark around this by flushing and invalidating cached pages > > before IO submission, but the lack of locking in the page fault path > > means we can't avoid the race entirely. > > > > Thanks for the explanation. > > >> P.S., I also came across the following thread which, if related, > >> suggests this might be known/understood to a degree: > >> > >> http://oss.sgi.com/archives/xfs/2012-04/msg00703.html > > > > Yup, that's potentially one aspect of it. However, have you run the > > test code on ext3/4? it works just fine - it's only XFS that has > > problems with this case, so it's not clear that this is a DIO > > problem. It was never able to work out where ext3/ext4 were zeroing > > the part of the page beyond EOF, and I couldn't ever make the DIO > > code reliably do the right thing. It's one of the reasons that lead > > to this discussion as LSFMM: > > > > http://lwn.net/Articles/548351/ > > > > Interesting, thanks again. I did happen to run the script and the fsx > test on the ext4 rootfs of my VM and observed expected behavior. > > Note that I mentioned this was harder to reproduce with fixed alloc > sizes less than 128k or so. I don't believe ext4 does any kind of > speculative preallocation in the manner that XFS does. Perhaps that is a > factor..? Oh, it most likely is, but XFS has done speculative prealloc since, well, forever, so this isn't a regression as such. FWIW, the old default for speculative prealloc was XFS_WRITEIO_LOG_LARGE (16 filesystem blocks), so this test would have failed before any of the dynamic speculative alloc changes were made.... Indeed, if you mount with -o allocsize=4k, you'll find the test case no longer fails - it requires allocsize=32k (or larger) to fail here. That's not surprising, given that the test is writing across a 16k-beyond-eof boundary when it triggers the problem, and so needs a prealloc size of >16k to trigger... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-06-10 23:42 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-06-10 13:17 fsx failure on 3.10.0-rc1+ (xfstests 263) -- Mapped Read: non-zero data past EOF Brian Foster 2013-06-10 21:31 ` Dave Chinner 2013-06-10 23:17 ` Brian Foster 2013-06-10 23:42 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox