From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id mBG5elP9008712 for ; Mon, 15 Dec 2008 23:40:48 -0600 Received: from sandeen.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 2DDFF1739D88 for ; Mon, 15 Dec 2008 21:40:45 -0800 (PST) Received: from sandeen.net (sandeen.net [209.173.210.139]) by cuda.sgi.com with ESMTP id hF8NEHhdFTTeO0AH for ; Mon, 15 Dec 2008 21:40:45 -0800 (PST) Message-ID: <49473F5C.3070308@sandeen.net> Date: Mon, 15 Dec 2008 23:40:44 -0600 From: Eric Sandeen MIME-Version: 1.0 Subject: Re: [PATCH] fix corruption case for block size < page size References: <49435F35.40109@sandeen.net> <4943FCD7.2010509@sandeen.net> <494735D9.8020809@sgi.com> In-Reply-To: <494735D9.8020809@sgi.com> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: lachlan@sgi.com Cc: xfs-oss Lachlan McIlroy wrote: > I'm still working through this Eric so I don't fully understand what's > going on. > > It looks to me like the region was never zeroed at all. In > xfs_zero_last_block() we only zero up to the end of the last block > (hence the name) but if the last page extends beyond that last > block we wont zero that extra space in the page. If that remaining > space in the page sits over a hole then xfs_zero_eof() wont zero it > either. So the testcase steps are (8 blocks per page here). 1: |1111|1111|1111|1111|1111|1111|1111|1111| write 1's 2: |RRRR|1111|1111|1111|1111|1111|1111|1111| mapread first block |<--------trunc----------------------- 3: |1100| trunc down to 1/2 block trunc-->| 4: |1100| trunc up to block+1byte 5: | | | | |2222| write 2's (extending) And we get: # |1100|0000|1111|1111|2222|----|----|----| But we should get: # |1100|HHHH|HHHH|HHHH|2222|----|----|----| (where "H" means hole) So no, the data for blocks 1,2,3 is never zeroed, nor should it be, I think - we never partially write those blocks. And the stale 1's coming through in blocks 2 & 3 are not the result of non-zero buffer heads getting written to disk in the last stage; they are simply stale disk blocks from the first step mapped back into the file. (They were originally written as 1's in the first step.) > In your example above the last write extends the file size from 513 > bytes to 2048 bytes. In xfs_zero_last_block() we'll only zero from > 513 up to 1024 bytes (ie up to the end of the last block) but leave > the rest of the page untouched. Actually; after the truncate down step (3) we should have: |<--------trunc----------------------- 3: |11??| trunc down to 1/2 block ^ | EOF Hm, but does the end of this block get zeroed now or only when we subsequently extend the size? The latter I think...? So I think in the next step: trunc-->| 4: |1100| trunc up to block+1byte ^^ now || this part of the block gets zeroed, right, by xfs_zero_eof? > Because of the truncate to 256 bytes > only the first block is allocated and everything beyond 512 bytes is > a hole. Yep, up until the last write anyway. > More specifically there is a hole under the remainder of the > page so xfs_zero_eof() will skip that region and not zero anything. Well, the last write (step 5) is still completely within the page... Right, that's what it *should* be doing; but in page_state_convert (and I'll admit to not having this 100% nailed down) we write block 1 and map blocks 2 & 3 back into the file, and get: # |1100|0000|1111|1111|2222|----|----|----| ^^^^ ^^^^ where these |||| |||| blocks are stale data, and block 1 is written (but at least zeroed). How block 1 got zeroed I guess I'm not quite certain yet. But it does not appear that blocks 2 and 3 get *written* any time other than step 1; blktrace seems to confirm this. block 1 does get written, and 0s are written. (But I don't think this block ever should get written either; EOF landed there but only via truncate, not a write). Crap, now you've got me slightly confused again, and I'll need to look a bit more to be sure I'm 100% clear on what's getting zeroed and when vs. what's getting mapped and why. :) -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs