All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: lachlan@sgi.com
Cc: xfs-oss <xfs@oss.sgi.com>
Subject: Re: [PATCH] fix corruption case for block size < page size
Date: Mon, 15 Dec 2008 23:40:44 -0600	[thread overview]
Message-ID: <49473F5C.3070308@sandeen.net> (raw)
In-Reply-To: <494735D9.8020809@sgi.com>

Lachlan McIlroy wrote:
> I'm still working through this Eric so I don't fully understand what's 
> going on.
> 
> It looks to me like the region was never zeroed at all.  In
> xfs_zero_last_block() we only zero up to the end of the last block
> (hence the name) but if the last page extends beyond that last
> block we wont zero that extra space in the page.  If that remaining
> space in the page sits over a hole then xfs_zero_eof() wont zero it
> either.

So the testcase steps are (8 blocks per page here).

1: |1111|1111|1111|1111|1111|1111|1111|1111|     write 1's

2: |RRRR|1111|1111|1111|1111|1111|1111|1111|     mapread first block

     |<--------trunc-----------------------
3: |1100|                                        trunc down to 1/2 block

 trunc-->|
4: |1100|                                        trunc up to block+1byte

5: |    |    |    |    |2222|                    write 2's (extending)

And we get:

# |1100|0000|1111|1111|2222|----|----|----|

But we should get:

# |1100|HHHH|HHHH|HHHH|2222|----|----|----|

(where "H" means hole)

So no, the data for blocks 1,2,3 is never zeroed, nor should it be, I
think - we never partially write those blocks.  And the stale 1's coming
through in blocks 2 & 3 are not the result of non-zero buffer heads
getting written to disk in the last stage; they are simply stale disk
blocks from the first step mapped back into the file.  (They were
originally written as 1's in the first step.)

> In your example above the last write extends the file size from 513
> bytes to 2048 bytes.  In xfs_zero_last_block() we'll only zero from
> 513 up to 1024 bytes (ie up to the end of the last block) but leave
> the rest of the page untouched.  

Actually; after the truncate down step (3) we should have:

     |<--------trunc-----------------------
3: |11??|                                       trunc down to 1/2 block
     ^
     |
    EOF

Hm, but does the end of this block get zeroed now or only when we
subsequently extend the size?  The latter I think...?

So I think in the next step:

 trunc-->|
4: |1100|                                        trunc up to block+1byte
      ^^
  now || this part of the block gets zeroed, right, by xfs_zero_eof?

> Because of the truncate to 256 bytes
> only the first block is allocated and everything beyond 512 bytes is
> a hole.  

Yep, up until the last write anyway.

> More specifically there is a hole under the remainder of the
> page so xfs_zero_eof() will skip that region and not zero anything.

Well, the last write (step 5) is still completely within the page...

Right, that's what it *should* be doing; but in page_state_convert (and
I'll admit to not having this 100% nailed down) we write block 1 and map
blocks 2 & 3 back into the file, and get:

# |1100|0000|1111|1111|2222|----|----|----|
             ^^^^ ^^^^
where these  |||| |||| blocks are stale data, and block 1 is written
(but at least zeroed).  How block 1 got zeroed I guess I'm not quite
certain yet.  But it does not appear that blocks 2 and 3 get *written*
any time other than step 1; blktrace seems to confirm this.  block 1
does get written, and 0s are written.  (But I don't think this block
ever should get written either; EOF landed there but only via truncate,
not a write).

Crap, now you've got me slightly confused again, and I'll need to look a
bit more to be sure I'm 100% clear on what's getting zeroed and when vs.
what's getting mapped and why.  :)

-Eric


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2008-12-16  5:40 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-13  7:07 [PATCH] fix corruption case for block size < page size Eric Sandeen
2008-12-13 17:48 ` Eric Sandeen
2008-12-13 18:20 ` Eric Sandeen
2008-12-16  5:00   ` Lachlan McIlroy
2008-12-16  5:40     ` Eric Sandeen [this message]
2008-12-16  6:05       ` Lachlan McIlroy
2008-12-16  6:10         ` Eric Sandeen
2008-12-16  6:21           ` Eric Sandeen
2008-12-16  6:51             ` Eric Sandeen
2009-01-07  5:23               ` Lachlan McIlroy
2009-01-07  5:53                 ` Eric Sandeen
2009-01-07  6:32                   ` Lachlan McIlroy
2009-01-07 21:42                     ` Dave Chinner
2009-01-09  0:18                       ` Lachlan McIlroy
2008-12-16  7:54           ` Lachlan McIlroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49473F5C.3070308@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=lachlan@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.