From: Dave Chinner <david@fromorbit.com>
To: Sage Weil <sage@inktank.com>
Cc: Eric Sandeen <sandeen@sandeen.net>, xfs@oss.sgi.com
Subject: Re: garbage block(s) after powercycle/reboot + sparse writes
Date: Wed, 19 Jun 2013 11:46:46 +1000 [thread overview]
Message-ID: <20130619014646.GF29338@dastard> (raw)
In-Reply-To: <alpine.DEB.2.00.1306120955460.15386@cobra.newdream.net>
On Wed, Jun 12, 2013 at 10:02:52AM -0700, Sage Weil wrote:
> Hi guys,
>
> I reproduced this on two more boxes and have more data. The full set of
> notes/logs is at
>
> http://newdream.net/~sage/bug-4976/notes.txt
case c:
/var/lib/ceph/osd/ceph-0/current/3.0_head/DIR_9/plana8021941-457__head_834C52B9__3:
0: [0..7]: 244206232..244206239
1: [8..1351]: hole
2: [1352..2431]: 244252824..244253903
3: [2432..3255]: hole
4: [3256..4855]: 244254728..244256327
bad data starts at offset 1179648:
Which lies within the middle of an allocated extent (offset 2304bb).
IIUC, then there was a write at offset 700466 for 445465 bytes,
(i.e start @ 1368bb, end @ 2239bb), but given the block count of
the file didn't change, this must have been an overwrite of existing
data. It's well within EOF, too, so it's not clear what has happened
here - the bad data was not written by the ceph journal replay, and
the extent was already allocated on disk...
case d:
0: [0..7]: 732632192..732632199
1: [8..783]: hole
2: [784..2943]: 733513808..733515967
3: [2944..3511]: hole
4: [3512..4703]: 733516536..733517727
INFO:teuthology.task.rados.rados.0.out:incorrect buffer at pos 1179648 (((same offset as previous case!!!))
So, similar layout, we have a write at offset 404703 for 773518
bytes (start @790bb, end @2302). Ok, that makes a little more sense
to have corruption starting @ 2304. The pages in the page cache that
cover this write would cover offsets 784bb to 2303bb, having zeroed
the head and tail sections.
The file size and number of blocks didn't change again and it's
well within the file size, so this is also an overwrite without
allocat. IOWs the overwrite ended at 2303bb, with corruption
starting at 2304bb.
Let's play a "what-if" game. there are 3 writes to the file:
-@ 0 for 57 bytes (1FSB)
-@ 404703 for 773518 bytes (
-@ 1801584 for 603119 bytes
These translate to the following regions in BB:
- [0..1]
- [790..2302]
- [3518..4697]
If we round these to filesystem blocks, we have:
- [0..7]
- [784..2303]
- [3512..4703]
And in filesystem blocks:
Should have Actually have
----------- ------------
- [0..1] - [0..1]
- [98..287] - [98..367]
- [439..587] - [439..587]
So there's an extra 80 filesystem blocks in the middle extent. Does
that tally with the block count of 3360bb = 420fsb? 1 + 190 + 149 =
340fsb. it doesn't, so there's another 60fsb beyond EOF.
And that's pretty telling. There's non-zero'd speculative
preallocation there. The *big* question is this: why was the file
size extented (i.e. user data IO completed) and updated and the
file size update logged to the journal *before* the zeroed regions
of the file had been written to zero?
And why only on 3.8/3.9?
> Any ideas?
I'd love to see the code that is writing the files in the first
place - it's not playing silly buggers with sync_file_range(), is
it?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-06-19 1:46 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-04 19:24 garbage block(s) after powercycle/reboot + sparse writes Sage Weil
2013-06-04 20:00 ` Ben Myers
2013-06-05 0:22 ` Eric Sandeen
2013-06-12 17:02 ` Sage Weil
2013-06-19 1:46 ` Dave Chinner [this message]
2013-06-19 3:12 ` Sage Weil
2013-06-19 4:05 ` Dave Chinner
2013-06-19 4:15 ` Sage Weil
2013-06-19 5:18 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130619014646.GF29338@dastard \
--to=david@fromorbit.com \
--cc=sage@inktank.com \
--cc=sandeen@sandeen.net \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox