* dm-thin discard issue
[not found] <1801783.47.1362573815861.JavaMail.javamailuser@localhost>
@ 2013-03-06 13:12 ` Jim Minter
2013-03-06 14:26 ` thornber
0 siblings, 1 reply; 3+ messages in thread
From: Jim Minter @ 2013-03-06 13:12 UTC (permalink / raw)
To: dm-devel
Hello,
I think I've uncovered a problem when issuing the BLKDISCARD ioctl to a thin volume. If I create a thin volume, fill it with data, snapshot it, then call BLKDISCARD on the thin volume, it looks like the kernel doesn't take into account the fact that the underlying blocks are shared with the snapshot, and just goes ahead and discards them. This appears to then leave the metadata in an inconsistent state.
Here's a reproducer (works on rawhide as of today, 3.9.0-0.rc1.git0.1.fc19.x86_64):
(assumes volumes 253:2 for metadata and 253:3 for pool; uses blkdiscard from upstream util-linux to issue the BLKDISCARD ioctl)
=== 8< ===
echo +++ Creating pool and thin...
dmsetup create pool --table '0 2097152 thin-pool 253:2 253:3 128 0 1 no_discard_passdown'
dmsetup message /dev/mapper/pool 0 "create_thin 0"
dmsetup create thin --table "0 1048576 thin /dev/mapper/pool 0"
echo +++ Filling thin...
dd if=/dev/zero of=/dev/mapper/thin bs=1M &>/dev/null
echo +++ Creating snap...
dmsetup suspend /dev/mapper/thin
dmsetup message /dev/mapper/pool 0 "create_snap 1 0"
dmsetup resume /dev/mapper/thin
dmsetup create snap --table "0 1048576 thin /dev/mapper/pool 1"
dmsetup status
echo +++ Discarding thin...
blkdiscard /dev/mapper/thin
dmsetup status
=== 8< ===
Output is:
=== 8< ===
+++ Creating pool and thin...
+++ Filling thin...
+++ Creating snap...
thin: 0 1048576 thin 1048576 1048575
vg-pool: 0 2097152 linear
fedora-swap: 0 8257536 linear
fedora-root: 0 32653312 linear
vg-meta: 0 262144 linear
snap: 0 1048576 thin 1048576 1048575
pool: 0 2097152 thin-pool 0 72/32768 8192/16384 - rw no_discard_passdown
+++ Discarding thin...
thin: 0 1048576 thin 0 -
vg-pool: 0 2097152 linear
fedora-swap: 0 8257536 linear
fedora-root: 0 32653312 linear
vg-meta: 0 262144 linear
snap: 0 1048576 thin 1048576 1048575
pool: 0 2097152 thin-pool 0 73/32768 0/16384 - rw no_discard_passdown
=== 8< ===
-------------------------------------^
At this point, pool's used data blocks looks wrong - I'd expect 8192 still, not 0.
After 'dmsetup remove'ing snap, thin and pool, the output of thin_check /dev/mapper/vg-meta is:
=== 8< ===
Errors in metadata
Errors in data block reference counts
0: was 0, expected 1
1: was 0, expected 1
...
8190: was 0, expected 1
8191: was 0, expected 1
=== 8< ===
I'm also having trouble recovering from this situation using the user-space tools, but I'll continue about that on a second thread.
Cheers,
Jim
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: dm-thin discard issue
2013-03-06 13:12 ` dm-thin discard issue Jim Minter
@ 2013-03-06 14:26 ` thornber
2013-03-06 16:06 ` thornber
0 siblings, 1 reply; 3+ messages in thread
From: thornber @ 2013-03-06 14:26 UTC (permalink / raw)
To: device-mapper development
On Wed, Mar 06, 2013 at 08:12:39AM -0500, Jim Minter wrote:
> Hello,
>
> I think I've uncovered a problem when issuing the BLKDISCARD ioctl to a thin volume. If I create a thin volume, fill it with data, snapshot it, then call BLKDISCARD on the thin volume, it looks like the kernel doesn't take into account the fact that the underlying blocks are shared with the snapshot, and just goes ahead and discards them. This appears to then leave the metadata in an inconsistent state.
>
> Here's a reproducer (works on rawhide as of today, 3.9.0-0.rc1.git0.1.fc19.x86_64):
> (assumes volumes 253:2 for metadata and 253:3 for pool; uses blkdiscard from upstream util-linux to issue the BLKDISCARD ioctl)
Alarming, to say the least. Give me a couple of hours to look at this ...
- Joe
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: dm-thin discard issue
2013-03-06 14:26 ` thornber
@ 2013-03-06 16:06 ` thornber
0 siblings, 0 replies; 3+ messages in thread
From: thornber @ 2013-03-06 16:06 UTC (permalink / raw)
To: device-mapper development
Hi Jim,
On Wed, Mar 06, 2013 at 02:26:22PM +0000, thornber@redhat.com wrote:
> On Wed, Mar 06, 2013 at 08:12:39AM -0500, Jim Minter wrote:
> > Hello,
> >
> > I think I've uncovered a problem when issuing the BLKDISCARD ioctl to a thin volume. If I create a thin volume, fill it with data, snapshot it, then call BLKDISCARD on the thin volume, it looks like the kernel doesn't take into account the fact that the underlying blocks are shared with the snapshot, and just goes ahead and discards them. This appears to then leave the metadata in an inconsistent state.
> >
> > Here's a reproducer (works on rawhide as of today, 3.9.0-0.rc1.git0.1.fc19.x86_64):
> > (assumes volumes 253:2 for metadata and 253:3 for pool; uses blkdiscard from upstream util-linux to issue the BLKDISCARD ioctl)
>
> Alarming, to say the least. Give me a couple of hours to look at this ...
I managed to reproduce with this test:
def test_discard_origin_does_not_effect_snap
with_standard_pool(@size) do |pool|
with_new_thin(pool, @volume_size, 0) do |thin|
wipe_device(thin)
assert_used_blocks(pool, @blocks_per_dev)
with_new_snap(pool, @volume_size, 1, 0, thin) do |snap|
assert_used_blocks(pool, @blocks_per_dev)
end
thin.discard(0, @volume_size)
assert_used_blocks(pool, @blocks_per_dev)
end
assert_used_blocks(pool, @blocks_per_dev)
end
end
This commit fixes the issue:
https://github.com/jthornber/linux-2.6/commit/a42dfef751cb666d3274346c07dff655cb40cc5a
I'm really sorry about this, we should have covered this better in the
tests.
- Joe
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-03-06 16:06 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1801783.47.1362573815861.JavaMail.javamailuser@localhost>
2013-03-06 13:12 ` dm-thin discard issue Jim Minter
2013-03-06 14:26 ` thornber
2013-03-06 16:06 ` thornber
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.