* dm-thinp BUG at drivers/md/persistent-data/dm-btree-remove.c:188
@ 2013-02-15 2:07 Eric Wheeler
2013-02-15 2:34 ` Alasdair G Kergon
2013-02-15 10:52 ` thornber
0 siblings, 2 replies; 7+ messages in thread
From: Eric Wheeler @ 2013-02-15 2:07 UTC (permalink / raw)
To: dm-devel
Hello all,
I've been experimenting with dm-thinp recently and for the past few months
and all has been well---until today.
The server is running vanilla 3.7.1 and just started issuing the BUG dump
below. After the bug, the kernel hangs and I can't even ping the server.
This is running as a KVM virtual machine running dm-thinp backed with
a single virtio-blk device.
Has anyone seen this? Is this known to be fixed in a newer version?
Does this indicate a corrupt volume or metadata volume?
Let me know what other data I can collect, if any. The VM seems to hang
every few hours or so but I'm not sure what triggers it yet.
-Eric
kernel BUG at drivers/md/persistent-data/dm-btree-remove.c:188!
invalid opcode: 0000 [#1] SMP
Modules linked in: ebtable_nat ebtables ipt_REJECT bridge fcoe libfcoe
libfc 8021q scsi_transport_fc garp stp scsi_tgt llc sunrpc xt_limit
xt_conntrack iptable_filter xt_mark iptable_mangle ipt_MASQUERADE
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ip_tables
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack
ip6table_filter ip6_tables ipv6 ext3 jbd dm_thin_pool dm_bio_prison
dm_persistent_data dm_bufio libcrc32c vhost_net tun crc32c_intel microcode
pcspkr i2c_piix4 i2c_core pata_acpi ata_generic ata_piix floppy dm_mirror
dm_region_hash dm_log dm_mod
CPU 2
Pid: 3084, comm: kworker/u:0 Not tainted 3.7.1 #2 Red Hat KVM
RIP: 0010:[<ffffffffa009ad01>] [<ffffffffa009ad01>] shift+0x3b/0x91
[dm_persistent_data]
RSP: 0018:ffff8802160e7b58 EFLAGS: 00010202
RAX: 00000000000000fc RBX: ffff880040411000 RCX: 00000000000000fb
RDX: 00000000ffffffff RSI: ffff880040411000 RDI: ffff880040410000
RBP: ffff8802160e7b88 R08: 00000000000000fc R09: 000000000008bfc6
R10: ffff8802160e7bf0 R11: ffff8802160e7ac8 R12: 00000000ffffffff
R13: ffff880040410000 R14: 00000000000000fc R15: 00000000000000fd
FS: 0000000000000000(0000) GS:ffff88021fd00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f7749c89000 CR3: 00000002141ca000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:0 (pid: 3084, threadinfo ffff8802160e6000, task
ffff880214038e20)
Stack:
ffff8802160e7b78 ffff8802153eec40 ffff88001e1d7000 ffff880040411000
ffff880040410000 00000000000000fc ffff8802160e7c78 ffffffffa009b471
ffff880200000000 ffff88021fc92680 ffff8802160e7bd8 ffffffff81092a3b
Call Trace:
[<ffffffffa009b471>] remove_raw+0x517/0x624 [dm_persistent_data]
[<ffffffff81092a3b>] ? ttwu_do_wakeup+0x4d/0xdb
[<ffffffff81098ce8>] ? try_to_wake_up+0x19c/0x1ae
[<ffffffffa009b5ff>] dm_btree_remove+0x81/0x12e [dm_persistent_data]
[<ffffffffa00ae684>] dm_thin_remove_block+0x5f/0x8a [dm_thin_pool]
[<ffffffffa00ab1bf>] process_prepared_discard+0x22/0x40 [dm_thin_pool]
[<ffffffffa00aa875>] process_prepared+0x77/0x8f [dm_thin_pool]
[<ffffffffa00ac106>] do_worker+0x53/0x22f [dm_thin_pool]
[<ffffffff810846db>] process_one_work+0x1ea/0x2ec
[<ffffffffa00ac0b3>] ? pool_dtr+0x6b/0x6b [dm_thin_pool]
[<ffffffff81086a7c>] worker_thread+0x168/0x268
[<ffffffff81086914>] ? manage_workers+0x280/0x280
[<ffffffff8108a73d>] kthread+0xb5/0xbd
[<ffffffff8108a688>] ? kthread_freezable_should_stop+0x65/0x65
[<ffffffff81496eac>] ret_from_fork+0x7c/0xb0
[<ffffffff8108a688>] ? kthread_freezable_should_stop+0x65/0x65
Code: 08 66 66 66 66 90 8b 47 14 49 89 fd 48 89 f3 41 89 d4 44 8b 7f 10 44
8b 76 10 3b 46 14 74 04 0f 0b eb fe 41 29 d7 41 39 c7 76 04 <0f> 0b eb fe
47 8d 34 34 41 39 c6 76 04 0f 0b eb fe 83 fa 00 74
RIP [<ffffffffa009ad01>] shift+0x3b/0x91 [dm_persistent_data]
RSP <ffff8802160e7b58>
---[ end trace 524d6bc36c283730 ]---
BUG: unable to handle kernel paging request at ffffffffffffffd8
IP: [<ffffffff8108a1d3>] kthread_data+0x10/0x16
PGD 1673067 PUD 1674067 PMD 0
Oops: 0000 [#2] SMP
Modules linked in: ebtable_nat ebtables ipt_REJECT bridge fcoe libfcoe
libfc 8021q scsi_transport_fc garp stp scsi_tgt llc sunrpc xt_limit
xt_conntrack iptable_filter xt_mark iptable_mangle ipt_MASQUERADE
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ip_tables
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack
ip6table_filter ip6_tables ipv6 ext3 jbd dm_thin_pool dm_bio_prison
dm_persistent_data dm_bufio libcrc32c vhost_net tun crc32c_intel microcode
pcspkr i2c_piix4 i2c_core pata_acpi ata_generic ata_piix floppy dm_mirror
dm_region_hash dm_log dm_mod
CPU 2
Pid: 3084, comm: kworker/u:0 Tainted: G D 3.7.1 #2 Red Hat KVM
RIP: 0010:[<ffffffff8108a1d3>] [<ffffffff8108a1d3>]
kthread_data+0x10/0x16
RSP: 0018:ffff8802160e77e8 EFLAGS: 00010092
RAX: 0000000000000000 RBX: ffff88021fd12680 RCX: 0000000000000002
RDX: ffffffff818a8760 RSI: 0000000000000002 RDI: ffff880214038e20
RBP: ffff8802160e77e8 R08: ffff88021fd12680 R09: ffff880214038e68
R10: ffff8801c7c1adf0 R11: 0000000000000010 R12: ffff880214039100
R13: 0000000000000002 R14: 0000000000000002 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff88021fd00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffffffffd8 CR3: 00000002141ca000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:0 (pid: 3084, threadinfo ffff8802160e6000, task
ffff880214038e20)
Stack:
ffff8802160e7818 ffffffff810863e4 ffff8802160e7818 ffff88021fd12680
ffff880214039100 ffff8802160e78e8 ffff8802160e78a8 ffffffff8148ebab
ffff8802160e6010 0000000000012680 ffff880214038e20 0000000000012680
Call Trace:
[<ffffffff810863e4>] wq_worker_sleeping+0x1a/0x78
[<ffffffff8148ebab>] __schedule+0x150/0x503
[<ffffffff8148f24f>] schedule+0x64/0x66
[<ffffffff81072e23>] do_exit+0x81b/0x834
[<ffffffff81490ca0>] oops_end+0xbf/0xc7
[<ffffffff8103cb97>] die+0x5a/0x63
[<ffffffff8149081f>] do_trap+0x70/0x137
[<ffffffff8103b02c>] do_invalid_op+0x9c/0xa5
[<ffffffffa009ad01>] ? shift+0x3b/0x91 [dm_persistent_data]
[<ffffffffa0099672>] ? insert_shadow+0x39/0x8c [dm_persistent_data]
[<ffffffff81142110>] ? kmem_cache_alloc_trace+0xc1/0xd3
[<ffffffff81497f5e>] invalid_op+0x1e/0x30
[<ffffffffa009ad01>] ? shift+0x3b/0x91 [dm_persistent_data]
[<ffffffffa009b471>] remove_raw+0x517/0x624 [dm_persistent_data]
[<ffffffff81092a3b>] ? ttwu_do_wakeup+0x4d/0xdb
[<ffffffff81098ce8>] ? try_to_wake_up+0x19c/0x1ae
[<ffffffffa009b5ff>] dm_btree_remove+0x81/0x12e [dm_persistent_data]
[<ffffffffa00ae684>] dm_thin_remove_block+0x5f/0x8a [dm_thin_pool]
[<ffffffffa00ab1bf>] process_prepared_discard+0x22/0x40 [dm_thin_pool]
[<ffffffffa00aa875>] process_prepared+0x77/0x8f [dm_thin_pool]
[<ffffffffa00ac106>] do_worker+0x53/0x22f [dm_thin_pool]
[<ffffffff810846db>] process_one_work+0x1ea/0x2ec
[<ffffffffa00ac0b3>] ? pool_dtr+0x6b/0x6b [dm_thin_pool]
[<ffffffff81086a7c>] worker_thread+0x168/0x268
[<ffffffff81086914>] ? manage_workers+0x280/0x280
[<ffffffff8108a73d>] kthread+0xb5/0xbd
[<ffffffff8108a688>] ? kthread_freezable_should_stop+0x65/0x65
[<ffffffff81496eac>] ret_from_fork+0x7c/0xb0
[<ffffffff8108a688>] ? kthread_freezable_should_stop+0x65/0x65
Code: 8b 04 25 80 b9 00 00 48 8b 80 88 02 00 00 48 8b 40 c8 c9 48 c1 e8 02
83 e0 01 c3 55 48 89 e5 66 66 66 66 90 48 8b 87 88 02 00 00 <48> 8b 40 d8
c9 c3 55 48 89 e5 66 66 66 66 90 48 3b 3d b7 e4 81
RIP [<ffffffff8108a1d3>] kthread_data+0x10/0x16
RSP <ffff8802160e77e8>
CR2: ffffffffffffffd8
---[ end trace 524d6bc36c283731 ]---
Fixing recursive fault but reboot is needed!
--
Eric Wheeler
www.globallinuxsecurity.pro
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dm-thinp BUG at drivers/md/persistent-data/dm-btree-remove.c:188
2013-02-15 2:07 dm-thinp BUG at drivers/md/persistent-data/dm-btree-remove.c:188 Eric Wheeler
@ 2013-02-15 2:34 ` Alasdair G Kergon
2013-02-15 3:48 ` Mike Snitzer
2013-02-15 3:49 ` Eric Wheeler
2013-02-15 10:52 ` thornber
1 sibling, 2 replies; 7+ messages in thread
From: Alasdair G Kergon @ 2013-02-15 2:34 UTC (permalink / raw)
To: Eric Wheeler; +Cc: dm-devel
On Thu, Feb 14, 2013 at 06:07:56PM -0800, Eric Wheeler wrote:
> Does this indicate a corrupt volume or metadata volume?
It could be a software bug.
It is always worth trying the newest code, but I don't spot any
obvious change that could be related to this.
> Let me know what other data I can collect, if any. The VM seems to hang
> every few hours or so but I'm not sure what triggers it yet.
Please provide the basic parameters of the device e.g.
dmsetup info -c
dmsetup table
dmsetup status
and then I think we'll probably want to see your metadata (or
the relevant part of it) at the point where it hangs.
The trace shows it's applying a 'discard' to the metadata and releasing
blocks when the problem occurs.
Alasdair
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dm-thinp BUG at drivers/md/persistent-data/dm-btree-remove.c:188
2013-02-15 2:34 ` Alasdair G Kergon
@ 2013-02-15 3:48 ` Mike Snitzer
2013-02-15 5:12 ` Eric Wheeler
2013-02-15 3:49 ` Eric Wheeler
1 sibling, 1 reply; 7+ messages in thread
From: Mike Snitzer @ 2013-02-15 3:48 UTC (permalink / raw)
To: Eric Wheeler, dm-devel
On Thu, Feb 14 2013 at 9:34pm -0500,
Alasdair G Kergon <agk@redhat.com> wrote:
> On Thu, Feb 14, 2013 at 06:07:56PM -0800, Eric Wheeler wrote:
> > Does this indicate a corrupt volume or metadata volume?
>
> It could be a software bug.
> It is always worth trying the newest code, but I don't spot any
> obvious change that could be related to this.
...
> The trace shows it's applying a 'discard' to the metadata and releasing
> blocks when the problem occurs.
Right, given that the BUG trace shows active discards it could be that
v3.8 commit e808807 ("dm thin: fix race between simultaneous io and
discards to same block") may help.
(though I never saw that race manifest with the BUG in question)
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: dm-thinp BUG at drivers/md/persistent-data/dm-btree-remove.c:188
2013-02-15 3:48 ` Mike Snitzer
@ 2013-02-15 5:12 ` Eric Wheeler
0 siblings, 0 replies; 7+ messages in thread
From: Eric Wheeler @ 2013-02-15 5:12 UTC (permalink / raw)
To: device-mapper development
>> The trace shows it's applying a 'discard' to the metadata and releasing
>> blocks when the problem occurs.
>
> Right, given that the BUG trace shows active discards it could be that
> v3.8 commit e808807 ("dm thin: fix race between simultaneous io and
> discards to same block") may help.
>
> (though I never saw that race manifest with the BUG in question)
I am mounting ext4 with the discard option on my dm-thinp volumes. I've
turned discard off for now hoping for stable operation.
-Eric
--
Eric Wheeler
www.globallinuxsecurity.pro
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dm-thinp BUG at drivers/md/persistent-data/dm-btree-remove.c:188
2013-02-15 2:34 ` Alasdair G Kergon
2013-02-15 3:48 ` Mike Snitzer
@ 2013-02-15 3:49 ` Eric Wheeler
1 sibling, 0 replies; 7+ messages in thread
From: Eric Wheeler @ 2013-02-15 3:49 UTC (permalink / raw)
To: Alasdair G Kergon; +Cc: dm-devel
On Fri, 15 Feb 2013, Alasdair G Kergon wrote:
> On Thu, Feb 14, 2013 at 06:07:56PM -0800, Eric Wheeler wrote:
>> Does this indicate a corrupt volume or metadata volume?
>
> It could be a software bug.
> It is always worth trying the newest code, but I don't spot any
> obvious change that could be related to this.
>
>> Let me know what other data I can collect, if any. The VM seems to hang
>> every few hours or so but I'm not sure what triggers it yet.
>
> Please provide the basic parameters of the device e.g.
The volume names have been sanitized in the output so I can publish them.
old_pool is no longer in use. It is mostly broken because I ran out of
metadata space on old_pool and started over with "pool" which has a
16GB tmeta volume.
> dmsetup info -c
Name Maj Min Stat Open Targ Event UUID
old_pool-aac 252 19 L--w 0 1 0 LVM-WxsniE1DiB1bDGpIIVIIOmutTcY2O12ZPDUdZs0WdzwBrW4u7IbKt4TDmjHlkqY1
old_pool-aad 252 27 L--w 0 1 0 LVM-WxsniE1DiB1bDGpIIVIIOmutTcY2O12ZrMmwdOg0acMW47R2FtVN8IpC9mC7fPBu
old_pool-aae 252 18 L--w 0 1 0 LVM-WxsniE1DiB1bDGpIIVIIOmutTcY2O12ZL4oAew8Xj9qIUAlirz1mUDO6uC1YbeL2
old_pool-aaf 252 24 L--w 0 1 0 LVM-WxsniE1DiB1bDGpIIVIIOmutTcY2O12Z17AKABzVOfxVoiBOZxf7AFHVcfaNtBv4
old_pool-aaj 252 25 L--w 0 1 0 LVM-WxsniE1DiB1bDGpIIVIIOmutTcY2O12ZAx5amNe2T4yjsJ9M1yUC0Com8nHNOvoS
old_pool-aak 252 23 L--w 0 1 0 LVM-WxsniE1DiB1bDGpIIVIIOmutTcY2O12ZlUyYiVcr10r9jZ2BjjcPp8XVwdnRrtxR
old_pool-aal 252 26 L--w 0 1 0 LVM-WxsniE1DiB1bDGpIIVIIOmutTcY2O12ZrULTcdi88MO327hUEPpXUyGkfaUCUMdj
old_pool-aan 252 20 L--w 0 1 0 LVM-WxsniE1DiB1bDGpIIVIIOmutTcY2O12ZNwC8lugqdj0mesi4xCWeuSqxm5ZOYe5B
old_pool-aap 252 22 L--w 0 1 0 LVM-WxsniE1DiB1bDGpIIVIIOmutTcY2O12ZFj8gsjzrQ9yrMfpZrRiij70wylJeY4Na
old_pool-aar 252 21 L--w 0 1 0 LVM-WxsniE1DiB1bDGpIIVIIOmutTcY2O12ZKeqf7ouRzMEYTMKDQ8wn9dYZyIDTo844
old_pool-aas 252 28 L--w 0 1 0 LVM-WxsniE1DiB1bDGpIIVIIOmutTcY2O12Ze9dwbC2jPEaZKfbB8bwH4ZKWJtXyic2h
old_pool-pool 252 17 L--w 0 1 0 LVM-WxsniE1DiB1bDGpIIVIIOmutTcY2O12Z7GcAvP20FSMxLI3Y8gyG1el4eCejvMdr
old_pool-pool_tdata 252 15 L--w 1 2 0 LVM-WxsniE1DiB1bDGpIIVIIOmutTcY2O12ZzEM36WwdDz6TqSu4RAZcQPEIPWA6KaE5
old_pool-pool_tmeta 252 14 L--w 1 1 0 LVM-WxsniE1DiB1bDGpIIVIIOmutTcY2O12Zitg1ZQFu0lywXtXtATOGazFE8IUB6YXv
old_pool-pool-tpool 252 16 L--w 12 1 0 LVM-WxsniE1DiB1bDGpIIVIIOmutTcY2O12Z7GcAvP20FSMxLI3Y8gyG1el4eCejvMdr-tpool
pool-aaa 252 7 L--w 0 1 0 LVM-eEgPyKYcZaM3BYBtSTHwLKuoodqbRH4ayHUOZJPgmAV9CMxty7w93RbpRHPq29HK
pool-aab 252 10 L--w 0 1 0 LVM-eEgPyKYcZaM3BYBtSTHwLKuoodqbRH4aUMJFnLqsGdoVJ6HzhqFLvdFUX06Fu4rN
pool-aag 252 12 L--w 0 1 0 LVM-eEgPyKYcZaM3BYBtSTHwLKuoodqbRH4aHTPViI0Hb56ico35QgtQF3H9818AVw37
pool-aah 252 6 L--w 0 1 0 LVM-eEgPyKYcZaM3BYBtSTHwLKuoodqbRH4actTmWGAa3LiXwLu6txzj6BqrmardzapF
pool-aai 252 11 L--w 0 1 0 LVM-eEgPyKYcZaM3BYBtSTHwLKuoodqbRH4anpU9NKpY0O4UmQn0QXIrei5qRlK58JTV
pool-aam 252 8 L--w 1 1 0 LVM-eEgPyKYcZaM3BYBtSTHwLKuoodqbRH4aALhd9AEoF3OfcTBP8HaS0Mrekh1K2sa8
pool-aao 252 9 L--w 0 1 0 LVM-eEgPyKYcZaM3BYBtSTHwLKuoodqbRH4amxFrrZasqAPxj5SmHeIG8PYRc2fGWsgb
pool-aaq 252 13 L--w 0 1 0 LVM-eEgPyKYcZaM3BYBtSTHwLKuoodqbRH4aB3scAqJPBoUn33F4Fl20c0PRYCNWPPzA
pool-pool 252 5 L--w 0 1 0 LVM-eEgPyKYcZaM3BYBtSTHwLKuoodqbRH4aX6UHd24HwxSfe4Ek9DKe7Gn3EzHqReP0
pool-pool_tdata 252 3 L--w 1 2 0 LVM-eEgPyKYcZaM3BYBtSTHwLKuoodqbRH4aBxn6jyhL5FW1sQJ1HVKAfLak929VgkOw
pool-pool_tmeta 252 2 L--w 1 1 0 LVM-eEgPyKYcZaM3BYBtSTHwLKuoodqbRH4aN82ZJRYEgmqfufReLDshPnoNxubeFX60
pool-pool-tpool 252 4 L--w 9 1 0 LVM-eEgPyKYcZaM3BYBtSTHwLKuoodqbRH4aX6UHd24HwxSfe4Ek9DKe7Gn3EzHqReP0-tpool
VolGroup-lv_root 252 0 L--w 1 2 0 LVM-Jx5Plzd03Rc9hfDqdz5wYGDbwXbi0hzg2HsYRt9T7mKFtmxFKGl13mab0fnyB0T0
VolGroup-lv_swap 252 1 L--w 2 1 0 LVM-Jx5Plzd03Rc9hfDqdz5wYGDbwXbi0hzgq0MV3ZgxgmkesoXuxzJlHRQNGFe8KzRj
> dmsetup table
old_pool-aac: 0 41943040 thin 252:16 2
old_pool-aad: 0 41943040 thin 252:16 18
old_pool-aae: 0 629145600 thin 252:16 1
old_pool-aaf: 0 31457280 thin 252:16 7
old_pool-aaj: 0 41943040 thin 252:16 16
old_pool-aak: 0 31457280 thin 252:16 6
old_pool-aal_swap: 0 16777216 thin 252:16 17
old_pool-aan: 0 62496768 thin 252:16 3
old_pool-aap: 0 62914560 thin 252:16 5
old_pool-aar: 0 41943040 thin 252:16 4
old_pool-aas: 0 629145600 thin 252:16 19
old_pool-pool: 0 1027604480 linear 252:16 0
old_pool-pool_tdata: 0 419430400 linear 253:16 2048
old_pool-pool_tdata: 419430400 608174080 linear 253:16 419637248
old_pool-pool_tmeta: 0 204800 linear 253:16 419432448
old_pool-pool-tpool: 0 1027604480 thin-pool 252:14 252:15 256 0 0
pool-aaa: 0 629145600 thin 252:4 3
pool-aab: 0 79691776 thin 252:4 6
pool-aag: 0 79273984 thin 252:4 8
pool-aah: 0 31457280 thin 252:4 1
pool-aai: 0 62914560 thin 252:4 7
pool-aam: 0 41943040 thin 252:4 4
pool-aao: 0 52428800 thin 252:4 5
pool-aaq: 0 629145600 thin 252:4 9
pool-pool: 0 1006632960 linear 252:4 0
pool-pool_tdata: 0 838860800 linear 253:32 2048
pool-pool_tdata: 838860800 167772160 linear 253:32 872417280
pool-pool_tmeta: 0 33161216 linear 253:32 838862848
pool-pool-tpool: 0 1006632960 thin-pool 252:2 252:3 128 0 0
VolGroup-lv_root: 0 13713408 linear 253:2 2048
VolGroup-lv_root: 13713408 14680064 linear 253:2 15747072
VolGroup-lv_swap: 0 2031616 linear 253:2 13715456
> dmsetup status
old_pool-aac: 0 41943040 thin 37725952 41943039
old_pool-aad: 0 41943040 thin 30514176 41943039
old_pool-aae: 0 629145600 thin 326287616 625017087
old_pool-aaf: 0 31457280 thin 21150464 31457279
old_pool-aaj: 0 41943040 thin 36858880 41943039
old_pool-aak: 0 31457280 thin 20298496 31375871
old_pool-aal_swap: 0 16777216 thin 279808 280831
old_pool-aan: 0 62496768 thin 60272128 62476543
old_pool-aap: 0 62914560 thin 59126016 62901503
old_pool-aar: 0 41943040 thin 31097344 41943039
old_pool-aas: 0 629145600 thin 299941376 625017087
old_pool-pool: 0 1027604480 linear
old_pool-pool_tdata: 0 419430400 linear
old_pool-pool_tdata: 419430400 608174080 linear
old_pool-pool_tmeta: 0 204800 linear
old_pool-pool-tpool: 0 1027604480 thin-pool 197 25586/25600 3229287/4014080 - rw no_discard_passdown
pool-aaa: 0 629145600 thin 207272576 624951423
pool-aab: 0 79691776 thin 42822656 79508223
pool-aag: 0 79273984 thin 62524416 79273983
pool-aah: 0 31457280 thin 19696256 31375871
pool-aai: 0 62914560 thin 47576320 62901503
pool-aam: 0 41943040 thin 12874880 39770495
pool-aao: 0 52428800 thin 32009600 52283647
pool-aaq: 0 629145600 thin 170327040 624951423
pool-pool: 0 1006632960 linear
pool-pool_tdata: 0 838860800 linear
pool-pool_tdata: 838860800 167772160 linear
pool-pool_tmeta: 0 33161216 linear
pool-pool-tpool: 0 1006632960 thin-pool 42 4890/4145152 4553196/7864320 - rw no_discard_passdown
VolGroup-lv_root: 0 13713408 linear
VolGroup-lv_root: 13713408 14680064 linear
VolGroup-lv_swap: 0 2031616 linear
> and then I think we'll probably want to see your metadata (or
> the relevant part of it) at the point where it hangs.
Here's the whole dump:
http://www.globallinuxsecurity.pro/out/pool_tmeta.dump.gz
> The trace shows it's applying a 'discard' to the metadata and releasing
> blocks when the problem occurs.
Is this the same 'discard' as in the ATA discard sense, or just deleting
something from its tree?
-Eric
>
> Alasdair
>
>
--
Eric Wheeler
www.globallinuxsecurity.pro
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dm-thinp BUG at drivers/md/persistent-data/dm-btree-remove.c:188
2013-02-15 2:07 dm-thinp BUG at drivers/md/persistent-data/dm-btree-remove.c:188 Eric Wheeler
2013-02-15 2:34 ` Alasdair G Kergon
@ 2013-02-15 10:52 ` thornber
2013-02-15 23:21 ` Eric Wheeler
1 sibling, 1 reply; 7+ messages in thread
From: thornber @ 2013-02-15 10:52 UTC (permalink / raw)
To: device-mapper development
On Thu, Feb 14, 2013 at 06:07:56PM -0800, Eric Wheeler wrote:
>
> Hello all,
>
> I've been experimenting with dm-thinp recently and for the past few
> months and all has been well---until today.
Hi Eric,
Thanks for testing thinp and giving such a good bug report. This bug
is one I haven't seen before (nothing to do with the race fix that
Mike mentioned). It's almost certainly in the btree remove code, very
close to where the BUG() triggered.
Discard support is the main user of the btree_remove, so turning off
discard support in the pool should be a good temporary fix.
Could you let me know what sort of work load you were applying to the
ext4 fs to cause this to happen? If I can reproduce this I should be
able to get a fix to you v. quickly.
- Joe
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: dm-thinp BUG at drivers/md/persistent-data/dm-btree-remove.c:188
2013-02-15 10:52 ` thornber
@ 2013-02-15 23:21 ` Eric Wheeler
0 siblings, 0 replies; 7+ messages in thread
From: Eric Wheeler @ 2013-02-15 23:21 UTC (permalink / raw)
To: device-mapper development; +Cc: thornber
On Fri, 15 Feb 2013, thornber@redhat.com wrote:
> On Thu, Feb 14, 2013 at 06:07:56PM -0800, Eric Wheeler wrote:
> Discard support is the main user of the btree_remove, so turning off
> discard support in the pool should be a good temporary fix.
Yep, I did just that and it seems to be stable at the moment.
> Could you let me know what sort of work load you were applying to the
> ext4 fs to cause this to happen? If I can reproduce this I should be
> able to get a fix to you v. quickly.
I'm sure it was very light load. Backups happen at night using
rdiff-backup, but the failure was happening during the day outside of a
backup period.
I do have a MySQL replication writing to a volume, so that would generate
a few writes---but its an extremely-low-write database, so that seems
unlikely.
The best I can think is something like mlocate was scanning the disk, so
maybe just a filesystem traversal was doing it---but mlocate would have
been read-only.
I wouldn't think so, but is it possible for a read to cause a discard
under dm-thinp?
-Eric
--
Eric Wheeler
www.globallinuxsecurity.pro
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-02-15 23:21 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-15 2:07 dm-thinp BUG at drivers/md/persistent-data/dm-btree-remove.c:188 Eric Wheeler
2013-02-15 2:34 ` Alasdair G Kergon
2013-02-15 3:48 ` Mike Snitzer
2013-02-15 5:12 ` Eric Wheeler
2013-02-15 3:49 ` Eric Wheeler
2013-02-15 10:52 ` thornber
2013-02-15 23:21 ` Eric Wheeler
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.