* [linux-lvm] deadlock during lvm snapshot
@ 2016-11-28 11:58 Tomaz Beltram
2016-11-29 8:38 ` Zdenek Kabelac
2016-11-29 10:28 ` Tomaz Beltram
0 siblings, 2 replies; 4+ messages in thread
From: Tomaz Beltram @ 2016-11-28 11:58 UTC (permalink / raw)
To: linux-lvm
[-- Attachment #1.1: Type: text/plain, Size: 2758 bytes --]
Hi,
I'm doing backup of a running mongodb using LVM snapshot. Sometimes I
run into a deadlock situation and kernel reports blocked tasks for jbd2,
mongod, dmeventd and my tar doing backup.
This happens very rarely (one in a thousand) but the effect is rather
severe as mongodb stops working. I also can't remove and unmount the
snapshot. I have attached syslog of two occurrences. The stack traces of
the blocked tasks are very similar so I suspect a potential race
condition between the affected tasks.
Sep 15 17:06:53 dss2 kernel: [719277.567904] INFO: task jbd2/dm-2-8:9048
blocked for more than 120 seconds.
Sep 15 17:06:53 dss2 kernel: [719277.568130] INFO: task mongod:23239
blocked for more than 120 seconds.
Sep 15 17:06:53 dss2 kernel: [719277.568267] INFO: task mongod:23242
blocked for more than 120 seconds.
Sep 15 17:06:53 dss2 kernel: [719277.568350] INFO: task mongod:23243
blocked for more than 120 seconds.
Sep 15 17:06:53 dss2 kernel: [719277.568397] INFO: task dmeventd:12427
blocked for more than 120 seconds.
Sep 15 17:06:53 dss2 kernel: [719277.568523] INFO: task
kworker/u16:2:31890 blocked for more than 120 seconds.
Sep 15 17:06:53 dss2 kernel: [719277.568713] INFO: task tar:12446
blocked for more than 120 seconds.
Sep 15 17:08:53 dss2 kernel: [719397.567614] INFO: task jbd2/dm-2-8:9048
blocked for more than 120 seconds.
Sep 15 17:08:53 dss2 kernel: [719397.567731] INFO: task mongod:23239
blocked for more than 120 seconds.
Sep 15 17:08:53 dss2 kernel: [719397.567870] INFO: task mongod:23240
blocked for more than 120 seconds.
Nov 25 17:10:56 dss2 kernel: [282360.865020] INFO: task jbd2/dm-4-8:878
blocked for more than 120 seconds.
Nov 25 17:10:56 dss2 kernel: [282360.865624] INFO: task mongod:1652
blocked for more than 120 seconds.
Nov 25 17:10:56 dss2 kernel: [282360.866279] INFO: task mongod:1655
blocked for more than 120 seconds.
Nov 25 17:10:56 dss2 kernel: [282360.866771] INFO: task mongod:1656
blocked for more than 120 seconds.
Nov 25 17:10:56 dss2 kernel: [282360.867294] INFO: task dmeventd:3504
blocked for more than 120 seconds.
Nov 25 17:10:56 dss2 kernel: [282360.867783] INFO: task
kworker/u16:2:8016 blocked for more than 120 seconds.
Nov 25 17:10:56 dss2 kernel: [282360.868351] INFO: task tar:3560 blocked
for more than 120 seconds.
Nov 25 17:10:56 dss2 kernel: [282360.868865] INFO: task
kworker/u16:1:5561 blocked for more than 120 seconds.
Nov 25 17:12:56 dss2 kernel: [282480.868656] INFO: task jbd2/dm-4-8:878
blocked for more than 120 seconds.
Nov 25 17:12:56 dss2 kernel: [282480.869015] INFO: task mongod:1652
blocked for more than 120 seconds.
Full syslogs of two occurrences are attached.
I have Ubuntu 16.04.1 (lvm2 2.02.133-1ubuntu10) with mongod 3.2.9 on a
64bit system.
wbr Tomaz
[-- Attachment #1.2: Type: text/html, Size: 3112 bytes --]
[-- Attachment #2: syslog.2.gz --]
[-- Type: application/gzip, Size: 7067 bytes --]
[-- Attachment #3: syslog.4.gz --]
[-- Type: application/gzip, Size: 8955 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-lvm] deadlock during lvm snapshot
2016-11-28 11:58 [linux-lvm] deadlock during lvm snapshot Tomaz Beltram
@ 2016-11-29 8:38 ` Zdenek Kabelac
2016-11-29 10:28 ` Tomaz Beltram
1 sibling, 0 replies; 4+ messages in thread
From: Zdenek Kabelac @ 2016-11-29 8:38 UTC (permalink / raw)
To: LVM general discussion and development
Dne 28.11.2016 v 12:58 Tomaz Beltram napsal(a):
> Hi,
>
> I'm doing backup of a running mongodb using LVM snapshot. Sometimes I
> run into a deadlock situation and kernel reports blocked tasks for jbd2,
> mongod, dmeventd and my tar doing backup.
>
> This happens very rarely (one in a thousand) but the effect is rather
> severe as mongodb stops working. I also can't remove and unmount the
> snapshot. I have attached syslog of two occurrences. The stack traces of
> the blocked tasks are very similar so I suspect a potential race
> condition between the affected tasks.
>
> Sep 15 17:06:53 dss2 kernel: [719277.567904] INFO: task jbd2/dm-2-8:9048
> blocked for more than 120 seconds.
> Sep 15 17:06:53 dss2 kernel: [719277.568130] INFO: task mongod:23239
> blocked for more than 120 seconds.
> Sep 15 17:06:53 dss2 kernel: [719277.568267] INFO: task mongod:23242
> blocked for more than 120 seconds.
> Sep 15 17:06:53 dss2 kernel: [719277.568350] INFO: task mongod:23243
> blocked for more than 120 seconds.
> Sep 15 17:06:53 dss2 kernel: [719277.568397] INFO: task dmeventd:12427
> blocked for more than 120 seconds.
> Sep 15 17:06:53 dss2 kernel: [719277.568523] INFO: task
> kworker/u16:2:31890 blocked for more than 120 seconds.
> Sep 15 17:06:53 dss2 kernel: [719277.568713] INFO: task tar:12446
> blocked for more than 120 seconds.
> Sep 15 17:08:53 dss2 kernel: [719397.567614] INFO: task jbd2/dm-2-8:9048
> blocked for more than 120 seconds.
> Sep 15 17:08:53 dss2 kernel: [719397.567731] INFO: task mongod:23239
> blocked for more than 120 seconds.
> Sep 15 17:08:53 dss2 kernel: [719397.567870] INFO: task mongod:23240
> blocked for more than 120 seconds.
>
> Nov 25 17:10:56 dss2 kernel: [282360.865020] INFO: task jbd2/dm-4-8:878
> blocked for more than 120 seconds.
> Nov 25 17:10:56 dss2 kernel: [282360.865624] INFO: task mongod:1652
> blocked for more than 120 seconds.
> Nov 25 17:10:56 dss2 kernel: [282360.866279] INFO: task mongod:1655
> blocked for more than 120 seconds.
> Nov 25 17:10:56 dss2 kernel: [282360.866771] INFO: task mongod:1656
> blocked for more than 120 seconds.
> Nov 25 17:10:56 dss2 kernel: [282360.867294] INFO: task dmeventd:3504
> blocked for more than 120 seconds.
> Nov 25 17:10:56 dss2 kernel: [282360.867783] INFO: task
> kworker/u16:2:8016 blocked for more than 120 seconds.
> Nov 25 17:10:56 dss2 kernel: [282360.868351] INFO: task tar:3560 blocked
> for more than 120 seconds.
> Nov 25 17:10:56 dss2 kernel: [282360.868865] INFO: task
> kworker/u16:1:5561 blocked for more than 120 seconds.
> Nov 25 17:12:56 dss2 kernel: [282480.868656] INFO: task jbd2/dm-4-8:878
> blocked for more than 120 seconds.
> Nov 25 17:12:56 dss2 kernel: [282480.869015] INFO: task mongod:1652
> blocked for more than 120 seconds.
>
> Full syslogs of two occurrences are attached.
>
> I have Ubuntu 16.04.1 (lvm2 2.02.133-1ubuntu10) with mongod 3.2.9 on a
> 64bit system.
>
Please switch to newer version of lvm2.
Sequence with snapshot activation had been reworked to minimize possibility
to hit this kernel race - race is still there even with the latest kernel,
but in the real world you should not have a much chance to hit it.
If you still do - please report again - we will take more closer look and your
workflow.
Regards
Zdenek
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-lvm] deadlock during lvm snapshot
2016-11-28 11:58 [linux-lvm] deadlock during lvm snapshot Tomaz Beltram
2016-11-29 8:38 ` Zdenek Kabelac
@ 2016-11-29 10:28 ` Tomaz Beltram
2016-11-29 15:59 ` [linux-lvm] deadlock during lvm snapshot target Zdenek Kabelac
1 sibling, 1 reply; 4+ messages in thread
From: Tomaz Beltram @ 2016-11-29 10:28 UTC (permalink / raw)
To: linux-lvm
[-- Attachment #1: Type: text/plain, Size: 464 bytes --]
On 29. 11. 2016 09:38, Zdenek Kabelac wrote:
>
> Please switch to newer version of lvm2.
>
> Sequence with snapshot activation had been reworked to minimize
> possibility to hit this kernel race - race is still there even with
> the latest kernel, but in the real world you should not have a much chance to hit it.
Thanks for the explanation. I reported this to Ubuntu launchpad. Please
add your votes.
https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/1645636
[-- Attachment #2: Type: text/html, Size: 1164 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-lvm] deadlock during lvm snapshot target
2016-11-29 10:28 ` Tomaz Beltram
@ 2016-11-29 15:59 ` Zdenek Kabelac
0 siblings, 0 replies; 4+ messages in thread
From: Zdenek Kabelac @ 2016-11-29 15:59 UTC (permalink / raw)
To: LVM general discussion and development
Dne 29.11.2016 v 11:28 Tomaz Beltram napsal(a):
> On 29. 11. 2016 09:38, Zdenek Kabelac wrote:
>>
>> Please switch to newer version of lvm2.
>>
>> Sequence with snapshot activation had been reworked to minimize possibility
>> to hit this kernel race - race is still there even with the latest kernel, but in the real world you should not have a much chance to hit it.
> Thanks for the explanation. I reported this to Ubuntu launchpad. Please add
> your votes.
>
> https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/1645636
>
Yep just a minor comment - it's not 'fixed' - but the chance to hit the race
have been considerably decreased - but we still have an example where it could
be demonstrated (and it's not just 'dm' - it's more generic - see Mikulas's post)
Regards
Zdenek
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-11-29 15:59 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-28 11:58 [linux-lvm] deadlock during lvm snapshot Tomaz Beltram
2016-11-29 8:38 ` Zdenek Kabelac
2016-11-29 10:28 ` Tomaz Beltram
2016-11-29 15:59 ` [linux-lvm] deadlock during lvm snapshot target Zdenek Kabelac
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).