cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [Cluster-devel] Bug on dlm
@ 2007-09-28  7:46 Jordi Prats
  2007-09-28  7:59 ` Patrick Caulfield
  0 siblings, 1 reply; 4+ messages in thread
From: Jordi Prats @ 2007-09-28  7:46 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,
I've found this while starting my server. It's a F7 with the latest 
version avaliable.

Hope this helps :)

Jordi

Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: recover 1
Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: add member 2
Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: total members 1 error 0
Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: dlm_recover_directory
Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: dlm_recover_directory 0 
entries
Jul 26 23:52:51 inf18 kernel:
Jul 26 23:52:51 inf18 kernel: =====================================
Jul 26 23:52:51 inf18 kernel: [ BUG: bad unlock balance detected! ]
Jul 26 23:52:51 inf18 kernel: -------------------------------------
Jul 26 23:52:51 inf18 kernel: dlm_recoverd/2963 is trying to release 
lock (&ls->ls_in_recovery) at:
Jul 26 23:52:51 inf18 kernel: [<ee67b874>] dlm_recoverd+0x265/0x433 [dlm]
Jul 26 23:52:51 inf18 kernel: but there are no more locks to release!
Jul 26 23:52:51 inf18 kernel:
Jul 26 23:52:51 inf18 kernel: other info that might help us debug this:
Jul 26 23:52:51 inf18 kernel: 2 locks held by dlm_recoverd/2963:
Jul 26 23:52:51 inf18 kernel:  #0:  (&ls->ls_recoverd_active){....}, at: 
[<c11f9e7d>] mutex_lock+0x21/0x24
Jul 26 23:52:51 inf18 kernel:  #1:  (&ls->ls_recover_lock){....}, at: 
[<ee67b84d>] dlm_recoverd+0x23e/0x433 [dlm]
Jul 26 23:52:51 inf18 kernel:
Jul 26 23:52:51 inf18 kernel: stack backtrace:
Jul 26 23:52:51 inf18 kernel:  [<c1005e4a>] show_trace_log_lvl+0x1a/0x2f
Jul 26 23:52:51 inf18 kernel:  [<c10063fc>] show_trace+0x12/0x14
Jul 26 23:52:51 inf18 kernel:  [<c1006480>] dump_stack+0x16/0x18
Jul 26 23:52:51 inf18 kernel:  [<c1037321>] 
print_unlock_inbalance_bug+0xec/0xf9
Jul 26 23:52:51 inf18 kernel:  [<c10381e6>] 
lock_release_non_nested+0x95/0x150
Jul 26 23:52:51 inf18 kernel:  [<c10383ed>] lock_release+0x14c/0x189
Jul 26 23:52:51 inf18 kernel:  [<c1033bfc>] up_write+0x16/0x2b
Jul 26 23:52:51 inf18 kernel:  [<ee67b874>] dlm_recoverd+0x265/0x433 [dlm]
Jul 26 23:52:52 inf18 kernel:  [<c1030e47>] kthread+0xb3/0xdc
Jul 26 23:52:52 inf18 kernel:  [<c1005967>] kernel_thread_helper+0x7/0x10
Jul 26 23:52:52 inf18 kernel:  =======================
Jul 26 23:52:52 inf18 kernel: dlm: rgmanager: recover 1 done: 4 ms
Jul 26 23:53:00 inf18 clurgmgrd[2949]: <notice> Resource Group Manager 
Starting

-- 
......................................................................
         __
        / /          Jordi Prats
  C E / S / C A      Dept. de Sistemes
      /_/            Centre de Supercomputaci? de Catalunya

  Gran Capit?, 2-4 (Edifici Nexus) ? 08034 Barcelona
  T. 93 205 6464 ? F.  93 205 6979 ? jprats at cesca.es
...................................................................... 



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Cluster-devel] Bug on dlm
  2007-09-28  7:46 [Cluster-devel] Bug on dlm Jordi Prats
@ 2007-09-28  7:59 ` Patrick Caulfield
  2007-09-28 13:37   ` Jordi Prats
  0 siblings, 1 reply; 4+ messages in thread
From: Patrick Caulfield @ 2007-09-28  7:59 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Jordi Prats wrote:
> Hi,
> I've found this while starting my server. It's a F7 with the latest
> version avaliable.
>
> Hope this helps :)
>
> Jordi
>
> Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: recover 1
> Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: add member 2
> Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: total members 1 error 0
> Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: dlm_recover_directory
> Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: dlm_recover_directory 0
> entries
> Jul 26 23:52:51 inf18 kernel:
> Jul 26 23:52:51 inf18 kernel: =====================================
> Jul 26 23:52:51 inf18 kernel: [ BUG: bad unlock balance detected! ]
> Jul 26 23:52:51 inf18 kernel: -------------------------------------
> Jul 26 23:52:51 inf18 kernel: dlm_recoverd/2963 is trying to release
> lock (&ls->ls_in_recovery) at:
> Jul 26 23:52:51 inf18 kernel: [<ee67b874>] dlm_recoverd+0x265/0x433 [dlm]
> Jul 26 23:52:51 inf18 kernel: but there are no more locks to release!
> Jul 26 23:52:51 inf18 kernel:

Yeah, we know about it. It's not actually a bug, just the lockdep checking code
being a little over-enthusiastic. Unfortunately there aren't any annotations
available to make it quiet either.

The trick is to live with it, or to use kernels that have a little less
debugging compiled in, which you would want to do for production anyway :)


Patrick



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Cluster-devel] Bug on dlm
  2007-09-28  7:59 ` Patrick Caulfield
@ 2007-09-28 13:37   ` Jordi Prats
  2007-09-28 13:51     ` Patrick Caulfield
  0 siblings, 1 reply; 4+ messages in thread
From: Jordi Prats @ 2007-09-28 13:37 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,
This bug could be causing this?


[root at inf17 ~]# clustat
Member Status: Inquorate

  Member Name                        ID   Status
  ------ ----                        ---- ------
  inf17                                 1 Online, Local
  inf18                                 2 Offline
  inf19                                 3 Offline


[root at inf18 ~]# clustat
Member Status: Quorate

  Member Name                        ID   Status
  ------ ----                        ---- ------
  inf17                                 1 Online
  inf18                                 2 Online, Local
  inf19                                 3 Offline


[root at inf17 ~]# group_tool
type             level name       id       state
fence            0     default    00010001 JOIN_START_WAIT
[1]
dlm              1     rgmanager  00020001 JOIN_ALL_STOPPED
[1]

[root at inf18 ~]# group_tool
type             level name       id       state
fence            0     default    00000000 JOIN_STOP_WAIT
[1 2]
dlm              1     rgmanager  00010002 JOIN_START_WAIT
[2]

[root at inf17 ~]# cman_tool status
Version: 6.0.1
Config Version: 4
Cluster Name: boumort
Cluster Id: 13356
Cluster Member: Yes
Cluster Generation: 3824
Membership state: Cluster-Member
Nodes: 1
Expected votes: 2
Total votes: 1
Quorum: 2 Activity blocked
Active subsystems: 7
Flags:
Ports Bound: 0
Node name: inf17
Node ID: 1
Multicast addresses: 239.192.52.96
Node addresses: 192.168.22.17


[root at inf18 ~]# cman_tool status
Version: 6.0.1
Config Version: 4
Cluster Name: boumort
Cluster Id: 13356
Cluster Member: Yes
Cluster Generation: 3820
Membership state: Cluster-Member
Nodes: 2
Expected votes: 2
Total votes: 2
Quorum: 2
Active subsystems: 7
Flags:
Ports Bound: 0 177
Node name: inf18
Node ID: 2
Multicast addresses: 239.192.52.96
Node addresses: 192.168.22.18


Patrick Caulfield wrote:
> Jordi Prats wrote:
>> Hi,
>> I've found this while starting my server. It's a F7 with the latest
>> version avaliable.
>>
>> Hope this helps :)
>>
>> Jordi
>>
>> Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: recover 1
>> Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: add member 2
>> Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: total members 1 error 0
>> Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: dlm_recover_directory
>> Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: dlm_recover_directory 0
>> entries
>> Jul 26 23:52:51 inf18 kernel:
>> Jul 26 23:52:51 inf18 kernel: =====================================
>> Jul 26 23:52:51 inf18 kernel: [ BUG: bad unlock balance detected! ]
>> Jul 26 23:52:51 inf18 kernel: -------------------------------------
>> Jul 26 23:52:51 inf18 kernel: dlm_recoverd/2963 is trying to release
>> lock (&ls->ls_in_recovery) at:
>> Jul 26 23:52:51 inf18 kernel: [<ee67b874>] dlm_recoverd+0x265/0x433 [dlm]
>> Jul 26 23:52:51 inf18 kernel: but there are no more locks to release!
>> Jul 26 23:52:51 inf18 kernel:
> 
> Yeah, we know about it. It's not actually a bug, just the lockdep checking code
> being a little over-enthusiastic. Unfortunately there aren't any annotations
> available to make it quiet either.
> 
> The trick is to live with it, or to use kernels that have a little less
> debugging compiled in, which you would want to do for production anyway :)
> 
> 
> Patrick
> 
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Cluster-devel] Bug on dlm
  2007-09-28 13:37   ` Jordi Prats
@ 2007-09-28 13:51     ` Patrick Caulfield
  0 siblings, 0 replies; 4+ messages in thread
From: Patrick Caulfield @ 2007-09-28 13:51 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Jordi Prats wrote:
> Hi,
> This bug could be causing this?
> 
> 
> [root at inf17 ~]# clustat
> Member Status: Inquorate
> 
>  Member Name                        ID   Status
>  ------ ----                        ---- ------
>  inf17                                 1 Online, Local
>  inf18                                 2 Offline
>  inf19                                 3 Offline
> 
> 
> [root at inf18 ~]# clustat
> Member Status: Quorate
> 
>  Member Name                        ID   Status
>  ------ ----                        ---- ------
>  inf17                                 1 Online
>  inf18                                 2 Online, Local
>  inf19                                 3 Offline
> 
> 
> [root at inf17 ~]# group_tool
> type             level name       id       state
> fence            0     default    00010001 JOIN_START_WAIT
> [1]
> dlm              1     rgmanager  00020001 JOIN_ALL_STOPPED
> [1]
> 
> [root at inf18 ~]# group_tool
> type             level name       id       state
> fence            0     default    00000000 JOIN_STOP_WAIT
> [1 2]
> dlm              1     rgmanager  00010002 JOIN_START_WAIT
> [2]


No, that's misconfgured fencing.

> [root at inf17 ~]# cman_tool status
> Version: 6.0.1
> Config Version: 4
> Cluster Name: boumort
> Cluster Id: 13356
> Cluster Member: Yes
> Cluster Generation: 3824
> Membership state: Cluster-Member
> Nodes: 1
> Expected votes: 2
> Total votes: 1
> Quorum: 2 Activity blocked
> Active subsystems: 7
> Flags:
> Ports Bound: 0
> Node name: inf17
> Node ID: 1
> Multicast addresses: 239.192.52.96
> Node addresses: 192.168.22.17
> 
> 
> [root at inf18 ~]# cman_tool status
> Version: 6.0.1
> Config Version: 4
> Cluster Name: boumort
> Cluster Id: 13356
> Cluster Member: Yes
> Cluster Generation: 3820
> Membership state: Cluster-Member
> Nodes: 2
> Expected votes: 2
> Total votes: 2
> Quorum: 2
> Active subsystems: 7
> Flags:
> Ports Bound: 0 177
> Node name: inf18
> Node ID: 2
> Multicast addresses: 239.192.52.96
> Node addresses: 192.168.22.18
> 
> 
> Patrick Caulfield wrote:
>> Jordi Prats wrote:
>>> Hi,
>>> I've found this while starting my server. It's a F7 with the latest
>>> version avaliable.
>>>
>>> Hope this helps :)
>>>
>>> Jordi
>>>
>>> Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: recover 1
>>> Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: add member 2
>>> Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: total members 1 error 0
>>> Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: dlm_recover_directory
>>> Jul 26 23:52:51 inf18 kernel: dlm: rgmanager: dlm_recover_directory 0
>>> entries
>>> Jul 26 23:52:51 inf18 kernel:
>>> Jul 26 23:52:51 inf18 kernel: =====================================
>>> Jul 26 23:52:51 inf18 kernel: [ BUG: bad unlock balance detected! ]
>>> Jul 26 23:52:51 inf18 kernel: -------------------------------------
>>> Jul 26 23:52:51 inf18 kernel: dlm_recoverd/2963 is trying to release
>>> lock (&ls->ls_in_recovery) at:
>>> Jul 26 23:52:51 inf18 kernel: [<ee67b874>] dlm_recoverd+0x265/0x433
>>> [dlm]
>>> Jul 26 23:52:51 inf18 kernel: but there are no more locks to release!
>>> Jul 26 23:52:51 inf18 kernel:
>>
>> Yeah, we know about it. It's not actually a bug, just the lockdep
>> checking code
>> being a little over-enthusiastic. Unfortunately there aren't any
>> annotations
>> available to make it quiet either.
>>
>> The trick is to live with it, or to use kernels that have a little less
>> debugging compiled in, which you would want to do for production
>> anyway :)
>>
>>
>> Patrick
>>
>>
> 


-- 
Patrick

Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street,
Windsor, Berkshire, SL4 ITE, UK.
Registered in England and Wales under Company Registration No. 3798903



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-09-28 13:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-28  7:46 [Cluster-devel] Bug on dlm Jordi Prats
2007-09-28  7:59 ` Patrick Caulfield
2007-09-28 13:37   ` Jordi Prats
2007-09-28 13:51     ` Patrick Caulfield

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).