Hi,

I was the one who initially stumbled onto this problem, and when I 
realized that it is the ABBA deadlock, I approached Chandra and Mike.
Chandra came up with the initial fix, and it worked fine.
 
Later Chandra pointed me to this patch, and when I tried it on.. I ran 
into system hang.

Please note that, I am running it on -rt kernel based on 2.6.24. I could 
not apply the patch directly, so I ported it onto my kernel.
I am attaching the ported version(new_dm_patch)..I already ran this 
ported patch by Chandra.

I ran IO stress test with this patch while one of the paths is 
constantly bounced . Bounced the same path all the time. (20min between 
the bounces)
System hung few hours into the test..and I forced the dump. I am still 
analyzing the dump.
If you want to have the dump, please let me know where I can upload it 
to. Core is around 8G

Here are few things from the dump that might be interesting.

crash> ps |grep udev | wc -l
425 << <<<425 udevd threads at the time of hang.
crash> ps | wc -l
782
crash> foreach bt| grep rt_mutex_slowlock | wc -l
416 < <<<<416 of the the total 782 threads are waiting for a lock.
crash>
crash> struct rt_mutex ffff81024ec6cca0
struct rt_mutex {
  wait_lock = {
    raw_lock = {
      slock = 49858
    },
    break_lock = 0
  },
  wait_list = {
    prio_list = {
      next = 0xffff8101007f5ae0,
      prev = 0xffff8101007f5ae0
    },
    node_list = {
      next = 0xffff8101007f5af0,
      prev = 0xffff81007cb51af0
    }
  },
  owner = 0xffff8102400c2b22
}
Following task is holding the lock that many other udevs are waiting for.
PID: 21896  TASK: ffff8102400c2b20  CPU: 6   COMMAND: "udevd" << Holding
 #0 [ffff810100667848] schedule at ffffffff8128531c
 #1 [ffff810100667900] io_schedule at ffffffff81285859
 #2 [ffff810100667920] sync_buffer at ffffffff810d1fc1
 #3 [ffff810100667930] __wait_on_bit at ffffffff81285ad1
 #4 [ffff810100667970] out_of_line_wait_on_bit at ffffffff81285b71
 #5 [ffff8101006679e0] __wait_on_buffer at ffffffff810d1f41
 #6 [ffff8101006679f0] ext3_find_entry at ffffffff8803c1d2
 #7 [ffff810100667b60] ext3_lookup at ffffffff8803dbae
 #8 [ffff810100667ba0] do_lookup at ffffffff810b78cf
 #9 [ffff810100667bf0] __link_path_walk at ffffffff810b94ef
#10 [ffff810100667c90] link_path_walk at ffffffff810b9f99
#11 [ffff810100667d60] path_walk at ffffffff810ba04b
#12 [ffff810100667d70] do_path_lookup at ffffffff810ba352
#13 [ffff810100667dc0] __path_lookup_intent_open at ffffffff810bae88
#14 [ffff810100667e10] path_lookup_open at ffffffff810baf38
#15 [ffff810100667e20] open_exec at ffffffff810b41e3
#16 [ffff810100667ed0] do_execve at ffffffff810b53a2
#17 [ffff810100667f20] sys_execve at ffffffff8100ac30
#18 [ffff810100667f50] stub_execve at ffffffff8100c5c7

PID: 21946  TASK: ffff81007f090b20  CPU: 4   COMMAND: "udevd" << One of 
the udevds waiting for the lock.
 #0 [ffff81007f0e59f8] schedule at ffffffff8128531c
 #1 [ffff81007f0e5ab0] rt_mutex_slowlock at ffffffff81286a95
 #2 [ffff81007f0e5b80] rt_mutex_lock at ffffffff81285f84
 #3 [ffff81007f0e5b90] _mutex_lock at ffffffff812873f9
 #4 [ffff81007f0e5ba0] do_lookup at ffffffff810b788b
 #5 [ffff81007f0e5bf0] __link_path_walk at ffffffff810b94ef
 #6 [ffff81007f0e5c90] link_path_walk at ffffffff810b9f99
 #7 [ffff81007f0e5d60] path_walk at ffffffff810ba04b
 #8 [ffff81007f0e5d70] do_path_lookup at ffffffff810ba352
 #9 [ffff81007f0e5dc0] __path_lookup_intent_open at ffffffff810bae88
#10 [ffff81007f0e5e10] path_lookup_open at ffffffff810baf38
#11 [ffff81007f0e5e20] open_exec at ffffffff810b41e3
#12 [ffff81007f0e5ed0] do_execve at ffffffff810b53a2
#13 [ffff81007f0e5f20] sys_execve at ffffffff8100ac30
#14 [ffff81007f0e5f50] stub_execve at ffffffff8100c5c7
   

Thanks,
Venkateswararao Jujjuri (JV)
Realtime Team, LTC,
Beaverton, OR 97006
>
> ------------------------------------------------------------------------
>
> Subject:
> [PATCHES] new solution for dm_any_congested crash
> From:
> Mikulas Patocka <mpatocka@redhat.com>
> Date:
> Thu, 13 Nov 2008 20:55:27 -0500 (EST)
> To:
> Alasdair G Kergon <agk@redhat.com>, Chandra Seetharaman 
> <sekharan@us.ibm.com>
>
> To:
> Alasdair G Kergon <agk@redhat.com>, Chandra Seetharaman 
> <sekharan@us.ibm.com>
> CC:
> dm-devel@redhat.com, Milan Broz <mbroz@redhat.com>
>
>
> Hi
>
> The Chandra's patch was correct, but the problem is more serious (the same 
> crash could happen in dm_merge_bvec, dm_unplug_all or at some other dm 
> places), so I had to rework reference counting.
>
> These are three patches.
> 1. reverts Chadra's changes
> 2. just a little swap of two calls, to prepare for the third
> 3. the reference counting rework
>
> Chandra, please test the patches at your system (without your original 
> patch) and verify that they avoid the crashes as well as your patch does.
>
> Mikulas