All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Nowoczynski <pauln@psc.edu>
To: lustre-devel@lists.lustre.org
Subject: [Lustre-devel] Deadlock in usocklnd
Date: Tue, 17 Mar 2009 15:50:52 -0400	[thread overview]
Message-ID: <49BFFF1C.8080203@psc.edu> (raw)

Hi,
I've got some code which uses ptlrpc and lnet and a few weeks ago we 
integrated usocklnd into our environment. This lnd has been working 
quite well and seems very efficient. The only issue so far is that we're 
hitting a clear case of deadlock when trying to handle a failed 
connection. I'm not sure if this code is actually in use by any lustre 
component as of yet but I thought it best to bring this to your attention.
paul


(gdb) bt
#0 0x00002b3872336888 in __lll_mutex_lock_wait () from 
/lib64/libpthread.so.0
#1 0x00002b38723328a5 in _L_mutex_lock_107 () from /lib64/libpthread.so.0
#2 0x00002b3872332333 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000000000049c9f5 in usocklnd_find_or_create_conn (peer=0x2695270, 
type=0, connp=0x42803d30, tx=0x2aaad46f5d10, zc_ack=0x0,
send_immediately=0x42803d2c) at 
..//../..//lnet-lite/ulnds/socklnd/conn.c:841
#4 0x00000000004a2701 in usocklnd_send (ni=0x25e1810, private=0x0, 
lntmsg=0x2aaacca1ace0) at 
..//../..//lnet-lite/ulnds/socklnd/usocklnd_cb.c:124
#5 0x000000000048d886 in lnet_ni_send (ni=0x25e1810, msg=0x2aaacca1ace0) 
at ..//../..//lnet-lite/lnet/lib-move.c:863
#6 0x000000000048dd20 in lnet_post_send_locked (msg=0x2aaacca1ace0, 
do_send=1) at ..//../..//lnet-lite/lnet/lib-move.c:951
#7 0x000000000048dfb4 in lnet_return_credits_locked (msg=0x2aaacca1ae40) 
at ..//../..//lnet-lite/lnet/lib-move.c:1108
#8 0x0000000000494230 in lnet_complete_msg_locked (msg=0x2aaacca1ae40) 
at ..//../..//lnet-lite/lnet/lib-msg.c:143
#9 0x00000000004944f5 in lnet_finalize (ni=0x25e1810, 
msg=0x2aaacca1ae40, status=-5) at ..//../..//lnet-lite/lnet/lib-msg.c:242
#10 0x000000000049bff8 in usocklnd_destroy_tx (ni=0x25e1810, 
tx=0x26950c0) at ..//../..//lnet-lite/ulnds/socklnd/conn.c:562
#11 0x000000000049c02d in usocklnd_destroy_txlist (ni=0x25e1810, 
txlist=0x2693650) at ..//../..//lnet-lite/ulnds/socklnd/conn.c:574
#12 0x000000000049b09c in usocklnd_tear_peer_conn (conn=0x2692550) at 
..//../..//lnet-lite/ulnds/socklnd/conn.c:148
#13 0x000000000049f5c8 in usocklnd_process_stale_list 
(pt_data=0x25e1c98) at ..//../..//lnet-lite/ulnds/socklnd/poll.c:112
#14 0x000000000049f7b2 in usocklnd_poll_thread (arg=0x25e1c98) at 
..//../..//lnet-lite/ulnds/socklnd/poll.c:169
#15 0x0000000000450871 in psc_usklndthr_begin (arg=0x25e4370) at 
..//../..//psc_fsutil_libs/psc_util/usklndthr.c:16
#16 0x000000000044f000 in _pscthr_begin (arg=0x7fff389980b0) at 
..//../..//psc_fsutil_libs/psc_util/thread.c:237
#17 0x00002b38723302f7 in start_thread () from /lib64/libpthread.so.0
#18 0x00002b387281fe3d in clone () from /lib64/libc.so.6
(gdb) up
#4 0x00000000004a2701 in usocklnd_send (ni=0x25e1810, private=0x0, 
lntmsg=0x2aaacca1ace0) at 
..//../..//lnet-lite/ulnds/socklnd/usocklnd_cb.c:124
124 rc = usocklnd_find_or_create_conn(peer, type, &conn, tx, NULL,
(gdb) print *peer
$4 = {up_list = {next = 0x6e1b08, prev = 0x6e1b08}, up_peerid = {nid = 
562954416842067, pid = 2147520945}, up_conns = {0x2692550, 0x0, 0x0},
up_ni = 0x25e1810, up_incarnation = 0, up_incrn_is_set = 0, up_refcount 
= {counter = 3}, up_lock = {__data = {__lock = 2, __count = 0, __owner = 
23283,
__nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 
0x0}},
__size = "\002\000\000\000\000\000\000\000?Z\000\000\001", '\0' <repeats 
26 times>, __align = 2}, up_errored = 0, up_last_alive = 0}
(gdb)

## We're waiting for a lock which we already hold!
* 6 Thread 1115703616 (LWP 23283) 0x00002b3872336888 in 
__lll_mutex_lock_wait () from /lib64/libpthread.so.0

             reply	other threads:[~2009-03-17 19:50 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-17 19:50 Paul Nowoczynski [this message]
2009-03-17 21:17 ` [Lustre-devel] Deadlock in usocklnd Isaac Huang
2009-03-17 21:31   ` Paul Nowoczynski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49BFFF1C.8080203@psc.edu \
    --to=pauln@psc.edu \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.