From: Jim Rees <rees@umich.edu>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Peng Tao <bergwolf@gmail.com>,
William Andros Adamson <andros@netapp.com>,
Christoph Hellwig <hch@infradead.org>,
linux-nfs@vger.kernel.org, peter honeyman <honey@citi.umich.edu>
Subject: Re: [PATCH v4 00/27] add block layout driver to pnfs client
Date: Mon, 1 Aug 2011 22:21:44 -0400 [thread overview]
Message-ID: <20110802022144.GA18157@merit.edu> (raw)
In-Reply-To: <1312238117.23392.19.camel@lade.trondhjem.org>
Trond Myklebust wrote:
On Mon, 2011-08-01 at 17:10 -0400, Trond Myklebust wrote:
> Looking at the callback code, I see that if tbl->highest_used_slotid !=
> 0, then we BUG() while holding the backchannel's tbl->slot_tbl_lock
> spinlock. That seems a likely candidate for the above hang.
>
> Andy, how we are guaranteed that tbl->highest_used_slotid won't take
> values other than 0, and why do we commit suicide when it does? As far
> as I can see, there is no guarantee that we call nfs4_cb_take_slot() in
> nfs4_callback_compound(), however we appear to unconditionally call
> nfs4_cb_free_slot() provided there is a session.
>
> The other strangeness would be the fact that there is nothing enforcing
> the NFS4_SESSION_DRAINING flag. If the session is draining, then the
> back-channel simply ignores that and goes ahead with processing the
> callback. Is this to avoid deadlocks with the server returning
> NFS4ERR_BACK_CHAN_BUSY when the client does a DESTROY_SESSION?
How about something like the following?
I applied this patch, along with Andy's htonl correction. It now fails in a
different way, with a deadlock. The test runs several processes in
parallel.
INFO: task t_mtab:1767 blocked for more than 10 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
t_mtab D 0000000000000000 0 1767 1634 0x00000080
ffff8800376afd48 0000000000000086 ffff8800376afcd8 ffffffff00000000
ffff8800376ae010 ffff880037ef4500 0000000000012c80 ffff8800376affd8
ffff8800376affd8 0000000000012c80 ffffffff81a0c020 ffff880037ef4500
Call Trace:
[<ffffffff8145411a>] __mutex_lock_common+0x110/0x171
[<ffffffff81454191>] __mutex_lock_slowpath+0x16/0x18
[<ffffffff81454257>] mutex_lock+0x1e/0x32
[<ffffffff811169a2>] kern_path_create+0x75/0x11e
[<ffffffff810fe836>] ? kmem_cache_alloc+0x5f/0xf1
[<ffffffff812127d9>] ? strncpy_from_user+0x43/0x72
[<ffffffff81114077>] ? getname_flags+0x158/0x1d2
[<ffffffff81116a86>] user_path_create+0x3b/0x52
[<ffffffff81117466>] sys_linkat+0x9a/0x120
[<ffffffff8109932e>] ? audit_syscall_entry+0x119/0x145
[<ffffffff81117505>] sys_link+0x19/0x1c
[<ffffffff8145b612>] system_call_fastpath+0x16/0x1b
INFO: task t_mtab:1768 blocked for more than 10 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
t_mtab D 0000000000000000 0 1768 1634 0x00000080
ffff880037ccbc18 0000000000000082 ffff880037ccbbe8 ffffffff00000000
ffff880037cca010 ffff880037ef2e00 0000000000012c80 ffff880037ccbfd8
ffff880037ccbfd8 0000000000012c80 ffffffff81a0c020 ffff880037ef2e00
Call Trace:
[<ffffffff8145411a>] __mutex_lock_common+0x110/0x171
[<ffffffff81454191>] __mutex_lock_slowpath+0x16/0x18
[<ffffffff81454257>] mutex_lock+0x1e/0x32
[<ffffffff8111565d>] ? walk_component+0x362/0x38f
[<ffffffff811e7b9a>] ima_file_check+0x53/0x111
[<ffffffff81115ae0>] do_last+0x456/0x566
[<ffffffff81114467>] ? path_init+0x179/0x2b8
[<ffffffff81116148>] path_openat+0xca/0x30e
[<ffffffff8111647b>] do_filp_open+0x38/0x84
[<ffffffff812127d9>] ? strncpy_from_user+0x43/0x72
[<ffffffff81120014>] ? alloc_fd+0x76/0x11f
[<ffffffff81109696>] do_sys_open+0x6e/0x100
[<ffffffff81109751>] sys_open+0x1b/0x1d
[<ffffffff8145b612>] system_call_fastpath+0x16/0x1b
INFO: task t_mtab:1767 blocked for more than 10 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
t_mtab D 0000000000000000 0 1767 1634 0x00000080
ffff8800376afc18 0000000000000086 ffff8800376afbe8 ffffffff00000000
ffff8800376ae010 ffff880037ef4500 0000000000012c80 ffff8800376affd8
ffff8800376affd8 0000000000012c80 ffffffff81a0c020 ffff880037ef4500
Call Trace:
[<ffffffff8145411a>] __mutex_lock_common+0x110/0x171
[<ffffffff81454191>] __mutex_lock_slowpath+0x16/0x18
[<ffffffff81454257>] mutex_lock+0x1e/0x32
[<ffffffff8111565d>] ? walk_component+0x362/0x38f
[<ffffffff811e7b9a>] ima_file_check+0x53/0x111
[<ffffffff81115ae0>] do_last+0x456/0x566
[<ffffffff81114467>] ? path_init+0x179/0x2b8
[<ffffffff81116148>] path_openat+0xca/0x30e
[<ffffffff8111647b>] do_filp_open+0x38/0x84
[<ffffffff812127d9>] ? strncpy_from_user+0x43/0x72
[<ffffffff81120014>] ? alloc_fd+0x76/0x11f
[<ffffffff81109696>] do_sys_open+0x6e/0x100
[<ffffffff81109751>] sys_open+0x1b/0x1d
[<ffffffff8145b612>] system_call_fastpath+0x16/0x1b
INFO: task t_mtab:1768 blocked for more than 10 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
t_mtab D 0000000000000000 0 1768 1634 0x00000080
ffff880037ccbc68 0000000000000082 0000000000000000 0000000000000000
ffff880037cca010 ffff880037ef2e00 0000000000012c80 ffff880037ccbfd8
ffff880037ccbfd8 0000000000012c80 ffffffff81a0c020 ffff880037ef2e00
Call Trace:
[<ffffffff8145411a>] __mutex_lock_common+0x110/0x171
[<ffffffffa0267ca5>] ? nfs_permission+0xd7/0x168 [nfs]
[<ffffffff81454191>] __mutex_lock_slowpath+0x16/0x18
[<ffffffff81454257>] mutex_lock+0x1e/0x32
[<ffffffff8111581e>] do_last+0x194/0x566
[<ffffffff81114467>] ? path_init+0x179/0x2b8
[<ffffffff81116148>] path_openat+0xca/0x30e
[<ffffffffa028d8fd>] ? __nfs4_close+0xf4/0x101 [nfs]
[<ffffffff8111647b>] do_filp_open+0x38/0x84
[<ffffffff812127d9>] ? strncpy_from_user+0x43/0x72
[<ffffffff81120014>] ? alloc_fd+0x76/0x11f
[<ffffffff81109696>] do_sys_open+0x6e/0x100
[<ffffffff81109751>] sys_open+0x1b/0x1d
[<ffffffff8145b612>] system_call_fastpath+0x16/0x1b
INFO: task t_mtab:1767 blocked for more than 10 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
t_mtab D 0000000000000000 0 1767 1634 0x00000080
ffff8800376afd48 0000000000000086 ffff8800376afcd8 ffffffff00000000
ffff8800376ae010 ffff880037ef4500 0000000000012c80 ffff8800376affd8
ffff8800376affd8 0000000000012c80 ffffffff81a0c020 ffff880037ef4500
Call Trace:
[<ffffffff8145411a>] __mutex_lock_common+0x110/0x171
[<ffffffff81454191>] __mutex_lock_slowpath+0x16/0x18
[<ffffffff81454257>] mutex_lock+0x1e/0x32
[<ffffffff811169a2>] kern_path_create+0x75/0x11e
[<ffffffff810fe836>] ? kmem_cache_alloc+0x5f/0xf1
[<ffffffff812127d9>] ? strncpy_from_user+0x43/0x72
[<ffffffff81114077>] ? getname_flags+0x158/0x1d2
[<ffffffff81116a86>] user_path_create+0x3b/0x52
[<ffffffff81117466>] sys_linkat+0x9a/0x120
[<ffffffff8109932e>] ? audit_syscall_entry+0x119/0x145
[<ffffffff81117505>] sys_link+0x19/0x1c
[<ffffffff8145b612>] system_call_fastpath+0x16/0x1b
INFO: task t_mtab:1769 blocked for more than 10 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
t_mtab D 0000000000000000 0 1769 1634 0x00000080
ffff88006c2d1c18 0000000000000082 ffff88006c2d1be8 ffffffff00000000
ffff88006c2d0010 ffff880037ef0000 0000000000012c80 ffff88006c2d1fd8
ffff88006c2d1fd8 0000000000012c80 ffffffff81a0c020 ffff880037ef0000
Call Trace:
[<ffffffff8145411a>] __mutex_lock_common+0x110/0x171
[<ffffffff81454191>] __mutex_lock_slowpath+0x16/0x18
[<ffffffff81454257>] mutex_lock+0x1e/0x32
[<ffffffff8111565d>] ? walk_component+0x362/0x38f
[<ffffffff811e7b9a>] ima_file_check+0x53/0x111
[<ffffffff81115ae0>] do_last+0x456/0x566
[<ffffffff81114467>] ? path_init+0x179/0x2b8
[<ffffffff81116148>] path_openat+0xca/0x30e
[<ffffffff8111647b>] do_filp_open+0x38/0x84
[<ffffffff812127d9>] ? strncpy_from_user+0x43/0x72
[<ffffffff81120014>] ? alloc_fd+0x76/0x11f
[<ffffffff81109696>] do_sys_open+0x6e/0x100
[<ffffffff81109751>] sys_open+0x1b/0x1d
[<ffffffff8145b612>] system_call_fastpath+0x16/0x1b
next prev parent reply other threads:[~2011-08-02 2:21 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-28 17:30 [PATCH v4 00/27] add block layout driver to pnfs client Jim Rees
2011-07-28 17:30 ` [PATCH v4 01/27] pnfs: GETDEVICELIST Jim Rees
2011-07-28 17:30 ` [PATCH v4 02/27] pnfs: add set-clear layoutdriver interface Jim Rees
2011-07-28 17:30 ` [PATCH v4 03/27] pnfs: save layoutcommit lwb at layout header Jim Rees
2011-07-28 17:30 ` [PATCH v4 04/27] pnfs: save layoutcommit cred " Jim Rees
2011-07-28 17:30 ` [PATCH v4 05/27] pnfs: let layoutcommit handle a list of lseg Jim Rees
2011-07-28 18:52 ` Boaz Harrosh
2011-07-28 17:30 ` [PATCH v4 06/27] pnfs: use lwb as layoutcommit length Jim Rees
2011-07-28 17:30 ` [PATCH v4 07/27] NFS41: save layoutcommit cred in layout header init Jim Rees
2011-07-28 17:30 ` [PATCH v4 08/27] pnfs: ask for layout_blksize and save it in nfs_server Jim Rees
2011-07-28 17:30 ` [PATCH v4 09/27] pnfs: cleanup_layoutcommit Jim Rees
2011-07-28 18:26 ` Boaz Harrosh
2011-07-29 3:16 ` Jim Rees
2011-07-28 17:30 ` [PATCH v4 10/27] pnfsblock: add blocklayout Kconfig option, Makefile, and stubs Jim Rees
2011-07-28 17:31 ` [PATCH v4 11/27] pnfsblock: use pageio_ops api Jim Rees
2011-07-28 17:31 ` [PATCH v4 12/27] pnfsblock: basic extent code Jim Rees
2011-07-28 17:31 ` [PATCH v4 13/27] pnfsblock: add device operations Jim Rees
2011-07-28 17:31 ` [PATCH v4 14/27] pnfsblock: remove " Jim Rees
2011-07-28 17:31 ` [PATCH v4 15/27] pnfsblock: lseg alloc and free Jim Rees
2011-07-28 17:31 ` [PATCH v4 16/27] pnfsblock: merge extents Jim Rees
2011-07-28 17:31 ` [PATCH v4 17/27] pnfsblock: call and parse getdevicelist Jim Rees
2011-07-28 17:31 ` [PATCH v4 18/27] pnfsblock: xdr decode pnfs_block_layout4 Jim Rees
2011-07-28 17:31 ` [PATCH v4 19/27] pnfsblock: bl_find_get_extent Jim Rees
2011-07-28 17:31 ` [PATCH v4 20/27] pnfsblock: add extent manipulation functions Jim Rees
2011-07-28 17:31 ` [PATCH v4 21/27] pnfsblock: merge rw extents Jim Rees
2011-07-28 17:31 ` [PATCH v4 22/27] pnfsblock: encode_layoutcommit Jim Rees
2011-07-28 17:31 ` [PATCH v4 23/27] pnfsblock: cleanup_layoutcommit Jim Rees
2011-07-28 17:31 ` [PATCH v4 24/27] pnfsblock: bl_read_pagelist Jim Rees
2011-07-28 17:31 ` [PATCH v4 25/27] pnfsblock: bl_write_pagelist Jim Rees
2011-07-28 17:31 ` [PATCH v4 26/27] pnfsblock: note written INVAL areas for layoutcommit Jim Rees
2011-07-28 17:31 ` [PATCH v4 27/27] pnfsblock: write_pagelist handle zero invalid extents Jim Rees
2011-07-29 15:51 ` [PATCH v4 00/27] add block layout driver to pnfs client Christoph Hellwig
2011-07-29 17:45 ` Peng Tao
2011-07-29 18:44 ` Christoph Hellwig
2011-07-29 18:54 ` Jim Rees
2011-07-29 19:01 ` Christoph Hellwig
2011-07-29 19:13 ` Jim Rees
2011-07-30 1:09 ` Trond Myklebust
2011-07-30 3:26 ` Jim Rees
2011-07-30 14:25 ` Peng Tao
2011-08-01 21:10 ` Trond Myklebust
2011-08-01 22:35 ` Trond Myklebust
2011-08-01 22:57 ` Andy Adamson
2011-08-01 23:11 ` Trond Myklebust
2011-08-02 17:30 ` Trond Myklebust
2011-08-02 18:50 ` [PATCH v2 1/2] NFSv4.1: Fix the callback 'highest_used_slotid' behaviour Trond Myklebust
2011-08-02 18:50 ` [PATCH v2 2/2] NFSv4.1: Return NFS4ERR_BADSESSION to callbacks during session resets Trond Myklebust
2011-08-03 8:52 ` [PATCH v2 1/2] NFSv4.1: Fix the callback 'highest_used_slotid' behaviour Peng Tao
2011-08-02 2:21 ` Jim Rees [this message]
2011-08-02 2:29 ` [PATCH v4 00/27] add block layout driver to pnfs client Myklebust, Trond
2011-08-02 3:23 ` Jim Rees
2011-08-02 12:28 ` Trond Myklebust
2011-08-02 12:56 ` Jim Rees
2011-08-03 1:48 ` Jim Rees
2011-08-03 2:07 ` Myklebust, Trond
[not found] ` <2E1EB2CF9ED1CB4AA966F0EB76EAB4430A778AE2-hX7t0kiaRRrlMGe9HJ1VYQK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2011-08-03 2:11 ` Jim Rees
2011-08-03 2:38 ` Jim Rees
2011-08-03 8:43 ` Peng Tao
2011-08-03 11:49 ` Jim Rees
2011-08-03 11:53 ` Jim Rees
2011-08-03 13:59 ` Peng Tao
2011-08-03 14:11 ` Jim Rees
2011-07-30 14:18 ` Jim Rees
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110802022144.GA18157@merit.edu \
--to=rees@umich.edu \
--cc=Trond.Myklebust@netapp.com \
--cc=andros@netapp.com \
--cc=bergwolf@gmail.com \
--cc=hch@infradead.org \
--cc=honey@citi.umich.edu \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.