* return layout on error, BUG/deadlock
@ 2012-08-09 13:03 Idan Kedar
2012-08-09 14:05 ` Myklebust, Trond
0 siblings, 1 reply; 12+ messages in thread
From: Idan Kedar @ 2012-08-09 13:03 UTC (permalink / raw)
To: Boaz Harrosh, NFS list; +Cc: Benny Halevy
Hi,
As a result of some experiments, I wanted to see what happens when I
inject an error (hard coded) to the object layout driver. the patch is
at the bottom of this mail. the reason I did this is because when I
inject errors in my modified version of the object layout driver, I
get the same BUG Tigran reported about yesterday:
nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs));
In my modified version (based on kernel 3.3), the bug seems to be that
pnfs_ld_write_done calls pnfs_return_layout in the error path, even if
there is in-flight I/O.
I wanted to see if this is a result of my modifications, so I injected
errors to the ORE (on kernel 3.5) and ran Connectathon's basic tests,
and got a deadlock AND that BUG:
[ 112.659066] =================================
[ 112.659066] [ INFO: inconsistent lock state ]
[ 112.659066] 3.5.0-nfsobj+ #35 Not tainted
[ 112.659066] ---------------------------------
[ 112.659066] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[ 112.659066] kworker/0:2/456 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 112.659066] (&(&objlay->lock)->rlock){+.?...}, at:
[<ffffffffa02558fc>] objlayout_encode_layoutreturn+0x6c/0x420
[objlayoutdriver]
[ 112.659066] {IN-SOFTIRQ-W} state was registered at:
[ 112.659066] [<ffffffff810b2bd8>] __lock_acquire+0x5e8/0x1bb0
[ 112.659066] [<ffffffff810b47c2>] lock_acquire+0x92/0x120
[ 112.659066] [<ffffffff81713e16>] _raw_spin_lock+0x36/0x70
[ 112.659066] [<ffffffffa025622d>]
objlayout_iodone.part.1+0x25/0x56 [objlayoutdriver]
[ 112.659066] [<ffffffffa025573b>] objlayout_write_done+0xcb/0xd0
[objlayoutdriver]
[ 112.659066] [<ffffffffa02541c3>] _write_done+0x43/0x60 [objlayoutdriver]
[ 112.659066] [<ffffffffa023a013>] _last_io+0x13/0x20 [libore]
[ 112.659066] [<ffffffffa023a868>] _done_io+0x28/0x30 [libore]
[ 112.659066] [<ffffffffa0229f94>] osd_request_async_done+0xd4/0xf0 [libosd]
[ 112.659066] [<ffffffff812a4a38>] blk_finish_request+0xa8/0x2c0
[ 112.659066] [<ffffffff812a4c9f>] blk_end_bidi_request+0x4f/0x80
[ 112.659066] [<ffffffff812a4d10>] blk_end_request+0x10/0x20
[ 112.659066] [<ffffffff8138c36f>] scsi_io_completion+0xaf/0x670
[ 112.659066] [<ffffffff81382861>] scsi_finish_command+0xd1/0x130
[ 112.659066] [<ffffffff8138c1bf>] scsi_softirq_done+0x13f/0x160
[ 112.659066] [<ffffffff812abb3c>] blk_done_softirq+0x8c/0xa0
[ 112.659066] [<ffffffff81068be8>] __do_softirq+0xc8/0x250
[ 112.659066] [<ffffffff8171696c>] call_softirq+0x1c/0x30
[ 112.659066] [<ffffffff81015785>] do_softirq+0xa5/0xe0
[ 112.659066] [<ffffffff8106893b>] local_bh_enable_ip+0xeb/0x100
[ 112.659066] [<ffffffff81714244>] _raw_spin_unlock_bh+0x34/0x40
[ 112.659066] [<ffffffff815c8abc>] release_sock+0x14c/0x190
[ 112.659066] [<ffffffff81613005>] tcp_sendpage+0xc5/0x6e0
[ 112.659066] [<ffffffff81638703>] inet_sendpage+0xd3/0x120
[ 112.659066] [<ffffffffa0221586>]
iscsi_sw_tcp_pdu_xmit+0x116/0x290 [iscsi_tcp]
[ 112.659066] [<ffffffff814ba930>] iscsi_tcp_task_xmit+0xb0/0x2d0
[ 112.659066] [<ffffffff814354ae>] iscsi_xmit_task+0x5e/0xb0
[ 112.659066] [<ffffffff81437457>] iscsi_xmitworker+0x1f7/0x330
[ 112.659066] [<ffffffff8107d8af>] process_one_work+0x19f/0x530
[ 112.659066] [<ffffffff8107f5d9>] worker_thread+0x159/0x340
[ 112.659066] [<ffffffff810847a7>] kthread+0xb7/0xc0
[ 112.659066] [<ffffffff81716874>] kernel_thread_helper+0x4/0x10
[ 112.659066] irq event stamp: 56183
[ 112.659066] hardirqs last enabled at (56183): [<ffffffff810688e7>]
local_bh_enable_ip+0x97/0x100
[ 112.659066] hardirqs last disabled at (56181): [<ffffffff81068894>]
local_bh_enable_ip+0x44/0x100
[ 112.659066] softirqs last enabled at (56182): [<ffffffffa00344d0>]
xprt_prepare_transmit+0x70/0x90 [sunrpc]
[ 112.659066] softirqs last disabled at (56180): [<ffffffff81713ee8>]
_raw_spin_lock_bh+0x18/0x70
[ 112.659066]
[ 112.659066] other info that might help us debug this:
[ 112.659066] Possible unsafe locking scenario:
[ 112.659066]
[ 112.659066] CPU0
[ 112.659066] ----
[ 112.659066] lock(&(&objlay->lock)->rlock);
[ 112.659066] <Interrupt>
[ 112.659066] lock(&(&objlay->lock)->rlock);
[ 112.659066]
[ 112.659066] *** DEADLOCK ***
[ 112.659066]
[ 112.659066] 2 locks held by kworker/0:2/456:
[ 112.659066] #0: (events){.+.+.+}, at: [<ffffffff8107d84c>]
process_one_work+0x13c/0x530
[ 112.659066] #1: ((&wdata->task.u.tk_work)){+.+.+.}, at:
[<ffffffff8107d84c>] process_one_work+0x13c/0x530
[ 112.659066]
[ 112.659066] stack backtrace:
[ 112.659066] Pid: 456, comm: kworker/0:2 Not tainted 3.5.0-nfsobj+ #35
[ 112.659066] Call Trace:
[ 112.659066] [<ffffffff81703f08>] print_usage_bug+0x1f5/0x206
[ 112.659066] [<ffffffff8102242f>] ? save_stack_trace+0x2f/0x50
[ 112.659066] [<ffffffff810b2595>] mark_lock+0x295/0x2f0
[ 112.659066] [<ffffffff810b1af0>] ? check_usage_forwards+0x140/0x140
[ 112.659066] [<ffffffff810b2c3a>] __lock_acquire+0x64a/0x1bb0
[ 112.659066] [<ffffffff810aef9d>] ? trace_hardirqs_off+0xd/0x10
[ 112.659066] [<ffffffff8109852f>] ? local_clock+0x6f/0x80
[ 112.659066] [<ffffffff8101baf3>] ? native_sched_clock+0x13/0x80
[ 112.659066] [<ffffffff8101bb69>] ? sched_clock+0x9/0x10
[ 112.659066] [<ffffffff810982c5>] ? sched_clock_local+0x25/0x90
[ 112.659066] [<ffffffff81098458>] ? sched_clock_cpu+0xa8/0x110
[ 112.659066] [<ffffffffa02558fc>] ?
objlayout_encode_layoutreturn+0x6c/0x420 [objlayoutdriver]
[ 112.659066] [<ffffffff810b47c2>] lock_acquire+0x92/0x120
[ 112.659066] [<ffffffffa02558fc>] ?
objlayout_encode_layoutreturn+0x6c/0x420 [objlayoutdriver]
[ 112.659066] [<ffffffff8101bb69>] ? sched_clock+0x9/0x10
[ 112.659066] [<ffffffffa02551a0>] ?
__copy_nfsS_and_zero_terminate+0x90/0x90 [objlayoutdriver]
[ 112.659066] [<ffffffff81713e16>] _raw_spin_lock+0x36/0x70
[ 112.659066] [<ffffffffa02558fc>] ?
objlayout_encode_layoutreturn+0x6c/0x420 [objlayoutdriver]
[ 112.659066] [<ffffffff8101bb69>] ? sched_clock+0x9/0x10
[ 112.659066] [<ffffffffa02558fc>]
objlayout_encode_layoutreturn+0x6c/0x420 [objlayoutdriver]
[ 112.659066] [<ffffffff81098458>] ? sched_clock_cpu+0xa8/0x110
[ 112.659066] [<ffffffff810aef9d>] ? trace_hardirqs_off+0xd/0x10
[ 112.659066] [<ffffffff8109852f>] ? local_clock+0x6f/0x80
[ 112.659066] [<ffffffffa016541a>] ?
nfs4_xdr_enc_layoutreturn+0x11a/0x170 [nfs]
[ 112.659066] [<ffffffffa0045797>] ?
xdr_encode_opaque_fixed+0x47/0x80 [sunrpc]
[ 112.659066] [<ffffffffa02551a0>] ?
__copy_nfsS_and_zero_terminate+0x90/0x90 [objlayoutdriver]
[ 112.659066] [<ffffffffa0165449>] nfs4_xdr_enc_layoutreturn+0x149/0x170 [nfs]
[ 112.659066] [<ffffffff810688e7>] ? local_bh_enable_ip+0x97/0x100
[ 112.659066] [<ffffffffa00344d0>] ? xprt_prepare_transmit+0x70/0x90 [sunrpc]
[ 112.659066] [<ffffffffa0165300>] ? nfs4_xdr_enc_commit+0xe0/0xe0 [nfs]
[ 112.659066] [<ffffffffa003bc05>] rpcauth_wrap_req+0x65/0x70 [sunrpc]
[ 112.659066] [<ffffffffa0030dae>] call_transmit+0x18e/0x260 [sunrpc]
[ 112.659066] [<ffffffffa003a360>] __rpc_execute+0x70/0x280 [sunrpc]
[ 112.659066] [<ffffffffa0030c20>] ? call_connect+0x40/0x40 [sunrpc]
[ 112.659066] [<ffffffffa0030c20>] ? call_connect+0x40/0x40 [sunrpc]
[ 112.659066] [<ffffffff81084e9e>] ? wake_up_bit+0x2e/0x40
[ 112.659066] [<ffffffffa02551a0>] ?
__copy_nfsS_and_zero_terminate+0x90/0x90 [objlayoutdriver]
[ 112.659066] [<ffffffffa003ab4f>] rpc_execute+0x4f/0xb0 [sunrpc]
[ 112.659066] [<ffffffffa02551a0>] ?
__copy_nfsS_and_zero_terminate+0x90/0x90 [objlayoutdriver]
[ 112.659066] [<ffffffffa00327b5>] rpc_run_task+0x75/0x90 [sunrpc]
[ 112.659066] [<ffffffffa0161c96>] nfs4_proc_layoutreturn+0x86/0xb0 [nfs]
[ 112.659066] [<ffffffffa0174594>] _pnfs_return_layout+0x114/0x180 [nfs]
[ 112.659066] [<ffffffffa0174737>] pnfs_ld_write_done+0x67/0xe0 [nfs]
[ 112.659066] [<ffffffffa02551b5>] _rpc_write_complete+0x15/0x20
[objlayoutdriver]
[ 112.659066] [<ffffffff8107d8af>] process_one_work+0x19f/0x530
[ 112.659066] [<ffffffff8107d84c>] ? process_one_work+0x13c/0x530
[ 112.659066] [<ffffffff8107f5d9>] worker_thread+0x159/0x340
[ 112.659066] [<ffffffff8107f480>] ? manage_workers+0x230/0x230
[ 112.659066] [<ffffffff810847a7>] kthread+0xb7/0xc0
[ 112.659066] [<ffffffff810b5295>] ? trace_hardirqs_on_caller+0x105/0x190
[ 112.659066] [<ffffffff81716874>] kernel_thread_helper+0x4/0x10
[ 112.659066] [<ffffffff81714bb0>] ? retint_restore_args+0x13/0x13
[ 112.659066] [<ffffffff810846f0>] ? __init_kthread_worker+0x70/0x70
[ 112.659066] [<ffffffff81716870>] ? gs_change+0x13/0x13
[ 112.787051] ------------[ cut here ]------------
[ 112.787799] kernel BUG at fs/nfs/nfs4proc.c:6252!
[ 112.787799] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 112.787799] CPU 0
[ 112.787799] Modules linked in:[ 112.787799]
nfs_layout_nfsv41_files objlayoutdriver exofs libore osd libosd
iscsi_tcp netconsole nfs nfsd fscache lockd auth_rpcgss nfs_acl sunrpc
e1000 rtc_cmos serio_raw [last unloaded: scsi_wait_scan]
[ 112.787799] Pid: 456, comm: kworker/0:2 Not tainted 3.5.0-nfsobj+
#35 innotek GmbH VirtualBox
[ 112.787799] RIP: 0010:[<ffffffffa015a245>] [<ffffffffa015a245>]
nfs4_layoutreturn_done+0xd5/0xe0 [nfs]
[ 112.787799] RSP: 0018:ffff88003c7bdb50 EFLAGS: 00010206
[ 112.787799] RAX: ffff880034480b90 RBX: ffff88003cbb5860 RCX: ffff88003add4ca8
[ 112.787799] RDX: 0000000000000000 RSI: ffff8800354b1288 RDI: 0000000000000246
[ 112.787799] RBP: ffff88003c7bdb70 R08: 0000000000000002 R09: 0000000000000000
[ 112.787799] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a782240
[ 112.787799] R13: ffff880034480b68 R14: ffff88003a787138 R15: ffffffffa02551a0
[ 112.787799] FS: 0000000000000000(0000) GS:ffff88003e200000(0000)
knlGS:0000000000000000
[ 112.787799] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 112.787799] CR2: 0000003f33c9b640 CR3: 0000000034070000 CR4: 00000000000006f0
[ 112.787799] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 112.787799] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 112.787799] Process kworker/0:2 (pid: 456, threadinfo
ffff88003c7bc000, task ffff88003add4620)
[ 112.787799] Stack:
[ 112.787799] ffff88003a787138 ffff88003a782240 ffff8800347c4858
ffff88003c7bdcf8
[ 112.787799] ffff88003c7bdb90 ffffffffa00391fc ffff88003a787138
ffff88003a782240
[ 112.787799] ffff88003c7bdc00 ffffffffa003a360 000000003c7bdbc0
ffff88003a7822b0
[ 112.787799] Call Trace:
[ 112.787799] [<ffffffffa00391fc>] rpc_exit_task+0x2c/0x90 [sunrpc]
[ 112.787799] [<ffffffffa003a360>] __rpc_execute+0x70/0x280 [sunrpc]
[ 112.787799] [<ffffffffa00391d0>] ? rpc_async_release+0x20/0x20 [sunrpc]
[ 112.787799] [<ffffffffa00391d0>] ? rpc_async_release+0x20/0x20 [sunrpc]
[ 112.787799] [<ffffffff81084e9e>] ? wake_up_bit+0x2e/0x40
[ 112.787799] [<ffffffffa02551a0>] ?
__copy_nfsS_and_zero_terminate+0x90/0x90 [objlayoutdriver]
[ 112.787799] [<ffffffffa003ab4f>] rpc_execute+0x4f/0xb0 [sunrpc]
[ 112.787799] [<ffffffffa02551a0>] ?
__copy_nfsS_and_zero_terminate+0x90/0x90 [objlayoutdriver]
[ 112.787799] [<ffffffffa00327b5>] rpc_run_task+0x75/0x90 [sunrpc]
[ 112.787799] [<ffffffffa0161c96>] nfs4_proc_layoutreturn+0x86/0xb0 [nfs]
[ 112.787799] [<ffffffffa0174594>] _pnfs_return_layout+0x114/0x180 [nfs]
[ 112.787799] [<ffffffffa0174737>] pnfs_ld_write_done+0x67/0xe0 [nfs]
[ 112.787799] [<ffffffffa02551b5>] _rpc_write_complete+0x15/0x20
[objlayoutdriver]
[ 112.787799] [<ffffffff8107d8af>] process_one_work+0x19f/0x530
[ 112.787799] [<ffffffff8107d84c>] ? process_one_work+0x13c/0x530
[ 112.787799] [<ffffffff8107f5d9>] worker_thread+0x159/0x340
[ 112.787799] [<ffffffff8107f480>] ? manage_workers+0x230/0x230
[ 112.787799] [<ffffffff810847a7>] kthread+0xb7/0xc0
[ 112.787799] [<ffffffff810b5295>] ? trace_hardirqs_on_caller+0x105/0x190
[ 112.787799] [<ffffffff81716874>] kernel_thread_helper+0x4/0x10
[ 112.787799] [<ffffffff81714bb0>] ? retint_restore_args+0x13/0x13
[ 112.787799] [<ffffffff810846f0>] ? __init_kthread_worker+0x70/0x70
[ 112.787799] [<ffffffff81716870>] ? gs_change+0x13/0x13
[ 112.787799] Code: c3 0f 1f 44 00 00 48 8d 73 64 ba 01 00 00 00 4c
89 ef e8 ff ab 01 00 eb c8 0f 1f 44 00 00 4c 89 e7 e8 e0 66 ed ff e9
5a ff ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec
08 66
[ 112.787799] RIP [<ffffffffa015a245>] nfs4_layoutreturn_done+0xd5/0xe0 [nfs]
[ 112.787799] RSP <ffff88003c7bdb50>
[ 112.872492] ---[ end trace 114822acc0612b2a ]---
My setup:
Data server is osd-osc emulator
MDS is nfsd-exofs on kernel 3.3 (Benny's kernel)
Is there any fix for this? Or is it not an issue and my injection is
just wrong and stupid?
I've seen some discussions on this in the mailing list, and I saw the
patch that removes the BUG, but I'm not sure if this is the right
thing to do for object layout.
---
fs/exofs/ore.c | 11 ++++++++---
1 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/fs/exofs/ore.c b/fs/exofs/ore.c
index 24a49d4..e9d7b45 100644
--- a/fs/exofs/ore.c
+++ b/fs/exofs/ore.c
@@ -426,9 +426,14 @@ int ore_check_io(struct ore_io_state *ios,
ore_on_dev_error on_dev_error)
if (unlikely(!or))
continue;
- ret = osd_req_decode_sense(or, &osi);
- if (likely(!ret))
- continue;
+ if (jiffies % 6 == 0) {
+ osi.osd_err_pri = OSD_ERR_PRI_EIO;
+ ret = -EIO;
+ } else {
+ ret = osd_req_decode_sense(or, &osi);
+ if (likely(!ret))
+ continue;
+ }
if (OSD_ERR_PRI_CLEAR_PAGES == osi.osd_err_pri) {
/* start read offset passed endof file */
--
1.7.6.5
--
idank
^ permalink raw reply related [flat|nested] 12+ messages in thread* RE: return layout on error, BUG/deadlock 2012-08-09 13:03 return layout on error, BUG/deadlock Idan Kedar @ 2012-08-09 14:05 ` Myklebust, Trond 2012-08-09 15:49 ` Idan Kedar 0 siblings, 1 reply; 12+ messages in thread From: Myklebust, Trond @ 2012-08-09 14:05 UTC (permalink / raw) To: Idan Kedar, Boaz Harrosh, NFS list; +Cc: Benny Halevy > -----Original Message----- > From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- > owner@vger.kernel.org] On Behalf Of Idan Kedar > Sent: Thursday, August 09, 2012 9:03 AM > To: Boaz Harrosh; NFS list > Cc: Benny Halevy > Subject: return layout on error, BUG/deadlock > > Hi, > > As a result of some experiments, I wanted to see what happens when I > inject an error (hard coded) to the object layout driver. the patch is at the > bottom of this mail. the reason I did this is because when I inject errors in my > modified version of the object layout driver, I get the same BUG Tigran > reported about yesterday: > nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); > > In my modified version (based on kernel 3.3), the bug seems to be that > pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there > is in-flight I/O. That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS. See the changelog in the patch that I sent to the list yesterday. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: return layout on error, BUG/deadlock 2012-08-09 14:05 ` Myklebust, Trond @ 2012-08-09 15:49 ` Idan Kedar 2012-08-09 16:06 ` Myklebust, Trond 0 siblings, 1 reply; 12+ messages in thread From: Idan Kedar @ 2012-08-09 15:49 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Boaz Harrosh, NFS list, Benny Halevy, Peng Tao On Thu, Aug 9, 2012 at 5:05 PM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: >> -----Original Message----- >> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- >> owner@vger.kernel.org] On Behalf Of Idan Kedar >> Sent: Thursday, August 09, 2012 9:03 AM >> To: Boaz Harrosh; NFS list >> Cc: Benny Halevy >> Subject: return layout on error, BUG/deadlock >> >> Hi, >> >> As a result of some experiments, I wanted to see what happens when I >> inject an error (hard coded) to the object layout driver. the patch is at the >> bottom of this mail. the reason I did this is because when I inject errors in my >> modified version of the object layout driver, I get the same BUG Tigran >> reported about yesterday: >> nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); >> >> In my modified version (based on kernel 3.3), the bug seems to be that >> pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there >> is in-flight I/O. > > That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS. to what change are you referring? > > See the changelog in the patch that I sent to the list yesterday. > I saw that, and if I'm not mistaken these races apply to object layout as well, and in any case they apply in my case. However, it is not easy to mess around with LAYOUTRETURN in object layout, and there have been several discussions on the issue. In one of these discussions Benny clarified that the object layout client must wait for all in-flight I/O to end. So for file layout it probably makes sense, but object layout (and if I understand correctly, block layout as well) something else needs to be done. I thought about sync wait when returning the layout on error, but according to Boaz it will cause deadlocks (Boaz - can you please elaborate?). And come to think of it, nfs4_proc_setattr also returns the layout when there may be I/O in-flight (correct me if i'm wrong). So I guess pnfs_return_layout should somehow solve this by itself by saying "if this is fencing (a flag which will be set for file layout only), go ahead, otherwise make the layout as 'needs to be returned' and when the lseg lists gets empty return the layout". Comments? > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com > > -- idank ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: return layout on error, BUG/deadlock 2012-08-09 15:49 ` Idan Kedar @ 2012-08-09 16:06 ` Myklebust, Trond 2012-08-09 16:34 ` Peng Tao 2012-08-09 16:54 ` Idan Kedar 0 siblings, 2 replies; 12+ messages in thread From: Myklebust, Trond @ 2012-08-09 16:06 UTC (permalink / raw) To: Idan Kedar; +Cc: Boaz Harrosh, NFS list, Benny Halevy, Peng Tao T24gVGh1LCAyMDEyLTA4LTA5IGF0IDE4OjQ5ICswMzAwLCBJZGFuIEtlZGFyIHdyb3RlOg0KPiBP biBUaHUsIEF1ZyA5LCAyMDEyIGF0IDU6MDUgUE0sIE15a2xlYnVzdCwgVHJvbmQNCj4gPFRyb25k Lk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gPj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdl LS0tLS0NCj4gPj4gRnJvbTogbGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9yZyBbbWFpbHRv OmxpbnV4LW5mcy0NCj4gPj4gb3duZXJAdmdlci5rZXJuZWwub3JnXSBPbiBCZWhhbGYgT2YgSWRh biBLZWRhcg0KPiA+PiBTZW50OiBUaHVyc2RheSwgQXVndXN0IDA5LCAyMDEyIDk6MDMgQU0NCj4g Pj4gVG86IEJvYXogSGFycm9zaDsgTkZTIGxpc3QNCj4gPj4gQ2M6IEJlbm55IEhhbGV2eQ0KPiA+ PiBTdWJqZWN0OiByZXR1cm4gbGF5b3V0IG9uIGVycm9yLCBCVUcvZGVhZGxvY2sNCj4gPj4NCj4g Pj4gSGksDQo+ID4+DQo+ID4+IEFzIGEgcmVzdWx0IG9mIHNvbWUgZXhwZXJpbWVudHMsIEkgd2Fu dGVkIHRvIHNlZSB3aGF0IGhhcHBlbnMgd2hlbiBJDQo+ID4+IGluamVjdCBhbiBlcnJvciAoaGFy ZCBjb2RlZCkgdG8gdGhlIG9iamVjdCBsYXlvdXQgZHJpdmVyLiB0aGUgcGF0Y2ggaXMgYXQgdGhl DQo+ID4+IGJvdHRvbSBvZiB0aGlzIG1haWwuIHRoZSByZWFzb24gSSBkaWQgdGhpcyBpcyBiZWNh dXNlIHdoZW4gSSBpbmplY3QgZXJyb3JzIGluIG15DQo+ID4+IG1vZGlmaWVkIHZlcnNpb24gb2Yg dGhlIG9iamVjdCBsYXlvdXQgZHJpdmVyLCBJIGdldCB0aGUgc2FtZSBCVUcgVGlncmFuDQo+ID4+ IHJlcG9ydGVkIGFib3V0IHllc3RlcmRheToNCj4gPj4gbmZzNHByb2MuYzo2MjUyIDogICBCVUdf T04oIWxpc3RfZW1wdHkoJmxvLT5wbGhfc2VncykpOw0KPiA+Pg0KPiA+PiBJbiBteSBtb2RpZmll ZCB2ZXJzaW9uIChiYXNlZCBvbiBrZXJuZWwgMy4zKSwgdGhlIGJ1ZyBzZWVtcyB0byBiZSB0aGF0 DQo+ID4+IHBuZnNfbGRfd3JpdGVfZG9uZSBjYWxscyBwbmZzX3JldHVybl9sYXlvdXQgaW4gdGhl IGVycm9yIHBhdGgsIGV2ZW4gaWYgdGhlcmUNCj4gPj4gaXMgaW4tZmxpZ2h0IEkvTy4NCj4gPg0K PiA+IFRoYXQgaXMgbm90IGEgYnVnLiBJdCBpcyBhbiBpbnRlbnRpb25hbCBjaGFuZ2UgaW4gb3Jk ZXIgdG8gYWxsb3cgdGhlIE1EUyB0byBmZW5jZSBvZmYgdGhlIG91dHN0YW5kaW5nIHdyaXRlcyAo aWYgaXQgY2FuIGRvIHNvKSBiZWZvcmUgd2UgcmV0cmFuc21pdCB0aGVtIGFzIHdyaXRlLXRocm91 Z2gtTURTLiBPdGhlcndpc2UsIHlvdSByaXNrIHJhY2VzIGJldHdlZW4gdGhlIG91dHN0YW5kaW5n IHdyaXRlcy10by1EUyBhbmQgdGhlIG5ldyB3cml0ZXMtdGhyb3VnaC1NRFMuDQo+IA0KPiB0byB3 aGF0IGNoYW5nZSBhcmUgeW91IHJlZmVycmluZz8NCg0KQXMgSSBzdGF0ZWQgaW4gdGhlIGNoYW5n ZWxvZyBvZiB0aGUgcGF0Y2ggdGhhdCBJIHNlbnQgdG8gdGhlIGxpc3QNCnllc3RlcmRheSwgdGhl IGJlaGF2aW91ciBpcyBkdWUgdG8gY29tbWl0IDBhNTdjZGFjM2YuDQoNCj4gPg0KPiA+IFNlZSB0 aGUgY2hhbmdlbG9nIGluIHRoZSBwYXRjaCB0aGF0IEkgc2VudCB0byB0aGUgbGlzdCB5ZXN0ZXJk YXkuDQo+ID4NCj4gDQo+IEkgc2F3IHRoYXQsIGFuZCBpZiBJJ20gbm90IG1pc3Rha2VuIHRoZXNl IHJhY2VzIGFwcGx5IHRvIG9iamVjdCBsYXlvdXQNCj4gYXMgd2VsbCwgYW5kIGluIGFueSBjYXNl IHRoZXkgYXBwbHkgaW4gbXkgY2FzZS4gSG93ZXZlciwgaXQgaXMgbm90DQo+IGVhc3kgdG8gbWVz cyBhcm91bmQgd2l0aCBMQVlPVVRSRVRVUk4gaW4gb2JqZWN0IGxheW91dCwgYW5kIHRoZXJlIGhh dmUNCj4gYmVlbiBzZXZlcmFsIGRpc2N1c3Npb25zIG9uIHRoZSBpc3N1ZS4gSW4gb25lIG9mIHRo ZXNlIGRpc2N1c3Npb25zDQo+IEJlbm55IGNsYXJpZmllZCB0aGF0IHRoZSBvYmplY3QgbGF5b3V0 IGNsaWVudCBtdXN0IHdhaXQgZm9yIGFsbA0KPiBpbi1mbGlnaHQgSS9PIHRvIGVuZC4NCg0KSWYg dGhlIHByb2JsZW0gaXMgdGhhdCB0aGUgRFMgaXMgZmFpbGluZyB0byByZXNwb25kLCBob3cgZG9l cyB0aGUgY2xpZW50DQprbm93IHRoYXQgdGhlIGluLWZsaWdodCBJL08gaGFzIGVuZGVkPw0KDQo+ IFNvIGZvciBmaWxlIGxheW91dCBpdCBwcm9iYWJseSBtYWtlcyBzZW5zZSwgYnV0IG9iamVjdCBs YXlvdXQgKGFuZCBpZg0KPiBJIHVuZGVyc3RhbmQgY29ycmVjdGx5LCBibG9jayBsYXlvdXQgYXMg d2VsbCkgc29tZXRoaW5nIGVsc2UgbmVlZHMgdG8NCj4gYmUgZG9uZS4gSSB0aG91Z2h0IGFib3V0 IHN5bmMgd2FpdCB3aGVuIHJldHVybmluZyB0aGUgbGF5b3V0IG9uIGVycm9yLA0KPiBidXQgYWNj b3JkaW5nIHRvIEJvYXogaXQgd2lsbCBjYXVzZSBkZWFkbG9ja3MgKEJvYXogLSBjYW4geW91IHBs ZWFzZQ0KPiBlbGFib3JhdGU/KS4NCg0KVGhlIG9iamVjdCBsYXlvdXRyZXR1cm4gaGFzIHRoZSBh YmlsaXR5IHRvIHBhc3MgYSB0aW1lb3V0IGVycm9yIHZhbHVlIHRvDQp0aGUgTURTIHByZWNpc2Vs eSBpbiBvcmRlciB0byBhbGxvdyB0aGUgbGF0dGVyIHRvIGRlYWwgd2l0aCB0aGlzIGtpbmQgb2YN Cmlzc3VlLiBTZWUgdGhlIGRlc2NyaXB0aW9uIG9mIHN0cnVjdCBwbmZzX29zZF9pb2VycjQgaW4g cmZjNTY2NC4NCg0KVGhlIGJsb2NrIGxheW91dCBpcyBhZGRpbmcgdGhlIHNhbWUgYWJpbGl0eSB0 byBsYXlvdXRyZXR1cm4gaW4gTkZTdjQuMg0KKHNlZSBkcmFmdC1pZXRmLW5mc3Y0LW1pbm9ydmVy c2lvbjItMTMudHh0KSB2aWEgdGhlIHN0cnVjdA0KbGF5b3V0cmV0dXJuX2RldmljZV9lcnJvcjQs IHNvIHByZXN1bWFibHkgdGhleSB0b28gaGF2ZSBhIHBsYW4gZm9yDQpkZWFsaW5nIHdpdGggdGhp cyBraW5kIG9mIGlzc3VlLg0KDQo+IEFuZCBjb21lIHRvIHRoaW5rIG9mIGl0LCBuZnM0X3Byb2Nf c2V0YXR0ciBhbHNvIHJldHVybnMgdGhlIGxheW91dA0KPiB3aGVuIHRoZXJlIG1heSBiZSBJL08g aW4tZmxpZ2h0IChjb3JyZWN0IG1lIGlmIGknbSB3cm9uZykuIFNvIEkgZ3Vlc3MNCj4gcG5mc19y ZXR1cm5fbGF5b3V0IHNob3VsZCBzb21laG93IHNvbHZlIHRoaXMgYnkgaXRzZWxmIGJ5IHNheWlu ZyAiaWYNCj4gdGhpcyBpcyBmZW5jaW5nIChhIGZsYWcgd2hpY2ggd2lsbCBiZSBzZXQgZm9yIGZp bGUgbGF5b3V0IG9ubHkpLCBnbw0KPiBhaGVhZCwgb3RoZXJ3aXNlIG1ha2UgdGhlIGxheW91dCBh cyAnbmVlZHMgdG8gYmUgcmV0dXJuZWQnIGFuZCB3aGVuDQo+IHRoZSBsc2VnIGxpc3RzIGdldHMg ZW1wdHkgcmV0dXJuIHRoZSBsYXlvdXQiLg0KDQpUaGUgb25seSBsYXlvdXQgdHlwZSB0aGF0IHNl dHMgdGhlIFBORlNfTEFZT1VUUkVUX09OX1NFVEFUVFIgZmxhZyBpcw0Kb2JqZWN0cywgc28gdGhh dCBxdWVzdGlvbiBuZWVkcyB0byBiZSBkaXJlY3RlZCB0byBCb2F6Lg0KDQotLSANClRyb25kIE15 a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlr bGVidXN0QG5ldGFwcC5jb20NCnd3dy5uZXRhcHAuY29tDQoNCg== ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: return layout on error, BUG/deadlock 2012-08-09 16:06 ` Myklebust, Trond @ 2012-08-09 16:34 ` Peng Tao 2012-08-09 16:37 ` Myklebust, Trond 2012-08-09 16:54 ` Idan Kedar 1 sibling, 1 reply; 12+ messages in thread From: Peng Tao @ 2012-08-09 16:34 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Idan Kedar, Boaz Harrosh, NFS list, Benny Halevy On Fri, Aug 10, 2012 at 12:06 AM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Thu, 2012-08-09 at 18:49 +0300, Idan Kedar wrote: >> On Thu, Aug 9, 2012 at 5:05 PM, Myklebust, Trond >> <Trond.Myklebust@netapp.com> wrote: >> >> -----Original Message----- >> >> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- >> >> owner@vger.kernel.org] On Behalf Of Idan Kedar >> >> Sent: Thursday, August 09, 2012 9:03 AM >> >> To: Boaz Harrosh; NFS list >> >> Cc: Benny Halevy >> >> Subject: return layout on error, BUG/deadlock >> >> >> >> Hi, >> >> >> >> As a result of some experiments, I wanted to see what happens when I >> >> inject an error (hard coded) to the object layout driver. the patch is at the >> >> bottom of this mail. the reason I did this is because when I inject errors in my >> >> modified version of the object layout driver, I get the same BUG Tigran >> >> reported about yesterday: >> >> nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); >> >> >> >> In my modified version (based on kernel 3.3), the bug seems to be that >> >> pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there >> >> is in-flight I/O. >> > >> > That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS. >> >> to what change are you referring? > > As I stated in the changelog of the patch that I sent to the list > yesterday, the behaviour is due to commit 0a57cdac3f. > >> > >> > See the changelog in the patch that I sent to the list yesterday. >> > >> >> I saw that, and if I'm not mistaken these races apply to object layout >> as well, and in any case they apply in my case. However, it is not >> easy to mess around with LAYOUTRETURN in object layout, and there have >> been several discussions on the issue. In one of these discussions >> Benny clarified that the object layout client must wait for all >> in-flight I/O to end. > > If the problem is that the DS is failing to respond, how does the client > know that the in-flight I/O has ended? > >> So for file layout it probably makes sense, but object layout (and if >> I understand correctly, block layout as well) something else needs to >> be done. I thought about sync wait when returning the layout on error, >> but according to Boaz it will cause deadlocks (Boaz - can you please >> elaborate?). > > The object layoutreturn has the ability to pass a timeout error value to > the MDS precisely in order to allow the latter to deal with this kind of > issue. See the description of struct pnfs_osd_ioerr4 in rfc5664. > > The block layout is adding the same ability to layoutreturn in NFSv4.2 > (see draft-ietf-nfsv4-minorversion2-13.txt) via the struct > layoutreturn_device_error4, so presumably they too have a plan for > dealing with this kind of issue. It is one thing to tell MDS that there is DS access error by sending layoutreturn, and it is another thing to return a layout even if there is overlapping in-flight DS IO... I certainly agree that client is entitled to return layout to inform MDS about DS errors and also avoid possible cb_layoutrecall. But it is just an optimization and should only be done when there is no in-flight IO (at least for block layout) IMHO. Thanks, Tao > >> And come to think of it, nfs4_proc_setattr also returns the layout >> when there may be I/O in-flight (correct me if i'm wrong). So I guess >> pnfs_return_layout should somehow solve this by itself by saying "if >> this is fencing (a flag which will be set for file layout only), go >> ahead, otherwise make the layout as 'needs to be returned' and when >> the lseg lists gets empty return the layout". > > The only layout type that sets the PNFS_LAYOUTRET_ON_SETATTR flag is > objects, so that question needs to be directed to Boaz. > > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: return layout on error, BUG/deadlock 2012-08-09 16:34 ` Peng Tao @ 2012-08-09 16:37 ` Myklebust, Trond 2012-08-09 16:48 ` Peng Tao 0 siblings, 1 reply; 12+ messages in thread From: Myklebust, Trond @ 2012-08-09 16:37 UTC (permalink / raw) To: Peng Tao; +Cc: Idan Kedar, Boaz Harrosh, NFS list, Benny Halevy T24gRnJpLCAyMDEyLTA4LTEwIGF0IDAwOjM0ICswODAwLCBQZW5nIFRhbyB3cm90ZToNCj4gT24g RnJpLCBBdWcgMTAsIDIwMTIgYXQgMTI6MDYgQU0sIE15a2xlYnVzdCwgVHJvbmQNCj4gPFRyb25k Lk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gPiBPbiBUaHUsIDIwMTItMDgtMDkgYXQg MTg6NDkgKzAzMDAsIElkYW4gS2VkYXIgd3JvdGU6DQo+ID4+IE9uIFRodSwgQXVnIDksIDIwMTIg YXQgNTowNSBQTSwgTXlrbGVidXN0LCBUcm9uZA0KPiA+PiA8VHJvbmQuTXlrbGVidXN0QG5ldGFw cC5jb20+IHdyb3RlOg0KPiA+PiA+PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiA+PiA+ PiBGcm9tOiBsaW51eC1uZnMtb3duZXJAdmdlci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtbmZz LQ0KPiA+PiA+PiBvd25lckB2Z2VyLmtlcm5lbC5vcmddIE9uIEJlaGFsZiBPZiBJZGFuIEtlZGFy DQo+ID4+ID4+IFNlbnQ6IFRodXJzZGF5LCBBdWd1c3QgMDksIDIwMTIgOTowMyBBTQ0KPiA+PiA+ PiBUbzogQm9heiBIYXJyb3NoOyBORlMgbGlzdA0KPiA+PiA+PiBDYzogQmVubnkgSGFsZXZ5DQo+ ID4+ID4+IFN1YmplY3Q6IHJldHVybiBsYXlvdXQgb24gZXJyb3IsIEJVRy9kZWFkbG9jaw0KPiA+ PiA+Pg0KPiA+PiA+PiBIaSwNCj4gPj4gPj4NCj4gPj4gPj4gQXMgYSByZXN1bHQgb2Ygc29tZSBl eHBlcmltZW50cywgSSB3YW50ZWQgdG8gc2VlIHdoYXQgaGFwcGVucyB3aGVuIEkNCj4gPj4gPj4g aW5qZWN0IGFuIGVycm9yIChoYXJkIGNvZGVkKSB0byB0aGUgb2JqZWN0IGxheW91dCBkcml2ZXIu IHRoZSBwYXRjaCBpcyBhdCB0aGUNCj4gPj4gPj4gYm90dG9tIG9mIHRoaXMgbWFpbC4gdGhlIHJl YXNvbiBJIGRpZCB0aGlzIGlzIGJlY2F1c2Ugd2hlbiBJIGluamVjdCBlcnJvcnMgaW4gbXkNCj4g Pj4gPj4gbW9kaWZpZWQgdmVyc2lvbiBvZiB0aGUgb2JqZWN0IGxheW91dCBkcml2ZXIsIEkgZ2V0 IHRoZSBzYW1lIEJVRyBUaWdyYW4NCj4gPj4gPj4gcmVwb3J0ZWQgYWJvdXQgeWVzdGVyZGF5Og0K PiA+PiA+PiBuZnM0cHJvYy5jOjYyNTIgOiAgIEJVR19PTighbGlzdF9lbXB0eSgmbG8tPnBsaF9z ZWdzKSk7DQo+ID4+ID4+DQo+ID4+ID4+IEluIG15IG1vZGlmaWVkIHZlcnNpb24gKGJhc2VkIG9u IGtlcm5lbCAzLjMpLCB0aGUgYnVnIHNlZW1zIHRvIGJlIHRoYXQNCj4gPj4gPj4gcG5mc19sZF93 cml0ZV9kb25lIGNhbGxzIHBuZnNfcmV0dXJuX2xheW91dCBpbiB0aGUgZXJyb3IgcGF0aCwgZXZl biBpZiB0aGVyZQ0KPiA+PiA+PiBpcyBpbi1mbGlnaHQgSS9PLg0KPiA+PiA+DQo+ID4+ID4gVGhh dCBpcyBub3QgYSBidWcuIEl0IGlzIGFuIGludGVudGlvbmFsIGNoYW5nZSBpbiBvcmRlciB0byBh bGxvdyB0aGUgTURTIHRvIGZlbmNlIG9mZiB0aGUgb3V0c3RhbmRpbmcgd3JpdGVzIChpZiBpdCBj YW4gZG8gc28pIGJlZm9yZSB3ZSByZXRyYW5zbWl0IHRoZW0gYXMgd3JpdGUtdGhyb3VnaC1NRFMu IE90aGVyd2lzZSwgeW91IHJpc2sgcmFjZXMgYmV0d2VlbiB0aGUgb3V0c3RhbmRpbmcgd3JpdGVz LXRvLURTIGFuZCB0aGUgbmV3IHdyaXRlcy10aHJvdWdoLU1EUy4NCj4gPj4NCj4gPj4gdG8gd2hh dCBjaGFuZ2UgYXJlIHlvdSByZWZlcnJpbmc/DQo+ID4NCj4gPiBBcyBJIHN0YXRlZCBpbiB0aGUg Y2hhbmdlbG9nIG9mIHRoZSBwYXRjaCB0aGF0IEkgc2VudCB0byB0aGUgbGlzdA0KPiA+IHllc3Rl cmRheSwgdGhlIGJlaGF2aW91ciBpcyBkdWUgdG8gY29tbWl0IDBhNTdjZGFjM2YuDQo+ID4NCj4g Pj4gPg0KPiA+PiA+IFNlZSB0aGUgY2hhbmdlbG9nIGluIHRoZSBwYXRjaCB0aGF0IEkgc2VudCB0 byB0aGUgbGlzdCB5ZXN0ZXJkYXkuDQo+ID4+ID4NCj4gPj4NCj4gPj4gSSBzYXcgdGhhdCwgYW5k IGlmIEknbSBub3QgbWlzdGFrZW4gdGhlc2UgcmFjZXMgYXBwbHkgdG8gb2JqZWN0IGxheW91dA0K PiA+PiBhcyB3ZWxsLCBhbmQgaW4gYW55IGNhc2UgdGhleSBhcHBseSBpbiBteSBjYXNlLiBIb3dl dmVyLCBpdCBpcyBub3QNCj4gPj4gZWFzeSB0byBtZXNzIGFyb3VuZCB3aXRoIExBWU9VVFJFVFVS TiBpbiBvYmplY3QgbGF5b3V0LCBhbmQgdGhlcmUgaGF2ZQ0KPiA+PiBiZWVuIHNldmVyYWwgZGlz Y3Vzc2lvbnMgb24gdGhlIGlzc3VlLiBJbiBvbmUgb2YgdGhlc2UgZGlzY3Vzc2lvbnMNCj4gPj4g QmVubnkgY2xhcmlmaWVkIHRoYXQgdGhlIG9iamVjdCBsYXlvdXQgY2xpZW50IG11c3Qgd2FpdCBm b3IgYWxsDQo+ID4+IGluLWZsaWdodCBJL08gdG8gZW5kLg0KPiA+DQo+ID4gSWYgdGhlIHByb2Js ZW0gaXMgdGhhdCB0aGUgRFMgaXMgZmFpbGluZyB0byByZXNwb25kLCBob3cgZG9lcyB0aGUgY2xp ZW50DQo+ID4ga25vdyB0aGF0IHRoZSBpbi1mbGlnaHQgSS9PIGhhcyBlbmRlZD8NCj4gPg0KPiA+ PiBTbyBmb3IgZmlsZSBsYXlvdXQgaXQgcHJvYmFibHkgbWFrZXMgc2Vuc2UsIGJ1dCBvYmplY3Qg bGF5b3V0IChhbmQgaWYNCj4gPj4gSSB1bmRlcnN0YW5kIGNvcnJlY3RseSwgYmxvY2sgbGF5b3V0 IGFzIHdlbGwpIHNvbWV0aGluZyBlbHNlIG5lZWRzIHRvDQo+ID4+IGJlIGRvbmUuIEkgdGhvdWdo dCBhYm91dCBzeW5jIHdhaXQgd2hlbiByZXR1cm5pbmcgdGhlIGxheW91dCBvbiBlcnJvciwNCj4g Pj4gYnV0IGFjY29yZGluZyB0byBCb2F6IGl0IHdpbGwgY2F1c2UgZGVhZGxvY2tzIChCb2F6IC0g Y2FuIHlvdSBwbGVhc2UNCj4gPj4gZWxhYm9yYXRlPykuDQo+ID4NCj4gPiBUaGUgb2JqZWN0IGxh eW91dHJldHVybiBoYXMgdGhlIGFiaWxpdHkgdG8gcGFzcyBhIHRpbWVvdXQgZXJyb3IgdmFsdWUg dG8NCj4gPiB0aGUgTURTIHByZWNpc2VseSBpbiBvcmRlciB0byBhbGxvdyB0aGUgbGF0dGVyIHRv IGRlYWwgd2l0aCB0aGlzIGtpbmQgb2YNCj4gPiBpc3N1ZS4gU2VlIHRoZSBkZXNjcmlwdGlvbiBv ZiBzdHJ1Y3QgcG5mc19vc2RfaW9lcnI0IGluIHJmYzU2NjQuDQo+ID4NCj4gPiBUaGUgYmxvY2sg bGF5b3V0IGlzIGFkZGluZyB0aGUgc2FtZSBhYmlsaXR5IHRvIGxheW91dHJldHVybiBpbiBORlN2 NC4yDQo+ID4gKHNlZSBkcmFmdC1pZXRmLW5mc3Y0LW1pbm9ydmVyc2lvbjItMTMudHh0KSB2aWEg dGhlIHN0cnVjdA0KPiA+IGxheW91dHJldHVybl9kZXZpY2VfZXJyb3I0LCBzbyBwcmVzdW1hYmx5 IHRoZXkgdG9vIGhhdmUgYSBwbGFuIGZvcg0KPiA+IGRlYWxpbmcgd2l0aCB0aGlzIGtpbmQgb2Yg aXNzdWUuDQo+IEl0IGlzIG9uZSB0aGluZyB0byB0ZWxsIE1EUyB0aGF0IHRoZXJlIGlzIERTIGFj Y2VzcyBlcnJvciBieSBzZW5kaW5nDQo+IGxheW91dHJldHVybiwgYW5kIGl0IGlzIGFub3RoZXIg dGhpbmcgdG8gcmV0dXJuIGEgbGF5b3V0IGV2ZW4gaWYgdGhlcmUNCj4gaXMgb3ZlcmxhcHBpbmcg aW4tZmxpZ2h0IERTIElPLi4uDQo+IA0KPiBJIGNlcnRhaW5seSBhZ3JlZSB0aGF0IGNsaWVudCBp cyBlbnRpdGxlZCB0byByZXR1cm4gbGF5b3V0IHRvIGluZm9ybQ0KPiBNRFMgYWJvdXQgRFMgZXJy b3JzIGFuZCBhbHNvIGF2b2lkIHBvc3NpYmxlIGNiX2xheW91dHJlY2FsbC4gQnV0IGl0IGlzDQo+ IGp1c3QgYW4gb3B0aW1pemF0aW9uIGFuZCBzaG91bGQgb25seSBiZSBkb25lIHdoZW4gdGhlcmUg aXMgbm8NCj4gaW4tZmxpZ2h0IElPIChhdCBsZWFzdCBmb3IgYmxvY2sgbGF5b3V0KSBJTUhPLg0K DQpIT1cgRE8gWU9VIEdVQVJBTlRFRSBOTyBJTi1GTElHSFQgSU8/DQoNClJlcGVhdGluZyB0aGUg c2FtZSBtYW50cmEgYWJvdXQgJ25vIGluLWZsaWdodCBJTycgdGhhdCBkb2Vzbid0IGFwcGx5IHRv DQp0aW1lb3V0IHNpdHVhdGlvbnMgaXNuJ3QgaGVscGZ1bC4NCg0KQSBUSU1FT1VUIG1lYW5zIHRo YXQgeW91IGhhdmUgTk8gSURFQSBpZiB0aGUgZGF0YSBpcyBzdGlsbCBpbiBmbGlnaHQgb3INCm5v dC4gVGhhdCdzIHdoZW4geW91IG5lZWQgZmVuY2luZywgYW5kIHRoZSBvbmx5IHRoaW5nIHRoYXQg Y2FuIHN1cHBseQ0KZmVuY2luZyBpbiB0aGF0IHNpdHVhdGlvbiBpcyB0aGUgTURTLg0KDQotLSAN ClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5ldEFwcA0K VHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCnd3dy5uZXRhcHAuY29tDQoNCg== ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: return layout on error, BUG/deadlock 2012-08-09 16:37 ` Myklebust, Trond @ 2012-08-09 16:48 ` Peng Tao 2012-08-09 19:31 ` Myklebust, Trond 0 siblings, 1 reply; 12+ messages in thread From: Peng Tao @ 2012-08-09 16:48 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Idan Kedar, Boaz Harrosh, NFS list, Benny Halevy On Fri, Aug 10, 2012 at 12:37 AM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Fri, 2012-08-10 at 00:34 +0800, Peng Tao wrote: >> On Fri, Aug 10, 2012 at 12:06 AM, Myklebust, Trond >> <Trond.Myklebust@netapp.com> wrote: >> > On Thu, 2012-08-09 at 18:49 +0300, Idan Kedar wrote: >> >> On Thu, Aug 9, 2012 at 5:05 PM, Myklebust, Trond >> >> <Trond.Myklebust@netapp.com> wrote: >> >> >> -----Original Message----- >> >> >> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- >> >> >> owner@vger.kernel.org] On Behalf Of Idan Kedar >> >> >> Sent: Thursday, August 09, 2012 9:03 AM >> >> >> To: Boaz Harrosh; NFS list >> >> >> Cc: Benny Halevy >> >> >> Subject: return layout on error, BUG/deadlock >> >> >> >> >> >> Hi, >> >> >> >> >> >> As a result of some experiments, I wanted to see what happens when I >> >> >> inject an error (hard coded) to the object layout driver. the patch is at the >> >> >> bottom of this mail. the reason I did this is because when I inject errors in my >> >> >> modified version of the object layout driver, I get the same BUG Tigran >> >> >> reported about yesterday: >> >> >> nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); >> >> >> >> >> >> In my modified version (based on kernel 3.3), the bug seems to be that >> >> >> pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there >> >> >> is in-flight I/O. >> >> > >> >> > That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS. >> >> >> >> to what change are you referring? >> > >> > As I stated in the changelog of the patch that I sent to the list >> > yesterday, the behaviour is due to commit 0a57cdac3f. >> > >> >> > >> >> > See the changelog in the patch that I sent to the list yesterday. >> >> > >> >> >> >> I saw that, and if I'm not mistaken these races apply to object layout >> >> as well, and in any case they apply in my case. However, it is not >> >> easy to mess around with LAYOUTRETURN in object layout, and there have >> >> been several discussions on the issue. In one of these discussions >> >> Benny clarified that the object layout client must wait for all >> >> in-flight I/O to end. >> > >> > If the problem is that the DS is failing to respond, how does the client >> > know that the in-flight I/O has ended? >> > >> >> So for file layout it probably makes sense, but object layout (and if >> >> I understand correctly, block layout as well) something else needs to >> >> be done. I thought about sync wait when returning the layout on error, >> >> but according to Boaz it will cause deadlocks (Boaz - can you please >> >> elaborate?). >> > >> > The object layoutreturn has the ability to pass a timeout error value to >> > the MDS precisely in order to allow the latter to deal with this kind of >> > issue. See the description of struct pnfs_osd_ioerr4 in rfc5664. >> > >> > The block layout is adding the same ability to layoutreturn in NFSv4.2 >> > (see draft-ietf-nfsv4-minorversion2-13.txt) via the struct >> > layoutreturn_device_error4, so presumably they too have a plan for >> > dealing with this kind of issue. >> It is one thing to tell MDS that there is DS access error by sending >> layoutreturn, and it is another thing to return a layout even if there >> is overlapping in-flight DS IO... >> >> I certainly agree that client is entitled to return layout to inform >> MDS about DS errors and also avoid possible cb_layoutrecall. But it is >> just an optimization and should only be done when there is no >> in-flight IO (at least for block layout) IMHO. > > HOW DO YOU GUARANTEE NO IN-FLIGHT IO? > I don't. That's why I don't return layout in pnfs_ld_write_done(). And for layoutreturn upon cb_layoutreturn, block layout client needs to do timed-lease IO fencing per rfc5663, but it is not implemented in Linux client. > Repeating the same mantra about 'no in-flight IO' that doesn't apply to > timeout situations isn't helpful. > > A TIMEOUT means that you have NO IDEA if the data is still in flight or > not. That's when you need fencing, and the only thing that can supply > fencing in that situation is the MDS. > > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com > -- Thanks, Tao ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: return layout on error, BUG/deadlock 2012-08-09 16:48 ` Peng Tao @ 2012-08-09 19:31 ` Myklebust, Trond 2012-08-10 2:46 ` Peng Tao 0 siblings, 1 reply; 12+ messages in thread From: Myklebust, Trond @ 2012-08-09 19:31 UTC (permalink / raw) To: Peng Tao; +Cc: Idan Kedar, Boaz Harrosh, NFS list, Benny Halevy T24gRnJpLCAyMDEyLTA4LTEwIGF0IDAwOjQ4ICswODAwLCBQZW5nIFRhbyB3cm90ZToNCj4gT24g RnJpLCBBdWcgMTAsIDIwMTIgYXQgMTI6MzcgQU0sIE15a2xlYnVzdCwgVHJvbmQNCj4gPFRyb25k Lk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gPiBPbiBGcmksIDIwMTItMDgtMTAgYXQg MDA6MzQgKzA4MDAsIFBlbmcgVGFvIHdyb3RlOg0KPiA+PiBPbiBGcmksIEF1ZyAxMCwgMjAxMiBh dCAxMjowNiBBTSwgTXlrbGVidXN0LCBUcm9uZA0KPiA+PiA8VHJvbmQuTXlrbGVidXN0QG5ldGFw cC5jb20+IHdyb3RlOg0KPiA+PiA+IE9uIFRodSwgMjAxMi0wOC0wOSBhdCAxODo0OSArMDMwMCwg SWRhbiBLZWRhciB3cm90ZToNCj4gPj4gPj4gT24gVGh1LCBBdWcgOSwgMjAxMiBhdCA1OjA1IFBN LCBNeWtsZWJ1c3QsIFRyb25kDQo+ID4+ID4+IDxUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbT4g d3JvdGU6DQo+ID4+ID4+ID4+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+ID4+ID4+ID4+ IEZyb206IGxpbnV4LW5mcy1vd25lckB2Z2VyLmtlcm5lbC5vcmcgW21haWx0bzpsaW51eC1uZnMt DQo+ID4+ID4+ID4+IG93bmVyQHZnZXIua2VybmVsLm9yZ10gT24gQmVoYWxmIE9mIElkYW4gS2Vk YXINCj4gPj4gPj4gPj4gU2VudDogVGh1cnNkYXksIEF1Z3VzdCAwOSwgMjAxMiA5OjAzIEFNDQo+ ID4+ID4+ID4+IFRvOiBCb2F6IEhhcnJvc2g7IE5GUyBsaXN0DQo+ID4+ID4+ID4+IENjOiBCZW5u eSBIYWxldnkNCj4gPj4gPj4gPj4gU3ViamVjdDogcmV0dXJuIGxheW91dCBvbiBlcnJvciwgQlVH L2RlYWRsb2NrDQo+ID4+ID4+ID4+DQo+ID4+ID4+ID4+IEhpLA0KPiA+PiA+PiA+Pg0KPiA+PiA+ PiA+PiBBcyBhIHJlc3VsdCBvZiBzb21lIGV4cGVyaW1lbnRzLCBJIHdhbnRlZCB0byBzZWUgd2hh dCBoYXBwZW5zIHdoZW4gSQ0KPiA+PiA+PiA+PiBpbmplY3QgYW4gZXJyb3IgKGhhcmQgY29kZWQp IHRvIHRoZSBvYmplY3QgbGF5b3V0IGRyaXZlci4gdGhlIHBhdGNoIGlzIGF0IHRoZQ0KPiA+PiA+ PiA+PiBib3R0b20gb2YgdGhpcyBtYWlsLiB0aGUgcmVhc29uIEkgZGlkIHRoaXMgaXMgYmVjYXVz ZSB3aGVuIEkgaW5qZWN0IGVycm9ycyBpbiBteQ0KPiA+PiA+PiA+PiBtb2RpZmllZCB2ZXJzaW9u IG9mIHRoZSBvYmplY3QgbGF5b3V0IGRyaXZlciwgSSBnZXQgdGhlIHNhbWUgQlVHIFRpZ3Jhbg0K PiA+PiA+PiA+PiByZXBvcnRlZCBhYm91dCB5ZXN0ZXJkYXk6DQo+ID4+ID4+ID4+IG5mczRwcm9j LmM6NjI1MiA6ICAgQlVHX09OKCFsaXN0X2VtcHR5KCZsby0+cGxoX3NlZ3MpKTsNCj4gPj4gPj4g Pj4NCj4gPj4gPj4gPj4gSW4gbXkgbW9kaWZpZWQgdmVyc2lvbiAoYmFzZWQgb24ga2VybmVsIDMu MyksIHRoZSBidWcgc2VlbXMgdG8gYmUgdGhhdA0KPiA+PiA+PiA+PiBwbmZzX2xkX3dyaXRlX2Rv bmUgY2FsbHMgcG5mc19yZXR1cm5fbGF5b3V0IGluIHRoZSBlcnJvciBwYXRoLCBldmVuIGlmIHRo ZXJlDQo+ID4+ID4+ID4+IGlzIGluLWZsaWdodCBJL08uDQo+ID4+ID4+ID4NCj4gPj4gPj4gPiBU aGF0IGlzIG5vdCBhIGJ1Zy4gSXQgaXMgYW4gaW50ZW50aW9uYWwgY2hhbmdlIGluIG9yZGVyIHRv IGFsbG93IHRoZSBNRFMgdG8gZmVuY2Ugb2ZmIHRoZSBvdXRzdGFuZGluZyB3cml0ZXMgKGlmIGl0 IGNhbiBkbyBzbykgYmVmb3JlIHdlIHJldHJhbnNtaXQgdGhlbSBhcyB3cml0ZS10aHJvdWdoLU1E Uy4gT3RoZXJ3aXNlLCB5b3UgcmlzayByYWNlcyBiZXR3ZWVuIHRoZSBvdXRzdGFuZGluZyB3cml0 ZXMtdG8tRFMgYW5kIHRoZSBuZXcgd3JpdGVzLXRocm91Z2gtTURTLg0KPiA+PiA+Pg0KPiA+PiA+ PiB0byB3aGF0IGNoYW5nZSBhcmUgeW91IHJlZmVycmluZz8NCj4gPj4gPg0KPiA+PiA+IEFzIEkg c3RhdGVkIGluIHRoZSBjaGFuZ2Vsb2cgb2YgdGhlIHBhdGNoIHRoYXQgSSBzZW50IHRvIHRoZSBs aXN0DQo+ID4+ID4geWVzdGVyZGF5LCB0aGUgYmVoYXZpb3VyIGlzIGR1ZSB0byBjb21taXQgMGE1 N2NkYWMzZi4NCj4gPj4gPg0KPiA+PiA+PiA+DQo+ID4+ID4+ID4gU2VlIHRoZSBjaGFuZ2Vsb2cg aW4gdGhlIHBhdGNoIHRoYXQgSSBzZW50IHRvIHRoZSBsaXN0IHllc3RlcmRheS4NCj4gPj4gPj4g Pg0KPiA+PiA+Pg0KPiA+PiA+PiBJIHNhdyB0aGF0LCBhbmQgaWYgSSdtIG5vdCBtaXN0YWtlbiB0 aGVzZSByYWNlcyBhcHBseSB0byBvYmplY3QgbGF5b3V0DQo+ID4+ID4+IGFzIHdlbGwsIGFuZCBp biBhbnkgY2FzZSB0aGV5IGFwcGx5IGluIG15IGNhc2UuIEhvd2V2ZXIsIGl0IGlzIG5vdA0KPiA+ PiA+PiBlYXN5IHRvIG1lc3MgYXJvdW5kIHdpdGggTEFZT1VUUkVUVVJOIGluIG9iamVjdCBsYXlv dXQsIGFuZCB0aGVyZSBoYXZlDQo+ID4+ID4+IGJlZW4gc2V2ZXJhbCBkaXNjdXNzaW9ucyBvbiB0 aGUgaXNzdWUuIEluIG9uZSBvZiB0aGVzZSBkaXNjdXNzaW9ucw0KPiA+PiA+PiBCZW5ueSBjbGFy aWZpZWQgdGhhdCB0aGUgb2JqZWN0IGxheW91dCBjbGllbnQgbXVzdCB3YWl0IGZvciBhbGwNCj4g Pj4gPj4gaW4tZmxpZ2h0IEkvTyB0byBlbmQuDQo+ID4+ID4NCj4gPj4gPiBJZiB0aGUgcHJvYmxl bSBpcyB0aGF0IHRoZSBEUyBpcyBmYWlsaW5nIHRvIHJlc3BvbmQsIGhvdyBkb2VzIHRoZSBjbGll bnQNCj4gPj4gPiBrbm93IHRoYXQgdGhlIGluLWZsaWdodCBJL08gaGFzIGVuZGVkPw0KPiA+PiA+ DQo+ID4+ID4+IFNvIGZvciBmaWxlIGxheW91dCBpdCBwcm9iYWJseSBtYWtlcyBzZW5zZSwgYnV0 IG9iamVjdCBsYXlvdXQgKGFuZCBpZg0KPiA+PiA+PiBJIHVuZGVyc3RhbmQgY29ycmVjdGx5LCBi bG9jayBsYXlvdXQgYXMgd2VsbCkgc29tZXRoaW5nIGVsc2UgbmVlZHMgdG8NCj4gPj4gPj4gYmUg ZG9uZS4gSSB0aG91Z2h0IGFib3V0IHN5bmMgd2FpdCB3aGVuIHJldHVybmluZyB0aGUgbGF5b3V0 IG9uIGVycm9yLA0KPiA+PiA+PiBidXQgYWNjb3JkaW5nIHRvIEJvYXogaXQgd2lsbCBjYXVzZSBk ZWFkbG9ja3MgKEJvYXogLSBjYW4geW91IHBsZWFzZQ0KPiA+PiA+PiBlbGFib3JhdGU/KS4NCj4g Pj4gPg0KPiA+PiA+IFRoZSBvYmplY3QgbGF5b3V0cmV0dXJuIGhhcyB0aGUgYWJpbGl0eSB0byBw YXNzIGEgdGltZW91dCBlcnJvciB2YWx1ZSB0bw0KPiA+PiA+IHRoZSBNRFMgcHJlY2lzZWx5IGlu IG9yZGVyIHRvIGFsbG93IHRoZSBsYXR0ZXIgdG8gZGVhbCB3aXRoIHRoaXMga2luZCBvZg0KPiA+ PiA+IGlzc3VlLiBTZWUgdGhlIGRlc2NyaXB0aW9uIG9mIHN0cnVjdCBwbmZzX29zZF9pb2VycjQg aW4gcmZjNTY2NC4NCj4gPj4gPg0KPiA+PiA+IFRoZSBibG9jayBsYXlvdXQgaXMgYWRkaW5nIHRo ZSBzYW1lIGFiaWxpdHkgdG8gbGF5b3V0cmV0dXJuIGluIE5GU3Y0LjINCj4gPj4gPiAoc2VlIGRy YWZ0LWlldGYtbmZzdjQtbWlub3J2ZXJzaW9uMi0xMy50eHQpIHZpYSB0aGUgc3RydWN0DQo+ID4+ ID4gbGF5b3V0cmV0dXJuX2RldmljZV9lcnJvcjQsIHNvIHByZXN1bWFibHkgdGhleSB0b28gaGF2 ZSBhIHBsYW4gZm9yDQo+ID4+ID4gZGVhbGluZyB3aXRoIHRoaXMga2luZCBvZiBpc3N1ZS4NCj4g Pj4gSXQgaXMgb25lIHRoaW5nIHRvIHRlbGwgTURTIHRoYXQgdGhlcmUgaXMgRFMgYWNjZXNzIGVy cm9yIGJ5IHNlbmRpbmcNCj4gPj4gbGF5b3V0cmV0dXJuLCBhbmQgaXQgaXMgYW5vdGhlciB0aGlu ZyB0byByZXR1cm4gYSBsYXlvdXQgZXZlbiBpZiB0aGVyZQ0KPiA+PiBpcyBvdmVybGFwcGluZyBp bi1mbGlnaHQgRFMgSU8uLi4NCj4gPj4NCj4gPj4gSSBjZXJ0YWlubHkgYWdyZWUgdGhhdCBjbGll bnQgaXMgZW50aXRsZWQgdG8gcmV0dXJuIGxheW91dCB0byBpbmZvcm0NCj4gPj4gTURTIGFib3V0 IERTIGVycm9ycyBhbmQgYWxzbyBhdm9pZCBwb3NzaWJsZSBjYl9sYXlvdXRyZWNhbGwuIEJ1dCBp dCBpcw0KPiA+PiBqdXN0IGFuIG9wdGltaXphdGlvbiBhbmQgc2hvdWxkIG9ubHkgYmUgZG9uZSB3 aGVuIHRoZXJlIGlzIG5vDQo+ID4+IGluLWZsaWdodCBJTyAoYXQgbGVhc3QgZm9yIGJsb2NrIGxh eW91dCkgSU1ITy4NCj4gPg0KPiA+IEhPVyBETyBZT1UgR1VBUkFOVEVFIE5PIElOLUZMSUdIVCBJ Tz8NCj4gPg0KPiBJIGRvbid0LiBUaGF0J3Mgd2h5IEkgZG9uJ3QgcmV0dXJuIGxheW91dCBpbiBw bmZzX2xkX3dyaXRlX2RvbmUoKS4gQW5kDQo+IGZvciBsYXlvdXRyZXR1cm4gdXBvbiBjYl9sYXlv dXRyZXR1cm4sIGJsb2NrIGxheW91dCBjbGllbnQgbmVlZHMgdG8gZG8NCj4gdGltZWQtbGVhc2Ug SU8gZmVuY2luZyBwZXIgcmZjNTY2MywgYnV0IGl0IGlzIG5vdCBpbXBsZW1lbnRlZCBpbiBMaW51 eA0KPiBjbGllbnQuDQoNClRoZSB0aW1lZC1sZWFzZSBJTyBmZW5jaW5nIGRlc2NyaWJlZCBpbiBy ZmM1NjYzIGlzIGFib3V0IGluZm9ybWluZyB0aGUNCnNlcnZlciBhYm91dCBob3cgbG9uZyB0aGUg Y2xpZW50IGV4cGVjdHMgYSBjb21tYW5kIHRvIHN1Y2NlZWQgb3IgZmFpbC4NCkl0IGRvZXNuJ3Qg b2ZmZXIgYW55IGFkdmljZSBmb3IgaG93IHRoZSBjbGllbnQgaXMgdG8gZGVhbCB3aXRoIGFuDQp1 bnJlc3BvbnNpdmUgRFMuDQoNCldoYXQgeW91IG5lZWQgaGVyZSBpcyBoZWxwIGZyb20gdGhlIHVu ZGVybHlpbmcgdHJhbnNwb3J0IHByb3RvY29sLiBBcyBJDQpzYWlkIGluIHRoZSBlbWFpbCB0byBJ ZGFuLCB3aGVuIHJlc2VhcmNoaW5nIGlTQ1NJIGFuZCBpRkNQLCBJIGZvdW5kIHdoYXQNCmFwcGVh cnMgdG8gYmUgbWVjaGFuaXNtcyBmb3IgcmVsaWFibHkgdGltaW5nIG91dC4NCg0KLS0gDQpUcm9u ZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25k Lk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0KDQo= ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: return layout on error, BUG/deadlock 2012-08-09 19:31 ` Myklebust, Trond @ 2012-08-10 2:46 ` Peng Tao 0 siblings, 0 replies; 12+ messages in thread From: Peng Tao @ 2012-08-10 2:46 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Idan Kedar, Boaz Harrosh, NFS list, Benny Halevy On Fri, Aug 10, 2012 at 3:31 AM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Fri, 2012-08-10 at 00:48 +0800, Peng Tao wrote: >> On Fri, Aug 10, 2012 at 12:37 AM, Myklebust, Trond >> <Trond.Myklebust@netapp.com> wrote: >> > On Fri, 2012-08-10 at 00:34 +0800, Peng Tao wrote: >> >> On Fri, Aug 10, 2012 at 12:06 AM, Myklebust, Trond >> >> <Trond.Myklebust@netapp.com> wrote: >> >> > On Thu, 2012-08-09 at 18:49 +0300, Idan Kedar wrote: >> >> >> On Thu, Aug 9, 2012 at 5:05 PM, Myklebust, Trond >> >> >> <Trond.Myklebust@netapp.com> wrote: >> >> >> >> -----Original Message----- >> >> >> >> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- >> >> >> >> owner@vger.kernel.org] On Behalf Of Idan Kedar >> >> >> >> Sent: Thursday, August 09, 2012 9:03 AM >> >> >> >> To: Boaz Harrosh; NFS list >> >> >> >> Cc: Benny Halevy >> >> >> >> Subject: return layout on error, BUG/deadlock >> >> >> >> >> >> >> >> Hi, >> >> >> >> >> >> >> >> As a result of some experiments, I wanted to see what happens when I >> >> >> >> inject an error (hard coded) to the object layout driver. the patch is at the >> >> >> >> bottom of this mail. the reason I did this is because when I inject errors in my >> >> >> >> modified version of the object layout driver, I get the same BUG Tigran >> >> >> >> reported about yesterday: >> >> >> >> nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); >> >> >> >> >> >> >> >> In my modified version (based on kernel 3.3), the bug seems to be that >> >> >> >> pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there >> >> >> >> is in-flight I/O. >> >> >> > >> >> >> > That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS. >> >> >> >> >> >> to what change are you referring? >> >> > >> >> > As I stated in the changelog of the patch that I sent to the list >> >> > yesterday, the behaviour is due to commit 0a57cdac3f. >> >> > >> >> >> > >> >> >> > See the changelog in the patch that I sent to the list yesterday. >> >> >> > >> >> >> >> >> >> I saw that, and if I'm not mistaken these races apply to object layout >> >> >> as well, and in any case they apply in my case. However, it is not >> >> >> easy to mess around with LAYOUTRETURN in object layout, and there have >> >> >> been several discussions on the issue. In one of these discussions >> >> >> Benny clarified that the object layout client must wait for all >> >> >> in-flight I/O to end. >> >> > >> >> > If the problem is that the DS is failing to respond, how does the client >> >> > know that the in-flight I/O has ended? >> >> > >> >> >> So for file layout it probably makes sense, but object layout (and if >> >> >> I understand correctly, block layout as well) something else needs to >> >> >> be done. I thought about sync wait when returning the layout on error, >> >> >> but according to Boaz it will cause deadlocks (Boaz - can you please >> >> >> elaborate?). >> >> > >> >> > The object layoutreturn has the ability to pass a timeout error value to >> >> > the MDS precisely in order to allow the latter to deal with this kind of >> >> > issue. See the description of struct pnfs_osd_ioerr4 in rfc5664. >> >> > >> >> > The block layout is adding the same ability to layoutreturn in NFSv4.2 >> >> > (see draft-ietf-nfsv4-minorversion2-13.txt) via the struct >> >> > layoutreturn_device_error4, so presumably they too have a plan for >> >> > dealing with this kind of issue. >> >> It is one thing to tell MDS that there is DS access error by sending >> >> layoutreturn, and it is another thing to return a layout even if there >> >> is overlapping in-flight DS IO... >> >> >> >> I certainly agree that client is entitled to return layout to inform >> >> MDS about DS errors and also avoid possible cb_layoutrecall. But it is >> >> just an optimization and should only be done when there is no >> >> in-flight IO (at least for block layout) IMHO. >> > >> > HOW DO YOU GUARANTEE NO IN-FLIGHT IO? >> > >> I don't. That's why I don't return layout in pnfs_ld_write_done(). And >> for layoutreturn upon cb_layoutreturn, block layout client needs to do >> timed-lease IO fencing per rfc5663, but it is not implemented in Linux >> client. > > The timed-lease IO fencing described in rfc5663 is about informing the > server about how long the client expects a command to succeed or fail. > It doesn't offer any advice for how the client is to deal with an > unresponsive DS. > > What you need here is help from the underlying transport protocol. As I > said in the email to Idan, when researching iSCSI and iFCP, I found what > appears to be mechanisms for reliably timing out. Just checked and found that the layoutreturn-on-error behavior only affects object and file layout. So block layout stays out and safe. That's all I would ask for. Thanks for your explanation. Best, Tao ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: return layout on error, BUG/deadlock 2012-08-09 16:06 ` Myklebust, Trond 2012-08-09 16:34 ` Peng Tao @ 2012-08-09 16:54 ` Idan Kedar 2012-08-09 19:12 ` Myklebust, Trond 1 sibling, 1 reply; 12+ messages in thread From: Idan Kedar @ 2012-08-09 16:54 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Boaz Harrosh, NFS list, Benny Halevy, Peng Tao On Thu, Aug 9, 2012 at 7:06 PM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Thu, 2012-08-09 at 18:49 +0300, Idan Kedar wrote: >> On Thu, Aug 9, 2012 at 5:05 PM, Myklebust, Trond >> <Trond.Myklebust@netapp.com> wrote: >> >> -----Original Message----- >> >> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- >> >> owner@vger.kernel.org] On Behalf Of Idan Kedar >> >> Sent: Thursday, August 09, 2012 9:03 AM >> >> To: Boaz Harrosh; NFS list >> >> Cc: Benny Halevy >> >> Subject: return layout on error, BUG/deadlock >> >> >> >> Hi, >> >> >> >> As a result of some experiments, I wanted to see what happens when I >> >> inject an error (hard coded) to the object layout driver. the patch is at the >> >> bottom of this mail. the reason I did this is because when I inject errors in my >> >> modified version of the object layout driver, I get the same BUG Tigran >> >> reported about yesterday: >> >> nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); >> >> >> >> In my modified version (based on kernel 3.3), the bug seems to be that >> >> pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there >> >> is in-flight I/O. >> > >> > That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS. >> >> to what change are you referring? > > As I stated in the changelog of the patch that I sent to the list > yesterday, the behaviour is due to commit 0a57cdac3f. > I'm not sure I'm following. I was talking about the state of the code v3.5, not to any specific patch/change. since the client yelled "BUG!!!" and crashed, there is some bug involved, either because the BUG() is correct or because the presence of the BUG() is the bug itself. >> > >> > See the changelog in the patch that I sent to the list yesterday. >> > >> >> I saw that, and if I'm not mistaken these races apply to object layout >> as well, and in any case they apply in my case. However, it is not >> easy to mess around with LAYOUTRETURN in object layout, and there have >> been several discussions on the issue. In one of these discussions >> Benny clarified that the object layout client must wait for all >> in-flight I/O to end. > > If the problem is that the DS is failing to respond, how does the client > know that the in-flight I/O has ended? > after the last rpc_call_done is called there is no I/O from the client's perspective an the layout can be safely returned, after which I/O can be done through the MDS. Is there I/O pending from the DS perspective? maybe, but all you can do is send I/O via MDS and hope it will not race. fencing doesn't really save you, it just improves your odds where applicable, i.e. file layout. >> So for file layout it probably makes sense, but object layout (and if >> I understand correctly, block layout as well) something else needs to >> be done. I thought about sync wait when returning the layout on error, >> but according to Boaz it will cause deadlocks (Boaz - can you please >> elaborate?). > > The object layoutreturn has the ability to pass a timeout error value to > the MDS precisely in order to allow the latter to deal with this kind of > issue. See the description of struct pnfs_osd_ioerr4 in rfc5664. > > The block layout is adding the same ability to layoutreturn in NFSv4.2 > (see draft-ietf-nfsv4-minorversion2-13.txt) via the struct > layoutreturn_device_error4, so presumably they too have a plan for > dealing with this kind of issue. > I'll wait for Boaz (or someone else) to explain what exactly is "this kind of issue". I still don't know why sync wait would deadlock. >> And come to think of it, nfs4_proc_setattr also returns the layout >> when there may be I/O in-flight (correct me if i'm wrong). So I guess >> pnfs_return_layout should somehow solve this by itself by saying "if >> this is fencing (a flag which will be set for file layout only), go >> ahead, otherwise make the layout as 'needs to be returned' and when >> the lseg lists gets empty return the layout". > > The only layout type that sets the PNFS_LAYOUTRET_ON_SETATTR flag is > objects, so that question needs to be directed to Boaz. > Yes, this entire thread is mainly directed to Boaz... And IIRC he did want to implement something of that sort, and this thread was basically for asking him "did you?". > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com > -- idank ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: return layout on error, BUG/deadlock 2012-08-09 16:54 ` Idan Kedar @ 2012-08-09 19:12 ` Myklebust, Trond 2012-08-09 23:04 ` Idan Kedar 0 siblings, 1 reply; 12+ messages in thread From: Myklebust, Trond @ 2012-08-09 19:12 UTC (permalink / raw) To: Idan Kedar; +Cc: Boaz Harrosh, NFS list, Benny Halevy, Peng Tao T24gVGh1LCAyMDEyLTA4LTA5IGF0IDE5OjU0ICswMzAwLCBJZGFuIEtlZGFyIHdyb3RlOg0KPiBP biBUaHUsIEF1ZyA5LCAyMDEyIGF0IDc6MDYgUE0sIE15a2xlYnVzdCwgVHJvbmQNCj4gPFRyb25k Lk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gPiBPbiBUaHUsIDIwMTItMDgtMDkgYXQg MTg6NDkgKzAzMDAsIElkYW4gS2VkYXIgd3JvdGU6DQo+ID4+IE9uIFRodSwgQXVnIDksIDIwMTIg YXQgNTowNSBQTSwgTXlrbGVidXN0LCBUcm9uZA0KPiA+PiA8VHJvbmQuTXlrbGVidXN0QG5ldGFw cC5jb20+IHdyb3RlOg0KPiA+PiA+PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiA+PiA+ PiBGcm9tOiBsaW51eC1uZnMtb3duZXJAdmdlci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtbmZz LQ0KPiA+PiA+PiBvd25lckB2Z2VyLmtlcm5lbC5vcmddIE9uIEJlaGFsZiBPZiBJZGFuIEtlZGFy DQo+ID4+ID4+IFNlbnQ6IFRodXJzZGF5LCBBdWd1c3QgMDksIDIwMTIgOTowMyBBTQ0KPiA+PiA+ PiBUbzogQm9heiBIYXJyb3NoOyBORlMgbGlzdA0KPiA+PiA+PiBDYzogQmVubnkgSGFsZXZ5DQo+ ID4+ID4+IFN1YmplY3Q6IHJldHVybiBsYXlvdXQgb24gZXJyb3IsIEJVRy9kZWFkbG9jaw0KPiA+ PiA+Pg0KPiA+PiA+PiBIaSwNCj4gPj4gPj4NCj4gPj4gPj4gQXMgYSByZXN1bHQgb2Ygc29tZSBl eHBlcmltZW50cywgSSB3YW50ZWQgdG8gc2VlIHdoYXQgaGFwcGVucyB3aGVuIEkNCj4gPj4gPj4g aW5qZWN0IGFuIGVycm9yIChoYXJkIGNvZGVkKSB0byB0aGUgb2JqZWN0IGxheW91dCBkcml2ZXIu IHRoZSBwYXRjaCBpcyBhdCB0aGUNCj4gPj4gPj4gYm90dG9tIG9mIHRoaXMgbWFpbC4gdGhlIHJl YXNvbiBJIGRpZCB0aGlzIGlzIGJlY2F1c2Ugd2hlbiBJIGluamVjdCBlcnJvcnMgaW4gbXkNCj4g Pj4gPj4gbW9kaWZpZWQgdmVyc2lvbiBvZiB0aGUgb2JqZWN0IGxheW91dCBkcml2ZXIsIEkgZ2V0 IHRoZSBzYW1lIEJVRyBUaWdyYW4NCj4gPj4gPj4gcmVwb3J0ZWQgYWJvdXQgeWVzdGVyZGF5Og0K PiA+PiA+PiBuZnM0cHJvYy5jOjYyNTIgOiAgIEJVR19PTighbGlzdF9lbXB0eSgmbG8tPnBsaF9z ZWdzKSk7DQo+ID4+ID4+DQo+ID4+ID4+IEluIG15IG1vZGlmaWVkIHZlcnNpb24gKGJhc2VkIG9u IGtlcm5lbCAzLjMpLCB0aGUgYnVnIHNlZW1zIHRvIGJlIHRoYXQNCj4gPj4gPj4gcG5mc19sZF93 cml0ZV9kb25lIGNhbGxzIHBuZnNfcmV0dXJuX2xheW91dCBpbiB0aGUgZXJyb3IgcGF0aCwgZXZl biBpZiB0aGVyZQ0KPiA+PiA+PiBpcyBpbi1mbGlnaHQgSS9PLg0KPiA+PiA+DQo+ID4+ID4gVGhh dCBpcyBub3QgYSBidWcuIEl0IGlzIGFuIGludGVudGlvbmFsIGNoYW5nZSBpbiBvcmRlciB0byBh bGxvdyB0aGUgTURTIHRvIGZlbmNlIG9mZiB0aGUgb3V0c3RhbmRpbmcgd3JpdGVzIChpZiBpdCBj YW4gZG8gc28pIGJlZm9yZSB3ZSByZXRyYW5zbWl0IHRoZW0gYXMgd3JpdGUtdGhyb3VnaC1NRFMu IE90aGVyd2lzZSwgeW91IHJpc2sgcmFjZXMgYmV0d2VlbiB0aGUgb3V0c3RhbmRpbmcgd3JpdGVz LXRvLURTIGFuZCB0aGUgbmV3IHdyaXRlcy10aHJvdWdoLU1EUy4NCj4gPj4NCj4gPj4gdG8gd2hh dCBjaGFuZ2UgYXJlIHlvdSByZWZlcnJpbmc/DQo+ID4NCj4gPiBBcyBJIHN0YXRlZCBpbiB0aGUg Y2hhbmdlbG9nIG9mIHRoZSBwYXRjaCB0aGF0IEkgc2VudCB0byB0aGUgbGlzdA0KPiA+IHllc3Rl cmRheSwgdGhlIGJlaGF2aW91ciBpcyBkdWUgdG8gY29tbWl0IDBhNTdjZGFjM2YuDQo+ID4NCj4g DQo+IEknbSBub3Qgc3VyZSBJJ20gZm9sbG93aW5nLiBJIHdhcyB0YWxraW5nIGFib3V0IHRoZSBz dGF0ZSBvZiB0aGUgY29kZQ0KPiB2My41LCBub3QgdG8gYW55IHNwZWNpZmljIHBhdGNoL2NoYW5n ZS4gc2luY2UgdGhlIGNsaWVudCB5ZWxsZWQNCj4gIkJVRyEhISIgYW5kIGNyYXNoZWQsIHRoZXJl IGlzIHNvbWUgYnVnIGludm9sdmVkLCBlaXRoZXIgYmVjYXVzZSB0aGUNCj4gQlVHKCkgaXMgY29y cmVjdCBvciBiZWNhdXNlIHRoZSBwcmVzZW5jZSBvZiB0aGUgQlVHKCkgaXMgdGhlIGJ1Zw0KPiBp dHNlbGYuDQo+IA0KPiA+PiA+DQo+ID4+ID4gU2VlIHRoZSBjaGFuZ2Vsb2cgaW4gdGhlIHBhdGNo IHRoYXQgSSBzZW50IHRvIHRoZSBsaXN0IHllc3RlcmRheS4NCj4gPj4gPg0KPiA+Pg0KPiA+PiBJ IHNhdyB0aGF0LCBhbmQgaWYgSSdtIG5vdCBtaXN0YWtlbiB0aGVzZSByYWNlcyBhcHBseSB0byBv YmplY3QgbGF5b3V0DQo+ID4+IGFzIHdlbGwsIGFuZCBpbiBhbnkgY2FzZSB0aGV5IGFwcGx5IGlu IG15IGNhc2UuIEhvd2V2ZXIsIGl0IGlzIG5vdA0KPiA+PiBlYXN5IHRvIG1lc3MgYXJvdW5kIHdp dGggTEFZT1VUUkVUVVJOIGluIG9iamVjdCBsYXlvdXQsIGFuZCB0aGVyZSBoYXZlDQo+ID4+IGJl ZW4gc2V2ZXJhbCBkaXNjdXNzaW9ucyBvbiB0aGUgaXNzdWUuIEluIG9uZSBvZiB0aGVzZSBkaXNj dXNzaW9ucw0KPiA+PiBCZW5ueSBjbGFyaWZpZWQgdGhhdCB0aGUgb2JqZWN0IGxheW91dCBjbGll bnQgbXVzdCB3YWl0IGZvciBhbGwNCj4gPj4gaW4tZmxpZ2h0IEkvTyB0byBlbmQuDQo+ID4NCj4g PiBJZiB0aGUgcHJvYmxlbSBpcyB0aGF0IHRoZSBEUyBpcyBmYWlsaW5nIHRvIHJlc3BvbmQsIGhv dyBkb2VzIHRoZSBjbGllbnQNCj4gPiBrbm93IHRoYXQgdGhlIGluLWZsaWdodCBJL08gaGFzIGVu ZGVkPw0KPiA+DQo+IA0KPiBhZnRlciB0aGUgbGFzdCBycGNfY2FsbF9kb25lIGlzIGNhbGxlZCB0 aGVyZSBpcyBubyBJL08gZnJvbSB0aGUNCj4gY2xpZW50J3MgcGVyc3BlY3RpdmUgYW4gdGhlIGxh eW91dCBjYW4gYmUgc2FmZWx5IHJldHVybmVkLCBhZnRlciB3aGljaA0KPiBJL08gY2FuIGJlIGRv bmUgdGhyb3VnaCB0aGUgTURTLg0KDQpOby4gWW91IGFyZSBjb25mdXNlZC4NCg0KQmVpbmcgaW4g cnBjX2NhbGxfZG9uZSBqdXN0IHRlbGxzIHlvdSB0aGF0IHRoZSBjbGllbnQgdGhpbmtzIGl0IGlz IGRvbmUuDQpJdCB0ZWxscyB5b3Ugbm90aGluZyBhYm91dCB3aGF0IHRoZSBEUyBpcyBkb2luZy4g SXQgbWF5IGp1c3Qgc2xvb29vb3dseQ0KYmUgcHJvY2Vzc2luZyB0aGUgV1JJVEUgY2FsbHMgZm9y IGFsbCB5b3Uga25vdy4NCg0KVW5sZXNzIHlvdSBnZXQgYSByZXNwb25zZSBmcm9tIHRoZSBEUyBz YXlpbmcgIkknbSBkb25lIiwgb3IgeW91IGhhdmUNCnNvbWUgb3RoZXIgbWVjaGFuaXNtIHRvIGd1 YXJhbnRlZSB0aGF0IHRoZSBEUyBpcyBkb25lIHdoZW4geW91IHRpbWUgb3V0LA0KdGhlbiB5b3Ug aGF2ZSBubyBndWFyYW50ZWUgdGhhdCBpdCBpcyBzYWZlIHRvIHNlbmQgdG8gTURTLg0KDQo+IElz IHRoZXJlIEkvTyBwZW5kaW5nIGZyb20gdGhlIERTIHBlcnNwZWN0aXZlPyBtYXliZSwgYnV0IGFs bCB5b3UgY2FuDQo+IGRvIGlzIHNlbmQgSS9PIHZpYSBNRFMgYW5kIGhvcGUgaXQgd2lsbCBub3Qg cmFjZS4gZmVuY2luZyBkb2Vzbid0DQo+IHJlYWxseSBzYXZlIHlvdSwgaXQganVzdCBpbXByb3Zl cyB5b3VyIG9kZHMgd2hlcmUgYXBwbGljYWJsZSwgaS5lLg0KPiBmaWxlIGxheW91dC4NCg0KQWN0 dWFsbHksIHdoZW4gcmVhZGluZyB0aGUgaVNDU0kgc3BlYywgaXQgaW1wbGllcyB0aGF0IHRoZXJl IGlzIGEga25vd24NCnNlc3Npb24gdGltZW91dCB0aGF0IGlzIG5lZ290aWF0ZWQgYmV0d2VlbiB0 aGUgaW5pdGlhdG9yIGFuZCB0YXJnZXQuDQpBZnRlciB0aGUgc2Vzc2lvbiBleHBpcmVzLCB5b3Ug YXJlIGd1YXJhbnRlZWQgdGhhdCBubyBmdXJ0aGVyIGNvbW1hbmRzDQphcmUgcnVubmluZyBvbiB0 aGUgRFMuDQoNClRoZSBpRkNQIHByb3RvY29sIGF0dGFjaGVzIGEgdGltZW91dCB2YWx1ZSB0byBl YWNoIGZyYW1lLCBhbmQgaXMNCnN1cHBvc2VkIHRvIGVuZm9yY2UgdGhhdCB0aW1lb3V0Lg0KDQpT byBhdCBsZWFzdCB0aG9zZSBwcm90b2NvbHMgc2hvdWxkIGhhdmUgYSBkZXRlcm1pbmlzdGljIGJl aGF2aW91ciB0aGF0DQpjYW4gYmUgdXNlZCBieSB0aGUgY2xpZW50Lg0KDQo+ID4+IFNvIGZvciBm aWxlIGxheW91dCBpdCBwcm9iYWJseSBtYWtlcyBzZW5zZSwgYnV0IG9iamVjdCBsYXlvdXQgKGFu ZCBpZg0KPiA+PiBJIHVuZGVyc3RhbmQgY29ycmVjdGx5LCBibG9jayBsYXlvdXQgYXMgd2VsbCkg c29tZXRoaW5nIGVsc2UgbmVlZHMgdG8NCj4gPj4gYmUgZG9uZS4gSSB0aG91Z2h0IGFib3V0IHN5 bmMgd2FpdCB3aGVuIHJldHVybmluZyB0aGUgbGF5b3V0IG9uIGVycm9yLA0KPiA+PiBidXQgYWNj b3JkaW5nIHRvIEJvYXogaXQgd2lsbCBjYXVzZSBkZWFkbG9ja3MgKEJvYXogLSBjYW4geW91IHBs ZWFzZQ0KPiA+PiBlbGFib3JhdGU/KS4NCj4gPg0KPiA+IFRoZSBvYmplY3QgbGF5b3V0cmV0dXJu IGhhcyB0aGUgYWJpbGl0eSB0byBwYXNzIGEgdGltZW91dCBlcnJvciB2YWx1ZSB0bw0KPiA+IHRo ZSBNRFMgcHJlY2lzZWx5IGluIG9yZGVyIHRvIGFsbG93IHRoZSBsYXR0ZXIgdG8gZGVhbCB3aXRo IHRoaXMga2luZCBvZg0KPiA+IGlzc3VlLiBTZWUgdGhlIGRlc2NyaXB0aW9uIG9mIHN0cnVjdCBw bmZzX29zZF9pb2VycjQgaW4gcmZjNTY2NC4NCj4gPg0KPiA+IFRoZSBibG9jayBsYXlvdXQgaXMg YWRkaW5nIHRoZSBzYW1lIGFiaWxpdHkgdG8gbGF5b3V0cmV0dXJuIGluIE5GU3Y0LjINCj4gPiAo c2VlIGRyYWZ0LWlldGYtbmZzdjQtbWlub3J2ZXJzaW9uMi0xMy50eHQpIHZpYSB0aGUgc3RydWN0 DQo+ID4gbGF5b3V0cmV0dXJuX2RldmljZV9lcnJvcjQsIHNvIHByZXN1bWFibHkgdGhleSB0b28g aGF2ZSBhIHBsYW4gZm9yDQo+ID4gZGVhbGluZyB3aXRoIHRoaXMga2luZCBvZiBpc3N1ZS4NCj4g Pg0KPiANCj4gSSdsbCB3YWl0IGZvciBCb2F6IChvciBzb21lb25lIGVsc2UpIHRvIGV4cGxhaW4g d2hhdCBleGFjdGx5IGlzICJ0aGlzDQo+IGtpbmQgb2YgaXNzdWUiLiBJIHN0aWxsIGRvbid0IGtu b3cgd2h5IHN5bmMgd2FpdCB3b3VsZCBkZWFkbG9jay4NCj4gDQo+ID4+IEFuZCBjb21lIHRvIHRo aW5rIG9mIGl0LCBuZnM0X3Byb2Nfc2V0YXR0ciBhbHNvIHJldHVybnMgdGhlIGxheW91dA0KPiA+ PiB3aGVuIHRoZXJlIG1heSBiZSBJL08gaW4tZmxpZ2h0IChjb3JyZWN0IG1lIGlmIGknbSB3cm9u ZykuIFNvIEkgZ3Vlc3MNCj4gPj4gcG5mc19yZXR1cm5fbGF5b3V0IHNob3VsZCBzb21laG93IHNv bHZlIHRoaXMgYnkgaXRzZWxmIGJ5IHNheWluZyAiaWYNCj4gPj4gdGhpcyBpcyBmZW5jaW5nIChh IGZsYWcgd2hpY2ggd2lsbCBiZSBzZXQgZm9yIGZpbGUgbGF5b3V0IG9ubHkpLCBnbw0KPiA+PiBh aGVhZCwgb3RoZXJ3aXNlIG1ha2UgdGhlIGxheW91dCBhcyAnbmVlZHMgdG8gYmUgcmV0dXJuZWQn IGFuZCB3aGVuDQo+ID4+IHRoZSBsc2VnIGxpc3RzIGdldHMgZW1wdHkgcmV0dXJuIHRoZSBsYXlv dXQiLg0KPiA+DQo+ID4gVGhlIG9ubHkgbGF5b3V0IHR5cGUgdGhhdCBzZXRzIHRoZSBQTkZTX0xB WU9VVFJFVF9PTl9TRVRBVFRSIGZsYWcgaXMNCj4gPiBvYmplY3RzLCBzbyB0aGF0IHF1ZXN0aW9u IG5lZWRzIHRvIGJlIGRpcmVjdGVkIHRvIEJvYXouDQo+ID4NCj4gDQo+IFllcywgdGhpcyBlbnRp cmUgdGhyZWFkIGlzIG1haW5seSBkaXJlY3RlZCB0byBCb2F6Li4uIEFuZCBJSVJDIGhlIGRpZA0K PiB3YW50IHRvIGltcGxlbWVudCBzb21ldGhpbmcgb2YgdGhhdCBzb3J0LCBhbmQgdGhpcyB0aHJl YWQgd2FzDQo+IGJhc2ljYWxseSBmb3IgYXNraW5nIGhpbSAiZGlkIHlvdT8iLg0KPiANCj4gPiAt LQ0KPiA+IFRyb25kIE15a2xlYnVzdA0KPiA+IExpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0K PiA+DQo+ID4gTmV0QXBwDQo+ID4gVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCj4gPiB3d3cu bmV0YXBwLmNvbQ0KPiA+DQo+IA0KPiANCj4gDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51 eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBw LmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg0K ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: return layout on error, BUG/deadlock 2012-08-09 19:12 ` Myklebust, Trond @ 2012-08-09 23:04 ` Idan Kedar 0 siblings, 0 replies; 12+ messages in thread From: Idan Kedar @ 2012-08-09 23:04 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Boaz Harrosh, NFS list, Benny Halevy, Peng Tao On Thu, Aug 9, 2012 at 10:12 PM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Thu, 2012-08-09 at 19:54 +0300, Idan Kedar wrote: >> On Thu, Aug 9, 2012 at 7:06 PM, Myklebust, Trond >> <Trond.Myklebust@netapp.com> wrote: >> > On Thu, 2012-08-09 at 18:49 +0300, Idan Kedar wrote: >> >> On Thu, Aug 9, 2012 at 5:05 PM, Myklebust, Trond >> >> <Trond.Myklebust@netapp.com> wrote: >> >> >> -----Original Message----- >> >> >> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- >> >> >> owner@vger.kernel.org] On Behalf Of Idan Kedar >> >> >> Sent: Thursday, August 09, 2012 9:03 AM >> >> >> To: Boaz Harrosh; NFS list >> >> >> Cc: Benny Halevy >> >> >> Subject: return layout on error, BUG/deadlock >> >> >> >> >> >> Hi, >> >> >> >> >> >> As a result of some experiments, I wanted to see what happens when I >> >> >> inject an error (hard coded) to the object layout driver. the patch is at the >> >> >> bottom of this mail. the reason I did this is because when I inject errors in my >> >> >> modified version of the object layout driver, I get the same BUG Tigran >> >> >> reported about yesterday: >> >> >> nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); >> >> >> >> >> >> In my modified version (based on kernel 3.3), the bug seems to be that >> >> >> pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there >> >> >> is in-flight I/O. >> >> > >> >> > That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS. >> >> >> >> to what change are you referring? >> > >> > As I stated in the changelog of the patch that I sent to the list >> > yesterday, the behaviour is due to commit 0a57cdac3f. >> > >> >> I'm not sure I'm following. I was talking about the state of the code >> v3.5, not to any specific patch/change. since the client yelled >> "BUG!!!" and crashed, there is some bug involved, either because the >> BUG() is correct or because the presence of the BUG() is the bug >> itself. >> >> >> > >> >> > See the changelog in the patch that I sent to the list yesterday. >> >> > >> >> >> >> I saw that, and if I'm not mistaken these races apply to object layout >> >> as well, and in any case they apply in my case. However, it is not >> >> easy to mess around with LAYOUTRETURN in object layout, and there have >> >> been several discussions on the issue. In one of these discussions >> >> Benny clarified that the object layout client must wait for all >> >> in-flight I/O to end. >> > >> > If the problem is that the DS is failing to respond, how does the client >> > know that the in-flight I/O has ended? >> > >> >> after the last rpc_call_done is called there is no I/O from the >> client's perspective an the layout can be safely returned, after which >> I/O can be done through the MDS. > > No. You are confused. > > Being in rpc_call_done just tells you that the client thinks it is done. > It tells you nothing about what the DS is doing. It may just slooooowly > be processing the WRITE calls for all you know. > That's what I meant by "from the client's perspective". At this point, the client can say "I'm not *using* this layout anymore" because even if some I/O on its way to the DS originates in the client, the client can't do anything about it. The fate of this I/O is meaningless to the client at that point. This is the best the client can do to help the MDS preserve integrity - keep the layout as long as it has information about its I/O. > Unless you get a response from the DS saying "I'm done", or you have > some other mechanism to guarantee that the DS is done when you time out, > then you have no guarantee that it is safe to send to MDS. > Exactly. What if I don't have such a mechanism? My options are: * Ask the MDS to somehow fence in-flight I/O and resend through the MDS. * Return an error to the user. What if the server does not support fencing? In that case we shouldn't send through MDS while there is in-flight I/O. Do we want to assume fencing is implemented? And if fencing isn't implemented and we return an error to the user - how do we know when it is safe to perform any I/O again? And even if fencing is implemented, I don't think the client should rely on it, because the whole idea beind pNFS is a smart client that offloads what it can from the servers. >> Is there I/O pending from the DS perspective? maybe, but all you can >> do is send I/O via MDS and hope it will not race. fencing doesn't >> really save you, it just improves your odds where applicable, i.e. >> file layout. One correction to what I said: Fencing does save you, if it's implemented. LAYOUTRETURN doesn't save you (when fencing is not implemented). > > Actually, when reading the iSCSI spec, it implies that there is a known > session timeout that is negotiated between the initiator and target. > After the session expires, you are guaranteed that no further commands > are running on the DS. > > The iFCP protocol attaches a timeout value to each frame, and is > supposed to enforce that timeout. > > So at least those protocols should have a deterministic behaviour that > can be used by the client. > >> >> So for file layout it probably makes sense, but object layout (and if >> >> I understand correctly, block layout as well) something else needs to >> >> be done. I thought about sync wait when returning the layout on error, >> >> but according to Boaz it will cause deadlocks (Boaz - can you please >> >> elaborate?). >> > >> > The object layoutreturn has the ability to pass a timeout error value to >> > the MDS precisely in order to allow the latter to deal with this kind of >> > issue. See the description of struct pnfs_osd_ioerr4 in rfc5664. >> > >> > The block layout is adding the same ability to layoutreturn in NFSv4.2 >> > (see draft-ietf-nfsv4-minorversion2-13.txt) via the struct >> > layoutreturn_device_error4, so presumably they too have a plan for >> > dealing with this kind of issue. >> > Then on layoutreturn we could call a new layout driver method, end_io or something returns only when there is no danger of race (either by waiting for in-flight I/O to complete/timeout/error out or by fencing) and only then return the layout. This works for me. I don't know if it works for Boaz though. >> >> I'll wait for Boaz (or someone else) to explain what exactly is "this >> kind of issue". I still don't know why sync wait would deadlock. >> >> >> And come to think of it, nfs4_proc_setattr also returns the layout >> >> when there may be I/O in-flight (correct me if i'm wrong). So I guess >> >> pnfs_return_layout should somehow solve this by itself by saying "if >> >> this is fencing (a flag which will be set for file layout only), go >> >> ahead, otherwise make the layout as 'needs to be returned' and when >> >> the lseg lists gets empty return the layout". >> > >> > The only layout type that sets the PNFS_LAYOUTRET_ON_SETATTR flag is >> > objects, so that question needs to be directed to Boaz. >> > >> >> Yes, this entire thread is mainly directed to Boaz... And IIRC he did >> want to implement something of that sort, and this thread was >> basically for asking him "did you?". >> >> > -- >> > Trond Myklebust >> > Linux NFS client maintainer >> > >> > NetApp >> > Trond.Myklebust@netapp.com >> > www.netapp.com >> > >> >> >> > > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com > -- idank ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-08-10 2:47 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-08-09 13:03 return layout on error, BUG/deadlock Idan Kedar 2012-08-09 14:05 ` Myklebust, Trond 2012-08-09 15:49 ` Idan Kedar 2012-08-09 16:06 ` Myklebust, Trond 2012-08-09 16:34 ` Peng Tao 2012-08-09 16:37 ` Myklebust, Trond 2012-08-09 16:48 ` Peng Tao 2012-08-09 19:31 ` Myklebust, Trond 2012-08-10 2:46 ` Peng Tao 2012-08-09 16:54 ` Idan Kedar 2012-08-09 19:12 ` Myklebust, Trond 2012-08-09 23:04 ` Idan Kedar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).