* SQUASHME: missing from FIXME: async layout return @ 2010-05-13 8:40 Boaz Harrosh 2010-05-13 14:07 ` William A. (Andy) Adamson 2010-05-13 14:19 ` William A. (Andy) Adamson 0 siblings, 2 replies; 12+ messages in thread From: Boaz Harrosh @ 2010-05-13 8:40 UTC (permalink / raw) To: Benny Halevy, NFS list I've tested the patch: FIXME: async layout return And there is a missing small hunk I have tested with this patch and it is a very good patch that should also go into 2.6.33. It is necessary in the rare case when one inode have more then one open_context. (For some reason I see that happening much more in 2.6.34 I don't understand why) Boaz --- git diff --stat -p -M fs/nfs/nfs4state.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 15c8bc8..6dbe893 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm struct nfs_inode *nfsi = NFS_I(state->inode); if (nfsi->layoutcommit_ctx) - pnfs_layoutcommit_inode(state->inode, 0); + pnfs_layoutcommit_inode(state->inode, wait); if (has_layout(nfsi) && nfsi->layout.roc_iomode) { struct nfs4_pnfs_layout_segment range; ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: SQUASHME: missing from FIXME: async layout return 2010-05-13 8:40 SQUASHME: missing from FIXME: async layout return Boaz Harrosh @ 2010-05-13 14:07 ` William A. (Andy) Adamson [not found] ` <AANLkTilE4ao7UadJD58EzbExdfwE4zU5DF-6Nit29Yjq-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2010-05-13 14:19 ` William A. (Andy) Adamson 1 sibling, 1 reply; 12+ messages in thread From: William A. (Andy) Adamson @ 2010-05-13 14:07 UTC (permalink / raw) To: Boaz Harrosh; +Cc: Benny Halevy, NFS list Hi Boaz I've been chasing this bug. Where is the rest of the FIXME: async layout return patch? I can't find it. Thanks -->Andy On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <bharrosh@panasas.com> wr= ote: > I've tested the patch: > =A0 =A0 =A0 =A0FIXME: async layout return > > And there is a missing small hunk > > I have tested with this patch and it is a very good patch > that should also go into 2.6.33. It is necessary in the rare > case when one inode have more then one open_context. > > (For some reason I see that happening much more in 2.6.34 > =A0I don't understand why) > > Boaz > --- > git diff --stat -p -M > =A0fs/nfs/nfs4state.c | =A0 =A02 +- > =A01 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > index 15c8bc8..6dbe893 100644 > --- a/fs/nfs/nfs4state.c > +++ b/fs/nfs/nfs4state.c > @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, struc= t nfs4_state *state, fmode_t fm > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs_inode *nfsi =3D NFS_I(state= ->inode); > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (nfsi->layoutcommit_ctx) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit_inode= (state->inode, 0); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit_inode= (state->inode, wait); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (has_layout(nfsi) && nfsi->layout.r= oc_iomode) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs4_pnfs_layou= t_segment range; > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" = in > the body of a message to majordomo@vger.kernel.org > More majordomo info at =A0http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <AANLkTilE4ao7UadJD58EzbExdfwE4zU5DF-6Nit29Yjq-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: SQUASHME: missing from FIXME: async layout return [not found] ` <AANLkTilE4ao7UadJD58EzbExdfwE4zU5DF-6Nit29Yjq-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-05-13 14:39 ` Benny Halevy 0 siblings, 0 replies; 12+ messages in thread From: Benny Halevy @ 2010-05-13 14:39 UTC (permalink / raw) To: William A. (Andy) Adamson; +Cc: Boaz Harrosh, NFS list On May. 13, 2010, 17:07 +0300, "William A. (Andy) Adamson" <androsadamson@gmail.com> wrote: > Hi Boaz > > I've been chasing this bug. Where is the rest of the FIXME: async > layout return patch? > I can't find it. Sorry, it's not released yet. I wanted to do more testing with it to make sure it does the right thing. Here it is: >From 8052205dab7c8493bbd5837997ba79d95c8151cc Mon Sep 17 00:00:00 2001 From: Benny Halevy <bhalevy@panasas.com> Date: Wed, 14 Apr 2010 19:05:13 +0300 Subject: [PATCH] FIXME: async layout return FIXME: currently there's only one path that requires async layout return to avoid blocking of the rpciod like the stack below. We really neee the pnfs state machine instead. rpciod/0 D 0000003d8c2d83b0 0 1029 2 0x00000000 778ccdd0 60264f00 77a6a000 778ccb20 77a6ba20 6001375b 77a6ba20 77a6a000 77a6a000 778cc8a0 77a6ba70 601b192f 77a6ba50 601b35a6 6ed537f0 77a6a000 77a6bb00 63506fd8 7b1870c8 77a6bb10 77a6ba90 7b187100 63506fd8 77a6a000 Call Trace: 77a6b9f8: [<6001375b>] _switch_to+0x5e/0xae 77a6ba28: [<601b192f>] schedule+0x1dd/0x21b 77a6ba38: [<601b35a6>] _raw_spin_unlock_irqrestore+0x18/0x1c 77a6ba60: [<7b1870c8>] rpc_wait_bit_killable+0x0/0x3c [sunrpc] 77a6ba78: [<7b187100>] rpc_wait_bit_killable+0x38/0x3c [sunrpc] 77a6ba98: [<601b1d9f>] __wait_on_bit+0x43/0x76 77a6bae8: [<601b1e43>] out_of_line_wait_on_bit+0x71/0x7c 77a6baf8: [<7b1870c8>] rpc_wait_bit_killable+0x0/0x3c [sunrpc] 77a6bb20: [<600430d6>] wake_bit_function+0x0/0x2e 77a6bb68: [<7b18709d>] __rpc_wait_for_completion_task+0x34/0x36 [sunrpc] 77a6bb78: [<7bf36048>] nfs4_wait_for_completion_rpc_task+0xb/0xd [nfs] 77a6bb88: [<7bf369ac>] pnfs4_proc_layoutreturn+0xae/0xe6 [nfs] 77a6bbc8: [<6007c8b1>] kmem_cache_alloc+0xaf/0xbe 77a6bc08: [<7bf4dc39>] _pnfs_return_layout+0x450/0x4d4 [nfs] 77a6bc38: [<7bf3cb79>] decode_attr_time+0x17/0x4d [nfs] 77a6bc58: [<7bf429d7>] decode_getfattr+0xaae/0xc02 [nfs] 77a6bcb8: [<7bf4601b>] __nfs4_close+0x15d/0x17a [nfs] 77a6bd28: [<7bf46053>] nfs4_close_state+0xb/0xd [nfs] 77a6bd38: [<7bf34183>] nfs4_close_context+0x26/0x28 [nfs] 77a6bd48: [<7bf2315c>] __put_nfs_open_context+0x79/0xa1 [nfs] 77a6bd78: [<7bf23227>] put_nfs_open_context+0xb/0xd [nfs] 77a6bd88: [<7bf4b763>] pnfs_layoutcommit_done+0xa5/0xad [nfs] 77a6bdb8: [<7bf3b12e>] pnfs_layoutcommit_rpc_done+0x39/0x56 [nfs] 77a6bde8: [<7b18712b>] rpc_exit_task+0x27/0x52 [sunrpc] 77a6be08: [<7b187817>] __rpc_execute+0x88/0x23a [sunrpc] 77a6be30: [<7b1879ee>] rpc_async_schedule+0x0/0x12 [sunrpc] 77a6be48: [<7b1879fe>] rpc_async_schedule+0x10/0x12 [sunrpc] 77a6be58: [<6003fc45>] worker_thread+0x114/0x1a5 77a6be80: [<600430a2>] autoremove_wake_function+0x0/0x34 77a6beb8: [<6003fb31>] worker_thread+0x0/0x1a5 77a6bed8: [<60042cf7>] kthread+0x8e/0x96 77a6bf48: [<60021429>] run_kernel_thread+0x41/0x4a 77a6bf58: [<60042c69>] kthread+0x0/0x96 77a6bf98: [<60021410>] run_kernel_thread+0x28/0x4a 77a6bfc8: [<600136d3>] new_thread_handler+0x71/0x9b Signed-off-by: Benny Halevy <bhalevy@panasas.com> --- fs/nfs/callback_proc.c | 7 ++++--- fs/nfs/inode.c | 2 +- fs/nfs/nfs4proc.c | 17 ++++++++++------- fs/nfs/nfs4state.c | 2 +- fs/nfs/pnfs.c | 21 +++++++++++++-------- fs/nfs/pnfs.h | 9 +++++---- 6 files changed, 34 insertions(+), 24 deletions(-) diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c index ebf86df..33ef5c0 100644 --- a/fs/nfs/callback_proc.c +++ b/fs/nfs/callback_proc.c @@ -214,7 +214,7 @@ static int pnfs_recall_layout(void *data) if (rl.cbl_recall_type == RETURN_FILE) { status = pnfs_return_layout(inode, &rl.cbl_seg, &rl.cbl_stateid, - RETURN_FILE); + RETURN_FILE, true); if (status) dprintk("%s RETURN_FILE error: %d\n", __func__, status); goto out; @@ -226,12 +226,13 @@ static int pnfs_recall_layout(void *data) /* FIXME: This loop is inefficient, running in O(|s_inodes|^2) */ while ((ino = nfs_layoutrecall_find_inode(clp, &rl)) != NULL) { /* XXX need to check status on pnfs_return_layout */ - pnfs_return_layout(ino, &rl.cbl_seg, NULL, RETURN_FILE); + pnfs_return_layout(ino, &rl.cbl_seg, NULL, RETURN_FILE, true); iput(ino); } /* send final layoutreturn */ - status = pnfs_return_layout(inode, &rl.cbl_seg, NULL, rl.cbl_recall_type); + status = pnfs_return_layout(inode, &rl.cbl_seg, NULL, + rl.cbl_recall_type, true); if (status) printk(KERN_INFO "%s: ignoring pnfs_return_layout status=%d\n", __func__, status); diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index f0b8676..42fac75 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -1320,7 +1320,7 @@ void nfs4_clear_inode(struct inode *inode) /* First call standard NFS clear_inode() code */ nfs_clear_inode(inode); #ifdef CONFIG_NFS_V4_1 - pnfs_return_layout(inode, NULL, NULL, RETURN_FILE); + pnfs_return_layout(inode, NULL, NULL, RETURN_FILE, true); #endif /* CONFIG_NFS_V4_1 */ } #endif diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 82cd2ea..c01ecd7 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -1081,7 +1081,8 @@ static void pnfs4_layout_reclaim(struct nfs4_state *state) /* FIXME: send gratuitous layout commits and return with the reclaim * flag during grace period */ - pnfs_return_layout(state->inode, NULL, &state->open_stateid, RETURN_FILE); + pnfs_return_layout(state->inode, NULL, &state->open_stateid, + RETURN_FILE, true); pnfs_set_layout_stateid(&NFS_I(state->inode)->layout, &zero_stateid); #endif /* CONFIG_NFS_V4_1 */ } @@ -2375,7 +2376,7 @@ pnfs4_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr, if (pnfs_enabled_sb(server) && has_layout(nfsi) && pnfs_ld_layoutret_on_setattr(server->pnfs_curr_ld)) - pnfs_return_layout(inode, NULL, NULL, RETURN_FILE); + pnfs_return_layout(inode, NULL, NULL, RETURN_FILE, true); return nfs4_proc_setattr(dentry, fattr, sattr); } #endif /* CONFIG_NFS_V4_1 */ @@ -5736,7 +5737,7 @@ static const struct rpc_call_ops nfs4_pnfs_layoutreturn_call_ops = { .rpc_release = nfs4_pnfs_layoutreturn_release, }; -int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp) +int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp, bool wait) { struct inode *ino = lrp->args.inode; struct nfs_server *server = NFS_SERVER(ino); @@ -5753,7 +5754,7 @@ int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp) .callback_data = lrp, .flags = RPC_TASK_ASYNC, }; - int status; + int status = 0; dprintk("--> %s\n", __func__); lrp->res.seq_res.sr_slotid = NFS4_MAX_SLOT_TABLE; @@ -5762,9 +5763,11 @@ int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp) status = PTR_ERR(task); goto out; } - status = nfs4_wait_for_completion_rpc_task(task); - if (status == 0) - status = task->tk_status; + if (wait) { + status = nfs4_wait_for_completion_rpc_task(task); + if (status == 0) + status = task->tk_status; + } rpc_put_task(task); out: dprintk("<-- %s\n", __func__); diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 15c8bc8..cfaa1be 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -598,7 +598,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm range.offset = 0; range.length = NFS4_MAX_UINT64; pnfs_return_layout(state->inode, &range, NULL, - RETURN_FILE); + RETURN_FILE, wait); } #endif /* CONFIG_NFS_V4_1 */ nfs4_do_close(path, state, wait); diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index 3739c38..06f20b9 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -675,7 +675,8 @@ pnfs_return_layout_barrier(struct nfs_inode *nfsi, static int return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range, const nfs4_stateid *stateid, /* optional */ - enum pnfs_layoutreturn_type type, struct pnfs_layout_type *lo) + enum pnfs_layoutreturn_type type, struct pnfs_layout_type *lo, + bool wait) { struct nfs4_pnfs_layoutreturn *lrp; struct nfs_server *server = NFS_SERVER(ino); @@ -700,7 +701,7 @@ return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range, else if (lo) pnfs_get_layout_stateid(&lrp->args.stateid, lo); - status = pnfs4_proc_layoutreturn(lrp); + status = pnfs4_proc_layoutreturn(lrp, wait); out: dprintk("<-- %s status: %d\n", __func__, status); return status; @@ -709,7 +710,8 @@ out: int _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range, const nfs4_stateid *stateid, /* optional */ - enum pnfs_layoutreturn_type type) + enum pnfs_layoutreturn_type type, + bool wait) { struct pnfs_layout_type *lo = NULL; struct nfs_inode *nfsi = NFS_I(ino); @@ -727,7 +729,7 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range, } if (type == RETURN_FILE) { if (nfsi->layoutcommit_ctx) { - status = pnfs_layoutcommit_inode(ino, 1); + status = pnfs_layoutcommit_inode(ino, wait); if (status) { dprintk("%s: layoutcommit failed, status=%d. " "Returning layout anyway\n", @@ -765,7 +767,7 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range, } } send_return: - status = return_layout(ino, &arg, stateid, type, lo); + status = return_layout(ino, &arg, stateid, type, lo, wait); out: dprintk("<-- %s status: %d\n", __func__, status); return status; @@ -1680,7 +1682,8 @@ pnfs_writeback_done(struct nfs_write_data *data) .length = data->args.count, }; dprintk("%s: retrying\n", __func__); - _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE); + _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE, + true); pnfs_initiate_write(data, NFS_CLIENT(data->inode), pdata->call_ops, pdata->how); } @@ -1811,7 +1814,8 @@ pnfs_read_done(struct nfs_read_data *data) .length = data->args.count, }; dprintk("%s: retrying\n", __func__); - _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE); + _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE, + true); pnfs_initiate_read(data, NFS_CLIENT(data->inode), pdata->call_ops); } @@ -2037,7 +2041,8 @@ pnfs_commit_done(struct nfs_write_data *data) .length = data->args.count, }; dprintk("%s: retrying\n", __func__); - _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE); + _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE, + true); pnfs_initiate_commit(data, NFS_CLIENT(data->inode), pdata->call_ops, pdata->how); } diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h index 47160c5..524a7cd 100644 --- a/fs/nfs/pnfs.h +++ b/fs/nfs/pnfs.h @@ -29,7 +29,7 @@ extern int nfs4_pnfs_getdeviceinfo(struct nfs_server *server, struct pnfs_device *dev); extern int pnfs4_proc_layoutget(struct nfs4_pnfs_layoutget *lgp); extern int pnfs4_proc_layoutcommit(struct pnfs_layoutcommit_data *data); -extern int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp); +extern int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp, bool wait); /* pnfs.c */ extern const nfs4_stateid zero_stateid; @@ -40,7 +40,7 @@ int pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx, int _pnfs_return_layout(struct inode *, struct nfs4_pnfs_layout_segment *, const nfs4_stateid *stateid, /* optional */ - enum pnfs_layoutreturn_type); + enum pnfs_layoutreturn_type, bool wait); void set_pnfs_layoutdriver(struct nfs_server *, const struct nfs_fh *mntfh, u32 id); void unmount_pnfs_layoutdriver(struct nfs_server *); int pnfs_use_read(struct inode *inode, ssize_t count); @@ -247,14 +247,15 @@ static inline void pnfs_modify_new_request(struct nfs_page *req, static inline int pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *lseg, const nfs4_stateid *stateid, /* optional */ - enum pnfs_layoutreturn_type type) + enum pnfs_layoutreturn_type type, + bool wait) { struct nfs_inode *nfsi = NFS_I(ino); struct nfs_server *nfss = NFS_SERVER(ino); if (pnfs_enabled_sb(nfss) && (type != RETURN_FILE || has_layout(nfsi))) - return _pnfs_return_layout(ino, lseg, stateid, type); + return _pnfs_return_layout(ino, lseg, stateid, type, wait); return 0; } -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: SQUASHME: missing from FIXME: async layout return 2010-05-13 8:40 SQUASHME: missing from FIXME: async layout return Boaz Harrosh 2010-05-13 14:07 ` William A. (Andy) Adamson @ 2010-05-13 14:19 ` William A. (Andy) Adamson [not found] ` <AANLkTind3eXmuPw5HrKqB6bHJSSXFL3g2cDSEyyYc1n8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 12+ messages in thread From: William A. (Andy) Adamson @ 2010-05-13 14:19 UTC (permalink / raw) To: Boaz Harrosh; +Cc: Benny Halevy, NFS list On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <bharrosh@panasas.com> wr= ote: > I've tested the patch: > =A0 =A0 =A0 =A0FIXME: async layout return > > And there is a missing small hunk > > I have tested with this patch and it is a very good patch > that should also go into 2.6.33. It is necessary in the rare > case when one inode have more then one open_context. Do you mean more than one open context per open owner? -->Andy > > (For some reason I see that happening much more in 2.6.34 > =A0I don't understand why) > > Boaz > --- > git diff --stat -p -M > =A0fs/nfs/nfs4state.c | =A0 =A02 +- > =A01 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > index 15c8bc8..6dbe893 100644 > --- a/fs/nfs/nfs4state.c > +++ b/fs/nfs/nfs4state.c > @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, struc= t nfs4_state *state, fmode_t fm > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs_inode *nfsi =3D NFS_I(state= ->inode); > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (nfsi->layoutcommit_ctx) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit_inode= (state->inode, 0); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit_inode= (state->inode, wait); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (has_layout(nfsi) && nfsi->layout.r= oc_iomode) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs4_pnfs_layou= t_segment range; > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" = in > the body of a message to majordomo@vger.kernel.org > More majordomo info at =A0http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <AANLkTind3eXmuPw5HrKqB6bHJSSXFL3g2cDSEyyYc1n8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: SQUASHME: missing from FIXME: async layout return [not found] ` <AANLkTind3eXmuPw5HrKqB6bHJSSXFL3g2cDSEyyYc1n8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-05-13 14:40 ` Benny Halevy 2010-05-13 15:17 ` William A. (Andy) Adamson 0 siblings, 1 reply; 12+ messages in thread From: Benny Halevy @ 2010-05-13 14:40 UTC (permalink / raw) To: William A. (Andy) Adamson; +Cc: Boaz Harrosh, NFS list On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <androsadamson@gmail.com> wrote: > On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <bharrosh@panasas.com> wrote: >> I've tested the patch: >> FIXME: async layout return >> >> And there is a missing small hunk >> >> I have tested with this patch and it is a very good patch >> that should also go into 2.6.33. It is necessary in the rare >> case when one inode have more then one open_context. > > Do you mean more than one open context per open owner? What we see is one "regular" open context and one which is the layout_commit_ctx Benny > > -->Andy > >> >> (For some reason I see that happening much more in 2.6.34 >> I don't understand why) >> >> Boaz >> --- >> git diff --stat -p -M >> fs/nfs/nfs4state.c | 2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c >> index 15c8bc8..6dbe893 100644 >> --- a/fs/nfs/nfs4state.c >> +++ b/fs/nfs/nfs4state.c >> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm >> struct nfs_inode *nfsi = NFS_I(state->inode); >> >> if (nfsi->layoutcommit_ctx) >> - pnfs_layoutcommit_inode(state->inode, 0); >> + pnfs_layoutcommit_inode(state->inode, wait); >> if (has_layout(nfsi) && nfsi->layout.roc_iomode) { >> struct nfs4_pnfs_layout_segment range; >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: SQUASHME: missing from FIXME: async layout return 2010-05-13 14:40 ` Benny Halevy @ 2010-05-13 15:17 ` William A. (Andy) Adamson [not found] ` <AANLkTilhbkde-kAPGSBHT1b8vIZq4cM1SPpI-Rd0iOb7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: William A. (Andy) Adamson @ 2010-05-13 15:17 UTC (permalink / raw) To: Benny Halevy; +Cc: Boaz Harrosh, NFS list On Thu, May 13, 2010 at 10:40 AM, Benny Halevy <bhalevy@panasas.com> wr= ote: > On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <androsada= mson@gmail.com> wrote: >> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <bharrosh@panasas.com>= wrote: >>> I've tested the patch: >>> =A0 =A0 =A0 =A0FIXME: async layout return >>> >>> And there is a missing small hunk >>> >>> I have tested with this patch and it is a very good patch >>> that should also go into 2.6.33. It is necessary in the rare >>> case when one inode have more then one open_context. >> >> Do you mean more than one open context per open owner? > > What we see is one "regular" open context and one which is the layout= _commit_ctx Isn't that a BUG? Here is what we're seeing: nfs_file_release->nfs_release->nfs_file_clear_open_context->__put_nfs_o= pen_context-> NFS_PROTO(inode)->close_context->nfs4_close_sync->__nfs4_close->pnfs_la= youtcommit_inode, This is the same code that Boaz's 'missing small hunk' adds the wait to the pnfs_layout_commit_inode to. This is good, because when __nfs4_close is called with sync, every thing must be sent/returned prior to the nfs_do_close call. But there is still a problem. Here is the __nfs4_close call (with out the wait that Boaz added) if (nfsi->layoutcommit_ctx) pnfs_layoutcommit_inode(state->inode, 0); if (has_layout(nfsi) && nfsi->layout.roc_iomode) { struct nfs4_pnfs_layout_segment range; range.iomode =3D nfsi->layout.roc_iomode; range.offset =3D 0; range.length =3D NFS4_MAX_UINT64; pnfs_return_layout(state->inode, &range, NULL, RETURN_FILE); Note that a pnfs_return_layout which is always 'sync' (async with wait for completion). So the (currently) async LAYOUTCOMMIT call returns, and a LAYOUTRETURN is put on the wire Then the LAYOUTCOMMIT rpc_call_done routine calls pnfs_layoutcommit_done which calls put_nfs_open_context. If the open context is different from the open context that was put by nfs_file_release, then pnfs_layoutcommit_done->put_nfs_open_context-> ..... ->__nfs4_close and the return on close LAYOUTRETURN is sent again! Of couse this second LAYOUTRETURN either gets a zero stateid, or uses the same stateid as the first LAYOUTRETURN, and the reply to the second LAYOUTRETURN will result in a NFS4ERR_BAD_STATEID ..... This will occur whether the LAYOUTCOMMIT is async or sync as they both call pnfs_layoutcommit_done. I need to understand why there are two open contexts. On the face of it, it seems wrong. We also add a pointer to the open context in the nfs_write_data, and in the pnfs_layoutcommit_data. Do we take a reference on the open data in theses cases? . -->Andy > > Benny > >> >> -->Andy >> >>> >>> (For some reason I see that happening much more in 2.6.34 >>> =A0I don't understand why) >>> >>> Boaz >>> --- >>> git diff --stat -p -M >>> =A0fs/nfs/nfs4state.c | =A0 =A02 +- >>> =A01 files changed, 1 insertions(+), 1 deletions(-) >>> >>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c >>> index 15c8bc8..6dbe893 100644 >>> --- a/fs/nfs/nfs4state.c >>> +++ b/fs/nfs/nfs4state.c >>> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, str= uct nfs4_state *state, fmode_t fm >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs_inode *nfsi =3D NFS_I(sta= te->inode); >>> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (nfsi->layoutcommit_ctx) >>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit_ino= de(state->inode, 0); >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit_ino= de(state->inode, wait); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (has_layout(nfsi) && nfsi->layout= =2Eroc_iomode) { >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs4_pnfs_lay= out_segment range; >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs= " in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm= l >>> > ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <AANLkTilhbkde-kAPGSBHT1b8vIZq4cM1SPpI-Rd0iOb7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: SQUASHME: missing from FIXME: async layout return [not found] ` <AANLkTilhbkde-kAPGSBHT1b8vIZq4cM1SPpI-Rd0iOb7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-05-13 15:31 ` Boaz Harrosh 2010-05-13 16:04 ` William A. (Andy) Adamson 0 siblings, 1 reply; 12+ messages in thread From: Boaz Harrosh @ 2010-05-13 15:31 UTC (permalink / raw) To: William A. (Andy) Adamson; +Cc: Benny Halevy, NFS list On 05/13/2010 06:17 PM, William A. (Andy) Adamson wrote: > On Thu, May 13, 2010 at 10:40 AM, Benny Halevy <bhalevy@panasas.com> wrote: >> On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <androsadamson@gmail.com> wrote: >>> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <bharrosh@panasas.com> wrote: >>>> I've tested the patch: >>>> FIXME: async layout return >>>> >>>> And there is a missing small hunk >>>> >>>> I have tested with this patch and it is a very good patch >>>> that should also go into 2.6.33. It is necessary in the rare >>>> case when one inode have more then one open_context. >>> >>> Do you mean more than one open context per open owner? >> >> What we see is one "regular" open context and one which is the layout_commit_ctx > > Isn't that a BUG? > > Here is what we're seeing: > > nfs_file_release->nfs_release->nfs_file_clear_open_context->__put_nfs_open_context-> > NFS_PROTO(inode)->close_context->nfs4_close_sync->__nfs4_close->pnfs_layoutcommit_inode, > > This is the same code that Boaz's 'missing small hunk' adds the wait > to the pnfs_layout_commit_inode to. This is good, because when > __nfs4_close is called with sync, every thing must be sent/returned > prior to the nfs_do_close call. > > But there is still a problem. Here is the __nfs4_close call (with out > the wait that Boaz added) > > if (nfsi->layoutcommit_ctx) > pnfs_layoutcommit_inode(state->inode, 0); > if (has_layout(nfsi) && nfsi->layout.roc_iomode) { > struct nfs4_pnfs_layout_segment range; > > range.iomode = nfsi->layout.roc_iomode; > range.offset = 0; > range.length = NFS4_MAX_UINT64; > pnfs_return_layout(state->inode, &range, NULL, > RETURN_FILE); > > Note that a pnfs_return_layout which is always 'sync' (async with wait > for completion). So the (currently) async LAYOUTCOMMIT call returns, > and a LAYOUTRETURN is put on the wire > > Then the LAYOUTCOMMIT rpc_call_done routine calls > pnfs_layoutcommit_done which calls put_nfs_open_context. If the open > context is different from the open context that was put by > nfs_file_release, then > > pnfs_layoutcommit_done->put_nfs_open_context-> ..... ->__nfs4_close > and the return on close LAYOUTRETURN is sent again! > > Of couse this second LAYOUTRETURN either gets a zero stateid, or uses > the same stateid as the first LAYOUTRETURN, and the reply to the > second LAYOUTRETURN will result in a NFS4ERR_BAD_STATEID ..... > > This will occur whether the LAYOUTCOMMIT is async or sync as they both > call pnfs_layoutcommit_done. > > I need to understand why there are two open contexts. On the face of > it, it seems wrong. > > We also add a pointer to the open context in the nfs_write_data, and > in the pnfs_layoutcommit_data. Do we take a reference on the open data > in theses cases? . > > -->Andy > > Yes on all acounts. This is what Benny's patch is fixing please try it out it is most important (Add my small fix) And about the multiple contexts I don't understand as well but the fact of it is that an nfs_inode has an list_head of open_contexts and we must deal properly with when that happens. (BTW with 2.6.34 Kernel it happens regularly as in prev kernels it almost never trigger.) Benny's patch takes care of that and I've done heavy testing of it I can see cases of more then one contexts and it works correctly. But I too would like to understand when is it that an inode can have more then one open_context Boaz > >> >> Benny >> >>> >>> -->Andy >>> >>>> >>>> (For some reason I see that happening much more in 2.6.34 >>>> I don't understand why) >>>> >>>> Boaz >>>> --- >>>> git diff --stat -p -M >>>> fs/nfs/nfs4state.c | 2 +- >>>> 1 files changed, 1 insertions(+), 1 deletions(-) >>>> >>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c >>>> index 15c8bc8..6dbe893 100644 >>>> --- a/fs/nfs/nfs4state.c >>>> +++ b/fs/nfs/nfs4state.c >>>> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm >>>> struct nfs_inode *nfsi = NFS_I(state->inode); >>>> >>>> if (nfsi->layoutcommit_ctx) >>>> - pnfs_layoutcommit_inode(state->inode, 0); >>>> + pnfs_layoutcommit_inode(state->inode, wait); >>>> if (has_layout(nfsi) && nfsi->layout.roc_iomode) { >>>> struct nfs4_pnfs_layout_segment range; >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: SQUASHME: missing from FIXME: async layout return 2010-05-13 15:31 ` Boaz Harrosh @ 2010-05-13 16:04 ` William A. (Andy) Adamson 2010-05-13 16:20 ` Benny Halevy 2010-05-13 16:23 ` Boaz Harrosh 0 siblings, 2 replies; 12+ messages in thread From: William A. (Andy) Adamson @ 2010-05-13 16:04 UTC (permalink / raw) To: Boaz Harrosh; +Cc: Benny Halevy, NFS list On Thu, May 13, 2010 at 11:31 AM, Boaz Harrosh <bharrosh@panasas.com> wrote: > On 05/13/2010 06:17 PM, William A. (Andy) Adamson wrote: >> On Thu, May 13, 2010 at 10:40 AM, Benny Halevy <bhalevy@panasas.com> wrote: >>> On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <androsadamson@gmail.com> wrote: >>>> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <bharrosh@panasas.com> wrote: >>>>> I've tested the patch: >>>>> FIXME: async layout return >>>>> >>>>> And there is a missing small hunk >>>>> >>>>> I have tested with this patch and it is a very good patch >>>>> that should also go into 2.6.33. It is necessary in the rare >>>>> case when one inode have more then one open_context. >>>> >>>> Do you mean more than one open context per open owner? >>> >>> What we see is one "regular" open context and one which is the layout_commit_ctx >> >> Isn't that a BUG? >> >> Here is what we're seeing: >> >> nfs_file_release->nfs_release->nfs_file_clear_open_context->__put_nfs_open_context-> >> NFS_PROTO(inode)->close_context->nfs4_close_sync->__nfs4_close->pnfs_layoutcommit_inode, >> >> This is the same code that Boaz's 'missing small hunk' adds the wait >> to the pnfs_layout_commit_inode to. This is good, because when >> __nfs4_close is called with sync, every thing must be sent/returned >> prior to the nfs_do_close call. >> >> But there is still a problem. Here is the __nfs4_close call (with out >> the wait that Boaz added) >> >> if (nfsi->layoutcommit_ctx) >> pnfs_layoutcommit_inode(state->inode, 0); >> if (has_layout(nfsi) && nfsi->layout.roc_iomode) { >> struct nfs4_pnfs_layout_segment range; >> >> range.iomode = nfsi->layout.roc_iomode; >> range.offset = 0; >> range.length = NFS4_MAX_UINT64; >> pnfs_return_layout(state->inode, &range, NULL, >> RETURN_FILE); >> >> Note that a pnfs_return_layout which is always 'sync' (async with wait >> for completion). So the (currently) async LAYOUTCOMMIT call returns, >> and a LAYOUTRETURN is put on the wire >> >> Then the LAYOUTCOMMIT rpc_call_done routine calls >> pnfs_layoutcommit_done which calls put_nfs_open_context. If the open >> context is different from the open context that was put by >> nfs_file_release, then >> >> pnfs_layoutcommit_done->put_nfs_open_context-> ..... ->__nfs4_close >> and the return on close LAYOUTRETURN is sent again! >> >> Of couse this second LAYOUTRETURN either gets a zero stateid, or uses >> the same stateid as the first LAYOUTRETURN, and the reply to the >> second LAYOUTRETURN will result in a NFS4ERR_BAD_STATEID ..... >> >> This will occur whether the LAYOUTCOMMIT is async or sync as they both >> call pnfs_layoutcommit_done. >> >> I need to understand why there are two open contexts. On the face of >> it, it seems wrong. >> >> We also add a pointer to the open context in the nfs_write_data, and >> in the pnfs_layoutcommit_data. Do we take a reference on the open data >> in theses cases? . >> >> -->Andy >> >> > > Yes on all acounts. This is what Benny's patch is fixing please try it out > it is most important (Add my small fix) Is this what you mean? With Benny's patch, with return-on-close set and the layoutcommit_ctx different from the open context that caused the __nfs4_close: The (now) sync LAYOUTCOMMIT triggered from the nfs_file_release __nfs4_close() will complete prior to calling the return-on-close LAYOUTRETURN call. Since pnfs_layoutcommit_done is still called, put_nfs_open_context will be call on the layoutconmit_ctx (saved in pnfs_layoutcommit_data->ctx) and __nfs4_close will be called. In _this_ instance of calling __nfs4_close (from pnfs_layoutcommit_done), the nfsi->layoutcommit_ctx pointer is null so the pnfs_layoutcommit_inode call is skipped, and pnfs_return_layout is called. Since it is also a sync call, pnfs_layoutcommit_done does not return until it is complete. So the layout is returned, the layout is freed (all layout segments 'cause the range covers the whole file) and pnfs_layoutcommit_inode returns to the FIRST __nfs4_close (from nfs_file_release) Now since return-on-close is set, pnfs_layout_return is called (in the nfs_file_release __nfs4_close). This LAYOUTRETURN call will never go on the wire because the first pnfs_layout_return (from the pnfs_layoutcommit_done __nfs4_close) has removed the layout segment, and all is well. Sheese. --->Andy > > And about the multiple contexts I don't understand as well but the fact of it > is that an nfs_inode has an list_head of open_contexts and we must deal properly > with when that happens. (BTW with 2.6.34 Kernel it happens regularly as in > prev kernels it almost never trigger.) > > Benny's patch takes care of that and I've done heavy testing of it I can see cases > of more then one contexts and it works correctly. > > But I too would like to understand when is it that an inode can have more then > one open_context > > Boaz > >> >>> >>> Benny >>> >>>> >>>> -->Andy >>>> >>>>> >>>>> (For some reason I see that happening much more in 2.6.34 >>>>> I don't understand why) >>>>> >>>>> Boaz >>>>> --- >>>>> git diff --stat -p -M >>>>> fs/nfs/nfs4state.c | 2 +- >>>>> 1 files changed, 1 insertions(+), 1 deletions(-) >>>>> >>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c >>>>> index 15c8bc8..6dbe893 100644 >>>>> --- a/fs/nfs/nfs4state.c >>>>> +++ b/fs/nfs/nfs4state.c >>>>> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm >>>>> struct nfs_inode *nfsi = NFS_I(state->inode); >>>>> >>>>> if (nfsi->layoutcommit_ctx) >>>>> - pnfs_layoutcommit_inode(state->inode, 0); >>>>> + pnfs_layoutcommit_inode(state->inode, wait); >>>>> if (has_layout(nfsi) && nfsi->layout.roc_iomode) { >>>>> struct nfs4_pnfs_layout_segment range; >>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: SQUASHME: missing from FIXME: async layout return 2010-05-13 16:04 ` William A. (Andy) Adamson @ 2010-05-13 16:20 ` Benny Halevy 2010-05-14 3:28 ` Zhang Jingwang 2010-05-13 16:23 ` Boaz Harrosh 1 sibling, 1 reply; 12+ messages in thread From: Benny Halevy @ 2010-05-13 16:20 UTC (permalink / raw) To: William A. (Andy) Adamson; +Cc: Boaz Harrosh, NFS list On May. 13, 2010, 19:04 +0300, "William A. (Andy) Adamson" <androsadamson@gmail.com> wrote: > On Thu, May 13, 2010 at 11:31 AM, Boaz Harrosh <bharrosh@panasas.com> wrote: >> On 05/13/2010 06:17 PM, William A. (Andy) Adamson wrote: >>> On Thu, May 13, 2010 at 10:40 AM, Benny Halevy <bhalevy@panasas.com> wrote: >>>> On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <androsadamson@gmail.com> wrote: >>>>> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <bharrosh@panasas.com> wrote: >>>>>> I've tested the patch: >>>>>> FIXME: async layout return >>>>>> >>>>>> And there is a missing small hunk >>>>>> >>>>>> I have tested with this patch and it is a very good patch >>>>>> that should also go into 2.6.33. It is necessary in the rare >>>>>> case when one inode have more then one open_context. >>>>> >>>>> Do you mean more than one open context per open owner? >>>> >>>> What we see is one "regular" open context and one which is the layout_commit_ctx >>> >>> Isn't that a BUG? >>> >>> Here is what we're seeing: >>> >>> nfs_file_release->nfs_release->nfs_file_clear_open_context->__put_nfs_open_context-> >>> NFS_PROTO(inode)->close_context->nfs4_close_sync->__nfs4_close->pnfs_layoutcommit_inode, >>> >>> This is the same code that Boaz's 'missing small hunk' adds the wait >>> to the pnfs_layout_commit_inode to. This is good, because when >>> __nfs4_close is called with sync, every thing must be sent/returned >>> prior to the nfs_do_close call. >>> >>> But there is still a problem. Here is the __nfs4_close call (with out >>> the wait that Boaz added) >>> >>> if (nfsi->layoutcommit_ctx) >>> pnfs_layoutcommit_inode(state->inode, 0); >>> if (has_layout(nfsi) && nfsi->layout.roc_iomode) { >>> struct nfs4_pnfs_layout_segment range; >>> >>> range.iomode = nfsi->layout.roc_iomode; >>> range.offset = 0; >>> range.length = NFS4_MAX_UINT64; >>> pnfs_return_layout(state->inode, &range, NULL, >>> RETURN_FILE); >>> >>> Note that a pnfs_return_layout which is always 'sync' (async with wait >>> for completion). So the (currently) async LAYOUTCOMMIT call returns, >>> and a LAYOUTRETURN is put on the wire >>> >>> Then the LAYOUTCOMMIT rpc_call_done routine calls >>> pnfs_layoutcommit_done which calls put_nfs_open_context. If the open >>> context is different from the open context that was put by >>> nfs_file_release, then >>> >>> pnfs_layoutcommit_done->put_nfs_open_context-> ..... ->__nfs4_close >>> and the return on close LAYOUTRETURN is sent again! >>> >>> Of couse this second LAYOUTRETURN either gets a zero stateid, or uses >>> the same stateid as the first LAYOUTRETURN, and the reply to the >>> second LAYOUTRETURN will result in a NFS4ERR_BAD_STATEID ..... >>> >>> This will occur whether the LAYOUTCOMMIT is async or sync as they both >>> call pnfs_layoutcommit_done. >>> >>> I need to understand why there are two open contexts. On the face of >>> it, it seems wrong. >>> >>> We also add a pointer to the open context in the nfs_write_data, and >>> in the pnfs_layoutcommit_data. Do we take a reference on the open data >>> in theses cases? . >>> >>> -->Andy >>> >>> >> >> Yes on all acounts. This is what Benny's patch is fixing please try it out >> it is most important (Add my small fix) > > Is this what you mean? > > With Benny's patch, with return-on-close set and the layoutcommit_ctx > different from the open context that caused the __nfs4_close: > > The (now) sync LAYOUTCOMMIT triggered from the nfs_file_release > __nfs4_close() will complete prior to calling the return-on-close > LAYOUTRETURN call. Since pnfs_layoutcommit_done is still called, > put_nfs_open_context will be call on the layoutconmit_ctx (saved in > pnfs_layoutcommit_data->ctx) and __nfs4_close will be called. > > In _this_ instance of calling __nfs4_close (from > pnfs_layoutcommit_done), the nfsi->layoutcommit_ctx pointer is null so > the pnfs_layoutcommit_inode call is skipped, and pnfs_return_layout > is called. Since it is also a sync call, pnfs_layoutcommit_done does > not return until it is complete. So the layout is returned, the layout > is freed (all layout segments 'cause the range covers the whole file) > and pnfs_layoutcommit_inode returns to the FIRST __nfs4_close (from > nfs_file_release) > > Now since return-on-close is set, pnfs_layout_return is called (in the > nfs_file_release __nfs4_close). This LAYOUTRETURN call will never go > on the wire because the first pnfs_layout_return (from the > pnfs_layoutcommit_done __nfs4_close) has removed the layout segment, > and all is well. > > Sheese. Yeah :-/ I'm feeling increasingly uncomfortable with the current implementation and we're planning to come up with the state-machine based model for the Bakeathon and test it there. Ideally, this will be stable enough that we can get the gist of it into pnfs-submit. Benny > > --->Andy > > >> >> And about the multiple contexts I don't understand as well but the fact of it >> is that an nfs_inode has an list_head of open_contexts and we must deal properly >> with when that happens. (BTW with 2.6.34 Kernel it happens regularly as in >> prev kernels it almost never trigger.) >> >> Benny's patch takes care of that and I've done heavy testing of it I can see cases >> of more then one contexts and it works correctly. >> >> But I too would like to understand when is it that an inode can have more then >> one open_context >> >> Boaz >> >>> >>>> >>>> Benny >>>> >>>>> >>>>> -->Andy >>>>> >>>>>> >>>>>> (For some reason I see that happening much more in 2.6.34 >>>>>> I don't understand why) >>>>>> >>>>>> Boaz >>>>>> --- >>>>>> git diff --stat -p -M >>>>>> fs/nfs/nfs4state.c | 2 +- >>>>>> 1 files changed, 1 insertions(+), 1 deletions(-) >>>>>> >>>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c >>>>>> index 15c8bc8..6dbe893 100644 >>>>>> --- a/fs/nfs/nfs4state.c >>>>>> +++ b/fs/nfs/nfs4state.c >>>>>> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm >>>>>> struct nfs_inode *nfsi = NFS_I(state->inode); >>>>>> >>>>>> if (nfsi->layoutcommit_ctx) >>>>>> - pnfs_layoutcommit_inode(state->inode, 0); >>>>>> + pnfs_layoutcommit_inode(state->inode, wait); >>>>>> if (has_layout(nfsi) && nfsi->layout.roc_iomode) { >>>>>> struct nfs4_pnfs_layout_segment range; >>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: SQUASHME: missing from FIXME: async layout return 2010-05-13 16:20 ` Benny Halevy @ 2010-05-14 3:28 ` Zhang Jingwang 2010-05-17 6:02 ` Benny Halevy 0 siblings, 1 reply; 12+ messages in thread From: Zhang Jingwang @ 2010-05-14 3:28 UTC (permalink / raw) To: Benny Halevy; +Cc: NFS list 2010/5/14 Benny Halevy <bhalevy@panasas.com>: > On May. 13, 2010, 19:04 +0300, "William A. (Andy) Adamson" <androsada= mson@gmail.com> wrote: >> On Thu, May 13, 2010 at 11:31 AM, Boaz Harrosh <bharrosh@panasas.com= > wrote: >>> On 05/13/2010 06:17 PM, William A. (Andy) Adamson wrote: >>>> On Thu, May 13, 2010 at 10:40 AM, Benny Halevy <bhalevy-C4P08NqkoRmIwRZHo2/mJg@public.gmane.org= m> wrote: >>>>> On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <andro= sadamson@gmail.com> wrote: >>>>>> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <bharrosh@panasas.= com> wrote: >>>>>>> I've tested the patch: >>>>>>> =A0 =A0 =A0 =A0FIXME: async layout return >>>>>>> >>>>>>> And there is a missing small hunk >>>>>>> >>>>>>> I have tested with this patch and it is a very good patch >>>>>>> that should also go into 2.6.33. It is necessary in the rare >>>>>>> case when one inode have more then one open_context. >>>>>> >>>>>> Do you mean more than one open context per open owner? >>>>> >>>>> What we see is one "regular" open context and one which is the la= yout_commit_ctx >>>> >>>> Isn't that a BUG? >>>> >>>> Here is what we're seeing: >>>> >>>> nfs_file_release->nfs_release->nfs_file_clear_open_context->__put_= nfs_open_context-> >>>> NFS_PROTO(inode)->close_context->nfs4_close_sync->__nfs4_close->pn= fs_layoutcommit_inode, >>>> >>>> This is the same code that Boaz's 'missing small hunk' adds the wa= it >>>> to the pnfs_layout_commit_inode to. This is good, because when >>>> __nfs4_close is called with sync, every thing must be sent/returne= d >>>> prior to the nfs_do_close call. >>>> >>>> But there is still a problem. Here is the __nfs4_close call (with = out >>>> the wait that Boaz added) >>>> >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (nfsi->layoutcommit_ctx) >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit_= inode(state->inode, 0); >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (has_layout(nfsi) && nfsi->layo= ut.roc_iomode) { >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct nfs4_pnfs_l= ayout_segment range; >>>> >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 range.iomode =3D n= fsi->layout.roc_iomode; >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 range.offset =3D 0= ; >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 range.length =3D N= =46S4_MAX_UINT64; >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_return_layout= (state->inode, &range, NULL, >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0RETURN_FILE); >>>> >>>> Note that a pnfs_return_layout which is always 'sync' (async with = wait >>>> for completion). So the (currently) async LAYOUTCOMMIT call return= s, >>>> and a LAYOUTRETURN is put on the wire >>>> >>>> Then the LAYOUTCOMMIT rpc_call_done routine calls >>>> pnfs_layoutcommit_done which calls put_nfs_open_context. If the op= en >>>> context is different from the open context that was put by >>>> nfs_file_release, then >>>> >>>> pnfs_layoutcommit_done->put_nfs_open_context-> ..... ->__nfs4_clos= e >>>> and the return on close LAYOUTRETURN is sent again! >>>> >>>> Of couse this second LAYOUTRETURN either gets a zero stateid, or u= ses >>>> the same stateid as the first LAYOUTRETURN, and the reply to the >>>> second LAYOUTRETURN will result in a NFS4ERR_BAD_STATEID ..... >>>> >>>> This will occur whether the LAYOUTCOMMIT is async or sync as they = both >>>> call pnfs_layoutcommit_done. >>>> >>>> I need to understand why there are two open contexts. On the face = of >>>> it, it seems wrong. >>>> >>>> We also add a pointer to the open context in the nfs_write_data, a= nd >>>> in the pnfs_layoutcommit_data. Do we take a reference on the open = data >>>> in theses cases? . >>>> >>>> -->Andy >>>> >>>> >>> >>> Yes on all acounts. This is what Benny's patch is fixing please try= it out >>> it is most important (Add my small fix) >> >> Is this what you mean? >> >> With Benny's patch, with return-on-close set and the layoutcommit_ct= x >> different from the open context that caused the __nfs4_close: >> >> The (now) sync LAYOUTCOMMIT triggered from the nfs_file_release >> __nfs4_close() will complete prior to calling the return-on-close >> LAYOUTRETURN call. Since pnfs_layoutcommit_done is still called, >> put_nfs_open_context will be call on the layoutconmit_ctx (saved in >> pnfs_layoutcommit_data->ctx) and __nfs4_close will be called. >> >> In _this_ instance of calling __nfs4_close (from >> pnfs_layoutcommit_done), the nfsi->layoutcommit_ctx pointer is null = so >> the pnfs_layoutcommit_inode call is skipped, and =A0pnfs_return_layo= ut >> is called. Since it is also a sync call, pnfs_layoutcommit_done does >> not return until it is complete. So the layout is returned, the layo= ut >> is freed (all layout segments 'cause the range covers the whole file= ) >> and pnfs_layoutcommit_inode returns to the FIRST __nfs4_close (from >> nfs_file_release) >> >> Now since return-on-close is set, pnfs_layout_return is called (in t= he >> nfs_file_release __nfs4_close). This LAYOUTRETURN call will never go >> on the wire because the first pnfs_layout_return (from the >> pnfs_layoutcommit_done __nfs4_close) has removed the layout segment, >> and all is well. >> >> Sheese. > > Yeah :-/ > I'm feeling increasingly uncomfortable with the current implementatio= n > and we're planning to come up with the state-machine based model > for the Bakeathon and test it there. =A0Ideally, this will be stable = enough > that we can get the gist of it into pnfs-submit. > > Benny > Is there any document about the state-machine based model? >> >> --->Andy >> >> >>> >>> And about the multiple contexts I don't understand as well but the = fact of it >>> is that an nfs_inode has an list_head of open_contexts and we must = deal properly >>> with when that happens. (BTW with 2.6.34 Kernel it happens regularl= y as in >>> prev kernels it almost never trigger.) >>> >>> Benny's patch takes care of that and I've done heavy testing of it = I can see cases >>> of more then one contexts and it works correctly. >>> >>> But I too would like to understand when is it that an inode can hav= e more then >>> one open_context >>> >>> Boaz >>> >>>> >>>>> >>>>> Benny >>>>> >>>>>> >>>>>> -->Andy >>>>>> >>>>>>> >>>>>>> (For some reason I see that happening much more in 2.6.34 >>>>>>> =A0I don't understand why) >>>>>>> >>>>>>> Boaz >>>>>>> --- >>>>>>> git diff --stat -p -M >>>>>>> =A0fs/nfs/nfs4state.c | =A0 =A02 +- >>>>>>> =A01 files changed, 1 insertions(+), 1 deletions(-) >>>>>>> >>>>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c >>>>>>> index 15c8bc8..6dbe893 100644 >>>>>>> --- a/fs/nfs/nfs4state.c >>>>>>> +++ b/fs/nfs/nfs4state.c >>>>>>> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path,= struct nfs4_state *state, fmode_t fm >>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs_inode *nfsi =3D NFS_I= (state->inode); >>>>>>> >>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (nfsi->layoutcommit_ctx) >>>>>>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit= _inode(state->inode, 0); >>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 pnfs_layoutcommit= _inode(state->inode, wait); >>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (has_layout(nfsi) && nfsi->la= yout.roc_iomode) { >>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs4_pnfs= _layout_segment range; >>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux= -nfs" in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info= =2Ehtml >>>>>>> >>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-nf= s" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.ht= ml >>>> >>> >>> > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" = in > the body of a message to majordomo@vger.kernel.org > More majordomo info at =A0http://vger.kernel.org/majordomo-info.html > --=20 Zhang Jingwang National Research Centre for High Performance Computers Institute of Computing Technology, Chinese Academy of Sciences No. 6, South Kexueyuan Road, Haidian District Beijing, China ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: SQUASHME: missing from FIXME: async layout return 2010-05-14 3:28 ` Zhang Jingwang @ 2010-05-17 6:02 ` Benny Halevy 0 siblings, 0 replies; 12+ messages in thread From: Benny Halevy @ 2010-05-17 6:02 UTC (permalink / raw) To: Zhang Jingwang; +Cc: NFS list On May. 14, 2010, 6:28 +0300, Zhang Jingwang <yyalone@gmail.com> wrote: > 2010/5/14 Benny Halevy <bhalevy@panasas.com>: >> On May. 13, 2010, 19:04 +0300, "William A. (Andy) Adamson" <androsadamson@gmail.com> wrote: >>> On Thu, May 13, 2010 at 11:31 AM, Boaz Harrosh <bharrosh@panasas.com> wrote: >>>> On 05/13/2010 06:17 PM, William A. (Andy) Adamson wrote: >>>>> On Thu, May 13, 2010 at 10:40 AM, Benny Halevy <bhalevy@panasas.com> wrote: >>>>>> On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <androsadamson@gmail.com> wrote: >>>>>>> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <bharrosh@panasas.com> wrote: >>>>>>>> I've tested the patch: >>>>>>>> FIXME: async layout return >>>>>>>> >>>>>>>> And there is a missing small hunk >>>>>>>> >>>>>>>> I have tested with this patch and it is a very good patch >>>>>>>> that should also go into 2.6.33. It is necessary in the rare >>>>>>>> case when one inode have more then one open_context. >>>>>>> >>>>>>> Do you mean more than one open context per open owner? >>>>>> >>>>>> What we see is one "regular" open context and one which is the layout_commit_ctx >>>>> >>>>> Isn't that a BUG? >>>>> >>>>> Here is what we're seeing: >>>>> >>>>> nfs_file_release->nfs_release->nfs_file_clear_open_context->__put_nfs_open_context-> >>>>> NFS_PROTO(inode)->close_context->nfs4_close_sync->__nfs4_close->pnfs_layoutcommit_inode, >>>>> >>>>> This is the same code that Boaz's 'missing small hunk' adds the wait >>>>> to the pnfs_layout_commit_inode to. This is good, because when >>>>> __nfs4_close is called with sync, every thing must be sent/returned >>>>> prior to the nfs_do_close call. >>>>> >>>>> But there is still a problem. Here is the __nfs4_close call (with out >>>>> the wait that Boaz added) >>>>> >>>>> if (nfsi->layoutcommit_ctx) >>>>> pnfs_layoutcommit_inode(state->inode, 0); >>>>> if (has_layout(nfsi) && nfsi->layout.roc_iomode) { >>>>> struct nfs4_pnfs_layout_segment range; >>>>> >>>>> range.iomode = nfsi->layout.roc_iomode; >>>>> range.offset = 0; >>>>> range.length = NFS4_MAX_UINT64; >>>>> pnfs_return_layout(state->inode, &range, NULL, >>>>> RETURN_FILE); >>>>> >>>>> Note that a pnfs_return_layout which is always 'sync' (async with wait >>>>> for completion). So the (currently) async LAYOUTCOMMIT call returns, >>>>> and a LAYOUTRETURN is put on the wire >>>>> >>>>> Then the LAYOUTCOMMIT rpc_call_done routine calls >>>>> pnfs_layoutcommit_done which calls put_nfs_open_context. If the open >>>>> context is different from the open context that was put by >>>>> nfs_file_release, then >>>>> >>>>> pnfs_layoutcommit_done->put_nfs_open_context-> ..... ->__nfs4_close >>>>> and the return on close LAYOUTRETURN is sent again! >>>>> >>>>> Of couse this second LAYOUTRETURN either gets a zero stateid, or uses >>>>> the same stateid as the first LAYOUTRETURN, and the reply to the >>>>> second LAYOUTRETURN will result in a NFS4ERR_BAD_STATEID ..... >>>>> >>>>> This will occur whether the LAYOUTCOMMIT is async or sync as they both >>>>> call pnfs_layoutcommit_done. >>>>> >>>>> I need to understand why there are two open contexts. On the face of >>>>> it, it seems wrong. >>>>> >>>>> We also add a pointer to the open context in the nfs_write_data, and >>>>> in the pnfs_layoutcommit_data. Do we take a reference on the open data >>>>> in theses cases? . >>>>> >>>>> -->Andy >>>>> >>>>> >>>> >>>> Yes on all acounts. This is what Benny's patch is fixing please try it out >>>> it is most important (Add my small fix) >>> >>> Is this what you mean? >>> >>> With Benny's patch, with return-on-close set and the layoutcommit_ctx >>> different from the open context that caused the __nfs4_close: >>> >>> The (now) sync LAYOUTCOMMIT triggered from the nfs_file_release >>> __nfs4_close() will complete prior to calling the return-on-close >>> LAYOUTRETURN call. Since pnfs_layoutcommit_done is still called, >>> put_nfs_open_context will be call on the layoutconmit_ctx (saved in >>> pnfs_layoutcommit_data->ctx) and __nfs4_close will be called. >>> >>> In _this_ instance of calling __nfs4_close (from >>> pnfs_layoutcommit_done), the nfsi->layoutcommit_ctx pointer is null so >>> the pnfs_layoutcommit_inode call is skipped, and pnfs_return_layout >>> is called. Since it is also a sync call, pnfs_layoutcommit_done does >>> not return until it is complete. So the layout is returned, the layout >>> is freed (all layout segments 'cause the range covers the whole file) >>> and pnfs_layoutcommit_inode returns to the FIRST __nfs4_close (from >>> nfs_file_release) >>> >>> Now since return-on-close is set, pnfs_layout_return is called (in the >>> nfs_file_release __nfs4_close). This LAYOUTRETURN call will never go >>> on the wire because the first pnfs_layout_return (from the >>> pnfs_layoutcommit_done __nfs4_close) has removed the layout segment, >>> and all is well. >>> >>> Sheese. >> >> Yeah :-/ >> I'm feeling increasingly uncomfortable with the current implementation >> and we're planning to come up with the state-machine based model >> for the Bakeathon and test it there. Ideally, this will be stable enough >> that we can get the gist of it into pnfs-submit. >> >> Benny >> > > Is there any document about the state-machine based model? Here: http://linux-nfs.org/pipermail/pnfs/attachments/20100304/f90bede4/attachment.txt >>> >>> --->Andy >>> >>> >>>> >>>> And about the multiple contexts I don't understand as well but the fact of it >>>> is that an nfs_inode has an list_head of open_contexts and we must deal properly >>>> with when that happens. (BTW with 2.6.34 Kernel it happens regularly as in >>>> prev kernels it almost never trigger.) >>>> >>>> Benny's patch takes care of that and I've done heavy testing of it I can see cases >>>> of more then one contexts and it works correctly. >>>> >>>> But I too would like to understand when is it that an inode can have more then >>>> one open_context >>>> >>>> Boaz >>>> >>>>> >>>>>> >>>>>> Benny >>>>>> >>>>>>> >>>>>>> -->Andy >>>>>>> >>>>>>>> >>>>>>>> (For some reason I see that happening much more in 2.6.34 >>>>>>>> I don't understand why) >>>>>>>> >>>>>>>> Boaz >>>>>>>> --- >>>>>>>> git diff --stat -p -M >>>>>>>> fs/nfs/nfs4state.c | 2 +- >>>>>>>> 1 files changed, 1 insertions(+), 1 deletions(-) >>>>>>>> >>>>>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c >>>>>>>> index 15c8bc8..6dbe893 100644 >>>>>>>> --- a/fs/nfs/nfs4state.c >>>>>>>> +++ b/fs/nfs/nfs4state.c >>>>>>>> @@ -590,7 +590,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm >>>>>>>> struct nfs_inode *nfsi = NFS_I(state->inode); >>>>>>>> >>>>>>>> if (nfsi->layoutcommit_ctx) >>>>>>>> - pnfs_layoutcommit_inode(state->inode, 0); >>>>>>>> + pnfs_layoutcommit_inode(state->inode, wait); >>>>>>>> if (has_layout(nfsi) && nfsi->layout.roc_iomode) { >>>>>>>> struct nfs4_pnfs_layout_segment range; >>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>> >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: SQUASHME: missing from FIXME: async layout return 2010-05-13 16:04 ` William A. (Andy) Adamson 2010-05-13 16:20 ` Benny Halevy @ 2010-05-13 16:23 ` Boaz Harrosh 1 sibling, 0 replies; 12+ messages in thread From: Boaz Harrosh @ 2010-05-13 16:23 UTC (permalink / raw) To: William A. (Andy) Adamson; +Cc: Benny Halevy, NFS list On 05/13/2010 07:04 PM, William A. (Andy) Adamson wrote: > On Thu, May 13, 2010 at 11:31 AM, Boaz Harrosh <bharrosh@panasas.com> wrote: >> On 05/13/2010 06:17 PM, William A. (Andy) Adamson wrote: >>> On Thu, May 13, 2010 at 10:40 AM, Benny Halevy <bhalevy@panasas.com> wrote: >>>> On May. 13, 2010, 17:19 +0300, "William A. (Andy) Adamson" <androsadamson@gmail.com> wrote: >>>>> On Thu, May 13, 2010 at 4:40 AM, Boaz Harrosh <bharrosh@panasas.com> wrote: >>>>>> I've tested the patch: >>>>>> FIXME: async layout return >>>>>> >>>>>> And there is a missing small hunk >>>>>> >>>>>> I have tested with this patch and it is a very good patch >>>>>> that should also go into 2.6.33. It is necessary in the rare >>>>>> case when one inode have more then one open_context. >>>>> >>>>> Do you mean more than one open context per open owner? >>>> >>>> What we see is one "regular" open context and one which is the layout_commit_ctx >>> >>> Isn't that a BUG? >>> >>> Here is what we're seeing: >>> >>> nfs_file_release->nfs_release->nfs_file_clear_open_context->__put_nfs_open_context-> >>> NFS_PROTO(inode)->close_context->nfs4_close_sync->__nfs4_close->pnfs_layoutcommit_inode, >>> >>> This is the same code that Boaz's 'missing small hunk' adds the wait >>> to the pnfs_layout_commit_inode to. This is good, because when >>> __nfs4_close is called with sync, every thing must be sent/returned >>> prior to the nfs_do_close call. >>> >>> But there is still a problem. Here is the __nfs4_close call (with out >>> the wait that Boaz added) >>> >>> if (nfsi->layoutcommit_ctx) >>> pnfs_layoutcommit_inode(state->inode, 0); >>> if (has_layout(nfsi) && nfsi->layout.roc_iomode) { >>> struct nfs4_pnfs_layout_segment range; >>> >>> range.iomode = nfsi->layout.roc_iomode; >>> range.offset = 0; >>> range.length = NFS4_MAX_UINT64; >>> pnfs_return_layout(state->inode, &range, NULL, >>> RETURN_FILE); >>> >>> Note that a pnfs_return_layout which is always 'sync' (async with wait >>> for completion). So the (currently) async LAYOUTCOMMIT call returns, >>> and a LAYOUTRETURN is put on the wire >>> >>> Then the LAYOUTCOMMIT rpc_call_done routine calls >>> pnfs_layoutcommit_done which calls put_nfs_open_context. If the open >>> context is different from the open context that was put by >>> nfs_file_release, then >>> >>> pnfs_layoutcommit_done->put_nfs_open_context-> ..... ->__nfs4_close >>> and the return on close LAYOUTRETURN is sent again! >>> >>> Of couse this second LAYOUTRETURN either gets a zero stateid, or uses >>> the same stateid as the first LAYOUTRETURN, and the reply to the >>> second LAYOUTRETURN will result in a NFS4ERR_BAD_STATEID ..... >>> >>> This will occur whether the LAYOUTCOMMIT is async or sync as they both >>> call pnfs_layoutcommit_done. >>> >>> I need to understand why there are two open contexts. On the face of >>> it, it seems wrong. >>> >>> We also add a pointer to the open context in the nfs_write_data, and >>> in the pnfs_layoutcommit_data. Do we take a reference on the open data >>> in theses cases? . >>> >>> -->Andy >>> >>> >> >> Yes on all acounts. This is what Benny's patch is fixing please try it out >> it is most important (Add my small fix) > > Is this what you mean? > > With Benny's patch, with return-on-close set and the layoutcommit_ctx > different from the open context that caused the __nfs4_close: > > The (now) sync LAYOUTCOMMIT triggered from the nfs_file_release > __nfs4_close() will complete prior to calling the return-on-close > LAYOUTRETURN call. Since pnfs_layoutcommit_done is still called, > put_nfs_open_context will be call on the layoutconmit_ctx (saved in > pnfs_layoutcommit_data->ctx) and __nfs4_close will be called. > > In _this_ instance of calling __nfs4_close (from > pnfs_layoutcommit_done), the nfsi->layoutcommit_ctx pointer is null so > the pnfs_layoutcommit_inode call is skipped, and pnfs_return_layout > is called. Since it is also a sync call, pnfs_layoutcommit_done does > not return until it is complete. So the layout is returned, the layout > is freed (all layout segments 'cause the range covers the whole file) > and pnfs_layoutcommit_inode returns to the FIRST __nfs4_close (from > nfs_file_release) > > Now since return-on-close is set, pnfs_layout_return is called (in the > nfs_file_release __nfs4_close). This LAYOUTRETURN call will never go > on the wire because the first pnfs_layout_return (from the > pnfs_layoutcommit_done __nfs4_close) has removed the layout segment, > and all is well. > > Sheese. > > --->Andy > Yes I have seen this as well. I went even farther then Benny's patch and added the following hunk which works very well BTW, but I now understand the it is not necessary, exactly as you described above. After all this will settle, I have a pending patch that cleans this code up by moving the hunk from __nfs4_close() into a pnfs_close_context() function in pnfs.c (And empty if not 4.1) this way lots of duplicate code gets thrown away for example the double lo_commit codes --- diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index 06f20b9..1740648 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -739,6 +754,11 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range, } lo = get_lock_current_layout(nfsi); + if (wait && atomic_read(&nfsi->layout.lretcount)) { + printk(KERN_ERR "%s(%lx): wait lretcount=%d\n", __func__, ino->i_ino, + atomic_read(&nfsi->layout.lretcount)); + drain_layoutreturns(&nfsi->layout); + } if (lo && !has_layout_to_return(lo, &arg)) { put_unlock_current_layout(lo); lo = NULL; ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-05-17 6:03 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-13 8:40 SQUASHME: missing from FIXME: async layout return Boaz Harrosh
2010-05-13 14:07 ` William A. (Andy) Adamson
[not found] ` <AANLkTilE4ao7UadJD58EzbExdfwE4zU5DF-6Nit29Yjq-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-05-13 14:39 ` Benny Halevy
2010-05-13 14:19 ` William A. (Andy) Adamson
[not found] ` <AANLkTind3eXmuPw5HrKqB6bHJSSXFL3g2cDSEyyYc1n8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-05-13 14:40 ` Benny Halevy
2010-05-13 15:17 ` William A. (Andy) Adamson
[not found] ` <AANLkTilhbkde-kAPGSBHT1b8vIZq4cM1SPpI-Rd0iOb7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-05-13 15:31 ` Boaz Harrosh
2010-05-13 16:04 ` William A. (Andy) Adamson
2010-05-13 16:20 ` Benny Halevy
2010-05-14 3:28 ` Zhang Jingwang
2010-05-17 6:02 ` Benny Halevy
2010-05-13 16:23 ` Boaz Harrosh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).