* [PATCH 0/3] NFSD EOS deferral @ 2008-10-15 21:00 andros 2008-10-15 21:00 ` [PATCH 1/3] SUNRPC add deferral processing callbacks andros 2008-10-17 17:44 ` [PATCH 0/3] NFSD EOS deferral J. Bruce Fields 0 siblings, 2 replies; 18+ messages in thread From: andros @ 2008-10-15 21:00 UTC (permalink / raw) To: linux-nfs; +Cc: bfields Here's a patch set for review - it compiles and seems to work, but I haven't done stress testing, nor testing of all of the combinations of deferral cases. A deferral occurs when NFSD needs information from an rpc cache, and an upcall is required. Instead of NFSD waiting for the cache to be filled by the upcall, the RPC request is inserted back into the receive stream for processing at a later time. Exactly once semantics require that NFSD compound RPC deferral processing restart at the operation that caused the deferral, instead of reprocessing the full compound RPC from the start possibly repeating operation processing. These patches add three callbacks, a data pointer, and page pointer storage to the sunrpc svc deferral architecture that NFSD uses to accomplish this goal. Deferrals that do not define the callbacks act as before. Care has been taken to ensure that combinations of deferrals - those from the NFSv4 server with the callbacks defined, and those from the RPC layer without the callbacks defined work together correctly. Thoughts, comments and suggestions are really appreciated... -->Andy Adamson <andros@netapp.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/3] SUNRPC add deferral processing callbacks 2008-10-15 21:00 [PATCH 0/3] NFSD EOS deferral andros @ 2008-10-15 21:00 ` andros 2008-10-15 21:00 ` [PATCH 2/3] NFSD save, restore, and release deferred result pages andros ` (2 more replies) 2008-10-17 17:44 ` [PATCH 0/3] NFSD EOS deferral J. Bruce Fields 1 sibling, 3 replies; 18+ messages in thread From: andros @ 2008-10-15 21:00 UTC (permalink / raw) To: linux-nfs; +Cc: bfields, Andy Adamson, Andy Adamson From: Andy Adamson <andros@netapp.com> For EOS, NFSD compound RPC deferred processing should restart operation which caused the deferral. Add a callback and a defer_data pointer in svc_rqst to enable svc_defer to save partial result state in the deferral request. Add page pointer storage to svc_deferred_req to cache the pages holding the partially processed request. Add callbacks and a defer_data pointer in svc_deferred_request to enable svc_deferred_recv to restore and release the partial result state. Signed-off-by: Andy Adamson<andros-HgOvQuBEEgQAvxtiuMwx3w@public.gmane.org> --- include/linux/sunrpc/svc.h | 10 ++++++++++ net/sunrpc/svc_xprt.c | 20 +++++++++++++++----- 2 files changed, 25 insertions(+), 5 deletions(-) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index dc69068..46a4097 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -215,6 +215,9 @@ struct svc_rqst { struct svc_cred rq_cred; /* auth info */ void * rq_xprt_ctxt; /* transport specific context ptr */ struct svc_deferred_req*rq_deferred; /* deferred request we are replaying */ + /* callback to save deferred request state */ + void (*rq_save_state)(struct svc_rqst *, struct svc_deferred_req *); + void *rq_defer_data; /* defer state data to save */ size_t rq_xprt_hlen; /* xprt header len */ struct xdr_buf rq_arg; @@ -323,6 +326,13 @@ struct svc_deferred_req { union svc_addr_u daddr; /* where reply must come from */ struct cache_deferred_req handle; size_t xprt_hlen; + /* callbacks to restore and release deferred request state + * set in rq_save_state */ + void (*restore_state)(struct svc_rqst *, struct svc_deferred_req *); + void (*release_state)(struct svc_deferred_req *); + void *defer_data; /* defer state data */ + struct page *respages[RPCSVC_MAXPAGES]; + short resused; int argslen; __be32 args[0]; }; diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index e46c825..531b325 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -876,6 +876,8 @@ static void svc_revisit(struct cache_deferred_req *dreq, int too_many) struct svc_xprt *xprt = dr->xprt; if (too_many) { + if (dr->release_state) + dr->release_state(dr); svc_xprt_put(xprt); kfree(dr); return; @@ -902,10 +904,10 @@ static void svc_revisit(struct cache_deferred_req *dreq, int too_many) static struct cache_deferred_req *svc_defer(struct cache_req *req) { struct svc_rqst *rqstp = container_of(req, struct svc_rqst, rq_chandle); - struct svc_deferred_req *dr; + struct svc_deferred_req *dr = NULL; if (rqstp->rq_arg.page_len) - return NULL; /* if more than a page, give up FIXME */ + goto out; /* if more than a page, give up FIXME */ if (rqstp->rq_deferred) { dr = rqstp->rq_deferred; rqstp->rq_deferred = NULL; @@ -914,9 +916,9 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req) size_t size; /* FIXME maybe discard if size too large */ size = sizeof(struct svc_deferred_req) + rqstp->rq_arg.len; - dr = kmalloc(size, GFP_KERNEL); + dr = kzalloc(size, GFP_KERNEL); if (dr == NULL) - return NULL; + goto out; dr->handle.owner = rqstp->rq_server; dr->prot = rqstp->rq_prot; @@ -935,7 +937,13 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req) dr->xprt = rqstp->rq_xprt; dr->handle.revisit = svc_revisit; - return &dr->handle; + if (rqstp->rq_save_state) + rqstp->rq_save_state(rqstp, dr); +out: + rqstp->rq_defer_data = NULL; + rqstp->rq_save_state = NULL; + + return dr ? &dr->handle: NULL; } /* @@ -959,6 +967,8 @@ static int svc_deferred_recv(struct svc_rqst *rqstp) rqstp->rq_xprt_hlen = dr->xprt_hlen; rqstp->rq_daddr = dr->daddr; rqstp->rq_respages = rqstp->rq_pages; + if (dr->restore_state) + dr->restore_state(rqstp, dr); return (dr->argslen<<2) - dr->xprt_hlen; } -- 1.5.4.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/3] NFSD save, restore, and release deferred result pages 2008-10-15 21:00 ` [PATCH 1/3] SUNRPC add deferral processing callbacks andros @ 2008-10-15 21:00 ` andros 2008-10-15 21:00 ` [PATCH 3/3] NFSD deferral processing andros 2008-10-17 17:48 ` [PATCH 1/3] SUNRPC add deferral processing callbacks J. Bruce Fields 2008-10-17 20:35 ` Talpey, Thomas 2 siblings, 1 reply; 18+ messages in thread From: andros @ 2008-10-15 21:00 UTC (permalink / raw) To: linux-nfs; +Cc: bfields, Andy Adamson From: Andy Adamson <andros@netapp.com> Implement the rq_save_state, resume_state, and release_state RPC deferral callbacks. Save the reply pages in struct svc_deferred_req. Clear the svc_deferred_req respages in the save_state callback to setup for another NFSD operation deferral. Signed-off-by: Andy Adamson<andros@netapp.com> --- fs/nfsd/nfs4proc.c | 40 ++++++++++++++++++++++++++++++++++++++++ fs/nfsd/nfs4state.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ include/linux/nfsd/xdr4.h | 5 +++++ 3 files changed, 90 insertions(+), 0 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index e5b51ff..1c19ff9 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -836,6 +836,46 @@ static struct nfsd4_compound_state *cstate_alloc(void) return cstate; } +/* + * RPC deferral callbacks + */ +static void +nfsd4_release_deferred_state(struct svc_deferred_req *dreq) +{ + nfsd4_clear_respages(dreq->respages, dreq->resused); + cstate_free(dreq->defer_data); +} + +static void +nfsd4_restore_deferred_state(struct svc_rqst *rqstp, + struct svc_deferred_req *dreq) +{ + nfsd4_restore_rqst_pages(rqstp, dreq->respages, dreq->resused); + /* Reset defer_data for a NFSD deferral revisit interrupted + * by a non-NFSD deferral */ + rqstp->rq_defer_data = dreq->defer_data; +} + +static void +nfsd4_save_deferred_state(struct svc_rqst *rqstp, + struct svc_deferred_req *dreq) + { + struct nfsd4_compound_state *cstate = + (struct nfsd4_compound_state *)rqstp->rq_defer_data; + + fh_put(&cstate->current_fh); + fh_put(&cstate->save_fh); + + /* In case of an NFSD deferral on a previous operation */ + nfsd4_clear_respages(dreq->respages, dreq->resused); + + nfsd4_cache_rqst_pages(rqstp, dreq->respages, &dreq->resused); + + dreq->defer_data = rqstp->rq_defer_data; + dreq->restore_state = nfsd4_restore_deferred_state; + dreq->release_state = nfsd4_release_deferred_state; +} + typedef __be32(*nfsd4op_func)(struct svc_rqst *, struct nfsd4_compound_state *, void *); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 1578d7a..f266e45 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -684,6 +684,51 @@ out_err: return; } +void +nfsd4_move_pages(struct page **topages, struct page **frompages, short count) +{ + int i; + + for (i = 0; i < count; i++) { + topages[i] = frompages[i]; + if (!topages[i]) + continue; + get_page(topages[i]); + } +} + +void +nfsd4_cache_rqst_pages(struct svc_rqst *rqstp, struct page **respages, + short *resused) +{ + *resused = rqstp->rq_resused; + nfsd4_move_pages(respages, rqstp->rq_respages, rqstp->rq_resused); +} + +void +nfsd4_restore_rqst_pages(struct svc_rqst *rqstp, struct page **respages, + short resused) +{ + /* release allocated result pages to be replaced from the cache */ + svc_free_res_pages(rqstp); + + rqstp->rq_resused = resused; + nfsd4_move_pages(rqstp->rq_respages, respages, resused); +} + +void +nfsd4_clear_respages(struct page **respages, short resused) +{ + int i; + + for (i = 0; i < resused; i++) { + if (!respages[i]) + continue; + put_page(respages[i]); + respages[i] = NULL; + } +} + __be32 nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_setclientid *setclid) diff --git a/include/linux/nfsd/xdr4.h b/include/linux/nfsd/xdr4.h index 27bd3e3..f3495ae 100644 --- a/include/linux/nfsd/xdr4.h +++ b/include/linux/nfsd/xdr4.h @@ -442,6 +442,11 @@ void nfsd4_encode_replay(struct nfsd4_compoundres *resp, struct nfsd4_op *op); __be32 nfsd4_encode_fattr(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry, __be32 *buffer, int *countp, u32 *bmval, struct svc_rqst *, int ignore_crossmnt); +extern void nfsd4_clear_respages(struct page **respages, short resused); +extern void nfsd4_cache_rqst_pages(struct svc_rqst *rqstp, + struct page **respages, short *resused); +extern void nfsd4_restore_rqst_pages(struct svc_rqst *rqstp, + struct page **respages, short resused); extern __be32 nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *, struct nfsd4_setclientid *setclid); -- 1.5.4.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/3] NFSD deferral processing 2008-10-15 21:00 ` [PATCH 2/3] NFSD save, restore, and release deferred result pages andros @ 2008-10-15 21:00 ` andros 2008-10-17 20:46 ` Talpey, Thomas 0 siblings, 1 reply; 18+ messages in thread From: andros @ 2008-10-15 21:00 UTC (permalink / raw) To: linux-nfs; +Cc: bfields, Andy Adamson From: Andy Adamson <andros@netapp.com> Use a slab cache for nfsd4_compound_state allocation Save the struct nfsd4_compound_state and set the save_state callback for each request for potential deferral handling. If an NFSv4 operation causes a deferral, the save_state callback is called by svc_defer which saves the defer_data with the deferral, and sets the restore_state deferral callback. fh_put is called so that the deferral is not referencing the file handles, allowing umount of the file system. Signed-off-by: Andy Adamson<andros@netapp.com> --- fs/nfsd/nfs4proc.c | 47 +++++++++++++++++++------------------------- fs/nfsd/nfs4state.c | 37 +++++++++++++++++++++++++++++++++++ include/linux/nfsd/xdr4.h | 5 ++++ 3 files changed, 62 insertions(+), 27 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 1c19ff9..e60b359 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -813,37 +813,13 @@ static inline void nfsd4_increment_op_stats(u32 opnum) nfsdstats.nfs4_opcount[opnum]++; } -static void cstate_free(struct nfsd4_compound_state *cstate) -{ - if (cstate == NULL) - return; - fh_put(&cstate->current_fh); - fh_put(&cstate->save_fh); - BUG_ON(cstate->replay_owner); - kfree(cstate); -} - -static struct nfsd4_compound_state *cstate_alloc(void) -{ - struct nfsd4_compound_state *cstate; - - cstate = kmalloc(sizeof(struct nfsd4_compound_state), GFP_KERNEL); - if (cstate == NULL) - return NULL; - fh_init(&cstate->current_fh, NFS4_FHSIZE); - fh_init(&cstate->save_fh, NFS4_FHSIZE); - cstate->replay_owner = NULL; - return cstate; -} - /* * RPC deferral callbacks */ static void nfsd4_release_deferred_state(struct svc_deferred_req *dreq) { - nfsd4_clear_respages(dreq->respages, dreq->resused); - cstate_free(dreq->defer_data); + nfsd4_cstate_free(dreq->defer_data, dreq); } static void @@ -926,12 +902,22 @@ nfsd4_proc_compound(struct svc_rqst *rqstp, goto out; status = nfserr_resource; - cstate = cstate_alloc(); + cstate = nfsd4_cstate_alloc(rqstp); if (cstate == NULL) goto out; + if (rqstp->rq_deferred && rqstp->rq_deferred->defer_data) { + resp->opcnt = cstate->last_op_cnt; + resp->p = cstate->last_op_p; + fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_NOP); + fh_verify(rqstp, &cstate->save_fh, 0, NFSD_MAY_NOP); + } + rqstp->rq_defer_data = cstate; + rqstp->rq_save_state = nfsd4_save_deferred_state; + status = nfs_ok; while (!status && resp->opcnt < args->opcnt) { + cstate->last_op_p = resp->p; op = &args->ops[resp->opcnt++]; dprintk("nfsv4 compound op #%d/%d: %d (%s)\n", @@ -996,8 +982,15 @@ encode_op: nfsd4_increment_op_stats(op->opnum); } + rqstp->rq_defer_data = NULL; + rqstp->rq_save_state = NULL; + + if (status == nfserr_dropit) { + cstate->last_op_cnt = resp->opcnt - 1; + return status; + } - cstate_free(cstate); + nfsd4_cstate_free(cstate, rqstp->rq_deferred); out: nfsd4_release_compoundargs(args); dprintk("nfsv4 compound returned %d\n", ntohl(status)); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index f266e45..002a57c 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -91,6 +91,7 @@ static struct kmem_cache *stateowner_slab = NULL; static struct kmem_cache *file_slab = NULL; static struct kmem_cache *stateid_slab = NULL; static struct kmem_cache *deleg_slab = NULL; +static struct kmem_cache *cstate_slab; void nfs4_lock_state(void) @@ -442,6 +443,37 @@ static struct nfs4_client *create_client(struct xdr_netobj name, char *recdir) return clp; } +void nfsd4_cstate_free(struct nfsd4_compound_state *cstate, + struct svc_deferred_req *dreq) +{ + if (dreq && dreq->release_state) + nfsd4_clear_respages(dreq->respages, dreq->resused); + if (cstate == NULL) + return; + fh_put(&cstate->current_fh); + fh_put(&cstate->save_fh); + BUG_ON(cstate->replay_owner); + kmem_cache_free(cstate_slab, cstate); +} + +struct nfsd4_compound_state *nfsd4_cstate_alloc(struct svc_rqst *rqstp) +{ + struct nfsd4_compound_state *cstate; + + if (rqstp->rq_deferred && rqstp->rq_deferred->defer_data) { + cstate = rqstp->rq_deferred->defer_data; + goto out; + } + cstate = kmem_cache_alloc(cstate_slab, GFP_KERNEL); + if (cstate == NULL) + return NULL; + fh_init(&cstate->current_fh, NFS4_FHSIZE); + fh_init(&cstate->save_fh, NFS4_FHSIZE); + cstate->replay_owner = NULL; +out: + return cstate; +} + static void copy_verf(struct nfs4_client *target, nfs4_verifier *source) { memcpy(target->cl_verifier.data, source->data, @@ -986,6 +1018,7 @@ nfsd4_free_slabs(void) nfsd4_free_slab(&file_slab); nfsd4_free_slab(&stateid_slab); nfsd4_free_slab(&deleg_slab); + nfsd4_free_slab(&cstate_slab); } static int @@ -1007,6 +1040,10 @@ nfsd4_init_slabs(void) sizeof(struct nfs4_delegation), 0, 0, NULL); if (deleg_slab == NULL) goto out_nomem; + cstate_slab = kmem_cache_create("nfsd4_compound_states", + sizeof(struct nfsd4_compound_state), 0, 0, NULL); + if (cstate_slab == NULL) + goto out_nomem; return 0; out_nomem: nfsd4_free_slabs(); diff --git a/include/linux/nfsd/xdr4.h b/include/linux/nfsd/xdr4.h index f3495ae..3407f4b 100644 --- a/include/linux/nfsd/xdr4.h +++ b/include/linux/nfsd/xdr4.h @@ -48,6 +48,8 @@ struct nfsd4_compound_state { struct svc_fh current_fh; struct svc_fh save_fh; struct nfs4_stateowner *replay_owner; + __be32 *last_op_p; + u32 last_op_cnt; }; struct nfsd4_change_info { @@ -447,6 +449,9 @@ extern void nfsd4_cache_rqst_pages(struct svc_rqst *rqstp, struct page **respages, short *resused); extern void nfsd4_restore_rqst_pages(struct svc_rqst *rqstp, struct page **respages, short resused); +extern struct nfsd4_compound_state *nfsd4_cstate_alloc(struct svc_rqst *rqstp); +extern void nfsd4_cstate_free(struct nfsd4_compound_state *cstate, + struct svc_deferred_req *dreq); extern __be32 nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *, struct nfsd4_setclientid *setclid); -- 1.5.4.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] NFSD deferral processing 2008-10-15 21:00 ` [PATCH 3/3] NFSD deferral processing andros @ 2008-10-17 20:46 ` Talpey, Thomas [not found] ` <RTPCLUEXC1-PRDNY30k000001cc-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Talpey, Thomas @ 2008-10-17 20:46 UTC (permalink / raw) To: andros; +Cc: linux-nfs, bfields At 05:00 PM 10/15/2008, andros@netapp.com wrote: >From: Andy Adamson <andros@netapp.com> > >Use a slab cache for nfsd4_compound_state allocation > >Save the struct nfsd4_compound_state and set the save_state callback for >each request for potential deferral handling. > >If an NFSv4 operation causes a deferral, the save_state callback is called >by svc_defer which saves the defer_data with the deferral, and sets the >restore_state deferral callback. > >fh_put is called so that the deferral is not referencing the file handles, >allowing umount of the file system. Can you explain the safety of this? If the COMPOUND starts off with an operation that uses them, what's the implication of one going stale partway through? Is fh_verify() when the deferral continues enough to ensure the fh is protected until the end of the op? There's a comment at the top of fh_verify() that it is "only called at the start of an nfsproc call", with some assumptions, for instance. Tom > >Signed-off-by: Andy Adamson<andros@netapp.com> >--- > fs/nfsd/nfs4proc.c | 47 +++++++++++++++++++------------------------- > fs/nfsd/nfs4state.c | 37 +++++++++++++++++++++++++++++++++++ > include/linux/nfsd/xdr4.h | 5 ++++ > 3 files changed, 62 insertions(+), 27 deletions(-) > >diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c >index 1c19ff9..e60b359 100644 >--- a/fs/nfsd/nfs4proc.c >+++ b/fs/nfsd/nfs4proc.c >@@ -813,37 +813,13 @@ static inline void nfsd4_increment_op_stats(u32 opnum) > nfsdstats.nfs4_opcount[opnum]++; > } > >-static void cstate_free(struct nfsd4_compound_state *cstate) >-{ >- if (cstate == NULL) >- return; >- fh_put(&cstate->current_fh); >- fh_put(&cstate->save_fh); >- BUG_ON(cstate->replay_owner); >- kfree(cstate); >-} >- >-static struct nfsd4_compound_state *cstate_alloc(void) >-{ >- struct nfsd4_compound_state *cstate; >- >- cstate = kmalloc(sizeof(struct nfsd4_compound_state), GFP_KERNEL); >- if (cstate == NULL) >- return NULL; >- fh_init(&cstate->current_fh, NFS4_FHSIZE); >- fh_init(&cstate->save_fh, NFS4_FHSIZE); >- cstate->replay_owner = NULL; >- return cstate; >-} >- > /* > * RPC deferral callbacks > */ > static void > nfsd4_release_deferred_state(struct svc_deferred_req *dreq) > { >- nfsd4_clear_respages(dreq->respages, dreq->resused); >- cstate_free(dreq->defer_data); >+ nfsd4_cstate_free(dreq->defer_data, dreq); > } > > static void >@@ -926,12 +902,22 @@ nfsd4_proc_compound(struct svc_rqst *rqstp, > goto out; > > status = nfserr_resource; >- cstate = cstate_alloc(); >+ cstate = nfsd4_cstate_alloc(rqstp); > if (cstate == NULL) > goto out; > >+ if (rqstp->rq_deferred && rqstp->rq_deferred->defer_data) { >+ resp->opcnt = cstate->last_op_cnt; >+ resp->p = cstate->last_op_p; >+ fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_NOP); >+ fh_verify(rqstp, &cstate->save_fh, 0, NFSD_MAY_NOP); >+ } >+ rqstp->rq_defer_data = cstate; >+ rqstp->rq_save_state = nfsd4_save_deferred_state; >+ > status = nfs_ok; > while (!status && resp->opcnt < args->opcnt) { >+ cstate->last_op_p = resp->p; > op = &args->ops[resp->opcnt++]; > > dprintk("nfsv4 compound op #%d/%d: %d (%s)\n", >@@ -996,8 +982,15 @@ encode_op: > > nfsd4_increment_op_stats(op->opnum); > } >+ rqstp->rq_defer_data = NULL; >+ rqstp->rq_save_state = NULL; >+ >+ if (status == nfserr_dropit) { >+ cstate->last_op_cnt = resp->opcnt - 1; >+ return status; >+ } > >- cstate_free(cstate); >+ nfsd4_cstate_free(cstate, rqstp->rq_deferred); > out: > nfsd4_release_compoundargs(args); > dprintk("nfsv4 compound returned %d\n", ntohl(status)); >diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >index f266e45..002a57c 100644 >--- a/fs/nfsd/nfs4state.c >+++ b/fs/nfsd/nfs4state.c >@@ -91,6 +91,7 @@ static struct kmem_cache *stateowner_slab = NULL; > static struct kmem_cache *file_slab = NULL; > static struct kmem_cache *stateid_slab = NULL; > static struct kmem_cache *deleg_slab = NULL; >+static struct kmem_cache *cstate_slab; > > void > nfs4_lock_state(void) >@@ -442,6 +443,37 @@ static struct nfs4_client *create_client(struct >xdr_netobj name, char *recdir) > return clp; > } > >+void nfsd4_cstate_free(struct nfsd4_compound_state *cstate, >+ struct svc_deferred_req *dreq) >+{ >+ if (dreq && dreq->release_state) >+ nfsd4_clear_respages(dreq->respages, dreq->resused); >+ if (cstate == NULL) >+ return; >+ fh_put(&cstate->current_fh); >+ fh_put(&cstate->save_fh); >+ BUG_ON(cstate->replay_owner); >+ kmem_cache_free(cstate_slab, cstate); >+} >+ >+struct nfsd4_compound_state *nfsd4_cstate_alloc(struct svc_rqst *rqstp) >+{ >+ struct nfsd4_compound_state *cstate; >+ >+ if (rqstp->rq_deferred && rqstp->rq_deferred->defer_data) { >+ cstate = rqstp->rq_deferred->defer_data; >+ goto out; >+ } >+ cstate = kmem_cache_alloc(cstate_slab, GFP_KERNEL); >+ if (cstate == NULL) >+ return NULL; >+ fh_init(&cstate->current_fh, NFS4_FHSIZE); >+ fh_init(&cstate->save_fh, NFS4_FHSIZE); >+ cstate->replay_owner = NULL; >+out: >+ return cstate; >+} >+ > static void copy_verf(struct nfs4_client *target, nfs4_verifier *source) > { > memcpy(target->cl_verifier.data, source->data, >@@ -986,6 +1018,7 @@ nfsd4_free_slabs(void) > nfsd4_free_slab(&file_slab); > nfsd4_free_slab(&stateid_slab); > nfsd4_free_slab(&deleg_slab); >+ nfsd4_free_slab(&cstate_slab); > } > > static int >@@ -1007,6 +1040,10 @@ nfsd4_init_slabs(void) > sizeof(struct nfs4_delegation), 0, 0, NULL); > if (deleg_slab == NULL) > goto out_nomem; >+ cstate_slab = kmem_cache_create("nfsd4_compound_states", >+ sizeof(struct nfsd4_compound_state), 0, 0, NULL); >+ if (cstate_slab == NULL) >+ goto out_nomem; > return 0; > out_nomem: > nfsd4_free_slabs(); >diff --git a/include/linux/nfsd/xdr4.h b/include/linux/nfsd/xdr4.h >index f3495ae..3407f4b 100644 >--- a/include/linux/nfsd/xdr4.h >+++ b/include/linux/nfsd/xdr4.h >@@ -48,6 +48,8 @@ struct nfsd4_compound_state { > struct svc_fh current_fh; > struct svc_fh save_fh; > struct nfs4_stateowner *replay_owner; >+ __be32 *last_op_p; >+ u32 last_op_cnt; > }; > > struct nfsd4_change_info { >@@ -447,6 +449,9 @@ extern void nfsd4_cache_rqst_pages(struct svc_rqst *rqstp, > struct page **respages, short *resused); > extern void nfsd4_restore_rqst_pages(struct svc_rqst *rqstp, > struct page **respages, short resused); >+extern struct nfsd4_compound_state *nfsd4_cstate_alloc(struct >svc_rqst *rqstp); >+extern void nfsd4_cstate_free(struct nfsd4_compound_state *cstate, >+ struct svc_deferred_req *dreq); > extern __be32 nfsd4_setclientid(struct svc_rqst *rqstp, > struct nfsd4_compound_state *, > struct nfsd4_setclientid *setclid); >-- >1.5.4.3 > >-- >To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <RTPCLUEXC1-PRDNY30k000001cc-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>]
* Re: [PATCH 3/3] NFSD deferral processing [not found] ` <RTPCLUEXC1-PRDNY30k000001cc-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org> @ 2008-10-20 17:58 ` J. Bruce Fields 0 siblings, 0 replies; 18+ messages in thread From: J. Bruce Fields @ 2008-10-20 17:58 UTC (permalink / raw) To: Talpey, Thomas; +Cc: andros, linux-nfs On Fri, Oct 17, 2008 at 04:46:50PM -0400, Talpey, Thomas wrote: > At 05:00 PM 10/15/2008, andros@netapp.com wrote: > >From: Andy Adamson <andros@netapp.com> > > > >Use a slab cache for nfsd4_compound_state allocation > > > >Save the struct nfsd4_compound_state and set the save_state callback for > >each request for potential deferral handling. > > > >If an NFSv4 operation causes a deferral, the save_state callback is called > >by svc_defer which saves the defer_data with the deferral, and sets the > >restore_state deferral callback. > > > >fh_put is called so that the deferral is not referencing the file handles, > >allowing umount of the file system. > > Can you explain the safety of this? If the COMPOUND starts off with an > operation that uses them, what's the implication of one going stale > partway through? I suppose it means the compound could get an unexpected STALE error partway through. > Is fh_verify() when the deferral continues enough to > ensure the fh is protected until the end of the op? I'm not sure what you mean. > There's a comment at the top of fh_verify() that it is "only called at > the start of an nfsproc call", with some assumptions, for instance. That comment could be improved. --b. commit 438d5d3ec385e6385856929719322d569de3e148 Author: J. Bruce Fields <bfields@citi.umich.edu> Date: Mon Oct 20 13:01:59 2008 -0400 nfsd: update fh_verify description diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index cd25d91..8cabcfa 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -253,14 +253,32 @@ out: return error; } -/* - * Perform sanity checks on the dentry in a client's file handle. +/** + * fh_verify - filehandle lookup and access checking + * @rqstp: pointer to current rpc request + * @fhp: filehandle to be verified + * @type: expected type of object pointed to by filehandle + * @access: type of access needed to object + * + * Look up a dentry from the on-the-wire filehandle, check the client's + * access to the export, and set the current task's credentials. + * + * Regardless of success or failure of fh_verify(), fh_put() should be + * called on @fhp when the caller is finished with the filehandle. + * + * fh_verify() may be called multiple times on a given filehandle, for + * example, when processing an NFSv4 compound. The first call will look + * up a dentry using the on-the-wire filehandle. Subsequent calls will + * skip the lookup and just perform the other checks and possibly change + * the current task's credentials. * - * Note that the file handle dentry may need to be freed even after - * an error return. + * @type specifies the type of object expected using one of the S_IF* + * constants defined in include/linux/stat.h. The caller may use zero + * to indicate that it doesn't care, or a negative integer to indicate + * that it expects something not of the given type. * - * This is only called at the start of an nfsproc call, so fhp points to - * a svc_fh which is all 0 except for the over-the-wire file handle. + * @access is formed from the NFSD_MAY_* constants defined in + * include/linux/nfsd/nfsd.h. */ __be32 fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, int access) ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] SUNRPC add deferral processing callbacks 2008-10-15 21:00 ` [PATCH 1/3] SUNRPC add deferral processing callbacks andros 2008-10-15 21:00 ` [PATCH 2/3] NFSD save, restore, and release deferred result pages andros @ 2008-10-17 17:48 ` J. Bruce Fields 2008-10-17 18:42 ` Andy Adamson 2008-10-17 20:35 ` Talpey, Thomas 2 siblings, 1 reply; 18+ messages in thread From: J. Bruce Fields @ 2008-10-17 17:48 UTC (permalink / raw) To: andros; +Cc: linux-nfs, Andy Adamson On Wed, Oct 15, 2008 at 05:00:24PM -0400, andros@netapp.com wrote: > From: Andy Adamson <andros@netapp.com> > > For EOS, NFSD compound RPC deferred processing should restart operation which > caused the deferral. > > Add a callback and a defer_data pointer in svc_rqst to enable svc_defer to > save partial result state in the deferral request. > > Add page pointer storage to svc_deferred_req to cache the pages holding the > partially processed request. > > Add callbacks and a defer_data pointer in svc_deferred_request to enable > svc_deferred_recv to restore and release the partial result state. > > Signed-off-by: Andy Adamson<andros-HgOvQuBEEgQAvxtiuMwx3w@public.gmane.org> Could you fix the missing space there? > --- > include/linux/sunrpc/svc.h | 10 ++++++++++ > net/sunrpc/svc_xprt.c | 20 +++++++++++++++----- > 2 files changed, 25 insertions(+), 5 deletions(-) > > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h > index dc69068..46a4097 100644 > --- a/include/linux/sunrpc/svc.h > +++ b/include/linux/sunrpc/svc.h > @@ -215,6 +215,9 @@ struct svc_rqst { > struct svc_cred rq_cred; /* auth info */ > void * rq_xprt_ctxt; /* transport specific context ptr */ > struct svc_deferred_req*rq_deferred; /* deferred request we are replaying */ > + /* callback to save deferred request state */ > + void (*rq_save_state)(struct svc_rqst *, struct svc_deferred_req *); > + void *rq_defer_data; /* defer state data to save */ > > size_t rq_xprt_hlen; /* xprt header len */ > struct xdr_buf rq_arg; > @@ -323,6 +326,13 @@ struct svc_deferred_req { > union svc_addr_u daddr; /* where reply must come from */ > struct cache_deferred_req handle; > size_t xprt_hlen; > + /* callbacks to restore and release deferred request state > + * set in rq_save_state */ > + void (*restore_state)(struct svc_rqst *, struct svc_deferred_req *); > + void (*release_state)(struct svc_deferred_req *); > + void *defer_data; /* defer state data */ > + struct page *respages[RPCSVC_MAXPAGES]; > + short resused; > int argslen; > __be32 args[0]; > }; > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c > index e46c825..531b325 100644 > --- a/net/sunrpc/svc_xprt.c > +++ b/net/sunrpc/svc_xprt.c > @@ -876,6 +876,8 @@ static void svc_revisit(struct cache_deferred_req *dreq, int too_many) > struct svc_xprt *xprt = dr->xprt; > > if (too_many) { > + if (dr->release_state) > + dr->release_state(dr); > svc_xprt_put(xprt); > kfree(dr); > return; > @@ -902,10 +904,10 @@ static void svc_revisit(struct cache_deferred_req *dreq, int too_many) > static struct cache_deferred_req *svc_defer(struct cache_req *req) > { > struct svc_rqst *rqstp = container_of(req, struct svc_rqst, rq_chandle); > - struct svc_deferred_req *dr; > + struct svc_deferred_req *dr = NULL; > > if (rqstp->rq_arg.page_len) > - return NULL; /* if more than a page, give up FIXME */ > + goto out; /* if more than a page, give up FIXME */ > if (rqstp->rq_deferred) { > dr = rqstp->rq_deferred; > rqstp->rq_deferred = NULL; > @@ -914,9 +916,9 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req) > size_t size; > /* FIXME maybe discard if size too large */ > size = sizeof(struct svc_deferred_req) + rqstp->rq_arg.len; > - dr = kmalloc(size, GFP_KERNEL); > + dr = kzalloc(size, GFP_KERNEL); > if (dr == NULL) > - return NULL; > + goto out; > > dr->handle.owner = rqstp->rq_server; > dr->prot = rqstp->rq_prot; > @@ -935,7 +937,13 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req) > dr->xprt = rqstp->rq_xprt; > > dr->handle.revisit = svc_revisit; > - return &dr->handle; > + if (rqstp->rq_save_state) > + rqstp->rq_save_state(rqstp, dr); > +out: > + rqstp->rq_defer_data = NULL; > + rqstp->rq_save_state = NULL; > + > + return dr ? &dr->handle: NULL; I may not be thinking this through clearly, but: I think the correct place to initialize rq_defer_data and rq_save_state is not here but at the very start of request processing (in svc_recv or at the top of svc_process). We shouldn't depend on svc_defer()--it may never get called. --b. > } > > /* > @@ -959,6 +967,8 @@ static int svc_deferred_recv(struct svc_rqst *rqstp) > rqstp->rq_xprt_hlen = dr->xprt_hlen; > rqstp->rq_daddr = dr->daddr; > rqstp->rq_respages = rqstp->rq_pages; > + if (dr->restore_state) > + dr->restore_state(rqstp, dr); > return (dr->argslen<<2) - dr->xprt_hlen; > } > > -- > 1.5.4.3 > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] SUNRPC add deferral processing callbacks 2008-10-17 17:48 ` [PATCH 1/3] SUNRPC add deferral processing callbacks J. Bruce Fields @ 2008-10-17 18:42 ` Andy Adamson 0 siblings, 0 replies; 18+ messages in thread From: Andy Adamson @ 2008-10-17 18:42 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs, Andy Adamson On Fri, 2008-10-17 at 13:48 -0400, J. Bruce Fields wrote: > On Wed, Oct 15, 2008 at 05:00:24PM -0400, andros@netapp.com wrote: > > From: Andy Adamson <andros@netapp.com> > > > > For EOS, NFSD compound RPC deferred processing should restart operation which > > caused the deferral. > > > > Add a callback and a defer_data pointer in svc_rqst to enable svc_defer to > > save partial result state in the deferral request. > > > > Add page pointer storage to svc_deferred_req to cache the pages holding the > > partially processed request. > > > > Add callbacks and a defer_data pointer in svc_deferred_request to enable > > svc_deferred_recv to restore and release the partial result state. > > > > Signed-off-by: Andy Adamson<andros-HgOvQuBEEgQAvxtiuMwx3w@public.gmane.org> > > Could you fix the missing space there? Oops..:) > > > --- > > include/linux/sunrpc/svc.h | 10 ++++++++++ > > net/sunrpc/svc_xprt.c | 20 +++++++++++++++----- > > 2 files changed, 25 insertions(+), 5 deletions(-) > > > > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h > > index dc69068..46a4097 100644 > > --- a/include/linux/sunrpc/svc.h > > +++ b/include/linux/sunrpc/svc.h > > @@ -215,6 +215,9 @@ struct svc_rqst { > > struct svc_cred rq_cred; /* auth info */ > > void * rq_xprt_ctxt; /* transport specific context ptr */ > > struct svc_deferred_req*rq_deferred; /* deferred request we are replaying */ > > + /* callback to save deferred request state */ > > + void (*rq_save_state)(struct svc_rqst *, struct svc_deferred_req *); > > + void *rq_defer_data; /* defer state data to save */ > > > > size_t rq_xprt_hlen; /* xprt header len */ > > struct xdr_buf rq_arg; > > @@ -323,6 +326,13 @@ struct svc_deferred_req { > > union svc_addr_u daddr; /* where reply must come from */ > > struct cache_deferred_req handle; > > size_t xprt_hlen; > > + /* callbacks to restore and release deferred request state > > + * set in rq_save_state */ > > + void (*restore_state)(struct svc_rqst *, struct svc_deferred_req *); > > + void (*release_state)(struct svc_deferred_req *); > > + void *defer_data; /* defer state data */ > > + struct page *respages[RPCSVC_MAXPAGES]; > > + short resused; > > int argslen; > > __be32 args[0]; > > }; > > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c > > index e46c825..531b325 100644 > > --- a/net/sunrpc/svc_xprt.c > > +++ b/net/sunrpc/svc_xprt.c > > @@ -876,6 +876,8 @@ static void svc_revisit(struct cache_deferred_req *dreq, int too_many) > > struct svc_xprt *xprt = dr->xprt; > > > > if (too_many) { > > + if (dr->release_state) > > + dr->release_state(dr); > > svc_xprt_put(xprt); > > kfree(dr); > > return; > > @@ -902,10 +904,10 @@ static void svc_revisit(struct cache_deferred_req *dreq, int too_many) > > static struct cache_deferred_req *svc_defer(struct cache_req *req) > > { > > struct svc_rqst *rqstp = container_of(req, struct svc_rqst, rq_chandle); > > - struct svc_deferred_req *dr; > > + struct svc_deferred_req *dr = NULL; > > > > if (rqstp->rq_arg.page_len) > > - return NULL; /* if more than a page, give up FIXME */ > > + goto out; /* if more than a page, give up FIXME */ > > if (rqstp->rq_deferred) { > > dr = rqstp->rq_deferred; > > rqstp->rq_deferred = NULL; > > @@ -914,9 +916,9 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req) > > size_t size; > > /* FIXME maybe discard if size too large */ > > size = sizeof(struct svc_deferred_req) + rqstp->rq_arg.len; > > - dr = kmalloc(size, GFP_KERNEL); > > + dr = kzalloc(size, GFP_KERNEL); > > if (dr == NULL) > > - return NULL; > > + goto out; > > > > dr->handle.owner = rqstp->rq_server; > > dr->prot = rqstp->rq_prot; > > @@ -935,7 +937,13 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req) > > dr->xprt = rqstp->rq_xprt; > > > > dr->handle.revisit = svc_revisit; > > - return &dr->handle; > > + if (rqstp->rq_save_state) > > + rqstp->rq_save_state(rqstp, dr); > > +out: > > + rqstp->rq_defer_data = NULL; > > + rqstp->rq_save_state = NULL; > > + > > + return dr ? &dr->handle: NULL; > > I may not be thinking this through clearly, but: I think the correct > place to initialize rq_defer_data and rq_save_state is not here but at > the very start of request processing (in svc_recv or at the top of > svc_process). We shouldn't depend on svc_defer()--it may never get > called. In patch 3/3, nfsd4_proc_compound sets the rq_defer_data and rq_defer_state prior to operation processing, and then sets them back to NULL after operation processing, so the initialization to NULL in svc_defer is not needed. We could indeed initialize these fields at the top of svc_process in which case the setting to NULL at the end of operation processing in nfsd4_proc_compound would not be needed. Your call. -->Andy > --b. > > > } > > > > /* > > @@ -959,6 +967,8 @@ static int svc_deferred_recv(struct svc_rqst *rqstp) > > rqstp->rq_xprt_hlen = dr->xprt_hlen; > > rqstp->rq_daddr = dr->daddr; > > rqstp->rq_respages = rqstp->rq_pages; > > + if (dr->restore_state) > > + dr->restore_state(rqstp, dr); > > return (dr->argslen<<2) - dr->xprt_hlen; > > } > > > > -- > > 1.5.4.3 > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] SUNRPC add deferral processing callbacks 2008-10-15 21:00 ` [PATCH 1/3] SUNRPC add deferral processing callbacks andros 2008-10-15 21:00 ` [PATCH 2/3] NFSD save, restore, and release deferred result pages andros 2008-10-17 17:48 ` [PATCH 1/3] SUNRPC add deferral processing callbacks J. Bruce Fields @ 2008-10-17 20:35 ` Talpey, Thomas [not found] ` <RTPCLUEXC1-PRDf3sll000001cb-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org> 2 siblings, 1 reply; 18+ messages in thread From: Talpey, Thomas @ 2008-10-17 20:35 UTC (permalink / raw) To: andros; +Cc: linux-nfs, bfields At 05:00 PM 10/15/2008, andros@netapp.com wrote: >From: Andy Adamson <andros@netapp.com> > >For EOS, NFSD compound RPC deferred processing should restart operation which >caused the deferral. > >Add a callback and a defer_data pointer in svc_rqst to enable svc_defer to >save partial result state in the deferral request. > >Add page pointer storage to svc_deferred_req to cache the pages holding the >partially processed request. > >Add callbacks and a defer_data pointer in svc_deferred_request to enable >svc_deferred_recv to restore and release the partial result state. > >Signed-off-by: Andy Adamson<andros-HgOvQuBEEgQAvxtiuMwx3w@public.gmane.org> >--- > include/linux/sunrpc/svc.h | 10 ++++++++++ > net/sunrpc/svc_xprt.c | 20 +++++++++++++++----- > 2 files changed, 25 insertions(+), 5 deletions(-) > >diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h >index dc69068..46a4097 100644 >--- a/include/linux/sunrpc/svc.h >+++ b/include/linux/sunrpc/svc.h >@@ -215,6 +215,9 @@ struct svc_rqst { > struct svc_cred rq_cred; /* auth info */ > void * rq_xprt_ctxt; /* transport specific context ptr */ > struct svc_deferred_req*rq_deferred; /* deferred request we are replaying */ >+ /* callback to save deferred request state */ >+ void (*rq_save_state)(struct svc_rqst *, struct svc_deferred_req *); >+ void *rq_defer_data; /* defer state data to save */ > > size_t rq_xprt_hlen; /* xprt header len */ > struct xdr_buf rq_arg; >@@ -323,6 +326,13 @@ struct svc_deferred_req { > union svc_addr_u daddr; /* where reply must come from */ > struct cache_deferred_req handle; > size_t xprt_hlen; >+ /* callbacks to restore and release deferred request state >+ * set in rq_save_state */ >+ void (*restore_state)(struct svc_rqst *, struct svc_deferred_req *); >+ void (*release_state)(struct svc_deferred_req *); >+ void *defer_data; /* defer state data */ >+ struct page *respages[RPCSVC_MAXPAGES]; Is it worthwhile to make this a dynamically allocated pointer? If the max pages size is large (1MB means 256), it's quite a large array for something that may rarely be so full. >+ short resused; I find "resused" to be a little bit confusing. It looks a lot like "reused" and in any case seems to be a pagecount. Maybe, respagecount? > int argslen; > __be32 args[0]; > }; >diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c >index e46c825..531b325 100644 >--- a/net/sunrpc/svc_xprt.c >+++ b/net/sunrpc/svc_xprt.c >@@ -876,6 +876,8 @@ static void svc_revisit(struct cache_deferred_req >*dreq, int too_many) > struct svc_xprt *xprt = dr->xprt; > > if (too_many) { >+ if (dr->release_state) >+ dr->release_state(dr); > svc_xprt_put(xprt); > kfree(dr); > return; >@@ -902,10 +904,10 @@ static void svc_revisit(struct >cache_deferred_req *dreq, int too_many) > static struct cache_deferred_req *svc_defer(struct cache_req *req) > { > struct svc_rqst *rqstp = container_of(req, struct svc_rqst, rq_chandle); >- struct svc_deferred_req *dr; >+ struct svc_deferred_req *dr = NULL; > > if (rqstp->rq_arg.page_len) >- return NULL; /* if more than a page, give up FIXME */ >+ goto out; /* if more than a page, give up FIXME */ > if (rqstp->rq_deferred) { > dr = rqstp->rq_deferred; > rqstp->rq_deferred = NULL; >@@ -914,9 +916,9 @@ static struct cache_deferred_req *svc_defer(struct >cache_req *req) > size_t size; > /* FIXME maybe discard if size too large */ > size = sizeof(struct svc_deferred_req) + rqstp->rq_arg.len; >- dr = kmalloc(size, GFP_KERNEL); >+ dr = kzalloc(size, GFP_KERNEL); Does the entire buffer need to be zeroed? Again, if the rq_arg.len is large, this may be a needless overhead. Tom. > if (dr == NULL) >- return NULL; >+ goto out; > > dr->handle.owner = rqstp->rq_server; > dr->prot = rqstp->rq_prot; >@@ -935,7 +937,13 @@ static struct cache_deferred_req >*svc_defer(struct cache_req *req) > dr->xprt = rqstp->rq_xprt; > > dr->handle.revisit = svc_revisit; >- return &dr->handle; >+ if (rqstp->rq_save_state) >+ rqstp->rq_save_state(rqstp, dr); >+out: >+ rqstp->rq_defer_data = NULL; >+ rqstp->rq_save_state = NULL; >+ >+ return dr ? &dr->handle: NULL; > } > > /* >@@ -959,6 +967,8 @@ static int svc_deferred_recv(struct svc_rqst *rqstp) > rqstp->rq_xprt_hlen = dr->xprt_hlen; > rqstp->rq_daddr = dr->daddr; > rqstp->rq_respages = rqstp->rq_pages; >+ if (dr->restore_state) >+ dr->restore_state(rqstp, dr); > return (dr->argslen<<2) - dr->xprt_hlen; > } > >-- >1.5.4.3 > >-- >To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <RTPCLUEXC1-PRDf3sll000001cb-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>]
* Re: [PATCH 1/3] SUNRPC add deferral processing callbacks [not found] ` <RTPCLUEXC1-PRDf3sll000001cb-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org> @ 2008-10-20 18:42 ` J. Bruce Fields 0 siblings, 0 replies; 18+ messages in thread From: J. Bruce Fields @ 2008-10-20 18:42 UTC (permalink / raw) To: Talpey, Thomas; +Cc: andros, linux-nfs On Fri, Oct 17, 2008 at 04:35:28PM -0400, Talpey, Thomas wrote: > >diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h > >index dc69068..46a4097 100644 > >--- a/include/linux/sunrpc/svc.h > >+++ b/include/linux/sunrpc/svc.h > >@@ -215,6 +215,9 @@ struct svc_rqst { > > struct svc_cred rq_cred; /* auth info */ > > void * rq_xprt_ctxt; /* transport specific context ptr */ > > struct svc_deferred_req*rq_deferred; /* deferred request we are replaying */ > >+ /* callback to save deferred request state */ > >+ void (*rq_save_state)(struct svc_rqst *, struct svc_deferred_req *); > >+ void *rq_defer_data; /* defer state data to save */ > > > > size_t rq_xprt_hlen; /* xprt header len */ > > struct xdr_buf rq_arg; > >@@ -323,6 +326,13 @@ struct svc_deferred_req { > > union svc_addr_u daddr; /* where reply must come from */ > > struct cache_deferred_req handle; > > size_t xprt_hlen; > >+ /* callbacks to restore and release deferred request state > >+ * set in rq_save_state */ > >+ void (*restore_state)(struct svc_rqst *, struct svc_deferred_req *); > >+ void (*release_state)(struct svc_deferred_req *); > >+ void *defer_data; /* defer state data */ > >+ struct page *respages[RPCSVC_MAXPAGES]; > > Is it worthwhile to make this a dynamically allocated pointer? If the max > pages size is large (1MB means 256), So that's a 1k array with 32 bit pointers, a 2k array with 64. We currently allow a maximum of 300 deferred requests (DFR_MAX in net/sunrpc/cache.c). It'd be nice to not turn that into two allocations if the typical case only requires one page, so we'd probably want to allow either a single page pointer or a dynamically-allocated array. I don't know how much to care. --b. > it's quite a large array for something > that may rarely be so full. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] NFSD EOS deferral 2008-10-15 21:00 [PATCH 0/3] NFSD EOS deferral andros 2008-10-15 21:00 ` [PATCH 1/3] SUNRPC add deferral processing callbacks andros @ 2008-10-17 17:44 ` J. Bruce Fields 2008-10-17 18:44 ` Andy Adamson 2008-10-17 18:59 ` Marc Eshel 1 sibling, 2 replies; 18+ messages in thread From: J. Bruce Fields @ 2008-10-17 17:44 UTC (permalink / raw) To: andros; +Cc: linux-nfs On Wed, Oct 15, 2008 at 05:00:23PM -0400, andros@netapp.com wrote: > Here's a patch set for review - it compiles and seems to work, but I haven't > done stress testing, nor testing of all of the combinations of deferral cases. > > A deferral occurs when NFSD needs information from an rpc cache, and an upcall > is required. Instead of NFSD waiting for the cache to be filled by the upcall, > the RPC request is inserted back into the receive stream for processing at a > later time. > > Exactly once semantics require that NFSD compound RPC deferral processing > restart at the operation that caused the deferral, instead of reprocessing the > full compound RPC from the start possibly repeating operation processing. > These patches add three callbacks, a data pointer, and page pointer storage > to the sunrpc svc deferral architecture that NFSD uses to accomplish this goal. > > Deferrals that do not define the callbacks act as before. Care has been taken > to ensure that combinations of deferrals - those from the NFSv4 server with > the callbacks defined, and those from the RPC layer without the callbacks > defined work together correctly. > > Thoughts, comments and suggestions are really appreciated... Requests longer than a page are still not deferred, so large writes that trigger upcalls still get an ERR_DELAY. OK, probably no big deal. I don't think we can apply this until we have some way to track the number and size of deferred requests outstanding and fall back on ERR_DELAY if it's too much. I do sometimes wonder whether continuing with the current deferred-request approach is best, though: - If we're saving out large parts of the request anyway (the response pages), then maybe we should just keep rqstp's on the deferred request queue instead of copying to a separate deferred_request structure. - Then as long as we're saving all that request data, is there really significant savings from not keeping a thread around too? So I wonder if it'd be better just to let threads sleep (and be more aggressive about starting up new threads if appropriate, and add some other heuristics to avoid a situation where the whole server stalls on a temporarily wedged userspace daemon). --b. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] NFSD EOS deferral 2008-10-17 17:44 ` [PATCH 0/3] NFSD EOS deferral J. Bruce Fields @ 2008-10-17 18:44 ` Andy Adamson 2008-10-17 18:59 ` Marc Eshel 1 sibling, 0 replies; 18+ messages in thread From: Andy Adamson @ 2008-10-17 18:44 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs On Fri, 2008-10-17 at 13:44 -0400, J. Bruce Fields wrote: > On Wed, Oct 15, 2008 at 05:00:23PM -0400, andros@netapp.com wrote: > > Here's a patch set for review - it compiles and seems to work, but I haven't > > done stress testing, nor testing of all of the combinations of deferral cases. > > > > A deferral occurs when NFSD needs information from an rpc cache, and an upcall > > is required. Instead of NFSD waiting for the cache to be filled by the upcall, > > the RPC request is inserted back into the receive stream for processing at a > > later time. > > > > Exactly once semantics require that NFSD compound RPC deferral processing > > restart at the operation that caused the deferral, instead of reprocessing the > > full compound RPC from the start possibly repeating operation processing. > > These patches add three callbacks, a data pointer, and page pointer storage > > to the sunrpc svc deferral architecture that NFSD uses to accomplish this goal. > > > > Deferrals that do not define the callbacks act as before. Care has been taken > > to ensure that combinations of deferrals - those from the NFSv4 server with > > the callbacks defined, and those from the RPC layer without the callbacks > > defined work together correctly. > > > > Thoughts, comments and suggestions are really appreciated... > > Requests longer than a page are still not deferred, so large writes that > trigger upcalls still get an ERR_DELAY. OK, probably no big deal. > > I don't think we can apply this until we have some way to track the > number and size of deferred requests outstanding and fall back on > ERR_DELAY if it's too much. > > I do sometimes wonder whether continuing with the current > deferred-request approach is best, though: > > - If we're saving out large parts of the request anyway (the > response pages), then maybe we should just keep rqstp's > on the deferred request queue instead of copying to a separate > deferred_request structure. > - Then as long as we're saving all that request data, is there > really significant savings from not keeping a thread around > too? True, especially if we also save large arg data, as in large writes that trigger upcalls. > > So I wonder if it'd be better just to let threads sleep (and be more > aggressive about starting up new threads if appropriate, and add some > other heuristics to avoid a situation where the whole server stalls on a > temporarily wedged userspace daemon). > > --b. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] NFSD EOS deferral 2008-10-17 17:44 ` [PATCH 0/3] NFSD EOS deferral J. Bruce Fields 2008-10-17 18:44 ` Andy Adamson @ 2008-10-17 18:59 ` Marc Eshel [not found] ` <OF9E4C4BA6.37418EC7-ON882574E5.0067FB2B-882574E5.0068487F@ us.ibm.com> 1 sibling, 1 reply; 18+ messages in thread From: Marc Eshel @ 2008-10-17 18:59 UTC (permalink / raw) To: J. Bruce Fields; +Cc: andros, linux-nfs linux-nfs-owner@vger.kernel.org wrote on 10/17/2008 10:44:54 AM: > "J. Bruce Fields" <bfields@fieldses.org> > > On Wed, Oct 15, 2008 at 05:00:23PM -0400, andros@netapp.com wrote: > > Here's a patch set for review - it compiles and seems to work, butI haven't > > done stress testing, nor testing of all of the combinations of > deferral cases. > > > > A deferral occurs when NFSD needs information from an rpc cache, > and an upcall > > is required. Instead of NFSD waiting for the cache to be filled by > the upcall, > > the RPC request is inserted back into the receive stream for processing at a > > later time. > > > > Exactly once semantics require that NFSD compound RPC deferral processing > > restart at the operation that caused the deferral, instead of > reprocessing the > > full compound RPC from the start possibly repeating operation processing. > > These patches add three callbacks, a data pointer, and page pointer storage > > to the sunrpc svc deferral architecture that NFSD uses to > accomplish this goal. > > > > Deferrals that do not define the callbacks act as before. Care hasbeen taken > > to ensure that combinations of deferrals - those from the NFSv4 server with > > the callbacks defined, and those from the RPC layer without the callbacks > > defined work together correctly. > > > > Thoughts, comments and suggestions are really appreciated... > > Requests longer than a page are still not deferred, so large writes that > trigger upcalls still get an ERR_DELAY. OK, probably no big deal. > > I don't think we can apply this until we have some way to track the > number and size of deferred requests outstanding and fall back on > ERR_DELAY if it's too much. But I thought that the problem here is that the Linux NFS client doesn't handle this return code properly. Marc. > > I do sometimes wonder whether continuing with the current > deferred-request approach is best, though: > > - If we're saving out large parts of the request anyway (the > response pages), then maybe we should just keep rqstp's > on the deferred request queue instead of copying to a separate > deferred_request structure. > - Then as long as we're saving all that request data, is there > really significant savings from not keeping a thread around > too? > > So I wonder if it'd be better just to let threads sleep (and be more > aggressive about starting up new threads if appropriate, and add some > other heuristics to avoid a situation where the whole server stalls on a > temporarily wedged userspace daemon). > > --b. > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <OF9E4C4BA6.37418EC7-ON882574E5.0067FB2B-882574E5.0068487F@ us.ibm.com>]
* Re: [PATCH 0/3] NFSD EOS deferral [not found] ` <OF9E4C4BA6.37418EC7-ON882574E5.0067FB2B-882574E5.0068487F@ us.ibm.com> @ 2008-10-17 20:26 ` Talpey, Thomas [not found] ` <RTPCLUEXC1-PRDidcDj000001ca-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Talpey, Thomas @ 2008-10-17 20:26 UTC (permalink / raw) To: Marc Eshel, J. Bruce Fields, andros; +Cc: linux-nfs At 02:59 PM 10/17/2008, Marc Eshel wrote: >linux-nfs-owner@vger.kernel.org wrote on 10/17/2008 10:44:54 AM: > >> "J. Bruce Fields" <bfields@fieldses.org> >> Requests longer than a page are still not deferred, so large writes that >> trigger upcalls still get an ERR_DELAY. OK, probably no big deal. >> >> I don't think we can apply this until we have some way to track the >> number and size of deferred requests outstanding and fall back on >> ERR_DELAY if it's too much. > >But I thought that the problem here is that the Linux NFS client doesn't >handle this return code properly. Definitely this is an issue. Early clients do one of two things, they either pass the error back to the application, or they enter a buzz loop resending the operation with no delay. Later clients back off, but for a constant five seconds. Either way, the server is generally better off gritting its teeth and completing the operation. Blocking server threads is drastic, but in effect it will stall the client queues and "push back". The issue on Linux is the small number of nfsd contexts involved. It could lead to significant issues possibly including DOS attack. Dropping connections (judiciously) could be used instead of blocking the last few threads, though even that will have consequences. The easy way to test all this is decorate /etc/exports with lots of names, then break the nameservice and start sending requests from many new clients. It's very hard to get it all right. Tom. ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <RTPCLUEXC1-PRDidcDj000001ca-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>]
* Re: [PATCH 0/3] NFSD EOS deferral [not found] ` <RTPCLUEXC1-PRDidcDj000001ca-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org> @ 2008-10-17 20:36 ` J. Bruce Fields 2008-10-17 20:51 ` Talpey, Thomas 0 siblings, 1 reply; 18+ messages in thread From: J. Bruce Fields @ 2008-10-17 20:36 UTC (permalink / raw) To: Talpey, Thomas; +Cc: Marc Eshel, andros, linux-nfs On Fri, Oct 17, 2008 at 04:26:18PM -0400, Talpey, Thomas wrote: > At 02:59 PM 10/17/2008, Marc Eshel wrote: > >linux-nfs-owner@vger.kernel.org wrote on 10/17/2008 10:44:54 AM: > > > >> "J. Bruce Fields" <bfields@fieldses.org> > >> Requests longer than a page are still not deferred, so large writes that > >> trigger upcalls still get an ERR_DELAY. OK, probably no big deal. > >> > >> I don't think we can apply this until we have some way to track the > >> number and size of deferred requests outstanding and fall back on > >> ERR_DELAY if it's too much. > > > >But I thought that the problem here is that the Linux NFS client doesn't > >handle this return code properly. > > Definitely this is an issue. Early clients do one of two things, they either > pass the error back to the application, or they enter a buzz loop resending > the operation with no delay. Later clients back off, but for a constant > five seconds. I haven't tested it, but from fs/nfs/nfs4proc.c:nfs4_delay() it appears to start at a tenth of a second and then do exponential backoff (up to 15 seconds). Looks to me like the code's been that way since at least 2.6.19. --b. > Either way, the server is generally better off gritting its > teeth and completing the operation. > > Blocking server threads is drastic, but in effect it will stall the client > queues and "push back". The issue on Linux is the small number of > nfsd contexts involved. It could lead to significant issues possibly > including DOS attack. Dropping connections (judiciously) could be > used instead of blocking the last few threads, though even that will > have consequences. > > The easy way to test all this is decorate /etc/exports with lots of > names, then break the nameservice and start sending requests from > many new clients. It's very hard to get it all right. > > Tom. > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] NFSD EOS deferral 2008-10-17 20:36 ` J. Bruce Fields @ 2008-10-17 20:51 ` Talpey, Thomas [not found] ` <RTPCLUEXC1-PRDarcR4000001cd-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Talpey, Thomas @ 2008-10-17 20:51 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Marc Eshel, andros, linux-nfs At 04:36 PM 10/17/2008, J. Bruce Fields wrote: >On Fri, Oct 17, 2008 at 04:26:18PM -0400, Talpey, Thomas wrote: >> At 02:59 PM 10/17/2008, Marc Eshel wrote: >> >linux-nfs-owner@vger.kernel.org wrote on 10/17/2008 10:44:54 AM: >> > >> >> "J. Bruce Fields" <bfields@fieldses.org> >> >> Requests longer than a page are still not deferred, so large writes that >> >> trigger upcalls still get an ERR_DELAY. OK, probably no big deal. >> >> >> >> I don't think we can apply this until we have some way to track the >> >> number and size of deferred requests outstanding and fall back on >> >> ERR_DELAY if it's too much. >> > >> >But I thought that the problem here is that the Linux NFS client doesn't >> >handle this return code properly. >> >> Definitely this is an issue. Early clients do one of two things, they either >> pass the error back to the application, or they enter a buzz loop resending >> the operation with no delay. Later clients back off, but for a constant >> five seconds. > >I haven't tested it, but from fs/nfs/nfs4proc.c:nfs4_delay() it appears >to start at a tenth of a second and then do exponential backoff (up to >15 seconds). Looks to me like the code's been that way since at least >2.6.19. I was referring to NFSv3, actually - also impacted by this codepath. But I'll take the opportunity to point out that we'll get 5 retries from an NFSv4 client before 2 seconds go by, and only one from NFSv3 in twice that. In either case, it's a heck of a bad trade to return "I'm busy" only to have your bell rung repeatedly in response. Sorry, I have always hated EJUKEBOX. Tom. > >--b. > >> Either way, the server is generally better off gritting its >> teeth and completing the operation. >> >> Blocking server threads is drastic, but in effect it will stall the client >> queues and "push back". The issue on Linux is the small number of >> nfsd contexts involved. It could lead to significant issues possibly >> including DOS attack. Dropping connections (judiciously) could be >> used instead of blocking the last few threads, though even that will >> have consequences. >> >> The easy way to test all this is decorate /etc/exports with lots of >> names, then break the nameservice and start sending requests from >> many new clients. It's very hard to get it all right. >> >> Tom. >> ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <RTPCLUEXC1-PRDarcR4000001cd-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>]
* Re: [PATCH 0/3] NFSD EOS deferral [not found] ` <RTPCLUEXC1-PRDarcR4000001cd-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org> @ 2008-10-20 18:06 ` J. Bruce Fields 0 siblings, 0 replies; 18+ messages in thread From: J. Bruce Fields @ 2008-10-20 18:06 UTC (permalink / raw) To: Talpey, Thomas; +Cc: Marc Eshel, andros, linux-nfs On Fri, Oct 17, 2008 at 04:51:00PM -0400, Talpey, Thomas wrote: > At 04:36 PM 10/17/2008, J. Bruce Fields wrote: > >On Fri, Oct 17, 2008 at 04:26:18PM -0400, Talpey, Thomas wrote: > >> At 02:59 PM 10/17/2008, Marc Eshel wrote: > >> >linux-nfs-owner@vger.kernel.org wrote on 10/17/2008 10:44:54 AM: > >> > > >> >> "J. Bruce Fields" <bfields@fieldses.org> > >> >> Requests longer than a page are still not deferred, so large writes that > >> >> trigger upcalls still get an ERR_DELAY. OK, probably no big deal. > >> >> > >> >> I don't think we can apply this until we have some way to track the > >> >> number and size of deferred requests outstanding and fall back on > >> >> ERR_DELAY if it's too much. > >> > > >> >But I thought that the problem here is that the Linux NFS client doesn't > >> >handle this return code properly. > >> > >> Definitely this is an issue. Early clients do one of two things, they either > >> pass the error back to the application, or they enter a buzz loop resending > >> the operation with no delay. Later clients back off, but for a constant > >> five seconds. > > > >I haven't tested it, but from fs/nfs/nfs4proc.c:nfs4_delay() it appears > >to start at a tenth of a second and then do exponential backoff (up to > >15 seconds). Looks to me like the code's been that way since at least > >2.6.19. > > I was referring to NFSv3, actually - also impacted by this codepath. OK, sorry. Hm, taking another look: nfs3_rpc_wrapper() uses a constant 5 second delay, and nfs4_async_handle_error() uses a constant 15 second delay, so it's only nfs4_handle_exception that does the exponential backoff. Maybe we could fix that. --b. > But I'll take the opportunity to point out that we'll get 5 retries from > an NFSv4 client before 2 seconds go by, and only one from NFSv3 > in twice that. In either case, it's a heck of a bad trade to return "I'm > busy" only to have your bell rung repeatedly in response. > > Sorry, I have always hated EJUKEBOX. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 0/3] NFSD EOS deferral @ 2008-10-22 18:12 andros 2008-10-22 18:12 ` [PATCH 1/3] SUNRPC add deferral processing callbacks andros 0 siblings, 1 reply; 18+ messages in thread From: andros @ 2008-10-22 18:12 UTC (permalink / raw) To: linux-nfs; +Cc: androsadamson Here's a resend of the deferral patch set for review , responding to comments on the last patch set. A deferral occurs when NFSD needs information from an rpc cache, and an upcall is required. Instead of NFSD waiting for the cache to be filled by the upcall, the RPC request is inserted back into the receive stream for processing at a later time. Exactly once semantics require that NFSD compound RPC deferral processing restart at the operation that caused the deferral, instead of reprocessing the full compound RPC from the start possibly repeating operation processing. These patches add three callbacks, a data pointer, and dynamic page pointer storage to the sunrpc svc deferral architecture that NFSD uses to accomplish this goal. Deferrals that do not define the callbacks act as before. Care has been taken to ensure that combinations of deferrals - those from the NFSv4 server with the callbacks defined, and those from the RPC layer without the callbacks defined work together correctly. NEW: I've limited the number of pages held by all deferrals to the number of pages in the rpc maximum payload - the same as adding another nfsd thread. Most deferrals (most requests) fit in a page or two. We want to service the rare deferral where a large readdir or read reply is followed by a deferral, this limit allows the servicing of one such request at time. I've changed the deferral page pointer storage from static to dynamic. The new svc_rqst fields are initialized in the beginning of svc_process along with the rest of the fields. As always, thoughts, comments and suggestions welcome. -->Andy ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/3] SUNRPC add deferral processing callbacks 2008-10-22 18:12 andros @ 2008-10-22 18:12 ` andros 2008-10-22 18:12 ` [PATCH 2/3] NFSD save, restore, and release deferred result pages andros 0 siblings, 1 reply; 18+ messages in thread From: andros @ 2008-10-22 18:12 UTC (permalink / raw) To: linux-nfs; +Cc: androsadamson, Andy Adamson From: Andy Adamson <andros@netapp.com> For EOS, NFSD compound RPC deferred processing should restart operation which caused the deferral. Add a callback and a defer_data pointer in svc_rqst to enable svc_defer to save partial result state in the deferral request. Dynamically allocate page pointers in the save state callback. A failure to save state will free the deferred request and return NULL which signals the cache to return -ETIMEDOUT (NFSERR_DELAY). Add page pointer storage to svc_deferred_req to cache the pages holding the partially processed request. Add callbacks and a defer_data pointer in svc_deferred_request to enable svc_deferred_recv to restore and release the partial result state. Signed-off-by: Andy Adamson<andros@netapp.com> --- include/linux/sunrpc/svc.h | 10 ++++++++++ net/sunrpc/svc.c | 4 ++++ net/sunrpc/svc_xprt.c | 10 +++++++++- 3 files changed, 23 insertions(+), 1 deletions(-) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 3afe7fb..8cc8a74 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -216,6 +216,9 @@ struct svc_rqst { struct svc_cred rq_cred; /* auth info */ void * rq_xprt_ctxt; /* transport specific context ptr */ struct svc_deferred_req*rq_deferred; /* deferred request we are replaying */ + /* callback to save deferred request state */ + int (*rq_save_state)(struct svc_rqst *, struct svc_deferred_req *); + void *rq_defer_data; /* defer state data to save */ size_t rq_xprt_hlen; /* xprt header len */ struct xdr_buf rq_arg; @@ -324,6 +327,13 @@ struct svc_deferred_req { union svc_addr_u daddr; /* where reply must come from */ struct cache_deferred_req handle; size_t xprt_hlen; + /* callbacks to restore and release deferred request state + * set in rq_save_state */ + void (*restore_state)(struct svc_rqst *, struct svc_deferred_req *); + void (*release_state)(struct svc_deferred_req *); + void *defer_data; /* defer state data */ + struct page **respages; + int respages_used; int argslen; __be32 args[0]; }; diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 54c98d8..8a6c69c 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1024,6 +1024,10 @@ svc_process(struct svc_rqst *rqstp) /* Will be turned off only in gss privacy case: */ rqstp->rq_splice_ok = 1; + /* Reset deferred processing */ + rqstp->rq_defer_data = NULL; + rqstp->rq_save_state = NULL; + /* Setup reply header */ rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index bf5b5cd..0a8d6ab 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -903,6 +903,8 @@ static void svc_revisit(struct cache_deferred_req *dreq, int too_many) struct svc_xprt *xprt = dr->xprt; if (too_many) { + if (dr->release_state) + dr->release_state(dr); svc_xprt_put(xprt); kfree(dr); return; @@ -941,7 +943,7 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req) size_t size; /* FIXME maybe discard if size too large */ size = sizeof(struct svc_deferred_req) + rqstp->rq_arg.len; - dr = kmalloc(size, GFP_KERNEL); + dr = kzalloc(size, GFP_KERNEL); if (dr == NULL) return NULL; @@ -958,6 +960,10 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req) memcpy(dr->args, rqstp->rq_arg.head[0].iov_base - skip, dr->argslen << 2); } + if (rqstp->rq_save_state && !rqstp->rq_save_state(rqstp, dr)) { + kfree(dr); + return NULL; + } svc_xprt_get(rqstp->rq_xprt); dr->xprt = rqstp->rq_xprt; @@ -986,6 +992,8 @@ static int svc_deferred_recv(struct svc_rqst *rqstp) rqstp->rq_xprt_hlen = dr->xprt_hlen; rqstp->rq_daddr = dr->daddr; rqstp->rq_respages = rqstp->rq_pages; + if (dr->restore_state) + dr->restore_state(rqstp, dr); return (dr->argslen<<2) - dr->xprt_hlen; } -- 1.5.4.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/3] NFSD save, restore, and release deferred result pages 2008-10-22 18:12 ` [PATCH 1/3] SUNRPC add deferral processing callbacks andros @ 2008-10-22 18:12 ` andros 2008-10-22 18:12 ` [PATCH 3/3] NFSD deferral processing andros 0 siblings, 1 reply; 18+ messages in thread From: andros @ 2008-10-22 18:12 UTC (permalink / raw) To: linux-nfs; +Cc: androsadamson, Andy Adamson From: Andy Adamson <andros@netapp.com> Dynamically allocate the deferred request page pointer array only if there are enough deferral pages available. Limit the number of available deferral pages to the number of pages in the maximum rpc payload. This allows for one deferral at at time of a request that requires the maximum payload. Most deferrals require a single page. Implement the rq_save_state, resume_state, and release_state RPC deferral callbacks. Save the reply pages in struct svc_deferred_req. Clear the svc_deferred_req respages in the save_state callback to setup for another NFSD operation deferral. Signed-off-by: Andy Adamson<andros@netapp.com> --- fs/nfsd/nfs4proc.c | 129 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/sunrpc/svc.h | 1 + 2 files changed, 130 insertions(+), 0 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 669461e..97f2d25 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -836,6 +836,135 @@ static struct nfsd4_compound_state *cstate_alloc(void) return cstate; } +/* + * RPC deferral callbacks + */ + +void +nfsd4_move_pages(struct page **topages, struct page **frompages, int count) +{ + int i; + + for (i = 0; i < count; i++) { + topages[i] = frompages[i]; + if (!topages[i]) + continue; + get_page(topages[i]); + } +} + +void +nfsd4_cache_rqst_pages(struct svc_rqst *rqstp, struct page **respages, + int *resused) +{ + *resused = rqstp->rq_resused; + nfsd4_move_pages(respages, rqstp->rq_respages, rqstp->rq_resused); +} + +void +nfsd4_restore_rqst_pages(struct svc_rqst *rqstp, struct page **respages, + int resused) +{ + /* release allocated result pages to be replaced from the cache */ + svc_free_res_pages(rqstp); + + rqstp->rq_resused = resused; + nfsd4_move_pages(rqstp->rq_respages, respages, resused); +} + +static void +nfsd4_clear_respages(struct page **respages, int resused) +{ + int i; + + for (i = 0; i < resused; i++) { + if (!respages[i]) + continue; + put_page(respages[i]); + respages[i] = NULL; + } +} + +/* + * Limit the number of pages held by any deferral to the + * number of pages in the maximum rpc_payload. + */ +static struct page** +nfsd4_alloc_deferred_respages(struct svc_rqst *rqstp) +{ + struct page **new = NULL; + u32 maxpages = svc_max_payload(rqstp) >> PAGE_SHIFT; + + new = kcalloc(rqstp->rq_resused, sizeof(struct page *), GFP_KERNEL); + if (!new) + return new; + spin_lock(&nfsd_serv->sv_lock); + if (nfsd_serv->sv_defer_pages_used + rqstp->rq_resused <= maxpages) { + nfsd_serv->sv_defer_pages_used += rqstp->rq_resused; + spin_unlock(&nfsd_serv->sv_lock); + } else { + spin_unlock(&nfsd_serv->sv_lock); + kfree(new); + new = NULL; + } + return new; +} + +void +nfsd4_return_deferred_respages(struct svc_deferred_req *dreq) +{ + nfsd4_clear_respages(dreq->respages, dreq->respages_used); + spin_lock(&nfsd_serv->sv_lock); + nfsd_serv->sv_defer_pages_used -= dreq->respages_used; + spin_unlock(&nfsd_serv->sv_lock); + kfree(dreq->respages); + dreq->respages = NULL; + dreq->respages_used = 0; +} + +static void +nfsd4_release_deferred_state(struct svc_deferred_req *dreq) +{ + nfsd4_return_deferred_respages(dreq); + cstate_free(dreq->defer_data); +} + +static void +nfsd4_restore_deferred_state(struct svc_rqst *rqstp, + struct svc_deferred_req *dreq) +{ + nfsd4_restore_rqst_pages(rqstp, dreq->respages, dreq->respages_used); + /* Reset defer_data for a NFSD deferral revisit interrupted + * by a non-NFSD deferral */ + rqstp->rq_defer_data = dreq->defer_data; +} + +static int +nfsd4_save_deferred_state(struct svc_rqst *rqstp, + struct svc_deferred_req *dreq) + { + struct nfsd4_compound_state *cstate = + (struct nfsd4_compound_state *)rqstp->rq_defer_data; + + /* From NFSD deferral on a previous operation */ + if (dreq->respages) + nfsd4_return_deferred_respages(dreq); + dreq->respages = nfsd4_alloc_deferred_respages(rqstp); + if (!dreq->respages) + return 0; + dreq->respages_used = rqstp->rq_resused; + + fh_put(&cstate->current_fh); + fh_put(&cstate->save_fh); + + nfsd4_cache_rqst_pages(rqstp, dreq->respages, &dreq->respages_used); + + dreq->defer_data = rqstp->rq_defer_data; + dreq->restore_state = nfsd4_restore_deferred_state; + dreq->release_state = nfsd4_release_deferred_state; + return 1; +} + typedef __be32(*nfsd4op_func)(struct svc_rqst *, struct nfsd4_compound_state *, void *); diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 8cc8a74..bf943b7 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -60,6 +60,7 @@ struct svc_serv { unsigned int sv_nrthreads; /* # of server threads */ unsigned int sv_max_payload; /* datagram payload size */ unsigned int sv_max_mesg; /* max_payload + 1 page for overheads */ + unsigned int sv_defer_pages_used; /* deferred pages held */ unsigned int sv_xdrsize; /* XDR buffer size */ struct list_head sv_permsocks; /* all permanent sockets */ -- 1.5.4.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/3] NFSD deferral processing 2008-10-22 18:12 ` [PATCH 2/3] NFSD save, restore, and release deferred result pages andros @ 2008-10-22 18:12 ` andros 0 siblings, 0 replies; 18+ messages in thread From: andros @ 2008-10-22 18:12 UTC (permalink / raw) To: linux-nfs; +Cc: androsadamson, Andy Adamson From: Andy Adamson <andros@netapp.com> Use a slab cache for nfsd4_compound_state allocation Save the struct nfsd4_compound_state and set the save_state callback for each request for potential deferral handling. If an NFSv4 operation causes a deferral, the save_state callback is called by svc_defer which saves the defer_data with the deferral, and sets the restore_state deferral callback. fh_put is called so that the deferral is not referencing the file handles, allowing umount of the file system. Signed-off-by: Andy Adamson<andros@netapp.com> --- fs/nfsd/nfs4proc.c | 45 ++++++++++++++++++--------------------------- fs/nfsd/nfs4state.c | 37 +++++++++++++++++++++++++++++++++++++ include/linux/nfsd/xdr4.h | 6 ++++++ 3 files changed, 61 insertions(+), 27 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 97f2d25..c556e77 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -813,29 +813,6 @@ static inline void nfsd4_increment_op_stats(u32 opnum) nfsdstats.nfs4_opcount[opnum]++; } -static void cstate_free(struct nfsd4_compound_state *cstate) -{ - if (cstate == NULL) - return; - fh_put(&cstate->current_fh); - fh_put(&cstate->save_fh); - BUG_ON(cstate->replay_owner); - kfree(cstate); -} - -static struct nfsd4_compound_state *cstate_alloc(void) -{ - struct nfsd4_compound_state *cstate; - - cstate = kmalloc(sizeof(struct nfsd4_compound_state), GFP_KERNEL); - if (cstate == NULL) - return NULL; - fh_init(&cstate->current_fh, NFS4_FHSIZE); - fh_init(&cstate->save_fh, NFS4_FHSIZE); - cstate->replay_owner = NULL; - return cstate; -} - /* * RPC deferral callbacks */ @@ -925,8 +902,7 @@ nfsd4_return_deferred_respages(struct svc_deferred_req *dreq) static void nfsd4_release_deferred_state(struct svc_deferred_req *dreq) { - nfsd4_return_deferred_respages(dreq); - cstate_free(dreq->defer_data); + nfsd4_cstate_free(dreq->defer_data, dreq); } static void @@ -1015,12 +991,23 @@ nfsd4_proc_compound(struct svc_rqst *rqstp, goto out; status = nfserr_resource; - cstate = cstate_alloc(); + cstate = nfsd4_cstate_alloc(rqstp); if (cstate == NULL) goto out; + if (rqstp->rq_deferred && rqstp->rq_deferred->defer_data) { + resp->opcnt = cstate->last_op_cnt; + resp->p = cstate->last_op_p; + fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_NOP); + fh_verify(rqstp, &cstate->save_fh, 0, NFSD_MAY_NOP); + } + /* Reset to NULL in svc_process */ + rqstp->rq_defer_data = cstate; + rqstp->rq_save_state = nfsd4_save_deferred_state; + status = nfs_ok; while (!status && resp->opcnt < args->opcnt) { + cstate->last_op_p = resp->p; op = &args->ops[resp->opcnt++]; dprintk("nfsv4 compound op #%d/%d: %d (%s)\n", @@ -1085,8 +1072,12 @@ encode_op: nfsd4_increment_op_stats(op->opnum); } + if (status == nfserr_dropit) { + cstate->last_op_cnt = resp->opcnt - 1; + return status; + } - cstate_free(cstate); + nfsd4_cstate_free(cstate, rqstp->rq_deferred); out: nfsd4_release_compoundargs(args); dprintk("nfsv4 compound returned %d\n", ntohl(status)); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 0cc7ff5..6ab67fc 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -90,6 +90,7 @@ static struct kmem_cache *stateowner_slab = NULL; static struct kmem_cache *file_slab = NULL; static struct kmem_cache *stateid_slab = NULL; static struct kmem_cache *deleg_slab = NULL; +static struct kmem_cache *cstate_slab; void nfs4_lock_state(void) @@ -441,6 +442,37 @@ static struct nfs4_client *create_client(struct xdr_netobj name, char *recdir) return clp; } +void nfsd4_cstate_free(struct nfsd4_compound_state *cstate, + struct svc_deferred_req *dreq) +{ + if (dreq && dreq->release_state) + nfsd4_return_deferred_respages(dreq); + if (cstate == NULL) + return; + fh_put(&cstate->current_fh); + fh_put(&cstate->save_fh); + BUG_ON(cstate->replay_owner); + kmem_cache_free(cstate_slab, cstate); +} + +struct nfsd4_compound_state *nfsd4_cstate_alloc(struct svc_rqst *rqstp) +{ + struct nfsd4_compound_state *cstate; + + if (rqstp->rq_deferred && rqstp->rq_deferred->defer_data) { + cstate = rqstp->rq_deferred->defer_data; + goto out; + } + cstate = kmem_cache_alloc(cstate_slab, GFP_KERNEL); + if (cstate == NULL) + return NULL; + fh_init(&cstate->current_fh, NFS4_FHSIZE); + fh_init(&cstate->save_fh, NFS4_FHSIZE); + cstate->replay_owner = NULL; +out: + return cstate; +} + static void copy_verf(struct nfs4_client *target, nfs4_verifier *source) { memcpy(target->cl_verifier.data, source->data, @@ -940,6 +972,7 @@ nfsd4_free_slabs(void) nfsd4_free_slab(&file_slab); nfsd4_free_slab(&stateid_slab); nfsd4_free_slab(&deleg_slab); + nfsd4_free_slab(&cstate_slab); } static int @@ -961,6 +994,10 @@ nfsd4_init_slabs(void) sizeof(struct nfs4_delegation), 0, 0, NULL); if (deleg_slab == NULL) goto out_nomem; + cstate_slab = kmem_cache_create("nfsd4_compound_states", + sizeof(struct nfsd4_compound_state), 0, 0, NULL); + if (cstate_slab == NULL) + goto out_nomem; return 0; out_nomem: nfsd4_free_slabs(); diff --git a/include/linux/nfsd/xdr4.h b/include/linux/nfsd/xdr4.h index 27bd3e3..ced602c 100644 --- a/include/linux/nfsd/xdr4.h +++ b/include/linux/nfsd/xdr4.h @@ -48,6 +48,8 @@ struct nfsd4_compound_state { struct svc_fh current_fh; struct svc_fh save_fh; struct nfs4_stateowner *replay_owner; + __be32 *last_op_p; + u32 last_op_cnt; }; struct nfsd4_change_info { @@ -442,6 +444,10 @@ void nfsd4_encode_replay(struct nfsd4_compoundres *resp, struct nfsd4_op *op); __be32 nfsd4_encode_fattr(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry, __be32 *buffer, int *countp, u32 *bmval, struct svc_rqst *, int ignore_crossmnt); +extern void nfsd4_return_deferred_respages(struct svc_deferred_req *dreq); +extern struct nfsd4_compound_state *nfsd4_cstate_alloc(struct svc_rqst *rqstp); +extern void nfsd4_cstate_free(struct nfsd4_compound_state *cstate, + struct svc_deferred_req *dreq); extern __be32 nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *, struct nfsd4_setclientid *setclid); -- 1.5.4.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
end of thread, other threads:[~2008-10-22 18:12 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-15 21:00 [PATCH 0/3] NFSD EOS deferral andros
2008-10-15 21:00 ` [PATCH 1/3] SUNRPC add deferral processing callbacks andros
2008-10-15 21:00 ` [PATCH 2/3] NFSD save, restore, and release deferred result pages andros
2008-10-15 21:00 ` [PATCH 3/3] NFSD deferral processing andros
2008-10-17 20:46 ` Talpey, Thomas
[not found] ` <RTPCLUEXC1-PRDNY30k000001cc-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2008-10-20 17:58 ` J. Bruce Fields
2008-10-17 17:48 ` [PATCH 1/3] SUNRPC add deferral processing callbacks J. Bruce Fields
2008-10-17 18:42 ` Andy Adamson
2008-10-17 20:35 ` Talpey, Thomas
[not found] ` <RTPCLUEXC1-PRDf3sll000001cb-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2008-10-20 18:42 ` J. Bruce Fields
2008-10-17 17:44 ` [PATCH 0/3] NFSD EOS deferral J. Bruce Fields
2008-10-17 18:44 ` Andy Adamson
2008-10-17 18:59 ` Marc Eshel
[not found] ` <OF9E4C4BA6.37418EC7-ON882574E5.0067FB2B-882574E5.0068487F@ us.ibm.com>
2008-10-17 20:26 ` Talpey, Thomas
[not found] ` <RTPCLUEXC1-PRDidcDj000001ca-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2008-10-17 20:36 ` J. Bruce Fields
2008-10-17 20:51 ` Talpey, Thomas
[not found] ` <RTPCLUEXC1-PRDarcR4000001cd-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2008-10-20 18:06 ` J. Bruce Fields
-- strict thread matches above, loose matches on Subject: below --
2008-10-22 18:12 andros
2008-10-22 18:12 ` [PATCH 1/3] SUNRPC add deferral processing callbacks andros
2008-10-22 18:12 ` [PATCH 2/3] NFSD save, restore, and release deferred result pages andros
2008-10-22 18:12 ` [PATCH 3/3] NFSD deferral processing andros
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.