From: Calum Mackay <calum.mackay@oracle.com>
To: dai.ngo@oracle.com, "J. Bruce Fields" <bfields@fieldses.org>
Cc: Calum Mackay <calum.mackay@oracle.com>,
chuck.lever@oracle.com, linux-nfs@vger.kernel.org
Subject: Re: [PATCH RFC 1/1] nfsd: Initial implementation of NFSv4 Courteous Server
Date: Wed, 16 Jun 2021 20:19:54 +0100 [thread overview]
Message-ID: <93b4ea72-786f-dc7f-d000-e40426ffff02@oracle.com> (raw)
In-Reply-To: <63c6d3e0-87d5-f617-cd82-b91df0d2c4fe@oracle.com>
[-- Attachment #1.1: Type: text/plain, Size: 15707 bytes --]
On 16/06/2021 8:17 pm, dai.ngo@oracle.com wrote:
>
> On 6/16/21 9:02 AM, J. Bruce Fields wrote:
>> On Thu, Jun 03, 2021 at 02:14:38PM -0400, Dai Ngo wrote:
>>> Currently an NFSv4 client must maintain its lease by using the at least
>>> one of the state tokens or if nothing else, by issuing a RENEW (4.0), or
>>> a singleton SEQUENCE (4.1) at least once during each lease period. If
>>> the
>>> client fails to renew the lease, for any reason, the Linux server
>>> expunges
>>> the state tokens immediately upon detection of the "failure to renew the
>>> lease" condition and begins returning NFS4ERR_EXPIRED if the client
>>> should
>>> reconnect and attempt to use the (now) expired state.
>>>
>>> The default lease period for the Linux server is 90 seconds. The
>>> typical
>>> client cuts that in half and will issue a lease renewing operation every
>>> 45 seconds. The 90 second lease period is very short considering the
>>> potential for moderately long term network partitions. A network
>>> partition
>>> refers to any loss of network connectivity between the NFS client and
>>> the
>>> NFS server, regardless of its root cause. This includes NIC
>>> failures, NIC
>>> driver bugs, network misconfigurations & administrative errors,
>>> routers &
>>> switches crashing and/or having software updates applied, even down to
>>> cables being physically pulled. In most cases, these network
>>> failures are
>>> transient, although the duration is unknown.
>>>
>>> A server which does not immediately expunge the state on lease
>>> expiration
>>> is known as a Courteous Server. A Courteous Server continues to
>>> recognize
>>> previously generated state tokens as valid until conflict arises between
>>> the expired state and the requests from another client, or the server
>>> reboots.
>>>
>>> The initial implementation of the Courteous Server will do the
>>> following:
>>>
>>> . when the laundromat thread detects an expired client and if that
>>> client
>>> still has established states on the Linux server then marks the
>>> client as a
>>> COURTESY_CLIENT and skips destroying the client and all its states,
>>> otherwise destroy the client as usual.
>>>
>>> . detects conflict of OPEN request with a COURTESY_CLIENT, destroys the
>>> expired client and all its states, skips the delegation recall then
>>> allows
>>> the conflicting request to succeed.
>>>
>>> . detects conflict of LOCK/LOCKT request with a COURTESY_CLIENT,
>>> destroys
>>> the expired client and all its states then allows the conflicting
>>> request
>>> to succeed.
>>>
>>> To be done:
>>>
>>> . fix a problem with 4.1 where the Linux server keeps returning
>>> SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after the
>>> expired
>>> client re-connects, causing the client to keep sending BCTS requests
>>> to server.
>> Hm, any progress working out what's causing that?
>
> I will fix this in v2 patch.
>
>>
>>> . handle OPEN conflict with share reservations.
>> Sounds doable.
>
> Yes. But I don't know a way to force the Linux client to specify the
> deny mode on OPEN.
We could do this with pynfs, perhaps, for testing?
cheers,
c.
I may have to test this with SMB client: expired
> NFS client should not prevent SMB client from open the file with
> deny mode.
>
>>
>>> . instead of destroy the client anf all its state on conflict, only
>>> destroy
>>> the state that is conflicted with the current request.
>> The other todos I think have to be done before we merge, but this one I
>> think can wait.
>>
>>> . destroy the COURTESY_CLIENT either after a fixed period of time to
>>> release
>>> resources or as reacting to memory pressure.
>> I think we need something here, but it can be pretty simple.
>
> ok.
>
>>
>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>> ---
>>> fs/nfsd/nfs4state.c | 94
>>> +++++++++++++++++++++++++++++++++++++++++++---
>>> fs/nfsd/state.h | 1 +
>>> include/linux/sunrpc/svc.h | 1 +
>>> 3 files changed, 91 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>> index b517a8794400..969995872752 100644
>>> --- a/fs/nfsd/nfs4state.c
>>> +++ b/fs/nfsd/nfs4state.c
>>> @@ -171,6 +171,7 @@ renew_client_locked(struct nfs4_client *clp)
>>> list_move_tail(&clp->cl_lru, &nn->client_lru);
>>> clp->cl_time = ktime_get_boottime_seconds();
>>> + clear_bit(NFSD4_CLIENT_COURTESY, &clp->cl_flags);
>>> }
>>> static void put_client_renew_locked(struct nfs4_client *clp)
>>> @@ -4610,6 +4611,38 @@ static void nfsd_break_one_deleg(struct
>>> nfs4_delegation *dp)
>>> nfsd4_run_cb(&dp->dl_recall);
>>> }
>>> +/*
>>> + * Set rq_conflict_client if the conflict client have expired
>>> + * and return true, otherwise return false.
>>> + */
>>> +static bool
>>> +nfsd_set_conflict_client(struct nfs4_delegation *dp)
>>> +{
>>> + struct svc_rqst *rqst;
>>> + struct nfs4_client *clp;
>>> + struct nfsd_net *nn;
>>> + bool ret;
>>> +
>>> + if (!i_am_nfsd())
>>> + return false;
>>> + rqst = kthread_data(current);
>>> + if (rqst->rq_prog != NFS_PROGRAM || rqst->rq_vers < 4)
>>> + return false;
>> This says we do nothing if the lock request is not coming from nfsd and
>> v4. That doesn't sound right.
>>
>> For example, if the conflicting lock is coming from a local process (not
>> from an NFS client at all), we still need to revoke the courtesy state
>> so that process can make progress.
>
> The reason it checks for NFS request is because it uses rq_conflict_client
> to pass the expired client back to the upper layer which then does the
> expire_client. If this is not from a NFS request then kthread_data() might
> not be the svc_rqst. In that case the code will try to recall the
> delegation
> from the expired client and the recall will eventually timed out.
>
> I will rework this code.
>
> Thanks,
> -Dai
>
>>> + rqst->rq_conflict_client = NULL;
>>> + clp = dp->dl_recall.cb_clp;
>>> + nn = net_generic(clp->net, nfsd_net_id);
>>> + spin_lock(&nn->client_lock);
>>> +
>>> + if (test_bit(NFSD4_CLIENT_COURTESY, &clp->cl_flags) &&
>>> + !mark_client_expired_locked(clp)) {
>>> + rqst->rq_conflict_client = clp;
>>> + ret = true;
>>> + } else
>>> + ret = false;
>>> + spin_unlock(&nn->client_lock);
>>> + return ret;
>>> +}
>>> +
>>> /* Called from break_lease() with i_lock held. */
>>> static bool
>>> nfsd_break_deleg_cb(struct file_lock *fl)
>>> @@ -4618,6 +4651,8 @@ nfsd_break_deleg_cb(struct file_lock *fl)
>>> struct nfs4_delegation *dp = (struct nfs4_delegation
>>> *)fl->fl_owner;
>>> struct nfs4_file *fp = dp->dl_stid.sc_file;
>>> + if (nfsd_set_conflict_client(dp))
>>> + return false;
>>> trace_nfsd_deleg_break(&dp->dl_stid.sc_stateid);
>>> /*
>>> @@ -5237,6 +5272,22 @@ static void nfsd4_deleg_xgrade_none_ext(struct
>>> nfsd4_open *open,
>>> */
>>> }
>>> +static bool
>>> +nfs4_destroy_courtesy_client(struct nfs4_client *clp)
>>> +{
>>> + struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
>>> +
>>> + spin_lock(&nn->client_lock);
>>> + if (!test_bit(NFSD4_CLIENT_COURTESY, &clp->cl_flags) ||
>>> + mark_client_expired_locked(clp)) {
>>> + spin_unlock(&nn->client_lock);
>>> + return false;
>>> + }
>>> + spin_unlock(&nn->client_lock);
>>> + expire_client(clp);
>>> + return true;
>>> +}
>>> +
>>> __be32
>>> nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh
>>> *current_fh, struct nfsd4_open *open)
>>> {
>>> @@ -5286,7 +5337,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp,
>>> struct svc_fh *current_fh, struct nf
>>> goto out;
>>> }
>>> } else {
>>> + rqstp->rq_conflict_client = NULL;
>>> status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open);
>>> + if (status == nfserr_jukebox && rqstp->rq_conflict_client) {
>>> + if
>>> (nfs4_destroy_courtesy_client(rqstp->rq_conflict_client))
>>> + status = nfs4_get_vfs_file(rqstp, fp, current_fh,
>>> stp, open);
>>> + }
>>> +
>>> if (status) {
>>> stp->st_stid.sc_type = NFS4_CLOSED_STID;
>>> release_open_stateid(stp);
>>> @@ -5472,6 +5529,8 @@ nfs4_laundromat(struct nfsd_net *nn)
>>> };
>>> struct nfs4_cpntf_state *cps;
>>> copy_stateid_t *cps_t;
>>> + struct nfs4_stid *stid;
>>> + int id = 0;
>>> int i;
>>> if (clients_still_reclaiming(nn)) {
>>> @@ -5495,6 +5554,15 @@ nfs4_laundromat(struct nfsd_net *nn)
>>> clp = list_entry(pos, struct nfs4_client, cl_lru);
>>> if (!state_expired(<, clp->cl_time))
>>> break;
>>> +
>>> + spin_lock(&clp->cl_lock);
>>> + stid = idr_get_next(&clp->cl_stateids, &id);
>>> + spin_unlock(&clp->cl_lock);
>>> + if (stid) {
>>> + /* client still has states */
>>> + set_bit(NFSD4_CLIENT_COURTESY, &clp->cl_flags);
>>> + continue;
>>> + }
>>> if (mark_client_expired_locked(clp)) {
>>> trace_nfsd_clid_expired(&clp->cl_clientid);
>>> continue;
>>> @@ -6440,7 +6508,7 @@ static const struct lock_manager_operations
>>> nfsd_posix_mng_ops = {
>>> .lm_put_owner = nfsd4_fl_put_owner,
>>> };
>>> -static inline void
>>> +static inline int
>>> nfs4_set_lock_denied(struct file_lock *fl, struct nfsd4_lock_denied
>>> *deny)
>>> {
>>> struct nfs4_lockowner *lo;
>>> @@ -6453,6 +6521,8 @@ nfs4_set_lock_denied(struct file_lock *fl,
>>> struct nfsd4_lock_denied *deny)
>>> /* We just don't care that much */
>>> goto nevermind;
>>> deny->ld_clientid = lo->lo_owner.so_client->cl_clientid;
>>> + if (nfs4_destroy_courtesy_client(lo->lo_owner.so_client))
>>> + return -EAGAIN;
>>> } else {
>>> nevermind:
>>> deny->ld_owner.len = 0;
>>> @@ -6467,6 +6537,7 @@ nfs4_set_lock_denied(struct file_lock *fl,
>>> struct nfsd4_lock_denied *deny)
>>> deny->ld_type = NFS4_READ_LT;
>>> if (fl->fl_type != F_RDLCK)
>>> deny->ld_type = NFS4_WRITE_LT;
>>> + return 0;
>>> }
>>> static struct nfs4_lockowner *
>>> @@ -6734,6 +6805,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct
>>> nfsd4_compound_state *cstate,
>>> unsigned int fl_flags = FL_POSIX;
>>> struct net *net = SVC_NET(rqstp);
>>> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>>> + bool retried = false;
>>> dprintk("NFSD: nfsd4_lock: start=%Ld length=%Ld\n",
>>> (long long) lock->lk_offset,
>>> @@ -6860,7 +6932,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct
>>> nfsd4_compound_state *cstate,
>>> list_add_tail(&nbl->nbl_lru, &nn->blocked_locks_lru);
>>> spin_unlock(&nn->blocked_locks_lock);
>>> }
>>> -
>>> +again:
>>> err = vfs_lock_file(nf->nf_file, F_SETLK, file_lock, conflock);
>>> switch (err) {
>>> case 0: /* success! */
>>> @@ -6875,7 +6947,12 @@ nfsd4_lock(struct svc_rqst *rqstp, struct
>>> nfsd4_compound_state *cstate,
>>> case -EAGAIN: /* conflock holds conflicting lock */
>>> status = nfserr_denied;
>>> dprintk("NFSD: nfsd4_lock: conflicting lock found!\n");
>>> - nfs4_set_lock_denied(conflock, &lock->lk_denied);
>>> +
>>> + /* try again if conflict with courtesy client */
>>> + if (nfs4_set_lock_denied(conflock, &lock->lk_denied) ==
>>> -EAGAIN && !retried) {
>>> + retried = true;
>>> + goto again;
>>> + }
>>> break;
>>> case -EDEADLK:
>>> status = nfserr_deadlock;
>>> @@ -6962,6 +7039,8 @@ nfsd4_lockt(struct svc_rqst *rqstp, struct
>>> nfsd4_compound_state *cstate,
>>> struct nfs4_lockowner *lo = NULL;
>>> __be32 status;
>>> struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
>>> + bool retried = false;
>>> + int ret;
>>> if (locks_in_grace(SVC_NET(rqstp)))
>>> return nfserr_grace;
>>> @@ -7010,14 +7089,19 @@ nfsd4_lockt(struct svc_rqst *rqstp, struct
>>> nfsd4_compound_state *cstate,
>>> file_lock->fl_end = last_byte_offset(lockt->lt_offset,
>>> lockt->lt_length);
>>> nfs4_transform_lock_offset(file_lock);
>>> -
>>> +again:
>>> status = nfsd_test_lock(rqstp, &cstate->current_fh, file_lock);
>>> if (status)
>>> goto out;
>>> if (file_lock->fl_type != F_UNLCK) {
>>> status = nfserr_denied;
>>> - nfs4_set_lock_denied(file_lock, &lockt->lt_denied);
>>> + ret = nfs4_set_lock_denied(file_lock, &lockt->lt_denied);
>>> + if (ret == -EAGAIN && !retried) {
>>> + retried = true;
>>> + fh_clear_wcc(&cstate->current_fh);
>>> + goto again;
>>> + }
>>> }
>>> out:
>>> if (lo)
>>> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
>>> index e73bdbb1634a..bdc3605e3722 100644
>>> --- a/fs/nfsd/state.h
>>> +++ b/fs/nfsd/state.h
>>> @@ -345,6 +345,7 @@ struct nfs4_client {
>>> #define NFSD4_CLIENT_UPCALL_LOCK (5) /* upcall serialization */
>>> #define NFSD4_CLIENT_CB_FLAG_MASK (1 << NFSD4_CLIENT_CB_UPDATE | \
>>> 1 << NFSD4_CLIENT_CB_KILL)
>>> +#define NFSD4_CLIENT_COURTESY (6) /* be nice to expired
>>> client */
>>> unsigned long cl_flags;
>>> const struct cred *cl_cb_cred;
>>> struct rpc_clnt *cl_cb_client;
>>> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
>>> index e91d51ea028b..2f0382f9d0ff 100644
>>> --- a/include/linux/sunrpc/svc.h
>>> +++ b/include/linux/sunrpc/svc.h
>>> @@ -304,6 +304,7 @@ struct svc_rqst {
>>> * net namespace
>>> */
>>> void ** rq_lease_breaker; /* The v4 client breaking
>>> a lease */
>>> + void *rq_conflict_client;
>>> };
>>> #define SVC_NET(rqst) (rqst->rq_xprt ? rqst->rq_xprt->xpt_net :
>>> rqst->rq_bc_net)
>>> --
>>> 2.9.5
--
Calum Mackay
Linux Kernel Engineering
Oracle Linux and Virtualisation
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
next prev parent reply other threads:[~2021-06-16 19:20 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-03 18:14 [PATCH RFC 1/1] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo
2021-06-11 8:42 ` dai.ngo
2021-06-16 16:02 ` J. Bruce Fields
2021-06-16 16:32 ` Chuck Lever III
2021-06-16 19:25 ` dai.ngo
2021-06-16 19:29 ` Chuck Lever III
2021-06-16 20:30 ` Bruce Fields
2021-06-16 19:17 ` dai.ngo
2021-06-16 19:19 ` Calum Mackay [this message]
2021-06-16 19:27 ` dai.ngo
2021-06-24 14:02 ` J. Bruce Fields
2021-06-24 19:50 ` dai.ngo
2021-06-24 20:36 ` J. Bruce Fields
2021-06-28 20:23 ` J. Bruce Fields
2021-06-28 23:39 ` dai.ngo
2021-06-29 4:40 ` dai.ngo
2021-06-30 1:35 ` J. Bruce Fields
2021-06-30 8:41 ` dai.ngo
2021-06-30 14:52 ` J. Bruce Fields
2021-06-30 17:51 ` dai.ngo
2021-06-30 18:05 ` J. Bruce Fields
2021-06-30 18:49 ` dai.ngo
2021-06-30 18:55 ` J. Bruce Fields
2021-06-30 19:13 ` dai.ngo
2021-06-30 19:24 ` J. Bruce Fields
2021-06-30 23:48 ` dai.ngo
2021-07-01 1:16 ` J. Bruce Fields
2021-06-30 15:13 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=93b4ea72-786f-dc7f-d000-e40426ffff02@oracle.com \
--to=calum.mackay@oracle.com \
--cc=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=dai.ngo@oracle.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox