Linux NFS development
 help / color / mirror / Atom feed
From: Dai Ngo <dai.ngo@oracle.com>
To: Jeff Layton <jlayton@kernel.org>,
	chuck.lever@oracle.com, neilb@suse.de, okorniev@redhat.com,
	tom@talpey.com
Cc: linux-nfs@vger.kernel.org, sagi@grimberg.me
Subject: Re: [PATCH V2 2/2] NFSD: allow client to use write delegation stateid for READ
Date: Tue, 25 Feb 2025 16:31:46 -0800	[thread overview]
Message-ID: <bee17735-d7c6-4a0e-8da9-e8daea9fcd6c@oracle.com> (raw)
In-Reply-To: <2b65654e1e06125ba0b02b0beca3771896d4ab5f.camel@kernel.org>


On 2/25/25 4:31 AM, Jeff Layton wrote:
> On Mon, 2025-02-24 at 17:10 -0800, Dai Ngo wrote:
>> On 2/24/25 7:48 AM, Jeff Layton wrote:
>>> On Fri, 2025-02-21 at 15:42 -0800, Dai Ngo wrote:
>>>> Allow READ using write delegation stateid granted on OPENs with
>>>> OPEN4_SHARE_ACCESS_WRITE only, to accommodate clients whose WRITE
>>>> implementation may unavoidably do (e.g., due to buffer cache
>>>> constraints).
>>>>
>>>> When the server offers a write delegation for an OPEN with
>>>> OPEN4_SHARE_ACCESS_WRITE, the file access mode, the nfs4_file
>>>> and nfs4_ol_stateid are upgraded as if the OPEN was sent with
>>>> OPEN4_SHARE_ACCESS_BOTH.
>>>>
>>>> When this delegation is returned or revoked, the corresponding open
>>>> stateid is looked up and if it's found then the file access mode,
>>>> the nfs4_file and nfs4_ol_stateid are downgraded to remove the read
>>>> access.
>>>>
>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>> ---
>>>>    fs/nfsd/nfs4state.c | 62 +++++++++++++++++++++++++++++++++++++++++++++
>>>>    fs/nfsd/state.h     |  2 ++
>>>>    2 files changed, 64 insertions(+)
>>>>
>>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>>> index b533225e57cf..0c14f902c54c 100644
>>>> --- a/fs/nfsd/nfs4state.c
>>>> +++ b/fs/nfsd/nfs4state.c
>>>> @@ -6126,6 +6126,51 @@ nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
>>>>    	return rc == 0;
>>>>    }
>>>>    
>>>> +/*
>>>> + * Upgrade file access mode to include FMODE_READ. This is called only when
>>>> + * a write delegation is granted for an OPEN with OPEN4_SHARE_ACCESS_WRITE.
>>>> + */
>>>> +static void
>>>> +nfs4_upgrade_rdwr_file_access(struct nfs4_ol_stateid *stp)
>>>> +{
>>>> +	struct nfs4_file *fp = stp->st_stid.sc_file;
>>>> +	struct nfsd_file *nflp;
>>>> +	struct file *file;
>>>> +
>>>> +	spin_lock(&fp->fi_lock);
>>>> +	nflp = fp->fi_fds[O_WRONLY];
>>>> +	file = nflp->nf_file;
>>>> +	file->f_mode |= FMODE_READ;
>>> You can't just do this. Open upgrade/downgrade doesn't exist at the VFS
>>> layer currently. It might work with most local filesystems, but more
>>> complex filesystems have significant context attached to a file. Just
>>> because you've changed it here, doesn't mean that you will _actually_
>>> be able to do reads using it.
>> I think allowing read using a write delegation from a OPEN with
>> WRONLY is an optional feature and is not a requirement from the
>> spec. So if there are filesystems that do not allow this feature
>> to work then it is ok; we did not introduce any new problems with
>> this feature.
>>
> This patch is swapping the O_RDWR fd in the nfsd4_file with an O_WRONLY
> one that has had FMODE_READ added. That will break on filesystems that
> don't actually allow you to do reads on a filp that was opened
> O_WRONLY.

So the current patch might not work for all filesystems, it is not
a perfect solution but it doesn't introduce any bugs or undesired
behavior. I wonder how many filesystems out there fall in to this
category. So far I've only tested with xfs and ext4.

>
>>> This might even be actively unsafe, as you're bypassing permissions
>>> checks here. You never checked whether the file is readable. What if
>>> it's write only? Same clients will do an ACCESS check before allowing
>>> it, but a hostile actor might be able to exploit this.
>> Apparently the NFS server relies on the NFS client to do permission
>> check at time the file is opened. Once the permission check passes and
>> the OPEN is successful, then there is no permission check on READ/WRITE.
>>
>> I wrote this pynfs test to verify:
>>
>> def testReadOnWrOnlyFile(t, env):
>>       """Test read using open stateid with OPEN4_SHARE_ACCESS_WRITE
>>          on file with permission 0222
>>
>>       FLAGS: writedelegations deleg
>>       CODE: DELEG28
>>       """
>>
>>       # create a file with write-only mode (0222)
>>       sess = env.c1.new_client_session(b"%s_2" % env.testname(t))
>>       filename = env.testname(t)
>>       res = open_create_file(sess, filename, open_create=OPEN4_CREATE,
>>               attrs={FATTR4_MODE: 0o222}, access=OPEN4_SHARE_ACCESS_WRITE, want_deleg=False)
>>       check(res)
>>
>>       # write file content
>>       fh = res.resarray[-1].object
>>       stateid = res.resarray[-2].stateid
>>       data = b"write test data"
>>       res = write_file(sess, fh, data, 0, stateid)
>>       check(res)
>>
>>       # close the file
>>       res = close_file(sess, fh, stateid)
>>       check(res)
>>
>>       # OPEN file with OPEN4_SHARE_ACCESS_WRITE
>>       access = OPEN4_SHARE_ACCESS_WRITE|OPEN4_SHARE_ACCESS_WANT_NO_DELEG
>>       res = open_file(sess, filename, access = access, want_deleg = True)
> nit: shouldn't want_deleg be False there?

Thanks! Fixed, result is the same.

>
>>       check(res)
>>
>>       # READ file using open stateid
>>       # Linux NFS server allows READ using open stateid from OPEN4_SHARE_ACCESS_WRITE.
>>       # However, the file permission mode 0222 then the READ should fail!
>>       stateid = res.resarray[-2].stateid
>>       res = sess.compound([op.putfh(fh), op.read(stateid, 0, 10)])
>>       check(res, NFS4ERR_ACCESS)
>>       res = close_file(sess, fh, stateid)
>>       check(res)
>>
>> and the test failed with:
>>        "OP_READ should return NFS4ERR_ACCESS, instead got NFS4_OK"
>>
>> Am i missing something?
>>
> nfsd actually does check permissions on every READ operation via:
>
>    nfs4_preprocess_stateid_op
>       nfs4_check_file
>          nfsd_permission

Ah I missed this check.

>
> You may be bypassing it via the NFSD_MAY_OWNER_OVERRIDE flag.

Yes, the NFSD_MAY_OWNER_OVERRIDE flag bypasses the permission check
if the file owner is the same as the user accessing the file.

>   Does the
> above test also fail if the file is owned by a different user than the
> one running the test?

I modified the test so that the file is created by different user from
the one that got the write delegation (on an open with O_WRONLY) and tries
to do the read using the deleg stateid. The read fails due to the check
in nfsd_permission which is the correct behavior.

>
>>> I think you need to acquire a R/W open from the get-go instead when you
>>> intend to grant a delegation, and just fall back to doing a O_WRONLY
>>> open if that fails (and don't grant one).

I think so far this patch works as expected, except that it might not
work for some filesystems due to the strict permission checking that
you mentioned above. However I'll explore your suggestion to acquire
a R/W open from the get-go to see how it goes.

Thanks,
-Dai

>>>
>>>> +	swap(fp->fi_fds[O_RDWR], fp->fi_fds[O_WRONLY]);
>>>> +	clear_access(NFS4_SHARE_ACCESS_WRITE, stp);
>>>> +	set_access(NFS4_SHARE_ACCESS_BOTH, stp);
>>>> +	__nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);	/* incr fi_access[O_RDONLY] */
>>>> +	spin_unlock(&fp->fi_lock);
>>>> +}
>>>> +
>>>> +/*
>>>> + * Downgrade file access mode to remove FMODE_READ. This is called when
>>>> + * a write delegation, granted for an OPEN with OPEN4_SHARE_ACCESS_WRITE,
>>>> + * is returned.
>>>> + */
>>>> +static void
>>>> +nfs4_downgrade_wronly_file_access(struct nfs4_ol_stateid *stp)
>>>> +{
>>>> +	struct nfs4_file *fp = stp->st_stid.sc_file;
>>>> +	struct nfsd_file *nflp;
>>>> +	struct file *file;
>>>> +
>>>> +	spin_lock(&fp->fi_lock);
>>>> +	nflp = fp->fi_fds[O_RDWR];
>>>> +	file = nflp->nf_file;
>>>> +	file->f_mode &= ~FMODE_READ;
>>>> +	swap(fp->fi_fds[O_WRONLY], fp->fi_fds[O_RDWR]);
>>>> +	clear_access(NFS4_SHARE_ACCESS_BOTH, stp);
>>>> +	set_access(NFS4_SHARE_ACCESS_WRITE, stp);
>>>> +	spin_unlock(&fp->fi_lock);
>>>> +	nfs4_file_put_access(fp, NFS4_SHARE_ACCESS_READ);	/* decr. fi_access[O_RDONLY] */
>>>> +}
>>>> +
>>>>    /*
>>>>     * The Linux NFS server does not offer write delegations to NFSv4.0
>>>>     * clients in order to avoid conflicts between write delegations and
>>>> @@ -6207,6 +6252,11 @@ nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
>>>>    		dp->dl_cb_fattr.ncf_cur_fsize = stat.size;
>>>>    		dp->dl_cb_fattr.ncf_initial_cinfo = nfsd4_change_attribute(&stat);
>>>>    		trace_nfsd_deleg_write(&dp->dl_stid.sc_stateid);
>>>> +
>>>> +		if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) == NFS4_SHARE_ACCESS_WRITE) {
>>>> +			dp->dl_stateid = stp->st_stid.sc_stateid;
>>>> +			nfs4_upgrade_rdwr_file_access(stp);
>>>> +		}
>>>>    	} else {
>>>>    		open->op_delegate_type = deleg_ts ? OPEN_DELEGATE_READ_ATTRS_DELEG :
>>>>    						    OPEN_DELEGATE_READ;
>>>> @@ -7710,6 +7760,8 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>    	struct nfs4_stid *s;
>>>>    	__be32 status;
>>>>    	struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
>>>> +	struct nfs4_ol_stateid *stp;
>>>> +	struct nfs4_stid *stid;
>>>>    
>>>>    	if ((status = fh_verify(rqstp, &cstate->current_fh, S_IFREG, 0)))
>>>>    		return status;
>>>> @@ -7724,6 +7776,16 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>    
>>>>    	trace_nfsd_deleg_return(stateid);
>>>>    	destroy_delegation(dp);
>>>> +
>>>> +	if (dp->dl_stateid.si_generation && dp->dl_stateid.si_opaque.so_id) {
>>>> +		if (!nfsd4_lookup_stateid(cstate, &dp->dl_stateid,
>>>> +				SC_TYPE_OPEN, 0, &stid, nn)) {
>>>> +			stp = openlockstateid(stid);
>>>> +			nfs4_downgrade_wronly_file_access(stp);
>>>> +			nfs4_put_stid(stid);
>>>> +		}
>>>> +	}
>>>> +
>>>>    	smp_mb__after_atomic();
>>>>    	wake_up_var(d_inode(cstate->current_fh.fh_dentry));
>>>>    put_stateid:
>>>> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
>>>> index 74d2d7b42676..3f2f1b92db66 100644
>>>> --- a/fs/nfsd/state.h
>>>> +++ b/fs/nfsd/state.h
>>>> @@ -207,6 +207,8 @@ struct nfs4_delegation {
>>>>    
>>>>    	/* for CB_GETATTR */
>>>>    	struct nfs4_cb_fattr    dl_cb_fattr;
>>>> +
>>>> +	stateid_t		dl_stateid;  /* open stateid */
>>>>    };
>>>>    
>>>>    static inline bool deleg_is_read(u32 dl_type)

      reply	other threads:[~2025-02-26  0:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-21 23:42 [PATCH V2 0/2] NFSD: offer write delegation for OPEN with OPEN4_SHARE_ACCESS only Dai Ngo
2025-02-21 23:42 ` [PATCH V2 1/2] NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE Dai Ngo
2025-02-21 23:42 ` [PATCH V2 2/2] NFSD: allow client to use write delegation stateid for READ Dai Ngo
2025-02-24 14:48   ` Chuck Lever
2025-02-24 21:11     ` Dai Ngo
2025-02-24 21:38       ` Chuck Lever
2025-02-24 15:48   ` Jeff Layton
2025-02-25  1:10     ` Dai Ngo
2025-02-25 12:31       ` Jeff Layton
2025-02-26  0:31         ` Dai Ngo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bee17735-d7c6-4a0e-8da9-e8daea9fcd6c@oracle.com \
    --to=dai.ngo@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=okorniev@redhat.com \
    --cc=sagi@grimberg.me \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox