From: NeilBrown <neilb@suse.de>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>,
Kinglong Mee <kinglongmee@gmail.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH RFC] NFSD: fix cannot umounting mount points under pseudo root
Date: Wed, 6 May 2015 08:27:02 +1000 [thread overview]
Message-ID: <20150506082702.0a867be4@notabene.brown> (raw)
In-Reply-To: <20150504214822.GA16827@fieldses.org>
[-- Attachment #1: Type: text/plain, Size: 6231 bytes --]
On Mon, 4 May 2015 17:48:22 -0400 "J. Bruce Fields" <bfields@fieldses.org>
wrote:
> On Sun, May 03, 2015 at 09:16:53AM +1000, NeilBrown wrote:
> > On Fri, 1 May 2015 09:29:53 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> > wrote:
> >
> > > On Fri, May 01, 2015 at 01:08:26PM +1000, NeilBrown wrote:
> > > > On Fri, 1 May 2015 03:29:40 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote:
> > > >
> > > > > On Fri, May 01, 2015 at 12:23:33PM +1000, NeilBrown wrote:
> > > > > > > What kind of consistency warranties do callers expect, BTW? You do realize
> > > > > > > that between iterate_dir() and callbacks an entry might have been removed
> > > > > > > and/or replaced?
> > > > > >
> > > > > > For READDIR_PLUS, lookup_one_len is called on each name and it requires
> > > > > > i_mutex, so the code currently holds i_mutex over the whole sequence.
> > > > > > This is triggering a deadlock.
> > > > >
> > > > > Yes, I've seen the context. However, you are _not_ holding it between
> > > > > actual iterate_dir() and those callbacks, which opens a window when
> > > > > directory might have been changed.
> > > > >
> > > > > Again, what kind of consistency is expected by callers? Are they ready to
> > > > > cope with "there's no such entry anymore" or "inumber is nothing like
> > > > > what we'd put in ->ino, since it's no the same object" or "->d_type is
> > > > > completely unrelated to what we'd found, since the damn thing had been
> > > > > removed and created from scratch"?
> > > >
> > > > Ah, sorry.
> > > >
> > > > Yes, the callers are prepared for "there's no such entry anymore".
> > > > They don't use d_type, so don't care if it might be meaningless.
> > > > NFSv4 doesn't use ino either, but NFSv3 does and isn't properly cautious
> > > > about ino changing.
> > > >
> > > > In nfs3xdr, we should probably pass 'ino' to encode_entryplus_baggage() and
> > > > thence to compose_entry_fh() and it should report failure if
> > > > dchild->d_inode->i_ino doesn't match.
> > >
> > > Just to make sure I understand the concern..... So it shouldn't really
> > > be a problem if readdir and lookup find different objects for the same
> > > name, the problem is just when we mix attributes from the two objects,
> > > right? Looks like the v3 code could return an inode number derived from
> > > the readdir and a filehandle from the lookup, which is a problem. The
> > > v4 code will get everything from the result of the lookup, which should
> > > be OK.
> >
> > That agrees with my understanding, yes.
> >
> > I did wonder for a little while about the possibility of a directory
> > containing both 'a' and 'b', and NFSv4 doing the readdir and the stat of 'a',
> > and the a "mv a b" happening before the stat of 'b'.
> >
> > Then the readdir response will show both 'a' and 'b' referring to the same
> > object with a link count of 1.
> >
> > I can't quite decide if that is a problem or not.
> >
> >
> > >
> > > > Simply not returning the extra attributes is perfectly acceptable in NFSv3.
> > >
> > > Right, so no big deal anyway.--b.
> >
> > Not a big deal, but we should really add a patch like the following ("like"
> > as in "actually compile tested and documented" which this one isn't).
>
> Doesn't seem to break anything. Any second thoughts, or can I add a
> signed-off-by?
No second thoughts.
Signed-off-by: NeilBrown <neilb@suse.de>
Thanks.
NeilBrown
>
> --b.
>
> commit e11f8acace69
> Author: NeilBrown <neilb@suse.de>
> Date: Sun May 3 09:16:53 2015 +1000
>
> nfsd: stop READDIRPLUS returning inconsistent attributes
>
> The NFSv3 READDIRPLUS gets some of the returned attributes from the
> readdir, and some from an inode returned from a new lookup. The two
> objects could be different thanks to intervening renames.
>
> The attributes in READDIRPLUS are optional, so let's just skip them if
> we notice this case.
>
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
>
> diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> index e4b2b4322553..f6e7cbabac5a 100644
> --- a/fs/nfsd/nfs3xdr.c
> +++ b/fs/nfsd/nfs3xdr.c
> @@ -805,7 +805,7 @@ encode_entry_baggage(struct nfsd3_readdirres *cd, __be32 *p, const char *name,
>
> static __be32
> compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp,
> - const char *name, int namlen)
> + const char *name, int namlen, u64 ino)
> {
> struct svc_export *exp;
> struct dentry *dparent, *dchild;
> @@ -830,19 +830,21 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp,
> goto out;
> if (d_really_is_negative(dchild))
> goto out;
> + if (dchild->d_inode->i_ino != ino)
> + goto out;
> rv = fh_compose(fhp, exp, dchild, &cd->fh);
> out:
> dput(dchild);
> return rv;
> }
>
> -static __be32 *encode_entryplus_baggage(struct nfsd3_readdirres *cd, __be32 *p, const char *name, int namlen)
> +static __be32 *encode_entryplus_baggage(struct nfsd3_readdirres *cd, __be32 *p, const char *name, int namlen, u64 ino)
> {
> struct svc_fh *fh = &cd->scratch;
> __be32 err;
>
> fh_init(fh, NFS3_FHSIZE);
> - err = compose_entry_fh(cd, fh, name, namlen);
> + err = compose_entry_fh(cd, fh, name, namlen, ino);
> if (err) {
> *p++ = 0;
> *p++ = 0;
> @@ -927,7 +929,7 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
> p = encode_entry_baggage(cd, p, name, namlen, ino);
>
> if (plus)
> - p = encode_entryplus_baggage(cd, p, name, namlen);
> + p = encode_entryplus_baggage(cd, p, name, namlen, ino);
> num_entry_words = p - cd->buffer;
> } else if (*(page+1) != NULL) {
> /* temporarily encode entry into next page, then move back to
> @@ -941,7 +943,7 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
> p1 = encode_entry_baggage(cd, p1, name, namlen, ino);
>
> if (plus)
> - p1 = encode_entryplus_baggage(cd, p1, name, namlen);
> + p1 = encode_entryplus_baggage(cd, p1, name, namlen, ino);
>
> /* determine entry word length and lengths to go in pages */
> num_entry_words = p1 - tmp;
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
next prev parent reply other threads:[~2015-05-06 3:48 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-21 14:50 [PATCH RFC] NFSD: fix cannot umounting mount points under pseudo root Kinglong Mee
2015-04-21 21:54 ` J. Bruce Fields
2015-04-22 5:07 ` NeilBrown
2015-04-22 11:11 ` Kinglong Mee
2015-04-22 15:07 ` J. Bruce Fields
2015-04-22 23:44 ` NeilBrown
2015-04-23 12:52 ` Kinglong Mee
2015-04-24 3:00 ` NeilBrown
2015-04-27 12:11 ` Kinglong Mee
2015-04-29 2:57 ` NeilBrown
2015-04-29 8:45 ` Kinglong Mee
2015-04-29 19:19 ` J. Bruce Fields
2015-04-29 21:52 ` NeilBrown
2015-04-30 21:36 ` J. Bruce Fields
2015-05-01 1:53 ` NeilBrown
2015-05-01 2:03 ` Al Viro
2015-05-01 2:23 ` NeilBrown
2015-05-01 2:29 ` Al Viro
2015-05-01 3:08 ` NeilBrown
2015-05-01 13:29 ` J. Bruce Fields
2015-05-02 23:16 ` NeilBrown
2015-05-03 0:37 ` J. Bruce Fields
2015-05-04 4:11 ` NeilBrown
2015-05-04 21:48 ` J. Bruce Fields
2015-05-05 22:27 ` NeilBrown [this message]
2015-05-04 22:01 ` J. Bruce Fields
2015-05-05 13:54 ` Kinglong Mee
2015-05-05 14:18 ` J. Bruce Fields
2015-05-05 15:52 ` J. Bruce Fields
2015-05-05 22:26 ` NeilBrown
2015-05-08 16:15 ` J. Bruce Fields
2015-05-08 20:01 ` [PATCH] nfsd: don't hold i_mutex over userspace upcalls J. Bruce Fields
2015-06-03 15:18 ` J. Bruce Fields
2015-07-05 11:27 ` Kinglong Mee
2015-07-06 18:22 ` J. Bruce Fields
2015-08-18 19:10 ` J. Bruce Fields
2015-11-12 21:22 ` J. Bruce Fields
2015-05-07 15:31 ` [PATCH RFC] NFSD: fix cannot umounting mount points under pseudo root J. Bruce Fields
2015-05-07 22:42 ` NeilBrown
2015-05-08 14:10 ` J. Bruce Fields
2015-05-05 3:53 ` Kinglong Mee
2015-05-05 4:19 ` NeilBrown
2015-05-05 8:32 ` Kinglong Mee
2015-05-05 13:52 ` J. Bruce Fields
2015-06-26 23:14 ` Kinglong Mee
2015-06-26 23:35 ` NeilBrown
2015-07-02 9:42 ` Kinglong Mee
2015-05-01 1:55 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150506082702.0a867be4@notabene.brown \
--to=neilb@suse.de \
--cc=bfields@fieldses.org \
--cc=kinglongmee@gmail.com \
--cc=linux-nfs@vger.kernel.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).