linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* incorrect EXDEV error occasionally returned for rename over NFS
@ 2010-11-15 21:46 Joe Habermann
  2010-11-16 18:55 ` J. Bruce Fields
  0 siblings, 1 reply; 7+ messages in thread
From: Joe Habermann @ 2010-11-15 21:46 UTC (permalink / raw)
  To: linux-nfs

When running SVN checkouts over NFSv3, they occasionally fail with an
EXDEV error:

(etc.)
U    snfs/build/Makefile.hpux
svn: In directory 'snfs/build'
svn: Can't move 'snfs/build/tempfile.tmp' to
'snfs/build/Makefile.sol': Invalid cross-device link

I've noticed that other people have reported this problem but it remains
unresolved.  Through some debugging, I believe I have found root cause.
nfsd_rename() contains the following code:

1707 /*
1708  * Rename a file
1709  * N.B. After this call _both_ ffhp and tfhp need an fh_put 1710  */
1711 __be32
1712 nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char
*fname, int flen,
1713                             struct svc_fh *tfhp, char *tname, int tlen)
1714 {
1715         struct dentry   *fdentry, *tdentry, *odentry, *ndentry, *trap;
1716         struct inode    *fdir, *tdir;
1717         __be32          err;
1718         int             host_err;
1719
1720         err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
1721         if (err)
1722                 goto out;
1723         err = fh_verify(rqstp, tfhp, S_IFDIR, NFSD_MAY_CREATE);
1724         if (err)
1725                 goto out;
1726
1727         fdentry = ffhp->fh_dentry;
1728         fdir = fdentry->d_inode;
1729
1730         tdentry = tfhp->fh_dentry;
1731         tdir = tdentry->d_inode;
1732
1733         err = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
1734         if (ffhp->fh_export != tfhp->fh_export)
1735                 goto out;

Line 1734 compares the fh_export pointers in the file handles for
the "from" and "to" directories.  However, this is a bad check since
it's possible these pointers differ yet refer to the same underlying
file system.  Based on code inspection, encached export pointers are
only valid for 30 minutes before they expire and are reinserted through
calls made between the kernel and rpc.mountd.  So a race exists where the
ffhp->fh_export is populated on line 1720, the export cache is updated,
and tfhp->fh_export is set on line 1723 to a different value.

This window can be cranked wide open by putting an "msleep(10);"
between lines 1722 and 1723 above, recompiling and reinstalling nfsd.ko
etc., and then running SVN checkouts in a loop from an NFS client.
Within 30 minutes, the problem is seen.

I'm still very new to the NFS code, but I'm wonder whether the check
needs to be something like this instead:

--- vfs.c.orig  2010-11-15 14:35:18.000000000 -0600
+++ vfs.c       2010-11-15 14:43:52.000000000 -0600
@@ -1731,7 +1731,7 @@ nfsd_rename(struct svc_rqst *rqstp, stru
   tdir = tdentry->d_inode;

   err = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
-       if (ffhp->fh_export != tfhp->fh_export)
+       if (fdir->i_sb != tdir->i_sb)
       goto out;

   err = nfserr_perm;

Thoughts?  I think it's either that or some locking needs to be put in
to guarantee that the file handles for the "to" and "from" directory
always wind up with the same export pointer when they belong to the
same file system.  I guess the other question is why there is an
EXDEV check at the top of nfsd_rename() instead of putting it in
vfs_rename().

For full disclosure, I'm actually running the SLES11 SP1 kernel,
linux-2.6.32.12-0.7, but I've looked at the latest upstream kernels and
they appear to have the same code and I assume the same issue.

Thanks, -Joe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: incorrect EXDEV error occasionally returned for rename over NFS
  2010-11-15 21:46 incorrect EXDEV error occasionally returned for rename over NFS Joe Habermann
@ 2010-11-16 18:55 ` J. Bruce Fields
  2010-11-16 23:49   ` Joe Habermann
  2010-11-17 23:39   ` Joe Habermann
  0 siblings, 2 replies; 7+ messages in thread
From: J. Bruce Fields @ 2010-11-16 18:55 UTC (permalink / raw)
  To: Joe Habermann; +Cc: linux-nfs

On Mon, Nov 15, 2010 at 03:46:44PM -0600, Joe Habermann wrote:
> When running SVN checkouts over NFSv3, they occasionally fail with an
> EXDEV error:
> 
> (etc.)
> U    snfs/build/Makefile.hpux
> svn: In directory 'snfs/build'
> svn: Can't move 'snfs/build/tempfile.tmp' to
> 'snfs/build/Makefile.sol': Invalid cross-device link
> 
> I've noticed that other people have reported this problem but it remains
> unresolved.  Through some debugging, I believe I have found root cause.
> nfsd_rename() contains the following code:
> 
> 1707 /*
> 1708  * Rename a file
> 1709  * N.B. After this call _both_ ffhp and tfhp need an fh_put 1710  */
> 1711 __be32
> 1712 nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char
> *fname, int flen,
> 1713                             struct svc_fh *tfhp, char *tname, int tlen)
> 1714 {
> 1715         struct dentry   *fdentry, *tdentry, *odentry, *ndentry, *trap;
> 1716         struct inode    *fdir, *tdir;
> 1717         __be32          err;
> 1718         int             host_err;
> 1719
> 1720         err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
> 1721         if (err)
> 1722                 goto out;
> 1723         err = fh_verify(rqstp, tfhp, S_IFDIR, NFSD_MAY_CREATE);
> 1724         if (err)
> 1725                 goto out;
> 1726
> 1727         fdentry = ffhp->fh_dentry;
> 1728         fdir = fdentry->d_inode;
> 1729
> 1730         tdentry = tfhp->fh_dentry;
> 1731         tdir = tdentry->d_inode;
> 1732
> 1733         err = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
> 1734         if (ffhp->fh_export != tfhp->fh_export)
> 1735                 goto out;
> 
> Line 1734 compares the fh_export pointers in the file handles for
> the "from" and "to" directories.  However, this is a bad check since
> it's possible these pointers differ yet refer to the same underlying
> file system.  Based on code inspection, encached export pointers are
> only valid for 30 minutes before they expire and are reinserted through
> calls made between the kernel and rpc.mountd.  So a race exists where the
> ffhp->fh_export is populated on line 1720, the export cache is updated,
> and tfhp->fh_export is set on line 1723 to a different value.
> 
> This window can be cranked wide open by putting an "msleep(10);"
> between lines 1722 and 1723 above, recompiling and reinstalling nfsd.ko
> etc., and then running SVN checkouts in a loop from an NFS client.
> Within 30 minutes, the problem is seen.

Good detective work, thanks.  Have you confirmed that applying your
change below eliminates the problem?

> I'm still very new to the NFS code, but I'm wonder whether the check
> needs to be something like this instead:
> 
> --- vfs.c.orig  2010-11-15 14:35:18.000000000 -0600
> +++ vfs.c       2010-11-15 14:43:52.000000000 -0600
> @@ -1731,7 +1731,7 @@ nfsd_rename(struct svc_rqst *rqstp, stru
>    tdir = tdentry->d_inode;
> 
>    err = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
> -       if (ffhp->fh_export != tfhp->fh_export)
> +       if (fdir->i_sb != tdir->i_sb)
>        goto out;
> 
>    err = nfserr_perm;
> 
> Thoughts?  I think it's either that or some locking needs to be put in
> to guarantee that the file handles for the "to" and "from" directory
> always wind up with the same export pointer when they belong to the
> same file system.  I guess the other question is why there is an
> EXDEV check at the top of nfsd_rename() instead of putting it in
> vfs_rename().

The fh_verify call is also responsible for setting the identity of the
user, and that may be affected by export options like root_squash.  In
the above case if ffhp was on a root_squash export, and tfhp on a
no_root_squash export, and we get a request from uid 0, we'd end up
with an incoming uid-0 user running as root (since the fh_verify for
tfhp was performed last), and might allow renaming from a directory that
only root had write permission to.

It's a minor hole (especially since it's already easy for a client to
circumvent protections that vary from export to export in the default
no_subtree_check case).  But we should be able to do better.

I'm not sure what to suggest.

--b.

> 
> For full disclosure, I'm actually running the SLES11 SP1 kernel,
> linux-2.6.32.12-0.7, but I've looked at the latest upstream kernels and
> they appear to have the same code and I assume the same issue.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: incorrect EXDEV error occasionally returned for rename over NFS
  2010-11-16 18:55 ` J. Bruce Fields
@ 2010-11-16 23:49   ` Joe Habermann
  2010-11-17 23:39   ` Joe Habermann
  1 sibling, 0 replies; 7+ messages in thread
From: Joe Habermann @ 2010-11-16 23:49 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

I did test out the fix and it did avoid the EXDEV problem.  But I see your
point about it opening up a small security hole.

The other thought I had was to use checking similar to what's in
svc_export_match() to see if the two exports were really for the same
namespace/client domain:

--- vfs.c.orig  2011-02-10 18:39:30.000000000 -0600
+++ vfs.c       2011-02-10 18:39:46.000000000 -0600
@@ -1744,8 +1744,12 @@ nfsd_rename(struct svc_rqst *rqstp, stru
        tdir = tdentry->d_inode;

        err = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
-       if (ffhp->fh_export != tfhp->fh_export)
+       if (ffhp->fh_export != tfhp->fh_export) {
+           if (ffhp->fh_export->ex_client      != tfhp->fh_export->ex_client ||
+                ffhp->fh_export->ex_path.dentry !=
tfhp->fh_export->ex_path.dentry ||
+                ffhp->fh_export->ex_path.mnt    !=
tfhp->fh_export->ex_path.mnt)
                goto out;
+       }

        err = nfserr_perm;
        if (!flen || isdotent(fname, flen) || !tlen || isdotent(tname, tlen))

I don't think this is perfect because in the race window, the old export
could be removed and a new one could enter the cache with a different
ex_client pointer.

More noodling.

Thanks, -Joe

On Tue, Nov 16, 2010 at 12:55 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Mon, Nov 15, 2010 at 03:46:44PM -0600, Joe Habermann wrote:
>> When running SVN checkouts over NFSv3, they occasionally fail with an
>> EXDEV error:
>>
>> (etc.)
>> U    snfs/build/Makefile.hpux
>> svn: In directory 'snfs/build'
>> svn: Can't move 'snfs/build/tempfile.tmp' to
>> 'snfs/build/Makefile.sol': Invalid cross-device link
>>
>> I've noticed that other people have reported this problem but it remains
>> unresolved.  Through some debugging, I believe I have found root cause.
>> nfsd_rename() contains the following code:
>>
>> 1707 /*
>> 1708  * Rename a file
>> 1709  * N.B. After this call _both_ ffhp and tfhp need an fh_put 1710  */
>> 1711 __be32
>> 1712 nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char
>> *fname, int flen,
>> 1713                             struct svc_fh *tfhp, char *tname, int tlen)
>> 1714 {
>> 1715         struct dentry   *fdentry, *tdentry, *odentry, *ndentry, *trap;
>> 1716         struct inode    *fdir, *tdir;
>> 1717         __be32          err;
>> 1718         int             host_err;
>> 1719
>> 1720         err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
>> 1721         if (err)
>> 1722                 goto out;
>> 1723         err = fh_verify(rqstp, tfhp, S_IFDIR, NFSD_MAY_CREATE);
>> 1724         if (err)
>> 1725                 goto out;
>> 1726
>> 1727         fdentry = ffhp->fh_dentry;
>> 1728         fdir = fdentry->d_inode;
>> 1729
>> 1730         tdentry = tfhp->fh_dentry;
>> 1731         tdir = tdentry->d_inode;
>> 1732
>> 1733         err = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
>> 1734         if (ffhp->fh_export != tfhp->fh_export)
>> 1735                 goto out;
>>
>> Line 1734 compares the fh_export pointers in the file handles for
>> the "from" and "to" directories.  However, this is a bad check since
>> it's possible these pointers differ yet refer to the same underlying
>> file system.  Based on code inspection, encached export pointers are
>> only valid for 30 minutes before they expire and are reinserted through
>> calls made between the kernel and rpc.mountd.  So a race exists where the
>> ffhp->fh_export is populated on line 1720, the export cache is updated,
>> and tfhp->fh_export is set on line 1723 to a different value.
>>
>> This window can be cranked wide open by putting an "msleep(10);"
>> between lines 1722 and 1723 above, recompiling and reinstalling nfsd.ko
>> etc., and then running SVN checkouts in a loop from an NFS client.
>> Within 30 minutes, the problem is seen.
>
> Good detective work, thanks.  Have you confirmed that applying your
> change below eliminates the problem?
>
>> I'm still very new to the NFS code, but I'm wonder whether the check
>> needs to be something like this instead:
>>
>> --- vfs.c.orig  2010-11-15 14:35:18.000000000 -0600
>> +++ vfs.c       2010-11-15 14:43:52.000000000 -0600
>> @@ -1731,7 +1731,7 @@ nfsd_rename(struct svc_rqst *rqstp, stru
>>    tdir = tdentry->d_inode;
>>
>>    err = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
>> -       if (ffhp->fh_export != tfhp->fh_export)
>> +       if (fdir->i_sb != tdir->i_sb)
>>        goto out;
>>
>>    err = nfserr_perm;
>>
>> Thoughts?  I think it's either that or some locking needs to be put in
>> to guarantee that the file handles for the "to" and "from" directory
>> always wind up with the same export pointer when they belong to the
>> same file system.  I guess the other question is why there is an
>> EXDEV check at the top of nfsd_rename() instead of putting it in
>> vfs_rename().
>
> The fh_verify call is also responsible for setting the identity of the
> user, and that may be affected by export options like root_squash.  In
> the above case if ffhp was on a root_squash export, and tfhp on a
> no_root_squash export, and we get a request from uid 0, we'd end up
> with an incoming uid-0 user running as root (since the fh_verify for
> tfhp was performed last), and might allow renaming from a directory that
> only root had write permission to.
>
> It's a minor hole (especially since it's already easy for a client to
> circumvent protections that vary from export to export in the default
> no_subtree_check case).  But we should be able to do better.
>
> I'm not sure what to suggest.
>
> --b.
>
>>
>> For full disclosure, I'm actually running the SLES11 SP1 kernel,
>> linux-2.6.32.12-0.7, but I've looked at the latest upstream kernels and
>> they appear to have the same code and I assume the same issue.
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: incorrect EXDEV error occasionally returned for rename over NFS
  2010-11-16 18:55 ` J. Bruce Fields
  2010-11-16 23:49   ` Joe Habermann
@ 2010-11-17 23:39   ` Joe Habermann
  2010-11-19 23:57     ` J. Bruce Fields
  1 sibling, 1 reply; 7+ messages in thread
From: Joe Habermann @ 2010-11-17 23:39 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

OK, so I think I have an approach that may work and not require major
refactoring.  The basic idea is to maintain a sequence number for the
svc_export_cache.  We snap the sequence at the top of nfsd_rename.
And we snap it again after we've called fh_verify for the "to" directory.
If the sequence number changed, then the cache changed and trying
to compare the export pointers can lead to false positives or false negatives.
In this case, throw away export and the dentry information and start over.

Other changes:
1) got rid of "inuse" in cache_detail.  It really wasn't being used.
2) Remove check for ex_path.mnt near the bottom of nfsd_rename.
Can't we assert that ffhp->fh_export == tfhp->fh_export?
3) Threw in a stat to see how often we have to retry ("rename_retries")

I'm testing with this now and so far so good but
comments/corrections/suggestions
welcome since I'm very new to this code.

Thanks, -Joe

--- ../linux-2.6.32.12-0.7/./include/linux/sunrpc/cache.h
2010-05-20 05:06:54.000000000 -0500
+++ ./include/linux/sunrpc/cache.h      2011-02-11 16:04:19.000000000 -0600
@@ -74,7 +74,8 @@ struct cache_detail {
        struct cache_head **    hash_table;
        rwlock_t                hash_lock;

-       atomic_t                inuse; /* active user-space update or lookup */
+       atomic_t                seqno; /* bumped when cache changes
+                                       * not used by all caches */

        char                    *name;
        void                    (*cache_put)(struct kref *);
--- ../linux-2.6.32.12-0.7/./net/sunrpc/cache.c 2010-05-20
05:06:54.000000000 -0500
+++ ./net/sunrpc/cache.c        2011-02-11 20:03:55.000000000 -0600
@@ -96,6 +96,7 @@ struct cache_head *sunrpc_cache_lookup(s
        *head = new;
        detail->entries++;
        cache_get(new);
+       atomic_inc(&detail->seqno);
        write_unlock(&detail->hash_lock);

        return new;
@@ -141,6 +142,7 @@ struct cache_head *sunrpc_cache_update(s
                        cache_fresh_locked(old, new->expiry_time);
                        write_unlock(&detail->hash_lock);
                        cache_fresh_unlocked(old, detail);
+                       atomic_inc(&detail->seqno);
                        return old;
                }
                write_unlock(&detail->hash_lock);
@@ -170,6 +172,7 @@ struct cache_head *sunrpc_cache_update(s
        cache_fresh_unlocked(tmp, detail);
        cache_fresh_unlocked(old, detail);
        cache_put(old, detail);
+       atomic_inc(&detail->seqno);
        return tmp;
 }
 EXPORT_SYMBOL_GPL(sunrpc_cache_update);
@@ -241,11 +244,13 @@ int cache_check(struct cache_detail *det
                                        cache_fresh_unlocked(h, detail);
                                        rv = -ENOENT;
                                }
+                               atomic_inc(&detail->seqno);
                                break;

                        case -EAGAIN:
                                clear_bit(CACHE_PENDING, &h->flags);
                                cache_revisit_request(h);
+                               atomic_inc(&detail->seqno); /* not needed? */
                                break;
                        }
                }
@@ -327,7 +332,7 @@ static void sunrpc_destroy_cache_detail(
        cache_purge(cd);
        spin_lock(&cache_list_lock);
        write_lock(&cd->hash_lock);
-       if (cd->entries || atomic_read(&cd->inuse)) {
+       if (cd->entries) {
                write_unlock(&cd->hash_lock);
                spin_unlock(&cache_list_lock);
                goto out;
--- ../linux-2.6.32.12-0.7/./fs/nfsd/vfs.c      2010-05-20
05:06:54.000000000 -0500
+++ ./fs/nfsd/vfs.c     2011-02-11 16:03:24.000000000 -0600
@@ -1717,6 +1717,8 @@ out_nfserr:
        goto out_unlock;
 }

+static atomic_t rename_retries = ATOMIC_INIT(0);
+
 /*
  * Rename a file
  * N.B. After this call _both_ ffhp and tfhp need an fh_put
@@ -1729,14 +1731,33 @@ nfsd_rename(struct svc_rqst *rqstp, stru
        struct inode    *fdir, *tdir;
        __be32          err;
        int             host_err;
+       int             seqsav;

+retry:
+       seqsav = atomic_read(&svc_export_cache.seqno);
        err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
        if (err)
                goto out;
        err = fh_verify(rqstp, tfhp, S_IFDIR, NFSD_MAY_CREATE);
        if (err)
                goto out;

+       /* if cache changed, the fh_export pointer comparison
+        * below is not valid, in general */
+       if (unlikely(seqsav != atomic_read(&svc_export_cache.seqno))) {
+               cache_put(&ffhp->fh_export->h, &svc_export_cache);
+               ffhp->fh_export = NULL;
+               cache_put(&tfhp->fh_export->h, &svc_export_cache);
+               tfhp->fh_export = NULL;
+               dput(ffhp->fh_dentry);
+               ffhp->fh_dentry = NULL;
+               dput(tfhp->fh_dentry);
+               tfhp->fh_dentry = NULL;
+               atomic_inc(&rename_retries);
+               goto retry;
+       }
+
        fdentry = ffhp->fh_dentry;
        fdir = fdentry->d_inode;

@@ -1785,9 +1806,6 @@ nfsd_rename(struct svc_rqst *rqstp, stru
                        goto out_dput_new;
        }

-       host_err = -EXDEV;
-       if (ffhp->fh_export->ex_path.mnt != tfhp->fh_export->ex_path.mnt)
-               goto out_dput_new;
        host_err = mnt_want_write(ffhp->fh_export->ex_path.mnt);
        if (host_err)
                goto out_dput_new;


On Tue, Nov 16, 2010 at 12:55 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Mon, Nov 15, 2010 at 03:46:44PM -0600, Joe Habermann wrote:
>> When running SVN checkouts over NFSv3, they occasionally fail with an
>> EXDEV error:
>>
>> (etc.)
>> U    snfs/build/Makefile.hpux
>> svn: In directory 'snfs/build'
>> svn: Can't move 'snfs/build/tempfile.tmp' to
>> 'snfs/build/Makefile.sol': Invalid cross-device link
>>
>> I've noticed that other people have reported this problem but it remains
>> unresolved.  Through some debugging, I believe I have found root cause.
>> nfsd_rename() contains the following code:
>>
>> 1707 /*
>> 1708  * Rename a file
>> 1709  * N.B. After this call _both_ ffhp and tfhp need an fh_put 1710  */
>> 1711 __be32
>> 1712 nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char
>> *fname, int flen,
>> 1713                             struct svc_fh *tfhp, char *tname, int tlen)
>> 1714 {
>> 1715         struct dentry   *fdentry, *tdentry, *odentry, *ndentry, *trap;
>> 1716         struct inode    *fdir, *tdir;
>> 1717         __be32          err;
>> 1718         int             host_err;
>> 1719
>> 1720         err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
>> 1721         if (err)
>> 1722                 goto out;
>> 1723         err = fh_verify(rqstp, tfhp, S_IFDIR, NFSD_MAY_CREATE);
>> 1724         if (err)
>> 1725                 goto out;
>> 1726
>> 1727         fdentry = ffhp->fh_dentry;
>> 1728         fdir = fdentry->d_inode;
>> 1729
>> 1730         tdentry = tfhp->fh_dentry;
>> 1731         tdir = tdentry->d_inode;
>> 1732
>> 1733         err = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
>> 1734         if (ffhp->fh_export != tfhp->fh_export)
>> 1735                 goto out;
>>
>> Line 1734 compares the fh_export pointers in the file handles for
>> the "from" and "to" directories.  However, this is a bad check since
>> it's possible these pointers differ yet refer to the same underlying
>> file system.  Based on code inspection, encached export pointers are
>> only valid for 30 minutes before they expire and are reinserted through
>> calls made between the kernel and rpc.mountd.  So a race exists where the
>> ffhp->fh_export is populated on line 1720, the export cache is updated,
>> and tfhp->fh_export is set on line 1723 to a different value.
>>
>> This window can be cranked wide open by putting an "msleep(10);"
>> between lines 1722 and 1723 above, recompiling and reinstalling nfsd.ko
>> etc., and then running SVN checkouts in a loop from an NFS client.
>> Within 30 minutes, the problem is seen.
>
> Good detective work, thanks.  Have you confirmed that applying your
> change below eliminates the problem?
>
>> I'm still very new to the NFS code, but I'm wonder whether the check
>> needs to be something like this instead:
>>
>> --- vfs.c.orig  2010-11-15 14:35:18.000000000 -0600
>> +++ vfs.c       2010-11-15 14:43:52.000000000 -0600
>> @@ -1731,7 +1731,7 @@ nfsd_rename(struct svc_rqst *rqstp, stru
>>    tdir = tdentry->d_inode;
>>
>>    err = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
>> -       if (ffhp->fh_export != tfhp->fh_export)
>> +       if (fdir->i_sb != tdir->i_sb)
>>        goto out;
>>
>>    err = nfserr_perm;
>>
>> Thoughts?  I think it's either that or some locking needs to be put in
>> to guarantee that the file handles for the "to" and "from" directory
>> always wind up with the same export pointer when they belong to the
>> same file system.  I guess the other question is why there is an
>> EXDEV check at the top of nfsd_rename() instead of putting it in
>> vfs_rename().
>
> The fh_verify call is also responsible for setting the identity of the
> user, and that may be affected by export options like root_squash.  In
> the above case if ffhp was on a root_squash export, and tfhp on a
> no_root_squash export, and we get a request from uid 0, we'd end up
> with an incoming uid-0 user running as root (since the fh_verify for
> tfhp was performed last), and might allow renaming from a directory that
> only root had write permission to.
>
> It's a minor hole (especially since it's already easy for a client to
> circumvent protections that vary from export to export in the default
> no_subtree_check case).  But we should be able to do better.
>
> I'm not sure what to suggest.
>
> --b.
>
>>
>> For full disclosure, I'm actually running the SLES11 SP1 kernel,
>> linux-2.6.32.12-0.7, but I've looked at the latest upstream kernels and
>> they appear to have the same code and I assume the same issue.
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: incorrect EXDEV error occasionally returned for rename over NFS
  2010-11-17 23:39   ` Joe Habermann
@ 2010-11-19 23:57     ` J. Bruce Fields
  2010-11-23 20:55       ` Joe Habermann
  0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2010-11-19 23:57 UTC (permalink / raw)
  To: Joe Habermann; +Cc: linux-nfs

On Wed, Nov 17, 2010 at 05:39:01PM -0600, Joe Habermann wrote:
> OK, so I think I have an approach that may work and not require major
> refactoring.  The basic idea is to maintain a sequence number for the
> svc_export_cache.  We snap the sequence at the top of nfsd_rename.
> And we snap it again after we've called fh_verify for the "to" directory.
> If the sequence number changed, then the cache changed and trying
> to compare the export pointers can lead to false positives or false negatives.
> In this case, throw away export and the dentry information and start over.
> 
> Other changes:
> 1) got rid of "inuse" in cache_detail.  It really wasn't being used.
> 2) Remove check for ex_path.mnt near the bottom of nfsd_rename.
> Can't we assert that ffhp->fh_export == tfhp->fh_export?
> 3) Threw in a stat to see how often we have to retry ("rename_retries")
> 
> I'm testing with this now and so far so good but
> comments/corrections/suggestions
> welcome since I'm very new to this code.

Hm, thanks, that looks like an approach that should work.

It sure would be simpler, though if we could just solve the problem by
replacing the comparison of export pointers by a better comparison....

I understand your concern about svc_export_match--I'd need to reread
some code to figure out if we can have duplicate auth_domains in the
same way we can with exports--but if so we can always just do a strcmp()
of the ex_client->name's instead.

How about trying that?  Just define a small helper function that does
those checks, and call it from rename in place of the export
comparisons?

--b.

> 
> Thanks, -Joe
> 
> --- ../linux-2.6.32.12-0.7/./include/linux/sunrpc/cache.h
> 2010-05-20 05:06:54.000000000 -0500
> +++ ./include/linux/sunrpc/cache.h      2011-02-11 16:04:19.000000000 -0600
> @@ -74,7 +74,8 @@ struct cache_detail {
>         struct cache_head **    hash_table;
>         rwlock_t                hash_lock;
> 
> -       atomic_t                inuse; /* active user-space update or lookup */
> +       atomic_t                seqno; /* bumped when cache changes
> +                                       * not used by all caches */
> 
>         char                    *name;
>         void                    (*cache_put)(struct kref *);
> --- ../linux-2.6.32.12-0.7/./net/sunrpc/cache.c 2010-05-20
> 05:06:54.000000000 -0500
> +++ ./net/sunrpc/cache.c        2011-02-11 20:03:55.000000000 -0600
> @@ -96,6 +96,7 @@ struct cache_head *sunrpc_cache_lookup(s
>         *head = new;
>         detail->entries++;
>         cache_get(new);
> +       atomic_inc(&detail->seqno);
>         write_unlock(&detail->hash_lock);
> 
>         return new;
> @@ -141,6 +142,7 @@ struct cache_head *sunrpc_cache_update(s
>                         cache_fresh_locked(old, new->expiry_time);
>                         write_unlock(&detail->hash_lock);
>                         cache_fresh_unlocked(old, detail);
> +                       atomic_inc(&detail->seqno);
>                         return old;
>                 }
>                 write_unlock(&detail->hash_lock);
> @@ -170,6 +172,7 @@ struct cache_head *sunrpc_cache_update(s
>         cache_fresh_unlocked(tmp, detail);
>         cache_fresh_unlocked(old, detail);
>         cache_put(old, detail);
> +       atomic_inc(&detail->seqno);
>         return tmp;
>  }
>  EXPORT_SYMBOL_GPL(sunrpc_cache_update);
> @@ -241,11 +244,13 @@ int cache_check(struct cache_detail *det
>                                         cache_fresh_unlocked(h, detail);
>                                         rv = -ENOENT;
>                                 }
> +                               atomic_inc(&detail->seqno);
>                                 break;
> 
>                         case -EAGAIN:
>                                 clear_bit(CACHE_PENDING, &h->flags);
>                                 cache_revisit_request(h);
> +                               atomic_inc(&detail->seqno); /* not needed? */
>                                 break;
>                         }
>                 }
> @@ -327,7 +332,7 @@ static void sunrpc_destroy_cache_detail(
>         cache_purge(cd);
>         spin_lock(&cache_list_lock);
>         write_lock(&cd->hash_lock);
> -       if (cd->entries || atomic_read(&cd->inuse)) {
> +       if (cd->entries) {
>                 write_unlock(&cd->hash_lock);
>                 spin_unlock(&cache_list_lock);
>                 goto out;
> --- ../linux-2.6.32.12-0.7/./fs/nfsd/vfs.c      2010-05-20
> 05:06:54.000000000 -0500
> +++ ./fs/nfsd/vfs.c     2011-02-11 16:03:24.000000000 -0600
> @@ -1717,6 +1717,8 @@ out_nfserr:
>         goto out_unlock;
>  }
> 
> +static atomic_t rename_retries = ATOMIC_INIT(0);
> +
>  /*
>   * Rename a file
>   * N.B. After this call _both_ ffhp and tfhp need an fh_put
> @@ -1729,14 +1731,33 @@ nfsd_rename(struct svc_rqst *rqstp, stru
>         struct inode    *fdir, *tdir;
>         __be32          err;
>         int             host_err;
> +       int             seqsav;
> 
> +retry:
> +       seqsav = atomic_read(&svc_export_cache.seqno);
>         err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
>         if (err)
>                 goto out;
>         err = fh_verify(rqstp, tfhp, S_IFDIR, NFSD_MAY_CREATE);
>         if (err)
>                 goto out;
> 
> +       /* if cache changed, the fh_export pointer comparison
> +        * below is not valid, in general */
> +       if (unlikely(seqsav != atomic_read(&svc_export_cache.seqno))) {
> +               cache_put(&ffhp->fh_export->h, &svc_export_cache);
> +               ffhp->fh_export = NULL;
> +               cache_put(&tfhp->fh_export->h, &svc_export_cache);
> +               tfhp->fh_export = NULL;
> +               dput(ffhp->fh_dentry);
> +               ffhp->fh_dentry = NULL;
> +               dput(tfhp->fh_dentry);
> +               tfhp->fh_dentry = NULL;
> +               atomic_inc(&rename_retries);
> +               goto retry;
> +       }
> +
>         fdentry = ffhp->fh_dentry;
>         fdir = fdentry->d_inode;
> 
> @@ -1785,9 +1806,6 @@ nfsd_rename(struct svc_rqst *rqstp, stru
>                         goto out_dput_new;
>         }
> 
> -       host_err = -EXDEV;
> -       if (ffhp->fh_export->ex_path.mnt != tfhp->fh_export->ex_path.mnt)
> -               goto out_dput_new;
>         host_err = mnt_want_write(ffhp->fh_export->ex_path.mnt);
>         if (host_err)
>                 goto out_dput_new;
> 
> 
> On Tue, Nov 16, 2010 at 12:55 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > On Mon, Nov 15, 2010 at 03:46:44PM -0600, Joe Habermann wrote:
> >> When running SVN checkouts over NFSv3, they occasionally fail with an
> >> EXDEV error:
> >>
> >> (etc.)
> >> U    snfs/build/Makefile.hpux
> >> svn: In directory 'snfs/build'
> >> svn: Can't move 'snfs/build/tempfile.tmp' to
> >> 'snfs/build/Makefile.sol': Invalid cross-device link
> >>
> >> I've noticed that other people have reported this problem but it remains
> >> unresolved.  Through some debugging, I believe I have found root cause.
> >> nfsd_rename() contains the following code:
> >>
> >> 1707 /*
> >> 1708  * Rename a file
> >> 1709  * N.B. After this call _both_ ffhp and tfhp need an fh_put 1710  */
> >> 1711 __be32
> >> 1712 nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char
> >> *fname, int flen,
> >> 1713                             struct svc_fh *tfhp, char *tname, int tlen)
> >> 1714 {
> >> 1715         struct dentry   *fdentry, *tdentry, *odentry, *ndentry, *trap;
> >> 1716         struct inode    *fdir, *tdir;
> >> 1717         __be32          err;
> >> 1718         int             host_err;
> >> 1719
> >> 1720         err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
> >> 1721         if (err)
> >> 1722                 goto out;
> >> 1723         err = fh_verify(rqstp, tfhp, S_IFDIR, NFSD_MAY_CREATE);
> >> 1724         if (err)
> >> 1725                 goto out;
> >> 1726
> >> 1727         fdentry = ffhp->fh_dentry;
> >> 1728         fdir = fdentry->d_inode;
> >> 1729
> >> 1730         tdentry = tfhp->fh_dentry;
> >> 1731         tdir = tdentry->d_inode;
> >> 1732
> >> 1733         err = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
> >> 1734         if (ffhp->fh_export != tfhp->fh_export)
> >> 1735                 goto out;
> >>
> >> Line 1734 compares the fh_export pointers in the file handles for
> >> the "from" and "to" directories.  However, this is a bad check since
> >> it's possible these pointers differ yet refer to the same underlying
> >> file system.  Based on code inspection, encached export pointers are
> >> only valid for 30 minutes before they expire and are reinserted through
> >> calls made between the kernel and rpc.mountd.  So a race exists where the
> >> ffhp->fh_export is populated on line 1720, the export cache is updated,
> >> and tfhp->fh_export is set on line 1723 to a different value.
> >>
> >> This window can be cranked wide open by putting an "msleep(10);"
> >> between lines 1722 and 1723 above, recompiling and reinstalling nfsd.ko
> >> etc., and then running SVN checkouts in a loop from an NFS client.
> >> Within 30 minutes, the problem is seen.
> >
> > Good detective work, thanks.  Have you confirmed that applying your
> > change below eliminates the problem?
> >
> >> I'm still very new to the NFS code, but I'm wonder whether the check
> >> needs to be something like this instead:
> >>
> >> --- vfs.c.orig  2010-11-15 14:35:18.000000000 -0600
> >> +++ vfs.c       2010-11-15 14:43:52.000000000 -0600
> >> @@ -1731,7 +1731,7 @@ nfsd_rename(struct svc_rqst *rqstp, stru
> >>    tdir = tdentry->d_inode;
> >>
> >>    err = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
> >> -       if (ffhp->fh_export != tfhp->fh_export)
> >> +       if (fdir->i_sb != tdir->i_sb)
> >>        goto out;
> >>
> >>    err = nfserr_perm;
> >>
> >> Thoughts?  I think it's either that or some locking needs to be put in
> >> to guarantee that the file handles for the "to" and "from" directory
> >> always wind up with the same export pointer when they belong to the
> >> same file system.  I guess the other question is why there is an
> >> EXDEV check at the top of nfsd_rename() instead of putting it in
> >> vfs_rename().
> >
> > The fh_verify call is also responsible for setting the identity of the
> > user, and that may be affected by export options like root_squash.  In
> > the above case if ffhp was on a root_squash export, and tfhp on a
> > no_root_squash export, and we get a request from uid 0, we'd end up
> > with an incoming uid-0 user running as root (since the fh_verify for
> > tfhp was performed last), and might allow renaming from a directory that
> > only root had write permission to.
> >
> > It's a minor hole (especially since it's already easy for a client to
> > circumvent protections that vary from export to export in the default
> > no_subtree_check case).  But we should be able to do better.
> >
> > I'm not sure what to suggest.
> >
> > --b.
> >
> >>
> >> For full disclosure, I'm actually running the SLES11 SP1 kernel,
> >> linux-2.6.32.12-0.7, but I've looked at the latest upstream kernels and
> >> they appear to have the same code and I assume the same issue.
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: incorrect EXDEV error occasionally returned for rename over NFS
  2010-11-19 23:57     ` J. Bruce Fields
@ 2010-11-23 20:55       ` Joe Habermann
  2010-11-24 15:07         ` J. Bruce Fields
  0 siblings, 1 reply; 7+ messages in thread
From: Joe Habermann @ 2010-11-23 20:55 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

Sure, I'll take another look.  I definitely would prefer a more
elegant solution.

With regard to the strcmp on ex_client->name(), I'm not sure that's 100% tight.
Suppose we have an export such as:
    *(rw,sync,no_subtree_check)
And during the race window, that export is replaced with:
    foo(rw,sync,no_subtree_check)
And it's "foo" that's performing the rename.
In this case, the export pointers will (generally) be different, as
will the auth_domain pointers, as will the string pointed to by
ex_client->name ("*" vs. "foo"), incorrectly leading to a failing
check.

-Joe

On Fri, Nov 19, 2010 at 5:57 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Wed, Nov 17, 2010 at 05:39:01PM -0600, Joe Habermann wrote:
>> OK, so I think I have an approach that may work and not require major
>> refactoring.  The basic idea is to maintain a sequence number for the
>> svc_export_cache.  We snap the sequence at the top of nfsd_rename.
>> And we snap it again after we've called fh_verify for the "to" directory.
>> If the sequence number changed, then the cache changed and trying
>> to compare the export pointers can lead to false positives or false negatives.
>> In this case, throw away export and the dentry information and start over.
>>
>> Other changes:
>> 1) got rid of "inuse" in cache_detail.  It really wasn't being used.
>> 2) Remove check for ex_path.mnt near the bottom of nfsd_rename.
>> Can't we assert that ffhp->fh_export == tfhp->fh_export?
>> 3) Threw in a stat to see how often we have to retry ("rename_retries")
>>
>> I'm testing with this now and so far so good but
>> comments/corrections/suggestions
>> welcome since I'm very new to this code.
>
> Hm, thanks, that looks like an approach that should work.
>
> It sure would be simpler, though if we could just solve the problem by
> replacing the comparison of export pointers by a better comparison....
>
> I understand your concern about svc_export_match--I'd need to reread
> some code to figure out if we can have duplicate auth_domains in the
> same way we can with exports--but if so we can always just do a strcmp()
> of the ex_client->name's instead.
>
> How about trying that?  Just define a small helper function that does
> those checks, and call it from rename in place of the export
> comparisons?
>
> --b.
>
>>
>> Thanks, -Joe
>>
>> --- ../linux-2.6.32.12-0.7/./include/linux/sunrpc/cache.h
>> 2010-05-20 05:06:54.000000000 -0500
>> +++ ./include/linux/sunrpc/cache.h      2011-02-11 16:04:19.000000000 -0600
>> @@ -74,7 +74,8 @@ struct cache_detail {
>>         struct cache_head **    hash_table;
>>         rwlock_t                hash_lock;
>>
>> -       atomic_t                inuse; /* active user-space update or lookup */
>> +       atomic_t                seqno; /* bumped when cache changes
>> +                                       * not used by all caches */
>>
>>         char                    *name;
>>         void                    (*cache_put)(struct kref *);
>> --- ../linux-2.6.32.12-0.7/./net/sunrpc/cache.c 2010-05-20
>> 05:06:54.000000000 -0500
>> +++ ./net/sunrpc/cache.c        2011-02-11 20:03:55.000000000 -0600
>> @@ -96,6 +96,7 @@ struct cache_head *sunrpc_cache_lookup(s
>>         *head = new;
>>         detail->entries++;
>>         cache_get(new);
>> +       atomic_inc(&detail->seqno);
>>         write_unlock(&detail->hash_lock);
>>
>>         return new;
>> @@ -141,6 +142,7 @@ struct cache_head *sunrpc_cache_update(s
>>                         cache_fresh_locked(old, new->expiry_time);
>>                         write_unlock(&detail->hash_lock);
>>                         cache_fresh_unlocked(old, detail);
>> +                       atomic_inc(&detail->seqno);
>>                         return old;
>>                 }
>>                 write_unlock(&detail->hash_lock);
>> @@ -170,6 +172,7 @@ struct cache_head *sunrpc_cache_update(s
>>         cache_fresh_unlocked(tmp, detail);
>>         cache_fresh_unlocked(old, detail);
>>         cache_put(old, detail);
>> +       atomic_inc(&detail->seqno);
>>         return tmp;
>>  }
>>  EXPORT_SYMBOL_GPL(sunrpc_cache_update);
>> @@ -241,11 +244,13 @@ int cache_check(struct cache_detail *det
>>                                         cache_fresh_unlocked(h, detail);
>>                                         rv = -ENOENT;
>>                                 }
>> +                               atomic_inc(&detail->seqno);
>>                                 break;
>>
>>                         case -EAGAIN:
>>                                 clear_bit(CACHE_PENDING, &h->flags);
>>                                 cache_revisit_request(h);
>> +                               atomic_inc(&detail->seqno); /* not needed? */
>>                                 break;
>>                         }
>>                 }
>> @@ -327,7 +332,7 @@ static void sunrpc_destroy_cache_detail(
>>         cache_purge(cd);
>>         spin_lock(&cache_list_lock);
>>         write_lock(&cd->hash_lock);
>> -       if (cd->entries || atomic_read(&cd->inuse)) {
>> +       if (cd->entries) {
>>                 write_unlock(&cd->hash_lock);
>>                 spin_unlock(&cache_list_lock);
>>                 goto out;
>> --- ../linux-2.6.32.12-0.7/./fs/nfsd/vfs.c      2010-05-20
>> 05:06:54.000000000 -0500
>> +++ ./fs/nfsd/vfs.c     2011-02-11 16:03:24.000000000 -0600
>> @@ -1717,6 +1717,8 @@ out_nfserr:
>>         goto out_unlock;
>>  }
>>
>> +static atomic_t rename_retries = ATOMIC_INIT(0);
>> +
>>  /*
>>   * Rename a file
>>   * N.B. After this call _both_ ffhp and tfhp need an fh_put
>> @@ -1729,14 +1731,33 @@ nfsd_rename(struct svc_rqst *rqstp, stru
>>         struct inode    *fdir, *tdir;
>>         __be32          err;
>>         int             host_err;
>> +       int             seqsav;
>>
>> +retry:
>> +       seqsav = atomic_read(&svc_export_cache.seqno);
>>         err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
>>         if (err)
>>                 goto out;
>>         err = fh_verify(rqstp, tfhp, S_IFDIR, NFSD_MAY_CREATE);
>>         if (err)
>>                 goto out;
>>
>> +       /* if cache changed, the fh_export pointer comparison
>> +        * below is not valid, in general */
>> +       if (unlikely(seqsav != atomic_read(&svc_export_cache.seqno))) {
>> +               cache_put(&ffhp->fh_export->h, &svc_export_cache);
>> +               ffhp->fh_export = NULL;
>> +               cache_put(&tfhp->fh_export->h, &svc_export_cache);
>> +               tfhp->fh_export = NULL;
>> +               dput(ffhp->fh_dentry);
>> +               ffhp->fh_dentry = NULL;
>> +               dput(tfhp->fh_dentry);
>> +               tfhp->fh_dentry = NULL;
>> +               atomic_inc(&rename_retries);
>> +               goto retry;
>> +       }
>> +
>>         fdentry = ffhp->fh_dentry;
>>         fdir = fdentry->d_inode;
>>
>> @@ -1785,9 +1806,6 @@ nfsd_rename(struct svc_rqst *rqstp, stru
>>                         goto out_dput_new;
>>         }
>>
>> -       host_err = -EXDEV;
>> -       if (ffhp->fh_export->ex_path.mnt != tfhp->fh_export->ex_path.mnt)
>> -               goto out_dput_new;
>>         host_err = mnt_want_write(ffhp->fh_export->ex_path.mnt);
>>         if (host_err)
>>                 goto out_dput_new;
>>
>>
>> On Tue, Nov 16, 2010 at 12:55 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>> > On Mon, Nov 15, 2010 at 03:46:44PM -0600, Joe Habermann wrote:
>> >> When running SVN checkouts over NFSv3, they occasionally fail with an
>> >> EXDEV error:
>> >>
>> >> (etc.)
>> >> U    snfs/build/Makefile.hpux
>> >> svn: In directory 'snfs/build'
>> >> svn: Can't move 'snfs/build/tempfile.tmp' to
>> >> 'snfs/build/Makefile.sol': Invalid cross-device link
>> >>
>> >> I've noticed that other people have reported this problem but it remains
>> >> unresolved.  Through some debugging, I believe I have found root cause.
>> >> nfsd_rename() contains the following code:
>> >>
>> >> 1707 /*
>> >> 1708  * Rename a file
>> >> 1709  * N.B. After this call _both_ ffhp and tfhp need an fh_put 1710  */
>> >> 1711 __be32
>> >> 1712 nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char
>> >> *fname, int flen,
>> >> 1713                             struct svc_fh *tfhp, char *tname, int tlen)
>> >> 1714 {
>> >> 1715         struct dentry   *fdentry, *tdentry, *odentry, *ndentry, *trap;
>> >> 1716         struct inode    *fdir, *tdir;
>> >> 1717         __be32          err;
>> >> 1718         int             host_err;
>> >> 1719
>> >> 1720         err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
>> >> 1721         if (err)
>> >> 1722                 goto out;
>> >> 1723         err = fh_verify(rqstp, tfhp, S_IFDIR, NFSD_MAY_CREATE);
>> >> 1724         if (err)
>> >> 1725                 goto out;
>> >> 1726
>> >> 1727         fdentry = ffhp->fh_dentry;
>> >> 1728         fdir = fdentry->d_inode;
>> >> 1729
>> >> 1730         tdentry = tfhp->fh_dentry;
>> >> 1731         tdir = tdentry->d_inode;
>> >> 1732
>> >> 1733         err = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
>> >> 1734         if (ffhp->fh_export != tfhp->fh_export)
>> >> 1735                 goto out;
>> >>
>> >> Line 1734 compares the fh_export pointers in the file handles for
>> >> the "from" and "to" directories.  However, this is a bad check since
>> >> it's possible these pointers differ yet refer to the same underlying
>> >> file system.  Based on code inspection, encached export pointers are
>> >> only valid for 30 minutes before they expire and are reinserted through
>> >> calls made between the kernel and rpc.mountd.  So a race exists where the
>> >> ffhp->fh_export is populated on line 1720, the export cache is updated,
>> >> and tfhp->fh_export is set on line 1723 to a different value.
>> >>
>> >> This window can be cranked wide open by putting an "msleep(10);"
>> >> between lines 1722 and 1723 above, recompiling and reinstalling nfsd.ko
>> >> etc., and then running SVN checkouts in a loop from an NFS client.
>> >> Within 30 minutes, the problem is seen.
>> >
>> > Good detective work, thanks.  Have you confirmed that applying your
>> > change below eliminates the problem?
>> >
>> >> I'm still very new to the NFS code, but I'm wonder whether the check
>> >> needs to be something like this instead:
>> >>
>> >> --- vfs.c.orig  2010-11-15 14:35:18.000000000 -0600
>> >> +++ vfs.c       2010-11-15 14:43:52.000000000 -0600
>> >> @@ -1731,7 +1731,7 @@ nfsd_rename(struct svc_rqst *rqstp, stru
>> >>    tdir = tdentry->d_inode;
>> >>
>> >>    err = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
>> >> -       if (ffhp->fh_export != tfhp->fh_export)
>> >> +       if (fdir->i_sb != tdir->i_sb)
>> >>        goto out;
>> >>
>> >>    err = nfserr_perm;
>> >>
>> >> Thoughts?  I think it's either that or some locking needs to be put in
>> >> to guarantee that the file handles for the "to" and "from" directory
>> >> always wind up with the same export pointer when they belong to the
>> >> same file system.  I guess the other question is why there is an
>> >> EXDEV check at the top of nfsd_rename() instead of putting it in
>> >> vfs_rename().
>> >
>> > The fh_verify call is also responsible for setting the identity of the
>> > user, and that may be affected by export options like root_squash.  In
>> > the above case if ffhp was on a root_squash export, and tfhp on a
>> > no_root_squash export, and we get a request from uid 0, we'd end up
>> > with an incoming uid-0 user running as root (since the fh_verify for
>> > tfhp was performed last), and might allow renaming from a directory that
>> > only root had write permission to.
>> >
>> > It's a minor hole (especially since it's already easy for a client to
>> > circumvent protections that vary from export to export in the default
>> > no_subtree_check case).  But we should be able to do better.
>> >
>> > I'm not sure what to suggest.
>> >
>> > --b.
>> >
>> >>
>> >> For full disclosure, I'm actually running the SLES11 SP1 kernel,
>> >> linux-2.6.32.12-0.7, but I've looked at the latest upstream kernels and
>> >> they appear to have the same code and I assume the same issue.
>> >
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: incorrect EXDEV error occasionally returned for rename over NFS
  2010-11-23 20:55       ` Joe Habermann
@ 2010-11-24 15:07         ` J. Bruce Fields
  0 siblings, 0 replies; 7+ messages in thread
From: J. Bruce Fields @ 2010-11-24 15:07 UTC (permalink / raw)
  To: Joe Habermann; +Cc: linux-nfs

On Tue, Nov 23, 2010 at 02:55:24PM -0600, Joe Habermann wrote:
> Sure, I'll take another look.  I definitely would prefer a more
> elegant solution.
> 
> With regard to the strcmp on ex_client->name(), I'm not sure that's 100% tight.
> Suppose we have an export such as:
>     *(rw,sync,no_subtree_check)
> And during the race window, that export is replaced with:
>     foo(rw,sync,no_subtree_check)
> And it's "foo" that's performing the rename.
> In this case, the export pointers will (generally) be different, as
> will the auth_domain pointers, as will the string pointed to by
> ex_client->name ("*" vs. "foo"), incorrectly leading to a failing
> check.

You're devious!  We need you reading more of the NFS code....

But I think we're both being stupid here: both exports are looked up
using the same auth_domain (rq_client), and it's really the filesystems
(or filesystem subtrees) that we want to compare anyway.

So just ditch the ex_client comparison and compare the two
(vfsmount,dentry) pairs.

(There may still be an odd race or two; e.g. if someone does

	mount --bind /export/path /export/path
	exportfs -f

between the two export lookups, could we end up with two different
vfsmounts that really could be treated as the same?  But I don't think
we need to support that.

Other alternatives would be to compare something like the fs-identifying
part of the filehandle, or the path. (Probably svc_export_match() should
be strcmp()'ing paths anyway--paths are what userspace deals in anyway,
so if we're ever in a situation where it makes a difference, and there
are two (mnt,dentry) pairs that alias to the same path, then we end up
with a cache request that userspace can never respond to.))

--b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-11-24 15:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-15 21:46 incorrect EXDEV error occasionally returned for rename over NFS Joe Habermann
2010-11-16 18:55 ` J. Bruce Fields
2010-11-16 23:49   ` Joe Habermann
2010-11-17 23:39   ` Joe Habermann
2010-11-19 23:57     ` J. Bruce Fields
2010-11-23 20:55       ` Joe Habermann
2010-11-24 15:07         ` J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).