From: Jim Vanns <james.vanns@framestore.com>
To: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Cc: Jim Vanns <james.vanns@framestore.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: Your comments, guidance, advice please :)
Date: Tue, 14 Aug 2012 17:32:39 +0100 [thread overview]
Message-ID: <1344961959.8801.3.camel@sys367.ldn.framestore.com> (raw)
In-Reply-To: <4FA345DA4F4AE44899BD2B03EEEC2FA939D69C@SACEXCMBX04-PRD.hq.netapp.com>
[-- Attachment #1: Type: text/plain, Size: 2554 bytes --]
On Mon, 2012-08-13 at 16:40 +0000, Myklebust, Trond wrote:
> On Mon, 2012-08-13 at 15:55 +0100, Jim Vanns wrote:
> > Hello NFS hackers. First off, fear not - the attached patch is not
> > something I wish to submit to the mainline kernel! However, it is
> > important for me that you pass judgement or comment on it. It is small.
> >
> > Basically, I've written the patch solely to workaround a Bluearc bug
> > where it duplicates fileids within an fsid and therefore we're not able
> > to rely on the fsid+fileid to identify distinct files in an NFS
> > filesystem. Some of our storage indexing and reporting software relies
> > on this and works happily with other, more RFC-adherent
> > implementations ;)
> >
> > The functional change is one that modified the received fileid to a hash
> > of the file handle as that, thankfully, is still unique. As with a
> > fileid I need this hash to remain consistent for the lifetime of a file.
> > It is used as a unique identifier in a database.
> >
> > I'd really appreciate it if you could let me know of any problems you
> > see with it - whether it'll break some client-side code, hash table use
> > or worse still send back bad data to the server.
> >
> > I've modified what I can see as the least amount of code possible - and
> > my test VM is working happily as a client with this patch. It is
> > intended that the patch modifies only client-side code once the Bluearc
> > RPCs are pulled off the wire. I never want to send back these modified
> > fileids to the server.
>
> READDIR and READDIRPLUS will continue to return the fileid from the
> server, so the getdents() and readdir() syscalls will be broken. Since
> READDIRPLUS does return the filehandle, you might be able to fix that
> up, but plain READDIR would appear to be unfixable.
To this end then, I've modified my patch so that within
nfs_refresh_inode() itself I do the following:
fattr->fileid = nfs_fh_hash(NFS_FH(inode));
Before the spin lock is taken. Full patch attached again for context.
Thanks again,
Jim
> Otherwise, your strategy should in principle be OK, but with the caveat
> that a hash does not suffice to completely prevent collisions even if it
> is well chosen.
> IOW: All you are doing is tweaking the probability of a collision.
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> Trond.Myklebust@netapp.com
> www.netapp.com
>
> NrybXǧv^){.n+{"^nrz\x1ah&\x1eGh\x03(階ݢj"\x1a^[mzޖfh~m
--
Jim Vanns
Systems Programmer
Framestore
[-- Attachment #2: fscfc-nfs-fileid-hash-2.patch --]
[-- Type: text/x-patch, Size: 3452 bytes --]
--- linux-2.6.32/fs/nfs/inode.c.orig 2012-08-13 11:24:41.902452726 +0100
+++ linux-2.6.32/fs/nfs/inode.c 2012-08-14 14:04:37.905712353 +0100
@@ -60,6 +60,52 @@ static int nfs_update_inode(struct inode
static struct kmem_cache * nfs_inode_cachep;
+/*
+ * FSCFC:
+ *
+ * Implementations of FNV-1 hash function:
+ * http://isthe.com/chongo/tech/comp/fnv
+ * http://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash#Notes
+ */
+static inline uint32_t
+nfs_fh_hash32(struct nfs_fh *fh)
+{
+ static const uint32_t seed = 2166136261U,
+ prime = 16777619U;
+ uint32_t hash = seed, i = 0;
+
+ while (i < fh->size) {
+ hash = (hash * prime) ^ fh->data[i];
+ ++i;
+ }
+
+ return hash;
+}
+
+static inline uint64_t
+nfs_fh_hash64(struct nfs_fh *fh)
+{
+ static const uint64_t seed = 14695981039346656037U,
+ prime = 1099511628211U;
+ uint64_t hash = seed, i = 0;
+
+ while (i < fh->size) {
+ hash = (hash * prime) ^ fh->data[i];
+ ++i;
+ }
+
+ return hash;
+}
+
+static inline unsigned long
+nfs_fh_hash(struct nfs_fh *fh)
+{
+ if (sizeof(unsigned long) == 4)
+ return nfs_fh_hash32(fh);
+
+ return nfs_fh_hash64(fh);
+}
+
static inline unsigned long
nfs_fattr_to_ino_t(struct nfs_fattr *fattr)
{
@@ -268,7 +314,28 @@ nfs_fhget(struct super_block *sb, struct
if ((fattr->valid & NFS_ATTR_FATTR_TYPE) == 0)
goto out_no_inode;
- hash = nfs_fattr_to_ino_t(fattr);
+ /*
+ * FSCFC:
+ *
+ * This patch exists solely to work around the Bluearc duplicate inode/NFS
+ * fileid bug. On Bluearc filesystems a distinct, non-hardlinked file or
+ * directory appears to share the same fsid + fileid with other completely
+ * unrelated files elsewhere in the hierarchy. This becomes a problem for
+ * any system dependent on the commonly accpepted notion of the NFSv3 fsid
+ * and fileid uniquely identifying a single file/directory within an NFS
+ * filesystem. Thankfully, the NFS file handle for any duplications are
+ * still unique :)
+ *
+ * We must update *both* the hash value (was the i_ino/fileid as returned
+ * by nfs_fattr_to_ino_t() above) and fileid here as it is used as the key
+ * in the inode cache maintained within iget5_locked() below.
+ *
+ * We set fattr->fileid to 'hash' to because NFS_FILEID and set_nfs_fileid()
+ * just copy/return 'fileid' from this structure which the server has
+ * already sent as the inode on it's filesystem as you'd expect. This is
+ * what we overwrite - client side only.
+ */
+ fattr->fileid = hash = nfs_fh_hash(fh); // JV alternate code
inode = iget5_locked(sb, hash, nfs_find_actor, nfs_init_locked, &desc);
if (inode == NULL) {
@@ -907,7 +974,6 @@ static int nfs_check_inode_attributes(st
loff_t cur_size, new_isize;
unsigned long invalid = 0;
-
/* Has the inode gone and changed behind our back? */
if ((fattr->valid & NFS_ATTR_FATTR_FILEID) && nfsi->fileid != fattr->fileid)
return -EIO;
@@ -1036,6 +1102,12 @@ int nfs_refresh_inode(struct inode *inod
if ((fattr->valid & NFS_ATTR_FATTR) == 0)
return 0;
+
+ /*
+ * FSCFC: Compare against the hashed file handle, not the fileid
+ */
+ fattr->fileid = nfs_fh_hash(NFS_FH(inode)); // JV alternate code
+
spin_lock(&inode->i_lock);
status = nfs_refresh_inode_locked(inode, fattr);
spin_unlock(&inode->i_lock);
prev parent reply other threads:[~2012-08-14 16:32 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-13 14:55 Your comments, guidance, advice please :) Jim Vanns
2012-08-13 16:40 ` Myklebust, Trond
2012-08-13 16:51 ` Jim Vanns
2012-08-13 17:28 ` J. Bruce Fields
2012-08-14 16:32 ` Jim Vanns [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1344961959.8801.3.camel@sys367.ldn.framestore.com \
--to=james.vanns@framestore.com \
--cc=Trond.Myklebust@netapp.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).