* [PATCH 3 1/4] Add new FMODE flags: FMODE_32bithash and FMODE_64bithash
2011-08-16 11:54 [PATCH 0/2] 32/64 bit llseek hashes (v3) Bernd Schubert
@ 2011-08-16 11:54 ` Bernd Schubert
2011-08-16 11:54 ` [PATCH 3 2/4] Return 32/64-bit dir name hash according to usage type Bernd Schubert
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Bernd Schubert @ 2011-08-16 11:54 UTC (permalink / raw)
To: linux-nfs, linux-ext4
Cc: bfields, tytso, bernd.schubert, hch, adilger, yong.fan,
linux-fsdevel
Those flags are supposed to be set by NFS readdir() to tell ext3/ext4
to 32bit (NFSv2) or 64bit hash values (offsets) in seekdir().
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
---
include/linux/fs.h | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 178cdb4..18d40ae 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -91,6 +91,11 @@ struct inodes_stat_t {
/* File is opened using open(.., 3, ..) and is writeable only for ioctls
(specialy hack for floppy.c) */
#define FMODE_WRITE_IOCTL ((__force fmode_t)0x100)
+/* 32bit hashes as llseek() offset (for directories) */
+#define FMODE_32BITHASH ((__force fmode_t)0x200)
+/* 64bit hashes as llseek() offset (for directories) */
+#define FMODE_64BITHASH ((__force fmode_t)0x400)
+
/*
* Don't update ctime and mtime.
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3 2/4] Return 32/64-bit dir name hash according to usage type
2011-08-16 11:54 [PATCH 0/2] 32/64 bit llseek hashes (v3) Bernd Schubert
2011-08-16 11:54 ` [PATCH 3 1/4] Add new FMODE flags: FMODE_32bithash and FMODE_64bithash Bernd Schubert
@ 2011-08-16 11:54 ` Bernd Schubert
2011-08-19 22:29 ` Ted Ts'o
2011-08-16 11:54 ` [PATCH 3 3/4] nfsd_open(): rename 'int access' to 'int may_flags' in nfsd_open() Bernd Schubert
` (2 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Bernd Schubert @ 2011-08-16 11:54 UTC (permalink / raw)
To: linux-nfs, linux-ext4
Cc: bfields, tytso, bernd.schubert, hch, adilger, yong.fan,
linux-fsdevel
From: Fan Yong <yong.fan@whamcloud.com>
Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
and telldir(). However, this causes problems if there are 32-bit hash
collisions, since the NFSv2 server can get stuck resending the same
entries from the directory repeatedly.
Allow ext4 to return a full 64-bit hash (both major and minor) for
telldir to decrease the chance of hash collisions. This still needs
integration on the NFS side.
Patch-updated-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
(blame me if something is not correct)
Signed-off-by: Fan Yong <yong.fan@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
---
fs/ext4/dir.c | 185 ++++++++++++++++++++++++++++++++++++++++++++------------
fs/ext4/ext4.h | 6 ++
fs/ext4/hash.c | 4 +
3 files changed, 154 insertions(+), 41 deletions(-)
diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 164c560..cc47087 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -32,24 +32,8 @@ static unsigned char ext4_filetype_table[] = {
DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
};
-static int ext4_readdir(struct file *, void *, filldir_t);
static int ext4_dx_readdir(struct file *filp,
void *dirent, filldir_t filldir);
-static int ext4_release_dir(struct inode *inode,
- struct file *filp);
-
-const struct file_operations ext4_dir_operations = {
- .llseek = ext4_llseek,
- .read = generic_read_dir,
- .readdir = ext4_readdir, /* we take BKL. needed?*/
- .unlocked_ioctl = ext4_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = ext4_compat_ioctl,
-#endif
- .fsync = ext4_sync_file,
- .release = ext4_release_dir,
-};
-
static unsigned char get_dtype(struct super_block *sb, int filetype)
{
@@ -254,22 +238,134 @@ out:
return ret;
}
+static inline int is_32bit_api(void)
+{
+#ifdef HAVE_IS_COMPAT_TASK
+ return is_compat_task();
+#else
+ return (BITS_PER_LONG == 32);
+#endif
+}
+
/*
* These functions convert from the major/minor hash to an f_pos
- * value.
+ * value for dx directories
+ *
+ * Upper layer (for example NFS) should specify FMODE_32BITHASH or
+ * FMODE_64BITHASH explicitly. On the other hand, we allow ext4 to be mounted
+ * directly on both 32-bit and 64-bit nodes, under such case, neither
+ * FMODE_32BITHASH nor FMODE_64BITHASH is specified.
+ */
+static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
+{
+ if ((filp->f_flags & FMODE_32BITHASH) ||
+ (!(filp->f_flags & FMODE_64BITHASH) && is_32bit_api()))
+ return major >> 1;
+ else
+ return ((__u64)(major >> 1) << 32) | (__u64)minor;
+}
+
+static inline __u32 pos2maj_hash(struct file *filp, loff_t pos)
+{
+ if ((filp->f_flags & FMODE_32BITHASH) ||
+ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
+ return (pos << 1) & 0xffffffff;
+ else
+ return ((pos >> 32) << 1) & 0xffffffff;
+}
+
+static inline __u32 pos2min_hash(struct file *filp, loff_t pos)
+{
+ if ((filp->f_flags & FMODE_32BITHASH) ||
+ (!(filp->f_flags & FMODE_64BITHASH) && is_32bit_api()))
+ return 0;
+ else
+ return pos & 0xffffffff;
+}
+
+/*
+ * Return 32- or 64-bit end-of-file for dx directories
+ */
+static inline loff_t ext4_get_htree_eof(struct file *filp)
+{
+ if ((filp->f_mode & FMODE_32BITHASH) ||
+ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
+ return EXT4_HTREE_EOF_32BIT;
+ else
+ return EXT4_HTREE_EOF_64BIT;
+}
+
+
+/*
+ * ext4_dir_llseek() based on generic_file_llseek() to handle both
+ * non-htree and htree directories, where the "offset" is in terms
+ * of the filename hash value instead of the byte offset.
*
- * Currently we only use major hash numer. This is unfortunate, but
- * on 32-bit machines, the same VFS interface is used for lseek and
- * llseek, so if we use the 64 bit offset, then the 32-bit versions of
- * lseek/telldir/seekdir will blow out spectacularly, and from within
- * the ext2 low-level routine, we don't know if we're being called by
- * a 64-bit version of the system call or the 32-bit version of the
- * system call. Worse yet, NFSv2 only allows for a 32-bit readdir
- * cookie. Sigh.
+ * NOTE: offsets obtained *before* ext4_set_inode_flag(dir, EXT4_INODE_INDEX)
+ * will be invalid once the directory was converted into a dx directory
*/
-#define hash2pos(major, minor) (major >> 1)
-#define pos2maj_hash(pos) ((pos << 1) & 0xffffffff)
-#define pos2min_hash(pos) (0)
+loff_t ext4_dir_llseek(struct file *file, loff_t offset, int origin)
+{
+ struct inode *inode = file->f_mapping->host;
+ loff_t ret = -EINVAL;
+ int is_dx_dir = ext4_test_inode_flag(inode, EXT4_INODE_INDEX);
+
+ mutex_lock(&inode->i_mutex);
+
+ /* NOTE: relative offsets with dx directories might not work
+ * as expected, as it is difficult to figure out the
+ * correct offset between dx hashes */
+
+ switch (origin) {
+ case SEEK_END:
+ if (unlikely(offset > 0))
+ goto out_err; /* not supported for directories */
+
+ /* so only negative offsets are left, does that have a
+ * meaning for directories at all? */
+ if (is_dx_dir)
+ offset += ext4_get_htree_eof(file);
+ else
+ offset += inode->i_size;
+ break;
+ case SEEK_CUR:
+ /*
+ * Here we special-case the lseek(fd, 0, SEEK_CUR)
+ * position-querying operation. Avoid rewriting the "same"
+ * f_pos value back to the file because a concurrent read(),
+ * write() or lseek() might have altered it
+ */
+ if (offset == 0) {
+ offset = file->f_pos;
+ goto out_ok;
+ }
+
+ offset += file->f_pos;
+ break;
+ }
+
+ if (unlikely(offset < 0))
+ goto out_err;
+
+ if (!is_dx_dir) {
+ if (offset > inode->i_sb->s_maxbytes)
+ goto out_err;
+ } else if (offset > ext4_get_htree_eof(file))
+ goto out_err;
+
+ /* Special lock needed here? */
+ if (offset != file->f_pos) {
+ file->f_pos = offset;
+ file->f_version = 0;
+ }
+
+out_ok:
+ ret = offset;
+out_err:
+ mutex_unlock(&inode->i_mutex);
+
+ return ret;
+}
/*
* This structure holds the nodes of the red-black tree used to store
@@ -330,15 +426,16 @@ static void free_rb_tree_fname(struct rb_root *root)
}
-static struct dir_private_info *ext4_htree_create_dir_info(loff_t pos)
+static struct dir_private_info *ext4_htree_create_dir_info(struct file *filp,
+ loff_t pos)
{
struct dir_private_info *p;
p = kzalloc(sizeof(struct dir_private_info), GFP_KERNEL);
if (!p)
return NULL;
- p->curr_hash = pos2maj_hash(pos);
- p->curr_minor_hash = pos2min_hash(pos);
+ p->curr_hash = pos2maj_hash(filp, pos);
+ p->curr_minor_hash = pos2min_hash(filp, pos);
return p;
}
@@ -429,7 +526,7 @@ static int call_filldir(struct file *filp, void *dirent,
"null fname?!?\n");
return 0;
}
- curr_pos = hash2pos(fname->hash, fname->minor_hash);
+ curr_pos = hash2pos(filp, fname->hash, fname->minor_hash);
while (fname) {
error = filldir(dirent, fname->name,
fname->name_len, curr_pos,
@@ -454,13 +551,13 @@ static int ext4_dx_readdir(struct file *filp,
int ret;
if (!info) {
- info = ext4_htree_create_dir_info(filp->f_pos);
+ info = ext4_htree_create_dir_info(filp, filp->f_pos);
if (!info)
return -ENOMEM;
filp->private_data = info;
}
- if (filp->f_pos == EXT4_HTREE_EOF)
+ if (filp->f_pos == ext4_get_htree_eof(filp))
return 0; /* EOF */
/* Some one has messed with f_pos; reset the world */
@@ -468,8 +565,8 @@ static int ext4_dx_readdir(struct file *filp,
free_rb_tree_fname(&info->root);
info->curr_node = NULL;
info->extra_fname = NULL;
- info->curr_hash = pos2maj_hash(filp->f_pos);
- info->curr_minor_hash = pos2min_hash(filp->f_pos);
+ info->curr_hash = pos2maj_hash(filp, filp->f_pos);
+ info->curr_minor_hash = pos2min_hash(filp, filp->f_pos);
}
/*
@@ -501,7 +598,7 @@ static int ext4_dx_readdir(struct file *filp,
if (ret < 0)
return ret;
if (ret == 0) {
- filp->f_pos = EXT4_HTREE_EOF;
+ filp->f_pos = ext4_get_htree_eof(filp);
break;
}
info->curr_node = rb_first(&info->root);
@@ -521,7 +618,7 @@ static int ext4_dx_readdir(struct file *filp,
info->curr_minor_hash = fname->minor_hash;
} else {
if (info->next_hash == ~0) {
- filp->f_pos = EXT4_HTREE_EOF;
+ filp->f_pos = ext4_get_htree_eof(filp);
break;
}
info->curr_hash = info->next_hash;
@@ -540,3 +637,15 @@ static int ext4_release_dir(struct inode *inode, struct file *filp)
return 0;
}
+
+const struct file_operations ext4_dir_operations = {
+ .llseek = ext4_dir_llseek,
+ .read = generic_read_dir,
+ .readdir = ext4_readdir,
+ .unlocked_ioctl = ext4_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = ext4_compat_ioctl,
+#endif
+ .fsync = ext4_sync_file,
+ .release = ext4_release_dir,
+};
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index e717dfd..31d9ba0 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1560,7 +1560,11 @@ struct dx_hash_info
u32 *seed;
};
-#define EXT4_HTREE_EOF 0x7fffffff
+
+/* 32 and 64 bit signed EOF for dx directories */
+#define EXT4_HTREE_EOF_32BIT ((1UL << (32 - 1)) - 1)
+#define EXT4_HTREE_EOF_64BIT ((1ULL << (64 - 1)) - 1)
+
/*
* Control parameters used by ext4_htree_next_block
diff --git a/fs/ext4/hash.c b/fs/ext4/hash.c
index ac8f168..fa8e491 100644
--- a/fs/ext4/hash.c
+++ b/fs/ext4/hash.c
@@ -200,8 +200,8 @@ int ext4fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
return -1;
}
hash = hash & ~1;
- if (hash == (EXT4_HTREE_EOF << 1))
- hash = (EXT4_HTREE_EOF-1) << 1;
+ if (hash == (EXT4_HTREE_EOF_32BIT << 1))
+ hash = (EXT4_HTREE_EOF_32BIT - 1) << 1;
hinfo->hash = hash;
hinfo->minor_hash = minor_hash;
return 0;
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 3 2/4] Return 32/64-bit dir name hash according to usage type
2011-08-16 11:54 ` [PATCH 3 2/4] Return 32/64-bit dir name hash according to usage type Bernd Schubert
@ 2011-08-19 22:29 ` Ted Ts'o
2011-08-20 6:23 ` Andreas Dilger
0 siblings, 1 reply; 10+ messages in thread
From: Ted Ts'o @ 2011-08-19 22:29 UTC (permalink / raw)
To: Bernd Schubert
Cc: linux-nfs, linux-ext4, bfields, bernd.schubert, hch, adilger,
yong.fan, linux-fsdevel
On Tue, Aug 16, 2011 at 01:54:14PM +0200, Bernd Schubert wrote:
> +static inline int is_32bit_api(void)
> +{
> +#ifdef HAVE_IS_COMPAT_TASK
> + return is_compat_task();
> +#else
> + return (BITS_PER_LONG == 32);
> +#endif
I assume is_compat_task() is coming from another patch? What is the
status of that change?
In the case where is_compat_task() is not defined, we can't just test
based on BITS_PER_LONG == 32, since even on an x86_64 machine, it's
possible we're running a 32-bit binary in compat mode....
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3 2/4] Return 32/64-bit dir name hash according to usage type
2011-08-19 22:29 ` Ted Ts'o
@ 2011-08-20 6:23 ` Andreas Dilger
2011-08-30 22:07 ` Bernd Schubert
0 siblings, 1 reply; 10+ messages in thread
From: Andreas Dilger @ 2011-08-20 6:23 UTC (permalink / raw)
To: Ted Ts'o
Cc: Bernd Schubert, linux-nfs, linux-ext4, bfields, bernd.schubert,
hch, yong.fan, linux-fsdevel
On 2011-08-19, at 4:29 PM, Ted Ts'o wrote:
> On Tue, Aug 16, 2011 at 01:54:14PM +0200, Bernd Schubert wrote:
>> +static inline int is_32bit_api(void)
>> +{
>> +#ifdef HAVE_IS_COMPAT_TASK
>> + return is_compat_task();
>> +#else
>> + return (BITS_PER_LONG == 32);
>> +#endif
>
> I assume is_compat_task() is coming from another patch? What is the
> status of that change?
No, is_compat_task() is upstream for most (all?) of the architectures
that support hybrid 32-/64-bit operation. It is set at 32-bit syscall
entry when running on 64-bit architectures.
The only minor error in this patch (fixed with a new version from Bernd)
is that this should be under CONFIG_COMPAT instead of HAVE_IS_COMPAT_TASK.
> In the case where is_compat_task() is not defined, we can't just test
> based on BITS_PER_LONG == 32, since even on an x86_64 machine, it's
> possible we're running a 32-bit binary in compat mode....
It is definitely available on x86_64.
Cheers, Andreas
--
Andreas Dilger
Principal Engineer
Whamcloud, Inc.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3 2/4] Return 32/64-bit dir name hash according to usage type
2011-08-20 6:23 ` Andreas Dilger
@ 2011-08-30 22:07 ` Bernd Schubert
0 siblings, 0 replies; 10+ messages in thread
From: Bernd Schubert @ 2011-08-30 22:07 UTC (permalink / raw)
To: Andreas Dilger
Cc: Ted Ts'o, Bernd Schubert, linux-nfs, linux-ext4, bfields, hch,
yong.fan, linux-fsdevel
On 08/20/2011 08:23 AM, Andreas Dilger wrote:
> On 2011-08-19, at 4:29 PM, Ted Ts'o wrote:
>> On Tue, Aug 16, 2011 at 01:54:14PM +0200, Bernd Schubert wrote:
>>> +static inline int is_32bit_api(void)
>>> +{
>>> +#ifdef HAVE_IS_COMPAT_TASK
>>> + return is_compat_task();
>>> +#else
>>> + return (BITS_PER_LONG == 32);
>>> +#endif
>>
>> I assume is_compat_task() is coming from another patch? What is the
>> status of that change?
>
> No, is_compat_task() is upstream for most (all?) of the architectures
> that support hybrid 32-/64-bit operation. It is set at 32-bit syscall
> entry when running on 64-bit architectures.
>
> The only minor error in this patch (fixed with a new version from Bernd)
> is that this should be under CONFIG_COMPAT instead of HAVE_IS_COMPAT_TASK.
Yes sorry again about this. Could you please see patch series v4 please?
>
>> In the case where is_compat_task() is not defined, we can't just test
>> based on BITS_PER_LONG == 32, since even on an x86_64 machine, it's
>> possible we're running a 32-bit binary in compat mode....
>
> It is definitely available on x86_64.
Yep, otherwise it even wouldn't compile, at least not with patch series v4.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 3 3/4] nfsd_open(): rename 'int access' to 'int may_flags' in nfsd_open()
2011-08-16 11:54 [PATCH 0/2] 32/64 bit llseek hashes (v3) Bernd Schubert
2011-08-16 11:54 ` [PATCH 3 1/4] Add new FMODE flags: FMODE_32bithash and FMODE_64bithash Bernd Schubert
2011-08-16 11:54 ` [PATCH 3 2/4] Return 32/64-bit dir name hash according to usage type Bernd Schubert
@ 2011-08-16 11:54 ` Bernd Schubert
2011-08-16 11:54 ` [PATCH 3 4/4] nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes) Bernd Schubert
2011-08-23 21:56 ` [PATCH 0/2] 32/64 bit llseek hashes (v3) J. Bruce Fields
4 siblings, 0 replies; 10+ messages in thread
From: Bernd Schubert @ 2011-08-16 11:54 UTC (permalink / raw)
To: linux-nfs, linux-ext4
Cc: bfields, tytso, bernd.schubert, hch, adilger, yong.fan,
linux-fsdevel
Just rename this variable, as the next patch will add a flag and
'access' as variable name would not be correct any more.
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
---
fs/nfsd/vfs.c | 18 ++++++++++--------
1 files changed, 10 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index fd0acca..ca692b4 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -708,12 +708,13 @@ static int nfsd_open_break_lease(struct inode *inode, int access)
/*
* Open an existing file or directory.
- * The access argument indicates the type of open (read/write/lock)
+ * The may_flags argument indicates the type of open (read/write/lock)
+ * and additional flags.
* N.B. After this call fhp needs an fh_put
*/
__be32
nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
- int access, struct file **filp)
+ int may_flags, struct file **filp)
{
struct dentry *dentry;
struct inode *inode;
@@ -728,7 +729,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
* and (hopefully) checked permission - so allow OWNER_OVERRIDE
* in case a chmod has now revoked permission.
*/
- err = fh_verify(rqstp, fhp, type, access | NFSD_MAY_OWNER_OVERRIDE);
+ err = fh_verify(rqstp, fhp, type, may_flags | NFSD_MAY_OWNER_OVERRIDE);
if (err)
goto out;
@@ -739,7 +740,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
* or any access when mandatory locking enabled
*/
err = nfserr_perm;
- if (IS_APPEND(inode) && (access & NFSD_MAY_WRITE))
+ if (IS_APPEND(inode) && (may_flags & NFSD_MAY_WRITE))
goto out;
/*
* We must ignore files (but only files) which might have mandatory
@@ -752,12 +753,12 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
if (!inode->i_fop)
goto out;
- host_err = nfsd_open_break_lease(inode, access);
+ host_err = nfsd_open_break_lease(inode, may_flags);
if (host_err) /* NOMEM or WOULDBLOCK */
goto out_nfserr;
- if (access & NFSD_MAY_WRITE) {
- if (access & NFSD_MAY_READ)
+ if (may_flags & NFSD_MAY_WRITE) {
+ if (may_flags & NFSD_MAY_READ)
flags = O_RDWR|O_LARGEFILE;
else
flags = O_WRONLY|O_LARGEFILE;
@@ -767,7 +768,8 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
if (IS_ERR(*filp))
host_err = PTR_ERR(*filp);
else
- host_err = ima_file_check(*filp, access);
+ host_err = ima_file_check(*filp, may_flags);
+
out_nfserr:
err = nfserrno(host_err);
out:
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3 4/4] nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)
2011-08-16 11:54 [PATCH 0/2] 32/64 bit llseek hashes (v3) Bernd Schubert
` (2 preceding siblings ...)
2011-08-16 11:54 ` [PATCH 3 3/4] nfsd_open(): rename 'int access' to 'int may_flags' in nfsd_open() Bernd Schubert
@ 2011-08-16 11:54 ` Bernd Schubert
2011-08-23 21:56 ` [PATCH 0/2] 32/64 bit llseek hashes (v3) J. Bruce Fields
4 siblings, 0 replies; 10+ messages in thread
From: Bernd Schubert @ 2011-08-16 11:54 UTC (permalink / raw)
To: linux-nfs, linux-ext4
Cc: bfields, tytso, bernd.schubert, hch, adilger, yong.fan,
linux-fsdevel
Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
the NFS version. NFSv2 gets 32-bit hashes only.
NOTE: This patch got rather complex as Christoph asked to set the
filp->f_mode flag in the open call or immediatly after dentry_open()
in nfsd_open() to avoid races.
Personally I still do not see a reason for that and in my opinion
FMODE_32BITHASH/FMODE_64BITHASH flags could be set nfsd_readdir(), as it
follows directly after nfsd_open() without a chance of races.
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
---
fs/nfsd/vfs.c | 15 +++++++++++++--
fs/nfsd/vfs.h | 2 ++
2 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index ca692b4..97a99f1 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -767,9 +767,15 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
flags, current_cred());
if (IS_ERR(*filp))
host_err = PTR_ERR(*filp);
- else
+ else {
host_err = ima_file_check(*filp, may_flags);
+ if (may_flags & NFSD_MAY_64BIT_COOKIE)
+ (*filp)->f_mode |= FMODE_64BITHASH;
+ else
+ (*filp)->f_mode |= FMODE_32BITHASH;
+ }
+
out_nfserr:
err = nfserrno(host_err);
out:
@@ -1991,8 +1997,13 @@ nfsd_readdir(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t *offsetp,
__be32 err;
struct file *file;
loff_t offset = *offsetp;
+ int may_flags = NFSD_MAY_READ;
+
+ /* NFSv2 only supports 32 bit cookies */
+ if (rqstp->rq_vers > 2)
+ may_flags |= NFSD_MAY_64BIT_COOKIE;
- err = nfsd_open(rqstp, fhp, S_IFDIR, NFSD_MAY_READ, &file);
+ err = nfsd_open(rqstp, fhp, S_IFDIR, may_flags, &file);
if (err)
goto out;
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index e0bbac0..ecd00e1 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -26,6 +26,8 @@
#define NFSD_MAY_NOT_BREAK_LEASE 512
#define NFSD_MAY_BYPASS_GSS 1024
+#define NFSD_MAY_64BIT_COOKIE 2048 /* 64 bit readdir cookies for >= NFSv3 */
+
#define NFSD_MAY_CREATE (NFSD_MAY_EXEC|NFSD_MAY_WRITE)
#define NFSD_MAY_REMOVE (NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] 32/64 bit llseek hashes (v3)
2011-08-16 11:54 [PATCH 0/2] 32/64 bit llseek hashes (v3) Bernd Schubert
` (3 preceding siblings ...)
2011-08-16 11:54 ` [PATCH 3 4/4] nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes) Bernd Schubert
@ 2011-08-23 21:56 ` J. Bruce Fields
2011-08-30 21:59 ` Bernd Schubert
4 siblings, 1 reply; 10+ messages in thread
From: J. Bruce Fields @ 2011-08-23 21:56 UTC (permalink / raw)
To: Bernd Schubert
Cc: linux-nfs, linux-ext4, tytso, bernd.schubert, hch, adilger,
yong.fan, linux-fsdevel
On Tue, Aug 16, 2011 at 01:54:04PM +0200, Bernd Schubert wrote:
> With the ext3/ext4 directory index implementation hashes are used to specify
> offsets for llseek(). For compatibility with NFSv2 and 32-bit user space
> on 64-bit systems (kernel space) ext3/ext4 currently only return 32-bit
> hashes and therefore the probability of hash collisions for larger directories
> is rather high. As recently reported on the NFS mailing list that theoretical
> problem also happens on real systems:
> http://comments.gmane.org/gmane.linux.nfs/40863
>
> The following series adds two new f_mode flags to tell ext4
> to use 32-bit or 64-bit hash values for llseek() calls.
> These flags can then used by network file systems, such as NFS, to
> request 32-bit or 64-bit offsets (hashes).
>
> Version 3:
> - remove patch "RFC: Remove check for a 32-bit cookie in nfsd4_readdir()",
> I think Bruce wanted to take it seperately as bug fix. It should be applied
> before applying the remaining NFS patches, as without it NFSv4 will always
> fail with the new 64-bit ext4 seek hashes.
Yes, applied to my for-3.2 branch at
git://linux-nfs.org/~bfields/linux.git.
For the NFS patches:
Acked-by: J. Bruce Fields <bfields@redhat.com>
OK by me if they go in through ext4 tree, or however's most convenient.
--b.
> - split "nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)" into two
> two separate patches as suggested by Bruce, one patch to rename
> 'access' to 'may_flags'. And the remainder of the original patch to set
> FMODE_32BITHASH/FMODE_64BITHASH flags and to introduce the new
> NFSD_MAY_64BIT_COOKIE flag
>
> Version 2:
> - use f_mode instead of O_* flags and also in a separate patch
> - introduce EXT4_HTREE_EOF_32BIT and EXT4_HTREE_EOF_64BIT
> - fix SEEK_END in ext4_dir_llseek()
> - set f_mode flags in NFS code as early as possible and introduce a new
> NFSD_MAY_64BIT_COOKIE flag for that
>
> --
> Bernd Schubert
> Fraunhofer ITWM
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] 32/64 bit llseek hashes (v3)
2011-08-23 21:56 ` [PATCH 0/2] 32/64 bit llseek hashes (v3) J. Bruce Fields
@ 2011-08-30 21:59 ` Bernd Schubert
0 siblings, 0 replies; 10+ messages in thread
From: Bernd Schubert @ 2011-08-30 21:59 UTC (permalink / raw)
To: J. Bruce Fields
Cc: Bernd Schubert, linux-nfs, linux-ext4, tytso, hch, adilger,
yong.fan, linux-fsdevel
On 08/23/2011 11:56 PM, J. Bruce Fields wrote:
> On Tue, Aug 16, 2011 at 01:54:04PM +0200, Bernd Schubert wrote:
>> With the ext3/ext4 directory index implementation hashes are used to specify
>> offsets for llseek(). For compatibility with NFSv2 and 32-bit user space
>> on 64-bit systems (kernel space) ext3/ext4 currently only return 32-bit
>> hashes and therefore the probability of hash collisions for larger directories
>> is rather high. As recently reported on the NFS mailing list that theoretical
>> problem also happens on real systems:
>> http://comments.gmane.org/gmane.linux.nfs/40863
>>
>> The following series adds two new f_mode flags to tell ext4
>> to use 32-bit or 64-bit hash values for llseek() calls.
>> These flags can then used by network file systems, such as NFS, to
>> request 32-bit or 64-bit offsets (hashes).
>>
>> Version 3:
>> - remove patch "RFC: Remove check for a 32-bit cookie in nfsd4_readdir()",
>> I think Bruce wanted to take it seperately as bug fix. It should be applied
>> before applying the remaining NFS patches, as without it NFSv4 will always
>> fail with the new 64-bit ext4 seek hashes.
>
> Yes, applied to my for-3.2 branch at
> git://linux-nfs.org/~bfields/linux.git.
>
> For the NFS patches:
>
> Acked-by: J. Bruce Fields<bfields@redhat.com>
>
> OK by me if they go in through ext4 tree, or however's most convenient.
Great, thanks!
Cheers,
Bernd
^ permalink raw reply [flat|nested] 10+ messages in thread