* [PATCH] vfs: get_next_ino(), never inum=0 @ 2014-04-29 15:45 hooanon05g 2014-04-29 17:42 ` J. R. Okajima 2014-08-18 18:21 ` [PATCH v2] " Carlos Maiolino 0 siblings, 2 replies; 9+ messages in thread From: hooanon05g @ 2014-04-29 15:45 UTC (permalink / raw) To: hch, dchinner, viro; +Cc: linux-fsdevel, J. R. Okajima From: "J. R. Okajima" <hooanon05g@gmail.com> It is very rare for get_next_ino() to return zero as a new inode number since its type is unsigned int, but it can surely happen eventually. Interestingly, ls(1) and find(1) don't show a file whose inum is zero, so people won't be able to find it. This issue may be harmful especially for tmpfs. On a very long lived and busy system, users may frequently create files on tmpfs. And if unluckily he gets inum=0, then he cannot see its filename. If he remembers its name, he may be able to use or unlink it by its name since the file surely exists. Otherwise, the file remains on tmpfs silently. No one can touch it. This behaviour looks like resource leak. As a worse case, if a dir gets inum=0 and a user creates several files under it, then the leaked memory will increase since a user cannot see the name of all files under the dir whose inum=0, regardless the inum of the children. There is another unpleasant effect when get_next_ino() wraps around. When there is a file whose inum=100 on tmpfs, a new file may get inum=100. I am not sure what will happen when the duplicated inums exist on tmpfs. Anyway this is not a issue in get_next_ino(). It should be fixed in mm/shmem.c if it is really necessary. Signed-off-by: J. R. Okajima <hooanon05g@gmail.com> --- fs/inode.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/inode.c b/fs/inode.c index f96d2a6..a3e274a 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -848,7 +848,11 @@ unsigned int get_next_ino(void) } #endif - *p = ++res; + res++; + /* never zero */ + if (unlikely(!res)) + res++; + *p = res; put_cpu_var(last_ino); return res; } -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] vfs: get_next_ino(), never inum=0 2014-04-29 15:45 [PATCH] vfs: get_next_ino(), never inum=0 hooanon05g @ 2014-04-29 17:42 ` J. R. Okajima 2014-04-29 17:53 ` Christoph Hellwig 2014-08-18 18:21 ` [PATCH v2] " Carlos Maiolino 1 sibling, 1 reply; 9+ messages in thread From: J. R. Okajima @ 2014-04-29 17:42 UTC (permalink / raw) To: hch, dchinner, viro, linux-fsdevel > There is another unpleasant effect when get_next_ino() wraps > around. When there is a file whose inum=100 on tmpfs, a new file may get > inum=100. I am not sure what will happen when the duplicated inums exist > on tmpfs. ... Undeterministic behaviour when exporting via NFS? J. R. Okajima ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] vfs: get_next_ino(), never inum=0 2014-04-29 17:42 ` J. R. Okajima @ 2014-04-29 17:53 ` Christoph Hellwig 2014-04-30 4:08 ` J. R. Okajima 0 siblings, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2014-04-29 17:53 UTC (permalink / raw) To: J. R. Okajima; +Cc: dchinner, viro, linux-fsdevel On Wed, Apr 30, 2014 at 02:42:02AM +0900, J. R. Okajima wrote: > > > There is another unpleasant effect when get_next_ino() wraps > > around. When there is a file whose inum=100 on tmpfs, a new file may get > > inum=100. I am not sure what will happen when the duplicated inums exist > > on tmpfs. ... > > Undeterministic behaviour when exporting via NFS? If you care about really unique inode numbers you shouldn't use get_next_ino but something like an idr allocator. The default i_ino assigned in new_inode() from which get_next_ino was factored out was mostly intended for small synthetic filesystems with few enough inodes that it wouldn't wrap around. And yes, file handle based lookups are screwed by duplicated inode numbers, as are tools trying to do file level de-duplication, mostly in the backup or achival space. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] vfs: get_next_ino(), never inum=0 2014-04-29 17:53 ` Christoph Hellwig @ 2014-04-30 4:08 ` J. R. Okajima 2014-04-30 22:56 ` Andreas Dilger 0 siblings, 1 reply; 9+ messages in thread From: J. R. Okajima @ 2014-04-30 4:08 UTC (permalink / raw) To: Christoph Hellwig; +Cc: dchinner, viro, linux-fsdevel Christoph Hellwig: > If you care about really unique inode numbers you shouldn't use get_next_ino > but something like an idr allocator. The default i_ino assigned in > new_inode() from which get_next_ino was factored out was mostly intended > for small synthetic filesystems with few enough inodes that it wouldn't > wrap around. Grep-ping get_next_ino, I got 30 calls in mainline. How many of them are get_next_ino() inappropriate? I don't know. But at least for tmpfs, it is better to manage the inums by itself since tmpfs must be one of the biggest consumer of inums and it is NFS-exportable. Do you think we need a common function in VFS to manage inums per sb, or it is totally up to filesystem and the common function is unnecessary? Instead of idr, I was thinking about a simple bitmap in tmpfs such like this. It introduces a new mount option "ino" which forces tmpfs to assign the lowest unused number for a new inode within the mounted tmpfs. Without "ino" or specifying "noino", the behaviour is unchanged (use vfs:get_next_ino()). But it may not scale well due to the single spinlock every time. J. R. Okajima commit 214d38e8c34fb341fd0f37cc92614b5e93e0803b Author: J. R. Okajima <hooanon05@yahoo.co.jp> Date: Mon Sep 2 10:45:42 2013 +0900 shmem: management for inum Signed-off-by: J. R. Okajima <hooanon05g@gmail.com> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 4d1771c..39762e1 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -29,6 +29,8 @@ struct shmem_sb_info { unsigned long max_inodes; /* How many inodes are allowed */ unsigned long free_inodes; /* How many are left for allocation */ spinlock_t stat_lock; /* Serialize shmem_sb_info changes */ + spinlock_t ino_lock; + unsigned long *ino_bitmap; kuid_t uid; /* Mount uid for root directory */ kgid_t gid; /* Mount gid for root directory */ umode_t mode; /* Mount mode for root directory */ diff --git a/mm/shmem.c b/mm/shmem.c index 9f70e02..bc2c5e4 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -197,14 +197,61 @@ static int shmem_reserve_inode(struct super_block *sb) return 0; } -static void shmem_free_inode(struct super_block *sb) +static void shmem_free_inode(struct inode *inode) { + struct super_block *sb = inode->i_sb; struct shmem_sb_info *sbinfo = SHMEM_SB(sb); + if (sbinfo->max_inodes) { spin_lock(&sbinfo->stat_lock); sbinfo->free_inodes++; spin_unlock(&sbinfo->stat_lock); } + + if (!inode->i_nlink) { + spin_lock(&sbinfo->ino_lock); + if (sbinfo->ino_bitmap) + clear_bit(inode->i_ino - 2, sbinfo->ino_bitmap); + spin_unlock(&sbinfo->ino_lock); + } +} + +/* + * This is unsigned int instead of unsigned long. + * For details, see fs/inode.c:get_next_ino(). + */ +unsigned int shmem_next_ino(struct super_block *sb) +{ + unsigned long ino; + struct shmem_sb_info *sbinfo; + + ino = 0; + sbinfo = SHMEM_SB(sb); + if (sbinfo->ino_bitmap) { + spin_lock(&sbinfo->ino_lock); + /* + * someone else may remount, + * and ino_bitmap might be reset. + */ + if (sbinfo->ino_bitmap + && !bitmap_full(sbinfo->ino_bitmap, sbinfo->max_inodes)) { + ino = find_first_zero_bit(sbinfo->ino_bitmap, + sbinfo->max_inodes); + set_bit(ino, sbinfo->ino_bitmap); + ino += 2; /* ino 0 and 1 are reserved */ + } + spin_unlock(&sbinfo->ino_lock); + } + + /* + * someone else did remount, + * or ino_bitmap is unused originally, + * or ino_bimapt is full. + */ + if (!ino) + ino = get_next_ino(); + + return ino; } /** @@ -578,7 +625,7 @@ static void shmem_evict_inode(struct inode *inode) simple_xattrs_free(&info->xattrs); WARN_ON(inode->i_blocks); - shmem_free_inode(inode->i_sb); + shmem_free_inode(inode); clear_inode(inode); } @@ -1306,7 +1353,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode inode = new_inode(sb); if (inode) { - inode->i_ino = get_next_ino(); + inode->i_ino = shmem_next_ino(sb); inode_init_owner(inode, dir, mode); inode->i_blocks = 0; inode->i_mapping->backing_dev_info = &shmem_backing_dev_info; @@ -1348,7 +1395,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode break; } } else - shmem_free_inode(sb); + shmem_free_inode(inode); return inode; } @@ -1945,7 +1992,7 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry) struct inode *inode = dentry->d_inode; if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode)) - shmem_free_inode(inode->i_sb); + shmem_free_inode(inode); dir->i_size -= BOGO_DIRENT_SIZE; inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME; @@ -2315,6 +2362,54 @@ static const struct export_operations shmem_export_ops = { .fh_to_dentry = shmem_fh_to_dentry, }; +static void shmem_ino_bitmap(struct shmem_sb_info *sbinfo, + unsigned long prev_max) +{ + unsigned long *p; + unsigned long n, d; + int do_msg; + + n = sbinfo->max_inodes / BITS_PER_BYTE; + if (sbinfo->max_inodes % BITS_PER_BYTE) + n++; + + do_msg = 0; + if (sbinfo->ino_bitmap) { + /* + * by shrinking the bitmap, the large inode number in use + * may be left. but it is harmless. + */ + d = 0; + if (sbinfo->max_inodes > prev_max) { + d = sbinfo->max_inodes - prev_max; + d /= BITS_PER_BYTE; + } + spin_lock(&sbinfo->ino_lock); + p = krealloc(sbinfo->ino_bitmap, n, GFP_NOWAIT); + if (p) { + memset(p + n - d, 0, d); + sbinfo->ino_bitmap = p; + spin_unlock(&sbinfo->ino_lock); + } else { + p = sbinfo->ino_bitmap; + sbinfo->ino_bitmap = NULL; + spin_unlock(&sbinfo->ino_lock); + kfree(p); + do_msg = 1; + } + } else { + p = kzalloc(n, GFP_NOFS); + spin_lock(&sbinfo->ino_lock); + sbinfo->ino_bitmap = p; + spin_unlock(&sbinfo->ino_lock); + do_msg = !p; + } + + if (unlikely(do_msg)) + pr_err("%s: ino failed (%lu bytes). Ignored.\n", + __func__, n); +} + static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo, bool remount) { @@ -2322,7 +2417,10 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo, struct mempolicy *mpol = NULL; uid_t uid; gid_t gid; + bool do_ino; + unsigned long old_val = sbinfo->max_inodes; + do_ino = 0; while (options != NULL) { this_char = options; for (;;) { @@ -2342,6 +2440,14 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo, } if (!*this_char) continue; + if (!strcmp(this_char, "ino")) { + do_ino = 1; + continue; + } else if (!strcmp(this_char, "noino")) { + do_ino = 0; + continue; + } + if ((value = strchr(this_char,'=')) != NULL) { *value++ = 0; } else { @@ -2370,7 +2476,7 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo, goto bad_val; } else if (!strcmp(this_char,"nr_inodes")) { sbinfo->max_inodes = memparse(value, &rest); - if (*rest) + if (*rest || !sbinfo->max_inodes) goto bad_val; } else if (!strcmp(this_char,"mode")) { if (remount) @@ -2408,6 +2514,16 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo, } } sbinfo->mpol = mpol; + + if (do_ino) + shmem_ino_bitmap(sbinfo, old_val); + else if (sbinfo->ino_bitmap) { + void *p = sbinfo->ino_bitmap; + spin_lock(&sbinfo->ino_lock); + sbinfo->ino_bitmap = NULL; + spin_unlock(&sbinfo->ino_lock); + kfree(p); + } return 0; bad_val: @@ -2472,6 +2588,8 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root) sbinfo->max_blocks << (PAGE_CACHE_SHIFT - 10)); if (sbinfo->max_inodes != shmem_default_max_inodes()) seq_printf(seq, ",nr_inodes=%lu", sbinfo->max_inodes); + if (sbinfo->ino_bitmap) + seq_printf(seq, ",ino"); if (sbinfo->mode != (S_IRWXUGO | S_ISVTX)) seq_printf(seq, ",mode=%03ho", sbinfo->mode); if (!uid_eq(sbinfo->uid, GLOBAL_ROOT_UID)) @@ -2491,6 +2609,7 @@ static void shmem_put_super(struct super_block *sb) percpu_counter_destroy(&sbinfo->used_blocks); mpol_put(sbinfo->mpol); + kfree(sbinfo->ino_bitmap); kfree(sbinfo); sb->s_fs_info = NULL; } @@ -2510,6 +2629,7 @@ int shmem_fill_super(struct super_block *sb, void *data, int silent) sbinfo->mode = S_IRWXUGO | S_ISVTX; sbinfo->uid = current_fsuid(); sbinfo->gid = current_fsgid(); + spin_lock_init(&sbinfo->ino_lock); sb->s_fs_info = sbinfo; #ifdef CONFIG_TMPFS ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] vfs: get_next_ino(), never inum=0 2014-04-30 4:08 ` J. R. Okajima @ 2014-04-30 22:56 ` Andreas Dilger 2014-05-10 3:18 ` J. R. Okajima 0 siblings, 1 reply; 9+ messages in thread From: Andreas Dilger @ 2014-04-30 22:56 UTC (permalink / raw) To: J. R. Okajima; +Cc: Christoph Hellwig, dchinner, viro, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 10717 bytes --] On Apr 29, 2014, at 10:08 PM, J. R. Okajima <hooanon05g@gmail.com> wrote: > Christoph Hellwig wrote: >> If you care about really unique inode numbers you shouldn't use get_next_ino >> but something like an idr allocator. The default i_ino assigned in >> new_inode() from which get_next_ino was factored out was mostly intended >> for small synthetic filesystems with few enough inodes that it wouldn't >> wrap around. > > Grep-ping get_next_ino, I got 30 calls in mainline. > How many of them are get_next_ino() inappropriate? I don't know. But at > least for tmpfs, it is better to manage the inums by itself since tmpfs > must be one of the biggest consumer of inums and it is NFS-exportable. > > Do you think we need a common function in VFS to manage inums per sb, or > it is totally up to filesystem and the common function is unnecessary? > > Instead of idr, I was thinking about a simple bitmap in tmpfs such like > this. It introduces a new mount option "ino" which forces tmpfs to > assign the lowest unused number for a new inode within the mounted > tmpfs. Without "ino" or specifying "noino", the behaviour is unchanged > (use vfs:get_next_ino()). > But it may not scale well due to the single spinlock every time. The simplest solution is to just change get_next_ino() to return an unsigned long to match i_ino, instead of an int. That avoids any overhead in the most common cases (i.e. 64-bit systems where I highly doubt there will ever be a counter wrap). We've also been discussing changing i_ino to be u64 so that this works properly on 32-bit systems accessing 64-bit filesystems, but I don't know where that stands today. For 32-bit systems it would be possible to use get_next_ino() for the common case of inode numbers < 2^32, and only fall back to doing a lookup for an already-used inode in tmpfs if the counter wraps to 1. That would avoid overhead for 99% of users since they are unlikely to create more than 2^32 inodes in tmpfs over the lifetime of their system. Even in the check-if-inum-in-use case after the 2^32 wrap, it is very unlikely that many inodes would still be in use so the hash lookup should go relatively quickly. It could use something like an optimized find_inode() that just determined quickly if the hash entry was in use. That would avoid the constant spinlock contention in the most common cases, and only impose it for systems in rare cases. That said, I expect this overhead is more than just going to u64 for 32-bit systems. Cheers, Andreas > > commit 214d38e8c34fb341fd0f37cc92614b5e93e0803b > Author: J. R. Okajima <hooanon05@yahoo.co.jp> > Date: Mon Sep 2 10:45:42 2013 +0900 > > shmem: management for inum > > Signed-off-by: J. R. Okajima <hooanon05g@gmail.com> > > diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h > index 4d1771c..39762e1 100644 > --- a/include/linux/shmem_fs.h > +++ b/include/linux/shmem_fs.h > @@ -29,6 +29,8 @@ struct shmem_sb_info { > unsigned long max_inodes; /* How many inodes are allowed */ > unsigned long free_inodes; /* How many are left for allocation */ > spinlock_t stat_lock; /* Serialize shmem_sb_info changes */ > + spinlock_t ino_lock; > + unsigned long *ino_bitmap; > kuid_t uid; /* Mount uid for root directory */ > kgid_t gid; /* Mount gid for root directory */ > umode_t mode; /* Mount mode for root directory */ > diff --git a/mm/shmem.c b/mm/shmem.c > index 9f70e02..bc2c5e4 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -197,14 +197,61 @@ static int shmem_reserve_inode(struct super_block *sb) > return 0; > } > > -static void shmem_free_inode(struct super_block *sb) > +static void shmem_free_inode(struct inode *inode) > { > + struct super_block *sb = inode->i_sb; > struct shmem_sb_info *sbinfo = SHMEM_SB(sb); > + > if (sbinfo->max_inodes) { > spin_lock(&sbinfo->stat_lock); > sbinfo->free_inodes++; > spin_unlock(&sbinfo->stat_lock); > } > + > + if (!inode->i_nlink) { > + spin_lock(&sbinfo->ino_lock); > + if (sbinfo->ino_bitmap) > + clear_bit(inode->i_ino - 2, sbinfo->ino_bitmap); > + spin_unlock(&sbinfo->ino_lock); > + } > +} > + > +/* > + * This is unsigned int instead of unsigned long. > + * For details, see fs/inode.c:get_next_ino(). > + */ > +unsigned int shmem_next_ino(struct super_block *sb) > +{ > + unsigned long ino; > + struct shmem_sb_info *sbinfo; > + > + ino = 0; > + sbinfo = SHMEM_SB(sb); > + if (sbinfo->ino_bitmap) { > + spin_lock(&sbinfo->ino_lock); > + /* > + * someone else may remount, > + * and ino_bitmap might be reset. > + */ > + if (sbinfo->ino_bitmap > + && !bitmap_full(sbinfo->ino_bitmap, sbinfo->max_inodes)) { > + ino = find_first_zero_bit(sbinfo->ino_bitmap, > + sbinfo->max_inodes); > + set_bit(ino, sbinfo->ino_bitmap); > + ino += 2; /* ino 0 and 1 are reserved */ > + } > + spin_unlock(&sbinfo->ino_lock); > + } > + > + /* > + * someone else did remount, > + * or ino_bitmap is unused originally, > + * or ino_bimapt is full. > + */ > + if (!ino) > + ino = get_next_ino(); > + > + return ino; > } > > /** > @@ -578,7 +625,7 @@ static void shmem_evict_inode(struct inode *inode) > > simple_xattrs_free(&info->xattrs); > WARN_ON(inode->i_blocks); > - shmem_free_inode(inode->i_sb); > + shmem_free_inode(inode); > clear_inode(inode); > } > > @@ -1306,7 +1353,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode > > inode = new_inode(sb); > if (inode) { > - inode->i_ino = get_next_ino(); > + inode->i_ino = shmem_next_ino(sb); > inode_init_owner(inode, dir, mode); > inode->i_blocks = 0; > inode->i_mapping->backing_dev_info = &shmem_backing_dev_info; > @@ -1348,7 +1395,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode > break; > } > } else > - shmem_free_inode(sb); > + shmem_free_inode(inode); > return inode; > } > > @@ -1945,7 +1992,7 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry) > struct inode *inode = dentry->d_inode; > > if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode)) > - shmem_free_inode(inode->i_sb); > + shmem_free_inode(inode); > > dir->i_size -= BOGO_DIRENT_SIZE; > inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME; > @@ -2315,6 +2362,54 @@ static const struct export_operations shmem_export_ops = { > .fh_to_dentry = shmem_fh_to_dentry, > }; > > +static void shmem_ino_bitmap(struct shmem_sb_info *sbinfo, > + unsigned long prev_max) > +{ > + unsigned long *p; > + unsigned long n, d; > + int do_msg; > + > + n = sbinfo->max_inodes / BITS_PER_BYTE; > + if (sbinfo->max_inodes % BITS_PER_BYTE) > + n++; > + > + do_msg = 0; > + if (sbinfo->ino_bitmap) { > + /* > + * by shrinking the bitmap, the large inode number in use > + * may be left. but it is harmless. > + */ > + d = 0; > + if (sbinfo->max_inodes > prev_max) { > + d = sbinfo->max_inodes - prev_max; > + d /= BITS_PER_BYTE; > + } > + spin_lock(&sbinfo->ino_lock); > + p = krealloc(sbinfo->ino_bitmap, n, GFP_NOWAIT); > + if (p) { > + memset(p + n - d, 0, d); > + sbinfo->ino_bitmap = p; > + spin_unlock(&sbinfo->ino_lock); > + } else { > + p = sbinfo->ino_bitmap; > + sbinfo->ino_bitmap = NULL; > + spin_unlock(&sbinfo->ino_lock); > + kfree(p); > + do_msg = 1; > + } > + } else { > + p = kzalloc(n, GFP_NOFS); > + spin_lock(&sbinfo->ino_lock); > + sbinfo->ino_bitmap = p; > + spin_unlock(&sbinfo->ino_lock); > + do_msg = !p; > + } > + > + if (unlikely(do_msg)) > + pr_err("%s: ino failed (%lu bytes). Ignored.\n", > + __func__, n); > +} > + > static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo, > bool remount) > { > @@ -2322,7 +2417,10 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo, > struct mempolicy *mpol = NULL; > uid_t uid; > gid_t gid; > + bool do_ino; > + unsigned long old_val = sbinfo->max_inodes; > > + do_ino = 0; > while (options != NULL) { > this_char = options; > for (;;) { > @@ -2342,6 +2440,14 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo, > } > if (!*this_char) > continue; > + if (!strcmp(this_char, "ino")) { > + do_ino = 1; > + continue; > + } else if (!strcmp(this_char, "noino")) { > + do_ino = 0; > + continue; > + } > + > if ((value = strchr(this_char,'=')) != NULL) { > *value++ = 0; > } else { > @@ -2370,7 +2476,7 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo, > goto bad_val; > } else if (!strcmp(this_char,"nr_inodes")) { > sbinfo->max_inodes = memparse(value, &rest); > - if (*rest) > + if (*rest || !sbinfo->max_inodes) > goto bad_val; > } else if (!strcmp(this_char,"mode")) { > if (remount) > @@ -2408,6 +2514,16 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo, > } > } > sbinfo->mpol = mpol; > + > + if (do_ino) > + shmem_ino_bitmap(sbinfo, old_val); > + else if (sbinfo->ino_bitmap) { > + void *p = sbinfo->ino_bitmap; > + spin_lock(&sbinfo->ino_lock); > + sbinfo->ino_bitmap = NULL; > + spin_unlock(&sbinfo->ino_lock); > + kfree(p); > + } > return 0; > > bad_val: > @@ -2472,6 +2588,8 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root) > sbinfo->max_blocks << (PAGE_CACHE_SHIFT - 10)); > if (sbinfo->max_inodes != shmem_default_max_inodes()) > seq_printf(seq, ",nr_inodes=%lu", sbinfo->max_inodes); > + if (sbinfo->ino_bitmap) > + seq_printf(seq, ",ino"); > if (sbinfo->mode != (S_IRWXUGO | S_ISVTX)) > seq_printf(seq, ",mode=%03ho", sbinfo->mode); > if (!uid_eq(sbinfo->uid, GLOBAL_ROOT_UID)) > @@ -2491,6 +2609,7 @@ static void shmem_put_super(struct super_block *sb) > > percpu_counter_destroy(&sbinfo->used_blocks); > mpol_put(sbinfo->mpol); > + kfree(sbinfo->ino_bitmap); > kfree(sbinfo); > sb->s_fs_info = NULL; > } > @@ -2510,6 +2629,7 @@ int shmem_fill_super(struct super_block *sb, void *data, int silent) > sbinfo->mode = S_IRWXUGO | S_ISVTX; > sbinfo->uid = current_fsuid(); > sbinfo->gid = current_fsgid(); > + spin_lock_init(&sbinfo->ino_lock); > sb->s_fs_info = sbinfo; > > #ifdef CONFIG_TMPFS > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] vfs: get_next_ino(), never inum=0 2014-04-30 22:56 ` Andreas Dilger @ 2014-05-10 3:18 ` J. R. Okajima 0 siblings, 0 replies; 9+ messages in thread From: J. R. Okajima @ 2014-05-10 3:18 UTC (permalink / raw) To: Andreas Dilger; +Cc: Christoph Hellwig, dchinner, viro, linux-fsdevel Andreas Dilger: > The simplest solution is to just change get_next_ino() to return an > unsigned long to match i_ino, instead of an int. That avoids any > overhead in the most common cases (i.e. 64-bit systems where I highly > doubt there will ever be a counter wrap). > > We've also been discussing changing i_ino to be u64 so that this works > properly on 32-bit systems accessing 64-bit filesystems, but I don't > know where that stands today. I agree that such wrap-around won't happen easily although it will happen technically. At the same time, I am not sure chaging u64 is safe to 32bit systems. If nothing wrong happens, I agree get_next_ino() returns u64. Otherwise, "if (unlikely(!inum)) inum++" is necessary. > For 32-bit systems it would be possible to use get_next_ino() for the > common case of inode numbers < 2^32, and only fall back to doing a > lookup for an already-used inode in tmpfs if the counter wraps to 1. How can tmpfs detect the wrap-around? By storing the last largest inum locally? > That would avoid overhead for 99% of users since they are unlikely > to create more than 2^32 inodes in tmpfs over the lifetime of their > system. Even in the check-if-inum-in-use case after the 2^32 wrap, > it is very unlikely that many inodes would still be in use so the > hash lookup should go relatively quickly. I agree that so many inodes won't live so long. By the way, The reason I took the bitmap approach is to keep the inum small numbers. That is my local requiment and I know it won't be necessary for generic use. J. R. Okajima ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] vfs: get_next_ino(), never inum=0 2014-04-29 15:45 [PATCH] vfs: get_next_ino(), never inum=0 hooanon05g 2014-04-29 17:42 ` J. R. Okajima @ 2014-08-18 18:21 ` Carlos Maiolino 2014-08-19 0:58 ` J. R. Okajima 1 sibling, 1 reply; 9+ messages in thread From: Carlos Maiolino @ 2014-08-18 18:21 UTC (permalink / raw) To: linux-fsdevel This V2 looks very reasonable, and fix the problem with files with inode=0 on tmpfs which I tested here, so, consider it Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Cheers -- Carlos ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] vfs: get_next_ino(), never inum=0 2014-08-18 18:21 ` [PATCH v2] " Carlos Maiolino @ 2014-08-19 0:58 ` J. R. Okajima 0 siblings, 0 replies; 9+ messages in thread From: J. R. Okajima @ 2014-08-19 0:58 UTC (permalink / raw) To: Carlos Maiolino; +Cc: linux-fsdevel Carlos Maiolino: > This V2 looks very reasonable, and fix the problem with files with inode=0 on > tmpfs which I tested here, so, consider it > > Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Just out of curious, how did you notice the problem of inode=0? I think it is hard for everyone to meet the problem. And after posting the patch, some people reported me a bug related to Sysv shm. This extra patch supports Sysv shm. But I don't like it since it introduces an additional condition into the very normal path. J. R. Okajima diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index ca658a8..fda816e 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -25,6 +25,7 @@ struct shmem_inode_info { struct shmem_sb_info { struct mutex idr_lock; + bool idr_nouse; struct idr idr; /* manages inode-number */ unsigned long max_blocks; /* How many blocks are allowed */ struct percpu_counter used_blocks; /* How many are allocated */ diff --git a/mm/shmem.c b/mm/shmem.c index 0aa3b85..5eb75e9 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -648,7 +648,7 @@ static void shmem_evict_inode(struct inode *inode) simple_xattrs_free(&info->xattrs); WARN_ON(inode->i_blocks); - if (inode->i_ino) { + if (!sbinfo->idr_nouse && inode->i_ino) { mutex_lock(&sbinfo->idr_lock); idr_remove(&sbinfo->idr, inode->i_ino); mutex_unlock(&sbinfo->idr_lock); @@ -1423,19 +1423,24 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode break; } - /* inum 0 and 1 are unused */ - mutex_lock(&sbinfo->idr_lock); - ino = idr_alloc(&sbinfo->idr, inode, 2, INT_MAX, GFP_NOFS); - if (ino > 0) { - inode->i_ino = ino; - mutex_unlock(&sbinfo->idr_lock); - __insert_inode_hash(inode, inode->i_ino); - } else { - inode->i_ino = 0; - mutex_unlock(&sbinfo->idr_lock); - iput(inode); /* shmem_free_inode() will be called */ - inode = NULL; - } + if (!sbinfo->idr_nouse) { + /* inum 0 and 1 are unused */ + mutex_lock(&sbinfo->idr_lock); + ino = idr_alloc(&sbinfo->idr, inode, 2, INT_MAX, + GFP_NOFS); + if (ino > 0) { + inode->i_ino = ino; + mutex_unlock(&sbinfo->idr_lock); + __insert_inode_hash(inode, inode->i_ino); + } else { + inode->i_ino = 0; + mutex_unlock(&sbinfo->idr_lock); + iput(inode); + /* shmem_free_inode() will be called */ + inode = NULL; + } + } else + inode->i_ino = get_next_ino(); } else shmem_free_inode(sb); return inode; @@ -2560,7 +2565,8 @@ static void shmem_put_super(struct super_block *sb) { struct shmem_sb_info *sbinfo = SHMEM_SB(sb); - idr_destroy(&sbinfo->idr); + if (!sbinfo->idr_nouse) + idr_destroy(&sbinfo->idr); percpu_counter_destroy(&sbinfo->used_blocks); mpol_put(sbinfo->mpol); kfree(sbinfo); @@ -2682,6 +2688,15 @@ static void shmem_destroy_inodecache(void) kmem_cache_destroy(shmem_inode_cachep); } +static __init void shmem_no_idr(struct super_block *sb) +{ + struct shmem_sb_info *sbinfo; + + sbinfo = SHMEM_SB(sb); + sbinfo->idr_nouse = true; + idr_destroy(&sbinfo->idr); +} + static const struct address_space_operations shmem_aops = { .writepage = shmem_writepage, .set_page_dirty = __set_page_dirty_no_writeback, @@ -2814,6 +2829,7 @@ int __init shmem_init(void) printk(KERN_ERR "Could not kern_mount tmpfs\n"); goto out1; } + shmem_no_idr(shm_mnt->mnt_sb); return 0; out1: ^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <'<CANn89i+PBEGp=9QGRioa7CUDZmApT-UNa=OJTdz4eu7AyO3Kbw@mail.gmail.com>]
* [PATCH v2] vfs: get_next_ino(), never inum=0 [not found] <'<CANn89i+PBEGp=9QGRioa7CUDZmApT-UNa=OJTdz4eu7AyO3Kbw@mail.gmail.com> @ 2014-05-28 14:06 ` J. R. Okajima 0 siblings, 0 replies; 9+ messages in thread From: J. R. Okajima @ 2014-05-28 14:06 UTC (permalink / raw) To: linux-fsdevel, dchinner, viro, Eric Dumazet, Hugh Dickins, Christoph Hellwig, Andreas Dilger, Jan Kara It is very rare for get_next_ino() to return zero as a new inode number since its type is unsigned int, but it can surely happen eventually. Interestingly, ls(1) and find(1) (actually readdir(3)) don't show a file whose inum is zero, so people won't be able to find it. This issue may be harmful especially for tmpfs. On a very long lived and busy system, users may frequently create files on tmpfs. And if unluckily he gets inum=0, then he cannot see its filename. If he remembers its name, he may be able to use or unlink it by its name since the file surely exists. Otherwise, the file remains on tmpfs silently. No one can touch it. This behaviour looks like resource leak. As a worse case, if a dir gets inum=0 and a user creates several files under it, then the leaked memory will increase since a user cannot see the name of all files under the dir whose inum=0, regardless the inum of the children. There is another unpleasant effect when get_next_ino() wraps around. When there is a file whose inum=100 on tmpfs, a new file may get inum=100, ie. the duplicated inums. I am not sure what will happen when the duplicated inums exist on tmpfs. If it happens, then some tools won't work correctly such as backup tools, I am afraid. Anyway this is not a issue in get_next_ino(). It should be fixed in mm/shmem.c separatly if it is really necessary. There are many other get_next_ino() callers other than tmpfs, such as several drivers, anon_inode, autofs4, freevxfs, procfs, pis, hugetlbfs, configfs, ramfs, fuse, ocfs2, debugfs, securityfs, cgroup, socket, ipc. Some of them will not care inum so this issue is harmless for them. But the others may suffer from inum=0. For example, if procfs gets inum=0 for a task dir (or for one of its children), then several utilities won't work correctly, including ps(1), lsof(8), etc. (Essentially the patch is re-written by Eric Dumazet.) Cc: Eric Dumazet <edumazet@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andreas Dilger <adilger@dilger.ca> Cc: Jan Kara <jack@suse.cz> Signed-off-by: J. R. Okajima <hooanon05g@gmail.com> --- fs/inode.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/inode.c b/fs/inode.c index 567296b..58e7c56 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -840,6 +840,8 @@ unsigned int get_next_ino(void) unsigned int *p = &get_cpu_var(last_ino); unsigned int res = *p; +start: + #ifdef CONFIG_SMP if (unlikely((res & (LAST_INO_BATCH-1)) == 0)) { static atomic_t shared_last_ino; @@ -849,7 +851,9 @@ unsigned int get_next_ino(void) } #endif - *p = ++res; + if (unlikely(!++res)) + goto start; /* never zero */ + *p = res; put_cpu_var(last_ino); WARN(!res, "static inum wrapped around"); return res; -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-08-19 1:05 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-29 15:45 [PATCH] vfs: get_next_ino(), never inum=0 hooanon05g 2014-04-29 17:42 ` J. R. Okajima 2014-04-29 17:53 ` Christoph Hellwig 2014-04-30 4:08 ` J. R. Okajima 2014-04-30 22:56 ` Andreas Dilger 2014-05-10 3:18 ` J. R. Okajima 2014-08-18 18:21 ` [PATCH v2] " Carlos Maiolino 2014-08-19 0:58 ` J. R. Okajima [not found] <'<CANn89i+PBEGp=9QGRioa7CUDZmApT-UNa=OJTdz4eu7AyO3Kbw@mail.gmail.com> 2014-05-28 14:06 ` J. R. Okajima
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).