* [PATCH V3] fs: allow for more than 2^31 files [not found] ` <20100927.153639.212415479.davem@davemloft.net> @ 2010-09-28 3:46 ` Eric Dumazet 2010-09-28 4:10 ` David Miller 2010-09-30 20:26 ` Robin Holt 0 siblings, 2 replies; 9+ messages in thread From: Eric Dumazet @ 2010-09-28 3:46 UTC (permalink / raw) To: David Miller Cc: dipankar, holt, viro, bcrl, den, mingo, mszeredi, cmm, npiggin, xemul, linux-kernel, netdev Le lundi 27 septembre 2010 à 15:36 -0700, David Miller a écrit : > Is someone following up on integrating this upstream so this thing > gets fixed? > Thanks for the heads-up. I am not sure V2 of my patch was reviewed, maybe it did not reach the list. Here is V3 of it. I removed the ATOMIC_INIT(0) I left in V2. It should be an ATOMIC_LONG_INIT(0), but then, it can be avoided. CC netdev Thanks [PATCH V3] fs: allow for more than 2^31 files Robin Holt tried to boot a 16TB system and found af_unix was overflowing a 32bit value : <quote> We were seeing a failure which prevented boot. The kernel was incapable of creating either a named pipe or unix domain socket. This comes down to a common kernel function called unix_create1() which does: atomic_inc(&unix_nr_socks); if (atomic_read(&unix_nr_socks) > 2 * get_max_files()) goto out; The function get_max_files() is a simple return of files_stat.max_files. files_stat.max_files is a signed integer and is computed in fs/file_table.c's files_init(). n = (mempages * (PAGE_SIZE / 1024)) / 10; files_stat.max_files = n; In our case, mempages (total_ram_pages) is approx 3,758,096,384 (0xe0000000). That leaves max_files at approximately 1,503,238,553. This causes 2 * get_max_files() to integer overflow. </quote> Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long integers, and change af_unix to use an atomic_long_t instead of atomic_t. get_max_files() is changed to return an unsigned long. get_nr_files() is changed to return a long. unix_nr_socks is changed from atomic_t to atomic_long_t, while not strictly needed to address Robin problem. Before patch (on a 64bit kernel) : # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max -18446744071562067968 After patch: # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max 2147483648 Reported-by: Robin Holt <holt@sgi.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- fs/file_table.c | 15 ++++++--------- include/linux/fs.h | 8 ++++---- kernel/sysctl.c | 8 ++++---- net/unix/af_unix.c | 14 +++++++------- 4 files changed, 21 insertions(+), 24 deletions(-) diff --git a/fs/file_table.c b/fs/file_table.c index a04bdd8..46457ba 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -60,7 +60,7 @@ static inline void file_free(struct file *f) /* * Return the total number of open files in the system */ -static int get_nr_files(void) +static long get_nr_files(void) { return percpu_counter_read_positive(&nr_files); } @@ -68,7 +68,7 @@ static int get_nr_files(void) /* * Return the maximum number of open files in the system */ -int get_max_files(void) +unsigned long get_max_files(void) { return files_stat.max_files; } @@ -105,7 +105,7 @@ int proc_nr_files(ctl_table *table, int write, struct file *get_empty_filp(void) { const struct cred *cred = current_cred(); - static int old_max; + static long old_max; struct file * f; /* @@ -140,8 +140,7 @@ struct file *get_empty_filp(void) over: /* Ran out of filps - report that */ if (get_nr_files() > old_max) { - printk(KERN_INFO "VFS: file-max limit %d reached\n", - get_max_files()); + pr_info("VFS: file-max limit %lu reached\n", get_max_files()); old_max = get_nr_files(); } goto fail; @@ -487,7 +486,7 @@ retry: void __init files_init(unsigned long mempages) { - int n; + unsigned long n; filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL); @@ -498,9 +497,7 @@ void __init files_init(unsigned long mempages) */ n = (mempages * (PAGE_SIZE / 1024)) / 10; - files_stat.max_files = n; - if (files_stat.max_files < NR_FILE) - files_stat.max_files = NR_FILE; + files_stat.max_files = max_t(unsigned long, n, NR_FILE); files_defer_init(); lg_lock_init(files_lglock); percpu_counter_init(&nr_files, 0); diff --git a/include/linux/fs.h b/include/linux/fs.h index 63d069b..8c06590 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -34,9 +34,9 @@ /* And dynamically-tunable limits and defaults: */ struct files_stat_struct { - int nr_files; /* read only */ - int nr_free_files; /* read only */ - int max_files; /* tunable */ + unsigned long nr_files; /* read only */ + unsigned long nr_free_files; /* read only */ + unsigned long max_files; /* tunable */ }; struct inodes_stat_t { @@ -404,7 +404,7 @@ extern void __init inode_init_early(void); extern void __init files_init(unsigned long); extern struct files_stat_struct files_stat; -extern int get_max_files(void); +extern unsigned long get_max_files(void); extern int sysctl_nr_open; extern struct inodes_stat_t inodes_stat; extern int leases_enable, lease_break_time; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index f88552c..fc667bf 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1352,16 +1352,16 @@ static struct ctl_table fs_table[] = { { .procname = "file-nr", .data = &files_stat, - .maxlen = 3*sizeof(int), + .maxlen = sizeof(files_stat), .mode = 0444, - .proc_handler = proc_nr_files, + .proc_handler = proc_doulongvec_minmax, }, { .procname = "file-max", .data = &files_stat.max_files, - .maxlen = sizeof(int), + .maxlen = sizeof(files_stat.max_files), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_doulongvec_minmax, }, { .procname = "nr_open", diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 0b39b24..3e1d7d1 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -117,7 +117,7 @@ static struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1]; static DEFINE_SPINLOCK(unix_table_lock); -static atomic_t unix_nr_socks = ATOMIC_INIT(0); +static atomic_long_t unix_nr_socks; #define unix_sockets_unbound (&unix_socket_table[UNIX_HASH_SIZE]) @@ -360,13 +360,13 @@ static void unix_sock_destructor(struct sock *sk) if (u->addr) unix_release_addr(u->addr); - atomic_dec(&unix_nr_socks); + atomic_long_dec(&unix_nr_socks); local_bh_disable(); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); local_bh_enable(); #ifdef UNIX_REFCNT_DEBUG - printk(KERN_DEBUG "UNIX %p is destroyed, %d are still alive.\n", sk, - atomic_read(&unix_nr_socks)); + printk(KERN_DEBUG "UNIX %p is destroyed, %ld are still alive.\n", sk, + atomic_long_read(&unix_nr_socks)); #endif } @@ -606,8 +606,8 @@ static struct sock *unix_create1(struct net *net, struct socket *sock) struct sock *sk = NULL; struct unix_sock *u; - atomic_inc(&unix_nr_socks); - if (atomic_read(&unix_nr_socks) > 2 * get_max_files()) + atomic_long_inc(&unix_nr_socks); + if (atomic_long_read(&unix_nr_socks) > 2 * get_max_files()) goto out; sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_proto); @@ -632,7 +632,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock) unix_insert_socket(unix_sockets_unbound, sk); out: if (sk == NULL) - atomic_dec(&unix_nr_socks); + atomic_long_dec(&unix_nr_socks); else { local_bh_disable(); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH V3] fs: allow for more than 2^31 files 2010-09-28 3:46 ` [PATCH V3] fs: allow for more than 2^31 files Eric Dumazet @ 2010-09-28 4:10 ` David Miller 2010-09-30 20:26 ` Robin Holt 1 sibling, 0 replies; 9+ messages in thread From: David Miller @ 2010-09-28 4:10 UTC (permalink / raw) To: eric.dumazet Cc: dipankar, holt, viro, bcrl, den, mingo, mszeredi, cmm, npiggin, xemul, linux-kernel, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 28 Sep 2010 05:46:51 +0200 > [PATCH V3] fs: allow for more than 2^31 files ... > Reported-by: Robin Holt <holt@sgi.com> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V3] fs: allow for more than 2^31 files 2010-09-28 3:46 ` [PATCH V3] fs: allow for more than 2^31 files Eric Dumazet 2010-09-28 4:10 ` David Miller @ 2010-09-30 20:26 ` Robin Holt 2010-09-30 20:45 ` Eric Dumazet 1 sibling, 1 reply; 9+ messages in thread From: Robin Holt @ 2010-09-30 20:26 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, dipankar, holt, viro, bcrl, den, mingo, mszeredi, cmm, npiggin, xemul, linux-kernel, netdev On Tue, Sep 28, 2010 at 05:46:51AM +0200, Eric Dumazet wrote: > Le lundi 27 septembre 2010 à 15:36 -0700, David Miller a écrit : ... > Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long > integers, and change af_unix to use an atomic_long_t instead of > atomic_t. > > get_max_files() is changed to return an unsigned long. I _THINK_ we actually want get_max_files to return a long and have the files_stat_struct definitions be longs. If we do not have it that way, we could theoretically open enough files on a single cpu to make get_nr_files return a negative without overflowing max_files. That, of course, would require an insane amount of memory, but I think it is technically more correct. > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -1352,16 +1352,16 @@ static struct ctl_table fs_table[] = { > { > .procname = "file-nr", > .data = &files_stat, > - .maxlen = 3*sizeof(int), > + .maxlen = sizeof(files_stat), > .mode = 0444, > - .proc_handler = proc_nr_files, > + .proc_handler = proc_doulongvec_minmax, With this change, don't we lose the current nr_files value? I think you need proc_nr_files to stay as it was. If you disagree, we should probably eliminate the definitions for proc_nr_files as I don't believe they are used anywhere else. Thanks, Robin ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V3] fs: allow for more than 2^31 files 2010-09-30 20:26 ` Robin Holt @ 2010-09-30 20:45 ` Eric Dumazet 2010-10-01 4:34 ` Robin Holt 0 siblings, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2010-09-30 20:45 UTC (permalink / raw) To: Robin Holt Cc: David Miller, dipankar, viro, bcrl, den, mingo, mszeredi, cmm, npiggin, xemul, linux-kernel, netdev Le jeudi 30 septembre 2010 à 15:26 -0500, Robin Holt a écrit : > On Tue, Sep 28, 2010 at 05:46:51AM +0200, Eric Dumazet wrote: > > Le lundi 27 septembre 2010 à 15:36 -0700, David Miller a écrit : > ... > > > Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long > > integers, and change af_unix to use an atomic_long_t instead of > > atomic_t. > > > > get_max_files() is changed to return an unsigned long. > > I _THINK_ we actually want get_max_files to return a long and have > the files_stat_struct definitions be longs. If we do not have it that > way, we could theoretically open enough files on a single cpu to make > get_nr_files return a negative without overflowing max_files. That, > of course, would require an insane amount of memory, but I think it is > technically more correct. > Number of opened file is technically a positive (or null) value, I have no idea why you want it being signed. > > > --- a/kernel/sysctl.c > > +++ b/kernel/sysctl.c > > @@ -1352,16 +1352,16 @@ static struct ctl_table fs_table[] = { > > { > > .procname = "file-nr", > > .data = &files_stat, > > - .maxlen = 3*sizeof(int), > > + .maxlen = sizeof(files_stat), > > .mode = 0444, > > - .proc_handler = proc_nr_files, > > + .proc_handler = proc_doulongvec_minmax, > > With this change, don't we lose the current nr_files value? I think > you need proc_nr_files to stay as it was. If you disagree, we should > probably eliminate the definitions for proc_nr_files as I don't believe > they are used anywhere else. > I have no idea why you think I changed something. I only made the value use 64bit on 64bit arches, so that we are not anymore limited to 2^31 files. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V3] fs: allow for more than 2^31 files 2010-09-30 20:45 ` Eric Dumazet @ 2010-10-01 4:34 ` Robin Holt 2010-10-01 5:03 ` Eric Dumazet 0 siblings, 1 reply; 9+ messages in thread From: Robin Holt @ 2010-10-01 4:34 UTC (permalink / raw) To: Eric Dumazet Cc: Robin Holt, David Miller, dipankar, viro, bcrl, den, mingo, mszeredi, cmm, npiggin, xemul, linux-kernel, netdev On Thu, Sep 30, 2010 at 10:45:45PM +0200, Eric Dumazet wrote: > Le jeudi 30 septembre 2010 à 15:26 -0500, Robin Holt a écrit : > > On Tue, Sep 28, 2010 at 05:46:51AM +0200, Eric Dumazet wrote: > > > Le lundi 27 septembre 2010 à 15:36 -0700, David Miller a écrit : > > ... > > > > > Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long > > > integers, and change af_unix to use an atomic_long_t instead of > > > atomic_t. > > > > > > get_max_files() is changed to return an unsigned long. > > > > I _THINK_ we actually want get_max_files to return a long and have > > the files_stat_struct definitions be longs. If we do not have it that > > way, we could theoretically open enough files on a single cpu to make > > get_nr_files return a negative without overflowing max_files. That, > > of course, would require an insane amount of memory, but I think it is > > technically more correct. > > > > Number of opened file is technically a positive (or null) value, I have > no idea why you want it being signed. > > > > > > --- a/kernel/sysctl.c > > > +++ b/kernel/sysctl.c > > > @@ -1352,16 +1352,16 @@ static struct ctl_table fs_table[] = { > > > { > > > .procname = "file-nr", > > > .data = &files_stat, > > > - .maxlen = 3*sizeof(int), > > > + .maxlen = sizeof(files_stat), > > > .mode = 0444, > > > - .proc_handler = proc_nr_files, > > > + .proc_handler = proc_doulongvec_minmax, > > > > With this change, don't we lose the current nr_files value? I think > > you need proc_nr_files to stay as it was. If you disagree, we should > > probably eliminate the definitions for proc_nr_files as I don't believe > > they are used anywhere else. > > > > I have no idea why you think I changed something. I only made the value > use 64bit on 64bit arches, so that we are not anymore limited to 2^31 > files. The proc_handler used to be proc_nr_files() which would call get_nr_files() and deposit the result in files_stat.nr_files then cascade to proc_dointvec() which would dump the 3 values. Now it will dump the three values, but not update the middle (nr_files) value first. Robin ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V3] fs: allow for more than 2^31 files 2010-10-01 4:34 ` Robin Holt @ 2010-10-01 5:03 ` Eric Dumazet 2010-10-01 5:29 ` [PATCH V4] " Eric Dumazet 0 siblings, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2010-10-01 5:03 UTC (permalink / raw) To: Robin Holt Cc: David Miller, dipankar, viro, bcrl, den, mingo, mszeredi, cmm, npiggin, xemul, linux-kernel, netdev Le jeudi 30 septembre 2010 à 23:34 -0500, Robin Holt a écrit : > The proc_handler used to be proc_nr_files() which would call > get_nr_files() and deposit the result in files_stat.nr_files then cascade > to proc_dointvec() which would dump the 3 values. Now it will dump the > three values, but not update the middle (nr_files) value first. > Ah I get it now, thanks ! I'll send a V4 shortly. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH V4] fs: allow for more than 2^31 files 2010-10-01 5:03 ` Eric Dumazet @ 2010-10-01 5:29 ` Eric Dumazet 2010-10-01 13:38 ` Robin Holt 2010-10-05 7:32 ` Eric Dumazet 0 siblings, 2 replies; 9+ messages in thread From: Eric Dumazet @ 2010-10-01 5:29 UTC (permalink / raw) To: Robin Holt Cc: David Miller, dipankar, viro, bcrl, den, mingo, mszeredi, cmm, npiggin, xemul, linux-kernel, netdev Le vendredi 01 octobre 2010 à 07:03 +0200, Eric Dumazet a écrit : > Le jeudi 30 septembre 2010 à 23:34 -0500, Robin Holt a écrit : > > > The proc_handler used to be proc_nr_files() which would call > > get_nr_files() and deposit the result in files_stat.nr_files then cascade > > to proc_dointvec() which would dump the 3 values. Now it will dump the > > three values, but not update the middle (nr_files) value first. > > > > Ah I get it now, thanks ! > > I'll send a V4 shortly. > > In this v4, I call proc_nr_files() again, and proc_nr_files() calls proc_doulongvec_minmax() instead of proc_dointvec() Added the "cat /proc/sys/fs/file-nr" in Changelog Thanks again Robin [PATCH V3] fs: allow for more than 2^31 files Robin Holt tried to boot a 16TB system and found af_unix was overflowing a 32bit value : <quote> We were seeing a failure which prevented boot. The kernel was incapable of creating either a named pipe or unix domain socket. This comes down to a common kernel function called unix_create1() which does: atomic_inc(&unix_nr_socks); if (atomic_read(&unix_nr_socks) > 2 * get_max_files()) goto out; The function get_max_files() is a simple return of files_stat.max_files. files_stat.max_files is a signed integer and is computed in fs/file_table.c's files_init(). n = (mempages * (PAGE_SIZE / 1024)) / 10; files_stat.max_files = n; In our case, mempages (total_ram_pages) is approx 3,758,096,384 (0xe0000000). That leaves max_files at approximately 1,503,238,553. This causes 2 * get_max_files() to integer overflow. </quote> Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long integers, and change af_unix to use an atomic_long_t instead of atomic_t. get_max_files() is changed to return an unsigned long. get_nr_files() is changed to return a long. unix_nr_socks is changed from atomic_t to atomic_long_t, while not strictly needed to address Robin problem. Before patch (on a 64bit kernel) : # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max -18446744071562067968 After patch: # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max 2147483648 # cat /proc/sys/fs/file-nr 704 0 2147483648 Reported-by: Robin Holt <holt@sgi.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- fs/file_table.c | 17 +++++++---------- include/linux/fs.h | 8 ++++---- kernel/sysctl.c | 6 +++--- net/unix/af_unix.c | 14 +++++++------- 4 files changed, 21 insertions(+), 24 deletions(-) diff --git a/fs/file_table.c b/fs/file_table.c index a04bdd8..c3dee38 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -60,7 +60,7 @@ static inline void file_free(struct file *f) /* * Return the total number of open files in the system */ -static int get_nr_files(void) +static long get_nr_files(void) { return percpu_counter_read_positive(&nr_files); } @@ -68,7 +68,7 @@ static int get_nr_files(void) /* * Return the maximum number of open files in the system */ -int get_max_files(void) +unsigned long get_max_files(void) { return files_stat.max_files; } @@ -82,7 +82,7 @@ int proc_nr_files(ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) { files_stat.nr_files = get_nr_files(); - return proc_dointvec(table, write, buffer, lenp, ppos); + return proc_doulongvec_minmax(table, write, buffer, lenp, ppos); } #else int proc_nr_files(ctl_table *table, int write, @@ -105,7 +105,7 @@ int proc_nr_files(ctl_table *table, int write, struct file *get_empty_filp(void) { const struct cred *cred = current_cred(); - static int old_max; + static long old_max; struct file * f; /* @@ -140,8 +140,7 @@ struct file *get_empty_filp(void) over: /* Ran out of filps - report that */ if (get_nr_files() > old_max) { - printk(KERN_INFO "VFS: file-max limit %d reached\n", - get_max_files()); + pr_info("VFS: file-max limit %lu reached\n", get_max_files()); old_max = get_nr_files(); } goto fail; @@ -487,7 +486,7 @@ retry: void __init files_init(unsigned long mempages) { - int n; + unsigned long n; filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL); @@ -498,9 +497,7 @@ void __init files_init(unsigned long mempages) */ n = (mempages * (PAGE_SIZE / 1024)) / 10; - files_stat.max_files = n; - if (files_stat.max_files < NR_FILE) - files_stat.max_files = NR_FILE; + files_stat.max_files = max_t(unsigned long, n, NR_FILE); files_defer_init(); lg_lock_init(files_lglock); percpu_counter_init(&nr_files, 0); diff --git a/include/linux/fs.h b/include/linux/fs.h index 63d069b..8c06590 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -34,9 +34,9 @@ /* And dynamically-tunable limits and defaults: */ struct files_stat_struct { - int nr_files; /* read only */ - int nr_free_files; /* read only */ - int max_files; /* tunable */ + unsigned long nr_files; /* read only */ + unsigned long nr_free_files; /* read only */ + unsigned long max_files; /* tunable */ }; struct inodes_stat_t { @@ -404,7 +404,7 @@ extern void __init inode_init_early(void); extern void __init files_init(unsigned long); extern struct files_stat_struct files_stat; -extern int get_max_files(void); +extern unsigned long get_max_files(void); extern int sysctl_nr_open; extern struct inodes_stat_t inodes_stat; extern int leases_enable, lease_break_time; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index f88552c..f789a0a 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1352,16 +1352,16 @@ static struct ctl_table fs_table[] = { { .procname = "file-nr", .data = &files_stat, - .maxlen = 3*sizeof(int), + .maxlen = sizeof(files_stat), .mode = 0444, .proc_handler = proc_nr_files, }, { .procname = "file-max", .data = &files_stat.max_files, - .maxlen = sizeof(int), + .maxlen = sizeof(files_stat.max_files), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_doulongvec_minmax, }, { .procname = "nr_open", diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 0b39b24..3e1d7d1 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -117,7 +117,7 @@ static struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1]; static DEFINE_SPINLOCK(unix_table_lock); -static atomic_t unix_nr_socks = ATOMIC_INIT(0); +static atomic_long_t unix_nr_socks; #define unix_sockets_unbound (&unix_socket_table[UNIX_HASH_SIZE]) @@ -360,13 +360,13 @@ static void unix_sock_destructor(struct sock *sk) if (u->addr) unix_release_addr(u->addr); - atomic_dec(&unix_nr_socks); + atomic_long_dec(&unix_nr_socks); local_bh_disable(); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); local_bh_enable(); #ifdef UNIX_REFCNT_DEBUG - printk(KERN_DEBUG "UNIX %p is destroyed, %d are still alive.\n", sk, - atomic_read(&unix_nr_socks)); + printk(KERN_DEBUG "UNIX %p is destroyed, %ld are still alive.\n", sk, + atomic_long_read(&unix_nr_socks)); #endif } @@ -606,8 +606,8 @@ static struct sock *unix_create1(struct net *net, struct socket *sock) struct sock *sk = NULL; struct unix_sock *u; - atomic_inc(&unix_nr_socks); - if (atomic_read(&unix_nr_socks) > 2 * get_max_files()) + atomic_long_inc(&unix_nr_socks); + if (atomic_long_read(&unix_nr_socks) > 2 * get_max_files()) goto out; sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_proto); @@ -632,7 +632,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock) unix_insert_socket(unix_sockets_unbound, sk); out: if (sk == NULL) - atomic_dec(&unix_nr_socks); + atomic_long_dec(&unix_nr_socks); else { local_bh_disable(); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH V4] fs: allow for more than 2^31 files 2010-10-01 5:29 ` [PATCH V4] " Eric Dumazet @ 2010-10-01 13:38 ` Robin Holt 2010-10-05 7:32 ` Eric Dumazet 1 sibling, 0 replies; 9+ messages in thread From: Robin Holt @ 2010-10-01 13:38 UTC (permalink / raw) To: Eric Dumazet Cc: Robin Holt, David Miller, dipankar, viro, bcrl, den, mingo, mszeredi, cmm, npiggin, xemul, linux-kernel, netdev Looks good. Reviewed-by: Robin Holt <holt@sgi.com> Tested-by: Robin Holt <holt@sgi.com> I don't mean to flood this with my name, merely that I do find this patch acceptable, worthy, and have tested it. Feel free to lop off any of these lines that are offensive. Robin On Fri, Oct 01, 2010 at 07:29:18AM +0200, Eric Dumazet wrote: > Le vendredi 01 octobre 2010 à 07:03 +0200, Eric Dumazet a écrit : > > Le jeudi 30 septembre 2010 à 23:34 -0500, Robin Holt a écrit : > > > > > The proc_handler used to be proc_nr_files() which would call > > > get_nr_files() and deposit the result in files_stat.nr_files then cascade > > > to proc_dointvec() which would dump the 3 values. Now it will dump the > > > three values, but not update the middle (nr_files) value first. > > > > > > > Ah I get it now, thanks ! > > > > I'll send a V4 shortly. > > > > > > In this v4, I call proc_nr_files() again, and proc_nr_files() calls > proc_doulongvec_minmax() instead of proc_dointvec() > > Added the "cat /proc/sys/fs/file-nr" in Changelog > > Thanks again Robin > > [PATCH V3] fs: allow for more than 2^31 files > > Robin Holt tried to boot a 16TB system and found af_unix was overflowing > a 32bit value : > > <quote> > > We were seeing a failure which prevented boot. The kernel was incapable > of creating either a named pipe or unix domain socket. This comes down > to a common kernel function called unix_create1() which does: > > atomic_inc(&unix_nr_socks); > if (atomic_read(&unix_nr_socks) > 2 * get_max_files()) > goto out; > > The function get_max_files() is a simple return of files_stat.max_files. > files_stat.max_files is a signed integer and is computed in > fs/file_table.c's files_init(). > > n = (mempages * (PAGE_SIZE / 1024)) / 10; > files_stat.max_files = n; > > In our case, mempages (total_ram_pages) is approx 3,758,096,384 > (0xe0000000). That leaves max_files at approximately 1,503,238,553. > This causes 2 * get_max_files() to integer overflow. > > </quote> > > Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long > integers, and change af_unix to use an atomic_long_t instead of > atomic_t. > > get_max_files() is changed to return an unsigned long. > get_nr_files() is changed to return a long. > > unix_nr_socks is changed from atomic_t to atomic_long_t, while not > strictly needed to address Robin problem. > > Before patch (on a 64bit kernel) : > # echo 2147483648 >/proc/sys/fs/file-max > # cat /proc/sys/fs/file-max > -18446744071562067968 > > After patch: > # echo 2147483648 >/proc/sys/fs/file-max > # cat /proc/sys/fs/file-max > 2147483648 > # cat /proc/sys/fs/file-nr > 704 0 2147483648 > > > Reported-by: Robin Holt <holt@sgi.com> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > fs/file_table.c | 17 +++++++---------- > include/linux/fs.h | 8 ++++---- > kernel/sysctl.c | 6 +++--- > net/unix/af_unix.c | 14 +++++++------- > 4 files changed, 21 insertions(+), 24 deletions(-) > > diff --git a/fs/file_table.c b/fs/file_table.c > index a04bdd8..c3dee38 100644 > --- a/fs/file_table.c > +++ b/fs/file_table.c > @@ -60,7 +60,7 @@ static inline void file_free(struct file *f) > /* > * Return the total number of open files in the system > */ > -static int get_nr_files(void) > +static long get_nr_files(void) > { > return percpu_counter_read_positive(&nr_files); > } > @@ -68,7 +68,7 @@ static int get_nr_files(void) > /* > * Return the maximum number of open files in the system > */ > -int get_max_files(void) > +unsigned long get_max_files(void) > { > return files_stat.max_files; > } > @@ -82,7 +82,7 @@ int proc_nr_files(ctl_table *table, int write, > void __user *buffer, size_t *lenp, loff_t *ppos) > { > files_stat.nr_files = get_nr_files(); > - return proc_dointvec(table, write, buffer, lenp, ppos); > + return proc_doulongvec_minmax(table, write, buffer, lenp, ppos); > } > #else > int proc_nr_files(ctl_table *table, int write, > @@ -105,7 +105,7 @@ int proc_nr_files(ctl_table *table, int write, > struct file *get_empty_filp(void) > { > const struct cred *cred = current_cred(); > - static int old_max; > + static long old_max; > struct file * f; > > /* > @@ -140,8 +140,7 @@ struct file *get_empty_filp(void) > over: > /* Ran out of filps - report that */ > if (get_nr_files() > old_max) { > - printk(KERN_INFO "VFS: file-max limit %d reached\n", > - get_max_files()); > + pr_info("VFS: file-max limit %lu reached\n", get_max_files()); > old_max = get_nr_files(); > } > goto fail; > @@ -487,7 +486,7 @@ retry: > > void __init files_init(unsigned long mempages) > { > - int n; > + unsigned long n; > > filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0, > SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL); > @@ -498,9 +497,7 @@ void __init files_init(unsigned long mempages) > */ > > n = (mempages * (PAGE_SIZE / 1024)) / 10; > - files_stat.max_files = n; > - if (files_stat.max_files < NR_FILE) > - files_stat.max_files = NR_FILE; > + files_stat.max_files = max_t(unsigned long, n, NR_FILE); > files_defer_init(); > lg_lock_init(files_lglock); > percpu_counter_init(&nr_files, 0); > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 63d069b..8c06590 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -34,9 +34,9 @@ > > /* And dynamically-tunable limits and defaults: */ > struct files_stat_struct { > - int nr_files; /* read only */ > - int nr_free_files; /* read only */ > - int max_files; /* tunable */ > + unsigned long nr_files; /* read only */ > + unsigned long nr_free_files; /* read only */ > + unsigned long max_files; /* tunable */ > }; > > struct inodes_stat_t { > @@ -404,7 +404,7 @@ extern void __init inode_init_early(void); > extern void __init files_init(unsigned long); > > extern struct files_stat_struct files_stat; > -extern int get_max_files(void); > +extern unsigned long get_max_files(void); > extern int sysctl_nr_open; > extern struct inodes_stat_t inodes_stat; > extern int leases_enable, lease_break_time; > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index f88552c..f789a0a 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -1352,16 +1352,16 @@ static struct ctl_table fs_table[] = { > { > .procname = "file-nr", > .data = &files_stat, > - .maxlen = 3*sizeof(int), > + .maxlen = sizeof(files_stat), > .mode = 0444, > .proc_handler = proc_nr_files, > }, > { > .procname = "file-max", > .data = &files_stat.max_files, > - .maxlen = sizeof(int), > + .maxlen = sizeof(files_stat.max_files), > .mode = 0644, > - .proc_handler = proc_dointvec, > + .proc_handler = proc_doulongvec_minmax, > }, > { > .procname = "nr_open", > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > index 0b39b24..3e1d7d1 100644 > --- a/net/unix/af_unix.c > +++ b/net/unix/af_unix.c > @@ -117,7 +117,7 @@ > > static struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1]; > static DEFINE_SPINLOCK(unix_table_lock); > -static atomic_t unix_nr_socks = ATOMIC_INIT(0); > +static atomic_long_t unix_nr_socks; > > #define unix_sockets_unbound (&unix_socket_table[UNIX_HASH_SIZE]) > > @@ -360,13 +360,13 @@ static void unix_sock_destructor(struct sock *sk) > if (u->addr) > unix_release_addr(u->addr); > > - atomic_dec(&unix_nr_socks); > + atomic_long_dec(&unix_nr_socks); > local_bh_disable(); > sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); > local_bh_enable(); > #ifdef UNIX_REFCNT_DEBUG > - printk(KERN_DEBUG "UNIX %p is destroyed, %d are still alive.\n", sk, > - atomic_read(&unix_nr_socks)); > + printk(KERN_DEBUG "UNIX %p is destroyed, %ld are still alive.\n", sk, > + atomic_long_read(&unix_nr_socks)); > #endif > } > > @@ -606,8 +606,8 @@ static struct sock *unix_create1(struct net *net, struct socket *sock) > struct sock *sk = NULL; > struct unix_sock *u; > > - atomic_inc(&unix_nr_socks); > - if (atomic_read(&unix_nr_socks) > 2 * get_max_files()) > + atomic_long_inc(&unix_nr_socks); > + if (atomic_long_read(&unix_nr_socks) > 2 * get_max_files()) > goto out; > > sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_proto); > @@ -632,7 +632,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock) > unix_insert_socket(unix_sockets_unbound, sk); > out: > if (sk == NULL) > - atomic_dec(&unix_nr_socks); > + atomic_long_dec(&unix_nr_socks); > else { > local_bh_disable(); > sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH V4] fs: allow for more than 2^31 files 2010-10-01 5:29 ` [PATCH V4] " Eric Dumazet 2010-10-01 13:38 ` Robin Holt @ 2010-10-05 7:32 ` Eric Dumazet 1 sibling, 0 replies; 9+ messages in thread From: Eric Dumazet @ 2010-10-05 7:32 UTC (permalink / raw) To: Andrew Morton Cc: David Miller, Robin Holt, dipankar, viro, bcrl, den, mingo, mszeredi, cmm, npiggin, xemul, linux-kernel, netdev Andrew, Could you please review this patch, you probably are the right guy to take it, because it crosses fs and net trees. Note : /proc/sys/fs/file-nr is a read-only file, so this patch doesnt depend on previous patch (sysctl: fix min/max handling in __do_proc_doulongvec_minmax()) Thanks ! [PATCH V4] fs: allow for more than 2^31 files Robin Holt tried to boot a 16TB system and found af_unix was overflowing a 32bit value : <quote> We were seeing a failure which prevented boot. The kernel was incapable of creating either a named pipe or unix domain socket. This comes down to a common kernel function called unix_create1() which does: atomic_inc(&unix_nr_socks); if (atomic_read(&unix_nr_socks) > 2 * get_max_files()) goto out; The function get_max_files() is a simple return of files_stat.max_files. files_stat.max_files is a signed integer and is computed in fs/file_table.c's files_init(). n = (mempages * (PAGE_SIZE / 1024)) / 10; files_stat.max_files = n; In our case, mempages (total_ram_pages) is approx 3,758,096,384 (0xe0000000). That leaves max_files at approximately 1,503,238,553. This causes 2 * get_max_files() to integer overflow. </quote> Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long integers, and change af_unix to use an atomic_long_t instead of atomic_t. get_max_files() is changed to return an unsigned long. get_nr_files() is changed to return a long. unix_nr_socks is changed from atomic_t to atomic_long_t, while not strictly needed to address Robin problem. Before patch (on a 64bit kernel) : # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max -18446744071562067968 After patch: # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max 2147483648 # cat /proc/sys/fs/file-nr 704 0 2147483648 Reported-by: Robin Holt <holt@sgi.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Miller <davem@davemloft.net> Reviewed-by: Robin Holt <holt@sgi.com> Tested-by: Robin Holt <holt@sgi.com> --- fs/file_table.c | 17 +++++++---------- include/linux/fs.h | 8 ++++---- kernel/sysctl.c | 6 +++--- net/unix/af_unix.c | 14 +++++++------- 4 files changed, 21 insertions(+), 24 deletions(-) diff --git a/fs/file_table.c b/fs/file_table.c index a04bdd8..c3dee38 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -60,7 +60,7 @@ static inline void file_free(struct file *f) /* * Return the total number of open files in the system */ -static int get_nr_files(void) +static long get_nr_files(void) { return percpu_counter_read_positive(&nr_files); } @@ -68,7 +68,7 @@ static int get_nr_files(void) /* * Return the maximum number of open files in the system */ -int get_max_files(void) +unsigned long get_max_files(void) { return files_stat.max_files; } @@ -82,7 +82,7 @@ int proc_nr_files(ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) { files_stat.nr_files = get_nr_files(); - return proc_dointvec(table, write, buffer, lenp, ppos); + return proc_doulongvec_minmax(table, write, buffer, lenp, ppos); } #else int proc_nr_files(ctl_table *table, int write, @@ -105,7 +105,7 @@ int proc_nr_files(ctl_table *table, int write, struct file *get_empty_filp(void) { const struct cred *cred = current_cred(); - static int old_max; + static long old_max; struct file * f; /* @@ -140,8 +140,7 @@ struct file *get_empty_filp(void) over: /* Ran out of filps - report that */ if (get_nr_files() > old_max) { - printk(KERN_INFO "VFS: file-max limit %d reached\n", - get_max_files()); + pr_info("VFS: file-max limit %lu reached\n", get_max_files()); old_max = get_nr_files(); } goto fail; @@ -487,7 +486,7 @@ retry: void __init files_init(unsigned long mempages) { - int n; + unsigned long n; filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL); @@ -498,9 +497,7 @@ void __init files_init(unsigned long mempages) */ n = (mempages * (PAGE_SIZE / 1024)) / 10; - files_stat.max_files = n; - if (files_stat.max_files < NR_FILE) - files_stat.max_files = NR_FILE; + files_stat.max_files = max_t(unsigned long, n, NR_FILE); files_defer_init(); lg_lock_init(files_lglock); percpu_counter_init(&nr_files, 0); diff --git a/include/linux/fs.h b/include/linux/fs.h index 63d069b..8c06590 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -34,9 +34,9 @@ /* And dynamically-tunable limits and defaults: */ struct files_stat_struct { - int nr_files; /* read only */ - int nr_free_files; /* read only */ - int max_files; /* tunable */ + unsigned long nr_files; /* read only */ + unsigned long nr_free_files; /* read only */ + unsigned long max_files; /* tunable */ }; struct inodes_stat_t { @@ -404,7 +404,7 @@ extern void __init inode_init_early(void); extern void __init files_init(unsigned long); extern struct files_stat_struct files_stat; -extern int get_max_files(void); +extern unsigned long get_max_files(void); extern int sysctl_nr_open; extern struct inodes_stat_t inodes_stat; extern int leases_enable, lease_break_time; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index f88552c..f789a0a 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1352,16 +1352,16 @@ static struct ctl_table fs_table[] = { { .procname = "file-nr", .data = &files_stat, - .maxlen = 3*sizeof(int), + .maxlen = sizeof(files_stat), .mode = 0444, .proc_handler = proc_nr_files, }, { .procname = "file-max", .data = &files_stat.max_files, - .maxlen = sizeof(int), + .maxlen = sizeof(files_stat.max_files), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_doulongvec_minmax, }, { .procname = "nr_open", diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 0b39b24..3e1d7d1 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -117,7 +117,7 @@ static struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1]; static DEFINE_SPINLOCK(unix_table_lock); -static atomic_t unix_nr_socks = ATOMIC_INIT(0); +static atomic_long_t unix_nr_socks; #define unix_sockets_unbound (&unix_socket_table[UNIX_HASH_SIZE]) @@ -360,13 +360,13 @@ static void unix_sock_destructor(struct sock *sk) if (u->addr) unix_release_addr(u->addr); - atomic_dec(&unix_nr_socks); + atomic_long_dec(&unix_nr_socks); local_bh_disable(); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); local_bh_enable(); #ifdef UNIX_REFCNT_DEBUG - printk(KERN_DEBUG "UNIX %p is destroyed, %d are still alive.\n", sk, - atomic_read(&unix_nr_socks)); + printk(KERN_DEBUG "UNIX %p is destroyed, %ld are still alive.\n", sk, + atomic_long_read(&unix_nr_socks)); #endif } @@ -606,8 +606,8 @@ static struct sock *unix_create1(struct net *net, struct socket *sock) struct sock *sk = NULL; struct unix_sock *u; - atomic_inc(&unix_nr_socks); - if (atomic_read(&unix_nr_socks) > 2 * get_max_files()) + atomic_long_inc(&unix_nr_socks); + if (atomic_long_read(&unix_nr_socks) > 2 * get_max_files()) goto out; sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_proto); @@ -632,7 +632,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock) unix_insert_socket(unix_sockets_unbound, sk); out: if (sk == NULL) - atomic_dec(&unix_nr_socks); + atomic_long_dec(&unix_nr_socks); else { local_bh_disable(); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-10-05 7:33 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20100923121704.GR14064@sgi.com>
[not found] ` <1285246384.362.3.camel@edumazet-laptop>
[not found] ` <20100923141037.GA3811@in.ibm.com>
[not found] ` <20100927.153639.212415479.davem@davemloft.net>
2010-09-28 3:46 ` [PATCH V3] fs: allow for more than 2^31 files Eric Dumazet
2010-09-28 4:10 ` David Miller
2010-09-30 20:26 ` Robin Holt
2010-09-30 20:45 ` Eric Dumazet
2010-10-01 4:34 ` Robin Holt
2010-10-01 5:03 ` Eric Dumazet
2010-10-01 5:29 ` [PATCH V4] " Eric Dumazet
2010-10-01 13:38 ` Robin Holt
2010-10-05 7:32 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).