* suppress page allocation failure warnings from sys_listxattr @ 2012-03-13 18:22 Dave Jones 2012-03-13 21:33 ` Colin Walters 2012-03-27 22:51 ` Andrew Morton 0 siblings, 2 replies; 20+ messages in thread From: Dave Jones @ 2012-03-13 18:22 UTC (permalink / raw) To: viro; +Cc: Linux Kernel This size is user controllable, and so it's trivial for someone to trigger a stream of order:4 page allocation errors. Signed-off-by: Dave Jones <davej@redhat.com> --- There's also a similar problem in setxattr, but I'm not sure how we want to pass NOWARN down to memdup_user. Thoughts ? diff --git a/fs/xattr.c b/fs/xattr.c index 82f4337..544df90 100644 --- a/fs/xattr.c +++ b/fs/xattr.c @@ -496,7 +496,7 @@ listxattr(struct dentry *d, char __user *list, size_t size) if (size) { if (size > XATTR_LIST_MAX) size = XATTR_LIST_MAX; - klist = kmalloc(size, GFP_KERNEL); + klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL); if (!klist) return -ENOMEM; } ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-13 18:22 suppress page allocation failure warnings from sys_listxattr Dave Jones @ 2012-03-13 21:33 ` Colin Walters 2012-03-27 22:51 ` Andrew Morton 1 sibling, 0 replies; 20+ messages in thread From: Colin Walters @ 2012-03-13 21:33 UTC (permalink / raw) To: Dave Jones; +Cc: viro, Linux Kernel On Tue, 2012-03-13 at 14:22 -0400, Dave Jones wrote: > This size is user controllable, and so it's trivial for someone to trigger a > stream of order:4 page allocation errors. I spent some time today struggling with an order:4 allocation failure (my application uses CLONE_NEWNET to make an empty network stack for software builds, and apparently one of the netfilter caches requires this). But is that the general principle, that we just add GFP_NOWARN if the allocation size is trivially user controllable? I guess examples like fs/pipe.c:pipe_set_size() agree with you, but it feels kind of like it's papering over the problem... ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-13 18:22 suppress page allocation failure warnings from sys_listxattr Dave Jones 2012-03-13 21:33 ` Colin Walters @ 2012-03-27 22:51 ` Andrew Morton 2012-03-28 0:15 ` Dave Jones 1 sibling, 1 reply; 20+ messages in thread From: Andrew Morton @ 2012-03-27 22:51 UTC (permalink / raw) To: Dave Jones; +Cc: viro, Linux Kernel On Tue, 13 Mar 2012 14:22:20 -0400 Dave Jones <davej@redhat.com> wrote: > This size is user controllable, and so it's trivial for someone to trigger a > stream of order:4 page allocation errors. > > Signed-off-by: Dave Jones <davej@redhat.com> > > --- > There's also a similar problem in setxattr, but I'm not sure how we want > to pass NOWARN down to memdup_user. Thoughts ? > > diff --git a/fs/xattr.c b/fs/xattr.c > index 82f4337..544df90 100644 > --- a/fs/xattr.c > +++ b/fs/xattr.c > @@ -496,7 +496,7 @@ listxattr(struct dentry *d, char __user *list, size_t size) > if (size) { > if (size > XATTR_LIST_MAX) > size = XATTR_LIST_MAX; > - klist = kmalloc(size, GFP_KERNEL); > + klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL); > if (!klist) > return -ENOMEM; > } hm. The patch is good, but one would hope that it isn't "trivial" to trigger a page allocation failure for a kmalloc(65536, GFP_KERNEL) - the VM is supposed to be able to handle that. Is it really *that* easy, or is Something Unusual happening with that machine? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-27 22:51 ` Andrew Morton @ 2012-03-28 0:15 ` Dave Jones 2012-03-28 0:26 ` Andrew Morton 2012-03-28 4:39 ` Dave Chinner 0 siblings, 2 replies; 20+ messages in thread From: Dave Jones @ 2012-03-28 0:15 UTC (permalink / raw) To: Andrew Morton; +Cc: viro, Linux Kernel On Tue, Mar 27, 2012 at 03:51:49PM -0700, Andrew Morton wrote: > On Tue, 13 Mar 2012 14:22:20 -0400 > Dave Jones <davej@redhat.com> wrote: > > > This size is user controllable, and so it's trivial for someone to trigger a > > stream of order:4 page allocation errors. > > > > Signed-off-by: Dave Jones <davej@redhat.com> > > > > --- > > There's also a similar problem in setxattr, but I'm not sure how we want > > to pass NOWARN down to memdup_user. Thoughts ? > > > > diff --git a/fs/xattr.c b/fs/xattr.c > > index 82f4337..544df90 100644 > > --- a/fs/xattr.c > > +++ b/fs/xattr.c > > @@ -496,7 +496,7 @@ listxattr(struct dentry *d, char __user *list, size_t size) > > if (size) { > > if (size > XATTR_LIST_MAX) > > size = XATTR_LIST_MAX; > > - klist = kmalloc(size, GFP_KERNEL); > > + klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL); > > if (!klist) > > return -ENOMEM; > > } > > hm. The patch is good, but one would hope that it isn't "trivial" to > trigger a page allocation failure for a kmalloc(65536, GFP_KERNEL) - > the VM is supposed to be able to handle that. > > Is it really *that* easy, or is Something Unusual happening with that > machine? Well, the unusual thing was that I was fuzzing system calls for a few hours. My fuzzing tool was able to trigger these very easily after an hour or two of uptime and memory had fragmented a little, so yeah, quite trivial. Dave ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-28 0:15 ` Dave Jones @ 2012-03-28 0:26 ` Andrew Morton 2012-03-28 7:13 ` David Rientjes 2012-03-28 4:39 ` Dave Chinner 1 sibling, 1 reply; 20+ messages in thread From: Andrew Morton @ 2012-03-28 0:26 UTC (permalink / raw) To: Dave Jones; +Cc: viro, Linux Kernel On Tue, 27 Mar 2012 20:15:50 -0400 Dave Jones <davej@redhat.com> wrote: > On Tue, Mar 27, 2012 at 03:51:49PM -0700, Andrew Morton wrote: > > On Tue, 13 Mar 2012 14:22:20 -0400 > > Dave Jones <davej@redhat.com> wrote: > > > > > This size is user controllable, and so it's trivial for someone to trigger a > > > stream of order:4 page allocation errors. > > > > > > Signed-off-by: Dave Jones <davej@redhat.com> > > > > > > --- > > > There's also a similar problem in setxattr, but I'm not sure how we want > > > to pass NOWARN down to memdup_user. Thoughts ? > > > > > > diff --git a/fs/xattr.c b/fs/xattr.c > > > index 82f4337..544df90 100644 > > > --- a/fs/xattr.c > > > +++ b/fs/xattr.c > > > @@ -496,7 +496,7 @@ listxattr(struct dentry *d, char __user *list, size_t size) > > > if (size) { > > > if (size > XATTR_LIST_MAX) > > > size = XATTR_LIST_MAX; > > > - klist = kmalloc(size, GFP_KERNEL); > > > + klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL); > > > if (!klist) > > > return -ENOMEM; > > > } > > > > hm. The patch is good, but one would hope that it isn't "trivial" to > > trigger a page allocation failure for a kmalloc(65536, GFP_KERNEL) - > > the VM is supposed to be able to handle that. > > > > Is it really *that* easy, or is Something Unusual happening with that > > machine? > > Well, the unusual thing was that I was fuzzing system calls for a few hours. > > My fuzzing tool was able to trigger these very easily after an hour or two > of uptime and memory had fragmented a little, so yeah, quite trivial. > /* * PAGE_ALLOC_COSTLY_ORDER is the order at which allocations are deemed * costly to service. That is between allocation orders which should * coelesce naturally under reasonable reclaim pressure and those which * will not. */ #define PAGE_ALLOC_COSTLY_ORDER 3 Death to magic numbers :( ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-28 0:26 ` Andrew Morton @ 2012-03-28 7:13 ` David Rientjes 0 siblings, 0 replies; 20+ messages in thread From: David Rientjes @ 2012-03-28 7:13 UTC (permalink / raw) To: Andrew Morton; +Cc: Dave Jones, viro, Linux Kernel On Tue, 27 Mar 2012, Andrew Morton wrote: > /* > * PAGE_ALLOC_COSTLY_ORDER is the order at which allocations are deemed > * costly to service. That is between allocation orders which should > * coelesce naturally under reasonable reclaim pressure and those which > * will not. > */ > #define PAGE_ALLOC_COSTLY_ORDER 3 > > > Death to magic numbers :( This isn't as dire as it sounds, memory compaction is specifically targeted to run when the order is greater than this, see compaction_ready(). If direct reclaim and compaction both fail then there's nothing the VM can do other than oom kill to free memory and we avoid doing that because there's no guarantee of freeing enough memory that the high-order allocation will be successful. Not even __GFP_REPEAT is going to be helpful since we can't oom kill anything, the only alternative would be to use __GFP_NOFAIL and that would just be deadly for such an allocation request. This error is recoverable, so Acked-by: David Rientjes <rientjes@google.com> to the patch. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-28 0:15 ` Dave Jones 2012-03-28 0:26 ` Andrew Morton @ 2012-03-28 4:39 ` Dave Chinner 2012-03-28 23:47 ` Andrew Morton 1 sibling, 1 reply; 20+ messages in thread From: Dave Chinner @ 2012-03-28 4:39 UTC (permalink / raw) To: Dave Jones, Andrew Morton, viro, Linux Kernel On Tue, Mar 27, 2012 at 08:15:50PM -0400, Dave Jones wrote: > On Tue, Mar 27, 2012 at 03:51:49PM -0700, Andrew Morton wrote: > > On Tue, 13 Mar 2012 14:22:20 -0400 > > Dave Jones <davej@redhat.com> wrote: > > > > > This size is user controllable, and so it's trivial for someone to trigger a > > > stream of order:4 page allocation errors. > > > > > > Signed-off-by: Dave Jones <davej@redhat.com> > > > > > > --- > > > There's also a similar problem in setxattr, but I'm not sure how we want > > > to pass NOWARN down to memdup_user. Thoughts ? > > > > > > diff --git a/fs/xattr.c b/fs/xattr.c > > > index 82f4337..544df90 100644 > > > --- a/fs/xattr.c > > > +++ b/fs/xattr.c > > > @@ -496,7 +496,7 @@ listxattr(struct dentry *d, char __user *list, size_t size) > > > if (size) { > > > if (size > XATTR_LIST_MAX) > > > size = XATTR_LIST_MAX; > > > - klist = kmalloc(size, GFP_KERNEL); > > > + klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL); > > > if (!klist) > > > return -ENOMEM; > > > } > > > > hm. The patch is good, but one would hope that it isn't "trivial" to > > trigger a page allocation failure for a kmalloc(65536, GFP_KERNEL) - > > the VM is supposed to be able to handle that. > > > > Is it really *that* easy, or is Something Unusual happening with that > > machine? > > Well, the unusual thing was that I was fuzzing system calls for a few hours. > > My fuzzing tool was able to trigger these very easily after an hour or two > of uptime and memory had fragmented a little, so yeah, quite trivial. We've recently been seeing reports of xfsdump trigging a similar allocation failures in the XFS attr code when we are doing hundreds of thousands of attribute lookups to back them up. ad650f5 xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get I think that falling back to vmalloc here is much better solution than failing to retreive the attribute - it will work no matter how fragmented memory gets. That means we don't get incomplete backups occurring after days or months of uptime and successful backups... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-28 4:39 ` Dave Chinner @ 2012-03-28 23:47 ` Andrew Morton 2012-03-29 0:54 ` Dave Jones 0 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2012-03-28 23:47 UTC (permalink / raw) To: Dave Chinner; +Cc: Dave Jones, viro, Linux Kernel, David Rientjes On Wed, 28 Mar 2012 15:39:51 +1100 Dave Chinner <david@fromorbit.com> wrote: > > Well, the unusual thing was that I was fuzzing system calls for a few hours. > > > > My fuzzing tool was able to trigger these very easily after an hour or two > > of uptime and memory had fragmented a little, so yeah, quite trivial. > > We've recently been seeing reports of xfsdump trigging a similar > allocation failures in the XFS attr code when we are doing hundreds > of thousands of attribute lookups to back them up. > > ad650f5 xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get > > I think that falling back to vmalloc here is much better solution > than failing to retreive the attribute - it will work no matter how > fragmented memory gets. That means we don't get incomplete > backups occurring after days or months of uptime and successful > backups... Yup. How does the below look? This patch needs more Davids. From: Andrew Morton <akpm@linux-foundation.org> Subject: fs/xattr.c:listxattr(): fall back to vmalloc() if kmalloc() failed This allocation can be as large as 64k. As David points out, "falling back to vmalloc here is much better solution than failing to retreive the attribute - it will work no matter how fragmented memory gets. That means we don't get incomplete backups occurring after days or months of uptime and successful backups". Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: David Rientjes <rientjes@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- fs/xattr.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff -puN fs/xattr.c~fs-xattrc-listxattr-fall-back-to-vmalloc-if-kmalloc-failed fs/xattr.c --- a/fs/xattr.c~fs-xattrc-listxattr-fall-back-to-vmalloc-if-kmalloc-failed +++ a/fs/xattr.c @@ -492,13 +492,18 @@ listxattr(struct dentry *d, char __user { ssize_t error; char *klist = NULL; + char *vlist = NULL; /* If non-NULL, we used vmalloc() */ if (size) { if (size > XATTR_LIST_MAX) size = XATTR_LIST_MAX; klist = kmalloc(size, __GFP_NOWARN | GFP_KERNEL); - if (!klist) - return -ENOMEM; + if (!klist) { + vlist = vmalloc(size); + if (!vlist) + return -ENOMEM; + klist = vlist; + } } error = vfs_listxattr(d, klist, size); @@ -510,7 +515,10 @@ listxattr(struct dentry *d, char __user than XATTR_LIST_MAX bytes. Not possible. */ error = -E2BIG; } - kfree(klist); + if (vlist) + vfree(vlist); + else + kfree(klist); return error; } _ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-28 23:47 ` Andrew Morton @ 2012-03-29 0:54 ` Dave Jones 2012-03-29 1:10 ` Andrew Morton 0 siblings, 1 reply; 20+ messages in thread From: Dave Jones @ 2012-03-29 0:54 UTC (permalink / raw) To: Andrew Morton; +Cc: Dave Chinner, viro, Linux Kernel, David Rientjes On Wed, Mar 28, 2012 at 04:47:20PM -0700, Andrew Morton wrote: > On Wed, 28 Mar 2012 15:39:51 +1100 > Dave Chinner <david@fromorbit.com> wrote: > > > > Well, the unusual thing was that I was fuzzing system calls for a few hours. > > > > > > My fuzzing tool was able to trigger these very easily after an hour or two > > > of uptime and memory had fragmented a little, so yeah, quite trivial. > > > > We've recently been seeing reports of xfsdump trigging a similar > > allocation failures in the XFS attr code when we are doing hundreds > > of thousands of attribute lookups to back them up. > > > > ad650f5 xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get > > > > I think that falling back to vmalloc here is much better solution > > than failing to retreive the attribute - it will work no matter how > > fragmented memory gets. That means we don't get incomplete > > backups occurring after days or months of uptime and successful > > backups... > > Yup. How does the below look? Don't see anything immediately wrong with it. Any thoughts on what to do about the similar problem in setxattr ? (memdup_user) Dave ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-29 0:54 ` Dave Jones @ 2012-03-29 1:10 ` Andrew Morton 2012-03-29 1:28 ` Joe Perches 0 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2012-03-29 1:10 UTC (permalink / raw) To: Dave Jones; +Cc: Dave Chinner, viro, Linux Kernel, David Rientjes On Wed, 28 Mar 2012 20:54:42 -0400 Dave Jones <davej@redhat.com> wrote: > On Wed, Mar 28, 2012 at 04:47:20PM -0700, Andrew Morton wrote: > > On Wed, 28 Mar 2012 15:39:51 +1100 > > Dave Chinner <david@fromorbit.com> wrote: > > > > > > Well, the unusual thing was that I was fuzzing system calls for a few hours. > > > > > > > > My fuzzing tool was able to trigger these very easily after an hour or two > > > > of uptime and memory had fragmented a little, so yeah, quite trivial. > > > > > > We've recently been seeing reports of xfsdump trigging a similar > > > allocation failures in the XFS attr code when we are doing hundreds > > > of thousands of attribute lookups to back them up. > > > > > > ad650f5 xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get > > > > > > I think that falling back to vmalloc here is much better solution > > > than failing to retreive the attribute - it will work no matter how > > > fragmented memory gets. That means we don't get incomplete > > > backups occurring after days or months of uptime and successful > > > backups... > > > > Yup. How does the below look? > > Don't see anything immediately wrong with it. > Any thoughts on what to do about the similar problem in setxattr ? (memdup_user) > I can't think of anything clever. The dumb approach: From: Andrew Morton <akpm@linux-foundation.org> Subject: fs/xattr.c:setxattr(): improve handling of allocation failures This allocation can be as large as 64k. - Add __GFP_NOWARN so that a falied kmalloc() is silent - Fall back to vmalloc() if the kmalloc() failed Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: David Rientjes <rientjes@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- fs/xattr.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff -puN fs/xattr.c~fs-xattrc-setxattr-improve-handling-of-allocation-failures fs/xattr.c --- a/fs/xattr.c~fs-xattrc-setxattr-improve-handling-of-allocation-failures +++ a/fs/xattr.c @@ -320,6 +320,7 @@ setxattr(struct dentry *d, const char __ { int error; void *kvalue = NULL; + void *vvalue = NULL; /* If non-NULL, we used vmalloc() */ char kname[XATTR_NAME_MAX + 1]; if (flags & ~(XATTR_CREATE|XATTR_REPLACE)) @@ -334,13 +335,25 @@ setxattr(struct dentry *d, const char __ if (size) { if (size > XATTR_SIZE_MAX) return -E2BIG; - kvalue = memdup_user(value, size); - if (IS_ERR(kvalue)) - return PTR_ERR(kvalue); + kvalue = kmalloc(size, GFP_KERNEL | __GFP_NOWARN); + if (!kvalue) { + vvalue = vmalloc(size); + if (!vvalue) + return -ENOMEM; + kvalue = vvalue; + } + if (copy_from_user(kvalue, value, size)) { + error = -EFAULT; + goto out; + } } error = vfs_setxattr(d, kname, kvalue, size, flags); - kfree(kvalue); +out: + if (vvalue) + vfree(vvalue); + else + kfree(kvalue); return error; } _ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-29 1:10 ` Andrew Morton @ 2012-03-29 1:28 ` Joe Perches 2012-03-29 1:46 ` Andrew Morton 0 siblings, 1 reply; 20+ messages in thread From: Joe Perches @ 2012-03-29 1:28 UTC (permalink / raw) To: Andrew Morton Cc: Dave Jones, Dave Chinner, viro, Linux Kernel, David Rientjes On Wed, 2012-03-28 at 18:10 -0700, Andrew Morton wrote: > On Wed, 28 Mar 2012 20:54:42 -0400 Dave Jones <davej@redhat.com> wrote: > > > Yup. How does the below look? > > Don't see anything immediately wrong with it. > > Any thoughts on what to do about the similar problem in setxattr ? (memdup_user) [] > diff -puN fs/xattr.c~fs-xattrc-setxattr-improve-handling-of-allocation-failures fs/xattr.c [] > @@ -334,13 +335,25 @@ setxattr(struct dentry *d, const char __ [] > + kvalue = kmalloc(size, GFP_KERNEL | __GFP_NOWARN); > + if (!kvalue) { > + vvalue = vmalloc(size); [] > + if (vvalue) > + vfree(vvalue); > + else > + kfree(kvalue); > return error; These patterns are pretty common, maybe create a standard helper? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-29 1:28 ` Joe Perches @ 2012-03-29 1:46 ` Andrew Morton 2012-03-29 1:50 ` Dave Jones 2012-03-29 5:35 ` Dave Chinner 0 siblings, 2 replies; 20+ messages in thread From: Andrew Morton @ 2012-03-29 1:46 UTC (permalink / raw) To: Joe Perches; +Cc: Dave Jones, Dave Chinner, viro, Linux Kernel, David Rientjes On Wed, 28 Mar 2012 18:28:43 -0700 Joe Perches <joe@perches.com> wrote: > On Wed, 2012-03-28 at 18:10 -0700, Andrew Morton wrote: > > On Wed, 28 Mar 2012 20:54:42 -0400 Dave Jones <davej@redhat.com> wrote: > > > > Yup. How does the below look? > > > Don't see anything immediately wrong with it. > > > Any thoughts on what to do about the similar problem in setxattr ? (memdup_user) > [] > > diff -puN fs/xattr.c~fs-xattrc-setxattr-improve-handling-of-allocation-failures fs/xattr.c > [] > > @@ -334,13 +335,25 @@ setxattr(struct dentry *d, const char __ > [] > > + kvalue = kmalloc(size, GFP_KERNEL | __GFP_NOWARN); > > + if (!kvalue) { > > + vvalue = vmalloc(size); > [] > > + if (vvalue) > > + vfree(vvalue); > > + else > > + kfree(kvalue); > > return error; > > These patterns are pretty common, maybe create a standard helper? Could. There was some discussion last year and implementations were tossed around. I'm a bit apprehensive - kernel code is supposed to be robust, and large allocations are not robust and vmalloc() is crappy. Formalising these things in an API probably won't make anything worse, but will deprive us of opportunities for ritualistic humiliation and knuckle-rapping. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-29 1:46 ` Andrew Morton @ 2012-03-29 1:50 ` Dave Jones 2012-03-29 2:02 ` Andrew Morton 2012-03-29 5:35 ` Dave Chinner 1 sibling, 1 reply; 20+ messages in thread From: Dave Jones @ 2012-03-29 1:50 UTC (permalink / raw) To: Andrew Morton Cc: Joe Perches, Dave Chinner, viro, Linux Kernel, David Rientjes On Wed, Mar 28, 2012 at 06:46:02PM -0700, Andrew Morton wrote: > Could. There was some discussion last year and implementations were > tossed around. > > I'm a bit apprehensive - kernel code is supposed to be robust, and > large allocations are not robust and vmalloc() is crappy. Can you expand on crappy ? Also, what happens if something allocates and sits on a bunch of vmalloc'd memory ? would we start seeing oom kills ? (thinking of the context of my fuzzing tool where a bunch of instances could feasibly call these syscalls and not sit on huge amounts per thread, but collectively... I'm wondering if it could be provoked into killing processes I don't own) Dave ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-29 1:50 ` Dave Jones @ 2012-03-29 2:02 ` Andrew Morton 2012-03-29 2:08 ` Dave Jones 0 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2012-03-29 2:02 UTC (permalink / raw) To: Dave Jones; +Cc: Joe Perches, Dave Chinner, viro, Linux Kernel, David Rientjes On Wed, 28 Mar 2012 21:50:59 -0400 Dave Jones <davej@redhat.com> wrote: > On Wed, Mar 28, 2012 at 06:46:02PM -0700, Andrew Morton wrote: > > > Could. There was some discussion last year and implementations were > > tossed around. > > > > I'm a bit apprehensive - kernel code is supposed to be robust, and > > large allocations are not robust and vmalloc() is crappy. > > Can you expand on crappy ? It's expensive on a per-call basis and can end up failing due to internal fragmentation of vmalloc()'s virtually-addressed arena. I don't think I've ever seen a report of anyone getting a vmalloc() failure due to the fragmentation issue, so it's largely theoretical. But of course, the more we use it (especially for long-lived allocations), the greater the risk becomes. Mainly to 32-bit machines, I assume. > Also, what happens if something allocates > and sits on a bunch of vmalloc'd memory ? would we start seeing oom kills ? vmalloc() would fail. > (thinking of the context of my fuzzing tool where a bunch of instances could > feasibly call these syscalls and not sit on huge amounts per thread, but > collectively... I'm wondering if it could be provoked into killing > processes I don't own) umm, if you wanted to deliberately trigger a vmalloc() failure then I guess a good approach would be to locate a vmalloc() site which can persist beyond the syscall (modprobe is a good one!) then exercise it in a way so that there are no N-byte holes left in the arena, then trigger an N-byte vmalloc(). ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-29 2:02 ` Andrew Morton @ 2012-03-29 2:08 ` Dave Jones 2012-03-29 2:28 ` Andrew Morton 0 siblings, 1 reply; 20+ messages in thread From: Dave Jones @ 2012-03-29 2:08 UTC (permalink / raw) To: Andrew Morton Cc: Joe Perches, Dave Chinner, viro, Linux Kernel, David Rientjes On Wed, Mar 28, 2012 at 07:02:11PM -0700, Andrew Morton wrote: > > Also, what happens if something allocates > > and sits on a bunch of vmalloc'd memory ? would we start seeing oom kills ? > > vmalloc() would fail. Ok, that's a pretty boring failure mode, so not a big deal probably. > > (thinking of the context of my fuzzing tool where a bunch of instances could > > feasibly call these syscalls and not sit on huge amounts per thread, but > > collectively... I'm wondering if it could be provoked into killing > > processes I don't own) > > umm, if you wanted to deliberately trigger a vmalloc() failure then I > guess a good approach would be to locate a vmalloc() site which can > persist beyond the syscall (modprobe is a good one!) then exercise it > in a way so that there are no N-byte holes left in the arena, then > trigger an N-byte vmalloc(). Well modprobe is root-only, so that's not so bad. But it looks like key_add (see other thread from this evening) and probably others can be called as a user and gobble up vmalloc space. omnomnom. Dave ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-29 2:08 ` Dave Jones @ 2012-03-29 2:28 ` Andrew Morton 2012-03-29 3:00 ` Dave Jones 0 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2012-03-29 2:28 UTC (permalink / raw) To: Dave Jones; +Cc: Joe Perches, Dave Chinner, viro, Linux Kernel, David Rientjes On Wed, 28 Mar 2012 22:08:20 -0400 Dave Jones <davej@redhat.com> wrote: > On Wed, Mar 28, 2012 at 07:02:11PM -0700, Andrew Morton wrote: > > > > Also, what happens if something allocates > > > and sits on a bunch of vmalloc'd memory ? would we start seeing oom kills ? > > > > vmalloc() would fail. > > Ok, that's a pretty boring failure mode, so not a big deal probably. > > > > (thinking of the context of my fuzzing tool where a bunch of instances could > > > feasibly call these syscalls and not sit on huge amounts per thread, but > > > collectively... I'm wondering if it could be provoked into killing > > > processes I don't own) > > > > umm, if you wanted to deliberately trigger a vmalloc() failure then I > > guess a good approach would be to locate a vmalloc() site which can > > persist beyond the syscall (modprobe is a good one!) then exercise it > > in a way so that there are no N-byte holes left in the arena, then > > trigger an N-byte vmalloc(). > > Well modprobe is root-only, so that's not so bad. Even if it's root-only, we can still end up with a toasted machine. Accidentally toasted, not deliberately. > But it looks like > key_add (see other thread from this evening) and probably others can be > called as a user and gobble up vmalloc space. omnomnom. hm, the keys code appears to prevent the user from reserving more than 20000 bytes of memory total (key_payload_reserve()), so it doesn't look very useful for screwing up vmalloc(). ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-29 2:28 ` Andrew Morton @ 2012-03-29 3:00 ` Dave Jones 2012-03-29 21:09 ` Andrew Morton 0 siblings, 1 reply; 20+ messages in thread From: Dave Jones @ 2012-03-29 3:00 UTC (permalink / raw) To: Andrew Morton Cc: Joe Perches, Dave Chinner, viro, Linux Kernel, David Rientjes On Wed, Mar 28, 2012 at 07:28:04PM -0700, Andrew Morton wrote: > > But it looks like > > key_add (see other thread from this evening) and probably others can be > > called as a user and gobble up vmalloc space. omnomnom. > > hm, the keys code appears to prevent the user from reserving more than > 20000 bytes of memory total (key_payload_reserve()), so it doesn't look > very useful for screwing up vmalloc(). Then how did I trick it into trying an order 8 allocation ? trinity: page allocation failure: order:8, mode:0x40d0 Pid: 27119, comm: trinity Not tainted 3.3.0+ #31 Call Trace: [<ffffffff8115dd66>] warn_alloc_failed+0xf6/0x160 [<ffffffff816ad436>] ? __alloc_pages_direct_compact+0x1d0/0x1e2 [<ffffffff81162492>] __alloc_pages_nodemask+0x8b2/0xb10 [<ffffffff8119dae6>] alloc_pages_current+0xb6/0x120 [<ffffffff8115d3b4>] __get_free_pages+0x14/0x50 [<ffffffff811ac64f>] kmalloc_order_trace+0x3f/0x1a0 [<ffffffff811aca0a>] __kmalloc+0x25a/0x280 [<ffffffff812c034a>] sys_add_key+0x9a/0x210 [<ffffffff813386be>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff816c04e9>] system_call_fastpath+0x16/0x1b ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-29 3:00 ` Dave Jones @ 2012-03-29 21:09 ` Andrew Morton 2012-03-29 21:13 ` Dave Jones 0 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2012-03-29 21:09 UTC (permalink / raw) To: Dave Jones; +Cc: Joe Perches, Dave Chinner, viro, Linux Kernel, David Rientjes On Wed, 28 Mar 2012 23:00:00 -0400 Dave Jones <davej@redhat.com> wrote: > On Wed, Mar 28, 2012 at 07:28:04PM -0700, Andrew Morton wrote: > > > But it looks like > > > key_add (see other thread from this evening) and probably others can be > > > called as a user and gobble up vmalloc space. omnomnom. > > > > hm, the keys code appears to prevent the user from reserving more than > > 20000 bytes of memory total (key_payload_reserve()), so it doesn't look > > very useful for screwing up vmalloc(). > > Then how did I trick it into trying an order 8 allocation ? > > trinity: page allocation failure: order:8, mode:0x40d0 > Pid: 27119, comm: trinity Not tainted 3.3.0+ #31 > Call Trace: > [<ffffffff8115dd66>] warn_alloc_failed+0xf6/0x160 > [<ffffffff816ad436>] ? __alloc_pages_direct_compact+0x1d0/0x1e2 > [<ffffffff81162492>] __alloc_pages_nodemask+0x8b2/0xb10 > [<ffffffff8119dae6>] alloc_pages_current+0xb6/0x120 > [<ffffffff8115d3b4>] __get_free_pages+0x14/0x50 > [<ffffffff811ac64f>] kmalloc_order_trace+0x3f/0x1a0 > [<ffffffff811aca0a>] __kmalloc+0x25a/0x280 > [<ffffffff812c034a>] sys_add_key+0x9a/0x210 > [<ffffffff813386be>] ? trace_hardirqs_on_thunk+0x3a/0x3f > [<ffffffff816c04e9>] system_call_fastpath+0x16/0x1b Ah, that's different. The memory at *payload doesn't live beyond the syscall so it can't be used to cause vmalloc fragmentation. We should squish the warning: From: Andrew Morton <akpm@linux-foundation.org> Subject: security/keys/keyctl.c: suppress memory allocation failure warning This allocation may be large. The code is probing to see if it will succeed and if not, it falls back to vmalloc(). We should suppress any page-allocation failure messages when the fallback happens. Reported-by: Dave Jones <davej@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- security/keys/keyctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -puN security/keys/keyctl.c~security-keys-keyctlc-suppress-memory-allocation-failure-warning security/keys/keyctl.c --- a/security/keys/keyctl.c~security-keys-keyctlc-suppress-memory-allocation-failure-warning +++ a/security/keys/keyctl.c @@ -84,7 +84,7 @@ SYSCALL_DEFINE5(add_key, const char __us vm = false; if (_payload) { ret = -ENOMEM; - payload = kmalloc(plen, GFP_KERNEL); + payload = kmalloc(plen, GFP_KERNEL | __GFP_NOWARN); if (!payload) { if (plen <= PAGE_SIZE) goto error2; _ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-29 21:09 ` Andrew Morton @ 2012-03-29 21:13 ` Dave Jones 0 siblings, 0 replies; 20+ messages in thread From: Dave Jones @ 2012-03-29 21:13 UTC (permalink / raw) To: Andrew Morton Cc: Joe Perches, Dave Chinner, viro, Linux Kernel, David Rientjes On Thu, Mar 29, 2012 at 02:09:34PM -0700, Andrew Morton wrote: > On Wed, 28 Mar 2012 23:00:00 -0400 > Dave Jones <davej@redhat.com> wrote: > > > On Wed, Mar 28, 2012 at 07:28:04PM -0700, Andrew Morton wrote: > > > > But it looks like > > > > key_add (see other thread from this evening) and probably others can be > > > > called as a user and gobble up vmalloc space. omnomnom. > > > > > > hm, the keys code appears to prevent the user from reserving more than > > > 20000 bytes of memory total (key_payload_reserve()), so it doesn't look > > > very useful for screwing up vmalloc(). > > > > Then how did I trick it into trying an order 8 allocation ? > > > > trinity: page allocation failure: order:8, mode:0x40d0 > > Pid: 27119, comm: trinity Not tainted 3.3.0+ #31 > > Call Trace: > > [<ffffffff8115dd66>] warn_alloc_failed+0xf6/0x160 > > [<ffffffff816ad436>] ? __alloc_pages_direct_compact+0x1d0/0x1e2 > > [<ffffffff81162492>] __alloc_pages_nodemask+0x8b2/0xb10 > > [<ffffffff8119dae6>] alloc_pages_current+0xb6/0x120 > > [<ffffffff8115d3b4>] __get_free_pages+0x14/0x50 > > [<ffffffff811ac64f>] kmalloc_order_trace+0x3f/0x1a0 > > [<ffffffff811aca0a>] __kmalloc+0x25a/0x280 > > [<ffffffff812c034a>] sys_add_key+0x9a/0x210 > > [<ffffffff813386be>] ? trace_hardirqs_on_thunk+0x3a/0x3f > > [<ffffffff816c04e9>] system_call_fastpath+0x16/0x1b > > Ah, that's different. The memory at *payload doesn't live beyond the > syscall so it can't be used to cause vmalloc fragmentation. > > We should squish the warning: That's the same patch I sent in the other thread, so ack ;-) Dave ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: suppress page allocation failure warnings from sys_listxattr 2012-03-29 1:46 ` Andrew Morton 2012-03-29 1:50 ` Dave Jones @ 2012-03-29 5:35 ` Dave Chinner 1 sibling, 0 replies; 20+ messages in thread From: Dave Chinner @ 2012-03-29 5:35 UTC (permalink / raw) To: Andrew Morton; +Cc: Joe Perches, Dave Jones, viro, Linux Kernel, David Rientjes On Wed, Mar 28, 2012 at 06:46:02PM -0700, Andrew Morton wrote: > On Wed, 28 Mar 2012 18:28:43 -0700 Joe Perches <joe@perches.com> wrote: > > > On Wed, 2012-03-28 at 18:10 -0700, Andrew Morton wrote: > > > On Wed, 28 Mar 2012 20:54:42 -0400 Dave Jones <davej@redhat.com> wrote: > > > > > Yup. How does the below look? > > > > Don't see anything immediately wrong with it. > > > > Any thoughts on what to do about the similar problem in setxattr ? (memdup_user) > > [] > > > diff -puN fs/xattr.c~fs-xattrc-setxattr-improve-handling-of-allocation-failures fs/xattr.c > > [] > > > @@ -334,13 +335,25 @@ setxattr(struct dentry *d, const char __ > > [] > > > + kvalue = kmalloc(size, GFP_KERNEL | __GFP_NOWARN); > > > + if (!kvalue) { > > > + vvalue = vmalloc(size); > > [] > > > + if (vvalue) > > > + vfree(vvalue); > > > + else > > > + kfree(kvalue); > > > return error; > > > > These patterns are pretty common, maybe create a standard helper? > > Could. There was some discussion last year and implementations were > tossed around. > > I'm a bit apprehensive - kernel code is supposed to be robust, and > large allocations are not robust and vmalloc() is crappy. Formalising > these things in an API probably won't make anything worse, but will > deprive us of opportunities for ritualistic humiliation and > knuckle-rapping. I did a sweep of this recently, considering helpers for exactly such an allocation and replacing the existing per-filesystem wrappers for it. IIRC, there are wrappper functions in ext4, gfs2, and ntfs, XFS now open codes it in a couple of places, there's alloc_fdmem(), cgroup pidlists and the network code does it in several places, etc. Even some drivers are doing this. It's a widespread pattern. The easiest way to find the trivial wrappers is to grep for is_vmalloc_addr, because all the wrapper functions use this code to determine what to do: if (is_vmalloc_addr(p)) vfree(p) else kfree(p) Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2012-03-29 21:14 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-03-13 18:22 suppress page allocation failure warnings from sys_listxattr Dave Jones 2012-03-13 21:33 ` Colin Walters 2012-03-27 22:51 ` Andrew Morton 2012-03-28 0:15 ` Dave Jones 2012-03-28 0:26 ` Andrew Morton 2012-03-28 7:13 ` David Rientjes 2012-03-28 4:39 ` Dave Chinner 2012-03-28 23:47 ` Andrew Morton 2012-03-29 0:54 ` Dave Jones 2012-03-29 1:10 ` Andrew Morton 2012-03-29 1:28 ` Joe Perches 2012-03-29 1:46 ` Andrew Morton 2012-03-29 1:50 ` Dave Jones 2012-03-29 2:02 ` Andrew Morton 2012-03-29 2:08 ` Dave Jones 2012-03-29 2:28 ` Andrew Morton 2012-03-29 3:00 ` Dave Jones 2012-03-29 21:09 ` Andrew Morton 2012-03-29 21:13 ` Dave Jones 2012-03-29 5:35 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox