[PATCH] kexec: add restriction on kexec

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] kexec: add restriction on kexec_load() segment sizes
@ 2016-07-26  3:03 zhongjiang
  2016-07-26 19:55 ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: zhongjiang @ 2016-07-26  3:03 UTC (permalink / raw)
  To: akpm, ebiederm; +Cc: linux-mm, mm-commits

From: zhong jiang <zhongjiang@huawei.com>

I hit the following issue when run trinity in my system.  The kernel is
3.4 version, but mainline has the same issue.

The root cause is that the segment size is too large so the kerenl spends
too long trying to allocate a page.  Other cases will block until the test
case quits.  Also, OOM conditions will occur.

Call Trace:
 [<ffffffff81106eac>] __alloc_pages_nodemask+0x14c/0x8f0
 [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
 [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
 [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
 [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
 [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
 [<ffffffff8113e5ef>] alloc_pages_current+0xaf/0x120
 [<ffffffff810a0da0>] kimage_alloc_pages+0x10/0x60
 [<ffffffff810a15ad>] kimage_alloc_control_pages+0x5d/0x270
 [<ffffffff81027e85>] machine_kexec_prepare+0xe5/0x6c0
 [<ffffffff810a0d52>] ? kimage_free_page_list+0x52/0x70
 [<ffffffff810a1921>] sys_kexec_load+0x141/0x600
 [<ffffffff8115e6b0>] ? vfs_write+0x100/0x180
 [<ffffffff8145fbd9>] system_call_fastpath+0x16/0x1b

The patch changes sanity_check_segment_list() to verify that no segment is
larger than half of memory.

Suggested-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 kernel/kexec_core.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 56b3ed0..536550f 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -140,6 +140,7 @@ int kexec_should_crash(struct task_struct *p)
  * allocating pages whose destination address we do not care about.
  */
 #define KIMAGE_NO_DEST	(-1UL)
+#define PAGE_COUNT(x)	(((x) + PAGE_SIZE - 1) >> PAGE_SHIFT)
 
 static struct page *kimage_alloc_page(struct kimage *image,
 				       gfp_t gfp_mask,
@@ -149,6 +150,7 @@ int sanity_check_segment_list(struct kimage *image)
 {
 	int result, i;
 	unsigned long nr_segments = image->nr_segments;
+	unsigned long total_segments = 0;
 
 	/*
 	 * Verify we have good destination addresses.  The caller is
@@ -210,6 +212,23 @@ int sanity_check_segment_list(struct kimage *image)
 	}

+	/*
+	 * Verify that no segment is larger than half of memory.
+	 * If a segment from userspace is too large, a large amount
+	 * of time will be wasted allocating pages, which can cause
+	 * a soft lockup.
+	 */
+	for (i = 0; i < nr_segments; i++) {
+		if (PAGE_COUNT(image->segment[i].memsz) > totalram_pages / 2
+				|| PAGE_COUNT(total_segments) > totalram_pages / 2)
+			return result;
+
+		total_segments += image->segment[i].memsz;
+	}
+
+	if (PAGE_COUNT(total_segments) > totalram_pages / 2)
+		return result;
+
	/*
 	 * Verify we have good destination addresses.  Normally
 	 * the caller is responsible for making certain we don't
 	 * attempt to load the new image into invalid or reserved
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] kexec: add restriction on kexec_load() segment sizes
  2016-07-26  3:03 [PATCH] kexec: add restriction on kexec_load() segment sizes zhongjiang
@ 2016-07-26 19:55 ` Andrew Morton
  2016-07-27  4:10   ` zhong jiang
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2016-07-26 19:55 UTC (permalink / raw)
  To: zhongjiang; +Cc: ebiederm, linux-mm, mm-commits

On Tue, 26 Jul 2016 11:03:39 +0800 zhongjiang <zhongjiang@huawei.com> wrote:

> From: zhong jiang <zhongjiang@huawei.com>
> 
> I hit the following issue when run trinity in my system.  The kernel is
> 3.4 version, but mainline has the same issue.
> 
> The root cause is that the segment size is too large so the kerenl spends
> too long trying to allocate a page.  Other cases will block until the test
> case quits.  Also, OOM conditions will occur.
> 
> Call Trace:
>  [<ffffffff81106eac>] __alloc_pages_nodemask+0x14c/0x8f0
>  [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>  [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>  [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>  [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>  [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>  [<ffffffff8113e5ef>] alloc_pages_current+0xaf/0x120
>  [<ffffffff810a0da0>] kimage_alloc_pages+0x10/0x60
>  [<ffffffff810a15ad>] kimage_alloc_control_pages+0x5d/0x270
>  [<ffffffff81027e85>] machine_kexec_prepare+0xe5/0x6c0
>  [<ffffffff810a0d52>] ? kimage_free_page_list+0x52/0x70
>  [<ffffffff810a1921>] sys_kexec_load+0x141/0x600
>  [<ffffffff8115e6b0>] ? vfs_write+0x100/0x180
>  [<ffffffff8145fbd9>] system_call_fastpath+0x16/0x1b
> 
> The patch changes sanity_check_segment_list() to verify that no segment is
> larger than half of memory.

"to verify that the usage by all segmetns does not exceed half of memory"

> Suggested-off-by: Eric W. Biederman <ebiederm@xmission.com>

"Suggested-by:"

> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -140,6 +140,7 @@ int kexec_should_crash(struct task_struct *p)
>   * allocating pages whose destination address we do not care about.
>   */
>  #define KIMAGE_NO_DEST	(-1UL)
> +#define PAGE_COUNT(x)	(((x) + PAGE_SIZE - 1) >> PAGE_SHIFT)
>  
>  static struct page *kimage_alloc_page(struct kimage *image,
>  				       gfp_t gfp_mask,
> @@ -149,6 +150,7 @@ int sanity_check_segment_list(struct kimage *image)
>  {
>  	int result, i;
>  	unsigned long nr_segments = image->nr_segments;
> +	unsigned long total_segments = 0;

"total_segments" implies "total number of segments".  ie, nr_segments. 
I'd call this "total_pages" instead.

>  	/*
>  	 * Verify we have good destination addresses.  The caller is
> @@ -210,6 +212,23 @@ int sanity_check_segment_list(struct kimage *image)
>  	}
> 
> +	/*
> +	 * Verify that no segment is larger than half of memory.
> +	 * If a segment from userspace is too large, a large amount
> +	 * of time will be wasted allocating pages, which can cause
> +	 * a soft lockup.
> +	 */

	/*
	 * Verify that the memory usage required for all segments does not
	 * exceed half of all memory.  If the memory usage requested by
	 * userspace is excessive, a large amount of time will be wasted
	 * allocating pages, which can cause a soft lockup.
	 */
	
> +	for (i = 0; i < nr_segments; i++) {
> +		if (PAGE_COUNT(image->segment[i].memsz) > totalram_pages / 2
> +				|| PAGE_COUNT(total_segments) > totalram_pages / 2)
> +			return result;

And I don't think we need this?  Unless we're worried about the sum of
all segments overflowing an unsigned long, which I guess is possible. 
But if we care about that we should handle it in the next statement:

> +		total_segments += image->segment[i].memsz;

Should this be 

		total_pages += PAGE_COUNT(image->segment[i].memsz);

?  I think "yes", if the segments are allocated separately and "no" if
they are all allocated in a big blob.

And it is after this statement that we should check for arithmetic
overflow.

> +	}
> +
> +	if (PAGE_COUNT(total_segments) > totalram_pages / 2)
> +		return result;
> +

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] kexec: add restriction on kexec_load() segment sizes
  2016-07-26 19:55 ` Andrew Morton
@ 2016-07-27  4:10   ` zhong jiang
  2016-07-27 20:03     ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: zhong jiang @ 2016-07-27  4:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ebiederm, linux-mm, mm-commits

On 2016/7/27 3:55, Andrew Morton wrote:
> On Tue, 26 Jul 2016 11:03:39 +0800 zhongjiang <zhongjiang@huawei.com> wrote:
>
>> From: zhong jiang <zhongjiang@huawei.com>
>>
>> I hit the following issue when run trinity in my system.  The kernel is
>> 3.4 version, but mainline has the same issue.
>>
>> The root cause is that the segment size is too large so the kerenl spends
>> too long trying to allocate a page.  Other cases will block until the test
>> case quits.  Also, OOM conditions will occur.
>>
>> Call Trace:
>>  [<ffffffff81106eac>] __alloc_pages_nodemask+0x14c/0x8f0
>>  [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>>  [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>>  [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>>  [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>>  [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>>  [<ffffffff8113e5ef>] alloc_pages_current+0xaf/0x120
>>  [<ffffffff810a0da0>] kimage_alloc_pages+0x10/0x60
>>  [<ffffffff810a15ad>] kimage_alloc_control_pages+0x5d/0x270
>>  [<ffffffff81027e85>] machine_kexec_prepare+0xe5/0x6c0
>>  [<ffffffff810a0d52>] ? kimage_free_page_list+0x52/0x70
>>  [<ffffffff810a1921>] sys_kexec_load+0x141/0x600
>>  [<ffffffff8115e6b0>] ? vfs_write+0x100/0x180
>>  [<ffffffff8145fbd9>] system_call_fastpath+0x16/0x1b
>>
>> The patch changes sanity_check_segment_list() to verify that no segment is
>> larger than half of memory.
> "to verify that the usage by all segmetns does not exceed half of memory"
  yes
>> Suggested-off-by: Eric W. Biederman <ebiederm@xmission.com>
> "Suggested-by:"
  yes
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -140,6 +140,7 @@ int kexec_should_crash(struct task_struct *p)
>>   * allocating pages whose destination address we do not care about.
>>   */
>>  #define KIMAGE_NO_DEST	(-1UL)
>> +#define PAGE_COUNT(x)	(((x) + PAGE_SIZE - 1) >> PAGE_SHIFT)
>>  
>>  static struct page *kimage_alloc_page(struct kimage *image,
>>  				       gfp_t gfp_mask,
>> @@ -149,6 +150,7 @@ int sanity_check_segment_list(struct kimage *image)
>>  {
>>  	int result, i;
>>  	unsigned long nr_segments = image->nr_segments;
>> +	unsigned long total_segments = 0;
> "total_segments" implies "total number of segments".  ie, nr_segments. 
> I'd call this "total_pages" instead.
  yes,  it is better.
>>  	/*
>>  	 * Verify we have good destination addresses.  The caller is
>> @@ -210,6 +212,23 @@ int sanity_check_segment_list(struct kimage *image)
>>  	}
>>
>> +	/*
>> +	 * Verify that no segment is larger than half of memory.
>> +	 * If a segment from userspace is too large, a large amount
>> +	 * of time will be wasted allocating pages, which can cause
>> +	 * a soft lockup.
>> +	 */
> 	/*
> 	 * Verify that the memory usage required for all segments does not
> 	 * exceed half of all memory.  If the memory usage requested by
> 	 * userspace is excessive, a large amount of time will be wasted
> 	 * allocating pages, which can cause a soft lockup.
> 	 */
> 	
>> +	for (i = 0; i < nr_segments; i++) {
>> +		if (PAGE_COUNT(image->segment[i].memsz) > totalram_pages / 2
>> +				|| PAGE_COUNT(total_segments) > totalram_pages / 2)
>> +			return result;
> And I don't think we need this?  Unless we're worried about the sum of
> all segments overflowing an unsigned long, which I guess is possible. 
> But if we care about that we should handle it in the next statement:
>
>> +		total_segments += image->segment[i].memsz;
> Should this be 
>
> 		total_pages += PAGE_COUNT(image->segment[i].memsz);
  ok
> ?  I think "yes", if the segments are allocated separately and "no" if
> they are all allocated in a big blob.
   There is a possible that  most of segments size will exceed half of  the real memory.

  if (PAGE_COUNT(image->segment[i].memsz) > totalram_pages / 2
	|| total_pages > totalram_pages / 2)
  I guess that it is ok , we should bail out timely when it happens to the condition.
  
  is right ?

 your mean that above condition is no need. In the end, we check the overflow just one time.
  or I misunderstand.

> And it is after this statement that we should check for arithmetic
> overflow.
>
>> +	}
>> +
>> +	if (PAGE_COUNT(total_segments) > totalram_pages / 2)
>> +		return result;
>> +
>
> .
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] kexec: add restriction on kexec_load() segment sizes
  2016-07-27  4:10   ` zhong jiang
@ 2016-07-27 20:03     ` Andrew Morton
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2016-07-27 20:03 UTC (permalink / raw)
  To: zhong jiang; +Cc: ebiederm, linux-mm, mm-commits

On Wed, 27 Jul 2016 12:10:13 +0800 zhong jiang <zhongjiang@huawei.com> wrote:

> >> +	for (i = 0; i < nr_segments; i++) {
> >> +		if (PAGE_COUNT(image->segment[i].memsz) > totalram_pages / 2
> >> +				|| PAGE_COUNT(total_segments) > totalram_pages / 2)
> >> +			return result;
> > And I don't think we need this?  Unless we're worried about the sum of
> > all segments overflowing an unsigned long, which I guess is possible. 
> > But if we care about that we should handle it in the next statement:
> >
> >> +		total_segments += image->segment[i].memsz;
> > Should this be 
> >
> > 		total_pages += PAGE_COUNT(image->segment[i].memsz);
>   ok
> > ?  I think "yes", if the segments are allocated separately and "no" if
> > they are all allocated in a big blob.
>    There is a possible that  most of segments size will exceed half of  the real memory.
> 
>   if (PAGE_COUNT(image->segment[i].memsz) > totalram_pages / 2
> 	|| total_pages > totalram_pages / 2)
>   I guess that it is ok , we should bail out timely when it happens to the condition.
>   
>   is right ?
> 
>  your mean that above condition is no need. In the end, we check the overflow just one time.
>   or I misunderstand.

It doesn't matter much.  Actually I misread the code a bit.  How about

	for (i = 0; i < nr_segments; i++) {
		unsigned long seg_pages = PAGE_COUNT(image->segment[i].memsz);

		if (seg_pages > totalram_pages / 2))
			return -EINVAL;

		total_pages += seg_pages;

		if (total_pages > totalram_pages / 2)
			return -EINVAL;
	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-07-27 20:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-26  3:03 [PATCH] kexec: add restriction on kexec_load() segment sizes zhongjiang
2016-07-26 19:55 ` Andrew Morton
2016-07-27  4:10   ` zhong jiang
2016-07-27 20:03     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).