From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id F397C6B025F for ; Tue, 26 Jul 2016 15:55:02 -0400 (EDT) Received: by mail-pf0-f198.google.com with SMTP id c5so20506564pfc.2 for ; Tue, 26 Jul 2016 12:55:02 -0700 (PDT) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org. [140.211.169.12]) by mx.google.com with ESMTPS id kb6si2025488pab.202.2016.07.26.12.55.02 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Jul 2016 12:55:02 -0700 (PDT) Date: Tue, 26 Jul 2016 12:55:01 -0700 From: Andrew Morton Subject: Re: [PATCH] kexec: add restriction on kexec_load() segment sizes Message-Id: <20160726125501.69c8186ab9c3b1cef89899d4@linux-foundation.org> In-Reply-To: <1469502219-24140-1-git-send-email-zhongjiang@huawei.com> References: <1469502219-24140-1-git-send-email-zhongjiang@huawei.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: zhongjiang Cc: ebiederm@xmission.com, linux-mm@kvack.org, mm-commits@vger.kernel.org On Tue, 26 Jul 2016 11:03:39 +0800 zhongjiang wrote: > From: zhong jiang > > I hit the following issue when run trinity in my system. The kernel is > 3.4 version, but mainline has the same issue. > > The root cause is that the segment size is too large so the kerenl spends > too long trying to allocate a page. Other cases will block until the test > case quits. Also, OOM conditions will occur. > > Call Trace: > [] __alloc_pages_nodemask+0x14c/0x8f0 > [] ? trace_hardirqs_on_thunk+0x3a/0x3c > [] ? trace_hardirqs_on_thunk+0x3a/0x3c > [] ? trace_hardirqs_on_thunk+0x3a/0x3c > [] ? trace_hardirqs_on_thunk+0x3a/0x3c > [] ? trace_hardirqs_on_thunk+0x3a/0x3c > [] alloc_pages_current+0xaf/0x120 > [] kimage_alloc_pages+0x10/0x60 > [] kimage_alloc_control_pages+0x5d/0x270 > [] machine_kexec_prepare+0xe5/0x6c0 > [] ? kimage_free_page_list+0x52/0x70 > [] sys_kexec_load+0x141/0x600 > [] ? vfs_write+0x100/0x180 > [] system_call_fastpath+0x16/0x1b > > The patch changes sanity_check_segment_list() to verify that no segment is > larger than half of memory. "to verify that the usage by all segmetns does not exceed half of memory" > Suggested-off-by: Eric W. Biederman "Suggested-by:" > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -140,6 +140,7 @@ int kexec_should_crash(struct task_struct *p) > * allocating pages whose destination address we do not care about. > */ > #define KIMAGE_NO_DEST (-1UL) > +#define PAGE_COUNT(x) (((x) + PAGE_SIZE - 1) >> PAGE_SHIFT) > > static struct page *kimage_alloc_page(struct kimage *image, > gfp_t gfp_mask, > @@ -149,6 +150,7 @@ int sanity_check_segment_list(struct kimage *image) > { > int result, i; > unsigned long nr_segments = image->nr_segments; > + unsigned long total_segments = 0; "total_segments" implies "total number of segments". ie, nr_segments. I'd call this "total_pages" instead. > /* > * Verify we have good destination addresses. The caller is > @@ -210,6 +212,23 @@ int sanity_check_segment_list(struct kimage *image) > } > > + /* > + * Verify that no segment is larger than half of memory. > + * If a segment from userspace is too large, a large amount > + * of time will be wasted allocating pages, which can cause > + * a soft lockup. > + */ /* * Verify that the memory usage required for all segments does not * exceed half of all memory. If the memory usage requested by * userspace is excessive, a large amount of time will be wasted * allocating pages, which can cause a soft lockup. */ > + for (i = 0; i < nr_segments; i++) { > + if (PAGE_COUNT(image->segment[i].memsz) > totalram_pages / 2 > + || PAGE_COUNT(total_segments) > totalram_pages / 2) > + return result; And I don't think we need this? Unless we're worried about the sum of all segments overflowing an unsigned long, which I guess is possible. But if we care about that we should handle it in the next statement: > + total_segments += image->segment[i].memsz; Should this be total_pages += PAGE_COUNT(image->segment[i].memsz); ? I think "yes", if the segments are allocated separately and "no" if they are all allocated in a big blob. And it is after this statement that we should check for arithmetic overflow. > + } > + > + if (PAGE_COUNT(total_segments) > totalram_pages / 2) > + return result; > + -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org