From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-by2nam03on0098.outbound.protection.outlook.com ([104.47.42.98]:51775 "EHLO NAM03-BY2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1032067AbeCAPa7 (ORCPT ); Thu, 1 Mar 2018 10:30:59 -0500 From: Sasha Levin To: "stable@vger.kernel.org" , "stable-commits@vger.kernel.org" CC: Vlastimil Babka , Andrew Morton , Linus Torvalds , Sasha Levin Subject: [added to the 4.1 stable tree] fs/select: add vmalloc fallback for select(2) Date: Thu, 1 Mar 2018 15:24:40 +0000 Message-ID: <20180301152116.1486-211-alexander.levin@microsoft.com> References: <20180301152116.1486-1-alexander.levin@microsoft.com> In-Reply-To: <20180301152116.1486-1-alexander.levin@microsoft.com> Content-Language: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: stable-owner@vger.kernel.org List-ID: From: Vlastimil Babka This patch has been added to the 4.1 stable tree. If you have any objections, please let us know. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D [ Upstream commit 2d19309cf86883f634a4f8ec55a54bda87db19bf ] The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows with the number of fds passed. We had a customer report page allocation failures of order-4 for this allocation. This is a costly order, so it migh= t easily fail, as the VM expects such allocation to have a lower-order fallba= ck. Such trivial fallback is vmalloc(), as the memory doesn't have to be physic= ally contiguous and the allocation is temporary for the duration of the syscall only. There were some concerns, whether this would have negative impact on = the system by exposing vmalloc() to userspace. Although an excessive use of vma= lloc can cause some system wide performance issues - TLB flushes etc. - a large order allocation is not for free either and an excessive reclaim/compaction= can have a similar effect. Also note that the size is effectively limited by RLIMIT_NOFILE which defaults to 1024 on the systems I checked. That means t= he bitmaps will fit well within single page and thus the vmalloc() fallback co= uld be only excercised for processes where root allows a higher limit. Note that the poll(2) syscall seems to use a linked list of order-0 pages, = so it doesn't need this kind of fallback. [eric.dumazet@gmail.com: fix failure path logic] [akpm@linux-foundation.org: use proper type for size] Link: http://lkml.kernel.org/r/20160927084536.5923-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka Acked-by: Michal Hocko Cc: Alexander Viro Cc: Eric Dumazet Cc: David Laight Cc: Hillf Danton Cc: Nicholas Piggin Cc: Jason Baron Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- fs/select.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/fs/select.c b/fs/select.c index f684c750e08a..f7e6fc7bf83c 100644 --- a/fs/select.c +++ b/fs/select.c @@ -29,6 +29,7 @@ #include #include #include +#include =20 #include =20 @@ -550,7 +551,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set _= _user *outp, fd_set_bits fds; void *bits; int ret, max_fds; - unsigned int size; + size_t size, alloc_size; struct fdtable *fdt; /* Allocate small arguments on the stack to save memory and be faster */ long stack_fds[SELECT_STACK_ALLOC/sizeof(long)]; @@ -577,7 +578,14 @@ int core_sys_select(int n, fd_set __user *inp, fd_set = __user *outp, if (size > sizeof(stack_fds) / 6) { /* Not enough space in on-stack array; must use kmalloc */ ret =3D -ENOMEM; - bits =3D kmalloc(6 * size, GFP_KERNEL); + if (size > (SIZE_MAX / 6)) + goto out_nofds; + + alloc_size =3D 6 * size; + bits =3D kmalloc(alloc_size, GFP_KERNEL|__GFP_NOWARN); + if (!bits && alloc_size > PAGE_SIZE) + bits =3D vmalloc(alloc_size); + if (!bits) goto out_nofds; } @@ -614,7 +622,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set _= _user *outp, =20 out: if (bits !=3D stack_fds) - kfree(bits); + kvfree(bits); out_nofds: return ret; } --=20 2.14.1