From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02A89C64E7B for ; Mon, 30 Nov 2020 20:27:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 429652073C for ; Mon, 30 Nov 2020 20:27:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ChID5aWK" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 429652073C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 975D06B0036; Mon, 30 Nov 2020 15:27:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8FDEC6B005C; Mon, 30 Nov 2020 15:27:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79EA48D0001; Mon, 30 Nov 2020 15:27:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0175.hostedemail.com [216.40.44.175]) by kanga.kvack.org (Postfix) with ESMTP id 5390B6B0036 for ; Mon, 30 Nov 2020 15:27:14 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 0BAEA180AD822 for ; Mon, 30 Nov 2020 20:27:14 +0000 (UTC) X-FDA: 77542219188.12.neck29_2201729273a4 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id DA95518016EA1 for ; Mon, 30 Nov 2020 20:27:13 +0000 (UTC) X-HE-Tag: neck29_2201729273a4 X-Filterd-Recvd-Size: 14476 Received: from mail-lj1-f193.google.com (mail-lj1-f193.google.com [209.85.208.193]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Mon, 30 Nov 2020 20:27:13 +0000 (UTC) Received: by mail-lj1-f193.google.com with SMTP id y16so20079220ljk.1 for ; Mon, 30 Nov 2020 12:27:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=lDZ5X6WZg4oXWrzpU1s7ivHNltSMZSLkihBqR57bbc4=; b=ChID5aWK1J8Dr7qE6xzUGdBM2InzVzSWqMA5JNKDFsUFIEnwoRuUaP4ze3j89ILfrp MWDJteoca5YbtnRrDTMoW1VwB0tqZ5eNltnVszNopSJEBcxQGbJ5IpbFy8ymUA8Tv2Zx uSeLaWYeaBV8DZNoqcjhPwXAYJtOFRARO2DBK6mNBTeSqrK2VBXjZVo5wc1HMZ4GEVee 9a5hoEUuBpHsHjFX9lI+DQCy3WCm6qDXESz2AZeivrkiVKYLdBo/cxJjbLkxLYo1ua3C lqzce/dJtZz0e9Dod1rK0rIa+GJ21D9LCCQBT8+48NEPDPFqf0DcD6JqKpUerOMPa6Hi Yr7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=lDZ5X6WZg4oXWrzpU1s7ivHNltSMZSLkihBqR57bbc4=; b=i/bVgGwYrjhE9OFT5FgdNKcJJXe4xc407EpIi7giOZhlbBrmxy+NMiFKsEWDisLsy5 pKXIO23rX+N5sF8wUJHlEmBU/9aKD+ujrWAwVh1r2siG6WfqwDy4w6vIkxurd7F/NLpT Gfg+CALkd3oHFlN1nGjn8os+PAj+kKeQ/nTtDmTtTlnEjtMZEg/RR0hXAiBk1SRGvj6U a1uQlO8/RhDeaPnWRJ+Kkfov8dfyIDLXc0rmmwLbFYHO2aawP4OSP1ZDA1nsV8rJbAg3 RqBx6FohPy2rEd501hJKw89g7pHL82OKECaJ6ydyeg/LYN083h4nUUW5Jq4tkYDXP3yu 21Tw== X-Gm-Message-State: AOAM531sp+Ya4eX2fSfGwhJBslcp5vV3eLHl4HP6oBG2apL1kBe59PDK m8PUCWn7ybNl4GqsWyvVfAW07zVg8h87aA== X-Google-Smtp-Source: ABdhPJyWdXJqwzvdEWtGKon4yAMFKqjdI9fNYK6LaEzoi1ZobKV0sATs6JyIp0V1Mg9jL8K5bIbldg== X-Received: by 2002:a2e:8e6c:: with SMTP id t12mr11034447ljk.441.1606768031766; Mon, 30 Nov 2020 12:27:11 -0800 (PST) Received: from [192.168.1.39] (88-114-211-119.elisa-laajakaista.fi. [88.114.211.119]) by smtp.gmail.com with ESMTPSA id b145sm2565287lfg.225.2020.11.30.12.27.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Nov 2020 12:27:10 -0800 (PST) Subject: Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack To: Andy Lutomirski Cc: linux-hardening@vger.kernel.org, Andrew Morton , Linux-MM , LKML , Jann Horn , Kees Cook , Matthew Wilcox , Mike Rapoport , Linux API References: <20201129211517.2208-1-toiwoton@gmail.com> From: Topi Miettinen Message-ID: <4f0b9735-cf55-30cf-f78c-4affc5f8ca3c@gmail.com> Date: Mon, 30 Nov 2020 22:27:08 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 30.11.2020 19.57, Andy Lutomirski wrote: > On Sun, Nov 29, 2020 at 1:20 PM Topi Miettinen wrote: >> >> Writing a new value of 3 to /proc/sys/kernel/randomize_va_space >> enables full randomization of memory mappings created with mmap(NULL, >> ...). With 2, the base of the VMA used for such mappings is random, >> but the mappings are created in predictable places within the VMA and >> in sequential order. With 3, new VMAs are created to fully randomize >> the mappings. >> >> Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if not >> necessary and the location of stack and vdso are also randomized. >> >> The method is to randomize the new address without considering >> VMAs. If the address fails checks because of overlap with the stack >> area (or in case of mremap(), overlap with the old mapping), the >> operation is retried a few times before falling back to old method. >> >> On 32 bit systems this may cause problems due to increased VM >> fragmentation if the address space gets crowded. >> >> On all systems, it will reduce performance and increase memory usage >> due to less efficient use of page tables and inability to merge >> adjacent VMAs with compatible attributes. In the worst case, >> additional page table entries of up to 4 pages are created for each >> mapping, so with small mappings there's considerable penalty. >> >> In this example with sysctl.kernel.randomize_va_space = 2, dynamic >> loader, libc, anonymous memory reserved with mmap() and locale-archive >> are located close to each other: >> >> $ cat /proc/self/maps (only first line for each object shown for brevity) >> 5acea452d000-5acea452f000 r--p 00000000 fe:0c 1868624 /usr/bin/cat >> 74f438f90000-74f4394f2000 r--p 00000000 fe:0c 2473999 /usr/lib/locale/locale-archive >> 74f4394f2000-74f4395f2000 rw-p 00000000 00:00 0 >> 74f4395f2000-74f439617000 r--p 00000000 fe:0c 2402332 /usr/lib/x86_64-linux-gnu/libc-2.31.so >> 74f4397b3000-74f4397b9000 rw-p 00000000 00:00 0 >> 74f4397e5000-74f4397e6000 r--p 00000000 fe:0c 2400754 /usr/lib/x86_64-linux-gnu/ld-2.31.so >> 74f439811000-74f439812000 rw-p 00000000 00:00 0 >> 7fffdca0d000-7fffdca2e000 rw-p 00000000 00:00 0 [stack] >> 7fffdcb49000-7fffdcb4d000 r--p 00000000 00:00 0 [vvar] >> 7fffdcb4d000-7fffdcb4f000 r-xp 00000000 00:00 0 [vdso] >> >> With sysctl.kernel.randomize_va_space = 3, they are located at >> unrelated addresses and the order is random: >> >> $ echo 3 > /proc/sys/kernel/randomize_va_space >> $ cat /proc/self/maps (only first line for each object shown for brevity) >> 3850520000-3850620000 rw-p 00000000 00:00 0 >> 28cfb4c8000-28cfb4cc000 r--p 00000000 00:00 0 [vvar] >> 28cfb4cc000-28cfb4ce000 r-xp 00000000 00:00 0 [vdso] >> 9e74c385000-9e74c387000 rw-p 00000000 00:00 0 >> a42e0233000-a42e0234000 r--p 00000000 fe:0c 2400754 /usr/lib/x86_64-linux-gnu/ld-2.31.so >> a42e025f000-a42e0260000 rw-p 00000000 00:00 0 >> bea40427000-bea4044c000 r--p 00000000 fe:0c 2402332 /usr/lib/x86_64-linux-gnu/libc-2.31.so >> bea405e8000-bea405ec000 rw-p 00000000 00:00 0 >> f6d446fa000-f6d44c5c000 r--p 00000000 fe:0c 2473999 /usr/lib/locale/locale-archive >> fcfbf684000-fcfbf6a5000 rw-p 00000000 00:00 0 [stack] >> 619aba62d000-619aba62f000 r--p 00000000 fe:0c 1868624 /usr/bin/cat >> >> CC: Andrew Morton >> CC: Jann Horn >> CC: Kees Cook >> CC: Matthew Wilcox >> CC: Mike Rapoport >> CC: Linux API >> Signed-off-by: Topi Miettinen >> --- >> v2: also randomize mremap(..., MREMAP_MAYMOVE) >> v3: avoid stack area and retry in case of bad random address (Jann >> Horn), improve description in kernel.rst (Matthew Wilcox) >> v4: >> - use /proc/$pid/maps in the example (Mike Rapaport) >> - CCs (Andrew Morton) >> - only check randomize_va_space == 3 >> v5: randomize also vdso and stack >> --- >> Documentation/admin-guide/hw-vuln/spectre.rst | 6 ++-- >> Documentation/admin-guide/sysctl/kernel.rst | 20 +++++++++++++ >> arch/x86/entry/vdso/vma.c | 26 +++++++++++++++- >> include/linux/mm.h | 8 +++++ >> init/Kconfig | 2 +- >> mm/mmap.c | 30 +++++++++++++------ >> mm/mremap.c | 27 +++++++++++++++++ >> mm/util.c | 6 ++++ >> 8 files changed, 111 insertions(+), 14 deletions(-) >> >> diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst >> index e05e581af5cf..9ea250522077 100644 >> --- a/Documentation/admin-guide/hw-vuln/spectre.rst >> +++ b/Documentation/admin-guide/hw-vuln/spectre.rst >> @@ -254,7 +254,7 @@ Spectre variant 2 >> left by the previous process will also be cleared. >> >> User programs should use address space randomization to make attacks >> - more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2). >> + more difficult (Set /proc/sys/kernel/randomize_va_space = 1, 2 or 3). >> >> 3. A virtualized guest attacking the host >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> @@ -499,8 +499,8 @@ Spectre variant 2 >> more overhead and run slower. >> >> User programs should use address space randomization >> - (/proc/sys/kernel/randomize_va_space = 1 or 2) to make attacks more >> - difficult. >> + (/proc/sys/kernel/randomize_va_space = 1, 2 or 3) to make attacks >> + more difficult. >> >> 3. VM mitigation >> ^^^^^^^^^^^^^^^^ >> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst >> index d4b32cc32bb7..806e3b29d2b5 100644 >> --- a/Documentation/admin-guide/sysctl/kernel.rst >> +++ b/Documentation/admin-guide/sysctl/kernel.rst >> @@ -1060,6 +1060,26 @@ that support this feature. >> Systems with ancient and/or broken binaries should be configured >> with ``CONFIG_COMPAT_BRK`` enabled, which excludes the heap from process >> address space randomization. >> + >> +3 Additionally enable full randomization of memory mappings created >> + with mmap(NULL, ...). With 2, the base of the VMA used for such >> + mappings is random, but the mappings are created in predictable >> + places within the VMA and in sequential order. With 3, new VMAs >> + are created to fully randomize the mappings. >> + >> + Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if >> + not necessary and the location of stack and vdso are also >> + randomized. >> + >> + On 32 bit systems this may cause problems due to increased VM >> + fragmentation if the address space gets crowded. >> + >> + On all systems, it will reduce performance and increase memory >> + usage due to less efficient use of page tables and inability to >> + merge adjacent VMAs with compatible attributes. In the worst case, >> + additional page table entries of up to 4 pages are created for >> + each mapping, so with small mappings there's considerable penalty. >> + >> == =========================================================================== >> >> >> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c >> index 9185cb1d13b9..03ea884822e3 100644 >> --- a/arch/x86/entry/vdso/vma.c >> +++ b/arch/x86/entry/vdso/vma.c >> @@ -12,6 +12,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> #include >> @@ -32,6 +33,8 @@ >> const size_t name ## _offset = offset; >> #include >> >> +#define MAX_RANDOM_VDSO_RETRIES 5 >> + >> struct vdso_data *arch_get_vdso_data(void *vvar_page) >> { >> return (struct vdso_data *)(vvar_page + _vdso_data_offset); >> @@ -361,7 +364,28 @@ static unsigned long vdso_addr(unsigned long start, unsigned len) >> >> static int map_vdso_randomized(const struct vdso_image *image) >> { >> - unsigned long addr = vdso_addr(current->mm->start_stack, image->size-image->sym_vvar_start); >> + unsigned long addr; >> + >> + if (randomize_va_space == 3) { >> + /* >> + * Randomize vdso address. >> + */ >> + int i = MAX_RANDOM_VDSO_RETRIES; >> + >> + do { >> + int ret; >> + >> + /* Try a few times to find a free area */ >> + addr = arch_mmap_rnd(); >> + >> + ret = map_vdso(image, addr); >> + if (!IS_ERR_VALUE(ret)) >> + return ret; >> + } while (--i >= 0); >> + >> + /* Give up and try the less random way */ >> + } >> + addr = vdso_addr(current->mm->start_stack, image->size-image->sym_vvar_start); > > This is IMO rather ugly. You're picking random numbers and throwing > them at map_vdso(), which throws them at get_unmapped_area(), which > will validate them. And you duplicate the same ugly loop later on. I agree it's not very pretty, but I'd expect that the first number would already have high probability of getting accepted and the probability of all five attempts failing should be very low. For example, on a system with 16GB of RAM, maximum VM of 32GB (35 bits) and 47 bits of available VM space (since kernel takes one bit), the chances of failure should be 1 - 1 / 2^(47 - 35) or only one out of 4096 first attempts should be expected to fail. Chances of all five failing should be 1 / 2^60. > How about instead pushing this logic into get_unmapped_area()? The real work seems to be done in unmapped_area() and similar unmapped_area_topdown(), which traverse a RB tree when checking the address. Maybe a more clever algorithm could walk the tree using a random starting address, and when on a branch the address is found invalid (again, not very likely), instead of restarting from top, the search could mutate some bits of the address and continue randomly either sideways or backing up. I'm not sure how randomness properties would be affected by this and how to guarantee that the random walk will always stop eventually. This is not a problem with the proposed simple approach. -Topi > --Andy >