From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67F60C433FE for ; Mon, 31 Jan 2022 17:15:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1380839AbiAaRP3 (ORCPT ); Mon, 31 Jan 2022 12:15:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245379AbiAaRPI (ORCPT ); Mon, 31 Jan 2022 12:15:08 -0500 Received: from mail-lf1-x12a.google.com (mail-lf1-x12a.google.com [IPv6:2a00:1450:4864:20::12a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0EDE5C061758 for ; Mon, 31 Jan 2022 09:14:11 -0800 (PST) Received: by mail-lf1-x12a.google.com with SMTP id p27so28300533lfa.1 for ; Mon, 31 Jan 2022 09:14:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=LyPS3Szlv9qh4gjOhtDn0yktNtrjffm5++UyP/rn3f4=; b=IGrlP+yBf6s4abW3h/cJFVuo4oT6B/rCIdFAATcNHSETF4LFNBRGHRQs5xKe0zeKvo OTDgTiAvO1hcP64cT6bGl/o/InpN6yki/Q09yOh48jad3z5kgvyM+g2XkwDoGy+eUwN5 eJS1HTB/9ONSUcIHEvpuGA4ZZH+VXrDCXnEnR/1kGiclbHgu1nz28DhGjRgkT7ZKUucT Tj5h7gMC94Lu9uAomUqxhJqGFG+JhwECDdChE/WeuOe8/g/fdCSU2hYWBcNGIKc3RfnN +hYN/OUYtpgZKNwvwLQQKwfMQMSXgPtpYn9sD0KyNPB3WE2innFHqvsSG4XF7hHfGfwK dJ4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=LyPS3Szlv9qh4gjOhtDn0yktNtrjffm5++UyP/rn3f4=; b=ay7knP2qWjr/9l2lbL/YYsZiK35CMGudA0Jlwx5HMsc99ybrIC4hV5GjDx3jF47kro o4Dl7XzlZLrreX1E/3vaQpHAuYDQ4utpkuUkdugSV7lqKBHd/ENT5vQAIm23lCwxlsmC IbbLM3B82SBN1CL1TnjHpmmUtEyRKSBeLa+J1ISN9pmBdyXHyUkXYODpMmyvjlssZn2S vNlk1Yqn6iggQ3gS2+YlVWwmiHMVGEudbM/5jVSTvqEUfjVMZsFjnlN57MrqacaYm9cd H6oJF6Wjd5+wDQqgwa0kH/U273EhuPi5LHGVY+lSsDgtMSgjaRzYBmUs5wyPOeZH89Yz i/qw== X-Gm-Message-State: AOAM530M8zvotXObkKwNK8qCxiC+lrsEBIPYW00wcekm8vIEQ0OA/u1r SvZ4qRwGvE6FU+nf3IxrCzF0CLHP1enSr/w4pa1j5A== X-Google-Smtp-Source: ABdhPJxweuySzWpnHG0bnPZw7yNmwgLjQXGJ1V54SkKRiPJ42RiOvY0GQO0vziXQ3O+oZQfZdQKYNMY89gobBQr0vB4= X-Received: by 2002:a19:ee04:: with SMTP id g4mr15639705lfb.157.1643649248869; Mon, 31 Jan 2022 09:14:08 -0800 (PST) MIME-Version: 1.0 References: <20220131153740.2396974-1-willy@infradead.org> <871r0nriy4.fsf@email.froward.int.ebiederm.org> <877dafq3bw.fsf@email.froward.int.ebiederm.org> In-Reply-To: From: Jann Horn Date: Mon, 31 Jan 2022 18:13:42 +0100 Message-ID: Subject: Re: [PATCH] binfmt_elf: Take the mmap lock when walking the VMA list To: Matthew Wilcox Cc: "Eric W. Biederman" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Alexander Viro , Denys Vlasenko , Kees Cook , Vlastimil Babka , "Liam R . Howlett" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Mon, Jan 31, 2022 at 5:35 PM Matthew Wilcox wrote: > > On Mon, Jan 31, 2022 at 10:26:11AM -0600, Eric W. Biederman wrote: > > Matthew Wilcox writes: > > > > > On Mon, Jan 31, 2022 at 10:03:31AM -0600, Eric W. Biederman wrote: > > >> "Matthew Wilcox (Oracle)" writes: > > >> > > >> > I'm not sure if the VMA list can change under us, but dump_vma_snapshot() > > >> > is very careful to take the mmap_lock in write mode. We only need to > > >> > take it in read mode here as we do not care if the size of the stack > > >> > VMA changes underneath us. > > >> > > > >> > If it can be changed underneath us, this is a potential use-after-free > > >> > for a multithreaded process which is dumping core. > > >> > > >> The problem is not multi-threaded process so much as processes that > > >> share their mm. > > > > > > I don't understand the difference. I appreciate that another process can > > > get read access to an mm through, eg, /proc, but how can another process > > > (that isn't a thread of this process) modify the VMAs? > > > > There are a couple of ways. > > > > A classic way is a multi-threads process can call vfork, and the > > mm_struct is shared with the child until exec is called. > > While true, I thought the semantics of vfork() were that the parent > was suspended. Given that, it can't core dump until the child execs > ... right? > > > A process can do this more deliberately by forking a child using > > clone(CLONE_VM) and not including CLONE_THREAD. Supporting this case > > is a hold over from before CLONE_THREAD was supported in the kernel and > > such processes were used to simulate threads. The syscall clone() is kind of the generalized version of fork() and vfork(), and it lets you create fun combinations of these things (some of which might be useful, others which make little sense), and e.g. vfork() is basically just clone() with CLONE_VM (for sharing address space) plus CLONE_VFORK (block until child exits or execs) plus SIGCHLD (child should send SIGCHLD to parent when it terminates). (Some combinations are disallowed, but you can IIRC make things like threads with separate FD tables, or processes that share virtual memory and signal handler tables but aren't actual threads.) Note that until commit 0258b5fd7c71 ("coredump: Limit coredumps to a single thread group", first in 5.16), coredumps would always tear down every process that shares the MM before dumping, and the coredumping code was trying to rely on that to protect against concurrency. (Which at some point didn't actually work anymore, see below...) > That is a multithreaded process then! Maybe not in the strict POSIX > compliance sense, but the intent is to be a multithreaded process. > ie multiple threads of execution, sharing an address space. current_is_single_threaded() agrees with you. :P > > It also happens that there are subsystems in the kernel that do things > > like kthread_use_mm that can also be modifying the mm during a coredump. > > Yikes. That's terrifying. It's really legitimate for a kthread to > attach to a process and start tearing down VMAs? I don't know anything about kthreads doing this, but a fun way to remotely mess with VMAs is userfaultfd. See https://bugs.chromium.org/p/project-zero/issues/detail?id=1790 ("Issue 1790: Linux: missing locking between ELF coredump code and userfaultfd VMA modification") for the long version - but the gist is that userfaultfd can set/clear flags on virtual address ranges (implemented as flags on VMAs), and that may involve splitting VMAs (when the boundaries of the specified range don't correspond to existing VMA boundaries) or merging VMAs (when the flags of adjacent VMAs become equal). And userfaultfd can by design be used remotely on another process (so long as that process first created the userfaultfd and then handed it over). That's why I added that locking in the coredumping code. > > > Uhh .. that seems like it needs a lot more understanding of binfmt_elf > > > than I currently possess. I'd rather spend my time working on folios > > > than learning much more about binfmt_elf. I was just trying to fix an > > > assertion failure with the maple tree patches (we now assert that you're > > > holding a lock when walking the list of VMAs). > > > > Fair enough. I will put it on my list of things to address. > > Thanks. Now that I've disclosed it's a UAF, I hope you're able to > get to it soon. Otherwise we should put this band-aid in for now > and you can address it properly in the fullness of time.