From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F3FFC433DB for ; Mon, 22 Feb 2021 17:42:39 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0FAD364E77 for ; Mon, 22 Feb 2021 17:42:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0FAD364E77 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=darnok.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=8SiewRHHGE3sT7aAAduqfzL1fxFFeAS2zwhwxr9oHSU=; b=qitYwIPAwCa6GVwl6e2RpyAJz qIzLFxB8EFotUf7KFfiIjvDMLQPtNB7u1T+gskLPoSdFB8O7J0AVnys7siuDmDezAi6USLTYE66N7 btkPHsAyejbcZfPYHH1pcwOqSxsiZtFbV0rzEDIGHSMU85QpOF2Hsf7uAaKraaJN1Vy+VqBXzHxPw 43jnpB3BeOx50cmwB39uzNdw969YlUrF5xdWP1WCsFtPpIB7RSdJtWRgnLPKfOjsuUd5iQ0R3R1zA U07TJD+rwNj11e9Hqa4iqcaYX4k64nGpNDthg1gZ2higEUYnwZ7slq8LdVoG6JIqBS0oIFIWvvms8 TYIoyxiwQ==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1lEFCg-0005gi-Ng; Mon, 22 Feb 2021 17:41:02 +0000 Received: from mail-qk1-x729.google.com ([2607:f8b0:4864:20::729]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1lEFCP-0005aF-Cl for linux-arm-kernel@lists.infradead.org; Mon, 22 Feb 2021 17:40:52 +0000 Received: by mail-qk1-x729.google.com with SMTP id z190so13390266qka.9 for ; Mon, 22 Feb 2021 09:40:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=bUxy2tCwU4OV+6MXdFIf9MDJ8sFuj2MZRqKqAUO9jLI=; b=EhROeVtMD+5mOBvlGdL251imICTnvTHc/7iFBoxK4NCfgjyAEdI1dZ4UCgTerlBEsY Gimc8bmokQgllWRtU2/3qqEtTMZiOVVNH2kXc9XpJhSEVpgI7AxUBuWyKlKgJckEvKIN mWns8Kou5I13nLozDmnwQvdRm2Oxc2HDwUma0bHWFLo2e6DYlFTLB2oNOXATjUaDZyZr LbRNU9sGzpnUJMbVcOtQ+n3bCkBmGW2oVP3zAC6vJxO3pj6iYP9dzlrws4Ttu6S6yPoo Cm1OwQoxqnWPVWYAr3UKVRCXN6PUxUCp7+Prc8/wkga0mmRVtFTJbGC7boOJXCe+v/0N Rz4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition :content-transfer-encoding:in-reply-to; bh=bUxy2tCwU4OV+6MXdFIf9MDJ8sFuj2MZRqKqAUO9jLI=; b=FH2th76gtFpqm8qWCQPoysyQrKFJSTWlhYpdY42vYgLs5O6P+wUlCOWAZwMuKcjh41 LIk8UJgOKFSpGcNH3uzgrT3FZ1R4SLl8+jhAMsVQBVEZGvGjPmUEyfDB7vtuf0ZL6dxu 9R1IUfSpOaiNpxzYd7zVcNjjg5QHYthsx6aKt/zrmD8mJ0biSBHbYJzkHABSYYHqusg6 oQ3ly0VbkrK3rGhO5AAF+1jAib0nej0EIMnciIVXaHaQccK0lCLx47EYWPMq8cW02hZm mXU1Hs0C6EDo1bDCFH7Nokd3bbPnlkP7zJ6dQQfrf+CX94GS2W0uYmPsG5LDfTciXJ8F 8jXg== X-Gm-Message-State: AOAM532tLGAq+S1/zLAaiPS0p6B8cpjDMFioV7vzwc0boiWaJE305VbG iiNQOSD9uGarDY8KSYVD4E4= X-Google-Smtp-Source: ABdhPJxwZUGZvt981ZlKW9EI5SrpMc65t4DYb+3hPcfGNn8aV8TLvgI1943y1GB89Co9efRx9HNRSA== X-Received: by 2002:a37:aa94:: with SMTP id t142mr23039952qke.40.1614015639559; Mon, 22 Feb 2021 09:40:39 -0800 (PST) Received: from fedora (209-6-208-110.s8556.c3-0.smr-cbr2.sbo-smr.ma.cable.rcncustomer.com. [209.6.208.110]) by smtp.gmail.com with ESMTPSA id m190sm12716464qkc.66.2021.02.22.09.40.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Feb 2021 09:40:38 -0800 (PST) Date: Mon, 22 Feb 2021 12:40:36 -0500 From: Konrad Rzeszutek Wilk To: David Hildenbrand Subject: Re: [PATCH] mm, kasan: don't poison boot memory Message-ID: <20210222174036.GA399355@fedora> References: <487751e1ccec8fcd32e25a06ce000617e96d7ae1.1613595269.git.andreyknvl@google.com> <797fae72-e3ea-c0b0-036a-9283fa7f2317@oracle.com> <1ac78f02-d0af-c3ff-cc5e-72d6b074fc43@redhat.com> <56c97056-6d8b-db0e-e303-421ee625abe3@redhat.com> <4c7351e2-e97c-e740-5800-ada5504588aa@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <4c7351e2-e97c-e740-5800-ada5504588aa@redhat.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210222_124045_578216_5AA7CC46 X-CRM114-Status: GOOD ( 58.70 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Linux ARM , Marco Elver , Dhaval Giani , Mike Rapoport , Linux Memory Management List , Catalin Marinas , Kevin Brodsky , Will Deacon , Branislav Rankov , kasan-dev , LKML , Christoph Hellwig , George Kennedy , Alexander Potapenko , Evgenii Stepanov , Andrey Konovalov , Andrey Ryabinin , Andrew Morton , Vincenzo Frascino , Peter Collingbourne , Dmitry Vyukov Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, Feb 22, 2021 at 05:39:29PM +0100, David Hildenbrand wrote: > On 22.02.21 17:13, David Hildenbrand wrote: > > On 22.02.21 16:13, George Kennedy wrote: > > > = > > > = > > > On 2/22/2021 4:52 AM, David Hildenbrand wrote: > > > > On 20.02.21 00:04, George Kennedy wrote: > > > > > = > > > > > = > > > > > On 2/19/2021 11:45 AM, George Kennedy wrote: > > > > > > = > > > > > > = > > > > > > On 2/18/2021 7:09 PM, Andrey Konovalov wrote: > > > > > > > On Fri, Feb 19, 2021 at 1:06 AM George Kennedy > > > > > > > wrote: > > > > > > > > = > > > > > > > > = > > > > > > > > On 2/18/2021 3:55 AM, David Hildenbrand wrote: > > > > > > > > > On 17.02.21 21:56, Andrey Konovalov wrote: > > > > > > > > > > During boot, all non-reserved memblock memory is expose= d to the > > > > > > > > > > buddy > > > > > > > > > > allocator. Poisoning all that memory with KASAN lengthe= ns boot > > > > > > > > > > time, > > > > > > > > > > especially on systems with large amount of RAM. This pa= tch makes > > > > > > > > > > page_alloc to not call kasan_free_pages() on all new me= mory. > > > > > > > > > > = > > > > > > > > > > __free_pages_core() is used when exposing fresh memory = during > > > > > > > > > > system > > > > > > > > > > boot and when onlining memory during hotplug. This patc= h adds a new > > > > > > > > > > FPI_SKIP_KASAN_POISON flag and passes it to __free_page= s_ok() > > > > > > > > > > through > > > > > > > > > > free_pages_prepare() from __free_pages_core(). > > > > > > > > > > = > > > > > > > > > > This has little impact on KASAN memory tracking. > > > > > > > > > > = > > > > > > > > > > Assuming that there are no references to newly exposed = pages > > > > > > > > > > before they > > > > > > > > > > are ever allocated, there won't be any intended (but bu= ggy) > > > > > > > > > > accesses to > > > > > > > > > > that memory that KASAN would normally detect. > > > > > > > > > > = > > > > > > > > > > However, with this patch, KASAN stops detecting wild an= d large > > > > > > > > > > out-of-bounds accesses that happen to land on a fresh m= emory page > > > > > > > > > > that > > > > > > > > > > was never allocated. This is taken as an acceptable tra= de-off. > > > > > > > > > > = > > > > > > > > > > All memory allocated normally when the boot is over kee= ps getting > > > > > > > > > > poisoned as usual. > > > > > > > > > > = > > > > > > > > > > Signed-off-by: Andrey Konovalov > > > > > > > > > > Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d > > > > > > > > > Not sure this is the right thing to do, see > > > > > > > > > = > > > > > > > > > https://lkml.kernel.org/r/bcf8925d-0949-3fe1-baa8-cc536c5= 29860@oracle.com > > > > > > > > > = > > > > > > > > > = > > > > > > > > > = > > > > > > > > > Reversing the order in which memory gets allocated + used= during > > > > > > > > > boot > > > > > > > > > (in a patch by me) might have revealed an invalid memory = access > > > > > > > > > during > > > > > > > > > boot. > > > > > > > > > = > > > > > > > > > I suspect that that issue would no longer get detected wi= th your > > > > > > > > > patch, as the invalid memory access would simply not get = detected. > > > > > > > > > Now, I cannot prove that :) > > > > > > > > Since David's patch we're having trouble with the iBFT ACPI= table, > > > > > > > > which > > > > > > > > is mapped in via kmap() - see acpi_map() in "drivers/acpi/o= sl.c". > > > > > > > > KASAN > > > > > > > > detects that it is being used after free when ibft_init() a= ccesses > > > > > > > > the > > > > > > > > iBFT table, but as of yet we can't find where it get's free= d (we've > > > > > > > > instrumented calls to kunmap()). > > > > > > > Maybe it doesn't get freed, but what you see is a wild or a l= arge > > > > > > > out-of-bounds access. Since KASAN marks all memory as freed d= uring the > > > > > > > memblock->page_alloc transition, such bugs can manifest as > > > > > > > use-after-frees. > > > > > > = > > > > > > It gets freed and re-used. By the time the iBFT table is access= ed by > > > > > > ibft_init() the page has been over-written. > > > > > > = > > > > > > Setting page flags like the following before the call to kmap() > > > > > > prevents the iBFT table page from being freed: > > > > > = > > > > > Cleaned up version: > > > > > = > > > > > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c > > > > > index 0418feb..8f0a8e7 100644 > > > > > --- a/drivers/acpi/osl.c > > > > > +++ b/drivers/acpi/osl.c > > > > > @@ -287,9 +287,12 @@ static void __iomem *acpi_map(acpi_physical_= address > > > > > pg_off, unsigned long pg_sz) > > > > > = > > > > > =A0 =A0=A0=A0=A0 pfn =3D pg_off >> PAGE_SHIFT; > > > > > =A0 =A0=A0=A0=A0 if (should_use_kmap(pfn)) { > > > > > +=A0=A0=A0 =A0=A0=A0 struct page *page =3D pfn_to_page(pfn); > > > > > + > > > > > =A0 =A0=A0=A0=A0 =A0=A0=A0 if (pg_sz > PAGE_SIZE) > > > > > =A0 =A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return NULL; > > > > > -=A0=A0=A0 =A0=A0=A0 return (void __iomem __force *)kmap(pfn_to_p= age(pfn)); > > > > > +=A0=A0=A0 =A0=A0=A0 SetPageReserved(page); > > > > > +=A0=A0=A0 =A0=A0=A0 return (void __iomem __force *)kmap(page); > > > > > =A0 =A0=A0=A0=A0 } else > > > > > =A0 =A0=A0=A0=A0 =A0=A0=A0 return acpi_os_ioremap(pg_off, pg_sz= ); > > > > > =A0 =A0} > > > > > @@ -299,9 +302,12 @@ static void acpi_unmap(acpi_physical_address > > > > > pg_off, void __iomem *vaddr) > > > > > =A0 =A0=A0=A0=A0 unsigned long pfn; > > > > > = > > > > > =A0 =A0=A0=A0=A0 pfn =3D pg_off >> PAGE_SHIFT; > > > > > -=A0=A0=A0 if (should_use_kmap(pfn)) > > > > > -=A0=A0=A0 =A0=A0=A0 kunmap(pfn_to_page(pfn)); > > > > > -=A0=A0=A0 else > > > > > +=A0=A0=A0 if (should_use_kmap(pfn)) { > > > > > +=A0=A0=A0 =A0=A0=A0 struct page *page =3D pfn_to_page(pfn); > > > > > + > > > > > +=A0=A0=A0 =A0=A0=A0 ClearPageReserved(page); > > > > > +=A0=A0=A0 =A0=A0=A0 kunmap(page); > > > > > +=A0=A0=A0 } else > > > > > =A0 =A0=A0=A0=A0 =A0=A0=A0 iounmap(vaddr); > > > > > =A0 =A0} > > > > > = > > > > > David, the above works, but wondering why it is now necessary. ku= nmap() > > > > > is not hit. What other ways could a page mapped via kmap() be unm= apped? > > > > > = > > > > = > > > > Let me look into the code ... I have little experience with ACPI > > > > details, so bear with me. > > > > = > > > > I assume that acpi_map()/acpi_unmap() map some firmware blob that is > > > > provided via firmware/bios/... to us. > > > > = > > > > should_use_kmap() tells us whether > > > > a) we have a "struct page" and should kmap() that one > > > > b) we don't have a "struct page" and should ioremap. > > > > = > > > > As it is a blob, the firmware should always reserve that memory reg= ion > > > > via memblock (e.g., memblock_reserve()), such that we either > > > > 1) don't create a memmap ("struct page") at all (-> case b) ) > > > > 2) if we have to create e memmap, we mark the page PG_reserved and > > > > =A0=A0 *never* expose it to the buddy (-> case a) ) > > > > = > > > > = > > > > Are you telling me that in this case we might have a memmap for the= HW > > > > blob that is *not* PG_reserved? In that case it most probably got > > > > exposed to the buddy where it can happily get allocated/freed. > > > > = > > > > The latent BUG would be that that blob gets exposed to the system l= ike > > > > ordinary RAM, and not reserved via memblock early during boot. > > > > Assuming that blob has a low physical address, with my patch it will > > > > get allocated/used a lot earlier - which would mean we trigger this > > > > latent BUG now more easily. > > > > = > > > > There have been similar latent BUGs on ARM boards that my patch > > > > discovered where special RAM regions did not get marked as reserved > > > > via the device tree properly. > > > > = > > > > Now, this is just a wild guess :) Can you dump the page when mapping > > > > (before PageReserved()) and when unmapping, to see what the state of > > > > that memmap is? > > > = > > > Thank you David for the explanation and your help on this, > > > = > > > dump_page() before PageReserved and before kmap() in the above patch: > > > = > > > [=A0=A0=A0 1.116480] ACPI: Core revision 20201113 > > > [=A0=A0=A0 1.117628] XXX acpi_map: about to call kmap()... > > > [=A0=A0=A0 1.118561] page:ffffea0002f914c0 refcount:0 mapcount:0 > > > mapping:0000000000000000 index:0x0 pfn:0xbe453 > > > [=A0=A0=A0 1.120381] flags: 0xfffffc0000000() > > > [=A0=A0=A0 1.121116] raw: 000fffffc0000000 ffffea0002f914c8 ffffea000= 2f914c8 > > > 0000000000000000 > > > [=A0=A0=A0 1.122638] raw: 0000000000000000 0000000000000000 00000000f= fffffff > > > 0000000000000000 > > > [=A0=A0=A0 1.124146] page dumped because: acpi_map pre SetPageReserved > > > = > > > I also added dump_page() before unmapping, but it is not hit. The > > > following for the same pfn now shows up I believe as a result of sett= ing > > > PageReserved: > > > = > > > [=A0=A0 28.098208] BUG:Bad page state in process mo dprobe=A0 pfn:be4= 53 > > > [=A0=A0 28.098394] page:ffffea0002f914c0 refcount:0 mapcount:0 > > > mapping:0000000000000000 index:0x1 pfn:0xbe453 > > > [=A0=A0 28.098394] flags: 0xfffffc0001000(reserved) > > > [=A0=A0 28.098394] raw: 000fffffc0001000 dead000000000100 dead0000000= 00122 > > > 0000000000000000 > > > [=A0=A0 28.098394] raw: 0000000000000001 0000000000000000 00000000fff= fffff > > > 0000000000000000 > > > [=A0=A0 28.098394] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag= (s) set > > > [=A0=A0 28.098394] page_owner info is not present (never set?) > > > [=A0=A0 28.098394] Modules linked in: > > > [=A0=A0 28.098394] CPU: 2 PID: 204 Comm: modprobe Not tainted 5.11.0-= 3dbd5e3 #66 > > > [=A0=A0 28.098394] Hardware name: QEMU Standard PC (i440FX + PIIX, 19= 96), > > > BIOS 0.0.0 02/06/2015 > > > [=A0=A0 28.098394] Call Trace: > > > [=A0=A0 28.098394]=A0 dump_stack+0xdb/0x120 > > > [=A0=A0 28.098394]=A0 bad_page.cold.108+0xc6/0xcb > > > [=A0=A0 28.098394]=A0 check_new_page_bad+0x47/0xa0 > > > [=A0=A0 28.098394]=A0 get_page_from_freelist+0x30cd/0x5730 > > > [=A0=A0 28.098394]=A0 ? __isolate_free_page+0x4f0/0x4f0 > > > [=A0=A0 28.098394]=A0 ? init_object+0x7e/0x90 > > > [=A0=A0 28.098394]=A0 __alloc_pages_nodemask+0x2d8/0x650 > > > [=A0=A0 28.098394]=A0 ? write_comp_data+0x2f/0x90 > > > [=A0=A0 28.098394]=A0 ? __alloc_pages_slowpath.constprop.103+0x2110/0= x2110 > > > [=A0=A0 28.098394]=A0 ? __sanitizer_cov_trace_pc+0x21/0x50 > > > [=A0=A0 28.098394]=A0 alloc_pages_vma+0xe2/0x560 > > > [=A0=A0 28.098394]=A0 do_fault+0x194/0x12c0 > > > [=A0=A0 28.098394]=A0 ? write_comp_data+0x2f/0x90 > > > [=A0=A0 28.098394]=A0 __handle_mm_fault+0x1650/0x26c0 > > > [=A0=A0 28.098394]=A0 ? copy_page_range+0x1350/0x1350 > > > [=A0=A0 28.098394]=A0 ? write_comp_data+0x2f/0x90 > > > [=A0=A0 28.098394]=A0 ? write_comp_data+0x2f/0x90 > > > [=A0=A0 28.098394]=A0 handle_mm_fault+0x1f9/0x810 > > > [=A0=A0 28.098394]=A0 ? write_comp_data+0x2f/0x90 > > > [=A0=A0 28.098394]=A0 do_user_addr_fault+0x6f7/0xca0 > > > [=A0=A0 28.098394]=A0 exc_page_fault+0xaf/0x1a0 > > > [=A0=A0 28.098394]=A0 asm_exc_page_fault+0x1e/0x30 > > > [=A0=A0 28.098394] RIP: 0010:__clear_user+0x30/0x60 > > = > > I think the PAGE_FLAGS_CHECK_AT_PREP check in this instance means that > > someone is trying to allocate that page with the PG_reserved bit set. > > This means that the page actually was exposed to the buddy. > > = > > However, when you SetPageReserved(), I don't think that PG_buddy is set > > and the refcount is 0. That could indicate that the page is on the buddy > > PCP list. Could be that it is getting reused a couple of times. > > = > > The PFN 0xbe453 looks a little strange, though. Do we expect ACPI tables > > close to 3 GiB ? No idea. Could it be that you are trying to map a wrong > > table? Just a guess. Nah, ACPI MADT enumerates the table and that is the proper location of it. > = > ... but I assume ibft_check_device() would bail out on an invalid checksu= m. > So the question is, why is this page not properly marked as reserved > already. The ibft_check_device ends up being called as module way way after the kernel has cleaned the memory. The funny thing about iBFT is that (it is also mentioned in the spec) that the table can resize in memory .. or in the ACPI regions (which have no E820_RAM and are considered "MMIO" regions). Either place is fine, so it can be in either RAM or MMIO :-( > = > -- = > Thanks, > = > David / dhildenb > = _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel