From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEC32C54FCE for ; Mon, 23 Mar 2020 16:09:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 88AD62072E for ; Mon, 23 Mar 2020 16:09:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="eNuL1zAg" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 88AD62072E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A6FB16B0005; Mon, 23 Mar 2020 12:09:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A2DE46B0006; Mon, 23 Mar 2020 12:09:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 90E156B0007; Mon, 23 Mar 2020 12:09:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0047.hostedemail.com [216.40.44.47]) by kanga.kvack.org (Postfix) with ESMTP id 766E56B0005 for ; Mon, 23 Mar 2020 12:09:58 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 2001D180ACF8F for ; Mon, 23 Mar 2020 16:09:58 +0000 (UTC) X-FDA: 76627113276.13.value92_7419080867123 X-HE-Tag: value92_7419080867123 X-Filterd-Recvd-Size: 5899 Received: from mail-qk1-f193.google.com (mail-qk1-f193.google.com [209.85.222.193]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Mon, 23 Mar 2020 16:09:57 +0000 (UTC) Received: by mail-qk1-f193.google.com with SMTP id b62so6462857qkf.6 for ; Mon, 23 Mar 2020 09:09:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=kAhcxP8/eNJxKmPp4P/1zEx2VH3e7bOqUzz83MhM5no=; b=eNuL1zAguHPlqQybPhPa1MXls3PQYb5vdLS5AZjgIq8RkZBCaxVVIIJpgVPW7QrGSY KA7ZAYszAWHP6UbEIF7t3zDnHlCOL9PW6k4i9mtF7oVXKTMo0W6DhnV6FOCLEaxS3BZL Nl+nfv/a30GWZzhldB3ExREp3xIPvsWdTuISOXGXdlKFmE6QIipUAkCffJK9vyX8B5L+ bVwjJ+7ef9TJV7QA9dy/U3o8SdwKDb5X7jK6dCtQUslWMIalmrWiQxEGYimgiQYMsFYf di9FRqrqUb+Wgu/STcRhGoXM0jqV7OuNMKpOEUF2xp2NMaMyKKc9aLgUCcsVQJUjR88e okYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=kAhcxP8/eNJxKmPp4P/1zEx2VH3e7bOqUzz83MhM5no=; b=nbwv3BT3PR+xEvj9slMGC3fDNEedoCvDcAmffQodp8A1gDcfEX+niygrB+5Ux7PIAa uf/6l67OHIh1JMXQKCAhe0JUMKeSlCASPhviEl/NJvDJuBCYZxSwu9/DIae5Q0Rff6/E ngAkuAJsS4s+KsUUhZO78YU389RJWMY0bQcj69UvTpqEvHOZ0uA78/htQEtKRvUG4AMk OG+EmlwKt+JJ62/TT245PPF6oK9Jy+3RSyR4C5OHj0FngFcBmtVSYgmwJ4jABIrAwZEV QzSfvY1Ek1m6vdDtJYGXNLSRNtW1M5ytammfJDYlL8xTmFVKIr2ssnGIcaOuX0a4e9JH /DXA== X-Gm-Message-State: ANhLgQ0BHnvcZOBdX8ESbxkYRLxjbV4fQ+ZZrk7SLj4infMKN4d/ttg5 dbJQ/SE3uDJ2w8YX7IKgDNsn/w== X-Google-Smtp-Source: ADFU+vsws2eqOZlxDQQkmlsIvJQ1ivBDR/Au9oUOqDUQqVvEuamycFbO8yG/pIZC6ehrGORsRcL+Yg== X-Received: by 2002:a05:620a:22ef:: with SMTP id p15mr19433497qki.495.1584979796923; Mon, 23 Mar 2020 09:09:56 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-68-57-212.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.57.212]) by smtp.gmail.com with ESMTPSA id a11sm1005364qto.57.2020.03.23.09.09.55 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 23 Mar 2020 09:09:56 -0700 (PDT) Received: from jgg by mlx.ziepe.ca with local (Exim 4.90_1) (envelope-from ) id 1jGPeF-00053p-1f; Mon, 23 Mar 2020 13:09:55 -0300 Date: Mon, 23 Mar 2020 13:09:55 -0300 From: Jason Gunthorpe To: Mike Kravetz Cc: "Longpeng (Mike)" , akpm@linux-foundation.org, kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org, arei.gonglei@huawei.com, weidong.huang@huawei.com, weifuqiang@huawei.com, kvm@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Sean Christopherson , stable@vger.kernel.org Subject: Re: [PATCH v2] mm/hugetlb: fix a addressing exception caused by huge_pte_offset() Message-ID: <20200323160955.GY20941@ziepe.ca> References: <1582342427-230392-1-git-send-email-longpeng2@huawei.com> <51a25d55-de49-4c0a-c994-bf1a8cfc8638@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51a25d55-de49-4c0a-c994-bf1a8cfc8638@oracle.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Mar 21, 2020 at 04:38:19PM -0700, Mike Kravetz wrote: > Andrew dropped this patch from his tree which caused me to go back and > look at the status of this patch/issue. > > It is pretty obvious that code in the current huge_pte_offset routine > is racy. I checked out the assembly code produced by my compiler and > verified that the line, > > if (pud_huge(*pud) || !pud_present(*pud)) > > does actually dereference *pud twice. So, the value could change between > those two dereferences. Longpeng (Mike) could easlily recreate the issue > if he put a delay between the two dereferences. I believe the only > reservations/concerns about the patch below was the use of READ_ONCE(). > Is that correct? I'm looking at a similar situation in pagewalk.c right now with PUD, and it is very confusing to see that locks are being held, memory accessed without READ_ONCE, but actually it has concurrent writes. I think it is valuable to annotate with READ_ONCE when the author knows this is an unlocked data access, regardless of what the compiler does. pagewalk probably has the same racy bug you show here, I'm going to send a very similar looking patch to pagewalk hopefully soon. Also, the remark about pmd_offset() seems accurate. The get_user_fast_pages() pattern seems like the correct one to emulate: pud = READ_ONCE(*pudp); if (pud_none(pud)) .. if (!pud_'is a pmd pointer') .. pmdp = pmd_offset(&pud, address); pmd = READ_ONCE(*pmd); [...] Passing &pud in avoids another de-reference of the pudp. Honestly all these APIs that take in page table pointers and internally de-reference them seem very hard to use correctly when the page table access isn't fully locked against write. This also relies on 'some kind of locking' to prevent the pmdp from becoming freed concurrently while this is running. .. also this only works if READ_ONCE() is atomic, ie the pud can't be 64 bit on a 32 bit platform. At least pmd has this problem, I haven't figured out if pud does?? > Are there any objections to the patch if the READ_ONCE() calls are removed? I think if we know there is no concurrent data access then it makes sense to keep the READ_ONCE. It looks like at least the p4d read from the pgd is also unlocked here as handle_mm_fault() writes to it?? Jason