From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EC2AC433E0 for ; Tue, 29 Dec 2020 04:37:22 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E2493208D5 for ; Tue, 29 Dec 2020 04:37:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E2493208D5 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:Message-ID:In-Reply-To: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=vYoL6lKBRlDRaRlAep+J7l3/8yHRfL4ordlQcInOK9o=; b=Lxyn5+bRzIHUk9VcsMej01rI0 Jz12UZfltRKWJCeu7GxjYvv3y+1qfYiP/TiuT0ILOGS94hYrfun9Mo5XiSa0HTCrHecLiLaZMqEUa XKT9uELB6Wq+yO9lWkFJihGEpaYvzqrXuEXOaF4H2kDrcISrpn2rdC08Fuh9AYhSb6ID+xc7ukY5k N0M7ZgvGhIO23TG7E4g6ELZOz9atNm8cQMfquN7psVi1bXV2zkp2hmxTXPuBUX6qxHHZH0VuHOpVW /W+KjXTJ89X1pRsY93gNSd0R7NpfkTJlB+sLkMgN4AT0Y1ewKWRSlRAcepetC/ESYH9xpKcMubydM 13u7/oWaw==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1ku6jQ-00026Z-HN; Tue, 29 Dec 2020 04:35:36 +0000 Received: from mail-oi1-x235.google.com ([2607:f8b0:4864:20::235]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1ku6jL-00025K-VM for linux-arm-kernel@lists.infradead.org; Tue, 29 Dec 2020 04:35:33 +0000 Received: by mail-oi1-x235.google.com with SMTP id d203so13601771oia.0 for ; Mon, 28 Dec 2020 20:35:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=hqQREaNq+NNAv75fe43F01kJKyRrA6N95gs0RKLBLQM=; b=HKp1B9SaJ4tN+WlMO/Qg9gpOOi6ZDpwz4xG4OzPcWYuJu1lWBDnwSx77dfIPumeGBt 6xCVrpoH6RqQL5DEDV5mduzxLMDfvMG0oGAd50jUFEvL3Z5Bsx/LwqU5WP9JG4ZvhopB uSfSicQt/SklehWvKfm08VlMl0WGiv9Mhpp/VszvsmacTClXzXnVUpM2whqiErC0hn/Z alqFshOjTj3PaOHMuDSLeO+aIexc3M/a99on4A9KqSkwRBn4od3ypNmOnHk0eX8n2TwZ YzCaL4nGke7Y0FkGjnG1psXFjOxzTmej5ESsnXpIQ4iu4vgYXK9ZwoujSntqcILYI7X9 bfuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=hqQREaNq+NNAv75fe43F01kJKyRrA6N95gs0RKLBLQM=; b=uLE3wFC/9n2K7z7nl6kk0vZlt3a4wSMjzQbQJOrbujC1rO0oDmENjLClEND240EVKB c33E9ayWDzjxTDJe3FUD10eyv0HL/BGc2YVBxOVK7j85yGI9QvEtsIaTDkFOKREYrfDD NgM2gA2K9NwCgoMiD4/AFnMJvCMkRot4wZTrnoxjvqFjcdksqAYFYF+dnNOy5K/6Gd/2 j+IKu0v6f8lr2roUvlJCQr7GPmoiv5ZDACYYlpW/bkqr3R+3EydwYMQIiDMNFZKRXEvc S+5yRykCtuH+/sYa7NzcITYmZm3OKegecG+2uuIJYUemyGmDzYkxYBeZnV0odSAAh1em qsQQ== X-Gm-Message-State: AOAM530x88Mon4E0IbsJUjl6kii2LmLiu0cFHQ4VZ7PesTBS14zSpiew zdqf8M6v9/0lzAbbUfo8vWgJMQ== X-Google-Smtp-Source: ABdhPJxnuGvedWExGwmna/yXsnxS3rwe/jtCvuhc/XFABPncfq3GbCaJ7VJgaAJnEVbAa24+fa1bNg== X-Received: by 2002:a05:6808:3c9:: with SMTP id o9mr1301842oie.103.1609216521313; Mon, 28 Dec 2020 20:35:21 -0800 (PST) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j2sm9666576otq.78.2020.12.28.20.35.19 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Mon, 28 Dec 2020 20:35:20 -0800 (PST) Date: Mon, 28 Dec 2020 20:35:06 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: "Kirill A. Shutemov" Subject: Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting In-Reply-To: <20201228221237.6nu75kgxq7ikxn2a@box> Message-ID: References: <20201226224016.dxjmordcfj75xgte@box> <20201227234853.5mjyxcybucts3kbq@box> <20201228125352.phnj2x2ci3kwfld5@box> <20201228220548.57hl32mmrvvefj6q@box> <20201228221237.6nu75kgxq7ikxn2a@box> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201228_233532_044790_B0FA0498 X-CRM114-Status: GOOD ( 37.48 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Android Kernel Team , Jan Kara , Minchan Kim , Linus Torvalds , Hugh Dickins , Linux Kernel Mailing List , Matthew Wilcox , Linux-MM , Vinayak Menon , Linux ARM , Catalin Marinas , Andrew Morton , Will Deacon , "Kirill A. Shutemov" Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Got it at last, sorry it's taken so long. On Tue, 29 Dec 2020, Kirill A. Shutemov wrote: > On Tue, Dec 29, 2020 at 01:05:48AM +0300, Kirill A. Shutemov wrote: > > On Mon, Dec 28, 2020 at 10:47:36AM -0800, Linus Torvalds wrote: > > > On Mon, Dec 28, 2020 at 4:53 AM Kirill A. Shutemov wrote: > > > > > > > > So far I only found one more pin leak and always-true check. I don't see > > > > how can it lead to crash or corruption. Keep looking. Those mods look good in themselves, but, as you expected, made no difference to the corruption I was seeing. > > > > > > Well, I noticed that the nommu.c version of filemap_map_pages() needs > > > fixing, but that's obviously not the case Hugh sees. > > > > > > No,m I think the problem is the > > > > > > pte_unmap_unlock(vmf->pte, vmf->ptl); > > > > > > at the end of filemap_map_pages(). > > > > > > Why? > > > > > > Because we've been updating vmf->pte as we go along: > > > > > > vmf->pte += xas.xa_index - last_pgoff; > > > > > > and I think that by the time we get to that "pte_unmap_unlock()", > > > vmf->pte potentially points to past the edge of the page directory. > > > > Well, if it's true we have bigger problem: we set up an pte entry without > > relevant PTL. > > > > But I *think* we should be fine here: do_fault_around() limits start_pgoff > > and end_pgoff to stay within the page table. Yes, Linus's patch had made no difference, the map_pages loop is safe in that respect. > > > > It made mw looking at the code around pte_unmap_unlock() and I think that > > the bug is that we have to reset vmf->address and NULLify vmf->pte once we > > are done with faultaround: > > > > diff --git a/mm/memory.c b/mm/memory.c > > Ugh.. Wrong place. Need to sleep. > > I'll look into your idea tomorrow. > > diff --git a/mm/filemap.c b/mm/filemap.c > index 87671284de62..e4daab80ed81 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2987,6 +2987,8 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, unsigned long address, > } while ((head = next_map_page(vmf, &xas, end_pgoff)) != NULL); > pte_unmap_unlock(vmf->pte, vmf->ptl); > rcu_read_unlock(); > + vmf->address = address; > + vmf->pte = NULL; > WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss); > > return ret; > -- And that made no (noticeable) difference either. But at last I realized, it's absolutely on the right track, but missing the couple of early returns at the head of filemap_map_pages(): add --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3025,14 +3025,12 @@ vm_fault_t filemap_map_pages(struct vm_f rcu_read_lock(); head = first_map_page(vmf, &xas, end_pgoff); - if (!head) { - rcu_read_unlock(); - return 0; - } + if (!head) + goto out; if (filemap_map_pmd(vmf, head)) { - rcu_read_unlock(); - return VM_FAULT_NOPAGE; + ret = VM_FAULT_NOPAGE; + goto out; } vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, @@ -3066,9 +3064,9 @@ unlock: put_page(head); } while ((head = next_map_page(vmf, &xas, end_pgoff)) != NULL); pte_unmap_unlock(vmf->pte, vmf->ptl); +out: rcu_read_unlock(); vmf->address = address; - vmf->pte = NULL; WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss); return ret; -- and then the corruption is fixed. It seems miraculous that the machines even booted with that bad vmf->address going to __do_fault(): maybe that tells us what a good job map_pages does most of the time. You'll see I've tried removing the "vmf->pte = NULL;" there. I did criticize earlier that vmf->pte was being left set, but was either thinking back to some earlier era of mm/memory.c, or else confusing with vmf->prealloc_pte, which is NULLed when consumed: I could not find anywhere in mm/memory.c which now needs vmf->pte to be cleared, and I seem to run fine without it (even on i386 HIGHPTE). So, the mystery is solved; but I don't think any of these patches should be applied. Without thinking through Linus's suggestions re do_set_pte() in particular, I do think this map_pages interface is too ugly, and given us lots of trouble: please take your time to go over it all again, and come up with a cleaner patch. I've grown rather jaded, and questioning the value of the rework: I don't think I want to look at or test another for a week or so. Hugh _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel