From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1309EC54E67 for ; Wed, 27 Mar 2024 09:51:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6AB666B0087; Wed, 27 Mar 2024 05:51:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 65BFC6B0089; Wed, 27 Mar 2024 05:51:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 522FA6B008A; Wed, 27 Mar 2024 05:51:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 35A076B0087 for ; Wed, 27 Mar 2024 05:51:12 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id EEBD91A0A34 for ; Wed, 27 Mar 2024 09:51:11 +0000 (UTC) X-FDA: 81942350742.18.F1EF05A Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf21.hostedemail.com (Postfix) with ESMTP id 168891C0004 for ; Wed, 27 Mar 2024 09:51:09 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711533070; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SKnm/61Gy7Shl0uPuxzr9+9bnHvnObNAVegRhiHhAAg=; b=28L2a+IprOjcTN+ugmngNGQtYA6jpMyy8h7Pwv90iRff5xS2nRnHt5KrEmIGLlELWL14Qk NIWDUwqRvGwZOHZDYCKhgDRw+hzy3eII7A7NzS0Q8Kx95k77sajRgBi5lTIXc5HAYArdhr JBXKEe/BMPS4jm3ePAvWEg0kufEevfk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711533070; a=rsa-sha256; cv=none; b=nUdIrC8J1ti8LU0er5OOaCvhM+LZVxHqFBDg0FGB7o8hT/wE14MFP9ZWgn4CtCp9Txq7Yl ypSUbaFDaBxe6+HpuRchVu1rm0/XE/FAU3VGx1+V/GxIK2hAsLezH41aGjlDGcGAlYNDDW eTHo5bOkRBQIm5c3gtLT9k03Yp1tgOQ= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1ED372F4; Wed, 27 Mar 2024 02:51:43 -0700 (PDT) Received: from [10.57.72.121] (unknown [10.57.72.121]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 191333F694; Wed, 27 Mar 2024 02:51:06 -0700 (PDT) Message-ID: Date: Wed, 27 Mar 2024 09:51:05 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v1 3/4] mm/memory: Use ptep_get_lockless_norecency() for orig_pte Content-Language: en-GB To: David Hildenbrand , Mark Rutland , Catalin Marinas , Will Deacon , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Andrew Morton , Muchun Song Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20240215121756.2734131-1-ryan.roberts@arm.com> <20240215121756.2734131-4-ryan.roberts@arm.com> <6aaff470-c510-469b-8f4f-2e4c5cf07d56@redhat.com> From: Ryan Roberts In-Reply-To: <6aaff470-c510-469b-8f4f-2e4c5cf07d56@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 168891C0004 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: grjqiksfabe97ywjeqri8fc7f3e3f451 X-HE-Tag: 1711533069-925834 X-HE-Meta: U2FsdGVkX19urzBA6JDndYJcm51OP/ZkbpRGqn29aB7rOr4KbYcoPdOqfnJeLlmCIrMsAqwr9U8J8jzKtkXD7O5C5qmvxlIHJ5OQHhXwPbuK6BQmD1X49cuEf0xexWtJJCDvnhh6NiFxTzTxupZ39Fcu74fjTpeDt54p6W7XpnjzJuCYtXBG7jSsJagmH2JeKPqka+4Zy0kHqiqQg1CYhI3ooUO7NyKL4t3js5Jn/LH4SWh718FkT67E6IS0Zn9MhHMItc6Olc3yQP4qx7XFEjBetzREo3x8UJyYNWeGSvXZB1Vh+tRIxGBzk86JqL5y7JpJZyGYvCJzfh/SwGKTYWh5y0s6is+X6mpFL+tO+8bIbgLX6Lff4656oI5SMAdLASHgS61OhHmMN2vzEoOCDF9OyHC8ywZB7x8DWvTiE45rllgBxmoxnW1T4e2fdLZKPe87gVFFZnQezWBak1E7yIT65fXuCZgBOPFdYUE0Ze/bOiITQ8kynAXdmcojDYU2j35PDeD+bPHsBswCVcNYbdKGS7lD2iMP5JZeoB1g4OcN48NYkHGvRGeHA3icanZmWN6Yn+o7A75Libn29HaP6DKW/WAV8dVjIZh8qjdGodbPGftvg4dJ0rLbLqdZCsSxT/z/9Cqp+k82/OEXJnabVnY4m/0Ms8bQvc4f9XOjO8vEvF0vcuUIt58mq5D7Un/jSOOYoFzO2CqefuhGabF47Gq/oK+Yaz9qnQkw5IAlfSljQQMweXD7YZ9wawDUKAdoiAzISfQdGgCge59TV+Ylo8K62SM2aEzE6Gzum2rvXvVSWh5xaFhYxShkkNacOGt0Z1M2QbrKJEBHgEvb+a4EKHcYnRWZ2Unigy9fM4J1tM9TMqZc8MLDV4nyXIiv5W2bVjwctoTPccgTJZ6dCZ9TrkG5tMpUzEWcggpounVsi2ypr+iRmexOAzNR8+YQHFu1NW4KUd2zgpHf9728M9W GKhbyaop sEfl9+lcUk8/1eBd/UmXT76o/1c5Zs7P+qOyMELIZdrUG5s8cYHdJekpud5ol7vS0x/bxKK6Ei1SiXYQtZYxC4q706gHBSXZr1xnQrM4yoHkt+yZyCv4VTiP0C2IUPSJ08g/iR6CtDcObGXzThoeqO7cCi0E8kePB7oWea9Tk9CHOVEk7pZk47K/my9Zl2yiIy46r6AZFMQtwe3t2OJ/FGoFqQr0SN7DY7nv3 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 26/03/2024 17:58, David Hildenbrand wrote: >>>>> >>>>> vmf->orig_pte = ptep_get_lockless_norecency(vmf->pte) >>>>> /* not dirty */ >>>>> >>>>> /* Now, thread 2 ends up setting the PTE dirty under PT lock. */ >> >> Ahh, this comment about thread 2 is not referring to the code immediately below >> it. It all makes much more sense now. :) > > Sorry :) > >> >>>>> >>>>> spin_lock(vmf->ptl); >>>>> entry = vmf->orig_pte; >>>>> if (unlikely(!pte_same(ptep_get(vmf->pte), entry))) { >>>>>       ... >>>>> } >>>>> ... >>>>> entry = pte_mkyoung(entry); >>>> >>>> Do you mean pte_mkdirty() here? You're talking about dirty everywhere else. >>> >>> No, that is just thread 1 seeing "oh, nothing to do" and then goes ahead and >>> unconditionally does that in handle_pte_fault(). >>> >>>> >>>>> if (ptep_set_access_flags(vmf->vma, ...) >>>>> ... >>>>> pte_unmap_unlock(vmf->pte, vmf->ptl); >>>>> >>>>> >>>>> Generic ptep_set_access_flags() will do another pte_same() check and realize >>>>> "hey, there was a change!" let's update the PTE! >>>>> >>>>> set_pte_at(vma->vm_mm, address, ptep, entry); >>>> >>>> This is called from the generic ptep_set_access_flags() in your example, right? >>>> >>> >>> Yes. >>> >>>>> >>>>> would overwrite the dirty bit set by thread 2. >>>> >>>> I'm not really sure what you are getting at... Is your concern that there is a >>>> race where the page could become dirty in the meantime and it now gets lost? I >>>> think that's why arm64 overrides ptep_set_access_flags(); since the hw can >>>> update access/dirty we have to deal with the races. >>> >>> My concern is that your patch can in subtle ways lead to use losing PTE dirty >>> bits on architectures that don't have the HW-managed dirty bit. They do exist ;) >> >> But I think the example you give can already happen today? Thread 1 reads >> orig_pte = ptep_get_lockless(). So that's already racy, if thread 2 is going to >> set dirty just after the get, then thread 1 is going to set the PTE back to (a >> modified version of) orig_pte. Isn't it already broken? > > No, because the pte_same() check under PTL would have detected it, and we would > have backed out. And I think the problem comes to live when we convert > pte_same()->pte_same_norecency(), because we fail to protect PTE access/dirty > changes that happend under PTL from another thread. Ahh yep. Got it. I absolutely knew that you would be correct, but I still walked right into it! I think one could argue that the generic ptep_set_access_flags() is not implementing its own spec: " ... Only sets the access flags (dirty, accessed), as well as write permission. Furthermore, we know it always gets set to a "more permissive" setting ... " Surely it should be folding the access and dirty bits from *ptep into entry if they are set? Regardless, I think this example proves that its fragile and subtle. I'm not really sure how to fix it more generally/robustly. Any thoughts? If not perhaps we are better off keeping ptep_get_lockless() around and only using ptep_get_lockless_norecency() for the really obviously correct cases? > > But could be I am missing something :) > >>> Arm64 should be fine in that regard. >>> >> >> There is plenty of arm64 HW that doesn't do HW access/dirty update. But our >> ptep_set_access_flags() can always deal with a racing update, even if that >> update originates from SW. >> >> Why do I have the feeling you're about to explain (very patiently) exactly why >> I'm wrong?... :) > > heh ... or you'll tell me (vary patiently) why I am wrong :) >