From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB1D3C4332F for ; Thu, 8 Dec 2022 22:21:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 751F98E0003; Thu, 8 Dec 2022 17:21:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 702008E0001; Thu, 8 Dec 2022 17:21:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5CA498E0003; Thu, 8 Dec 2022 17:21:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4F5C58E0001 for ; Thu, 8 Dec 2022 17:21:18 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 28BA1160401 for ; Thu, 8 Dec 2022 22:21:18 +0000 (UTC) X-FDA: 80220561036.03.15B0994 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 2C17B1C000E for ; Thu, 8 Dec 2022 22:21:15 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Dfdwed/d"; spf=pass (imf20.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670538076; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=t77lCNIODLpJzB/PzPsfG+a/MQodrWqq6HVKm0HpGEs=; b=hZ3H698cYD6Gwp91zcl8v+C2F1zheicfozksbAiCXNKzH6XevZZqhdOvXG4G5nFRplAO7d uaX33JNeTvuuauvZZQYg2HtPfOF9JSW50xcHIkiBtuv2GbqcbmER/xQiY5UmJjgEjc2cL5 BaMbO6gKAjxGcm7nnPtjucfV15dDCuM= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Dfdwed/d"; spf=pass (imf20.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670538076; a=rsa-sha256; cv=none; b=G5mtR5tl0eOxuNzdPnBcMBN6Kb1mtV3dwwhBjONfRT8Aen+Vnwjqiwvb+HWdPvY2Aikvwm Q//jPLTXhJDXg9NREglxXZoe/wSwhrOFK9zV1qYzlnAqgh+GexijAW8MKEK0Q3W6J0dbQB K7RQw4z+hSBFfdwdXnH2BfPaeToLKZI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670538075; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=t77lCNIODLpJzB/PzPsfG+a/MQodrWqq6HVKm0HpGEs=; b=Dfdwed/d70QreYqoWbMXT4mq+RYY/iu13melmrKD0j1N7IhzX9bXGHsxAZQkmygk2HuigN b5ycwL0tR7FtFlDCLlUIIiiwaUG9am9QCMk61tJKCqy//GvZCrnc31xCC6hwybYyvX2E2M ZuBG4FFNqd8tse43aN+aqZ2NWv5XpbA= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-563-9BFfO-RLNLSH6h1Ln4TJog-1; Thu, 08 Dec 2022 17:21:14 -0500 X-MC-Unique: 9BFfO-RLNLSH6h1Ln4TJog-1 Received: by mail-qv1-f71.google.com with SMTP id y11-20020ad457cb000000b004c6fafdde42so2719565qvx.5 for ; Thu, 08 Dec 2022 14:21:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=t77lCNIODLpJzB/PzPsfG+a/MQodrWqq6HVKm0HpGEs=; b=ZtV2w9/d8vSX0UpsFjSlJOrzEq7pXmxK62CeljtNqngOBPYIf8SOe05At0Zy5Oi+S+ 9edEgsjA5W9ZsQ1URacRI0fpA+tEKcGa/FhNbmtz6juuCpdMMpGUUc/nXMqa7nnb3rfi oQD8eg5LwlO2xVfRSLPFqCHJQ7O3SI2jFiFu5cRtDDQepGXvZFAmr92PfMGlC51rpOVL PmD7GsdQtBV+9ntoxaimFU2+rggDbsgVLjFNGwLUwDiPvP2CCmTyynByzNSDzFZIgPnt zySlyiTRY3LOHfwv3LPYij60WEKAuVm5Bq38VoXKB9BY7WXe/kzkGVNgWrLoUb7rfShv +vnw== X-Gm-Message-State: ANoB5pmgoxzMws3Vr+gRQWoDZqr73fsGH59LtigaNWwVTB6RZ+0QUozu niMEacUS82BSJ6QBRbtCqYR6Rqd7KJwJ27OTGBPLdCWHXHWRXCPiqcDdzIL0kMQg11r9FQvcDm8 PtyAs7mEA2YE= X-Received: by 2002:a05:6214:440d:b0:4bb:653c:dd05 with SMTP id oj13-20020a056214440d00b004bb653cdd05mr5414977qvb.37.1670538073904; Thu, 08 Dec 2022 14:21:13 -0800 (PST) X-Google-Smtp-Source: AA0mqf6fI6megs0kZxPJ5INnvc+2dK6YcUrRylUvNWmQb/47bV6nxtreyn6Z08QcI4RBwC1gTsbqNg== X-Received: by 2002:a05:6214:440d:b0:4bb:653c:dd05 with SMTP id oj13-20020a056214440d00b004bb653cdd05mr5414958qvb.37.1670538073610; Thu, 08 Dec 2022 14:21:13 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id dt4-20020a05620a478400b006fc9847d207sm19639798qkb.79.2022.12.08.14.21.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Dec 2022 14:21:12 -0800 (PST) Date: Thu, 8 Dec 2022 17:21:00 -0500 From: Peter Xu To: John Hubbard Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jann Horn , Andrea Arcangeli , James Houghton , Rik van Riel , Miaohe Lin , Nadav Amit , Mike Kravetz , David Hildenbrand , Andrew Morton , Muchun Song Subject: Re: [PATCH v2 10/10] mm/hugetlb: Document why page_vma_mapped_walk() is safe to walk Message-ID: References: <20221207203034.650899-1-peterx@redhat.com> <20221207203158.651092-1-peterx@redhat.com> <6a970de6-e3da-f57d-14fd-55f65ddcb27d@nvidia.com> <61751d01-2ba4-efc0-9cb8-eeeb3d70908d@nvidia.com> MIME-Version: 1.0 In-Reply-To: <61751d01-2ba4-efc0-9cb8-eeeb3d70908d@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Stat-Signature: 1b8mdicprkgqgty1qrquxqdmjt77jsr7 X-Rspam-User: X-Rspamd-Queue-Id: 2C17B1C000E X-Rspamd-Server: rspam06 X-HE-Tag: 1670538075-31309 X-HE-Meta: U2FsdGVkX19dvzQQu/skqSMjME3LChCYvGoZqq+6qe2jMTJf6mdyeAB/1CGo7toiaEj4zJBhahs26c1iMy7V3/gN+cJUlW2Z0hgI9JNWt2rUObXjmSzVVtafqv8TCuIxnFmZkXdyv469H4H8h30h19xjN5U8lq9cVhtIZlUkaskp5MZNBqiJBoXim+AFo/BtsVX4zJvCNwiezJJI2QSDQNuw5AV++LPzBD1N91Br4LNMS5eRfCTIJnCVHiKZtWvkZnUIL69aOMLXgEXnA7TwBkJIm1yMKNqeclUbZ7bzF8gP5Ny4HVF0bSOvuyceiJwJJJ7mKLP5TqFXC8UsdEhPPHeuaS4pCCdFHF9b9R+U6MFlKyWT/P/9VhqT6i31+Uc2KSU0m8wW6JiDEnfFwha06XqPyffIxfegTEgxp9gTXUbSeOZxnj1ZJlk4Yl9X5c7NtgX4r3srWlvtf3FeXSUyLxDUWCkzsBEYiJueDNkXb4myH2YyQcVA5l1NztSlzKZhsRJFNuGtesS9yuLBj325wF+6dHswD9BkGnacAYgAIPUjMchLJewBrdAUlshGXx16vgWLKz4TfaKvsBYhPiP3zVkYQX3/4DrKWhNLZbxbHtAFREEYkRH/a0Keg8QHA6jqcbWzM4fsCzw6aNFvCpoWeKMKnqwqzNMGhc7kd8vT2RdbypE5DRyzJ3AXt6p24XVFiePuLt2g3yDxvF9FTxZ5VKAkkMHmDJzIe9PE345U8WEp5PjtlxFE4anNVEL5v/6UpTzKSU34WfWpBf9aj0sTX7qLsppwm08s5i0Rnnw2AFx9ZlpD7nEnxIdULFBNX67kVWe4SvPYdF53PhwNiUlZ/kNTP2LaULR1UGUbhuZKp4i+CQRbG7R/xt9fACIXBjcRKSLnlakRM+fPKXS5ggdsLUTxL3xFKme+crcvN3Dzxuw0Vpt9wSnU1yd7JIbNs1bc+BrVmkUMt5OZvgX4L/r DITAkWIG WONRUnSZmT6WOsk09uFp6k4+Zl9GU6wqnBhaRC5RJP8yxxJdbRxYJn4+GY3XEtgsaY1j4NYDC5BdnA3BAisJ6UgGCRKJfvkyV8K1o3L4A6XYfaDCoEAN1QEQgIYmvO5IggeQ7W96xdEJ27G2r4S9hQSoT4m9StUViU6MRd0Lyg3jHkAnJrKyxTl2BgOJPQ71gcRn2hTcb++gKHbYUiBV8hK4oDNv8LffnYP+J X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 08, 2022 at 01:54:27PM -0800, John Hubbard wrote: > On 12/8/22 13:05, Peter Xu wrote: > > > > + /* > > > > + * NOTE: we don't need explicit lock here to walk the > > > > + * hugetlb pgtable because either (1) potential callers of > > > > + * hugetlb pvmw currently holds i_mmap_rwsem, or (2) the > > > > + * caller will not walk a hugetlb vma (e.g. ksm or uprobe). > > > > + * When one day this rule breaks, one will get a warning > > > > + * in hugetlb_walk(), and then we'll figure out what to do. > > > > + */ > > > > > > Confused. Is this documentation actually intended to refer to hugetlb_walk() > > > itself, or just this call site? If the former, then let's move it over > > > to be right before hugetlb_walk(). > > > > It is for this specific code path not hugetlb_walk(). > > > > The "holds i_mmap_rwsem" here is a true statement (not requirement) because > > PVMW rmap walkers always have that. That satisfies with hugetlb_walk() > > requirements already even without holding the vma lock. > > > > It's really hard to understand. Do you have a few extra words to explain it? > I can help with actual comment wording perhaps, but I am still a bit in > the dark as to the actual meaning. :) Firstly, this patch (to be squashed into previous) is trying to document page_vma_mapped_walk() on why it's not needed to further take any lock to call hugetlb_walk(). To call hugetlb_walk() we need either of the locks listed below (in either read or write mode), according to the rules we setup for it in patch 3: (1) hugetlb vma lock (2) i_mmap_rwsem lock page_vma_mapped_walk() is called in below sites across the kernel: __replace_page[179] if (!page_vma_mapped_walk(&pvmw)) __damon_pa_mkold[24] while (page_vma_mapped_walk(&pvmw)) { __damon_pa_young[97] while (page_vma_mapped_walk(&pvmw)) { write_protect_page[1065] if (!page_vma_mapped_walk(&pvmw)) remove_migration_pte[179] while (page_vma_mapped_walk(&pvmw)) { page_idle_clear_pte_refs_one[56] while (page_vma_mapped_walk(&pvmw)) { page_mapped_in_vma[318] if (!page_vma_mapped_walk(&pvmw)) folio_referenced_one[813] while (page_vma_mapped_walk(&pvmw)) { page_vma_mkclean_one[958] while (page_vma_mapped_walk(pvmw)) { try_to_unmap_one[1506] while (page_vma_mapped_walk(&pvmw)) { try_to_migrate_one[1881] while (page_vma_mapped_walk(&pvmw)) { page_make_device_exclusive_one[2205] while (page_vma_mapped_walk(&pvmw)) { If we group them, we can see that most of them are during a rmap walk (i.e., comes from a higher rmap_walk() stack), they are: __damon_pa_mkold[24] while (page_vma_mapped_walk(&pvmw)) { __damon_pa_young[97] while (page_vma_mapped_walk(&pvmw)) { remove_migration_pte[179] while (page_vma_mapped_walk(&pvmw)) { page_idle_clear_pte_refs_one[56] while (page_vma_mapped_walk(&pvmw)) { page_mapped_in_vma[318] if (!page_vma_mapped_walk(&pvmw)) folio_referenced_one[813] while (page_vma_mapped_walk(&pvmw)) { page_vma_mkclean_one[958] while (page_vma_mapped_walk(pvmw)) { try_to_unmap_one[1506] while (page_vma_mapped_walk(&pvmw)) { try_to_migrate_one[1881] while (page_vma_mapped_walk(&pvmw)) { page_make_device_exclusive_one[2205] while (page_vma_mapped_walk(&pvmw)) { Let's call it case (A). We have another two special cases that are not during a rmap walk, they are: write_protect_page[1065] if (!page_vma_mapped_walk(&pvmw)) __replace_page[179] if (!page_vma_mapped_walk(&pvmw)) Let's call it case (B). Case (A) is always safe because it always take the i_mmap_rwsem lock in read mode. It's done in rmap_walk_file() where: if (!locked) { if (i_mmap_trylock_read(mapping)) goto lookup; if (rwc->try_lock) { rwc->contended = true; return; } i_mmap_lock_read(mapping); } If locked==true it means the caller already holds the lock, so no need to take it. It justifies that all callers from rmap_walk() upon a hugetlb vma is safe to call hugetlb_walk() already according to the rule of hugetlb_walk(). Case (B) contains two cases either in KSM path or uprobe path, and none of the paths (afaict) can get a hugetlb vma involved. IOW, the whole path of if (unlikely(is_vm_hugetlb_page(vma))) { In page_vma_mapped_walk() just should never trigger. To summarize above into a shorter paragraph, it'll become the comment. Hope it explains. Thanks. -- Peter Xu