From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A72D42AD13 for ; Tue, 7 Jan 2025 04:35:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736224517; cv=none; b=IjZM8HfPMtWfkiK9hPsgNRDL1sFKtjCInc0rTEYFdHOXDGDfXz4SB1rfdB57mM7H/SxaIfs4NIkBRv2QKO8XX5uKfGMcKttwStQG1pTRCVt7XXQUDd+rlK3hrDWsGO+vnvuoOaGqyoTJp27TTrN1BX8jhQE14L0uBhDJ2mEntDY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736224517; c=relaxed/simple; bh=26BMyFUFU5Puj6LnWuWZ+G6tk6DpzuT39ZY1+0SnkqQ=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=NUXqEyI0M56Mj+vLsodvKQl4IaNEEIeUWC9O56WiBQzd3kv2y9XxArMmnPpa8i1PKwgxGL4CqYBF7A1BeYuOaTL8CB95+LyPDjHyG/3v+jluvVglIENVnyqxq5S9N4yow0fFWVQD98rXpRzdW48DhfwnN/MLDCXbg6NQNLBAW74= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Iajnt4h7; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Iajnt4h7" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2166f9f52fbso341888305ad.2 for ; Mon, 06 Jan 2025 20:35:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736224515; x=1736829315; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=Cn4s+9ucXGtjlzn0PWRq8ra+54khHbQ+LN3X8SCXohA=; b=Iajnt4h7SjWldN0Me8Vr9Om7O6XZDFvmq2TwiMHOpu9qkMyQ5clpdnVLD4s3468RS3 lRXvD3929KS6aSXTH+Cz1pzvQhuPd6GxEMErXKqbYSKTK12wmio5Ylq+XVE/51UavNBk pTbj6QKb8I940Lkfid5S9/DlMIyN37PD9XrvbUpnvvycWQw8YYx6JCkUfRpT7Roum3Ug yQ3FKuhPXIFi/AXXuznsFBXyB2pCCkCQUh5h+y9V0eV0+xK4y/o49LbXgl+POT4/WK45 CyesYIzE3++Qzff2TTnQzeRjkxwRLPImb46g4i6yKQ7MKx4y0pIsKelPAF6PRTM1GklS U0+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736224515; x=1736829315; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Cn4s+9ucXGtjlzn0PWRq8ra+54khHbQ+LN3X8SCXohA=; b=vuAexiyjat+cwP3K+Z591IzbkO4mjqiZkiWRMyWc/DVesFwse/RAyk2WxYX80WkJWg mpp1+qIVgdrDPMz1BEewBA0l9RiOFWcdsiJJn6M9s1nFCs3MlPe7Kg8N82wxvK6FDCw5 +XuaFEOvwk0Ah/bm+4ExIsxvMd8IpJGKOXFJ2XHbiCoqAmgS/6TXqRhBwz9YOiyWqH/9 qpQ9pbA/VQeyELKTXHPgoIYbAJLb+rWEoZfRE5+pBzJ7AZKQCY5kSCHeStj3FUH+ZRYS 8HHMJKcSSx1Lj7EVxTBpX25NdbUL+5gRrvo/gdyGg/kwYLwZQHkxGWl8bNhcHFobesmH sfLA== X-Forwarded-Encrypted: i=1; AJvYcCVrV92lB9c2YA7cDTtLpPmPo2XwtuCbN6YPw9SoM7svwEKAsHZ5bga+1Zy/phWr9C7QUyJYTRsKZJa2yOU=@vger.kernel.org X-Gm-Message-State: AOJu0Yx3q3Jox4HDnVZkJ2itkRt86Pm8onKEJktSH4McmAQMee1JeMLK 7uQI58sMzf529r5debKEeipbKUjHhcgHOGJHONhNdx8SYwqdp7hQJQtqCCWlxBmlHAdrzExyAGB GvA== X-Google-Smtp-Source: AGHT+IF+y73cdd3ZEHD+XhpEssxjkMg/Wm3KlFQdEXnuSvr8fI5xc73aStiWEaCUdlyInFDAuIKS9nee3YE= X-Received: from plbjy11.prod.google.com ([2002:a17:903:42cb:b0:216:2dc4:50c1]) (user=yuzhao job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:ce01:b0:215:97c5:52b4 with SMTP id d9443c01a7336-219e6f2601emr950676335ad.39.1736224514894; Mon, 06 Jan 2025 20:35:14 -0800 (PST) Date: Mon, 6 Jan 2025 21:35:05 -0700 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250107043505.351925-1-yuzhao@google.com> Subject: [PATCH mm-unstable v1] mm/hugetlb_vmemmap: fix memory loads ordering From: Yu Zhao To: Andrew Morton Cc: David Hildenbrand , Mateusz Guzik , "Matthew Wilcox (Oracle)" , Muchun Song , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao , Will Deacon Content-Type: text/plain; charset="UTF-8" Using x86_64 as an example, for a 32KB struct page[] area describing a 2MB hugeTLB, HVO reduces the area to 4KB by the following steps: 1. Split the (r/w vmemmap) PMD mapping the area into 512 (r/w) PTEs; 2. For the 8 PTEs mapping the area, remap PTE 1-7 to the page mapped by PTE 0, and at the same time change the permission from r/w to r/o; 3. Free the pages PTE 1-7 used to map, hence the reduction from 32KB to 4KB. However, the following race can happen due to improperly memory loads ordering: CPU 1 (HVO) CPU 2 (speculative PFN walker) page_ref_freeze() synchronize_rcu() rcu_read_lock() page_is_fake_head() is false vmemmap_remap_pte() XXX: struct page[] becomes r/o page_ref_unfreeze() page_ref_count() is not zero atomic_add_unless(&page->_refcount) XXX: try to modify r/o struct page[] Specifically, page_is_fake_head() must be ordered after page_ref_count() on CPU 2 so that it can only return true for this case, to avoid the later attempt to modify r/o struct page[]. This patch adds the missing memory barrier and makes the tests on page_is_fake_head() and page_ref_count() done in the proper order. Fixes: bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers") Reported-by: Will Deacon Closes: https://lore.kernel.org/20241128142028.GA3506@willie-the-truck/ Signed-off-by: Yu Zhao --- include/linux/page-flags.h | 2 +- include/linux/page_ref.h | 8 ++++++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 691506bdf2c5..6b8ecf86f1b6 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -212,7 +212,7 @@ static __always_inline const struct page *page_fixed_fake_head(const struct page * cold cacheline in some cases. */ if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) && - test_bit(PG_head, &page->flags)) { + test_bit_acquire(PG_head, &page->flags)) { /* * We can safely access the field of the @page[1] with PG_head * because the @page is a compound page composed with at least diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h index 8c236c651d1d..5becea98bd79 100644 --- a/include/linux/page_ref.h +++ b/include/linux/page_ref.h @@ -233,8 +233,12 @@ static inline bool page_ref_add_unless(struct page *page, int nr, int u) bool ret = false; rcu_read_lock(); - /* avoid writing to the vmemmap area being remapped */ - if (!page_is_fake_head(page) && page_ref_count(page) != u) + /* + * To avoid writing to the vmemmap area remapped into r/o in parallel, + * the page_ref_count() test must precede the page_is_fake_head() test + * so that test_bit_acquire() in the latter is ordered after the former. + */ + if (page_ref_count(page) != u && !page_is_fake_head(page)) ret = atomic_add_unless(&page->_refcount, nr, u); rcu_read_unlock(); -- 2.47.1.613.gc27f4b7a9f-goog