linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>,
	Sean Christopherson <seanjc@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joerg Roedel <jroedel@suse.de>, Andi Kleen <ak@linux.intel.com>,
	Kuppuswamy Sathyanarayanan
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	David Rientjes <rientjes@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Ingo Molnar <mingo@redhat.com>,
	Varad Gautam <varad.gautam@suse.com>,
	Dario Faggioli <dfaggioli@suse.com>,
	x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev,
	linux-kernel@vger.kernel.org,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH 1/5] mm: Add support for unaccepted memory
Date: Fri, 13 Aug 2021 00:08:46 +0300	[thread overview]
Message-ID: <20210812210846.bfalflrvn4bfpyyh@box.shutemov.name> (raw)
In-Reply-To: <f7667988-4d6c-461e-901d-a6c3612b2f0f@intel.com>

On Tue, Aug 10, 2021 at 01:50:57PM -0700, Dave Hansen wrote:
> On 8/10/21 11:13 AM, Dave Hansen wrote:
> >> @@ -1001,6 +1004,9 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone,
> >>  	if (page_reported(page))
> >>  		__ClearPageReported(page);
> >>  
> >> +	if (PageOffline(page))
> >> +		clear_page_offline(page, order);
> >> +
> >>  	list_del(&page->lru);
> >>  	__ClearPageBuddy(page);
> >>  	set_page_private(page, 0);
> > So, this is right in the fast path of the page allocator.  It's a
> > one-time thing per 2M page, so it's not permanent.
> > 
> > *But* there's both a global spinlock and a firmware call hidden in
> > clear_page_offline().  That's *GOT* to hurt if you were, for instance,
> > running a benchmark while this code path is being tickled.  Not just to
> > 
> > That could be just downright catastrophic for scalability, albeit
> > temporarily.
> 
> One more thing...
> 
> How long are these calls?  You have to make at least 512 calls into the
> SEAM module.  Assuming they're syscall-ish, so ~1,000 cycles each,
> that's ~500,000 cycles, even if we ignore the actual time it takes to
> zero that 2MB worth of memory and all other overhead within the SEAM module.

I hope to get away with 2 calls per 2M: one MapGPA and one TDACCEPTPAGE
(or 3 for MAXORDER -- 4M -- pages). I don't have any numbers yet.

> So, we're sitting on one CPU with interrupts off, blocking all the other
> CPUs from doing page allocation in this zone. 

I agree that's not good. Let's see if it's going to be okay with accepting
in 2M chunks.

> Then, we're holding a global lock which prevents any other NUMA nodes
> from accepting pages.

Looking at this again, the global lock is aviodable: the caller owns the
pfn range so nobody can touch these bits in the bitmap. We can replace
bitmap_clear() with atomic clear_bit() loop and drop the lock completely.

> If the other node happens to *try* to do an
> accept, it will sit with its zone lock held waiting for this one.

> Maybe nobody will ever notice.  But, it seems like an awfully big risk
> to me.  I'd at least *try* do these calls outside of the zone lock.
> Then the collateral damage will at least be limited to things doing
> accepts rather than all zone->lock users.
> 
> Couldn't we delay the acceptance to, say the place where we've dropped
> the zone->lock and do the __GFP_ZERO memset() like at prep_new_page()?
> Or is there some concern that the page has been split at that point?

It *will* be split by the point. Like if you ask for order-0 page and you
don't any left page allocator will try higher orders until finds anything.
On order-9 it would hit unaccepted. At that point the page going to split
and put on the free lists accordingly. That's all happens under zone lock.

  __rmqueue_smallest ->
    del_page_from_free_list()
    expand()

> I guess that makes it more complicated because you might have a 4k page
> but you need to go accept a 2M page.  You might end up having to check
> the bitmap 511 more times because you might see 511 more PageOffline()
> pages come through.
> 
> You shouldn't even need the bitmap lock to read since it's a one-way
> trip from unaccepted->accepted.

Yeah. Unless we don't want to flip it back on making the range share.
I think we do. Otherwise it will cause problems for kexec.

-- 
 Kirill A. Shutemov

  reply	other threads:[~2021-08-12 21:08 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-10  6:26 [PATCH 0/5] x86: Impplement support for unaccepted memory Kirill A. Shutemov
2021-08-10  6:26 ` [PATCH 1/5] mm: Add " Kirill A. Shutemov
2021-08-10  7:48   ` David Hildenbrand
2021-08-10 15:02     ` Kirill A. Shutemov
2021-08-10 15:21       ` David Hildenbrand
2021-08-12 20:34         ` Kirill A. Shutemov
2021-08-10 18:13   ` Dave Hansen
2021-08-10 18:30     ` Andi Kleen
2021-08-10 18:56       ` Dave Hansen
2021-08-10 19:23         ` Andi Kleen
2021-08-10 19:46           ` Dave Hansen
2021-08-10 21:20             ` Andi Kleen
2021-08-12  8:19               ` Joerg Roedel
2021-08-12 14:14                 ` Dave Hansen
2021-08-12 20:49                   ` Kirill A. Shutemov
2021-08-12 20:59                     ` Dave Hansen
2021-08-12 21:23                       ` Kirill A. Shutemov
2021-08-13 14:49                   ` Joerg Roedel
2021-08-17 15:00                     ` David Hildenbrand
2021-08-19  9:55                       ` Joerg Roedel
2021-08-19 10:06                         ` David Hildenbrand
2021-08-10 20:50     ` Dave Hansen
2021-08-12 21:08       ` Kirill A. Shutemov [this message]
2021-08-10  6:26 ` [PATCH 2/5] efi/x86: Implement " Kirill A. Shutemov
2021-08-10 17:50   ` Dave Hansen
2021-08-12 21:14     ` Kirill A. Shutemov
2021-08-12 21:43       ` Dave Hansen
2021-08-10 18:30   ` Dave Hansen
2021-08-10 19:08     ` Kirill A. Shutemov
2021-08-10 19:19       ` Dave Hansen
2021-08-12 21:17         ` Kirill A. Shutemov
2021-08-10  6:26 ` [PATCH 3/5] x86/boot/compressed: Handle " Kirill A. Shutemov
2021-08-10  6:26 ` [PATCH 4/5] x86/mm: Provide helpers for " Kirill A. Shutemov
2021-08-10 18:16   ` Dave Hansen
2021-08-12 20:31     ` Kirill A. Shutemov
2021-08-10  6:26 ` [PATCH 5/5] x86/tdx: Unaccepted memory support Kirill A. Shutemov
2021-08-10 14:08 ` [PATCH 0/5] x86: Impplement support for unaccepted memory Dave Hansen
2021-08-10 15:15   ` Kirill A. Shutemov
2021-08-10 15:51     ` Dave Hansen
2021-08-10 17:31       ` Kirill A. Shutemov
2021-08-10 17:36         ` Dave Hansen
2021-08-10 17:51           ` Kirill A. Shutemov
2021-08-10 18:19             ` Dave Hansen
2021-08-10 18:39               ` Kirill A. Shutemov
2021-08-12  8:23 ` Joerg Roedel
2021-08-12 10:10   ` Kirill A. Shutemov
2021-08-12 19:33     ` Andi Kleen
2021-08-12 20:22       ` Kirill A. Shutemov
2021-08-13 14:56         ` Joerg Roedel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210812210846.bfalflrvn4bfpyyh@box.shutemov.name \
    --to=kirill@shutemov.name \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=dfaggioli@suse.com \
    --cc=jroedel@suse.de \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=varad.gautam@suse.com \
    --cc=vbabka@suse.cz \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).