* Re: [PATCH v2 0/1] Accept unaccepted kexec segments' destination addresses
2025-01-13 14:59 ` Eric W. Biederman
@ 2025-01-14 3:26 ` Baoquan He
2025-01-14 7:04 ` Yan Zhao
` (2 subsequent siblings)
3 siblings, 0 replies; 24+ messages in thread
From: Baoquan He @ 2025-01-14 3:26 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Kirill A. Shutemov, akpm, kexec, Yan Zhao, linux-kernel,
linux-coco, x86, rick.p.edgecombe, kirill.shutemov, security
Hi Eric,
On 01/13/25 at 08:59am, Eric W. Biederman wrote:
> Baoquan He <bhe@redhat.com> writes:
>
> > On 01/13/25 at 12:01pm, Kirill A. Shutemov wrote:
> >> On Fri, Dec 13, 2024 at 05:49:30PM +0800, Yan Zhao wrote:
> >> > Hi Eric,
> >> >
> >> > This is a repost of the patch "kexec_core: Accept unaccepted kexec
> >> > destination addresses" [1], rebased to v6.13-rc2.
> >>
> >> Can we get this patch applied?
> >
> > This looks good to me. In v1, we have analyzed all other possible
> > solutions, however change in this patch seems the simplest and most
> > accepatable one.
>
> Truly? I will go back and look and see what I missed but I haven't seen
> anything that I addressed my original objections.
>
> To repeat my objection. The problem I saw was that the performance of
> the accepted memory paradigm was so terrible that they had to resort to
> lazily ``accepting'' memory, which leads to hacks in kexec. I would not
> like to included hacks in kexec just so that other people can avoid
> fixing their bugs.
>
> I did see a coherent explanation of the bad performance that pointed the
> finger squarely at the fact that everything is happening a page at a
> time. AKA that the design of the ACPI interface has a flaw that needs
> to be fixed.
>
> I really don't think we should be making complicated work-arounds for
> someone else's bad software decision just because someone immortalized
> their bad decision in a standard. Just accepting all of memory and
> letting the folks who made the bad decision deal with the consequences
> seems much more reasonable to me.
Ah, I didn't realized you object the accept_memory feature itself, sorry
about that. I personally dislike accept_memory either since there's
already DEFERRED_STRUCT_PAGE_INIT feature to improve the boot time
memory init. While talking about the passive providing RAM to guest
system when actually demanded, this seems to be helpful to save RAM
memory for cloud provider's host system, this is what I think is
valuable of the accept_memory, even though Intel engineer avoids to
delcare it formally.
Anyway, I would like to ack it based on accept_memory feature having
already been merged into mainline kernel. If the feature itself is
objected, the top priority is discussing to decide if we should take it
off in kernel or how limitedly it's being used in kernel, or vice versa,
whether supporting it in kexec truly is another story.
Thanks a lot for your thought sharing with elaborate explanation.
Thanks
Baoquan
>
> > If Eric has no objection, maybe Andrew can help pick this into his
> > tree.
>
> I have a new objection. I believe ``unaccepted memory'' and especially
> lazily initialized ``unaccepted memory'' is an information leak that
> could defeat the purpose of encrypted memory. For that reason I have
> Cc'd the security list. I don't know who to CC to get expertise on this
> issue, and the security list folks should.
>
> Unless I am misunderstanding things the big idea with encrypted
> memory is that the hypervisor won't be able to figure out what you
> are doing, because it can't read your memory.
>
> My concern is that by making the ``acceptance'' of memory lazy, that
> there is a fairly strong indication of the function of different parts
> of memory. I expect that signal is strong enough to defeat whatever
> elements of memory address randomization that we implement in the
> kernel.
>
> So not only does it appear to me that implementation of ``accepting''
> memory has a stupidly slow implementation, somewhat enshrined by a bad
> page at a time ACPI standard, but it appears to me that lazily
> ``accepting'' that memory probably defeats the purpose of having
> encrypted memory.
>
> I think the actual solution is to remove all code except for the
> "accept_memory=eager" code paths. AKA delete the "accept_memory=lazy"
> code. At that point there are no more changes that need to be made to
> kexec.
>
> Eric
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/1] Accept unaccepted kexec segments' destination addresses
2025-01-13 14:59 ` Eric W. Biederman
2025-01-14 3:26 ` Baoquan He
@ 2025-01-14 7:04 ` Yan Zhao
2025-01-14 10:08 ` Kirill A. Shutemov
2025-02-13 15:55 ` Dave Hansen
3 siblings, 0 replies; 24+ messages in thread
From: Yan Zhao @ 2025-01-14 7:04 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Baoquan He, Kirill A. Shutemov, akpm, kexec, linux-kernel,
linux-coco, x86, rick.p.edgecombe, kirill.shutemov, security
On Mon, Jan 13, 2025 at 08:59:29AM -0600, Eric W. Biederman wrote:
> Baoquan He <bhe@redhat.com> writes:
>
> > On 01/13/25 at 12:01pm, Kirill A. Shutemov wrote:
> >> On Fri, Dec 13, 2024 at 05:49:30PM +0800, Yan Zhao wrote:
> >> > Hi Eric,
> >> >
> >> > This is a repost of the patch "kexec_core: Accept unaccepted kexec
> >> > destination addresses" [1], rebased to v6.13-rc2.
> >>
> >> Can we get this patch applied?
> >
> > This looks good to me. In v1, we have analyzed all other possible
> > solutions, however change in this patch seems the simplest and most
> > accepatable one.
>
> Truly? I will go back and look and see what I missed but I haven't seen
> anything that I addressed my original objections.
>
> To repeat my objection. The problem I saw was that the performance of
> the accepted memory paradigm was so terrible that they had to resort to
> lazily ``accepting'' memory, which leads to hacks in kexec. I would not
> like to included hacks in kexec just so that other people can avoid
> fixing their bugs.
Hi Eric,
Your previous concerns in v1 [1] include:
1. "an unfiltered accept_memory may result in memory that has already been
``accepted'' being accepted again.
2. "target kernel won't know about about accepting memory, or might not perform
the work early enough and try to use memory without accepting it first."
3. "this is has the potential to conflict with the accounting in
try_to_accept_memory"
For 1/2, as we explained in [2], accept_memory() is not unfiltered. A bitmap in
the virtual firmware maintains the accepted/unaccepted status and the
bitmap is passed across the kernels.
For 3, sorry that I didn't explain clearly enough in v1, so I explained it in
detail in the v2's cover letter (please check bullet 6 in [3]).
The accounting in try_to_accept_memory_one() includes
zone->unaccepted_pages,
zone_page_state(zone, NR_FREE_PAGES),
zone_page_state(zone, NR_UNACCEPTED),
which are updated in try_to_accept_memory_one()-->__accept_page().
However, the accounting will not be affected by invoking accept_memory()
in kexec, since accept_memory() does not modify them, and it's correct to
do so because of the way how the "accept_memory=lazy" works:
(1) when to release free pages to the buddy allocator in
memblock_free_all() during kernel boot, "accept_memory=lazy" withholds
some pages out of the buddy allocator by recording them in the
zone->unaccepted_pages list. The NR_FREE_PAGES, NR_UNACCEPTED are
increased accordingly. By NR_UNACCEPTED, it just means the count of
pages that are potentially available but currently not available to
buddy's freelists. It does not mean alls the pages must be in
unaccepted status.
(see __free_pages_core() and __free_unaccepted()).
(2) When the kernel runs out of memory, indicated by no enough
NR_UNACCEPTED, it invokes
cond_accept_memory()-->try_to_accept_memory_one()-->__accept_page() to
put the pages from zone->unaccepted_pages to buddy's freelists and
further call accept_memory() to accept those pages.
Before (2), though accept_memory() can also accept a page, the page is
not available to the buddy and hence not available to other kernel
components. When accept_memory() is invoked in (2), the page will not be
re-accepted.
The reason for this series to have kexec to accept_memory() to kexec segments'
destination addresses is that those addresses are not necessarily allocated by
the first kernel's buddy allocator. So, before kexec accessing those pages
(which could be earlier than the second kernel), we invoke the accept_memory()
to trigger the physical page allocation in host, GFN->PFN mapping, physical page
initialization and encryption. After that, kexec can copy source pages into the
destination pages and start the transition to the second kernel.
With that, do you still think this patch is a hack ?
[1] https://lore.kernel.org/all/87frop8r0y.fsf@email.froward.int.ebiederm.org/
[2] https://lore.kernel.org/all/tpbcun3d4wrnbtsvx3b3hjpdl47f2zuxvx6zqsjoelazdt3eyv@kgqnedtcejta/
[3] https://lore.kernel.org/all/20241213094930.748-1-yan.y.zhao@intel.com
>
> I did see a coherent explanation of the bad performance that pointed the
> finger squarely at the fact that everything is happening a page at a
> time. AKA that the design of the ACPI interface has a flaw that needs
> to be fixed.
By flaw, do you mean accepting page by page?
The accept_memory() only takes effect in a guest, demanding physical page
allocation in host OS, which is slow by nature. It's also the truth that
once a page has been accepted, it cannot be swapped out in host.
> I really don't think we should be making complicated work-arounds for
> someone else's bad software decision just because someone immortalized
> their bad decision in a standard. Just accepting all of memory and
> letting the folks who made the bad decision deal with the consequences
> seems much more reasonable to me.
>
> > If Eric has no objection, maybe Andrew can help pick this into his
> > tree.
>
> I have a new objection. I believe ``unaccepted memory'' and especially
> lazily initialized ``unaccepted memory'' is an information leak that
> could defeat the purpose of encrypted memory. For that reason I have
Do you mean before the lazy acceptance, the host can access the page?
Or are you referring to that the host can know the GFN of a page when it
responds to the page allocation request?
For the former, the page will be regarded as private by the guest only after
it's accepted in the guest. So no data will be leaked before guest completes
accept_memory() that initializes the memory data to 0 and encrypts the memory.
For the latter, the lazy memory acceptance still happens in a bulk way, i.e.
not in response to the guest accessing of a specific memory.
So, I can't see which information is leaked.
> Cc'd the security list. I don't know who to CC to get expertise on this
> issue, and the security list folks should.
>
> Unless I am misunderstanding things the big idea with encrypted
> memory is that the hypervisor won't be able to figure out what you
> are doing, because it can't read your memory.
There might be some misunderstanding.
A page will only be regarded as private by the guest after the guest's explicit
acceptance of the memory.
>
> My concern is that by making the ``acceptance'' of memory lazy, that
> there is a fairly strong indication of the function of different parts
> of memory. I expect that signal is strong enough to defeat whatever
> elements of memory address randomization that we implement in the
> kernel.
memory randomization also invokes accept_memory() in extract_kernel().
> So not only does it appear to me that implementation of ``accepting''
> memory has a stupidly slow implementation, somewhat enshrined by a bad
> page at a time ACPI standard, but it appears to me that lazily
> ``accepting'' that memory probably defeats the purpose of having
> encrypted memory.
Not only for performance, but also for memory over-commitment.
Hope the above explanations have addressed your concerns.
Please let me know if anything still doesn't sound correct to you.
Thanks
Yan
> I think the actual solution is to remove all code except for the
> "accept_memory=eager" code paths. AKA delete the "accept_memory=lazy"
> code. At that point there are no more changes that need to be made to
> kexec.
>
> Eric
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/1] Accept unaccepted kexec segments' destination addresses
2025-01-13 14:59 ` Eric W. Biederman
2025-01-14 3:26 ` Baoquan He
2025-01-14 7:04 ` Yan Zhao
@ 2025-01-14 10:08 ` Kirill A. Shutemov
2025-02-13 15:55 ` Dave Hansen
3 siblings, 0 replies; 24+ messages in thread
From: Kirill A. Shutemov @ 2025-01-14 10:08 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Baoquan He, akpm, kexec, Yan Zhao, linux-kernel, linux-coco, x86,
rick.p.edgecombe, kirill.shutemov, security
On Mon, Jan 13, 2025 at 08:59:29AM -0600, Eric W. Biederman wrote:
> Baoquan He <bhe@redhat.com> writes:
>
> > On 01/13/25 at 12:01pm, Kirill A. Shutemov wrote:
> >> On Fri, Dec 13, 2024 at 05:49:30PM +0800, Yan Zhao wrote:
> >> > Hi Eric,
> >> >
> >> > This is a repost of the patch "kexec_core: Accept unaccepted kexec
> >> > destination addresses" [1], rebased to v6.13-rc2.
> >>
> >> Can we get this patch applied?
> >
> > This looks good to me. In v1, we have analyzed all other possible
> > solutions, however change in this patch seems the simplest and most
> > accepatable one.
>
> Truly? I will go back and look and see what I missed but I haven't seen
> anything that I addressed my original objections.
>
> To repeat my objection. The problem I saw was that the performance of
> the accepted memory paradigm was so terrible that they had to resort to
> lazily ``accepting'' memory, which leads to hacks in kexec. I would not
> like to included hacks in kexec just so that other people can avoid
> fixing their bugs.
>
> I did see a coherent explanation of the bad performance that pointed the
> finger squarely at the fact that everything is happening a page at a
> time. AKA that the design of the ACPI interface has a flaw that needs
> to be fixed.
The interface of accepting memory is platform specific. EFI (not ACPI)
only provides a way to enumerate which memory is unaccepted.
> I really don't think we should be making complicated work-arounds for
> someone else's bad software decision just because someone immortalized
> their bad decision in a standard. Just accepting all of memory and
> letting the folks who made the bad decision deal with the consequences
> seems much more reasonable to me.
Note that these work-arounds are needed only because kexec allocates
memory in a hackish way bypassing page allocator.
I don't like that unaccepted memory details leaks into kexec code either.
But it happens because kexec is special and requires special handling.
> > If Eric has no objection, maybe Andrew can help pick this into his
> > tree.
>
> I have a new objection. I believe ``unaccepted memory'' and especially
> lazily initialized ``unaccepted memory'' is an information leak that
> could defeat the purpose of encrypted memory. For that reason I have
> Cc'd the security list. I don't know who to CC to get expertise on this
> issue, and the security list folks should.
>
> Unless I am misunderstanding things the big idea with encrypted
> memory is that the hypervisor won't be able to figure out what you
> are doing, because it can't read your memory.
>
> My concern is that by making the ``acceptance'' of memory lazy, that
> there is a fairly strong indication of the function of different parts
> of memory. I expect that signal is strong enough to defeat whatever
> elements of memory address randomization that we implement in the
> kernel.
>
> So not only does it appear to me that implementation of ``accepting''
> memory has a stupidly slow implementation, somewhat enshrined by a bad
> page at a time ACPI standard, but it appears to me that lazily
> ``accepting'' that memory probably defeats the purpose of having
> encrypted memory.
>
> I think the actual solution is to remove all code except for the
> "accept_memory=eager" code paths. AKA delete the "accept_memory=lazy"
> code. At that point there are no more changes that need to be made to
> kexec.
It is outside of TDX (and I believe SEV) threat model. In TDX case, VMM
can block access to arbitrary guest memory range which would cause TD-exit
if guest touches it. The blocking is required to do some of memory
maintenance operations, like promoting 4k pages to 2M or relocating a
guest page to a different host physical address.
Lazy memory accept doesn't change anything from security PoV here.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/1] Accept unaccepted kexec segments' destination addresses
2025-01-13 14:59 ` Eric W. Biederman
` (2 preceding siblings ...)
2025-01-14 10:08 ` Kirill A. Shutemov
@ 2025-02-13 15:55 ` Dave Hansen
2025-02-14 13:46 ` Kirill A. Shutemov
3 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2025-02-13 15:55 UTC (permalink / raw)
To: Eric W. Biederman, Baoquan He
Cc: Kirill A. Shutemov, akpm, kexec, Yan Zhao, linux-kernel,
linux-coco, x86, rick.p.edgecombe, kirill.shutemov, security
On 1/13/25 06:59, Eric W. Biederman wrote:
...
> I have a new objection. I believe ``unaccepted memory'' and especially
> lazily initialized ``unaccepted memory'' is an information leak that
> could defeat the purpose of encrypted memory. For that reason I have
> Cc'd the security list. I don't know who to CC to get expertise on this
> issue, and the security list folks should.
>
> Unless I am misunderstanding things the big idea with encrypted
> memory is that the hypervisor won't be able to figure out what you
> are doing, because it can't read your memory.
At a super high level, you are right. Accepting memory tells the
hypervisor that the guest is _allocating_ memory. It even tells the host
what the guest physical address of the memory is. But that's far below
the standard we've usually exercised in the kernel for rejecting on
security concerns.
Did anyone on the security list raise any issues here? I've asked them
about a few things in the past and usually I've thought that no news is
good news.
> My concern is that by making the ``acceptance'' of memory lazy, that
> there is a fairly strong indication of the function of different parts
> of memory. I expect that signal is strong enough to defeat whatever
> elements of memory address randomization that we implement in the
> kernel.
In the end, the information that the hypervisor gets is that the guest
allocated _some_ page within a 4MB physical region and the time. It gets
that signal once per boot for each region. It will mostly see a pattern
of acceptance going top-down from high to low physical addresses.
The hypervisor never learns anything about KASLR. The fact that the
physical allocation patterns are predictable (with or without memory
acceptance) is one of the reasons KASLR is in place.
I don't think memory acceptance has any real impact on "memory address
randomization". This is especially true because it's a once-per-boot
signal, not a continuous thing that can be leveraged. 4MB is also
awfully coarse.
> So not only does it appear to me that implementation of ``accepting''
> memory has a stupidly slow implementation, somewhat enshrined by a bad
> page at a time ACPI standard, but it appears to me that lazily
> ``accepting'' that memory probably defeats the purpose of having
> encrypted memory.
Memory acceptance is pitifully slow. But it's slow because it
fundamentally requires getting guest memory into a known state before
guest use. You either have slow memory acceptance as a thing or you have
slow guest boot.
Are there any other CoCo systems that don't have to zero memory like TDX
does? On the x86 side, we have SGX the various flavors of SEV. They all,
as far as I know, require some kind of slow "conversion" process when
pages change security domains.
> I think the actual solution is to remove all code except for the
> "accept_memory=eager" code paths. AKA delete the "accept_memory=lazy"
> code. At that point there are no more changes that need to be made to
> kexec.
That was my first instinct too: lazy acceptance is too complicated to
live and must die.
It sounds like you're advocating for the "slow guest boot" option.
Kirill, can you remind us how fast a guest boots to the shell for
modestly-sized (say 256GB) memory with "accept_memory=eager" versus
"accept_memory=lazy"? IIRC, it was a pretty remarkable difference.
Eric, I wasn't planning on ripping the lazy acceptance code out of
arch/x86. I haven't heard any rumblings from the mm folks that it's
causing problems over there either. This seems like something we want to
fix and I _think_ the core kexec code is the right place to fix this issue.
There are definitely ways to work around this in arch code, but they
seem rather distasteful and I'd rather not go there.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/1] Accept unaccepted kexec segments' destination addresses
2025-02-13 15:55 ` Dave Hansen
@ 2025-02-14 13:46 ` Kirill A. Shutemov
2025-02-14 16:20 ` Dave Hansen
` (2 more replies)
0 siblings, 3 replies; 24+ messages in thread
From: Kirill A. Shutemov @ 2025-02-14 13:46 UTC (permalink / raw)
To: Dave Hansen
Cc: Eric W. Biederman, Baoquan He, Kirill A. Shutemov, akpm, kexec,
Yan Zhao, linux-kernel, linux-coco, x86, rick.p.edgecombe,
security
On Thu, Feb 13, 2025 at 07:55:15AM -0800, Dave Hansen wrote:
> On 1/13/25 06:59, Eric W. Biederman wrote:
> ...
> > I have a new objection. I believe ``unaccepted memory'' and especially
> > lazily initialized ``unaccepted memory'' is an information leak that
> > could defeat the purpose of encrypted memory. For that reason I have
> > Cc'd the security list. I don't know who to CC to get expertise on this
> > issue, and the security list folks should.
> >
> > Unless I am misunderstanding things the big idea with encrypted
> > memory is that the hypervisor won't be able to figure out what you
> > are doing, because it can't read your memory.
>
> At a super high level, you are right. Accepting memory tells the
> hypervisor that the guest is _allocating_ memory. It even tells the host
> what the guest physical address of the memory is. But that's far below
> the standard we've usually exercised in the kernel for rejecting on
> security concerns.
>
> Did anyone on the security list raise any issues here? I've asked them
> about a few things in the past and usually I've thought that no news is
> good news.
>
> > My concern is that by making the ``acceptance'' of memory lazy, that
> > there is a fairly strong indication of the function of different parts
> > of memory. I expect that signal is strong enough to defeat whatever
> > elements of memory address randomization that we implement in the
> > kernel.
>
> In the end, the information that the hypervisor gets is that the guest
> allocated _some_ page within a 4MB physical region and the time. It gets
> that signal once per boot for each region. It will mostly see a pattern
> of acceptance going top-down from high to low physical addresses.
>
> The hypervisor never learns anything about KASLR. The fact that the
> physical allocation patterns are predictable (with or without memory
> acceptance) is one of the reasons KASLR is in place.
>
> I don't think memory acceptance has any real impact on "memory address
> randomization". This is especially true because it's a once-per-boot
> signal, not a continuous thing that can be leveraged. 4MB is also
> awfully coarse.
>
> > So not only does it appear to me that implementation of ``accepting''
> > memory has a stupidly slow implementation, somewhat enshrined by a bad
> > page at a time ACPI standard, but it appears to me that lazily
> > ``accepting'' that memory probably defeats the purpose of having
> > encrypted memory.
>
> Memory acceptance is pitifully slow. But it's slow because it
> fundamentally requires getting guest memory into a known state before
> guest use. You either have slow memory acceptance as a thing or you have
> slow guest boot.
>
> Are there any other CoCo systems that don't have to zero memory like TDX
> does? On the x86 side, we have SGX the various flavors of SEV. They all,
> as far as I know, require some kind of slow "conversion" process when
> pages change security domains.
>
> > I think the actual solution is to remove all code except for the
> > "accept_memory=eager" code paths. AKA delete the "accept_memory=lazy"
> > code. At that point there are no more changes that need to be made to
> > kexec.
>
> That was my first instinct too: lazy acceptance is too complicated to
> live and must die.
>
> It sounds like you're advocating for the "slow guest boot" option.
> Kirill, can you remind us how fast a guest boots to the shell for
> modestly-sized (say 256GB) memory with "accept_memory=eager" versus
> "accept_memory=lazy"? IIRC, it was a pretty remarkable difference.
I only have 128GB machine readily available and posted some number on
other thread[1]:
On single vCPU it takes about a minute to accept 90GiB of memory.
It improves a bit with number of vCPUs. It is 40 seconds with 4 vCPU, but
it doesn't scale past that in my setup.
I've mentioned it before in other thread:
[1] https://lore.kernel.org/all/ihzvi5pwn5hrn4ky2ehjqztjxoixaiaby4igmeihqfehy2vrii@tsg6j5qvmyrm
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/1] Accept unaccepted kexec segments' destination addresses
2025-02-14 13:46 ` Kirill A. Shutemov
@ 2025-02-14 16:20 ` Dave Hansen
2025-03-04 8:41 ` Kirill A. Shutemov
2025-02-19 23:03 ` Jianxiong Gao
2025-02-20 2:27 ` Ashish Kalra
2 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2025-02-14 16:20 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Eric W. Biederman, Baoquan He, Kirill A. Shutemov, akpm, kexec,
Yan Zhao, linux-kernel, linux-coco, x86, rick.p.edgecombe,
security
On 2/14/25 05:46, Kirill A. Shutemov wrote:
>> It sounds like you're advocating for the "slow guest boot" option.
>> Kirill, can you remind us how fast a guest boots to the shell for
>> modestly-sized (say 256GB) memory with "accept_memory=eager" versus
>> "accept_memory=lazy"? IIRC, it was a pretty remarkable difference.
> I only have 128GB machine readily available and posted some number on
> other thread[1]:
>
> On single vCPU it takes about a minute to accept 90GiB of memory.
>
> It improves a bit with number of vCPUs. It is 40 seconds with 4 vCPU, but
> it doesn't scale past that in my setup.
>
> I've mentioned it before in other thread:
>
> [1] https://lore.kernel.org/all/ihzvi5pwn5hrn4ky2ehjqztjxoixaiaby4igmeihqfehy2vrii@tsg6j5qvmyrm
Oh, wow, from that other thread, you've been trying to get this crash
fix accepted since November?
From the looks of it, Eric stopped responding to that thread. I _think_
you gave a reasonable explanation of why memory acceptance is slow. He
then popped back up last month raising security concerns. But I don't
see anyone that shares those concerns.
The unaccepted memory stuff is also _already_ touching the page
allocator. If it's a dumb idea, then we should be gleefully ripping it
out of the page allocator, not rejecting a 2-line kexec patch.
Baoquan has also said this looks good to him.
I'm happy to give Eric another week to respond in case he's on vacation
or something, but I'm honestly not seeing a good reason to hold this bug
fix up.
Andrew, is this the kind of thing you can stick into mm and hold on to
for a bit while we give Eric time to respond?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/1] Accept unaccepted kexec segments' destination addresses
2025-02-14 16:20 ` Dave Hansen
@ 2025-03-04 8:41 ` Kirill A. Shutemov
2025-03-04 18:49 ` Eric W. Biederman
0 siblings, 1 reply; 24+ messages in thread
From: Kirill A. Shutemov @ 2025-03-04 8:41 UTC (permalink / raw)
To: akpm, Eric W. Biederman
Cc: Dave Hansen, Baoquan He, Kirill A. Shutemov, kexec, Yan Zhao,
linux-kernel, linux-coco, x86, rick.p.edgecombe, security
On Fri, Feb 14, 2025 at 08:20:07AM -0800, Dave Hansen wrote:
> On 2/14/25 05:46, Kirill A. Shutemov wrote:
> >> It sounds like you're advocating for the "slow guest boot" option.
> >> Kirill, can you remind us how fast a guest boots to the shell for
> >> modestly-sized (say 256GB) memory with "accept_memory=eager" versus
> >> "accept_memory=lazy"? IIRC, it was a pretty remarkable difference.
> > I only have 128GB machine readily available and posted some number on
> > other thread[1]:
> >
> > On single vCPU it takes about a minute to accept 90GiB of memory.
> >
> > It improves a bit with number of vCPUs. It is 40 seconds with 4 vCPU, but
> > it doesn't scale past that in my setup.
> >
> > I've mentioned it before in other thread:
> >
> > [1] https://lore.kernel.org/all/ihzvi5pwn5hrn4ky2ehjqztjxoixaiaby4igmeihqfehy2vrii@tsg6j5qvmyrm
>
> Oh, wow, from that other thread, you've been trying to get this crash
> fix accepted since November?
>
> From the looks of it, Eric stopped responding to that thread. I _think_
> you gave a reasonable explanation of why memory acceptance is slow. He
> then popped back up last month raising security concerns. But I don't
> see anyone that shares those concerns.
>
> The unaccepted memory stuff is also _already_ touching the page
> allocator. If it's a dumb idea, then we should be gleefully ripping it
> out of the page allocator, not rejecting a 2-line kexec patch.
>
> Baoquan has also said this looks good to him.
>
> I'm happy to give Eric another week to respond in case he's on vacation
> or something, but I'm honestly not seeing a good reason to hold this bug
> fix up.
>
> Andrew, is this the kind of thing you can stick into mm and hold on to
> for a bit while we give Eric time to respond?
Andrew, Eric, can we get this patch in?
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/1] Accept unaccepted kexec segments' destination addresses
2025-03-04 8:41 ` Kirill A. Shutemov
@ 2025-03-04 18:49 ` Eric W. Biederman
2025-03-04 19:16 ` Dave Hansen
0 siblings, 1 reply; 24+ messages in thread
From: Eric W. Biederman @ 2025-03-04 18:49 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: akpm, Dave Hansen, Baoquan He, Kirill A. Shutemov, kexec,
Yan Zhao, linux-kernel, linux-coco, x86, rick.p.edgecombe,
security
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
> On Fri, Feb 14, 2025 at 08:20:07AM -0800, Dave Hansen wrote:
>> On 2/14/25 05:46, Kirill A. Shutemov wrote:
>> >> It sounds like you're advocating for the "slow guest boot" option.
>> >> Kirill, can you remind us how fast a guest boots to the shell for
>> >> modestly-sized (say 256GB) memory with "accept_memory=eager" versus
>> >> "accept_memory=lazy"? IIRC, it was a pretty remarkable difference.
>> > I only have 128GB machine readily available and posted some number on
>> > other thread[1]:
>> >
>> > On single vCPU it takes about a minute to accept 90GiB of memory.
>> >
>> > It improves a bit with number of vCPUs. It is 40 seconds with 4 vCPU, but
>> > it doesn't scale past that in my setup.
>> >
>> > I've mentioned it before in other thread:
>> >
>> > [1] https://lore.kernel.org/all/ihzvi5pwn5hrn4ky2ehjqztjxoixaiaby4igmeihqfehy2vrii@tsg6j5qvmyrm
>>
>> Oh, wow, from that other thread, you've been trying to get this crash
>> fix accepted since November?
>>
>> From the looks of it, Eric stopped responding to that thread. I _think_
>> you gave a reasonable explanation of why memory acceptance is slow. He
>> then popped back up last month raising security concerns. But I don't
>> see anyone that shares those concerns.
>>
>> The unaccepted memory stuff is also _already_ touching the page
>> allocator. If it's a dumb idea, then we should be gleefully ripping it
>> out of the page allocator, not rejecting a 2-line kexec patch.
>>
>> Baoquan has also said this looks good to him.
>>
>> I'm happy to give Eric another week to respond in case he's on vacation
>> or something, but I'm honestly not seeing a good reason to hold this bug
>> fix up.
>>
>> Andrew, is this the kind of thing you can stick into mm and hold on to
>> for a bit while we give Eric time to respond?
>
> Andrew, Eric, can we get this patch in?
How goes the work to fix this horrifically slow firmware interface?
Eric
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/1] Accept unaccepted kexec segments' destination addresses
2025-03-04 18:49 ` Eric W. Biederman
@ 2025-03-04 19:16 ` Dave Hansen
2025-03-12 20:33 ` Dave Hansen
0 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2025-03-04 19:16 UTC (permalink / raw)
To: Eric W. Biederman, Kirill A. Shutemov
Cc: akpm, Baoquan He, Kirill A. Shutemov, kexec, Yan Zhao,
linux-kernel, linux-coco, x86, rick.p.edgecombe, security
On 3/4/25 10:49, Eric W. Biederman wrote:
> How goes the work to fix this horrifically slow firmware interface?
The firmware interface isn't actually all that slow.
The fundamental requirement is that confidential computing environments
need to be handed memory in a known-benign state. For AMD SEV
historically, that's meant doing things like flushing the caches so that
old cache evictions don't write to new data. For SGX, it's meant having
the CPU zero pages (in microcode) before adding them to an enclave.
For TDX, it's meant ensuring that TDX protections are in place, like the
memory integrity and "TD bit". But, those can't actually be set until
the page has been assigned to a TD since the integrity data is dependent
on the per-TD encryption key. But, the "memory integrity and TD bit" are
stored waaaaaaaay out in DRAM because they're pretty large structures
and aren't practical to store inside the CPU.
Even when the firmware isn't in play, it's still expensive to "convert"
pages back and forth to protected or not. See __prep_encrypted_page in
the MKTME series[1], for example. It was quite slow, requiring memset()s
and cache flushing, even though there was no firmware in sight. That's
exactly what the firmware is doing when you ask it to accept memory.
In other words, the process of ensuring that memory is sanitized before
going into a confidential computing environment is slow, not the
firmware interface.
I think what you're effectively asking for is either making DRAM faster,
or some other architecture that doesn't rely on going all the way out to
DRAM to sanitize a page.
1.
https://lore.kernel.org/lkml/20190731150813.26289-8-kirill.shutemov@linux.intel.com/t/
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/1] Accept unaccepted kexec segments' destination addresses
2025-03-04 19:16 ` Dave Hansen
@ 2025-03-12 20:33 ` Dave Hansen
0 siblings, 0 replies; 24+ messages in thread
From: Dave Hansen @ 2025-03-12 20:33 UTC (permalink / raw)
To: Eric W. Biederman, Kirill A. Shutemov
Cc: akpm, Baoquan He, Kirill A. Shutemov, kexec, Yan Zhao,
linux-kernel, linux-coco, x86, rick.p.edgecombe, security
On 3/4/25 11:16, Dave Hansen wrote:
> On 3/4/25 10:49, Eric W. Biederman wrote:
>> How goes the work to fix this horrifically slow firmware interface?
> The firmware interface isn't actually all that slow.
Hey Eric,
I've noticed a trend on this series. It seems like every time there's
some forward progress on a fix, you pop up, and ask a question. Someone
answers the question. Then, a couple of months later, you seem to pop up
again and ask another form of the same question. It kinda seems to me
like you may not be thoroughly reading the answers from the previous
round of discussion. Or, maybe you're like me and have a hard time
recalling any discussions more than a week ago. ;)
Either way, I hope you're finally convinced that the hardware design
here is reasonable.
If not, I'd really like to continue the conversation now when this is
all fresh in our heads instead of having to poke at cold brain cells in
another month.
Any more questions, or can we finally put this issue to bed?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/1] Accept unaccepted kexec segments' destination addresses
2025-02-14 13:46 ` Kirill A. Shutemov
2025-02-14 16:20 ` Dave Hansen
@ 2025-02-19 23:03 ` Jianxiong Gao
2025-02-20 2:27 ` Ashish Kalra
2 siblings, 0 replies; 24+ messages in thread
From: Jianxiong Gao @ 2025-02-19 23:03 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Dave Hansen, Eric W. Biederman, Baoquan He, Kirill A. Shutemov,
akpm, kexec, Yan Zhao, linux-kernel, linux-coco, x86,
rick.p.edgecombe, security
> > It sounds like you're advocating for the "slow guest boot" option.
> > Kirill, can you remind us how fast a guest boots to the shell for
> > modestly-sized (say 256GB) memory with "accept_memory=eager" versus
> > "accept_memory=lazy"? IIRC, it was a pretty remarkable difference.
>
> I only have 128GB machine readily available and posted some number on
> other thread[1]:
>
> On single vCPU it takes about a minute to accept 90GiB of memory.
>
> It improves a bit with number of vCPUs. It is 40 seconds with 4 vCPU, but
> it doesn't scale past that in my setup.
>
We have seen similar boot performance improvements on our larger shapes
of VMs. Both lazy accept and kexec with TDX are important features for us.
--
Jianxiong Gao
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2 0/1] Accept unaccepted kexec segments' destination addresses
2025-02-14 13:46 ` Kirill A. Shutemov
2025-02-14 16:20 ` Dave Hansen
2025-02-19 23:03 ` Jianxiong Gao
@ 2025-02-20 2:27 ` Ashish Kalra
2 siblings, 0 replies; 24+ messages in thread
From: Ashish Kalra @ 2025-02-20 2:27 UTC (permalink / raw)
To: kirill.shutemov
Cc: akpm, bhe, dave.hansen, ebiederm, kexec, kirill, linux-coco,
linux-kernel, rick.p.edgecombe, security, x86, yan.y.zhao,
thomas.lendacky, michael.roth
> On Thu, Feb 13, 2025 at 07:55:15AM -0800, Dave Hansen wrote:
>> On 1/13/25 06:59, Eric W. Biederman wrote:
>> ...
>> > I have a new objection. I believe ``unaccepted memory'' and especially
>> > lazily initialized ``unaccepted memory'' is an information leak that
>> > could defeat the purpose of encrypted memory. For that reason I have
>> > Cc'd the security list. I don't know who to CC to get expertise on this
>> > issue, and the security list folks should.
>> >
>> > Unless I am misunderstanding things the big idea with encrypted
>> > memory is that the hypervisor won't be able to figure out what you
>> > are doing, because it can't read your memory.
>>
>> At a super high level, you are right. Accepting memory tells the
>> hypervisor that the guest is _allocating_ memory. It even tells the host
>> what the guest physical address of the memory is. But that's far below
>> the standard we've usually exercised in the kernel for rejecting on
>> security concerns.
>>
>> Did anyone on the security list raise any issues here? I've asked them
>> about a few things in the past and usually I've thought that no news is
>> good news.
>>
>> > My concern is that by making the ``acceptance'' of memory lazy, that
>> > there is a fairly strong indication of the function of different parts
>> > of memory. I expect that signal is strong enough to defeat whatever
>> > elements of memory address randomization that we implement in the
>> > kernel.
>>
>> In the end, the information that the hypervisor gets is that the guest
>> allocated _some_ page within a 4MB physical region and the time. It gets
>> that signal once per boot for each region. It will mostly see a pattern
>> of acceptance going top-down from high to low physical addresses.
>>
>> The hypervisor never learns anything about KASLR. The fact that the
>> physical allocation patterns are predictable (with or without memory
>> acceptance) is one of the reasons KASLR is in place.
>>
>> I don't think memory acceptance has any real impact on "memory address
>> randomization". This is especially true because it's a once-per-boot
>> signal, not a continuous thing that can be leveraged. 4MB is also
>> awfully coarse.
>>
>> > So not only does it appear to me that implementation of ``accepting''
>> > memory has a stupidly slow implementation, somewhat enshrined by a bad
>> > page at a time ACPI standard, but it appears to me that lazily
>> > ``accepting'' that memory probably defeats the purpose of having
>> > encrypted memory.
>>
>> Memory acceptance is pitifully slow. But it's slow because it
>> fundamentally requires getting guest memory into a known state before
>> guest use. You either have slow memory acceptance as a thing or you have
>> slow guest boot.
>>
>> Are there any other CoCo systems that don't have to zero memory like TDX
>> does? On the x86 side, we have SGX the various flavors of SEV. They all,
>> as far as I know, require some kind of slow "conversion" process when
>> pages change security domains.
>>
>> > I think the actual solution is to remove all code except for the
>> > "accept_memory=eager" code paths. AKA delete the "accept_memory=lazy"
>> > code. At that point there are no more changes that need to be made to
>> > kexec.
>>
>> That was my first instinct too: lazy acceptance is too complicated to
>> live and must die.
>>
>> It sounds like you're advocating for the "slow guest boot" option.
>> Kirill, can you remind us how fast a guest boots to the shell for
>> modestly-sized (say 256GB) memory with "accept_memory=eager" versus
>> "accept_memory=lazy"? IIRC, it was a pretty remarkable difference.
>I only have 128GB machine readily available and posted some number on
>other thread[1]:
> On single vCPU it takes about a minute to accept 90GiB of memory.
> It improves a bit with number of vCPUs. It is 40 seconds with 4 vCPU, but
> it doesn't scale past that in my setup.
>I've mentioned it before in other thread:
>[1] https://lore.kernel.org/all/ihzvi5pwn5hrn4ky2ehjqztjxoixaiaby4igmeihqfehy2vrii@tsg6j5qvmyrm
We essentially rely on lazy acceptance support for reducing SNP guest boot time.
Here are some performance numbers for SNP guests which i have here after discussing with
Michael Roth (who is also CCed here):
Just did quick boot of a 128GB SNP guest with accept_memory=lazy guest kernel parameter
and that took 22s to boot, and with accept_memory=eager it takes 3 minutes and 47s, so it
is a remarkable difference.
Thanks,
Ashish
>--
> Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 24+ messages in thread