* Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
[not found] <20201221163024.GA22532@open-light-1.localdomain>
@ 2021-01-04 19:19 ` Dave Hansen
2021-01-04 19:27 ` Matthew Wilcox
0 siblings, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2021-01-04 19:19 UTC (permalink / raw)
To: Alexander Duyck, Mel Gorman, Andrew Morton, Andrea Arcangeli,
Dan Williams, Michael S. Tsirkin, David Hildenbrand, Jason Wang,
Michal Hocko, Liang Li, linux-mm, linux-kernel, virtualization
On 12/21/20 8:30 AM, Liang Li wrote:
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -137,6 +137,9 @@ enum pageflags {
> #endif
> #ifdef CONFIG_64BIT
> PG_arch_2,
> +#endif
> +#ifdef CONFIG_PREZERO_PAGE
> + PG_zero,
> #endif
> __NR_PAGEFLAGS,
I don't think this is worth a generic page->flags bit.
There's a ton of space in 'struct page' for pages that are in the
allocator. Can't we use some of that space?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
2021-01-04 19:19 ` [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO Dave Hansen
@ 2021-01-04 19:27 ` Matthew Wilcox
2021-01-04 19:44 ` Dan Williams
2021-01-04 19:51 ` Dave Hansen
0 siblings, 2 replies; 8+ messages in thread
From: Matthew Wilcox @ 2021-01-04 19:27 UTC (permalink / raw)
To: Dave Hansen
Cc: Andrea Arcangeli, Michal Hocko, Michael S. Tsirkin, Dan Williams,
Liang Li, linux-kernel, linux-mm, Alexander Duyck, virtualization,
Mel Gorman, Andrew Morton
On Mon, Jan 04, 2021 at 11:19:13AM -0800, Dave Hansen wrote:
> On 12/21/20 8:30 AM, Liang Li wrote:
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -137,6 +137,9 @@ enum pageflags {
> > #endif
> > #ifdef CONFIG_64BIT
> > PG_arch_2,
> > +#endif
> > +#ifdef CONFIG_PREZERO_PAGE
> > + PG_zero,
> > #endif
> > __NR_PAGEFLAGS,
>
> I don't think this is worth a generic page->flags bit.
>
> There's a ton of space in 'struct page' for pages that are in the
> allocator. Can't we use some of that space?
I was going to object to that too, but I think the entire approach is
flawed and needs to be thrown out. It just nukes the caches in extremely
subtle and hard to measure ways, lowering overall system performance.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
2021-01-04 19:27 ` Matthew Wilcox
@ 2021-01-04 19:44 ` Dan Williams
2021-01-04 19:51 ` Dave Hansen
1 sibling, 0 replies; 8+ messages in thread
From: Dan Williams @ 2021-01-04 19:44 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrea Arcangeli, Michal Hocko, Michael S. Tsirkin, Liang Li,
Linux Kernel Mailing List, Linux MM, Dave Hansen, Alexander Duyck,
virtualization, Mel Gorman, Andrew Morton
On Mon, Jan 4, 2021 at 11:28 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Jan 04, 2021 at 11:19:13AM -0800, Dave Hansen wrote:
> > On 12/21/20 8:30 AM, Liang Li wrote:
> > > --- a/include/linux/page-flags.h
> > > +++ b/include/linux/page-flags.h
> > > @@ -137,6 +137,9 @@ enum pageflags {
> > > #endif
> > > #ifdef CONFIG_64BIT
> > > PG_arch_2,
> > > +#endif
> > > +#ifdef CONFIG_PREZERO_PAGE
> > > + PG_zero,
> > > #endif
> > > __NR_PAGEFLAGS,
> >
> > I don't think this is worth a generic page->flags bit.
> >
> > There's a ton of space in 'struct page' for pages that are in the
> > allocator. Can't we use some of that space?
>
> I was going to object to that too, but I think the entire approach is
> flawed and needs to be thrown out. It just nukes the caches in extremely
> subtle and hard to measure ways, lowering overall system performance.
At a minimum the performance analysis should at least try to quantify
that externalized cost. Certainly that overhead went somewhere. Maybe
if this overhead was limited to run when the CPU would otherwise go
idle, but that might mean it never runs in practice...
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
2021-01-04 19:27 ` Matthew Wilcox
2021-01-04 19:44 ` Dan Williams
@ 2021-01-04 19:51 ` Dave Hansen
2021-01-04 20:11 ` David Hildenbrand
1 sibling, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2021-01-04 19:51 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrea Arcangeli, Michal Hocko, Michael S. Tsirkin, Dan Williams,
Liang Li, linux-kernel, linux-mm, Alexander Duyck, virtualization,
Mel Gorman, Andrew Morton
On 1/4/21 11:27 AM, Matthew Wilcox wrote:
> On Mon, Jan 04, 2021 at 11:19:13AM -0800, Dave Hansen wrote:
>> On 12/21/20 8:30 AM, Liang Li wrote:
>>> --- a/include/linux/page-flags.h
>>> +++ b/include/linux/page-flags.h
>>> @@ -137,6 +137,9 @@ enum pageflags {
>>> #endif
>>> #ifdef CONFIG_64BIT
>>> PG_arch_2,
>>> +#endif
>>> +#ifdef CONFIG_PREZERO_PAGE
>>> + PG_zero,
>>> #endif
>>> __NR_PAGEFLAGS,
>>
>> I don't think this is worth a generic page->flags bit.
>>
>> There's a ton of space in 'struct page' for pages that are in the
>> allocator. Can't we use some of that space?
>
> I was going to object to that too, but I think the entire approach is
> flawed and needs to be thrown out. It just nukes the caches in extremely
> subtle and hard to measure ways, lowering overall system performance.
Yeah, it certainly can't be the default, but it *is* useful for thing
where we know that there are no cache benefits to zeroing close to where
the memory is allocated.
The trick is opting into it somehow, either in a process or a VMA.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
2021-01-04 19:51 ` Dave Hansen
@ 2021-01-04 20:11 ` David Hildenbrand
2021-01-04 22:29 ` Dan Williams
2021-01-04 23:00 ` Dave Hansen
0 siblings, 2 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-01-04 20:11 UTC (permalink / raw)
To: Dave Hansen
Cc: Andrea Arcangeli, Michal Hocko, Andrew Morton, Michael S. Tsirkin,
Liang Li, Matthew Wilcox, linux-kernel, linux-mm, Dan Williams,
virtualization, Mel Gorman, Alexander Duyck
> Am 04.01.2021 um 20:52 schrieb Dave Hansen <dave.hansen@intel.com>:
>
> On 1/4/21 11:27 AM, Matthew Wilcox wrote:
>>> On Mon, Jan 04, 2021 at 11:19:13AM -0800, Dave Hansen wrote:
>>> On 12/21/20 8:30 AM, Liang Li wrote:
>>>> --- a/include/linux/page-flags.h
>>>> +++ b/include/linux/page-flags.h
>>>> @@ -137,6 +137,9 @@ enum pageflags {
>>>> #endif
>>>> #ifdef CONFIG_64BIT
>>>> PG_arch_2,
>>>> +#endif
>>>> +#ifdef CONFIG_PREZERO_PAGE
>>>> + PG_zero,
>>>> #endif
>>>> __NR_PAGEFLAGS,
>>>
>>> I don't think this is worth a generic page->flags bit.
>>>
>>> There's a ton of space in 'struct page' for pages that are in the
>>> allocator. Can't we use some of that space?
>>
>> I was going to object to that too, but I think the entire approach is
>> flawed and needs to be thrown out. It just nukes the caches in extremely
>> subtle and hard to measure ways, lowering overall system performance.
>
> Yeah, it certainly can't be the default, but it *is* useful for thing
> where we know that there are no cache benefits to zeroing close to where
> the memory is allocated.
>
> The trick is opting into it somehow, either in a process or a VMA.
>
The patch set is mostly trying to optimize starting a new process. So process/vma doesn‘t really work.
I still wonder if using tmpfs/shmem cannot somehow be used to cover the original use case of starting a new vm fast (or rebooting an existing one involving restarting the process).
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
2021-01-04 20:11 ` David Hildenbrand
@ 2021-01-04 22:29 ` Dan Williams
2021-01-04 23:00 ` Dave Hansen
1 sibling, 0 replies; 8+ messages in thread
From: Dan Williams @ 2021-01-04 22:29 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrea Arcangeli, Michal Hocko, Michael S. Tsirkin, Liang Li,
Matthew Wilcox, Linux Kernel Mailing List, Linux MM, Dave Hansen,
Alexander Duyck, virtualization, Mel Gorman, Andrew Morton
On Mon, Jan 4, 2021 at 12:11 PM David Hildenbrand <david@redhat.com> wrote:
>
>
> > Am 04.01.2021 um 20:52 schrieb Dave Hansen <dave.hansen@intel.com>:
> >
> > On 1/4/21 11:27 AM, Matthew Wilcox wrote:
> >>> On Mon, Jan 04, 2021 at 11:19:13AM -0800, Dave Hansen wrote:
> >>> On 12/21/20 8:30 AM, Liang Li wrote:
> >>>> --- a/include/linux/page-flags.h
> >>>> +++ b/include/linux/page-flags.h
> >>>> @@ -137,6 +137,9 @@ enum pageflags {
> >>>> #endif
> >>>> #ifdef CONFIG_64BIT
> >>>> PG_arch_2,
> >>>> +#endif
> >>>> +#ifdef CONFIG_PREZERO_PAGE
> >>>> + PG_zero,
> >>>> #endif
> >>>> __NR_PAGEFLAGS,
> >>>
> >>> I don't think this is worth a generic page->flags bit.
> >>>
> >>> There's a ton of space in 'struct page' for pages that are in the
> >>> allocator. Can't we use some of that space?
> >>
> >> I was going to object to that too, but I think the entire approach is
> >> flawed and needs to be thrown out. It just nukes the caches in extremely
> >> subtle and hard to measure ways, lowering overall system performance.
> >
> > Yeah, it certainly can't be the default, but it *is* useful for thing
> > where we know that there are no cache benefits to zeroing close to where
> > the memory is allocated.
> >
> > The trick is opting into it somehow, either in a process or a VMA.
> >
>
> The patch set is mostly trying to optimize starting a new process. So process/vma doesn‘t really work.
>
> I still wonder if using tmpfs/shmem cannot somehow be used to cover the original use case of starting a new vm fast (or rebooting an existing one involving restarting the process).
If it's rebooting a VM then file-backed should be able to skip the
zeroing because the stale data exposure is only to itself. If the
memory is being repurposed to a new VM then the file needs to be
reallocated / re-zeroed just like the anonymous case.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
2021-01-04 20:11 ` David Hildenbrand
2021-01-04 22:29 ` Dan Williams
@ 2021-01-04 23:00 ` Dave Hansen
[not found] ` <20210105092037.GY13207@dhcp22.suse.cz>
1 sibling, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2021-01-04 23:00 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrea Arcangeli, Michal Hocko, Andrew Morton, Michael S. Tsirkin,
Liang Li, Matthew Wilcox, linux-kernel, linux-mm, Dan Williams,
virtualization, Mel Gorman, Alexander Duyck
On 1/4/21 12:11 PM, David Hildenbrand wrote:
>> Yeah, it certainly can't be the default, but it *is* useful for
>> thing where we know that there are no cache benefits to zeroing
>> close to where the memory is allocated.
>>
>> The trick is opting into it somehow, either in a process or a VMA.
>>
> The patch set is mostly trying to optimize starting a new process. So
> process/vma doesn‘t really work.
Let's say you have a system-wide tunable that says: pre-zero pages and
keep 10GB of them around. Then, you opt-in a process to being allowed
to dip into that pool with a process-wide flag or an madvise() call.
You could even have the flag be inherited across execve() if you wanted
to have helper apps be able to set the policy and access the pool like
how numactl works.
Dan makes a very good point about using filesystems for this, though.
It wouldn't be rocket science to set up a special tmpfs mount just for
VM memory and pre-zero it from userspace. For qemu, you'd need to teach
the management layer to hand out zeroed files via mem-path=. Heck, if
you taught MADV_FREE how to handle tmpfs, you could even pre-zero *and*
get the memory back quickly if those files ended up over-sized somehow.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
[not found] ` <20210105092037.GY13207@dhcp22.suse.cz>
@ 2021-01-05 9:29 ` David Hildenbrand
0 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-01-05 9:29 UTC (permalink / raw)
To: Michal Hocko, Dave Hansen
Cc: Andrea Arcangeli, Andrew Morton, Michael S. Tsirkin, Liang Li,
Matthew Wilcox, linux-kernel, linux-mm, Dan Williams,
virtualization, Mel Gorman, Alexander Duyck
On 05.01.21 10:20, Michal Hocko wrote:
> On Mon 04-01-21 15:00:31, Dave Hansen wrote:
>> On 1/4/21 12:11 PM, David Hildenbrand wrote:
>>>> Yeah, it certainly can't be the default, but it *is* useful for
>>>> thing where we know that there are no cache benefits to zeroing
>>>> close to where the memory is allocated.
>>>>
>>>> The trick is opting into it somehow, either in a process or a VMA.
>>>>
>>> The patch set is mostly trying to optimize starting a new process. So
>>> process/vma doesn‘t really work.
>>
>> Let's say you have a system-wide tunable that says: pre-zero pages and
>> keep 10GB of them around. Then, you opt-in a process to being allowed
>> to dip into that pool with a process-wide flag or an madvise() call.
>> You could even have the flag be inherited across execve() if you wanted
>> to have helper apps be able to set the policy and access the pool like
>> how numactl works.
>
> While possible, it sounds quite heavy weight to me. Page allocator would
> have to somehow maintain those pre-zeroed pages. This pool will also
> become a very scarce resource very soon because everybody just want to
> run faster. So this would open many more interesting questions.
Agreed.
>
> A global knob with all or nothing sounds like an easier to use and
> maintain solution to me.
I mean, that brings me back to my original suggestion: just use
hugetlbfs and implement some sort of pre-zeroing there (worker thread,
whatsoever). Most vfio users should already be better of using hugepages.
It's a "pool of pages" already. Selected users use it. I really don't
see a need to extend the buddy with something like that.
--
Thanks,
David / dhildenb
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-01-05 9:30 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20201221163024.GA22532@open-light-1.localdomain>
2021-01-04 19:19 ` [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO Dave Hansen
2021-01-04 19:27 ` Matthew Wilcox
2021-01-04 19:44 ` Dan Williams
2021-01-04 19:51 ` Dave Hansen
2021-01-04 20:11 ` David Hildenbrand
2021-01-04 22:29 ` Dan Williams
2021-01-04 23:00 ` Dave Hansen
[not found] ` <20210105092037.GY13207@dhcp22.suse.cz>
2021-01-05 9:29 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).