- * [PATCH v1 1/4] powerpc/mm: factor out creating/removing linear mapping
  2020-10-29 16:27 [PATCH v1 0/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
@ 2020-10-29 16:27 ` David Hildenbrand
  2020-10-29 16:27 ` [PATCH v1 2/4] powerpc/mm: print warning in arch_remove_linear_mapping() David Hildenbrand
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2020-10-29 16:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Oscar Salvador,
	Wei Yang
We want to stop abusing memory hotplug infrastructure in memtrace code
to perform allocations and remove the linear mapping. Instead we will use
alloc_contig_pages() and remove the identity mapping manually.
Let's factor out creating/removing the linear mapping into
arch_create_linear_mapping() / arch_remove_linear_mapping() - so in the
future, we might be able to have whole arch_add_memory() /
arch_remove_memory() be implemented in common code.
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/mm/mem.c          | 41 +++++++++++++++++++++++-----------
 include/linux/memory_hotplug.h |  3 +++
 2 files changed, 31 insertions(+), 13 deletions(-)
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 01ec2a252f09..8a86d81f8df0 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -120,34 +120,26 @@ static void flush_dcache_range_chunked(unsigned long start, unsigned long stop,
 	}
 }
 
-int __ref arch_add_memory(int nid, u64 start, u64 size,
-			  struct mhp_params *params)
+int __ref arch_create_linear_mapping(int nid, u64 start, u64 size,
+				     struct mhp_params *params)
 {
-	unsigned long start_pfn = start >> PAGE_SHIFT;
-	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int rc;
 
 	start = (unsigned long)__va(start);
 	rc = create_section_mapping(start, start + size, nid,
 				    params->pgprot);
 	if (rc) {
-		pr_warn("Unable to create mapping for hot added memory 0x%llx..0x%llx: %d\n",
+		pr_warn("Unable to create linear mapping for 0x%llx..0x%llx: %d\n",
 			start, start + size, rc);
 		return -EFAULT;
 	}
-
-	return __add_pages(nid, start_pfn, nr_pages, params);
+	return 0;
 }
 
-void __ref arch_remove_memory(int nid, u64 start, u64 size,
-			     struct vmem_altmap *altmap)
+void __ref arch_remove_linear_mapping(u64 start, u64 size)
 {
-	unsigned long start_pfn = start >> PAGE_SHIFT;
-	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
-	__remove_pages(start_pfn, nr_pages, altmap);
-
 	/* Remove htab bolted mappings for this section of memory */
 	start = (unsigned long)__va(start);
 	flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE);
@@ -160,6 +152,29 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
 	 */
 	vm_unmap_aliases();
 }
+
+int __ref arch_add_memory(int nid, u64 start, u64 size,
+			  struct mhp_params *params)
+{
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	int rc;
+
+	rc = arch_create_linear_mapping(nid, start, size, params);
+	if (rc)
+		return rc;
+	return __add_pages(nid, start_pfn, nr_pages, params);
+}
+
+void __ref arch_remove_memory(int nid, u64 start, u64 size,
+			      struct vmem_altmap *altmap)
+{
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+
+	__remove_pages(start_pfn, nr_pages, altmap);
+	arch_remove_linear_mapping(start, size);
+}
 #endif
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index d65c6fdc5cfc..00b9e9bd3850 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -375,6 +375,9 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
 					  unsigned long pnum);
 extern struct zone *zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
 		unsigned long nr_pages);
+extern int arch_create_linear_mapping(int nid, u64 start, u64 size,
+				      struct mhp_params *params);
+void arch_remove_linear_mapping(u64 start, u64 size);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
-- 
2.26.2
^ permalink raw reply related	[flat|nested] 15+ messages in thread
- * [PATCH v1 2/4] powerpc/mm: print warning in arch_remove_linear_mapping()
  2020-10-29 16:27 [PATCH v1 0/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
  2020-10-29 16:27 ` [PATCH v1 1/4] powerpc/mm: factor out creating/removing linear mapping David Hildenbrand
@ 2020-10-29 16:27 ` David Hildenbrand
  2020-11-04  9:42   ` osalvador
  2020-10-29 16:27 ` [PATCH v1 3/4] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory() David Hildenbrand
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2020-10-29 16:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Oscar Salvador,
	Wei Yang
Let's print a warning similar to in arch_add_linear_mapping() instead of
WARN_ON_ONCE() and eventually crashing the kernel.
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/mm/mem.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 8a86d81f8df0..685028451dd2 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -145,7 +145,9 @@ void __ref arch_remove_linear_mapping(u64 start, u64 size)
 	flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE);
 
 	ret = remove_section_mapping(start, start + size);
-	WARN_ON_ONCE(ret);
+	if (ret)
+		pr_warn("Unable to remove linear mapping for 0x%llx..0x%llx: %d\n",
+			start, start + size, ret);
 
 	/* Ensure all vmalloc mappings are flushed in case they also
 	 * hit that section of memory
-- 
2.26.2
^ permalink raw reply related	[flat|nested] 15+ messages in thread
- * Re: [PATCH v1 2/4] powerpc/mm: print warning in arch_remove_linear_mapping()
  2020-10-29 16:27 ` [PATCH v1 2/4] powerpc/mm: print warning in arch_remove_linear_mapping() David Hildenbrand
@ 2020-11-04  9:42   ` osalvador
  2020-11-11 12:10     ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: osalvador @ 2020-11-04  9:42 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Wei Yang
On Thu, Oct 29, 2020 at 05:27:16PM +0100, David Hildenbrand wrote:
> Let's print a warning similar to in arch_add_linear_mapping() instead of
> WARN_ON_ONCE() and eventually crashing the kernel.
> 
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Rashmica Gupta <rashmica.g@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/powerpc/mm/mem.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 8a86d81f8df0..685028451dd2 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -145,7 +145,9 @@ void __ref arch_remove_linear_mapping(u64 start, u64 size)
>  	flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE);
>  
>  	ret = remove_section_mapping(start, start + size);
> -	WARN_ON_ONCE(ret);
> +	if (ret)
> +		pr_warn("Unable to remove linear mapping for 0x%llx..0x%llx: %d\n",
> +			start, start + size, ret);
I guess the fear is to panic on systems that do have panic_on_warn (not
sure how many productions systems have this out there).
But anyway, being coherent with that, I think you should remove the WARN_ON
in hash__remove_section_mapping as well.
Besides that:
Reviewed-by: Oscar Salvador <osalvador@suse.
Not sure if the functions below that also have any sort of WARN_ON.
native_hpte_removebolted has a VM_WARN_ON, but that is on
CONFIG_DEBUG_VM so does not really matter.
-- 
Oscar Salvador
SUSE L3
^ permalink raw reply	[flat|nested] 15+ messages in thread
- * Re: [PATCH v1 2/4] powerpc/mm: print warning in arch_remove_linear_mapping()
  2020-11-04  9:42   ` osalvador
@ 2020-11-11 12:10     ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2020-11-11 12:10 UTC (permalink / raw)
  To: osalvador
  Cc: linux-kernel, linux-mm, linuxppc-dev, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Wei Yang
On 04.11.20 10:42, osalvador wrote:
> On Thu, Oct 29, 2020 at 05:27:16PM +0100, David Hildenbrand wrote:
>> Let's print a warning similar to in arch_add_linear_mapping() instead of
>> WARN_ON_ONCE() and eventually crashing the kernel.
>>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Paul Mackerras <paulus@samba.org>
>> Cc: Rashmica Gupta <rashmica.g@gmail.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Mike Rapoport <rppt@kernel.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   arch/powerpc/mm/mem.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
>> index 8a86d81f8df0..685028451dd2 100644
>> --- a/arch/powerpc/mm/mem.c
>> +++ b/arch/powerpc/mm/mem.c
>> @@ -145,7 +145,9 @@ void __ref arch_remove_linear_mapping(u64 start, u64 size)
>>   	flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE);
>>   
>>   	ret = remove_section_mapping(start, start + size);
>> -	WARN_ON_ONCE(ret);
>> +	if (ret)
>> +		pr_warn("Unable to remove linear mapping for 0x%llx..0x%llx: %d\n",
>> +			start, start + size, ret);
> 
> I guess the fear is to panic on systems that do have panic_on_warn (not
> sure how many productions systems have this out there).
Exactly.
> But anyway, being coherent with that, I think you should remove the WARN_ON
> in hash__remove_section_mapping as well.
Thanks, I'll add a patch doing that.
> 
> Besides that:
> 
> Reviewed-by: Oscar Salvador <osalvador@suse.
> 
> Not sure if the functions below that also have any sort of WARN_ON.
> native_hpte_removebolted has a VM_WARN_ON, but that is on
> CONFIG_DEBUG_VM so does not really matter.
Right. Thanks!
-- 
Thanks,
David / dhildenb
^ permalink raw reply	[flat|nested] 15+ messages in thread
 
 
- * [PATCH v1 3/4] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()
  2020-10-29 16:27 [PATCH v1 0/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
  2020-10-29 16:27 ` [PATCH v1 1/4] powerpc/mm: factor out creating/removing linear mapping David Hildenbrand
  2020-10-29 16:27 ` [PATCH v1 2/4] powerpc/mm: print warning in arch_remove_linear_mapping() David Hildenbrand
@ 2020-10-29 16:27 ` David Hildenbrand
  2020-11-04  9:50   ` osalvador
  2020-11-04 12:11   ` Oscar Salvador
  2020-10-29 16:27 ` [PATCH v1 4/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
  2020-11-25 11:57 ` [PATCH v1 0/4] " Michael Ellerman
  4 siblings, 2 replies; 15+ messages in thread
From: David Hildenbrand @ 2020-10-29 16:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Oscar Salvador,
	Wei Yang
Let's revert what we did in case seomthing goes wrong and we return an
error.
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/mm/mem.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 685028451dd2..69b3e8072261 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -165,7 +165,10 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
 	rc = arch_create_linear_mapping(nid, start, size, params);
 	if (rc)
 		return rc;
-	return __add_pages(nid, start_pfn, nr_pages, params);
+	rc = __add_pages(nid, start_pfn, nr_pages, params);
+	if (rc)
+		arch_remove_linear_mapping(start, size);
+	return rc;
 }
 
 void __ref arch_remove_memory(int nid, u64 start, u64 size,
-- 
2.26.2
^ permalink raw reply related	[flat|nested] 15+ messages in thread 
- * Re: [PATCH v1 3/4] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()
  2020-10-29 16:27 ` [PATCH v1 3/4] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory() David Hildenbrand
@ 2020-11-04  9:50   ` osalvador
  2020-11-04 12:06     ` Mike Rapoport
  2020-11-04 12:11   ` Oscar Salvador
  1 sibling, 1 reply; 15+ messages in thread
From: osalvador @ 2020-11-04  9:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Wei Yang
On Thu, Oct 29, 2020 at 05:27:17PM +0100, David Hildenbrand wrote:
> Let's revert what we did in case seomthing goes wrong and we return an
> error.
Dumb question, but should not we do this for other arches as well?
-- 
Oscar Salvador
SUSE L3
^ permalink raw reply	[flat|nested] 15+ messages in thread 
- * Re: [PATCH v1 3/4] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()
  2020-11-04  9:50   ` osalvador
@ 2020-11-04 12:06     ` Mike Rapoport
  2020-11-04 12:11       ` Oscar Salvador
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Rapoport @ 2020-11-04 12:06 UTC (permalink / raw)
  To: osalvador
  Cc: David Hildenbrand, linux-kernel, linux-mm, linuxppc-dev,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Rashmica Gupta, Andrew Morton, Michal Hocko, Wei Yang
On Wed, Nov 04, 2020 at 10:50:07AM +0100, osalvador wrote:
> On Thu, Oct 29, 2020 at 05:27:17PM +0100, David Hildenbrand wrote:
> > Let's revert what we did in case seomthing goes wrong and we return an
> > error.
> 
> Dumb question, but should not we do this for other arches as well?
It seems arm64 and s390 already do that. 
x86 could have its arch_add_memory() improved though :)
> -- 
> Oscar Salvador
> SUSE L3
-- 
Sincerely yours,
Mike.
^ permalink raw reply	[flat|nested] 15+ messages in thread 
- * Re: [PATCH v1 3/4] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()
  2020-11-04 12:06     ` Mike Rapoport
@ 2020-11-04 12:11       ` Oscar Salvador
  2020-11-11 12:07         ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Oscar Salvador @ 2020-11-04 12:11 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: David Hildenbrand, linux-kernel, linux-mm, linuxppc-dev,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Rashmica Gupta, Andrew Morton, Michal Hocko, Wei Yang
On Wed, Nov 04, 2020 at 02:06:51PM +0200, Mike Rapoport wrote:
> On Wed, Nov 04, 2020 at 10:50:07AM +0100, osalvador wrote:
> > On Thu, Oct 29, 2020 at 05:27:17PM +0100, David Hildenbrand wrote:
> > > Let's revert what we did in case seomthing goes wrong and we return an
> > > error.
> > 
> > Dumb question, but should not we do this for other arches as well?
> 
> It seems arm64 and s390 already do that. 
> x86 could have its arch_add_memory() improved though :)
Right, I only stared at x86 and see it did not have it.
I guess we want to have all arches aligned with this.
Thanks
-- 
Oscar Salvador
SUSE L3
^ permalink raw reply	[flat|nested] 15+ messages in thread 
- * Re: [PATCH v1 3/4] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()
  2020-11-04 12:11       ` Oscar Salvador
@ 2020-11-11 12:07         ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2020-11-11 12:07 UTC (permalink / raw)
  To: Oscar Salvador, Mike Rapoport
  Cc: linux-kernel, linux-mm, linuxppc-dev, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Michal Hocko, Wei Yang
On 04.11.20 13:11, Oscar Salvador wrote:
> On Wed, Nov 04, 2020 at 02:06:51PM +0200, Mike Rapoport wrote:
>> On Wed, Nov 04, 2020 at 10:50:07AM +0100, osalvador wrote:
>>> On Thu, Oct 29, 2020 at 05:27:17PM +0100, David Hildenbrand wrote:
>>>> Let's revert what we did in case seomthing goes wrong and we return an
>>>> error.
>>>
>>> Dumb question, but should not we do this for other arches as well?
>>
>> It seems arm64 and s390 already do that.
>> x86 could have its arch_add_memory() improved though :)
> 
> Right, I only stared at x86 and see it did not have it.
> I guess we want to have all arches aligned with this.
The ultimate goal would be to get rid of arch-specific arch_add_memory() 
implementations completely, providing arch_create_linear_mapping() / 
arch_remove_linear_mapping() instead (as indicated in patch #1).
The x86 variant certainly needs love, but I'll keep this patch set 
powerpc specific, so it can go via the powerpc tree in one piece. I'll 
add unifying these implementations onto my todo list.
Thanks!
-- 
Thanks,
David / dhildenb
^ permalink raw reply	[flat|nested] 15+ messages in thread 
 
 
 
- * Re: [PATCH v1 3/4] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()
  2020-10-29 16:27 ` [PATCH v1 3/4] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory() David Hildenbrand
  2020-11-04  9:50   ` osalvador
@ 2020-11-04 12:11   ` Oscar Salvador
  1 sibling, 0 replies; 15+ messages in thread
From: Oscar Salvador @ 2020-11-04 12:11 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Wei Yang
On Thu, Oct 29, 2020 at 05:27:17PM +0100, David Hildenbrand wrote:
> Let's revert what we did in case seomthing goes wrong and we return an
> error.
> 
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Rashmica Gupta <rashmica.g@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
> ---
>  arch/powerpc/mm/mem.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 685028451dd2..69b3e8072261 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -165,7 +165,10 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
>  	rc = arch_create_linear_mapping(nid, start, size, params);
>  	if (rc)
>  		return rc;
> -	return __add_pages(nid, start_pfn, nr_pages, params);
> +	rc = __add_pages(nid, start_pfn, nr_pages, params);
> +	if (rc)
> +		arch_remove_linear_mapping(start, size);
> +	return rc;
>  }
>  
>  void __ref arch_remove_memory(int nid, u64 start, u64 size,
> -- 
> 2.26.2
> 
-- 
Oscar Salvador
SUSE L3
^ permalink raw reply	[flat|nested] 15+ messages in thread 
 
- * [PATCH v1 4/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
  2020-10-29 16:27 [PATCH v1 0/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
                   ` (2 preceding siblings ...)
  2020-10-29 16:27 ` [PATCH v1 3/4] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory() David Hildenbrand
@ 2020-10-29 16:27 ` David Hildenbrand
  2020-11-03  9:23   ` Michal Hocko
  2020-11-25 11:57 ` [PATCH v1 0/4] " Michael Ellerman
  4 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2020-10-29 16:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Michal Hocko,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Rashmica Gupta, Andrew Morton, Mike Rapoport, Michal Hocko,
	Oscar Salvador, Wei Yang
Let's use alloc_contig_pages() for allocating memory and remove the
linear mapping manually via arch_remove_linear_mapping(). Mark all pages
PG_offline, such that they will definitely not get touched - e.g.,
when hibernating. When freeing memory, try to revert what we did.
The original idea was discussed in:
 https://lkml.kernel.org/r/48340e96-7e6b-736f-9e23-d3111b915b6e@redhat.com
This is similar to CONFIG_DEBUG_PAGEALLOC handling on other
architectures, whereby only single pages are unmapped from the linear
mapping. Let's mimic what memory hot(un)plug would do with the linear
mapping.
We now need MEMORY_HOTPLUG and CONTIG_ALLOC as dependencies.
Simple test under QEMU TCG (10GB RAM, single NUMA node):
sh-5.0# mount -t debugfs none /sys/kernel/debug/
sh-5.0# cat /sys/devices/system/memory/block_size_bytes
40000000
sh-5.0# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
[   71.052836][  T356] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
sh-5.0# echo 0x80000000 > /sys/kernel/debug/powerpc/memtrace/enable
[   75.424302][  T356] radix-mmu: Mapped 0x0000000080000000-0x00000000c0000000 with 64.0 KiB pages
[   75.430549][  T356] memtrace: Freed trace memory back on node 0
[   75.604520][  T356] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
sh-5.0# echo 0x100000000 > /sys/kernel/debug/powerpc/memtrace/enable
[   80.418835][  T356] radix-mmu: Mapped 0x0000000080000000-0x0000000100000000 with 64.0 KiB pages
[   80.430493][  T356] memtrace: Freed trace memory back on node 0
[   80.433882][  T356] memtrace: Failed to allocate trace memory on node 0
sh-5.0# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
[   91.920158][  T356] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
Note 1: We currently won't be allocating from ZONE_MOVABLE - because our
	pages are not movable. However, as we don't run with any memory
	hot(un)plug mechanism around, we could make an exception to
	increase the chance of allocations succeeding.
Note 2: PG_reserved isn't sufficient. E.g., kernel_page_present() used
	along PG_reserved in hibernation code will always return "true"
	on powerpc, resulting in the pages getting touched. It's too
	generic - e.g., indicates boot allocations.
Note 3: For now, we keep using memory_block_size_bytes() as minimum
	granularity. I'm not able to come up with a better guess (most
	probably, doing it on a section basis could be possible).
Suggested-by: Michal Hocko <mhocko@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/platforms/powernv/Kconfig    |   8 +-
 arch/powerpc/platforms/powernv/memtrace.c | 134 ++++++++--------------
 2 files changed, 49 insertions(+), 93 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 938803eab0ad..619b093a0657 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -27,11 +27,11 @@ config OPAL_PRD
 	  recovery diagnostics on OpenPower machines
 
 config PPC_MEMTRACE
-	bool "Enable removal of RAM from kernel mappings for tracing"
-	depends on PPC_POWERNV && MEMORY_HOTREMOVE
+	bool "Enable runtime allocation of RAM for tracing"
+	depends on PPC_POWERNV && MEMORY_HOTPLUG && CONTIG_ALLOC
 	help
-	  Enabling this option allows for the removal of memory (RAM)
-	  from the kernel mappings to be used for hardware tracing.
+	  Enabling this option allows for runtime allocation of memory (RAM)
+	  for hardware tracing.
 
 config PPC_VAS
 	bool "IBM Virtual Accelerator Switchboard (VAS)"
diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 6828108486f8..8f47797a78c2 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -50,83 +50,50 @@ static const struct file_operations memtrace_fops = {
 	.open	= simple_open,
 };
 
-static int check_memblock_online(struct memory_block *mem, void *arg)
-{
-	if (mem->state != MEM_ONLINE)
-		return -1;
-
-	return 0;
-}
-
-static int change_memblock_state(struct memory_block *mem, void *arg)
-{
-	unsigned long state = (unsigned long)arg;
-
-	mem->state = state;
-
-	return 0;
-}
-
-/* called with device_hotplug_lock held */
-static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
+static u64 memtrace_alloc_node(u32 nid, u64 size)
 {
-	const unsigned long start = PFN_PHYS(start_pfn);
-	const unsigned long size = PFN_PHYS(nr_pages);
+	const unsigned long nr_pages = PHYS_PFN(size);
+	unsigned long pfn, start_pfn;
+	struct page *page;
 
-	if (walk_memory_blocks(start, size, NULL, check_memblock_online))
-		return false;
-
-	walk_memory_blocks(start, size, (void *)MEM_GOING_OFFLINE,
-			   change_memblock_state);
-
-	if (offline_pages(start_pfn, nr_pages)) {
-		walk_memory_blocks(start, size, (void *)MEM_ONLINE,
-				   change_memblock_state);
-		return false;
-	}
+	/*
+	 * Trace memory needs to be aligned to the size, which is guaranteed
+	 * by alloc_contig_pages().
+	 */
+	page = alloc_contig_pages(nr_pages, __GFP_THISNODE | __GFP_NOWARN,
+				  nid, NULL);
+	if (!page)
+		return 0;
 
-	walk_memory_blocks(start, size, (void *)MEM_OFFLINE,
-			   change_memblock_state);
+	/*
+	 * Set pages PageOffline(), to indicate that nobody (e.g., hibernation,
+	 * dumping, ...) should be touching these pages.
+	 */
+	start_pfn = page_to_pfn(page);
+	for (pfn = start_pfn; pfn < start_pfn + nr_pages; pfn++)
+		__SetPageOffline(pfn_to_page(pfn));
 
+	arch_remove_linear_mapping(PFN_PHYS(start_pfn), size);
 
-	return true;
+	return PFN_PHYS(start_pfn);
 }
 
-static u64 memtrace_alloc_node(u32 nid, u64 size)
+static int memtrace_free(int nid, u64 start, u64 size)
 {
-	u64 start_pfn, end_pfn, nr_pages, pfn;
-	u64 base_pfn;
-	u64 bytes = memory_block_size_bytes();
+	struct mhp_params params = { .pgprot = PAGE_KERNEL };
+	const unsigned long nr_pages = PHYS_PFN(size);
+	const unsigned long start_pfn = PHYS_PFN(start);
+	unsigned long pfn;
+	int ret;
 
-	if (!node_spanned_pages(nid))
-		return 0;
+	ret = arch_create_linear_mapping(nid, start, size, ¶ms);
+	if (ret)
+		return ret;
 
-	start_pfn = node_start_pfn(nid);
-	end_pfn = node_end_pfn(nid);
-	nr_pages = size >> PAGE_SHIFT;
-
-	/* Trace memory needs to be aligned to the size */
-	end_pfn = round_down(end_pfn - nr_pages, nr_pages);
-
-	lock_device_hotplug();
-	for (base_pfn = end_pfn; base_pfn > start_pfn; base_pfn -= nr_pages) {
-		if (memtrace_offline_pages(nid, base_pfn, nr_pages) == true) {
-			/*
-			 * Remove memory in memory block size chunks so that
-			 * iomem resources are always split to the same size and
-			 * we never try to remove memory that spans two iomem
-			 * resources.
-			 */
-			end_pfn = base_pfn + nr_pages;
-			for (pfn = base_pfn; pfn < end_pfn; pfn += bytes>> PAGE_SHIFT) {
-				__remove_memory(nid, pfn << PAGE_SHIFT, bytes);
-			}
-			unlock_device_hotplug();
-			return base_pfn << PAGE_SHIFT;
-		}
-	}
-	unlock_device_hotplug();
+	for (pfn = start_pfn; pfn < start_pfn + nr_pages; pfn++)
+		__ClearPageOffline(pfn_to_page(pfn));
 
+	free_contig_range(start_pfn, nr_pages);
 	return 0;
 }
 
@@ -197,16 +164,11 @@ static int memtrace_init_debugfs(void)
 	return ret;
 }
 
-static int online_mem_block(struct memory_block *mem, void *arg)
-{
-	return device_online(&mem->dev);
-}
-
 /*
- * Iterate through the chunks of memory we have removed from the kernel
- * and attempt to add them back to the kernel.
+ * Iterate through the chunks of memory we allocated and attempt to expose
+ * them back to the kernel.
  */
-static int memtrace_online(void)
+static int memtrace_free_regions(void)
 {
 	int i, ret = 0;
 	struct memtrace_entry *ent;
@@ -214,7 +176,7 @@ static int memtrace_online(void)
 	for (i = memtrace_array_nr - 1; i >= 0; i--) {
 		ent = &memtrace_array[i];
 
-		/* We have onlined this chunk previously */
+		/* We have freed this chunk previously */
 		if (ent->nid == NUMA_NO_NODE)
 			continue;
 
@@ -224,30 +186,25 @@ static int memtrace_online(void)
 			ent->mem = 0;
 		}
 
-		if (add_memory(ent->nid, ent->start, ent->size, MHP_NONE)) {
-			pr_err("Failed to add trace memory to node %d\n",
+		if (memtrace_free(ent->nid, ent->start, ent->size)) {
+			pr_err("Failed to free trace memory on node %d\n",
 				ent->nid);
 			ret += 1;
 			continue;
 		}
 
-		lock_device_hotplug();
-		walk_memory_blocks(ent->start, ent->size, NULL,
-				   online_mem_block);
-		unlock_device_hotplug();
-
 		/*
-		 * Memory was added successfully so clean up references to it
-		 * so on reentry we can tell that this chunk was added.
+		 * Memory was freed successfully so clean up references to it
+		 * so on reentry we can tell that this chunk was freed.
 		 */
 		debugfs_remove_recursive(ent->dir);
-		pr_info("Added trace memory back to node %d\n", ent->nid);
+		pr_info("Freed trace memory back on node %d\n", ent->nid);
 		ent->size = ent->start = ent->nid = NUMA_NO_NODE;
 	}
 	if (ret)
 		return ret;
 
-	/* If all chunks of memory were added successfully, reset globals */
+	/* If all chunks of memory were freed successfully, reset globals */
 	kfree(memtrace_array);
 	memtrace_array = NULL;
 	memtrace_size = 0;
@@ -269,16 +226,15 @@ static int memtrace_enable_set(void *data, u64 val)
 		return -EINVAL;
 	}
 
-	/* Re-add/online previously removed/offlined memory */
+	/* Free all previously allocated memory. */
 	if (memtrace_size) {
-		if (memtrace_online())
+		if (memtrace_free_regions())
 			return -EAGAIN;
 	}
 
 	if (!val)
 		return 0;
 
-	/* Offline and remove memory */
 	if (memtrace_init_regions_runtime(val))
 		return -EINVAL;
 
-- 
2.26.2
^ permalink raw reply related	[flat|nested] 15+ messages in thread
- * Re: [PATCH v1 4/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
  2020-10-29 16:27 ` [PATCH v1 4/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
@ 2020-11-03  9:23   ` Michal Hocko
  2020-11-03  9:29     ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2020-11-03  9:23 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Oscar Salvador, Wei Yang
On Thu 29-10-20 17:27:18, David Hildenbrand wrote:
> Let's use alloc_contig_pages() for allocating memory and remove the
> linear mapping manually via arch_remove_linear_mapping(). Mark all pages
> PG_offline, such that they will definitely not get touched - e.g.,
> when hibernating. When freeing memory, try to revert what we did.
> 
> The original idea was discussed in:
>  https://lkml.kernel.org/r/48340e96-7e6b-736f-9e23-d3111b915b6e@redhat.com
> 
> This is similar to CONFIG_DEBUG_PAGEALLOC handling on other
> architectures, whereby only single pages are unmapped from the linear
> mapping. Let's mimic what memory hot(un)plug would do with the linear
> mapping.
> 
> We now need MEMORY_HOTPLUG and CONTIG_ALLOC as dependencies.
> 
> Simple test under QEMU TCG (10GB RAM, single NUMA node):
> 
> sh-5.0# mount -t debugfs none /sys/kernel/debug/
> sh-5.0# cat /sys/devices/system/memory/block_size_bytes
> 40000000
> sh-5.0# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
> [   71.052836][  T356] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
> sh-5.0# echo 0x80000000 > /sys/kernel/debug/powerpc/memtrace/enable
> [   75.424302][  T356] radix-mmu: Mapped 0x0000000080000000-0x00000000c0000000 with 64.0 KiB pages
> [   75.430549][  T356] memtrace: Freed trace memory back on node 0
> [   75.604520][  T356] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
> sh-5.0# echo 0x100000000 > /sys/kernel/debug/powerpc/memtrace/enable
> [   80.418835][  T356] radix-mmu: Mapped 0x0000000080000000-0x0000000100000000 with 64.0 KiB pages
> [   80.430493][  T356] memtrace: Freed trace memory back on node 0
> [   80.433882][  T356] memtrace: Failed to allocate trace memory on node 0
> sh-5.0# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
> [   91.920158][  T356] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
> 
> Note 1: We currently won't be allocating from ZONE_MOVABLE - because our
> 	pages are not movable. However, as we don't run with any memory
> 	hot(un)plug mechanism around, we could make an exception to
> 	increase the chance of allocations succeeding.
> 
> Note 2: PG_reserved isn't sufficient. E.g., kernel_page_present() used
> 	along PG_reserved in hibernation code will always return "true"
> 	on powerpc, resulting in the pages getting touched. It's too
> 	generic - e.g., indicates boot allocations.
> 
> Note 3: For now, we keep using memory_block_size_bytes() as minimum
> 	granularity. I'm not able to come up with a better guess (most
> 	probably, doing it on a section basis could be possible).
> 
> Suggested-by: Michal Hocko <mhocko@kernel.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Rashmica Gupta <rashmica.g@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Thanks! This looks like a move into the right direction. I cannot really
judge implementation details because I am not familiar with the code.
I have only one tiny concern:
[...]
> -/* called with device_hotplug_lock held */
> -static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
> +static u64 memtrace_alloc_node(u32 nid, u64 size)
>  {
> -	const unsigned long start = PFN_PHYS(start_pfn);
> -	const unsigned long size = PFN_PHYS(nr_pages);
> +	const unsigned long nr_pages = PHYS_PFN(size);
> +	unsigned long pfn, start_pfn;
> +	struct page *page;
>  
> -	if (walk_memory_blocks(start, size, NULL, check_memblock_online))
> -		return false;
> -
> -	walk_memory_blocks(start, size, (void *)MEM_GOING_OFFLINE,
> -			   change_memblock_state);
> -
> -	if (offline_pages(start_pfn, nr_pages)) {
> -		walk_memory_blocks(start, size, (void *)MEM_ONLINE,
> -				   change_memblock_state);
> -		return false;
> -	}
> +	/*
> +	 * Trace memory needs to be aligned to the size, which is guaranteed
> +	 * by alloc_contig_pages().
> +	 */
> +	page = alloc_contig_pages(nr_pages, __GFP_THISNODE | __GFP_NOWARN,
> +				  nid, NULL);
__GFP_THISNODE without other modifiers looks suspicious. I suspect you
want to enfore node locality and exclude movable zones by this. While
this works it is an antipattern. I would rather use GFP_KERNEL |
__GFP_THISNODE | __GFP_NOWARN to be more in line with other gfp usage.
If for no other reasons we want to be able to work inside a normal
compaction context (comparing to effectively GFP_NOIO which the above
implies). Also this looks like a sleepable context.
-- 
Michal Hocko
SUSE Labs
^ permalink raw reply	[flat|nested] 15+ messages in thread
- * Re: [PATCH v1 4/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
  2020-11-03  9:23   ` Michal Hocko
@ 2020-11-03  9:29     ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2020-11-03  9:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, linuxppc-dev, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Oscar Salvador, Wei Yang
On 03.11.20 10:23, Michal Hocko wrote:
> On Thu 29-10-20 17:27:18, David Hildenbrand wrote:
>> Let's use alloc_contig_pages() for allocating memory and remove the
>> linear mapping manually via arch_remove_linear_mapping(). Mark all pages
>> PG_offline, such that they will definitely not get touched - e.g.,
>> when hibernating. When freeing memory, try to revert what we did.
>>
>> The original idea was discussed in:
>>   https://lkml.kernel.org/r/48340e96-7e6b-736f-9e23-d3111b915b6e@redhat.com
>>
>> This is similar to CONFIG_DEBUG_PAGEALLOC handling on other
>> architectures, whereby only single pages are unmapped from the linear
>> mapping. Let's mimic what memory hot(un)plug would do with the linear
>> mapping.
>>
>> We now need MEMORY_HOTPLUG and CONTIG_ALLOC as dependencies.
>>
>> Simple test under QEMU TCG (10GB RAM, single NUMA node):
>>
>> sh-5.0# mount -t debugfs none /sys/kernel/debug/
>> sh-5.0# cat /sys/devices/system/memory/block_size_bytes
>> 40000000
>> sh-5.0# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
>> [   71.052836][  T356] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
>> sh-5.0# echo 0x80000000 > /sys/kernel/debug/powerpc/memtrace/enable
>> [   75.424302][  T356] radix-mmu: Mapped 0x0000000080000000-0x00000000c0000000 with 64.0 KiB pages
>> [   75.430549][  T356] memtrace: Freed trace memory back on node 0
>> [   75.604520][  T356] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
>> sh-5.0# echo 0x100000000 > /sys/kernel/debug/powerpc/memtrace/enable
>> [   80.418835][  T356] radix-mmu: Mapped 0x0000000080000000-0x0000000100000000 with 64.0 KiB pages
>> [   80.430493][  T356] memtrace: Freed trace memory back on node 0
>> [   80.433882][  T356] memtrace: Failed to allocate trace memory on node 0
>> sh-5.0# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
>> [   91.920158][  T356] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
>>
>> Note 1: We currently won't be allocating from ZONE_MOVABLE - because our
>> 	pages are not movable. However, as we don't run with any memory
>> 	hot(un)plug mechanism around, we could make an exception to
>> 	increase the chance of allocations succeeding.
>>
>> Note 2: PG_reserved isn't sufficient. E.g., kernel_page_present() used
>> 	along PG_reserved in hibernation code will always return "true"
>> 	on powerpc, resulting in the pages getting touched. It's too
>> 	generic - e.g., indicates boot allocations.
>>
>> Note 3: For now, we keep using memory_block_size_bytes() as minimum
>> 	granularity. I'm not able to come up with a better guess (most
>> 	probably, doing it on a section basis could be possible).
>>
>> Suggested-by: Michal Hocko <mhocko@kernel.org>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Paul Mackerras <paulus@samba.org>
>> Cc: Rashmica Gupta <rashmica.g@gmail.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Mike Rapoport <rppt@kernel.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> Thanks! This looks like a move into the right direction. I cannot really
> judge implementation details because I am not familiar with the code.
> I have only one tiny concern:
> [...]
>> -/* called with device_hotplug_lock held */
>> -static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
>> +static u64 memtrace_alloc_node(u32 nid, u64 size)
>>   {
>> -	const unsigned long start = PFN_PHYS(start_pfn);
>> -	const unsigned long size = PFN_PHYS(nr_pages);
>> +	const unsigned long nr_pages = PHYS_PFN(size);
>> +	unsigned long pfn, start_pfn;
>> +	struct page *page;
>>   
>> -	if (walk_memory_blocks(start, size, NULL, check_memblock_online))
>> -		return false;
>> -
>> -	walk_memory_blocks(start, size, (void *)MEM_GOING_OFFLINE,
>> -			   change_memblock_state);
>> -
>> -	if (offline_pages(start_pfn, nr_pages)) {
>> -		walk_memory_blocks(start, size, (void *)MEM_ONLINE,
>> -				   change_memblock_state);
>> -		return false;
>> -	}
>> +	/*
>> +	 * Trace memory needs to be aligned to the size, which is guaranteed
>> +	 * by alloc_contig_pages().
>> +	 */
>> +	page = alloc_contig_pages(nr_pages, __GFP_THISNODE | __GFP_NOWARN,
>> +				  nid, NULL);
> 
> __GFP_THISNODE without other modifiers looks suspicious. I suspect you
> want to enfore node locality and exclude movable zones by this. While
> this works it is an antipattern. I would rather use GFP_KERNEL |
> __GFP_THISNODE | __GFP_NOWARN to be more in line with other gfp usage.
Agreed GFP_KERNEL should be the right thing to do here.
> 
> If for no other reasons we want to be able to work inside a normal
> compaction context (comparing to effectively GFP_NOIO which the above
> implies). Also this looks like a sleepable context.
> 
Yes it is. Thanks!
-- 
Thanks,
David / dhildenb
^ permalink raw reply	[flat|nested] 15+ messages in thread
 
 
- * Re: [PATCH v1 0/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
  2020-10-29 16:27 [PATCH v1 0/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
                   ` (3 preceding siblings ...)
  2020-10-29 16:27 ` [PATCH v1 4/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
@ 2020-11-25 11:57 ` Michael Ellerman
  4 siblings, 0 replies; 15+ messages in thread
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: linux-kernel, David Hildenbrand
  Cc: Rashmica Gupta, Michael Ellerman, Andrew Morton, Paul Mackerras,
	Michal Hocko, Mike Rapoport, Michal Hocko, Wei Yang,
	Oscar Salvador, linux-mm, Benjamin Herrenschmidt, linuxppc-dev
On Thu, 29 Oct 2020 17:27:14 +0100, David Hildenbrand wrote:
> powernv/memtrace is the only in-kernel user that rips out random memory
> it never added (doesn't own) in order to allocate memory without a
> linear mapping. Let's stop abusing memory hot(un)plug infrastructure for
> that - use alloc_contig_pages() for allocating memory and remove the
> linear mapping manually.
> 
> The original idea was discussed in:
>  https://lkml.kernel.org/r/48340e96-7e6b-736f-9e23-d3111b915b6e@redhat.com
> 
> [...]
Applied to powerpc/next.
[1/4] powerpc/mm: factor out creating/removing linear mapping
      https://git.kernel.org/powerpc/c/4abb1e5b63ac3281275315fc6b0cde0b9c2e2e42
[2/4] powerpc/mm: print warning in arch_remove_linear_mapping()
      https://git.kernel.org/powerpc/c/1f73ad3e8d755dbec52fcec98618a7ce4de12af2
[3/4] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()
      https://git.kernel.org/powerpc/c/ca2c36cae9d48b180ea51259e35ab3d95d327df2
[4/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
      https://git.kernel.org/powerpc/c/0bd4b96d99108b7ea9bac0573957483be7781d70
cheers
^ permalink raw reply	[flat|nested] 15+ messages in thread