LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP
From: David Hildenbrand @ 2020-05-01 21:10 UTC (permalink / raw)
  To: Dan Williams
  Cc: virtio-dev, linux-hyperv, Michal Hocko, Baoquan He, Linux ACPI,
	Wei Yang, linux-s390, linux-nvdimm, Linux Kernel Mailing List,
	virtualization, Linux MM, Michael S . Tsirkin, Eric W. Biederman,
	Pankaj Gupta, xen-devel, Andrew Morton, Michal Hocko,
	linuxppc-dev
In-Reply-To: <CAPcyv4iXyOUDZgqhWH1KCObvATL=gP55xEr64rsRfUuJg5B+eQ@mail.gmail.com>

On 01.05.20 22:12, Dan Williams wrote:
> On Fri, May 1, 2020 at 12:18 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 01.05.20 20:43, Dan Williams wrote:
>>> On Fri, May 1, 2020 at 11:14 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 01.05.20 20:03, Dan Williams wrote:
>>>>> On Fri, May 1, 2020 at 10:51 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>>
>>>>>> On 01.05.20 19:45, David Hildenbrand wrote:
>>>>>>> On 01.05.20 19:39, Dan Williams wrote:
>>>>>>>> On Fri, May 1, 2020 at 10:21 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> On 01.05.20 18:56, Dan Williams wrote:
>>>>>>>>>> On Fri, May 1, 2020 at 2:34 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 01.05.20 00:24, Andrew Morton wrote:
>>>>>>>>>>>> On Thu, 30 Apr 2020 20:43:39 +0200 David Hildenbrand <david@redhat.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Why does the firmware map support hotplug entries?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I assume:
>>>>>>>>>>>>>
>>>>>>>>>>>>> The firmware memmap was added primarily for x86-64 kexec (and still, is
>>>>>>>>>>>>> mostly used on x86-64 only IIRC). There, we had ACPI hotplug. When DIMMs
>>>>>>>>>>>>> get hotplugged on real HW, they get added to e820. Same applies to
>>>>>>>>>>>>> memory added via HyperV balloon (unless memory is unplugged via
>>>>>>>>>>>>> ballooning and you reboot ... the the e820 is changed as well). I assume
>>>>>>>>>>>>> we wanted to be able to reflect that, to make kexec look like a real reboot.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This worked for a while. Then came dax/kmem. Now comes virtio-mem.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> But I assume only Andrew can enlighten us.
>>>>>>>>>>>>>
>>>>>>>>>>>>> @Andrew, any guidance here? Should we really add all memory to the
>>>>>>>>>>>>> firmware memmap, even if this contradicts with the existing
>>>>>>>>>>>>> documentation? (especially, if the actual firmware memmap will *not*
>>>>>>>>>>>>> contain that memory after a reboot)
>>>>>>>>>>>>
>>>>>>>>>>>> For some reason that patch is misattributed - it was authored by
>>>>>>>>>>>> Shaohui Zheng <shaohui.zheng@intel.com>, who hasn't been heard from in
>>>>>>>>>>>> a decade.  I looked through the email discussion from that time and I'm
>>>>>>>>>>>> not seeing anything useful.  But I wasn't able to locate Dave Hansen's
>>>>>>>>>>>> review comments.
>>>>>>>>>>>
>>>>>>>>>>> Okay, thanks for checking. I think the documentation from 2008 is pretty
>>>>>>>>>>> clear what has to be done here. I will add some of these details to the
>>>>>>>>>>> patch description.
>>>>>>>>>>>
>>>>>>>>>>> Also, now that I know that esp. kexec-tools already don't consider
>>>>>>>>>>> dax/kmem memory properly (memory will not get dumped via kdump) and
>>>>>>>>>>> won't really suffer from a name change in /proc/iomem, I will go back to
>>>>>>>>>>> the MHP_DRIVER_MANAGED approach and
>>>>>>>>>>> 1. Don't create firmware memmap entries
>>>>>>>>>>> 2. Name the resource "System RAM (driver managed)"
>>>>>>>>>>> 3. Flag the resource via something like IORESOURCE_MEM_DRIVER_MANAGED.
>>>>>>>>>>>
>>>>>>>>>>> This way, kernel users and user space can figure out that this memory
>>>>>>>>>>> has different semantics and handle it accordingly - I think that was
>>>>>>>>>>> what Eric was asking for.
>>>>>>>>>>>
>>>>>>>>>>> Of course, open for suggestions.
>>>>>>>>>>
>>>>>>>>>> I'm still more of a fan of this being communicated by "System RAM"
>>>>>>>>>
>>>>>>>>> I was mentioning somewhere in this thread that "System RAM" inside a
>>>>>>>>> hierarchy (like dax/kmem) will already be basically ignored by
>>>>>>>>> kexec-tools. So, placing it inside a hierarchy already makes it look
>>>>>>>>> special already.
>>>>>>>>>
>>>>>>>>> But after all, as we have to change kexec-tools either way, we can
>>>>>>>>> directly go ahead and flag it properly as special (in case there will
>>>>>>>>> ever be other cases where we could no longer distinguish it).
>>>>>>>>>
>>>>>>>>>> being parented especially because that tells you something about how
>>>>>>>>>> the memory is driver-managed and which mechanism might be in play.
>>>>>>>>>
>>>>>>>>> The could be communicated to some degree via the resource hierarchy.
>>>>>>>>>
>>>>>>>>> E.g.,
>>>>>>>>>
>>>>>>>>>             [root@localhost ~]# cat /proc/iomem
>>>>>>>>>             ...
>>>>>>>>>             140000000-33fffffff : Persistent Memory
>>>>>>>>>               140000000-1481fffff : namespace0.0
>>>>>>>>>               150000000-33fffffff : dax0.0
>>>>>>>>>                 150000000-33fffffff : System RAM (driver managed)
>>>>>>>>>
>>>>>>>>> vs.
>>>>>>>>>
>>>>>>>>>            :/# cat /proc/iomem
>>>>>>>>>             [...]
>>>>>>>>>             140000000-333ffffff : virtio-mem (virtio0)
>>>>>>>>>               140000000-147ffffff : System RAM (driver managed)
>>>>>>>>>               148000000-14fffffff : System RAM (driver managed)
>>>>>>>>>               150000000-157ffffff : System RAM (driver managed)
>>>>>>>>>
>>>>>>>>> Good enough for my taste.
>>>>>>>>>
>>>>>>>>>> What about adding an optional /sys/firmware/memmap/X/parent attribute.
>>>>>>>>>
>>>>>>>>> I really don't want any firmware memmap entries for something that is
>>>>>>>>> not part of the firmware provided memmap. In addition,
>>>>>>>>> /sys/firmware/memmap/ is still a fairly x86_64 specific thing. Only mips
>>>>>>>>> and two arm configs enable it at all.
>>>>>>>>>
>>>>>>>>> So, IMHO, /sys/firmware/memmap/ is definitely not the way to go.
>>>>>>>>
>>>>>>>> I think that's a policy decision and policy decisions do not belong in
>>>>>>>> the kernel. Give the tooling the opportunity to decide whether System
>>>>>>>> RAM stays that way over a kexec. The parenthetical reference otherwise
>>>>>>>> looks out of place to me in the /proc/iomem output. What makes it
>>>>>>>> "driver managed" is how the kernel handles it, not how the kernel
>>>>>>>> names it.
>>>>>>>
>>>>>>> At least, virtio-mem is different. It really *has to be handled* by the
>>>>>>> driver. This is not a policy. It's how it works.
>>>>>
>>>>> ...but that's not necessarily how dax/kmem works.
>>>>>
>>>>
>>>> Yes, and user space could still take that memory and add it to the
>>>> firmware memmap if it really wants to. It knows that it is special. It
>>>> can figure out that it belongs to a dax device using /proc/iomem.
>>>>
>>>>>>>
>>>>>>
>>>>>> Oh, and I don't see why "System RAM (driver managed)" would hinder any
>>>>>> policy in user case to still do what it thinks is the right thing to do
>>>>>> (e.g., for dax).
>>>>>>
>>>>>> "System RAM (driver managed)" would mean: Memory is not part of the raw
>>>>>> firmware memmap. It was detected and added by a driver. Handle with
>>>>>> care, this is special.
>>>>>
>>>>> Oh, no, I was more reacting to your, "don't update
>>>>> /sys/firmware/memmap for the (driver managed) range" choice as being a
>>>>> policy decision. It otherwise feels to me "System RAM (driver
>>>>> managed)" adds confusion for casual users of /proc/iomem and for clued
>>>>> in tools they have the parent association to decide policy.
>>>>
>>>> Not sure if I understand correctly, so bear with me :).
>>>>
>>>> Adding or not adding stuff to /sys/firmware/memmap is not a policy
>>>> decision. If it's not part of the raw firmware-provided memmap, it has
>>>> nothing to do in /sys/firmware/memmap. That's what the documentation
>>>> from 2008 tells us.
>>>
>>> It just occurs to me that there are valid cases for both wanting to
>>> start over with driver managed memory with a kexec and keeping it in
>>> the map.
>>
>> Yes, there might be valid cases. My gut feeling is that in the general
>> case, you want to let the kexec kernel implement a policy/ let the user
>> in the new system decide.
>>
>> But as I said, you can implement in kexec-tools whatever policy you
>> want. It has access to all information.
> 
> Right, so why is a new type needed if all the information is there by
> other means?

You mean "System RAM (driver managed)" in /proc/iomem? See below for more.

> 
>>> Consider the case of EFI Special Purpose (SP) Memory that is
>>> marked EFI Conventional Memory with the SP attribute. In that case the
>>> firmware memory map marked it as conventional RAM, but the kernel
>>> optionally marks it as System RAM vs Soft Reserved. The 2008 patch
>>> simply does not consider that case. I'm not sure strict textualism
>>> works for coding decisions.
>>
>> I am no expert on that matter (esp EFI). But looking at the users of
>> firmware_map_add_early(), the single user is in arch/x86/kernel/e820.c
>> . So the single source of /sys/firmware/memmap is (besides hotplug) e820.
>>
>> "'e820_table_firmware': the original firmware version passed to us by
>> the bootloader - not modified by the kernel. ... inform the user about
>> the firmware's notion of memory layout via /sys/firmware/memmap"
>> (arch/x86/kernel/e820.c)
>>
>> How is the EFI Special Purpose (SP) Memory represented in e820?
>> /sys/firmware/memmap is really simple: just dump in e820. No policies IIUC.
> 
> e820 now has a Soft Reserved translation for this which means "try to
> reserve, but treat as System RAM is ok too". It seems generically
> useful to me that the toggle for determining whether Soft Reserved or
> System RAM shows up /sys/firmware/memmap is a determination that
> policy can make. The kernel need not preemptively block it.

So, I think I have to clarify something here. We do have two ways to kexec

1. kexec_load(): User space (kexec-tools) crafts the memmap (e.g., using
/sys/firmware/memmap on x86-64) and selects memory where to place the
kexec images (e.g., using /proc/iomem)

2. kexec_file_load(): The kernel reuses the (basically) raw firmware
memmap and selects memory where to place kexec images.

We are talking about changing 1, to behave like 2 in regards to
dax/kmem. 2. does currently not add any hotplugged memory to the
fixed-up e820, and it should be fixed regarding hotplugged DIMMs that
would appear in e820 after a reboot.

Now, all these policy discussions are nice and fun, but I don't really
see a good reason to (ab)use /sys/firmware/memmap for that (e.g., parent
properties). If you want to be able to make this configurable, then
e.g., add a way to configure this in the kernel (for example along with
kmem) to make 1. and 2. behave the same way. Otherwise, you really only
can change 1.


Now, let's clarify what I want regarding virtio-mem:

1. kexec should not add virtio-mem memory to the initial firmware
   memmap. The driver has to be in charge as discussed.
2. kexec should not place kexec images onto virtio-mem memory. That
   would end badly.
3. kexec should still dump virtio-mem memory via kdump.

This has to work when using kexec_load() or kexec_file_load(). This has
to theoretically work on different architectures (especially, without
/sys/firmware/memmap). kexec-tools has to have access to that
information to figure out what to do.

Regarding 1:
- kexec_file_load(): works out of the box currently.
- kexec_load(): Don't create entries in /sys/firmware/memmap (for
  reasons discussed)
Regarding 2:
- kexec_file_load(): tag the resources as IORESOURCE_MEM_DRIVER_MANAGED
  (inspired by Eric)
- kexec_load(): indicate the memory as "System RAM (driver managed)"
Regarding 3:
- Same as 2. kexec-tools need to be thought to properly consider the
  memory during kdump.

Now, you are asking, "why System RAM (driver managed)". I don't think
it's strictly needed right now, but it feels cleaner. E.g., for
virtio-mem the current plan is to have /proc/iomem look like

           :/# cat /proc/iomem
            [...]
            140000000-333ffffff : virtio-mem (virtio0)
              140000000-147ffffff : System RAM (driver managed)
              148000000-14fffffff : System RAM (driver managed)
              150000000-157ffffff : System RAM (driver managed)

One could judge by looking at the hierarchy, that this memory is
special. kexec-tools will skip it currently in either form.

If we all agree here, that we can drop it, then let's drop it,
especially if it would allow dax/kmem to use the same mechanism I am
proposing here for virtio-mem.


Now, it would be fairly simple to add a config option for dax/kmem,
making it configurable in the kernel, whether to add memory via
MHP_DRIVER_MANAGED or just as we do now. It would contradict with the
"raw firmware/prov..." description of /sys/firmware/memmap, but hey,
somebody explicitly configured it, so it can't be wrong.

-- 
Thanks,

David / dhildenb


^ permalink raw reply

* Re: [RFC PATCH v2 1/5] powerpc/mm: Introduce temporary mm
From: Christopher M. Riedl @ 2020-05-01 20:46 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, kernel-hardening
In-Reply-To: <d481ec66-8e14-614f-8e33-d381ce606bc5@c-s.fr>

On Wed Apr 29, 2020 at 7:48 AM, Christophe Leroy wrote:
>
> 
>
> 
> Le 29/04/2020 à 04:05, Christopher M. Riedl a écrit :
> > x86 supports the notion of a temporary mm which restricts access to
> > temporary PTEs to a single CPU. A temporary mm is useful for situations
> > where a CPU needs to perform sensitive operations (such as patching a
> > STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
> > said mappings to other CPUs. A side benefit is that other CPU TLBs do
> > not need to be flushed when the temporary mm is torn down.
> > 
> > Mappings in the temporary mm can be set in the userspace portion of the
> > address-space.
> > 
> > Interrupts must be disabled while the temporary mm is in use. HW
> > breakpoints, which may have been set by userspace as watchpoints on
> > addresses now within the temporary mm, are saved and disabled when
> > loading the temporary mm. The HW breakpoints are restored when unloading
> > the temporary mm. All HW breakpoints are indiscriminately disabled while
> > the temporary mm is in use.
>
> 
> Why do we need to use a temporary mm all the time ?
>

Not sure I understand, the temporary mm is only in use for kernel
patching in this series. We could have other uses in the future maybe
where it's beneficial to keep mappings local.

> 
> Doesn't each CPU have its own mm already ? Only the upper address space
> is shared between all mm's but each mm has its own lower address space,
> at least when it is running a user process. Why not just use that mm ?
> As we are mapping then unmapping with interrupts disabled, there is no
> risk at all that the user starts running while the patch page is mapped,
> so I'm not sure why switching to a temporary mm is needed.
>
> 

I suppose that's an option, but then we have to save and restore the
mapping which we temporarily "steal" from userspace. I admit I didn't
consider that as an option when I started this series based on the x86
patches. I think it's cleaner to switch mm, but that's a rather weak
argument. Are you concerned about performance with the temporary mm?

>
> 
> > 
> > Based on x86 implementation:
> > 
> > commit cefa929c034e
> > ("x86/mm: Introduce temporary mm structs")
> > 
> > Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
>
> 
> Christophe
>
> 
>
> 


^ permalink raw reply

* 5.7-rc interrupt_return Unrecoverable exception 380
From: Hugh Dickins @ 2020-05-01 20:38 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Michal Suchanek, Hugh Dickins, linuxppc-dev

Hi Nick,

I've been getting an "Unrecoverable exception 380" after a few hours
of load on the G5 (yes, that G5!) with 5.7-rc: when interrupt_return
checks lazy_irq_pending, it crashes at check_preemption_disabled+0x24
with CONFIG_DEBUG_PREEMPT=y.

check_preemption_disabled():
lib/smp_processor_id.c:13
   0:	7c 08 02 a6 	mflr    r0
   4:	fb e1 ff f8 	std     r31,-8(r1)
   8:	fb 61 ff d8 	std     r27,-40(r1)
   c:	fb 81 ff e0 	std     r28,-32(r1)
  10:	fb a1 ff e8 	std     r29,-24(r1)
  14:	fb c1 ff f0 	std     r30,-16(r1)
get_current():
arch/powerpc/include/asm/current.h:20
  18:	eb ed 01 88 	ld      r31,392(r13)
check_preemption_disabled():
lib/smp_processor_id.c:13
  1c:	f8 01 00 10 	std     r0,16(r1)
  20:	f8 21 ff 61 	stdu    r1,-160(r1)
__read_once_size():
include/linux/compiler.h:199
  24:	81 3f 00 00 	lwz     r9,0(r31)
check_preemption_disabled():
lib/smp_processor_id.c:14
  28:	a3 cd 00 02 	lhz     r30,2(r13)

I don't read ppc assembly, and have not jotted down the registers,
but hope you can make sense of it. I get around it with the patch
below (just avoiding the debug), but have no idea whether it's a
necessary fix or a hacky workaround.

Hugh

--- 5.7-rc3/arch/powerpc/include/asm/hw_irq.h	2020-04-12 16:24:29.802769727 -0700
+++ linux/arch/powerpc/include/asm/hw_irq.h	2020-04-27 11:31:10.000000000 -0700
@@ -252,7 +252,7 @@ static inline bool arch_irqs_disabled(vo
 
 static inline bool lazy_irq_pending(void)
 {
-	return !!(get_paca()->irq_happened & ~PACA_IRQ_HARD_DIS);
+	return !!(local_paca->irq_happened & ~PACA_IRQ_HARD_DIS);
 }
 
 /*

^ permalink raw reply

* Re: [RFC PATCH v2 1/5] powerpc/mm: Introduce temporary mm
From: Christopher M. Riedl @ 2020-05-01 20:30 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, kernel-hardening
In-Reply-To: <df3d65fe-0c13-10dc-8508-b59b6daa3fdc@c-s.fr>

On Wed Apr 29, 2020 at 7:39 AM, Christophe Leroy wrote:
>
> 
>
> 
> Le 29/04/2020 à 04:05, Christopher M. Riedl a écrit :
> > x86 supports the notion of a temporary mm which restricts access to
> > temporary PTEs to a single CPU. A temporary mm is useful for situations
> > where a CPU needs to perform sensitive operations (such as patching a
> > STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
> > said mappings to other CPUs. A side benefit is that other CPU TLBs do
> > not need to be flushed when the temporary mm is torn down.
> > 
> > Mappings in the temporary mm can be set in the userspace portion of the
> > address-space.
> > 
> > Interrupts must be disabled while the temporary mm is in use. HW
> > breakpoints, which may have been set by userspace as watchpoints on
> > addresses now within the temporary mm, are saved and disabled when
> > loading the temporary mm. The HW breakpoints are restored when unloading
> > the temporary mm. All HW breakpoints are indiscriminately disabled while
> > the temporary mm is in use.
> > 
> > Based on x86 implementation:
> > 
> > commit cefa929c034e
> > ("x86/mm: Introduce temporary mm structs")
> > 
> > Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
> > ---
> >   arch/powerpc/include/asm/debug.h       |  1 +
> >   arch/powerpc/include/asm/mmu_context.h | 54 ++++++++++++++++++++++++++
> >   arch/powerpc/kernel/process.c          |  5 +++
> >   3 files changed, 60 insertions(+)
> > 
> > diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
> > index 7756026b95ca..b945bc16c932 100644
> > --- a/arch/powerpc/include/asm/debug.h
> > +++ b/arch/powerpc/include/asm/debug.h
> > @@ -45,6 +45,7 @@ static inline int debugger_break_match(struct pt_regs *regs) { return 0; }
> >   static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; }
> >   #endif
> >   
> > +void __get_breakpoint(struct arch_hw_breakpoint *brk);
> >   void __set_breakpoint(struct arch_hw_breakpoint *brk);
> >   bool ppc_breakpoint_available(void);
> >   #ifdef CONFIG_PPC_ADV_DEBUG_REGS
> > diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> > index 360367c579de..57a8695fe63f 100644
> > --- a/arch/powerpc/include/asm/mmu_context.h
> > +++ b/arch/powerpc/include/asm/mmu_context.h
> > @@ -10,6 +10,7 @@
> >   #include <asm/mmu.h>	
> >   #include <asm/cputable.h>
> >   #include <asm/cputhreads.h>
> > +#include <asm/debug.h>
> >   
> >   /*
> >    * Most if the context management is out of line
> > @@ -270,5 +271,58 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
> >   	return 0;
> >   }
> >   
> > +struct temp_mm {
> > +	struct mm_struct *temp;
> > +	struct mm_struct *prev;
> > +	bool is_kernel_thread;
> > +	struct arch_hw_breakpoint brk;
> > +};
> > +
> > +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
> > +{
> > +	temp_mm->temp = mm;
> > +	temp_mm->prev = NULL;
> > +	temp_mm->is_kernel_thread = false;
> > +	memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
> > +}
> > +
> > +static inline void use_temporary_mm(struct temp_mm *temp_mm)
> > +{
> > +	lockdep_assert_irqs_disabled();
> > +
> > +	temp_mm->is_kernel_thread = current->mm == NULL;
> > +	if (temp_mm->is_kernel_thread)
> > +		temp_mm->prev = current->active_mm;
> > +	else
> > +		temp_mm->prev = current->mm;
> > +
> > +	/*
> > +	 * Hash requires a non-NULL current->mm to allocate a userspace address
> > +	 * when handling a page fault. Does not appear to hurt in Radix either.
> > +	 */
> > +	current->mm = temp_mm->temp;
> > +	switch_mm_irqs_off(NULL, temp_mm->temp, current);
> > +
> > +	if (ppc_breakpoint_available()) {
> > +		__get_breakpoint(&temp_mm->brk);
> > +		if (temp_mm->brk.type != 0)
> > +			hw_breakpoint_disable();
> > +	}
> > +}
> > +
> > +static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
>
> 
> Not sure "unuse" is a best naming, allthought I don't have a better
> suggestion a the moment. If not using temporary_mm anymore, what are we
> using now ?
>
> 

I'm not too fond of 'unuse' either, but it's what x86 uses and I
couldn't come up with anything better on the spot. Maybe 'undo' is
better since we're switching back to whatever mm was in use before?

> > +{
> > +	lockdep_assert_irqs_disabled();
> > +
> > +	if (temp_mm->is_kernel_thread)
> > +		current->mm = NULL;
> > +	else
> > +		current->mm = temp_mm->prev;
> > +	switch_mm_irqs_off(NULL, temp_mm->prev, current);
> > +
> > +	if (ppc_breakpoint_available() && temp_mm->brk.type != 0)
> > +		__set_breakpoint(&temp_mm->brk);
> > +}
> > +
> >   #endif /* __KERNEL__ */
> >   #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> > index 9c21288f8645..ec4cf890d92c 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -800,6 +800,11 @@ static inline int set_breakpoint_8xx(struct arch_hw_breakpoint *brk)
> >   	return 0;
> >   }
> >   
> > +void __get_breakpoint(struct arch_hw_breakpoint *brk)
> > +{
> > +	memcpy(brk, this_cpu_ptr(&current_brk), sizeof(*brk));
> > +}
> > +
> >   void __set_breakpoint(struct arch_hw_breakpoint *brk)
> >   {
> >   	memcpy(this_cpu_ptr(&current_brk), brk, sizeof(*brk));
> > 
>
> 
> Christophe
>
> 
>
> 


^ permalink raw reply

* Re: [RFC PATCH v2 3/5] powerpc/lib: Use a temporary mm for code patching
From: Christopher M. Riedl @ 2020-05-01 20:28 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, kernel-hardening
In-Reply-To: <ce7d8643-d7bc-5d1a-6098-2352550e3793@c-s.fr>

On Wed Apr 29, 2020 at 7:52 AM, Christophe Leroy wrote:
>
> 
>
> 
> Le 29/04/2020 à 04:05, Christopher M. Riedl a écrit :
> > Currently, code patching a STRICT_KERNEL_RWX exposes the temporary
> > mappings to other CPUs. These mappings should be kept local to the CPU
> > doing the patching. Use the pre-initialized temporary mm and patching
> > address for this purpose. Also add a check after patching to ensure the
> > patch succeeded.
> > 
> > Use the KUAP functions on non-BOOKS3_64 platforms since the temporary
> > mapping for patching uses a userspace address (to keep the mapping
> > local). On BOOKS3_64 platforms hash does not implement KUAP and on radix
> > the use of PAGE_KERNEL sets EAA[0] for the PTE which means the AMR
> > (KUAP) protection is ignored (see PowerISA v3.0b, Fig, 35).
> > 
> > Based on x86 implementation:
> > 
> > commit b3fd8e83ada0
> > ("x86/alternatives: Use temporary mm for text poking")
> > 
> > Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
> > ---
> >   arch/powerpc/lib/code-patching.c | 149 ++++++++++++-------------------
> >   1 file changed, 55 insertions(+), 94 deletions(-)
> > 
> > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> > index 259c19480a85..26f06cdb5d7e 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
> > @@ -19,6 +19,7 @@
> >   #include <asm/page.h>
> >   #include <asm/code-patching.h>
> >   #include <asm/setup.h>
> > +#include <asm/mmu_context.h>
> >   
> >   static int __patch_instruction(unsigned int *exec_addr, unsigned int instr,
> >   			       unsigned int *patch_addr)
> > @@ -72,101 +73,58 @@ void __init poking_init(void)
> >   	pte_unmap_unlock(ptep, ptl);
> >   }
> >   
> > -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> > -
> > -static int text_area_cpu_up(unsigned int cpu)
> > -{
> > -	struct vm_struct *area;
> > -
> > -	area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> > -	if (!area) {
> > -		WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> > -			cpu);
> > -		return -1;
> > -	}
> > -	this_cpu_write(text_poke_area, area);
> > -
> > -	return 0;
> > -}
> > -
> > -static int text_area_cpu_down(unsigned int cpu)
> > -{
> > -	free_vm_area(this_cpu_read(text_poke_area));
> > -	return 0;
> > -}
> > -
> > -/*
> > - * Run as a late init call. This allows all the boot time patching to be done
> > - * simply by patching the code, and then we're called here prior to
> > - * mark_rodata_ro(), which happens after all init calls are run. Although
> > - * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
> > - * it as being preferable to a kernel that will crash later when someone tries
> > - * to use patch_instruction().
> > - */
> > -static int __init setup_text_poke_area(void)
> > -{
> > -	BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> > -		"powerpc/text_poke:online", text_area_cpu_up,
> > -		text_area_cpu_down));
> > -
> > -	return 0;
> > -}
> > -late_initcall(setup_text_poke_area);
> > +struct patch_mapping {
> > +	spinlock_t *ptl; /* for protecting pte table */
> > +	pte_t *ptep;
> > +	struct temp_mm temp_mm;
> > +};
> >   
> >   /*
> >    * This can be called for kernel text or a module.
> >    */
> > -static int map_patch_area(void *addr, unsigned long text_poke_addr)
> > +static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
> >   {
> > -	unsigned long pfn;
> > -	int err;
> > +	struct page *page;
> > +	pte_t pte;
> > +	pgprot_t pgprot;
> >   
> >   	if (is_vmalloc_addr(addr))
> > -		pfn = vmalloc_to_pfn(addr);
> > +		page = vmalloc_to_page(addr);
> >   	else
> > -		pfn = __pa_symbol(addr) >> PAGE_SHIFT;
> > +		page = virt_to_page(addr);
> >   
> > -	err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);
> > +	if (radix_enabled())
> > +		pgprot = PAGE_KERNEL;
> > +	else
> > +		pgprot = PAGE_SHARED;
> >   
> > -	pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err);
> > -	if (err)
> > +	patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr,
> > +					     &patch_mapping->ptl);
> > +	if (unlikely(!patch_mapping->ptep)) {
> > +		pr_warn("map patch: failed to allocate pte for patching\n");
> >   		return -1;
> > +	}
> > +
> > +	pte = mk_pte(page, pgprot);
> > +	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64))
> > +		pte = pte_mkdirty(pte);
>
> 
> Why only when CONFIG_PPC_BOOK3S_64 is not set ?
>
> 
> PAGE_KERNEL should already be dirty, so making it dirty all the time
> shouldn't hurt.
>
> 
Ok, I'll remove this check to simplify.
> > +	set_pte_at(patching_mm, patching_addr, patch_mapping->ptep, pte);
> > +
> > +	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
> > +	use_temporary_mm(&patch_mapping->temp_mm);
> >   
> >   	return 0;
> >   }
> >   
> > -static inline int unmap_patch_area(unsigned long addr)
> > +static void unmap_patch(struct patch_mapping *patch_mapping)
> >   {
> > -	pte_t *ptep;
> > -	pmd_t *pmdp;
> > -	pud_t *pudp;
> > -	pgd_t *pgdp;
> > -
> > -	pgdp = pgd_offset_k(addr);
> > -	if (unlikely(!pgdp))
> > -		return -EINVAL;
> > -
> > -	pudp = pud_offset(pgdp, addr);
> > -	if (unlikely(!pudp))
> > -		return -EINVAL;
> > -
> > -	pmdp = pmd_offset(pudp, addr);
> > -	if (unlikely(!pmdp))
> > -		return -EINVAL;
> > -
> > -	ptep = pte_offset_kernel(pmdp, addr);
> > -	if (unlikely(!ptep))
> > -		return -EINVAL;
> > +	/* In hash, pte_clear flushes the tlb */
> > +	pte_clear(patching_mm, patching_addr, patch_mapping->ptep);
> > +	unuse_temporary_mm(&patch_mapping->temp_mm);
> >   
> > -	pr_devel("clearing mm %p, pte %p, addr %lx\n", &init_mm, ptep, addr);
> > -
> > -	/*
> > -	 * In hash, pte_clear flushes the tlb, in radix, we have to
> > -	 */
> > -	pte_clear(&init_mm, addr, ptep);
> > -	flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> > -
> > -	return 0;
> > +	/* In radix, we have to explicitly flush the tlb (no-op in hash) */
> > +	local_flush_tlb_mm(patching_mm);
> > +	pte_unmap_unlock(patch_mapping->ptep, patch_mapping->ptl);
> >   }
> >   
> >   static int do_patch_instruction(unsigned int *addr, unsigned int instr)
> > @@ -174,33 +132,36 @@ static int do_patch_instruction(unsigned int *addr, unsigned int instr)
> >   	int err;
> >   	unsigned int *patch_addr = NULL;
> >   	unsigned long flags;
> > -	unsigned long text_poke_addr;
> > -	unsigned long kaddr = (unsigned long)addr;
> > +	struct patch_mapping patch_mapping;
> >   
> >   	/*
> > -	 * During early early boot patch_instruction is called
> > -	 * when text_poke_area is not ready, but we still need
> > -	 * to allow patching. We just do the plain old patching
> > +	 * The patching_mm is initialized before calling mark_rodata_ro. Prior
> > +	 * to this, patch_instruction is called when we don't have (and don't
> > +	 * need) the patching_mm so just do plain old patching.
> >   	 */
> > -	if (!this_cpu_read(text_poke_area))
> > +	if (!patching_mm)
> >   		return raw_patch_instruction(addr, instr);
> >   
> >   	local_irq_save(flags);
> >   
> > -	text_poke_addr = (unsigned long)__this_cpu_read(text_poke_area)->addr;
> > -	if (map_patch_area(addr, text_poke_addr)) {
> > -		err = -1;
> > +	err = map_patch(addr, &patch_mapping);
> > +	if (err)
> >   		goto out;
> > -	}
> >   
> > -	patch_addr = (unsigned int *)(text_poke_addr) +
> > -			((kaddr & ~PAGE_MASK) / sizeof(unsigned int));
> > +	patch_addr = (unsigned int *)(patching_addr | offset_in_page(addr));
> >   
> > -	__patch_instruction(addr, instr, patch_addr);
> > +	if (!radix_enabled())
> > +		allow_write_to_user(patch_addr, sizeof(instr));
> > +	err = __patch_instruction(addr, instr, patch_addr);
> > +	if (!radix_enabled())
> > +		prevent_write_to_user(patch_addr, sizeof(instr));
> >   
> > -	err = unmap_patch_area(text_poke_addr);
> > -	if (err)
> > -		pr_warn("failed to unmap %lx\n", text_poke_addr);
> > +	unmap_patch(&patch_mapping);
> > +	/*
> > +	 * Something is wrong if what we just wrote doesn't match what we
> > +	 * think we just wrote.
> > +	 */
> > +	WARN_ON(*addr != instr);
> >   
> >   out:
> >   	local_irq_restore(flags);
> > 
>
> 
> Christophe
>
> 
>
> 


^ permalink raw reply

* Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP
From: Dan Williams @ 2020-05-01 20:12 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: virtio-dev, linux-hyperv, Michal Hocko, Baoquan He, Linux ACPI,
	Wei Yang, linux-s390, linux-nvdimm, Linux Kernel Mailing List,
	virtualization, Linux MM, Michael S . Tsirkin, Eric W. Biederman,
	Pankaj Gupta, xen-devel, Andrew Morton, Michal Hocko,
	linuxppc-dev
In-Reply-To: <04242d48-5fa9-6da4-3e4a-991e401eb580@redhat.com>

On Fri, May 1, 2020 at 12:18 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 01.05.20 20:43, Dan Williams wrote:
> > On Fri, May 1, 2020 at 11:14 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 01.05.20 20:03, Dan Williams wrote:
> >>> On Fri, May 1, 2020 at 10:51 AM David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>> On 01.05.20 19:45, David Hildenbrand wrote:
> >>>>> On 01.05.20 19:39, Dan Williams wrote:
> >>>>>> On Fri, May 1, 2020 at 10:21 AM David Hildenbrand <david@redhat.com> wrote:
> >>>>>>>
> >>>>>>> On 01.05.20 18:56, Dan Williams wrote:
> >>>>>>>> On Fri, May 1, 2020 at 2:34 AM David Hildenbrand <david@redhat.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On 01.05.20 00:24, Andrew Morton wrote:
> >>>>>>>>>> On Thu, 30 Apr 2020 20:43:39 +0200 David Hildenbrand <david@redhat.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Why does the firmware map support hotplug entries?
> >>>>>>>>>>>
> >>>>>>>>>>> I assume:
> >>>>>>>>>>>
> >>>>>>>>>>> The firmware memmap was added primarily for x86-64 kexec (and still, is
> >>>>>>>>>>> mostly used on x86-64 only IIRC). There, we had ACPI hotplug. When DIMMs
> >>>>>>>>>>> get hotplugged on real HW, they get added to e820. Same applies to
> >>>>>>>>>>> memory added via HyperV balloon (unless memory is unplugged via
> >>>>>>>>>>> ballooning and you reboot ... the the e820 is changed as well). I assume
> >>>>>>>>>>> we wanted to be able to reflect that, to make kexec look like a real reboot.
> >>>>>>>>>>>
> >>>>>>>>>>> This worked for a while. Then came dax/kmem. Now comes virtio-mem.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> But I assume only Andrew can enlighten us.
> >>>>>>>>>>>
> >>>>>>>>>>> @Andrew, any guidance here? Should we really add all memory to the
> >>>>>>>>>>> firmware memmap, even if this contradicts with the existing
> >>>>>>>>>>> documentation? (especially, if the actual firmware memmap will *not*
> >>>>>>>>>>> contain that memory after a reboot)
> >>>>>>>>>>
> >>>>>>>>>> For some reason that patch is misattributed - it was authored by
> >>>>>>>>>> Shaohui Zheng <shaohui.zheng@intel.com>, who hasn't been heard from in
> >>>>>>>>>> a decade.  I looked through the email discussion from that time and I'm
> >>>>>>>>>> not seeing anything useful.  But I wasn't able to locate Dave Hansen's
> >>>>>>>>>> review comments.
> >>>>>>>>>
> >>>>>>>>> Okay, thanks for checking. I think the documentation from 2008 is pretty
> >>>>>>>>> clear what has to be done here. I will add some of these details to the
> >>>>>>>>> patch description.
> >>>>>>>>>
> >>>>>>>>> Also, now that I know that esp. kexec-tools already don't consider
> >>>>>>>>> dax/kmem memory properly (memory will not get dumped via kdump) and
> >>>>>>>>> won't really suffer from a name change in /proc/iomem, I will go back to
> >>>>>>>>> the MHP_DRIVER_MANAGED approach and
> >>>>>>>>> 1. Don't create firmware memmap entries
> >>>>>>>>> 2. Name the resource "System RAM (driver managed)"
> >>>>>>>>> 3. Flag the resource via something like IORESOURCE_MEM_DRIVER_MANAGED.
> >>>>>>>>>
> >>>>>>>>> This way, kernel users and user space can figure out that this memory
> >>>>>>>>> has different semantics and handle it accordingly - I think that was
> >>>>>>>>> what Eric was asking for.
> >>>>>>>>>
> >>>>>>>>> Of course, open for suggestions.
> >>>>>>>>
> >>>>>>>> I'm still more of a fan of this being communicated by "System RAM"
> >>>>>>>
> >>>>>>> I was mentioning somewhere in this thread that "System RAM" inside a
> >>>>>>> hierarchy (like dax/kmem) will already be basically ignored by
> >>>>>>> kexec-tools. So, placing it inside a hierarchy already makes it look
> >>>>>>> special already.
> >>>>>>>
> >>>>>>> But after all, as we have to change kexec-tools either way, we can
> >>>>>>> directly go ahead and flag it properly as special (in case there will
> >>>>>>> ever be other cases where we could no longer distinguish it).
> >>>>>>>
> >>>>>>>> being parented especially because that tells you something about how
> >>>>>>>> the memory is driver-managed and which mechanism might be in play.
> >>>>>>>
> >>>>>>> The could be communicated to some degree via the resource hierarchy.
> >>>>>>>
> >>>>>>> E.g.,
> >>>>>>>
> >>>>>>>             [root@localhost ~]# cat /proc/iomem
> >>>>>>>             ...
> >>>>>>>             140000000-33fffffff : Persistent Memory
> >>>>>>>               140000000-1481fffff : namespace0.0
> >>>>>>>               150000000-33fffffff : dax0.0
> >>>>>>>                 150000000-33fffffff : System RAM (driver managed)
> >>>>>>>
> >>>>>>> vs.
> >>>>>>>
> >>>>>>>            :/# cat /proc/iomem
> >>>>>>>             [...]
> >>>>>>>             140000000-333ffffff : virtio-mem (virtio0)
> >>>>>>>               140000000-147ffffff : System RAM (driver managed)
> >>>>>>>               148000000-14fffffff : System RAM (driver managed)
> >>>>>>>               150000000-157ffffff : System RAM (driver managed)
> >>>>>>>
> >>>>>>> Good enough for my taste.
> >>>>>>>
> >>>>>>>> What about adding an optional /sys/firmware/memmap/X/parent attribute.
> >>>>>>>
> >>>>>>> I really don't want any firmware memmap entries for something that is
> >>>>>>> not part of the firmware provided memmap. In addition,
> >>>>>>> /sys/firmware/memmap/ is still a fairly x86_64 specific thing. Only mips
> >>>>>>> and two arm configs enable it at all.
> >>>>>>>
> >>>>>>> So, IMHO, /sys/firmware/memmap/ is definitely not the way to go.
> >>>>>>
> >>>>>> I think that's a policy decision and policy decisions do not belong in
> >>>>>> the kernel. Give the tooling the opportunity to decide whether System
> >>>>>> RAM stays that way over a kexec. The parenthetical reference otherwise
> >>>>>> looks out of place to me in the /proc/iomem output. What makes it
> >>>>>> "driver managed" is how the kernel handles it, not how the kernel
> >>>>>> names it.
> >>>>>
> >>>>> At least, virtio-mem is different. It really *has to be handled* by the
> >>>>> driver. This is not a policy. It's how it works.
> >>>
> >>> ...but that's not necessarily how dax/kmem works.
> >>>
> >>
> >> Yes, and user space could still take that memory and add it to the
> >> firmware memmap if it really wants to. It knows that it is special. It
> >> can figure out that it belongs to a dax device using /proc/iomem.
> >>
> >>>>>
> >>>>
> >>>> Oh, and I don't see why "System RAM (driver managed)" would hinder any
> >>>> policy in user case to still do what it thinks is the right thing to do
> >>>> (e.g., for dax).
> >>>>
> >>>> "System RAM (driver managed)" would mean: Memory is not part of the raw
> >>>> firmware memmap. It was detected and added by a driver. Handle with
> >>>> care, this is special.
> >>>
> >>> Oh, no, I was more reacting to your, "don't update
> >>> /sys/firmware/memmap for the (driver managed) range" choice as being a
> >>> policy decision. It otherwise feels to me "System RAM (driver
> >>> managed)" adds confusion for casual users of /proc/iomem and for clued
> >>> in tools they have the parent association to decide policy.
> >>
> >> Not sure if I understand correctly, so bear with me :).
> >>
> >> Adding or not adding stuff to /sys/firmware/memmap is not a policy
> >> decision. If it's not part of the raw firmware-provided memmap, it has
> >> nothing to do in /sys/firmware/memmap. That's what the documentation
> >> from 2008 tells us.
> >
> > It just occurs to me that there are valid cases for both wanting to
> > start over with driver managed memory with a kexec and keeping it in
> > the map.
>
> Yes, there might be valid cases. My gut feeling is that in the general
> case, you want to let the kexec kernel implement a policy/ let the user
> in the new system decide.
>
> But as I said, you can implement in kexec-tools whatever policy you
> want. It has access to all information.

Right, so why is a new type needed if all the information is there by
other means?

> > Consider the case of EFI Special Purpose (SP) Memory that is
> > marked EFI Conventional Memory with the SP attribute. In that case the
> > firmware memory map marked it as conventional RAM, but the kernel
> > optionally marks it as System RAM vs Soft Reserved. The 2008 patch
> > simply does not consider that case. I'm not sure strict textualism
> > works for coding decisions.
>
> I am no expert on that matter (esp EFI). But looking at the users of
> firmware_map_add_early(), the single user is in arch/x86/kernel/e820.c
> . So the single source of /sys/firmware/memmap is (besides hotplug) e820.
>
> "'e820_table_firmware': the original firmware version passed to us by
> the bootloader - not modified by the kernel. ... inform the user about
> the firmware's notion of memory layout via /sys/firmware/memmap"
> (arch/x86/kernel/e820.c)
>
> How is the EFI Special Purpose (SP) Memory represented in e820?
> /sys/firmware/memmap is really simple: just dump in e820. No policies IIUC.

e820 now has a Soft Reserved translation for this which means "try to
reserve, but treat as System RAM is ok too". It seems generically
useful to me that the toggle for determining whether Soft Reserved or
System RAM shows up /sys/firmware/memmap is a determination that
policy can make. The kernel need not preemptively block it.

^ permalink raw reply

* Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP
From: David Hildenbrand @ 2020-05-01 19:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: virtio-dev, linux-hyperv, Michal Hocko, Baoquan He, Linux ACPI,
	Wei Yang, linux-s390, linux-nvdimm, Linux Kernel Mailing List,
	virtualization, Linux MM, Michael S . Tsirkin, Eric W. Biederman,
	Pankaj Gupta, xen-devel, Andrew Morton, Michal Hocko,
	linuxppc-dev
In-Reply-To: <CAPcyv4jjrxQ27rsfmz6wYPgmedevU=KG+wZ0GOm=qiE6tqa+VA@mail.gmail.com>

On 01.05.20 20:43, Dan Williams wrote:
> On Fri, May 1, 2020 at 11:14 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 01.05.20 20:03, Dan Williams wrote:
>>> On Fri, May 1, 2020 at 10:51 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 01.05.20 19:45, David Hildenbrand wrote:
>>>>> On 01.05.20 19:39, Dan Williams wrote:
>>>>>> On Fri, May 1, 2020 at 10:21 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>>>
>>>>>>> On 01.05.20 18:56, Dan Williams wrote:
>>>>>>>> On Fri, May 1, 2020 at 2:34 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> On 01.05.20 00:24, Andrew Morton wrote:
>>>>>>>>>> On Thu, 30 Apr 2020 20:43:39 +0200 David Hildenbrand <david@redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Why does the firmware map support hotplug entries?
>>>>>>>>>>>
>>>>>>>>>>> I assume:
>>>>>>>>>>>
>>>>>>>>>>> The firmware memmap was added primarily for x86-64 kexec (and still, is
>>>>>>>>>>> mostly used on x86-64 only IIRC). There, we had ACPI hotplug. When DIMMs
>>>>>>>>>>> get hotplugged on real HW, they get added to e820. Same applies to
>>>>>>>>>>> memory added via HyperV balloon (unless memory is unplugged via
>>>>>>>>>>> ballooning and you reboot ... the the e820 is changed as well). I assume
>>>>>>>>>>> we wanted to be able to reflect that, to make kexec look like a real reboot.
>>>>>>>>>>>
>>>>>>>>>>> This worked for a while. Then came dax/kmem. Now comes virtio-mem.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> But I assume only Andrew can enlighten us.
>>>>>>>>>>>
>>>>>>>>>>> @Andrew, any guidance here? Should we really add all memory to the
>>>>>>>>>>> firmware memmap, even if this contradicts with the existing
>>>>>>>>>>> documentation? (especially, if the actual firmware memmap will *not*
>>>>>>>>>>> contain that memory after a reboot)
>>>>>>>>>>
>>>>>>>>>> For some reason that patch is misattributed - it was authored by
>>>>>>>>>> Shaohui Zheng <shaohui.zheng@intel.com>, who hasn't been heard from in
>>>>>>>>>> a decade.  I looked through the email discussion from that time and I'm
>>>>>>>>>> not seeing anything useful.  But I wasn't able to locate Dave Hansen's
>>>>>>>>>> review comments.
>>>>>>>>>
>>>>>>>>> Okay, thanks for checking. I think the documentation from 2008 is pretty
>>>>>>>>> clear what has to be done here. I will add some of these details to the
>>>>>>>>> patch description.
>>>>>>>>>
>>>>>>>>> Also, now that I know that esp. kexec-tools already don't consider
>>>>>>>>> dax/kmem memory properly (memory will not get dumped via kdump) and
>>>>>>>>> won't really suffer from a name change in /proc/iomem, I will go back to
>>>>>>>>> the MHP_DRIVER_MANAGED approach and
>>>>>>>>> 1. Don't create firmware memmap entries
>>>>>>>>> 2. Name the resource "System RAM (driver managed)"
>>>>>>>>> 3. Flag the resource via something like IORESOURCE_MEM_DRIVER_MANAGED.
>>>>>>>>>
>>>>>>>>> This way, kernel users and user space can figure out that this memory
>>>>>>>>> has different semantics and handle it accordingly - I think that was
>>>>>>>>> what Eric was asking for.
>>>>>>>>>
>>>>>>>>> Of course, open for suggestions.
>>>>>>>>
>>>>>>>> I'm still more of a fan of this being communicated by "System RAM"
>>>>>>>
>>>>>>> I was mentioning somewhere in this thread that "System RAM" inside a
>>>>>>> hierarchy (like dax/kmem) will already be basically ignored by
>>>>>>> kexec-tools. So, placing it inside a hierarchy already makes it look
>>>>>>> special already.
>>>>>>>
>>>>>>> But after all, as we have to change kexec-tools either way, we can
>>>>>>> directly go ahead and flag it properly as special (in case there will
>>>>>>> ever be other cases where we could no longer distinguish it).
>>>>>>>
>>>>>>>> being parented especially because that tells you something about how
>>>>>>>> the memory is driver-managed and which mechanism might be in play.
>>>>>>>
>>>>>>> The could be communicated to some degree via the resource hierarchy.
>>>>>>>
>>>>>>> E.g.,
>>>>>>>
>>>>>>>             [root@localhost ~]# cat /proc/iomem
>>>>>>>             ...
>>>>>>>             140000000-33fffffff : Persistent Memory
>>>>>>>               140000000-1481fffff : namespace0.0
>>>>>>>               150000000-33fffffff : dax0.0
>>>>>>>                 150000000-33fffffff : System RAM (driver managed)
>>>>>>>
>>>>>>> vs.
>>>>>>>
>>>>>>>            :/# cat /proc/iomem
>>>>>>>             [...]
>>>>>>>             140000000-333ffffff : virtio-mem (virtio0)
>>>>>>>               140000000-147ffffff : System RAM (driver managed)
>>>>>>>               148000000-14fffffff : System RAM (driver managed)
>>>>>>>               150000000-157ffffff : System RAM (driver managed)
>>>>>>>
>>>>>>> Good enough for my taste.
>>>>>>>
>>>>>>>> What about adding an optional /sys/firmware/memmap/X/parent attribute.
>>>>>>>
>>>>>>> I really don't want any firmware memmap entries for something that is
>>>>>>> not part of the firmware provided memmap. In addition,
>>>>>>> /sys/firmware/memmap/ is still a fairly x86_64 specific thing. Only mips
>>>>>>> and two arm configs enable it at all.
>>>>>>>
>>>>>>> So, IMHO, /sys/firmware/memmap/ is definitely not the way to go.
>>>>>>
>>>>>> I think that's a policy decision and policy decisions do not belong in
>>>>>> the kernel. Give the tooling the opportunity to decide whether System
>>>>>> RAM stays that way over a kexec. The parenthetical reference otherwise
>>>>>> looks out of place to me in the /proc/iomem output. What makes it
>>>>>> "driver managed" is how the kernel handles it, not how the kernel
>>>>>> names it.
>>>>>
>>>>> At least, virtio-mem is different. It really *has to be handled* by the
>>>>> driver. This is not a policy. It's how it works.
>>>
>>> ...but that's not necessarily how dax/kmem works.
>>>
>>
>> Yes, and user space could still take that memory and add it to the
>> firmware memmap if it really wants to. It knows that it is special. It
>> can figure out that it belongs to a dax device using /proc/iomem.
>>
>>>>>
>>>>
>>>> Oh, and I don't see why "System RAM (driver managed)" would hinder any
>>>> policy in user case to still do what it thinks is the right thing to do
>>>> (e.g., for dax).
>>>>
>>>> "System RAM (driver managed)" would mean: Memory is not part of the raw
>>>> firmware memmap. It was detected and added by a driver. Handle with
>>>> care, this is special.
>>>
>>> Oh, no, I was more reacting to your, "don't update
>>> /sys/firmware/memmap for the (driver managed) range" choice as being a
>>> policy decision. It otherwise feels to me "System RAM (driver
>>> managed)" adds confusion for casual users of /proc/iomem and for clued
>>> in tools they have the parent association to decide policy.
>>
>> Not sure if I understand correctly, so bear with me :).
>>
>> Adding or not adding stuff to /sys/firmware/memmap is not a policy
>> decision. If it's not part of the raw firmware-provided memmap, it has
>> nothing to do in /sys/firmware/memmap. That's what the documentation
>> from 2008 tells us.
> 
> It just occurs to me that there are valid cases for both wanting to
> start over with driver managed memory with a kexec and keeping it in
> the map.

Yes, there might be valid cases. My gut feeling is that in the general
case, you want to let the kexec kernel implement a policy/ let the user
in the new system decide.

But as I said, you can implement in kexec-tools whatever policy you
want. It has access to all information.

> Consider the case of EFI Special Purpose (SP) Memory that is
> marked EFI Conventional Memory with the SP attribute. In that case the
> firmware memory map marked it as conventional RAM, but the kernel
> optionally marks it as System RAM vs Soft Reserved. The 2008 patch
> simply does not consider that case. I'm not sure strict textualism
> works for coding decisions.

I am no expert on that matter (esp EFI). But looking at the users of
firmware_map_add_early(), the single user is in arch/x86/kernel/e820.c
. So the single source of /sys/firmware/memmap is (besides hotplug) e820.

"'e820_table_firmware': the original firmware version passed to us by
the bootloader - not modified by the kernel. ... inform the user about
the firmware's notion of memory layout via /sys/firmware/memmap"
(arch/x86/kernel/e820.c)

How is the EFI Special Purpose (SP) Memory represented in e820?

/sys/firmware/memmap is really simple: just dump in e820. No policies IIUC.

-- 
Thanks,

David / dhildenb


^ permalink raw reply

* Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP
From: Dan Williams @ 2020-05-01 18:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: virtio-dev, linux-hyperv, Michal Hocko, Baoquan He, Linux ACPI,
	Wei Yang, linux-s390, linux-nvdimm, Linux Kernel Mailing List,
	virtualization, Linux MM, Michael S . Tsirkin, Eric W. Biederman,
	Pankaj Gupta, xen-devel, Andrew Morton, Michal Hocko,
	linuxppc-dev
In-Reply-To: <9f3a813e-dc1d-b675-6e69-85beed3057a4@redhat.com>

On Fri, May 1, 2020 at 11:14 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 01.05.20 20:03, Dan Williams wrote:
> > On Fri, May 1, 2020 at 10:51 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 01.05.20 19:45, David Hildenbrand wrote:
> >>> On 01.05.20 19:39, Dan Williams wrote:
> >>>> On Fri, May 1, 2020 at 10:21 AM David Hildenbrand <david@redhat.com> wrote:
> >>>>>
> >>>>> On 01.05.20 18:56, Dan Williams wrote:
> >>>>>> On Fri, May 1, 2020 at 2:34 AM David Hildenbrand <david@redhat.com> wrote:
> >>>>>>>
> >>>>>>> On 01.05.20 00:24, Andrew Morton wrote:
> >>>>>>>> On Thu, 30 Apr 2020 20:43:39 +0200 David Hildenbrand <david@redhat.com> wrote:
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Why does the firmware map support hotplug entries?
> >>>>>>>>>
> >>>>>>>>> I assume:
> >>>>>>>>>
> >>>>>>>>> The firmware memmap was added primarily for x86-64 kexec (and still, is
> >>>>>>>>> mostly used on x86-64 only IIRC). There, we had ACPI hotplug. When DIMMs
> >>>>>>>>> get hotplugged on real HW, they get added to e820. Same applies to
> >>>>>>>>> memory added via HyperV balloon (unless memory is unplugged via
> >>>>>>>>> ballooning and you reboot ... the the e820 is changed as well). I assume
> >>>>>>>>> we wanted to be able to reflect that, to make kexec look like a real reboot.
> >>>>>>>>>
> >>>>>>>>> This worked for a while. Then came dax/kmem. Now comes virtio-mem.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> But I assume only Andrew can enlighten us.
> >>>>>>>>>
> >>>>>>>>> @Andrew, any guidance here? Should we really add all memory to the
> >>>>>>>>> firmware memmap, even if this contradicts with the existing
> >>>>>>>>> documentation? (especially, if the actual firmware memmap will *not*
> >>>>>>>>> contain that memory after a reboot)
> >>>>>>>>
> >>>>>>>> For some reason that patch is misattributed - it was authored by
> >>>>>>>> Shaohui Zheng <shaohui.zheng@intel.com>, who hasn't been heard from in
> >>>>>>>> a decade.  I looked through the email discussion from that time and I'm
> >>>>>>>> not seeing anything useful.  But I wasn't able to locate Dave Hansen's
> >>>>>>>> review comments.
> >>>>>>>
> >>>>>>> Okay, thanks for checking. I think the documentation from 2008 is pretty
> >>>>>>> clear what has to be done here. I will add some of these details to the
> >>>>>>> patch description.
> >>>>>>>
> >>>>>>> Also, now that I know that esp. kexec-tools already don't consider
> >>>>>>> dax/kmem memory properly (memory will not get dumped via kdump) and
> >>>>>>> won't really suffer from a name change in /proc/iomem, I will go back to
> >>>>>>> the MHP_DRIVER_MANAGED approach and
> >>>>>>> 1. Don't create firmware memmap entries
> >>>>>>> 2. Name the resource "System RAM (driver managed)"
> >>>>>>> 3. Flag the resource via something like IORESOURCE_MEM_DRIVER_MANAGED.
> >>>>>>>
> >>>>>>> This way, kernel users and user space can figure out that this memory
> >>>>>>> has different semantics and handle it accordingly - I think that was
> >>>>>>> what Eric was asking for.
> >>>>>>>
> >>>>>>> Of course, open for suggestions.
> >>>>>>
> >>>>>> I'm still more of a fan of this being communicated by "System RAM"
> >>>>>
> >>>>> I was mentioning somewhere in this thread that "System RAM" inside a
> >>>>> hierarchy (like dax/kmem) will already be basically ignored by
> >>>>> kexec-tools. So, placing it inside a hierarchy already makes it look
> >>>>> special already.
> >>>>>
> >>>>> But after all, as we have to change kexec-tools either way, we can
> >>>>> directly go ahead and flag it properly as special (in case there will
> >>>>> ever be other cases where we could no longer distinguish it).
> >>>>>
> >>>>>> being parented especially because that tells you something about how
> >>>>>> the memory is driver-managed and which mechanism might be in play.
> >>>>>
> >>>>> The could be communicated to some degree via the resource hierarchy.
> >>>>>
> >>>>> E.g.,
> >>>>>
> >>>>>             [root@localhost ~]# cat /proc/iomem
> >>>>>             ...
> >>>>>             140000000-33fffffff : Persistent Memory
> >>>>>               140000000-1481fffff : namespace0.0
> >>>>>               150000000-33fffffff : dax0.0
> >>>>>                 150000000-33fffffff : System RAM (driver managed)
> >>>>>
> >>>>> vs.
> >>>>>
> >>>>>            :/# cat /proc/iomem
> >>>>>             [...]
> >>>>>             140000000-333ffffff : virtio-mem (virtio0)
> >>>>>               140000000-147ffffff : System RAM (driver managed)
> >>>>>               148000000-14fffffff : System RAM (driver managed)
> >>>>>               150000000-157ffffff : System RAM (driver managed)
> >>>>>
> >>>>> Good enough for my taste.
> >>>>>
> >>>>>> What about adding an optional /sys/firmware/memmap/X/parent attribute.
> >>>>>
> >>>>> I really don't want any firmware memmap entries for something that is
> >>>>> not part of the firmware provided memmap. In addition,
> >>>>> /sys/firmware/memmap/ is still a fairly x86_64 specific thing. Only mips
> >>>>> and two arm configs enable it at all.
> >>>>>
> >>>>> So, IMHO, /sys/firmware/memmap/ is definitely not the way to go.
> >>>>
> >>>> I think that's a policy decision and policy decisions do not belong in
> >>>> the kernel. Give the tooling the opportunity to decide whether System
> >>>> RAM stays that way over a kexec. The parenthetical reference otherwise
> >>>> looks out of place to me in the /proc/iomem output. What makes it
> >>>> "driver managed" is how the kernel handles it, not how the kernel
> >>>> names it.
> >>>
> >>> At least, virtio-mem is different. It really *has to be handled* by the
> >>> driver. This is not a policy. It's how it works.
> >
> > ...but that's not necessarily how dax/kmem works.
> >
>
> Yes, and user space could still take that memory and add it to the
> firmware memmap if it really wants to. It knows that it is special. It
> can figure out that it belongs to a dax device using /proc/iomem.
>
> >>>
> >>
> >> Oh, and I don't see why "System RAM (driver managed)" would hinder any
> >> policy in user case to still do what it thinks is the right thing to do
> >> (e.g., for dax).
> >>
> >> "System RAM (driver managed)" would mean: Memory is not part of the raw
> >> firmware memmap. It was detected and added by a driver. Handle with
> >> care, this is special.
> >
> > Oh, no, I was more reacting to your, "don't update
> > /sys/firmware/memmap for the (driver managed) range" choice as being a
> > policy decision. It otherwise feels to me "System RAM (driver
> > managed)" adds confusion for casual users of /proc/iomem and for clued
> > in tools they have the parent association to decide policy.
>
> Not sure if I understand correctly, so bear with me :).
>
> Adding or not adding stuff to /sys/firmware/memmap is not a policy
> decision. If it's not part of the raw firmware-provided memmap, it has
> nothing to do in /sys/firmware/memmap. That's what the documentation
> from 2008 tells us.

It just occurs to me that there are valid cases for both wanting to
start over with driver managed memory with a kexec and keeping it in
the map. Consider the case of EFI Special Purpose (SP) Memory that is
marked EFI Conventional Memory with the SP attribute. In that case the
firmware memory map marked it as conventional RAM, but the kernel
optionally marks it as System RAM vs Soft Reserved. The 2008 patch
simply does not consider that case. I'm not sure strict textualism
works for coding decisions.

^ permalink raw reply

* Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP
From: David Hildenbrand @ 2020-05-01 18:14 UTC (permalink / raw)
  To: Dan Williams
  Cc: virtio-dev, linux-hyperv, Michal Hocko, Baoquan He, Linux ACPI,
	Wei Yang, linux-s390, linux-nvdimm, Linux Kernel Mailing List,
	virtualization, Linux MM, Michael S . Tsirkin, Eric W. Biederman,
	Pankaj Gupta, xen-devel, Andrew Morton, Michal Hocko,
	linuxppc-dev
In-Reply-To: <CAPcyv4jGnR_fPtpKBC1rD2KRcT88bTkhqnTMmuwuc+f9Dwrz1g@mail.gmail.com>

On 01.05.20 20:03, Dan Williams wrote:
> On Fri, May 1, 2020 at 10:51 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 01.05.20 19:45, David Hildenbrand wrote:
>>> On 01.05.20 19:39, Dan Williams wrote:
>>>> On Fri, May 1, 2020 at 10:21 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>
>>>>> On 01.05.20 18:56, Dan Williams wrote:
>>>>>> On Fri, May 1, 2020 at 2:34 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>>>
>>>>>>> On 01.05.20 00:24, Andrew Morton wrote:
>>>>>>>> On Thu, 30 Apr 2020 20:43:39 +0200 David Hildenbrand <david@redhat.com> wrote:
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Why does the firmware map support hotplug entries?
>>>>>>>>>
>>>>>>>>> I assume:
>>>>>>>>>
>>>>>>>>> The firmware memmap was added primarily for x86-64 kexec (and still, is
>>>>>>>>> mostly used on x86-64 only IIRC). There, we had ACPI hotplug. When DIMMs
>>>>>>>>> get hotplugged on real HW, they get added to e820. Same applies to
>>>>>>>>> memory added via HyperV balloon (unless memory is unplugged via
>>>>>>>>> ballooning and you reboot ... the the e820 is changed as well). I assume
>>>>>>>>> we wanted to be able to reflect that, to make kexec look like a real reboot.
>>>>>>>>>
>>>>>>>>> This worked for a while. Then came dax/kmem. Now comes virtio-mem.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But I assume only Andrew can enlighten us.
>>>>>>>>>
>>>>>>>>> @Andrew, any guidance here? Should we really add all memory to the
>>>>>>>>> firmware memmap, even if this contradicts with the existing
>>>>>>>>> documentation? (especially, if the actual firmware memmap will *not*
>>>>>>>>> contain that memory after a reboot)
>>>>>>>>
>>>>>>>> For some reason that patch is misattributed - it was authored by
>>>>>>>> Shaohui Zheng <shaohui.zheng@intel.com>, who hasn't been heard from in
>>>>>>>> a decade.  I looked through the email discussion from that time and I'm
>>>>>>>> not seeing anything useful.  But I wasn't able to locate Dave Hansen's
>>>>>>>> review comments.
>>>>>>>
>>>>>>> Okay, thanks for checking. I think the documentation from 2008 is pretty
>>>>>>> clear what has to be done here. I will add some of these details to the
>>>>>>> patch description.
>>>>>>>
>>>>>>> Also, now that I know that esp. kexec-tools already don't consider
>>>>>>> dax/kmem memory properly (memory will not get dumped via kdump) and
>>>>>>> won't really suffer from a name change in /proc/iomem, I will go back to
>>>>>>> the MHP_DRIVER_MANAGED approach and
>>>>>>> 1. Don't create firmware memmap entries
>>>>>>> 2. Name the resource "System RAM (driver managed)"
>>>>>>> 3. Flag the resource via something like IORESOURCE_MEM_DRIVER_MANAGED.
>>>>>>>
>>>>>>> This way, kernel users and user space can figure out that this memory
>>>>>>> has different semantics and handle it accordingly - I think that was
>>>>>>> what Eric was asking for.
>>>>>>>
>>>>>>> Of course, open for suggestions.
>>>>>>
>>>>>> I'm still more of a fan of this being communicated by "System RAM"
>>>>>
>>>>> I was mentioning somewhere in this thread that "System RAM" inside a
>>>>> hierarchy (like dax/kmem) will already be basically ignored by
>>>>> kexec-tools. So, placing it inside a hierarchy already makes it look
>>>>> special already.
>>>>>
>>>>> But after all, as we have to change kexec-tools either way, we can
>>>>> directly go ahead and flag it properly as special (in case there will
>>>>> ever be other cases where we could no longer distinguish it).
>>>>>
>>>>>> being parented especially because that tells you something about how
>>>>>> the memory is driver-managed and which mechanism might be in play.
>>>>>
>>>>> The could be communicated to some degree via the resource hierarchy.
>>>>>
>>>>> E.g.,
>>>>>
>>>>>             [root@localhost ~]# cat /proc/iomem
>>>>>             ...
>>>>>             140000000-33fffffff : Persistent Memory
>>>>>               140000000-1481fffff : namespace0.0
>>>>>               150000000-33fffffff : dax0.0
>>>>>                 150000000-33fffffff : System RAM (driver managed)
>>>>>
>>>>> vs.
>>>>>
>>>>>            :/# cat /proc/iomem
>>>>>             [...]
>>>>>             140000000-333ffffff : virtio-mem (virtio0)
>>>>>               140000000-147ffffff : System RAM (driver managed)
>>>>>               148000000-14fffffff : System RAM (driver managed)
>>>>>               150000000-157ffffff : System RAM (driver managed)
>>>>>
>>>>> Good enough for my taste.
>>>>>
>>>>>> What about adding an optional /sys/firmware/memmap/X/parent attribute.
>>>>>
>>>>> I really don't want any firmware memmap entries for something that is
>>>>> not part of the firmware provided memmap. In addition,
>>>>> /sys/firmware/memmap/ is still a fairly x86_64 specific thing. Only mips
>>>>> and two arm configs enable it at all.
>>>>>
>>>>> So, IMHO, /sys/firmware/memmap/ is definitely not the way to go.
>>>>
>>>> I think that's a policy decision and policy decisions do not belong in
>>>> the kernel. Give the tooling the opportunity to decide whether System
>>>> RAM stays that way over a kexec. The parenthetical reference otherwise
>>>> looks out of place to me in the /proc/iomem output. What makes it
>>>> "driver managed" is how the kernel handles it, not how the kernel
>>>> names it.
>>>
>>> At least, virtio-mem is different. It really *has to be handled* by the
>>> driver. This is not a policy. It's how it works.
> 
> ...but that's not necessarily how dax/kmem works.
> 

Yes, and user space could still take that memory and add it to the
firmware memmap if it really wants to. It knows that it is special. It
can figure out that it belongs to a dax device using /proc/iomem.

>>>
>>
>> Oh, and I don't see why "System RAM (driver managed)" would hinder any
>> policy in user case to still do what it thinks is the right thing to do
>> (e.g., for dax).
>>
>> "System RAM (driver managed)" would mean: Memory is not part of the raw
>> firmware memmap. It was detected and added by a driver. Handle with
>> care, this is special.
> 
> Oh, no, I was more reacting to your, "don't update
> /sys/firmware/memmap for the (driver managed) range" choice as being a
> policy decision. It otherwise feels to me "System RAM (driver
> managed)" adds confusion for casual users of /proc/iomem and for clued
> in tools they have the parent association to decide policy.

Not sure if I understand correctly, so bear with me :).

Adding or not adding stuff to /sys/firmware/memmap is not a policy
decision. If it's not part of the raw firmware-provided memmap, it has
nothing to do in /sys/firmware/memmap. That's what the documentation
from 2008 tells us.

Again, my point is that we don't create /sys/firmware/memmap entries for
dax/kmem and virtio-mem memory - because it's not part of the raw
firmware-provided memmap. I was not suggesting to add something like
"System RAM (driver managed)" there instead, maybe that part was confusing.

-- 
Thanks,

David / dhildenb


^ permalink raw reply

* Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP
From: Dan Williams @ 2020-05-01 18:03 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: virtio-dev, linux-hyperv, Michal Hocko, Baoquan He, Linux ACPI,
	Wei Yang, linux-s390, linux-nvdimm, Linux Kernel Mailing List,
	virtualization, Linux MM, Michael S . Tsirkin, Eric W. Biederman,
	Pankaj Gupta, xen-devel, Andrew Morton, Michal Hocko,
	linuxppc-dev
In-Reply-To: <0169e822-a6cc-1543-88ed-2a85d95ffb93@redhat.com>

On Fri, May 1, 2020 at 10:51 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 01.05.20 19:45, David Hildenbrand wrote:
> > On 01.05.20 19:39, Dan Williams wrote:
> >> On Fri, May 1, 2020 at 10:21 AM David Hildenbrand <david@redhat.com> wrote:
> >>>
> >>> On 01.05.20 18:56, Dan Williams wrote:
> >>>> On Fri, May 1, 2020 at 2:34 AM David Hildenbrand <david@redhat.com> wrote:
> >>>>>
> >>>>> On 01.05.20 00:24, Andrew Morton wrote:
> >>>>>> On Thu, 30 Apr 2020 20:43:39 +0200 David Hildenbrand <david@redhat.com> wrote:
> >>>>>>
> >>>>>>>>
> >>>>>>>> Why does the firmware map support hotplug entries?
> >>>>>>>
> >>>>>>> I assume:
> >>>>>>>
> >>>>>>> The firmware memmap was added primarily for x86-64 kexec (and still, is
> >>>>>>> mostly used on x86-64 only IIRC). There, we had ACPI hotplug. When DIMMs
> >>>>>>> get hotplugged on real HW, they get added to e820. Same applies to
> >>>>>>> memory added via HyperV balloon (unless memory is unplugged via
> >>>>>>> ballooning and you reboot ... the the e820 is changed as well). I assume
> >>>>>>> we wanted to be able to reflect that, to make kexec look like a real reboot.
> >>>>>>>
> >>>>>>> This worked for a while. Then came dax/kmem. Now comes virtio-mem.
> >>>>>>>
> >>>>>>>
> >>>>>>> But I assume only Andrew can enlighten us.
> >>>>>>>
> >>>>>>> @Andrew, any guidance here? Should we really add all memory to the
> >>>>>>> firmware memmap, even if this contradicts with the existing
> >>>>>>> documentation? (especially, if the actual firmware memmap will *not*
> >>>>>>> contain that memory after a reboot)
> >>>>>>
> >>>>>> For some reason that patch is misattributed - it was authored by
> >>>>>> Shaohui Zheng <shaohui.zheng@intel.com>, who hasn't been heard from in
> >>>>>> a decade.  I looked through the email discussion from that time and I'm
> >>>>>> not seeing anything useful.  But I wasn't able to locate Dave Hansen's
> >>>>>> review comments.
> >>>>>
> >>>>> Okay, thanks for checking. I think the documentation from 2008 is pretty
> >>>>> clear what has to be done here. I will add some of these details to the
> >>>>> patch description.
> >>>>>
> >>>>> Also, now that I know that esp. kexec-tools already don't consider
> >>>>> dax/kmem memory properly (memory will not get dumped via kdump) and
> >>>>> won't really suffer from a name change in /proc/iomem, I will go back to
> >>>>> the MHP_DRIVER_MANAGED approach and
> >>>>> 1. Don't create firmware memmap entries
> >>>>> 2. Name the resource "System RAM (driver managed)"
> >>>>> 3. Flag the resource via something like IORESOURCE_MEM_DRIVER_MANAGED.
> >>>>>
> >>>>> This way, kernel users and user space can figure out that this memory
> >>>>> has different semantics and handle it accordingly - I think that was
> >>>>> what Eric was asking for.
> >>>>>
> >>>>> Of course, open for suggestions.
> >>>>
> >>>> I'm still more of a fan of this being communicated by "System RAM"
> >>>
> >>> I was mentioning somewhere in this thread that "System RAM" inside a
> >>> hierarchy (like dax/kmem) will already be basically ignored by
> >>> kexec-tools. So, placing it inside a hierarchy already makes it look
> >>> special already.
> >>>
> >>> But after all, as we have to change kexec-tools either way, we can
> >>> directly go ahead and flag it properly as special (in case there will
> >>> ever be other cases where we could no longer distinguish it).
> >>>
> >>>> being parented especially because that tells you something about how
> >>>> the memory is driver-managed and which mechanism might be in play.
> >>>
> >>> The could be communicated to some degree via the resource hierarchy.
> >>>
> >>> E.g.,
> >>>
> >>>             [root@localhost ~]# cat /proc/iomem
> >>>             ...
> >>>             140000000-33fffffff : Persistent Memory
> >>>               140000000-1481fffff : namespace0.0
> >>>               150000000-33fffffff : dax0.0
> >>>                 150000000-33fffffff : System RAM (driver managed)
> >>>
> >>> vs.
> >>>
> >>>            :/# cat /proc/iomem
> >>>             [...]
> >>>             140000000-333ffffff : virtio-mem (virtio0)
> >>>               140000000-147ffffff : System RAM (driver managed)
> >>>               148000000-14fffffff : System RAM (driver managed)
> >>>               150000000-157ffffff : System RAM (driver managed)
> >>>
> >>> Good enough for my taste.
> >>>
> >>>> What about adding an optional /sys/firmware/memmap/X/parent attribute.
> >>>
> >>> I really don't want any firmware memmap entries for something that is
> >>> not part of the firmware provided memmap. In addition,
> >>> /sys/firmware/memmap/ is still a fairly x86_64 specific thing. Only mips
> >>> and two arm configs enable it at all.
> >>>
> >>> So, IMHO, /sys/firmware/memmap/ is definitely not the way to go.
> >>
> >> I think that's a policy decision and policy decisions do not belong in
> >> the kernel. Give the tooling the opportunity to decide whether System
> >> RAM stays that way over a kexec. The parenthetical reference otherwise
> >> looks out of place to me in the /proc/iomem output. What makes it
> >> "driver managed" is how the kernel handles it, not how the kernel
> >> names it.
> >
> > At least, virtio-mem is different. It really *has to be handled* by the
> > driver. This is not a policy. It's how it works.

...but that's not necessarily how dax/kmem works.

> >
>
> Oh, and I don't see why "System RAM (driver managed)" would hinder any
> policy in user case to still do what it thinks is the right thing to do
> (e.g., for dax).
>
> "System RAM (driver managed)" would mean: Memory is not part of the raw
> firmware memmap. It was detected and added by a driver. Handle with
> care, this is special.

Oh, no, I was more reacting to your, "don't update
/sys/firmware/memmap for the (driver managed) range" choice as being a
policy decision. It otherwise feels to me "System RAM (driver
managed)" adds confusion for casual users of /proc/iomem and for clued
in tools they have the parent association to decide policy.

^ permalink raw reply

* Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP
From: David Hildenbrand @ 2020-05-01 17:51 UTC (permalink / raw)
  To: Dan Williams
  Cc: virtio-dev, linux-hyperv, Michal Hocko, Baoquan He, Linux ACPI,
	Wei Yang, linux-s390, linux-nvdimm, Linux Kernel Mailing List,
	virtualization, Linux MM, Michael S . Tsirkin, Eric W. Biederman,
	Pankaj Gupta, xen-devel, Andrew Morton, Michal Hocko,
	linuxppc-dev
In-Reply-To: <62dd4ce2-86cc-5b85-734f-ec8766528a1b@redhat.com>

On 01.05.20 19:45, David Hildenbrand wrote:
> On 01.05.20 19:39, Dan Williams wrote:
>> On Fri, May 1, 2020 at 10:21 AM David Hildenbrand <david@redhat.com> wrote:
>>>
>>> On 01.05.20 18:56, Dan Williams wrote:
>>>> On Fri, May 1, 2020 at 2:34 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>
>>>>> On 01.05.20 00:24, Andrew Morton wrote:
>>>>>> On Thu, 30 Apr 2020 20:43:39 +0200 David Hildenbrand <david@redhat.com> wrote:
>>>>>>
>>>>>>>>
>>>>>>>> Why does the firmware map support hotplug entries?
>>>>>>>
>>>>>>> I assume:
>>>>>>>
>>>>>>> The firmware memmap was added primarily for x86-64 kexec (and still, is
>>>>>>> mostly used on x86-64 only IIRC). There, we had ACPI hotplug. When DIMMs
>>>>>>> get hotplugged on real HW, they get added to e820. Same applies to
>>>>>>> memory added via HyperV balloon (unless memory is unplugged via
>>>>>>> ballooning and you reboot ... the the e820 is changed as well). I assume
>>>>>>> we wanted to be able to reflect that, to make kexec look like a real reboot.
>>>>>>>
>>>>>>> This worked for a while. Then came dax/kmem. Now comes virtio-mem.
>>>>>>>
>>>>>>>
>>>>>>> But I assume only Andrew can enlighten us.
>>>>>>>
>>>>>>> @Andrew, any guidance here? Should we really add all memory to the
>>>>>>> firmware memmap, even if this contradicts with the existing
>>>>>>> documentation? (especially, if the actual firmware memmap will *not*
>>>>>>> contain that memory after a reboot)
>>>>>>
>>>>>> For some reason that patch is misattributed - it was authored by
>>>>>> Shaohui Zheng <shaohui.zheng@intel.com>, who hasn't been heard from in
>>>>>> a decade.  I looked through the email discussion from that time and I'm
>>>>>> not seeing anything useful.  But I wasn't able to locate Dave Hansen's
>>>>>> review comments.
>>>>>
>>>>> Okay, thanks for checking. I think the documentation from 2008 is pretty
>>>>> clear what has to be done here. I will add some of these details to the
>>>>> patch description.
>>>>>
>>>>> Also, now that I know that esp. kexec-tools already don't consider
>>>>> dax/kmem memory properly (memory will not get dumped via kdump) and
>>>>> won't really suffer from a name change in /proc/iomem, I will go back to
>>>>> the MHP_DRIVER_MANAGED approach and
>>>>> 1. Don't create firmware memmap entries
>>>>> 2. Name the resource "System RAM (driver managed)"
>>>>> 3. Flag the resource via something like IORESOURCE_MEM_DRIVER_MANAGED.
>>>>>
>>>>> This way, kernel users and user space can figure out that this memory
>>>>> has different semantics and handle it accordingly - I think that was
>>>>> what Eric was asking for.
>>>>>
>>>>> Of course, open for suggestions.
>>>>
>>>> I'm still more of a fan of this being communicated by "System RAM"
>>>
>>> I was mentioning somewhere in this thread that "System RAM" inside a
>>> hierarchy (like dax/kmem) will already be basically ignored by
>>> kexec-tools. So, placing it inside a hierarchy already makes it look
>>> special already.
>>>
>>> But after all, as we have to change kexec-tools either way, we can
>>> directly go ahead and flag it properly as special (in case there will
>>> ever be other cases where we could no longer distinguish it).
>>>
>>>> being parented especially because that tells you something about how
>>>> the memory is driver-managed and which mechanism might be in play.
>>>
>>> The could be communicated to some degree via the resource hierarchy.
>>>
>>> E.g.,
>>>
>>>             [root@localhost ~]# cat /proc/iomem
>>>             ...
>>>             140000000-33fffffff : Persistent Memory
>>>               140000000-1481fffff : namespace0.0
>>>               150000000-33fffffff : dax0.0
>>>                 150000000-33fffffff : System RAM (driver managed)
>>>
>>> vs.
>>>
>>>            :/# cat /proc/iomem
>>>             [...]
>>>             140000000-333ffffff : virtio-mem (virtio0)
>>>               140000000-147ffffff : System RAM (driver managed)
>>>               148000000-14fffffff : System RAM (driver managed)
>>>               150000000-157ffffff : System RAM (driver managed)
>>>
>>> Good enough for my taste.
>>>
>>>> What about adding an optional /sys/firmware/memmap/X/parent attribute.
>>>
>>> I really don't want any firmware memmap entries for something that is
>>> not part of the firmware provided memmap. In addition,
>>> /sys/firmware/memmap/ is still a fairly x86_64 specific thing. Only mips
>>> and two arm configs enable it at all.
>>>
>>> So, IMHO, /sys/firmware/memmap/ is definitely not the way to go.
>>
>> I think that's a policy decision and policy decisions do not belong in
>> the kernel. Give the tooling the opportunity to decide whether System
>> RAM stays that way over a kexec. The parenthetical reference otherwise
>> looks out of place to me in the /proc/iomem output. What makes it
>> "driver managed" is how the kernel handles it, not how the kernel
>> names it.
> 
> At least, virtio-mem is different. It really *has to be handled* by the
> driver. This is not a policy. It's how it works.
> 

Oh, and I don't see why "System RAM (driver managed)" would hinder any
policy in user case to still do what it thinks is the right thing to do
(e.g., for dax).

"System RAM (driver managed)" would mean: Memory is not part of the raw
firmware memmap. It was detected and added by a driver. Handle with
care, this is special.

-- 
Thanks,

David / dhildenb


^ permalink raw reply

* Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP
From: David Hildenbrand @ 2020-05-01 17:45 UTC (permalink / raw)
  To: Dan Williams
  Cc: virtio-dev, linux-hyperv, Michal Hocko, Baoquan He, Linux ACPI,
	Wei Yang, linux-s390, linux-nvdimm, Linux Kernel Mailing List,
	virtualization, Linux MM, Michael S . Tsirkin, Eric W. Biederman,
	Pankaj Gupta, xen-devel, Andrew Morton, Michal Hocko,
	linuxppc-dev
In-Reply-To: <CAPcyv4iOqS0Wbfa2KPfE1axQFGXoRB4mmPRP__Lmqpw6Qpr_ig@mail.gmail.com>

On 01.05.20 19:39, Dan Williams wrote:
> On Fri, May 1, 2020 at 10:21 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 01.05.20 18:56, Dan Williams wrote:
>>> On Fri, May 1, 2020 at 2:34 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 01.05.20 00:24, Andrew Morton wrote:
>>>>> On Thu, 30 Apr 2020 20:43:39 +0200 David Hildenbrand <david@redhat.com> wrote:
>>>>>
>>>>>>>
>>>>>>> Why does the firmware map support hotplug entries?
>>>>>>
>>>>>> I assume:
>>>>>>
>>>>>> The firmware memmap was added primarily for x86-64 kexec (and still, is
>>>>>> mostly used on x86-64 only IIRC). There, we had ACPI hotplug. When DIMMs
>>>>>> get hotplugged on real HW, they get added to e820. Same applies to
>>>>>> memory added via HyperV balloon (unless memory is unplugged via
>>>>>> ballooning and you reboot ... the the e820 is changed as well). I assume
>>>>>> we wanted to be able to reflect that, to make kexec look like a real reboot.
>>>>>>
>>>>>> This worked for a while. Then came dax/kmem. Now comes virtio-mem.
>>>>>>
>>>>>>
>>>>>> But I assume only Andrew can enlighten us.
>>>>>>
>>>>>> @Andrew, any guidance here? Should we really add all memory to the
>>>>>> firmware memmap, even if this contradicts with the existing
>>>>>> documentation? (especially, if the actual firmware memmap will *not*
>>>>>> contain that memory after a reboot)
>>>>>
>>>>> For some reason that patch is misattributed - it was authored by
>>>>> Shaohui Zheng <shaohui.zheng@intel.com>, who hasn't been heard from in
>>>>> a decade.  I looked through the email discussion from that time and I'm
>>>>> not seeing anything useful.  But I wasn't able to locate Dave Hansen's
>>>>> review comments.
>>>>
>>>> Okay, thanks for checking. I think the documentation from 2008 is pretty
>>>> clear what has to be done here. I will add some of these details to the
>>>> patch description.
>>>>
>>>> Also, now that I know that esp. kexec-tools already don't consider
>>>> dax/kmem memory properly (memory will not get dumped via kdump) and
>>>> won't really suffer from a name change in /proc/iomem, I will go back to
>>>> the MHP_DRIVER_MANAGED approach and
>>>> 1. Don't create firmware memmap entries
>>>> 2. Name the resource "System RAM (driver managed)"
>>>> 3. Flag the resource via something like IORESOURCE_MEM_DRIVER_MANAGED.
>>>>
>>>> This way, kernel users and user space can figure out that this memory
>>>> has different semantics and handle it accordingly - I think that was
>>>> what Eric was asking for.
>>>>
>>>> Of course, open for suggestions.
>>>
>>> I'm still more of a fan of this being communicated by "System RAM"
>>
>> I was mentioning somewhere in this thread that "System RAM" inside a
>> hierarchy (like dax/kmem) will already be basically ignored by
>> kexec-tools. So, placing it inside a hierarchy already makes it look
>> special already.
>>
>> But after all, as we have to change kexec-tools either way, we can
>> directly go ahead and flag it properly as special (in case there will
>> ever be other cases where we could no longer distinguish it).
>>
>>> being parented especially because that tells you something about how
>>> the memory is driver-managed and which mechanism might be in play.
>>
>> The could be communicated to some degree via the resource hierarchy.
>>
>> E.g.,
>>
>>             [root@localhost ~]# cat /proc/iomem
>>             ...
>>             140000000-33fffffff : Persistent Memory
>>               140000000-1481fffff : namespace0.0
>>               150000000-33fffffff : dax0.0
>>                 150000000-33fffffff : System RAM (driver managed)
>>
>> vs.
>>
>>            :/# cat /proc/iomem
>>             [...]
>>             140000000-333ffffff : virtio-mem (virtio0)
>>               140000000-147ffffff : System RAM (driver managed)
>>               148000000-14fffffff : System RAM (driver managed)
>>               150000000-157ffffff : System RAM (driver managed)
>>
>> Good enough for my taste.
>>
>>> What about adding an optional /sys/firmware/memmap/X/parent attribute.
>>
>> I really don't want any firmware memmap entries for something that is
>> not part of the firmware provided memmap. In addition,
>> /sys/firmware/memmap/ is still a fairly x86_64 specific thing. Only mips
>> and two arm configs enable it at all.
>>
>> So, IMHO, /sys/firmware/memmap/ is definitely not the way to go.
> 
> I think that's a policy decision and policy decisions do not belong in
> the kernel. Give the tooling the opportunity to decide whether System
> RAM stays that way over a kexec. The parenthetical reference otherwise
> looks out of place to me in the /proc/iomem output. What makes it
> "driver managed" is how the kernel handles it, not how the kernel
> names it.

At least, virtio-mem is different. It really *has to be handled* by the
driver. This is not a policy. It's how it works.

-- 
Thanks,

David / dhildenb


^ permalink raw reply

* Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP
From: Dan Williams @ 2020-05-01 17:39 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: virtio-dev, linux-hyperv, Michal Hocko, Baoquan He, Linux ACPI,
	Wei Yang, linux-s390, linux-nvdimm, Linux Kernel Mailing List,
	virtualization, Linux MM, Michael S . Tsirkin, Eric W. Biederman,
	Pankaj Gupta, xen-devel, Andrew Morton, Michal Hocko,
	linuxppc-dev
In-Reply-To: <2d019c11-a478-9d70-abd5-4fd2ebf4bc1d@redhat.com>

On Fri, May 1, 2020 at 10:21 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 01.05.20 18:56, Dan Williams wrote:
> > On Fri, May 1, 2020 at 2:34 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 01.05.20 00:24, Andrew Morton wrote:
> >>> On Thu, 30 Apr 2020 20:43:39 +0200 David Hildenbrand <david@redhat.com> wrote:
> >>>
> >>>>>
> >>>>> Why does the firmware map support hotplug entries?
> >>>>
> >>>> I assume:
> >>>>
> >>>> The firmware memmap was added primarily for x86-64 kexec (and still, is
> >>>> mostly used on x86-64 only IIRC). There, we had ACPI hotplug. When DIMMs
> >>>> get hotplugged on real HW, they get added to e820. Same applies to
> >>>> memory added via HyperV balloon (unless memory is unplugged via
> >>>> ballooning and you reboot ... the the e820 is changed as well). I assume
> >>>> we wanted to be able to reflect that, to make kexec look like a real reboot.
> >>>>
> >>>> This worked for a while. Then came dax/kmem. Now comes virtio-mem.
> >>>>
> >>>>
> >>>> But I assume only Andrew can enlighten us.
> >>>>
> >>>> @Andrew, any guidance here? Should we really add all memory to the
> >>>> firmware memmap, even if this contradicts with the existing
> >>>> documentation? (especially, if the actual firmware memmap will *not*
> >>>> contain that memory after a reboot)
> >>>
> >>> For some reason that patch is misattributed - it was authored by
> >>> Shaohui Zheng <shaohui.zheng@intel.com>, who hasn't been heard from in
> >>> a decade.  I looked through the email discussion from that time and I'm
> >>> not seeing anything useful.  But I wasn't able to locate Dave Hansen's
> >>> review comments.
> >>
> >> Okay, thanks for checking. I think the documentation from 2008 is pretty
> >> clear what has to be done here. I will add some of these details to the
> >> patch description.
> >>
> >> Also, now that I know that esp. kexec-tools already don't consider
> >> dax/kmem memory properly (memory will not get dumped via kdump) and
> >> won't really suffer from a name change in /proc/iomem, I will go back to
> >> the MHP_DRIVER_MANAGED approach and
> >> 1. Don't create firmware memmap entries
> >> 2. Name the resource "System RAM (driver managed)"
> >> 3. Flag the resource via something like IORESOURCE_MEM_DRIVER_MANAGED.
> >>
> >> This way, kernel users and user space can figure out that this memory
> >> has different semantics and handle it accordingly - I think that was
> >> what Eric was asking for.
> >>
> >> Of course, open for suggestions.
> >
> > I'm still more of a fan of this being communicated by "System RAM"
>
> I was mentioning somewhere in this thread that "System RAM" inside a
> hierarchy (like dax/kmem) will already be basically ignored by
> kexec-tools. So, placing it inside a hierarchy already makes it look
> special already.
>
> But after all, as we have to change kexec-tools either way, we can
> directly go ahead and flag it properly as special (in case there will
> ever be other cases where we could no longer distinguish it).
>
> > being parented especially because that tells you something about how
> > the memory is driver-managed and which mechanism might be in play.
>
> The could be communicated to some degree via the resource hierarchy.
>
> E.g.,
>
>             [root@localhost ~]# cat /proc/iomem
>             ...
>             140000000-33fffffff : Persistent Memory
>               140000000-1481fffff : namespace0.0
>               150000000-33fffffff : dax0.0
>                 150000000-33fffffff : System RAM (driver managed)
>
> vs.
>
>            :/# cat /proc/iomem
>             [...]
>             140000000-333ffffff : virtio-mem (virtio0)
>               140000000-147ffffff : System RAM (driver managed)
>               148000000-14fffffff : System RAM (driver managed)
>               150000000-157ffffff : System RAM (driver managed)
>
> Good enough for my taste.
>
> > What about adding an optional /sys/firmware/memmap/X/parent attribute.
>
> I really don't want any firmware memmap entries for something that is
> not part of the firmware provided memmap. In addition,
> /sys/firmware/memmap/ is still a fairly x86_64 specific thing. Only mips
> and two arm configs enable it at all.
>
> So, IMHO, /sys/firmware/memmap/ is definitely not the way to go.

I think that's a policy decision and policy decisions do not belong in
the kernel. Give the tooling the opportunity to decide whether System
RAM stays that way over a kexec. The parenthetical reference otherwise
looks out of place to me in the /proc/iomem output. What makes it
"driver managed" is how the kernel handles it, not how the kernel
names it.

^ permalink raw reply

* Re: [PATCH v3 0/2] PCI/ERR: Allow Native AER/DPC using _OSC
From: Derrick, Jonathan @ 2020-05-01 17:35 UTC (permalink / raw)
  To: helgaas@kernel.org
  Cc: sathyanarayanan.kuppuswamy@linux.intel.com, Patel, Mayurkumar,
	fred@fredlawl.com, sbobroff@linux.ibm.com,
	linuxppc-dev@lists.ozlabs.org, Wysocki,  Rafael J,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	andriy.shevchenko@linux.intel.com, bhelgaas@google.com,
	alex.williamson@redhat.com, oohall@gmail.com, olof@lixom.net,
	rajatja@google.com, mika.westerberg@linux.intel.com
In-Reply-To: <20200501171649.GA116404@bjorn-Precision-5520>

On Fri, 2020-05-01 at 12:16 -0500, Bjorn Helgaas wrote:
> On Thu, Apr 30, 2020 at 12:46:07PM -0600, Jon Derrick wrote:
> > Hi Bjorn & Kuppuswamy,
> > 
> > I see a problem in the DPC ECN [1] to _OSC in that it doesn't give us a way to
> > determine if firmware supports _OSC DPC negotation, and therefore how to handle
> > DPC.
> > 
> > Here is the wording of the ECN that implies that Firmware without _OSC DPC
> > negotiation support should have the OSPM rely on _OSC AER negotiation when
> > determining DPC control:
> > 
> >   PCIe Base Specification suggests that Downstream Port Containment may be
> >   controlled either by the Firmware or the Operating System. It also suggests
> >   that the Firmware retain ownership of Downstream Port Containment if it also
> >   owns AER. When the Firmware owns Downstream Port Containment, it is expected
> >   to use the new "Error Disconnect Recover" notification to alert OSPM of a
> >   Downstream Port Containment event.
> > 
> > In legacy platforms, as bits in _OSC are reserved prior to implementation, ACPI
> > Root Bus enumeration will mark these Host Bridges as without Native DPC
> > support, even though the specification implies it's expected that AER _OSC
> > negotiation determines DPC control for these platforms. There seems to be a
> > need for a way to determine if the DPC control bit in _OSC is supported and
> > fallback on AER otherwise.
> > 
> > 
> > Currently portdrv assumes DPC control if the port has Native AER services:
> > 
> > static int get_port_device_capability(struct pci_dev *dev)
> > ...
> > 	if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
> > 	    pci_aer_available() &&
> > 	    (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
> > 		services |= PCIE_PORT_SERVICE_DPC;
> > 
> > Newer firmware may not grant OSPM DPC control, if for instance, it expects to
> > use Error Disconnect Recovery. However it looks like ACPI will use DPC services
> > via the EDR driver, without binding the full DPC port service driver.
> > 
> > 
> > If we change portdrv to probe based on host->native_dpc and not AER, then we
> > break instances with legacy firmware where OSPM will clear host->native_dpc
> > solely due to _OSC bits being reserved:
> > 
> > struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
> > ...
> > 	if (!(root->osc_control_set & OSC_PCI_EXPRESS_DPC_CONTROL))
> > 		host_bridge->native_dpc = 0;
> > 
> > 
> > 
> > So my assumption instead is that host->native_dpc can be 0 and expect Native
> > DPC services if AER is used. In other words, if and only if DPC probe is
> > invoked from portdrv, then it needs to rely on the AER dependency. Otherwise it
> > should be assumed that ACPI set up DPC via EDR. This covers legacy firmware.
> > 
> > However it seems like that could be trouble with newer firmware that might give
> > OSPM control of AER but not DPC, and would result in both Native DPC and EDR
> > being in effect.
> > 
> > 
> > Anyways here are two patches that give control of AER and DPC on the results of
> > _OSC. They don't mess with the HEST parser as I expect those to be removed at
> > some point. I need these for VMD support which doesn't even rely on _OSC, but I
> > suspect this won't be the last effort as we detangle Firmware First.
> > 
> > [1] https://members.pcisig.com/wg/PCI-SIG/document/12888
> 
> Hi Jon, I think we need to sort out the _OSC/FIRMWARE_FIRST patches
> from Alex and Sathy first, then see what needs to be done on top of
> those, so I'm going to push these off for a few days and they'll
> probably need a refresh.
> 
> Bjorn


Agreed, no need to merge now. Just wanted to bring up the DPC
ambiguity, which I think was first addressed by dpc-native:

commit 35a0b2378c199d4f26e458b2ca38ea56aaf2d9b8
Author: Olof Johansson <olof@lixom.net>
Date:   Wed Oct 23 12:22:05 2019 -0700

    PCI/DPC: Add "pcie_ports=dpc-native" to allow DPC without AER control
    
    Prior to eed85ff4c0da7 ("PCI/DPC: Enable DPC only if AER is available"),
    Linux handled DPC events regardless of whether firmware had granted it
    ownership of AER or DPC, e.g., via _OSC.
    
    PCIe r5.0, sec 6.2.10, recommends that the OS link control of DPC to
    control of AER, so after eed85ff4c0da7, Linux handles DPC events only if it
    has control of AER.
    
    On platforms that do not grant OS control of AER via _OSC, Linux DPC
    handling worked before eed85ff4c0da7 but not after.
    
    To make Linux DPC handling work on those platforms the same way they did
    before, add a "pcie_ports=dpc-native" kernel parameter that makes Linux
    handle DPC events regardless of whether it has control of AER.
    
    [bhelgaas: commit log, move pcie_ports_dpc_native to drivers/pci/]
    Link: https://lore.kernel.org/r/20191023192205.97024-1-olof@lixom.net
    Signed-off-by: Olof Johansson <olof@lixom.net>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


Thanks,
Jon

^ permalink raw reply

* Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP
From: David Hildenbrand @ 2020-05-01 17:21 UTC (permalink / raw)
  To: Dan Williams
  Cc: virtio-dev, linux-hyperv, Michal Hocko, Baoquan He, Linux ACPI,
	Wei Yang, linux-s390, linux-nvdimm, Linux Kernel Mailing List,
	virtualization, Linux MM, Michael S . Tsirkin, Eric W. Biederman,
	Pankaj Gupta, xen-devel, Andrew Morton, Michal Hocko,
	linuxppc-dev
In-Reply-To: <CAPcyv4j=YKnr1HW4OhAmpzbuKjtfP7FdAn4-V7uA=b-Tcpfu+A@mail.gmail.com>

On 01.05.20 18:56, Dan Williams wrote:
> On Fri, May 1, 2020 at 2:34 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 01.05.20 00:24, Andrew Morton wrote:
>>> On Thu, 30 Apr 2020 20:43:39 +0200 David Hildenbrand <david@redhat.com> wrote:
>>>
>>>>>
>>>>> Why does the firmware map support hotplug entries?
>>>>
>>>> I assume:
>>>>
>>>> The firmware memmap was added primarily for x86-64 kexec (and still, is
>>>> mostly used on x86-64 only IIRC). There, we had ACPI hotplug. When DIMMs
>>>> get hotplugged on real HW, they get added to e820. Same applies to
>>>> memory added via HyperV balloon (unless memory is unplugged via
>>>> ballooning and you reboot ... the the e820 is changed as well). I assume
>>>> we wanted to be able to reflect that, to make kexec look like a real reboot.
>>>>
>>>> This worked for a while. Then came dax/kmem. Now comes virtio-mem.
>>>>
>>>>
>>>> But I assume only Andrew can enlighten us.
>>>>
>>>> @Andrew, any guidance here? Should we really add all memory to the
>>>> firmware memmap, even if this contradicts with the existing
>>>> documentation? (especially, if the actual firmware memmap will *not*
>>>> contain that memory after a reboot)
>>>
>>> For some reason that patch is misattributed - it was authored by
>>> Shaohui Zheng <shaohui.zheng@intel.com>, who hasn't been heard from in
>>> a decade.  I looked through the email discussion from that time and I'm
>>> not seeing anything useful.  But I wasn't able to locate Dave Hansen's
>>> review comments.
>>
>> Okay, thanks for checking. I think the documentation from 2008 is pretty
>> clear what has to be done here. I will add some of these details to the
>> patch description.
>>
>> Also, now that I know that esp. kexec-tools already don't consider
>> dax/kmem memory properly (memory will not get dumped via kdump) and
>> won't really suffer from a name change in /proc/iomem, I will go back to
>> the MHP_DRIVER_MANAGED approach and
>> 1. Don't create firmware memmap entries
>> 2. Name the resource "System RAM (driver managed)"
>> 3. Flag the resource via something like IORESOURCE_MEM_DRIVER_MANAGED.
>>
>> This way, kernel users and user space can figure out that this memory
>> has different semantics and handle it accordingly - I think that was
>> what Eric was asking for.
>>
>> Of course, open for suggestions.
> 
> I'm still more of a fan of this being communicated by "System RAM"

I was mentioning somewhere in this thread that "System RAM" inside a
hierarchy (like dax/kmem) will already be basically ignored by
kexec-tools. So, placing it inside a hierarchy already makes it look
special already.

But after all, as we have to change kexec-tools either way, we can
directly go ahead and flag it properly as special (in case there will
ever be other cases where we could no longer distinguish it).

> being parented especially because that tells you something about how
> the memory is driver-managed and which mechanism might be in play.

The could be communicated to some degree via the resource hierarchy.

E.g.,

            [root@localhost ~]# cat /proc/iomem
            ...
            140000000-33fffffff : Persistent Memory
              140000000-1481fffff : namespace0.0
              150000000-33fffffff : dax0.0
                150000000-33fffffff : System RAM (driver managed)

vs.

           :/# cat /proc/iomem
            [...]
            140000000-333ffffff : virtio-mem (virtio0)
              140000000-147ffffff : System RAM (driver managed)
              148000000-14fffffff : System RAM (driver managed)
              150000000-157ffffff : System RAM (driver managed)

Good enough for my taste.

> What about adding an optional /sys/firmware/memmap/X/parent attribute.

I really don't want any firmware memmap entries for something that is
not part of the firmware provided memmap. In addition,
/sys/firmware/memmap/ is still a fairly x86_64 specific thing. Only mips
and two arm configs enable it at all.

So, IMHO, /sys/firmware/memmap/ is definitely not the way to go.

-- 
Thanks,

David / dhildenb


^ permalink raw reply

* Re: [PATCH V1 00/10] Remove duplicated kmap code
From: Ira Weiny @ 2020-05-01 17:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Huang Rui, Paul Mackerras,
	H. Peter Anvin, sparclinux, Dan Williams, Helge Deller, x86,
	linux-csky, Ingo Molnar, linux-snps-arc, linux-xtensa,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, Chris Zankel, Thomas Bogendoerfer, linux-parisc,
	linux-kernel, David S. Miller, Andrew Morton, linuxppc-dev,
	Christian Koenig
In-Reply-To: <20200501085456.GL27858@infradead.org>

On Fri, May 01, 2020 at 01:54:56AM -0700, Christoph Hellwig wrote:
> In addition to the work already it the series, it seems like
> LAST_PKMAP_MASK, PKMAP_ADDR and PKMAP_NR can also be consolidated
> to common code.

Agreed, I mentioned in the cover letter there are similarities...

> 
> Also kmap_atomic_high_prot / kmap_atomic_pfn could move into common
> code, maybe keyed off a symbol selected by the actual users that
> need it.  It also seems like it doesn't actually ever need to be
> exported.

...  but these are not as readily obvious, at least to me.  I do see a pattern
but the differences seemed subtle enough that it would take a while to ensure
correctness.  So I'd like to see this series go in and build on it.

> 
> This in turn would lead to being able to allow io_mapping_map_atomic_wc
> on all architectures, which might make nouveau and qxl happy, but maybe
> that can be left for another series.

I agree, that this should be follow on patches.  I still need to fix the
bisect-ability and I don't want to bog down 0-day with a longer series.

Thanks for the review!
Ira


^ permalink raw reply

* Re: [PATCH v3 0/2] PCI/ERR: Allow Native AER/DPC using _OSC
From: Bjorn Helgaas @ 2020-05-01 17:16 UTC (permalink / raw)
  To: Jon Derrick
  Cc: Kuppuswamy Sathyanarayanan, Rajat Jain, Frederick Lawler,
	Sam Bobroff, linux-pci, Rafael J. Wysocki, linuxppc-dev,
	linux-kernel, Olof Johansson, Alex Williamson, Patel, Mayurkumar,
	Oliver O'Halloran, Bjorn Helgaas, Andy Shevchenko,
	Mika Westerberg
In-Reply-To: <1588272369-2145-1-git-send-email-jonathan.derrick@intel.com>

On Thu, Apr 30, 2020 at 12:46:07PM -0600, Jon Derrick wrote:
> Hi Bjorn & Kuppuswamy,
> 
> I see a problem in the DPC ECN [1] to _OSC in that it doesn't give us a way to
> determine if firmware supports _OSC DPC negotation, and therefore how to handle
> DPC.
> 
> Here is the wording of the ECN that implies that Firmware without _OSC DPC
> negotiation support should have the OSPM rely on _OSC AER negotiation when
> determining DPC control:
> 
>   PCIe Base Specification suggests that Downstream Port Containment may be
>   controlled either by the Firmware or the Operating System. It also suggests
>   that the Firmware retain ownership of Downstream Port Containment if it also
>   owns AER. When the Firmware owns Downstream Port Containment, it is expected
>   to use the new "Error Disconnect Recover" notification to alert OSPM of a
>   Downstream Port Containment event.
> 
> In legacy platforms, as bits in _OSC are reserved prior to implementation, ACPI
> Root Bus enumeration will mark these Host Bridges as without Native DPC
> support, even though the specification implies it's expected that AER _OSC
> negotiation determines DPC control for these platforms. There seems to be a
> need for a way to determine if the DPC control bit in _OSC is supported and
> fallback on AER otherwise.
> 
> 
> Currently portdrv assumes DPC control if the port has Native AER services:
> 
> static int get_port_device_capability(struct pci_dev *dev)
> ...
> 	if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
> 	    pci_aer_available() &&
> 	    (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
> 		services |= PCIE_PORT_SERVICE_DPC;
> 
> Newer firmware may not grant OSPM DPC control, if for instance, it expects to
> use Error Disconnect Recovery. However it looks like ACPI will use DPC services
> via the EDR driver, without binding the full DPC port service driver.
> 
> 
> If we change portdrv to probe based on host->native_dpc and not AER, then we
> break instances with legacy firmware where OSPM will clear host->native_dpc
> solely due to _OSC bits being reserved:
> 
> struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
> ...
> 	if (!(root->osc_control_set & OSC_PCI_EXPRESS_DPC_CONTROL))
> 		host_bridge->native_dpc = 0;
> 
> 
> 
> So my assumption instead is that host->native_dpc can be 0 and expect Native
> DPC services if AER is used. In other words, if and only if DPC probe is
> invoked from portdrv, then it needs to rely on the AER dependency. Otherwise it
> should be assumed that ACPI set up DPC via EDR. This covers legacy firmware.
> 
> However it seems like that could be trouble with newer firmware that might give
> OSPM control of AER but not DPC, and would result in both Native DPC and EDR
> being in effect.
> 
> 
> Anyways here are two patches that give control of AER and DPC on the results of
> _OSC. They don't mess with the HEST parser as I expect those to be removed at
> some point. I need these for VMD support which doesn't even rely on _OSC, but I
> suspect this won't be the last effort as we detangle Firmware First.
> 
> [1] https://members.pcisig.com/wg/PCI-SIG/document/12888

Hi Jon, I think we need to sort out the _OSC/FIRMWARE_FIRST patches
from Alex and Sathy first, then see what needs to be done on top of
those, so I'm going to push these off for a few days and they'll
probably need a refresh.

Bjorn

^ permalink raw reply

* Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP
From: Dan Williams @ 2020-05-01 16:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: virtio-dev, linux-hyperv, Michal Hocko, Baoquan He, Linux ACPI,
	Wei Yang, linux-s390, linux-nvdimm, Linux Kernel Mailing List,
	virtualization, Linux MM, Michael S . Tsirkin, Eric W. Biederman,
	Pankaj Gupta, xen-devel, Andrew Morton, Michal Hocko,
	linuxppc-dev
In-Reply-To: <5c908ec3-9495-531e-9291-cbab24f292d6@redhat.com>

On Fri, May 1, 2020 at 2:34 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 01.05.20 00:24, Andrew Morton wrote:
> > On Thu, 30 Apr 2020 20:43:39 +0200 David Hildenbrand <david@redhat.com> wrote:
> >
> >>>
> >>> Why does the firmware map support hotplug entries?
> >>
> >> I assume:
> >>
> >> The firmware memmap was added primarily for x86-64 kexec (and still, is
> >> mostly used on x86-64 only IIRC). There, we had ACPI hotplug. When DIMMs
> >> get hotplugged on real HW, they get added to e820. Same applies to
> >> memory added via HyperV balloon (unless memory is unplugged via
> >> ballooning and you reboot ... the the e820 is changed as well). I assume
> >> we wanted to be able to reflect that, to make kexec look like a real reboot.
> >>
> >> This worked for a while. Then came dax/kmem. Now comes virtio-mem.
> >>
> >>
> >> But I assume only Andrew can enlighten us.
> >>
> >> @Andrew, any guidance here? Should we really add all memory to the
> >> firmware memmap, even if this contradicts with the existing
> >> documentation? (especially, if the actual firmware memmap will *not*
> >> contain that memory after a reboot)
> >
> > For some reason that patch is misattributed - it was authored by
> > Shaohui Zheng <shaohui.zheng@intel.com>, who hasn't been heard from in
> > a decade.  I looked through the email discussion from that time and I'm
> > not seeing anything useful.  But I wasn't able to locate Dave Hansen's
> > review comments.
>
> Okay, thanks for checking. I think the documentation from 2008 is pretty
> clear what has to be done here. I will add some of these details to the
> patch description.
>
> Also, now that I know that esp. kexec-tools already don't consider
> dax/kmem memory properly (memory will not get dumped via kdump) and
> won't really suffer from a name change in /proc/iomem, I will go back to
> the MHP_DRIVER_MANAGED approach and
> 1. Don't create firmware memmap entries
> 2. Name the resource "System RAM (driver managed)"
> 3. Flag the resource via something like IORESOURCE_MEM_DRIVER_MANAGED.
>
> This way, kernel users and user space can figure out that this memory
> has different semantics and handle it accordingly - I think that was
> what Eric was asking for.
>
> Of course, open for suggestions.

I'm still more of a fan of this being communicated by "System RAM"
being parented especially because that tells you something about how
the memory is driver-managed and which mechanism might be in play.
What about adding an optional /sys/firmware/memmap/X/parent attribute.
This lets tooling check if it cares via that interface and lets it
lookup the related infrastructure to interact with if it would do
something different for virtio-mem vs dax/kmem?

^ permalink raw reply

* Re: sparc-related comment, to Re: [PATCH V1 07/10] arch/kmap: Ensure kmap_prot visibility
From: Ira Weiny @ 2020-05-01 15:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Huang Rui, Paul Mackerras,
	H. Peter Anvin, sparclinux, Dan Williams, Helge Deller, x86,
	linux-csky, Ingo Molnar, linux-snps-arc, linux-xtensa,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, Chris Zankel, Thomas Bogendoerfer, linux-parisc,
	linux-kernel, David S. Miller, Andrew Morton, linuxppc-dev,
	Christian Koenig
In-Reply-To: <20200501084446.GG27858@infradead.org>

On Fri, May 01, 2020 at 01:44:46AM -0700, Christoph Hellwig wrote:
> > --- a/arch/sparc/mm/highmem.c
> > +++ b/arch/sparc/mm/highmem.c
> > @@ -33,6 +33,7 @@
> >  #include <asm/vaddrs.h>
> >  
> >  pgprot_t kmap_prot;
> > +EXPORT_SYMBOL(kmap_prot);
> 
> Btw, I don't see why sparc needs this as a variable, as there is just
> a single assignment to it.

Because sparc uses non-standard defines which I'm not familiar with.

        kmap_prot = __pgprot(SRMMU_ET_PTE | SRMMU_PRIV | SRMMU_CACHE);

SRMMU_ET_PTE and friends are defined in 

arch/sparc/include/asm/pgtsrmmu.h

Since I can't readily test sparc this was easier to put out than let 0-day
crank on the entire series checking if including that header in the common
header chain would be an issue.

> 
> If sparc is sorted out we can always make it a define, and use a define
> for kmap_prot that defaults to PAGE_KERNEL, avoiding a little
> more duplication.

Agreed.  But it seems easier as a follow up (for me with 0-day).  Perhaps
someone from sparc can weigh in on the specifics of those defines and why they
are different from the normal ones?  Or even provide a follow on patch?

Ira


^ permalink raw reply

* [PATCH v2] powerpc/ima: fix secure boot rules in ima arch policy
From: Nayna Jain @ 2020-05-01 14:16 UTC (permalink / raw)
  To: linux-integrity, linuxppc-dev; +Cc: Nayna Jain, linux-kernel, Mimi Zohar

To prevent verifying the kernel module appended signature twice
(finit_module), once by the module_sig_check() and again by IMA, powerpc
secure boot rules define an IMA architecture specific policy rule
only if CONFIG_MODULE_SIG_FORCE is not enabled. This, unfortunately, does
not take into account the ability of enabling "sig_enforce" on the boot
command line (module.sig_enforce=1).

Including the IMA module appraise rule results in failing the finit_module
syscall, unless the module signing public key is loaded onto the IMA
keyring.

This patch fixes secure boot policy rules to be based on CONFIG_MODULE_SIG
instead.

Fixes: 4238fad366a6 ("powerpc/ima: Add support to initialize ima policy rules")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
---
v2:
* Fixes the patch description to specify the problem more clearly as asked 
by Michael Ellerman.

 arch/powerpc/kernel/ima_arch.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/ima_arch.c b/arch/powerpc/kernel/ima_arch.c
index e34116255ced..957abd592075 100644
--- a/arch/powerpc/kernel/ima_arch.c
+++ b/arch/powerpc/kernel/ima_arch.c
@@ -19,12 +19,12 @@ bool arch_ima_get_secureboot(void)
  * to be stored as an xattr or as an appended signature.
  *
  * To avoid duplicate signature verification as much as possible, the IMA
- * policy rule for module appraisal is added only if CONFIG_MODULE_SIG_FORCE
+ * policy rule for module appraisal is added only if CONFIG_MODULE_SIG
  * is not enabled.
  */
 static const char *const secure_rules[] = {
 	"appraise func=KEXEC_KERNEL_CHECK appraise_flag=check_blacklist appraise_type=imasig|modsig",
-#ifndef CONFIG_MODULE_SIG_FORCE
+#ifndef CONFIG_MODULE_SIG
 	"appraise func=MODULE_CHECK appraise_flag=check_blacklist appraise_type=imasig|modsig",
 #endif
 	NULL
@@ -50,7 +50,7 @@ static const char *const secure_and_trusted_rules[] = {
 	"measure func=KEXEC_KERNEL_CHECK template=ima-modsig",
 	"measure func=MODULE_CHECK template=ima-modsig",
 	"appraise func=KEXEC_KERNEL_CHECK appraise_flag=check_blacklist appraise_type=imasig|modsig",
-#ifndef CONFIG_MODULE_SIG_FORCE
+#ifndef CONFIG_MODULE_SIG
 	"appraise func=MODULE_CHECK appraise_flag=check_blacklist appraise_type=imasig|modsig",
 #endif
 	NULL
-- 
2.18.1


^ permalink raw reply related

* [powerpc:topic/uaccess] BUILD REGRESSION 5bb3b9d986426296507d3ef58d1e5fe4625de01f
From: kbuild test robot @ 2020-05-01 10:55 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  topic/uaccess
branch HEAD: 5bb3b9d986426296507d3ef58d1e5fe4625de01f  uaccess: Rename user_access_begin/end() to user_full_access_begin/end()

Error/Warning in current branch:

arch/x86/kernel/signal.c:312:7: error: implicit declaration of function 'user_access_begin'; did you mean 'user_access_end'? [-Werror=implicit-function-declaration]
arch/x86/kernel/signal.c:452:7: error: implicit declaration of function 'user_access_begin'; did you mean 'user_access_end'? [-Werror=implicit-function-declaration]
arch/x86/kernel/vm86_32.c:116:7: error: implicit declaration of function 'user_access_begin'; did you mean 'user_access_end'? [-Werror=implicit-function-declaration]

Error/Warning ids grouped by kconfigs:

recent_errors
|-- i386-allyesconfig
|   |-- arch-x86-kernel-signal.c:error:implicit-declaration-of-function-user_access_begin
|   `-- arch-x86-kernel-vm86_32.c:error:implicit-declaration-of-function-user_access_begin
|-- i386-randconfig-f002-20200430
|   |-- arch-x86-kernel-signal.c:error:implicit-declaration-of-function-user_access_begin
|   `-- arch-x86-kernel-vm86_32.c:error:implicit-declaration-of-function-user_access_begin
|-- x86_64-allmodconfig
|   `-- arch-x86-kernel-signal.c:error:implicit-declaration-of-function-user_access_begin
`-- x86_64-allyesconfig
    `-- arch-x86-kernel-signal.c:error:implicit-declaration-of-function-user_access_begin

elapsed time: 486m

configs tested: 221
configs skipped: 0

arm64                            allyesconfig
arm                              allyesconfig
arm64                            allmodconfig
arm                              allmodconfig
arm64                             allnoconfig
arm                               allnoconfig
arm                           efm32_defconfig
arm                         at91_dt_defconfig
arm                        shmobile_defconfig
arm64                               defconfig
arm                          exynos_defconfig
arm                        multi_v5_defconfig
arm                           sunxi_defconfig
arm                        multi_v7_defconfig
sparc                            allyesconfig
ia64                        generic_defconfig
powerpc                             defconfig
ia64                                defconfig
mips                malta_kvm_guest_defconfig
i386                              allnoconfig
i386                             allyesconfig
i386                             alldefconfig
i386                                defconfig
i386                              debian-10.3
ia64                             allmodconfig
ia64                              allnoconfig
ia64                          tiger_defconfig
ia64                         bigsur_defconfig
ia64                             allyesconfig
ia64                             alldefconfig
m68k                       m5475evb_defconfig
m68k                             allmodconfig
m68k                       bvme6000_defconfig
m68k                           sun3_defconfig
m68k                          multi_defconfig
nios2                         3c120_defconfig
nios2                         10m50_defconfig
c6x                        evmc6678_defconfig
c6x                              allyesconfig
openrisc                 simple_smp_defconfig
openrisc                    or1ksim_defconfig
nds32                               defconfig
nds32                             allnoconfig
csky                                defconfig
alpha                               defconfig
h8300                       h8s-sim_defconfig
h8300                     edosk2674_defconfig
xtensa                          iss_defconfig
h8300                    h8300h-sim_defconfig
xtensa                       common_defconfig
arc                                 defconfig
arc                              allyesconfig
microblaze                      mmu_defconfig
microblaze                    nommu_defconfig
mips                         tb0287_defconfig
mips                       capcella_defconfig
mips                           ip32_defconfig
mips                  decstation_64_defconfig
mips                      loongson3_defconfig
mips                          ath79_defconfig
mips                        bcm63xx_defconfig
mips                      fuloong2e_defconfig
mips                      malta_kvm_defconfig
mips                            ar7_defconfig
mips                             allyesconfig
mips                         64r6el_defconfig
mips                              allnoconfig
mips                           32r2_defconfig
mips                             allmodconfig
parisc                            allnoconfig
parisc                generic-64bit_defconfig
parisc                generic-32bit_defconfig
parisc                           allyesconfig
parisc                           allmodconfig
powerpc                      chrp32_defconfig
powerpc                       holly_defconfig
powerpc                       ppc64_defconfig
powerpc                          rhel-kconfig
powerpc                           allnoconfig
powerpc                  mpc866_ads_defconfig
powerpc                    amigaone_defconfig
powerpc                    adder875_defconfig
powerpc                     ep8248e_defconfig
powerpc                          g5_defconfig
powerpc                     mpc512x_defconfig
m68k                 randconfig-a001-20200501
mips                 randconfig-a001-20200501
nds32                randconfig-a001-20200501
alpha                randconfig-a001-20200501
parisc               randconfig-a001-20200501
riscv                randconfig-a001-20200501
microblaze           randconfig-a001-20200430
nios2                randconfig-a001-20200430
h8300                randconfig-a001-20200430
c6x                  randconfig-a001-20200430
sparc64              randconfig-a001-20200430
s390                 randconfig-a001-20200430
xtensa               randconfig-a001-20200430
csky                 randconfig-a001-20200430
openrisc             randconfig-a001-20200430
sh                   randconfig-a001-20200430
s390                 randconfig-a001-20200501
xtensa               randconfig-a001-20200501
sh                   randconfig-a001-20200501
openrisc             randconfig-a001-20200501
csky                 randconfig-a001-20200501
i386                 randconfig-b001-20200430
i386                 randconfig-b002-20200430
x86_64               randconfig-b001-20200430
i386                 randconfig-b003-20200430
x86_64               randconfig-b002-20200430
x86_64               randconfig-b003-20200430
i386                 randconfig-b003-20200501
x86_64               randconfig-b002-20200501
i386                 randconfig-b001-20200501
x86_64               randconfig-b003-20200501
x86_64               randconfig-b001-20200501
i386                 randconfig-b002-20200501
x86_64               randconfig-c001-20200501
x86_64               randconfig-c002-20200501
i386                 randconfig-c002-20200501
x86_64               randconfig-c003-20200501
i386                 randconfig-c001-20200501
i386                 randconfig-c003-20200501
x86_64               randconfig-c001-20200430
i386                 randconfig-c001-20200430
i386                 randconfig-c002-20200430
x86_64               randconfig-c002-20200430
x86_64               randconfig-c003-20200430
i386                 randconfig-c003-20200430
x86_64               randconfig-d001-20200501
i386                 randconfig-d003-20200501
x86_64               randconfig-d003-20200501
i386                 randconfig-d001-20200501
x86_64               randconfig-d002-20200501
i386                 randconfig-d002-20200501
x86_64               randconfig-d002-20200430
x86_64               randconfig-d001-20200430
i386                 randconfig-d001-20200430
i386                 randconfig-d003-20200430
i386                 randconfig-d002-20200430
x86_64               randconfig-d003-20200430
x86_64               randconfig-e002-20200430
i386                 randconfig-e003-20200430
x86_64               randconfig-e003-20200430
i386                 randconfig-e002-20200430
x86_64               randconfig-e001-20200430
i386                 randconfig-e001-20200430
x86_64               randconfig-e002-20200501
x86_64               randconfig-e003-20200501
i386                 randconfig-e003-20200501
x86_64               randconfig-e001-20200501
i386                 randconfig-e002-20200501
i386                 randconfig-e001-20200501
x86_64               randconfig-f001-20200430
i386                 randconfig-f002-20200430
i386                 randconfig-f003-20200430
i386                 randconfig-f001-20200430
x86_64               randconfig-f003-20200430
i386                 randconfig-f003-20200501
x86_64               randconfig-f001-20200501
x86_64               randconfig-f003-20200501
i386                 randconfig-f001-20200501
i386                 randconfig-f002-20200501
i386                 randconfig-g003-20200501
i386                 randconfig-g002-20200501
x86_64               randconfig-g002-20200501
i386                 randconfig-g001-20200501
x86_64               randconfig-a003-20200501
x86_64               randconfig-a001-20200501
i386                 randconfig-a003-20200501
i386                 randconfig-a002-20200501
i386                 randconfig-a001-20200501
i386                 randconfig-h001-20200501
i386                 randconfig-h002-20200501
i386                 randconfig-h003-20200501
x86_64               randconfig-h001-20200501
x86_64               randconfig-h003-20200501
i386                 randconfig-h002-20200430
i386                 randconfig-h003-20200430
x86_64               randconfig-h001-20200430
x86_64               randconfig-h003-20200430
i386                 randconfig-h001-20200430
sparc                randconfig-a001-20200430
arc                  randconfig-a001-20200430
ia64                 randconfig-a001-20200430
powerpc              randconfig-a001-20200430
arm                  randconfig-a001-20200430
riscv                            allyesconfig
riscv                    nommu_virt_defconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                          rv32_defconfig
riscv                            allmodconfig
s390                       zfcpdump_defconfig
s390                          debug_defconfig
s390                             allyesconfig
s390                              allnoconfig
s390                             allmodconfig
s390                             alldefconfig
s390                                defconfig
sh                          rsk7269_defconfig
sh                               allmodconfig
sh                            titan_defconfig
sh                  sh7785lcr_32bit_defconfig
sh                                allnoconfig
sparc                               defconfig
sparc64                             defconfig
sparc64                           allnoconfig
sparc64                          allyesconfig
sparc64                          allmodconfig
um                           x86_64_defconfig
um                             i386_defconfig
um                                  defconfig
x86_64                                   rhel
x86_64                               rhel-7.6
x86_64                    rhel-7.6-kselftests
x86_64                         rhel-7.2-clear
x86_64                                    lkp
x86_64                              fedora-25
x86_64                                  kexec

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* Re: [PATCH 2/3] ASoC: fsl_esai: Add support for imx8qm
From: Mark Brown @ 2020-05-01 10:21 UTC (permalink / raw)
  To: Shengjiu Wang
  Cc: devicetree, alsa-devel, timur, Xiubo.Lee, lgirdwood, linuxppc-dev,
	tiwai, perex, nicoleotsuka, robh+dt, festevam, linux-kernel
In-Reply-To: <a933bafd2d6a60a69f840d9d4b613337efcf2816.1588320656.git.shengjiu.wang@nxp.com>

[-- Attachment #1: Type: text/plain, Size: 499 bytes --]

On Fri, May 01, 2020 at 04:12:05PM +0800, Shengjiu Wang wrote:
> The difference for esai on imx8qm is that DMA device is EDMA.
> 
> EDMA requires the period size to be multiple of maxburst. Otherwise
> the remaining bytes are not transferred and thus noise is produced.

If this constraint comes from the DMA controller then normally you'd
expect the DMA controller integration to be enforcing this - is there no
information in the DMA API that lets us know that this constraint is
there?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP
From: David Hildenbrand @ 2020-05-01  9:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: virtio-dev, linux-hyperv, Michal Hocko, Baoquan He, linux-acpi,
	Wei Yang, linux-s390, linux-nvdimm, linux-kernel, virtualization,
	linux-mm, Michael S . Tsirkin, Eric W. Biederman, Pankaj Gupta,
	xen-devel, Michal Hocko, linuxppc-dev
In-Reply-To: <20200430152403.e0d6da5eb1cad06411ac6d46@linux-foundation.org>

On 01.05.20 00:24, Andrew Morton wrote:
> On Thu, 30 Apr 2020 20:43:39 +0200 David Hildenbrand <david@redhat.com> wrote:
> 
>>>
>>> Why does the firmware map support hotplug entries?
>>
>> I assume:
>>
>> The firmware memmap was added primarily for x86-64 kexec (and still, is
>> mostly used on x86-64 only IIRC). There, we had ACPI hotplug. When DIMMs
>> get hotplugged on real HW, they get added to e820. Same applies to
>> memory added via HyperV balloon (unless memory is unplugged via
>> ballooning and you reboot ... the the e820 is changed as well). I assume
>> we wanted to be able to reflect that, to make kexec look like a real reboot.
>>
>> This worked for a while. Then came dax/kmem. Now comes virtio-mem.
>>
>>
>> But I assume only Andrew can enlighten us.
>>
>> @Andrew, any guidance here? Should we really add all memory to the
>> firmware memmap, even if this contradicts with the existing
>> documentation? (especially, if the actual firmware memmap will *not*
>> contain that memory after a reboot)
> 
> For some reason that patch is misattributed - it was authored by
> Shaohui Zheng <shaohui.zheng@intel.com>, who hasn't been heard from in
> a decade.  I looked through the email discussion from that time and I'm
> not seeing anything useful.  But I wasn't able to locate Dave Hansen's
> review comments.

Okay, thanks for checking. I think the documentation from 2008 is pretty
clear what has to be done here. I will add some of these details to the
patch description.

Also, now that I know that esp. kexec-tools already don't consider
dax/kmem memory properly (memory will not get dumped via kdump) and
won't really suffer from a name change in /proc/iomem, I will go back to
the MHP_DRIVER_MANAGED approach and
1. Don't create firmware memmap entries
2. Name the resource "System RAM (driver managed)"
3. Flag the resource via something like IORESOURCE_MEM_DRIVER_MANAGED.

This way, kernel users and user space can figure out that this memory
has different semantics and handle it accordingly - I think that was
what Eric was asking for.

Of course, open for suggestions.

-- 
Thanks,

David / dhildenb


^ permalink raw reply

* Re: [PATCH V1 00/10] Remove duplicated kmap code
From: Christoph Hellwig @ 2020-05-01  8:54 UTC (permalink / raw)
  To: ira.weiny
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Huang Rui, Paul Mackerras,
	H. Peter Anvin, sparclinux, Dan Williams, Helge Deller, x86,
	linux-csky, Ingo Molnar, linux-snps-arc, linux-xtensa,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, Chris Zankel, Thomas Bogendoerfer, linux-parisc,
	linux-kernel, David S. Miller, Andrew Morton, linuxppc-dev,
	Christian Koenig
In-Reply-To: <20200430203845.582900-1-ira.weiny@intel.com>

In addition to the work already it the series, it seems like
LAST_PKMAP_MASK, PKMAP_ADDR and PKMAP_NR can also be consolidated
to common code.

Also kmap_atomic_high_prot / kmap_atomic_pfn could move into common
code, maybe keyed off a symbol selected by the actual users that
need it.  It also seems like it doesn't actually ever need to be
exported.

This in turn would lead to being able to allow io_mapping_map_atomic_wc
on all architectures, which might make nouveau and qxl happy, but maybe
that can be left for another series.

^ permalink raw reply

* Re: [PATCH V1 10/10] drm: Remove drm specific kmap_atomic code
From: Christoph Hellwig @ 2020-05-01  8:49 UTC (permalink / raw)
  To: ira.weiny
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Huang Rui, Paul Mackerras,
	H. Peter Anvin, sparclinux, Dan Williams, Helge Deller, x86,
	linux-csky, Ingo Molnar, linux-snps-arc, linux-xtensa,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, Chris Zankel, Thomas Bogendoerfer, linux-parisc,
	linux-kernel, David S. Miller, Andrew Morton, linuxppc-dev,
	Christian Koenig
In-Reply-To: <20200430203845.582900-11-ira.weiny@intel.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox