* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use [not found] ` <20200331034612.GB83248@dhcp-128-65.nay.redhat.com> @ 2020-04-14 17:31 ` James Morse 0 siblings, 0 replies; 61+ messages in thread From: James Morse @ 2020-04-14 17:31 UTC (permalink / raw) To: Dave Young Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel Hi Dave, On 31/03/2020 04:46, Dave Young wrote: > I agreed that file load is still not widely used, but in the long run > we should not maintain both of them all the future time. Especially > when some kernel-userspace interfaces need to be introduced, file load > will have the natural advantage. We may keep the kexec_load for other > misc usecases, but we can use file load for the major modern > linux-to-linux loading. I'm not saying we can do it immediately, just > thought we should reduce the duplicate effort and try to avoid hacking if > possible. Sure. My aim here is to never debug this problem again. > Anyway about this particular issue, I wonder if we can just reload with > a udev rule as replied in another mail. What if it doesn't? I can't find such a rule on my debian machine. I don't think user-space can be relied on for something like this. The best we could hope for here is a dying gasp from the old kernel: | kexec: memory layout changed since kexec load, this may not work. | Bye! ... assuming anyone sees such a message. Thanks, James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use [not found] <20200326180730.4754-1-james.morse@arm.com> [not found] ` <20200330135522.GE6352@MiWiFi-R3L-srv> @ 2020-04-15 20:29 ` Eric W. Biederman 2020-04-22 12:14 ` James Morse [not found] ` <20200326180730.4754-2-james.morse@arm.com> ` (2 subsequent siblings) 4 siblings, 1 reply; 61+ messages in thread From: Eric W. Biederman @ 2020-04-15 20:29 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > Hello! > > arm64 recently queued support for memory hotremove, which led to some > new corner cases for kexec. > > If the kexec segments are loaded for a removable region, that region may > be removed before kexec actually occurs. This causes the first kernel to > lockup when applying the relocations. (I've triggered this on x86 too). > > The first patch adds a memory notifier for kexec so that it can refuse > to allow in-use regions to be taken offline. > > > This doesn't solve the problem for arm64, where the new kernel must > initially rely on the data structures from the first boot to describe > memory. These don't describe hotpluggable memory. > If kexec places the kernel in one of these regions, it must also provide > a DT that describes the region in which the kernel was mapped as memory. > (and somehow ensure its always present in the future...) > > To prevent this from happening accidentally with unaware user-space, > patches two and three allow arm64 to give these regions a different > name. > > This is a change in behaviour for arm64 as memory hotadd and hotremove > were added separately. > > > I haven't tried kdump. > Unaware kdump from user-space probably won't describe the hotplug > regions if the name is different, which saves us from problems if > the memory is no longer present at kdump time, but means the vmcore > is incomplete. > > > These patches are based on arm64's for-next/core branch, but can all > be merged independently. So I just looked through these quickly and I think there are real problems here we can fix, and that are worth fixing. However I am not thrilled with the fixes you propose. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use 2020-04-15 20:29 ` Eric W. Biederman @ 2020-04-22 12:14 ` James Morse 2020-04-22 13:04 ` Eric W. Biederman 0 siblings, 1 reply; 61+ messages in thread From: James Morse @ 2020-04-22 12:14 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi Eric, On 15/04/2020 21:29, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: > >> Hello! >> >> arm64 recently queued support for memory hotremove, which led to some >> new corner cases for kexec. >> >> If the kexec segments are loaded for a removable region, that region may >> be removed before kexec actually occurs. This causes the first kernel to >> lockup when applying the relocations. (I've triggered this on x86 too). >> >> The first patch adds a memory notifier for kexec so that it can refuse >> to allow in-use regions to be taken offline. >> >> >> This doesn't solve the problem for arm64, where the new kernel must >> initially rely on the data structures from the first boot to describe >> memory. These don't describe hotpluggable memory. >> If kexec places the kernel in one of these regions, it must also provide >> a DT that describes the region in which the kernel was mapped as memory. >> (and somehow ensure its always present in the future...) >> >> To prevent this from happening accidentally with unaware user-space, >> patches two and three allow arm64 to give these regions a different >> name. >> >> This is a change in behaviour for arm64 as memory hotadd and hotremove >> were added separately. >> >> >> I haven't tried kdump. >> Unaware kdump from user-space probably won't describe the hotplug >> regions if the name is different, which saves us from problems if >> the memory is no longer present at kdump time, but means the vmcore >> is incomplete. >> >> >> These patches are based on arm64's for-next/core branch, but can all >> be merged independently. > > So I just looked through these quickly and I think there are real > problems here we can fix, and that are worth fixing. > > However I am not thrilled with the fixes you propose. Sure. Unfortunately /proc/iomem is the only trick arm64 has to keep the existing kexec-tools working. (We've had 'unthrilling' patches like this before to prevent user-space from loading the kernel over the top of the in-memory firmware tables.) arm64 expects the description of memory to come from firmware, be that UEFI for memory present at boot, or the ACPI AML methods for memory that was added later. On arm64 there is no standard location for memory. The kernel has to be handed a pointer to the firmware tables that describe it. The kernel expects to boot from memory that was present at boot. Modifying the firmware tables at runtime doesn't solve the problem as we may need to move the firmware-reserved memory region that describes memory. User-space may still load and kexec either side of that update. Even if we could modify the structures at runtime, we can't update a loaded kexec image. We have no idea which blob from userspace is the DT. It may not even be linux that has been loaded. We can't emulate parts of UEFI's handover because kexec's purgatory isn't an EFI program. I can't see a path through all this. If we have to modify existing user-space, I'd rather leave it broken. We can detect the problem in the arch code and print a warning at load time. James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use 2020-04-22 12:14 ` James Morse @ 2020-04-22 13:04 ` Eric W. Biederman 2020-04-22 15:40 ` James Morse 0 siblings, 1 reply; 61+ messages in thread From: Eric W. Biederman @ 2020-04-22 13:04 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > Hi Eric, > > On 15/04/2020 21:29, Eric W. Biederman wrote: >> James Morse <james.morse@arm.com> writes: >> >>> Hello! >>> >>> arm64 recently queued support for memory hotremove, which led to some >>> new corner cases for kexec. >>> >>> If the kexec segments are loaded for a removable region, that region may >>> be removed before kexec actually occurs. This causes the first kernel to >>> lockup when applying the relocations. (I've triggered this on x86 too). >>> >>> The first patch adds a memory notifier for kexec so that it can refuse >>> to allow in-use regions to be taken offline. >>> >>> >>> This doesn't solve the problem for arm64, where the new kernel must >>> initially rely on the data structures from the first boot to describe >>> memory. These don't describe hotpluggable memory. >>> If kexec places the kernel in one of these regions, it must also provide >>> a DT that describes the region in which the kernel was mapped as memory. >>> (and somehow ensure its always present in the future...) >>> >>> To prevent this from happening accidentally with unaware user-space, >>> patches two and three allow arm64 to give these regions a different >>> name. >>> >>> This is a change in behaviour for arm64 as memory hotadd and hotremove >>> were added separately. >>> >>> >>> I haven't tried kdump. >>> Unaware kdump from user-space probably won't describe the hotplug >>> regions if the name is different, which saves us from problems if >>> the memory is no longer present at kdump time, but means the vmcore >>> is incomplete. >>> >>> >>> These patches are based on arm64's for-next/core branch, but can all >>> be merged independently. >> >> So I just looked through these quickly and I think there are real >> problems here we can fix, and that are worth fixing. >> >> However I am not thrilled with the fixes you propose. > > Sure. Unfortunately /proc/iomem is the only trick arm64 has to keep the existing > kexec-tools working. > (We've had 'unthrilling' patches like this before to prevent user-space from loading the > kernel over the top of the in-memory firmware tables.) > > arm64 expects the description of memory to come from firmware, be that UEFI for memory > present at boot, or the ACPI AML methods for memory that was added > later. > > On arm64 there is no standard location for memory. The kernel has to be handed a pointer > to the firmware tables that describe it. The kernel expects to boot from memory that was > present at boot. What do you do when the firmware is wrong? Does arm64 support the mem=xxx@yyy kernel command line options? If you want to handle the general case of memory hotplug having a limitation that you have to boot from memory that was present at boot is a bug, because the memory might not be there. > Modifying the firmware tables at runtime doesn't solve the problem as we may need to move > the firmware-reserved memory region that describes memory. User-space may still load and > kexec either side of that update. > > Even if we could modify the structures at runtime, we can't update a loaded kexec image. > We have no idea which blob from userspace is the DT. It may not even be linux that has > been loaded. What can be done and very reasonably so is on memory hotplug: - Unloaded any loaded kexec image. - Block loading any new image until the hotplug operation completes. That is simple and generic, and can be done for all architectures. This doesn't apply to kexec on panic kernel because it fundamentally needs to figure out how to limp along (or reliably stop) when it has the wrong memory map. > We can't emulate parts of UEFI's handover because kexec's purgatory > isn't an EFI program. Plus much of EFI is unusable after ExitBootServices is called. > I can't see a path through all this. If we have to modify existing user-space, I'd rather > leave it broken. We can detect the problem in the arch code and print a warning at load time. The weirdest thing to me in all of this is that you have been wanting to handle memory hotplug. But you don't want to change or deal with the memory map changing when hotplug occurs. The memory map changing is fundamentally memory hotplug does. So I think it is fundamental to figure out how to pass the updated memory map. Either through command line mem=xxx@yyy command line options or through another option. If you really want to keep the limitation that you have to have the kernel in the initial memory map you can compare that map to the efi tables when selecting the load address. Expecting userspace to reload the loaded kernel after memory hotplug is completely reasonable. Unless I am mistaken memory hotplug is expected to be a rare event not something that happens every day, certainly not something that happens every minute. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use 2020-04-22 13:04 ` Eric W. Biederman @ 2020-04-22 15:40 ` James Morse 0 siblings, 0 replies; 61+ messages in thread From: James Morse @ 2020-04-22 15:40 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi Eric, On 22/04/2020 14:04, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: >> On 15/04/2020 21:29, Eric W. Biederman wrote: >>> James Morse <james.morse@arm.com> writes: >>>> arm64 recently queued support for memory hotremove, which led to some >>>> new corner cases for kexec. >>>> >>>> If the kexec segments are loaded for a removable region, that region may >>>> be removed before kexec actually occurs. This causes the first kernel to >>>> lockup when applying the relocations. (I've triggered this on x86 too). >>>> >>>> The first patch adds a memory notifier for kexec so that it can refuse >>>> to allow in-use regions to be taken offline. >>>> >>>> >>>> This doesn't solve the problem for arm64, where the new kernel must >>>> initially rely on the data structures from the first boot to describe >>>> memory. These don't describe hotpluggable memory. >>>> If kexec places the kernel in one of these regions, it must also provide >>>> a DT that describes the region in which the kernel was mapped as memory. >>>> (and somehow ensure its always present in the future...) >>>> >>>> To prevent this from happening accidentally with unaware user-space, >>>> patches two and three allow arm64 to give these regions a different >>>> name. >>>> >>>> This is a change in behaviour for arm64 as memory hotadd and hotremove >>>> were added separately. >>>> >>>> >>>> I haven't tried kdump. >>>> Unaware kdump from user-space probably won't describe the hotplug >>>> regions if the name is different, which saves us from problems if >>>> the memory is no longer present at kdump time, but means the vmcore >>>> is incomplete. >>>> >>>> >>>> These patches are based on arm64's for-next/core branch, but can all >>>> be merged independently. >>> >>> So I just looked through these quickly and I think there are real >>> problems here we can fix, and that are worth fixing. >>> >>> However I am not thrilled with the fixes you propose. >> >> Sure. Unfortunately /proc/iomem is the only trick arm64 has to keep the existing >> kexec-tools working. >> (We've had 'unthrilling' patches like this before to prevent user-space from loading the >> kernel over the top of the in-memory firmware tables.) >> >> arm64 expects the description of memory to come from firmware, be that UEFI for memory >> present at boot, or the ACPI AML methods for memory that was added >> later. >> >> On arm64 there is no standard location for memory. The kernel has to be handed a pointer >> to the firmware tables that describe it. The kernel expects to boot from memory that was >> present at boot. > What do you do when the firmware is wrong? The firmware gets fixed. Its the only source of facts about the platform. > Does arm64 support the > mem=xxx@yyy kernel command line options? Only the debug option to reduce the available memory. > If you want to handle the general case of memory hotplug having a > limitation that you have to boot from memory that was present at boot is > a bug, because the memory might not be there. arm64's arch code prevents the memory described by the UEFI memory map from being taken offline/removed. Memory present at boot may have firmware reservations, that are being used by some other agent in the system. firmware-first RAS errors are one, the interrupt controllers' property and pending tables are another. The UEFI memory map's description of memory may have been incomplete, as there may have been regions carved-out, not described at all instead of described as reserved. The UEFI runtime services will live in memory described by the UEFI memory map. >> Modifying the firmware tables at runtime doesn't solve the problem as we may need to move >> the firmware-reserved memory region that describes memory. User-space may still load and >> kexec either side of that update. >> >> Even if we could modify the structures at runtime, we can't update a loaded kexec image. >> We have no idea which blob from userspace is the DT. It may not even be linux that has >> been loaded. > > What can be done and very reasonably so is on memory hotplug: > - Unloaded any loaded kexec image. > - Block loading any new image until the hotplug operation completes. > > That is simple and generic, and can be done for all architectures. Yes, certainly. > This doesn't apply to kexec on panic kernel because it fundamentally > needs to figure out how to limp along (or reliably stop) when it has the > wrong memory map. > >> We can't emulate parts of UEFI's handover because kexec's purgatory >> isn't an EFI program. > > Plus much of EFI is unusable after ExitBootServices is called. Of course, we even overwrite its code when allocating memory for the kernel. I bring it up because it is our only way of handing over the memory map of the system. >> I can't see a path through all this. If we have to modify existing user-space, I'd rather >> leave it broken. We can detect the problem in the arch code and print a warning at load time. > The weirdest thing to me in all of this is that you have been wanting to > handle memory hotplug. But you don't want to change or deal with the > memory map changing when hotplug occurs. The memory map changing is > fundamentally memory hotplug does. arm64 doesn't have a 'the memory map', just what came from firmware. The memory map linux uses is built from these firmware descriptions. Memory is discovered from: early: The DT memory node. early: The UEFI memory map. later: ACPI hotplug memory. Later kexec()d or kdump'd kernels rebuild the memory map from the firmware description. This means kexec is totally invisible. Not changing these descriptions is important to ensure we don't accidentally corrupt them, or make up some property that isn't true. Your request to 'change' the memory map involves creating a new UEFI memory map that describes the memory we found via ACPI hotplug. arm64 doesn't do this because we expect the next kernel to re-discover this memory via ACPI hotplug. Generally, arm64 expects a kexec'd kernel to learn and discover things in exactly the same way that it would have done if it were the first kernel to have been booted. > So I think it is fundamental to figure out how to pass the updated > memory map. Either through command line mem=xxx@yyy command line > options or through another option. We re-discover it from firmware. Booting from memory that is not described as memory early enough is the second problem addressed by this series. > If you really want to keep the limitation that you have to have the > kernel in the initial memory map you can compare that map to the > efi tables when selecting the load address. Great. How can user-space know the contents of that map? It only reads /proc/iomem today. On a system that doesn't support APCI memory hotplug, /proc/iomem describes the memory present at boot. These things have never been different before. > Expecting userspace to reload the loaded kernel after memory hotplug is > completely reasonable. I'm sold on this, it implicitly solves the 'kexec image wants to be copied into removed memory' problem. > Unless I am mistaken memory hotplug is expected to be a rare event not > something that happens every day, certainly not something that happens > every minute. One of the motivations for supporting memory hotplug is for VMs. Container projects like to create VMs in advance, then reconfigure them just before they are used. This saves the time taken by the hypervisor to do its work. Hitting the 'not booted from boot memory' is now just using kexec in a VM deployed like this. Thanks, James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
[parent not found: <20200326180730.4754-2-james.morse@arm.com>]
[parent not found: <321e6bf7-e898-7701-dd60-6c25237ff9cd@redhat.com>]
[parent not found: <a21d90ea-2566-a2bc-ad2f-6464a416c97f@arm.com>]
[parent not found: <9cb4ea0d-34c3-de42-4b3f-ee25a59c4835@redhat.com>]
[parent not found: <b0443908-e36f-9bc4-4a8a-4206cb782d4b@arm.com>]
[parent not found: <72672e2c-a57a-8df9-0cff-8035cbce7740@redhat.com>]
[parent not found: <34274b02-60ba-eb78-eacd-6dc1146ed3cd@arm.com>]
[parent not found: <80e4d1d7-f493-3f66-f700-86f18002d692@redhat.com>]
[parent not found: <dfacf85f-d79d-8742-7a13-1ac0a67bad04@arm.com>]
[parent not found: <ba481c82-c69e-043c-4b66-2d2c7732cf07@redhat.com>]
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image [not found] ` <ba481c82-c69e-043c-4b66-2d2c7732cf07@redhat.com> @ 2020-04-10 19:10 ` Andrew Morton 2020-04-11 3:44 ` Baoquan He 2020-04-14 7:05 ` David Hildenbrand 0 siblings, 2 replies; 61+ messages in thread From: Andrew Morton @ 2020-04-10 19:10 UTC (permalink / raw) To: David Hildenbrand Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Eric Biederman, Will Deacon, linux-arm-kernel It's unclear (to me) what is the status of this patchset. But it does appear that an new version can be expected? _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-10 19:10 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image Andrew Morton @ 2020-04-11 3:44 ` Baoquan He 2020-04-11 9:30 ` Russell King - ARM Linux admin 2020-04-14 7:05 ` David Hildenbrand 1 sibling, 1 reply; 61+ messages in thread From: Baoquan He @ 2020-04-11 3:44 UTC (permalink / raw) To: Andrew Morton Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, linux-mm, James Morse, Eric Biederman, Will Deacon, linux-arm-kernel On 04/10/20 at 12:10pm, Andrew Morton wrote: > It's unclear (to me) what is the status of this patchset. But it does appear that > an new version can be expected? As we discussed in the thread of replying to the cover letter, the idea of this patchset is not good. Because We tend to use kexec_file_load more and improve/enhance it in the future, and gradually obsolete the old kexec_load interface which this patchset is trying to fix on. And the issue James spot is a very corner case, we have suggested another easier way to avoid it by adding systemd service to load kexec and monitor memory adding/removing uevent, juas as we have done for kdump loading. Bhupesh is working on this to add a service in Fedora and test, and will put it to RHEL too if nobody is unsatisfied. Thanks Baoquan _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-11 3:44 ` Baoquan He @ 2020-04-11 9:30 ` Russell King - ARM Linux admin 2020-04-11 9:58 ` David Hildenbrand 2020-04-12 5:35 ` Baoquan He 0 siblings, 2 replies; 61+ messages in thread From: Russell King - ARM Linux admin @ 2020-04-11 9:30 UTC (permalink / raw) To: Baoquan He Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, linux-mm, James Morse, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On Sat, Apr 11, 2020 at 11:44:14AM +0800, Baoquan He wrote: > Because We tend to use kexec_file_load more and improve/enhance it in the > future, and gradually obsolete the old kexec_load interface which this > patchset is trying to fix on. That's not going to happen; 32-bit ARM kexec uses the kexec_load interface rather than the kexec_file_load version, and I see no one with any interest in changing that - and there's users of the former. I don't see how it's possible to convert 32-bit ARM kexec to the kexec_file_load interface - this assumes that all you have are the kernel, initrd, and commandline, but on 32-bit ARM kexec, we have kernel, initrd and the dtb blob which the user can specify. So, if we wanted to obsolete the kexec_load interface, _first_ there needs to be a way to provide users with the existing functionality they have already in place on 32-bit ARM - otherwise we're looking at a userspace regression. Especially as kexec_file_load takes precedence on some distro patched versions of the kexec tool, irrespective of which interface the user requests of the tool. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-11 9:30 ` Russell King - ARM Linux admin @ 2020-04-11 9:58 ` David Hildenbrand 2020-04-12 5:35 ` Baoquan He 1 sibling, 0 replies; 61+ messages in thread From: David Hildenbrand @ 2020-04-11 9:58 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, linux-mm, James Morse, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel > Am 11.04.2020 um 11:40 schrieb Russell King - ARM Linux admin <linux@armlinux.org.uk>: > > On Sat, Apr 11, 2020 at 11:44:14AM +0800, Baoquan He wrote: >> Because We tend to use kexec_file_load more and improve/enhance it in the >> future, and gradually obsolete the old kexec_load interface which this >> patchset is trying to fix on. > > That's not going to happen; 32-bit ARM kexec uses the kexec_load > interface rather than the kexec_file_load version, and I see no one > with any interest in changing that - and there's users of the former. > > I don't see how it's possible to convert 32-bit ARM kexec to the > kexec_file_load interface - this assumes that all you have are the > kernel, initrd, and commandline, but on 32-bit ARM kexec, we have > kernel, initrd and the dtb blob which the user can specify. > > So, if we wanted to obsolete the kexec_load interface, _first_ there > needs to be a way to provide users with the existing functionality > they have already in place on 32-bit ARM - otherwise we're looking > at a userspace regression. Especially as kexec_file_load takes > precedence on some distro patched versions of the kexec tool, > irrespective of which interface the user requests of the tool. > On 32bit architectures we usually don‘t really care about memory hotplug. So we could deprecate it only for 64bit architectures AFAIKS. > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up > _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-11 9:30 ` Russell King - ARM Linux admin 2020-04-11 9:58 ` David Hildenbrand @ 2020-04-12 5:35 ` Baoquan He 2020-04-12 8:08 ` Russell King - ARM Linux admin 1 sibling, 1 reply; 61+ messages in thread From: Baoquan He @ 2020-04-12 5:35 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, linux-mm, James Morse, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/11/20 at 10:30am, Russell King - ARM Linux admin wrote: > On Sat, Apr 11, 2020 at 11:44:14AM +0800, Baoquan He wrote: > > Because We tend to use kexec_file_load more and improve/enhance it in the > > future, and gradually obsolete the old kexec_load interface which this > > patchset is trying to fix on. > > That's not going to happen; 32-bit ARM kexec uses the kexec_load > interface rather than the kexec_file_load version, and I see no one > with any interest in changing that - and there's users of the former. > > I don't see how it's possible to convert 32-bit ARM kexec to the > kexec_file_load interface - this assumes that all you have are the > kernel, initrd, and commandline, but on 32-bit ARM kexec, we have > kernel, initrd and the dtb blob which the user can specify. Well, I understand what you said about 32-bit ARM support with only kexec_old support thing. That's why I said we tend to obsolete it 'GRADUALLY'. It's the existing users who are using kexec_load, and the ARCHes which only has kexec_load, make us have to transfer to kexec_file_load gradually. Comparing with kexec_load, kexec_file_load has only one disadvantage, that is some ARCHes only have kexec_load. Otherwise, kexec_file_load benefits kexec/kdump developping/maintaining very much. The loading job of kexec_file_load is mostly done in kernel, we can get whatever we want about kernel information very conveniently to do anything needed. For the kexec_load interface, the loading job is mostly done in userspace, we have to export kernel information to procfs, sysfs, etc, then parse them in kexec_tools, finally passed it to kernel part of kexec loading. The gradual obsoleting means we may only add feature/improvement/enhancement to kexec_file_load. And if a bug fix is needed for both kexec_load and kexec_file_load, and the fix is very complicated, we may only fix it in kexec_file_load too. Kexec_file_load interface is suggested to add if does't have, just port user space part to kernel as x86/s390/arm64 have done. Surely, it doesn't mean we don't fix the critical/blocker bug with kexec_load loading. We still try to do, just are not so eager. In the existing product environment, the kexec_load is used, just keep using it. Do we bother to change it to kexec_file_load, e.g in our RHEL7 distros? Certainly not. But in our new product, we will change to use kexec_file_load interface. I guess this is similar with arm64. The advantage and benefit have been told in the 2nd paragraph. As for 32-bit ARM, is it like the old product, we have many in-use systems deployed in customers' laboratory? Wondering if ARM continues designing new 32-bit ARM cpu, and some companies continue producing tons of 32-bit ARM cpus. If yes, I think we need continue taking care of kexec_load if 32-bit ARM can't convert to kexec_file_load. If not, it may be not a barrier when we consider converting kexec_load to kexec_file_load in other ARCHes. We just need keep using it, try to fix those critical/blocker bug in kexec_load interface if encountered. Finally, comning back to this patchset itself, the issue James spotted is not so ciritical, I would say. When I do kexec jumping, I will do loading firstly, then trigge jumping. I can think of the case that people may load kexec-ed kernel, then do something else, later she/he triggers the kexec jumping. These are not necessary steps. As Dave and I replied to James in the cover-letter thread, adding a systemd service of kexec loading, monitor hotplug uevent, reload it if any hot remove happened. This is quite easy to do, I don't see any problem with it, and why we don't do like this. My personal opinion, please tell if I miss anything. > > So, if we wanted to obsolete the kexec_load interface, _first_ there > needs to be a way to provide users with the existing functionality > they have already in place on 32-bit ARM - otherwise we're looking > at a userspace regression. Especially as kexec_file_load takes > precedence on some distro patched versions of the kexec tool, > irrespective of which interface the user requests of the tool. > > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up > _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-12 5:35 ` Baoquan He @ 2020-04-12 8:08 ` Russell King - ARM Linux admin 2020-04-12 19:52 ` Eric W. Biederman 0 siblings, 1 reply; 61+ messages in thread From: Russell King - ARM Linux admin @ 2020-04-12 8:08 UTC (permalink / raw) To: Baoquan He Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, linux-mm, James Morse, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On Sun, Apr 12, 2020 at 01:35:07PM +0800, Baoquan He wrote: > On 04/11/20 at 10:30am, Russell King - ARM Linux admin wrote: > > On Sat, Apr 11, 2020 at 11:44:14AM +0800, Baoquan He wrote: > > > Because We tend to use kexec_file_load more and improve/enhance it in the > > > future, and gradually obsolete the old kexec_load interface which this > > > patchset is trying to fix on. > > > > That's not going to happen; 32-bit ARM kexec uses the kexec_load > > interface rather than the kexec_file_load version, and I see no one > > with any interest in changing that - and there's users of the former. > > > > I don't see how it's possible to convert 32-bit ARM kexec to the > > kexec_file_load interface - this assumes that all you have are the > > kernel, initrd, and commandline, but on 32-bit ARM kexec, we have > > kernel, initrd and the dtb blob which the user can specify. > > Well, I understand what you said about 32-bit ARM support with only > kexec_old support thing. That's why I said we tend to obsolete it > 'GRADUALLY'. It's the existing users who are using kexec_load, and the > ARCHes which only has kexec_load, make us have to transfer to > kexec_file_load gradually. > > Comparing with kexec_load, kexec_file_load has only one disadvantage, > that is some ARCHes only have kexec_load. Otherwise, kexec_file_load > benefits kexec/kdump developping/maintaining very much. The loading job > of kexec_file_load is mostly done in kernel, we can get whatever we > want about kernel information very conveniently to do anything needed. > For the kexec_load interface, the loading job is mostly done in > userspace, we have to export kernel information to procfs, sysfs, etc, > then parse them in kexec_tools, finally passed it to kernel part of > kexec loading. > > The gradual obsoleting means we may only add > feature/improvement/enhancement to kexec_file_load. And if a bug fix is > needed for both kexec_load and kexec_file_load, and the fix is very > complicated, we may only fix it in kexec_file_load too. Kexec_file_load > interface is suggested to add if does't have, just port user space part > to kernel as x86/s390/arm64 have done. > > Surely, it doesn't mean we don't fix the critical/blocker bug with > kexec_load loading. We still try to do, just are not so eager. In the > existing product environment, the kexec_load is used, just keep using > it. Do we bother to change it to kexec_file_load, e.g in our RHEL7 > distros? Certainly not. But in our new product, we will change to use > kexec_file_load interface. I guess this is similar with arm64. The > advantage and benefit have been told in the 2nd paragraph. > > > As for 32-bit ARM, is it like the old product, we have many in-use systems > deployed in customers' laboratory? Wondering if ARM continues designing > new 32-bit ARM cpu, and some companies continue producing tons of 32-bit ARM > cpus. If yes, I think we need continue taking care of kexec_load if > 32-bit ARM can't convert to kexec_file_load. If not, it may be not a > barrier when we consider converting kexec_load to kexec_file_load in > other ARCHes. We just need keep using it, try to fix those critical/blocker > bug in kexec_load interface if encountered. > > Finally, comning back to this patchset itself, the issue James spotted > is not so ciritical, I would say. When I do kexec jumping, I will do > loading firstly, then trigge jumping. I can think of the case that > people may load kexec-ed kernel, then do something else, later she/he > triggers the kexec jumping. These are not necessary steps. As Dave and I > replied to James in the cover-letter thread, adding a systemd service of > kexec loading, monitor hotplug uevent, reload it if any hot remove > happened. This is quite easy to do, I don't see any problem with it, and > why we don't do like this. > > My personal opinion, please tell if I miss anything. All that opinion and hand waving about the benefits of the new interface is totally irrelevent for 32-bit ARM for the reasons I stated in my email to which you replied. Gradual obsolecence or not, the file interface can't be supported on 32-bit ARM as-is - it is totally inadequate and inferior as an API compared to the functionality we have with plain kexec_load. Without that point addressed, kexec_file_load is meaningless for 32-bit ARM. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-12 8:08 ` Russell King - ARM Linux admin @ 2020-04-12 19:52 ` Eric W. Biederman 2020-04-12 20:37 ` Bhupesh SHARMA 2020-04-13 2:37 ` Baoquan He 0 siblings, 2 replies; 61+ messages in thread From: Eric W. Biederman @ 2020-04-12 19:52 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: Baoquan He, David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel The only benefit of kexec_file_load is that it is simple enough from a kernel perspective that signatures can be checked. kexec_load in every other respect is the more capable and functional interface. It makes no sense to get rid of it. It does make sense to reload with a loaded kernel on memory hotplug. That is simple and easy. If we are going to handle something in the kernel it should simple an automated unloading of the kernel on memory hotplug. I think it would be irresponsible to deprecate kexec_load on any platform. I also suspect that kexec_file_load could be taught to copy the dtb on arm32 if someone wants to deal with signatures. We definitely can not even think of deprecating kexec_load until architecture that supports it also supports kexec_file_load and everyone is happy with that interface. That is Linus's no regression rule. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-12 19:52 ` Eric W. Biederman @ 2020-04-12 20:37 ` Bhupesh SHARMA 2020-04-13 2:37 ` Baoquan He 1 sibling, 0 replies; 61+ messages in thread From: Bhupesh SHARMA @ 2020-04-12 20:37 UTC (permalink / raw) To: Eric W. Biederman Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On Mon, Apr 13, 2020 at 1:26 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > > The only benefit of kexec_file_load is that it is simple enough from a > kernel perspective that signatures can be checked. > > kexec_load in every other respect is the more capable and functional > interface. It makes no sense to get rid of it. > > It does make sense to reload with a loaded kernel on memory hotplug. > That is simple and easy. If we are going to handle something in the > kernel it should simple an automated unloading of the kernel on memory > hotplug. > > > I think it would be irresponsible to deprecate kexec_load on any > platform. > > I also suspect that kexec_file_load could be taught to copy the dtb > on arm32 if someone wants to deal with signatures. > > We definitely can not even think of deprecating kexec_load until > architecture that supports it also supports kexec_file_load and everyone > is happy with that interface. That is Linus's no regression rule. TBH, I have seen several active users of kexec_load on arm32 environments and we have been trying to help them with kexec issues on arm32 in recent past as well. So, I agree with Eric's view that probably deprecating this in favour of kexec_file_load will break these existing environment. I tried to do some work at the start of this year to add kexec_file_load support for arm32 in my spare cycles, but I gave up as the arm32 hardware had a broken firmware and couldn't boot latest upstream kernel. May be I try to find some spare cycles in the coming days to do it. But I think since kexec_load is an important interface on these arm32 boards for supporting existing kexec-based bootloaders, we should continue supporting the same until kexec_file_load is supported/mature enough for arm32. Thanks, Bhupesh _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-12 19:52 ` Eric W. Biederman 2020-04-12 20:37 ` Bhupesh SHARMA @ 2020-04-13 2:37 ` Baoquan He 2020-04-13 13:15 ` Eric W. Biederman 1 sibling, 1 reply; 61+ messages in thread From: Baoquan He @ 2020-04-13 2:37 UTC (permalink / raw) To: Eric W. Biederman Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > > The only benefit of kexec_file_load is that it is simple enough from a > kernel perspective that signatures can be checked. We don't have this restriction any more with below commit: commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE") With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both secure boot or legacy system for kexec/kdump. Being simple enough is enough to astract and convince us to use it instead. And kexec_file_load has been in use for several years on systems with secure boot, since added in 2014, on x86_64. > > kexec_load in every other respect is the more capable and functional > interface. It makes no sense to get rid of it. > > It does make sense to reload with a loaded kernel on memory hotplug. > That is simple and easy. If we are going to handle something in the > kernel it should simple an automated unloading of the kernel on memory > hotplug. > > > I think it would be irresponsible to deprecate kexec_load on any > platform. > > I also suspect that kexec_file_load could be taught to copy the dtb > on arm32 if someone wants to deal with signatures. > > We definitely can not even think of deprecating kexec_load until > architecture that supports it also supports kexec_file_load and everyone > is happy with that interface. That is Linus's no regression rule. I should pick a milder word to express our tendency and tell our plan then 'obsolete'. Even though I added 'gradually', seems it doesn't help much. I didn't mean to say 'deprecate' at all when replied. The situation and trend I understand about kexec_load and kexec_file_load are: 1) Supporting kexec_file_load is suggested to add in ARCHes which don't have yet, just as x86_64, arm64 and s390 have done; 2) kexec_file_load is suggested to use, and take precedence over kexec_load in the future, if both are supported in one ARCH. 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, and by ARCHes for back compatibility w/ kexec_file_load support. For 1) and 2), I think the reason is obvious as Eric said, kexec_file_load is simple enough. And currently, whenever we got a bug report, we may need fix them twice, for kexec_load and kexec_file_load. If kexec_file_load is made by default, e.g on x86_64, we will change it in kernel space only, for kexec_file_load. This is what I meant about 'obsolete gradually'. I think for arm64, s390, they will do these too. Unless there's some critical/blocker bug in kexec_load, to corrupt the old kexec_load interface in old product. For 3), people can still use kexec_load and develop/fix for it, if no kexec_file_load supported. But 32-bit arm should be a different one, more like i386, we will leave it as is, and fix anything which could break it. But people really expects to improve or add feature to it? E.g in this patchset, the mem hotplug issue James raised, I assume James is focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in another reply, people even don't agree to continue supporting memory hotplug on 32-bit system. We ever took effort to fix a memory hotplug bug on i386 with a patch, but people would rather set it as BROKEN. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-13 2:37 ` Baoquan He @ 2020-04-13 13:15 ` Eric W. Biederman 2020-04-13 23:01 ` Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: Eric W. Biederman @ 2020-04-13 13:15 UTC (permalink / raw) To: Baoquan He Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel Baoquan He <bhe@redhat.com> writes: > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >> >> The only benefit of kexec_file_load is that it is simple enough from a >> kernel perspective that signatures can be checked. > > We don't have this restriction any more with below commit: > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > and KEXEC_SIG_FORCE") > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > secure boot or legacy system for kexec/kdump. Being simple enough is > enough to astract and convince us to use it instead. And kexec_file_load > has been in use for several years on systems with secure boot, since > added in 2014, on x86_64. No. Actaully kexec_file_load is the less capable interface, and less flexible interface. Which is why it is appropriate for signature verification. >> kexec_load in every other respect is the more capable and functional >> interface. It makes no sense to get rid of it. >> >> It does make sense to reload with a loaded kernel on memory hotplug. >> That is simple and easy. If we are going to handle something in the >> kernel it should simple an automated unloading of the kernel on memory >> hotplug. >> >> >> I think it would be irresponsible to deprecate kexec_load on any >> platform. >> >> I also suspect that kexec_file_load could be taught to copy the dtb >> on arm32 if someone wants to deal with signatures. >> >> We definitely can not even think of deprecating kexec_load until >> architecture that supports it also supports kexec_file_load and everyone >> is happy with that interface. That is Linus's no regression rule. > > I should pick a milder word to express our tendency and tell our plan > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > much. I didn't mean to say 'deprecate' at all when replied. > > The situation and trend I understand about kexec_load and kexec_file_load > are: > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > have yet, just as x86_64, arm64 and s390 have done; > > 2) kexec_file_load is suggested to use, and take precedence over > kexec_load in the future, if both are supported in one ARCH. The deep problem is that kexec_file_load is distinctly less expressive than kexec_load. > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > and by ARCHes for back compatibility w/ kexec_file_load support. > > For 1) and 2), I think the reason is obvious as Eric said, > kexec_file_load is simple enough. And currently, whenever we got a bug > report, we may need fix them twice, for kexec_load and kexec_file_load. > If kexec_file_load is made by default, e.g on x86_64, we will change it > in kernel space only, for kexec_file_load. This is what I meant about > 'obsolete gradually'. I think for arm64, s390, they will do these too. > Unless there's some critical/blocker bug in kexec_load, to corrupt the > old kexec_load interface in old product. Maybe. The code that kexec_file_load sucked into the kernel is quite stable and rarely needs changes except during a port of kexec to another architecture. Last I looked the real maintenance effor of kexec and kexec on panic was in the drivers. So I don't think we can use maintenance to do anything. > For 3), people can still use kexec_load and develop/fix for it, if no > kexec_file_load supported. But 32-bit arm should be a different one, > more like i386, we will leave it as is, and fix anything which could > break it. But people really expects to improve or add feature to it? E.g > in this patchset, the mem hotplug issue James raised, I assume James is > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > another reply, people even don't agree to continue supporting memory > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > bug on i386 with a patch, but people would rather set it as BROKEN. For memory hotplug just reload. Userspace already gets good events. We should not expect anything except a panic kernel to be loaded over a memory hotplug event. The kexec on panic code should actually be loaded in a location that we don't reliquish if asked for it. Quite frankly at this point I would love to see the signature fad die, which would allow us to remove kexec_file_load. I still have not seen the signature code used anywhere except by people anticipating trouble. Given that Microsoft has already directly signed a malicous bootloader. (Not in the Linux ecosystem). I don't even know if any of the reasons for having kexec_file_load are legtimate. If someone wants to do the work and ensure everything that is possible to load with kexec_load is possible to load with kexec_file_load. Kernels supporting the multi-boot protocol etc. Then we can consider deprecating kexec_load. I think it took me about 15 years to remove the sysctl system call and it only ever had about 10 users. If you want to go through that kind of work to make certain there are no more users and that everything they could do with the old interface is doable with the new interface then please be my guest. Until then we need to fully support kexec_load. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-13 13:15 ` Eric W. Biederman @ 2020-04-13 23:01 ` Andrew Morton 2020-04-14 6:13 ` Eric W. Biederman 2020-04-14 6:40 ` Baoquan He 2020-04-14 9:16 ` Dave Young 2 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2020-04-13 23:01 UTC (permalink / raw) To: Eric W. Biederman Cc: Baoquan He, David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Will Deacon, linux-arm-kernel On Mon, 13 Apr 2020 08:15:23 -0500 ebiederm@xmission.com (Eric W. Biederman) wrote: > > For 3), people can still use kexec_load and develop/fix for it, if no > > kexec_file_load supported. But 32-bit arm should be a different one, > > more like i386, we will leave it as is, and fix anything which could > > break it. But people really expects to improve or add feature to it? E.g > > in this patchset, the mem hotplug issue James raised, I assume James is > > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > > another reply, people even don't agree to continue supporting memory > > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > > bug on i386 with a patch, but people would rather set it as BROKEN. > > For memory hotplug just reload. Userspace already gets good events. > > We should not expect anything except a panic kernel to be loaded over a > memory hotplug event. The kexec on panic code should actually be loaded > in a location that we don't reliquish if asked for it. Is that a nack for James's patchset? _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-13 23:01 ` Andrew Morton @ 2020-04-14 6:13 ` Eric W. Biederman 0 siblings, 0 replies; 61+ messages in thread From: Eric W. Biederman @ 2020-04-14 6:13 UTC (permalink / raw) To: Andrew Morton Cc: Baoquan He, David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Will Deacon, linux-arm-kernel Andrew Morton <akpm@linux-foundation.org> writes: > On Mon, 13 Apr 2020 08:15:23 -0500 ebiederm@xmission.com (Eric W. Biederman) wrote: > >> > For 3), people can still use kexec_load and develop/fix for it, if no >> > kexec_file_load supported. But 32-bit arm should be a different one, >> > more like i386, we will leave it as is, and fix anything which could >> > break it. But people really expects to improve or add feature to it? E.g >> > in this patchset, the mem hotplug issue James raised, I assume James is >> > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >> > another reply, people even don't agree to continue supporting memory >> > hotplug on 32-bit system. We ever took effort to fix a memory hotplug >> > bug on i386 with a patch, but people would rather set it as BROKEN. >> >> For memory hotplug just reload. Userspace already gets good events. >> >> We should not expect anything except a panic kernel to be loaded over a >> memory hotplug event. The kexec on panic code should actually be loaded >> in a location that we don't reliquish if asked for it. > > Is that a nack for James's patchset? I have just read the end of the thread and I have the sense that the patchset had already been rejected. I will see if I can go back and read the beginning. I was mostly reacting to the idea that you could stop maintaining an interface that people are actively using because there is a newer interface. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-13 13:15 ` Eric W. Biederman 2020-04-13 23:01 ` Andrew Morton @ 2020-04-14 6:40 ` Baoquan He 2020-04-14 6:51 ` Baoquan He 2020-04-14 8:00 ` David Hildenbrand 2020-04-14 9:16 ` Dave Young 2 siblings, 2 replies; 61+ messages in thread From: Baoquan He @ 2020-04-14 6:40 UTC (permalink / raw) To: Eric W. Biederman Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 04/13/20 at 08:15am, Eric W. Biederman wrote: > Baoquan He <bhe@redhat.com> writes: > > > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >> > >> The only benefit of kexec_file_load is that it is simple enough from a > >> kernel perspective that signatures can be checked. > > > > We don't have this restriction any more with below commit: > > > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > > and KEXEC_SIG_FORCE") > > > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > > secure boot or legacy system for kexec/kdump. Being simple enough is > > enough to astract and convince us to use it instead. And kexec_file_load > > has been in use for several years on systems with secure boot, since > > added in 2014, on x86_64. > > No. Actaully kexec_file_load is the less capable interface, and less > flexible interface. Which is why it is appropriate for signature > verification. Well, everyone has a stance and the corresponding view. You could have wider view from long time maintenance and in upstrem position, and think kexec_file_load is horrible. But I can only see from our work as a front line engineer to maintain/develop kexec/kdump in RHEL, and think kexec_file_load is easier to maintain. Surely except of multiple kernel image format support. No matter it is kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. This is produced from kerel building by default. We have no way to support it in our distros and add it into kexec_file_load. [RFC PATCH] x86/boot: make ELF kernel multiboot-able https://lkml.org/lkml/2017/2/15/654 > > >> kexec_load in every other respect is the more capable and functional > >> interface. It makes no sense to get rid of it. > >> > >> It does make sense to reload with a loaded kernel on memory hotplug. > >> That is simple and easy. If we are going to handle something in the > >> kernel it should simple an automated unloading of the kernel on memory > >> hotplug. > >> > >> > >> I think it would be irresponsible to deprecate kexec_load on any > >> platform. > >> > >> I also suspect that kexec_file_load could be taught to copy the dtb > >> on arm32 if someone wants to deal with signatures. > >> > >> We definitely can not even think of deprecating kexec_load until > >> architecture that supports it also supports kexec_file_load and everyone > >> is happy with that interface. That is Linus's no regression rule. > > > > I should pick a milder word to express our tendency and tell our plan > > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > > much. I didn't mean to say 'deprecate' at all when replied. > > > > The situation and trend I understand about kexec_load and kexec_file_load > > are: > > > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > > have yet, just as x86_64, arm64 and s390 have done; > > > > 2) kexec_file_load is suggested to use, and take precedence over > > kexec_load in the future, if both are supported in one ARCH. > > The deep problem is that kexec_file_load is distinctly less expressive > than kexec_load. > > > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > > and by ARCHes for back compatibility w/ kexec_file_load support. > > > > For 1) and 2), I think the reason is obvious as Eric said, > > kexec_file_load is simple enough. And currently, whenever we got a bug > > report, we may need fix them twice, for kexec_load and kexec_file_load. > > If kexec_file_load is made by default, e.g on x86_64, we will change it > > in kernel space only, for kexec_file_load. This is what I meant about > > 'obsolete gradually'. I think for arm64, s390, they will do these too. > > Unless there's some critical/blocker bug in kexec_load, to corrupt the > > old kexec_load interface in old product. > > Maybe. The code that kexec_file_load sucked into the kernel is quite > stable and rarely needs changes except during a port of kexec to > another architecture. > > Last I looked the real maintenance effor of kexec and kexec on panic was > in the drivers. So I don't think we can use maintenance to do anything. Not sure if I got it. But if check Lianbo's patches, a lot of effort has been taken to make SEV work well on kexec_file_load. And we have switched to use kexec_file_load in the newly published Fedora release on x86_64 by default. Before this, Lianbo has investigated and done many experiments to make sure the switching is safe. We finally made this decision. Next we will do the switch in Enterprise distros. Once these are proved safe, we will suggest customers to use kexec_file_load for kexec rebooting too. In the future, we will only care about kexec_file_load if everying is going well. But as I have explained repeatedly, only caring about kexec_file_load means we will leave kexec_load as is, we will not add new feature or improvement patches for it. commit 6a20bd54473e11011bf2b47efb52d0759d412854 Author: Lianbo Jiang <lijiang@redhat.com> Date: Thu Jan 16 13:47:35 2020 +0800 kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > > > For 3), people can still use kexec_load and develop/fix for it, if no > > kexec_file_load supported. But 32-bit arm should be a different one, > > more like i386, we will leave it as is, and fix anything which could > > break it. But people really expects to improve or add feature to it? E.g > > in this patchset, the mem hotplug issue James raised, I assume James is > > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > > another reply, people even don't agree to continue supporting memory > > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > > bug on i386 with a patch, but people would rather set it as BROKEN. > > For memory hotplug just reload. Userspace already gets good events. Kexec_file_load is easy to maintain. This is an example. Lock the hotplug area where kexed-ed kernel is targeted in this patchset, it's obviously not right. We can't disable memory hotplug just because kexec-ed kernel is loaded ahead of time. Reloading is also not a good fix. Kexec-ed kernel is targeted at a movable area, reloading can avoid kexec rebooting corruption if that area is hot removed. But if that area is not removed, locating kernel into the hotpluggable area will change the area into ummovable zone. Unless we decide to not support memory hotplug in kexec-ed kernel, I guess it's very hard. Now in our distros kexec rebooting has been supported, the big cloud providers are deploying linux in guest, bugs on kexec reboot failure has been reported. They need the memory hotplug to increase/decrease memory. The root cause is kexec-ed kernel is targeted at hotpluggable memory region. Just avoiding the movable area can fix it. In kexec_file_load(), just checking or picking those unmovable region to put kernel/initrd in function locate_mem_hole_callback() can fix it. The page or pageblock's zone is movable or not, it's easy to know. This fix doesn't need to bother other component. > > We should not expect anything except a panic kernel to be loaded over a > memory hotplug event. The kexec on panic code should actually be loaded > in a location that we don't reliquish if asked for it. > > Quite frankly at this point I would love to see the signature fad die, > which would allow us to remove kexec_file_load. I still have not seen > the signature code used anywhere except by people anticipating trouble. > > Given that Microsoft has already directly signed a malicous bootloader. > (Not in the Linux ecosystem). I don't even know if any of the reasons > for having kexec_file_load are legtimate. > > > If someone wants to do the work and ensure everything that is possible > to load with kexec_load is possible to load with kexec_file_load. > Kernels supporting the multi-boot protocol etc. Then we can consider > deprecating kexec_load. > > > I think it took me about 15 years to remove the sysctl system call and > it only ever had about 10 users. If you want to go through that kind of > work to make certain there are no more users and that everything they > could do with the old interface is doable with the new interface then > please be my guest. Until then we need to fully support kexec_load. I want to clarify again, we have no plan to deprecate kexec_load. We just plan to use kexec_file_load more in our distros, for both legacy system or system with secure boot. Eric, I am glad to see you told your opinion about kexec_file_load. Without the discussion in this thread, we may not know it. So I have one question, seems kexec_file_load will continue existing, the ARCHes our distros is supporting, x86_64, s390, ppc, arm64, all have kexec_file_load, do you object us to continue using kexec_file_load, for signature verification and normal kexec/kdump booting? Or you plan to deprecate kexec_file_load? _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 6:40 ` Baoquan He @ 2020-04-14 6:51 ` Baoquan He 2020-04-14 8:00 ` David Hildenbrand 1 sibling, 0 replies; 61+ messages in thread From: Baoquan He @ 2020-04-14 6:51 UTC (permalink / raw) To: Eric W. Biederman Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 04/14/20 at 02:40pm, Baoquan He wrote: > On 04/13/20 at 08:15am, Eric W. Biederman wrote: > > Baoquan He <bhe@redhat.com> writes: > > > > > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > > >> > > >> The only benefit of kexec_file_load is that it is simple enough from a > > >> kernel perspective that signatures can be checked. > > > > > > We don't have this restriction any more with below commit: > > > > > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > > > and KEXEC_SIG_FORCE") > > > > > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > > > secure boot or legacy system for kexec/kdump. Being simple enough is > > > enough to astract and convince us to use it instead. And kexec_file_load > > > has been in use for several years on systems with secure boot, since > > > added in 2014, on x86_64. > > > > No. Actaully kexec_file_load is the less capable interface, and less > > flexible interface. Which is why it is appropriate for signature > > verification. > > Well, everyone has a stance and the corresponding view. You could have > wider view from long time maintenance and in upstrem position, and think > kexec_file_load is horrible. But I can only see from our work as a front > line engineer to maintain/develop kexec/kdump in RHEL, and think > kexec_file_load is easier to maintain. > > Surely except of multiple kernel image format support. No matter it is > kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > This is produced from kerel building by default. We have no way to > support it in our distros and add it into kexec_file_load. > > [RFC PATCH] x86/boot: make ELF kernel multiboot-able > https://lkml.org/lkml/2017/2/15/654 > > > > > >> kexec_load in every other respect is the more capable and functional > > >> interface. It makes no sense to get rid of it. > > >> > > >> It does make sense to reload with a loaded kernel on memory hotplug. > > >> That is simple and easy. If we are going to handle something in the > > >> kernel it should simple an automated unloading of the kernel on memory > > >> hotplug. > > >> > > >> > > >> I think it would be irresponsible to deprecate kexec_load on any > > >> platform. > > >> > > >> I also suspect that kexec_file_load could be taught to copy the dtb > > >> on arm32 if someone wants to deal with signatures. > > >> > > >> We definitely can not even think of deprecating kexec_load until > > >> architecture that supports it also supports kexec_file_load and everyone > > >> is happy with that interface. That is Linus's no regression rule. > > > > > > I should pick a milder word to express our tendency and tell our plan > > > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > > > much. I didn't mean to say 'deprecate' at all when replied. > > > > > > The situation and trend I understand about kexec_load and kexec_file_load > > > are: > > > > > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > > > have yet, just as x86_64, arm64 and s390 have done; > > > > > > 2) kexec_file_load is suggested to use, and take precedence over > > > kexec_load in the future, if both are supported in one ARCH. > > > > The deep problem is that kexec_file_load is distinctly less expressive > > than kexec_load. > > > > > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > > > and by ARCHes for back compatibility w/ kexec_file_load support. > > > > > > For 1) and 2), I think the reason is obvious as Eric said, > > > kexec_file_load is simple enough. And currently, whenever we got a bug > > > report, we may need fix them twice, for kexec_load and kexec_file_load. > > > If kexec_file_load is made by default, e.g on x86_64, we will change it > > > in kernel space only, for kexec_file_load. This is what I meant about > > > 'obsolete gradually'. I think for arm64, s390, they will do these too. > > > Unless there's some critical/blocker bug in kexec_load, to corrupt the > > > old kexec_load interface in old product. > > > > Maybe. The code that kexec_file_load sucked into the kernel is quite > > stable and rarely needs changes except during a port of kexec to > > another architecture. > > > > Last I looked the real maintenance effor of kexec and kexec on panic was > > in the drivers. So I don't think we can use maintenance to do anything. > > Not sure if I got it. But if check Lianbo's patches, a lot of effort has > been taken to make SEV work well on kexec_file_load. And we have > switched to use kexec_file_load in the newly published Fedora release > on x86_64 by default. Before this, Lianbo has investigated and done many > experiments to make sure the switching is safe. We finally made this > decision. Next we will do the switch in Enterprise distros. Once these > are proved safe, we will suggest customers to use kexec_file_load for > kexec rebooting too. In the future, we will only care about > kexec_file_load if everying is going well. But as I have explained > repeatedly, only caring about kexec_file_load means we will leave > kexec_load as is, we will not add new feature or improvement patches > for it. > > commit 6a20bd54473e11011bf2b47efb52d0759d412854 > Author: Lianbo Jiang <lijiang@redhat.com> > Date: Thu Jan 16 13:47:35 2020 +0800 > > kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > > > > > > For 3), people can still use kexec_load and develop/fix for it, if no > > > kexec_file_load supported. But 32-bit arm should be a different one, > > > more like i386, we will leave it as is, and fix anything which could > > > break it. But people really expects to improve or add feature to it? E.g > > > in this patchset, the mem hotplug issue James raised, I assume James is > > > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > > > another reply, people even don't agree to continue supporting memory > > > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > > > bug on i386 with a patch, but people would rather set it as BROKEN. > > > > For memory hotplug just reload. Userspace already gets good events. > > Kexec_file_load is easy to maintain. This is an example. > > Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > it's obviously not right. We can't disable memory hotplug just because > kexec-ed kernel is loaded ahead of time. > > Reloading is also not a good fix. Kexec-ed kernel is targeted at a > movable area, reloading can avoid kexec rebooting corruption if that > area is hot removed. But if that area is not removed, locating kernel > into the hotpluggable area will change the area into ummovable zone. Here I mean if kexec kernel is targeted at a hotplggable memory region, after kexec rebooting, that region will become unmovable. People can't hot remove it in kexec-ed kernel. > Unless we decide to not support memory hotplug in kexec-ed kernel, I > guess it's very hard. Now in our distros kexec rebooting has been > supported, the big cloud providers are deploying linux in guest, bugs on > kexec reboot failure has been reported. They need the memory hotplug to > increase/decrease memory. > > The root cause is kexec-ed kernel is targeted at hotpluggable memory > region. Just avoiding the movable area can fix it. In kexec_file_load(), > just checking or picking those unmovable region to put kernel/initrd in > function locate_mem_hole_callback() can fix it. The page or pageblock's > zone is movable or not, it's easy to know. This fix doesn't need to > bother other component. > > > > > We should not expect anything except a panic kernel to be loaded over a > > memory hotplug event. The kexec on panic code should actually be loaded > > in a location that we don't reliquish if asked for it. > > > > Quite frankly at this point I would love to see the signature fad die, > > which would allow us to remove kexec_file_load. I still have not seen > > the signature code used anywhere except by people anticipating trouble. > > > > Given that Microsoft has already directly signed a malicous bootloader. > > (Not in the Linux ecosystem). I don't even know if any of the reasons > > for having kexec_file_load are legtimate. > > > > > > If someone wants to do the work and ensure everything that is possible > > to load with kexec_load is possible to load with kexec_file_load. > > Kernels supporting the multi-boot protocol etc. Then we can consider > > deprecating kexec_load. > > > > > > I think it took me about 15 years to remove the sysctl system call and > > it only ever had about 10 users. If you want to go through that kind of > > work to make certain there are no more users and that everything they > > could do with the old interface is doable with the new interface then > > please be my guest. Until then we need to fully support kexec_load. > > I want to clarify again, we have no plan to deprecate kexec_load. > We just plan to use kexec_file_load more in our distros, for both legacy > system or system with secure boot. > > Eric, I am glad to see you told your opinion about kexec_file_load. > Without the discussion in this thread, we may not know it. So I have one > question, seems kexec_file_load will continue existing, the ARCHes our > distros is supporting, x86_64, s390, ppc, arm64, all have kexec_file_load, > do you object us to continue using kexec_file_load, for signature > verification and normal kexec/kdump booting? Or you plan to deprecate > kexec_file_load? _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 6:40 ` Baoquan He 2020-04-14 6:51 ` Baoquan He @ 2020-04-14 8:00 ` David Hildenbrand 2020-04-14 9:22 ` Baoquan He 1 sibling, 1 reply; 61+ messages in thread From: David Hildenbrand @ 2020-04-14 8:00 UTC (permalink / raw) To: Baoquan He, Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 14.04.20 08:40, Baoquan He wrote: > On 04/13/20 at 08:15am, Eric W. Biederman wrote: >> Baoquan He <bhe@redhat.com> writes: >> >>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >>>> >>>> The only benefit of kexec_file_load is that it is simple enough from a >>>> kernel perspective that signatures can be checked. >>> >>> We don't have this restriction any more with below commit: >>> >>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG >>> and KEXEC_SIG_FORCE") >>> >>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both >>> secure boot or legacy system for kexec/kdump. Being simple enough is >>> enough to astract and convince us to use it instead. And kexec_file_load >>> has been in use for several years on systems with secure boot, since >>> added in 2014, on x86_64. >> >> No. Actaully kexec_file_load is the less capable interface, and less >> flexible interface. Which is why it is appropriate for signature >> verification. > > Well, everyone has a stance and the corresponding view. You could have > wider view from long time maintenance and in upstrem position, and think > kexec_file_load is horrible. But I can only see from our work as a front > line engineer to maintain/develop kexec/kdump in RHEL, and think > kexec_file_load is easier to maintain. > > Surely except of multiple kernel image format support. No matter it is > kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > This is produced from kerel building by default. We have no way to > support it in our distros and add it into kexec_file_load. > > [RFC PATCH] x86/boot: make ELF kernel multiboot-able > https://lkml.org/lkml/2017/2/15/654 > >> >>>> kexec_load in every other respect is the more capable and functional >>>> interface. It makes no sense to get rid of it. >>>> >>>> It does make sense to reload with a loaded kernel on memory hotplug. >>>> That is simple and easy. If we are going to handle something in the >>>> kernel it should simple an automated unloading of the kernel on memory >>>> hotplug. >>>> >>>> >>>> I think it would be irresponsible to deprecate kexec_load on any >>>> platform. >>>> >>>> I also suspect that kexec_file_load could be taught to copy the dtb >>>> on arm32 if someone wants to deal with signatures. >>>> >>>> We definitely can not even think of deprecating kexec_load until >>>> architecture that supports it also supports kexec_file_load and everyone >>>> is happy with that interface. That is Linus's no regression rule. >>> >>> I should pick a milder word to express our tendency and tell our plan >>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help >>> much. I didn't mean to say 'deprecate' at all when replied. >>> >>> The situation and trend I understand about kexec_load and kexec_file_load >>> are: >>> >>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't >>> have yet, just as x86_64, arm64 and s390 have done; >>> >>> 2) kexec_file_load is suggested to use, and take precedence over >>> kexec_load in the future, if both are supported in one ARCH. >> >> The deep problem is that kexec_file_load is distinctly less expressive >> than kexec_load. >> >>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, >>> and by ARCHes for back compatibility w/ kexec_file_load support. >>> >>> For 1) and 2), I think the reason is obvious as Eric said, >>> kexec_file_load is simple enough. And currently, whenever we got a bug >>> report, we may need fix them twice, for kexec_load and kexec_file_load. >>> If kexec_file_load is made by default, e.g on x86_64, we will change it >>> in kernel space only, for kexec_file_load. This is what I meant about >>> 'obsolete gradually'. I think for arm64, s390, they will do these too. >>> Unless there's some critical/blocker bug in kexec_load, to corrupt the >>> old kexec_load interface in old product. >> >> Maybe. The code that kexec_file_load sucked into the kernel is quite >> stable and rarely needs changes except during a port of kexec to >> another architecture. >> >> Last I looked the real maintenance effor of kexec and kexec on panic was >> in the drivers. So I don't think we can use maintenance to do anything. > > Not sure if I got it. But if check Lianbo's patches, a lot of effort has > been taken to make SEV work well on kexec_file_load. And we have > switched to use kexec_file_load in the newly published Fedora release > on x86_64 by default. Before this, Lianbo has investigated and done many > experiments to make sure the switching is safe. We finally made this > decision. Next we will do the switch in Enterprise distros. Once these > are proved safe, we will suggest customers to use kexec_file_load for > kexec rebooting too. In the future, we will only care about > kexec_file_load if everying is going well. But as I have explained > repeatedly, only caring about kexec_file_load means we will leave > kexec_load as is, we will not add new feature or improvement patches > for it. > > commit 6a20bd54473e11011bf2b47efb52d0759d412854 > Author: Lianbo Jiang <lijiang@redhat.com> > Date: Thu Jan 16 13:47:35 2020 +0800 > > kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > >> >>> For 3), people can still use kexec_load and develop/fix for it, if no >>> kexec_file_load supported. But 32-bit arm should be a different one, >>> more like i386, we will leave it as is, and fix anything which could >>> break it. But people really expects to improve or add feature to it? E.g >>> in this patchset, the mem hotplug issue James raised, I assume James is >>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >>> another reply, people even don't agree to continue supporting memory >>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug >>> bug on i386 with a patch, but people would rather set it as BROKEN. >> >> For memory hotplug just reload. Userspace already gets good events. > > Kexec_file_load is easy to maintain. This is an example. > > Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > it's obviously not right. We can't disable memory hotplug just because > kexec-ed kernel is loaded ahead of time. > > Reloading is also not a good fix. Kexec-ed kernel is targeted at a > movable area, reloading can avoid kexec rebooting corruption if that > area is hot removed. But if that area is not removed, locating kernel > into the hotpluggable area will change the area into ummovable zone. > Unless we decide to not support memory hotplug in kexec-ed kernel, I > guess it's very hard. Now in our distros kexec rebooting has been > supported, the big cloud providers are deploying linux in guest, bugs on > kexec reboot failure has been reported. They need the memory hotplug to > increase/decrease memory. > > The root cause is kexec-ed kernel is targeted at hotpluggable memory > region. Just avoiding the movable area can fix it. In kexec_file_load(), > just checking or picking those unmovable region to put kernel/initrd in > function locate_mem_hole_callback() can fix it. The page or pageblock's > zone is movable or not, it's easy to know. This fix doesn't need to > bother other component. I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL does not imply that it cannot get offlined and removed e.g., this is heavily used on ppc64, with 16MB sections. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 8:00 ` David Hildenbrand @ 2020-04-14 9:22 ` Baoquan He 2020-04-14 9:37 ` David Hildenbrand 0 siblings, 1 reply; 61+ messages in thread From: Baoquan He @ 2020-04-14 9:22 UTC (permalink / raw) To: David Hildenbrand Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/14/20 at 10:00am, David Hildenbrand wrote: > On 14.04.20 08:40, Baoquan He wrote: > > On 04/13/20 at 08:15am, Eric W. Biederman wrote: > >> Baoquan He <bhe@redhat.com> writes: > >> > >>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >>>> > >>>> The only benefit of kexec_file_load is that it is simple enough from a > >>>> kernel perspective that signatures can be checked. > >>> > >>> We don't have this restriction any more with below commit: > >>> > >>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > >>> and KEXEC_SIG_FORCE") > >>> > >>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > >>> secure boot or legacy system for kexec/kdump. Being simple enough is > >>> enough to astract and convince us to use it instead. And kexec_file_load > >>> has been in use for several years on systems with secure boot, since > >>> added in 2014, on x86_64. > >> > >> No. Actaully kexec_file_load is the less capable interface, and less > >> flexible interface. Which is why it is appropriate for signature > >> verification. > > > > Well, everyone has a stance and the corresponding view. You could have > > wider view from long time maintenance and in upstrem position, and think > > kexec_file_load is horrible. But I can only see from our work as a front > > line engineer to maintain/develop kexec/kdump in RHEL, and think > > kexec_file_load is easier to maintain. > > > > Surely except of multiple kernel image format support. No matter it is > > kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > > This is produced from kerel building by default. We have no way to > > support it in our distros and add it into kexec_file_load. > > > > [RFC PATCH] x86/boot: make ELF kernel multiboot-able > > https://lkml.org/lkml/2017/2/15/654 > > > >> > >>>> kexec_load in every other respect is the more capable and functional > >>>> interface. It makes no sense to get rid of it. > >>>> > >>>> It does make sense to reload with a loaded kernel on memory hotplug. > >>>> That is simple and easy. If we are going to handle something in the > >>>> kernel it should simple an automated unloading of the kernel on memory > >>>> hotplug. > >>>> > >>>> > >>>> I think it would be irresponsible to deprecate kexec_load on any > >>>> platform. > >>>> > >>>> I also suspect that kexec_file_load could be taught to copy the dtb > >>>> on arm32 if someone wants to deal with signatures. > >>>> > >>>> We definitely can not even think of deprecating kexec_load until > >>>> architecture that supports it also supports kexec_file_load and everyone > >>>> is happy with that interface. That is Linus's no regression rule. > >>> > >>> I should pick a milder word to express our tendency and tell our plan > >>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help > >>> much. I didn't mean to say 'deprecate' at all when replied. > >>> > >>> The situation and trend I understand about kexec_load and kexec_file_load > >>> are: > >>> > >>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > >>> have yet, just as x86_64, arm64 and s390 have done; > >>> > >>> 2) kexec_file_load is suggested to use, and take precedence over > >>> kexec_load in the future, if both are supported in one ARCH. > >> > >> The deep problem is that kexec_file_load is distinctly less expressive > >> than kexec_load. > >> > >>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > >>> and by ARCHes for back compatibility w/ kexec_file_load support. > >>> > >>> For 1) and 2), I think the reason is obvious as Eric said, > >>> kexec_file_load is simple enough. And currently, whenever we got a bug > >>> report, we may need fix them twice, for kexec_load and kexec_file_load. > >>> If kexec_file_load is made by default, e.g on x86_64, we will change it > >>> in kernel space only, for kexec_file_load. This is what I meant about > >>> 'obsolete gradually'. I think for arm64, s390, they will do these too. > >>> Unless there's some critical/blocker bug in kexec_load, to corrupt the > >>> old kexec_load interface in old product. > >> > >> Maybe. The code that kexec_file_load sucked into the kernel is quite > >> stable and rarely needs changes except during a port of kexec to > >> another architecture. > >> > >> Last I looked the real maintenance effor of kexec and kexec on panic was > >> in the drivers. So I don't think we can use maintenance to do anything. > > > > Not sure if I got it. But if check Lianbo's patches, a lot of effort has > > been taken to make SEV work well on kexec_file_load. And we have > > switched to use kexec_file_load in the newly published Fedora release > > on x86_64 by default. Before this, Lianbo has investigated and done many > > experiments to make sure the switching is safe. We finally made this > > decision. Next we will do the switch in Enterprise distros. Once these > > are proved safe, we will suggest customers to use kexec_file_load for > > kexec rebooting too. In the future, we will only care about > > kexec_file_load if everying is going well. But as I have explained > > repeatedly, only caring about kexec_file_load means we will leave > > kexec_load as is, we will not add new feature or improvement patches > > for it. > > > > commit 6a20bd54473e11011bf2b47efb52d0759d412854 > > Author: Lianbo Jiang <lijiang@redhat.com> > > Date: Thu Jan 16 13:47:35 2020 +0800 > > > > kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > > > >> > >>> For 3), people can still use kexec_load and develop/fix for it, if no > >>> kexec_file_load supported. But 32-bit arm should be a different one, > >>> more like i386, we will leave it as is, and fix anything which could > >>> break it. But people really expects to improve or add feature to it? E.g > >>> in this patchset, the mem hotplug issue James raised, I assume James is > >>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > >>> another reply, people even don't agree to continue supporting memory > >>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug > >>> bug on i386 with a patch, but people would rather set it as BROKEN. > >> > >> For memory hotplug just reload. Userspace already gets good events. > > > > Kexec_file_load is easy to maintain. This is an example. > > > > Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > > it's obviously not right. We can't disable memory hotplug just because > > kexec-ed kernel is loaded ahead of time. > > > > Reloading is also not a good fix. Kexec-ed kernel is targeted at a > > movable area, reloading can avoid kexec rebooting corruption if that > > area is hot removed. But if that area is not removed, locating kernel > > into the hotpluggable area will change the area into ummovable zone. > > Unless we decide to not support memory hotplug in kexec-ed kernel, I > > guess it's very hard. Now in our distros kexec rebooting has been > > supported, the big cloud providers are deploying linux in guest, bugs on > > kexec reboot failure has been reported. They need the memory hotplug to > > increase/decrease memory. > > > > The root cause is kexec-ed kernel is targeted at hotpluggable memory > > region. Just avoiding the movable area can fix it. In kexec_file_load(), > > just checking or picking those unmovable region to put kernel/initrd in > > function locate_mem_hole_callback() can fix it. The page or pageblock's > > zone is movable or not, it's easy to know. This fix doesn't need to > > bother other component. > > I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > does not imply that it cannot get offlined and removed e.g., this is > heavily used on ppc64, with 16MB sections. Really? I just know there are two kinds of mem hoplug in ppc, but don't know the details. So in this case, is there any flag or a way to know those memory block are hotpluggable? I am curious how those kernel data is avoided to be put in this area. Or ppc just freely uses it for kernel data or user space data, then try to migrate when hot remove? _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 9:22 ` Baoquan He @ 2020-04-14 9:37 ` David Hildenbrand 2020-04-14 14:39 ` Baoquan He 0 siblings, 1 reply; 61+ messages in thread From: David Hildenbrand @ 2020-04-14 9:37 UTC (permalink / raw) To: Baoquan He Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 14.04.20 11:22, Baoquan He wrote: > On 04/14/20 at 10:00am, David Hildenbrand wrote: >> On 14.04.20 08:40, Baoquan He wrote: >>> On 04/13/20 at 08:15am, Eric W. Biederman wrote: >>>> Baoquan He <bhe@redhat.com> writes: >>>> >>>>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >>>>>> >>>>>> The only benefit of kexec_file_load is that it is simple enough from a >>>>>> kernel perspective that signatures can be checked. >>>>> >>>>> We don't have this restriction any more with below commit: >>>>> >>>>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG >>>>> and KEXEC_SIG_FORCE") >>>>> >>>>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both >>>>> secure boot or legacy system for kexec/kdump. Being simple enough is >>>>> enough to astract and convince us to use it instead. And kexec_file_load >>>>> has been in use for several years on systems with secure boot, since >>>>> added in 2014, on x86_64. >>>> >>>> No. Actaully kexec_file_load is the less capable interface, and less >>>> flexible interface. Which is why it is appropriate for signature >>>> verification. >>> >>> Well, everyone has a stance and the corresponding view. You could have >>> wider view from long time maintenance and in upstrem position, and think >>> kexec_file_load is horrible. But I can only see from our work as a front >>> line engineer to maintain/develop kexec/kdump in RHEL, and think >>> kexec_file_load is easier to maintain. >>> >>> Surely except of multiple kernel image format support. No matter it is >>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. >>> This is produced from kerel building by default. We have no way to >>> support it in our distros and add it into kexec_file_load. >>> >>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able >>> https://lkml.org/lkml/2017/2/15/654 >>> >>>> >>>>>> kexec_load in every other respect is the more capable and functional >>>>>> interface. It makes no sense to get rid of it. >>>>>> >>>>>> It does make sense to reload with a loaded kernel on memory hotplug. >>>>>> That is simple and easy. If we are going to handle something in the >>>>>> kernel it should simple an automated unloading of the kernel on memory >>>>>> hotplug. >>>>>> >>>>>> >>>>>> I think it would be irresponsible to deprecate kexec_load on any >>>>>> platform. >>>>>> >>>>>> I also suspect that kexec_file_load could be taught to copy the dtb >>>>>> on arm32 if someone wants to deal with signatures. >>>>>> >>>>>> We definitely can not even think of deprecating kexec_load until >>>>>> architecture that supports it also supports kexec_file_load and everyone >>>>>> is happy with that interface. That is Linus's no regression rule. >>>>> >>>>> I should pick a milder word to express our tendency and tell our plan >>>>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help >>>>> much. I didn't mean to say 'deprecate' at all when replied. >>>>> >>>>> The situation and trend I understand about kexec_load and kexec_file_load >>>>> are: >>>>> >>>>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't >>>>> have yet, just as x86_64, arm64 and s390 have done; >>>>> >>>>> 2) kexec_file_load is suggested to use, and take precedence over >>>>> kexec_load in the future, if both are supported in one ARCH. >>>> >>>> The deep problem is that kexec_file_load is distinctly less expressive >>>> than kexec_load. >>>> >>>>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, >>>>> and by ARCHes for back compatibility w/ kexec_file_load support. >>>>> >>>>> For 1) and 2), I think the reason is obvious as Eric said, >>>>> kexec_file_load is simple enough. And currently, whenever we got a bug >>>>> report, we may need fix them twice, for kexec_load and kexec_file_load. >>>>> If kexec_file_load is made by default, e.g on x86_64, we will change it >>>>> in kernel space only, for kexec_file_load. This is what I meant about >>>>> 'obsolete gradually'. I think for arm64, s390, they will do these too. >>>>> Unless there's some critical/blocker bug in kexec_load, to corrupt the >>>>> old kexec_load interface in old product. >>>> >>>> Maybe. The code that kexec_file_load sucked into the kernel is quite >>>> stable and rarely needs changes except during a port of kexec to >>>> another architecture. >>>> >>>> Last I looked the real maintenance effor of kexec and kexec on panic was >>>> in the drivers. So I don't think we can use maintenance to do anything. >>> >>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has >>> been taken to make SEV work well on kexec_file_load. And we have >>> switched to use kexec_file_load in the newly published Fedora release >>> on x86_64 by default. Before this, Lianbo has investigated and done many >>> experiments to make sure the switching is safe. We finally made this >>> decision. Next we will do the switch in Enterprise distros. Once these >>> are proved safe, we will suggest customers to use kexec_file_load for >>> kexec rebooting too. In the future, we will only care about >>> kexec_file_load if everying is going well. But as I have explained >>> repeatedly, only caring about kexec_file_load means we will leave >>> kexec_load as is, we will not add new feature or improvement patches >>> for it. >>> >>> commit 6a20bd54473e11011bf2b47efb52d0759d412854 >>> Author: Lianbo Jiang <lijiang@redhat.com> >>> Date: Thu Jan 16 13:47:35 2020 +0800 >>> >>> kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default >>> >>>> >>>>> For 3), people can still use kexec_load and develop/fix for it, if no >>>>> kexec_file_load supported. But 32-bit arm should be a different one, >>>>> more like i386, we will leave it as is, and fix anything which could >>>>> break it. But people really expects to improve or add feature to it? E.g >>>>> in this patchset, the mem hotplug issue James raised, I assume James is >>>>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >>>>> another reply, people even don't agree to continue supporting memory >>>>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug >>>>> bug on i386 with a patch, but people would rather set it as BROKEN. >>>> >>>> For memory hotplug just reload. Userspace already gets good events. >>> >>> Kexec_file_load is easy to maintain. This is an example. >>> >>> Lock the hotplug area where kexed-ed kernel is targeted in this patchset, >>> it's obviously not right. We can't disable memory hotplug just because >>> kexec-ed kernel is loaded ahead of time. >>> >>> Reloading is also not a good fix. Kexec-ed kernel is targeted at a >>> movable area, reloading can avoid kexec rebooting corruption if that >>> area is hot removed. But if that area is not removed, locating kernel >>> into the hotpluggable area will change the area into ummovable zone. >>> Unless we decide to not support memory hotplug in kexec-ed kernel, I >>> guess it's very hard. Now in our distros kexec rebooting has been >>> supported, the big cloud providers are deploying linux in guest, bugs on >>> kexec reboot failure has been reported. They need the memory hotplug to >>> increase/decrease memory. >>> >>> The root cause is kexec-ed kernel is targeted at hotpluggable memory >>> region. Just avoiding the movable area can fix it. In kexec_file_load(), >>> just checking or picking those unmovable region to put kernel/initrd in >>> function locate_mem_hole_callback() can fix it. The page or pageblock's >>> zone is movable or not, it's easy to know. This fix doesn't need to >>> bother other component. >> >> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL >> does not imply that it cannot get offlined and removed e.g., this is >> heavily used on ppc64, with 16MB sections. > > Really? I just know there are two kinds of mem hoplug in ppc, but don't > know the details. So in this case, is there any flag or a way to know > those memory block are hotpluggable? I am curious how those kernel data > is avoided to be put in this area. Or ppc just freely uses it for kernel > data or user space data, then try to migrate when hot remove? See arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() Under DLAPR, it can remove memory in LMB granularity, which is usually 16MB (== single section on ppc64). DLPAR will directly online all hotplugged memory (LMBs) from the kernel using device_online(), which will go to ZONE_NORMAL. When trying to remove memory, it simply scans for offlineable 16MB memory blocks (==section == LMB), offlines and removes them. No need for the movable zone and all the involved issues. Now, the interesting question is, can we have LMBs added during boot (not via add_memory()), that will later be removed via remove_memory(). IIRC, we had BUGs related to that, so I think yes. If a section contains no unmovable allocations (after boot), it can get removed. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 9:37 ` David Hildenbrand @ 2020-04-14 14:39 ` Baoquan He 2020-04-14 14:49 ` David Hildenbrand 0 siblings, 1 reply; 61+ messages in thread From: Baoquan He @ 2020-04-14 14:39 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/14/20 at 11:37am, David Hildenbrand wrote: > On 14.04.20 11:22, Baoquan He wrote: > > On 04/14/20 at 10:00am, David Hildenbrand wrote: > >> On 14.04.20 08:40, Baoquan He wrote: > >>> On 04/13/20 at 08:15am, Eric W. Biederman wrote: > >>>> Baoquan He <bhe@redhat.com> writes: > >>>> > >>>>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >>>>>> > >>>>>> The only benefit of kexec_file_load is that it is simple enough from a > >>>>>> kernel perspective that signatures can be checked. > >>>>> > >>>>> We don't have this restriction any more with below commit: > >>>>> > >>>>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > >>>>> and KEXEC_SIG_FORCE") > >>>>> > >>>>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > >>>>> secure boot or legacy system for kexec/kdump. Being simple enough is > >>>>> enough to astract and convince us to use it instead. And kexec_file_load > >>>>> has been in use for several years on systems with secure boot, since > >>>>> added in 2014, on x86_64. > >>>> > >>>> No. Actaully kexec_file_load is the less capable interface, and less > >>>> flexible interface. Which is why it is appropriate for signature > >>>> verification. > >>> > >>> Well, everyone has a stance and the corresponding view. You could have > >>> wider view from long time maintenance and in upstrem position, and think > >>> kexec_file_load is horrible. But I can only see from our work as a front > >>> line engineer to maintain/develop kexec/kdump in RHEL, and think > >>> kexec_file_load is easier to maintain. > >>> > >>> Surely except of multiple kernel image format support. No matter it is > >>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > >>> This is produced from kerel building by default. We have no way to > >>> support it in our distros and add it into kexec_file_load. > >>> > >>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able > >>> https://lkml.org/lkml/2017/2/15/654 > >>> > >>>> > >>>>>> kexec_load in every other respect is the more capable and functional > >>>>>> interface. It makes no sense to get rid of it. > >>>>>> > >>>>>> It does make sense to reload with a loaded kernel on memory hotplug. > >>>>>> That is simple and easy. If we are going to handle something in the > >>>>>> kernel it should simple an automated unloading of the kernel on memory > >>>>>> hotplug. > >>>>>> > >>>>>> > >>>>>> I think it would be irresponsible to deprecate kexec_load on any > >>>>>> platform. > >>>>>> > >>>>>> I also suspect that kexec_file_load could be taught to copy the dtb > >>>>>> on arm32 if someone wants to deal with signatures. > >>>>>> > >>>>>> We definitely can not even think of deprecating kexec_load until > >>>>>> architecture that supports it also supports kexec_file_load and everyone > >>>>>> is happy with that interface. That is Linus's no regression rule. > >>>>> > >>>>> I should pick a milder word to express our tendency and tell our plan > >>>>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help > >>>>> much. I didn't mean to say 'deprecate' at all when replied. > >>>>> > >>>>> The situation and trend I understand about kexec_load and kexec_file_load > >>>>> are: > >>>>> > >>>>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > >>>>> have yet, just as x86_64, arm64 and s390 have done; > >>>>> > >>>>> 2) kexec_file_load is suggested to use, and take precedence over > >>>>> kexec_load in the future, if both are supported in one ARCH. > >>>> > >>>> The deep problem is that kexec_file_load is distinctly less expressive > >>>> than kexec_load. > >>>> > >>>>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > >>>>> and by ARCHes for back compatibility w/ kexec_file_load support. > >>>>> > >>>>> For 1) and 2), I think the reason is obvious as Eric said, > >>>>> kexec_file_load is simple enough. And currently, whenever we got a bug > >>>>> report, we may need fix them twice, for kexec_load and kexec_file_load. > >>>>> If kexec_file_load is made by default, e.g on x86_64, we will change it > >>>>> in kernel space only, for kexec_file_load. This is what I meant about > >>>>> 'obsolete gradually'. I think for arm64, s390, they will do these too. > >>>>> Unless there's some critical/blocker bug in kexec_load, to corrupt the > >>>>> old kexec_load interface in old product. > >>>> > >>>> Maybe. The code that kexec_file_load sucked into the kernel is quite > >>>> stable and rarely needs changes except during a port of kexec to > >>>> another architecture. > >>>> > >>>> Last I looked the real maintenance effor of kexec and kexec on panic was > >>>> in the drivers. So I don't think we can use maintenance to do anything. > >>> > >>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has > >>> been taken to make SEV work well on kexec_file_load. And we have > >>> switched to use kexec_file_load in the newly published Fedora release > >>> on x86_64 by default. Before this, Lianbo has investigated and done many > >>> experiments to make sure the switching is safe. We finally made this > >>> decision. Next we will do the switch in Enterprise distros. Once these > >>> are proved safe, we will suggest customers to use kexec_file_load for > >>> kexec rebooting too. In the future, we will only care about > >>> kexec_file_load if everying is going well. But as I have explained > >>> repeatedly, only caring about kexec_file_load means we will leave > >>> kexec_load as is, we will not add new feature or improvement patches > >>> for it. > >>> > >>> commit 6a20bd54473e11011bf2b47efb52d0759d412854 > >>> Author: Lianbo Jiang <lijiang@redhat.com> > >>> Date: Thu Jan 16 13:47:35 2020 +0800 > >>> > >>> kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > >>> > >>>> > >>>>> For 3), people can still use kexec_load and develop/fix for it, if no > >>>>> kexec_file_load supported. But 32-bit arm should be a different one, > >>>>> more like i386, we will leave it as is, and fix anything which could > >>>>> break it. But people really expects to improve or add feature to it? E.g > >>>>> in this patchset, the mem hotplug issue James raised, I assume James is > >>>>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > >>>>> another reply, people even don't agree to continue supporting memory > >>>>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug > >>>>> bug on i386 with a patch, but people would rather set it as BROKEN. > >>>> > >>>> For memory hotplug just reload. Userspace already gets good events. > >>> > >>> Kexec_file_load is easy to maintain. This is an example. > >>> > >>> Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > >>> it's obviously not right. We can't disable memory hotplug just because > >>> kexec-ed kernel is loaded ahead of time. > >>> > >>> Reloading is also not a good fix. Kexec-ed kernel is targeted at a > >>> movable area, reloading can avoid kexec rebooting corruption if that > >>> area is hot removed. But if that area is not removed, locating kernel > >>> into the hotpluggable area will change the area into ummovable zone. > >>> Unless we decide to not support memory hotplug in kexec-ed kernel, I > >>> guess it's very hard. Now in our distros kexec rebooting has been > >>> supported, the big cloud providers are deploying linux in guest, bugs on > >>> kexec reboot failure has been reported. They need the memory hotplug to > >>> increase/decrease memory. > >>> > >>> The root cause is kexec-ed kernel is targeted at hotpluggable memory > >>> region. Just avoiding the movable area can fix it. In kexec_file_load(), > >>> just checking or picking those unmovable region to put kernel/initrd in > >>> function locate_mem_hole_callback() can fix it. The page or pageblock's > >>> zone is movable or not, it's easy to know. This fix doesn't need to > >>> bother other component. > >> > >> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > >> does not imply that it cannot get offlined and removed e.g., this is > >> heavily used on ppc64, with 16MB sections. > > > > Really? I just know there are two kinds of mem hoplug in ppc, but don't > > know the details. So in this case, is there any flag or a way to know > > those memory block are hotpluggable? I am curious how those kernel data > > is avoided to be put in this area. Or ppc just freely uses it for kernel > > data or user space data, then try to migrate when hot remove? > > See > arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() > > Under DLAPR, it can remove memory in LMB granularity, which is usually > 16MB (== single section on ppc64). DLPAR will directly online all > hotplugged memory (LMBs) from the kernel using device_online(), which > will go to ZONE_NORMAL. > > When trying to remove memory, it simply scans for offlineable 16MB > memory blocks (==section == LMB), offlines and removes them. No need for > the movable zone and all the involved issues. Yes, this is a different one, thanks for pointing it out. It sounds like balloon driver in virt platform, doesn't it? Avoiding to put kexec kernel into movable zone can't solve this DLPAR case as you said. > > Now, the interesting question is, can we have LMBs added during boot > (not via add_memory()), that will later be removed via remove_memory(). > IIRC, we had BUGs related to that, so I think yes. If a section contains > no unmovable allocations (after boot), it can get removed. I do want to ask this question. If we can add LMB into system RAM, then reload kexec can solve it. Another better way is adding a common function to filter out the movable zone when search position for kexec kernel, use a arch specific funciton to filter out DLPAR memory blocks for ppc only. Over there, we can simply use for_each_drmem_lmb() to do that. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 14:39 ` Baoquan He @ 2020-04-14 14:49 ` David Hildenbrand 2020-04-15 2:35 ` Baoquan He 0 siblings, 1 reply; 61+ messages in thread From: David Hildenbrand @ 2020-04-14 14:49 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 14.04.20 16:39, Baoquan He wrote: > On 04/14/20 at 11:37am, David Hildenbrand wrote: >> On 14.04.20 11:22, Baoquan He wrote: >>> On 04/14/20 at 10:00am, David Hildenbrand wrote: >>>> On 14.04.20 08:40, Baoquan He wrote: >>>>> On 04/13/20 at 08:15am, Eric W. Biederman wrote: >>>>>> Baoquan He <bhe@redhat.com> writes: >>>>>> >>>>>>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >>>>>>>> >>>>>>>> The only benefit of kexec_file_load is that it is simple enough from a >>>>>>>> kernel perspective that signatures can be checked. >>>>>>> >>>>>>> We don't have this restriction any more with below commit: >>>>>>> >>>>>>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG >>>>>>> and KEXEC_SIG_FORCE") >>>>>>> >>>>>>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both >>>>>>> secure boot or legacy system for kexec/kdump. Being simple enough is >>>>>>> enough to astract and convince us to use it instead. And kexec_file_load >>>>>>> has been in use for several years on systems with secure boot, since >>>>>>> added in 2014, on x86_64. >>>>>> >>>>>> No. Actaully kexec_file_load is the less capable interface, and less >>>>>> flexible interface. Which is why it is appropriate for signature >>>>>> verification. >>>>> >>>>> Well, everyone has a stance and the corresponding view. You could have >>>>> wider view from long time maintenance and in upstrem position, and think >>>>> kexec_file_load is horrible. But I can only see from our work as a front >>>>> line engineer to maintain/develop kexec/kdump in RHEL, and think >>>>> kexec_file_load is easier to maintain. >>>>> >>>>> Surely except of multiple kernel image format support. No matter it is >>>>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. >>>>> This is produced from kerel building by default. We have no way to >>>>> support it in our distros and add it into kexec_file_load. >>>>> >>>>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able >>>>> https://lkml.org/lkml/2017/2/15/654 >>>>> >>>>>> >>>>>>>> kexec_load in every other respect is the more capable and functional >>>>>>>> interface. It makes no sense to get rid of it. >>>>>>>> >>>>>>>> It does make sense to reload with a loaded kernel on memory hotplug. >>>>>>>> That is simple and easy. If we are going to handle something in the >>>>>>>> kernel it should simple an automated unloading of the kernel on memory >>>>>>>> hotplug. >>>>>>>> >>>>>>>> >>>>>>>> I think it would be irresponsible to deprecate kexec_load on any >>>>>>>> platform. >>>>>>>> >>>>>>>> I also suspect that kexec_file_load could be taught to copy the dtb >>>>>>>> on arm32 if someone wants to deal with signatures. >>>>>>>> >>>>>>>> We definitely can not even think of deprecating kexec_load until >>>>>>>> architecture that supports it also supports kexec_file_load and everyone >>>>>>>> is happy with that interface. That is Linus's no regression rule. >>>>>>> >>>>>>> I should pick a milder word to express our tendency and tell our plan >>>>>>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help >>>>>>> much. I didn't mean to say 'deprecate' at all when replied. >>>>>>> >>>>>>> The situation and trend I understand about kexec_load and kexec_file_load >>>>>>> are: >>>>>>> >>>>>>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't >>>>>>> have yet, just as x86_64, arm64 and s390 have done; >>>>>>> >>>>>>> 2) kexec_file_load is suggested to use, and take precedence over >>>>>>> kexec_load in the future, if both are supported in one ARCH. >>>>>> >>>>>> The deep problem is that kexec_file_load is distinctly less expressive >>>>>> than kexec_load. >>>>>> >>>>>>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, >>>>>>> and by ARCHes for back compatibility w/ kexec_file_load support. >>>>>>> >>>>>>> For 1) and 2), I think the reason is obvious as Eric said, >>>>>>> kexec_file_load is simple enough. And currently, whenever we got a bug >>>>>>> report, we may need fix them twice, for kexec_load and kexec_file_load. >>>>>>> If kexec_file_load is made by default, e.g on x86_64, we will change it >>>>>>> in kernel space only, for kexec_file_load. This is what I meant about >>>>>>> 'obsolete gradually'. I think for arm64, s390, they will do these too. >>>>>>> Unless there's some critical/blocker bug in kexec_load, to corrupt the >>>>>>> old kexec_load interface in old product. >>>>>> >>>>>> Maybe. The code that kexec_file_load sucked into the kernel is quite >>>>>> stable and rarely needs changes except during a port of kexec to >>>>>> another architecture. >>>>>> >>>>>> Last I looked the real maintenance effor of kexec and kexec on panic was >>>>>> in the drivers. So I don't think we can use maintenance to do anything. >>>>> >>>>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has >>>>> been taken to make SEV work well on kexec_file_load. And we have >>>>> switched to use kexec_file_load in the newly published Fedora release >>>>> on x86_64 by default. Before this, Lianbo has investigated and done many >>>>> experiments to make sure the switching is safe. We finally made this >>>>> decision. Next we will do the switch in Enterprise distros. Once these >>>>> are proved safe, we will suggest customers to use kexec_file_load for >>>>> kexec rebooting too. In the future, we will only care about >>>>> kexec_file_load if everying is going well. But as I have explained >>>>> repeatedly, only caring about kexec_file_load means we will leave >>>>> kexec_load as is, we will not add new feature or improvement patches >>>>> for it. >>>>> >>>>> commit 6a20bd54473e11011bf2b47efb52d0759d412854 >>>>> Author: Lianbo Jiang <lijiang@redhat.com> >>>>> Date: Thu Jan 16 13:47:35 2020 +0800 >>>>> >>>>> kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default >>>>> >>>>>> >>>>>>> For 3), people can still use kexec_load and develop/fix for it, if no >>>>>>> kexec_file_load supported. But 32-bit arm should be a different one, >>>>>>> more like i386, we will leave it as is, and fix anything which could >>>>>>> break it. But people really expects to improve or add feature to it? E.g >>>>>>> in this patchset, the mem hotplug issue James raised, I assume James is >>>>>>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >>>>>>> another reply, people even don't agree to continue supporting memory >>>>>>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug >>>>>>> bug on i386 with a patch, but people would rather set it as BROKEN. >>>>>> >>>>>> For memory hotplug just reload. Userspace already gets good events. >>>>> >>>>> Kexec_file_load is easy to maintain. This is an example. >>>>> >>>>> Lock the hotplug area where kexed-ed kernel is targeted in this patchset, >>>>> it's obviously not right. We can't disable memory hotplug just because >>>>> kexec-ed kernel is loaded ahead of time. >>>>> >>>>> Reloading is also not a good fix. Kexec-ed kernel is targeted at a >>>>> movable area, reloading can avoid kexec rebooting corruption if that >>>>> area is hot removed. But if that area is not removed, locating kernel >>>>> into the hotpluggable area will change the area into ummovable zone. >>>>> Unless we decide to not support memory hotplug in kexec-ed kernel, I >>>>> guess it's very hard. Now in our distros kexec rebooting has been >>>>> supported, the big cloud providers are deploying linux in guest, bugs on >>>>> kexec reboot failure has been reported. They need the memory hotplug to >>>>> increase/decrease memory. >>>>> >>>>> The root cause is kexec-ed kernel is targeted at hotpluggable memory >>>>> region. Just avoiding the movable area can fix it. In kexec_file_load(), >>>>> just checking or picking those unmovable region to put kernel/initrd in >>>>> function locate_mem_hole_callback() can fix it. The page or pageblock's >>>>> zone is movable or not, it's easy to know. This fix doesn't need to >>>>> bother other component. >>>> >>>> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL >>>> does not imply that it cannot get offlined and removed e.g., this is >>>> heavily used on ppc64, with 16MB sections. >>> >>> Really? I just know there are two kinds of mem hoplug in ppc, but don't >>> know the details. So in this case, is there any flag or a way to know >>> those memory block are hotpluggable? I am curious how those kernel data >>> is avoided to be put in this area. Or ppc just freely uses it for kernel >>> data or user space data, then try to migrate when hot remove? >> >> See >> arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() >> >> Under DLAPR, it can remove memory in LMB granularity, which is usually >> 16MB (== single section on ppc64). DLPAR will directly online all >> hotplugged memory (LMBs) from the kernel using device_online(), which >> will go to ZONE_NORMAL. >> >> When trying to remove memory, it simply scans for offlineable 16MB >> memory blocks (==section == LMB), offlines and removes them. No need for >> the movable zone and all the involved issues. > > Yes, this is a different one, thanks for pointing it out. It sounds like > balloon driver in virt platform, doesn't it? With DLPAR there is a hypervisor involved (which manages the actual HW DIMMs), so yes. > > Avoiding to put kexec kernel into movable zone can't solve this DLPAR > case as you said. > >> >> Now, the interesting question is, can we have LMBs added during boot >> (not via add_memory()), that will later be removed via remove_memory(). >> IIRC, we had BUGs related to that, so I think yes. If a section contains >> no unmovable allocations (after boot), it can get removed. > > I do want to ask this question. If we can add LMB into system RAM, then > reload kexec can solve it. > > Another better way is adding a common function to filter out the > movable zone when search position for kexec kernel, use a arch specific > funciton to filter out DLPAR memory blocks for ppc only. Over there, > we can simply use for_each_drmem_lmb() to do that. I was thinking about something similar. Maybe something like a notifier that can be used to test if selected memory can be used for kexec images. It would apply to - arm64 and filter out all hotadded memory (IIRC, only boot memory can be used). - powerpc to filter out all LMBs that can be removed (assuming not all memory corresponds to LMBs that can be removed, otherwise we're in trouble ... :) ) - virtio-mem to filter out all memory it added. - hyper-v to filter out partially backed memory blocks (esp. the last memory block it added and only partially backed it by memory). This would make it work for kexec_file_load(), however, I do wonder how we would want to approach that from userspace kexec-tools when handling it from kexec_load(). -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 14:49 ` David Hildenbrand @ 2020-04-15 2:35 ` Baoquan He 2020-04-16 13:31 ` David Hildenbrand 0 siblings, 1 reply; 61+ messages in thread From: Baoquan He @ 2020-04-15 2:35 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/14/20 at 04:49pm, David Hildenbrand wrote: > >>>>> The root cause is kexec-ed kernel is targeted at hotpluggable memory > >>>>> region. Just avoiding the movable area can fix it. In kexec_file_load(), > >>>>> just checking or picking those unmovable region to put kernel/initrd in > >>>>> function locate_mem_hole_callback() can fix it. The page or pageblock's > >>>>> zone is movable or not, it's easy to know. This fix doesn't need to > >>>>> bother other component. > >>>> > >>>> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > >>>> does not imply that it cannot get offlined and removed e.g., this is > >>>> heavily used on ppc64, with 16MB sections. > >>> > >>> Really? I just know there are two kinds of mem hoplug in ppc, but don't > >>> know the details. So in this case, is there any flag or a way to know > >>> those memory block are hotpluggable? I am curious how those kernel data > >>> is avoided to be put in this area. Or ppc just freely uses it for kernel > >>> data or user space data, then try to migrate when hot remove? > >> > >> See > >> arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() > >> > >> Under DLAPR, it can remove memory in LMB granularity, which is usually > >> 16MB (== single section on ppc64). DLPAR will directly online all > >> hotplugged memory (LMBs) from the kernel using device_online(), which > >> will go to ZONE_NORMAL. > >> > >> When trying to remove memory, it simply scans for offlineable 16MB > >> memory blocks (==section == LMB), offlines and removes them. No need for > >> the movable zone and all the involved issues. > > > > Yes, this is a different one, thanks for pointing it out. It sounds like > > balloon driver in virt platform, doesn't it? > > With DLPAR there is a hypervisor involved (which manages the actual HW > DIMMs), so yes. > > > > > Avoiding to put kexec kernel into movable zone can't solve this DLPAR > > case as you said. > > > >> > >> Now, the interesting question is, can we have LMBs added during boot > >> (not via add_memory()), that will later be removed via remove_memory(). > >> IIRC, we had BUGs related to that, so I think yes. If a section contains > >> no unmovable allocations (after boot), it can get removed. > > > > I do want to ask this question. If we can add LMB into system RAM, then > > reload kexec can solve it. > > > > Another better way is adding a common function to filter out the > > movable zone when search position for kexec kernel, use a arch specific > > funciton to filter out DLPAR memory blocks for ppc only. Over there, > > we can simply use for_each_drmem_lmb() to do that. > > I was thinking about something similar. Maybe something like a notifier > that can be used to test if selected memory can be used for kexec Not sure if I get the notifier idea clearly. If you mean 1) Add a common function to pick memory in unmovable zone; 2) Let DLPAR, balloon register with notifier; 3) In the common function, ask notified part to check if the picked unmovable memory is available for locating kexec kernel; Sounds doable to me, and not complicated. > images. It would apply to > > - arm64 and filter out all hotadded memory (IIRC, only boot memory can > be used). Do you mean hot added memory after boot can't be recognized and added into system RAM on arm64? > - powerpc to filter out all LMBs that can be removed (assuming not all > memory corresponds to LMBs that can be removed, otherwise we're in > trouble ... :) ) > - virtio-mem to filter out all memory it added. > - hyper-v to filter out partially backed memory blocks (esp. the last > memory block it added and only partially backed it by memory). > > This would make it work for kexec_file_load(), however, I do wonder how > we would want to approach that from userspace kexec-tools when handling > it from kexec_load(). Let's make kexec_file_load work firstly. Since this work is only first step to make kexec-ed kernel not break memory hotplug. After kexec rebooting, the KASLR may locate kernel into hotpluggable area too. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-15 2:35 ` Baoquan He @ 2020-04-16 13:31 ` David Hildenbrand 2020-04-16 14:02 ` Baoquan He 0 siblings, 1 reply; 61+ messages in thread From: David Hildenbrand @ 2020-04-16 13:31 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel > Not sure if I get the notifier idea clearly. If you mean > > 1) Add a common function to pick memory in unmovable zone; Not strictly required IMHO. But, minor detail. > 2) Let DLPAR, balloon register with notifier; Yeah, or virtio-mem, or any other technology that adds/removes memory dynamically. > 3) In the common function, ask notified part to check if the picked > unmovable memory is available for locating kexec kernel; Yeah. > > Sounds doable to me, and not complicated. > >> images. It would apply to >> >> - arm64 and filter out all hotadded memory (IIRC, only boot memory can >> be used). > > Do you mean hot added memory after boot can't be recognized and added > into system RAM on arm64? See patch #3 of this patch set, which wants to avoid placing kexec binaries on hotplugged memory. But I have no idea what the current plan regarding arm64 is (this thread exploded :) ). I would assume that we don't want to place kexec images on any hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > >> - powerpc to filter out all LMBs that can be removed (assuming not all >> memory corresponds to LMBs that can be removed, otherwise we're in >> trouble ... :) ) >> - virtio-mem to filter out all memory it added. >> - hyper-v to filter out partially backed memory blocks (esp. the last >> memory block it added and only partially backed it by memory). >> >> This would make it work for kexec_file_load(), however, I do wonder how >> we would want to approach that from userspace kexec-tools when handling >> it from kexec_load(). > > Let's make kexec_file_load work firstly. Since this work is only first > step to make kexec-ed kernel not break memory hotplug. After kexec > rebooting, the KASLR may locate kernel into hotpluggable area too. Can you elaborate how that would work? -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-16 13:31 ` David Hildenbrand @ 2020-04-16 14:02 ` Baoquan He 2020-04-16 14:09 ` David Hildenbrand 0 siblings, 1 reply; 61+ messages in thread From: Baoquan He @ 2020-04-16 14:02 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/16/20 at 03:31pm, David Hildenbrand wrote: > > Not sure if I get the notifier idea clearly. If you mean > > > > 1) Add a common function to pick memory in unmovable zone; > > Not strictly required IMHO. But, minor detail. > > > 2) Let DLPAR, balloon register with notifier; > > Yeah, or virtio-mem, or any other technology that adds/removes memory > dynamically. > > > 3) In the common function, ask notified part to check if the picked > > unmovable memory is available for locating kexec kernel; > > Yeah. These may not be needed, please see below comment. > > > > > Sounds doable to me, and not complicated. > > > >> images. It would apply to > >> > >> - arm64 and filter out all hotadded memory (IIRC, only boot memory can > >> be used). > > > > Do you mean hot added memory after boot can't be recognized and added > > into system RAM on arm64? > > See patch #3 of this patch set, which wants to avoid placing kexec > binaries on hotplugged memory. But I have no idea what the current plan > regarding arm64 is (this thread exploded :) ). > > I would assume that we don't want to place kexec images on any > hotplugged (or rather: hot(un)pluggable) memory - on any architecture. Yes, noticed that and James replied to DaveY. Later, when I was considering to make a draft patch to do the picking of memory from normal zone, and add a notifier, as we discussed at above, I suddenly realized that kexec_file_load doesn't have this issue. It traverse system RAM bottom up to get an available region to put kernel/initrd/boot_param, etc. I can't think of a system where its low memory could be unavailable. > > > > > > >> - powerpc to filter out all LMBs that can be removed (assuming not all > >> memory corresponds to LMBs that can be removed, otherwise we're in > >> trouble ... :) ) > >> - virtio-mem to filter out all memory it added. > >> - hyper-v to filter out partially backed memory blocks (esp. the last > >> memory block it added and only partially backed it by memory). > >> > >> This would make it work for kexec_file_load(), however, I do wonder how > >> we would want to approach that from userspace kexec-tools when handling > >> it from kexec_load(). > > > > Let's make kexec_file_load work firstly. Since this work is only first > > step to make kexec-ed kernel not break memory hotplug. After kexec > > rebooting, the KASLR may locate kernel into hotpluggable area too. > > Can you elaborate how that would work? Well, boot memory can be hotplugged or not after boot, they are marked in uefi tables, the current kexec doesn't save and pass them into 2nd kenrel, when kexec kernel bootup, it need read them and avoid them to randomize kernel into. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-16 14:02 ` Baoquan He @ 2020-04-16 14:09 ` David Hildenbrand 2020-04-16 14:36 ` Baoquan He 0 siblings, 1 reply; 61+ messages in thread From: David Hildenbrand @ 2020-04-16 14:09 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel >>> Sounds doable to me, and not complicated. >>> >>>> images. It would apply to >>>> >>>> - arm64 and filter out all hotadded memory (IIRC, only boot memory can >>>> be used). >>> >>> Do you mean hot added memory after boot can't be recognized and added >>> into system RAM on arm64? >> >> See patch #3 of this patch set, which wants to avoid placing kexec >> binaries on hotplugged memory. But I have no idea what the current plan >> regarding arm64 is (this thread exploded :) ). >> >> I would assume that we don't want to place kexec images on any >> hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > Yes, noticed that and James replied to DaveY. > > Later, when I was considering to make a draft patch to do the picking of > memory from normal zone, and add a notifier, as we discussed at above, I > suddenly realized that kexec_file_load doesn't have this issue. It > traverse system RAM bottom up to get an available region to put > kernel/initrd/boot_param, etc. I can't think of a system where its > low memory could be unavailable. kexec_walk_memblock() has the option for "kbuf->top_down". Only kexec_walk_resources() seems to ignore it. So I think in case of memblocks (e.g., arm64), this still applies? >> >>> >>> >>>> - powerpc to filter out all LMBs that can be removed (assuming not all >>>> memory corresponds to LMBs that can be removed, otherwise we're in >>>> trouble ... :) ) >>>> - virtio-mem to filter out all memory it added. >>>> - hyper-v to filter out partially backed memory blocks (esp. the last >>>> memory block it added and only partially backed it by memory). >>>> >>>> This would make it work for kexec_file_load(), however, I do wonder how >>>> we would want to approach that from userspace kexec-tools when handling >>>> it from kexec_load(). >>> >>> Let's make kexec_file_load work firstly. Since this work is only first >>> step to make kexec-ed kernel not break memory hotplug. After kexec >>> rebooting, the KASLR may locate kernel into hotpluggable area too. >> >> Can you elaborate how that would work? > > Well, boot memory can be hotplugged or not after boot, they are marked > in uefi tables, the current kexec doesn't save and pass them into 2nd > kenrel, when kexec kernel bootup, it need read them and avoid them to > randomize kernel into. What about e.g., memory hotplugged by ACPI? I would assume, that the kexec kernel will not make use of that (IOW detected that) until the ACPI driver comes up and re-detects + adds that memory. Or how would that machinery work in case we have a DIMM hotplugged via ACPI? -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-16 14:09 ` David Hildenbrand @ 2020-04-16 14:36 ` Baoquan He 2020-04-16 14:47 ` David Hildenbrand 0 siblings, 1 reply; 61+ messages in thread From: Baoquan He @ 2020-04-16 14:36 UTC (permalink / raw) To: David Hildenbrand, Andrew Morton Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Will Deacon, linux-arm-kernel On 04/16/20 at 04:09pm, David Hildenbrand wrote: > >>> Sounds doable to me, and not complicated. > >>> > >>>> images. It would apply to > >>>> > >>>> - arm64 and filter out all hotadded memory (IIRC, only boot memory can > >>>> be used). > >>> > >>> Do you mean hot added memory after boot can't be recognized and added > >>> into system RAM on arm64? > >> > >> See patch #3 of this patch set, which wants to avoid placing kexec > >> binaries on hotplugged memory. But I have no idea what the current plan > >> regarding arm64 is (this thread exploded :) ). > >> > >> I would assume that we don't want to place kexec images on any > >> hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > > > Yes, noticed that and James replied to DaveY. > > > > Later, when I was considering to make a draft patch to do the picking of > > memory from normal zone, and add a notifier, as we discussed at above, I > > suddenly realized that kexec_file_load doesn't have this issue. It > > traverse system RAM bottom up to get an available region to put > > kernel/initrd/boot_param, etc. I can't think of a system where its > > low memory could be unavailable. > > kexec_walk_memblock() has the option for "kbuf->top_down". Only > kexec_walk_resources() seems to ignore it. Yeah, that top down searching is done in a found low mem area. Means firstly search an available region bottom up, then put kernel top down in that region. The reason is our iomem res is linked with singly linked list. So we can only search bottom up efficiently. kexec_load is doing the real top down searching, so kernel will be put at the top of system ram. I ever tried to change it to support top down searching for kexec_file_load too with patches, since QE and customers are often confused with this difference when debugging. Andrew may remeber this, he suggested me to change the singly linked list to doubly linked list for iomem res, then do the top down searching for kexec_file_load. I tried with some effort, the change introduced too much code change, I just gave up finally. http://archive.lwn.net:8080/devicetree/20180718024944.577-1-bhe@redhat.com/ I can see that top down searching for kexec can avoid the highly used low memory region, esp under 4G, for dma, kinds of firmware reserving, etc. And customers/QE of kexec get used to it. I can change kexec_file_load to top down too with a simple way if people really complain it. But now, seems bottom up is not bad too. > > So I think in case of memblocks (e.g., arm64), this still applies? Yeah, aren't you trying to remove it? I haven't read your patches carefully, maybe I got it wrong. And arm64 even can't support the hot added memory being able to recorded into firmware, seems it's not so ready, won't they change that design in the future? > > >> > >>> > >>> > >>>> - powerpc to filter out all LMBs that can be removed (assuming not all > >>>> memory corresponds to LMBs that can be removed, otherwise we're in > >>>> trouble ... :) ) > >>>> - virtio-mem to filter out all memory it added. > >>>> - hyper-v to filter out partially backed memory blocks (esp. the last > >>>> memory block it added and only partially backed it by memory). > >>>> > >>>> This would make it work for kexec_file_load(), however, I do wonder how > >>>> we would want to approach that from userspace kexec-tools when handling > >>>> it from kexec_load(). > >>> > >>> Let's make kexec_file_load work firstly. Since this work is only first > >>> step to make kexec-ed kernel not break memory hotplug. After kexec > >>> rebooting, the KASLR may locate kernel into hotpluggable area too. > >> > >> Can you elaborate how that would work? > > > > Well, boot memory can be hotplugged or not after boot, they are marked > > in uefi tables, the current kexec doesn't save and pass them into 2nd > > kenrel, when kexec kernel bootup, it need read them and avoid them to > > randomize kernel into. > > What about e.g., memory hotplugged by ACPI? I would assume, that the > kexec kernel will not make use of that (IOW detected that) until the > ACPI driver comes up and re-detects + adds that memory. > > Or how would that machinery work in case we have a DIMM hotplugged via ACPI? ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't pass the efi, it won't get the SRAT table correctly, if I remember correctly. Yeah, I remeber kvm guest can get memory hotplugged with ACPI only, this won't happen on bare metal though. Need check carefully. I have been using kvm guest with uefi firmwire recently. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-16 14:36 ` Baoquan He @ 2020-04-16 14:47 ` David Hildenbrand 2020-04-21 13:29 ` David Hildenbrand 0 siblings, 1 reply; 61+ messages in thread From: David Hildenbrand @ 2020-04-16 14:47 UTC (permalink / raw) To: Baoquan He, Andrew Morton Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Will Deacon, linux-arm-kernel >> kexec_walk_memblock() has the option for "kbuf->top_down". Only >> kexec_walk_resources() seems to ignore it. > > Yeah, that top down searching is done in a found low mem area. Means > firstly search an available region bottom up, then put kernel top down > in that region. The reason is our iomem res is linked with singly linked > list. So we can only search bottom up efficiently. > > kexec_load is doing the real top down searching, so kernel will be put > at the top of system ram. I ever tried to change it to support top down > searching for kexec_file_load too with patches, since QE and customers > are often confused with this difference when debugging. > > Andrew may remeber this, he suggested me to change the singly linked list > to doubly linked list for iomem res, then do the top down searching for > kexec_file_load. I tried with some effort, the change introduced too much > code change, I just gave up finally. Well, at least right now this seems to be the right approach (hotplug), lol :) > > http://archive.lwn.net:8080/devicetree/20180718024944.577-1-bhe@redhat.com/ > > I can see that top down searching for kexec can avoid the highly used > low memory region, esp under 4G, for dma, kinds of firmware reserving, > etc. And customers/QE of kexec get used to it. I can change kexec_file_load > to top down too with a simple way if people really complain it. But now, > seems bottom up is not bad too. Ah, I understand the problem. Maybe a simple "optimization" would be to start searching bottom-up from e.g.,2GB/4GB first. If nothing was found, search botoom-up from 0-2GB/4GB etc. > >> >> So I think in case of memblocks (e.g., arm64), this still applies? > > Yeah, aren't you trying to remove it? I haven't read your patches > carefully, maybe I got it wrong. And arm64 even can't support the hot added For arm64 we're still creating memblocks for hotplugged memory, but I guess it's not too hard to stop doing that. > memory being able to recorded into firmware, seems it's not so ready, > won't they change that design in the future? It seems to be incomplete, yes. No idea if it's fixable, no arm64 expert ... >>>>>> - powerpc to filter out all LMBs that can be removed (assuming not all >>>>>> memory corresponds to LMBs that can be removed, otherwise we're in >>>>>> trouble ... :) ) >>>>>> - virtio-mem to filter out all memory it added. >>>>>> - hyper-v to filter out partially backed memory blocks (esp. the last >>>>>> memory block it added and only partially backed it by memory). >>>>>> >>>>>> This would make it work for kexec_file_load(), however, I do wonder how >>>>>> we would want to approach that from userspace kexec-tools when handling >>>>>> it from kexec_load(). >>>>> >>>>> Let's make kexec_file_load work firstly. Since this work is only first >>>>> step to make kexec-ed kernel not break memory hotplug. After kexec >>>>> rebooting, the KASLR may locate kernel into hotpluggable area too. >>>> >>>> Can you elaborate how that would work? >>> >>> Well, boot memory can be hotplugged or not after boot, they are marked >>> in uefi tables, the current kexec doesn't save and pass them into 2nd >>> kenrel, when kexec kernel bootup, it need read them and avoid them to >>> randomize kernel into. >> >> What about e.g., memory hotplugged by ACPI? I would assume, that the >> kexec kernel will not make use of that (IOW detected that) until the >> ACPI driver comes up and re-detects + adds that memory. >> >> Or how would that machinery work in case we have a DIMM hotplugged via ACPI? > > ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > pass the efi, it won't get the SRAT table correctly, if I remember > correctly. Yeah, I remeber kvm guest can get memory hotplugged with > ACPI only, this won't happen on bare metal though. Need check carefully. > I have been using kvm guest with uefi firmwire recently. Yeah, I can imagine that bare metal is different. kvm only uses ACPI. I'm also asking because of virtio-mem. Memory added via virtio-mem is not part of any efi tables or whatsoever. So I assume the kexec kernel will not detect it automatically (good!), instead load the virtio-mem driver and let it add memory back to the system. I should probably play with kexec and virtio-mem once I have some spare cycles ... to find out what's broken and needs to be addressed :) -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-16 14:47 ` David Hildenbrand @ 2020-04-21 13:29 ` David Hildenbrand 2020-04-21 13:57 ` David Hildenbrand ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: David Hildenbrand @ 2020-04-21 13:29 UTC (permalink / raw) To: Baoquan He, Andrew Morton Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Will Deacon, linux-arm-kernel >> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >> pass the efi, it won't get the SRAT table correctly, if I remember >> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >> ACPI only, this won't happen on bare metal though. Need check carefully. >> I have been using kvm guest with uefi firmwire recently. > > Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > > I'm also asking because of virtio-mem. Memory added via virtio-mem is > not part of any efi tables or whatsoever. So I assume the kexec kernel > will not detect it automatically (good!), instead load the virtio-mem > driver and let it add memory back to the system. > > I should probably play with kexec and virtio-mem once I have some spare > cycles ... to find out what's broken and needs to be addressed :) FWIW, I just gave virtio-mem and kexec/kdump a try. a) kdump seems to work. Memory added by virtio-mem is getting dumped. The kexec kernel only uses memory in the crash region. The virtio-mem driver properly bails out due to is_kdump_kernel(). b) "kexec -s -l" seems to work fine. For now, the kernel does not seem to get placed on virtio-mem memory (pure luck due to the left-to-right search). Memory added by virtio-mem is not getting added to the e820 map. Once the virtio-mem driver comes back up in the kexec kernel, the right memory is readded. c) "kexec -c -l" does not work properly. All memory added by virtio-mem is added to the e820 map, which is wrong. Memory that should not be touched will be touched by the kexec kernel. I assume kexec-tools just goes ahead and adds anything it can find in /proc/iomem (or /sys/firmware/memmap/) to the e820 map of the new kernel. Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is similarly added to the e820 map and, therefore, won't be able to be onlined MOVABLE easily. At least for virtio-mem, I would either have to a) Not support "kexec -c -l". A viable option if we would be planning on not supporting it either way in the long term. I could block this in-kernel somehow eventually. b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by indicating it in /proc/iomem in a special way ("System RAM (hotplugged)"/"System RAM (virtio-mem)"). Baoquan, any opinion on that? -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-21 13:29 ` David Hildenbrand @ 2020-04-21 13:57 ` David Hildenbrand 2020-04-21 13:59 ` Eric W. Biederman 2020-04-22 9:17 ` Baoquan He 2 siblings, 0 replies; 61+ messages in thread From: David Hildenbrand @ 2020-04-21 13:57 UTC (permalink / raw) To: Baoquan He, Andrew Morton Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Will Deacon, linux-arm-kernel On 21.04.20 15:29, David Hildenbrand wrote: >>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>> pass the efi, it won't get the SRAT table correctly, if I remember >>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>> ACPI only, this won't happen on bare metal though. Need check carefully. >>> I have been using kvm guest with uefi firmwire recently. >> >> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >> >> I'm also asking because of virtio-mem. Memory added via virtio-mem is >> not part of any efi tables or whatsoever. So I assume the kexec kernel >> will not detect it automatically (good!), instead load the virtio-mem >> driver and let it add memory back to the system. >> >> I should probably play with kexec and virtio-mem once I have some spare >> cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. > > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. > > > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. > > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). I just realized, that *not* creating /sys/firmware/memmap/ entries for virtio-mem memory seems to be the right thing to do. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-21 13:29 ` David Hildenbrand 2020-04-21 13:57 ` David Hildenbrand @ 2020-04-21 13:59 ` Eric W. Biederman 2020-04-21 14:30 ` David Hildenbrand 2020-04-22 9:17 ` Baoquan He 2 siblings, 1 reply; 61+ messages in thread From: Eric W. Biederman @ 2020-04-21 13:59 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel David Hildenbrand <david@redhat.com> writes: >>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>> pass the efi, it won't get the SRAT table correctly, if I remember >>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>> ACPI only, this won't happen on bare metal though. Need check carefully. >>> I have been using kvm guest with uefi firmwire recently. >> >> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >> >> I'm also asking because of virtio-mem. Memory added via virtio-mem is >> not part of any efi tables or whatsoever. So I assume the kexec kernel >> will not detect it automatically (good!), instead load the virtio-mem >> driver and let it add memory back to the system. >> >> I should probably play with kexec and virtio-mem once I have some spare >> cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. This sounds like a bug. > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. This sounds like correct behavior to me. If you add memory to the system it is treated as memory to the system. If we need to make it a special kind of memory with special rules we can have some kind of special marking for the memory. But hotplugged is not in itself a sufficient criteria to say don't use this as normal memory. If take a huge server and I plug in an extra dimm it is just memory. For a similarly huge server I might want to have memory that the system booted with unpluggable, in case hardware error reporting notices a dimm generating a lot of memory errors. Now perhaps virtualization needs a special tier of memory that should only be used for cases where the memory is easily movable. I am not familiar with virtio-mem but my skim of the initial design is that virtio-mem was not designed to be such a special tier of memory. Perhaps something has changed? https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. No. > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). How does the kernel memory allocator treat this memory? The logic is simple. If the kernel memory allocator treats that memory as ordinary memory available for all uses it should be presented as ordinary memory available for all uses. If the kernel memory allocator treats that memory as special memory only available for uses that we can easily free later and give back to the system. AKA it is special and not oridinary memory we should mark it as such. Eric p.s. Please excuse me for jumping in I may be missing some important context, but what I read when I saw this message in my inbox just seemed very wrong. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-21 13:59 ` Eric W. Biederman @ 2020-04-21 14:30 ` David Hildenbrand 0 siblings, 0 replies; 61+ messages in thread From: David Hildenbrand @ 2020-04-21 14:30 UTC (permalink / raw) To: Eric W. Biederman Cc: piliu, Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel >> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >> to get placed on virtio-mem memory (pure luck due to the left-to-right >> search). Memory added by virtio-mem is not getting added to the e820 >> map. Once the virtio-mem driver comes back up in the kexec kernel, the >> right memory is readded. > > This sounds like a bug. This is how virtio-mem wants its memory to get handled. > >> c) "kexec -c -l" does not work properly. All memory added by virtio-mem >> is added to the e820 map, which is wrong. Memory that should not be >> touched will be touched by the kexec kernel. I assume kexec-tools just >> goes ahead and adds anything it can find in /proc/iomem (or >> /sys/firmware/memmap/) to the e820 map of the new kernel. >> >> Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is >> similarly added to the e820 map and, therefore, won't be able to be >> onlined MOVABLE easily. > > This sounds like correct behavior to me. If you add memory to the > system it is treated as memory to the system. Yeah, I would agree if we are talking about DIMMs, but this memory is special. It's added via a paravirtualized interface and will contain holes, especially after unplug. While memory in these holes can usually be read, it should not be written. More on that below. > > If we need to make it a special kind of memory with special rules we can > have some kind of special marking for the memory. But hotplugged is not > in itself a sufficient criteria to say don't use this as normal memory. Agreed. It is special, though. > > If take a huge server and I plug in an extra dimm it is just memory. Agreed. [...] > > Now perhaps virtualization needs a special tier of memory that should > only be used for cases where the memory is easily movable. > > I am not familiar with virtio-mem but my skim of the initial design > is that virtio-mem was not designed to be such a special tier of memory. > Perhaps something has changed? > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html Yes, a lot changed. See https://lkml.kernel.org/r/20200311171422.10484-1-david@redhat.com for the latest-greatest design overview. > >> b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by >> indicating it in /proc/iomem in a special way ("System RAM >> (hotplugged)"/"System RAM (virtio-mem)"). > > How does the kernel memory allocator treat this memory? So what virtio-mem does is add memory sections on demand and populate within these sections the requested amount of memory. E.g., if 64MB are requested, it will add a 128MB section/resource but only make the first 64MB accessible (via the hypervisor) and only give the first 64MB to the buddy. This way of adding memory is similar to what XEN and hypver-v balloon drivers do when hotplugging memory. When requested to plug more memory, it might go ahead and make (parts of) the remaining 64MB accessible and give them to the buddy. In case it cannot "fill any holes", it will add a new section. When requested to unplug memory, it will try to remove memory from the added (here 64MB) memory from the buddy and tell the hypervisor about it. So, it has some similarity to ballooning in virtual environment, however, it manages its own device memory only and can therefore give better guarantees and detect malicious guests. Right now, I think the right approach would be to not create /sys/firmware/memmap entries from memory virtio-mem added. [...] > > p.s. Please excuse me for jumping in I may be missing some important > context, but what I read when I saw this message in my inbox just seemed > very wrong. Yeah, still, thanks for having a look. Please let me know if you need more information. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-21 13:29 ` David Hildenbrand 2020-04-21 13:57 ` David Hildenbrand 2020-04-21 13:59 ` Eric W. Biederman @ 2020-04-22 9:17 ` Baoquan He 2020-04-22 9:24 ` David Hildenbrand 2 siblings, 1 reply; 61+ messages in thread From: Baoquan He @ 2020-04-22 9:17 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >> pass the efi, it won't get the SRAT table correctly, if I remember > >> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >> ACPI only, this won't happen on bare metal though. Need check carefully. > >> I have been using kvm guest with uefi firmwire recently. > > > > Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > > > > I'm also asking because of virtio-mem. Memory added via virtio-mem is > > not part of any efi tables or whatsoever. So I assume the kexec kernel > > will not detect it automatically (good!), instead load the virtio-mem > > driver and let it add memory back to the system. > > > > I should probably play with kexec and virtio-mem once I have some spare > > cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). Right, kdump is not impacted later added memory. > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. kexec_file_load just behaves as you tested. It doesn't collect later added memory to e820 because it uses e820_table_kexec directly to pass e820 to kexec-ed kernel. However, this e820_table_kexec is only updated during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel doesn't have it in e820 during bootup, but it's recoginized and added when ACPI scanning. I think we should update e820_table_kexec when hot add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, balloon will need be added into e820_table_kexec too, and if this is expected behaviour. But whatever we do, it won't impact the kexec file_loading, because of the searching strategy bottom up. Just adding them into e820_table_kexec will make it consistent with cold reboot which get recognizes and get them into e820 during bootup. > > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. Yes, kexec_load will read memory regions from /sys/firmware/memmap/ or /proc/iomem. Making it right seems a little harder, we can export them to /proc/iomem or /sys/firmware/memmap/ with mark them with 'hotplug', but the attribute that which zone they belongs to is not easy to tell. We are proactive on widely testing kexec_file_load on x86_64, s390, arm64 by adding test cases into CKI. > > > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. > > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). > > Baoquan, any opinion on that? > > -- > Thanks, > > David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-22 9:17 ` Baoquan He @ 2020-04-22 9:24 ` David Hildenbrand 2020-04-22 9:57 ` Baoquan He 0 siblings, 1 reply; 61+ messages in thread From: David Hildenbrand @ 2020-04-22 9:24 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 22.04.20 11:17, Baoquan He wrote: > On 04/21/20 at 03:29pm, David Hildenbrand wrote: >>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>>> pass the efi, it won't get the SRAT table correctly, if I remember >>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>>> ACPI only, this won't happen on bare metal though. Need check carefully. >>>> I have been using kvm guest with uefi firmwire recently. >>> >>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >>> >>> I'm also asking because of virtio-mem. Memory added via virtio-mem is >>> not part of any efi tables or whatsoever. So I assume the kexec kernel >>> will not detect it automatically (good!), instead load the virtio-mem >>> driver and let it add memory back to the system. >>> >>> I should probably play with kexec and virtio-mem once I have some spare >>> cycles ... to find out what's broken and needs to be addressed :) >> >> FWIW, I just gave virtio-mem and kexec/kdump a try. >> >> a) kdump seems to work. Memory added by virtio-mem is getting dumped. >> The kexec kernel only uses memory in the crash region. The virtio-mem >> driver properly bails out due to is_kdump_kernel(). > > Right, kdump is not impacted later added memory. > >> >> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >> to get placed on virtio-mem memory (pure luck due to the left-to-right >> search). Memory added by virtio-mem is not getting added to the e820 >> map. Once the virtio-mem driver comes back up in the kexec kernel, the >> right memory is readded. > > kexec_file_load just behaves as you tested. It doesn't collect later > added memory to e820 because it uses e820_table_kexec directly to pass > e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > doesn't have it in e820 during bootup, but it's recoginized and added > when ACPI scanning. I think we should update e820_table_kexec when hot > add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > balloon will need be added into e820_table_kexec too, and if this is > expected behaviour. > > But whatever we do, it won't impact the kexec file_loading, because of > the searching strategy bottom up. Just adding them into e820_table_kexec > will make it consistent with cold reboot which get recognizes and get > them into e820 during bootup. Yeah, I think whatever a cold-booted kernel will see is what kexec-ed kernel should see. Not more, not less. Regarding virtio-mem: Not in e820 on cold-boot. Regarding DIMMs: DIMMs under KVM will never show up in the e820 map IIRC. I think on real HW it can be different. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-22 9:24 ` David Hildenbrand @ 2020-04-22 9:57 ` Baoquan He 2020-04-22 10:05 ` David Hildenbrand 0 siblings, 1 reply; 61+ messages in thread From: Baoquan He @ 2020-04-22 9:57 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/22/20 at 11:24am, David Hildenbrand wrote: > On 22.04.20 11:17, Baoquan He wrote: > > On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >>>> pass the efi, it won't get the SRAT table correctly, if I remember > >>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >>>> ACPI only, this won't happen on bare metal though. Need check carefully. > >>>> I have been using kvm guest with uefi firmwire recently. > >>> > >>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > >>> > >>> I'm also asking because of virtio-mem. Memory added via virtio-mem is > >>> not part of any efi tables or whatsoever. So I assume the kexec kernel > >>> will not detect it automatically (good!), instead load the virtio-mem > >>> driver and let it add memory back to the system. > >>> > >>> I should probably play with kexec and virtio-mem once I have some spare > >>> cycles ... to find out what's broken and needs to be addressed :) > >> > >> FWIW, I just gave virtio-mem and kexec/kdump a try. > >> > >> a) kdump seems to work. Memory added by virtio-mem is getting dumped. > >> The kexec kernel only uses memory in the crash region. The virtio-mem > >> driver properly bails out due to is_kdump_kernel(). > > > > Right, kdump is not impacted later added memory. > > > >> > >> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > >> to get placed on virtio-mem memory (pure luck due to the left-to-right > >> search). Memory added by virtio-mem is not getting added to the e820 > >> map. Once the virtio-mem driver comes back up in the kexec kernel, the > >> right memory is readded. > > > > kexec_file_load just behaves as you tested. It doesn't collect later > > added memory to e820 because it uses e820_table_kexec directly to pass > > e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > > during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > > doesn't have it in e820 during bootup, but it's recoginized and added > > when ACPI scanning. I think we should update e820_table_kexec when hot > > add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > > balloon will need be added into e820_table_kexec too, and if this is > > expected behaviour. > > > > But whatever we do, it won't impact the kexec file_loading, because of > > the searching strategy bottom up. Just adding them into e820_table_kexec > > will make it consistent with cold reboot which get recognizes and get > > them into e820 during bootup. > > Yeah, I think whatever a cold-booted kernel will see is what kexec-ed > kernel should see. Not more, not less. > > Regarding virtio-mem: Not in e820 on cold-boot. > Regarding DIMMs: DIMMs under KVM will never show up in the e820 map > IIRC. I think on real HW it can be different. Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of QEMU/KVM guest in this area, why we don't make kvm guest recognize hotpluggable DIMM and add it into e820 map, he said he had tried to make it, but this will corrupt guest on HyperV. So he had to revert the commit on qemu. So I think we can leave it for now for both real HW and kvm, or update the e820_table_kexec to include added DIMM for both real HW and KVM. I hope one day KVM dev will find a way to conquer the defect on HyperV and make the e820map consistent with bare metal. After all, kvm guest is trying to imitate real HW for the most part. Anyway, I will think about the e820_table_kexec updating. See if we can do something about it. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-22 9:57 ` Baoquan He @ 2020-04-22 10:05 ` David Hildenbrand 2020-04-22 10:36 ` Baoquan He 0 siblings, 1 reply; 61+ messages in thread From: David Hildenbrand @ 2020-04-22 10:05 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 22.04.20 11:57, Baoquan He wrote: > On 04/22/20 at 11:24am, David Hildenbrand wrote: >> On 22.04.20 11:17, Baoquan He wrote: >>> On 04/21/20 at 03:29pm, David Hildenbrand wrote: >>>>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>>>>> pass the efi, it won't get the SRAT table correctly, if I remember >>>>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>>>>> ACPI only, this won't happen on bare metal though. Need check carefully. >>>>>> I have been using kvm guest with uefi firmwire recently. >>>>> >>>>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >>>>> >>>>> I'm also asking because of virtio-mem. Memory added via virtio-mem is >>>>> not part of any efi tables or whatsoever. So I assume the kexec kernel >>>>> will not detect it automatically (good!), instead load the virtio-mem >>>>> driver and let it add memory back to the system. >>>>> >>>>> I should probably play with kexec and virtio-mem once I have some spare >>>>> cycles ... to find out what's broken and needs to be addressed :) >>>> >>>> FWIW, I just gave virtio-mem and kexec/kdump a try. >>>> >>>> a) kdump seems to work. Memory added by virtio-mem is getting dumped. >>>> The kexec kernel only uses memory in the crash region. The virtio-mem >>>> driver properly bails out due to is_kdump_kernel(). >>> >>> Right, kdump is not impacted later added memory. >>> >>>> >>>> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >>>> to get placed on virtio-mem memory (pure luck due to the left-to-right >>>> search). Memory added by virtio-mem is not getting added to the e820 >>>> map. Once the virtio-mem driver comes back up in the kexec kernel, the >>>> right memory is readded. >>> >>> kexec_file_load just behaves as you tested. It doesn't collect later >>> added memory to e820 because it uses e820_table_kexec directly to pass >>> e820 to kexec-ed kernel. However, this e820_table_kexec is only updated >>> during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel >>> doesn't have it in e820 during bootup, but it's recoginized and added >>> when ACPI scanning. I think we should update e820_table_kexec when hot >>> add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, >>> balloon will need be added into e820_table_kexec too, and if this is >>> expected behaviour. >>> >>> But whatever we do, it won't impact the kexec file_loading, because of >>> the searching strategy bottom up. Just adding them into e820_table_kexec >>> will make it consistent with cold reboot which get recognizes and get >>> them into e820 during bootup. >> >> Yeah, I think whatever a cold-booted kernel will see is what kexec-ed >> kernel should see. Not more, not less. >> >> Regarding virtio-mem: Not in e820 on cold-boot. >> Regarding DIMMs: DIMMs under KVM will never show up in the e820 map >> IIRC. I think on real HW it can be different. > > Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature > of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of > QEMU/KVM guest in this area, why we don't make kvm guest recognize > hotpluggable DIMM and add it into e820 map, he said he had tried to make > it, but this will corrupt guest on HyperV. So he had to revert the Yeah, I remember that this had to be reverted due to something breaking. But OTOH, it allows us to online coldplugged DIMMs online_movable easily, so I'd say it's even a feature (although, does not behave like real HW we have). I use this extensively when testing memory hot(un)plug via coldplugged DIMMs. I do wonder if there is real HW, where this is also the case. > commit on qemu. So I think we can leave it for now for both real HW and > kvm, or update the e820_table_kexec to include added DIMM for both real > HW and KVM. I hope one day KVM dev will find a way to conquer the defect > on HyperV and make the e820map consistent with bare metal. After all, > kvm guest is trying to imitate real HW for the most part. > > Anyway, I will think about the e820_table_kexec updating. See if we can > do something about it. Yeah, for DIMMs on real HW it might definitely make sense. We might be able to hook into updates of /sys/firmware/memmap on memory add/remove. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-22 10:05 ` David Hildenbrand @ 2020-04-22 10:36 ` Baoquan He 0 siblings, 0 replies; 61+ messages in thread From: Baoquan He @ 2020-04-22 10:36 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/22/20 at 12:05pm, David Hildenbrand wrote: > On 22.04.20 11:57, Baoquan He wrote: > > On 04/22/20 at 11:24am, David Hildenbrand wrote: > >> On 22.04.20 11:17, Baoquan He wrote: > >>> On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >>>>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >>>>>> pass the efi, it won't get the SRAT table correctly, if I remember > >>>>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >>>>>> ACPI only, this won't happen on bare metal though. Need check carefully. > >>>>>> I have been using kvm guest with uefi firmwire recently. > >>>>> > >>>>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > >>>>> > >>>>> I'm also asking because of virtio-mem. Memory added via virtio-mem is > >>>>> not part of any efi tables or whatsoever. So I assume the kexec kernel > >>>>> will not detect it automatically (good!), instead load the virtio-mem > >>>>> driver and let it add memory back to the system. > >>>>> > >>>>> I should probably play with kexec and virtio-mem once I have some spare > >>>>> cycles ... to find out what's broken and needs to be addressed :) > >>>> > >>>> FWIW, I just gave virtio-mem and kexec/kdump a try. > >>>> > >>>> a) kdump seems to work. Memory added by virtio-mem is getting dumped. > >>>> The kexec kernel only uses memory in the crash region. The virtio-mem > >>>> driver properly bails out due to is_kdump_kernel(). > >>> > >>> Right, kdump is not impacted later added memory. > >>> > >>>> > >>>> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > >>>> to get placed on virtio-mem memory (pure luck due to the left-to-right > >>>> search). Memory added by virtio-mem is not getting added to the e820 > >>>> map. Once the virtio-mem driver comes back up in the kexec kernel, the > >>>> right memory is readded. > >>> > >>> kexec_file_load just behaves as you tested. It doesn't collect later > >>> added memory to e820 because it uses e820_table_kexec directly to pass > >>> e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > >>> during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > >>> doesn't have it in e820 during bootup, but it's recoginized and added > >>> when ACPI scanning. I think we should update e820_table_kexec when hot > >>> add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > >>> balloon will need be added into e820_table_kexec too, and if this is > >>> expected behaviour. > >>> > >>> But whatever we do, it won't impact the kexec file_loading, because of > >>> the searching strategy bottom up. Just adding them into e820_table_kexec > >>> will make it consistent with cold reboot which get recognizes and get > >>> them into e820 during bootup. > >> > >> Yeah, I think whatever a cold-booted kernel will see is what kexec-ed > >> kernel should see. Not more, not less. > >> > >> Regarding virtio-mem: Not in e820 on cold-boot. > >> Regarding DIMMs: DIMMs under KVM will never show up in the e820 map > >> IIRC. I think on real HW it can be different. > > > > Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature > > of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of > > QEMU/KVM guest in this area, why we don't make kvm guest recognize > > hotpluggable DIMM and add it into e820 map, he said he had tried to make > > it, but this will corrupt guest on HyperV. So he had to revert the > > Yeah, I remember that this had to be reverted due to something breaking. > But OTOH, it allows us to online coldplugged DIMMs online_movable > easily, so I'd say it's even a feature (although, does not behave like > real HW we have). > > I use this extensively when testing memory hot(un)plug via coldplugged > DIMMs. > > I do wonder if there is real HW, where this is also the case. None for what I know. Hotplug on real HW includes two parts, the boot mem being hotpluggable is more flexiable one. It allows people to replace bad DIMM. And you can see code in boot stage has been adjusted a lot on this purpose, at that time, people haven't thought about kvm guest. > > > commit on qemu. So I think we can leave it for now for both real HW and > > kvm, or update the e820_table_kexec to include added DIMM for both real > > HW and KVM. I hope one day KVM dev will find a way to conquer the defect > > on HyperV and make the e820map consistent with bare metal. After all, > > kvm guest is trying to imitate real HW for the most part. > > > > Anyway, I will think about the e820_table_kexec updating. See if we can > > do something about it. > > Yeah, for DIMMs on real HW it might definitely make sense. We might be > able to hook into updates of /sys/firmware/memmap on memory add/remove. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-13 13:15 ` Eric W. Biederman 2020-04-13 23:01 ` Andrew Morton 2020-04-14 6:40 ` Baoquan He @ 2020-04-14 9:16 ` Dave Young 2020-04-14 9:38 ` Dave Young 2 siblings, 1 reply; 61+ messages in thread From: Dave Young @ 2020-04-14 9:16 UTC (permalink / raw) To: Eric W. Biederman Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 04/13/20 at 08:15am, Eric W. Biederman wrote: > Baoquan He <bhe@redhat.com> writes: > > > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >> > >> The only benefit of kexec_file_load is that it is simple enough from a > >> kernel perspective that signatures can be checked. > > > > We don't have this restriction any more with below commit: > > > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > > and KEXEC_SIG_FORCE") > > > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > > secure boot or legacy system for kexec/kdump. Being simple enough is > > enough to astract and convince us to use it instead. And kexec_file_load > > has been in use for several years on systems with secure boot, since > > added in 2014, on x86_64. > > No. Actaully kexec_file_load is the less capable interface, and less > flexible interface. Which is why it is appropriate for signature > verification. I agreed that the user space design is more flexible, but as for the common use case of loading bzImage (say x86 as an example) the kexec_file_load is good enough. We could have other potential improvement based on kexec_file_load. For example we could use it to do some early kdump loading, eg. try to load an attached kdump kernel immediately once the crashkernel memory get reserved. > > >> kexec_load in every other respect is the more capable and functional > >> interface. It makes no sense to get rid of it. We do not remove kexec_load at all, it is indeed helpful in many cases as all agreed. But if we have a bug reported for both we may fix kexec_file_load first because it is usually easier, also do not need to worry about too much about old kernel and new kernel compatibility. For example the recent breakage we found in efi path, kexec_file_load just work after the efi cleanup, but kexec_load depends on the ABI we added, so we must fix it as below: https://lore.kernel.org/linux-efi/20200410135644.GB6772@dhcp-128-65.nay.redhat.com/ > >> > >> It does make sense to reload with a loaded kernel on memory hotplug. > >> That is simple and easy. If we are going to handle something in the > >> kernel it should simple an automated unloading of the kernel on memory > >> hotplug. > >> > >> > >> I think it would be irresponsible to deprecate kexec_load on any > >> platform. > >> > >> I also suspect that kexec_file_load could be taught to copy the dtb > >> on arm32 if someone wants to deal with signatures. > >> > >> We definitely can not even think of deprecating kexec_load until > >> architecture that supports it also supports kexec_file_load and everyone > >> is happy with that interface. That is Linus's no regression rule. > > > > I should pick a milder word to express our tendency and tell our plan > > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > > much. I didn't mean to say 'deprecate' at all when replied. > > > > The situation and trend I understand about kexec_load and kexec_file_load > > are: > > > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > > have yet, just as x86_64, arm64 and s390 have done; > > > > 2) kexec_file_load is suggested to use, and take precedence over > > kexec_load in the future, if both are supported in one ARCH. > > The deep problem is that kexec_file_load is distinctly less expressive > than kexec_load. > > > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > > and by ARCHes for back compatibility w/ kexec_file_load support. > > > > For 1) and 2), I think the reason is obvious as Eric said, > > kexec_file_load is simple enough. And currently, whenever we got a bug > > report, we may need fix them twice, for kexec_load and kexec_file_load. > > If kexec_file_load is made by default, e.g on x86_64, we will change it > > in kernel space only, for kexec_file_load. This is what I meant about > > 'obsolete gradually'. I think for arm64, s390, they will do these too. > > Unless there's some critical/blocker bug in kexec_load, to corrupt the > > old kexec_load interface in old product. > > Maybe. The code that kexec_file_load sucked into the kernel is quite > stable and rarely needs changes except during a port of kexec to > another architecture. > > Last I looked the real maintenance effor of kexec and kexec on panic was > in the drivers. So I don't think we can use maintenance to do anything. > > > For 3), people can still use kexec_load and develop/fix for it, if no > > kexec_file_load supported. But 32-bit arm should be a different one, > > more like i386, we will leave it as is, and fix anything which could > > break it. But people really expects to improve or add feature to it? E.g > > in this patchset, the mem hotplug issue James raised, I assume James is > > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > > another reply, people even don't agree to continue supporting memory > > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > > bug on i386 with a patch, but people would rather set it as BROKEN. > > For memory hotplug just reload. Userspace already gets good events. > > We should not expect anything except a panic kernel to be loaded over a > memory hotplug event. The kexec on panic code should actually be loaded > in a location that we don't reliquish if asked for it. > > Quite frankly at this point I would love to see the signature fad die, > which would allow us to remove kexec_file_load. I still have not seen > the signature code used anywhere except by people anticipating trouble. Same to me, I also hate the Secure Boot, and I also do not like the trouble added by signature verification. But still we found that beyond of Secure Boot use cases it is also useful in other usual cases. And since kernel has the lockdown supported we have to leave with it. Thanks Dave _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 9:16 ` Dave Young @ 2020-04-14 9:38 ` Dave Young 0 siblings, 0 replies; 61+ messages in thread From: Dave Young @ 2020-04-14 9:38 UTC (permalink / raw) To: Eric W. Biederman Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel > We do not remove kexec_load at all, it is indeed helpful in many cases > as all agreed. But if we have a bug reported for both we may fix > kexec_file_load first because it is usually easier, also do not need to > worry about too much about old kernel and new kernel compatibility. > > For example the recent breakage we found in efi path, kexec_file_load > just work after the efi cleanup, but kexec_load depends on the ABI we > added, so we must fix it as below: > https://lore.kernel.org/linux-efi/20200410135644.GB6772@dhcp-128-65.nay.redhat.com/ Also, we have some specific sysfs files exported for kexec-tools use /sys/firmware/efi/runtime-map/* and a few other table addresses: fw_vendor runtime and config_table under /sys/firmware/efi That is only used by userspace kexec_tools for kexec_load, now the runtime field is useless because of Ard's cleanup in efi code, but we have to keep it there, older kexec-tools will need it. In this case kexec_file_load do not need those hacks at all. So in the future if we have to invent some kernel/userspace abi only for kexec_load we should be careful and maybe reject if no strong reason. Thanks Dave _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-10 19:10 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image Andrew Morton 2020-04-11 3:44 ` Baoquan He @ 2020-04-14 7:05 ` David Hildenbrand 2020-04-14 16:55 ` James Morse 1 sibling, 1 reply; 61+ messages in thread From: David Hildenbrand @ 2020-04-14 7:05 UTC (permalink / raw) To: Andrew Morton Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Eric Biederman, Will Deacon, linux-arm-kernel On 10.04.20 21:10, Andrew Morton wrote: > It's unclear (to me) what is the status of this patchset. But it does appear that > an new version can be expected? > I'd suggest to unqueue the patches until we have a consensus. While there are a couple of ideas floating around here, my current suggestion would be either 1. Indicate all hotplugged memory as "System RAM (hotplugged)" in /proc/iomem and the firmware memmap (on all architectures). This will require kexec changes, but I would have assume that kexec has to be updated in lock-step with the kernel just like e.g., makedumpfile. Modify kexec() to not place the kexec kernel on these areas (easy) but still consider them as crash regions to dump. When loading a kexec kernel, validate in the kernel that the memory is appropriate. 2. Make kexec() reload the the kernel whenever we e.g., get a udev event for removal of memory in /sys/devices/system/memory/. On every remove_memory(), invalidate the loaded kernel in the kernel. As I mentioned somewhere, 1. will be interesting for virtio-mem, where we don't want any kexec kernel to be placed on virtio-mem-added memory. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 7:05 ` David Hildenbrand @ 2020-04-14 16:55 ` James Morse 2020-04-14 17:41 ` David Hildenbrand 0 siblings, 1 reply; 61+ messages in thread From: James Morse @ 2020-04-14 16:55 UTC (permalink / raw) To: David Hildenbrand, Andrew Morton Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Will Deacon, linux-arm-kernel Hi guys, On 14/04/2020 08:05, David Hildenbrand wrote: > On 10.04.20 21:10, Andrew Morton wrote: >> It's unclear (to me) what is the status of this patchset. But it does appear that >> an new version can be expected? > I'd suggest to unqueue the patches until we have a consensus. Certainly! > While there are a couple of ideas floating around here, my current > suggestion would be either > > 1. Indicate all hotplugged memory as "System RAM (hotplugged)" in > /proc/iomem and the firmware memmap (on all architectures). This will > require kexec changes, > but I would have assume that kexec has to be > updated in lock-step with the kernel News to me: I was using the version I first built when arm64's support was new. I've only had to update it once when we had to change user-space. I don't think debian updates kexec-tools when it updates the kernel. Making changes to /proc/iomem means updating user-space again, (for kdump). I'd like to avoid that if its at all possible. > just like e.g., makedumpfile. > Modify kexec() to not place the kexec kernel on these areas (easy) but > still consider them as crash regions to dump. When loading a kexec > kernel, validate in the kernel that the memory is appropriate. > 2. Make kexec() reload the the kernel whenever we e.g., get a udev event > for removal of memory in /sys/devices/system/memory/. I don't think we can rely on user-space to do something, > On every remove_memory(), invalidate the loaded kernel in the kernel. This is an option, ... but its a change of behaviour. If user-space asks for two impossible things, the second request should fail. Having the first-one disappear is a bit spooky... Fortunately user-space checks the 'kexec -l' bit happened before it calls reboot() behind 'kexec -e'. So this works, but is not intuitive. ("Did I load it? What changed and when? oh, half a mile up in dmesg is a message saying the kernel discarded the kexec kernel last wednesday.") > As I mentioned somewhere, 1. will be interesting for virtio-mem, where > we don't want any kexec kernel to be placed on virtio-mem-added memory. Do these virtio-mem-added regions need to be accessible by kdump? (do we already need a user-space change for that?) A third option, along the line of what I posted: Split the 'offline' and 'removed' ideas, which David mentioned somewhere. We'd end up with (yet) another notifier chain, that prevents the memory being removed, but you can still mark it as offline in /sys/. (...I'm not quite sure why you would do that...) This would need hooking up for ACPI (which covers x86 and arm64), and other architectures mechanisms for doing this... arm64 can then switch is arch hook that prevents 'bootmem' being removed to this new notifier chain, as the kernel can only boot from that was present at boot. My preference is 3, then 2. I think 1 is slightly less desirable than a message at kexec time that the memory layout has changed since load, and this might not work... Thanks, James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 16:55 ` James Morse @ 2020-04-14 17:41 ` David Hildenbrand 0 siblings, 0 replies; 61+ messages in thread From: David Hildenbrand @ 2020-04-14 17:41 UTC (permalink / raw) To: James Morse, Andrew Morton Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Will Deacon, linux-arm-kernel >> While there are a couple of ideas floating around here, my current >> suggestion would be either >> >> 1. Indicate all hotplugged memory as "System RAM (hotplugged)" in >> /proc/iomem and the firmware memmap (on all architectures). This will >> require kexec changes, > >> but I would have assume that kexec has to be >> updated in lock-step with the kernel > > News to me: I was using the version I first built when arm64's support was new. I've only > had to update it once when we had to change user-space. > > I don't think debian updates kexec-tools when it updates the kernel. I would assume they are also not pushing the latest-greatest kernel in their current release, after settling on a kexec version, no? I think you can assume new kernels to require new kexec-tools versions to provide all features. > Making changes to /proc/iomem means updating user-space again, (for kdump). I'd like to > avoid that if its at all possible. Yes, it's not desirable, but if all that's not working is a "not all memory will be dumped out of the box", at least I think this is tolerable. It's not like we're completely breaking kexec. Your current arm64 patches require the same change AFAIKS - and I think we already have arm64 hotplug support in Linux distros. As I said, similarly, makedumpfile has to be upgraded with every kernel release to make kdump work as expected. And that is no big news I hope :) >> just like e.g., makedumpfile. >> Modify kexec() to not place the kexec kernel on these areas (easy) but >> still consider them as crash regions to dump. When loading a kexec >> kernel, validate in the kernel that the memory is appropriate. > > >> 2. Make kexec() reload the the kernel whenever we e.g., get a udev event >> for removal of memory in /sys/devices/system/memory/. > > I don't think we can rely on user-space to do something, > > >> On every remove_memory(), invalidate the loaded kernel in the kernel. > > This is an option, ... but its a change of behaviour. If user-space asks for two > impossible things, the second request should fail. Having the first-one disappear is a bit > spooky... We are talking about corner cases that are already broken, no? > > Fortunately user-space checks the 'kexec -l' bit happened before it calls reboot() behind > 'kexec -e'. So this works, but is not intuitive. > > ("Did I load it? What changed and when? oh, half a mile up in dmesg is a message saying > the kernel discarded the kexec kernel last wednesday.") > > >> As I mentioned somewhere, 1. will be interesting for virtio-mem, where >> we don't want any kexec kernel to be placed on virtio-mem-added memory. > > Do these virtio-mem-added regions need to be accessible by kdump? > (do we already need a user-space change for that?) Yes, they have to be accessible by kdump. Currently, they are also exported as "System RAM" via /proc/iomem - which is why dumping works e.g., on x86-64 (we'll have to increase the #of memory resources that can be considered in the future, but that's a different story and only applies when adding more than 100GB of memory via virtio-mem or so) But as virtio-mem is fairly new (IOW, about to get queued for integration soonish), I could still change the memory resources to show up differently ("System RAM (hotplugged)", "System RAM (virtio-mem)", etc.) and teach kexec about them. But learning that we are having similar problems on arm64 (and theoretically on Hyper-V), I think it makes sense to discuss a solution that will solve the other issues as well. > > > A third option, along the line of what I posted: > > Split the 'offline' and 'removed' ideas, which David mentioned somewhere. We'd end up with > (yet) another notifier chain, that prevents the memory being removed, but you can still I dislike limiting memory unplug - and especially making remove_memory() fail - just because somebody once thought it would be a good place to load - in the future - some kexec binary onto it. > mark it as offline in /sys/. (...I'm not quite sure why you would do that...) > > This would need hooking up for ACPI (which covers x86 and arm64), and other architectures > mechanisms for doing this... > arm64 can then switch is arch hook that prevents 'bootmem' being removed to this new > notifier chain, as the kernel can only boot from that was present at boot. We have two different problems here, right? 1. Don't place kexec binaries on specific memory areas (e.g., arm64, virtio-mem, hyper-v, ...) 2. Figure out what to do when unplugging memory that was selected as a target for kexec binaries. For 1, I have a feeling that /proc/iomem could be the right solution, eventually requiring kexec changes to handle kdump properly (IOW, dump all memory). Indicating all hotplugged memory as "System RAM (hotplugged)" would be the way to go here. For 2, I think we should unload all kexec images in case they overlap with memory to be removed (e.g., remove_memory() notifier, which cannot stop removal, it's only an indication), and make userspace reload kexec via udev events. Also, we have to think about kexec_file_load() to deal with 1. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image [not found] ` <20200326180730.4754-2-james.morse@arm.com> [not found] ` <321e6bf7-e898-7701-dd60-6c25237ff9cd@redhat.com> @ 2020-04-15 20:33 ` Eric W. Biederman 2020-04-22 12:28 ` James Morse 1 sibling, 1 reply; 61+ messages in thread From: Eric W. Biederman @ 2020-04-15 20:33 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > An image loaded for kexec is not stored in place, instead its segments > are scattered through memory, and are re-assembled when needed. In the > meantime, the target memory may have been removed. > > Because mm is not aware that this memory is still in use, it allows it > to be removed. > > Add a memory notifier to prevent the removal of memory regions that > overlap with a loaded kexec image segment. e.g., when triggered from the > Qemu console: > | kexec_core: memory region in use > | memory memory32: Offline failed. > > Signed-off-by: James Morse <james.morse@arm.com> Given that we are talking about the destination pages for kexec not where the loaded kernel is currently stored the description is confusing. Beyond that I think it would be better to simply unload the loaded kernel at memory hotunplug time. Usually somewhere in the loaded image is a copy of the memory map at the time the kexec kernel was loaded. That will invalidate the memory map as well. All of this should be for a very brief window of a few seconds, as the loaded kexec image is quite short. So instead of failing in the notifier, if you could simply unload the loaded image in the notifier I think that would be simpler and more robust. While still preventing the loaded image from falling over when it starts executing. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-15 20:33 ` Eric W. Biederman @ 2020-04-22 12:28 ` James Morse 2020-04-22 15:25 ` Eric W. Biederman 0 siblings, 1 reply; 61+ messages in thread From: James Morse @ 2020-04-22 12:28 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi Eric, On 15/04/2020 21:33, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: > >> An image loaded for kexec is not stored in place, instead its segments >> are scattered through memory, and are re-assembled when needed. In the >> meantime, the target memory may have been removed. >> >> Because mm is not aware that this memory is still in use, it allows it >> to be removed. >> >> Add a memory notifier to prevent the removal of memory regions that >> overlap with a loaded kexec image segment. e.g., when triggered from the >> Qemu console: >> | kexec_core: memory region in use >> | memory memory32: Offline failed. >> >> Signed-off-by: James Morse <james.morse@arm.com> > > Given that we are talking about the destination pages for kexec > not where the loaded kernel is currently stored the description is > confusing. I think David has some better wording to cover this. I thought I had it with 'scattered and re-assembled'. > Beyond that I think it would be better to simply unload the loaded > kernel at memory hotunplug time. Unconditionally, or if it aliases the removed region? I don't particular like it. User-space has asked for two impossible things, we are changing the answer to the first when we see the second. Its a bit spooky. (maybe no one will notice) > Usually somewhere in the loaded image > is a copy of the memory map at the time the kexec kernel was loaded. > That will invalidate the memory map as well. Ah, unconditionally. Sure, x86 needs this. (arm64 re-discovers the memory map from firmware tables after kexec) If that's an acceptable change in behaviour, sure, lets do that. > All of this should be for a very brief window of a few seconds, as > the loaded kexec image is quite short. It seems I'm the outlier anticipating anything could happen between those syscalls. Thanks, James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-22 12:28 ` James Morse @ 2020-04-22 15:25 ` Eric W. Biederman 2020-04-22 16:40 ` David Hildenbrand 0 siblings, 1 reply; 61+ messages in thread From: Eric W. Biederman @ 2020-04-22 15:25 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > Hi Eric, > > On 15/04/2020 21:33, Eric W. Biederman wrote: >> James Morse <james.morse@arm.com> writes: >> >>> An image loaded for kexec is not stored in place, instead its segments >>> are scattered through memory, and are re-assembled when needed. In the >>> meantime, the target memory may have been removed. >>> >>> Because mm is not aware that this memory is still in use, it allows it >>> to be removed. >>> >>> Add a memory notifier to prevent the removal of memory regions that >>> overlap with a loaded kexec image segment. e.g., when triggered from the >>> Qemu console: >>> | kexec_core: memory region in use >>> | memory memory32: Offline failed. >>> >>> Signed-off-by: James Morse <james.morse@arm.com> >> >> Given that we are talking about the destination pages for kexec >> not where the loaded kernel is currently stored the description is >> confusing. > > I think David has some better wording to cover this. I thought I had it with 'scattered > and re-assembled'. The confusing part was talking about memory being still in use, that is actually scheduled for use in the future. >> Usually somewhere in the loaded image >> is a copy of the memory map at the time the kexec kernel was loaded. >> That will invalidate the memory map as well. > > Ah, unconditionally. Sure, x86 needs this. > (arm64 re-discovers the memory map from firmware tables after kexec) > > If that's an acceptable change in behaviour, sure, lets do that. Yes. >> All of this should be for a very brief window of a few seconds, as >> the loaded kexec image is quite short. > > It seems I'm the outlier anticipating anything could happen between > those syscalls. The design is: sys_kexec_load() shutdown scripts sys_reboot(LINUX_REBOOT_CMD_KEXEC); There are two system call simply so that the shutdown scripts can run. Now maybe someone somewhere does something different but that is not expected. Only the kexec on panic kernel is expected to persist somewhat indefinitely. But that should be in memory that is reserved from boot time, and so the memory hotplug should have enough visibility to not allow that memory to be given up. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-22 15:25 ` Eric W. Biederman @ 2020-04-22 16:40 ` David Hildenbrand 2020-04-23 16:29 ` Eric W. Biederman 2020-05-01 16:55 ` James Morse 0 siblings, 2 replies; 61+ messages in thread From: David Hildenbrand @ 2020-04-22 16:40 UTC (permalink / raw) To: Eric W. Biederman, James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel > The confusing part was talking about memory being still in use, > that is actually scheduled for use in the future. +1 > >>> Usually somewhere in the loaded image >>> is a copy of the memory map at the time the kexec kernel was loaded. >>> That will invalidate the memory map as well. >> >> Ah, unconditionally. Sure, x86 needs this. >> (arm64 re-discovers the memory map from firmware tables after kexec) Does this include hotplugged DIMMs e.g., under KVM? [...] >>> All of this should be for a very brief window of a few seconds, as >>> the loaded kexec image is quite short. >> >> It seems I'm the outlier anticipating anything could happen between >> those syscalls. > > The design is: > sys_kexec_load() > shutdown scripts > sys_reboot(LINUX_REBOOT_CMD_KEXEC); > > There are two system call simply so that the shutdown scripts can run. > Now maybe someone somewhere does something different but that is not > expected. > > Only the kexec on panic kernel is expected to persist somewhat > indefinitely. But that should be in memory that is reserved from boot > time, and so the memory hotplug should have enough visibility to not > allow that memory to be given up. Yes, and AFAIK, memory blocks which hold the reserved crashkernel area can usually not get offlined and, therefore, the memory cannot get removed. Interestingly, s390x even has a hotplug notifier for that arch/s390/kernel/setup.c:kdump_mem_notifier() (offlining of memory on s390x can result in memory getting depopulated in the hypervisor, so after it would have been offlined, it would no longer be accessible. I somewhat doubt that this notifier is really needed - all pages in the crashkernel area should look like ordinary allocated pages when the area is reserved early during boot via the memblock allocator, and therefore offlining cannot succeed. But that's a different story - and I suspect this is a leftover from pre-memblock times.) -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-22 16:40 ` David Hildenbrand @ 2020-04-23 16:29 ` Eric W. Biederman 2020-04-24 7:39 ` David Hildenbrand 2020-05-01 16:55 ` James Morse 1 sibling, 1 reply; 61+ messages in thread From: Eric W. Biederman @ 2020-04-23 16:29 UTC (permalink / raw) To: David Hildenbrand Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel David Hildenbrand <david@redhat.com> writes: >> The confusing part was talking about memory being still in use, >> that is actually scheduled for use in the future. > > +1 > >> >>>> Usually somewhere in the loaded image >>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>> That will invalidate the memory map as well. >>> >>> Ah, unconditionally. Sure, x86 needs this. >>> (arm64 re-discovers the memory map from firmware tables after kexec) > > Does this include hotplugged DIMMs e.g., under KVM? > [...] As far as I know. If the memory map changes we need to drop the loaded image. Having thought about it a little more I suspect it would be the other way and just block all hotplug actions after a kexec_load. As all we expect to happen is running shutdown scripts. If blocking the hotplug action uses printk to print a nice message saying something like: "Hotplug blocked because of a loaded kexec image", then people will be able to figure out what is going on and call kexec -u if they haven't started the shutdown scripts yet. Either way it is something simple and unconditional that will make things work. >>>> All of this should be for a very brief window of a few seconds, as >>>> the loaded kexec image is quite short. >>> >>> It seems I'm the outlier anticipating anything could happen between >>> those syscalls. >> >> The design is: >> sys_kexec_load() >> shutdown scripts >> sys_reboot(LINUX_REBOOT_CMD_KEXEC); >> >> There are two system call simply so that the shutdown scripts can run. >> Now maybe someone somewhere does something different but that is not >> expected. >> >> Only the kexec on panic kernel is expected to persist somewhat >> indefinitely. But that should be in memory that is reserved from boot >> time, and so the memory hotplug should have enough visibility to not >> allow that memory to be given up. > > Yes, and AFAIK, memory blocks which hold the reserved crashkernel area > can usually not get offlined and, therefore, the memory cannot get removed. > > Interestingly, s390x even has a hotplug notifier for that > > arch/s390/kernel/setup.c:kdump_mem_notifier() > > (offlining of memory on s390x can result in memory getting depopulated > in the hypervisor, so after it would have been offlined, it would no > longer be accessible. I somewhat doubt that this notifier is really > needed - all pages in the crashkernel area should look like ordinary > allocated pages when the area is reserved early during boot via the > memblock allocator, and therefore offlining cannot succeed. But that's a > different story - and I suspect this is a leftover from pre-memblock times.) It might be worth seeing if that is true, or if we need to generalize the s390x code. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-23 16:29 ` Eric W. Biederman @ 2020-04-24 7:39 ` David Hildenbrand 2020-04-24 7:41 ` David Hildenbrand 0 siblings, 1 reply; 61+ messages in thread From: David Hildenbrand @ 2020-04-24 7:39 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 23.04.20 18:29, Eric W. Biederman wrote: > David Hildenbrand <david@redhat.com> writes: > >>> The confusing part was talking about memory being still in use, >>> that is actually scheduled for use in the future. >> >> +1 >> >>> >>>>> Usually somewhere in the loaded image >>>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>>> That will invalidate the memory map as well. >>>> >>>> Ah, unconditionally. Sure, x86 needs this. >>>> (arm64 re-discovers the memory map from firmware tables after kexec) >> >> Does this include hotplugged DIMMs e.g., under KVM? >> [...] > > As far as I know. If the memory map changes we need to drop the loaded > image. > > > Having thought about it a little more I suspect it would be the > other way and just block all hotplug actions after a kexec_load. > As all we expect to happen is running shutdown scripts. > > If blocking the hotplug action uses printk to print a nice message > saying something like: "Hotplug blocked because of a loaded kexec image", > then people will be able to figure out what is going on and > call kexec -u if they haven't started the shutdown scripts yet. > > > Either way it is something simple and unconditional that will make > things work. > Personally, I consider memory hotplug more important than keeping loaded kexec data alive (just because somebody once decided to do a "kexec -l" and never did a "kexec -e" we should not block any memory hot(un)plug - especially in virtualized environments - for all eternity). So IMHO we would invalidate loaded kexec data (not the crashkernel, of course) on memory hot(un)plug and print a warning. In addition, we can let kexec-tools try to reload whatever they loaded after getting notified that something changed. The "something changed" is visible to user space e.g., via udev events for /sys/devices/memory/memoryX/ >>>>> All of this should be for a very brief window of a few seconds, as >>>>> the loaded kexec image is quite short. >>>> >>>> It seems I'm the outlier anticipating anything could happen between >>>> those syscalls. >>> >>> The design is: >>> sys_kexec_load() >>> shutdown scripts >>> sys_reboot(LINUX_REBOOT_CMD_KEXEC); >>> >>> There are two system call simply so that the shutdown scripts can run. >>> Now maybe someone somewhere does something different but that is not >>> expected. >>> >>> Only the kexec on panic kernel is expected to persist somewhat >>> indefinitely. But that should be in memory that is reserved from boot >>> time, and so the memory hotplug should have enough visibility to not >>> allow that memory to be given up. >> >> Yes, and AFAIK, memory blocks which hold the reserved crashkernel area >> can usually not get offlined and, therefore, the memory cannot get removed. >> >> Interestingly, s390x even has a hotplug notifier for that >> >> arch/s390/kernel/setup.c:kdump_mem_notifier() >> >> (offlining of memory on s390x can result in memory getting depopulated >> in the hypervisor, so after it would have been offlined, it would no >> longer be accessible. I somewhat doubt that this notifier is really >> needed - all pages in the crashkernel area should look like ordinary >> allocated pages when the area is reserved early during boot via the >> memblock allocator, and therefore offlining cannot succeed. But that's a >> different story - and I suspect this is a leftover from pre-memblock times.) > > It might be worth seeing if that is true, or if we need to generalize the > s390x code. I'll try to find some time to test if the s390x handler is still relevant. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-24 7:39 ` David Hildenbrand @ 2020-04-24 7:41 ` David Hildenbrand 0 siblings, 0 replies; 61+ messages in thread From: David Hildenbrand @ 2020-04-24 7:41 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 24.04.20 09:39, David Hildenbrand wrote: > On 23.04.20 18:29, Eric W. Biederman wrote: >> David Hildenbrand <david@redhat.com> writes: >> >>>> The confusing part was talking about memory being still in use, >>>> that is actually scheduled for use in the future. >>> >>> +1 >>> >>>> >>>>>> Usually somewhere in the loaded image >>>>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>>>> That will invalidate the memory map as well. >>>>> >>>>> Ah, unconditionally. Sure, x86 needs this. >>>>> (arm64 re-discovers the memory map from firmware tables after kexec) >>> >>> Does this include hotplugged DIMMs e.g., under KVM? >>> [...] >> >> As far as I know. If the memory map changes we need to drop the loaded >> image. >> >> >> Having thought about it a little more I suspect it would be the >> other way and just block all hotplug actions after a kexec_load. >> As all we expect to happen is running shutdown scripts. >> >> If blocking the hotplug action uses printk to print a nice message >> saying something like: "Hotplug blocked because of a loaded kexec image", >> then people will be able to figure out what is going on and >> call kexec -u if they haven't started the shutdown scripts yet. >> >> >> Either way it is something simple and unconditional that will make >> things work. >> > > Personally, I consider memory hotplug more important than keeping loaded > kexec data alive (just because somebody once decided to do a "kexec -l" > and never did a "kexec -e" we should not block any memory hot(un)plug - > especially in virtualized environments - for all eternity). > > So IMHO we would invalidate loaded kexec data (not the crashkernel, of > course) on memory hot(un)plug and print a warning. In addition, we can > let kexec-tools try to reload whatever they loaded after getting > notified that something changed. > > The "something changed" is visible to user space e.g., via udev events > for /sys/devices/memory/memoryX/ /sys/devices/system/memory/memoryX/ ... -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-22 16:40 ` David Hildenbrand 2020-04-23 16:29 ` Eric W. Biederman @ 2020-05-01 16:55 ` James Morse 1 sibling, 0 replies; 61+ messages in thread From: James Morse @ 2020-05-01 16:55 UTC (permalink / raw) To: David Hildenbrand, Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi guys, On 22/04/2020 17:40, David Hildenbrand wrote: >>>> Usually somewhere in the loaded image >>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>> That will invalidate the memory map as well. >>> >>> Ah, unconditionally. Sure, x86 needs this. >>> (arm64 re-discovers the memory map from firmware tables after kexec) > Does this include hotplugged DIMMs e.g., under KVM? If you advertise hotplugged memory to the guest using ACPI, yes. We don't have a practical mechanism to pass 'fact's about the platform between kernels, instead we rely on those facts being discoverable, or described by firmware. >>>> All of this should be for a very brief window of a few seconds, as >>>> the loaded kexec image is quite short. >>> >>> It seems I'm the outlier anticipating anything could happen between >>> those syscalls. >> >> The design is: >> sys_kexec_load() >> shutdown scripts >> sys_reboot(LINUX_REBOOT_CMD_KEXEC); >> >> There are two system call simply so that the shutdown scripts can run. >> Now maybe someone somewhere does something different but that is not >> expected. [...] > Yes, and AFAIK, memory blocks which hold the reserved crashkernel area > can usually not get offlined and, therefore, the memory cannot get removed. The crashkernel area on arm64 will always land in un-removable memory. We set PG_Reserved on it too. Thanks, James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
[parent not found: <20200326180730.4754-3-james.morse@arm.com>]
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names [not found] ` <20200326180730.4754-3-james.morse@arm.com> @ 2020-04-02 5:49 ` Dave Young 2020-04-02 6:12 ` piliu 2020-04-15 20:36 ` Eric W. Biederman 2020-05-09 0:45 ` Andrew Morton 2 siblings, 1 reply; 61+ messages in thread From: Dave Young @ 2020-04-02 5:49 UTC (permalink / raw) To: James Morse Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Hari Bathini, Andrew Morton, Will Deacon, linux-arm-kernel On 03/26/20 at 06:07pm, James Morse wrote: > Memory added to the system by hotplug has a 'System RAM' resource created > for it. This is exposed to user-space via /proc/iomem. > > This poses problems for kexec on arm64. If kexec decides to place the > kernel in one of these newly onlined regions, the new kernel will find > itself booting from a region not described as memory in the firmware > tables. > > Arm64 doesn't have a structure like the e820 memory map that can be > re-written when memory is brought online. Instead arm64 uses the UEFI > memory map, or the memory node from the DT, sometimes both. We never > rewrite these. Could arm64 use similar way to update DT, or a cooked UEFI maps? Add pingfan in cc, he said ppc64 update the DT after a memremove thus it would be good to just redo a kexec load. Added Pingfan and Hari for comments and corrections. > > Allow an architecture to specify a different name for these hotplug > regions. > > Signed-off-by: James Morse <james.morse@arm.com> > --- > mm/memory_hotplug.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 0a54ffac8c68..69b03dd7fc74 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -42,6 +42,10 @@ > #include "internal.h" > #include "shuffle.h" > > +#ifndef MEMORY_HOTPLUG_RES_NAME > +#define MEMORY_HOTPLUG_RES_NAME "System RAM" > +#endif > + > /* > * online_page_callback contains pointer to current page onlining function. > * Initially it is generic_online_page(). If it is required it could be > @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) > { > struct resource *res; > unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > - char *resource_name = "System RAM"; > + char *resource_name = MEMORY_HOTPLUG_RES_NAME; > > if (start + size > max_mem_size) > return ERR_PTR(-E2BIG); > -- > 2.25.1 > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec > _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names 2020-04-02 5:49 ` [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names Dave Young @ 2020-04-02 6:12 ` piliu 2020-04-14 17:21 ` James Morse 0 siblings, 1 reply; 61+ messages in thread From: piliu @ 2020-04-02 6:12 UTC (permalink / raw) To: Dave Young, James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Hari Bathini, Andrew Morton, Will Deacon, linux-arm-kernel On 04/02/2020 01:49 PM, Dave Young wrote: > On 03/26/20 at 06:07pm, James Morse wrote: >> Memory added to the system by hotplug has a 'System RAM' resource created >> for it. This is exposed to user-space via /proc/iomem. >> >> This poses problems for kexec on arm64. If kexec decides to place the >> kernel in one of these newly onlined regions, the new kernel will find >> itself booting from a region not described as memory in the firmware >> tables. >> >> Arm64 doesn't have a structure like the e820 memory map that can be >> re-written when memory is brought online. Instead arm64 uses the UEFI >> memory map, or the memory node from the DT, sometimes both. We never >> rewrite these. > > Could arm64 use similar way to update DT, or a cooked UEFI maps? > > Add pingfan in cc, he said ppc64 update the DT after a memremove thus it > would be good to just redo a kexec load. > Yes, the memory changes will be observed through device-node under /proc/device-tree/ (which is for powerpc). Later if running kexec -l/-p , it can build new dtb with the latest info from /proc/device-tree > Added Pingfan and Hari for comments and corrections. > >> >> Allow an architecture to specify a different name for these hotplug >> regions. >> >> Signed-off-by: James Morse <james.morse@arm.com> >> --- >> mm/memory_hotplug.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 0a54ffac8c68..69b03dd7fc74 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -42,6 +42,10 @@ >> #include "internal.h" >> #include "shuffle.h" >> >> +#ifndef MEMORY_HOTPLUG_RES_NAME >> +#define MEMORY_HOTPLUG_RES_NAME "System RAM" >> +#endif >> + >> /* >> * online_page_callback contains pointer to current page onlining function. >> * Initially it is generic_online_page(). If it is required it could be >> @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) >> { >> struct resource *res; >> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >> - char *resource_name = "System RAM"; >> + char *resource_name = MEMORY_HOTPLUG_RES_NAME; >> >> if (start + size > max_mem_size) >> return ERR_PTR(-E2BIG); >> -- >> 2.25.1 >> >> >> _______________________________________________ >> kexec mailing list >> kexec@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/kexec >> _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names 2020-04-02 6:12 ` piliu @ 2020-04-14 17:21 ` James Morse 0 siblings, 0 replies; 61+ messages in thread From: James Morse @ 2020-04-14 17:21 UTC (permalink / raw) To: piliu Cc: Will Deacon, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Hari Bathini, Andrew Morton, Dave Young, linux-arm-kernel Hi Dave, Pingfan, On 02/04/2020 07:12, piliu wrote: > On 04/02/2020 01:49 PM, Dave Young wrote: >> On 03/26/20 at 06:07pm, James Morse wrote: >>> Memory added to the system by hotplug has a 'System RAM' resource created >>> for it. This is exposed to user-space via /proc/iomem. >>> >>> This poses problems for kexec on arm64. If kexec decides to place the >>> kernel in one of these newly onlined regions, the new kernel will find >>> itself booting from a region not described as memory in the firmware >>> tables. >>> >>> Arm64 doesn't have a structure like the e820 memory map that can be >>> re-written when memory is brought online. Instead arm64 uses the UEFI >>> memory map, or the memory node from the DT, sometimes both. We never >>> rewrite these. >> >> Could arm64 use similar way to update DT, or a cooked UEFI maps? >> Add pingfan in cc, he said ppc64 update the DT after a memremove thus it >> would be good to just redo a kexec load. > Yes, the memory changes will be observed through device-node under > /proc/device-tree/ (which is for powerpc). > > Later if running kexec -l/-p , it can build new dtb with the latest info > from /proc/device-tree For arm64, the device-tree is set in stone. We don't have the runtime parts of open-firmware that powerpc does. (my knowledge in this area is extremely sparse) arm64 platforms where stuff like this changes tend to use ACPI instead, and these all have to boot with UEFI, which means its the UEFI memory map that has authority. We don't cook a fake UEFI memory map when things change because we treat it like the set-in-stone DT. This means we only have discrepancies in firmware to workaround, instead of any we introduce ourselves. One of the UEFI configuration tables describes addresses Linux programmed into hardware that can't be reset. Newer versions of Linux know how to pick these up on kexec... but older versions don't know how to parse/rewrite/move that table. Cooking up new versions of these tables would prevent us doing stuff like this, which we need to workaround hardware that didn't get the 'kexec exists' memo. Thanks, James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names [not found] ` <20200326180730.4754-3-james.morse@arm.com> 2020-04-02 5:49 ` [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names Dave Young @ 2020-04-15 20:36 ` Eric W. Biederman 2020-04-22 12:14 ` James Morse 2020-05-09 0:45 ` Andrew Morton 2 siblings, 1 reply; 61+ messages in thread From: Eric W. Biederman @ 2020-04-15 20:36 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > Memory added to the system by hotplug has a 'System RAM' resource created > for it. This is exposed to user-space via /proc/iomem. > > This poses problems for kexec on arm64. If kexec decides to place the > kernel in one of these newly onlined regions, the new kernel will find > itself booting from a region not described as memory in the firmware > tables. > > Arm64 doesn't have a structure like the e820 memory map that can be > re-written when memory is brought online. Instead arm64 uses the UEFI > memory map, or the memory node from the DT, sometimes both. We never > rewrite these. > > Allow an architecture to specify a different name for these hotplug > regions. Gah. No. Please find a way to pass the current memory map to the loaded kexec'd kernel. Starting a kernel with no way for it to know what the current memory map is just plain scary. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names 2020-04-15 20:36 ` Eric W. Biederman @ 2020-04-22 12:14 ` James Morse 0 siblings, 0 replies; 61+ messages in thread From: James Morse @ 2020-04-22 12:14 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi Eric, On 15/04/2020 21:36, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: > >> Memory added to the system by hotplug has a 'System RAM' resource created >> for it. This is exposed to user-space via /proc/iomem. >> >> This poses problems for kexec on arm64. If kexec decides to place the >> kernel in one of these newly onlined regions, the new kernel will find >> itself booting from a region not described as memory in the firmware >> tables. >> >> Arm64 doesn't have a structure like the e820 memory map that can be >> re-written when memory is brought online. Instead arm64 uses the UEFI >> memory map, or the memory node from the DT, sometimes both. We never >> rewrite these. >> >> Allow an architecture to specify a different name for these hotplug >> regions. > > Gah. No. > > Please find a way to pass the current memory map to the loaded kexec'd > kernel. > Starting a kernel with no way for it to know what the current memory map > is just plain scary. We have one. Firmware tables are the source of all this information. We don't tamper with them. Firmware describes memory present at boot in the UEFI memory map or DT. On systems with ACPI, regions that were added after booting are discovered by running AML methods. (for which we need to allocate memory, so you can't describe boot memory like this) This doesn't work if you kexec from a hot-added region. You've booted from memory that wasn't present at boot. I don't think this is fixable with the set of constraints. Thanks, James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names [not found] ` <20200326180730.4754-3-james.morse@arm.com> 2020-04-02 5:49 ` [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names Dave Young 2020-04-15 20:36 ` Eric W. Biederman @ 2020-05-09 0:45 ` Andrew Morton 2020-05-11 8:35 ` David Hildenbrand 2 siblings, 1 reply; 61+ messages in thread From: Andrew Morton @ 2020-05-09 0:45 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Will Deacon, linux-arm-kernel On Thu, 26 Mar 2020 18:07:29 +0000 James Morse <james.morse@arm.com> wrote: > Memory added to the system by hotplug has a 'System RAM' resource created > for it. This is exposed to user-space via /proc/iomem. > > This poses problems for kexec on arm64. If kexec decides to place the > kernel in one of these newly onlined regions, the new kernel will find > itself booting from a region not described as memory in the firmware > tables. > > Arm64 doesn't have a structure like the e820 memory map that can be > re-written when memory is brought online. Instead arm64 uses the UEFI > memory map, or the memory node from the DT, sometimes both. We never > rewrite these. > > Allow an architecture to specify a different name for these hotplug > regions. > > ... > > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -42,6 +42,10 @@ > #include "internal.h" > #include "shuffle.h" > > +#ifndef MEMORY_HOTPLUG_RES_NAME > +#define MEMORY_HOTPLUG_RES_NAME "System RAM" > +#endif > + > /* > * online_page_callback contains pointer to current page onlining function. > * Initially it is generic_online_page(). If it is required it could be > @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) > { > struct resource *res; > unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > - char *resource_name = "System RAM"; > + char *resource_name = MEMORY_HOTPLUG_RES_NAME; > > if (start + size > max_mem_size) > return ERR_PTR(-E2BIG); I suppose we should do this as well: --- a/mm/memory_hotplug.c~mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix +++ a/mm/memory_hotplug.c @@ -129,7 +129,8 @@ static struct resource *register_memory_ resource_name, flags); if (!res) { - pr_debug("Unable to reserve System RAM region: %016llx->%016llx\n", + pr_debug("Unable to reserve " MEMORY_HOTPLUG_RES_NAME + " region: %016llx->%016llx\n", start, start + size); return ERR_PTR(-EEXIST); } It assumes that MEMORY_HOTPLUG_RES_NAME will be a literal string, which is the case in [3/3]. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names 2020-05-09 0:45 ` Andrew Morton @ 2020-05-11 8:35 ` David Hildenbrand 0 siblings, 0 replies; 61+ messages in thread From: David Hildenbrand @ 2020-05-11 8:35 UTC (permalink / raw) To: Andrew Morton, James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Will Deacon, linux-arm-kernel On 09.05.20 02:45, Andrew Morton wrote: > On Thu, 26 Mar 2020 18:07:29 +0000 James Morse <james.morse@arm.com> wrote: > >> Memory added to the system by hotplug has a 'System RAM' resource created >> for it. This is exposed to user-space via /proc/iomem. >> >> This poses problems for kexec on arm64. If kexec decides to place the >> kernel in one of these newly onlined regions, the new kernel will find >> itself booting from a region not described as memory in the firmware >> tables. >> >> Arm64 doesn't have a structure like the e820 memory map that can be >> re-written when memory is brought online. Instead arm64 uses the UEFI >> memory map, or the memory node from the DT, sometimes both. We never >> rewrite these. >> >> Allow an architecture to specify a different name for these hotplug >> regions. >> >> ... >> >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -42,6 +42,10 @@ >> #include "internal.h" >> #include "shuffle.h" >> >> +#ifndef MEMORY_HOTPLUG_RES_NAME >> +#define MEMORY_HOTPLUG_RES_NAME "System RAM" >> +#endif >> + >> /* >> * online_page_callback contains pointer to current page onlining function. >> * Initially it is generic_online_page(). If it is required it could be >> @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) >> { >> struct resource *res; >> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >> - char *resource_name = "System RAM"; >> + char *resource_name = MEMORY_HOTPLUG_RES_NAME; >> >> if (start + size > max_mem_size) >> return ERR_PTR(-E2BIG); > > I suppose we should do this as well: > > --- a/mm/memory_hotplug.c~mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix > +++ a/mm/memory_hotplug.c > @@ -129,7 +129,8 @@ static struct resource *register_memory_ > resource_name, flags); > > if (!res) { > - pr_debug("Unable to reserve System RAM region: %016llx->%016llx\n", > + pr_debug("Unable to reserve " MEMORY_HOTPLUG_RES_NAME > + " region: %016llx->%016llx\n", > start, start + size); > return ERR_PTR(-EEXIST); > } > > It assumes that MEMORY_HOTPLUG_RES_NAME will be a literal string, which > is the case in [3/3]. @Andrew, as discussed in this thread already [1], I suggest to drop this series from -mm tree for now. [1] https://lkml.kernel.org/r/2e3419b2-d00c-51c3-9b45-9de114608cdf@arm.com -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
[parent not found: <20200326180730.4754-4-james.morse@arm.com>]
* Re: [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name [not found] ` <20200326180730.4754-4-james.morse@arm.com> @ 2020-04-15 20:37 ` Eric W. Biederman 2020-04-22 12:14 ` James Morse 0 siblings, 1 reply; 61+ messages in thread From: Eric W. Biederman @ 2020-04-15 20:37 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > If kexec chooses to place the kernel in a memory region that was > added after boot, we fail to boot as the kernel is running from a > location that is not described as memory by the UEFI memory map or > the original DT. > > To prevent unaware user-space kexec from doing this accidentally, > give these regions a different name. Please fix the problem and don't hack around it. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name 2020-04-15 20:37 ` [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name Eric W. Biederman @ 2020-04-22 12:14 ` James Morse 0 siblings, 0 replies; 61+ messages in thread From: James Morse @ 2020-04-22 12:14 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi Eric, On 15/04/2020 21:37, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: > >> If kexec chooses to place the kernel in a memory region that was >> added after boot, we fail to boot as the kernel is running from a >> location that is not described as memory by the UEFI memory map or >> the original DT. >> >> To prevent unaware user-space kexec from doing this accidentally, >> give these regions a different name. > > Please fix the problem and don't hack around it. The problem is firmware didn't describe memory that wasn't present at boot. arm64 relies on the firmware description of memory well before it can go poking around in ACPI to find out where extra memory was added to the system. We already need kexec to not overwrite in-memory structures left by firmware. (like, the memory map). We do this by naming them reserved in /proc/iomem. Doing the same for hotadded memory means existing kexec user-space can't do this accidentally. The shape of /proc/iomem is the only trick in the book for arm64's kexec userspace, as its the only thing it looks at. Thanks, James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2020-05-11 8:35 UTC | newest]
Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20200326180730.4754-1-james.morse@arm.com>
[not found] ` <20200330135522.GE6352@MiWiFi-R3L-srv>
[not found] ` <2bdfbb1c-49da-d476-4a38-f91937105ae3@arm.com>
[not found] ` <20200331034612.GB83248@dhcp-128-65.nay.redhat.com>
2020-04-14 17:31 ` [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse
2020-04-15 20:29 ` Eric W. Biederman
2020-04-22 12:14 ` James Morse
2020-04-22 13:04 ` Eric W. Biederman
2020-04-22 15:40 ` James Morse
[not found] ` <20200326180730.4754-2-james.morse@arm.com>
[not found] ` <321e6bf7-e898-7701-dd60-6c25237ff9cd@redhat.com>
[not found] ` <a21d90ea-2566-a2bc-ad2f-6464a416c97f@arm.com>
[not found] ` <9cb4ea0d-34c3-de42-4b3f-ee25a59c4835@redhat.com>
[not found] ` <b0443908-e36f-9bc4-4a8a-4206cb782d4b@arm.com>
[not found] ` <72672e2c-a57a-8df9-0cff-8035cbce7740@redhat.com>
[not found] ` <34274b02-60ba-eb78-eacd-6dc1146ed3cd@arm.com>
[not found] ` <80e4d1d7-f493-3f66-f700-86f18002d692@redhat.com>
[not found] ` <dfacf85f-d79d-8742-7a13-1ac0a67bad04@arm.com>
[not found] ` <ba481c82-c69e-043c-4b66-2d2c7732cf07@redhat.com>
2020-04-10 19:10 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image Andrew Morton
2020-04-11 3:44 ` Baoquan He
2020-04-11 9:30 ` Russell King - ARM Linux admin
2020-04-11 9:58 ` David Hildenbrand
2020-04-12 5:35 ` Baoquan He
2020-04-12 8:08 ` Russell King - ARM Linux admin
2020-04-12 19:52 ` Eric W. Biederman
2020-04-12 20:37 ` Bhupesh SHARMA
2020-04-13 2:37 ` Baoquan He
2020-04-13 13:15 ` Eric W. Biederman
2020-04-13 23:01 ` Andrew Morton
2020-04-14 6:13 ` Eric W. Biederman
2020-04-14 6:40 ` Baoquan He
2020-04-14 6:51 ` Baoquan He
2020-04-14 8:00 ` David Hildenbrand
2020-04-14 9:22 ` Baoquan He
2020-04-14 9:37 ` David Hildenbrand
2020-04-14 14:39 ` Baoquan He
2020-04-14 14:49 ` David Hildenbrand
2020-04-15 2:35 ` Baoquan He
2020-04-16 13:31 ` David Hildenbrand
2020-04-16 14:02 ` Baoquan He
2020-04-16 14:09 ` David Hildenbrand
2020-04-16 14:36 ` Baoquan He
2020-04-16 14:47 ` David Hildenbrand
2020-04-21 13:29 ` David Hildenbrand
2020-04-21 13:57 ` David Hildenbrand
2020-04-21 13:59 ` Eric W. Biederman
2020-04-21 14:30 ` David Hildenbrand
2020-04-22 9:17 ` Baoquan He
2020-04-22 9:24 ` David Hildenbrand
2020-04-22 9:57 ` Baoquan He
2020-04-22 10:05 ` David Hildenbrand
2020-04-22 10:36 ` Baoquan He
2020-04-14 9:16 ` Dave Young
2020-04-14 9:38 ` Dave Young
2020-04-14 7:05 ` David Hildenbrand
2020-04-14 16:55 ` James Morse
2020-04-14 17:41 ` David Hildenbrand
2020-04-15 20:33 ` Eric W. Biederman
2020-04-22 12:28 ` James Morse
2020-04-22 15:25 ` Eric W. Biederman
2020-04-22 16:40 ` David Hildenbrand
2020-04-23 16:29 ` Eric W. Biederman
2020-04-24 7:39 ` David Hildenbrand
2020-04-24 7:41 ` David Hildenbrand
2020-05-01 16:55 ` James Morse
[not found] ` <20200326180730.4754-3-james.morse@arm.com>
2020-04-02 5:49 ` [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names Dave Young
2020-04-02 6:12 ` piliu
2020-04-14 17:21 ` James Morse
2020-04-15 20:36 ` Eric W. Biederman
2020-04-22 12:14 ` James Morse
2020-05-09 0:45 ` Andrew Morton
2020-05-11 8:35 ` David Hildenbrand
[not found] ` <20200326180730.4754-4-james.morse@arm.com>
2020-04-15 20:37 ` [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name Eric W. Biederman
2020-04-22 12:14 ` James Morse
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox