From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev,
Wei Yang <richard.weiyang@linux.alibaba.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Jason Wang <jasowang@redhat.com>,
Pankaj Gupta <pankaj.gupta.linux@gmail.com>,
Michal Hocko <mhocko@kernel.org>,
Oscar Salvador <osalvador@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@redhat.com>,
Ma Wupeng <mawupeng1@huawei.com>
Subject: [PATCH 5.10 82/89] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
Date: Mon, 19 Jun 2023 12:31:10 +0200 [thread overview]
Message-ID: <20230619102142.020097694@linuxfoundation.org> (raw)
In-Reply-To: <20230619102138.279161276@linuxfoundation.org>
From: David Hildenbrand <david@redhat.com>
commit 8dc4bb58a146655eb057247d7c9d19e73928715b upstream.
virtio-mem soon wants to use offline_and_remove_memory() memory that
exceeds a single Linux memory block (memory_block_size_bytes()). Let's
remove that restriction.
Let's remember the old state and try to restore that if anything goes
wrong. While re-onlining can, in general, fail, it's highly unlikely to
happen (usually only when a notifier fails to allocate memory, and these
are rather rare).
This will be used by virtio-mem to offline+remove memory ranges that are
bigger than a single memory block - for example, with a device block
size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
block size of 128MB.
While we could compress the state into 2 bit, using 8 bit is much
easier.
This handling is similar, but different to acpi_scan_try_to_offline():
a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
optimization is still relevant - it should only apply to ZONE_NORMAL
(where we have no guarantees). If relevant, we can always add it.
b) acpi_scan_try_to_offline() simply onlines all memory in case
something goes wrong. It doesn't restore previous online type. Let's do
that, so we won't overwrite what e.g., user space configured.
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/memory_hotplug.c | 105 ++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 89 insertions(+), 16 deletions(-)
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1788,39 +1788,112 @@ int remove_memory(int nid, u64 start, u6
}
EXPORT_SYMBOL_GPL(remove_memory);
+static int try_offline_memory_block(struct memory_block *mem, void *arg)
+{
+ uint8_t online_type = MMOP_ONLINE_KERNEL;
+ uint8_t **online_types = arg;
+ struct page *page;
+ int rc;
+
+ /*
+ * Sense the online_type via the zone of the memory block. Offlining
+ * with multiple zones within one memory block will be rejected
+ * by offlining code ... so we don't care about that.
+ */
+ page = pfn_to_online_page(section_nr_to_pfn(mem->start_section_nr));
+ if (page && zone_idx(page_zone(page)) == ZONE_MOVABLE)
+ online_type = MMOP_ONLINE_MOVABLE;
+
+ rc = device_offline(&mem->dev);
+ /*
+ * Default is MMOP_OFFLINE - change it only if offlining succeeded,
+ * so try_reonline_memory_block() can do the right thing.
+ */
+ if (!rc)
+ **online_types = online_type;
+
+ (*online_types)++;
+ /* Ignore if already offline. */
+ return rc < 0 ? rc : 0;
+}
+
+static int try_reonline_memory_block(struct memory_block *mem, void *arg)
+{
+ uint8_t **online_types = arg;
+ int rc;
+
+ if (**online_types != MMOP_OFFLINE) {
+ mem->online_type = **online_types;
+ rc = device_online(&mem->dev);
+ if (rc < 0)
+ pr_warn("%s: Failed to re-online memory: %d",
+ __func__, rc);
+ }
+
+ /* Continue processing all remaining memory blocks. */
+ (*online_types)++;
+ return 0;
+}
+
/*
- * Try to offline and remove a memory block. Might take a long time to
- * finish in case memory is still in use. Primarily useful for memory devices
- * that logically unplugged all memory (so it's no longer in use) and want to
- * offline + remove the memory block.
+ * Try to offline and remove memory. Might take a long time to finish in case
+ * memory is still in use. Primarily useful for memory devices that logically
+ * unplugged all memory (so it's no longer in use) and want to offline + remove
+ * that memory.
*/
int offline_and_remove_memory(int nid, u64 start, u64 size)
{
- struct memory_block *mem;
- int rc = -EINVAL;
+ const unsigned long mb_count = size / memory_block_size_bytes();
+ uint8_t *online_types, *tmp;
+ int rc;
if (!IS_ALIGNED(start, memory_block_size_bytes()) ||
- size != memory_block_size_bytes())
- return rc;
+ !IS_ALIGNED(size, memory_block_size_bytes()) || !size)
+ return -EINVAL;
+
+ /*
+ * We'll remember the old online type of each memory block, so we can
+ * try to revert whatever we did when offlining one memory block fails
+ * after offlining some others succeeded.
+ */
+ online_types = kmalloc_array(mb_count, sizeof(*online_types),
+ GFP_KERNEL);
+ if (!online_types)
+ return -ENOMEM;
+ /*
+ * Initialize all states to MMOP_OFFLINE, so when we abort processing in
+ * try_offline_memory_block(), we'll skip all unprocessed blocks in
+ * try_reonline_memory_block().
+ */
+ memset(online_types, MMOP_OFFLINE, mb_count);
lock_device_hotplug();
- mem = find_memory_block(__pfn_to_section(PFN_DOWN(start)));
- if (mem)
- rc = device_offline(&mem->dev);
- /* Ignore if the device is already offline. */
- if (rc > 0)
- rc = 0;
+
+ tmp = online_types;
+ rc = walk_memory_blocks(start, size, &tmp, try_offline_memory_block);
/*
- * In case we succeeded to offline the memory block, remove it.
+ * In case we succeeded to offline all memory, remove it.
* This cannot fail as it cannot get onlined in the meantime.
*/
if (!rc) {
rc = try_remove_memory(nid, start, size);
- WARN_ON_ONCE(rc);
+ if (rc)
+ pr_err("%s: Failed to remove memory: %d", __func__, rc);
+ }
+
+ /*
+ * Rollback what we did. While memory onlining might theoretically fail
+ * (nacked by a notifier), it barely ever happens.
+ */
+ if (rc) {
+ tmp = online_types;
+ walk_memory_blocks(start, size, &tmp,
+ try_reonline_memory_block);
}
unlock_device_hotplug();
+ kfree(online_types);
return rc;
}
EXPORT_SYMBOL_GPL(offline_and_remove_memory);
next prev parent reply other threads:[~2023-06-19 10:57 UTC|newest]
Thread overview: 98+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-19 10:29 [PATCH 5.10 00/89] 5.10.185-rc1 review Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 01/89] lib: cleanup kstrto*() usage Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 02/89] kernel.h: split out kstrtox() and simple_strtox() to a separate header Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 03/89] test_firmware: Use kstrtobool() instead of strtobool() Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 04/89] test_firmware: prevent race conditions by a correct implementation of locking Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 05/89] test_firmware: fix a memory leak with reqs buffer Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 06/89] power: supply: ab8500: Fix external_power_changed race Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 07/89] power: supply: sc27xx: " Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 08/89] power: supply: bq27xxx: Use mod_delayed_work() instead of cancel() + schedule() Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 09/89] ARM: dts: vexpress: add missing cache properties Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 10/89] tools: gpio: fix debounce_period_us output of lsgpio Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 11/89] power: supply: Ratelimit no data debug output Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 12/89] platform/x86: asus-wmi: Ignore WMI events with codes 0x7B, 0xC0 Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 13/89] regulator: Fix error checking for debugfs_create_dir Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 14/89] irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 15/89] power: supply: Fix logic checking if system is running from battery Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 16/89] btrfs: scrub: try harder to mark RAID56 block groups read-only Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 17/89] btrfs: handle memory allocation failure in btrfs_csum_one_bio Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 18/89] ASoC: soc-pcm: test if a BE can be prepared Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 19/89] parisc: Improve cache flushing for PCXL in arch_sync_dma_for_cpu() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 20/89] parisc: Flush gatt writes and adjust gatt mask in parisc_agp_mask_memory() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 21/89] MIPS: Alchemy: fix dbdma2 Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 22/89] mips: Move initrd_start check after initrd address sanitisation Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 23/89] ASoC: dwc: move DMA init to snd_soc_dai_driver probe() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 24/89] xen/blkfront: Only check REQ_FUA for writes Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 25/89] drm:amd:amdgpu: Fix missing buffer object unlock in failure path Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 26/89] irqchip/gic: Correctly validate OF quirk descriptors Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 27/89] io_uring: hold uring mutex around poll removal Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 28/89] epoll: ep_autoremove_wake_function should use list_del_init_careful Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 29/89] ocfs2: fix use-after-free when unmounting read-only filesystem Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 30/89] ocfs2: check new file size on fallocate call Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 31/89] nios2: dts: Fix tse_mac "max-frame-size" property Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 32/89] nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 33/89] nilfs2: fix possible out-of-bounds segment allocation in resize ioctl Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 34/89] kexec: support purgatories with .text.hot sections Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 35/89] x86/purgatory: remove PGO flags Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 36/89] powerpc/purgatory: " Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 37/89] nouveau: fix client work fence deletion race Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 38/89] RDMA/uverbs: Restrict usage of privileged QKEYs Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 39/89] net: usb: qmi_wwan: add support for Compal RXM-G1 Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 40/89] ALSA: hda/realtek: Add a quirk for Compaq N14JP6 Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 41/89] Remove DECnet support from kernel Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 42/89] USB: serial: option: add Quectel EM061KGL series Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 43/89] serial: lantiq: add missing interrupt ack Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 44/89] usb: dwc3: gadget: Reset num TRBs before giving back the request Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 45/89] RDMA/rtrs: Fix the last iu->buf leak in err path Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 46/89] spi: fsl-dspi: avoid SCK glitches with continuous transfers Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 47/89] netfilter: nfnetlink: skip error delivery on batch in case of ENOMEM Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 48/89] net: enetc: correct the indexes of highest and 2nd highest TCs Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 49/89] ping6: Fix send to link-local addresses with VRF Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 50/89] net/sched: cls_u32: Fix reference counter leak leading to overflow Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 51/89] RDMA/rxe: Remove the unused variable obj Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 52/89] RDMA/rxe: Removed unused name from rxe_task struct Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 53/89] RDMA/rxe: Fix the use-before-initialization error of resp_pkts Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 54/89] iavf: remove mask from iavf_irq_enable_queues() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 55/89] octeontx2-af: fixed resource availability check Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 56/89] RDMA/mlx5: Initiate dropless RQ for RAW Ethernet functions Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 57/89] RDMA/cma: Always set static rate to 0 for RoCE Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 58/89] IB/uverbs: Fix to consider event queue closing also upon non-blocking mode Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 59/89] IB/isert: Fix dead lock in ib_isert Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 60/89] IB/isert: Fix possible list corruption in CMA handler Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 61/89] IB/isert: Fix incorrect release of isert connection Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 62/89] ipvlan: fix bound dev checking for IPv6 l3s mode Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 63/89] sctp: fix an error code in sctp_sf_eat_auth() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 64/89] igb: fix nvm.ops.read() error handling Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 65/89] drm/nouveau: dont detect DSM for non-NVIDIA device Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 66/89] drm/nouveau/dp: check for NULL nv_connector->native_mode Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 67/89] drm/nouveau: add nv_encoder pointer check for NULL Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 68/89] ext4: drop the call to ext4_error() from ext4_get_group_info() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 69/89] net/sched: cls_api: Fix lockup on flushing explicitly created chain Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 70/89] net: lapbether: only support ethernet devices Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 71/89] net: tipc: resize nlattr array to correct size Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 72/89] selftests/ptp: Fix timestamp printf format for PTP_SYS_OFFSET Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 73/89] afs: Fix vlserver probe RTT handling Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 74/89] cgroup: always put cset in cgroup_css_set_put_fork Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 75/89] rcu/kvfree: Avoid freeing new kfree_rcu() memory after old grace period Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 76/89] neighbour: Remove unused inline function neigh_key_eq16() Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 77/89] net: Remove unused inline function dst_hold_and_use() Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 78/89] net: Remove DECnet leftovers from flow.h Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 79/89] neighbour: delete neigh_lookup_nodev as not used Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 80/89] batman-adv: Switch to kstrtox.h for kstrtou64 Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 81/89] mmc: block: ensure error propagation for non-blk Greg Kroah-Hartman
2023-06-19 10:31 ` Greg Kroah-Hartman [this message]
2023-06-19 10:31 ` [PATCH 5.10 83/89] nilfs2: reject devices with insufficient block count Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 84/89] media: dvbdev: Fix memleak in dvb_register_device Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 85/89] media: dvbdev: fix error logic at dvb_register_device() Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 86/89] media: dvb-core: Fix use-after-free due to race " Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 87/89] drm/i915/dg1: Wait for pcode/uncore handshake at startup Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 88/89] drm/i915/gen11+: Only load DRAM information from pcode Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 89/89] um: Fix build w/o CONFIG_PM_SLEEP Greg Kroah-Hartman
2023-06-19 13:20 ` [PATCH 5.10 00/89] 5.10.185-rc1 review Florian Fainelli
2023-06-20 9:18 ` Chris Paterson
2023-06-20 10:21 ` Jon Hunter
2023-06-20 11:04 ` Sudip Mukherjee (Codethink)
2023-06-20 14:37 ` Naresh Kamboju
2023-06-20 17:08 ` Allen Pais
2023-06-20 21:04 ` Shuah Khan
2023-06-21 0:38 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230619102142.020097694@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=jasowang@redhat.com \
--cc=mawupeng1@huawei.com \
--cc=mhocko@kernel.org \
--cc=mst@redhat.com \
--cc=osalvador@suse.de \
--cc=pankaj.gupta.linux@gmail.com \
--cc=patches@lists.linux.dev \
--cc=richard.weiyang@linux.alibaba.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.