From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev,
Wei Yang <richard.weiyang@linux.alibaba.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Jason Wang <jasowang@redhat.com>,
Pankaj Gupta <pankaj.gupta.linux@gmail.com>,
Michal Hocko <mhocko@kernel.org>,
Oscar Salvador <osalvador@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@redhat.com>,
Ma Wupeng <mawupeng1@huawei.com>
Subject: [PATCH 5.10 82/89] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
Date: Mon, 19 Jun 2023 12:31:10 +0200 [thread overview]
Message-ID: <20230619102142.020097694@linuxfoundation.org> (raw)
In-Reply-To: <20230619102138.279161276@linuxfoundation.org>
From: David Hildenbrand <david@redhat.com>
commit 8dc4bb58a146655eb057247d7c9d19e73928715b upstream.
virtio-mem soon wants to use offline_and_remove_memory() memory that
exceeds a single Linux memory block (memory_block_size_bytes()). Let's
remove that restriction.
Let's remember the old state and try to restore that if anything goes
wrong. While re-onlining can, in general, fail, it's highly unlikely to
happen (usually only when a notifier fails to allocate memory, and these
are rather rare).
This will be used by virtio-mem to offline+remove memory ranges that are
bigger than a single memory block - for example, with a device block
size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
block size of 128MB.
While we could compress the state into 2 bit, using 8 bit is much
easier.
This handling is similar, but different to acpi_scan_try_to_offline():
a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
optimization is still relevant - it should only apply to ZONE_NORMAL
(where we have no guarantees). If relevant, we can always add it.
b) acpi_scan_try_to_offline() simply onlines all memory in case
something goes wrong. It doesn't restore previous online type. Let's do
that, so we won't overwrite what e.g., user space configured.
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/memory_hotplug.c | 105 ++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 89 insertions(+), 16 deletions(-)
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1788,39 +1788,112 @@ int remove_memory(int nid, u64 start, u6
}
EXPORT_SYMBOL_GPL(remove_memory);
+static int try_offline_memory_block(struct memory_block *mem, void *arg)
+{
+ uint8_t online_type = MMOP_ONLINE_KERNEL;
+ uint8_t **online_types = arg;
+ struct page *page;
+ int rc;
+
+ /*
+ * Sense the online_type via the zone of the memory block. Offlining
+ * with multiple zones within one memory block will be rejected
+ * by offlining code ... so we don't care about that.
+ */
+ page = pfn_to_online_page(section_nr_to_pfn(mem->start_section_nr));
+ if (page && zone_idx(page_zone(page)) == ZONE_MOVABLE)
+ online_type = MMOP_ONLINE_MOVABLE;
+
+ rc = device_offline(&mem->dev);
+ /*
+ * Default is MMOP_OFFLINE - change it only if offlining succeeded,
+ * so try_reonline_memory_block() can do the right thing.
+ */
+ if (!rc)
+ **online_types = online_type;
+
+ (*online_types)++;
+ /* Ignore if already offline. */
+ return rc < 0 ? rc : 0;
+}
+
+static int try_reonline_memory_block(struct memory_block *mem, void *arg)
+{
+ uint8_t **online_types = arg;
+ int rc;
+
+ if (**online_types != MMOP_OFFLINE) {
+ mem->online_type = **online_types;
+ rc = device_online(&mem->dev);
+ if (rc < 0)
+ pr_warn("%s: Failed to re-online memory: %d",
+ __func__, rc);
+ }
+
+ /* Continue processing all remaining memory blocks. */
+ (*online_types)++;
+ return 0;
+}
+
/*
- * Try to offline and remove a memory block. Might take a long time to
- * finish in case memory is still in use. Primarily useful for memory devices
- * that logically unplugged all memory (so it's no longer in use) and want to
- * offline + remove the memory block.
+ * Try to offline and remove memory. Might take a long time to finish in case
+ * memory is still in use. Primarily useful for memory devices that logically
+ * unplugged all memory (so it's no longer in use) and want to offline + remove
+ * that memory.
*/
int offline_and_remove_memory(int nid, u64 start, u64 size)
{
- struct memory_block *mem;
- int rc = -EINVAL;
+ const unsigned long mb_count = size / memory_block_size_bytes();
+ uint8_t *online_types, *tmp;
+ int rc;
if (!IS_ALIGNED(start, memory_block_size_bytes()) ||
- size != memory_block_size_bytes())
- return rc;
+ !IS_ALIGNED(size, memory_block_size_bytes()) || !size)
+ return -EINVAL;
+
+ /*
+ * We'll remember the old online type of each memory block, so we can
+ * try to revert whatever we did when offlining one memory block fails
+ * after offlining some others succeeded.
+ */
+ online_types = kmalloc_array(mb_count, sizeof(*online_types),
+ GFP_KERNEL);
+ if (!online_types)
+ return -ENOMEM;
+ /*
+ * Initialize all states to MMOP_OFFLINE, so when we abort processing in
+ * try_offline_memory_block(), we'll skip all unprocessed blocks in
+ * try_reonline_memory_block().
+ */
+ memset(online_types, MMOP_OFFLINE, mb_count);
lock_device_hotplug();
- mem = find_memory_block(__pfn_to_section(PFN_DOWN(start)));
- if (mem)
- rc = device_offline(&mem->dev);
- /* Ignore if the device is already offline. */
- if (rc > 0)
- rc = 0;
+
+ tmp = online_types;
+ rc = walk_memory_blocks(start, size, &tmp, try_offline_memory_block);
/*
- * In case we succeeded to offline the memory block, remove it.
+ * In case we succeeded to offline all memory, remove it.
* This cannot fail as it cannot get onlined in the meantime.
*/
if (!rc) {
rc = try_remove_memory(nid, start, size);
- WARN_ON_ONCE(rc);
+ if (rc)
+ pr_err("%s: Failed to remove memory: %d", __func__, rc);
+ }
+
+ /*
+ * Rollback what we did. While memory onlining might theoretically fail
+ * (nacked by a notifier), it barely ever happens.
+ */
+ if (rc) {
+ tmp = online_types;
+ walk_memory_blocks(start, size, &tmp,
+ try_reonline_memory_block);
}
unlock_device_hotplug();
+ kfree(online_types);
return rc;
}
EXPORT_SYMBOL_GPL(offline_and_remove_memory);
next prev parent reply other threads:[~2023-06-19 10:59 UTC|newest]
Thread overview: 98+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-19 10:29 [PATCH 5.10 00/89] 5.10.185-rc1 review Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 01/89] lib: cleanup kstrto*() usage Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 02/89] kernel.h: split out kstrtox() and simple_strtox() to a separate header Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 03/89] test_firmware: Use kstrtobool() instead of strtobool() Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 04/89] test_firmware: prevent race conditions by a correct implementation of locking Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 05/89] test_firmware: fix a memory leak with reqs buffer Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 06/89] power: supply: ab8500: Fix external_power_changed race Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 07/89] power: supply: sc27xx: " Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 08/89] power: supply: bq27xxx: Use mod_delayed_work() instead of cancel() + schedule() Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 09/89] ARM: dts: vexpress: add missing cache properties Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 10/89] tools: gpio: fix debounce_period_us output of lsgpio Greg Kroah-Hartman
2023-06-19 10:29 ` [PATCH 5.10 11/89] power: supply: Ratelimit no data debug output Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 12/89] platform/x86: asus-wmi: Ignore WMI events with codes 0x7B, 0xC0 Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 13/89] regulator: Fix error checking for debugfs_create_dir Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 14/89] irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 15/89] power: supply: Fix logic checking if system is running from battery Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 16/89] btrfs: scrub: try harder to mark RAID56 block groups read-only Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 17/89] btrfs: handle memory allocation failure in btrfs_csum_one_bio Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 18/89] ASoC: soc-pcm: test if a BE can be prepared Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 19/89] parisc: Improve cache flushing for PCXL in arch_sync_dma_for_cpu() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 20/89] parisc: Flush gatt writes and adjust gatt mask in parisc_agp_mask_memory() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 21/89] MIPS: Alchemy: fix dbdma2 Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 22/89] mips: Move initrd_start check after initrd address sanitisation Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 23/89] ASoC: dwc: move DMA init to snd_soc_dai_driver probe() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 24/89] xen/blkfront: Only check REQ_FUA for writes Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 25/89] drm:amd:amdgpu: Fix missing buffer object unlock in failure path Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 26/89] irqchip/gic: Correctly validate OF quirk descriptors Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 27/89] io_uring: hold uring mutex around poll removal Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 28/89] epoll: ep_autoremove_wake_function should use list_del_init_careful Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 29/89] ocfs2: fix use-after-free when unmounting read-only filesystem Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 30/89] ocfs2: check new file size on fallocate call Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 31/89] nios2: dts: Fix tse_mac "max-frame-size" property Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 32/89] nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 33/89] nilfs2: fix possible out-of-bounds segment allocation in resize ioctl Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 34/89] kexec: support purgatories with .text.hot sections Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 35/89] x86/purgatory: remove PGO flags Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 36/89] powerpc/purgatory: " Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 37/89] nouveau: fix client work fence deletion race Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 38/89] RDMA/uverbs: Restrict usage of privileged QKEYs Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 39/89] net: usb: qmi_wwan: add support for Compal RXM-G1 Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 40/89] ALSA: hda/realtek: Add a quirk for Compaq N14JP6 Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 41/89] Remove DECnet support from kernel Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 42/89] USB: serial: option: add Quectel EM061KGL series Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 43/89] serial: lantiq: add missing interrupt ack Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 44/89] usb: dwc3: gadget: Reset num TRBs before giving back the request Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 45/89] RDMA/rtrs: Fix the last iu->buf leak in err path Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 46/89] spi: fsl-dspi: avoid SCK glitches with continuous transfers Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 47/89] netfilter: nfnetlink: skip error delivery on batch in case of ENOMEM Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 48/89] net: enetc: correct the indexes of highest and 2nd highest TCs Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 49/89] ping6: Fix send to link-local addresses with VRF Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 50/89] net/sched: cls_u32: Fix reference counter leak leading to overflow Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 51/89] RDMA/rxe: Remove the unused variable obj Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 52/89] RDMA/rxe: Removed unused name from rxe_task struct Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 53/89] RDMA/rxe: Fix the use-before-initialization error of resp_pkts Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 54/89] iavf: remove mask from iavf_irq_enable_queues() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 55/89] octeontx2-af: fixed resource availability check Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 56/89] RDMA/mlx5: Initiate dropless RQ for RAW Ethernet functions Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 57/89] RDMA/cma: Always set static rate to 0 for RoCE Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 58/89] IB/uverbs: Fix to consider event queue closing also upon non-blocking mode Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 59/89] IB/isert: Fix dead lock in ib_isert Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 60/89] IB/isert: Fix possible list corruption in CMA handler Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 61/89] IB/isert: Fix incorrect release of isert connection Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 62/89] ipvlan: fix bound dev checking for IPv6 l3s mode Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 63/89] sctp: fix an error code in sctp_sf_eat_auth() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 64/89] igb: fix nvm.ops.read() error handling Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 65/89] drm/nouveau: dont detect DSM for non-NVIDIA device Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 66/89] drm/nouveau/dp: check for NULL nv_connector->native_mode Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 67/89] drm/nouveau: add nv_encoder pointer check for NULL Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 68/89] ext4: drop the call to ext4_error() from ext4_get_group_info() Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 69/89] net/sched: cls_api: Fix lockup on flushing explicitly created chain Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 70/89] net: lapbether: only support ethernet devices Greg Kroah-Hartman
2023-06-19 10:30 ` [PATCH 5.10 71/89] net: tipc: resize nlattr array to correct size Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 72/89] selftests/ptp: Fix timestamp printf format for PTP_SYS_OFFSET Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 73/89] afs: Fix vlserver probe RTT handling Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 74/89] cgroup: always put cset in cgroup_css_set_put_fork Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 75/89] rcu/kvfree: Avoid freeing new kfree_rcu() memory after old grace period Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 76/89] neighbour: Remove unused inline function neigh_key_eq16() Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 77/89] net: Remove unused inline function dst_hold_and_use() Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 78/89] net: Remove DECnet leftovers from flow.h Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 79/89] neighbour: delete neigh_lookup_nodev as not used Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 80/89] batman-adv: Switch to kstrtox.h for kstrtou64 Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 81/89] mmc: block: ensure error propagation for non-blk Greg Kroah-Hartman
2023-06-19 10:31 ` Greg Kroah-Hartman [this message]
2023-06-19 10:31 ` [PATCH 5.10 83/89] nilfs2: reject devices with insufficient block count Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 84/89] media: dvbdev: Fix memleak in dvb_register_device Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 85/89] media: dvbdev: fix error logic at dvb_register_device() Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 86/89] media: dvb-core: Fix use-after-free due to race " Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 87/89] drm/i915/dg1: Wait for pcode/uncore handshake at startup Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 88/89] drm/i915/gen11+: Only load DRAM information from pcode Greg Kroah-Hartman
2023-06-19 10:31 ` [PATCH 5.10 89/89] um: Fix build w/o CONFIG_PM_SLEEP Greg Kroah-Hartman
2023-06-19 13:20 ` [PATCH 5.10 00/89] 5.10.185-rc1 review Florian Fainelli
2023-06-20 9:18 ` Chris Paterson
2023-06-20 10:21 ` Jon Hunter
2023-06-20 11:04 ` Sudip Mukherjee (Codethink)
2023-06-20 14:37 ` Naresh Kamboju
2023-06-20 17:08 ` Allen Pais
2023-06-20 21:04 ` Shuah Khan
2023-06-21 0:38 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230619102142.020097694@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=jasowang@redhat.com \
--cc=mawupeng1@huawei.com \
--cc=mhocko@kernel.org \
--cc=mst@redhat.com \
--cc=osalvador@suse.de \
--cc=pankaj.gupta.linux@gmail.com \
--cc=patches@lists.linux.dev \
--cc=richard.weiyang@linux.alibaba.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).