* [PATCH v1 1/5] mm/memory_hotplug: check for fatal signals only in offline_pages()
2023-06-27 11:22 [PATCH v1 0/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals David Hildenbrand
@ 2023-06-27 11:22 ` David Hildenbrand
[not found] ` <ZJrXcaGE6gCrmLqg@dhcp22.suse.cz>
2023-06-27 11:22 ` [PATCH v1 2/5] virtio-mem: convert most offline_and_remove_memory() errors to -EBUSY David Hildenbrand
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2023-06-27 11:22 UTC (permalink / raw)
To: linux-kernel
Cc: Xuan Zhuo, Michal Hocko, John Hubbard, Michael S. Tsirkin,
virtualization, linux-mm, Andrew Morton, Oscar Salvador
Let's check for fatal signals only. That looks cleaner and still keeps
the documented use case for manual user-space triggered memory offlining
working. From Documentation/admin-guide/mm/memory-hotplug.rst:
% timeout $TIMEOUT offline_block | failure_handling
In fact, we even document there: "the offlining context can be terminated
by sending a fatal signal".
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/memory_hotplug.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 8e0fa209d533..0d2151df4ee1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1879,7 +1879,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
do {
pfn = start_pfn;
do {
- if (signal_pending(current)) {
+ if (fatal_signal_pending(current)) {
ret = -EINTR;
reason = "signal backoff";
goto failed_removal_isolated;
--
2.40.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v1 2/5] virtio-mem: convert most offline_and_remove_memory() errors to -EBUSY
2023-06-27 11:22 [PATCH v1 0/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals David Hildenbrand
2023-06-27 11:22 ` [PATCH v1 1/5] mm/memory_hotplug: check for fatal signals only in offline_pages() David Hildenbrand
@ 2023-06-27 11:22 ` David Hildenbrand
2023-06-27 11:22 ` [PATCH v1 3/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals David Hildenbrand
` (2 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2023-06-27 11:22 UTC (permalink / raw)
To: linux-kernel
Cc: Xuan Zhuo, Michal Hocko, John Hubbard, Michael S. Tsirkin,
virtualization, linux-mm, Andrew Morton, Oscar Salvador
Let's prepare for offline_and_remove_memory() to return other error
codes that effectively translate to -EBUSY, such as -ETIMEDOUT.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
drivers/virtio/virtio_mem.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 835f6cc2fb66..cb8bc6f6aa90 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -750,7 +750,15 @@ static int virtio_mem_offline_and_remove_memory(struct virtio_mem *vm,
dev_dbg(&vm->vdev->dev,
"offlining and removing memory failed: %d\n", rc);
}
- return rc;
+
+ switch (rc) {
+ case 0:
+ case -ENOMEM:
+ case -EINVAL:
+ return rc;
+ default:
+ return -EBUSY;
+ }
}
/*
--
2.40.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v1 3/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals
2023-06-27 11:22 [PATCH v1 0/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals David Hildenbrand
2023-06-27 11:22 ` [PATCH v1 1/5] mm/memory_hotplug: check for fatal signals only in offline_pages() David Hildenbrand
2023-06-27 11:22 ` [PATCH v1 2/5] virtio-mem: convert most offline_and_remove_memory() errors to -EBUSY David Hildenbrand
@ 2023-06-27 11:22 ` David Hildenbrand
[not found] ` <ZJrYv0JIcrNyf2py@dhcp22.suse.cz>
2023-06-28 2:00 ` kernel test robot
2023-06-27 11:22 ` [PATCH v1 4/5] virtio-mem: set the timeout for offline_and_remove_memory() to 10 seconds David Hildenbrand
2023-06-27 11:22 ` [PATCH v1 5/5] virtio-mem: check if the config changed before (fake) offlining memory David Hildenbrand
4 siblings, 2 replies; 11+ messages in thread
From: David Hildenbrand @ 2023-06-27 11:22 UTC (permalink / raw)
To: linux-kernel
Cc: Xuan Zhuo, Michal Hocko, John Hubbard, Michael S. Tsirkin,
virtualization, linux-mm, Andrew Morton, Oscar Salvador
John Hubbard writes [1]:
Some device drivers add memory to the system via memory hotplug.
When the driver is unloaded, that memory is hot-unplugged.
However, memory hot unplug can fail. And these days, it fails a
little too easily, with respect to the above case. Specifically, if
a signal is pending on the process, hot unplug fails.
[...]
So in this case, other things (unmovable pages, un-splittable huge
pages) can also cause the above problem. However, those are
demonstrably less common than simply having a pending signal. I've
got bug reports from users who can trivially reproduce this by
killing their process with a "kill -9", for example.
Especially with ZONE_MOVABLE, offlining is supposed to work in most
cases when offlining actually hotplugged (not boot) memory, and only fail
in rare corner cases (e.g., some driver holds a reference to a page in
ZONE_MOVABLE, turning it unmovable).
In these corner cases we really don't want to be stuck forever in
offline_and_remove_memory(). But in the general cases, we really want to
do our best to make memory offlining succeed -- in a reasonable
timeframe.
Reliably failing in the described case when there is a fatal signal pending
is sub-optimal. The pending signal check is mostly only relevant when user
space explicitly triggers offlining of memory using sysfs device attributes
("state" or "online" attribute), but not when coming via
offline_and_remove_memory().
So let's use a timer instead and ignore fatal signals, because they are
not really expressive for offline_and_remove_memory() users. Let's default
to 30 seconds if no timeout was specified, and limit the timeout to 120
seconds.
This change is also valuable for virtio-mem in BBM (Big Block Mode) with
"bbm_safe_unplug=off", to avoid endless loops when stuck forever in
offline_and_remove_memory().
While at it, drop the "extern" from offline_and_remove_memory() to make
it fit into a single line.
[1] https://lkml.kernel.org/r/20230620011719.155379-1-jhubbard@nvidia.com
Signed-off-by: David Hildenbrand <david@redhat.com>
---
drivers/virtio/virtio_mem.c | 2 +-
include/linux/memory_hotplug.h | 2 +-
mm/memory_hotplug.c | 50 ++++++++++++++++++++++++++++++++--
3 files changed, 50 insertions(+), 4 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index cb8bc6f6aa90..f8792223f1db 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -738,7 +738,7 @@ static int virtio_mem_offline_and_remove_memory(struct virtio_mem *vm,
"offlining and removing memory: 0x%llx - 0x%llx\n", addr,
addr + size - 1);
- rc = offline_and_remove_memory(addr, size);
+ rc = offline_and_remove_memory(addr, size, 0);
if (!rc) {
atomic64_sub(size, &vm->offline_size);
/*
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 9fcbf5706595..d5f9e8b5a4a4 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -307,7 +307,7 @@ extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
struct zone *zone, struct memory_group *group);
extern int remove_memory(u64 start, u64 size);
extern void __remove_memory(u64 start, u64 size);
-extern int offline_and_remove_memory(u64 start, u64 size);
+int offline_and_remove_memory(u64 start, u64 size, unsigned int timeout_ms);
#else
static inline void try_offline_node(int nid) {}
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0d2151df4ee1..ca635121644a 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -152,6 +152,22 @@ void put_online_mems(void)
bool movable_node_enabled = false;
+/*
+ * Protected by the device hotplug lock: offline_and_remove_memory()
+ * will activate a timer such that offlining cannot be stuck forever.
+ *
+ * With an active timer, fatal signals will be ignored, because they can be
+ * counter-productive when dying user space triggers device unplug/driver
+ * unloading that ends up offlining+removing device memory.
+ */
+static bool mhp_offlining_timer_active;
+static atomic_t mhp_offlining_timer_expired;
+
+static void mhp_offline_timer_fn(struct timer_list *unused)
+{
+ atomic_set(&mhp_offlining_timer_expired, 1);
+}
+
#ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE
int mhp_default_online_type = MMOP_OFFLINE;
#else
@@ -1879,7 +1895,18 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
do {
pfn = start_pfn;
do {
- if (fatal_signal_pending(current)) {
+ /*
+ * If a timer is active, we're coming via
+ * offline_and_remove_memory() and want to ignore even
+ * fatal signals.
+ */
+ if (mhp_offlining_timer_active) {
+ if (atomic_read(&mhp_offlining_timer_expired)) {
+ ret = -ETIMEDOUT;
+ reason = "timeout";
+ goto failed_removal_isolated;
+ }
+ } else if (fatal_signal_pending(current)) {
ret = -EINTR;
reason = "signal backoff";
goto failed_removal_isolated;
@@ -2232,11 +2259,17 @@ static int try_reonline_memory_block(struct memory_block *mem, void *arg)
* memory is still in use. Primarily useful for memory devices that logically
* unplugged all memory (so it's no longer in use) and want to offline + remove
* that memory.
+ *
+ * offline_and_remove_memory() will not fail on fatal signals. Instead, it will
+ * fail once the timeout has been reached and offlining was not completed. If
+ * no timeout was specified, it will timeout after 30 seconds. The timeout is
+ * limited to 120 seconds.
*/
-int offline_and_remove_memory(u64 start, u64 size)
+int offline_and_remove_memory(u64 start, u64 size, unsigned int timeout_ms)
{
const unsigned long mb_count = size / memory_block_size_bytes();
uint8_t *online_types, *tmp;
+ struct timer_list timer;
int rc;
if (!IS_ALIGNED(start, memory_block_size_bytes()) ||
@@ -2261,9 +2294,22 @@ int offline_and_remove_memory(u64 start, u64 size)
lock_device_hotplug();
+ if (!timeout_ms)
+ timeout_ms = 30 * MSEC_PER_SEC;
+ timeout_ms = min_t(unsigned int, timeout_ms, 120 * MSEC_PER_SEC);
+
+ timer_setup_on_stack(&timer, mhp_offline_timer_fn, 0);
+ mod_timer(&timer, jiffies + msecs_to_jiffies(timeout_ms));
+ mhp_offlining_timer_active = true;
+
tmp = online_types;
rc = walk_memory_blocks(start, size, &tmp, try_offline_memory_block);
+ timer_delete_sync(&timer);
+ atomic_set(&mhp_offlining_timer_expired, 0);
+ mhp_offlining_timer_active = false;
+ destroy_timer_on_stack(&timer);
+
/*
* In case we succeeded to offline all memory, remove it.
* This cannot fail as it cannot get onlined in the meantime.
--
2.40.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 11+ messages in thread[parent not found: <ZJrYv0JIcrNyf2py@dhcp22.suse.cz>]
* Re: [PATCH v1 3/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals
[not found] ` <ZJrYv0JIcrNyf2py@dhcp22.suse.cz>
@ 2023-06-27 13:14 ` David Hildenbrand
[not found] ` <ZJrvhACxmaQmmwYP@dhcp22.suse.cz>
0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2023-06-27 13:14 UTC (permalink / raw)
To: Michal Hocko
Cc: Xuan Zhuo, Michael S. Tsirkin, John Hubbard, linux-kernel,
virtualization, linux-mm, Andrew Morton, Oscar Salvador
On 27.06.23 14:40, Michal Hocko wrote:
> On Tue 27-06-23 13:22:18, David Hildenbrand wrote:
>> John Hubbard writes [1]:
>>
>> Some device drivers add memory to the system via memory hotplug.
>> When the driver is unloaded, that memory is hot-unplugged.
>>
>> However, memory hot unplug can fail. And these days, it fails a
>> little too easily, with respect to the above case. Specifically, if
>> a signal is pending on the process, hot unplug fails.
>>
>> [...]
>>
>> So in this case, other things (unmovable pages, un-splittable huge
>> pages) can also cause the above problem. However, those are
>> demonstrably less common than simply having a pending signal. I've
>> got bug reports from users who can trivially reproduce this by
>> killing their process with a "kill -9", for example.
>
> This looks like a bug of the said driver no? If the tear down process is
> killed it could very well happen right before offlining so you end up in
> the very same state. Or what am I missing?
IIUC (John can correct me if I am wrong):
1) The process holds the device node open
2) The process gets killed or quits
3) As the process gets torn down, it closes the device node
4) Closing the device node results in the driver removing the device and
calling offline_and_remove_memory()
So it's not a "tear down process" that triggers that offlining_removal
somehow explicitly, it's just a side-product of it letting go of the
device node as the process gets torn down.
>
>> Especially with ZONE_MOVABLE, offlining is supposed to work in most
>> cases when offlining actually hotplugged (not boot) memory, and only fail
>> in rare corner cases (e.g., some driver holds a reference to a page in
>> ZONE_MOVABLE, turning it unmovable).
>>
>> In these corner cases we really don't want to be stuck forever in
>> offline_and_remove_memory(). But in the general cases, we really want to
>> do our best to make memory offlining succeed -- in a reasonable
>> timeframe.
>>
>> Reliably failing in the described case when there is a fatal signal pending
>> is sub-optimal. The pending signal check is mostly only relevant when user
>> space explicitly triggers offlining of memory using sysfs device attributes
>> ("state" or "online" attribute), but not when coming via
>> offline_and_remove_memory().
>>
>> So let's use a timer instead and ignore fatal signals, because they are
>> not really expressive for offline_and_remove_memory() users. Let's default
>> to 30 seconds if no timeout was specified, and limit the timeout to 120
>> seconds.
>
> I really hate having timeouts back. They just proven to be hard to get
> right and it is essentially a policy implemented in the kernel. They
> simply do not belong to the kernel space IMHO.
As much as I agree with you in terms of offlining triggered from user
space (e.g., write "state" or "online" attribute) where user-space is
actually in charge and can do something reasonable (timeout, retry,
whatever), in these the offline_and_remove_memory() case it's the driver
that wants a best-effort memory offlining+removal.
If it times out, virtio-mem will simply try another block or retry
later. Right now, it could get stuck forever in
offline_and_remove_memory(), which is obviously "not great".
Fortunately, for virtio-mem it's configurable and we use the
alloc_contig_range()-method for now as default.
If it would time out for John's driver, we most certainly don't want to
be stuck in offline_and_remove_memory(), blocking device/driver
unloading (and even a reboot IIRC) possibly forever.
I much rather have offline_and_remove_memory() indicate "timeout" to a
in-kernel user a bit earlier than getting stuck in there forever. The
timeout parameter allows for giving the in-kernel users a bit of
flexibility, which I showcases for virtio-mem that unplugs smaller
blocks and rather wants to fail fast and retry later.
Sure, we could make the timeout configurable to optimize for some corner
cases, but that's not really what offline_and_remove_memory() users want
and I doubt anybody would fine-tune that: they want a best-effort
attempt. And that's IMHO not really a policy, it's an implementation
detail of these drivers.
For example, the driver from John could simply never call
offline_and_remove_memory() and always require a reboot when wanting to
reuse a device. But that's definitely what users want.
virtio-mem could simply never call offline_and_remove_memory() and
indicate "I don't support unplug of these online memory blocks". But
that's *definitely* not what users want.
I'm very open for alternatives regarding offline_and_remove_memory(), so
far this was the only reasonable thing I could come up with that
actually achieves what we want for these users: not get stuck in there
forever but rather fail earlier than later.
And most importantly, not somehow try to involve user space that isn't
even in charge of the offlining operation.
--
Cheers,
David / dhildenb
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1 3/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals
2023-06-27 11:22 ` [PATCH v1 3/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals David Hildenbrand
[not found] ` <ZJrYv0JIcrNyf2py@dhcp22.suse.cz>
@ 2023-06-28 2:00 ` kernel test robot
1 sibling, 0 replies; 11+ messages in thread
From: kernel test robot @ 2023-06-28 2:00 UTC (permalink / raw)
To: David Hildenbrand, linux-kernel
Cc: Xuan Zhuo, Michal Hocko, John Hubbard, llvm, virtualization,
linux-mm, Michael S. Tsirkin, oe-kbuild-all, Andrew Morton,
Oscar Salvador
Hi David,
kernel test robot noticed the following build warnings:
[auto build test WARNING on 6995e2de6891c724bfeb2db33d7b87775f913ad1]
url: https://github.com/intel-lab-lkp/linux/commits/David-Hildenbrand/mm-memory_hotplug-check-for-fatal-signals-only-in-offline_pages/20230627-192444
base: 6995e2de6891c724bfeb2db33d7b87775f913ad1
patch link: https://lore.kernel.org/r/20230627112220.229240-4-david%40redhat.com
patch subject: [PATCH v1 3/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals
config: x86_64-randconfig-x006-20230627 (https://download.01.org/0day-ci/archive/20230628/202306280935.dKTWlHFD-lkp@intel.com/config)
compiler: clang version 15.0.7 (https://github.com/llvm/llvm-project.git 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a)
reproduce: (https://download.01.org/0day-ci/archive/20230628/202306280935.dKTWlHFD-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202306280935.dKTWlHFD-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> mm/memory_hotplug.c:163:13: warning: unused variable 'mhp_offlining_timer_active' [-Wunused-variable]
static bool mhp_offlining_timer_active;
^
mm/memory_hotplug.c:166:13: warning: unused function 'mhp_offline_timer_fn' [-Wunused-function]
static void mhp_offline_timer_fn(struct timer_list *unused)
^
2 warnings generated.
vim +/mhp_offlining_timer_active +163 mm/memory_hotplug.c
154
155 /*
156 * Protected by the device hotplug lock: offline_and_remove_memory()
157 * will activate a timer such that offlining cannot be stuck forever.
158 *
159 * With an active timer, fatal signals will be ignored, because they can be
160 * counter-productive when dying user space triggers device unplug/driver
161 * unloading that ends up offlining+removing device memory.
162 */
> 163 static bool mhp_offlining_timer_active;
164 static atomic_t mhp_offlining_timer_expired;
165
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v1 4/5] virtio-mem: set the timeout for offline_and_remove_memory() to 10 seconds
2023-06-27 11:22 [PATCH v1 0/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals David Hildenbrand
` (2 preceding siblings ...)
2023-06-27 11:22 ` [PATCH v1 3/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals David Hildenbrand
@ 2023-06-27 11:22 ` David Hildenbrand
2023-06-27 11:22 ` [PATCH v1 5/5] virtio-mem: check if the config changed before (fake) offlining memory David Hildenbrand
4 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2023-06-27 11:22 UTC (permalink / raw)
To: linux-kernel
Cc: Xuan Zhuo, Michal Hocko, John Hubbard, Michael S. Tsirkin,
virtualization, linux-mm, Andrew Morton, Oscar Salvador
Currently we use the default (30 seconds), let's reduce it to 10
seconds. In BBM, we barely deal with blocks larger than 1/2 GiB, and
after 10 seconds it's most probably best to give up on that memory block
and try another one (or retry this one later).
In the common fake-offline case where we effectively fake-offline memory
using alloc_contig_range() first (SBM or BBM with bbm_safe_unplug=on),
we expect offline_and_remove_memory() to be blazingly fast and never take
anywhere close to 10seconds -- so this should only affect BBM with
bbm_safe_unplug=off.
While at it, update the parameter description and the relationship to
unmovable pages.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
drivers/virtio/virtio_mem.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index f8792223f1db..7468b4a907e3 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -41,7 +41,7 @@ MODULE_PARM_DESC(bbm_block_size,
static bool bbm_safe_unplug = true;
module_param(bbm_safe_unplug, bool, 0444);
MODULE_PARM_DESC(bbm_safe_unplug,
- "Use a safe unplug mechanism in BBM, avoiding long/endless loops");
+ "Use a safe/fast unplug mechanism in BBM, failing faster on unmovable pages");
/*
* virtio-mem currently supports the following modes of operation:
@@ -738,7 +738,7 @@ static int virtio_mem_offline_and_remove_memory(struct virtio_mem *vm,
"offlining and removing memory: 0x%llx - 0x%llx\n", addr,
addr + size - 1);
- rc = offline_and_remove_memory(addr, size, 0);
+ rc = offline_and_remove_memory(addr, size, 10 * MSEC_PER_SEC);
if (!rc) {
atomic64_sub(size, &vm->offline_size);
/*
--
2.40.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v1 5/5] virtio-mem: check if the config changed before (fake) offlining memory
2023-06-27 11:22 [PATCH v1 0/5] mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals David Hildenbrand
` (3 preceding siblings ...)
2023-06-27 11:22 ` [PATCH v1 4/5] virtio-mem: set the timeout for offline_and_remove_memory() to 10 seconds David Hildenbrand
@ 2023-06-27 11:22 ` David Hildenbrand
4 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2023-06-27 11:22 UTC (permalink / raw)
To: linux-kernel
Cc: Xuan Zhuo, Michal Hocko, John Hubbard, Michael S. Tsirkin,
virtualization, linux-mm, Andrew Morton, Oscar Salvador
If we repeatedly fail to (fake) offline memory, we won't be sending
any unplug requests to the device. However, we only check if the config
changed when sending such (un)plug requests.
So we could end up trying for a long time to offline memory, even though
the config changed already and we're not supposed to unplug memory
anymore.
Let's optimize for that case, identified while testing the
offline_and_remove() memory timeout and simulating it repeatedly running
into the timeout.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
drivers/virtio/virtio_mem.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 7468b4a907e3..247fb3e0ce61 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -1922,6 +1922,10 @@ static int virtio_mem_sbm_unplug_sb_online(struct virtio_mem *vm,
unsigned long start_pfn;
int rc;
+ /* Stop fake offlining attempts if the config changed. */
+ if (atomic_read(&vm->config_changed))
+ return -EAGAIN;
+
start_pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) +
sb_id * vm->sbm.sb_size);
@@ -2233,6 +2237,10 @@ static int virtio_mem_bbm_unplug_request(struct virtio_mem *vm, uint64_t diff)
virtio_mem_bbm_for_each_bb_rev(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED) {
cond_resched();
+ /* Stop (fake) offlining attempts if the config changed. */
+ if (atomic_read(&vm->config_changed))
+ return -EAGAIN;
+
/*
* As we're holding no locks, these checks are racy,
* but we don't care.
--
2.40.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 11+ messages in thread