* [PATCH 0/2] Drivers: hv: hv_balloon: avoid OOM killer on the ballooning path
@ 2015-02-19 16:27 Vitaly Kuznetsov
2015-02-19 16:27 ` [PATCH 1/2] Drivers: hv: hv_balloon: report offline pages as being used Vitaly Kuznetsov
2015-02-19 16:27 ` [PATCH 2/2] Drivers: hv: hv_balloon: refuse to balloon below the floor Vitaly Kuznetsov
0 siblings, 2 replies; 6+ messages in thread
From: Vitaly Kuznetsov @ 2015-02-19 16:27 UTC (permalink / raw)
To: K. Y. Srinivasan, devel; +Cc: Haiyang Zhang, linux-kernel, Dexuan Cui
In some cases host asks us to overballoon and this triggers OOM killer which
eventually kills everyone. The easiest way to get into such situation is to
avoid onlining memory-hotplugged blocks. Address the issue twice:
- Report offline pages as used to the host so it won't ask us to overballoon;
- Avoid ballooning below the 'floor'.
Vitaly Kuznetsov (2):
Drivers: hv: hv_balloon: report offline pages as being used
Drivers: hv: hv_balloon: refuse to balloon below the floor
drivers/hv/hv_balloon.c | 44 +++++++++++++++++++++++++++++++++++---------
1 file changed, 35 insertions(+), 9 deletions(-)
--
1.9.3
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] Drivers: hv: hv_balloon: report offline pages as being used
2015-02-19 16:27 [PATCH 0/2] Drivers: hv: hv_balloon: avoid OOM killer on the ballooning path Vitaly Kuznetsov
@ 2015-02-19 16:27 ` Vitaly Kuznetsov
2015-02-25 14:32 ` KY Srinivasan
2015-02-19 16:27 ` [PATCH 2/2] Drivers: hv: hv_balloon: refuse to balloon below the floor Vitaly Kuznetsov
1 sibling, 1 reply; 6+ messages in thread
From: Vitaly Kuznetsov @ 2015-02-19 16:27 UTC (permalink / raw)
To: K. Y. Srinivasan, devel; +Cc: Haiyang Zhang, linux-kernel, Dexuan Cui
When hot-added memory pages are not brought online or when some memory blocks
are sent offline the subsequent ballooning process kills the guest with OOM
killer. This happens as we don't report these pages as neither used nor free
and apparently host algorythm considers them as being unused. Keep track of
all online/offline operations and report all currently offline pages as being
used so host won't try to balloon them out.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
drivers/hv/hv_balloon.c | 33 ++++++++++++++++++++++++---------
1 file changed, 24 insertions(+), 9 deletions(-)
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index a095b70..e4b4454 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -503,6 +503,8 @@ struct hv_dynmem_device {
* Number of pages we have currently ballooned out.
*/
unsigned int num_pages_ballooned;
+ unsigned int num_pages_onlined;
+ unsigned int num_pages_added;
/*
* State to manage the ballooning (up) operation.
@@ -556,12 +558,15 @@ static void post_status(struct hv_dynmem_device *dm);
static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
void *v)
{
+ struct memory_notify *mem = (struct memory_notify *)v;
+
switch (val) {
case MEM_GOING_ONLINE:
mutex_lock(&dm_device.ha_region_mutex);
break;
case MEM_ONLINE:
+ dm_device.num_pages_onlined += mem->nr_pages;
case MEM_CANCEL_ONLINE:
mutex_unlock(&dm_device.ha_region_mutex);
if (dm_device.ha_waiting) {
@@ -570,8 +575,12 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
}
break;
- case MEM_GOING_OFFLINE:
case MEM_OFFLINE:
+ mutex_lock(&dm_device.ha_region_mutex);
+ dm_device.num_pages_onlined -= mem->nr_pages;
+ mutex_unlock(&dm_device.ha_region_mutex);
+ break;
+ case MEM_GOING_OFFLINE:
case MEM_CANCEL_OFFLINE:
break;
}
@@ -896,6 +905,8 @@ static void hot_add_req(struct work_struct *dummy)
if (do_hot_add)
resp.page_count = process_hot_add(pg_start, pfn_cnt,
rg_start, rg_sz);
+
+ dm->num_pages_added += resp.page_count;
mutex_unlock(&dm_device.ha_region_mutex);
#endif
/*
@@ -1009,17 +1020,21 @@ static void post_status(struct hv_dynmem_device *dm)
status.hdr.trans_id = atomic_inc_return(&trans_id);
/*
- * The host expects the guest to report free memory.
- * Further, the host expects the pressure information to
- * include the ballooned out pages.
- * For a given amount of memory that we are managing, we
- * need to compute a floor below which we should not balloon.
- * Compute this and add it to the pressure report.
+ * The host expects the guest to report free and committed memory.
+ * Furthermore, the host expects the pressure information to include
+ * the ballooned out pages. For a given amount of memory that we are
+ * managing we need to compute a floor below which we should not
+ * balloon. Compute this and add it to the pressure report.
+ * We also need to report all offline pages (num_pages_added -
+ * num_pages_onlined) as committed to the host, otherwise it can try
+ * asking us to balloon them out.
*/
status.num_avail = val.freeram;
status.num_committed = vm_memory_committed() +
- dm->num_pages_ballooned +
- compute_balloon_floor();
+ dm->num_pages_ballooned +
+ (dm->num_pages_added > dm->num_pages_onlined ?
+ dm->num_pages_added - dm->num_pages_onlined : 0) +
+ compute_balloon_floor();
/*
* If our transaction ID is no longer current, just don't
--
1.9.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] Drivers: hv: hv_balloon: refuse to balloon below the floor
2015-02-19 16:27 [PATCH 0/2] Drivers: hv: hv_balloon: avoid OOM killer on the ballooning path Vitaly Kuznetsov
2015-02-19 16:27 ` [PATCH 1/2] Drivers: hv: hv_balloon: report offline pages as being used Vitaly Kuznetsov
@ 2015-02-19 16:27 ` Vitaly Kuznetsov
1 sibling, 0 replies; 6+ messages in thread
From: Vitaly Kuznetsov @ 2015-02-19 16:27 UTC (permalink / raw)
To: K. Y. Srinivasan, devel; +Cc: Haiyang Zhang, linux-kernel, Dexuan Cui
When host asks us to balloon up we need to be sure we're not committing suicide
by overballooning. Use already existent 'floor' metric as our lowest possible
value for free ram.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
drivers/hv/hv_balloon.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index e4b4454..3292e97 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1138,6 +1138,8 @@ static void balloon_up(struct work_struct *dummy)
bool alloc_error;
bool done = false;
int i;
+ struct sysinfo val;
+ unsigned long floor;
/* The host balloons pages in 2M granularity. */
WARN_ON_ONCE(num_pages % PAGES_IN_2M != 0);
@@ -1148,6 +1150,15 @@ static void balloon_up(struct work_struct *dummy)
*/
alloc_unit = 512;
+ si_meminfo(&val);
+ floor = compute_balloon_floor();
+
+ /* Refuse to balloon below the floor, keep the 2M granularity. */
+ if (val.freeram - num_pages < floor) {
+ num_pages = val.freeram > floor ? (val.freeram - floor) : 0;
+ num_pages -= num_pages % PAGES_IN_2M;
+ }
+
while (!done) {
bl_resp = (struct dm_balloon_response *)send_buffer;
memset(send_buffer, 0, PAGE_SIZE);
--
1.9.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* RE: [PATCH 1/2] Drivers: hv: hv_balloon: report offline pages as being used
2015-02-19 16:27 ` [PATCH 1/2] Drivers: hv: hv_balloon: report offline pages as being used Vitaly Kuznetsov
@ 2015-02-25 14:32 ` KY Srinivasan
2015-02-25 16:55 ` Vitaly Kuznetsov
0 siblings, 1 reply; 6+ messages in thread
From: KY Srinivasan @ 2015-02-25 14:32 UTC (permalink / raw)
To: Vitaly Kuznetsov, devel@linuxdriverproject.org
Cc: Haiyang Zhang, linux-kernel@vger.kernel.org, Dexuan Cui
> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Thursday, February 19, 2015 8:27 AM
> To: KY Srinivasan; devel@linuxdriverproject.org
> Cc: Haiyang Zhang; linux-kernel@vger.kernel.org; Dexuan Cui
> Subject: [PATCH 1/2] Drivers: hv: hv_balloon: report offline pages as being
> used
>
> When hot-added memory pages are not brought online or when some
> memory blocks
> are sent offline the subsequent ballooning process kills the guest with OOM
> killer. This happens as we don't report these pages as neither used nor free
> and apparently host algorythm considers them as being unused. Keep track
> of
> all online/offline operations and report all currently offline pages as being
> used so host won't try to balloon them out.
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> drivers/hv/hv_balloon.c | 33 ++++++++++++++++++++++++---------
> 1 file changed, 24 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index a095b70..e4b4454 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -503,6 +503,8 @@ struct hv_dynmem_device {
> * Number of pages we have currently ballooned out.
> */
> unsigned int num_pages_ballooned;
> + unsigned int num_pages_onlined;
> + unsigned int num_pages_added;
>
> /*
> * State to manage the ballooning (up) operation.
> @@ -556,12 +558,15 @@ static void post_status(struct hv_dynmem_device
> *dm);
> static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
> void *v)
> {
> + struct memory_notify *mem = (struct memory_notify *)v;
> +
> switch (val) {
> case MEM_GOING_ONLINE:
> mutex_lock(&dm_device.ha_region_mutex);
> break;
>
> case MEM_ONLINE:
> + dm_device.num_pages_onlined += mem->nr_pages;
> case MEM_CANCEL_ONLINE:
Why are we not adjusting num_pages_onlined when we cancel the online
Operation.
K. Y
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] Drivers: hv: hv_balloon: report offline pages as being used
2015-02-25 14:32 ` KY Srinivasan
@ 2015-02-25 16:55 ` Vitaly Kuznetsov
2015-02-25 22:29 ` KY Srinivasan
0 siblings, 1 reply; 6+ messages in thread
From: Vitaly Kuznetsov @ 2015-02-25 16:55 UTC (permalink / raw)
To: KY Srinivasan
Cc: devel@linuxdriverproject.org, Haiyang Zhang,
linux-kernel@vger.kernel.org, Dexuan Cui
KY Srinivasan <kys@microsoft.com> writes:
>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: Thursday, February 19, 2015 8:27 AM
>> To: KY Srinivasan; devel@linuxdriverproject.org
>> Cc: Haiyang Zhang; linux-kernel@vger.kernel.org; Dexuan Cui
>> Subject: [PATCH 1/2] Drivers: hv: hv_balloon: report offline pages as being
>> used
>>
>> When hot-added memory pages are not brought online or when some
>> memory blocks
>> are sent offline the subsequent ballooning process kills the guest with OOM
>> killer. This happens as we don't report these pages as neither used nor free
>> and apparently host algorythm considers them as being unused. Keep track
>> of
>> all online/offline operations and report all currently offline pages as being
>> used so host won't try to balloon them out.
>>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>> drivers/hv/hv_balloon.c | 33 ++++++++++++++++++++++++---------
>> 1 file changed, 24 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
>> index a095b70..e4b4454 100644
>> --- a/drivers/hv/hv_balloon.c
>> +++ b/drivers/hv/hv_balloon.c
>> @@ -503,6 +503,8 @@ struct hv_dynmem_device {
>> * Number of pages we have currently ballooned out.
>> */
>> unsigned int num_pages_ballooned;
>> + unsigned int num_pages_onlined;
>> + unsigned int num_pages_added;
>>
>> /*
>> * State to manage the ballooning (up) operation.
>> @@ -556,12 +558,15 @@ static void post_status(struct hv_dynmem_device
>> *dm);
>> static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
>> void *v)
>> {
>> + struct memory_notify *mem = (struct memory_notify *)v;
>> +
>> switch (val) {
>> case MEM_GOING_ONLINE:
>> mutex_lock(&dm_device.ha_region_mutex);
>> break;
>>
>> case MEM_ONLINE:
>> + dm_device.num_pages_onlined += mem->nr_pages;
>> case MEM_CANCEL_ONLINE:
>
> Why are we not adjusting num_pages_onlined when we cancel the online
> Operation.
Because we didn't increase the number yet.
To my understanding, events come in the following order:
1) MEM_GOING_ONLINE - we just take the lock
2) MEM_ONLINE - and we increase nr_pages and drop the lock
or
MEM_CANCEL_ONLINE - we just drop the lock (mem never was online so
nr_pages wasn't increased)
3) MEM_GOING_OFFLINE - we do nothing
4) MEM_OFFLINE - and we decrease nr_pages
or
MEM_CANCEL_OFFLINE - we do nothing (mem is still online, no need to
adjust nr_pages)
>
> K. Y
--
Vitaly
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH 1/2] Drivers: hv: hv_balloon: report offline pages as being used
2015-02-25 16:55 ` Vitaly Kuznetsov
@ 2015-02-25 22:29 ` KY Srinivasan
0 siblings, 0 replies; 6+ messages in thread
From: KY Srinivasan @ 2015-02-25 22:29 UTC (permalink / raw)
To: Vitaly Kuznetsov
Cc: devel@linuxdriverproject.org, Haiyang Zhang,
linux-kernel@vger.kernel.org, Dexuan Cui
> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Wednesday, February 25, 2015 8:56 AM
> To: KY Srinivasan
> Cc: devel@linuxdriverproject.org; Haiyang Zhang; linux-
> kernel@vger.kernel.org; Dexuan Cui
> Subject: Re: [PATCH 1/2] Drivers: hv: hv_balloon: report offline pages as
> being used
>
> KY Srinivasan <kys@microsoft.com> writes:
>
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> >> Sent: Thursday, February 19, 2015 8:27 AM
> >> To: KY Srinivasan; devel@linuxdriverproject.org
> >> Cc: Haiyang Zhang; linux-kernel@vger.kernel.org; Dexuan Cui
> >> Subject: [PATCH 1/2] Drivers: hv: hv_balloon: report offline pages as
> >> being used
> >>
> >> When hot-added memory pages are not brought online or when some
> >> memory blocks are sent offline the subsequent ballooning process
> >> kills the guest with OOM killer. This happens as we don't report
> >> these pages as neither used nor free and apparently host algorythm
> >> considers them as being unused. Keep track of all online/offline
> >> operations and report all currently offline pages as being used so
> >> host won't try to balloon them out.
> >>
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> ---
> >> drivers/hv/hv_balloon.c | 33 ++++++++++++++++++++++++---------
> >> 1 file changed, 24 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index
> >> a095b70..e4b4454 100644
> >> --- a/drivers/hv/hv_balloon.c
> >> +++ b/drivers/hv/hv_balloon.c
> >> @@ -503,6 +503,8 @@ struct hv_dynmem_device {
> >> * Number of pages we have currently ballooned out.
> >> */
> >> unsigned int num_pages_ballooned;
> >> + unsigned int num_pages_onlined;
> >> + unsigned int num_pages_added;
> >>
> >> /*
> >> * State to manage the ballooning (up) operation.
> >> @@ -556,12 +558,15 @@ static void post_status(struct
> hv_dynmem_device
> >> *dm); static int hv_memory_notifier(struct notifier_block *nb,
> >> unsigned long val,
> >> void *v)
> >> {
> >> + struct memory_notify *mem = (struct memory_notify *)v;
> >> +
> >> switch (val) {
> >> case MEM_GOING_ONLINE:
> >> mutex_lock(&dm_device.ha_region_mutex);
> >> break;
> >>
> >> case MEM_ONLINE:
> >> + dm_device.num_pages_onlined += mem->nr_pages;
> >> case MEM_CANCEL_ONLINE:
> >
> > Why are we not adjusting num_pages_onlined when we cancel the online
> > Operation.
>
> Because we didn't increase the number yet.
Thanks; my mistake.
K. Y
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-02-25 22:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-19 16:27 [PATCH 0/2] Drivers: hv: hv_balloon: avoid OOM killer on the ballooning path Vitaly Kuznetsov
2015-02-19 16:27 ` [PATCH 1/2] Drivers: hv: hv_balloon: report offline pages as being used Vitaly Kuznetsov
2015-02-25 14:32 ` KY Srinivasan
2015-02-25 16:55 ` Vitaly Kuznetsov
2015-02-25 22:29 ` KY Srinivasan
2015-02-19 16:27 ` [PATCH 2/2] Drivers: hv: hv_balloon: refuse to balloon below the floor Vitaly Kuznetsov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox