public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "K. Y. Srinivasan" <kys@microsoft.com>
To: gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org,
	devel@linuxdriverproject.org, olaf@aepfle.de, apw@canonical.com,
	vkuznets@redhat.com
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Subject: [PATCH 3/9] Drivers: hv: hv_balloon: don't lose memory when onlining order is not natural
Date: Tue, 17 Mar 2015 17:41:55 -0700	[thread overview]
Message-ID: <1426639321-25458-3-git-send-email-kys@microsoft.com> (raw)
In-Reply-To: <1426639321-25458-1-git-send-email-kys@microsoft.com>

From: Vitaly Kuznetsov <vkuznets@redhat.com>

Memory blocks can be onlined in random order. When this order is not natural
some memory pages are not onlined because of the redundant check in
hv_online_page().

Here is a real world scenario:
1) Host tries to hot-add the following (process_hot_add):
  pg_start=rg_start=0x48000, pfn_cnt=111616, rg_size=262144

2) This results in adding 4 memory blocks:
[  109.057866] init_memory_mapping: [mem 0x48000000-0x4fffffff]
[  114.102698] init_memory_mapping: [mem 0x50000000-0x57ffffff]
[  119.168039] init_memory_mapping: [mem 0x58000000-0x5fffffff]
[  124.233053] init_memory_mapping: [mem 0x60000000-0x67ffffff]
The last one is incomplete but we have special has->covered_end_pfn counter to
avoid onlining non-backed frames and hv_bring_pgs_online() function to bring
them online later on.

3) Now we have 4 offline memory blocks: /sys/devices/system/memory/memory9-12
$ for f in /sys/devices/system/memory/memory*/state; do echo $f `cat $f`; done | grep -v onlin
/sys/devices/system/memory/memory10/state offline
/sys/devices/system/memory/memory11/state offline
/sys/devices/system/memory/memory12/state offline
/sys/devices/system/memory/memory9/state offline

4) We bring them online in non-natural order:
$grep MemTotal /proc/meminfo
MemTotal:         966348 kB
$echo online > /sys/devices/system/memory/memory12/state && grep MemTotal /proc/meminfo
MemTotal:        1019596 kB
$echo online > /sys/devices/system/memory/memory11/state && grep MemTotal /proc/meminfo
MemTotal:        1150668 kB
$echo online > /sys/devices/system/memory/memory9/state && grep MemTotal /proc/meminfo
MemTotal:        1150668 kB

As you can see memory9 block gives us zero additional memory. We can also
observe a huge discrepancy between host- and guest-reported memory sizes.

The root cause of the issue is the redundant pg >= covered_start_pfn check (and
covered_start_pfn advancing) in hv_online_page(). When upper memory block in
being onlined before the lower one (memory12 and memory11 in the above case) we
advance the covered_start_pfn pointer and all memory9 pages do not pass the
check. If the assumption that host always gives us requests in sequential order
and pg_start always equals rg_start when the first request for the new HA
region is received (that's the case in my testing) is correct than we can get
rid of covered_start_pfn and pg >= start_pfn check in hv_online_page() is
sufficient.

The current char-next branch is broken and this patch fixes
the bug.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/hv/hv_balloon.c |   14 ++++----------
 1 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index f1f17c5..014256a 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -428,14 +428,13 @@ struct dm_info_msg {
  * currently hot added. We hot add in multiples of 128M
  * chunks; it is possible that we may not be able to bring
  * online all the pages in the region. The range
- * covered_start_pfn : covered_end_pfn defines the pages that can
+ * covered_end_pfn defines the pages that can
  * be brough online.
  */
 
 struct hv_hotadd_state {
 	struct list_head list;
 	unsigned long start_pfn;
-	unsigned long covered_start_pfn;
 	unsigned long covered_end_pfn;
 	unsigned long ha_end_pfn;
 	unsigned long end_pfn;
@@ -679,8 +678,7 @@ static void hv_online_page(struct page *pg)
 
 	list_for_each(cur, &dm_device.ha_region_list) {
 		has = list_entry(cur, struct hv_hotadd_state, list);
-		cur_start_pgp = (unsigned long)
-				pfn_to_page(has->covered_start_pfn);
+		cur_start_pgp = (unsigned long)pfn_to_page(has->start_pfn);
 		cur_end_pgp = (unsigned long)pfn_to_page(has->covered_end_pfn);
 
 		if (((unsigned long)pg >= cur_start_pgp) &&
@@ -692,7 +690,6 @@ static void hv_online_page(struct page *pg)
 			__online_page_set_limits(pg);
 			__online_page_increment_counters(pg);
 			__online_page_free(pg);
-			has->covered_start_pfn++;
 		}
 	}
 }
@@ -736,10 +733,9 @@ static bool pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
 		 * is, update it.
 		 */
 
-		if (has->covered_end_pfn != start_pfn) {
+		if (has->covered_end_pfn != start_pfn)
 			has->covered_end_pfn = start_pfn;
-			has->covered_start_pfn = start_pfn;
-		}
+
 		return true;
 
 	}
@@ -784,7 +780,6 @@ static unsigned long handle_pg_range(unsigned long pg_start,
 				pgs_ol = pfn_cnt;
 			hv_bring_pgs_online(start_pfn, pgs_ol);
 			has->covered_end_pfn +=  pgs_ol;
-			has->covered_start_pfn +=  pgs_ol;
 			pfn_cnt -= pgs_ol;
 		}
 
@@ -845,7 +840,6 @@ static unsigned long process_hot_add(unsigned long pg_start,
 		list_add_tail(&ha_region->list, &dm_device.ha_region_list);
 		ha_region->start_pfn = rg_start;
 		ha_region->ha_end_pfn = rg_start;
-		ha_region->covered_start_pfn = pg_start;
 		ha_region->covered_end_pfn = pg_start;
 		ha_region->end_pfn = rg_start + rg_size;
 	}
-- 
1.7.4.1


  parent reply	other threads:[~2015-03-17 23:25 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-18  0:41 [PATCH 0/9] Drivers: hv: vmbus: Some miscellaneous fixes K. Y. Srinivasan
2015-03-18  0:41 ` [PATCH 1/9] Drivers: hv: vmbus: Perform device register in the per-channel work element K. Y. Srinivasan
2015-03-18  0:41   ` [PATCH 2/9] Drivers: hv: hv_balloon: keep locks balanced on add_memory() failure K. Y. Srinivasan
2015-03-18  0:41   ` K. Y. Srinivasan [this message]
2015-03-18  0:41   ` [PATCH 4/9] Correcting truncation error for constant HV_CRASH_CTL_CRASH_NOTIFY K. Y. Srinivasan
2015-03-18  7:26     ` Greg KH
2015-03-18  7:27     ` Greg KH
2015-03-18 14:23       ` KY Srinivasan
2015-03-18 14:26         ` Greg KH
2015-03-18  0:41   ` [PATCH 5/9] hv: vmbus: missing curly braces in vmbus_process_offer() K. Y. Srinivasan
2015-03-18  0:41   ` [PATCH 6/9] tools: hv: fcopy_daemon: support >2GB files for x86_32 guest K. Y. Srinivasan
2015-03-18  0:41   ` [PATCH 7/9] Drivers: hv: vmbus: Fix a bug in rescind processing in vmbus_close_internal() K. Y. Srinivasan
2015-03-18  0:42   ` [PATCH 8/9] hv: hypervvssd: call endmntent before call setmntent again K. Y. Srinivasan
2015-03-18  0:42   ` [PATCH 9/9] Drivers: hv: vmbus: Export the vmbus_sendpacket_pagebuffer_ctl() K. Y. Srinivasan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1426639321-25458-3-git-send-email-kys@microsoft.com \
    --to=kys@microsoft.com \
    --cc=apw@canonical.com \
    --cc=devel@linuxdriverproject.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=olaf@aepfle.de \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox