From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752363AbcHHJiQ (ORCPT ); Mon, 8 Aug 2016 05:38:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42356 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752005AbcHHJiP (ORCPT ); Mon, 8 Aug 2016 05:38:15 -0400 From: Vitaly Kuznetsov To: "Alex Ng \(LIS\)" Cc: "devel\@linuxdriverproject.org" , "linux-kernel\@vger.kernel.org" , "Haiyang Zhang" , KY Srinivasan Subject: Re: [PATCH 2/4] Drivers: hv: balloon: account for gaps in hot add regions References: <1470394147-21268-1-git-send-email-vkuznets@redhat.com> <1470394147-21268-3-git-send-email-vkuznets@redhat.com> Date: Mon, 08 Aug 2016 11:38:11 +0200 In-Reply-To: (Alex Ng's message of "Sat, 6 Aug 2016 00:07:24 +0000") Message-ID: <87wpjrob24.fsf@vitty.brq.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.95 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Mon, 08 Aug 2016 09:38:14 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org "Alex Ng (LIS)" writes: >> -----Original Message----- >> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com] >> Sent: Friday, August 5, 2016 3:49 AM >> To: devel@linuxdriverproject.org >> Cc: linux-kernel@vger.kernel.org; Haiyang Zhang ; >> KY Srinivasan ; Alex Ng (LIS) >> Subject: [PATCH 2/4] Drivers: hv: balloon: account for gaps in hot add regions >> >> I'm observing the following hot add requests from the WS2012 host: >> >> hot_add_req: start_pfn = 0x108200 count = 330752 >> hot_add_req: start_pfn = 0x158e00 count = 193536 >> hot_add_req: start_pfn = 0x188400 count = 239616 >> >> As the host doesn't specify hot add regions we're trying to create 128Mb- >> aligned region covering the first request, we create the 0x108000 - >> 0x160000 region and we add 0x108000 - 0x158e00 memory. The second >> request passes the pfn_covered() check, we enlarge the region to 0x108000 - >> 0x190000 and add 0x158e00 - 0x188200 memory. The problem emerges with >> the third request as it starts at 0x188400 so there is a 0x200 gap which is not >> covered. As the end of our region is 0x190000 now it again passes the >> pfn_covered() check were we just adjust the covered_end_pfn and make it >> 0x188400 instead of 0x188200 which means that we'll try to online >> 0x188200-0x188400 pages but these pages were never assigned to us and we >> crash. > > The fact that the host sent a request that's non-contiguous with the previous > request is unexpected. Could we check to see the number of pages we returned > in our response, after each request? > > I'm wondering if we may have given a wrong response to cause the host to > follow-up with a gapped request. It seems it is not the case, here is the recorded session (address format is hex, count is decimal): [ 66.851401] DM: hot_add_req: 108200 303104 0 0 -> we were asked to add 303104 pages ... [ 66.854420] DM: handle_pg_range: 108200 303104 [ 84.489291] DM: handle_pg_range: return 303104 [ 84.492498] DM: hot_add_req: ret 303104 -> and we returned '303104' [ 131.934542] DM: hot_add_req: 152200 221184 0 0 -> we were asked to add 221184 pages ... [ 131.937495] DM: handle_pg_range: 152200 221184 [ 132.720390] DM: handle_pg_range: return 221184 [ 132.722953] DM: hot_add_req: ret 221184 -> and we returned '221184' [ 132.958045] DM: hot_add_req: 188400 409088 0 0 -> and here we were asked to add pages with a gap (0x108200 + 303104 + 221184 = 0x188200 but as you can see the new range starts at 0x188400) [ 132.961409] DM: handle_pg_range: 188400 409088 [ 134.012555] DM: handle_pg_range: return 409088 [ 134.013862] DM: hot_add_req: ret 409088 so I don't see a flaw on Linux side ... -- Vitaly