From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Chen, Tiejun" Subject: Re: [v7][RFC][PATCH 06/13] hvmloader/ram: check if guest memory is out of reserved device memory maps Date: Tue, 28 Oct 2014 15:47:05 +0800 Message-ID: <544F49F9.3070208@intel.com> References: <1414136077-18599-1-git-send-email-tiejun.chen@intel.com> <1414136077-18599-7-git-send-email-tiejun.chen@intel.com> <544A84B10200007800042016@mail.emea.novell.com> <544DFDB2.2010508@intel.com> <544E29C70200007800042595@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <544E29C70200007800042595@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: yang.z.zhang@intel.com, kevin.tian@intel.com, tim@xen.org, xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 2014/10/27 18:17, Jan Beulich wrote: >>>> On 27.10.14 at 09:09, wrote: >> On 2014/10/24 22:56, Jan Beulich wrote: >>>>>> On 24.10.14 at 09:34, wrote: >>>> We need to check to reserve all reserved device memory maps in e820 >>>> to avoid any potential guest memory conflict. >>>> >>>> Currently, if we can't insert RDM entries directly, we may need to handle >>>> several ranges as follows: >>>> a. Fixed Ranges --> BUG() >>>> lowmem_reserved_base-0xA0000: reserved by BIOS implementation, >>>> BIOS region, >>>> RESERVED_MEMBASE ~ 0x100000000, >>> >>> This seems conceptually wrong to me, and I said so before: >>> Depending on host characteristics this approach may mean you're >>> going to be unable to build any HVM guests. Minimally there needs >>> to be a way to avoid these checks (resulting in devices associated >>> with RMRRs not being assignable to such a guest). I'm therefore >> >> I just use 'err' to indicate if these fixed range overlaps RMRR, >> >> + /* These overlap may issue guest can't work well. */ >> + if ( err ) >> + { >> + printf("Guest can't work with some reserved device memory overlap!\n"); >> + BUG(); >> + } >> >> As I understand, these fixed ranges don't like RAM that we can move >> safely out any RMRR overlap. And actually its rare to overlap with those >> fixed ranges. > > Again - one of my systems has RMRRs in the Ex000 range, which > certainly risks overlapping with the BIOS image should that one be > larger than 64k. Plus with RMRRs being in that region, I can > certainly see (physical) systems with small enough BIOS images > to place RMRRs even in the low Fx000 range, which then quite > certainly would overlap with the (virtual) BIOS range. > >> But I can remove BUG if you insist on this point. > > Whether removing the BUG() here is correct and/or sufficient to > address my concern I can't immediately tell. What I insist on is that Okay. > _no matter_ what RMRRs a physical host has, it should not prevent > the creation of guests (the worst that may result is that passing > through certain devices doesn't work anymore, and even then the > operator needs to be given a way of circumventing this if (s)he > knows that the device won't access the range post-boot, or if it's > being deemed acceptable for it to do so). As we know just legacy USB and GFX need these RMRR ranges. Especially, I believe just USB need << 1M space, so it may be possible to be placed below 1M. But I think we can ask BIOS to reallocate them upwards like my real platform, RMRR region: base_addr ab80a000 end_address ab81dfff I don't know what platform you're using, maybe its a legacy machine? But anyway it should be feasible to update BIOS. And even we can ask BIOS do this as a normal rule in the future. For GFX, oftentimes it need dozens of MB, RMRR region: base_addr ad000000 end_address af7fffff So it shouldn't be overlapped with <1M. > >>>> + /* If we're going last RAM:Hole range */ >>>> + else if ( end < next_start && >>>> + rdm_start > start && >>>> + rdm_end < next_start && >>>> + type == E820_RAM ) >>>> + { >>>> + if ( do_insert ) >>>> + { >>>> + memmove(&e820[j+1], &e820[j], >>>> + (sum_nr - j) * sizeof(struct e820entry)); >>>> + >>>> + e820[j].size = rdm_start - e820[j].addr; >>>> + e820[j].type = E820_RAM; >>>> + >>>> + e820[j+1].addr = rdm_start; >>>> + e820[j+1].size = rdm_end - rdm_start; >>>> + e820[j+1].type = E820_RESERVED; >>>> + next_e820_entry_index++; >>>> + } >>>> + insert++; >>>> + } >>> >>> This if-else-if series looks horrible - is there really no way to consolidate >>> it? Also, other than punching holes in the E820 map you don't seem to >> >> I know this is ugly but as you know there's no any rule we can make good >> use of this case. RMRR can start anywhere so We have to assume any >> scenarios, >> >> 1. Just amid those remaining e820 entries. >> 2. Already at the end. >> 3. If coincide with one RAM range. >> 4. If we're just aligned with start of one RAM range. >> 5. If we're just aligned with end of one RAM range. >> 6. If we're just in of one RAM range. >> 7. If we're going last RAM:Hole range. >> >> So if you think we're handling correctly, maybe we can continue >> optimizing this way once we have a better idea. > > I understand that there are various cases to be considered, but > that's no different elsewhere. For example, look at > xen/arch/x86/e820.c:e820_change_range_type() which gets I don't think this circumstance is same as our requirement. Here we are trying to insert different multiple entries that they have different range. Anyway, I can take a further look at if we can improve this. > away with quite a bit shorter an if/else-if sequence. > >>> be doing anything here. And the earlier tools side patches didn't do >>> anything about this either. Consequently, at the time where it may >>> become necessary to establish the 1:1 mapping in the P2M, there'll >>> be the RAM mapping still there, causing the device assignment to fail. >> >> But I already set these range as p2m_access_n, and as you see I also >> reserved these range in e820 table. So although the RAM mapping still is >> still there but no any actual access. > > That's being done in patch 8, but we're talking about patch 6 here. > Also - what size are the RMRRs in your case? The USB ones I know RMRR region: base_addr ab80a000 end_address ab81dfff RMRR region: base_addr ad000000 end_address af7fffff > of are typical single or very few page ones, so having the guest > lose that amount of memory may be tolerable. But if the ranges can > get any larger than a couple of pages, or if there can reasonably be > a larger amount of them (like could be the case on e.g. multi-node > systems), simply hiding that memory may not be well received by > our users. Customers may accept dozens of MB but I'm not sure. > >> RMRR range: >> >> root@tchen0-Shark-Bay-Client-platform:/home/tchen0/workspace# xl dmesg | >> grep RMRR >> (XEN) [VT-D]dmar.c:834: found ACPI_DMAR_RMRR: >> (XEN) [VT-D]dmar.c:679: RMRR region: base_addr ab80a000 end_address ab81dfff >> (XEN) [VT-D]dmar.c:834: found ACPI_DMAR_RMRR: >> (XEN) [VT-D]dmar.c:679: RMRR region: base_addr ad000000 end_address af7fffff >> root@tchen0-Shark-Bay-Client-platform:/home/tchen0/workspace# >> >> Without my patch: >> >> (d4) E820 table: >> (d4) [00]: 00000000:00000000 - 00000000:0009e000: RAM >> (d4) [01]: 00000000:0009e000 - 00000000:000a0000: RESERVED >> (d4) HOLE: 00000000:000a0000 - 00000000:000e0000 >> (d4) [02]: 00000000:000e0000 - 00000000:00100000: RESERVED >> (d4) [03]: 00000000:00100000 - 00000000:ab80a000: RAM >> (d4) [04]: 00000000:ab80a000 - 00000000:ab81e000: RESERVED >> (d4) [05]: 00000000:ab81e000 - 00000000:ad000000: RAM >> (d4) [06]: 00000000:ad000000 - 00000000:af800000: RESERVED > > Where would this reserved range come from when you patches > aren't in place? > >> (d4) HOLE: 00000000:af800000 - 00000000:fc000000 >> (d4) [07]: 00000000:fc000000 - 00000001:00000000: RESERVED >> >> >> With my patch: >> >> (d2) f0000-fffff: Main BIOS >> (d2) E820 table: >> (d2) [00]: 00000000:00000000 - 00000000:0009e000: RAM >> (d2) [01]: 00000000:0009e000 - 00000000:000a0000: RESERVED >> (d2) HOLE: 00000000:000a0000 - 00000000:000e0000 >> (d2) [02]: 00000000:000e0000 - 00000000:00100000: RESERVED >> (d2) [03]: 00000000:00100000 - 00000000:ab80a000: RAM >> (d2) [04]: 00000000:ab80a000 - 00000000:ab81e000: RESERVED >> (d2) [05]: 00000000:ab81e000 - 00000000:ad000000: RAM >> (d2) [06]: 00000000:ad000000 - 00000000:af800000: RESERVED > > And this already answers what I asked above: You shouldn't be blindly > hiding 40Mb from the guest. If we don't reserve these RMRR ranges, so guest may create 1:1 mapping. Then it will affect a device usage in other VM, or a device usage may corrupt these ranges in other VM. Yes, we really need a policy to do this. So please tell me what you expect. Thanks Tiejun