From: "Chen, Tiejun" <tiejun.chen@intel.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: yang.z.zhang@intel.com, kevin.tian@intel.com, tim@xen.org,
xen-devel@lists.xen.org
Subject: Re: [v7][RFC][PATCH 06/13] hvmloader/ram: check if guest memory is out of reserved device memory maps
Date: Mon, 27 Oct 2014 16:09:22 +0800 [thread overview]
Message-ID: <544DFDB2.2010508@intel.com> (raw)
In-Reply-To: <544A84B10200007800042016@mail.emea.novell.com>
On 2014/10/24 22:56, Jan Beulich wrote:
>>>> On 24.10.14 at 09:34, <tiejun.chen@intel.com> wrote:
>> We need to check to reserve all reserved device memory maps in e820
>> to avoid any potential guest memory conflict.
>>
>> Currently, if we can't insert RDM entries directly, we may need to handle
>> several ranges as follows:
>> a. Fixed Ranges --> BUG()
>> lowmem_reserved_base-0xA0000: reserved by BIOS implementation,
>> BIOS region,
>> RESERVED_MEMBASE ~ 0x100000000,
>
> This seems conceptually wrong to me, and I said so before:
> Depending on host characteristics this approach may mean you're
> going to be unable to build any HVM guests. Minimally there needs
> to be a way to avoid these checks (resulting in devices associated
> with RMRRs not being assignable to such a guest). I'm therefore
I just use 'err' to indicate if these fixed range overlaps RMRR,
+ /* These overlap may issue guest can't work well. */
+ if ( err )
+ {
+ printf("Guest can't work with some reserved device memory
overlap!\n");
+ BUG();
+ }
As I understand, these fixed ranges don't like RAM that we can move
safely out any RMRR overlap. And actually its rare to overlap with those
fixed ranges.
But I can remove BUG if you insist on this point.
> only going to briefly look at the rest of this patch.
>
>> +static unsigned int construct_rdm_e820_maps(unsigned int next_e820_entry_index,
>> + uint32_t nr_map,
>> + struct
>> xen_mem_reserved_device_memory *map,
>> + struct e820entry *e820,
>> + unsigned int lowmem_reserved_base,
>> + unsigned int bios_image_base)
>> +{
>> + unsigned int i, j, sum_nr = next_e820_entry_index + nr_map;
>> + uint64_t start, end, next_start, rdm_start, rdm_end;
>> + uint32_t type;
>> + unsigned int insert = 0, do_insert = 0;
>> + int err = 0;
>> +
>> + do_real_construct:
>> + for ( i = 0; i < nr_map; i++ )
>> + {
>> + rdm_start = map[i].start_pfn << PAGE_SHIFT;
>> + rdm_end = rdm_start + (map[i].nr_pages << PAGE_SHIFT);
>> +
>> + for ( j = 0; j < next_e820_entry_index - 1; j++ )
>> + {
>> + start = e820[j].addr;
>> + end = e820[j].addr + e820[j].size;
>> + type = e820[j].type;
>> + next_start = e820[j+1].addr;
>> +
>> + /* lowmem_reserved_base-0xA0000: reserved by BIOS implementation. */
>> + if ( lowmem_reserved_base < 0xA0000 &&
>> + start == lowmem_reserved_base )
>> + {
>> + if ( rdm_start >= start && rdm_start <= end )
>> + {
>> + err = -1;
>> + break;
>> + }
>> + }
>> +
>> + /*
>> + * BIOS region.
>> + */
>> + if ( start == bios_image_base )
>> + {
>> + if ( rdm_start >= start && rdm_start <= end )
>> + {
>> + err = -1;
>> + break;
>> + }
>> + }
>> +
>> + /* The default memory map always occupy one fixed reserved
>> + * range: RESERVED_MEMBASE ~ 0x100000000
>> + */
>> + if ( rdm_start >= RESERVED_MEMBASE &&
>> + rdm_start <= ((uint64_t)1 << 32) )
>> + {
>> + err = -1;
>> + break;
>> + }
>> +
>> + /* Just amid those remaining e820 entries. */
>> + if ( (rdm_start > end) && (rdm_end < next_start) )
>> + {
>> + if ( do_insert )
>> + {
>> + memmove(&e820[j+2], &e820[j+1],
>> + (sum_nr - j - 1) * sizeof(struct e820entry));
>> +
>> + /* Then fill RMRR into that entry. */
>> + e820[j+1].addr = rdm_start;
>> + e820[j+1].size = rdm_end - rdm_start;
>> + e820[j+1].type = E820_RESERVED;
>> + next_e820_entry_index++;
>> + }
>> + insert++;
>> + }
>> + /* Already at the end. */
>> + else if ( (rdm_start > end) && !next_start )
>> + {
>> + if ( do_insert )
>> + {
>> + e820[next_e820_entry_index].addr = rdm_start;
>> + e820[next_e820_entry_index].size = rdm_end - rdm_start;
>> + e820[next_e820_entry_index].type = E820_RESERVED;
>> + next_e820_entry_index++;
>> + }
>> + insert++;
>> + }
>> + /* If completely overlap with one RAM range. */
>> + else if ( rdm_start == start && rdm_end == end && type == E820_RAM )
>
> Comment and expression disagree.
What about this?
/* If coincide with one RAM range. */
>
>> + {
>> + if ( do_insert )
>> + e820[j].type = E820_RESERVED;
>> + insert++;
>> + }
>> + /* If we're just alligned with start of one RAM range. */
>> + else if ( rdm_start == start && rdm_end < end && type == E820_RAM )
>> + {
>> + if ( do_insert )
>> + {
>> + memmove(&e820[j+1], &e820[j],
>> + (sum_nr - j) * sizeof(struct e820entry));
>> +
>> + e820[j+1].addr = rdm_end;
>> + e820[j+1].size = e820[j].addr + e820[j].size - rdm_end;
>> + e820[j+1].type = E820_RAM;
>> + next_e820_entry_index++;
>> +
>> + e820[j].addr = rdm_start;
>> + e820[j].size = rdm_end - rdm_start;
>> + e820[j].type = E820_RESERVED;
>> + }
>> + insert++;
>> + }
>> + /* If we're just alligned with end of one RAM range. */
>> + else if ( rdm_start > start && rdm_end == end && type == E820_RAM )
>> + {
>> + if ( do_insert )
>> + {
>> + memmove(&e820[j+1], &e820[j],
>> + (sum_nr - j) * sizeof(struct e820entry));
>> +
>> + e820[j].size = rdm_start - e820[j].addr;
>> + e820[j].type = E820_RAM;
>> +
>> + e820[j+1].addr = rdm_start;
>> + e820[j+1].size = rdm_end - rdm_start;
>> + e820[j+1].type = E820_RESERVED;
>> + next_e820_entry_index++;
>> + }
>> + insert++;
>> + }
>> + /* If we're just in of one RAM range */
>> + else if ( rdm_start > start && rdm_end < end && type == E820_RAM )
>> + {
>> + if ( do_insert )
>> + {
>> + memmove(&e820[j+2], &e820[j],
>> + (sum_nr - j) * sizeof(struct e820entry));
>> +
>> + e820[j+2].addr = rdm_end;
>> + e820[j+2].size = e820[j].addr + e820[j].size - rdm_end;
>> + e820[j+2].type = E820_RAM;
>> + next_e820_entry_index++;
>> +
>> + e820[j+1].addr = rdm_start;
>> + e820[j+1].size = rdm_end - rdm_start;
>> + e820[j+1].type = E820_RESERVED;
>> + next_e820_entry_index++;
>> +
>> + e820[j].size = rdm_start - e820[j].addr;
>> + e820[j].type = E820_RAM;
>> + }
>> + insert++;
>> + }
>> + /* If we're going last RAM:Hole range */
>> + else if ( end < next_start &&
>> + rdm_start > start &&
>> + rdm_end < next_start &&
>> + type == E820_RAM )
>> + {
>> + if ( do_insert )
>> + {
>> + memmove(&e820[j+1], &e820[j],
>> + (sum_nr - j) * sizeof(struct e820entry));
>> +
>> + e820[j].size = rdm_start - e820[j].addr;
>> + e820[j].type = E820_RAM;
>> +
>> + e820[j+1].addr = rdm_start;
>> + e820[j+1].size = rdm_end - rdm_start;
>> + e820[j+1].type = E820_RESERVED;
>> + next_e820_entry_index++;
>> + }
>> + insert++;
>> + }
>
> This if-else-if series looks horrible - is there really no way to consolidate
> it? Also, other than punching holes in the E820 map you don't seem to
I know this is ugly but as you know there's no any rule we can make good
use of this case. RMRR can start anywhere so We have to assume any
scenarios,
1. Just amid those remaining e820 entries.
2. Already at the end.
3. If coincide with one RAM range.
4. If we're just aligned with start of one RAM range.
5. If we're just aligned with end of one RAM range.
6. If we're just in of one RAM range.
7. If we're going last RAM:Hole range.
So if you think we're handling correctly, maybe we can continue
optimizing this way once we have a better idea.
> be doing anything here. And the earlier tools side patches didn't do
> anything about this either. Consequently, at the time where it may
> become necessary to establish the 1:1 mapping in the P2M, there'll
> be the RAM mapping still there, causing the device assignment to fail.
But I already set these range as p2m_access_n, and as you see I also
reserved these range in e820 table. So although the RAM mapping still is
still there but no any actual access.
Then when we assign device, we will override these p2m entry as 1:1
mapping if p2m_access_n is set.
> Again - did you _test_ this scenario with a big enough guest and your
> gfx card passed through?
Yes. I validate these patches as following configuration:
memory = 2816
gfx_passthru=1
pci=["00:02.0"]
And I also perform 'xl dmesg' to check e820 table. Here I can provide an
instance with overlap,
RMRR range:
root@tchen0-Shark-Bay-Client-platform:/home/tchen0/workspace# xl dmesg |
grep RMRR
(XEN) [VT-D]dmar.c:834: found ACPI_DMAR_RMRR:
(XEN) [VT-D]dmar.c:679: RMRR region: base_addr ab80a000 end_address
ab81dfff
(XEN) [VT-D]dmar.c:834: found ACPI_DMAR_RMRR:
(XEN) [VT-D]dmar.c:679: RMRR region: base_addr ad000000 end_address
af7fffff
root@tchen0-Shark-Bay-Client-platform:/home/tchen0/workspace#
Without my patch:
(d4) E820 table:
(d4) [00]: 00000000:00000000 - 00000000:0009e000: RAM
(d4) [01]: 00000000:0009e000 - 00000000:000a0000: RESERVED
(d4) HOLE: 00000000:000a0000 - 00000000:000e0000
(d4) [02]: 00000000:000e0000 - 00000000:00100000: RESERVED
(d4) [03]: 00000000:00100000 - 00000000:ab80a000: RAM
(d4) [04]: 00000000:ab80a000 - 00000000:ab81e000: RESERVED
(d4) [05]: 00000000:ab81e000 - 00000000:ad000000: RAM
(d4) [06]: 00000000:ad000000 - 00000000:af800000: RESERVED
(d4) HOLE: 00000000:af800000 - 00000000:fc000000
(d4) [07]: 00000000:fc000000 - 00000001:00000000: RESERVED
With my patch:
(d2) f0000-fffff: Main BIOS
(d2) E820 table:
(d2) [00]: 00000000:00000000 - 00000000:0009e000: RAM
(d2) [01]: 00000000:0009e000 - 00000000:000a0000: RESERVED
(d2) HOLE: 00000000:000a0000 - 00000000:000e0000
(d2) [02]: 00000000:000e0000 - 00000000:00100000: RESERVED
(d2) [03]: 00000000:00100000 - 00000000:ab80a000: RAM
(d2) [04]: 00000000:ab80a000 - 00000000:ab81e000: RESERVED
(d2) [05]: 00000000:ab81e000 - 00000000:ad000000: RAM
(d2) [06]: 00000000:ad000000 - 00000000:af800000: RESERVED
(d2) HOLE: 00000000:af800000 - 00000000:fc000000
(d2) [07]: 00000000:fc000000 - 00000000:fdffc000: RESERVED
(d2) [08]: 00000000:fdffc000 - 00000000:fdfff000: NVS
(d2) [09]: 00000000:fdfff000 - 00000001:00000000: RESERVED
(d2) Invoking ROMBIOS ...
Thanks
Tiejun
next prev parent reply other threads:[~2014-10-27 8:09 UTC|newest]
Thread overview: 180+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-24 7:34 [v7][RFC][PATCH 01/13] xen: RMRR fix Tiejun Chen
2014-10-24 7:34 ` [v7][RFC][PATCH 01/13] introduce XENMEM_reserved_device_memory_map Tiejun Chen
2014-10-24 14:11 ` Jan Beulich
2014-10-27 2:11 ` Chen, Tiejun
2014-10-27 2:18 ` Chen, Tiejun
2014-10-27 9:42 ` Jan Beulich
2014-10-28 2:22 ` Chen, Tiejun
2014-10-27 13:35 ` Julien Grall
2014-10-28 2:35 ` Chen, Tiejun
2014-10-28 10:36 ` Jan Beulich
2014-10-29 0:40 ` Chen, Tiejun
2014-10-29 8:53 ` Jan Beulich
2014-10-30 2:53 ` Chen, Tiejun
2014-10-30 9:10 ` Jan Beulich
2014-10-31 1:03 ` Chen, Tiejun
2014-10-24 7:34 ` [v7][RFC][PATCH 02/13] tools/libxc: introduce hypercall for xc_reserved_device_memory_map Tiejun Chen
2014-10-24 7:34 ` [v7][RFC][PATCH 03/13] tools/libxc: check if modules space is overlapping with reserved device memory Tiejun Chen
2014-10-24 7:34 ` [v7][RFC][PATCH 04/13] hvmloader/util: get reserved device memory maps Tiejun Chen
2014-10-24 14:22 ` Jan Beulich
2014-10-27 3:12 ` Chen, Tiejun
2014-10-27 9:45 ` Jan Beulich
2014-10-28 5:21 ` Chen, Tiejun
2014-10-28 9:48 ` Jan Beulich
2014-10-29 6:54 ` Chen, Tiejun
2014-10-29 9:05 ` Jan Beulich
2014-10-30 5:55 ` Chen, Tiejun
2014-10-30 9:13 ` Jan Beulich
2014-10-31 2:20 ` Chen, Tiejun
2014-10-31 8:14 ` Jan Beulich
2014-11-03 2:22 ` Chen, Tiejun
2014-11-03 8:53 ` Jan Beulich
2014-11-03 9:32 ` Chen, Tiejun
2014-11-03 9:45 ` Jan Beulich
2014-11-03 9:55 ` Chen, Tiejun
2014-11-03 10:02 ` Jan Beulich
2014-11-21 6:26 ` Chen, Tiejun
2014-11-21 7:43 ` Tian, Kevin
2014-11-21 7:54 ` Jan Beulich
2014-11-21 8:01 ` Tian, Kevin
2014-11-21 8:54 ` Chen, Tiejun
2014-11-21 9:33 ` Jan Beulich
2014-10-24 14:27 ` Jan Beulich
2014-10-27 5:07 ` Chen, Tiejun
2014-10-24 7:34 ` [v7][RFC][PATCH 05/13] hvmloader/mmio: reconcile guest mmio with reserved device memory Tiejun Chen
2014-10-24 14:42 ` Jan Beulich
2014-10-27 7:12 ` Chen, Tiejun
2014-10-27 9:56 ` Jan Beulich
2014-10-28 7:11 ` Chen, Tiejun
2014-10-28 9:56 ` Jan Beulich
2014-10-29 7:03 ` Chen, Tiejun
2014-10-29 9:08 ` Jan Beulich
2014-10-30 3:18 ` Chen, Tiejun
2014-10-24 7:34 ` [v7][RFC][PATCH 06/13] hvmloader/ram: check if guest memory is out of reserved device memory maps Tiejun Chen
2014-10-24 14:56 ` Jan Beulich
2014-10-27 8:09 ` Chen, Tiejun [this message]
2014-10-27 10:17 ` Jan Beulich
2014-10-28 7:47 ` Chen, Tiejun
2014-10-28 10:06 ` Jan Beulich
2014-10-29 7:43 ` Chen, Tiejun
2014-10-29 9:15 ` Jan Beulich
2014-10-30 3:11 ` Chen, Tiejun
2014-10-30 9:20 ` Jan Beulich
2014-10-31 5:41 ` Chen, Tiejun
2014-10-31 6:21 ` Tian, Kevin
2014-10-31 7:02 ` Chen, Tiejun
2014-10-31 8:20 ` Jan Beulich
2014-11-03 5:49 ` Chen, Tiejun
2014-11-03 8:56 ` Jan Beulich
2014-11-03 9:40 ` Chen, Tiejun
2014-11-03 9:51 ` Jan Beulich
2014-11-03 11:32 ` Chen, Tiejun
2014-11-03 11:43 ` Jan Beulich
2014-11-03 11:58 ` Chen, Tiejun
2014-11-03 12:34 ` Jan Beulich
2014-11-04 5:05 ` Chen, Tiejun
2014-11-04 7:54 ` Jan Beulich
2014-11-05 2:59 ` Chen, Tiejun
2014-11-05 17:00 ` Jan Beulich
2014-11-06 9:28 ` Chen, Tiejun
2014-11-06 10:06 ` Jan Beulich
2014-11-07 10:27 ` Chen, Tiejun
2014-11-07 11:08 ` Jan Beulich
2014-11-11 6:32 ` Chen, Tiejun
2014-11-11 7:49 ` Chen, Tiejun
2014-11-11 9:03 ` Jan Beulich
2014-11-11 9:06 ` Jan Beulich
2014-11-11 9:42 ` Chen, Tiejun
2014-11-11 10:07 ` Jan Beulich
2014-11-12 1:36 ` Chen, Tiejun
2014-11-12 8:37 ` Jan Beulich
2014-11-12 8:45 ` Chen, Tiejun
2014-11-12 9:02 ` Jan Beulich
2014-11-12 9:13 ` Chen, Tiejun
2014-11-12 9:56 ` Jan Beulich
2014-11-12 10:18 ` Chen, Tiejun
2014-11-19 8:17 ` Tian, Kevin
2014-11-20 7:45 ` Tian, Kevin
2014-11-20 8:04 ` Jan Beulich
2014-11-20 8:51 ` Tian, Kevin
2014-11-20 14:40 ` Tian, Kevin
2014-11-20 14:46 ` Jan Beulich
2014-11-20 20:11 ` Konrad Rzeszutek Wilk
2014-11-21 0:32 ` Tian, Kevin
2014-11-12 3:05 ` Chen, Tiejun
2014-11-12 8:55 ` Jan Beulich
2014-11-12 10:18 ` Chen, Tiejun
2014-11-12 10:24 ` Jan Beulich
2014-11-12 10:32 ` Chen, Tiejun
2014-11-13 3:09 ` Chen, Tiejun
2014-11-14 2:21 ` Chen, Tiejun
2014-11-14 8:21 ` Jan Beulich
2014-11-17 7:31 ` Chen, Tiejun
2014-11-17 7:57 ` Chen, Tiejun
2014-11-17 10:05 ` Jan Beulich
2014-11-17 11:08 ` Chen, Tiejun
2014-11-17 11:17 ` Jan Beulich
2014-11-17 11:32 ` Chen, Tiejun
2014-11-17 11:51 ` Jan Beulich
2014-11-18 3:08 ` Chen, Tiejun
2014-11-18 8:01 ` Jan Beulich
2014-11-18 8:16 ` Chen, Tiejun
2014-11-18 9:33 ` Jan Beulich
2014-11-19 1:26 ` Chen, Tiejun
2014-11-20 7:31 ` Jan Beulich
2014-11-20 8:12 ` Chen, Tiejun
2014-11-20 8:59 ` Jan Beulich
2014-11-20 10:28 ` Chen, Tiejun
2014-11-11 8:59 ` Jan Beulich
2014-11-11 9:35 ` Chen, Tiejun
2014-11-11 9:42 ` Jan Beulich
2014-11-11 9:51 ` Chen, Tiejun
2014-10-24 7:34 ` [v7][RFC][PATCH 07/13] xen/x86/p2m: introduce p2m_check_reserved_device_memory Tiejun Chen
2014-10-24 15:02 ` Jan Beulich
2014-10-27 8:50 ` Chen, Tiejun
2014-10-24 7:34 ` [v7][RFC][PATCH 08/13] xen/x86/p2m: set p2m_access_n for reserved device memory mapping Tiejun Chen
2014-10-24 15:11 ` Jan Beulich
2014-10-27 9:05 ` Chen, Tiejun
2014-10-27 10:33 ` Jan Beulich
2014-10-28 8:26 ` Chen, Tiejun
2014-10-28 10:12 ` Jan Beulich
2014-10-29 8:20 ` Chen, Tiejun
2014-10-29 9:20 ` Jan Beulich
2014-10-30 7:39 ` Chen, Tiejun
2014-10-30 9:24 ` Jan Beulich
2014-10-31 2:50 ` Chen, Tiejun
2014-10-31 8:25 ` Jan Beulich
2014-11-03 6:20 ` Chen, Tiejun
2014-11-03 9:00 ` Jan Beulich
2014-11-03 9:51 ` Chen, Tiejun
2014-11-03 10:03 ` Jan Beulich
2014-11-03 11:48 ` Chen, Tiejun
2014-11-03 11:53 ` Jan Beulich
2014-11-04 1:35 ` Chen, Tiejun
2014-11-04 8:02 ` Jan Beulich
2014-11-04 10:41 ` Chen, Tiejun
2014-11-04 11:41 ` Jan Beulich
2014-11-04 11:51 ` Chen, Tiejun
2014-10-24 7:34 ` [v7][RFC][PATCH 09/13] xen/x86/ept: handle reserved device memory in ept_handle_violation Tiejun Chen
2014-10-24 7:34 ` [v7][RFC][PATCH 10/13] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
2014-10-24 7:34 ` [v7][RFC][PATCH 11/13] xen:vtd: create RMRR mapping Tiejun Chen
2014-10-24 7:34 ` [v7][RFC][PATCH 12/13] xen/vtd: re-enable USB device assignment Tiejun Chen
2014-10-24 7:34 ` [v7][RFC][PATCH 13/13] xen/vtd: group assigned device with RMRR Tiejun Chen
2014-10-24 10:52 ` [v7][RFC][PATCH 01/13] xen: RMRR fix Jan Beulich
2014-10-27 2:00 ` Chen, Tiejun
2014-10-27 9:41 ` Jan Beulich
2014-10-28 8:36 ` Chen, Tiejun
2014-10-28 9:34 ` Jan Beulich
2014-10-28 9:39 ` Razvan Cojocaru
2014-10-29 0:51 ` Chen, Tiejun
2014-10-29 0:48 ` Chen, Tiejun
2014-10-29 2:51 ` Chen, Tiejun
2014-10-29 8:45 ` Jan Beulich
2014-10-30 8:21 ` Chen, Tiejun
2014-10-30 9:07 ` Jan Beulich
2014-10-31 3:11 ` Chen, Tiejun
2014-10-29 8:44 ` Jan Beulich
2014-10-30 2:51 ` Chen, Tiejun
2014-10-30 22:15 ` Tim Deegan
2014-10-31 2:53 ` Chen, Tiejun
2014-10-31 9:10 ` Tim Deegan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=544DFDB2.2010508@intel.com \
--to=tiejun.chen@intel.com \
--cc=JBeulich@suse.com \
--cc=kevin.tian@intel.com \
--cc=tim@xen.org \
--cc=xen-devel@lists.xen.org \
--cc=yang.z.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).