public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [0/3] -reserved-ram for PCI passthrough without VT-d and without paravirt
@ 2008-03-31 16:55 Andrea Arcangeli
  2008-03-31 17:02 ` [1/3] " Andrea Arcangeli
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Andrea Arcangeli @ 2008-03-31 16:55 UTC (permalink / raw)
  To: kvm-devel

Hello,

These three patches (one against host kernel, one against kvm.git, one
against kvm-userland.git) forces KVM to map all RAM mapped in the
virtualized e820 map provided to the guest with gfn = hfn. In turn
it's now possible to give direct hardware access to the guest, all DMA
will work fine on the virtualized guest ram.

The bios has to be updated to alter the end of the first ram slot in
the virtualized e820 map. This is unfixable as the address hardcoded
in the current bios is higher than what's marked as ram in my hardware
e820 map.

The only exception where gfn != hfn for ranges included in the
virtualized e820 map is for the magic bios page at host physical
address zero (bytes from 0 to 4096). All linux versions will
definitely never attempt to dma on such a page. If all OS are like
linux there will be no problem and pci passthrough will work
regardless of the guest OS without requiring any paravirtualization,
nor VT-d.

This only implements the memory management side, the logic to map mmio
regions into kvm address space will require further changes. The limit
of the reserved ram is around 1G and it has to be set at compile time,
so the guest will run with no more than 1G of ram (it's fairly easy to
extend it to 2G though). You can't run more than one guest with
-reserved-ram at once or they'll be overwriting themself. You need
access to /dev/mem on the userland side, and CAP_ADMIN on the kernel
side to run this.

I choosed an approach to require the minimal number of changes given
this is a short term approach due the lack of hardware features in
lots of cpus out there.

This is how the memory layout looks like when live guest runs
(physical start set to 512M and kvm -m 300).

7f108d429000-7f108d42b000 rw-p 7f108d429000 00:00 0
7f108d42b000-7f108d4ba000 rw-s 00000000 00:0e 275                        /dev/mem
7f108d4ba000-7f108d52a000 rw-p 7f108d4ba000 00:00 0
7f108d52a000-7f10a002a000 rw-s 00100000 00:0e 275                        /dev/mem
7f10a002a000-7f10a142e000 rw-p 7f10a002a000 00:00 0

  PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
 5522 ?        SLl    4:06      1  1568 427067 33144  7.0 bin/x86_64/kvm/bin/qemu-system-x86_64 -hda tmp/vir

RSS isn't including the reserved ram of course.

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [1/3] -reserved-ram for PCI passthrough without VT-d and without paravirt
  2008-03-31 16:55 [0/3] -reserved-ram for PCI passthrough without VT-d and without paravirt Andrea Arcangeli
@ 2008-03-31 17:02 ` Andrea Arcangeli
  2008-03-31 17:07 ` [2/3] " Andrea Arcangeli
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Andrea Arcangeli @ 2008-03-31 17:02 UTC (permalink / raw)
  To: kvm-devel

This is the kvm.git patch to enable -reserved-ram (without this kvm
will simply gracefully fail to emulate the illegal instruction inside
the bad_page). This trick avoids altering the ioctl api with libkvm,
in short if get_user_pages fails on a host kernel with reserved ram
config option enabled, it tries to see if it's a remap_pfn_range
mapping backing the memslot. In such case it checks if the ram was
reserved with page_count 0 and if so it disables the reference
counting as those pages are invisibile to linux. As long as pfn_valid
returns, pfn_to_page should be safe, so shall the memmap array be
allocated with holes corresponding to the holes generated in the e820
map, simply bad_page will be returned gracefully without risk like if
this patch wasn't applied to kvm.git.

Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index e9ae5db..e7a9c82 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -263,7 +263,8 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
 	npage = vcpu->arch.update_pte.page;
 	if (!npage)
 		return;
-	get_page(npage);
+	if (!page_is_reserved(npage))
+		get_page(npage);
 	mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0,
 		     gpte & PT_DIRTY_MASK, NULL, largepage, gpte_to_gfn(gpte),
 		     npage, true);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f4e1436..7f087ac 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -281,6 +281,18 @@ static inline void kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
 	set_bit(KVM_REQ_MIGRATE_TIMER, &vcpu->requests);
 }
 
+#ifdef CONFIG_RESERVE_PHYSICAL_START
+static inline int page_is_reserved(struct page * page)
+{
+	return !page_count(page);
+}
+#else /* CONFIG_RESERVE_PHYSICAL_START */
+static inline int page_is_reserved(struct page * page)
+{
+	return 0;
+}
+#endif /* CONFIG_RESERVE_PHYSICAL_START */
+
 enum kvm_stat_kind {
 	KVM_STAT_VM,
 	KVM_STAT_VCPU,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 30bf832..50a7b3e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -498,6 +524,65 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
 	return (slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE);
 }
 
+
+#ifdef CONFIG_RESERVE_PHYSICAL_START
+static struct page *direct_page(struct mm_struct *mm,
+				unsigned long address)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *ptep, pte;
+	spinlock_t *ptl;
+	struct page *page;
+	struct vm_area_struct *vma;
+	unsigned long pfn;
+
+	page = NULL;
+	if (!capable(CAP_SYS_ADMIN)) /* go safe */
+		goto out;
+
+	vma = find_vma(current->mm, address);
+	if (!vma || vma->vm_start > address ||
+	    !(vma->vm_flags & VM_PFNMAP))
+		goto out;
+
+	pgd = pgd_offset(mm, address);
+	if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
+		goto out;
+
+	pud = pud_offset(pgd, address);
+	if (pud_none(*pud) || unlikely(pud_bad(*pud)))
+		goto out;
+	
+	pmd = pmd_offset(pud, address);
+	if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
+		goto out;
+
+	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
+	if (!ptep)
+		goto out;
+
+	pte = *ptep;
+	if (!pte_present(pte))
+		goto unlock;
+
+	pfn = pte_pfn(pte);
+	if (!pfn_valid(pfn))
+		goto unlock;
+
+	page = pfn_to_page(pfn);
+	if (!page_is_reserved(page)) {
+		page = NULL;
+		goto unlock;
+	}
+unlock:
+	pte_unmap_unlock(ptep, ptl);
+out:
+	return page;
+}
+#endif /* CONFIG_RESERVE_PHYSICAL_START */
+
 /*
  * Requires current->mm->mmap_sem to be held
  */
@@ -519,6 +604,11 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 				NULL);
 
 	if (npages != 1) {
+#ifdef CONFIG_RESERVE_PHYSICAL_START
+		page[0] = direct_page(current->mm, addr);
+		if (page[0])
+			return page[0];
+#endif
 		get_page(bad_page);
 		return bad_page;
 	}
@@ -530,15 +620,18 @@ EXPORT_SYMBOL_GPL(gfn_to_page);
 
 void kvm_release_page_clean(struct page *page)
 {
-	put_page(page);
+	if (!page_is_reserved(page))
+		put_page(page);
 }
 EXPORT_SYMBOL_GPL(kvm_release_page_clean);
 
 void kvm_release_page_dirty(struct page *page)
 {
-	if (!PageReserved(page))
-		SetPageDirty(page);
-	put_page(page);
+	if (!page_is_reserved(page)) {
+		if (!PageReserved(page))
+			SetPageDirty(page);
+		put_page(page);
+	}
 }
 EXPORT_SYMBOL_GPL(kvm_release_page_dirty);
 

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [2/3] -reserved-ram for PCI passthrough without VT-d and without paravirt
  2008-03-31 16:55 [0/3] -reserved-ram for PCI passthrough without VT-d and without paravirt Andrea Arcangeli
  2008-03-31 17:02 ` [1/3] " Andrea Arcangeli
@ 2008-03-31 17:07 ` Andrea Arcangeli
  2008-03-31 17:20 ` [3/3] -reserved-ram for PCI passthrough without iommu " Andrea Arcangeli
  2008-04-11 12:13 ` [0/3] -reserved-ram for PCI passthrough without VT-d " Amit Shah
  3 siblings, 0 replies; 8+ messages in thread
From: Andrea Arcangeli @ 2008-03-31 17:07 UTC (permalink / raw)
  To: kvm-devel

This is the kvm-userland.git patch overwriting the ranges in the
virtualized e820 map with /dev/mem. All is validated through
/proc/iomem, so shall the hardware e820 map be weird, there will be
zero risk of corruption, simply it will fail to startup with a verbose
error.

The bios has to be rebuilt to pass the variable address near 640k
where to stop the virtualized e820 slot in function of the ram
available in the host, and in function of the eary-reserve for things
like the smp trampoline page that we don't want to pass as available
ram to the guest. Only the page at address zero is magic and it's
mapped as ram in the guest, but it's allocated through regular
anonymous memory as you can see from the first /dev/mem mapping
starting at area+reserved[0]. To rebuild the bios "make bios" before
"make install" should do the trick. If you don't rebuild the bios
everything will work fine if you don't use pci-passthrough, but then
pci passthrough will randomly memory corrupt the host.

Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>

diff --git a/bios/rombios.c b/bios/rombios.c
index 318de57..f93a6c6 100644
--- a/bios/rombios.c
+++ b/bios/rombios.c
@@ -4251,6 +4251,7 @@ int15_function32(regs, ES, DS, FLAGS)
   Bit32u  extra_lowbits_memory_size=0;
   Bit16u  CX,DX;
   Bit8u   extra_highbits_memory_size=0;
+  Bit32u  below_640_end;
 
 BX_DEBUG_INT15("int15 AX=%04x\n",regs.u.r16.ax);
 
@@ -4305,6 +4306,11 @@ ASM_END
          case 0x20: // coded by osmaker aka K.J.
             if(regs.u.r32.edx == 0x534D4150)
             {
+                below_640_end = inb_cmos(0x16);
+                below_640_end <<= 8;
+                below_640_end |= inb_cmos(0x15);
+                below_640_end *= 1024;
+
                 extended_memory_size = inb_cmos(0x35);
                 extended_memory_size <<= 8;
                 extended_memory_size |= inb_cmos(0x34);
@@ -4334,7 +4340,7 @@ ASM_END
                 {
                     case 0:
                         set_e820_range(ES, regs.u.r16.di,
-                                       0x0000000L, 0x0009fc00L, 0, 0, 1);
+                                       0x0000000L, below_640_end, 0, 0, 1);
                         regs.u.r32.ebx = 1;
                         regs.u.r32.eax = 0x534D4150;
                         regs.u.r32.ecx = 0x14;
@@ -4343,7 +4349,7 @@ ASM_END
                         break;
                     case 1:
                         set_e820_range(ES, regs.u.r16.di,
-                                       0x0009fc00L, 0x000a0000L, 0, 0, 2);
+                                       below_640_end, 0x000a0000L, 0, 0, 2);
                         regs.u.r32.ebx = 2;
                         regs.u.r32.eax = 0x534D4150;
                         regs.u.r32.ecx = 0x14;
diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c
index 0d2e6c3..a6b28c8 100644
--- a/qemu/hw/pc.c
+++ b/qemu/hw/pc.c
@@ -198,6 +198,8 @@ static void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
 
     /* memory size */
     val = 640; /* base memory in K */
+    if (reserved_ram)
+	    val = reserved[1] / 1024;
     rtc_set_memory(s, 0x15, val);
     rtc_set_memory(s, 0x16, val >> 8);
 
diff --git a/qemu/pc-bios/bios.bin b/qemu/pc-bios/bios.bin
index 2e7d3e0..90d626d 100644
Binary files a/qemu/pc-bios/bios.bin and b/qemu/pc-bios/bios.bin differ
diff --git a/qemu/sysemu.h b/qemu/sysemu.h
index c728605..db0dda4 100644
--- a/qemu/sysemu.h
+++ b/qemu/sysemu.h
@@ -103,6 +103,8 @@ extern int autostart;
 extern int old_param;
 extern int hpagesize;
 extern const char *bootp_filename;
+extern int reserved_ram;
+extern int64_t reserved[4];
 
 
 #ifdef USE_KQEMU
diff --git a/qemu/vl.c b/qemu/vl.c
index 3570388..31adc90 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -240,6 +240,8 @@ int time_drift_fix = 0;
 unsigned int kvm_shadow_memory = 0;
 const char *mem_path = NULL;
 int hpagesize = 0;
+int reserved_ram = 0;
+int64_t reserved[4];
 const char *cpu_vendor_string;
 #ifdef TARGET_ARM
 int old_param = 0;
@@ -8313,6 +8315,7 @@ enum {
     QEMU_OPTION_tdf,
     QEMU_OPTION_kvm_shadow_memory,
     QEMU_OPTION_mempath,
+    QEMU_OPTION_reserved_ram,
 };
 
 typedef struct QEMUOption {
@@ -8439,6 +8442,7 @@ const QEMUOption qemu_options[] = {
     { "clock", HAS_ARG, QEMU_OPTION_clock },
     { "startdate", HAS_ARG, QEMU_OPTION_startdate },
     { "mem-path", HAS_ARG, QEMU_OPTION_mempath },
+    { "reserved-ram", 0, QEMU_OPTION_reserved_ram },
     { NULL },
 };
 
@@ -8724,6 +8728,80 @@ static int gethugepagesize(void)
     return hugepagesize;
 }
 
+static int find_reserved_ram(int64_t *_start, int64_t *_end,
+			     unsigned long below, unsigned long above,
+			     unsigned long min_size)
+{
+    int ret, fd;
+    char buf[4096];
+    char *needle = "reserved RAM\n";
+//    char *needle = "System RAM\n";
+    char *size, *curr;
+    int64_t start, end;
+
+    fd = open("/proc/iomem", O_RDONLY);
+    if (fd < 0) {
+	perror("open");
+	exit(0);
+    }
+
+    ret = read(fd, buf, sizeof(buf)-1);
+    if (ret < 0) {
+	perror("read");
+	exit(0);
+    }
+    buf[ret] = 0;
+
+    size = buf;
+    while (1) {
+	    size = strstr(size, needle);
+	    if (!size)
+		    return 0;
+	    size += strlen(needle);
+	    curr = size - strlen(needle) - 20;
+	    start = strtoll(curr, &curr, 16);
+	    end = strtoll(curr+1, NULL, 16);
+	    if ((!above || start >= above) && (!below || end <= below) &&
+		(!min_size || end-start >= min_size)) {
+		    *_start = start;
+		    *_end = end+1;
+		    return 1;
+	    }
+    }
+}
+
+static void init_reserved_ram(void)
+{
+	if (find_reserved_ram(&reserved[0], &reserved[1],
+			      640*1024, 0, 500*1024) &&
+	    find_reserved_ram(&reserved[2], &reserved[3],
+			      0, 1024*1024, 1024*1024)) {
+		reserved_ram = 1;
+		if (reserved[0] != 4096) {
+			fprintf(stderr,
+				"strange host ram layout\n");
+			exit(1);
+		}
+		if (reserved[2] != 1024*1024) {
+			fprintf(stderr,
+				"strange host ram layout\n");
+			exit(1);
+		}
+		if (reserved[3] < ram_size) {
+			fprintf(stderr,
+				"not enough host reserved ram, decrease -m\n");
+			exit(1);
+		}
+		reserved[1] &= TARGET_PAGE_MASK;
+		//printf("reserved RAM %lx-%lx %lx-%lx\n",
+		//       reserved[0], reserved[1], reserved[2], reserved[3]);
+	} else {
+		fprintf(stderr,
+			"host reserved ram not found\n");
+		exit(1);
+	}
+}
+
 void *alloc_mem_area(unsigned long memory, const char *path)
 {
     char *filename;
@@ -8768,10 +8846,43 @@ void *qemu_alloc_physram(unsigned long memory)
 {
     void *area = NULL;
 
-    if (mem_path)
+    if (!area && mem_path)
 	area = alloc_mem_area(memory, mem_path);
-    if (!area)
+    if (!area) {
 	area = qemu_vmalloc(memory);
+	if (reserved_ram) {
+	    int fd;
+	    if (memory < reserved[2]) {
+		printf("memory < reserved[2]\n");
+		return NULL;
+	    }
+	    fd = open("/dev/mem", O_RDWR);
+	    if (fd < 0) {
+		perror("reserved_ram requires access to /dev/mem");
+		return NULL;
+	    }
+	    if (mmap((char *)area+reserved[0],
+		reserved[1]-reserved[0],
+		     PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED,
+		     fd, 0) == MAP_FAILED) {
+		    perror("reserved_ram mmap failed on /dev/mem");
+		    return NULL;
+	    }
+	    bzero((char *)area+reserved[0], reserved[1]-reserved[0]);
+	    if (mmap((char *)area+reserved[2],
+		     ram_size-reserved[2],
+		     PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED,
+		     fd, reserved[2]) == MAP_FAILED) {
+		    perror("reserved_ram mmap failed on /dev/mem");
+		    return NULL;
+	    }
+	    bzero((char *)area+reserved[2], ram_size-reserved[2]);
+	    if (close(fd) < 0) {
+		    perror("/dev/mem");
+		    return NULL;
+	    }
+	}
+    }
 
     return area;
 }
@@ -9389,6 +9500,9 @@ int main(int argc, char **argv)
             case QEMU_OPTION_mempath:
 		mem_path = optarg;
 		break;
+            case QEMU_OPTION_reserved_ram:
+		init_reserved_ram();
+		break;
             case QEMU_OPTION_name:
                 qemu_name = optarg;
                 break;


-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [3/3] -reserved-ram for PCI passthrough without iommu and without paravirt
  2008-03-31 16:55 [0/3] -reserved-ram for PCI passthrough without VT-d and without paravirt Andrea Arcangeli
  2008-03-31 17:02 ` [1/3] " Andrea Arcangeli
  2008-03-31 17:07 ` [2/3] " Andrea Arcangeli
@ 2008-03-31 17:20 ` Andrea Arcangeli
  2008-04-11 12:13 ` [0/3] -reserved-ram for PCI passthrough without VT-d " Amit Shah
  3 siblings, 0 replies; 8+ messages in thread
From: Andrea Arcangeli @ 2008-03-31 17:20 UTC (permalink / raw)
  To: kvm-devel; +Cc: Andrew Morton, linux-mm, linux-kernel

Hello,

The "reserved RAM" can be mapped by virtualization software with
/dev/mem to create a 1:1 mapping between guest physical (bus) address
and host physical (bus) address. Please let me know if something like
this can be merged in -mm (this is the minimal possible change to
achieve the feature). The part at the end is more a fix, but it's only
required with this applied (unless you want to have a kexec above
~40M).

Here the complete patchset:

http://marc.info/?l=kvm-devel&m=120698256716369&w=2
http://marc.info/?l=kvm-devel&m=120698299317253&w=2
http://marc.info/?l=kvm-devel&m=120698328617835&w=2

Note current mainline is buggy so this patch should also applied with
-R for the host kernel to boot but hopefully the regression will be
solved sooner than later (no reply yet though).

http://marc.info/?l=kvm-devel&m=120673375913890&w=2

andrea@svm ~ $ cat /proc/iomem | head
00000000-00000fff : reserved RAM failed
00001000-0008ffff : reserved RAM
00090000-00091fff : reserved RAM failed
00092000-0009efff : reserved RAM
0009f000-0009ffff : reserved
000cd600-000cffff : pnp 00:0d
000f0000-000fffff : reserved
00100000-1fffffff : reserved RAM
20000000-3dedffff : System RAM

Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1107,8 +1107,36 @@ config CRASH_DUMP
 	  (CONFIG_RELOCATABLE=y).
 	  For more details see Documentation/kdump/kdump.txt
 
+config RESERVE_PHYSICAL_START
+	bool "Reserve all RAM below PHYSICAL_START (EXPERIMENTAL)"
+	depends on !RELOCATABLE && X86_64
+	help
+	  This makes the kernel use only RAM above __PHYSICAL_START.
+	  All memory below __PHYSICAL_START will be left unused and
+	  marked as "reserved RAM" in /proc/iomem. The few special
+	  pages that can't be relocated at addresses above
+	  __PHYSICAL_START and that can't be guaranteed to be unused
+	  by the running kernel will be marked "reserved RAM failed"
+	  in /proc/iomem. Those may or may be not used by the kernel
+	  (for example SMP trampoline pages would only be used if
+	  CPU hotplug is enabled).
+
+	  The "reserved RAM" can be mapped by virtualization software
+	  with /dev/mem to create a 1:1 mapping between guest physical
+	  (bus) address and host physical (bus) address. This will
+	  allow PCI passthrough with DMA for the guest using the RAM
+	  with the 1:1 mapping. The only detail to take care of is the
+	  RAM marked "reserved RAM failed". The virtualization
+	  software must create for the guest an e820 map that only
+	  includes the "reserved RAM" regions but if the guest touches
+	  memory with guest physical address in the "reserved RAM
+	  failed" ranges (Linux guest will do that even if the RAM
+	  isn't present in the e820 map), it should provide that as
+	  RAM and map it with a non-linear mapping. This should allow
+	  any Linux kernel to run fine and hopefully any other OS too.
+
 config PHYSICAL_START
-	hex "Physical address where the kernel is loaded" if (EMBEDDED || CRASH_DUMP)
+	hex "Physical address where the kernel is loaded" if (EMBEDDED || CRASH_DUMP || RESERVE_PHYSICAL_START)
 	default "0x1000000" if X86_NUMAQ
 	default "0x200000" if X86_64
 	default "0x100000"
diff --git a/arch/x86/kernel/e820_64.c b/arch/x86/kernel/e820_64.c
--- a/arch/x86/kernel/e820_64.c
+++ b/arch/x86/kernel/e820_64.c
@@ -91,6 +91,11 @@ void __init early_res_to_bootmem(void)
 		printk(KERN_INFO "early res: %d [%lx-%lx] %s\n", i,
 			r->start, r->end - 1, r->name);
 		reserve_bootmem_generic(r->start, r->end - r->start);
+#ifdef CONFIG_RESERVE_PHYSICAL_START
+		if (r->start < __PHYSICAL_START)
+			add_memory_region(r->start, r->end - r->start,
+					  E820_RESERVED_RAM_FAILED);
+#endif			
 	}
 }
 
@@ -231,6 +236,10 @@ void __init e820_reserve_resources(struc
 		struct resource *data_resource, struct resource *bss_resource)
 {
 	int i;
+#ifdef CONFIG_RESERVE_PHYSICAL_START
+	/* solve E820_RESERVED_RAM vs E820_RESERVED_RAM_FAILED conflicts */
+	update_e820();
+#endif
 	for (i = 0; i < e820.nr_map; i++) {
 		struct resource *res;
 		res = alloc_bootmem_low(sizeof(struct resource));
@@ -238,6 +247,16 @@ void __init e820_reserve_resources(struc
 		case E820_RAM:	res->name = "System RAM"; break;
 		case E820_ACPI:	res->name = "ACPI Tables"; break;
 		case E820_NVS:	res->name = "ACPI Non-volatile Storage"; break;
+#ifdef CONFIG_RESERVE_PHYSICAL_START
+		case E820_RESERVED_RAM_FAILED:
+			res->name = "reserved RAM failed";
+			break;
+		case E820_RESERVED_RAM:
+			memset(__va(e820.map[i].addr),
+			       POISON_FREE_INITMEM, e820.map[i].size);
+			res->name = "reserved RAM";
+			break;
+#endif
 		default:	res->name = "reserved";
 		}
 		res->start = e820.map[i].addr;
@@ -410,6 +429,14 @@ static void __init e820_print_map(char *
 		case E820_NVS:
 			printk(KERN_CONT "(ACPI NVS)\n");
 			break;
+#ifdef CONFIG_RESERVE_PHYSICAL_START
+		case E820_RESERVED_RAM:
+			printk(KERN_CONT "(reserved RAM)\n");
+			break;
+		case E820_RESERVED_RAM_FAILED:
+			printk(KERN_CONT "(reserved RAM failed)\n");
+			break;
+#endif
 		default:
 			printk(KERN_CONT "type %u\n", e820.map[i].type);
 			break;
@@ -639,9 +666,31 @@ static int __init copy_e820_map(struct e
 		unsigned long end = start + size;
 		unsigned long type = biosmap->type;
 
+#ifdef CONFIG_RESERVE_PHYSICAL_START
+		/* make space for two more low-prio types */
+		type += 2;
+#endif
+
 		/* Overflow in 64 bits? Ignore the memory map. */
 		if (start > end)
 			return -1;
+
+#ifdef CONFIG_RESERVE_PHYSICAL_START
+		if (type == E820_RAM) {
+			if (end <= __PHYSICAL_START) {
+				add_memory_region(start, size,
+						  E820_RESERVED_RAM);
+				continue;
+			}
+			if (start < __PHYSICAL_START) {
+				add_memory_region(start,
+						  __PHYSICAL_START-start,
+						  E820_RESERVED_RAM);
+				size -= __PHYSICAL_START-start;
+				start = __PHYSICAL_START;
+			}
+		}
+#endif
 
 		add_memory_region(start, size, type);
 	} while (biosmap++, --nr_map);
diff --git a/include/asm-x86/e820.h b/include/asm-x86/e820.h
--- a/include/asm-x86/e820.h
+++ b/include/asm-x86/e820.h
@@ -4,10 +4,19 @@
 #define E820MAX	128		/* number of entries in E820MAP */
 #define E820NR	0x1e8		/* # entries in E820MAP */
 
+#ifdef CONFIG_RESERVE_PHYSICAL_START
+#define E820_RESERVED_RAM 1
+#define E820_RESERVED_RAM_FAILED 2
+#define E820_RAM	3
+#define E820_RESERVED	4
+#define E820_ACPI	5
+#define E820_NVS	6
+#else
 #define E820_RAM	1
 #define E820_RESERVED	2
 #define E820_ACPI	3
 #define E820_NVS	4
+#endif
 
 #ifndef __ASSEMBLY__
 struct e820entry {
diff --git a/include/asm-x86/page_64.h b/include/asm-x86/page_64.h
--- a/include/asm-x86/page_64.h
+++ b/include/asm-x86/page_64.h
@@ -29,6 +29,7 @@
 #define __PAGE_OFFSET           _AC(0xffff810000000000, UL)
 
 #define __PHYSICAL_START	CONFIG_PHYSICAL_START
+#define __PHYSICAL_OFFSET	(__PHYSICAL_START-0x200000)
 #define __KERNEL_ALIGN		0x200000
 
 /*
@@ -51,7 +52,7 @@
  * Kernel image size is limited to 128 MB (see level2_kernel_pgt in
  * arch/x86/kernel/head_64.S), and it is mapped here:
  */
-#define KERNEL_IMAGE_SIZE	(128*1024*1024)
+#define KERNEL_IMAGE_SIZE	(128*1024*1024+__PHYSICAL_OFFSET)
 #define KERNEL_IMAGE_START	_AC(0xffffffff80000000, UL)
 
 #ifndef __ASSEMBLY__
diff --git a/include/asm-x86/pgtable_64.h b/include/asm-x86/pgtable_64.h
--- a/include/asm-x86/pgtable_64.h
+++ b/include/asm-x86/pgtable_64.h
@@ -140,7 +140,7 @@ static inline void native_pgd_clear(pgd_
 #define VMALLOC_START    _AC(0xffffc20000000000, UL)
 #define VMALLOC_END      _AC(0xffffe1ffffffffff, UL)
 #define VMEMMAP_START	 _AC(0xffffe20000000000, UL)
-#define MODULES_VADDR    _AC(0xffffffff88000000, UL)
+#define MODULES_VADDR    (0xffffffff88000000UL+__PHYSICAL_OFFSET)
 #define MODULES_END      _AC(0xfffffffffff00000, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 
diff --git a/include/asm-x86/smp_64.h b/include/asm-x86/smp_64.h
--- a/include/asm-x86/smp_64.h
+++ b/include/asm-x86/smp_64.h
@@ -47,7 +47,11 @@ static inline int cpu_present_to_apicid(
 
 #ifdef CONFIG_SMP
 
+#ifndef CONFIG_RESERVE_PHYSICAL_START
 #define SMP_TRAMPOLINE_BASE 0x6000
+#else
+#define SMP_TRAMPOLINE_BASE 0x90000 /* move it next to 640k */
+#endif
 
 extern int __cpu_disable(void);
 extern void __cpu_die(unsigned int cpu);

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [0/3] -reserved-ram for PCI passthrough without VT-d and without paravirt
  2008-03-31 16:55 [0/3] -reserved-ram for PCI passthrough without VT-d and without paravirt Andrea Arcangeli
                   ` (2 preceding siblings ...)
  2008-03-31 17:20 ` [3/3] -reserved-ram for PCI passthrough without iommu " Andrea Arcangeli
@ 2008-04-11 12:13 ` Amit Shah
  2008-04-11 18:36   ` [0/3] -reserved-ram for PCI passthrough without VT-d and without?paravirt Andrea Arcangeli
  3 siblings, 1 reply; 8+ messages in thread
From: Amit Shah @ 2008-04-11 12:13 UTC (permalink / raw)
  To: kvm-devel

Hi Andrea,

Did you have to recompile the bios? How did you do that (or did you ask Avi to 
generate it?) Do you have a binary of the bios that I can use to test 
reserved ram?

Thanks,
Amit

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [0/3] -reserved-ram for PCI passthrough without VT-d and without?paravirt
  2008-04-11 12:13 ` [0/3] -reserved-ram for PCI passthrough without VT-d " Amit Shah
@ 2008-04-11 18:36   ` Andrea Arcangeli
  2008-04-12  7:41     ` Amit Shah
  0 siblings, 1 reply; 8+ messages in thread
From: Andrea Arcangeli @ 2008-04-11 18:36 UTC (permalink / raw)
  To: Amit Shah; +Cc: kvm-devel

On Fri, Apr 11, 2008 at 05:43:03PM +0530, Amit Shah wrote:
> Hi Andrea,
> 
> Did you have to recompile the bios? How did you do that (or did you ask Avi to 

Yes.

> generate it?) Do you have a binary of the bios that I can use to test 
> reserved ram?

make bios; make install should do the trick, the new bios should run
after that.

thanks,
Andrea

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [0/3] -reserved-ram for PCI passthrough without VT-d and without?paravirt
  2008-04-11 18:36   ` [0/3] -reserved-ram for PCI passthrough without VT-d and without?paravirt Andrea Arcangeli
@ 2008-04-12  7:41     ` Amit Shah
  2008-04-12 12:22       ` Andrea Arcangeli
  0 siblings, 1 reply; 8+ messages in thread
From: Amit Shah @ 2008-04-12  7:41 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: kvm-devel, Avi Kivity

* On Saturday 12 Apr 2008 00:06:32 Andrea Arcangeli wrote:
> On Fri, Apr 11, 2008 at 05:43:03PM +0530, Amit Shah wrote:
> > Hi Andrea,
> >
> > Did you have to recompile the bios? How did you do that (or did you ask
> > Avi to
>
> Yes.
>
> > generate it?) Do you have a binary of the bios that I can use to test
> > reserved ram?
>
> make bios; make install should do the trick, the new bios should run
> after that.

Well bcc, iasl on my machine have some problem.

Avi: do we have a machine where we have this setup for me to compile the bios?

Amit

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [0/3] -reserved-ram for PCI passthrough without VT-d and without?paravirt
  2008-04-12  7:41     ` Amit Shah
@ 2008-04-12 12:22       ` Andrea Arcangeli
  0 siblings, 0 replies; 8+ messages in thread
From: Andrea Arcangeli @ 2008-04-12 12:22 UTC (permalink / raw)
  To: Amit Shah; +Cc: kvm-devel, Avi Kivity

On Sat, Apr 12, 2008 at 01:11:27PM +0530, Amit Shah wrote:
> Well bcc, iasl on my machine have some problem.

The versions I'm using are:

andrea@duo ~ $ esearch iasl dev86
[ Results for search key : iasl ]
[ Applications found : 1 ]

*  sys-power/iasl
      Latest version available: 20060912
      Latest version installed: 20060912
      Size of downloaded files: [no/bad digest]
      Homepage:    http://www.intel.com/technology/iapc/acpi/
      Description: Intel ACPI Source Language (ASL) compiler
      License:     iASL


[ Results for search key : dev86 ]
[ Applications found : 1 ]

*  sys-devel/dev86
      Latest version available: 0.16.17-r5
      Latest version installed: 0.16.17-r5
      Size of downloaded files: [no/bad digest]
      Homepage:    http://www.cix.co.uk/~mayday
      Description: Bruce's C compiler - Simple C compiler to generate
      8086 code
      License:     GPL-2



They worked fine.

> Avi: do we have a machine where we have this setup for me to compile the bios?

I will send the patched bios to you by email in the meantime.

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-04-12 12:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-31 16:55 [0/3] -reserved-ram for PCI passthrough without VT-d and without paravirt Andrea Arcangeli
2008-03-31 17:02 ` [1/3] " Andrea Arcangeli
2008-03-31 17:07 ` [2/3] " Andrea Arcangeli
2008-03-31 17:20 ` [3/3] -reserved-ram for PCI passthrough without iommu " Andrea Arcangeli
2008-04-11 12:13 ` [0/3] -reserved-ram for PCI passthrough without VT-d " Amit Shah
2008-04-11 18:36   ` [0/3] -reserved-ram for PCI passthrough without VT-d and without?paravirt Andrea Arcangeli
2008-04-12  7:41     ` Amit Shah
2008-04-12 12:22       ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox