* [Qemu-devel] [PATCH 0/6] Support for the Kernel Virtual Machine interface
@ 2008-01-31 22:36 Anthony Liguori
2008-01-31 22:36 ` [Qemu-devel] [PATCH 1/6] Use correct types to enable > 2G support Anthony Liguori
` (6 more replies)
0 siblings, 7 replies; 35+ messages in thread
From: Anthony Liguori @ 2008-01-31 22:36 UTC (permalink / raw)
To: qemu-devel; +Cc: kvm-devel, Paul Brook
KVM is a Linux interface for providing userspace interfaces for accelerated
virtualization. It has been included since 2.6.20 and supports Intel VT and
AMD-V. Ports are under way for ia64, embedded PowerPC, and s390.
This set of patches provide basic support for KVM in QEMU. It does not include
all of the changes in the KVM QEMU branch (such as virtio, live migration,
extboot, etc). However, if we can get these first portions merged, I will
follow up with the remainder of the changes and I believe we can be fully
merged in the very near future.
The first 5 patches of this series are not KVM specific but are critical fixes
for KVM to be functional. The 6th patch provides KVM support. The goal in
providing KVM support is to make sure that when KVM support is not compiled in,
the code paths aren't changed at all. I hope this makes it very easy to merge.
KVM moves very quickly, so I'd appreciate if these patches can be reviewed as
soon as possible as it's going to be tough to keep them in sync with the main
KVM tree while they're out of tree.
To enable KVM support, you have to have to libkvm installed. You should also
explicitly specify the location of your kernel tree (with KVM headers) with the
--kernel-path option. We will improve libkvm such that this isn't required in
future versions.
KVM also has an enhanced Bochs BIOS. I've tested these patches with out it and
it's not strictly necessary for basic functionality. I would recommend pulling
in a copy of it though as it has useful fixes even in the absence of KVM.
A very large number of people have contributed to these patches with Avi Kivity
being the main developer of this support. For a full listing of contributers,
please consult the KVM ChangeLog[1].
[1] http://kvm.qumranet.com/kvmwiki/ChangeLog
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] [PATCH 1/6] Use correct types to enable > 2G support
2008-01-31 22:36 [Qemu-devel] [PATCH 0/6] Support for the Kernel Virtual Machine interface Anthony Liguori
@ 2008-01-31 22:36 ` Anthony Liguori
2008-01-31 23:54 ` [Qemu-devel] " Paul Brook
` (2 more replies)
2008-01-31 22:36 ` [Qemu-devel] [PATCH 2/6] SCI fixes Anthony Liguori
` (5 subsequent siblings)
6 siblings, 3 replies; 35+ messages in thread
From: Anthony Liguori @ 2008-01-31 22:36 UTC (permalink / raw)
To: qemu-devel; +Cc: kvm-devel, Paul Brook
KVM supports more than 2GB of memory for x86_64 hosts. The following patch
fixes a number of type related issues where int's were being used when they
shouldn't have been. It also introduces CMOS support so the BIOS can build
the appropriate e820 tables.
Index: qemu/cpu-all.h
===================================================================
--- qemu.orig/cpu-all.h 2008-01-30 13:47:00.000000000 -0600
+++ qemu/cpu-all.h 2008-01-30 13:47:31.000000000 -0600
@@ -695,7 +695,7 @@
/* page related stuff */
-#define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS)
+#define TARGET_PAGE_SIZE (1ul << TARGET_PAGE_BITS)
#define TARGET_PAGE_MASK ~(TARGET_PAGE_SIZE - 1)
#define TARGET_PAGE_ALIGN(addr) (((addr) + TARGET_PAGE_SIZE - 1) & TARGET_PAGE_MASK)
@@ -816,7 +816,7 @@
/* memory API */
-extern int phys_ram_size;
+extern ram_addr_t phys_ram_size;
extern int phys_ram_fd;
extern uint8_t *phys_ram_base;
extern uint8_t *phys_ram_dirty;
@@ -844,7 +844,7 @@
unsigned long size,
unsigned long phys_offset);
uint32_t cpu_get_physical_page_desc(target_phys_addr_t addr);
-ram_addr_t qemu_ram_alloc(unsigned int size);
+ram_addr_t qemu_ram_alloc(unsigned long size);
void qemu_ram_free(ram_addr_t addr);
int cpu_register_io_memory(int io_index,
CPUReadMemoryFunc **mem_read,
Index: qemu/exec.c
===================================================================
--- qemu.orig/exec.c 2008-01-30 13:47:00.000000000 -0600
+++ qemu/exec.c 2008-01-30 13:47:31.000000000 -0600
@@ -73,9 +73,11 @@
#define TARGET_VIRT_ADDR_SPACE_BITS 42
#elif defined(TARGET_PPC64)
#define TARGET_PHYS_ADDR_SPACE_BITS 42
-#else
+#elif USE_KQEMU
/* Note: for compatibility with kqemu, we use 32 bits for x86_64 */
#define TARGET_PHYS_ADDR_SPACE_BITS 32
+#else
+#define TARGET_PHYS_ADDR_SPACE_BITS 42
#endif
TranslationBlock tbs[CODE_GEN_MAX_BLOCKS];
@@ -87,7 +89,7 @@
uint8_t code_gen_buffer[CODE_GEN_BUFFER_SIZE] __attribute__((aligned (32)));
uint8_t *code_gen_ptr;
-int phys_ram_size;
+ram_addr_t phys_ram_size;
int phys_ram_fd;
uint8_t *phys_ram_base;
uint8_t *phys_ram_dirty;
@@ -112,7 +114,7 @@
typedef struct PhysPageDesc {
/* offset in host memory of the page + io_index in the low 12 bits */
- uint32_t phys_offset;
+ ram_addr_t phys_offset;
} PhysPageDesc;
#define L2_BITS 10
@@ -2083,11 +2085,11 @@
}
/* XXX: better than nothing */
-ram_addr_t qemu_ram_alloc(unsigned int size)
+ram_addr_t qemu_ram_alloc(unsigned long size)
{
ram_addr_t addr;
if ((phys_ram_alloc_offset + size) >= phys_ram_size) {
- fprintf(stderr, "Not enough memory (requested_size = %u, max memory = %d)\n",
+ fprintf(stderr, "Not enough memory (requested_size = %lu, max memory = %d)\n",
size, phys_ram_size);
abort();
}
Index: qemu/hw/boards.h
===================================================================
--- qemu.orig/hw/boards.h 2008-01-30 13:47:00.000000000 -0600
+++ qemu/hw/boards.h 2008-01-30 13:47:31.000000000 -0600
@@ -3,7 +3,7 @@
#ifndef HW_BOARDS_H
#define HW_BOARDS_H
-typedef void QEMUMachineInitFunc(int ram_size, int vga_ram_size,
+typedef void QEMUMachineInitFunc(ram_addr_t ram_size, int vga_ram_size,
const char *boot_device, DisplayState *ds,
const char *kernel_filename,
const char *kernel_cmdline,
Index: qemu/hw/pc.c
===================================================================
--- qemu.orig/hw/pc.c 2008-01-30 13:47:00.000000000 -0600
+++ qemu/hw/pc.c 2008-01-30 13:47:31.000000000 -0600
@@ -181,7 +181,8 @@
}
/* hd_table must contain 4 block drivers */
-static void cmos_init(int ram_size, const char *boot_device, BlockDriverState **hd_table)
+static void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
+ const char *boot_device, BlockDriverState **hd_table)
{
RTCState *s = rtc_state;
int nbds, bds[3] = { 0, };
@@ -204,6 +205,12 @@
rtc_set_memory(s, 0x30, val);
rtc_set_memory(s, 0x31, val >> 8);
+ if (above_4g_mem_size) {
+ rtc_set_memory(s, 0x5b, (unsigned int)above_4g_mem_size >> 16);
+ rtc_set_memory(s, 0x5c, (unsigned int)above_4g_mem_size >> 24);
+ rtc_set_memory(s, 0x5d, above_4g_mem_size >> 32);
+ }
+
if (ram_size > (16 * 1024 * 1024))
val = (ram_size / 65536) - ((16 * 1024 * 1024) / 65536);
else
@@ -697,7 +704,7 @@
}
/* PC hardware initialisation */
-static void pc_init1(int ram_size, int vga_ram_size,
+static void pc_init1(ram_addr_t ram_size, int vga_ram_size,
const char *boot_device, DisplayState *ds,
const char *kernel_filename, const char *kernel_cmdline,
const char *initrd_filename,
@@ -706,6 +713,7 @@
char buf[1024];
int ret, linux_boot, i;
ram_addr_t ram_addr, vga_ram_addr, bios_offset, vga_bios_offset;
+ ram_addr_t above_4g_mem_size = 0;
int bios_size, isa_bios_size, vga_bios_size;
PCIBus *pci_bus;
int piix3_devfn = -1;
@@ -717,6 +725,11 @@
BlockDriverState *hd[MAX_IDE_BUS * MAX_IDE_DEVS];
BlockDriverState *fd[MAX_FD];
+ if (ram_size >= 0xe0000000 ) {
+ above_4g_mem_size = ram_size - 0xe0000000;
+ ram_size = 0xe0000000;
+ }
+
linux_boot = (kernel_filename != NULL);
/* init CPUs */
@@ -790,6 +803,12 @@
exit(1);
}
+ /* above 4giga memory allocation */
+ if (above_4g_mem_size > 0) {
+ ram_addr = qemu_ram_alloc(above_4g_mem_size);
+ cpu_register_physical_memory(0x100000000, above_4g_mem_size, ram_addr);
+ }
+
/* setup basic memory access */
cpu_register_physical_memory(0xc0000, 0x10000,
vga_bios_offset | IO_MEM_ROM);
@@ -970,7 +989,7 @@
}
floppy_controller = fdctrl_init(i8259[6], 2, 0, 0x3f0, fd);
- cmos_init(ram_size, boot_device, hd);
+ cmos_init(ram_size, above_4g_mem_size, boot_device, hd);
if (pci_enabled && usb_enabled) {
usb_uhci_piix3_init(pci_bus, piix3_devfn + 2);
@@ -1010,7 +1029,7 @@
}
}
-static void pc_init_pci(int ram_size, int vga_ram_size,
+static void pc_init_pci(ram_addr_t ram_size, int vga_ram_size,
const char *boot_device, DisplayState *ds,
const char *kernel_filename,
const char *kernel_cmdline,
@@ -1022,7 +1041,7 @@
initrd_filename, 1, cpu_model);
}
-static void pc_init_isa(int ram_size, int vga_ram_size,
+static void pc_init_isa(ram_addr_t ram_size, int vga_ram_size,
const char *boot_device, DisplayState *ds,
const char *kernel_filename,
const char *kernel_cmdline,
Index: qemu/osdep.c
===================================================================
--- qemu.orig/osdep.c 2008-01-30 13:47:00.000000000 -0600
+++ qemu/osdep.c 2008-01-30 13:47:31.000000000 -0600
@@ -113,7 +113,7 @@
int64_t free_space;
int ram_mb;
- extern int ram_size;
+ extern int64_t ram_size;
free_space = (int64_t)stfs.f_bavail * stfs.f_bsize;
if ((ram_size + 8192 * 1024) >= free_space) {
ram_mb = (ram_size / (1024 * 1024));
@@ -202,7 +202,7 @@
#ifdef _BSD
return valloc(size);
#else
- return memalign(4096, size);
+ return memalign(TARGET_PAGE_SIZE, size);
#endif
}
Index: qemu/sysemu.h
===================================================================
--- qemu.orig/sysemu.h 2008-01-30 13:47:00.000000000 -0600
+++ qemu/sysemu.h 2008-01-30 13:47:31.000000000 -0600
@@ -69,7 +69,7 @@
/* SLIRP */
void do_info_slirp(void);
-extern int ram_size;
+extern int64_t ram_size;
extern int bios_size;
extern int rtc_utc;
extern int rtc_start_date;
Index: qemu/vl.c
===================================================================
--- qemu.orig/vl.c 2008-01-30 13:47:00.000000000 -0600
+++ qemu/vl.c 2008-01-30 13:47:31.000000000 -0600
@@ -142,7 +142,11 @@
//#define DEBUG_UNUSED_IOPORT
//#define DEBUG_IOPORT
+#if HOST_LONG_BITS < 64
#define PHYS_RAM_MAX_SIZE (2047 * 1024 * 1024)
+#else
+#define PHYS_RAM_MAX_SIZE (2047 * 1024 * 1024 * 1024ULL)
+#endif
#ifdef TARGET_PPC
#define DEFAULT_RAM_SIZE 144
@@ -174,7 +178,7 @@
int nographic;
const char* keyboard_layout = NULL;
int64_t ticks_per_sec;
-int ram_size;
+int64_t ram_size;
int pit_min_timer_count = 0;
int nb_nics;
NICInfo nd_table[MAX_NICS];
@@ -8460,7 +8464,7 @@
help(0);
break;
case QEMU_OPTION_m:
- ram_size = atoi(optarg) * 1024 * 1024;
+ ram_size = (int64_t)atoi(optarg) * 1024 * 1024;
if (ram_size <= 0)
help(1);
if (ram_size > PHYS_RAM_MAX_SIZE) {
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] [PATCH 2/6] SCI fixes
2008-01-31 22:36 [Qemu-devel] [PATCH 0/6] Support for the Kernel Virtual Machine interface Anthony Liguori
2008-01-31 22:36 ` [Qemu-devel] [PATCH 1/6] Use correct types to enable > 2G support Anthony Liguori
@ 2008-01-31 22:36 ` Anthony Liguori
2008-01-31 22:36 ` [Qemu-devel] [PATCH 3/6] Fix daemonize options Anthony Liguori
` (4 subsequent siblings)
6 siblings, 0 replies; 35+ messages in thread
From: Anthony Liguori @ 2008-01-31 22:36 UTC (permalink / raw)
To: qemu-devel; +Cc: kvm-devel, Paul Brook
KVM supports the ability to use ACPI to shutdown guests. In order to enable
this requires some fixes to be able to generate the SCI interrupt and the
appropriate plumbing.
Index: qemu/hw/acpi.c
===================================================================
--- qemu.orig/hw/acpi.c 2008-01-30 13:47:00.000000000 -0600
+++ qemu/hw/acpi.c 2008-01-30 13:47:37.000000000 -0600
@@ -49,6 +49,7 @@
uint8_t smb_data1;
uint8_t smb_data[32];
uint8_t smb_index;
+ qemu_irq irq;
} PIIX4PMState;
#define RTC_EN (1 << 10)
@@ -71,6 +72,8 @@
#define SMBHSTDAT1 0x06
#define SMBBLKDAT 0x07
+PIIX4PMState *pm_state;
+
static uint32_t get_pmtmr(PIIX4PMState *s)
{
uint32_t d;
@@ -97,11 +100,12 @@
pmsts = get_pmsts(s);
sci_level = (((pmsts & s->pmen) &
(RTC_EN | PWRBTN_EN | GBL_EN | TMROF_EN)) != 0);
- qemu_set_irq(s->dev.irq[0], sci_level);
+ qemu_set_irq(s->irq, sci_level);
/* schedule a timer interruption if needed */
if ((s->pmen & TMROF_EN) && !(pmsts & TMROF_EN)) {
expire_time = muldiv64(s->tmr_overflow_time, ticks_per_sec, PM_FREQ);
qemu_mod_timer(s->tmr_timer, expire_time);
+ s->tmr_overflow_time += 0x800000;
} else {
qemu_del_timer(s->tmr_timer);
}
@@ -467,7 +471,8 @@
return 0;
}
-i2c_bus *piix4_pm_init(PCIBus *bus, int devfn, uint32_t smb_io_base)
+i2c_bus *piix4_pm_init(PCIBus *bus, int devfn, uint32_t smb_io_base,
+ qemu_irq sci_irq)
{
PIIX4PMState *s;
uint8_t *pci_conf;
@@ -475,6 +480,7 @@
s = (PIIX4PMState *)pci_register_device(bus,
"PM", sizeof(PIIX4PMState),
devfn, NULL, pm_write_config);
+ pm_state = s;
pci_conf = s->dev.config;
pci_conf[0x00] = 0x86;
pci_conf[0x01] = 0x80;
@@ -514,5 +520,16 @@
register_savevm("piix4_pm", 0, 1, pm_save, pm_load, s);
s->smbus = i2c_init_bus();
+ s->irq = sci_irq;
return s->smbus;
}
+
+#if defined(TARGET_I386)
+void qemu_system_powerdown(void)
+{
+ if(pm_state->pmen & PWRBTN_EN) {
+ pm_state->pmsts |= PWRBTN_EN;
+ pm_update_sci(pm_state);
+ }
+}
+#endif
Index: qemu/hw/mips_malta.c
===================================================================
--- qemu.orig/hw/mips_malta.c 2008-01-30 13:47:00.000000000 -0600
+++ qemu/hw/mips_malta.c 2008-01-30 13:47:37.000000000 -0600
@@ -905,7 +905,7 @@
piix4_devfn = piix4_init(pci_bus, 80);
pci_piix4_ide_init(pci_bus, hd, piix4_devfn + 1, i8259);
usb_uhci_piix4_init(pci_bus, piix4_devfn + 2);
- smbus = piix4_pm_init(pci_bus, piix4_devfn + 3, 0x1100);
+ smbus = piix4_pm_init(pci_bus, piix4_devfn + 3, 0x1100, i8259[9]);
eeprom_buf = qemu_mallocz(8 * 256); /* XXX: make this persistent */
for (i = 0; i < 8; i++) {
/* TODO: Populate SPD eeprom data. */
Index: qemu/hw/pc.c
===================================================================
--- qemu.orig/hw/pc.c 2008-01-30 13:47:31.000000000 -0600
+++ qemu/hw/pc.c 2008-01-30 13:47:37.000000000 -0600
@@ -1000,7 +1000,7 @@
i2c_bus *smbus;
/* TODO: Populate SPD eeprom data. */
- smbus = piix4_pm_init(pci_bus, piix3_devfn + 3, 0xb100);
+ smbus = piix4_pm_init(pci_bus, piix3_devfn + 3, 0xb100, i8259[9]);
for (i = 0; i < 8; i++) {
smbus_eeprom_device_init(smbus, 0x50 + i, eeprom_buf + (i * 256));
}
Index: qemu/hw/pc.h
===================================================================
--- qemu.orig/hw/pc.h 2008-01-30 13:47:00.000000000 -0600
+++ qemu/hw/pc.h 2008-01-30 13:47:37.000000000 -0600
@@ -88,7 +88,8 @@
/* acpi.c */
extern int acpi_enabled;
-i2c_bus *piix4_pm_init(PCIBus *bus, int devfn, uint32_t smb_io_base);
+i2c_bus *piix4_pm_init(PCIBus *bus, int devfn, uint32_t smb_io_base,
+ qemu_irq sci_irq);
void piix4_smbus_register_device(SMBusDevice *dev, uint8_t addr);
void acpi_bios_init(void);
Index: qemu/hw/piix_pci.c
===================================================================
--- qemu.orig/hw/piix_pci.c 2008-01-30 13:47:00.000000000 -0600
+++ qemu/hw/piix_pci.c 2008-01-30 13:47:37.000000000 -0600
@@ -220,7 +220,6 @@
{
int i, pic_irq, pic_level;
- piix3_dev->config[0x60 + irq_num] &= ~0x80; // enable bit
pci_irq_levels[irq_num] = level;
/* now we change the pic irq level according to the piix irq mappings */
Index: qemu/sysemu.h
===================================================================
--- qemu.orig/sysemu.h 2008-01-30 13:47:31.000000000 -0600
+++ qemu/sysemu.h 2008-01-30 13:47:37.000000000 -0600
@@ -30,12 +30,16 @@
void qemu_system_reset_request(void);
void qemu_system_shutdown_request(void);
void qemu_system_powerdown_request(void);
-#if !defined(TARGET_SPARC)
+int qemu_shutdown_requested(void);
+int qemu_reset_requested(void);
+int qemu_powerdown_requested(void);
+#if !defined(TARGET_SPARC) && !defined(TARGET_I386)
// Please implement a power failure function to signal the OS
#define qemu_system_powerdown() do{}while(0)
#else
void qemu_system_powerdown(void);
#endif
+void qemu_system_reset(void);
void cpu_save(QEMUFile *f, void *opaque);
int cpu_load(QEMUFile *f, void *opaque, int version_id);
Index: qemu/vl.c
===================================================================
--- qemu.orig/vl.c 2008-01-30 13:47:31.000000000 -0600
+++ qemu/vl.c 2008-01-30 13:47:37.000000000 -0600
@@ -7267,6 +7267,27 @@
static int shutdown_requested;
static int powerdown_requested;
+int qemu_shutdown_requested(void)
+{
+ int r = shutdown_requested;
+ shutdown_requested = 0;
+ return r;
+}
+
+int qemu_reset_requested(void)
+{
+ int r = reset_requested;
+ reset_requested = 0;
+ return r;
+}
+
+int qemu_powerdown_requested(void)
+{
+ int r = powerdown_requested;
+ powerdown_requested = 0;
+ return r;
+}
+
void qemu_register_reset(QEMUResetHandler *func, void *opaque)
{
QEMUResetEntry **pre, *re;
@@ -7281,7 +7302,7 @@
*pre = re;
}
-static void qemu_system_reset(void)
+void qemu_system_reset(void)
{
QEMUResetEntry *re;
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] [PATCH 3/6] Fix daemonize options
2008-01-31 22:36 [Qemu-devel] [PATCH 0/6] Support for the Kernel Virtual Machine interface Anthony Liguori
2008-01-31 22:36 ` [Qemu-devel] [PATCH 1/6] Use correct types to enable > 2G support Anthony Liguori
2008-01-31 22:36 ` [Qemu-devel] [PATCH 2/6] SCI fixes Anthony Liguori
@ 2008-01-31 22:36 ` Anthony Liguori
2008-01-31 22:36 ` [Qemu-devel] [PATCH 4/6] Tell BIOS about the number of CPUs Anthony Liguori
` (3 subsequent siblings)
6 siblings, 0 replies; 35+ messages in thread
From: Anthony Liguori @ 2008-01-31 22:36 UTC (permalink / raw)
To: qemu-devel; +Cc: kvm-devel, Paul Brook
The -daemonize option is too restrictive when using with SDL. It also switches
the working directory to / too early which causes block devices with a relative
path to fail.
The -daemonize option is needed for my regression testing so I've included this
patch in the series.
Index: qemu/vl.c
===================================================================
--- qemu.orig/vl.c 2008-01-30 13:47:37.000000000 -0600
+++ qemu/vl.c 2008-01-30 13:47:39.000000000 -0600
@@ -8766,11 +8766,6 @@
}
#ifndef _WIN32
- if (daemonize && !nographic && vnc_display == NULL) {
- fprintf(stderr, "Can only daemonize if using -nographic or -vnc\n");
- daemonize = 0;
- }
-
if (daemonize) {
pid_t pid;
@@ -8808,7 +8803,6 @@
exit(1);
umask(027);
- chdir("/");
signal(SIGTSTP, SIG_IGN);
signal(SIGTTOU, SIG_IGN);
@@ -9067,6 +9061,7 @@
if (len != 1)
exit(1);
+ chdir("/");
TFR(fd = open("/dev/null", O_RDWR));
if (fd == -1)
exit(1);
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] [PATCH 4/6] Tell BIOS about the number of CPUs
2008-01-31 22:36 [Qemu-devel] [PATCH 0/6] Support for the Kernel Virtual Machine interface Anthony Liguori
` (2 preceding siblings ...)
2008-01-31 22:36 ` [Qemu-devel] [PATCH 3/6] Fix daemonize options Anthony Liguori
@ 2008-01-31 22:36 ` Anthony Liguori
2008-02-01 0:14 ` [Qemu-devel] " Paul Brook
2008-01-31 22:36 ` [Qemu-devel] [PATCH 5/6] Refactor option ROM loading Anthony Liguori
` (2 subsequent siblings)
6 siblings, 1 reply; 35+ messages in thread
From: Anthony Liguori @ 2008-01-31 22:36 UTC (permalink / raw)
To: qemu-devel; +Cc: kvm-devel, Paul Brook
Previously, the BIOS would probe the CPUs for SMP guests. This tends to be
very unreliably because of startup timing issues. By passing the number of
CPUs in the CMOS, the BIOS can detect the number of CPUs much more reliably.
Index: qemu/hw/pc.c
===================================================================
--- qemu.orig/hw/pc.c 2008-01-30 13:47:37.000000000 -0600
+++ qemu/hw/pc.c 2008-01-30 13:47:40.000000000 -0600
@@ -182,7 +182,8 @@
/* hd_table must contain 4 block drivers */
static void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
- const char *boot_device, BlockDriverState **hd_table)
+ const char *boot_device, BlockDriverState **hd_table,
+ int smp_cpus)
{
RTCState *s = rtc_state;
int nbds, bds[3] = { 0, };
@@ -210,6 +211,7 @@
rtc_set_memory(s, 0x5c, (unsigned int)above_4g_mem_size >> 24);
rtc_set_memory(s, 0x5d, above_4g_mem_size >> 32);
}
+ rtc_set_memory(s, 0x5f, smp_cpus - 1);
if (ram_size > (16 * 1024 * 1024))
val = (ram_size / 65536) - ((16 * 1024 * 1024) / 65536);
@@ -989,7 +991,7 @@
}
floppy_controller = fdctrl_init(i8259[6], 2, 0, 0x3f0, fd);
- cmos_init(ram_size, above_4g_mem_size, boot_device, hd);
+ cmos_init(ram_size, above_4g_mem_size, boot_device, hd, smp_cpus);
if (pci_enabled && usb_enabled) {
usb_uhci_piix3_init(pci_bus, piix3_devfn + 2);
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] [PATCH 5/6] Refactor option ROM loading
2008-01-31 22:36 [Qemu-devel] [PATCH 0/6] Support for the Kernel Virtual Machine interface Anthony Liguori
` (3 preceding siblings ...)
2008-01-31 22:36 ` [Qemu-devel] [PATCH 4/6] Tell BIOS about the number of CPUs Anthony Liguori
@ 2008-01-31 22:36 ` Anthony Liguori
2008-01-31 22:36 ` [Qemu-devel] [PATCH 6/6] QEMU support for the Kernel Virtual Machine interface Anthony Liguori
2008-01-31 22:53 ` [qemu-devel] [PATCH 0/6] Support " Anthony Liguori
6 siblings, 0 replies; 35+ messages in thread
From: Anthony Liguori @ 2008-01-31 22:36 UTC (permalink / raw)
To: qemu-devel; +Cc: kvm-devel, Paul Brook
KVM requires that any ROM memory be registerd through a second interface. This
patch refactors the option ROM loading to simplify adding KVM support (which
will follow in the next patch).
Index: qemu/hw/pc.c
===================================================================
--- qemu.orig/hw/pc.c 2008-01-30 13:47:40.000000000 -0600
+++ qemu/hw/pc.c 2008-01-30 13:47:41.000000000 -0600
@@ -704,6 +704,31 @@
isa_ne2000_init(ne2000_io[nb_ne2k], pic[ne2000_irq[nb_ne2k]], nd);
nb_ne2k++;
}
+
+static int load_option_rom(const char *filename, int offset)
+{
+ ram_addr_t option_rom_offset;
+ int size, ret;
+
+ size = get_image_size(filename);
+ if (size < 0) {
+ fprintf(stderr, "Could not load option rom '%s'\n", filename);
+ exit(1);
+ }
+ if (size > (0x10000 - offset))
+ goto option_rom_error;
+ option_rom_offset = qemu_ram_alloc(size);
+ ret = load_image(filename, phys_ram_base + option_rom_offset);
+ if (ret != size) {
+ option_rom_error:
+ fprintf(stderr, "Too many option ROMS\n");
+ exit(1);
+ }
+ size = (size + 4095) & ~4095;
+ cpu_register_physical_memory(0xd0000 + offset,
+ size, option_rom_offset | IO_MEM_ROM);
+ return size;
+}
/* PC hardware initialisation */
static void pc_init1(ram_addr_t ram_size, int vga_ram_size,
@@ -716,7 +741,7 @@
int ret, linux_boot, i;
ram_addr_t ram_addr, vga_ram_addr, bios_offset, vga_bios_offset;
ram_addr_t above_4g_mem_size = 0;
- int bios_size, isa_bios_size, vga_bios_size;
+ int bios_size, isa_bios_size, vga_bios_size, opt_rom_offset;
PCIBus *pci_bus;
int piix3_devfn = -1;
CPUState *env;
@@ -825,33 +850,9 @@
isa_bios_size,
(bios_offset + bios_size - isa_bios_size) | IO_MEM_ROM);
- {
- ram_addr_t option_rom_offset;
- int size, offset;
-
- offset = 0;
- for (i = 0; i < nb_option_roms; i++) {
- size = get_image_size(option_rom[i]);
- if (size < 0) {
- fprintf(stderr, "Could not load option rom '%s'\n",
- option_rom[i]);
- exit(1);
- }
- if (size > (0x10000 - offset))
- goto option_rom_error;
- option_rom_offset = qemu_ram_alloc(size);
- ret = load_image(option_rom[i], phys_ram_base + option_rom_offset);
- if (ret != size) {
- option_rom_error:
- fprintf(stderr, "Too many option ROMS\n");
- exit(1);
- }
- size = (size + 4095) & ~4095;
- cpu_register_physical_memory(0xd0000 + offset,
- size, option_rom_offset | IO_MEM_ROM);
- offset += size;
- }
- }
+ opt_rom_offset = 0;
+ for (i = 0; i < nb_option_roms; i++)
+ opt_rom_offset += load_option_rom(option_rom[i], opt_rom_offset);
/* map all the bios at the top of memory */
cpu_register_physical_memory((uint32_t)(-bios_size),
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] [PATCH 6/6] QEMU support for the Kernel Virtual Machine interface
2008-01-31 22:36 [Qemu-devel] [PATCH 0/6] Support for the Kernel Virtual Machine interface Anthony Liguori
` (4 preceding siblings ...)
2008-01-31 22:36 ` [Qemu-devel] [PATCH 5/6] Refactor option ROM loading Anthony Liguori
@ 2008-01-31 22:36 ` Anthony Liguori
2008-02-01 9:49 ` [Qemu-devel] " Fabrice Bellard
2008-01-31 22:53 ` [qemu-devel] [PATCH 0/6] Support " Anthony Liguori
6 siblings, 1 reply; 35+ messages in thread
From: Anthony Liguori @ 2008-01-31 22:36 UTC (permalink / raw)
To: qemu-devel; +Cc: kvm-devel, Paul Brook
This patch actually enables KVM support for QEMU. I apologize that it is so
large but this was the only sane way to preserve bisectability.
The goal of this patch is to add KVM support, but not to impact users when
KVM isn't being used. It achieves this by using a kvm_enabled() macro that
evaluates to (0) when KVM support is not enabled. An if (kvm_enabled()) is
just as good as using an #ifdef since GCC will eliminate the dead code.
This patches touches a lot of areas. For performance reasons, the guest CPU
state is not kept in sync with CPUState. This requires an explicit
synchronization whenever CPUState is required. KVM also uses it's own main
loop as it runs each VCPU in it's own thread.
Trapping VGA updates via MMIO is far too slow when running KVM so there is
additional logic to allow VGA memory to be accessed as RAM. We use KVM's
shadow page tables to keep track of which portions of RAM have been dirtied.
KVM also supports an in-kernel APIC implementation as a performance
enhancement. Finally, KVM supports APIC TPR patching. This allows TPR
accesses (which are very frequently for Windows) to be patches into CALL
instructions to the BIOS (for 32-bit guests). This results in a very
sigificant performance improvement for Windows guests.
While this patch is very large, the new files are only included when KVM
support is compiled in. Every change to QEMU is wrapped in an
if (kvm_enabled()) so the code disappears when KVM support is not compiled in.
This is done to ensure no regressions are introduced to normal QEMU.
Index: qemu/Makefile.target
===================================================================
--- qemu.orig/Makefile.target 2008-01-31 05:32:10.000000000 -0600
+++ qemu/Makefile.target 2008-01-31 15:41:47.000000000 -0600
@@ -174,6 +174,10 @@
# cpu emulator library
LIBOBJS=exec.o kqemu.o translate-op.o translate-all.o cpu-exec.o\
translate.o op.o host-utils.o
+ifeq ($(USE_KVM), 1)
+LIBOBJS+=qemu-kvm.o
+endif
+
ifdef CONFIG_SOFTFLOAT
LIBOBJS+=fpu/softfloat.o
else
@@ -183,10 +187,18 @@
ifeq ($(TARGET_ARCH), i386)
LIBOBJS+=helper.o helper2.o
+ifeq ($(USE_KVM), 1)
+LIBOBJS+=qemu-kvm-x86.o kvm-tpr-opt.o
+LIBOBJS+=qemu-kvm-helper.o
+endif
endif
ifeq ($(TARGET_ARCH), x86_64)
LIBOBJS+=helper.o helper2.o
+ifeq ($(USE_KVM), 1)
+LIBOBJS+=qemu-kvm-x86.o kvm-tpr-opt.o
+LIBOBJS+=qemu-kvm-helper.o
+endif
endif
ifeq ($(TARGET_BASE_ARCH), ppc)
@@ -289,6 +301,8 @@
# HELPER_CFLAGS is used for all the code compiled with static register
# variables
ifeq ($(TARGET_BASE_ARCH), i386)
+qemu-kvm-x86.o: qemu-kvm-x86.c qemu-kvm.h
+
# XXX: rename helper.c to op_helper.c
helper.o: helper.c
$(CC) $(HELPER_CFLAGS) $(CPPFLAGS) $(CFLAGS) -c -o $@ $<
@@ -414,6 +428,13 @@
OBJS+= libqemu.a
+qemu-kvm.o: qemu-kvm.c qemu-kvm.h
+ifeq ($(TARGET_BASE_ARCH), i386)
+qemu-kvm-helper.o: qemu-kvm-helper.c
+endif
+
+ $(CC) $(HELPER_CFLAGS) $(CPPFLAGS) $(BASE_CFLAGS) -c -o $@ $<
+
# Note: this is a workaround. The real fix is to avoid compiling
# cpu_signal_handler() in cpu-exec.c.
signal.o: signal.c
@@ -496,6 +517,11 @@
SOUND_HW += gus.o gusemu_hal.o gusemu_mixer.o
endif
+ifdef CONFIG_KVM_KERNEL_INC
+CFLAGS += -I $(CONFIG_KVM_KERNEL_INC)
+LIBS += -lkvm
+endif
+
ifdef CONFIG_VNC_TLS
CPPFLAGS += $(CONFIG_VNC_TLS_CFLAGS)
LIBS += $(CONFIG_VNC_TLS_LIBS)
Index: qemu/block-raw-posix.c
===================================================================
--- qemu.orig/block-raw-posix.c 2008-01-06 12:53:07.000000000 -0600
+++ qemu/block-raw-posix.c 2008-01-31 15:41:47.000000000 -0600
@@ -23,6 +23,7 @@
*/
#include "qemu-common.h"
#ifndef QEMU_IMG
+#include "qemu-kvm.h"
#include "qemu-timer.h"
#include "exec-all.h"
#endif
@@ -345,6 +346,12 @@
if (!aio_initialized)
qemu_aio_init();
+#ifndef QEMU_IMG
+ if (kvm_enabled()) {
+ qemu_kvm_aio_wait_start();
+ return;
+ }
+#endif
sigemptyset(&set);
sigaddset(&set, aio_sig_num);
sigprocmask(SIG_BLOCK, &set, &wait_oset);
@@ -358,6 +365,11 @@
#ifndef QEMU_IMG
if (qemu_bh_poll())
return;
+ if (kvm_enabled()) {
+ qemu_kvm_aio_wait();
+ qemu_aio_poll();
+ return;
+ }
#endif
sigemptyset(&set);
sigaddset(&set, aio_sig_num);
@@ -367,6 +379,12 @@
void qemu_aio_wait_end(void)
{
+#ifndef QEMU_IMG
+ if (kvm_enabled()) {
+ qemu_kvm_aio_wait_end();
+ return;
+ }
+#endif
sigprocmask(SIG_SETMASK, &wait_oset, NULL);
}
Index: qemu/configure
===================================================================
--- qemu.orig/configure 2008-01-31 05:32:10.000000000 -0600
+++ qemu/configure 2008-01-31 15:41:47.000000000 -0600
@@ -99,7 +99,9 @@
bsd="no"
linux="no"
kqemu="no"
+kvm="no"
profiler="no"
+kernel_path=""
cocoa="no"
check_gfx="yes"
check_gcc="yes"
@@ -136,6 +138,7 @@
oss="yes"
if [ "$cpu" = "i386" -o "$cpu" = "x86_64" ] ; then
kqemu="yes"
+ kvm="yes"
fi
;;
NetBSD)
@@ -193,6 +196,7 @@
linux_user="yes"
if [ "$cpu" = "i386" -o "$cpu" = "x86_64" ] ; then
kqemu="yes"
+ kvm="yes"
fi
;;
esac
@@ -287,8 +291,12 @@
;;
--disable-kqemu) kqemu="no"
;;
+ --disable-kvm) kvm="no"
+ ;;
--enable-profiler) profiler="yes"
;;
+ --kernel-path=*) kernel_path="$optarg"
+ ;;
--enable-cocoa) cocoa="yes" ; coreaudio="yes" ; sdl="no"
;;
--disable-gfx-check) check_gfx="no"
@@ -325,7 +333,7 @@
;;
--disable-werror) werror="no"
;;
- *) echo "ERROR: unknown option $opt"; show_help="yes"
+ *) echo "ERROR: unknown option $opt"; exit 1
;;
esac
done
@@ -394,6 +402,8 @@
echo ""
echo "kqemu kernel acceleration support:"
echo " --disable-kqemu disable kqemu support"
+echo " --kernel-path=PATH set the kernel path (configure probes it)"
+echo " --disable-kvm disable kernel virtual machine support"
echo ""
echo "Advanced options (experts only):"
echo " --source-path=PATH path of source code [$source_path]"
@@ -671,6 +681,24 @@
fi
fi
+# Check for libkvm
+if [ "$kvm" = "yes" ] ; then
+ cat > $TMPC <<EOF
+#include <libkvm.h>
+int main(void) {}
+EOF
+ if [ "$kernel_path" != "" ] ; then
+ flags="-I$kernel_path/include"
+ fi
+ have_libkvm="no"
+ if $cc -c -o $TMPO $TMPC "${flags}" 2> /dev/null ; then
+ have_libkvm="yes"
+ fi
+ if [ "$have_libkvm" = "no" ] ; then
+ kvm="no"
+ fi
+fi
+
# Check if tools are available to build documentation.
if [ -x "`which texi2html 2>/dev/null`" ] && \
[ -x "`which pod2man 2>/dev/null`" ]; then
@@ -752,6 +780,7 @@
echo "Target Sparc Arch $sparc_cpu"
fi
echo "kqemu support $kqemu"
+echo "kvm support $kvm"
echo "Documentation $build_docs"
[ ! -z "$uname_release" ] && \
echo "uname -r $uname_release"
@@ -1074,6 +1103,15 @@
interp_prefix1=`echo "$interp_prefix" | sed "s/%M/$target_cpu/g"`
echo "#define CONFIG_QEMU_PREFIX \"$interp_prefix1\"" >> $config_h
+configure_kvm() {
+ if test $kvm = "yes" -a "$target_softmmu" = "yes" -a "$have_libkvm" = "yes" \
+ -a \( "$cpu" = "i386" -o "$cpu" = "x86_64" \); then
+ echo "#define USE_KVM 1" >> $config_h
+ echo "USE_KVM=1" >> $config_mak
+ echo "CONFIG_KVM_KERNEL_INC=$kernel_path/include" >> $config_mak
+ fi
+}
+
if test "$target_cpu" = "i386" ; then
echo "TARGET_ARCH=i386" >> $config_mak
echo "#define TARGET_ARCH \"i386\"" >> $config_h
@@ -1081,6 +1119,7 @@
if test $kqemu = "yes" -a "$target_softmmu" = "yes" -a $cpu = "i386" ; then
echo "#define USE_KQEMU 1" >> $config_h
fi
+ configure_kvm
elif test "$target_cpu" = "arm" -o "$target_cpu" = "armeb" ; then
echo "TARGET_ARCH=arm" >> $config_mak
echo "#define TARGET_ARCH \"arm\"" >> $config_h
@@ -1136,6 +1175,7 @@
if test $kqemu = "yes" -a "$target_softmmu" = "yes" -a $cpu = "x86_64" ; then
echo "#define USE_KQEMU 1" >> $config_h
fi
+ configure_kvm
elif test "$target_cpu" = "mips" -o "$target_cpu" = "mipsel" ; then
echo "TARGET_ARCH=mips" >> $config_mak
echo "#define TARGET_ARCH \"mips\"" >> $config_h
Index: qemu/cpu-all.h
===================================================================
--- qemu.orig/cpu-all.h 2008-01-31 15:41:46.000000000 -0600
+++ qemu/cpu-all.h 2008-01-31 15:41:47.000000000 -0600
@@ -820,6 +820,7 @@
extern int phys_ram_fd;
extern uint8_t *phys_ram_base;
extern uint8_t *phys_ram_dirty;
+extern uint8_t *bios_mem;
/* physical memory access */
#define TLB_INVALID_MASK (1 << 3)
Index: qemu/cpu-exec.c
===================================================================
--- qemu.orig/cpu-exec.c 2008-01-23 13:01:11.000000000 -0600
+++ qemu/cpu-exec.c 2008-01-31 15:41:47.000000000 -0600
@@ -36,6 +36,8 @@
#include <sys/ucontext.h>
#endif
+#include "qemu-kvm.h"
+
int tb_invalidated_flag;
//#define DEBUG_EXEC
@@ -475,6 +477,10 @@
}
#endif
+ if (kvm_enabled()) {
+ kvm_cpu_exec(env);
+ longjmp(env->jmp_env, 1);
+ }
T0 = 0; /* force lookup of first TB */
for(;;) {
SAVE_GLOBALS();
Index: qemu/exec.c
===================================================================
--- qemu.orig/exec.c 2008-01-31 15:41:46.000000000 -0600
+++ qemu/exec.c 2008-01-31 15:41:47.000000000 -0600
@@ -35,6 +35,8 @@
#include "cpu.h"
#include "exec-all.h"
+#include "dyngen.h"
+#include "qemu-kvm.h"
#if defined(CONFIG_USER_ONLY)
#include <qemu.h>
#endif
@@ -73,11 +75,11 @@
#define TARGET_VIRT_ADDR_SPACE_BITS 42
#elif defined(TARGET_PPC64)
#define TARGET_PHYS_ADDR_SPACE_BITS 42
-#elif USE_KQEMU
+#elif defined(TARGET_X86_64) && !defined(USE_KQEMU)
+#define TARGET_PHYS_ADDR_SPACE_BITS 42
+#else
/* Note: for compatibility with kqemu, we use 32 bits for x86_64 */
#define TARGET_PHYS_ADDR_SPACE_BITS 32
-#else
-#define TARGET_PHYS_ADDR_SPACE_BITS 42
#endif
TranslationBlock tbs[CODE_GEN_MAX_BLOCKS];
@@ -93,6 +95,7 @@
int phys_ram_fd;
uint8_t *phys_ram_base;
uint8_t *phys_ram_dirty;
+uint8_t *bios_mem;
static ram_addr_t phys_ram_alloc_offset = 0;
CPUState *first_cpu;
@@ -1132,6 +1135,9 @@
return -1;
env->breakpoints[env->nb_breakpoints++] = pc;
+ if (kvm_enabled())
+ kvm_update_debugger(env);
+
breakpoint_invalidate(env, pc);
return 0;
#else
@@ -1154,6 +1160,9 @@
if (i < env->nb_breakpoints)
env->breakpoints[i] = env->breakpoints[env->nb_breakpoints];
+ if (kvm_enabled())
+ kvm_update_debugger(env);
+
breakpoint_invalidate(env, pc);
return 0;
#else
@@ -1172,6 +1181,8 @@
/* XXX: only flush what is necessary */
tb_flush(env);
}
+ if (kvm_enabled())
+ kvm_update_debugger(env);
#endif
}
@@ -1219,6 +1230,9 @@
static int interrupt_lock;
env->interrupt_request |= mask;
+ if (kvm_enabled() && !qemu_kvm_irqchip_in_kernel())
+ kvm_update_interrupt_request(env);
+
/* if the cpu is currently executing code, we must unlink it and
all the potentially executing TB */
tb = env->current_tb;
@@ -2613,6 +2627,11 @@
phys_ram_dirty[addr1 >> TARGET_PAGE_BITS] |=
(0xff & ~CODE_DIRTY_FLAG);
}
+ /* qemu doesn't execute guest code directly, but kvm does
+ therefore fluch instruction caches */
+ if (kvm_enabled())
+ flush_icache_range((unsigned long)ptr,
+ ((unsigned long)ptr)+l);
}
} else {
if ((pd & ~TARGET_PAGE_MASK) > IO_MEM_ROM &&
Index: qemu/gdbstub.c
===================================================================
--- qemu.orig/gdbstub.c 2007-12-15 21:16:05.000000000 -0600
+++ qemu/gdbstub.c 2008-01-31 15:41:47.000000000 -0600
@@ -35,6 +35,8 @@
#include "gdbstub.h"
#endif
+#include "qemu-kvm.h"
+
#include "qemu_socket.h"
#ifdef _WIN32
/* XXX: these constants may be independent of the host ones even for Unix */
@@ -893,6 +895,8 @@
addr = strtoull(p, (char **)&p, 16);
#if defined(TARGET_I386)
env->eip = addr;
+ if (kvm_enabled())
+ kvm_load_registers(env);
#elif defined (TARGET_PPC)
env->nip = addr;
#elif defined (TARGET_SPARC)
@@ -919,6 +923,8 @@
addr = strtoull(p, (char **)&p, 16);
#if defined(TARGET_I386)
env->eip = addr;
+ if (kvm_enabled())
+ kvm_load_registers(env);
#elif defined (TARGET_PPC)
env->nip = addr;
#elif defined (TARGET_SPARC)
@@ -970,6 +976,8 @@
}
break;
case 'g':
+ if (kvm_enabled())
+ kvm_save_registers(env);
reg_size = cpu_gdb_read_registers(env, mem_buf);
memtohex(buf, mem_buf, reg_size);
put_packet(s, buf);
@@ -979,6 +987,8 @@
len = strlen(p) / 2;
hextomem((uint8_t *)registers, p, len);
cpu_gdb_write_registers(env, mem_buf, len);
+ if (kvm_enabled())
+ kvm_load_registers(env);
put_packet(s, "OK");
break;
case 'm':
Index: qemu/hw/apic.c
===================================================================
--- qemu.orig/hw/apic.c 2007-12-16 17:41:11.000000000 -0600
+++ qemu/hw/apic.c 2008-01-31 15:41:47.000000000 -0600
@@ -21,6 +21,8 @@
#include "pc.h"
#include "qemu-timer.h"
+#include "qemu-kvm.h"
+
//#define DEBUG_APIC
//#define DEBUG_IOAPIC
@@ -56,6 +58,7 @@
#define APIC_INPUT_POLARITY (1<<13)
#define APIC_SEND_PENDING (1<<12)
+/* FIXME: it's now hard coded to be equal with KVM_IOAPIC_NUM_PINS */
#define IOAPIC_NUM_PINS 0x18
#define ESR_ILLEGAL_ADDRESS (1 << 7)
@@ -400,6 +403,10 @@
s->initial_count = 0;
s->initial_count_load_time = 0;
s->next_time = 0;
+
+ if (kvm_enabled() && !qemu_kvm_irqchip_in_kernel())
+ if (s->cpu_env)
+ kvm_apic_init(s->cpu_env);
}
/* send a SIPI message to the CPU to start it */
@@ -412,6 +419,8 @@
cpu_x86_load_seg_cache(env, R_CS, vector_num << 8, vector_num << 12,
0xffff, 0);
env->hflags &= ~HF_HALTED_MASK;
+ if (kvm_enabled() && !qemu_kvm_irqchip_in_kernel())
+ kvm_update_after_sipi(env);
}
static void apic_deliver(APICState *s, uint8_t dest, uint8_t dest_mode,
@@ -737,11 +746,94 @@
}
}
+#ifdef KVM_CAP_IRQCHIP
+
+static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id)
+{
+ return *((uint32_t *) (kapic->regs + (reg_id << 4)));
+}
+
+static inline void kapic_set_reg(struct kvm_lapic_state *kapic,
+ int reg_id, uint32_t val)
+{
+ *((uint32_t *) (kapic->regs + (reg_id << 4))) = val;
+}
+
+static void kvm_kernel_lapic_save_to_user(APICState *s)
+{
+ struct kvm_lapic_state apic;
+ struct kvm_lapic_state *kapic = &apic;
+ int i, v;
+
+ kvm_get_lapic(kvm_context, s->cpu_env->cpu_index, kapic);
+
+ s->id = kapic_reg(kapic, 0x2);
+ s->tpr = kapic_reg(kapic, 0x8);
+ s->arb_id = kapic_reg(kapic, 0x9);
+ s->log_dest = kapic_reg(kapic, 0xd) >> 24;
+ s->dest_mode = kapic_reg(kapic, 0xe) >> 28;
+ s->spurious_vec = kapic_reg(kapic, 0xf);
+ for (i = 0; i < 8; i++) {
+ s->isr[i] = kapic_reg(kapic, 0x10 + i);
+ s->tmr[i] = kapic_reg(kapic, 0x18 + i);
+ s->irr[i] = kapic_reg(kapic, 0x20 + i);
+ }
+ s->esr = kapic_reg(kapic, 0x28);
+ s->icr[0] = kapic_reg(kapic, 0x30);
+ s->icr[1] = kapic_reg(kapic, 0x31);
+ for (i = 0; i < APIC_LVT_NB; i++)
+ s->lvt[i] = kapic_reg(kapic, 0x32 + i);
+ s->initial_count = kapic_reg(kapic, 0x38);
+ s->divide_conf = kapic_reg(kapic, 0x3e);
+
+ v = (s->divide_conf & 3) | ((s->divide_conf >> 1) & 4);
+ s->count_shift = (v + 1) & 7;
+
+ s->initial_count_load_time = qemu_get_clock(vm_clock);
+ apic_timer_update(s, s->initial_count_load_time);
+}
+
+static void kvm_kernel_lapic_load_from_user(APICState *s)
+{
+ struct kvm_lapic_state apic;
+ struct kvm_lapic_state *klapic = &apic;
+ int i;
+
+ memset(klapic, 0, sizeof apic);
+ kapic_set_reg(klapic, 0x2, s->id);
+ kapic_set_reg(klapic, 0x8, s->tpr);
+ kapic_set_reg(klapic, 0xd, s->log_dest << 24);
+ kapic_set_reg(klapic, 0xe, s->dest_mode << 28 | 0x0fffffff);
+ kapic_set_reg(klapic, 0xf, s->spurious_vec);
+ for (i = 0; i < 8; i++) {
+ kapic_set_reg(klapic, 0x10 + i, s->isr[i]);
+ kapic_set_reg(klapic, 0x18 + i, s->tmr[i]);
+ kapic_set_reg(klapic, 0x20 + i, s->irr[i]);
+ }
+ kapic_set_reg(klapic, 0x28, s->esr);
+ kapic_set_reg(klapic, 0x30, s->icr[0]);
+ kapic_set_reg(klapic, 0x31, s->icr[1]);
+ for (i = 0; i < APIC_LVT_NB; i++)
+ kapic_set_reg(klapic, 0x32 + i, s->lvt[i]);
+ kapic_set_reg(klapic, 0x38, s->initial_count);
+ kapic_set_reg(klapic, 0x3e, s->divide_conf);
+
+ kvm_set_lapic(kvm_context, s->cpu_env->cpu_index, klapic);
+}
+
+#endif
+
static void apic_save(QEMUFile *f, void *opaque)
{
APICState *s = opaque;
int i;
+#ifdef KVM_CAP_IRQCHIP
+ if (kvm_enabled() && qemu_kvm_irqchip_in_kernel()) {
+ kvm_kernel_lapic_save_to_user(s);
+ }
+#endif
+
qemu_put_be32s(f, &s->apicbase);
qemu_put_8s(f, &s->id);
qemu_put_8s(f, &s->arb_id);
@@ -804,6 +896,13 @@
if (version_id >= 2)
qemu_get_timer(f, s->timer);
+
+#ifdef KVM_CAP_IRQCHIP
+ if (kvm_enabled() && qemu_kvm_irqchip_in_kernel()) {
+ kvm_kernel_lapic_load_from_user(s);
+ }
+#endif
+
return 0;
}
@@ -818,6 +917,11 @@
* processor when local APIC is enabled.
*/
s->lvt[APIC_LVT_LINT0] = 0x700;
+#ifdef KVM_CAP_IRQCHIP
+ if (kvm_enabled() && qemu_kvm_irqchip_in_kernel()) {
+ kvm_kernel_lapic_load_from_user(s);
+ }
+#endif
}
static CPUReadMemoryFunc *apic_mem_read[3] = {
@@ -1010,11 +1114,54 @@
}
}
+static void kvm_kernel_ioapic_save_to_user(IOAPICState *s)
+{
+#if defined(KVM_CAP_IRQCHIP) && defined(TARGET_I386)
+ struct kvm_irqchip chip;
+ struct kvm_ioapic_state *kioapic;
+ int i;
+
+ chip.chip_id = KVM_IRQCHIP_IOAPIC;
+ kvm_get_irqchip(kvm_context, &chip);
+ kioapic = &chip.chip.ioapic;
+
+ s->id = kioapic->id;
+ s->ioregsel = kioapic->ioregsel;
+ for (i = 0; i < IOAPIC_NUM_PINS; i++) {
+ s->ioredtbl[i] = kioapic->redirtbl[i].bits;
+ }
+#endif
+}
+
+static void kvm_kernel_ioapic_load_from_user(IOAPICState *s)
+{
+#if defined(KVM_CAP_IRQCHIP) && defined(TARGET_I386)
+ struct kvm_irqchip chip;
+ struct kvm_ioapic_state *kioapic;
+ int i;
+
+ chip.chip_id = KVM_IRQCHIP_IOAPIC;
+ kioapic = &chip.chip.ioapic;
+
+ kioapic->id = s->id;
+ kioapic->ioregsel = s->ioregsel;
+ for (i = 0; i < IOAPIC_NUM_PINS; i++) {
+ kioapic->redirtbl[i].bits = s->ioredtbl[i];
+ }
+
+ kvm_set_irqchip(kvm_context, &chip);
+#endif
+}
+
static void ioapic_save(QEMUFile *f, void *opaque)
{
IOAPICState *s = opaque;
int i;
+ if (kvm_enabled() && qemu_kvm_irqchip_in_kernel()) {
+ kvm_kernel_ioapic_save_to_user(s);
+ }
+
qemu_put_8s(f, &s->id);
qemu_put_8s(f, &s->ioregsel);
for (i = 0; i < IOAPIC_NUM_PINS; i++) {
@@ -1035,6 +1182,11 @@
for (i = 0; i < IOAPIC_NUM_PINS; i++) {
qemu_get_be64s(f, &s->ioredtbl[i]);
}
+
+ if (kvm_enabled() && qemu_kvm_irqchip_in_kernel()) {
+ kvm_kernel_ioapic_load_from_user(s);
+ }
+
return 0;
}
Index: qemu/hw/cirrus_vga.c
===================================================================
--- qemu.orig/hw/cirrus_vga.c 2007-12-16 17:41:11.000000000 -0600
+++ qemu/hw/cirrus_vga.c 2008-01-31 15:56:10.000000000 -0600
@@ -31,6 +31,10 @@
#include "pci.h"
#include "console.h"
#include "vga_int.h"
+#ifndef _WIN32
+#include <sys/mman.h>
+#endif
+#include "qemu-kvm.h"
/*
* TODO:
@@ -234,6 +238,11 @@
int cirrus_linear_io_addr;
int cirrus_linear_bitblt_io_addr;
int cirrus_mmio_io_addr;
+ unsigned long cirrus_lfb_addr;
+ unsigned long cirrus_lfb_end;
+ int aliases_enabled;
+ uint32_t aliased_bank_base[2];
+ uint32_t aliased_bank_limit[2];
uint32_t cirrus_addr_mask;
uint32_t linear_mmio_mask;
uint8_t cirrus_shadow_gr0;
@@ -1354,6 +1363,8 @@
printf("cirrus: handled outport sr_index %02x, sr_value %02x\n",
reg_index, reg_value);
#endif
+ if (reg_index == 0x07)
+ cirrus_update_memory_access(s);
break;
case 0x17: // Configuration Readback and Extended Control
s->sr[reg_index] = (s->sr[reg_index] & 0x38) | (reg_value & 0xc7);
@@ -1500,6 +1511,7 @@
s->gr[reg_index] = reg_value;
cirrus_update_bank_ptr(s, 0);
cirrus_update_bank_ptr(s, 1);
+ cirrus_update_memory_access(s);
break;
case 0x0B:
s->gr[reg_index] = reg_value;
@@ -2588,10 +2600,86 @@
cirrus_linear_bitblt_writel,
};
+void *set_vram_mapping(unsigned long begin, unsigned long end)
+{
+ void *vram_pointer = NULL;
+
+ /* align begin and end address */
+ begin = begin & TARGET_PAGE_MASK;
+ end = begin + VGA_RAM_SIZE;
+ end = (end + TARGET_PAGE_SIZE -1 ) & TARGET_PAGE_MASK;
+
+ if (kvm_enabled())
+ vram_pointer = kvm_cpu_create_phys_mem(begin, end - begin, 1, 1);
+
+ if (vram_pointer == NULL) {
+ printf("set_vram_mapping: cannot allocate memory: %m\n");
+ return NULL;
+ }
+
+ memset(vram_pointer, 0, end - begin);
+
+ return vram_pointer;
+}
+
+int unset_vram_mapping(unsigned long begin, unsigned long end)
+{
+ /* align begin and end address */
+ end = begin + VGA_RAM_SIZE;
+ begin = begin & TARGET_PAGE_MASK;
+ end = (end + TARGET_PAGE_SIZE -1 ) & TARGET_PAGE_MASK;
+
+ if (kvm_enabled())
+ kvm_cpu_destroy_phys_mem(begin, end - begin);
+
+ return 0;
+}
+
+#if defined(TARGET_I386)
+static void kvm_update_vga_alias(CirrusVGAState *s, int ok, int bank,
+ unsigned long phys_addr)
+{
+ unsigned limit, base;
+
+ if (!ok && !s->aliases_enabled)
+ return;
+ limit = s->cirrus_bank_limit[bank];
+ if (limit > 0x8000)
+ limit = 0x8000;
+ base = s->cirrus_lfb_addr + s->cirrus_bank_base[bank];
+ if (ok) {
+ if (!s->aliases_enabled
+ || base != s->aliased_bank_base[bank]
+ || limit != s->aliased_bank_limit[bank]) {
+ if (kvm_enabled())
+ qemu_kvm_create_memory_alias(phys_addr,
+ 0xa0000 + bank * 0x8000,
+ limit, base);
+ s->aliased_bank_base[bank] = base;
+ s->aliased_bank_limit[bank] = limit;
+ }
+ } else if (kvm_enabled()) {
+ qemu_kvm_destroy_memory_alias(phys_addr);
+ }
+}
+
+static void kvm_update_vga_aliases(CirrusVGAState *s, int ok)
+{
+ if (kvm_enabled()) {
+ kvm_update_vga_alias(s, ok, 0, 0xc0000);
+ kvm_update_vga_alias(s, ok, 1, s->map_addr);
+ }
+ s->aliases_enabled = ok;
+}
+#endif
+
/* Compute the memory access functions */
static void cirrus_update_memory_access(CirrusVGAState *s)
{
unsigned mode;
+#if defined(TARGET_I386)
+ int want_vga_alias = 0;
+#endif
if ((s->sr[0x17] & 0x44) == 0x44) {
goto generic_io;
@@ -2606,16 +2694,58 @@
mode = s->gr[0x05] & 0x7;
if (mode < 4 || mode > 5 || ((s->gr[0x0B] & 0x4) == 0)) {
+ if (kvm_enabled() && s->cirrus_lfb_addr && s->cirrus_lfb_end &&
+ !s->map_addr) {
+ void *vram_pointer, *old_vram;
+
+ vram_pointer = set_vram_mapping(s->cirrus_lfb_addr,
+ s->cirrus_lfb_end);
+ if (!vram_pointer)
+ fprintf(stderr, "NULL vram_pointer\n");
+ else {
+ old_vram = vga_update_vram((VGAState *)s, vram_pointer,
+ VGA_RAM_SIZE);
+ qemu_free(old_vram);
+ }
+ s->map_addr = s->cirrus_lfb_addr;
+ s->map_end = s->cirrus_lfb_end;
+ }
+#if defined(TARGET_I386)
+ if (kvm_enabled()
+ && !(s->cirrus_srcptr != s->cirrus_srcptr_end)
+ && !((s->sr[0x07] & 0x01) == 0)
+ && !((s->gr[0x0B] & 0x14) == 0x14)
+ && !(s->gr[0x0B] & 0x02))
+ want_vga_alias = 1;
+#endif
s->cirrus_linear_write[0] = cirrus_linear_mem_writeb;
s->cirrus_linear_write[1] = cirrus_linear_mem_writew;
s->cirrus_linear_write[2] = cirrus_linear_mem_writel;
} else {
generic_io:
+ if (kvm_enabled() && s->cirrus_lfb_addr && s->cirrus_lfb_end &&
+ s->map_addr) {
+ int error;
+ void *old_vram = NULL;
+
+ error = unset_vram_mapping(s->cirrus_lfb_addr,
+ s->cirrus_lfb_end);
+ if (!error)
+ old_vram = vga_update_vram((VGAState *)s, NULL,
+ VGA_RAM_SIZE);
+ if (old_vram)
+ munmap(old_vram, s->map_end - s->map_addr);
+ s->map_addr = s->map_end = 0;
+ }
s->cirrus_linear_write[0] = cirrus_linear_writeb;
s->cirrus_linear_write[1] = cirrus_linear_writew;
s->cirrus_linear_write[2] = cirrus_linear_writel;
}
}
+#if defined(TARGET_I386)
+ kvm_update_vga_aliases(s, want_vga_alias);
+#endif
+
}
@@ -3009,6 +3139,11 @@
qemu_put_be32s(f, &s->hw_cursor_y);
/* XXX: we do not save the bitblt state - we assume we do not save
the state when the blitter is active */
+
+ if (kvm_enabled()) { /* XXX: KVM images ought to be loadable in QEMU */
+ qemu_put_be32s(f, &s->real_vram_size);
+ qemu_put_buffer(f, s->vram_ptr, s->real_vram_size);
+ }
}
static int cirrus_vga_load(QEMUFile *f, void *opaque, int version_id)
@@ -3059,6 +3194,20 @@
qemu_get_be32s(f, &s->hw_cursor_x);
qemu_get_be32s(f, &s->hw_cursor_y);
+ if (kvm_enabled()) {
+ int real_vram_size;
+ qemu_get_be32s(f, &real_vram_size);
+ if (real_vram_size != s->real_vram_size) {
+ if (real_vram_size > s->real_vram_size)
+ real_vram_size = s->real_vram_size;
+ printf("%s: REAL_VRAM_SIZE MISMATCH !!!!!! SAVED=%d CURRENT=%d",
+ __FUNCTION__, real_vram_size, s->real_vram_size);
+ }
+ qemu_get_buffer(f, s->vram_ptr, real_vram_size);
+ cirrus_update_memory_access(s);
+ }
+
+
/* force refresh */
s->graphic_mode = -1;
cirrus_update_bank_ptr(s, 0);
@@ -3214,6 +3363,15 @@
/* XXX: add byte swapping apertures */
cpu_register_physical_memory(addr, s->vram_size,
s->cirrus_linear_io_addr);
+ if (kvm_enabled()) {
+ s->cirrus_lfb_addr = addr;
+ s->cirrus_lfb_end = addr + VGA_RAM_SIZE;
+
+ if (s->map_addr && (s->cirrus_lfb_addr != s->map_addr) &&
+ (s->cirrus_lfb_end != s->map_end))
+ printf("cirrus vga map change while on lfb mode\n");
+ }
+
cpu_register_physical_memory(addr + 0x1000000, 0x400000,
s->cirrus_linear_bitblt_io_addr);
}
Index: qemu/hw/i8259.c
===================================================================
--- qemu.orig/hw/i8259.c 2007-11-17 19:44:36.000000000 -0600
+++ qemu/hw/i8259.c 2008-01-31 15:41:47.000000000 -0600
@@ -26,6 +26,8 @@
#include "isa.h"
#include "console.h"
+#include "qemu-kvm.h"
+
/* debug PIC */
//#define DEBUG_PIC
@@ -181,7 +183,11 @@
static void i8259_set_irq(void *opaque, int irq, int level)
{
PicState2 *s = opaque;
-
+#ifdef KVM_CAP_IRQCHIP
+ if (kvm_enabled())
+ if (kvm_set_irq(irq, level))
+ return;
+#endif
#if defined(DEBUG_PIC) || defined(DEBUG_IRQ_COUNT)
if (level != irq_level[irq]) {
#if defined(DEBUG_PIC)
@@ -448,10 +454,77 @@
return s->elcr;
}
+static void kvm_kernel_pic_save_to_user(PicState *s)
+{
+#if defined(KVM_CAP_IRQCHIP) && defined(TARGET_I386)
+ struct kvm_irqchip chip;
+ struct kvm_pic_state *kpic;
+
+ chip.chip_id = (&s->pics_state->pics[0] == s) ?
+ KVM_IRQCHIP_PIC_MASTER :
+ KVM_IRQCHIP_PIC_SLAVE;
+ kvm_get_irqchip(kvm_context, &chip);
+ kpic = &chip.chip.pic;
+
+ s->last_irr = kpic->last_irr;
+ s->irr = kpic->irr;
+ s->imr = kpic->imr;
+ s->isr = kpic->isr;
+ s->priority_add = kpic->priority_add;
+ s->irq_base = kpic->irq_base;
+ s->read_reg_select = kpic->read_reg_select;
+ s->poll = kpic->poll;
+ s->special_mask = kpic->special_mask;
+ s->init_state = kpic->init_state;
+ s->auto_eoi = kpic->auto_eoi;
+ s->rotate_on_auto_eoi = kpic->rotate_on_auto_eoi;
+ s->special_fully_nested_mode = kpic->special_fully_nested_mode;
+ s->init4 = kpic->init4;
+ s->elcr = kpic->elcr;
+ s->elcr_mask = kpic->elcr_mask;
+#endif
+}
+
+static void kvm_kernel_pic_load_from_user(PicState *s)
+{
+#if defined(KVM_CAP_IRQCHIP) && defined(TARGET_I386)
+ struct kvm_irqchip chip;
+ struct kvm_pic_state *kpic;
+
+ chip.chip_id = (&s->pics_state->pics[0] == s) ?
+ KVM_IRQCHIP_PIC_MASTER :
+ KVM_IRQCHIP_PIC_SLAVE;
+ kpic = &chip.chip.pic;
+
+ kpic->last_irr = s->last_irr;
+ kpic->irr = s->irr;
+ kpic->imr = s->imr;
+ kpic->isr = s->isr;
+ kpic->priority_add = s->priority_add;
+ kpic->irq_base = s->irq_base;
+ kpic->read_reg_select = s->read_reg_select;
+ kpic->poll = s->poll;
+ kpic->special_mask = s->special_mask;
+ kpic->init_state = s->init_state;
+ kpic->auto_eoi = s->auto_eoi;
+ kpic->rotate_on_auto_eoi = s->rotate_on_auto_eoi;
+ kpic->special_fully_nested_mode = s->special_fully_nested_mode;
+ kpic->init4 = s->init4;
+ kpic->elcr = s->elcr;
+ kpic->elcr_mask = s->elcr_mask;
+
+ kvm_set_irqchip(kvm_context, &chip);
+#endif
+}
+
static void pic_save(QEMUFile *f, void *opaque)
{
PicState *s = opaque;
+ if (kvm_enabled() && qemu_kvm_irqchip_in_kernel()) {
+ kvm_kernel_pic_save_to_user(s);
+ }
+
qemu_put_8s(f, &s->last_irr);
qemu_put_8s(f, &s->irr);
qemu_put_8s(f, &s->imr);
@@ -493,6 +566,11 @@
qemu_get_8s(f, &s->init4);
qemu_get_8s(f, &s->single_mode);
qemu_get_8s(f, &s->elcr);
+
+ if (kvm_enabled() && qemu_kvm_irqchip_in_kernel()) {
+ kvm_kernel_pic_load_from_user(s);
+ }
+
return 0;
}
Index: qemu/hw/pc.c
===================================================================
--- qemu.orig/hw/pc.c 2008-01-31 15:41:47.000000000 -0600
+++ qemu/hw/pc.c 2008-01-31 15:41:47.000000000 -0600
@@ -32,6 +32,8 @@
#include "smbus.h"
#include "boards.h"
+#include "qemu-kvm.h"
+
/* output Bochs bios info messages */
//#define DEBUG_BIOS
@@ -727,6 +729,10 @@
size = (size + 4095) & ~4095;
cpu_register_physical_memory(0xd0000 + offset,
size, option_rom_offset | IO_MEM_ROM);
+ if (kvm_enabled())
+ kvm_cpu_register_physical_memory(0xd0000 + offset,
+ size, option_rom_offset |
+ IO_MEM_ROM);
return size;
}
@@ -789,9 +795,23 @@
}
/* allocate RAM */
- ram_addr = qemu_ram_alloc(ram_size);
- cpu_register_physical_memory(0, ram_size, ram_addr);
-
+#ifdef KVM_CAP_USER_MEMORY
+ if (kvm_enabled() && kvm_qemu_check_extension(KVM_CAP_USER_MEMORY)) {
+ ram_addr = qemu_ram_alloc(0xa0000);
+ cpu_register_physical_memory(0, 0xa0000, ram_addr);
+ kvm_cpu_register_physical_memory(0, 0xa0000, ram_addr);
+
+ ram_addr = qemu_ram_alloc(0x100000 - 0xa0000); // hole
+ ram_addr = qemu_ram_alloc(ram_size - 0x100000);
+ cpu_register_physical_memory(0x100000, ram_size - 0x100000, ram_addr);
+ kvm_cpu_register_physical_memory(0x100000, ram_size - 0x100000,
+ ram_addr);
+ } else
+#endif
+ {
+ ram_addr = qemu_ram_alloc(ram_size);
+ cpu_register_physical_memory(0, ram_size, ram_addr);
+ }
/* allocate VGA RAM */
vga_ram_addr = qemu_ram_alloc(vga_ram_size);
@@ -834,11 +854,19 @@
if (above_4g_mem_size > 0) {
ram_addr = qemu_ram_alloc(above_4g_mem_size);
cpu_register_physical_memory(0x100000000, above_4g_mem_size, ram_addr);
+
+ if (kvm_enabled())
+ kvm_cpu_register_physical_memory(0x100000000,
+ above_4g_mem_size,
+ ram_addr);
}
/* setup basic memory access */
cpu_register_physical_memory(0xc0000, 0x10000,
vga_bios_offset | IO_MEM_ROM);
+ if (kvm_enabled())
+ kvm_cpu_register_physical_memory(0xc0000, 0x10000,
+ vga_bios_offset | IO_MEM_ROM);
/* map the last 128KB of the BIOS in ISA space */
isa_bios_size = bios_size;
@@ -846,9 +874,14 @@
isa_bios_size = 128 * 1024;
cpu_register_physical_memory(0xd0000, (192 * 1024) - isa_bios_size,
IO_MEM_UNASSIGNED);
+ /* kvm tpr optimization needs the bios accessible for write, at least to qemu itself */
cpu_register_physical_memory(0x100000 - isa_bios_size,
isa_bios_size,
- (bios_offset + bios_size - isa_bios_size) | IO_MEM_ROM);
+ (bios_offset + bios_size - isa_bios_size));
+ if (kvm_enabled())
+ kvm_cpu_register_physical_memory(0x100000 - isa_bios_size,
+ isa_bios_size,
+ (bios_offset + bios_size - isa_bios_size) | IO_MEM_ROM);
opt_rom_offset = 0;
for (i = 0; i < nb_option_roms; i++)
@@ -857,6 +890,23 @@
/* map all the bios at the top of memory */
cpu_register_physical_memory((uint32_t)(-bios_size),
bios_size, bios_offset | IO_MEM_ROM);
+ if (kvm_enabled()) {
+ int r;
+#ifdef KVM_CAP_USER_MEMORY
+ r = kvm_qemu_check_extension(KVM_CAP_USER_MEMORY);
+ if (r)
+ kvm_cpu_register_physical_memory((uint32_t)(-bios_size),
+ bios_size, bios_offset | IO_MEM_ROM);
+ else
+#endif
+ {
+ bios_mem = kvm_cpu_create_phys_mem((uint32_t)(-bios_size),
+ bios_size, 0, 1);
+ if (!bios_mem)
+ exit(1);
+ memcpy(bios_mem, phys_ram_base + bios_offset, bios_size);
+ }
+ }
bochs_bios_init();
Index: qemu/hw/vga.c
===================================================================
--- qemu.orig/hw/vga.c 2007-12-16 17:41:11.000000000 -0600
+++ qemu/hw/vga.c 2008-01-31 15:41:47.000000000 -0600
@@ -27,6 +27,10 @@
#include "pci.h"
#include "vga_int.h"
#include "pixel_ops.h"
+#include "qemu-kvm.h"
+#ifndef _WIN32
+#include <sys/mman.h>
+#endif
//#define DEBUG_VGA
//#define DEBUG_VGA_MEM
@@ -1412,17 +1416,37 @@
}
}
+static int bitmap_get_dirty(unsigned long *bitmap, unsigned nr)
+{
+ unsigned word = nr / ((sizeof bitmap[0]) * 8);
+ unsigned bit = nr % ((sizeof bitmap[0]) * 8);
+
+ //printf("%x -> %ld\n", nr, (bitmap[word] >> bit) & 1);
+ return (bitmap[word] >> bit) & 1;
+}
+
+
/*
* graphic modes
*/
static void vga_draw_graphic(VGAState *s, int full_update)
{
- int y1, y, update, page_min, page_max, linesize, y_start, double_scan, mask;
- int width, height, shift_control, line_offset, page0, page1, bwidth;
+ int y1, y, update, linesize, y_start, double_scan, mask;
+ int width, height, shift_control, line_offset, bwidth;
int disp_width, multi_scan, multi_run;
uint8_t *d;
uint32_t v, addr1, addr;
+ long page0, page1, page_min, page_max;
vga_draw_line_func *vga_draw_line;
+ /* HACK ALERT */
+#define VGA_BITMAP_SIZE ((8*1024*1024) / 4096 / 8 / sizeof(long))
+ unsigned long bitmap[VGA_BITMAP_SIZE];
+ int r;
+ if (kvm_enabled()) {
+ r = qemu_kvm_get_dirty_pages(s->map_addr, &bitmap);
+ if (r < 0)
+ fprintf(stderr, "kvm: get_dirty_pages returned %d\n", r);
+ }
full_update |= update_basic_params(s);
@@ -1530,10 +1554,17 @@
update = full_update |
cpu_physical_memory_get_dirty(page0, VGA_DIRTY_FLAG) |
cpu_physical_memory_get_dirty(page1, VGA_DIRTY_FLAG);
+ if (kvm_enabled()) {
+ update |= bitmap_get_dirty(bitmap, (page0 - s->vram_offset) >> TARGET_PAGE_BITS);
+ update |= bitmap_get_dirty(bitmap, (page1 - s->vram_offset) >> TARGET_PAGE_BITS);
+ }
+
if ((page1 - page0) > TARGET_PAGE_SIZE) {
/* if wide line, can use another page */
update |= cpu_physical_memory_get_dirty(page0 + TARGET_PAGE_SIZE,
VGA_DIRTY_FLAG);
+ if (kvm_enabled())
+ update |= bitmap_get_dirty(bitmap, (page0 - s->vram_offset) >> TARGET_PAGE_BITS);
}
/* explicit invalidation for the hardware cursor */
update |= (s->invalidated_y_table[y >> 5] >> (y & 0x1f)) & 1;
@@ -1787,9 +1818,41 @@
cpu_register_physical_memory(addr, s->bios_size, s->bios_offset);
} else {
cpu_register_physical_memory(addr, s->vram_size, s->vram_offset);
+ if (kvm_enabled()) {
+ unsigned long vga_ram_begin, vga_ram_end;
+ void *vram_pointer, *old_vram;
+
+ vga_ram_begin = addr;
+ vga_ram_end = addr + VGA_RAM_SIZE;
+
+ if (vga_ram_begin == s->map_addr &&
+ vga_ram_end == s->map_end) {
+ return;
+ }
+
+ if (s->map_addr && s->map_end)
+ unset_vram_mapping(s->map_addr, s->map_end);
+
+ vram_pointer = set_vram_mapping(vga_ram_begin, vga_ram_end);
+ if (!vram_pointer) {
+ fprintf(stderr, "set_vram_mapping failed\n");
+ s->map_addr = s->map_end = 0;
+ }
+ else {
+ old_vram = vga_update_vram((VGAState *)s, vram_pointer,
+ VGA_RAM_SIZE);
+ if (s->map_addr && s->map_end)
+ munmap(old_vram, s->map_end - s->map_addr);
+ else
+ qemu_free(old_vram);
+ s->map_addr = vga_ram_begin;
+ s->map_end = vga_ram_end;
+ }
+ }
}
}
+/* when used on xen/kvm environment, the vga_ram_base is not used */
void vga_common_init(VGAState *s, DisplayState *ds, uint8_t *vga_ram_base,
unsigned long vga_ram_offset, int vga_ram_size)
{
@@ -1820,7 +1883,10 @@
vga_reset(s);
- s->vram_ptr = vga_ram_base;
+ if (kvm_enabled())
+ s->vram_ptr = qemu_malloc(vga_ram_size);
+ else
+ s->vram_ptr = vga_ram_base;
s->vram_offset = vga_ram_offset;
s->vram_size = vga_ram_size;
s->ds = ds;
@@ -2053,6 +2119,31 @@
return 0;
}
+void *vga_update_vram(VGAState *s, void *vga_ram_base, int vga_ram_size)
+{
+ uint8_t *old_pointer;
+
+ if (s->vram_size != vga_ram_size) {
+ fprintf(stderr, "No support to change vga_ram_size\n");
+ return NULL;
+ }
+
+ if (!vga_ram_base) {
+ vga_ram_base = qemu_malloc(vga_ram_size);
+ if (!vga_ram_base) {
+ fprintf(stderr, "reallocate error\n");
+ return NULL;
+ }
+ }
+
+ /* XXX lock needed? */
+ memcpy(vga_ram_base, s->vram_ptr, vga_ram_size);
+ old_pointer = s->vram_ptr;
+ s->vram_ptr = vga_ram_base;
+
+ return old_pointer;
+}
+
/********************************************************/
/* vga screen dump */
Index: qemu/hw/vga_int.h
===================================================================
--- qemu.orig/hw/vga_int.h 2007-09-17 03:09:49.000000000 -0500
+++ qemu/hw/vga_int.h 2008-01-31 15:41:47.000000000 -0600
@@ -145,11 +145,20 @@
void (*cursor_draw_line)(struct VGAState *s, uint8_t *d, int y); \
/* tell for each page if it has been updated since the last time */ \
uint32_t last_palette[256]; \
- uint32_t last_ch_attr[CH_ATTR_SIZE]; /* XXX: make it dynamic */
+ uint32_t last_ch_attr[CH_ATTR_SIZE]; /* XXX: make it dynamic */ \
+ unsigned long map_addr; \
+ unsigned long map_end;
typedef struct VGAState {
VGA_STATE_COMMON
+
+ int32_t aliases_enabled;
+ int32_t pad1;
+ uint32_t aliased_bank_base[2];
+ uint32_t aliased_bank_limit[2];
+
+
} VGAState;
static inline int c6_to_8(int v)
@@ -182,5 +191,10 @@
unsigned int color0, unsigned int color1,
unsigned int color_xor);
+/* let kvm create vga memory */
+void *set_vram_mapping(unsigned long begin, unsigned long end);
+int unset_vram_mapping(unsigned long begin, unsigned long end);
+
+void *vga_update_vram(VGAState *s, void *vga_ram_base, int vga_ram_size);
extern const uint8_t sr_mask[8];
extern const uint8_t gr_mask[16];
Index: qemu/hw/vmport.c
===================================================================
--- qemu.orig/hw/vmport.c 2007-11-17 11:14:50.000000000 -0600
+++ qemu/hw/vmport.c 2008-01-31 15:41:47.000000000 -0600
@@ -21,10 +21,12 @@
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
+
#include "hw.h"
#include "isa.h"
#include "pc.h"
#include "sysemu.h"
+#include "qemu-kvm.h"
#define VMPORT_CMD_GETVERSION 0x0a
#define VMPORT_CMD_GETRAMSIZE 0x14
@@ -55,6 +57,10 @@
VMPortState *s = opaque;
unsigned char command;
uint32_t eax;
+ uint32_t ret;
+
+ if (kvm_enabled())
+ kvm_save_registers(s->env);
eax = s->env->regs[R_EAX];
if (eax != VMPORT_MAGIC)
@@ -69,7 +75,12 @@
return eax;
}
- return s->func[command](s->opaque[command], addr);
+ ret = s->func[command](s->opaque[command], addr);
+
+ if (kvm_enabled())
+ kvm_load_registers(s->env);
+
+ return ret;
}
static uint32_t vmport_cmd_get_version(void *opaque, uint32_t addr)
Index: qemu/kvm-tpr-opt.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ qemu/kvm-tpr-opt.c 2008-01-31 15:41:47.000000000 -0600
@@ -0,0 +1,288 @@
+
+#include "config.h"
+#include "config-host.h"
+
+#include <string.h>
+
+#include "hw/hw.h"
+#include "sysemu.h"
+#include "qemu-kvm.h"
+#include "cpu.h"
+
+#include <stdio.h>
+
+extern kvm_context_t kvm_context;
+
+static uint64_t map_addr(struct kvm_sregs *sregs, target_ulong virt, unsigned *perms)
+{
+ uint64_t mask = ((1ull << 48) - 1) & ~4095ull;
+ uint64_t p, pp = 7;
+
+ p = sregs->cr3;
+ if (sregs->cr4 & 0x20) {
+ p &= ~31ull;
+ p = ldq_phys(p + 8 * (virt >> 30));
+ if (!(p & 1))
+ return -1ull;
+ p &= mask;
+ p = ldq_phys(p + 8 * ((virt >> 21) & 511));
+ if (!(p & 1))
+ return -1ull;
+ pp &= p;
+ if (p & 128) {
+ p += ((virt >> 12) & 511) << 12;
+ } else {
+ p &= mask;
+ p = ldq_phys(p + 8 * ((virt >> 12) & 511));
+ if (!(p & 1))
+ return -1ull;
+ pp &= p;
+ }
+ } else {
+ p &= mask;
+ p = ldl_phys(p + 4 * ((virt >> 22) & 1023));
+ if (!(p & 1))
+ return -1ull;
+ pp &= p;
+ if (p & 128) {
+ p += ((virt >> 12) & 1023) << 12;
+ } else {
+ p &= mask;
+ p = ldl_phys(p + 4 * ((virt >> 12) & 1023));
+ pp &= p;
+ if (!(p & 1))
+ return -1ull;
+ }
+ }
+ if (perms)
+ *perms = pp >> 1;
+ p &= mask;
+ return p + (virt & 4095);
+}
+
+static uint8_t read_byte_virt(CPUState *env, target_ulong virt)
+{
+ struct kvm_sregs sregs;
+
+ kvm_get_sregs(kvm_context, env->cpu_index, &sregs);
+ return ldub_phys(map_addr(&sregs, virt, NULL));
+}
+
+static void write_byte_virt(CPUState *env, target_ulong virt, uint8_t b)
+{
+ struct kvm_sregs sregs;
+
+ kvm_get_sregs(kvm_context, env->cpu_index, &sregs);
+ stb_phys(map_addr(&sregs, virt, NULL), b);
+}
+
+static uint32_t get_bios_map(CPUState *env, unsigned *perms)
+{
+ uint32_t v;
+ struct kvm_sregs sregs;
+
+ kvm_get_sregs(kvm_context, env->cpu_index, &sregs);
+
+ for (v = -4096u; v != 0; v -= 4096)
+ if (map_addr(&sregs, v, perms) == 0xe0000)
+ return v;
+ return -1u;
+}
+
+struct vapic_bios {
+ char signature[8];
+ uint32_t virt_base;
+ uint32_t fixup_start;
+ uint32_t fixup_end;
+ uint32_t vapic;
+ uint32_t vapic_size;
+ uint32_t vcpu_shift;
+ uint32_t real_tpr;
+ uint32_t set_tpr;
+ uint32_t set_tpr_eax;
+ uint32_t get_tpr[8];
+};
+
+static struct vapic_bios vapic_bios;
+
+static uint32_t real_tpr;
+static uint32_t bios_addr;
+static uint32_t vapic_phys;
+static int bios_enabled;
+static uint32_t vbios_desc_phys;
+
+void update_vbios_real_tpr()
+{
+ cpu_physical_memory_rw(vbios_desc_phys, (void *)&vapic_bios, sizeof vapic_bios, 0);
+ vapic_bios.real_tpr = real_tpr;
+ vapic_bios.vcpu_shift = 7;
+ cpu_physical_memory_rw(vbios_desc_phys, (void *)&vapic_bios, sizeof vapic_bios, 1);
+}
+
+static unsigned modrm_reg(uint8_t modrm)
+{
+ return (modrm >> 3) & 7;
+}
+
+static int is_abs_modrm(uint8_t modrm)
+{
+ return (modrm & 0xc7) == 0x05;
+}
+
+static int instruction_is_ok(CPUState *env, uint64_t rip, int is_write)
+{
+ uint8_t b1, b2;
+ unsigned addr_offset;
+ uint32_t addr;
+ uint64_t p;
+
+ if ((rip & 0xf0000000) != 0x80000000 && (rip & 0xf0000000) != 0xe0000000)
+ return 0;
+ b1 = read_byte_virt(env, rip);
+ b2 = read_byte_virt(env, rip + 1);
+ switch (b1) {
+ case 0xc7: /* mov imm32, r/m32 (c7/0) */
+ if (modrm_reg(b2) != 0)
+ return 0;
+ /* fall through */
+ case 0x89: /* mov r32 to r/m32 */
+ case 0x8b: /* mov r/m32 to r32 */
+ if (!is_abs_modrm(b2))
+ return 0;
+ addr_offset = 2;
+ break;
+ case 0xa1: /* mov abs to eax */
+ case 0xa3: /* mov eax to abs */
+ addr_offset = 1;
+ break;
+ default:
+ return 0;
+ }
+ p = rip + addr_offset;
+ addr = read_byte_virt(env, p++);
+ addr |= read_byte_virt(env, p++) << 8;
+ addr |= read_byte_virt(env, p++) << 16;
+ addr |= read_byte_virt(env, p++) << 24;
+ if ((addr & 0xfff) != 0x80)
+ return 0;
+ real_tpr = addr;
+ update_vbios_real_tpr();
+ return 1;
+}
+
+static int bios_is_mapped(CPUState *env, uint64_t rip)
+{
+ uint32_t probe;
+ uint64_t phys;
+ struct kvm_sregs sregs;
+ unsigned perms;
+ uint32_t i;
+ uint32_t offset, fixup;
+
+ if (bios_enabled)
+ return 1;
+
+ kvm_get_sregs(kvm_context, env->cpu_index, &sregs);
+
+ probe = (rip & 0xf0000000) + 0xe0000;
+ phys = map_addr(&sregs, probe, &perms);
+ if (phys != 0xe0000)
+ return 0;
+ bios_addr = probe;
+ for (i = 0; i < 64; ++i) {
+ cpu_physical_memory_read(phys, (void *)&vapic_bios, sizeof(vapic_bios));
+ if (memcmp(vapic_bios.signature, "kvm aPiC", 8) == 0)
+ break;
+ phys += 1024;
+ bios_addr += 1024;
+ }
+ if (i == 64)
+ return 0;
+ if (bios_addr == vapic_bios.virt_base)
+ return 1;
+ vbios_desc_phys = phys;
+ for (i = vapic_bios.fixup_start; i < vapic_bios.fixup_end; i += 4) {
+ offset = ldl_phys(phys + i - vapic_bios.virt_base);
+ fixup = phys + offset;
+ stl_phys(fixup, ldl_phys(fixup) + bios_addr - vapic_bios.virt_base);
+ }
+ vapic_phys = vapic_bios.vapic - vapic_bios.virt_base + phys;
+ return 1;
+}
+
+static int enable_vapic(CPUState *env)
+{
+ struct kvm_sregs sregs;
+
+ kvm_get_sregs(kvm_context, env->cpu_index, &sregs);
+ sregs.tr.selector = 0xdb + (env->cpu_index << 8);
+ kvm_set_sregs(kvm_context, env->cpu_index, &sregs);
+
+ kvm_enable_vapic(kvm_context, env->cpu_index,
+ vapic_phys + (env->cpu_index << 7));
+ return 1;
+}
+
+static void patch_call(CPUState *env, uint64_t rip, uint32_t target)
+{
+ uint32_t offset;
+
+ offset = target - vapic_bios.virt_base + bios_addr - rip - 5;
+ write_byte_virt(env, rip, 0xe8); /* call near */
+ write_byte_virt(env, rip + 1, offset);
+ write_byte_virt(env, rip + 2, offset >> 8);
+ write_byte_virt(env, rip + 3, offset >> 16);
+ write_byte_virt(env, rip + 4, offset >> 24);
+}
+
+static void patch_instruction(CPUState *env, uint64_t rip)
+{
+ uint8_t b1, b2;
+
+ b1 = read_byte_virt(env, rip);
+ b2 = read_byte_virt(env, rip + 1);
+ switch (b1) {
+ case 0x89: /* mov r32 to r/m32 */
+ write_byte_virt(env, rip, 0x50 + modrm_reg(b2)); /* push reg */
+ patch_call(env, rip + 1, vapic_bios.set_tpr);
+ break;
+ case 0x8b: /* mov r/m32 to r32 */
+ write_byte_virt(env, rip, 0x90);
+ patch_call(env, rip + 1, vapic_bios.get_tpr[modrm_reg(b2)]);
+ break;
+ case 0xa1: /* mov abs to eax */
+ patch_call(env, rip, vapic_bios.get_tpr[0]);
+ break;
+ case 0xa3: /* mov eax to abs */
+ patch_call(env, rip, vapic_bios.set_tpr_eax);
+ break;
+ case 0xc7: /* mov imm32, r/m32 (c7/0) */
+ write_byte_virt(env, rip, 0x68); /* push imm32 */
+ write_byte_virt(env, rip + 1, read_byte_virt(env, rip+6));
+ write_byte_virt(env, rip + 2, read_byte_virt(env, rip+7));
+ write_byte_virt(env, rip + 3, read_byte_virt(env, rip+8));
+ write_byte_virt(env, rip + 4, read_byte_virt(env, rip+9));
+ patch_call(env, rip + 5, vapic_bios.set_tpr);
+ break;
+ default:
+ printf("funny insn %02x %02x\n", b1, b2);
+ }
+}
+
+void kvm_tpr_access_report(CPUState *env, uint64_t rip, int is_write)
+{
+ if (!instruction_is_ok(env, rip, is_write))
+ return;
+ if (!bios_is_mapped(env, rip))
+ return;
+ if (!enable_vapic(env))
+ return;
+ patch_instruction(env, rip);
+}
+
+void kvm_tpr_opt_setup(CPUState *env)
+{
+ if (smp_cpus > 1)
+ return;
+ kvm_enable_tpr_access_reporting(kvm_context, env->cpu_index);
+}
Index: qemu/monitor.c
===================================================================
--- qemu.orig/monitor.c 2007-12-16 21:15:51.000000000 -0600
+++ qemu/monitor.c 2008-01-31 15:41:47.000000000 -0600
@@ -36,6 +36,7 @@
#include "disas.h"
#include <dirent.h>
+#include "qemu-kvm.h"
#ifdef CONFIG_PROFILER
#include "qemu-timer.h" /* for ticks_per_sec */
#endif
@@ -283,6 +284,10 @@
if (!mon_cpu) {
mon_set_cpu(0);
}
+
+ if (kvm_enabled())
+ kvm_save_registers(mon_cpu);
+
return mon_cpu;
}
Index: qemu/qemu-img.c
===================================================================
--- qemu.orig/qemu-img.c 2008-01-06 11:21:48.000000000 -0600
+++ qemu/qemu-img.c 2008-01-31 15:41:47.000000000 -0600
@@ -55,6 +55,33 @@
return ptr;
}
+#ifdef _WIN32
+
+void *qemu_memalign(size_t alignment, size_t size)
+{
+ return VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE);
+}
+
+#else
+
+void *qemu_memalign(size_t alignment, size_t size)
+{
+#if defined(_POSIX_C_SOURCE)
+ int ret;
+ void *ptr;
+ ret = posix_memalign(&ptr, alignment, size);
+ if (ret != 0)
+ return NULL;
+ return ptr;
+#elif defined(_BSD)
+ return valloc(size);
+#else
+ return memalign(alignment, size);
+#endif
+}
+
+#endif
+
char *qemu_strdup(const char *str)
{
char *ptr;
Index: qemu/qemu-kvm-helper.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ qemu/qemu-kvm-helper.c 2008-01-31 15:41:47.000000000 -0600
@@ -0,0 +1,40 @@
+
+#include "config.h"
+#include "config-host.h"
+
+#include "exec.h"
+
+#include "qemu-kvm.h"
+
+void qemu_kvm_call_with_env(void (*func)(void *), void *data, CPUState *newenv)
+{
+ CPUState *oldenv;
+#define DECLARE_HOST_REGS
+#include "hostregs_helper.h"
+
+ oldenv = newenv;
+
+#define SAVE_HOST_REGS
+#include "hostregs_helper.h"
+
+ env = newenv;
+
+ env_to_regs();
+ func(data);
+ regs_to_env();
+
+ env = oldenv;
+
+#include "hostregs_helper.h"
+}
+
+static void call_helper_cpuid(void *junk)
+{
+ helper_cpuid();
+}
+
+void qemu_kvm_cpuid_on_env(CPUState *env)
+{
+ qemu_kvm_call_with_env(call_helper_cpuid, NULL, env);
+}
+
Index: qemu/qemu-kvm-x86.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ qemu/qemu-kvm-x86.c 2008-01-31 15:41:47.000000000 -0600
@@ -0,0 +1,628 @@
+
+#include "config.h"
+#include "config-host.h"
+
+#include <string.h>
+#include "hw/hw.h"
+
+#include "qemu-kvm.h"
+#include <libkvm.h>
+#include <pthread.h>
+#include <sys/utsname.h>
+
+#define MSR_IA32_TSC 0x10
+
+static struct kvm_msr_list *kvm_msr_list;
+extern unsigned int kvm_shadow_memory;
+extern kvm_context_t kvm_context;
+static int kvm_has_msr_star;
+
+static int lm_capable_kernel;
+
+int kvm_arch_qemu_create_context(void)
+{
+ int i;
+ if (kvm_shadow_memory)
+ kvm_set_shadow_pages(kvm_context, kvm_shadow_memory);
+
+ kvm_msr_list = kvm_get_msr_list(kvm_context);
+ if (!kvm_msr_list)
+ return -1;
+ for (i = 0; i < kvm_msr_list->nmsrs; ++i)
+ if (kvm_msr_list->indices[i] == MSR_STAR)
+ kvm_has_msr_star = 1;
+ return 0;
+}
+
+static void set_msr_entry(struct kvm_msr_entry *entry, uint32_t index,
+ uint64_t data)
+{
+ entry->index = index;
+ entry->data = data;
+}
+
+/* returns 0 on success, non-0 on failure */
+static int get_msr_entry(struct kvm_msr_entry *entry, CPUState *env)
+{
+ switch (entry->index) {
+ case MSR_IA32_SYSENTER_CS:
+ env->sysenter_cs = entry->data;
+ break;
+ case MSR_IA32_SYSENTER_ESP:
+ env->sysenter_esp = entry->data;
+ break;
+ case MSR_IA32_SYSENTER_EIP:
+ env->sysenter_eip = entry->data;
+ break;
+ case MSR_STAR:
+ env->star = entry->data;
+ break;
+#ifdef TARGET_X86_64
+ case MSR_CSTAR:
+ env->cstar = entry->data;
+ break;
+ case MSR_KERNELGSBASE:
+ env->kernelgsbase = entry->data;
+ break;
+ case MSR_FMASK:
+ env->fmask = entry->data;
+ break;
+ case MSR_LSTAR:
+ env->lstar = entry->data;
+ break;
+#endif
+ case MSR_IA32_TSC:
+ env->tsc = entry->data;
+ break;
+ default:
+ printf("Warning unknown msr index 0x%x\n", entry->index);
+ return 1;
+ }
+ return 0;
+}
+
+#ifdef TARGET_X86_64
+#define MSR_COUNT 9
+#else
+#define MSR_COUNT 5
+#endif
+
+static void set_v8086_seg(struct kvm_segment *lhs, const SegmentCache *rhs)
+{
+ lhs->selector = rhs->selector;
+ lhs->base = rhs->base;
+ lhs->limit = rhs->limit;
+ lhs->type = 3;
+ lhs->present = 1;
+ lhs->dpl = 3;
+ lhs->db = 0;
+ lhs->s = 1;
+ lhs->l = 0;
+ lhs->g = 0;
+ lhs->avl = 0;
+ lhs->unusable = 0;
+}
+
+static void set_seg(struct kvm_segment *lhs, const SegmentCache *rhs)
+{
+ unsigned flags = rhs->flags;
+ lhs->selector = rhs->selector;
+ lhs->base = rhs->base;
+ lhs->limit = rhs->limit;
+ lhs->type = (flags >> DESC_TYPE_SHIFT) & 15;
+ lhs->present = (flags & DESC_P_MASK) != 0;
+ lhs->dpl = rhs->selector & 3;
+ lhs->db = (flags >> DESC_B_SHIFT) & 1;
+ lhs->s = (flags & DESC_S_MASK) != 0;
+ lhs->l = (flags >> DESC_L_SHIFT) & 1;
+ lhs->g = (flags & DESC_G_MASK) != 0;
+ lhs->avl = (flags & DESC_AVL_MASK) != 0;
+ lhs->unusable = 0;
+}
+
+static void get_seg(SegmentCache *lhs, const struct kvm_segment *rhs)
+{
+ lhs->selector = rhs->selector;
+ lhs->base = rhs->base;
+ lhs->limit = rhs->limit;
+ lhs->flags =
+ (rhs->type << DESC_TYPE_SHIFT)
+ | (rhs->present * DESC_P_MASK)
+ | (rhs->dpl << DESC_DPL_SHIFT)
+ | (rhs->db << DESC_B_SHIFT)
+ | (rhs->s * DESC_S_MASK)
+ | (rhs->l << DESC_L_SHIFT)
+ | (rhs->g * DESC_G_MASK)
+ | (rhs->avl * DESC_AVL_MASK);
+}
+
+/* the reset values of qemu are not compatible to SVM
+ * this function is used to fix the segment descriptor values */
+static void fix_realmode_dataseg(struct kvm_segment *seg)
+{
+ seg->type = 0x02;
+ seg->present = 1;
+ seg->s = 1;
+}
+
+void kvm_arch_load_regs(CPUState *env)
+{
+ struct kvm_regs regs;
+ struct kvm_fpu fpu;
+ struct kvm_sregs sregs;
+ struct kvm_msr_entry msrs[MSR_COUNT];
+ int rc, n, i;
+
+ regs.rax = env->regs[R_EAX];
+ regs.rbx = env->regs[R_EBX];
+ regs.rcx = env->regs[R_ECX];
+ regs.rdx = env->regs[R_EDX];
+ regs.rsi = env->regs[R_ESI];
+ regs.rdi = env->regs[R_EDI];
+ regs.rsp = env->regs[R_ESP];
+ regs.rbp = env->regs[R_EBP];
+#ifdef TARGET_X86_64
+ regs.r8 = env->regs[8];
+ regs.r9 = env->regs[9];
+ regs.r10 = env->regs[10];
+ regs.r11 = env->regs[11];
+ regs.r12 = env->regs[12];
+ regs.r13 = env->regs[13];
+ regs.r14 = env->regs[14];
+ regs.r15 = env->regs[15];
+#endif
+
+ regs.rflags = env->eflags;
+ regs.rip = env->eip;
+
+ kvm_set_regs(kvm_context, env->cpu_index, ®s);
+
+ memset(&fpu, 0, sizeof fpu);
+ fpu.fsw = env->fpus & ~(7 << 11);
+ fpu.fsw |= (env->fpstt & 7) << 11;
+ fpu.fcw = env->fpuc;
+ for (i = 0; i < 8; ++i)
+ fpu.ftwx |= (!env->fptags[i]) << i;
+ memcpy(fpu.fpr, env->fpregs, sizeof env->fpregs);
+ memcpy(fpu.xmm, env->xmm_regs, sizeof env->xmm_regs);
+ fpu.mxcsr = env->mxcsr;
+ kvm_set_fpu(kvm_context, env->cpu_index, &fpu);
+
+ memcpy(sregs.interrupt_bitmap, env->kvm_interrupt_bitmap, sizeof(sregs.interrupt_bitmap));
+
+ if ((env->eflags & VM_MASK)) {
+ set_v8086_seg(&sregs.cs, &env->segs[R_CS]);
+ set_v8086_seg(&sregs.ds, &env->segs[R_DS]);
+ set_v8086_seg(&sregs.es, &env->segs[R_ES]);
+ set_v8086_seg(&sregs.fs, &env->segs[R_FS]);
+ set_v8086_seg(&sregs.gs, &env->segs[R_GS]);
+ set_v8086_seg(&sregs.ss, &env->segs[R_SS]);
+ } else {
+ set_seg(&sregs.cs, &env->segs[R_CS]);
+ set_seg(&sregs.ds, &env->segs[R_DS]);
+ set_seg(&sregs.es, &env->segs[R_ES]);
+ set_seg(&sregs.fs, &env->segs[R_FS]);
+ set_seg(&sregs.gs, &env->segs[R_GS]);
+ set_seg(&sregs.ss, &env->segs[R_SS]);
+
+ if (env->cr[0] & CR0_PE_MASK) {
+ /* force ss cpl to cs cpl */
+ sregs.ss.selector = (sregs.ss.selector & ~3) |
+ (sregs.cs.selector & 3);
+ sregs.ss.dpl = sregs.ss.selector & 3;
+ }
+
+ if (!(env->cr[0] & CR0_PG_MASK)) {
+ fix_realmode_dataseg(&sregs.cs);
+ fix_realmode_dataseg(&sregs.ds);
+ fix_realmode_dataseg(&sregs.es);
+ fix_realmode_dataseg(&sregs.fs);
+ fix_realmode_dataseg(&sregs.gs);
+ fix_realmode_dataseg(&sregs.ss);
+ }
+ }
+
+ set_seg(&sregs.tr, &env->tr);
+ set_seg(&sregs.ldt, &env->ldt);
+
+ sregs.idt.limit = env->idt.limit;
+ sregs.idt.base = env->idt.base;
+ sregs.gdt.limit = env->gdt.limit;
+ sregs.gdt.base = env->gdt.base;
+
+ sregs.cr0 = env->cr[0];
+ sregs.cr2 = env->cr[2];
+ sregs.cr3 = env->cr[3];
+ sregs.cr4 = env->cr[4];
+
+ sregs.apic_base = cpu_get_apic_base(env);
+ sregs.efer = env->efer;
+ sregs.cr8 = cpu_get_apic_tpr(env);
+
+ kvm_set_sregs(kvm_context, env->cpu_index, &sregs);
+
+ /* msrs */
+ n = 0;
+ set_msr_entry(&msrs[n++], MSR_IA32_SYSENTER_CS, env->sysenter_cs);
+ set_msr_entry(&msrs[n++], MSR_IA32_SYSENTER_ESP, env->sysenter_esp);
+ set_msr_entry(&msrs[n++], MSR_IA32_SYSENTER_EIP, env->sysenter_eip);
+ if (kvm_has_msr_star)
+ set_msr_entry(&msrs[n++], MSR_STAR, env->star);
+ set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
+#ifdef TARGET_X86_64
+ if (lm_capable_kernel) {
+ set_msr_entry(&msrs[n++], MSR_CSTAR, env->cstar);
+ set_msr_entry(&msrs[n++], MSR_KERNELGSBASE, env->kernelgsbase);
+ set_msr_entry(&msrs[n++], MSR_FMASK, env->fmask);
+ set_msr_entry(&msrs[n++], MSR_LSTAR , env->lstar);
+ }
+#endif
+
+ rc = kvm_set_msrs(kvm_context, env->cpu_index, msrs, n);
+ if (rc == -1)
+ perror("kvm_set_msrs FAILED");
+}
+
+
+void kvm_arch_save_regs(CPUState *env)
+{
+ struct kvm_regs regs;
+ struct kvm_fpu fpu;
+ struct kvm_sregs sregs;
+ struct kvm_msr_entry msrs[MSR_COUNT];
+ uint32_t hflags;
+ uint32_t i, n, rc;
+
+ kvm_get_regs(kvm_context, env->cpu_index, ®s);
+
+ env->regs[R_EAX] = regs.rax;
+ env->regs[R_EBX] = regs.rbx;
+ env->regs[R_ECX] = regs.rcx;
+ env->regs[R_EDX] = regs.rdx;
+ env->regs[R_ESI] = regs.rsi;
+ env->regs[R_EDI] = regs.rdi;
+ env->regs[R_ESP] = regs.rsp;
+ env->regs[R_EBP] = regs.rbp;
+#ifdef TARGET_X86_64
+ env->regs[8] = regs.r8;
+ env->regs[9] = regs.r9;
+ env->regs[10] = regs.r10;
+ env->regs[11] = regs.r11;
+ env->regs[12] = regs.r12;
+ env->regs[13] = regs.r13;
+ env->regs[14] = regs.r14;
+ env->regs[15] = regs.r15;
+#endif
+
+ env->eflags = regs.rflags;
+ env->eip = regs.rip;
+
+ kvm_get_fpu(kvm_context, env->cpu_index, &fpu);
+ env->fpstt = (fpu.fsw >> 11) & 7;
+ env->fpus = fpu.fsw;
+ env->fpuc = fpu.fcw;
+ for (i = 0; i < 8; ++i)
+ env->fptags[i] = !((fpu.ftwx >> i) & 1);
+ memcpy(env->fpregs, fpu.fpr, sizeof env->fpregs);
+ memcpy(env->xmm_regs, fpu.xmm, sizeof env->xmm_regs);
+ env->mxcsr = fpu.mxcsr;
+
+ kvm_get_sregs(kvm_context, env->cpu_index, &sregs);
+
+ memcpy(env->kvm_interrupt_bitmap, sregs.interrupt_bitmap, sizeof(env->kvm_interrupt_bitmap));
+
+ get_seg(&env->segs[R_CS], &sregs.cs);
+ get_seg(&env->segs[R_DS], &sregs.ds);
+ get_seg(&env->segs[R_ES], &sregs.es);
+ get_seg(&env->segs[R_FS], &sregs.fs);
+ get_seg(&env->segs[R_GS], &sregs.gs);
+ get_seg(&env->segs[R_SS], &sregs.ss);
+
+ get_seg(&env->tr, &sregs.tr);
+ get_seg(&env->ldt, &sregs.ldt);
+
+ env->idt.limit = sregs.idt.limit;
+ env->idt.base = sregs.idt.base;
+ env->gdt.limit = sregs.gdt.limit;
+ env->gdt.base = sregs.gdt.base;
+
+ env->cr[0] = sregs.cr0;
+ env->cr[2] = sregs.cr2;
+ env->cr[3] = sregs.cr3;
+ env->cr[4] = sregs.cr4;
+
+ cpu_set_apic_base(env, sregs.apic_base);
+
+ env->efer = sregs.efer;
+ //cpu_set_apic_tpr(env, sregs.cr8);
+
+#define HFLAG_COPY_MASK ~( \
+ HF_CPL_MASK | HF_PE_MASK | HF_MP_MASK | HF_EM_MASK | \
+ HF_TS_MASK | HF_TF_MASK | HF_VM_MASK | HF_IOPL_MASK | \
+ HF_OSFXSR_MASK | HF_LMA_MASK | HF_CS32_MASK | \
+ HF_SS32_MASK | HF_CS64_MASK | HF_ADDSEG_MASK)
+
+
+
+ hflags = (env->segs[R_CS].flags >> DESC_DPL_SHIFT) & HF_CPL_MASK;
+ hflags |= (env->cr[0] & CR0_PE_MASK) << (HF_PE_SHIFT - CR0_PE_SHIFT);
+ hflags |= (env->cr[0] << (HF_MP_SHIFT - CR0_MP_SHIFT)) &
+ (HF_MP_MASK | HF_EM_MASK | HF_TS_MASK);
+ hflags |= (env->eflags & (HF_TF_MASK | HF_VM_MASK | HF_IOPL_MASK));
+ hflags |= (env->cr[4] & CR4_OSFXSR_MASK) <<
+ (HF_OSFXSR_SHIFT - CR4_OSFXSR_SHIFT);
+
+ if (env->efer & MSR_EFER_LMA) {
+ hflags |= HF_LMA_MASK;
+ }
+
+ if ((hflags & HF_LMA_MASK) && (env->segs[R_CS].flags & DESC_L_MASK)) {
+ hflags |= HF_CS32_MASK | HF_SS32_MASK | HF_CS64_MASK;
+ } else {
+ hflags |= (env->segs[R_CS].flags & DESC_B_MASK) >>
+ (DESC_B_SHIFT - HF_CS32_SHIFT);
+ hflags |= (env->segs[R_SS].flags & DESC_B_MASK) >>
+ (DESC_B_SHIFT - HF_SS32_SHIFT);
+ if (!(env->cr[0] & CR0_PE_MASK) ||
+ (env->eflags & VM_MASK) ||
+ !(hflags & HF_CS32_MASK)) {
+ hflags |= HF_ADDSEG_MASK;
+ } else {
+ hflags |= ((env->segs[R_DS].base |
+ env->segs[R_ES].base |
+ env->segs[R_SS].base) != 0) <<
+ HF_ADDSEG_SHIFT;
+ }
+ }
+ env->hflags = (env->hflags & HFLAG_COPY_MASK) | hflags;
+ env->cc_src = env->eflags & (CC_O | CC_S | CC_Z | CC_A | CC_P | CC_C);
+ env->df = 1 - (2 * ((env->eflags >> 10) & 1));
+ env->cc_op = CC_OP_EFLAGS;
+ env->eflags &= ~(DF_MASK | CC_O | CC_S | CC_Z | CC_A | CC_P | CC_C);
+
+ /* msrs */
+ n = 0;
+ msrs[n++].index = MSR_IA32_SYSENTER_CS;
+ msrs[n++].index = MSR_IA32_SYSENTER_ESP;
+ msrs[n++].index = MSR_IA32_SYSENTER_EIP;
+ if (kvm_has_msr_star)
+ msrs[n++].index = MSR_STAR;
+ msrs[n++].index = MSR_IA32_TSC;
+#ifdef TARGET_X86_64
+ if (lm_capable_kernel) {
+ msrs[n++].index = MSR_CSTAR;
+ msrs[n++].index = MSR_KERNELGSBASE;
+ msrs[n++].index = MSR_FMASK;
+ msrs[n++].index = MSR_LSTAR;
+ }
+#endif
+ rc = kvm_get_msrs(kvm_context, env->cpu_index, msrs, n);
+ if (rc == -1) {
+ perror("kvm_get_msrs FAILED");
+ }
+ else {
+ n = rc; /* actual number of MSRs */
+ for (i=0 ; i<n; i++) {
+ if (get_msr_entry(&msrs[i], env))
+ return;
+ }
+ }
+}
+
+static void host_cpuid(uint32_t function, uint32_t *eax, uint32_t *ebx,
+ uint32_t *ecx, uint32_t *edx)
+{
+ uint32_t vec[4];
+
+ vec[0] = function;
+ asm volatile (
+#ifdef __x86_64__
+ "sub $128, %%rsp \n\t" /* skip red zone */
+ "push %0; push %%rsi \n\t"
+ "push %%rax; push %%rbx; push %%rcx; push %%rdx \n\t"
+ "mov 8*5(%%rsp), %%rsi \n\t"
+ "mov (%%rsi), %%eax \n\t"
+ "cpuid \n\t"
+ "mov %%eax, (%%rsi) \n\t"
+ "mov %%ebx, 4(%%rsi) \n\t"
+ "mov %%ecx, 8(%%rsi) \n\t"
+ "mov %%edx, 12(%%rsi) \n\t"
+ "pop %%rdx; pop %%rcx; pop %%rbx; pop %%rax \n\t"
+ "pop %%rsi; pop %0 \n\t"
+ "add $128, %%rsp"
+#else
+ "push %0; push %%esi \n\t"
+ "push %%eax; push %%ebx; push %%ecx; push %%edx \n\t"
+ "mov 4*5(%%esp), %%esi \n\t"
+ "mov (%%esi), %%eax \n\t"
+ "cpuid \n\t"
+ "mov %%eax, (%%esi) \n\t"
+ "mov %%ebx, 4(%%esi) \n\t"
+ "mov %%ecx, 8(%%esi) \n\t"
+ "mov %%edx, 12(%%esi) \n\t"
+ "pop %%edx; pop %%ecx; pop %%ebx; pop %%eax \n\t"
+ "pop %%esi; pop %0 \n\t"
+#endif
+ : : "rm"(vec) : "memory");
+ if (eax)
+ *eax = vec[0];
+ if (ebx)
+ *ebx = vec[1];
+ if (ecx)
+ *ecx = vec[2];
+ if (edx)
+ *edx = vec[3];
+}
+
+
+static void do_cpuid_ent(struct kvm_cpuid_entry *e, uint32_t function,
+ CPUState *env)
+{
+ env->regs[R_EAX] = function;
+ qemu_kvm_cpuid_on_env(env);
+ e->function = function;
+ e->eax = env->regs[R_EAX];
+ e->ebx = env->regs[R_EBX];
+ e->ecx = env->regs[R_ECX];
+ e->edx = env->regs[R_EDX];
+ if (function == 0x80000001) {
+ uint32_t h_eax, h_edx;
+ struct utsname utsname;
+
+ host_cpuid(function, &h_eax, NULL, NULL, &h_edx);
+ uname(&utsname);
+ lm_capable_kernel = strcmp(utsname.machine, "x86_64") == 0;
+
+ // long mode
+ if ((h_edx & 0x20000000) == 0 || !lm_capable_kernel)
+ e->edx &= ~0x20000000u;
+ // syscall
+ if ((h_edx & 0x00000800) == 0)
+ e->edx &= ~0x00000800u;
+ // nx
+ if ((h_edx & 0x00100000) == 0)
+ e->edx &= ~0x00100000u;
+ // svm
+ if (e->ecx & 4)
+ e->ecx &= ~4u;
+ }
+ // sysenter isn't supported on compatibility mode on AMD. and syscall
+ // isn't supported in compatibility mode on Intel. so advertise the
+ // actuall cpu, and say goodbye to migration between different vendors
+ // is you use compatibility mode.
+ if (function == 0) {
+ uint32_t bcd[3];
+
+ host_cpuid(0, NULL, &bcd[0], &bcd[1], &bcd[2]);
+ e->ebx = bcd[0];
+ e->ecx = bcd[1];
+ e->edx = bcd[2];
+ }
+}
+
+int kvm_arch_qemu_init_env(CPUState *cenv)
+{
+ struct kvm_cpuid_entry cpuid_ent[100];
+#ifdef KVM_CPUID_SIGNATURE
+ struct kvm_cpuid_entry *pv_ent;
+ uint32_t signature[3];
+#endif
+ int cpuid_nent = 0;
+ CPUState copy;
+ uint32_t i, limit;
+
+ copy = *cenv;
+
+#ifdef KVM_CPUID_SIGNATURE
+ /* Paravirtualization CPUIDs */
+ memcpy(signature, "KVMKVMKVM", 12);
+ pv_ent = &cpuid_ent[cpuid_nent++];
+ memset(pv_ent, 0, sizeof(*pv_ent));
+ pv_ent->function = KVM_CPUID_SIGNATURE;
+ pv_ent->eax = 0;
+ pv_ent->ebx = signature[0];
+ pv_ent->ecx = signature[1];
+ pv_ent->edx = signature[2];
+
+ pv_ent = &cpuid_ent[cpuid_nent++];
+ memset(pv_ent, 0, sizeof(*pv_ent));
+ pv_ent->function = KVM_CPUID_FEATURES;
+ pv_ent->eax = 0;
+#endif
+
+ copy.regs[R_EAX] = 0;
+ qemu_kvm_cpuid_on_env(©);
+ limit = copy.regs[R_EAX];
+
+ for (i = 0; i <= limit; ++i)
+ do_cpuid_ent(&cpuid_ent[cpuid_nent++], i, ©);
+
+ copy.regs[R_EAX] = 0x80000000;
+ qemu_kvm_cpuid_on_env(©);
+ limit = copy.regs[R_EAX];
+
+ for (i = 0x80000000; i <= limit; ++i)
+ do_cpuid_ent(&cpuid_ent[cpuid_nent++], i, ©);
+
+ kvm_setup_cpuid(kvm_context, cenv->cpu_index, cpuid_nent, cpuid_ent);
+ return 0;
+}
+
+int kvm_arch_halt(void *opaque, int vcpu)
+{
+ CPUState *env = cpu_single_env;
+
+ if (!((env->interrupt_request & CPU_INTERRUPT_HARD) &&
+ (env->eflags & IF_MASK))) {
+ env->hflags |= HF_HALTED_MASK;
+ env->exception_index = EXCP_HLT;
+ }
+ return 1;
+}
+
+void kvm_arch_pre_kvm_run(void *opaque, int vcpu)
+{
+ CPUState *env = cpu_single_env;
+
+ if (!kvm_irqchip_in_kernel(kvm_context))
+ kvm_set_cr8(kvm_context, vcpu, cpu_get_apic_tpr(env));
+}
+
+void kvm_arch_post_kvm_run(void *opaque, int vcpu)
+{
+ CPUState *env = qemu_kvm_cpu_env(vcpu);
+ cpu_single_env = env;
+
+ env->eflags = kvm_get_interrupt_flag(kvm_context, vcpu)
+ ? env->eflags | IF_MASK : env->eflags & ~IF_MASK;
+ env->ready_for_interrupt_injection
+ = kvm_is_ready_for_interrupt_injection(kvm_context, vcpu);
+
+ cpu_set_apic_tpr(env, kvm_get_cr8(kvm_context, vcpu));
+ cpu_set_apic_base(env, kvm_get_apic_base(kvm_context, vcpu));
+}
+
+int kvm_arch_has_work(CPUState *env)
+{
+ if ((env->interrupt_request & (CPU_INTERRUPT_HARD | CPU_INTERRUPT_EXIT)) &&
+ (env->eflags & IF_MASK))
+ return 1;
+ return 0;
+}
+
+int kvm_arch_try_push_interrupts(void *opaque)
+{
+ CPUState *env = cpu_single_env;
+ int r, irq;
+
+ if (env->ready_for_interrupt_injection &&
+ (env->interrupt_request & CPU_INTERRUPT_HARD) &&
+ (env->eflags & IF_MASK)) {
+ env->interrupt_request &= ~CPU_INTERRUPT_HARD;
+ irq = cpu_get_pic_interrupt(env);
+ if (irq >= 0) {
+ r = kvm_inject_irq(kvm_context, env->cpu_index, irq);
+ if (r < 0)
+ printf("cpu %d fail inject %x\n", env->cpu_index, irq);
+ }
+ }
+
+ return (env->interrupt_request & CPU_INTERRUPT_HARD) != 0;
+}
+
+void kvm_arch_update_regs_for_sipi(CPUState *env)
+{
+ SegmentCache cs = env->segs[R_CS];
+
+ kvm_arch_save_regs(env);
+ env->segs[R_CS] = cs;
+ env->eip = 0;
+ kvm_arch_load_regs(env);
+}
+
+int handle_tpr_access(void *opaque, int vcpu,
+ uint64_t rip, int is_write)
+{
+ kvm_tpr_access_report(cpu_single_env, rip, is_write);
+ return 0;
+}
Index: qemu/qemu-kvm.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ qemu/qemu-kvm.c 2008-01-31 15:54:36.000000000 -0600
@@ -0,0 +1,790 @@
+
+#include "config.h"
+#include "config-host.h"
+
+int kvm_allowed = 1;
+int kvm_irqchip = 1;
+
+#include <string.h>
+#include "hw/hw.h"
+#include "sysemu.h"
+
+#include "qemu-kvm.h"
+#include <libkvm.h>
+#include <pthread.h>
+#include <sys/utsname.h>
+
+extern void perror(const char *s);
+
+kvm_context_t kvm_context;
+
+extern int smp_cpus;
+
+pthread_mutex_t qemu_mutex = PTHREAD_MUTEX_INITIALIZER;
+pthread_cond_t qemu_aio_cond = PTHREAD_COND_INITIALIZER;
+__thread struct vcpu_info *vcpu;
+
+struct qemu_kvm_signal_table {
+ sigset_t sigset;
+ sigset_t negsigset;
+};
+
+static struct qemu_kvm_signal_table io_signal_table;
+
+#define SIG_IPI (SIGRTMIN+4)
+
+struct vcpu_info {
+ CPUState *env;
+ int sipi_needed;
+ int init;
+ pthread_t thread;
+ int signalled;
+ int stop;
+ int stopped;
+} vcpu_info[4];
+
+CPUState *qemu_kvm_cpu_env(int index)
+{
+ return vcpu_info[index].env;
+}
+
+static void sig_ipi_handler(int n)
+{
+}
+
+void kvm_update_interrupt_request(CPUState *env)
+{
+ if (env && vcpu && env != vcpu->env) {
+ if (vcpu_info[env->cpu_index].signalled)
+ return;
+ vcpu_info[env->cpu_index].signalled = 1;
+ if (vcpu_info[env->cpu_index].thread)
+ pthread_kill(vcpu_info[env->cpu_index].thread, SIG_IPI);
+ }
+}
+
+void kvm_update_after_sipi(CPUState *env)
+{
+ vcpu_info[env->cpu_index].sipi_needed = 1;
+ kvm_update_interrupt_request(env);
+}
+
+void kvm_apic_init(CPUState *env)
+{
+ if (env->cpu_index != 0)
+ vcpu_info[env->cpu_index].init = 1;
+ kvm_update_interrupt_request(env);
+}
+
+#include <signal.h>
+
+static int try_push_interrupts(void *opaque)
+{
+ return kvm_arch_try_push_interrupts(opaque);
+}
+
+static void post_kvm_run(void *opaque, int vcpu)
+{
+
+ pthread_mutex_lock(&qemu_mutex);
+ kvm_arch_post_kvm_run(opaque, vcpu);
+}
+
+static int pre_kvm_run(void *opaque, int vcpu)
+{
+ CPUState *env = cpu_single_env;
+
+ kvm_arch_pre_kvm_run(opaque, vcpu);
+
+ if (env->interrupt_request & CPU_INTERRUPT_EXIT)
+ return 1;
+ pthread_mutex_unlock(&qemu_mutex);
+ return 0;
+}
+
+void kvm_load_registers(CPUState *env)
+{
+ if (kvm_enabled())
+ kvm_arch_load_regs(env);
+}
+
+void kvm_save_registers(CPUState *env)
+{
+ if (kvm_enabled())
+ kvm_arch_save_regs(env);
+}
+
+int kvm_cpu_exec(CPUState *env)
+{
+ int r;
+
+ r = kvm_run(kvm_context, env->cpu_index);
+ if (r < 0) {
+ printf("kvm_run returned %d\n", r);
+ exit(1);
+ }
+
+ return 0;
+}
+
+extern int vm_running;
+
+static int has_work(CPUState *env)
+{
+ if (!vm_running)
+ return 0;
+ if (!(env->hflags & HF_HALTED_MASK))
+ return 1;
+ return kvm_arch_has_work(env);
+}
+
+static int kvm_eat_signal(CPUState *env, int timeout)
+{
+ struct timespec ts;
+ int r, e, ret = 0;
+ siginfo_t siginfo;
+ struct sigaction sa;
+
+ ts.tv_sec = timeout / 1000;
+ ts.tv_nsec = (timeout % 1000) * 1000000;
+ r = sigtimedwait(&io_signal_table.sigset, &siginfo, &ts);
+ if (r == -1 && (errno == EAGAIN || errno == EINTR) && !timeout)
+ return 0;
+ e = errno;
+ pthread_mutex_lock(&qemu_mutex);
+ if (vcpu)
+ cpu_single_env = vcpu->env;
+ if (r == -1 && !(errno == EAGAIN || errno == EINTR)) {
+ printf("sigtimedwait: %s\n", strerror(e));
+ exit(1);
+ }
+ if (r != -1) {
+ sigaction(siginfo.si_signo, NULL, &sa);
+ sa.sa_handler(siginfo.si_signo);
+ if (siginfo.si_signo == SIGUSR2)
+ pthread_cond_signal(&qemu_aio_cond);
+ ret = 1;
+ }
+ pthread_mutex_unlock(&qemu_mutex);
+
+ return ret;
+}
+
+
+static void kvm_eat_signals(CPUState *env, int timeout)
+{
+ int r = 0;
+
+ while (kvm_eat_signal(env, 0))
+ r = 1;
+ if (!r && timeout) {
+ r = kvm_eat_signal(env, timeout);
+ if (r)
+ while (kvm_eat_signal(env, 0))
+ ;
+ }
+ /*
+ * we call select() even if no signal was received, to account for
+ * for which there is no signal handler installed.
+ */
+ pthread_mutex_lock(&qemu_mutex);
+ cpu_single_env = vcpu->env;
+ main_loop_wait(0);
+ pthread_mutex_unlock(&qemu_mutex);
+}
+
+static void kvm_main_loop_wait(CPUState *env, int timeout)
+{
+ pthread_mutex_unlock(&qemu_mutex);
+ if (env->cpu_index == 0)
+ kvm_eat_signals(env, timeout);
+ else {
+ if (!kvm_irqchip_in_kernel(kvm_context) &&
+ (timeout || vcpu_info[env->cpu_index].stopped)) {
+ sigset_t set;
+ int n;
+
+ paused:
+ sigemptyset(&set);
+ sigaddset(&set, SIG_IPI);
+ sigwait(&set, &n);
+ } else {
+ struct timespec ts;
+ siginfo_t siginfo;
+ sigset_t set;
+
+ ts.tv_sec = 0;
+ ts.tv_nsec = 0;
+ sigemptyset(&set);
+ sigaddset(&set, SIG_IPI);
+ sigtimedwait(&set, &siginfo, &ts);
+ }
+ if (vcpu_info[env->cpu_index].stop) {
+ vcpu_info[env->cpu_index].stop = 0;
+ vcpu_info[env->cpu_index].stopped = 1;
+ pthread_kill(vcpu_info[0].thread, SIG_IPI);
+ goto paused;
+ }
+ }
+ pthread_mutex_lock(&qemu_mutex);
+ cpu_single_env = env;
+ vcpu_info[env->cpu_index].signalled = 0;
+}
+
+static int all_threads_paused(void)
+{
+ int i;
+
+ for (i = 1; i < smp_cpus; ++i)
+ if (vcpu_info[i].stopped)
+ return 0;
+ return 1;
+}
+
+static void pause_other_threads(void)
+{
+ int i;
+
+ for (i = 1; i < smp_cpus; ++i) {
+ vcpu_info[i].stop = 1;
+ pthread_kill(vcpu_info[i].thread, SIG_IPI);
+ }
+ while (!all_threads_paused())
+ kvm_eat_signals(vcpu->env, 0);
+}
+
+static void resume_other_threads(void)
+{
+ int i;
+
+ for (i = 1; i < smp_cpus; ++i) {
+ vcpu_info[i].stop = 0;
+ vcpu_info[i].stopped = 0;
+ pthread_kill(vcpu_info[i].thread, SIG_IPI);
+ }
+}
+
+static void kvm_vm_state_change_handler(void *context, int running)
+{
+ if (running)
+ resume_other_threads();
+ else
+ pause_other_threads();
+}
+
+static void update_regs_for_sipi(CPUState *env)
+{
+ kvm_arch_update_regs_for_sipi(env);
+ vcpu_info[env->cpu_index].sipi_needed = 0;
+ vcpu_info[env->cpu_index].init = 0;
+}
+
+static void update_regs_for_init(CPUState *env)
+{
+ cpu_reset(env);
+ kvm_arch_load_regs(env);
+}
+
+static void setup_kernel_sigmask(CPUState *env)
+{
+ sigset_t set;
+
+ sigprocmask(SIG_BLOCK, NULL, &set);
+ sigdelset(&set, SIG_IPI);
+ if (env->cpu_index == 0)
+ sigandset(&set, &set, &io_signal_table.negsigset);
+
+ kvm_set_signal_mask(kvm_context, env->cpu_index, &set);
+}
+
+static int kvm_main_loop_cpu(CPUState *env)
+{
+ struct vcpu_info *info = &vcpu_info[env->cpu_index];
+
+ setup_kernel_sigmask(env);
+ pthread_mutex_lock(&qemu_mutex);
+
+ kvm_qemu_init_env(env);
+ env->ready_for_interrupt_injection = 1;
+
+ cpu_single_env = env;
+#ifdef TARGET_I386
+ kvm_tpr_opt_setup(env);
+#endif
+ while (1) {
+ while (!has_work(env))
+ kvm_main_loop_wait(env, 10);
+ if (env->interrupt_request & CPU_INTERRUPT_HARD)
+ env->hflags &= ~HF_HALTED_MASK;
+ if (!kvm_irqchip_in_kernel(kvm_context) && info->sipi_needed)
+ update_regs_for_sipi(env);
+ if (!kvm_irqchip_in_kernel(kvm_context) && info->init)
+ update_regs_for_init(env);
+ if (!(env->hflags & HF_HALTED_MASK) && !info->init)
+ kvm_cpu_exec(env);
+ env->interrupt_request &= ~CPU_INTERRUPT_EXIT;
+ kvm_main_loop_wait(env, 0);
+ if (qemu_shutdown_requested())
+ break;
+ else if (qemu_powerdown_requested())
+ qemu_system_powerdown();
+ else if (qemu_reset_requested()) {
+ env->interrupt_request = 0;
+ qemu_system_reset();
+ kvm_arch_load_regs(env);
+ }
+ }
+ pthread_mutex_unlock(&qemu_mutex);
+ return 0;
+}
+
+static void *ap_main_loop(void *_env)
+{
+ CPUState *env = _env;
+ sigset_t signals;
+
+ vcpu = &vcpu_info[env->cpu_index];
+ vcpu->env = env;
+ sigfillset(&signals);
+ //sigdelset(&signals, SIG_IPI);
+ sigprocmask(SIG_BLOCK, &signals, NULL);
+ kvm_create_vcpu(kvm_context, env->cpu_index);
+ kvm_qemu_init_env(env);
+ if (kvm_irqchip_in_kernel(kvm_context))
+ env->hflags &= ~HF_HALTED_MASK;
+ kvm_main_loop_cpu(env);
+ return NULL;
+}
+
+static void qemu_kvm_init_signal_table(struct qemu_kvm_signal_table *sigtab)
+{
+ sigemptyset(&sigtab->sigset);
+ sigfillset(&sigtab->negsigset);
+}
+
+static void kvm_add_signal(struct qemu_kvm_signal_table *sigtab, int signum)
+{
+ sigaddset(&sigtab->sigset, signum);
+ sigdelset(&sigtab->negsigset, signum);
+}
+
+int kvm_init_ap(void)
+{
+ CPUState *env = first_cpu->next_cpu;
+ int i;
+
+ qemu_add_vm_change_state_handler(kvm_vm_state_change_handler, NULL);
+ qemu_kvm_init_signal_table(&io_signal_table);
+ kvm_add_signal(&io_signal_table, SIGIO);
+ kvm_add_signal(&io_signal_table, SIGALRM);
+ kvm_add_signal(&io_signal_table, SIGUSR2);
+ kvm_add_signal(&io_signal_table, SIG_IPI);
+ sigprocmask(SIG_BLOCK, &io_signal_table.sigset, NULL);
+
+ vcpu = &vcpu_info[0];
+ vcpu->env = first_cpu;
+ signal(SIG_IPI, sig_ipi_handler);
+ for (i = 1; i < smp_cpus; ++i) {
+ pthread_create(&vcpu_info[i].thread, NULL, ap_main_loop, env);
+ env = env->next_cpu;
+ }
+ return 0;
+}
+
+int kvm_main_loop(void)
+{
+ vcpu_info[0].thread = pthread_self();
+ pthread_mutex_unlock(&qemu_mutex);
+ return kvm_main_loop_cpu(first_cpu);
+}
+
+static int kvm_debug(void *opaque, int vcpu)
+{
+ CPUState *env = cpu_single_env;
+
+ env->exception_index = EXCP_DEBUG;
+ return 1;
+}
+
+static int kvm_inb(void *opaque, uint16_t addr, uint8_t *data)
+{
+ *data = cpu_inb(0, addr);
+ return 0;
+}
+
+static int kvm_inw(void *opaque, uint16_t addr, uint16_t *data)
+{
+ *data = cpu_inw(0, addr);
+ return 0;
+}
+
+static int kvm_inl(void *opaque, uint16_t addr, uint32_t *data)
+{
+ *data = cpu_inl(0, addr);
+ return 0;
+}
+
+#define PM_IO_BASE 0xb000
+
+static int kvm_outb(void *opaque, uint16_t addr, uint8_t data)
+{
+ if (addr == 0xb2) {
+ switch (data) {
+ case 0: {
+ cpu_outb(0, 0xb3, 0);
+ break;
+ }
+ case 0xf0: {
+ unsigned x;
+
+ /* enable acpi */
+ x = cpu_inw(0, PM_IO_BASE + 4);
+ x &= ~1;
+ cpu_outw(0, PM_IO_BASE + 4, x);
+ break;
+ }
+ case 0xf1: {
+ unsigned x;
+
+ /* enable acpi */
+ x = cpu_inw(0, PM_IO_BASE + 4);
+ x |= 1;
+ cpu_outw(0, PM_IO_BASE + 4, x);
+ break;
+ }
+ default:
+ break;
+ }
+ return 0;
+ }
+ cpu_outb(0, addr, data);
+ return 0;
+}
+
+static int kvm_outw(void *opaque, uint16_t addr, uint16_t data)
+{
+ cpu_outw(0, addr, data);
+ return 0;
+}
+
+static int kvm_outl(void *opaque, uint16_t addr, uint32_t data)
+{
+ cpu_outl(0, addr, data);
+ return 0;
+}
+
+static int kvm_mmio_read(void *opaque, uint64_t addr, uint8_t *data, int len)
+{
+ cpu_physical_memory_rw(addr, data, len, 0);
+ return 0;
+}
+
+static int kvm_mmio_write(void *opaque, uint64_t addr, uint8_t *data, int len)
+{
+ cpu_physical_memory_rw(addr, data, len, 1);
+ return 0;
+}
+
+static int kvm_io_window(void *opaque)
+{
+ return 1;
+}
+
+
+static int kvm_halt(void *opaque, int vcpu)
+{
+ return kvm_arch_halt(opaque, vcpu);
+}
+
+static int kvm_shutdown(void *opaque, int vcpu)
+{
+ qemu_system_reset_request();
+ return 1;
+}
+
+static struct kvm_callbacks qemu_kvm_ops = {
+ .debug = kvm_debug,
+ .inb = kvm_inb,
+ .inw = kvm_inw,
+ .inl = kvm_inl,
+ .outb = kvm_outb,
+ .outw = kvm_outw,
+ .outl = kvm_outl,
+ .mmio_read = kvm_mmio_read,
+ .mmio_write = kvm_mmio_write,
+ .halt = kvm_halt,
+ .shutdown = kvm_shutdown,
+ .io_window = kvm_io_window,
+ .try_push_interrupts = try_push_interrupts,
+ .post_kvm_run = post_kvm_run,
+ .pre_kvm_run = pre_kvm_run,
+#ifdef TARGET_I386
+ .tpr_access = handle_tpr_access,
+#endif
+#ifdef TARGET_PPC
+ .powerpc_dcr_read = handle_powerpc_dcr_read,
+ .powerpc_dcr_write = handle_powerpc_dcr_write,
+#endif
+};
+
+int kvm_qemu_init()
+{
+ /* Try to initialize kvm */
+ kvm_context = kvm_init(&qemu_kvm_ops, cpu_single_env);
+ if (!kvm_context) {
+ return -1;
+ }
+ pthread_mutex_lock(&qemu_mutex);
+
+ return 0;
+}
+
+int kvm_qemu_create_context(void)
+{
+ int r;
+ if (!kvm_irqchip) {
+ kvm_disable_irqchip_creation(kvm_context);
+ }
+ if (kvm_create(kvm_context, phys_ram_size, (void**)&phys_ram_base) < 0) {
+ kvm_qemu_destroy();
+ return -1;
+ }
+ r = kvm_arch_qemu_create_context();
+ if(r <0)
+ kvm_qemu_destroy();
+ return 0;
+}
+
+void kvm_qemu_destroy(void)
+{
+ kvm_finalize(kvm_context);
+}
+
+void kvm_cpu_register_physical_memory(target_phys_addr_t start_addr,
+ unsigned long size,
+ unsigned long phys_offset)
+{
+#ifdef KVM_CAP_USER_MEMORY
+ int r = 0;
+
+ r = kvm_check_extension(kvm_context, KVM_CAP_USER_MEMORY);
+ if (r) {
+ if (!(phys_offset & ~TARGET_PAGE_MASK)) {
+ r = kvm_is_allocated_mem(kvm_context, start_addr, size);
+ if (r)
+ return;
+ r = kvm_is_intersecting_mem(kvm_context, start_addr);
+ if (r)
+ kvm_create_mem_hole(kvm_context, start_addr, size);
+ r = kvm_register_userspace_phys_mem(kvm_context, start_addr,
+ phys_ram_base + phys_offset,
+ size, 0);
+ }
+ if (phys_offset & IO_MEM_ROM) {
+ phys_offset &= ~IO_MEM_ROM;
+ r = kvm_is_intersecting_mem(kvm_context, start_addr);
+ if (r)
+ kvm_create_mem_hole(kvm_context, start_addr, size);
+ r = kvm_register_userspace_phys_mem(kvm_context, start_addr,
+ phys_ram_base + phys_offset,
+ size, 0);
+ }
+ if (r < 0) {
+ printf("kvm_cpu_register_physical_memory: failed\n");
+ exit(1);
+ }
+ return;
+ }
+#endif
+ if (phys_offset & IO_MEM_ROM) {
+ phys_offset &= ~IO_MEM_ROM;
+ memcpy(phys_ram_base + start_addr, phys_ram_base + phys_offset, size);
+ }
+}
+
+int kvm_qemu_check_extension(int ext)
+{
+ return kvm_check_extension(kvm_context, ext);
+}
+
+int kvm_qemu_init_env(CPUState *cenv)
+{
+ return kvm_arch_qemu_init_env(cenv);
+}
+
+int kvm_update_debugger(CPUState *env)
+{
+ struct kvm_debug_guest dbg;
+ int i;
+
+ dbg.enabled = 0;
+ if (env->nb_breakpoints || env->singlestep_enabled) {
+ dbg.enabled = 1;
+ for (i = 0; i < 4 && i < env->nb_breakpoints; ++i) {
+ dbg.breakpoints[i].enabled = 1;
+ dbg.breakpoints[i].address = env->breakpoints[i];
+ }
+ dbg.singlestep = env->singlestep_enabled;
+ }
+ return kvm_guest_debug(kvm_context, env->cpu_index, &dbg);
+}
+
+
+/*
+ * dirty pages logging
+ */
+/* FIXME: use unsigned long pointer instead of unsigned char */
+unsigned char *kvm_dirty_bitmap = NULL;
+int kvm_physical_memory_set_dirty_tracking(int enable)
+{
+ int r = 0;
+
+ if (!kvm_enabled())
+ return 0;
+
+ if (enable) {
+ if (!kvm_dirty_bitmap) {
+ unsigned bitmap_size = BITMAP_SIZE(phys_ram_size);
+ kvm_dirty_bitmap = qemu_malloc(bitmap_size);
+ if (kvm_dirty_bitmap == NULL) {
+ perror("Failed to allocate dirty pages bitmap");
+ r=-1;
+ }
+ else {
+ r = kvm_dirty_pages_log_enable_all(kvm_context);
+ }
+ }
+ }
+ else {
+ if (kvm_dirty_bitmap) {
+ r = kvm_dirty_pages_log_reset(kvm_context);
+ qemu_free(kvm_dirty_bitmap);
+ kvm_dirty_bitmap = NULL;
+ }
+ }
+ return r;
+}
+
+/* get kvm's dirty pages bitmap and update qemu's */
+int kvm_get_dirty_pages_log_range(unsigned long start_addr,
+ unsigned char *bitmap,
+ unsigned int offset,
+ unsigned long mem_size)
+{
+ unsigned int i, j, n=0;
+ unsigned char c;
+ unsigned page_number, addr, addr1;
+ unsigned int len = ((mem_size/TARGET_PAGE_SIZE) + 7) / 8;
+
+ /*
+ * bitmap-traveling is faster than memory-traveling (for addr...)
+ * especially when most of the memory is not dirty.
+ */
+ for (i=0; i<len; i++) {
+ c = bitmap[i];
+ while (c>0) {
+ j = ffsl(c) - 1;
+ c &= ~(1u<<j);
+ page_number = i * 8 + j;
+ addr1 = page_number * TARGET_PAGE_SIZE;
+ addr = offset + addr1;
+ cpu_physical_memory_set_dirty(addr);
+ n++;
+ }
+ }
+ return 0;
+}
+int kvm_get_dirty_bitmap_cb(unsigned long start, unsigned long len,
+ void *bitmap, void *opaque)
+{
+ return kvm_get_dirty_pages_log_range(start, bitmap, start, len);
+}
+
+/*
+ * get kvm's dirty pages bitmap and update qemu's
+ * we only care about physical ram, which resides in slots 0 and 3
+ */
+int kvm_update_dirty_pages_log(void)
+{
+ int r = 0;
+
+
+ r = kvm_get_dirty_pages_range(kvm_context, 0, phys_ram_size,
+ kvm_dirty_bitmap, NULL,
+ kvm_get_dirty_bitmap_cb);
+ return r;
+}
+
+int kvm_get_phys_ram_page_bitmap(unsigned char *bitmap)
+{
+ unsigned int bsize = BITMAP_SIZE(phys_ram_size);
+ unsigned int brsize = BITMAP_SIZE(ram_size);
+ unsigned int extra_pages = (phys_ram_size - ram_size) / TARGET_PAGE_SIZE;
+ unsigned int extra_bytes = (extra_pages +7)/8;
+ unsigned int hole_start = BITMAP_SIZE(0xa0000);
+ unsigned int hole_end = BITMAP_SIZE(0xc0000);
+
+ memset(bitmap, 0xFF, brsize + extra_bytes);
+ memset(bitmap + hole_start, 0, hole_end - hole_start);
+ memset(bitmap + brsize + extra_bytes, 0, bsize - brsize - extra_bytes);
+
+ return 0;
+}
+
+#ifdef KVM_CAP_IRQCHIP
+
+int kvm_set_irq(int irq, int level)
+{
+ return kvm_set_irq_level(kvm_context, irq, level);
+}
+
+#endif
+
+void qemu_kvm_aio_wait_start(void)
+{
+}
+
+void qemu_kvm_aio_wait(void)
+{
+ if (!cpu_single_env || cpu_single_env->cpu_index == 0) {
+ pthread_mutex_unlock(&qemu_mutex);
+ kvm_eat_signal(cpu_single_env, 1000);
+ pthread_mutex_lock(&qemu_mutex);
+ } else {
+ pthread_cond_wait(&qemu_aio_cond, &qemu_mutex);
+ }
+}
+
+void qemu_kvm_aio_wait_end(void)
+{
+}
+
+int qemu_kvm_get_dirty_pages(unsigned long phys_addr, void *buf)
+{
+ return kvm_get_dirty_pages(kvm_context, phys_addr, buf);
+}
+
+void *kvm_cpu_create_phys_mem(target_phys_addr_t start_addr,
+ unsigned long size, int log, int writable)
+{
+ return kvm_create_phys_mem(kvm_context, start_addr, size, log, writable);
+}
+
+void kvm_cpu_destroy_phys_mem(target_phys_addr_t start_addr,
+ unsigned long size)
+{
+ kvm_destroy_phys_mem(kvm_context, start_addr, size);
+}
+
+int qemu_kvm_create_memory_alias(uint64_t phys_addr, uint64_t phys_start,
+ uint64_t len, uint64_t target_phys)
+{
+ return kvm_create_memory_alias(kvm_context, phys_addr, phys_start,
+ len, target_phys);
+}
+
+int qemu_kvm_destroy_memory_alias(uint64_t phys_addr)
+{
+ return kvm_destroy_memory_alias(kvm_context, phys_addr);
+}
+
Index: qemu/qemu-kvm.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ qemu/qemu-kvm.h 2008-01-31 15:53:18.000000000 -0600
@@ -0,0 +1,87 @@
+#ifndef QEMU_KVM_H
+#define QEMU_KVM_H
+
+#include "cpu.h"
+
+int kvm_main_loop(void);
+int kvm_qemu_init(void);
+int kvm_qemu_create_context(void);
+int kvm_init_ap(void);
+void kvm_qemu_destroy(void);
+void kvm_load_registers(CPUState *env);
+void kvm_save_registers(CPUState *env);
+int kvm_cpu_exec(CPUState *env);
+int kvm_update_debugger(CPUState *env);
+int kvm_qemu_init_env(CPUState *env);
+int kvm_qemu_check_extension(int ext);
+void kvm_apic_init(CPUState *env);
+int kvm_set_irq(int irq, int level);
+
+int kvm_physical_memory_set_dirty_tracking(int enable);
+int kvm_update_dirty_pages_log(void);
+int kvm_get_phys_ram_page_bitmap(unsigned char *bitmap);
+
+void qemu_kvm_call_with_env(void (*func)(void *), void *data, CPUState *env);
+void qemu_kvm_cpuid_on_env(CPUState *env);
+void kvm_update_after_sipi(CPUState *env);
+void kvm_update_interrupt_request(CPUState *env);
+void kvm_cpu_register_physical_memory(target_phys_addr_t start_addr,
+ unsigned long size,
+ unsigned long phys_offset);
+void *kvm_cpu_create_phys_mem(target_phys_addr_t start_addr,
+ unsigned long size, int log, int writable);
+
+void kvm_cpu_destroy_phys_mem(target_phys_addr_t start_addr,
+ unsigned long size);
+
+int qemu_kvm_create_memory_alias(uint64_t phys_addr, uint64_t phys_start,
+ uint64_t len, uint64_t target_phys);
+int qemu_kvm_destroy_memory_alias(uint64_t phys_addr);
+
+int kvm_arch_qemu_create_context(void);
+
+void kvm_arch_save_regs(CPUState *env);
+void kvm_arch_load_regs(CPUState *env);
+int kvm_arch_qemu_init_env(CPUState *cenv);
+int kvm_arch_halt(void *opaque, int vcpu);
+void kvm_arch_pre_kvm_run(void *opaque, int vcpu);
+void kvm_arch_post_kvm_run(void *opaque, int vcpu);
+int kvm_arch_has_work(CPUState *env);
+int kvm_arch_try_push_interrupts(void *opaque);
+void kvm_arch_update_regs_for_sipi(CPUState *env);
+
+CPUState *qemu_kvm_cpu_env(int index);
+
+void qemu_kvm_aio_wait_start(void);
+void qemu_kvm_aio_wait(void);
+void qemu_kvm_aio_wait_end(void);
+
+void kvm_tpr_opt_setup(CPUState *env);
+void kvm_tpr_access_report(CPUState *env, uint64_t rip, int is_write);
+int handle_tpr_access(void *opaque, int vcpu,
+ uint64_t rip, int is_write);
+
+int qemu_kvm_get_dirty_pages(unsigned long phys_addr, void *buf);
+
+#ifdef TARGET_PPC
+int handle_powerpc_dcr_read(int vcpu, uint32_t dcrn, uint32_t *data);
+int handle_powerpc_dcr_write(int vcpu,uint32_t dcrn, uint32_t data);
+#endif
+
+#define ALIGN(x, y) (((x)+(y)-1) & ~((y)-1))
+#define BITMAP_SIZE(m) (ALIGN(((m)>>TARGET_PAGE_BITS), HOST_LONG_BITS) / 8)
+
+#ifdef USE_KVM
+#include "libkvm.h"
+
+extern int kvm_allowed;
+extern kvm_context_t kvm_context;
+
+#define kvm_enabled() (kvm_allowed)
+#define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context)
+#else
+#define kvm_enabled() (0)
+#define qemu_kvm_irqchip_in_kernel() (0)
+#endif
+
+#endif
Index: qemu/target-i386/cpu.h
===================================================================
--- qemu.orig/target-i386/cpu.h 2007-11-14 12:08:56.000000000 -0600
+++ qemu/target-i386/cpu.h 2008-01-31 15:41:47.000000000 -0600
@@ -160,14 +160,19 @@
#define HF_MP_MASK (1 << HF_MP_SHIFT)
#define HF_EM_MASK (1 << HF_EM_SHIFT)
#define HF_TS_MASK (1 << HF_TS_SHIFT)
+#define HF_IOPL_MASK (3 << HF_IOPL_SHIFT)
#define HF_LMA_MASK (1 << HF_LMA_SHIFT)
#define HF_CS64_MASK (1 << HF_CS64_SHIFT)
#define HF_OSFXSR_MASK (1 << HF_OSFXSR_SHIFT)
+#define HF_VM_MASK (1 << HF_VM_SHIFT)
#define HF_HALTED_MASK (1 << HF_HALTED_SHIFT)
#define HF_SMM_MASK (1 << HF_SMM_SHIFT)
#define HF_GIF_MASK (1 << HF_GIF_SHIFT)
#define HF_HIF_MASK (1 << HF_HIF_SHIFT)
+#define CR0_PE_SHIFT 0
+#define CR0_MP_SHIFT 1
+
#define CR0_PE_MASK (1 << 0)
#define CR0_MP_MASK (1 << 1)
#define CR0_EM_MASK (1 << 2)
@@ -186,7 +191,8 @@
#define CR4_PAE_MASK (1 << 5)
#define CR4_PGE_MASK (1 << 7)
#define CR4_PCE_MASK (1 << 8)
-#define CR4_OSFXSR_MASK (1 << 9)
+#define CR4_OSFXSR_SHIFT 9
+#define CR4_OSFXSR_MASK (1 << CR4_OSFXSR_SHIFT)
#define CR4_OSXMMEXCPT_MASK (1 << 10)
#define PG_PRESENT_BIT 0
@@ -549,6 +555,8 @@
target_ulong kernelgsbase;
#endif
+ uint64_t tsc; /* time stamp counter */
+ uint8_t ready_for_interrupt_injection;
uint64_t pat;
/* exception/interrupt handling */
@@ -583,6 +591,11 @@
int kqemu_enabled;
int last_io_time;
#endif
+
+#define BITS_PER_LONG (8 * sizeof (uint32_t))
+#define NR_IRQ_WORDS (256/ BITS_PER_LONG)
+ uint32_t kvm_interrupt_bitmap[NR_IRQ_WORDS];
+
/* in order to simplify APIC support, we leave this pointer to the
user */
struct APICState *apic_state;
Index: qemu/vl.c
===================================================================
--- qemu.orig/vl.c 2008-01-31 15:41:46.000000000 -0600
+++ qemu/vl.c 2008-01-31 15:41:47.000000000 -0600
@@ -37,6 +37,7 @@
#include "qemu-char.h"
#include "block.h"
#include "audio/audio.h"
+#include "qemu-kvm.h"
#include <unistd.h>
#include <fcntl.h>
@@ -225,6 +226,7 @@
int nb_option_roms;
int semihosting_enabled = 0;
int autostart = 1;
+unsigned int kvm_shadow_memory = 0;
#ifdef TARGET_ARM
int old_param = 0;
#endif
@@ -6283,6 +6285,9 @@
uint32_t hflags;
int i;
+ if (kvm_enabled())
+ kvm_save_registers(env);
+
for(i = 0; i < CPU_NB_REGS; i++)
qemu_put_betls(f, &env->regs[i]);
qemu_put_betls(f, &env->eip);
@@ -6367,6 +6372,13 @@
qemu_put_be64s(f, &env->kernelgsbase);
#endif
qemu_put_be32s(f, &env->smbase);
+
+ if (kvm_enabled()) {
+ for (i = 0; i < NR_IRQ_WORDS ; i++) {
+ qemu_put_be32s(f, &env->kvm_interrupt_bitmap[i]);
+ }
+ qemu_put_be64s(f, &env->tsc);
+ }
}
#ifdef USE_X86LDOUBLE
@@ -6509,6 +6521,16 @@
/* XXX: compute hflags from scratch, except for CPL and IIF */
env->hflags = hflags;
tlb_flush(env, 1);
+ if (kvm_enabled()) {
+ /* when in-kernel irqchip is used, HF_HALTED_MASK causes deadlock
+ because no userspace IRQs will ever clear this flag */
+ env->hflags &= ~HF_HALTED_MASK;
+ for (i = 0; i < NR_IRQ_WORDS ; i++) {
+ qemu_get_be32s(f, &env->kvm_interrupt_bitmap[i]);
+ }
+ qemu_get_be64s(f, &env->tsc);
+ kvm_load_registers(env);
+ }
return 0;
}
@@ -6836,6 +6858,8 @@
if (qemu_get_be32(f) != phys_ram_size)
return -EINVAL;
for(i = 0; i < phys_ram_size; i+= TARGET_PAGE_SIZE) {
+ if (kvm_enabled() && (i>=0xa0000) && (i<0xc0000)) /* do not access video-addresses */
+ continue;
ret = ram_get_page(f, phys_ram_base + i, TARGET_PAGE_SIZE);
if (ret)
return ret;
@@ -6975,6 +6999,8 @@
if (ram_compress_open(s, f) < 0)
return;
for(i = 0; i < phys_ram_size; i+= BDRV_HASH_BLOCK_SIZE) {
+ if (kvm_enabled() && (i>=0xa0000) && (i<0xc0000)) /* do not access video-addresses */
+ continue;
#if 0
if (tight_savevm_enabled) {
int64_t sector_num;
@@ -7485,6 +7511,13 @@
#endif
CPUState *env;
+
+ if (kvm_enabled()) {
+ kvm_main_loop();
+ cpu_disable_ticks();
+ return 0;
+ }
+
cur_cpu = first_cpu;
next_cpu = cur_cpu->next_cpu ?: first_cpu;
for(;;) {
@@ -7526,6 +7559,8 @@
if (reset_requested) {
reset_requested = 0;
qemu_system_reset();
+ if (kvm_enabled())
+ kvm_load_registers(env);
ret = EXCP_INTERRUPT;
}
if (powerdown_requested) {
@@ -7671,6 +7706,10 @@
"-kernel-kqemu enable KQEMU full virtualization (default is user mode only)\n"
"-no-kqemu disable KQEMU kernel module usage\n"
#endif
+#ifdef USE_KVM
+ "-no-kvm disable KVM hardware virtualization\n"
+ "-no-kvm-irqchip disable KVM kernel mode PIC/IOAPIC/LAPIC\n"
+#endif
#ifdef TARGET_I386
"-std-vga simulate a standard VGA card with VESA Bochs Extensions\n"
" (default is CL-GD5446 PCI VGA)\n"
@@ -7682,6 +7721,7 @@
#ifndef _WIN32
"-daemonize daemonize QEMU after initializing\n"
#endif
+ "-kvm-shadow-memory megs set the amount of shadow pages to be allocated\n"
"-option-rom rom load a file, rom, into the option ROM space\n"
#ifdef TARGET_SPARC
"-prom-env variable=value set OpenBIOS nvram variables\n"
@@ -7783,6 +7823,8 @@
QEMU_OPTION_smp,
QEMU_OPTION_vnc,
QEMU_OPTION_no_acpi,
+ QEMU_OPTION_no_kvm,
+ QEMU_OPTION_no_kvm_irqchip,
QEMU_OPTION_no_reboot,
QEMU_OPTION_show_cursor,
QEMU_OPTION_daemonize,
@@ -7794,6 +7836,7 @@
QEMU_OPTION_clock,
QEMU_OPTION_startdate,
QEMU_OPTION_translation,
+ QEMU_OPTION_kvm_shadow_memory,
};
typedef struct QEMUOption {
@@ -7859,6 +7902,10 @@
{ "no-kqemu", 0, QEMU_OPTION_no_kqemu },
{ "kernel-kqemu", 0, QEMU_OPTION_kernel_kqemu },
#endif
+#ifdef USE_KVM
+ { "no-kvm", 0, QEMU_OPTION_no_kvm },
+ { "no-kvm-irqchip", 0, QEMU_OPTION_no_kvm_irqchip },
+#endif
#if defined(TARGET_PPC) || defined(TARGET_SPARC)
{ "g", 1, QEMU_OPTION_g },
#endif
@@ -7893,6 +7940,7 @@
#if defined(TARGET_ARM) || defined(TARGET_M68K)
{ "semihosting", 0, QEMU_OPTION_semihosting },
#endif
+ { "kvm-shadow-memory", HAS_ARG, QEMU_OPTION_kvm_shadow_memory },
{ "name", HAS_ARG, QEMU_OPTION_name },
#if defined(TARGET_SPARC)
{ "prom-env", HAS_ARG, QEMU_OPTION_prom_env },
@@ -8641,6 +8689,16 @@
kqemu_allowed = 2;
break;
#endif
+#ifdef USE_KVM
+ case QEMU_OPTION_no_kvm:
+ kvm_allowed = 0;
+ break;
+ case QEMU_OPTION_no_kvm_irqchip: {
+ extern int kvm_irqchip;
+ kvm_irqchip = 0;
+ break;
+ }
+#endif
case QEMU_OPTION_usb:
usb_enabled = 1;
break;
@@ -8688,6 +8746,9 @@
case QEMU_OPTION_semihosting:
semihosting_enabled = 1;
break;
+ case QEMU_OPTION_kvm_shadow_memory:
+ kvm_shadow_memory = (int64_t)atoi(optarg) * 1024 * 1024 / 4096;
+ break;
case QEMU_OPTION_name:
qemu_name = optarg;
break;
@@ -8810,6 +8871,16 @@
}
#endif
+#if USE_KVM
+ if (kvm_enabled()) {
+ if (kvm_qemu_init() < 0) {
+ extern int kvm_allowed;
+ fprintf(stderr, "Could not initialize KVM, will disable KVM support\n");
+ kvm_allowed = 0;
+ }
+ }
+#endif
+
if (pid_file && qemu_create_pidfile(pid_file) != 0) {
if (daemonize) {
uint8_t status = 1;
@@ -8904,10 +8975,38 @@
/* init the memory */
phys_ram_size = ram_size + vga_ram_size + MAX_BIOS_SIZE;
- phys_ram_base = qemu_vmalloc(phys_ram_size);
- if (!phys_ram_base) {
- fprintf(stderr, "Could not allocate physical memory\n");
- exit(1);
+ /* Initialize kvm */
+#if defined(TARGET_I386) || defined(TARGET_X86_64)
+#define KVM_EXTRA_PAGES 3
+#else
+#define KVM_EXTRA_PAGES 0
+#endif
+ if (kvm_enabled()) {
+ phys_ram_size += KVM_EXTRA_PAGES * TARGET_PAGE_SIZE;
+ if (kvm_qemu_create_context() < 0) {
+ fprintf(stderr, "Could not create KVM context\n");
+ exit(1);
+ }
+#ifdef KVM_CAP_USER_MEMORY
+{
+ int ret;
+
+ ret = kvm_qemu_check_extension(KVM_CAP_USER_MEMORY);
+ if (ret) {
+ phys_ram_base = qemu_vmalloc(phys_ram_size);
+ if (!phys_ram_base) {
+ fprintf(stderr, "Could not allocate physical memory\n");
+ exit(1);
+ }
+ }
+}
+#endif
+ } else {
+ phys_ram_base = qemu_vmalloc(phys_ram_size);
+ if (!phys_ram_base) {
+ fprintf(stderr, "Could not allocate physical memory\n");
+ exit(1);
+ }
}
bdrv_init();
@@ -9025,6 +9124,9 @@
qemu_mod_timer(display_state.gui_timer, qemu_get_clock(rt_clock));
}
+ if (kvm_enabled())
+ kvm_init_ap();
+
#ifdef CONFIG_GDBSTUB
if (use_gdbstub) {
/* XXX: use standard host:port notation and modify options
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [qemu-devel] [PATCH 0/6] Support for the Kernel Virtual Machine interface
2008-01-31 22:36 [Qemu-devel] [PATCH 0/6] Support for the Kernel Virtual Machine interface Anthony Liguori
` (5 preceding siblings ...)
2008-01-31 22:36 ` [Qemu-devel] [PATCH 6/6] QEMU support for the Kernel Virtual Machine interface Anthony Liguori
@ 2008-01-31 22:53 ` Anthony Liguori
6 siblings, 0 replies; 35+ messages in thread
From: Anthony Liguori @ 2008-01-31 22:53 UTC (permalink / raw)
To: qemu-devel; +Cc: kvm-devel, Paul Brook
FYI, for the new files introduced, Avi should be following up with a
patch to add Copyrights to the files. They will be licensed under the GPL.
Regards,
Anthony Liguori
Anthony Liguori wrote:
> KVM is a Linux interface for providing userspace interfaces for accelerated
> virtualization. It has been included since 2.6.20 and supports Intel VT and
> AMD-V. Ports are under way for ia64, embedded PowerPC, and s390.
>
> This set of patches provide basic support for KVM in QEMU. It does not include
> all of the changes in the KVM QEMU branch (such as virtio, live migration,
> extboot, etc). However, if we can get these first portions merged, I will
> follow up with the remainder of the changes and I believe we can be fully
> merged in the very near future.
>
> The first 5 patches of this series are not KVM specific but are critical fixes
> for KVM to be functional. The 6th patch provides KVM support. The goal in
> providing KVM support is to make sure that when KVM support is not compiled in,
> the code paths aren't changed at all. I hope this makes it very easy to merge.
>
> KVM moves very quickly, so I'd appreciate if these patches can be reviewed as
> soon as possible as it's going to be tough to keep them in sync with the main
> KVM tree while they're out of tree.
>
> To enable KVM support, you have to have to libkvm installed. You should also
> explicitly specify the location of your kernel tree (with KVM headers) with the
> --kernel-path option. We will improve libkvm such that this isn't required in
> future versions.
>
> KVM also has an enhanced Bochs BIOS. I've tested these patches with out it and
> it's not strictly necessary for basic functionality. I would recommend pulling
> in a copy of it though as it has useful fixes even in the absence of KVM.
>
> A very large number of people have contributed to these patches with Avi Kivity
> being the main developer of this support. For a full listing of contributers,
> please consult the KVM ChangeLog[1].
>
> [1] http://kvm.qumranet.com/kvmwiki/ChangeLog
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> kvm-devel mailing list
> kvm-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-01-31 22:36 ` [Qemu-devel] [PATCH 1/6] Use correct types to enable > 2G support Anthony Liguori
@ 2008-01-31 23:54 ` Paul Brook
2008-02-01 0:25 ` Anthony Liguori
2008-02-01 10:26 ` Fabrice Bellard
2008-02-03 8:58 ` Izik Eidus
2 siblings, 1 reply; 35+ messages in thread
From: Paul Brook @ 2008-01-31 23:54 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kvm-devel, qemu-devel
On Thursday 31 January 2008, Anthony Liguori wrote:
> KVM supports more than 2GB of memory for x86_64 hosts. The following patch
> fixes a number of type related issues where int's were being used when they
> shouldn't have been. It also introduces CMOS support so the BIOS can build
> the appropriate e820 tables.
You've still got a fairly random mix of unsigned long, ram_addr_t and
uint64_t.
> -typedef void QEMUMachineInitFunc(int ram_size, int vga_ram_size,
> +typedef void QEMUMachineInitFunc(ram_addr_t ram_size, int vga_ram_size,
This breaks every target except x86.
> + if (above_4g_mem_size) {
> + rtc_set_memory(s, 0x5b, (unsigned int)above_4g_mem_size >> 16);
> + rtc_set_memory(s, 0x5c, (unsigned int)above_4g_mem_size >> 24);
> + rtc_set_memory(s, 0x5d, above_4g_mem_size >> 32);
This will cause warnings on 32-bit hosts.
> + if (ram_size >= 0xe0000000 ) {
> + above_4g_mem_size = ram_size - 0xe0000000;
> + ram_size = 0xe0000000;
> + }
I'm fairly sure this will break the VMware VGA adapter:
> pci_vmsvga_init(pci_bus, ds, phys_ram_base + ram_size,
> ram_size, vga_ram_size);
> +#define PHYS_RAM_MAX_SIZE (2047 * 1024 * 1024 * 1024ULL)
This seems fairly arbitrary. Why? Any limit is certainly target specific.
Paul
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] Re: [PATCH 4/6] Tell BIOS about the number of CPUs
2008-01-31 22:36 ` [Qemu-devel] [PATCH 4/6] Tell BIOS about the number of CPUs Anthony Liguori
@ 2008-02-01 0:14 ` Paul Brook
2008-02-01 0:28 ` Anthony Liguori
0 siblings, 1 reply; 35+ messages in thread
From: Paul Brook @ 2008-02-01 0:14 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kvm-devel, qemu-devel
> - cmos_init(ram_size, above_4g_mem_size, boot_device, hd);
> + cmos_init(ram_size, above_4g_mem_size, boot_device, hd, smp_cpus);
smp_cpus is a global variable. Why bother passing it around?
Are the CMOS contents documented anywhere?
Paul
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-01-31 23:54 ` [Qemu-devel] " Paul Brook
@ 2008-02-01 0:25 ` Anthony Liguori
2008-02-01 0:37 ` Paul Brook
0 siblings, 1 reply; 35+ messages in thread
From: Anthony Liguori @ 2008-02-01 0:25 UTC (permalink / raw)
To: Paul Brook; +Cc: kvm-devel, Izik Eidus, qemu-devel
Paul Brook wrote:
> On Thursday 31 January 2008, Anthony Liguori wrote:
>
>> KVM supports more than 2GB of memory for x86_64 hosts. The following patch
>> fixes a number of type related issues where int's were being used when they
>> shouldn't have been. It also introduces CMOS support so the BIOS can build
>> the appropriate e820 tables.
>>
>
> You've still got a fairly random mix of unsigned long, ram_addr_t and
> uint64_t.
>
I wasn't the one that did this work, but we've tested KVM with very
large amounts of memory (~15GB I believe). I suspect the changes were
driven by trial and error. Perhaps Izik can shed more light on how
things were changed?
>> -typedef void QEMUMachineInitFunc(int ram_size, int vga_ram_size,
>> +typedef void QEMUMachineInitFunc(ram_addr_t ram_size, int vga_ram_size,
>>
>
> This breaks every target except x86.
>
>
Indeed. I missed this because it's only a warning since it's just a
pointer cast. I'll fix the patch for all the remaining targets. Thanks!
>> + if (above_4g_mem_size) {
>> + rtc_set_memory(s, 0x5b, (unsigned int)above_4g_mem_size >> 16);
>> + rtc_set_memory(s, 0x5c, (unsigned int)above_4g_mem_size >> 24);
>> + rtc_set_memory(s, 0x5d, above_4g_mem_size >> 32);
>>
>
> This will cause warnings on 32-bit hosts.
>
Yeah, it needs a (uint64_t), I'll update.
>> +#define PHYS_RAM_MAX_SIZE (2047 * 1024 * 1024 * 1024ULL)
>>
>
> This seems fairly arbitrary. Why? Any limit is certainly target specific.
>
On a 32-bit host, a 2GB limit is pretty reasonable since you're limited
in virtual address space. On a 64-bit host, there isn't this
fundamental limit. If a target may have it's own limit but there is
definitely a host imposed limit.
2047GBs is a somewhat arbitrary limit though for 64-bit hosts. If you
have a more logical suggestion, I'll happily change it.
Regards,
Anthony Liguori
> Paul
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] Re: [PATCH 4/6] Tell BIOS about the number of CPUs
2008-02-01 0:14 ` [Qemu-devel] " Paul Brook
@ 2008-02-01 0:28 ` Anthony Liguori
2008-02-01 0:40 ` Paul Brook
0 siblings, 1 reply; 35+ messages in thread
From: Anthony Liguori @ 2008-02-01 0:28 UTC (permalink / raw)
To: Paul Brook; +Cc: kvm-devel, qemu-devel
Paul Brook wrote:
>> - cmos_init(ram_size, above_4g_mem_size, boot_device, hd);
>> + cmos_init(ram_size, above_4g_mem_size, boot_device, hd, smp_cpus);
>>
>
> smp_cpus is a global variable. Why bother passing it around?
>
True, I'll update the patch
> Are the CMOS contents documented anywhere?
>
No, but if you have a suggestion of where to document them, I'll add
documentation.
Regards,
Anthony Liguori
> Paul
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 0:25 ` Anthony Liguori
@ 2008-02-01 0:37 ` Paul Brook
2008-02-01 0:40 ` Anthony Liguori
0 siblings, 1 reply; 35+ messages in thread
From: Paul Brook @ 2008-02-01 0:37 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kvm-devel, Izik Eidus, qemu-devel
> >> +#define PHYS_RAM_MAX_SIZE (2047 * 1024 * 1024 * 1024ULL)
> >
> > This seems fairly arbitrary. Why? Any limit is certainly target specific.
>
> On a 32-bit host, a 2GB limit is pretty reasonable since you're limited
> in virtual address space. On a 64-bit host, there isn't this
> fundamental limit. If a target may have it's own limit but there is
> definitely a host imposed limit.
>
> 2047GBs is a somewhat arbitrary limit though for 64-bit hosts. If you
> have a more logical suggestion, I'll happily change it.
Don't have a limit at all.
The reason we have the current 31-bit limit is because qemu is/was known to
use a signed int do hold the size. With your code 64-bit hosts should be able
to handle anything atoi can parse.
As mentioned on IRC, I also noticed that ram_save hasn't been updated.
Paul
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 0:37 ` Paul Brook
@ 2008-02-01 0:40 ` Anthony Liguori
0 siblings, 0 replies; 35+ messages in thread
From: Anthony Liguori @ 2008-02-01 0:40 UTC (permalink / raw)
To: Paul Brook; +Cc: kvm-devel, Izik Eidus, qemu-devel
Paul Brook wrote:
>>>> +#define PHYS_RAM_MAX_SIZE (2047 * 1024 * 1024 * 1024ULL)
>>>>
>>> This seems fairly arbitrary. Why? Any limit is certainly target specific.
>>>
>> On a 32-bit host, a 2GB limit is pretty reasonable since you're limited
>> in virtual address space. On a 64-bit host, there isn't this
>> fundamental limit. If a target may have it's own limit but there is
>> definitely a host imposed limit.
>>
>> 2047GBs is a somewhat arbitrary limit though for 64-bit hosts. If you
>> have a more logical suggestion, I'll happily change it.
>>
>
> Don't have a limit at all.
>
> The reason we have the current 31-bit limit is because qemu is/was known to
> use a signed int do hold the size. With your code 64-bit hosts should be able
> to handle anything atoi can parse.
>
> As mentioned on IRC, I also noticed that ram_save hasn't been updated.
>
Okay, I'll update both of these.
Regards,
Anthony Liguori
> Paul
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] Re: [PATCH 4/6] Tell BIOS about the number of CPUs
2008-02-01 0:28 ` Anthony Liguori
@ 2008-02-01 0:40 ` Paul Brook
0 siblings, 0 replies; 35+ messages in thread
From: Paul Brook @ 2008-02-01 0:40 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kvm-devel, qemu-devel
> > Are the CMOS contents documented anywhere?
>
> No, but if you have a suggestion of where to document them, I'll add
> documentation.
I suggest in or with the BIOS sources.
As we're using a common BIOS it seems a good idea to make sure this kind of
things is coordinated.
Paul
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] Re: [PATCH 6/6] QEMU support for the Kernel Virtual Machine interface
2008-01-31 22:36 ` [Qemu-devel] [PATCH 6/6] QEMU support for the Kernel Virtual Machine interface Anthony Liguori
@ 2008-02-01 9:49 ` Fabrice Bellard
2008-02-01 14:18 ` Anthony Liguori
0 siblings, 1 reply; 35+ messages in thread
From: Fabrice Bellard @ 2008-02-01 9:49 UTC (permalink / raw)
To: qemu-devel; +Cc: kvm-devel, Anthony Liguori, Paul Brook
Anthony Liguori wrote:
> This patch actually enables KVM support for QEMU. I apologize that it is so
> large but this was the only sane way to preserve bisectability.
>
> The goal of this patch is to add KVM support, but not to impact users when
> KVM isn't being used. It achieves this by using a kvm_enabled() macro that
> evaluates to (0) when KVM support is not enabled. An if (kvm_enabled()) is
> just as good as using an #ifdef since GCC will eliminate the dead code.
>
> This patches touches a lot of areas. For performance reasons, the guest CPU
> state is not kept in sync with CPUState. This requires an explicit
> synchronization whenever CPUState is required. KVM also uses it's own main
> loop as it runs each VCPU in it's own thread.
>
> Trapping VGA updates via MMIO is far too slow when running KVM so there is
> additional logic to allow VGA memory to be accessed as RAM. We use KVM's
> shadow page tables to keep track of which portions of RAM have been dirtied.
>
> KVM also supports an in-kernel APIC implementation as a performance
> enhancement. Finally, KVM supports APIC TPR patching. This allows TPR
> accesses (which are very frequently for Windows) to be patches into CALL
> instructions to the BIOS (for 32-bit guests). This results in a very
> sigificant performance improvement for Windows guests.
>
> While this patch is very large, the new files are only included when KVM
> support is compiled in. Every change to QEMU is wrapped in an
> if (kvm_enabled()) so the code disappears when KVM support is not compiled in.
> This is done to ensure no regressions are introduced to normal QEMU.
Some questions:
- QEMU already maintains modified page status for VGA memory (and kqemu
for example fully supports that), so I don't see why KVM needs a new method.
- Why is kvm_cpu_register_physical_memory() needed ? kqemu can work
without it because there is a remapping between physical memory and RAM
address. I suggest to add that feature in KVM or to modify
cpu_register_physical_memory() to hide it.
- If KVM implements its own CPU loop, why are there patches in libqemu.a
(CPU core) ?
Regards,
Fabrice.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-01-31 22:36 ` [Qemu-devel] [PATCH 1/6] Use correct types to enable > 2G support Anthony Liguori
2008-01-31 23:54 ` [Qemu-devel] " Paul Brook
@ 2008-02-01 10:26 ` Fabrice Bellard
2008-02-01 14:35 ` Anthony Liguori
2008-02-03 8:58 ` Izik Eidus
2 siblings, 1 reply; 35+ messages in thread
From: Fabrice Bellard @ 2008-02-01 10:26 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kvm-devel, qemu-devel, Paul Brook
Anthony Liguori wrote:
> KVM supports more than 2GB of memory for x86_64 hosts. The following patch
> fixes a number of type related issues where int's were being used when they
> shouldn't have been. It also introduces CMOS support so the BIOS can build
> the appropriate e820 tables.
> [...]
> + /* above 4giga memory allocation */
> + if (above_4g_mem_size > 0) {
> + ram_addr = qemu_ram_alloc(above_4g_mem_size);
> + cpu_register_physical_memory(0x100000000, above_4g_mem_size, ram_addr);
> + }
> +
Why do you need this ? All the RAM can be registered with a single call.
I fear you need to do that because of KVM RAM handling limitations.
> Index: qemu/osdep.c
> ===================================================================
> --- qemu.orig/osdep.c 2008-01-30 13:47:00.000000000 -0600
> +++ qemu/osdep.c 2008-01-30 13:47:31.000000000 -0600
> @@ -113,7 +113,7 @@
> int64_t free_space;
> int ram_mb;
>
> - extern int ram_size;
> + extern int64_t ram_size;
> free_space = (int64_t)stfs.f_bavail * stfs.f_bsize;
> if ((ram_size + 8192 * 1024) >= free_space) {
> ram_mb = (ram_size / (1024 * 1024));
> @@ -202,7 +202,7 @@
> #ifdef _BSD
> return valloc(size);
> #else
> - return memalign(4096, size);
> + return memalign(TARGET_PAGE_SIZE, size);
> #endif
> }
No fully correct because it is intended to be the host page size.
> +extern int64_t ram_size;
I agree with the fact that ram_size should be 64 bit. Maybe each machine
could test the value and emit an error message if it is too big. Maybe
an uint64_t would be better though.
Fabrice.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 15:13 ` Avi Kivity
@ 2008-02-01 11:56 ` Robert William Fuller
2008-02-01 16:09 ` M. Warner Losh
2008-02-01 17:35 ` Jamie Lokier
2008-02-01 15:33 ` [Qemu-devel] Re: [kvm-devel] " Anthony Liguori
1 sibling, 2 replies; 35+ messages in thread
From: Robert William Fuller @ 2008-02-01 11:56 UTC (permalink / raw)
To: qemu-devel
Avi Kivity wrote:
> Anthony Liguori wrote:
>> Fabrice Bellard wrote:
>>> Anthony Liguori wrote:
>>>> + /* above 4giga memory allocation */
>>>> + if (above_4g_mem_size > 0) {
>>>> + ram_addr = qemu_ram_alloc(above_4g_mem_size);
>>>> + cpu_register_physical_memory(0x100000000,
>>>> above_4g_mem_size, ram_addr);
>>>> + }
>>>> +
>>>
>>> Why do you need this ? All the RAM can be registered with a single
>>> call. I fear you need to do that because of KVM RAM handling
>>> limitations.
>>
>> On the x86, there is a rather large hole at the top of memory.
>> Currently, we do separate allocations around this whole. You can't
>> get away from doing multiple cpu_register_physical_memory calls here.
>> We've discussed just allocating a single chunk with qemu_ram_alloc
>> since so many places in QEMU assume that you can do phys_ram_base + PA.
>>
>> I think I'll change this too into a single qemu_ram_alloc. That will
>> fix the bug with KVM when using -kernel and large memory anyway :-)
>
> Won't that cause all of the memory in the hole to be wasted?
>
> You could munmap() it, but it's hardly elegant.
>
Linux doesn't commit mapped memory until it's faulted. As for other
platforms, who knows?
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] Re: [PATCH 6/6] QEMU support for the Kernel Virtual Machine interface
2008-02-01 9:49 ` [Qemu-devel] " Fabrice Bellard
@ 2008-02-01 14:18 ` Anthony Liguori
0 siblings, 0 replies; 35+ messages in thread
From: Anthony Liguori @ 2008-02-01 14:18 UTC (permalink / raw)
To: Fabrice Bellard; +Cc: kvm-devel, qemu-devel, Paul Brook
Fabrice Bellard wrote:
> Some questions:
>
> - QEMU already maintains modified page status for VGA memory (and
> kqemu for example fully supports that), so I don't see why KVM needs a
> new method.
KQEMU passes the dirty bitmap directly to the kernel. KVM does
aggressive shadow page table caching though so maintaining the bitmap
requires removing write protection from the shadow page table entries
explicitly whenever you want to reset it. This is not something you
would want to do every time you go back and forth between
userspace/kernelspace.
KVM also doesn't pass the phys_map to the kernel like KQEMU does.
Instead, it divides memory into a set of slots. slots are contiguous
areas of RAM memory. An IO access that does fall into a slot is treated
as MMIO and is then sent to userspace. We then use the phys_map in
userspace to dispatch the MMIO operation.
There are only a handful of slots and they happen to be arranged in
order of most frequent access (I believe) such that you can very quickly
determine whether memory is MMIO or not.
> - Why is kvm_cpu_register_physical_memory() needed ? kqemu can work
> without it because there is a remapping between physical memory and
> RAM address. I suggest to add that feature in KVM or to modify
> cpu_register_physical_memory() to hide it.
The only reason the second call exists is to simplify the backwards
compatibility code. I will fix it properly though because I do agree
with you that it shouldn't be necessary.
> - If KVM implements its own CPU loop, why are there patches in
> libqemu.a (CPU core) ?
Good question! I looked through the code and some of it was just dead
code from before we had our own main loop. The rest is as follows:
In exec.c, we need to bump the size of the phys_map to support larger
memory (since we use it to dispatch MMIO). We also need to ensure that
cpu_interrupt calls into KVM code. There are also hooks for debugging
support. We've added more flags to cpu.h that we use when synchronizing
KVM register state to CPUState. We also added some additional state to
CPUState that we need to use.
Other than that, I've removed everything else.
Regards,
Anthony Liguori
> Regards,
>
> Fabrice.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 10:26 ` Fabrice Bellard
@ 2008-02-01 14:35 ` Anthony Liguori
2008-02-01 15:13 ` Avi Kivity
2008-02-01 16:00 ` Paul Brook
0 siblings, 2 replies; 35+ messages in thread
From: Anthony Liguori @ 2008-02-01 14:35 UTC (permalink / raw)
To: Fabrice Bellard; +Cc: kvm-devel, qemu-devel, Paul Brook
Fabrice Bellard wrote:
> Anthony Liguori wrote:
>> + /* above 4giga memory allocation */
>> + if (above_4g_mem_size > 0) {
>> + ram_addr = qemu_ram_alloc(above_4g_mem_size);
>> + cpu_register_physical_memory(0x100000000, above_4g_mem_size,
>> ram_addr);
>> + }
>> +
>
> Why do you need this ? All the RAM can be registered with a single
> call. I fear you need to do that because of KVM RAM handling
> limitations.
On the x86, there is a rather large hole at the top of memory.
Currently, we do separate allocations around this whole. You can't
get away from doing multiple cpu_register_physical_memory calls here.
We've discussed just allocating a single chunk with qemu_ram_alloc since
so many places in QEMU assume that you can do phys_ram_base + PA.
I think I'll change this too into a single qemu_ram_alloc. That will
fix the bug with KVM when using -kernel and large memory anyway :-)
>> Index: qemu/osdep.c
>> ===================================================================
>> --- qemu.orig/osdep.c 2008-01-30 13:47:00.000000000 -0600
>> +++ qemu/osdep.c 2008-01-30 13:47:31.000000000 -0600
>> @@ -113,7 +113,7 @@
>> int64_t free_space;
>> int ram_mb;
>>
>> - extern int ram_size;
>> + extern int64_t ram_size;
>> free_space = (int64_t)stfs.f_bavail * stfs.f_bsize;
>> if ((ram_size + 8192 * 1024) >= free_space) {
>> ram_mb = (ram_size / (1024 * 1024));
>> @@ -202,7 +202,7 @@
>> #ifdef _BSD
>> return valloc(size);
>> #else
>> - return memalign(4096, size);
>> + return memalign(TARGET_PAGE_SIZE, size);
>> #endif
>> }
>
> No fully correct because it is intended to be the host page size.
Indeed. I'm dropping this. It was added for the ia64 port and since
that's not included in this patch set, I'll let them fix it properly
when they submit support for ia64.
>> +extern int64_t ram_size;
>
> I agree with the fact that ram_size should be 64 bit. Maybe each
> machine could test the value and emit an error message if it is too
> big. Maybe an uint64_t would be better though.
uint64_t is probably more reasonable. I wouldn't begin to know what the
appropriate amount of ram was for each machine though so I'll let the
appropriate people handle that :-)
Regards,
Anthony Liguori
> Fabrice.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 14:35 ` Anthony Liguori
@ 2008-02-01 15:13 ` Avi Kivity
2008-02-01 11:56 ` Robert William Fuller
2008-02-01 15:33 ` [Qemu-devel] Re: [kvm-devel] " Anthony Liguori
2008-02-01 16:00 ` Paul Brook
1 sibling, 2 replies; 35+ messages in thread
From: Avi Kivity @ 2008-02-01 15:13 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kvm-devel, Paul Brook, qemu-devel
Anthony Liguori wrote:
> Fabrice Bellard wrote:
>> Anthony Liguori wrote:
>>> + /* above 4giga memory allocation */
>>> + if (above_4g_mem_size > 0) {
>>> + ram_addr = qemu_ram_alloc(above_4g_mem_size);
>>> + cpu_register_physical_memory(0x100000000,
>>> above_4g_mem_size, ram_addr);
>>> + }
>>> +
>>
>> Why do you need this ? All the RAM can be registered with a single
>> call. I fear you need to do that because of KVM RAM handling
>> limitations.
>
> On the x86, there is a rather large hole at the top of memory.
> Currently, we do separate allocations around this whole. You can't
> get away from doing multiple cpu_register_physical_memory calls here.
> We've discussed just allocating a single chunk with qemu_ram_alloc
> since so many places in QEMU assume that you can do phys_ram_base + PA.
>
> I think I'll change this too into a single qemu_ram_alloc. That will
> fix the bug with KVM when using -kernel and large memory anyway :-)
Won't that cause all of the memory in the hole to be wasted?
You could munmap() it, but it's hardly elegant.
--
Any sufficiently difficult bug is indistinguishable from a feature.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] Re: [kvm-devel] [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 15:13 ` Avi Kivity
2008-02-01 11:56 ` Robert William Fuller
@ 2008-02-01 15:33 ` Anthony Liguori
2008-02-01 15:40 ` Ian Jackson
1 sibling, 1 reply; 35+ messages in thread
From: Anthony Liguori @ 2008-02-01 15:33 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm-devel, qemu-devel, Paul Brook
Avi Kivity wrote:
> Anthony Liguori wrote:
>
>> I think I'll change this too into a single qemu_ram_alloc. That will
>> fix the bug with KVM when using -kernel and large memory anyway :-)
>>
>
> Won't that cause all of the memory in the hole to be wasted?
>
> You could munmap() it, but it's hardly elegant.
>
It only gets wasted if it gets faulted in. Any it won't get faulted in,
so it won't increase the RSS size. We could madvise(MADV_DONTNEED) just
to ensure that it's not occupying swap space if you were really paranoid
about it. I don't think munmap()'ing malloc()'d memory is a very good
idea. glibc may freak out.
The alternative is to change all the places that assume phys_ram_base +
PA which I don't like very much.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Qemu-devel] Re: [kvm-devel] [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 15:33 ` [Qemu-devel] Re: [kvm-devel] " Anthony Liguori
@ 2008-02-01 15:40 ` Ian Jackson
2008-02-01 17:53 ` [kvm-devel] [Qemu-devel] " Anthony Liguori
0 siblings, 1 reply; 35+ messages in thread
From: Ian Jackson @ 2008-02-01 15:40 UTC (permalink / raw)
To: qemu-devel; +Cc: kvm-devel, Paul Brook
Anthony Liguori writes ("[Qemu-devel] Re: [kvm-devel] [PATCH 1/6] Use correct types to enable > 2G support"):
> The alternative is to change all the places that assume phys_ram_base +
> PA which I don't like very much.
We would ideally like to do this for Xen, at least in the places we
care about. (Xen uses less of the qemu tree than KVM, I think.)
In Xen, the guest memory is not in general mapped into the host qemu's
address space.
Ian.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 14:35 ` Anthony Liguori
2008-02-01 15:13 ` Avi Kivity
@ 2008-02-01 16:00 ` Paul Brook
2008-02-01 16:21 ` Fabrice Bellard
2008-02-01 17:49 ` [Qemu-devel] Re: [kvm-devel] " Anthony Liguori
1 sibling, 2 replies; 35+ messages in thread
From: Paul Brook @ 2008-02-01 16:00 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kvm-devel, qemu-devel
> > I agree with the fact that ram_size should be 64 bit. Maybe each
> > machine could test the value and emit an error message if it is too
> > big. Maybe an uint64_t would be better though.
>
> uint64_t is probably more reasonable. I wouldn't begin to know what the
> appropriate amount of ram was for each machine though so I'll let the
> appropriate people handle that :-)
I'd say ram_addr_t is an appropriate type.
Currently this is defined in cpu-defs.h. It should probably be moved elsewhere
because in the current implementation it's really a host type.
If we ever implement >2G ram on a 32-bit host this may need some rethinking.
We can deal with that if/when it happens though. Requiring a 64-bit host for
large quantities of ram seems an acceptable limitation (N.B. I'm only talking
about ram size, not target physical address size).
Paul
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 11:56 ` Robert William Fuller
@ 2008-02-01 16:09 ` M. Warner Losh
2008-02-01 16:47 ` Philip Boulain
2008-02-01 17:35 ` Jamie Lokier
1 sibling, 1 reply; 35+ messages in thread
From: M. Warner Losh @ 2008-02-01 16:09 UTC (permalink / raw)
To: qemu-devel, hydrologiccycle
In message: <47A308DB.3040204@gmail.com>
Robert William Fuller <hydrologiccycle@gmail.com> writes:
: Avi Kivity wrote:
: > Anthony Liguori wrote:
: >> Fabrice Bellard wrote:
: >>> Anthony Liguori wrote:
: >>>> + /* above 4giga memory allocation */
: >>>> + if (above_4g_mem_size > 0) {
: >>>> + ram_addr = qemu_ram_alloc(above_4g_mem_size);
: >>>> + cpu_register_physical_memory(0x100000000,
: >>>> above_4g_mem_size, ram_addr);
: >>>> + }
: >>>> +
: >>>
: >>> Why do you need this ? All the RAM can be registered with a single
: >>> call. I fear you need to do that because of KVM RAM handling
: >>> limitations.
: >>
: >> On the x86, there is a rather large hole at the top of memory.
: >> Currently, we do separate allocations around this whole. You can't
: >> get away from doing multiple cpu_register_physical_memory calls here.
: >> We've discussed just allocating a single chunk with qemu_ram_alloc
: >> since so many places in QEMU assume that you can do phys_ram_base + PA.
: >>
: >> I think I'll change this too into a single qemu_ram_alloc. That will
: >> fix the bug with KVM when using -kernel and large memory anyway :-)
: >
: > Won't that cause all of the memory in the hole to be wasted?
: >
: > You could munmap() it, but it's hardly elegant.
: >
:
: Linux doesn't commit mapped memory until it's faulted. As for other
: platforms, who knows?
Most BSDs are also similarly overcommitted. 95% of the users think
this is a feature, but the other 5 argue 20 times harder sometimes :-(
Warner
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 16:00 ` Paul Brook
@ 2008-02-01 16:21 ` Fabrice Bellard
2008-02-05 11:34 ` Ian Jackson
2008-02-01 17:49 ` [Qemu-devel] Re: [kvm-devel] " Anthony Liguori
1 sibling, 1 reply; 35+ messages in thread
From: Fabrice Bellard @ 2008-02-01 16:21 UTC (permalink / raw)
To: Paul Brook; +Cc: kvm-devel, Anthony Liguori, qemu-devel
Paul Brook wrote:
>>> I agree with the fact that ram_size should be 64 bit. Maybe each
>>> machine could test the value and emit an error message if it is too
>>> big. Maybe an uint64_t would be better though.
>> uint64_t is probably more reasonable. I wouldn't begin to know what the
>> appropriate amount of ram was for each machine though so I'll let the
>> appropriate people handle that :-)
>
> I'd say ram_addr_t is an appropriate type.
> Currently this is defined in cpu-defs.h. It should probably be moved elsewhere
> because in the current implementation it's really a host type.
>
> If we ever implement >2G ram on a 32-bit host this may need some rethinking.
> We can deal with that if/when it happens though. Requiring a 64-bit host for
> large quantities of ram seems an acceptable limitation (N.B. I'm only talking
> about ram size, not target physical address size).
I agree.
Fabrice.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 16:09 ` M. Warner Losh
@ 2008-02-01 16:47 ` Philip Boulain
0 siblings, 0 replies; 35+ messages in thread
From: Philip Boulain @ 2008-02-01 16:47 UTC (permalink / raw)
To: qemu-devel
On 1 Feb 2008, at 16:09, M. Warner Losh wrote:
> In message: <47A308DB.3040204@gmail.com>
> Robert William Fuller <hydrologiccycle@gmail.com> writes:
> : Avi Kivity wrote:
> : > Anthony Liguori wrote:
> : >> I think I'll change this too into a single qemu_ram_alloc.
> That will
> : >> fix the bug with KVM when using -kernel and large memory
> anyway :-)
> : > Won't that cause all of the memory in the hole to be wasted?
> : Linux doesn't commit mapped memory until it's faulted. As for other
> : platforms, who knows?
It would appear that modern Windows also overcommits:
"This memory isn’t allocated until the application explicitly uses
it. Once the application uses the page, it becomes committed."
"When an application touches a virtual memory page (reads/write/
programmatically commits) the page becomes a committed page. It is
now backed by a physical memory page. This will usually be a
physical RAM page, but could eventually be a page in the page file on
the hard disk, or it could be a page in a memory mapped file on the
hard disk."
-- http://blogs.msdn.com/ntdebugging/archive/2007/10/10/the-memory-
shell-game.aspx
So it looks like you could get away with this on the two big host
platforms.
> Most BSDs are also similarly overcommitted. 95% of the users think
> this is a feature, but the other 5 argue 20 times harder sometimes :-(
Some of us don't like the idea that our operating systems lie about
how many resources they have available, then have to club innocent
processes over the head when their lies catch up with them. ;)
Phil
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 11:56 ` Robert William Fuller
2008-02-01 16:09 ` M. Warner Losh
@ 2008-02-01 17:35 ` Jamie Lokier
1 sibling, 0 replies; 35+ messages in thread
From: Jamie Lokier @ 2008-02-01 17:35 UTC (permalink / raw)
To: qemu-devel
Robert William Fuller wrote:
> Linux doesn't commit mapped memory until it's faulted. As for other
> platforms, who knows?
Correction: most Linux installations don't commit mapped memory until
it's faulted. A few do, as a matter of policy (it depends on kernel
settings), so that applications won't randomly crash when too much
memory is faulted, but will return 0 from malloc() earlier.
For those few systems, use MAP_NORESERVE if you still need to allocate
the address space.
-- Jamie
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] Re: [kvm-devel] [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 16:00 ` Paul Brook
2008-02-01 16:21 ` Fabrice Bellard
@ 2008-02-01 17:49 ` Anthony Liguori
1 sibling, 0 replies; 35+ messages in thread
From: Anthony Liguori @ 2008-02-01 17:49 UTC (permalink / raw)
To: Paul Brook; +Cc: kvm-devel, qemu-devel
Paul Brook wrote:
>>> I agree with the fact that ram_size should be 64 bit. Maybe each
>>> machine could test the value and emit an error message if it is too
>>> big. Maybe an uint64_t would be better though.
>>>
>> uint64_t is probably more reasonable. I wouldn't begin to know what the
>> appropriate amount of ram was for each machine though so I'll let the
>> appropriate people handle that :-)
>>
>
> I'd say ram_addr_t is an appropriate type.
> Currently this is defined in cpu-defs.h. It should probably be moved elsewhere
> because in the current implementation it's really a host type.
>
Okay, it turns out that patch needed a lot of refactoring. I agree that
changing ram_addr_t to a host type is the right thing to do.
> If we ever implement >2G ram on a 32-bit host this may need some rethinking.
> We can deal with that if/when it happens though. Requiring a 64-bit host for
> large quantities of ram seems an acceptable limitation (N.B. I'm only talking
> about ram size, not target physical address size).
>
My current limitation is < 2GB if HOST_BITS==32 or defined(USE_KQEMU).
USE_KQEMU restricts the size of the phys_map which limits the maximum
physical address size. I guess technically USE_KQEMU could allow up to
around 3GB of ram but I preferred to simplify the logic.
Regards,
Anthony Liguori
> Paul
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> kvm-devel mailing list
> kvm-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 15:40 ` Ian Jackson
@ 2008-02-01 17:53 ` Anthony Liguori
2008-02-01 17:57 ` Daniel P. Berrange
0 siblings, 1 reply; 35+ messages in thread
From: Anthony Liguori @ 2008-02-01 17:53 UTC (permalink / raw)
To: Ian Jackson; +Cc: kvm-devel, qemu-devel, Paul Brook
Ian Jackson wrote:
> Anthony Liguori writes ("[Qemu-devel] Re: [kvm-devel] [PATCH 1/6] Use correct types to enable > 2G support"):
>
>> The alternative is to change all the places that assume phys_ram_base +
>> PA which I don't like very much.
>>
>
> We would ideally like to do this for Xen, at least in the places we
> care about. (Xen uses less of the qemu tree than KVM, I think.)
>
Support for the map cache in the Xen tree is a rather big change that
I'm not going to attempt to support it in this patch series.
I'd rather preserve the phys_ram_base + PA assumption because it allows
us to be able to do support > 1 page DMA operations for our virtual IO
drivers. If you break the assumption that physically contiguous memory
in the guest is virtual contiguous memory in the host, things get pretty
ugly.
Regards,
Anthony Liguori
> In Xen, the guest memory is not in general mapped into the host qemu's
> address space.
>
> Ian.
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> kvm-devel mailing list
> kvm-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 17:53 ` [kvm-devel] [Qemu-devel] " Anthony Liguori
@ 2008-02-01 17:57 ` Daniel P. Berrange
2008-02-01 20:31 ` Anthony Liguori
0 siblings, 1 reply; 35+ messages in thread
From: Daniel P. Berrange @ 2008-02-01 17:57 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kvm-devel, Ian Jackson, qemu-devel, Paul Brook
On Fri, Feb 01, 2008 at 11:53:02AM -0600, Anthony Liguori wrote:
> Ian Jackson wrote:
> > Anthony Liguori writes ("[Qemu-devel] Re: [kvm-devel] [PATCH 1/6] Use correct types to enable > 2G support"):
> >
> >> The alternative is to change all the places that assume phys_ram_base +
> >> PA which I don't like very much.
> >>
> >
> > We would ideally like to do this for Xen, at least in the places we
> > care about. (Xen uses less of the qemu tree than KVM, I think.)
> >
>
> Support for the map cache in the Xen tree is a rather big change that
> I'm not going to attempt to support it in this patch series.
>
> I'd rather preserve the phys_ram_base + PA assumption because it allows
> us to be able to do support > 1 page DMA operations for our virtual IO
> drivers. If you break the assumption that physically contiguous memory
> in the guest is virtual contiguous memory in the host, things get pretty
> ugly.
Well Xen i386 has no choice but to use the map cache, since PAE lets
i386 guests have as much as 100 GB of memory & there's no way you can
map that into QEMU's 32-bit userspace. So if virt IO has a dependancy
on contigious memory access in QEMU its not going to play nice with
Xen.
Dan.
--
|=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=|
|=- Perl modules: http://search.cpan.org/~danberr/ -=|
|=- Projects: http://freshmeat.net/~danielpb/ -=|
|=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 17:57 ` Daniel P. Berrange
@ 2008-02-01 20:31 ` Anthony Liguori
2008-02-01 21:33 ` Paul Brook
0 siblings, 1 reply; 35+ messages in thread
From: Anthony Liguori @ 2008-02-01 20:31 UTC (permalink / raw)
To: Daniel P. Berrange; +Cc: kvm-devel, Ian Jackson, qemu-devel, Paul Brook
Daniel P. Berrange wrote:
> On Fri, Feb 01, 2008 at 11:53:02AM -0600, Anthony Liguori wrote:
>
>> Ian Jackson wrote:
>>
>>> Anthony Liguori writes ("[Qemu-devel] Re: [kvm-devel] [PATCH 1/6] Use correct types to enable > 2G support"):
>>>
>>>
>>>> The alternative is to change all the places that assume phys_ram_base +
>>>> PA which I don't like very much.
>>>>
>>>>
>>> We would ideally like to do this for Xen, at least in the places we
>>> care about. (Xen uses less of the qemu tree than KVM, I think.)
>>>
>>>
>> Support for the map cache in the Xen tree is a rather big change that
>> I'm not going to attempt to support it in this patch series.
>>
>> I'd rather preserve the phys_ram_base + PA assumption because it allows
>> us to be able to do support > 1 page DMA operations for our virtual IO
>> drivers. If you break the assumption that physically contiguous memory
>> in the guest is virtual contiguous memory in the host, things get pretty
>> ugly.
>>
>
> Well Xen i386 has no choice but to use the map cache, since PAE lets
> i386 guests have as much as 100 GB of memory & there's no way you can
> map that into QEMU's 32-bit userspace. So if virt IO has a dependancy
> on contigious memory access in QEMU its not going to play nice with
> Xen.
>
For KVM (and it sounds like QEMU), we're just making the statement that
32-bit hosts cannot support > 2GB guests. I know that's a regression
for Xen but in all fairness, I did raise this as an objection when the
map cache was first introduced :-)
virtio could still be made to work with map cache. You would just have
to change it to be able to map more than one page contiguously. As I
mentioned though, it just starts getting ugly.
Regards,
Anthony Liguori
> Dan.
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 20:31 ` Anthony Liguori
@ 2008-02-01 21:33 ` Paul Brook
0 siblings, 0 replies; 35+ messages in thread
From: Paul Brook @ 2008-02-01 21:33 UTC (permalink / raw)
To: qemu-devel; +Cc: kvm-devel, Ian Jackson
> virtio could still be made to work with map cache. You would just have
> to change it to be able to map more than one page contiguously. As I
> mentioned though, it just starts getting ugly.
That's why you should be using the cpu_physical_memory_rw routines :-)
Anything that assume large linear accesses (Currently only some of the
embedded LCD controllers) is going to break as soon as you start introducing
IOMMUs. There have been several threads on this list about having a sane DMA
infrastructure.
Paul
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Qemu-devel] Re: [kvm-devel] [PATCH 1/6] Use correct types to enable > 2G support
2008-01-31 22:36 ` [Qemu-devel] [PATCH 1/6] Use correct types to enable > 2G support Anthony Liguori
2008-01-31 23:54 ` [Qemu-devel] " Paul Brook
2008-02-01 10:26 ` Fabrice Bellard
@ 2008-02-03 8:58 ` Izik Eidus
2 siblings, 0 replies; 35+ messages in thread
From: Izik Eidus @ 2008-02-03 8:58 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kvm-devel, Paul Brook, qemu-devel
On Thu, 2008-01-31 at 16:36 -0600, Anthony Liguori wrote:
> KVM supports more than 2GB of memory for x86_64 hosts. The following patch
> fixes a number of type related issues where int's were being used when they
> shouldn't have been. It also introduces CMOS support so the BIOS can build
> the appropriate e820 tables.
the CMOS addresses that we used to register the above 4 giga memory are
reserved and therefor the qemu bios does not know them
you have to patch the bios as well to make it work with above 4 giga
i once wrote this patch to qemu,
hope it still apply.
commit 21ea5f8286fd9cd7124dfa0865a213613b51add5
Author: Izik Eidus <izike@qumranet.com>
Date: Mon Aug 20 17:46:04 2007 +0300
kvm: bios: add support to memory above the pci hole
the new memory region is mapped after address 0x100000000,
the bios take the size of the memory after the 0x100000000 from
three new cmos bytes.
diff --git a/bios/rombios.c b/bios/rombios.c
index 9ea2dbc..ac918ad 100644
--- a/bios/rombios.c
+++ b/bios/rombios.c
@@ -4078,22 +4078,25 @@ BX_DEBUG_INT15("case default:\n");
#endif
-void set_e820_range(ES, DI, start, end, type)
+void set_e820_range(ES, DI, start, end, extra_start, extra_end, type)
Bit16u ES;
Bit16u DI;
Bit32u start;
Bit32u end;
+ Bit8u extra_start;
+ Bit8u extra_end;
Bit16u type;
{
write_word(ES, DI, start);
write_word(ES, DI+2, start >> 16);
- write_word(ES, DI+4, 0x00);
+ write_word(ES, DI+4, extra_start);
write_word(ES, DI+6, 0x00);
end -= start;
+ extra_end -= extra_start;
write_word(ES, DI+8, end);
write_word(ES, DI+10, end >> 16);
- write_word(ES, DI+12, 0x0000);
+ write_word(ES, DI+12, extra_end);
write_word(ES, DI+14, 0x0000);
write_word(ES, DI+16, type);
@@ -4106,7 +4109,9 @@ int15_function32(regs, ES, DS, FLAGS)
Bit16u ES, DS, FLAGS;
{
Bit32u extended_memory_size=0; // 64bits long
+ Bit32u extra_lowbits_memory_size=0;
Bit16u CX,DX;
+ Bit8u extra_highbits_memory_size=0;
BX_DEBUG_INT15("int15 AX=%04x\n",regs.u.r16.ax);
@@ -4179,11 +4184,18 @@ ASM_END
extended_memory_size *= 1024;
}
+ extra_lowbits_memory_size = inb_cmos(0x5c);
+ extra_lowbits_memory_size <<= 8;
+ extra_lowbits_memory_size |= inb_cmos(0x5b);
+ extra_lowbits_memory_size *= 64;
+ extra_lowbits_memory_size *= 1024;
+ extra_highbits_memory_size = inb_cmos(0x5d);
+
switch(regs.u.r16.bx)
{
case 0:
set_e820_range(ES, regs.u.r16.di,
- 0x0000000L, 0x0009fc00L, 1);
+ 0x0000000L, 0x0009fc00L, 0, 0, 1);
regs.u.r32.ebx = 1;
regs.u.r32.eax = 0x534D4150;
regs.u.r32.ecx = 0x14;
@@ -4192,7 +4204,7 @@ ASM_END
break;
case 1:
set_e820_range(ES, regs.u.r16.di,
- 0x0009fc00L, 0x000a0000L, 2);
+ 0x0009fc00L, 0x000a0000L, 0, 0, 2);
regs.u.r32.ebx = 2;
regs.u.r32.eax = 0x534D4150;
regs.u.r32.ecx = 0x14;
@@ -4201,7 +4213,7 @@ ASM_END
break;
case 2:
set_e820_range(ES, regs.u.r16.di,
- 0x000e8000L, 0x00100000L, 2);
+ 0x000e8000L, 0x00100000L, 0, 0, 2);
regs.u.r32.ebx = 3;
regs.u.r32.eax = 0x534D4150;
regs.u.r32.ecx = 0x14;
@@ -4211,7 +4223,7 @@ ASM_END
case 3:
set_e820_range(ES, regs.u.r16.di,
0x00100000L,
- extended_memory_size - ACPI_DATA_SIZE, 1);
+ extended_memory_size - ACPI_DATA_SIZE ,0, 0, 1);
regs.u.r32.ebx = 4;
regs.u.r32.eax = 0x534D4150;
regs.u.r32.ecx = 0x14;
@@ -4221,7 +4233,7 @@ ASM_END
case 4:
set_e820_range(ES, regs.u.r16.di,
extended_memory_size - ACPI_DATA_SIZE,
- extended_memory_size, 3); // ACPI RAM
+ extended_memory_size ,0, 0, 3); // ACPI RAM
regs.u.r32.ebx = 5;
regs.u.r32.eax = 0x534D4150;
regs.u.r32.ecx = 0x14;
@@ -4231,7 +4243,20 @@ ASM_END
case 5:
/* 256KB BIOS area at the end of 4 GB */
set_e820_range(ES, regs.u.r16.di,
- 0xfffc0000L, 0x00000000L, 2);
+ 0xfffc0000L, 0x00000000L ,0, 0, 2);
+ if (extra_highbits_memory_size || extra_lowbits_memory_size)
+ regs.u.r32.ebx = 6;
+ else
+ regs.u.r32.ebx = 0;
+ regs.u.r32.eax = 0x534D4150;
+ regs.u.r32.ecx = 0x14;
+ CLEAR_CF();
+ return;
+ case 6:
+ /* Maping of memory above 4 GB */
+ set_e820_range(ES, regs.u.r16.di, 0x00000000L,
+ extra_lowbits_memory_size, 1, extra_highbits_memory_size
+ + 1, 1);
regs.u.r32.ebx = 0;
regs.u.r32.eax = 0x534D4150;
regs.u.r32.ecx = 0x14;
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support
2008-02-01 16:21 ` Fabrice Bellard
@ 2008-02-05 11:34 ` Ian Jackson
0 siblings, 0 replies; 35+ messages in thread
From: Ian Jackson @ 2008-02-05 11:34 UTC (permalink / raw)
To: qemu-devel; +Cc: kvm-devel, Anthony Liguori, Paul Brook
Fabrice Bellard writes ("[Qemu-devel] Re: [PATCH 1/6] Use correct types to enable > 2G support"):
> Paul Brook wrote: If we ever implement >2G ram on a 32-bit host this
> > may need some rethinking. We can deal with that if/when it
> > happens though. Requiring a 64-bit host for large quantities of
> > ram seems an acceptable limitation (N.B. I'm only talking about
> > ram size, not target physical address size).
>
> I agree.
This demonstrates quite nicely why we need to get rid of the
assumption that all guest memory is mapped by the host.
The configuration with a 32-bit host (dom0), 64-bit guest, and of
course 64-bit Xen, is very common. qemu runs (currently) in the
dom0.
Ian.
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2008-02-05 11:36 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-31 22:36 [Qemu-devel] [PATCH 0/6] Support for the Kernel Virtual Machine interface Anthony Liguori
2008-01-31 22:36 ` [Qemu-devel] [PATCH 1/6] Use correct types to enable > 2G support Anthony Liguori
2008-01-31 23:54 ` [Qemu-devel] " Paul Brook
2008-02-01 0:25 ` Anthony Liguori
2008-02-01 0:37 ` Paul Brook
2008-02-01 0:40 ` Anthony Liguori
2008-02-01 10:26 ` Fabrice Bellard
2008-02-01 14:35 ` Anthony Liguori
2008-02-01 15:13 ` Avi Kivity
2008-02-01 11:56 ` Robert William Fuller
2008-02-01 16:09 ` M. Warner Losh
2008-02-01 16:47 ` Philip Boulain
2008-02-01 17:35 ` Jamie Lokier
2008-02-01 15:33 ` [Qemu-devel] Re: [kvm-devel] " Anthony Liguori
2008-02-01 15:40 ` Ian Jackson
2008-02-01 17:53 ` [kvm-devel] [Qemu-devel] " Anthony Liguori
2008-02-01 17:57 ` Daniel P. Berrange
2008-02-01 20:31 ` Anthony Liguori
2008-02-01 21:33 ` Paul Brook
2008-02-01 16:00 ` Paul Brook
2008-02-01 16:21 ` Fabrice Bellard
2008-02-05 11:34 ` Ian Jackson
2008-02-01 17:49 ` [Qemu-devel] Re: [kvm-devel] " Anthony Liguori
2008-02-03 8:58 ` Izik Eidus
2008-01-31 22:36 ` [Qemu-devel] [PATCH 2/6] SCI fixes Anthony Liguori
2008-01-31 22:36 ` [Qemu-devel] [PATCH 3/6] Fix daemonize options Anthony Liguori
2008-01-31 22:36 ` [Qemu-devel] [PATCH 4/6] Tell BIOS about the number of CPUs Anthony Liguori
2008-02-01 0:14 ` [Qemu-devel] " Paul Brook
2008-02-01 0:28 ` Anthony Liguori
2008-02-01 0:40 ` Paul Brook
2008-01-31 22:36 ` [Qemu-devel] [PATCH 5/6] Refactor option ROM loading Anthony Liguori
2008-01-31 22:36 ` [Qemu-devel] [PATCH 6/6] QEMU support for the Kernel Virtual Machine interface Anthony Liguori
2008-02-01 9:49 ` [Qemu-devel] " Fabrice Bellard
2008-02-01 14:18 ` Anthony Liguori
2008-01-31 22:53 ` [qemu-devel] [PATCH 0/6] Support " Anthony Liguori
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).