All of lore.kernel.org
 help / color / mirror / Atom feed
* Paravirtualised drivers for fully virtualised domains
@ 2006-07-18 12:51 Steven Smith
  2006-07-18 13:45 ` Ben Thomas
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Steven Smith @ 2006-07-18 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: sos22


[-- Attachment #1.1.1: Type: text/plain, Size: 3748 bytes --]

(The list appears to have eaten my previous attempt to send this.
Apologies if you receive multiple copies.)

The attached patches allow you to use paravirtualised network and
block interfaces from fully virtualised domains, based on Intel's
patches from a few months ago.  These are significantly faster than
the equivalent ioemu devices, sometimes by more than an order of
magnitude.

These drivers are explicitly not considered by XenSource to be an
alternative to improving the performance of the ioemu devices.
Rather, work on both will continue in parallel.

To build, apply the three patches to a clean checkout of xen-unstable
and then build Xen, dom0, and the tools in the usual way.  To build
the drivers themselves, you first need to build a native kernel for
the guest, and then go

cd xen-unstable.hg/unmodified-drivers/linux-2.6
./mkbuildtree
make -C /usr/src/linux-2.6.16 M=$PWD modules

where /usr/src/linux-2.6.16 is the path to the area where you built
the guest kernel.  This should be a native kernel, and not a xenolinux
one.  You should end up with four modules.  xen-evtchn.ko should be
loaded first, followed by xenbus.ko, and then whichever of xen-vnif.ko
and xen-vbd.ko you need.  None of the modules need any arguments.

The xm configuration syntax is exactly the same as it would be for
paravirtualised devices in a paravirtualised domain.  For a network
interface, you take your line

vif= [ 'type=ioemu,mac=00:16:3E:C1:CA:78' ]

(or whatever) and replace it with

vif= [ 'type=ioemu,mac=00:16:3E:C1:CA:78', 'bridge=xenbr0' ]

where bridge=xenbr0 should be some suitable netif configuration
string, as it would be in the PV-on-PV case.  Disk is likewise fairly
simple:

disk = [ 'file:/path/to/image,ioemu:hda,w' ]

becomes

disk = [ 'file:/path/to/image,ioemu:hda,w', 'file:/path/to/some/other/image,hde,w' ]

There is a slight complication in that the paravirtualised block
device can't share an IDE controller with an ioemu device, so if you
have an ioemu hda, the paravirtualised device must be hde or later.
This is to avoid confusing the Linux IDE driver.

Note that having a PV device doesn't imply having a corresponding
ioemu device, and vice versa.  Configuring a single backing store to
appear as both an IDE device and a paravirtualised block device is
likely to cause problems; don't do it.



The patches consist of a number of big parts:

-- A version of netback and netfront which can copy packets into
   domains rather than doing page flipping.  It's much easier to make
   this work well with qemu, since the P2M table doesn't need to
   change, and it can be faster for some workloads.

   The copying interface has been confirmed to work in paravirtualised
   domains, but is currently disabled there.

-- Reworking the device model and hypervisor support so that iorequest
   completion notifications no longer go to the HVM guest's event
   channel mask.  This avoids a whole slew of really quite nasty race
   conditions

-- Adding a new device to the qemu PCI bus which is used for
   bootstrapping the devices and getting an IRQ.

-- Support for hypercalls from HVM domains

-- Various shims and fixes to the frontends so that they work without
   the rest of the xenolinux infrastructure.

The patches still have a few rough edges, and they're not as easy to
understand as I'd like, but I think they should be mostly
comprehensible and reasonably stable.  The plan is to add them to
xen-unstable over the next few weeks, probably before 3.0.3, so any
testing which anyone can do would be helpful.

The Xen and tools changes are also available as a series of smaller
patches at http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/hvm_xen .  The
composition of these gives hvm_xen_unstable.diff.

Steven.

[-- Attachment #1.1.2: copy_netif.diff --]
[-- Type: text/plain, Size: 15145 bytes --]

# HG changeset patch
# User sos22@douglas.cl.cam.ac.uk
# Date 1153175686 -3600
# Node ID 7053592c928b488b0c653fb25ce6f73bc6deeb05
# Parent  4726fd416506a34da96888bac0e7c9772c5037e8
Copying netback.

diff -r 4726fd416506 -r 7053592c928b linux-2.6-xen-sparse/drivers/xen/netback/common.h
--- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Mon Jul 17 22:55:34 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Mon Jul 17 23:34:46 2006 +0100
@@ -59,6 +59,8 @@ typedef struct netif_st {
 	/* Unique identifier for this interface. */
 	domid_t          domid;
 	unsigned int     handle;
+	unsigned int     rx_flags;
+	unsigned int     copy_delivery_offset;
 
 	u8               fe_dev_addr[6];
 
diff -r 4726fd416506 -r 7053592c928b linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Mon Jul 17 22:55:34 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Mon Jul 17 23:34:46 2006 +0100
@@ -63,13 +63,17 @@ static struct timer_list net_timer;
 #define MAX_PENDING_REQS 256
 
 static struct sk_buff_head rx_queue;
-static multicall_entry_t rx_mcl[NET_RX_RING_SIZE+1];
+static multicall_entry_t rx_mcl[NET_RX_RING_SIZE+3];
 static mmu_update_t rx_mmu[NET_RX_RING_SIZE];
-static gnttab_transfer_t grant_rx_op[NET_RX_RING_SIZE];
+static gnttab_transfer_t grant_rx_trans_op[NET_RX_RING_SIZE];
+static gnttab_map_grant_ref_t grant_rx_map_op[NET_RX_RING_SIZE];
+static gnttab_unmap_grant_ref_t grant_rx_unmap_op[NET_RX_RING_SIZE];
 static unsigned char rx_notify[NR_IRQS];
 
 static unsigned long mmap_vstart;
 #define MMAP_VADDR(_req) (mmap_vstart + ((_req) * PAGE_SIZE))
+
+static void *rx_mmap_area;
 
 #define PKT_PROT_LEN 64
 
@@ -96,13 +100,12 @@ static struct list_head net_schedule_lis
 static struct list_head net_schedule_list;
 static spinlock_t net_schedule_list_lock;
 
+static unsigned long alloc_mfn(void)
+{
 #define MAX_MFN_ALLOC 64
-static unsigned long mfn_list[MAX_MFN_ALLOC];
-static unsigned int alloc_index = 0;
-static DEFINE_SPINLOCK(mfn_lock);
-
-static unsigned long alloc_mfn(void)
-{
+	static unsigned long mfn_list[MAX_MFN_ALLOC];
+	static unsigned int alloc_index = 0;
+	static DEFINE_SPINLOCK(mfn_lock);
 	unsigned long mfn = 0, flags;
 	struct xen_memory_reservation reservation = {
 		.nr_extents   = MAX_MFN_ALLOC,
@@ -218,73 +221,122 @@ static void net_rx_action(unsigned long 
 	u16 size, id, irq, flags;
 	multicall_entry_t *mcl;
 	mmu_update_t *mmu;
-	gnttab_transfer_t *gop;
+	gnttab_transfer_t *flip_gop;
+	gnttab_map_grant_ref_t *map_gop;
+	gnttab_unmap_grant_ref_t *unmap_gop;
 	unsigned long vdata, old_mfn, new_mfn;
-	struct sk_buff_head rxq;
+	struct sk_buff_head flip_rxq, copy_rxq;
 	struct sk_buff *skb;
 	u16 notify_list[NET_RX_RING_SIZE];
 	int notify_nr = 0;
 	int ret;
-
-	skb_queue_head_init(&rxq);
+	void *rx_mmap_ptr;
+	netif_rx_request_t *rx_req_p;
+	void *remote_data;
+
+	skb_queue_head_init(&flip_rxq);
+	skb_queue_head_init(&copy_rxq);
 
 	mcl = rx_mcl;
 	mmu = rx_mmu;
-	gop = grant_rx_op;
-
+	flip_gop = grant_rx_trans_op;
+	map_gop = grant_rx_map_op;
+	rx_mmap_ptr = rx_mmap_area;
+
+	/* Split the incoming skbs according to whether they need to
+	   be page flipped or copied, and build up the first set of
+	   hypercall arguments. */
 	while ((skb = skb_dequeue(&rx_queue)) != NULL) {
 		netif   = netdev_priv(skb->dev);
-		vdata   = (unsigned long)skb->data;
-		old_mfn = virt_to_mfn(vdata);
-
-		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-			/* Memory squeeze? Back off for an arbitrary while. */
-			if ((new_mfn = alloc_mfn()) == 0) {
-				if ( net_ratelimit() )
-					WPRINTK("Memory squeeze in netback "
-						"driver.\n");
-				mod_timer(&net_timer, jiffies + HZ);
-				skb_queue_head(&rx_queue, skb);
+		size    = skb->tail - skb->data;
+		rx_req_p = RING_GET_REQUEST(&netif->rx,
+					    netif->rx.req_cons);
+
+		if (netif->rx_flags &&
+		    (rx_req_p->flags & NETIF_RXRF_copy_packet)) {
+			if (map_gop - grant_rx_map_op ==
+			    ARRAY_SIZE(grant_rx_map_op))
 				break;
+			if (size > PAGE_SIZE - netif->copy_delivery_offset) {
+				if (net_ratelimit()) {
+					printk("Discarding jumbogram to copying interface\n");
+				}
+				netif_put(netif);
+				dev_kfree_skb(skb);
+				continue;
 			}
-			/*
-			 * Set the new P2M table entry before reassigning
-			 * the old data page. Heed the comment in
-			 * pgtable-2level.h:pte_page(). :-)
-			 */
-			set_phys_to_machine(
-				__pa(skb->data) >> PAGE_SHIFT,
-				new_mfn);
-
-			MULTI_update_va_mapping(mcl, vdata,
-						pfn_pte_ma(new_mfn,
-							   PAGE_KERNEL), 0);
-			mcl++;
-
-			mmu->ptr = ((maddr_t)new_mfn << PAGE_SHIFT) |
-				MMU_MACHPHYS_UPDATE;
-			mmu->val = __pa(vdata) >> PAGE_SHIFT;
-			mmu++;
-		}
-
-		gop->mfn = old_mfn;
-		gop->domid = netif->domid;
-		gop->ref = RING_GET_REQUEST(
-			&netif->rx, netif->rx.req_cons)->gref;
-		netif->rx.req_cons++;
-		gop++;
-
-		__skb_queue_tail(&rxq, skb);
-
-		/* Filled the batch queue? */
-		if ((gop - grant_rx_op) == ARRAY_SIZE(grant_rx_op))
-			break;
-	}
-
-	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-		if (mcl == rx_mcl)
-			return;
-
+			map_gop->host_addr = (unsigned long)rx_mmap_ptr;
+			map_gop->dom       = netif->domid;
+			map_gop->ref       = rx_req_p->gref;
+			map_gop->flags     = GNTMAP_host_map;
+			map_gop++;
+			rx_mmap_ptr += PAGE_SIZE;
+
+			memcpy(skb->cb, rx_req_p, sizeof(*rx_req_p));
+
+			netif->rx.req_cons++;
+			__skb_queue_tail(&copy_rxq, skb);
+		} else {
+			/* Filled the batch queue? */
+			if ((flip_gop - grant_rx_trans_op) ==
+			    ARRAY_SIZE(grant_rx_trans_op))
+				break;
+
+			vdata   = (unsigned long)skb->data;
+			old_mfn = virt_to_mfn(vdata);
+
+			if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+				/* Memory squeeze? Back off for an
+				 * arbitrary while. */
+				if ((new_mfn = alloc_mfn()) == 0) {
+					if ( net_ratelimit() )
+						WPRINTK("Memory squeeze in netback "
+							"driver.\n");
+					mod_timer(&net_timer, jiffies + HZ);
+					skb_queue_head(&rx_queue, skb);
+					break;
+				}
+				/*
+				 * Set the new P2M table entry before
+				 * reassigning the old data page. Heed
+				 * the comment in
+				 * pgtable-2level.h:pte_page(). :-)
+				 */
+				set_phys_to_machine(
+					__pa(skb->data) >> PAGE_SHIFT,
+					new_mfn);
+
+				MULTI_update_va_mapping(mcl, vdata,
+							pfn_pte_ma(new_mfn,
+								   PAGE_KERNEL), 0);
+				mcl++;
+
+				mmu->ptr = ((maddr_t)new_mfn << PAGE_SHIFT) |
+					MMU_MACHPHYS_UPDATE;
+				mmu->val = __pa(vdata) >> PAGE_SHIFT;
+				mmu++;
+			}
+
+			flip_gop->mfn   = old_mfn;
+			flip_gop->domid = netif->domid;
+			flip_gop->ref   = rx_req_p->gref;
+			flip_gop++;
+
+			netif->rx.req_cons++;
+			__skb_queue_tail(&flip_rxq, skb);
+		}
+
+		netif->stats.tx_bytes += size;
+		netif->stats.tx_packets++;
+	}
+
+	if (flip_gop == grant_rx_trans_op && map_gop == grant_rx_map_op) {
+		/* Nothing to do */
+		return;
+	}
+
+	if (mcl != rx_mcl) {
+		/* Did some unmaps -> need a TLB flush */
 		mcl[-1].args[MULTI_UVMFLAGS_INDEX] = UVMF_TLB_FLUSH|UVMF_ALL;
 
 		if (mmu - rx_mmu) {
@@ -296,26 +348,32 @@ static void net_rx_action(unsigned long 
 			mcl++;
 		}
 
-		ret = HYPERVISOR_multicall(rx_mcl, mcl - rx_mcl);
-		BUG_ON(ret != 0);
-	}
-
-	ret = HYPERVISOR_grant_table_op(GNTTABOP_transfer, grant_rx_op, 
-					gop - grant_rx_op);
+		BUG_ON(flip_gop == grant_rx_trans_op);
+		MULTI_grant_table_op(mcl, GNTTABOP_transfer,
+				     grant_rx_trans_op,
+				     flip_gop - grant_rx_trans_op);
+		mcl++;
+	}
+	if (map_gop != grant_rx_map_op) {
+		MULTI_grant_table_op(mcl, GNTTABOP_map_grant_ref,
+				     grant_rx_map_op,
+				     map_gop - grant_rx_map_op);
+		mcl++;
+	}
+
+	ret = HYPERVISOR_multicall(rx_mcl, mcl - rx_mcl);
 	BUG_ON(ret != 0);
 
+	/* Now do all of the page flips */
 	mcl = rx_mcl;
-	gop = grant_rx_op;
-	while ((skb = __skb_dequeue(&rxq)) != NULL) {
+	flip_gop = grant_rx_trans_op;
+	while ((skb = __skb_dequeue(&flip_rxq)) != NULL) {
 		netif   = netdev_priv(skb->dev);
 		size    = skb->tail - skb->data;
 
 		atomic_set(&(skb_shinfo(skb)->dataref), 1);
 		skb_shinfo(skb)->nr_frags = 0;
 		skb_shinfo(skb)->frag_list = NULL;
-
-		netif->stats.tx_bytes += size;
-		netif->stats.tx_packets++;
 
 		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
 			/* The update_va_mapping() must not fail. */
@@ -325,14 +383,14 @@ static void net_rx_action(unsigned long 
 
 		/* Check the reassignment error code. */
 		status = NETIF_RSP_OKAY;
-		if (gop->status != 0) { 
+		if (flip_gop->status != 0) { 
 			DPRINTK("Bad status %d from grant transfer to DOM%u\n",
-				gop->status, netif->domid);
+				flip_gop->status, netif->domid);
 			/*
 			 * Page no longer belongs to us unless GNTST_bad_page,
 			 * but that should be a fatal error anyway.
 			 */
-			BUG_ON(gop->status == GNTST_bad_page);
+			BUG_ON(flip_gop->status == GNTST_bad_page);
 			status = NETIF_RSP_ERROR; 
 		}
 		irq = netif->irq;
@@ -352,7 +410,72 @@ static void net_rx_action(unsigned long 
 
 		netif_put(netif);
 		dev_kfree_skb(skb);
-		gop++;
+		flip_gop++;
+	}
+
+	/* Now do all of the copies */
+	map_gop = grant_rx_map_op;
+	unmap_gop = grant_rx_unmap_op;
+	skb = ((struct sk_buff *)&copy_rxq)->next;
+	while (skb != (struct sk_buff *)&copy_rxq) {
+		netif = netdev_priv(skb->dev);
+		size  = skb->tail - skb->data;
+
+		rx_req_p = (netif_rx_request_t *)skb->cb;
+
+		if (map_gop->status == 0) {
+			remote_data =
+				(void *)(unsigned long)map_gop->host_addr;
+			memcpy(remote_data + 16,
+			       skb->data,
+			       size);
+			unmap_gop->host_addr    = map_gop->host_addr;
+			unmap_gop->dev_bus_addr = 0;
+			unmap_gop->handle       = map_gop->handle;
+			unmap_gop++;
+		}
+
+		map_gop++;
+		skb = skb->next;
+	}
+
+	/* Unmap the packets we just copied into */
+	if (unmap_gop != grant_rx_unmap_op) {
+		ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
+						grant_rx_unmap_op,
+						unmap_gop - grant_rx_unmap_op);
+		BUG_ON(ret);
+		map_gop = grant_rx_map_op;
+		/* And notify the other side. */
+		while ((skb = __skb_dequeue(&copy_rxq)) != NULL) {
+			netif = netdev_priv(skb->dev);
+			rx_req_p = (netif_rx_request_t *)skb->cb;
+
+			flags = 0;
+			if (skb->ip_summed == CHECKSUM_HW)
+				flags |= (NETRXF_csum_blank |
+					  NETRXF_data_validated);
+			else if (skb->proto_data_valid)
+				flags |= NETRXF_data_validated;
+
+			if (map_gop->status)
+				status = NETIF_RSP_ERROR;
+			else
+				status = NETIF_RSP_OKAY;
+
+			irq = netif->irq;
+			if (make_rx_response(netif, rx_req_p->id, status,
+					     netif->copy_delivery_offset, size,
+					     flags) &&
+			    rx_notify[irq] == 0) {
+				rx_notify[irq] = 1;
+				notify_list[notify_nr++] = irq;
+			}
+
+			netif_put(netif);
+			dev_kfree_skb(skb);
+			map_gop++;
+		}
 	}
 
 	while (notify_nr != 0) {
@@ -966,6 +1089,12 @@ static void netif_page_release(struct pa
 	set_page_count(page, 1);
 
 	netif_idx_release(pending_idx);
+}
+
+static void netif_rx_page_release(struct page *page)
+{
+	/* Ready for next use. */
+	set_page_count(page, 1);
 }
 
 irqreturn_t netif_be_int(int irq, void *dev_id, struct pt_regs *regs)
@@ -1093,6 +1222,16 @@ static int __init netback_init(void)
 		SetPageForeign(page, netif_page_release);
 	}
 
+	page = balloon_alloc_empty_page_range(NET_RX_RING_SIZE);
+	BUG_ON(page == NULL);
+	rx_mmap_area = pfn_to_kaddr(page_to_pfn(page));
+
+	for (i = 0; i < NET_RX_RING_SIZE; i++) {
+		page = virt_to_page(rx_mmap_area + (i * PAGE_SIZE));
+		set_page_count(page, 1);
+		SetPageForeign(page, netif_rx_page_release);
+	}
+
 	pending_cons = 0;
 	pending_prod = MAX_PENDING_REQS;
 	for (i = 0; i < MAX_PENDING_REQS; i++)
diff -r 4726fd416506 -r 7053592c928b linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Mon Jul 17 22:55:34 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Mon Jul 17 23:34:46 2006 +0100
@@ -110,6 +110,18 @@ static int netback_probe(struct xenbus_d
 		}
 #endif
 
+		err = xenbus_printf(xbt, dev->nodename, "feature-rx-copy", "%d", 1);
+		if (err) {
+			message = "writing feature-copying";
+			goto abort_transaction;
+		}
+
+		err = xenbus_printf(xbt, dev->nodename, "feature-rx-flags", "%d", 1);
+		if (err) {
+			message = "writing feature-rx-flags";
+			goto abort_transaction;
+		}
+
 		err = xenbus_transaction_end(xbt, 0);
 	} while (err == -EAGAIN);
 
@@ -363,6 +375,30 @@ static int connect_rings(struct backend_
 	if (err) {
 		xenbus_dev_fatal(dev, err,
 				 "reading %s/ring-ref and event-channel",
+				 dev->otherend);
+		return err;
+	}
+
+	err = xenbus_scanf(XBT_NIL, dev->otherend,
+			   "use-rx-flags", "%u",
+			   &be->netif->rx_flags);
+	if (err == -ENOENT) {
+		be->netif->rx_flags = 0;
+	} else if (err < 0) {
+		xenbus_dev_fatal(dev, err,
+				 "reading %s/use-rx-flags",
+				 dev->otherend);
+		return err;
+	}
+
+	err = xenbus_scanf(XBT_NIL, dev->otherend,
+			   "copy-delivery-offset", "%u",
+			   &be->netif->copy_delivery_offset);
+	if (err == -ENOENT) {
+		be->netif->copy_delivery_offset = 0;
+	} else if (err < 0) {
+		xenbus_dev_fatal(dev, err,
+				 "reading %s/copy_delivery_offset",
 				 dev->otherend);
 		return err;
 	}
diff -r 4726fd416506 -r 7053592c928b linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypervisor.h
--- a/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypervisor.h	Mon Jul 17 22:55:34 2006 +0100
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypervisor.h	Mon Jul 17 23:34:46 2006 +0100
@@ -200,6 +200,16 @@ MULTI_update_va_mapping(
 }
 
 static inline void
+MULTI_grant_table_op(multicall_entry_t *mcl, unsigned int cmd,
+		     void *uop, unsigned int count)
+{
+    mcl->op = __HYPERVISOR_grant_table_op;
+    mcl->args[0] = cmd;
+    mcl->args[1] = (unsigned long)uop;
+    mcl->args[2] = count;
+}
+
+static inline void
 MULTI_update_va_mapping_otherdomain(
     multicall_entry_t *mcl, unsigned long va,
     pte_t new_val, unsigned long flags, domid_t domid)
diff -r 4726fd416506 -r 7053592c928b xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h	Mon Jul 17 22:55:34 2006 +0100
+++ b/xen/include/public/io/netif.h	Mon Jul 17 23:34:46 2006 +0100
@@ -109,8 +109,12 @@ struct netif_tx_response {
 };
 typedef struct netif_tx_response netif_tx_response_t;
 
+#define _NETIF_RXRF_copy_packet (0)
+#define  NETIF_RXRF_copy_packet (1U<<_NETIF_RXRF_copy_packet)
+
 struct netif_rx_request {
     uint16_t    id;        /* Echoed in response message.        */
+    uint16_t    flags;     /* NETRXRF_* */
     grant_ref_t gref;      /* Reference to incoming granted frame */
 };
 typedef struct netif_rx_request netif_rx_request_t;

[-- Attachment #1.1.3: frontend_changes.diff --]
[-- Type: text/plain, Size: 66799 bytes --]

# HG changeset patch
# User sos22@douglas.cl.cam.ac.uk
# Date 1153175939 -3600
# Node ID aa3087ee5769d60d5ab1e368cc062233d364ec8b
# Parent  7053592c928b488b0c653fb25ce6f73bc6deeb05
Frontend parts of PV-on-HVM patches.

diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/blkfront/blkfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/blkfront/blkfront.c	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/blkfront/blkfront.c	Mon Jul 17 23:38:59 2006 +0100
@@ -46,6 +46,7 @@
 #include <xen/interface/grant_table.h>
 #include <xen/gnttab.h>
 #include <asm/hypervisor.h>
+#include <asm/maddr.h>
 
 #define BLKIF_STATE_DISCONNECTED 0
 #define BLKIF_STATE_CONNECTED    1
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/core/gnttab.c
--- a/linux-2.6-xen-sparse/drivers/xen/core/gnttab.c	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/core/gnttab.c	Mon Jul 17 23:38:59 2006 +0100
@@ -41,6 +41,13 @@
 #include <asm/pgtable.h>
 #include <asm/uaccess.h>
 #include <asm/synch_bitops.h>
+#include <asm/maddr.h>
+#include <xen/interface/memory.h>
+
+#ifndef CONFIG_XEN
+#include <asm/io.h>
+#include <evtchn-pci.h>
+#endif
 
 /* External tools reserve first few grant table entries. */
 #define NR_RESERVED_ENTRIES 8
@@ -350,6 +357,7 @@ void gnttab_cancel_free_callback(struct 
 }
 EXPORT_SYMBOL_GPL(gnttab_cancel_free_callback);
 
+#ifdef CONFIG_XEN
 #ifndef __ia64__
 static int map_pte_fn(pte_t *pte, struct page *pmd_page,
 		      unsigned long addr, void *data)
@@ -404,23 +412,49 @@ int gnttab_resume(void)
 	shared = __va(frames[0] << PAGE_SHIFT);
 	printk("grant table at %p\n", shared);
 #endif
-
-	return 0;
-}
+}
+#else /* !CONFIG_XEN */
+int
+gnttab_resume(void)
+{
+	unsigned long frames;
+	int x;
+	struct xen_add_to_physmap xatp;
+
+	frames = alloc_xen_mmio(PAGE_SIZE * NR_GRANT_FRAMES);
+	shared = ioremap(frames, PAGE_SIZE * NR_GRANT_FRAMES);
+	if(!shared){
+		printk("error to ioremap gnttab share frames\n");
+		return -1;
+	}
+	for (x = 0; x < NR_GRANT_FRAMES; x++) {
+		xatp.domid = DOMID_SELF;
+		xatp.idx = x;
+		xatp.space = XENMAPSPACE_grant_table;
+		xatp.gpfn = (frames >> PAGE_SHIFT) + x;
+		BUG_ON(HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp));
+	}
+	return 0;
+}
+#endif
 
 int gnttab_suspend(void)
 {
 
 #ifndef __ia64__
+#ifdef CONFIG_XEN
 	apply_to_page_range(&init_mm, (unsigned long)shared,
 			    PAGE_SIZE * NR_GRANT_FRAMES,
 			    unmap_pte_fn, NULL);
-#endif
-
-	return 0;
-}
-
-static int __init gnttab_init(void)
+#else
+	iounmap(shared);
+#endif
+#endif
+
+	return 0;
+}
+
+int __init gnttab_init(void)
 {
 	int i;
 
@@ -439,4 +473,6 @@ static int __init gnttab_init(void)
 	return 0;
 }
 
+#ifdef CONFIG_XEN
 core_initcall(gnttab_init);
+#endif
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/core/xen_proc.c
--- a/linux-2.6-xen-sparse/drivers/xen/core/xen_proc.c	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/core/xen_proc.c	Mon Jul 17 23:38:59 2006 +0100
@@ -1,4 +1,5 @@
 
+#include <linux/module.h>
 #include <linux/config.h>
 #include <linux/proc_fs.h>
 #include <xen/xen_proc.h>
@@ -12,6 +13,7 @@ struct proc_dir_entry *create_xen_proc_e
 			panic("Couldn't create /proc/xen");
 	return create_proc_entry(name, mode, xen_base);
 }
+EXPORT_SYMBOL(create_xen_proc_entry);
 
 void remove_xen_proc_entry(const char *name)
 {
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Mon Jul 17 23:38:59 2006 +0100
@@ -61,6 +61,25 @@
 #include <asm/uaccess.h>
 #include <xen/interface/grant_table.h>
 #include <xen/gnttab.h>
+#include <asm/maddr.h>
+
+/* If we don't have GSO, fake things up so that we never try to use
+   it */
+#ifndef NETIF_F_GSO
+#define netif_needs_gso(dev, skb) 0
+#define NETIF_F_GSO_ROBUST 0
+#define NETIF_F_GSO_SHIFT 16
+#else
+#define HAVE_GSO
+#endif
+
+#ifdef CONFIG_XEN
+#define SKB_PROTO_DATA_VALID(skb) (skb)->proto_data_valid
+#define SET_SKB_PROTO_DATA_VALID(skb, v) do { (skb)->proto_data_valid = (v); } while (0)
+#else
+#define SKB_PROTO_DATA_VALID(skb) 0
+#define SET_SKB_PROTO_DATA_VALID(skb, v) do {} while (0)
+#endif
 
 #define GRANT_INVALID_REF	0
 
@@ -88,6 +107,7 @@ struct netfront_info {
 
 	unsigned int handle;
 	unsigned int evtchn, irq;
+	unsigned int copyall;
 
 	/* Receive-ring batched refills. */
 #define RX_MIN_TARGET 8
@@ -148,7 +168,7 @@ static inline unsigned short get_id_from
 
 static int talk_to_backend(struct xenbus_device *, struct netfront_info *);
 static int setup_device(struct xenbus_device *, struct netfront_info *);
-static struct net_device *create_netdev(int, struct xenbus_device *);
+static struct net_device *create_netdev(int, int, struct xenbus_device *);
 
 static void netfront_closing(struct xenbus_device *);
 
@@ -190,14 +210,41 @@ static int __devinit netfront_probe(stru
 	struct net_device *netdev;
 	struct netfront_info *info;
 	unsigned int handle;
+#ifndef CONFIG_XEN
+	unsigned feature_rx_flags;
+#endif
+	unsigned feature_rx_copy;
 
 	err = xenbus_scanf(XBT_NIL, dev->nodename, "handle", "%u", &handle);
 	if (err != 1) {
 		xenbus_dev_fatal(dev, err, "reading handle");
 		return err;
 	}
-
-	netdev = create_netdev(handle, dev);
+#ifndef CONFIG_XEN
+	err = xenbus_scanf(XBT_NIL, dev->otherend, "feature-rx-flags", "%u",
+			   &feature_rx_flags);
+	if (err == 1) {
+		err = xenbus_scanf(XBT_NIL,
+				   dev->otherend,
+				   "feature-rx-copy",
+				   "%u",
+				   &feature_rx_copy);
+		if (err != 1) {
+			feature_rx_copy = 0;
+			err = EINVAL;
+		}
+	} else {
+		feature_rx_copy = feature_rx_flags = 0;
+	}
+	if (!feature_rx_copy) {
+		xenbus_dev_fatal(dev, err, "need a copy-capable backend");
+		return err;
+	}
+#else
+	feature_rx_copy = 0;
+#endif
+
+	netdev = create_netdev(handle, feature_rx_copy, dev);
 	if (IS_ERR(netdev)) {
 		err = PTR_ERR(netdev);
 		xenbus_dev_fatal(dev, err, "creating netdev");
@@ -300,6 +347,19 @@ again:
 			    "event-channel", "%u", info->evtchn);
 	if (err) {
 		message = "writing event-channel";
+		goto abort_transaction;
+	}
+
+	err = xenbus_printf(xbt, dev->nodename, "use-rx-flags", "%u", 1);
+	if (err) {
+		message = "writing use-rx-flags";
+		goto abort_transaction;
+	}
+
+	err = xenbus_printf(xbt, dev->nodename, "copy-delivery-offset", "%u",
+			    16);
+	if (err) {
+		message = "writing copy-delivery-offset";
 		goto abort_transaction;
 	}
 
@@ -550,6 +610,8 @@ static void network_alloc_rx_buffers(str
 	RING_IDX req_prod = np->rx.req_prod_pvt;
 	struct xen_memory_reservation reservation;
 	grant_ref_t ref;
+	netif_rx_request_t *req;
+	int nr_flips;
 
 	if (unlikely(!netif_carrier_ok(dev)))
 		return;
@@ -592,7 +654,7 @@ static void network_alloc_rx_buffers(str
 		np->rx_target = np->rx_max_target;
 
  refill:
-	for (i = 0; ; i++) {
+	for (nr_flips = i = 0; ; i++) {
 		if ((skb = __skb_dequeue(&np->rx_batch)) == NULL)
 			break;
 
@@ -602,17 +664,78 @@ static void network_alloc_rx_buffers(str
 
 		np->rx_skbs[id] = skb;
 
-		RING_GET_REQUEST(&np->rx, req_prod + i)->id = id;
 		ref = gnttab_claim_grant_reference(&np->gref_rx_head);
 		BUG_ON((signed short)ref < 0);
 		np->grant_rx_ref[id] = ref;
-		gnttab_grant_foreign_transfer_ref(ref,
-						  np->xbdev->otherend_id,
-						  __pa(skb->head)>>PAGE_SHIFT);
-		RING_GET_REQUEST(&np->rx, req_prod + i)->gref = ref;
-		np->rx_pfn_array[i] = virt_to_mfn(skb->head);
+
+		req = RING_GET_REQUEST(&np->rx, req_prod + i);
+		if ( !np->copyall ) {
+			gnttab_grant_foreign_transfer_ref(ref,
+							  np->xbdev->otherend_id,
+							  __pa(skb->head) >> PAGE_SHIFT);
+			np->rx_pfn_array[nr_flips] = virt_to_mfn(skb->head);
+
+			if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+				/* Remove this page from map before
+				 * passing back to Xen. */
+				set_phys_to_machine(__pa(skb->head) >>
+						    PAGE_SHIFT,
+						    INVALID_P2M_ENTRY);
+
+				MULTI_update_va_mapping(np->rx_mcl+nr_flips,
+						      (unsigned long)skb->head,
+							__pte(0), 0);
+			}
+			nr_flips++;
+			req->flags = 0;
+		} else {
+			gnttab_grant_foreign_access_ref(ref,
+							np->xbdev->otherend_id,
+							virt_to_mfn(skb->head),
+							0);
+			req->flags = NETIF_RXRF_copy_packet;
+		}
+		req->gref = ref;
+		req->id = id;
+	}
+
+	if ( nr_flips != 0 ) {
+		set_xen_guest_handle(reservation.extent_start,
+				     np->rx_pfn_array);
+		reservation.nr_extents   = nr_flips;
+		reservation.extent_order = 0;
+		reservation.address_bits = 0;
+		reservation.domid        = DOMID_SELF;
+
+		/* Tell the ballon driver what is going on. */
+		balloon_update_driver_allowance(nr_flips);
 
 		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+			/* After all PTEs have been zapped, flush the
+			 * TLB. */
+			np->rx_mcl[nr_flips-1].args[MULTI_UVMFLAGS_INDEX] =
+				UVMF_TLB_FLUSH|UVMF_ALL;
+
+			/* Give away a batch of pages. */
+			np->rx_mcl[nr_flips].op = __HYPERVISOR_memory_op;
+			np->rx_mcl[nr_flips].args[0] =
+				XENMEM_decrease_reservation;
+			np->rx_mcl[nr_flips].args[1] =
+				(unsigned long)&reservation;
+
+			/* Zap PTEs and give away pages in one big
+			 * multicall. */
+			(void)HYPERVISOR_multicall(np->rx_mcl, nr_flips + 1);
+
+			/* Check return status of
+			 * HYPERVISOR_memory_op(). */
+			if (unlikely(np->rx_mcl[nr_flips].result != nr_flips))
+				panic("Unable to reduce memory reservation (%ld,%d)\n",
+				      np->rx_mcl[nr_flips].result, nr_flips);
+		} else {
+			if (HYPERVISOR_memory_op(XENMEM_decrease_reservation,
+						 &reservation) != i)
+				panic("Unable to reduce memory reservation\n");
 			/* Remove this page before passing back to Xen. */
 			set_phys_to_machine(__pa(skb->head) >> PAGE_SHIFT,
 					    INVALID_P2M_ENTRY);
@@ -620,37 +743,9 @@ static void network_alloc_rx_buffers(str
 						(unsigned long)skb->head,
 						__pte(0), 0);
 		}
-	}
-
-	/* Tell the ballon driver what is going on. */
-	balloon_update_driver_allowance(i);
-
-	set_xen_guest_handle(reservation.extent_start, np->rx_pfn_array);
-	reservation.nr_extents   = i;
-	reservation.extent_order = 0;
-	reservation.address_bits = 0;
-	reservation.domid        = DOMID_SELF;
-
-	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-		/* After all PTEs have been zapped, flush the TLB. */
-		np->rx_mcl[i-1].args[MULTI_UVMFLAGS_INDEX] =
-			UVMF_TLB_FLUSH|UVMF_ALL;
-
-		/* Give away a batch of pages. */
-		np->rx_mcl[i].op = __HYPERVISOR_memory_op;
-		np->rx_mcl[i].args[0] = XENMEM_decrease_reservation;
-		np->rx_mcl[i].args[1] = (unsigned long)&reservation;
-
-		/* Zap PTEs and give away pages in one big multicall. */
-		(void)HYPERVISOR_multicall(np->rx_mcl, i+1);
-
-		/* Check return status of HYPERVISOR_memory_op(). */
-		if (unlikely(np->rx_mcl[i].result != i))
-			panic("Unable to reduce memory reservation\n");
-	} else
-		if (HYPERVISOR_memory_op(XENMEM_decrease_reservation,
-					 &reservation) != i)
-			panic("Unable to reduce memory reservation\n");
+	} else {
+		wmb();
+	}
 
 	/* Above is a suitable barrier to ensure backend will see requests. */
 	np->rx.req_prod_pvt = req_prod + i;
@@ -774,9 +869,10 @@ static int network_start_xmit(struct sk_
 
 	if (skb->ip_summed == CHECKSUM_HW) /* local packet? */
 		tx->flags |= NETTXF_csum_blank | NETTXF_data_validated;
-	if (skb->proto_data_valid) /* remote but checksummed? */
+	if (SKB_PROTO_DATA_VALID(skb)) /* remote but checksummed? */
 		tx->flags |= NETTXF_data_validated;
 
+#ifdef HAVE_GSO
 	if (skb_shinfo(skb)->gso_size) {
 		struct netif_extra_info *gso = (struct netif_extra_info *)
 			RING_GET_REQUEST(&np->tx, ++i);
@@ -793,6 +889,7 @@ static int network_start_xmit(struct sk_
 		gso->flags = 0;
 		extra = gso;
 	}
+#endif
 
 	np->tx.req_prod_pvt = i + 1;
 
@@ -852,6 +949,8 @@ static int netif_poll(struct net_device 
 	unsigned long flags;
 	unsigned long mfn;
 	grant_ref_t ref;
+	unsigned long ret;
+	netif_rx_request_t *req;
 
 	spin_lock(&np->rx_lock);
 
@@ -883,25 +982,50 @@ static int netif_poll(struct net_device 
 			continue;
 		}
 
-		/* Memory pressure, insufficient buffer headroom, ... */
-		if ((mfn = gnttab_end_foreign_transfer_ref(ref)) == 0) {
-			if (net_ratelimit())
-				WPRINTK("Unfulfilled rx req (id=%d, st=%d).\n",
-					rx->id, rx->status);
-			RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id =
-				rx->id;
-			RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref =
-				ref;
-			np->rx.req_prod_pvt++;
-			RING_PUSH_REQUESTS(&np->rx);
-			work_done--;
-			continue;
+		skb = np->rx_skbs[rx->id];
+
+		if ( !np->copyall ) {
+			/* Memory pressure, insufficient buffer
+			 * headroom, ... */
+			if ((mfn = gnttab_end_foreign_transfer_ref(ref)) == 0)
+			{
+				if (net_ratelimit())
+					WPRINTK("Unfulfilled rx req (id=%d, st=%d).\n",
+						rx->id, rx->status);
+				req = RING_GET_REQUEST(&np->rx,
+						       np->rx.req_prod_pvt);
+				req->id = rx->id;
+				req->gref = ref;
+				np->rx.req_prod_pvt++;
+				RING_PUSH_REQUESTS(&np->rx);
+				work_done--;
+				continue;
+			}
+			/* Remap the page. */
+			if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+				MULTI_update_va_mapping(mcl,
+						      (unsigned long)skb->head,
+							pfn_pte_ma(mfn,
+								  PAGE_KERNEL),
+							0);
+				mcl++;
+				mmu->ptr = ((maddr_t)mfn << PAGE_SHIFT)
+					| MMU_MACHPHYS_UPDATE;
+				mmu->val = __pa(skb->head) >> PAGE_SHIFT;
+				mmu++;
+
+				set_phys_to_machine(__pa(skb->head)
+						    >> PAGE_SHIFT,
+						    mfn);
+			}
+		} else {
+			ret = gnttab_end_foreign_access_ref(ref, 0);
+			BUG_ON(!ret);
 		}
 
 		gnttab_release_grant_reference(&np->gref_rx_head, ref);
 		np->grant_rx_ref[rx->id] = GRANT_INVALID_REF;
 
-		skb = np->rx_skbs[rx->id];
 		add_id_to_freelist(np->rx_skbs, rx->id);
 
 		/* NB. We handle skb overflow later. */
@@ -915,30 +1039,16 @@ static int netif_poll(struct net_device 
 		 */
 		if (rx->flags & (NETRXF_data_validated|NETRXF_csum_blank)) {
 			skb->ip_summed = CHECKSUM_UNNECESSARY;
-			skb->proto_data_valid = 1;
+			SET_SKB_PROTO_DATA_VALID(skb, 1);
 		} else {
 			skb->ip_summed = CHECKSUM_NONE;
-			skb->proto_data_valid = 0;
+			SET_SKB_PROTO_DATA_VALID(skb, 0);
 		}
+#ifdef CONFIG_XEN
 		skb->proto_csum_blank = !!(rx->flags & NETRXF_csum_blank);
-
+#endif
 		np->stats.rx_packets++;
 		np->stats.rx_bytes += rx->status;
-
-		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-			/* Remap the page. */
-			MULTI_update_va_mapping(mcl, (unsigned long)skb->head,
-						pfn_pte_ma(mfn, PAGE_KERNEL),
-						0);
-			mcl++;
-			mmu->ptr = ((maddr_t)mfn << PAGE_SHIFT)
-				| MMU_MACHPHYS_UPDATE;
-			mmu->val = __pa(skb->head) >> PAGE_SHIFT;
-			mmu++;
-
-			set_phys_to_machine(__pa(skb->head) >> PAGE_SHIFT,
-					    mfn);
-		}
 
 		__skb_queue_tail(&rxq, skb);
 	}
@@ -996,8 +1106,11 @@ static int netif_poll(struct net_device 
 				/* Copy any other fields we already set up. */
 				nskb->dev = skb->dev;
 				nskb->ip_summed = skb->ip_summed;
-				nskb->proto_data_valid = skb->proto_data_valid;
+				SET_SKB_PROTO_DATA_VALID(nskb,
+						  SKB_PROTO_DATA_VALID(skb));
+#ifdef CONFIG_XEN
 				nskb->proto_csum_blank = skb->proto_csum_blank;
+#endif
 			}
 
 			/* Reinitialise and then destroy the old skbuff. */
@@ -1126,6 +1239,8 @@ static void network_connect(struct net_d
 	struct netfront_info *np = netdev_priv(dev);
 	int i, requeue_idx;
 	struct sk_buff *skb;
+	grant_ref_t gref;
+	netif_rx_request_t *req;
 
 	xennet_set_features(dev);
 
@@ -1159,13 +1274,21 @@ static void network_connect(struct net_d
 	for (requeue_idx = 0, i = 1; i <= NET_RX_RING_SIZE; i++) {
 		if ((unsigned long)np->rx_skbs[i] < PAGE_OFFSET)
 			continue;
-		gnttab_grant_foreign_transfer_ref(
-			np->grant_rx_ref[i], np->xbdev->otherend_id,
-			__pa(np->rx_skbs[i]->data) >> PAGE_SHIFT);
-		RING_GET_REQUEST(&np->rx, requeue_idx)->gref =
-			np->grant_rx_ref[i];
-		RING_GET_REQUEST(&np->rx, requeue_idx)->id = i;
-		requeue_idx++;
+		gref = np->grant_rx_ref[i];
+		skb = np->rx_skbs[i];
+		if ( !np->copyall ) {
+			gnttab_grant_foreign_transfer_ref(
+				gref, np->xbdev->otherend_id,
+				__pa(skb->data) >> PAGE_SHIFT);
+		} else {
+			gnttab_grant_foreign_access_ref(
+				gref, np->xbdev->otherend_id,
+				virt_to_mfn(skb->data), 0);
+		}
+		req = RING_GET_REQUEST(&np->rx, requeue_idx);
+		req->gref = gref;
+		req->id = i;
+		requeue_idx++; 
 	}
 
 	np->rx.req_prod_pvt = requeue_idx;
@@ -1348,10 +1471,13 @@ static void network_set_multicast_list(s
 
 /** Create a network device.
  * @param handle device handle
+ * @param copyall flag; 1 if every packet must be copied, 0 if every packet
+ * must be flipped.
  * @param val return parameter for created device
  * @return 0 on success, error code otherwise
  */
 static struct net_device * __devinit create_netdev(int handle,
+						   int copyall,
 						   struct xenbus_device *dev)
 {
 	int i, err = 0;
@@ -1368,6 +1494,7 @@ static struct net_device * __devinit cre
 	np                = netdev_priv(netdev);
 	np->handle        = handle;
 	np->xbdev         = dev;
+	np->copyall       = copyall;
 
 	netif_carrier_off(netdev);
 
@@ -1418,7 +1545,11 @@ static struct net_device * __devinit cre
 	netdev->uninit          = netif_uninit;
 	netdev->change_mtu	= xennet_change_mtu;
 	netdev->weight          = 64;
+#ifdef CONFIG_XEN
 	netdev->features        = NETIF_F_IP_CSUM;
+#else
+	netdev->features        = 0;
+#endif
 
 	SET_ETHTOOL_OPS(netdev, &network_ethtool_ops);
 	SET_MODULE_OWNER(netdev);
@@ -1581,8 +1712,10 @@ static int __init netif_init(void)
 	if (!is_running_on_xen())
 		return -ENODEV;
 
+#ifdef CONFIG_XEN
 	if (xen_start_info->flags & SIF_INITDOMAIN)
 		return 0;
+#endif
 
 	IPRINTK("Initialising virtual ethernet driver.\n");
 
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.c
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.c	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.c	Mon Jul 17 23:38:59 2006 +0100
@@ -39,6 +39,8 @@
 #include <xen/xenbus.h>
 #include "xenbus_comms.h"
 
+void *shared_xenstore_buf;
+
 static int xenbus_irq;
 
 extern void xenbus_probe(void *);
@@ -49,7 +51,7 @@ DECLARE_WAIT_QUEUE_HEAD(xb_waitq);
 
 static inline struct xenstore_domain_interface *xenstore_domain_interface(void)
 {
-	return mfn_to_virt(xen_start_info->store_mfn);
+	return shared_xenstore_buf;
 }
 
 static irqreturn_t wake_waiting(int irq, void *unused, struct pt_regs *regs)
@@ -129,7 +131,7 @@ int xb_write(const void *data, unsigned 
 		intf->req_prod += avail;
 
 		/* This implies mb() before other side sees interrupt. */
-		notify_remote_via_evtchn(xen_start_info->store_evtchn);
+		notify_remote_via_evtchn(xen_store_evtchn);
 	}
 
 	return 0;
@@ -180,7 +182,7 @@ int xb_read(void *data, unsigned len)
 		pr_debug("Finished read of %i bytes (%i to go)\n", avail, len);
 
 		/* Implies mb(): they will see new header. */
-		notify_remote_via_evtchn(xen_start_info->store_evtchn);
+		notify_remote_via_evtchn(xen_store_evtchn);
 	}
 
 	return 0;
@@ -195,7 +197,7 @@ int xb_init_comms(void)
 		unbind_from_irqhandler(xenbus_irq, &xb_waitq);
 
 	err = bind_evtchn_to_irqhandler(
-		xen_start_info->store_evtchn, wake_waiting,
+		xen_store_evtchn, wake_waiting,
 		0, "xenbus", &xb_waitq);
 	if (err <= 0) {
 		printk(KERN_ERR "XENBUS request irq failed %i\n", err);
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.h
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.h	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.h	Mon Jul 17 23:38:59 2006 +0100
@@ -39,5 +39,7 @@ int xb_read(void *data, unsigned len);
 int xb_read(void *data, unsigned len);
 int xs_input_avail(void);
 extern wait_queue_head_t xb_waitq;
+extern void *shared_xenstore_buf;
+extern int xen_store_evtchn;
 
 #endif /* _XENBUS_COMMS_H */
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_dev.c
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_dev.c	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_dev.c	Mon Jul 17 23:38:59 2006 +0100
@@ -48,6 +48,7 @@
 #include <xen/xenbus.h>
 #include <xen/xen_proc.h>
 #include <asm/hypervisor.h>
+#include <asm/io.h>
 
 struct xenbus_dev_transaction {
 	struct list_head list;
@@ -181,7 +182,7 @@ static int xenbus_dev_open(struct inode 
 {
 	struct xenbus_dev_data *u;
 
-	if (xen_start_info->store_evtchn == 0)
+	if (xen_store_evtchn == 0)
 		return -ENOENT;
 
 	nonseekable_open(inode, filp);
@@ -232,7 +233,7 @@ static struct file_operations xenbus_dev
 	.poll = xenbus_dev_poll,
 };
 
-static int __init
+int __init
 xenbus_dev_init(void)
 {
 	xenbus_dev_intf = create_xen_proc_entry("xenbus", 0400);
@@ -242,4 +243,6 @@ xenbus_dev_init(void)
 	return 0;
 }
 
+#ifndef MODULE
 __initcall(xenbus_dev_init);
+#endif
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c	Mon Jul 17 23:38:59 2006 +0100
@@ -44,6 +44,7 @@
 #include <linux/kthread.h>
 
 #include <asm/io.h>
+#include <asm/maddr.h>
 #include <asm/page.h>
 #include <asm/pgtable.h>
 #include <asm/hypervisor.h>
@@ -51,8 +52,12 @@
 #include <xen/xen_proc.h>
 #include <xen/evtchn.h>
 #include <xen/features.h>
+#include <xen/hvm.h>
 
 #include "xenbus_comms.h"
+
+int xen_store_evtchn;
+static unsigned long xen_store_mfn;
 
 extern struct mutex xenwatch_mutex;
 
@@ -915,8 +920,7 @@ static int xsd_kva_mmap(struct file *fil
 	if ((size > PAGE_SIZE) || (vma->vm_pgoff != 0))
 		return -EINVAL;
 
-	if (remap_pfn_range(vma, vma->vm_start,
-			    mfn_to_pfn(xen_start_info->store_mfn),
+	if (remap_pfn_range(vma, vma->vm_start, mfn_to_pfn(xen_store_mfn),
 			    size, vma->vm_page_prot))
 		return -EAGAIN;
 
@@ -928,7 +932,7 @@ static int xsd_kva_read(char *page, char
 {
 	int len;
 
-	len  = sprintf(page, "0x%p", mfn_to_virt(xen_start_info->store_mfn));
+	len  = sprintf(page, "0x%p", mfn_to_virt(xen_store_mfn));
 	*eof = 1;
 	return len;
 }
@@ -938,12 +942,11 @@ static int xsd_port_read(char *page, cha
 {
 	int len;
 
-	len  = sprintf(page, "%d", xen_start_info->store_evtchn);
+	len  = sprintf(page, "%d", xen_store_evtchn);
 	*eof = 1;
 	return len;
 }
 #endif
-
 
 static int __init xenbus_probe_init(void)
 {
@@ -962,7 +965,11 @@ static int __init xenbus_probe_init(void
 	/*
 	 * Domain0 doesn't have a store_evtchn or store_mfn yet.
 	 */
+#ifdef CONFIG_XEN
 	dom0 = (xen_start_info->store_evtchn == 0);
+#else
+	dom0 = 0;
+#endif
 
 	if (dom0) {
 		struct evtchn_alloc_unbound alloc_unbound;
@@ -972,7 +979,7 @@ static int __init xenbus_probe_init(void
 		if (!page)
 			return -ENOMEM;
 
-		xen_start_info->store_mfn =
+		xen_store_mfn =
 			pfn_to_mfn(virt_to_phys((void *)page) >>
 				   PAGE_SHIFT);
 
@@ -985,7 +992,7 @@ static int __init xenbus_probe_init(void
 		if (err == -ENOSYS)
 			goto err;
 		BUG_ON(err);
-		xen_start_info->store_evtchn = alloc_unbound.port;
+		xen_store_evtchn = alloc_unbound.port;
 
 #ifdef CONFIG_PROC_FS
 		/* And finally publish the above info in /proc/xen */
@@ -1001,8 +1008,21 @@ static int __init xenbus_probe_init(void
 		if (xsd_port_intf)
 			xsd_port_intf->read_proc = xsd_port_read;
 #endif
-	} else
+		shared_xenstore_buf = mfn_to_virt(xen_store_mfn);
+	} else {
 		xenstored_ready = 1;
+#ifdef CONFIG_XEN
+		xen_store_evtchn = xen_start_info->store_evtchn;
+		xen_store_mfn = xen_start_info->store_mfn;
+		shared_xenstore_buf = mfn_to_virt(xen_store_mfn);
+#else
+		xen_store_evtchn = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN);
+		xen_store_mfn = hvm_get_parameter(HVM_PARAM_STORE_PFN);
+		shared_xenstore_buf = ioremap(xen_store_mfn << PAGE_SHIFT,
+					      PAGE_SIZE);
+		xenbus_dev_init();
+#endif
+	}
 
 	/* Initialize the interface to xenstore. */
 	err = xs_init();
@@ -1035,8 +1055,10 @@ static int __init xenbus_probe_init(void
 }
 
 postcore_initcall(xenbus_probe_init);
-
-
+MODULE_LICENSE("Dual BSD/GPL");
+
+
+#ifndef MODULE
 static int is_disconnected_device(struct device *dev, void *data)
 {
 	struct xenbus_device *xendev = to_xenbus_device(dev);
@@ -1105,3 +1127,4 @@ static int __init wait_for_devices(void)
 }
 
 late_initcall(wait_for_devices);
+#endif
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h
--- a/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h	Mon Jul 17 23:38:59 2006 +0100
@@ -42,6 +42,7 @@
 #define __STR(x) #x
 #define STR(x) __STR(x)
 
+#ifdef CONFIG_XEN
 #define _hypercall0(type, name)			\
 ({						\
 	long __res;				\
@@ -114,6 +115,92 @@
 		: "memory" );					\
 	(type)__res;						\
 })
+#else
+#define _hypercall0(type, name)			                \
+({						                \
+	long __res;				                \
+	asm volatile (				                \
+                "movl hypercall_page, %%eax\n"                  \
+                "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+		"call *%%eax"                                   \
+		: "=a" (__res)			                \
+		:				                \
+		: "memory" );			                \
+	(type)__res;				                \
+})
+
+#define _hypercall1(type, name, a1)				\
+({								\
+	long __res, __ign1;					\
+	asm volatile (						\
+                "movl hypercall_page, %%eax\n"                  \
+                "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+		"call *%%eax"                                   \
+		: "=a" (__res), "=b" (__ign1)			\
+		: "1" ((long)(a1))				\
+		: "memory" );					\
+	(type)__res;						\
+})
+
+#define _hypercall2(type, name, a1, a2)				\
+({								\
+	long __res, __ign1, __ign2;				\
+	asm volatile (						\
+                "movl hypercall_page, %%eax\n"                  \
+                "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+		"call *%%eax"                                   \
+		: "=a" (__res), "=b" (__ign1), "=c" (__ign2)	\
+		: "1" ((long)(a1)), "2" ((long)(a2))		\
+		: "memory" );					\
+	(type)__res;						\
+})
+
+#define _hypercall3(type, name, a1, a2, a3)			\
+({								\
+	long __res, __ign1, __ign2, __ign3;			\
+	asm volatile (						\
+                "movl hypercall_page, %%eax\n"                  \
+                "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+		"call *%%eax"                                   \
+		: "=a" (__res), "=b" (__ign1), "=c" (__ign2), 	\
+		"=d" (__ign3)					\
+		: "1" ((long)(a1)), "2" ((long)(a2)),		\
+		"3" ((long)(a3))				\
+		: "memory" );					\
+	(type)__res;						\
+})
+
+#define _hypercall4(type, name, a1, a2, a3, a4)			\
+({								\
+	long __res, __ign1, __ign2, __ign3, __ign4;		\
+	asm volatile (						\
+                "movl hypercall_page, %%eax\n"                  \
+                "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+		"call *%%eax"                                   \
+		: "=a" (__res), "=b" (__ign1), "=c" (__ign2),	\
+		"=d" (__ign3), "=S" (__ign4)			\
+		: "1" ((long)(a1)), "2" ((long)(a2)),		\
+		"3" ((long)(a3)), "4" ((long)(a4))		\
+		: "memory" );					\
+	(type)__res;						\
+})
+
+#define _hypercall5(type, name, a1, a2, a3, a4, a5)		\
+({								\
+	long __res, __ign1, __ign2, __ign3, __ign4, __ign5;	\
+	asm volatile (						\
+                "movl hypercall_page, %%eax\n"                  \
+                "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+		"call *%%eax"                                   \
+		: "=a" (__res), "=b" (__ign1), "=c" (__ign2),	\
+		"=d" (__ign3), "=S" (__ign4), "=D" (__ign5)	\
+		: "1" ((long)(a1)), "2" ((long)(a2)),		\
+		"3" ((long)(a3)), "4" ((long)(a4)),		\
+		"5" ((long)(a5))				\
+		: "memory" );					\
+	(type)__res;						\
+})
+#endif
 
 static inline int
 HYPERVISOR_set_trap_table(
@@ -354,6 +441,13 @@ HYPERVISOR_nmi_op(
 	return _hypercall2(int, nmi_op, op, arg);
 }
 
+static inline unsigned long
+HYPERVISOR_hvm_op(
+    int op, void *arg)
+{
+    return _hypercall2(unsigned long, hvm_op, op, arg);
+}
+
 static inline int
 HYPERVISOR_callback_op(
 	int cmd, void *arg)
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/page.h
--- a/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/page.h	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/page.h	Mon Jul 17 23:38:59 2006 +0100
@@ -20,6 +20,7 @@
 #include <xen/interface/xen.h>
 #include <xen/features.h>
 #include <xen/foreign_page.h>
+#include <asm/maddr.h>
 
 #define arch_free_page(_page,_order)			\
 ({	int foreign = PageForeign(_page);		\
@@ -59,123 +60,6 @@
 
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
 #define copy_user_page(to, from, vaddr, pg)	copy_page(to, from)
-
-/**** MACHINE <-> PHYSICAL CONVERSION MACROS ****/
-#define INVALID_P2M_ENTRY	(~0UL)
-#define FOREIGN_FRAME_BIT	(1UL<<31)
-#define FOREIGN_FRAME(m)	((m) | FOREIGN_FRAME_BIT)
-
-extern unsigned long *phys_to_machine_mapping;
-
-#undef machine_to_phys_mapping
-extern unsigned long *machine_to_phys_mapping;
-extern unsigned int   machine_to_phys_order;
-
-static inline unsigned long pfn_to_mfn(unsigned long pfn)
-{
-	if (xen_feature(XENFEAT_auto_translated_physmap))
-		return pfn;
-	return phys_to_machine_mapping[(unsigned int)(pfn)] &
-		~FOREIGN_FRAME_BIT;
-}
-
-static inline int phys_to_machine_mapping_valid(unsigned long pfn)
-{
-	if (xen_feature(XENFEAT_auto_translated_physmap))
-		return 1;
-	return (phys_to_machine_mapping[pfn] != INVALID_P2M_ENTRY);
-}
-
-static inline unsigned long mfn_to_pfn(unsigned long mfn)
-{
-	extern unsigned long max_mapnr;
-	unsigned long pfn;
-
-	if (xen_feature(XENFEAT_auto_translated_physmap))
-		return mfn;
-
-	if (unlikely((mfn >> machine_to_phys_order) != 0))
-		return max_mapnr;
-
-	/* The array access can fail (e.g., device space beyond end of RAM). */
-	asm (
-		"1:	movl %1,%0\n"
-		"2:\n"
-		".section .fixup,\"ax\"\n"
-		"3:	movl %2,%0\n"
-		"	jmp  2b\n"
-		".previous\n"
-		".section __ex_table,\"a\"\n"
-		"	.align 4\n"
-		"	.long 1b,3b\n"
-		".previous"
-		: "=r" (pfn)
-		: "m" (machine_to_phys_mapping[mfn]), "m" (max_mapnr) );
-
-	return pfn;
-}
-
-/*
- * We detect special mappings in one of two ways:
- *  1. If the MFN is an I/O page then Xen will set the m2p entry
- *     to be outside our maximum possible pseudophys range.
- *  2. If the MFN belongs to a different domain then we will certainly
- *     not have MFN in our p2m table. Conversely, if the page is ours,
- *     then we'll have p2m(m2p(MFN))==MFN.
- * If we detect a special mapping then it doesn't have a 'struct page'.
- * We force !pfn_valid() by returning an out-of-range pointer.
- *
- * NB. These checks require that, for any MFN that is not in our reservation,
- * there is no PFN such that p2m(PFN) == MFN. Otherwise we can get confused if
- * we are foreign-mapping the MFN, and the other domain as m2p(MFN) == PFN.
- * Yikes! Various places must poke in INVALID_P2M_ENTRY for safety.
- *
- * NB2. When deliberately mapping foreign pages into the p2m table, you *must*
- *      use FOREIGN_FRAME(). This will cause pte_pfn() to choke on it, as we
- *      require. In all the cases we care about, the FOREIGN_FRAME bit is
- *      masked (e.g., pfn_to_mfn()) so behaviour there is correct.
- */
-static inline unsigned long mfn_to_local_pfn(unsigned long mfn)
-{
-	extern unsigned long max_mapnr;
-	unsigned long pfn = mfn_to_pfn(mfn);
-	if ((pfn < max_mapnr)
-	    && !xen_feature(XENFEAT_auto_translated_physmap)
-	    && (phys_to_machine_mapping[pfn] != mfn))
-		return max_mapnr; /* force !pfn_valid() */
-	return pfn;
-}
-
-static inline void set_phys_to_machine(unsigned long pfn, unsigned long mfn)
-{
-	if (xen_feature(XENFEAT_auto_translated_physmap)) {
-		BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
-		return;
-	}
-	phys_to_machine_mapping[pfn] = mfn;
-}
-
-/* Definitions for machine and pseudophysical addresses. */
-#ifdef CONFIG_X86_PAE
-typedef unsigned long long paddr_t;
-typedef unsigned long long maddr_t;
-#else
-typedef unsigned long paddr_t;
-typedef unsigned long maddr_t;
-#endif
-
-static inline maddr_t phys_to_machine(paddr_t phys)
-{
-	maddr_t machine = pfn_to_mfn(phys >> PAGE_SHIFT);
-	machine = (machine << PAGE_SHIFT) | (phys & ~PAGE_MASK);
-	return machine;
-}
-static inline paddr_t machine_to_phys(maddr_t machine)
-{
-	paddr_t phys = mfn_to_pfn(machine >> PAGE_SHIFT);
-	phys = (phys << PAGE_SHIFT) | (machine & ~PAGE_MASK);
-	return phys;
-}
 
 /*
  * These are used to make use of C type-checking..
@@ -254,7 +138,6 @@ static inline unsigned long pgd_val(pgd_
 
 #define pgprot_val(x)	((x).pgprot)
 
-#define __pte_ma(x)	((pte_t) { (x) } )
 #define __pgprot(x)	((pgprot_t) { (x) } )
 
 #endif /* !__ASSEMBLY__ */
@@ -323,11 +206,6 @@ extern int page_is_ram(unsigned long pag
 	((current->personality & READ_IMPLIES_EXEC) ? VM_EXEC : 0 ) | \
 		 VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
 
-/* VIRT <-> MACHINE conversion */
-#define virt_to_machine(v)	(phys_to_machine(__pa(v)))
-#define virt_to_mfn(v)		(pfn_to_mfn(__pa(v) >> PAGE_SHIFT))
-#define mfn_to_virt(m)		(__va(mfn_to_pfn(m) << PAGE_SHIFT))
-
 #define __HAVE_ARCH_GATE_AREA 1
 
 #endif /* __KERNEL__ */
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-2level.h
--- a/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-2level.h	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-2level.h	Mon Jul 17 23:38:59 2006 +0100
@@ -45,7 +45,6 @@
 
 #define pte_none(x)		(!(x).pte_low)
 #define pfn_pte(pfn, prot)	__pte(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
-#define pfn_pte_ma(pfn, prot)	__pte_ma(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
 #define pfn_pmd(pfn, prot)	__pmd(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
 
 /*
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-3level.h
--- a/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-3level.h	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-3level.h	Mon Jul 17 23:38:59 2006 +0100
@@ -151,18 +151,6 @@ static inline int pte_none(pte_t pte)
 
 extern unsigned long long __supported_pte_mask;
 
-static inline pte_t pfn_pte_ma(unsigned long page_nr, pgprot_t pgprot)
-{
-	pte_t pte;
-
-	pte.pte_high = (page_nr >> (32 - PAGE_SHIFT)) | \
-					(pgprot_val(pgprot) >> 32);
-	pte.pte_high &= (__supported_pte_mask >> 32);
-	pte.pte_low = ((page_nr << PAGE_SHIFT) | pgprot_val(pgprot)) & \
-							__supported_pte_mask;
-	return pte;
-}
-
 static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
 {
 	return pfn_pte_ma(pfn_to_mfn(page_nr), pgprot);
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/xen/xenbus.h
--- a/linux-2.6-xen-sparse/include/xen/xenbus.h	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/include/xen/xenbus.h	Mon Jul 17 23:38:59 2006 +0100
@@ -295,5 +295,6 @@ void xenbus_dev_fatal(struct xenbus_devi
 void xenbus_dev_fatal(struct xenbus_device *dev, int err, const char *fmt,
 		      ...);
 
+int __init xenbus_dev_init(void);
 
 #endif /* _XEN_XENBUS_H */
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/maddr.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/maddr.h	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,153 @@
+#ifndef _I386_MADDR_H
+#define _I386_MADDR_H
+
+#include <xen/features.h>
+#include <xen/interface/arch-x86_32.h>
+#include <xen/interface/xen.h>
+
+/**** MACHINE <-> PHYSICAL CONVERSION MACROS ****/
+#define INVALID_P2M_ENTRY	(~0UL)
+#define FOREIGN_FRAME_BIT	(1UL<<31)
+#define FOREIGN_FRAME(m)	((m) | FOREIGN_FRAME_BIT)
+
+extern unsigned long *phys_to_machine_mapping;
+
+#undef machine_to_phys_mapping
+extern unsigned long *machine_to_phys_mapping;
+extern unsigned int   machine_to_phys_order;
+
+static inline unsigned long pfn_to_mfn(unsigned long pfn)
+{
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return pfn;
+	return phys_to_machine_mapping[(unsigned int)(pfn)] &
+		~FOREIGN_FRAME_BIT;
+}
+
+static inline int phys_to_machine_mapping_valid(unsigned long pfn)
+{
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return 1;
+	return (phys_to_machine_mapping[pfn] != INVALID_P2M_ENTRY);
+}
+
+static inline unsigned long mfn_to_pfn(unsigned long mfn)
+{
+#ifdef CONFIG_XEN
+	extern unsigned long max_mapnr;
+	unsigned long pfn;
+#endif
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return mfn;
+
+#ifndef CONFIG_XEN
+	BUG();
+#else
+	if (unlikely((mfn >> machine_to_phys_order) != 0))
+		return max_mapnr;
+
+	/* The array access can fail (e.g., device space beyond end of RAM). */
+	asm (
+		"1:	movl %1,%0\n"
+		"2:\n"
+		".section .fixup,\"ax\"\n"
+		"3:	movl %2,%0\n"
+		"	jmp  2b\n"
+		".previous\n"
+		".section __ex_table,\"a\"\n"
+		"	.align 4\n"
+		"	.long 1b,3b\n"
+		".previous"
+		: "=r" (pfn)
+		: "m" (machine_to_phys_mapping[mfn]), "m" (max_mapnr) );
+
+	return pfn;
+#endif
+}
+
+/*
+ * We detect special mappings in one of two ways:
+ *  1. If the MFN is an I/O page then Xen will set the m2p entry
+ *     to be outside our maximum possible pseudophys range.
+ *  2. If the MFN belongs to a different domain then we will certainly
+ *     not have MFN in our p2m table. Conversely, if the page is ours,
+ *     then we'll have p2m(m2p(MFN))==MFN.
+ * If we detect a special mapping then it doesn't have a 'struct page'.
+ * We force !pfn_valid() by returning an out-of-range pointer.
+ *
+ * NB. These checks require that, for any MFN that is not in our reservation,
+ * there is no PFN such that p2m(PFN) == MFN. Otherwise we can get confused if
+ * we are foreign-mapping the MFN, and the other domain as m2p(MFN) == PFN.
+ * Yikes! Various places must poke in INVALID_P2M_ENTRY for safety.
+ *
+ * NB2. When deliberately mapping foreign pages into the p2m table, you *must*
+ *      use FOREIGN_FRAME(). This will cause pte_pfn() to choke on it, as we
+ *      require. In all the cases we care about, the FOREIGN_FRAME bit is
+ *      masked (e.g., pfn_to_mfn()) so behaviour there is correct.
+ */
+static inline unsigned long mfn_to_local_pfn(unsigned long mfn)
+{
+	extern unsigned long max_mapnr;
+	unsigned long pfn = mfn_to_pfn(mfn);
+	if ((pfn < max_mapnr)
+	    && !xen_feature(XENFEAT_auto_translated_physmap)
+	    && (phys_to_machine_mapping[pfn] != mfn))
+		return max_mapnr; /* force !pfn_valid() */
+	return pfn;
+}
+
+static inline void set_phys_to_machine(unsigned long pfn, unsigned long mfn)
+{
+	if (xen_feature(XENFEAT_auto_translated_physmap)) {
+		BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
+		return;
+	}
+	phys_to_machine_mapping[pfn] = mfn;
+}
+
+/* Definitions for machine and pseudophysical addresses. */
+#ifdef CONFIG_X86_PAE
+typedef unsigned long long paddr_t;
+typedef unsigned long long maddr_t;
+#else
+typedef unsigned long paddr_t;
+typedef unsigned long maddr_t;
+#endif
+
+static inline maddr_t phys_to_machine(paddr_t phys)
+{
+	maddr_t machine = pfn_to_mfn(phys >> PAGE_SHIFT);
+	machine = (machine << PAGE_SHIFT) | (phys & ~PAGE_MASK);
+	return machine;
+}
+static inline paddr_t machine_to_phys(maddr_t machine)
+{
+	paddr_t phys = mfn_to_pfn(machine >> PAGE_SHIFT);
+	phys = (phys << PAGE_SHIFT) | (machine & ~PAGE_MASK);
+	return phys;
+}
+
+/* VIRT <-> MACHINE conversion */
+#define virt_to_machine(v)	(phys_to_machine(__pa(v)))
+#define virt_to_mfn(v)		(pfn_to_mfn(__pa(v) >> PAGE_SHIFT))
+#define mfn_to_virt(m)		(__va(mfn_to_pfn(m) << PAGE_SHIFT))
+
+#ifdef CONFIG_X86_PAE
+static inline pte_t pfn_pte_ma(unsigned long page_nr, pgprot_t pgprot)
+{
+	pte_t pte;
+
+	pte.pte_high = (page_nr >> (32 - PAGE_SHIFT)) | \
+					(pgprot_val(pgprot) >> 32);
+	pte.pte_high &= (__supported_pte_mask >> 32);
+	pte.pte_low = ((page_nr << PAGE_SHIFT) | pgprot_val(pgprot)) & \
+							__supported_pte_mask;
+	return pte;
+}
+#else
+#define pfn_pte_ma(pfn, prot)	__pte_ma(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
+#endif
+
+#define __pte_ma(x)	((pte_t) { (x) } )
+
+#endif /* _I386_MADDR_H */
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/xen/hvm.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/linux-2.6-xen-sparse/include/xen/hvm.h	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,17 @@
+/* Simple wrappers around HVM functions */
+#ifndef XEN_HVM_H__
+#define XEN_HVM_H__
+
+#include <xen/interface/hvm/params.h>
+#include <asm/hypercall.h>
+
+static inline unsigned long hvm_get_parameter(int idx)
+{
+	struct xen_hvm_param xhv;
+
+	xhv.domid = DOMID_SELF;
+	xhv.index = idx;
+	return HYPERVISOR_hvm_op(HVMOP_get_param, &xhv);
+}
+
+#endif /* XEN_HVM_H__ */
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/Makefile
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/Makefile	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,22 @@
+include $(M)/overrides.mk
+
+obj-$(CONFIG_XEN_EVTCHN_PCI)	+= evtchn-pci/
+obj-$(CONFIG_XEN_BLKDEV_FRONTEND)	+= blkfront/
+obj-$(CONFIG_XEN_NETDEV_FRONTEND)	+= netfront/
+obj-m	+= xenbus/
+
+
+debug:
+	chmod +x compile.sh
+	chmod +x mkbuildtree
+	echo $(XEN_DRIVERS_ROOT)
+	echo $(EXTRA_CFLAGS)
+	./compile.sh
+
+clean:
+	find . -name "*.o" |xargs rm -f
+	find . -name "*.ko" |xargs rm -f
+	find . -name "*.mod.c" |xargs rm -f
+	find . -name ".*.cmd" |xargs rm -f
+	rm .tmp_versions -rf
+    
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/README
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/README	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,7 @@
+To build, run ./mkbuildtree and then
+
+make -C /path/to/kernel/source M=$PWD modules
+
+You get four modules, xen-evtchn-pci.ko, xenbus.ko, xen-vbd.ko, and
+xen-vnif.ko.  Load xen-evtchn-pci first, then xenbus, and then
+whichever of xen-vbd and xen-vnif you happen to need.
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/blkfront/Kbuild
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/blkfront/Kbuild	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,6 @@
+include $(M)/overrides.mk
+
+obj-m += xen-vbd.o
+
+xen-vbd-objs := blkfront.o vbd.o
+
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/Kbuild
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/Kbuild	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,8 @@
+include $(M)/overrides.mk
+
+obj-m := xen-evtchn-pci.o
+
+EXTRA_CFLAGS += -I$(M)/evtchn-pci
+
+xen-evtchn-pci-objs := evtchn.o evtchn-pci.o gnttab.o xen_proc.o xen_support.o\
+	features.o
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/debuginfo.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/debuginfo.h	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,56 @@
+#ifndef __DEBUG_INFO__
+#define __DEBUG_INFO__
+//#define INSERT_TEST
+//#define VMX_DEBUG_INFO
+//#define KERNEL_DEBUG_INFO
+//#define FREQ_PRINT
+
+#define infotime(seconds, x, a...) \
+{           \
+static unsigned long prevjiffy = 0; \
+        if(time_after(jiffies, prevjiffy + seconds*HZ)) { \
+            prevjiffy = jiffies; \
+            vmx_printk(x, ##a); \
+        } \
+}
+
+#ifdef KERNEL_DEBUG_INFO
+#define dprintk(x, a...) \
+	printk("<vbd> " x, ##a)
+#define dprintknl(x, a...) \
+	printk(x, ##a)
+#define dprintkentry(x, a...) \
+	printk("<vbd-entry> " x "\n", ##a)
+#define dprintkexit(x, a...) \
+	printk("<vbd-exit> " x "\n", ##a)
+#ifdef FREQ_PRINT
+#define dprintkfreq(x, a...) \
+	printk("<vbd-freq> " x, ##a)
+#else
+#define dprintkfreq(x, a...)
+#endif 
+#elif defined(VMX_DEBUG_INFO)
+#define dprintk(x, a...) \
+	vmx_printk("<vbd> " x, ##a)
+#define dprintknl(x, a...) \
+	vmx_printk(x, ##a)
+#define dprintkentry(x, a...) \
+	vmx_printk("<vbd-entry> " x "\n", ##a)
+#define dprintkexit(x, a...) \
+	vmx_printk("<vbd-exit> " x "\n", ##a)
+#ifdef FREQ_PRINT
+#define dprintkfreq(x, a...) \
+	vmx_printk("<vbd-freq> " x, ##a)
+#else
+#define dprintkfreq(x, a...)
+#endif 
+
+#else
+#define dprintk(x, a...)
+#define dprintkentry(x, a...)
+#define dprintkexit(x, a...)
+#define dprintkfreq(x, a...)
+#define dprintknl(x, a...)
+#endif
+int vmx_printk(const char *fmt, ...);
+#endif
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/evtchn-pci.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/evtchn-pci.c	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,299 @@
+/******************************************************************************
+ * evtchn-pci.c
+ * xen event channel fake PCI device driver
+ * Copyright (C) 2005, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/errno.h>
+#include <linux/pci.h>
+#include <linux/init.h>
+#include <linux/version.h>
+#include <linux/interrupt.h>
+#include <asm/system.h>
+#include <asm/io.h>
+#include <asm/irq.h>
+#include <asm/uaccess.h>
+#include <asm/hypervisor.h>
+#include <xen/interface/memory.h>
+
+#include "evtchn-pci.h"
+
+#define DRV_NAME    "xen-evtchn-pci"
+#define DRV_VERSION "0.10"
+#define DRV_RELDATE "03/03/2005"
+
+extern void *hypercall_page;
+
+static int callbackirq = 3;		/* legacy mode irq */
+static int nopci = 0;
+static char version[] __devinitdata =
+	KERN_INFO DRV_NAME ":version " DRV_VERSION " " DRV_RELDATE
+	" Xiaofeng. Ling\n";
+
+MODULE_AUTHOR("xiaofeng.ling@intel.com");
+MODULE_DESCRIPTION("Xen evtchn PCI device");
+MODULE_LICENSE("GPL");
+
+MODULE_PARM(nopci, "i");
+MODULE_PARM(callbackirq, "i");
+MODULE_PARM_DESC(callbackirq, "callback irq number for xen event channel");
+
+#define XEN_EVTCHN_VENDOR_ID 0xfffd
+#define XEN_EVTCHN_DEVICE_ID 0x0101
+
+static struct pci_device_id evtchn_pci_tbl[] __devinitdata = {
+	{XEN_EVTCHN_VENDOR_ID, XEN_EVTCHN_DEVICE_ID,
+	 PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0},
+	{0,}
+};
+
+MODULE_DEVICE_TABLE(pci, evtchn_pci_tbl);
+
+unsigned long *phys_to_machine_mapping;
+EXPORT_SYMBOL(phys_to_machine_mapping);
+
+static int __init init_xen_info(void)
+{
+	unsigned long shared_info_frame;
+	struct xen_add_to_physmap xatp;
+
+	setup_xen_features();
+
+	shared_info_frame = alloc_xen_mmio(PAGE_SIZE) >> PAGE_SHIFT;
+	xatp.domid = DOMID_SELF;
+	xatp.idx = 0;
+	xatp.space = XENMAPSPACE_shared_info;
+	xatp.gpfn = shared_info_frame;
+	BUG_ON(HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp));
+	HYPERVISOR_shared_info =
+		ioremap(shared_info_frame << PAGE_SHIFT, PAGE_SIZE);
+
+	if (!HYPERVISOR_shared_info)
+		panic("can't map shared info\n");
+
+	dprintk("ioremap shared_info successful\n");
+
+	phys_to_machine_mapping = NULL;
+
+	gnttab_init();
+	evtchn_init();
+
+	return 0;
+}
+
+static void __devexit evtchn_pci_remove(struct pci_dev *pdev)
+{
+	long ioaddr, iolen;
+
+	/*if there are io region, don't forget to release */
+	ioaddr = pci_resource_start(pdev, 0);
+	iolen = pci_resource_len(pdev, 0);
+	if (ioaddr != 0)
+	{
+		release_region(ioaddr, iolen);
+	}
+
+	pci_set_drvdata(pdev, NULL);
+	free_irq(pdev->irq, NULL);
+}
+
+extern irqreturn_t evtchn_interrupt(int irq, void *devid, struct pt_regs *regs);
+
+unsigned long evtchn_mmio = 0xc000000;
+unsigned long evtchn_mmio_alloc;
+unsigned long evtchn_mmiolen = 0x1000000;
+
+unsigned long alloc_xen_mmio(unsigned long len)
+{
+	unsigned long addr;
+
+	addr = 0;
+	if (evtchn_mmio_alloc + len <= evtchn_mmiolen)
+	{
+		addr = evtchn_mmio + evtchn_mmio_alloc;
+		evtchn_mmio_alloc += len;
+	} else {
+		panic("ran out of xen mmio space");
+	}
+	return addr;
+}
+
+static int __devinit evtchn_pci_init(struct pci_dev *pdev,
+				     const struct pci_device_id *ent)
+{
+	int i, ret, irq;
+	long ioaddr, iolen;
+	long mmio_addr, mmio_len;
+
+	printk(KERN_INFO DRV_NAME ":found evtchn pci device model, do init\n");
+
+#ifndef MODULE
+	static int printed_version;
+	if (!printed_version++)
+		printk(version);
+#endif
+
+	i = pci_enable_device(pdev);
+	if (i)
+		return i;
+
+	ioaddr = pci_resource_start(pdev, 0);
+	iolen = pci_resource_len(pdev, 0);
+
+	mmio_addr = pci_resource_start(pdev, 1);
+	mmio_len = pci_resource_len(pdev, 1);
+
+	if (mmio_addr != 0)
+	{
+		if (request_mem_region(mmio_addr, mmio_len, DRV_NAME) == NULL)
+		{
+			printk(KERN_ERR ":MEM I/O resource 0x%lx @ 0x%lx busy\n",
+				   mmio_addr, mmio_len);
+			return -EBUSY;
+		}
+		evtchn_mmio = mmio_addr;
+		evtchn_mmiolen = mmio_len;
+	}
+	else
+	{
+		printk(KERN_WARNING DRV_NAME ":no MMIO found!\n");
+	}
+
+	irq = pdev->irq;
+	callbackirq = irq;
+
+	/* 
+	 *  maybe some day we may use I/O port for checking status 
+	 *  when sharing interrupts 
+	 */
+	if (ioaddr != 0)
+	{
+		if (request_region(ioaddr, iolen, DRV_NAME) == NULL)
+		{
+			printk(KERN_ERR DRV_NAME ":I/O resource 0x%lx @ 0x%lx busy\n",
+				   iolen, ioaddr);
+			return -EBUSY;
+		}
+
+		hypercall_page = (void *)__get_free_page(GFP_KERNEL);
+		if (!hypercall_page)
+			panic("Cannot get hypercall page.\n");
+		memset(hypercall_page, 0xcc, PAGE_SIZE);
+		asm volatile("outl %%eax, %%dx\n"
+			     :
+			     : "a" (virt_to_phys(hypercall_page) >> PAGE_SHIFT),
+			       "d" (ioaddr)
+			     : "memory");
+	}
+	printk(KERN_INFO DRV_NAME ":use irq %d for event channel\n", irq);
+
+	if ((ret = request_irq(irq, evtchn_interrupt, SA_SHIRQ,
+			       "xen-evtchn-pci", evtchn_interrupt))) {
+		goto out;
+	}
+
+	if ((ret = init_xen_info()))
+		goto out;
+
+	if ((ret = set_callback_irq(irq)))
+		goto out;
+
+ out:
+	if (ret && hypercall_page)
+		free_page((unsigned long)hypercall_page);
+	return 0;
+}
+
+static struct pci_driver evtchn_driver = {
+  name:DRV_NAME,
+  probe:evtchn_pci_init,
+  remove:__devexit_p(evtchn_pci_remove),
+  id_table:evtchn_pci_tbl,
+};
+
+int __init setup_xen_callback(void)
+{
+	int rc = 0;
+	/* two ways for call back from hypervisor */
+
+	printk(KERN_INFO DRV_NAME ":legacy driver request irq :%d\n", callbackirq);
+	rc = request_irq(callbackirq, evtchn_interrupt, SA_SHIRQ,
+					 "xen-evtchn", evtchn_interrupt);
+	if (rc != 0)
+		printk(":request irq error:%d!", rc);
+	rc = set_callback_irq(callbackirq);
+	if (rc != 0)
+		printk(KERN_ERR DRV_NAME ":set call back irq error:%d!", rc);
+	return rc;
+}
+
+static int __init evtchn_pci_module_init(void)
+{
+	int rc;
+
+	printk(KERN_INFO DRV_NAME ":do xen module support init\n");
+
+/* when a module, this is printed whether or not devices are found in probe */
+#ifdef MODULE
+	printk(version);
+#endif
+
+	if (!nopci)
+	{
+		rc = pci_module_init(&evtchn_driver);
+		if (rc)
+			printk(KERN_INFO DRV_NAME ":No evtchn pci device model found,"
+				   "use legacy mode\n");
+	}
+	else
+	{
+		printk(KERN_INFO DRV_NAME ":disable evtchn pci device model"
+			   "by module arguments,use legacy mode\n");
+		rc = 1;
+	}
+
+	if (rc)
+	{
+		/*No Pci device, try legacy mode */
+		rc = init_xen_info();
+		if (rc)
+			return rc;
+		setup_xen_callback();
+		if (rc)
+			printk(KERN_ERR DRV_NAME ":setup xen legacy callback fail\n");
+	}
+
+	return rc;
+}
+
+static void __exit evtchn_pci_module_cleanup(void)
+{
+	printk(KERN_INFO DRV_NAME ":Do evtchn module cleanup\n");
+	/* disable hypervisor for callback irq */
+	set_callback_irq(0);
+
+	free_irq(callbackirq, NULL);
+
+	/*TODO: unmap hypercall param share page */
+
+	pci_unregister_driver(&evtchn_driver);
+}
+
+module_init(evtchn_pci_module_init);
+module_exit(evtchn_pci_module_cleanup);
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/evtchn-pci.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/evtchn-pci.h	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,58 @@
+/******************************************************************************
+ * evtchn-pci.h
+ * module driver support in unmodified Linux
+ * Copyright (C) 2004, Intel Corporation. <xiaofeng.ling@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+
+#ifndef __XEN_SUPPORT_H
+#define __XEN_SUPPORT_H
+#include <linux/version.h>
+#include <asm/io.h>
+#include <xen/interface/hvm/params.h>
+
+#include "debuginfo.h"
+
+extern unsigned long *phys_to_machine_mapping;
+
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0)
+#else
+#define __user
+#endif
+
+static inline int set_callback_irq(int irq)
+{
+	struct xen_hvm_param a;
+
+	a.domid = DOMID_SELF;
+	a.index = HVM_PARAM_CALLBACK_IRQ;
+	a.value = irq;
+	return HYPERVISOR_hvm_op(HVMOP_set_param, &a);
+}
+
+#define L2_PAGETABLE_SHIFT 22
+unsigned long alloc_xen_mmio(unsigned long len);
+
+int gnttab_init(void);
+void evtchn_init(void);
+void ctrl_if_init(void);
+
+void xen_machphys_update(unsigned long mfn, unsigned long pfn);
+int xen_do_init(void);
+
+void setup_xen_features(void);
+
+#endif
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/evtchn.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/evtchn.c	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,200 @@
+/******************************************************************************
+ * evtchn.c
+ * 
+ * A simplified event channel for para-drivers in unmodified linux
+ * 
+ * Copyright (c) 2002-2005, K A Fraser
+ * Copyright (c) 2005, <xiaofeng.ling@intel.com>
+ * 
+ * This file may be distributed separately from the Linux kernel, or
+ * incorporated into other software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/config.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <xen/evtchn.h>
+#include <xen/interface/hvm/ioreq.h>
+#include "evtchn-pci.h"
+
+void *hypercall_page;
+
+#define cpu_from_evtchn(port) (0)
+#define MAX_EVTCHN 256
+static struct
+{
+	irqreturn_t(*handler) (int, void *, struct pt_regs *);
+	void *dev_id;
+} evtchns[MAX_EVTCHN];
+
+void mask_evtchn(int port)
+{
+	shared_info_t *s = HYPERVISOR_shared_info;
+	synch_set_bit(port, &s->evtchn_mask[0]);
+}
+EXPORT_SYMBOL(mask_evtchn);
+
+void unmask_evtchn(int port)
+{
+	shared_info_t *s = HYPERVISOR_shared_info;
+	unsigned int cpu = smp_processor_id();
+	vcpu_info_t *vcpu_info = &s->vcpu_info[cpu];
+
+	/* Slow path (hypercall) if this is a non-local port. */
+	if (unlikely(cpu != cpu_from_evtchn(port))) {
+		evtchn_unmask_t op = { .port = port };
+		(void)HYPERVISOR_event_channel_op(EVTCHNOP_unmask,
+						  &op);
+		return;
+	}
+
+	synch_clear_bit(port, &s->evtchn_mask[0]);
+
+	/*
+	 * The following is basically the equivalent of 'hw_resend_irq'. Just
+	 * like a real IO-APIC we 'lose the interrupt edge' if the channel is
+	 * masked.
+	 */
+	if (synch_test_bit(port, &s->evtchn_pending[0]) && 
+	    !synch_test_and_set_bit(port / BITS_PER_LONG,
+				    &vcpu_info->evtchn_pending_sel)) {
+		vcpu_info->evtchn_upcall_pending = 1;
+		if (!vcpu_info->evtchn_upcall_mask)
+			force_evtchn_callback();
+	}
+}
+EXPORT_SYMBOL(unmask_evtchn);
+
+unsigned int bind_virq_to_evtchn(int virq)
+{
+	evtchn_bind_virq_t op;
+
+	op.virq = virq;
+	op.vcpu = 0;
+	if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq, &op) != 0)
+		BUG();
+
+	return op.port;
+}
+
+int
+bind_evtchn_to_irqhandler(unsigned int evtchn,
+			  irqreturn_t(*handler) (int, void *,
+						 struct pt_regs *),
+			  unsigned long irqflags, const char *devname,
+			  void *dev_id)
+{
+	if (evtchn >= MAX_EVTCHN)
+		return -EINVAL;
+	evtchns[evtchn].handler = handler;
+	evtchns[evtchn].dev_id = dev_id;
+	unmask_evtchn(evtchn);
+	return evtchn;
+}
+
+EXPORT_SYMBOL(bind_evtchn_to_irqhandler);
+
+void unbind_from_irqhandler(unsigned int evtchn, void *dev_id)
+{
+	if (evtchn >= MAX_EVTCHN)
+		return;
+
+	mask_evtchn(evtchn);
+	evtchns[evtchn].handler = NULL;
+}
+
+EXPORT_SYMBOL(unbind_from_irqhandler);
+
+void notify_remote_via_irq(int irq)
+{
+	int evtchn = irq;
+	notify_remote_via_evtchn(evtchn);
+}
+
+EXPORT_SYMBOL(notify_remote_via_irq);
+
+void unbind_evtchn_from_irq(unsigned int evtchn)
+{
+	return;
+}
+
+EXPORT_SYMBOL(unbind_evtchn_from_irq);
+
+#define active_evtchns(cpu,sh,idx)		\
+	((sh)->evtchn_pending[idx] &		\
+	 ~(sh)->evtchn_mask[idx])
+
+irqreturn_t evtchn_interrupt(int irq, void *dev_id, struct pt_regs *regs)
+{
+	unsigned long l1, l2;
+	unsigned int l1i, l2i, port;
+	int cpu = smp_processor_id();
+	irqreturn_t(*handler) (int, void *, struct pt_regs *);
+	shared_info_t *s = HYPERVISOR_shared_info;
+	vcpu_info_t *vcpu_info = &s->vcpu_info[cpu];
+
+	vcpu_info->evtchn_upcall_pending = 0;
+
+	/* NB. No need for a barrier here -- XCHG is a barrier on x86. */
+	l1 = xchg(&vcpu_info->evtchn_pending_sel, 0);
+	while (l1 != 0)
+	{
+		l1i = __ffs(l1);
+		l1 &= ~(1 << l1i);
+
+		while ((l2 = active_evtchns(cpu, s, l1i)) != 0)
+		{
+			l2i = __ffs(l2);
+
+			port = (l1i * BITS_PER_LONG) + l2i;
+
+			if ((handler = evtchns[port].handler) != NULL)
+			{
+				clear_evtchn(port);
+				handler(port, evtchns[port].dev_id, regs);
+			}
+			else
+			{
+				evtchn_device_upcall(port);
+			}
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
+void force_evtchn_callback(void)
+{
+	evtchn_interrupt(0, NULL, NULL);
+}
+
+EXPORT_SYMBOL(force_evtchn_callback);
+
+void bind_evtchn_to_cpu(unsigned int chn, unsigned int cpu)
+{
+}
+
+void __init evtchn_init(void)
+{
+
+}
+
+EXPORT_SYMBOL(hypercall_page);
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/xen_support.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/xen_support.c	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,53 @@
+/******************************************************************************
+ * support.c
+ * Xen module support functions.
+ * Copyright (C) 2004, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <xen/evtchn.h>
+#include <xen/interface/xen.h>
+#include <asm/hypervisor.h>
+#include "evtchn-pci.h"
+
+shared_info_t *HYPERVISOR_shared_info = NULL;
+EXPORT_SYMBOL(HYPERVISOR_shared_info); 
+
+EXPORT_SYMBOL(xen_machphys_update);
+void xen_machphys_update(unsigned long mfn, unsigned long pfn)
+{
+    mmu_update_t u;
+    u.ptr = (mfn << PAGE_SHIFT) | MMU_MACHPHYS_UPDATE;
+    u.val = pfn;
+    BUG_ON(HYPERVISOR_mmu_update(&u, 1, NULL, DOMID_SELF) < 0);
+}
+
+void balloon_update_driver_allowance(long delta)
+{
+}
+
+EXPORT_SYMBOL(balloon_update_driver_allowance);
+
+void evtchn_device_upcall(int port)
+{
+	printk("Error,no device upcall in guest domain (%d)!\n", port);
+	clear_evtchn(port);
+}
+
+EXPORT_SYMBOL (evtchn_device_upcall);
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/mkbuildtree
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/mkbuildtree	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,35 @@
+#! /bin/sh
+
+C=$PWD
+
+XEN=$C/../../xen
+XL=$C/../../linux-2.6-xen-sparse
+
+for d in $(find ${XL}/drivers/xen/ -type d -maxdepth 1 | sed -e 1d); do
+    if ! echo $d | egrep -q back; then
+        lndir $d $(basename $d) > /dev/null 2>&1
+    fi
+done
+
+ln -sf ${XL}/drivers/xen/net_driver_util.c netfront
+
+ln -sf ${XL}/drivers/xen/core/gnttab.c evtchn-pci
+ln -sf ${XL}/drivers/xen/core/features.c evtchn-pci
+ln -sf ${XL}/drivers/xen/core/xen_proc.c evtchn-pci
+
+mkdir -p include
+mkdir -p include/xen
+mkdir -p include/public
+mkdir -p include/asm
+
+lndir -silent ${XL}/include/xen include/xen
+ln -sf ${XEN}/include/public include/xen/interface
+
+# Need to be quite careful here: we don't want the files we link in to
+# risk overriding the native Linux ones (in particular, system.h must
+# be native and not xenolinux).
+ln -sf ${XL}/include/asm-i386/mach-xen/asm/hypervisor.h include/asm
+ln -sf ${XL}/include/asm-i386/mach-xen/asm/hypercall.h include/asm
+ln -sf ${XL}/include/asm-i386/mach-xen/asm/synch_bitops.h include/asm
+ln -sf ${XL}/include/asm-i386/mach-xen/asm/maddr.h include/asm
+
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/netfront/Kbuild
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/netfront/Kbuild	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,4 @@
+include $(M)/overrides.mk
+
+obj-m  = xen-vnif.o
+xen-vnif-objs	:= netfront.o
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/overrides.mk
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/overrides.mk	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,16 @@
+# Hack: we need to use the config which was used to build the kernel,
+# except that that won't have the right headers etc., so duplicate
+# some of the mach-xen infrastructure in here.
+#
+# (i.e. we need the native config for things like -mregparm, but
+# a Xen kernel to find the right headers)
+CONFIG_X86_XEN=y
+CONFIG_XEN_EVTCHN_PCI = m
+CONFIG_XEN_BLKDEV_FRONTEND	= m
+CONFIG_XEN_NETDEV_FRONTEND	= m
+EXTRA_CFLAGS += -DCONFIG_VMX -DCONFIG_VMX_GUEST -DCONFIG_X86_XEN
+EXTRA_CFLAGS += -DCONFIG_XEN_SHADOW_MODE -DCONFIG_XEN_SHADOW_TRANSLATE
+EXTRA_CFLAGS += -DCONFIG_XEN_BLKDEV_GRANT -DXEN_EVTCHN_MASK_OPS
+EXTRA_CFLAGS += -DCONFIG_XEN_NETDEV_GRANT_RX -DCONFIG_XEN_NETDEV_GRANT_TX
+EXTRA_CFLAGS += -D__XEN_INTERFACE_VERSION__=0x00030202
+EXTRA_CFLAGS += -I$(M)/include
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/xenbus/Kbuild
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/xenbus/Kbuild	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,9 @@
+include $(M)/overrides.mk
+
+obj-m	+= xenbus.o
+xenbus-objs =
+xenbus-objs += xenbus_comms.o
+xenbus-objs += xenbus_xs.o
+xenbus-objs += xenbus_probe.o 
+xenbus-objs += xenbus_dev.o 
+xenbus-objs += xenbus_client.o 

[-- Attachment #1.1.4: hvm_xen_unstable.diff --]
[-- Type: text/plain, Size: 76134 bytes --]

diff -r ecb8ff1fcf1f linux-2.6-xen-sparse/drivers/xen/privcmd/privcmd.c
--- a/linux-2.6-xen-sparse/drivers/xen/privcmd/privcmd.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/privcmd/privcmd.c	Tue Jul 18 13:43:27 2006 +0100
@@ -270,6 +270,7 @@ static int __init privcmd_init(void)
 	set_bit(__HYPERVISOR_sched_op_compat,  hypercall_permission_map);
 	set_bit(__HYPERVISOR_event_channel_op_compat,
 		hypercall_permission_map);
+	set_bit(__HYPERVISOR_hvm_op,           hypercall_permission_map);
 
 	privcmd_intf = create_xen_proc_entry("privcmd", 0400);
 	if (privcmd_intf != NULL)
diff -r ecb8ff1fcf1f tools/firmware/hvmloader/hvmloader.c
--- a/tools/firmware/hvmloader/hvmloader.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/tools/firmware/hvmloader/hvmloader.c	Tue Jul 18 13:43:27 2006 +0100
@@ -31,7 +31,7 @@
 #define	ROMBIOS_PHYSICAL_ADDRESS	0x000F0000
 
 /* invoke SVM's paged realmode support */
-#define SVM_VMMCALL_RESET_TO_REALMODE	0x00000001
+#define SVM_VMMCALL_RESET_TO_REALMODE	0x80000001
 
 /*
  * C runtime start off
@@ -133,15 +133,15 @@ cirrus_check(void)
 	return inb(0x3C5) == 0x12;
 }
 
-int 
-vmmcall(int edi, int esi, int edx, int ecx, int ebx)
+int
+vmmcall(int function, int edi, int esi, int edx, int ecx, int ebx)
 {
         int eax;
 
         __asm__ __volatile__(
 		".byte 0x0F,0x01,0xD9"
                 : "=a" (eax)
-		: "a"(0x58454E00), /* XEN\0 key */
+		: "a"(function),
 		  "b"(ebx), "c"(ecx), "d"(edx), "D"(edi), "S"(esi)
 	);
         return eax;
@@ -200,7 +200,7 @@ main(void)
 	if (check_amd()) {
 		/* AMD implies this is SVM */
                 puts("SVM go ...\n");
-                vmmcall(SVM_VMMCALL_RESET_TO_REALMODE, 0, 0, 0, 0);
+                vmmcall(SVM_VMMCALL_RESET_TO_REALMODE, 0, 0, 0, 0, 0);
 	} else {
 		puts("Loading VMXAssist ...\n");
 		memcpy((void *)VMXASSIST_PHYSICAL_ADDRESS,
diff -r ecb8ff1fcf1f tools/ioemu/Makefile.target
--- a/tools/ioemu/Makefile.target	Fri Jul 14 18:53:27 2006 +0100
+++ b/tools/ioemu/Makefile.target	Tue Jul 18 13:43:27 2006 +0100
@@ -336,6 +336,7 @@ VL_OBJS+= fdc.o mc146818rtc.o serial.o p
 VL_OBJS+= fdc.o mc146818rtc.o serial.o pc.o
 VL_OBJS+= cirrus_vga.o mixeng.o parallel.o
 VL_OBJS+= piix4acpi.o
+VL_OBJS+= xen_evtchn.o
 DEFINES += -DHAS_AUDIO
 endif
 ifeq ($(TARGET_BASE_ARCH), ppc)
diff -r ecb8ff1fcf1f tools/ioemu/hw/pc.c
--- a/tools/ioemu/hw/pc.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/tools/ioemu/hw/pc.c	Tue Jul 18 13:43:27 2006 +0100
@@ -819,6 +819,9 @@ static void pc_init1(uint64_t ram_size, 
     }
 #endif /* !CONFIG_DM */
 
+    if (pci_enabled)
+	pci_xen_evtchn_init(pci_bus);
+
     for(i = 0; i < MAX_SERIAL_PORTS; i++) {
         if (serial_hds[i]) {
             serial_init(&pic_set_irq_new, isa_pic,
diff -r ecb8ff1fcf1f tools/ioemu/target-i386-dm/helper2.c
--- a/tools/ioemu/target-i386-dm/helper2.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/tools/ioemu/target-i386-dm/helper2.c	Tue Jul 18 13:43:27 2006 +0100
@@ -82,6 +82,10 @@ int xce_handle = -1;
 /* which vcpu we are serving */
 int send_vcpu = 0;
 
+//the evtchn port for polling the notification,
+#define NR_CPUS 32
+evtchn_port_t ioreq_local_port[NR_CPUS];
+
 CPUX86State *cpu_x86_init(void)
 {
     CPUX86State *env;
@@ -105,15 +109,14 @@ CPUX86State *cpu_x86_init(void)
             return NULL;
         }
 
-        /* FIXME: how about if we overflow the page here? */
         for (i = 0; i < vcpus; i++) {
-            rc = xc_evtchn_bind_interdomain(
-                xce_handle, domid, shared_page->vcpu_iodata[i].vp_eport);
+	    rc = xc_evtchn_bind_interdomain(xce_handle, DOMID_XEN,
+			     shared_page->vcpu_iodata[i].vp_xen_port);
             if (rc == -1) {
                 fprintf(logfile, "bind interdomain ioctl error %d\n", errno);
                 return NULL;
             }
-            shared_page->vcpu_iodata[i].dm_eport = rc;
+	    ioreq_local_port[i] = rc;
         }
     }
 
@@ -184,10 +187,9 @@ void sp_info()
 
     for (i = 0; i < vcpus; i++) {
         req = &(shared_page->vcpu_iodata[i].vp_ioreq);
-        term_printf("vcpu %d: event port %d\n", i,
-                    shared_page->vcpu_iodata[i].vp_eport);
+        term_printf("vcpu %d: event port %d\n", i, ioreq_local_port[i]);
         term_printf("  req state: %x, pvalid: %x, addr: %"PRIx64", "
-                    "data: %"PRIx64", count: %"PRIx64", size: %"PRIx64"\n",
+                    "data: %"PRIx64",  count: %"PRIx64", size: %"PRIx64"\n",
                     req->state, req->pdata_valid, req->addr,
                     req->u.data, req->count, req->size);
         term_printf("  IO totally occurred on this vcpu: %"PRIx64"\n",
@@ -201,17 +203,12 @@ static ioreq_t *__cpu_get_ioreq(int vcpu
     ioreq_t *req;
 
     req = &(shared_page->vcpu_iodata[vcpu].vp_ioreq);
-
     if (req->state == STATE_IOREQ_READY) {
-        req->state = STATE_IOREQ_INPROCESS;
-        return req;
-    }
-
-    fprintf(logfile, "False I/O request ... in-service already: "
-            "%x, pvalid: %x, port: %"PRIx64", "
-            "data: %"PRIx64", count: %"PRIx64", size: %"PRIx64"\n",
-            req->state, req->pdata_valid, req->addr,
-            req->u.data, req->count, req->size);
+	req->state = STATE_IOREQ_INPROCESS;
+	rmb();
+	return req;
+    }
+
     return NULL;
 }
 
@@ -226,7 +223,7 @@ static ioreq_t *cpu_get_ioreq(void)
     port = xc_evtchn_pending(xce_handle);
     if (port != -1) {
         for ( i = 0; i < vcpus; i++ )
-            if ( shared_page->vcpu_iodata[i].dm_eport == port )
+            if ( ioreq_local_port[i] == port )
                 break;
 
         if ( i == vcpus ) {
@@ -447,8 +444,10 @@ void cpu_handle_ioreq(void *opaque)
         }
 
         /* No state change if state = STATE_IORESP_HOOK */
-        if (req->state == STATE_IOREQ_INPROCESS)
+        if (req->state == STATE_IOREQ_INPROCESS) {
+	    mb();
             req->state = STATE_IORESP_READY;
+	}
         env->send_event = 1;
     }
 }
@@ -479,8 +478,7 @@ int main_loop(void)
 
         if (env->send_event) {
             env->send_event = 0;
-            xc_evtchn_notify(xce_handle,
-                             shared_page->vcpu_iodata[send_vcpu].dm_eport);
+            (void)xc_evtchn_notify(xce_handle, ioreq_local_port[send_vcpu]);
         }
     }
     destroy_hvm_domain();
diff -r ecb8ff1fcf1f tools/libxc/xc_hvm_build.c
--- a/tools/libxc/xc_hvm_build.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/tools/libxc/xc_hvm_build.c	Tue Jul 18 13:43:27 2006 +0100
@@ -6,12 +6,14 @@
 #include <stddef.h>
 #include <inttypes.h>
 #include "xg_private.h"
+#include "xc_private.h"
 #include "xc_elf.h"
 #include <stdlib.h>
 #include <unistd.h>
 #include <zlib.h>
 #include <xen/hvm/hvm_info_table.h>
 #include <xen/hvm/ioreq.h>
+#include <xen/hvm/params.h>
 
 #define HVM_LOADER_ENTR_ADDR  0x00100000
 
@@ -52,6 +54,30 @@ loadelfimage(
     char *elfbase, int xch, uint32_t dom, unsigned long *parray,
     struct domain_setup_info *dsi);
 
+static void xc_set_hvm_param(int handle,
+                             domid_t dom, int param, unsigned long value)
+{
+    DECLARE_HYPERCALL;
+    xen_hvm_param_t arg;
+    int rc;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_set_param;
+    hypercall.arg[1] = (unsigned long)&arg;
+    arg.domid = dom;
+    arg.index = param;
+    arg.value = value;
+    if ( mlock(&arg, sizeof(arg)) != 0 )
+    {
+        PERROR("Could not lock memory for set parameter");
+        return;
+    }
+    rc = do_xen_hypercall(handle, &hypercall);
+    safe_munlock(&arg, sizeof(arg));
+    if (rc < 0)
+        PERROR("set HVM parameter failed (%d)", rc);
+}
+
 static unsigned char build_e820map(void *e820_page, unsigned long long mem_size)
 {
     struct e820entry *e820entry =
@@ -162,6 +188,8 @@ static int set_hvm_info(int xc_handle, u
     set_hvm_info_checksum(va_hvm);
 
     munmap(va_map, PAGE_SIZE);
+
+    xc_set_hvm_param(xc_handle, dom, HVM_PARAM_APIC_ENABLED, apic);
 
     return 0;
 }
@@ -275,27 +303,17 @@ static int setup_guest(int xc_handle,
         shared_info->vcpu_info[i].evtchn_upcall_mask = 1;
     munmap(shared_info, PAGE_SIZE);
 
-    /* Populate the event channel port in the shared page */
+    /* Paranoia */
     shared_page_frame = page_array[(v_end >> PAGE_SHIFT) - 1];
     if ( (sp = (shared_iopage_t *) xc_map_foreign_range(
               xc_handle, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
               shared_page_frame)) == 0 )
         goto error_out;
     memset(sp, 0, PAGE_SIZE);
-
-    /* FIXME: how about if we overflow the page here? */
-    for ( i = 0; i < vcpus; i++ ) {
-        unsigned int vp_eport;
-
-        vp_eport = xc_evtchn_alloc_unbound(xc_handle, dom, 0);
-        if ( vp_eport < 0 ) {
-            PERROR("Couldn't get unbound port from VMX guest.\n");
-            goto error_out;
-        }
-        sp->vcpu_iodata[i].vp_eport = vp_eport;
-    }
-
     munmap(sp, PAGE_SIZE);
+
+    xc_set_hvm_param(xc_handle, dom, HVM_PARAM_STORE_PFN, (v_end >> PAGE_SHIFT) - 2);
+    xc_set_hvm_param(xc_handle, dom, HVM_PARAM_STORE_EVTCHN, store_evtchn);
 
     *store_mfn = page_array[(v_end >> PAGE_SHIFT) - 2];
     if ( xc_clear_domain_page(xc_handle, dom, *store_mfn) )
diff -r ecb8ff1fcf1f xen/arch/x86/dom0_ops.c
--- a/xen/arch/x86/dom0_ops.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/dom0_ops.c	Tue Jul 18 13:43:27 2006 +0100
@@ -429,7 +429,7 @@ long arch_do_dom0_op(struct dom0_op *op,
         ret = 0;
 
         hypercall_page = map_domain_page(mfn);
-        hypercall_page_initialise(hypercall_page);
+        hypercall_page_initialise(d, hypercall_page);
         unmap_domain_page(hypercall_page);
 
         put_page_and_type(mfn_to_page(mfn));
diff -r ecb8ff1fcf1f xen/arch/x86/domain.c
--- a/xen/arch/x86/domain.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/domain.c	Tue Jul 18 13:43:27 2006 +0100
@@ -819,7 +819,7 @@ unsigned long hypercall_create_continuat
 #if defined(__i386__)
         regs->eax  = op;
 
-        if ( supervisor_mode_kernel )
+        if ( supervisor_mode_kernel || hvm_guest(current) )
             regs->eip &= ~31; /* re-execute entire hypercall entry stub */
         else
             regs->eip -= 2;   /* re-execute 'int 0x82' */
diff -r ecb8ff1fcf1f xen/arch/x86/domain_build.c
--- a/xen/arch/x86/domain_build.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/domain_build.c	Tue Jul 18 13:43:27 2006 +0100
@@ -704,7 +704,7 @@ int construct_dom0(struct domain *d,
             return -1;
         }
 
-        hypercall_page_initialise((void *)hypercall_page);
+        hypercall_page_initialise(d, (void *)hypercall_page);
     }
 
     /* Copy the initial ramdisk. */
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/hvm.c	Tue Jul 18 13:43:27 2006 +0100
@@ -45,6 +45,9 @@
 #include <public/sched.h>
 #include <public/hvm/ioreq.h>
 #include <public/hvm/hvm_info_table.h>
+#include <xen/event.h>
+#include <xen/hypercall.h>
+#include <xen/guest_access.h>
 
 int hvm_enabled = 0;
 
@@ -58,6 +61,8 @@ static void hvm_zap_mmio_range(
 {
     unsigned long i, val = INVALID_MFN;
 
+    ASSERT(d == current->domain);
+
     for ( i = 0; i < nr_pfn; i++ )
     {
         if ( pfn + i >= 0xfffff )
@@ -67,18 +72,27 @@ static void hvm_zap_mmio_range(
     }
 }
 
-static void hvm_map_io_shared_page(struct domain *d)
+static void e820_zap_iommu_callback(struct domain *d,
+                                    struct e820entry *e,
+                                    void *ign)
+{
+    if ( e->type == E820_IO )
+        hvm_zap_mmio_range(d, e->addr >> PAGE_SHIFT, e->size >> PAGE_SHIFT);
+}
+
+static void e820_foreach(struct domain *d,
+                         void (*cb)(struct domain *d,
+                                    struct e820entry *e,
+                                    void *data),
+                         void *data)
 {
     int i;
     unsigned char e820_map_nr;
     struct e820entry *e820entry;
     unsigned char *p;
     unsigned long mfn;
-    unsigned long gpfn = 0;
-
-    local_flush_tlb_pge();
-
-    mfn = get_mfn_from_gpfn(E820_MAP_PAGE >> PAGE_SHIFT);
+
+    mfn = gmfn_to_mfn(d, E820_MAP_PAGE >> PAGE_SHIFT);
     if (mfn == INVALID_MFN) {
         printk("Can not find E820 memory map page for HVM domain.\n");
         domain_crash_synchronous();
@@ -95,26 +109,40 @@ static void hvm_map_io_shared_page(struc
 
     for ( i = 0; i < e820_map_nr; i++ )
     {
-        if ( e820entry[i].type == E820_SHARED_PAGE )
-            gpfn = (e820entry[i].addr >> PAGE_SHIFT);
-        if ( e820entry[i].type == E820_IO )
-            hvm_zap_mmio_range(
-                d, 
-                e820entry[i].addr >> PAGE_SHIFT,
-                e820entry[i].size >> PAGE_SHIFT);
-    }
-
-    if ( gpfn == 0 ) {
-        printk("Can not get io request shared page"
-               " from E820 memory map for HVM domain.\n");
-        unmap_domain_page(p);
-        domain_crash_synchronous();
-    }
+        cb(d, e820entry + i, data);
+    }
+
     unmap_domain_page(p);
-
-    /* Initialise shared page */
-    mfn = get_mfn_from_gpfn(gpfn);
-    if (mfn == INVALID_MFN) {
+}
+
+static void hvm_zap_iommu_pages(struct domain *d)
+{
+    e820_foreach(d, e820_zap_iommu_callback, NULL);
+}
+
+static void e820_map_io_shared_callback(struct domain *d,
+                                        struct e820entry *e,
+                                        void *data)
+{
+    unsigned long *mfn = data;
+    if ( e->type == E820_SHARED_PAGE ) {
+        ASSERT(*mfn == INVALID_MFN);
+        *mfn = gmfn_to_mfn(d, e->addr >> PAGE_SHIFT);
+    }
+}
+
+void hvm_map_io_shared_page(struct vcpu *v)
+{
+    unsigned long mfn = INVALID_MFN;
+    void *p;
+    struct domain *d = v->domain;
+
+    if ( d->arch.hvm_domain.shared_page_va )
+        return;
+
+    e820_foreach(d, e820_map_io_shared_callback, &mfn);
+
+    if ( mfn == INVALID_MFN ) {
         printk("Can not find io request shared page for HVM domain.\n");
         domain_crash_synchronous();
     }
@@ -127,59 +155,20 @@ static void hvm_map_io_shared_page(struc
     d->arch.hvm_domain.shared_page_va = (unsigned long)p;
 }
 
-static int validate_hvm_info(struct hvm_info_table *t)
-{
-    char signature[] = "HVM INFO";
-    uint8_t *ptr = (uint8_t *)t;
-    uint8_t sum = 0;
-    int i;
-
-    /* strncmp(t->signature, "HVM INFO", 8) */
-    for ( i = 0; i < 8; i++ ) {
-        if ( signature[i] != t->signature[i] ) {
-            printk("Bad hvm info signature\n");
-            return 0;
-        }
-    }
-
-    for ( i = 0; i < t->length; i++ )
-        sum += ptr[i];
-
-    return (sum == 0);
-}
-
-static void hvm_get_info(struct domain *d)
-{
-    unsigned char *p;
-    unsigned long mfn;
-    struct hvm_info_table *t;
-
-    mfn = get_mfn_from_gpfn(HVM_INFO_PFN);
-    if ( mfn == INVALID_MFN ) {
-        printk("Can not get info page mfn for HVM domain.\n");
-        domain_crash_synchronous();
-    }
-
-    p = map_domain_page(mfn);
-    if ( p == NULL ) {
-        printk("Can not map info page for HVM domain.\n");
-        domain_crash_synchronous();
-    }
-
-    t = (struct hvm_info_table *)(p + HVM_INFO_OFFSET);
-
-    if ( validate_hvm_info(t) ) {
-        d->arch.hvm_domain.nr_vcpus = t->nr_vcpus;
-        d->arch.hvm_domain.apic_enabled = t->apic_enabled;
-        d->arch.hvm_domain.pae_enabled = t->pae_enabled;
-    } else {
-        printk("Bad hvm info table\n");
-        d->arch.hvm_domain.nr_vcpus = 1;
-        d->arch.hvm_domain.apic_enabled = 0;
-        d->arch.hvm_domain.pae_enabled = 0;
-    }
-
-    unmap_domain_page(p);
+static void evtchn_callback_func(void *v)
+{
+    hvm_assist_complete(v);
+}
+
+void hvm_create_event_channels(struct vcpu *v)
+{
+    vcpu_iodata_t *p;
+    p = get_vio(v->domain, v->vcpu_id);
+    v->arch.hvm_vcpu.xen_port = p->vp_xen_port =
+        alloc_xen_event_channel(evtchn_callback_func,
+                                v,
+                                dom0);
+    DPRINTK("Allocated port %d for hvm.\n", v->arch.hvm_vcpu.xen_port);
 }
 
 void hvm_setup_platform(struct domain* d)
@@ -196,8 +185,7 @@ void hvm_setup_platform(struct domain* d
         domain_crash_synchronous();
     }
 
-    hvm_map_io_shared_page(d);
-    hvm_get_info(d);
+    hvm_zap_iommu_pages(d);
 
     platform = &d->arch.hvm_domain;
     pic_init(&platform->vpic, pic_irq_request, &platform->interrupt_request);
@@ -329,6 +317,59 @@ void hvm_print_line(struct vcpu *v, cons
 	pbuf[(*index)++] = c;
 }
 
+void hvm_release_assist_channel(struct vcpu *v)
+{
+    release_xen_event_channel(v->arch.hvm_vcpu.xen_port);
+}
+
+#if defined(__i386__)
+typedef unsigned long hvm_hypercall_handler(unsigned long, unsigned long,
+                                            unsigned long, unsigned long,
+                                            unsigned long);
+#define HYPERCALL(x) [ __HYPERVISOR_ ## x ] = (hvm_hypercall_handler *) do_ ## x
+static hvm_hypercall_handler *hvm_hypercall_table[] = {
+    HYPERCALL(mmu_update),
+    HYPERCALL(memory_op),
+    HYPERCALL(multicall),
+    HYPERCALL(update_va_mapping),
+    HYPERCALL(event_channel_op_compat),
+    HYPERCALL(xen_version),
+    HYPERCALL(grant_table_op),
+    HYPERCALL(event_channel_op),
+    HYPERCALL(hvm_op)
+};
+#undef HYPERCALL
+
+void hvm_do_hypercall(struct cpu_user_regs *pregs)
+{
+    if (pregs->eax > ARRAY_SIZE(hvm_hypercall_table) ||
+        !hvm_hypercall_table[pregs->eax]) {
+        DPRINTK("HVM vcpu %d:%d did a bad hypercall %d.\n",
+                current->domain->domain_id, current->vcpu_id,
+                pregs->eax);
+        pregs->eax = -ENOSYS;
+    } else {
+        pregs->eax = hvm_hypercall_table[pregs->eax](pregs->ebx, pregs->ecx,
+                                                     pregs->edx, pregs->esi,
+                                                     pregs->edi);
+    }
+}
+#else
+void hvm_do_hypercall(struct cpu_user_regs *pregs)
+{
+    printk("not supported yet!\n");
+}
+#endif
+
+/* Initialise a hypercall transfer page for a VMX domain using
+   paravirtualised drivers. */
+void hvm_hypercall_page_initialise(struct domain *d,
+                                   void *hypercall_page)
+{
+    hvm_funcs.init_hypercall_page(d, hypercall_page);
+}
+
+
 /*
  * only called in HVM domain BSP context
  * when booting, vcpuid is always equal to apic_id
@@ -372,6 +413,57 @@ int hvm_bringup_ap(int vcpuid, int tramp
 
     xfree(ctxt);
 
+    return rc;
+}
+
+long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg)
+
+{
+    long rc = 0;
+
+    switch (op)
+    {
+    case HVMOP_set_param:
+    case HVMOP_get_param:
+    {
+        struct xen_hvm_param a;
+        struct domain *d;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        if ( a.index < 0 || a.index > HVM_NR_PARAMS ) {
+            return -EINVAL;
+        }
+
+        if ( a.domid == DOMID_SELF ) {
+            get_knownalive_domain(current->domain);
+            d = current->domain;
+        } else if ( IS_PRIV(current->domain) ) {
+            d = find_domain_by_id(a.domid);
+            if ( !d ) {
+                return -ESRCH;
+            }
+        } else {
+            return -EPERM;
+        }
+
+        if ( op == HVMOP_set_param ) {
+            rc = 0;
+            d->arch.hvm_domain.params[a.index] = a.value;
+        } else {
+            rc = d->arch.hvm_domain.params[a.index];
+        }
+
+        put_domain(d);
+        return rc;
+    }
+    default:
+    {
+        DPRINTK("Bad HVM op %ld.\n", op);
+        rc = -EINVAL;
+    }
+    }
     return rc;
 }
 
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/intercept.c
--- a/xen/arch/x86/hvm/intercept.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/intercept.c	Tue Jul 18 13:43:27 2006 +0100
@@ -211,7 +211,7 @@ void hlt_timer_fn(void *data)
 {
     struct vcpu *v = data;
 
-    evtchn_set_pending(v, iopacket_port(v));
+    hvm_prod_vcpu(v);
 }
 
 static __inline__ void missed_ticks(struct periodic_time *pt)
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/io.c
--- a/xen/arch/x86/hvm/io.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/io.c	Tue Jul 18 13:43:27 2006 +0100
@@ -687,85 +687,18 @@ void hvm_io_assist(struct vcpu *v)
 
     p = &vio->vp_ioreq;
 
-    /* clear IO wait HVM flag */
-    if ( test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags) ) {
-        if ( p->state == STATE_IORESP_READY ) {
-            p->state = STATE_INVALID;
-            clear_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags);
-
-            if ( p->type == IOREQ_TYPE_PIO )
-                hvm_pio_assist(regs, p, io_opp);
-            else {
-                hvm_mmio_assist(regs, p, io_opp);
-                hvm_load_cpu_guest_regs(v, regs);
-            }
-
-            /* Copy register changes back into current guest state. */
-            memcpy(guest_cpu_user_regs(), regs, HVM_CONTEXT_STACK_BYTES);
-        }
-        /* else an interrupt send event raced us */
-    }
-}
-
-/*
- * On exit from hvm_wait_io, we're guaranteed not to be waiting on
- * I/O response from the device model.
- */
-void hvm_wait_io(void)
-{
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    int port = iopacket_port(v);
-
-    for ( ; ; )
-    {
-        /* Clear master flag, selector flag, event flag each in turn. */
-        v->vcpu_info->evtchn_upcall_pending = 0;
-        clear_bit(port/BITS_PER_LONG, &v->vcpu_info->evtchn_pending_sel);
-        smp_mb__after_clear_bit();
-        if ( test_and_clear_bit(port, &d->shared_info->evtchn_pending[0]) )
-            hvm_io_assist(v);
-
-        /* Need to wait for I/O responses? */
-        if ( !test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags) )
-            break;
-
-        do_sched_op_compat(SCHEDOP_block, 0);
-    }
-
-    /*
-     * Re-set the selector and master flags in case any other notifications
-     * are pending.
-     */
-    if ( d->shared_info->evtchn_pending[port/BITS_PER_LONG] )
-        set_bit(port/BITS_PER_LONG, &v->vcpu_info->evtchn_pending_sel);
-    if ( v->vcpu_info->evtchn_pending_sel )
-        v->vcpu_info->evtchn_upcall_pending = 1;
-}
-
-void hvm_safe_block(void)
-{
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    int port = iopacket_port(v);
-
-    for ( ; ; )
-    {
-        /* Clear master flag & selector flag so we will wake from block. */
-        v->vcpu_info->evtchn_upcall_pending = 0;
-        clear_bit(port/BITS_PER_LONG, &v->vcpu_info->evtchn_pending_sel);
-        smp_mb__after_clear_bit();
-
-        /* Event pending already? */
-        if ( test_bit(port, &d->shared_info->evtchn_pending[0]) )
-            break;
-
-        do_sched_op_compat(SCHEDOP_block, 0);
-    }
-
-    /* Reflect pending event in selector and master flags. */
-    set_bit(port/BITS_PER_LONG, &v->vcpu_info->evtchn_pending_sel);
-    v->vcpu_info->evtchn_upcall_pending = 1;
+    if (p->state == STATE_IORESP_READY) {
+        p->state = STATE_INVALID;
+        if (p->type == IOREQ_TYPE_PIO)
+            hvm_pio_assist(regs, p, io_opp);
+        else {
+            hvm_mmio_assist(regs, p, io_opp);
+            hvm_load_cpu_guest_regs(v, regs);
+        }
+
+        /* Copy register changes back into current guest state. */
+        memcpy(guest_cpu_user_regs(), regs, HVM_CONTEXT_STACK_BYTES);
+    }
 }
 
 /*
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/platform.c
--- a/xen/arch/x86/hvm/platform.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/platform.c	Tue Jul 18 13:43:27 2006 +0100
@@ -669,6 +669,37 @@ int inst_copy_from_guest(unsigned char *
     return inst_len;
 }
 
+static void hvm_send_assist_req(struct vcpu *v)
+{
+    ioreq_t *p;
+
+    ASSERT(!test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags));
+    spin_lock(&v->pause_lock);
+    if ( v->pause_count++ == 0 )
+        set_bit(_VCPUF_paused, &v->vcpu_flags);
+    spin_unlock(&v->pause_lock);
+    set_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags);
+    mb();
+    p = &get_vio(v->domain, v->vcpu_id)->vp_ioreq;
+    if (unlikely(p->state != STATE_INVALID)) {
+        /* This indicates a bug in the device model.  Crash the
+           domain. */
+        printf("Device model set bad IO state %d.\n", p->state);
+        domain_crash(v->domain);
+        return;
+    }
+    vcpu_sleep_nosync(v);
+    wmb();
+    p->state = STATE_IOREQ_READY;
+    notify_xen_event_channel(v->arch.hvm_vcpu.xen_port);
+}
+
+/* Wake up a vcpu whihc is waiting for interrupts to come in */
+void hvm_prod_vcpu(struct vcpu *v)
+{
+    vcpu_unblock(v);
+}
+
 void send_pio_req(struct cpu_user_regs *regs, unsigned long port,
                   unsigned long count, int size, long value, int dir, int pvalid)
 {
@@ -682,13 +713,11 @@ void send_pio_req(struct cpu_user_regs *
         domain_crash_synchronous();
     }
 
-    if (test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags)) {
-        printf("HVM I/O has not yet completed\n");
-        domain_crash_synchronous();
-    }
-    set_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags);
-
     p = &vio->vp_ioreq;
+    if (p->state != STATE_INVALID) {
+        printf("WARNING: send pio with something already pending (%d)?\n",
+               p->state);
+    }
     p->dir = dir;
     p->pdata_valid = pvalid;
 
@@ -714,15 +743,11 @@ void send_pio_req(struct cpu_user_regs *
         return;
     }
 
-    p->state = STATE_IOREQ_READY;
-
-    evtchn_send(iopacket_port(v));
-    hvm_wait_io();
-}
-
-void send_mmio_req(
-    unsigned char type, unsigned long gpa,
-    unsigned long count, int size, long value, int dir, int pvalid)
+    hvm_send_assist_req(v);
+}
+
+static void send_mmio_req(unsigned char type, unsigned long gpa,
+                          unsigned long count, int size, long value, int dir, int pvalid)
 {
     struct vcpu *v = current;
     vcpu_iodata_t *vio;
@@ -739,12 +764,10 @@ void send_mmio_req(
 
     p = &vio->vp_ioreq;
 
-    if (test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags)) {
-        printf("HVM I/O has not yet completed\n");
-        domain_crash_synchronous();
-    }
-
-    set_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags);
+    if (p->state != STATE_INVALID) {
+        printf("WARNING: send pio with something already pending (%d)?\n",
+               p->state);
+    }
     p->dir = dir;
     p->pdata_valid = pvalid;
 
@@ -770,10 +793,7 @@ void send_mmio_req(
         return;
     }
 
-    p->state = STATE_IOREQ_READY;
-
-    evtchn_send(iopacket_port(v));
-    hvm_wait_io();
+    hvm_send_assist_req(v);
 }
 
 static void mmio_operands(int type, unsigned long gpa, struct instruction *inst,
@@ -1035,6 +1055,108 @@ void handle_mmio(unsigned long va, unsig
     }
 }
 
+void hvm_assist_complete(struct vcpu *v)
+{
+    ioreq_t *p;
+    /* The device model just sent an event channel message to us.  Either:
+
+    a) It just finished processing a request, or
+    b) it wants us to send an interrupt into the guest.
+
+    We only need to handle case (b) explicitly if there is no pending
+    IO request from us to the device model (since if there is, we'll
+    pick up the interrupt when the request completes). */
+    p = &get_vio(v->domain, v->vcpu_id)->vp_ioreq;
+    if (p->state == STATE_IORESP_READY) {
+        /* There's a race here, in that the device model could set
+           p->state while we're not looking, but we don't care, since
+           that would imply that *this* notification is not related to
+           that state transition, and so there'll be another one along
+           shortly. */
+        if (test_and_clear_bit(ARCH_HVM_IO_WAIT,
+                               &v->arch.hvm_vcpu.ioflags)) {
+            /* Just completed a wait-for-io, so we can unpause the
+               vcpu.  It'll pick up the response when it returns.  */
+            vcpu_unpause(v);
+            return;
+        } else {
+            /* Someone got in and processed the response before us.
+               Just to be on the safe side, treat this as an interrupt
+               delivery. */
+            /* (the other path implicitly does interrupt delivery as
+               the vcpu returns to the guest) */
+        }
+    }
+
+    /* Evtchn message must have been for interrupt delivery. */
+    hvm_prod_vcpu(v);
+    smp_send_event_check_cpu(v->processor);
+}
+
+#define MIN(x,y) ((x)<(y)?(x):(y))
+
+/* Note that copy_{to,from}_user_hvm don't set the A and D bits on
+   PTEs, and require the PTE to be writable even when they're only
+   trying to read from it.  The guest is expected to deal with
+   this. */
+unsigned long copy_to_user_hvm(void *to, const void *from, unsigned len)
+{
+    unsigned long mfn;
+    unsigned long va;
+    void *map;
+    unsigned long off_in_page;
+    unsigned long chunk_size;
+
+    ASSERT(hvm_guest(current));
+    va = (unsigned long)to;
+    off_in_page = va % PAGE_SIZE;
+    while (len != 0) {
+        mfn = gva_to_mfn(va);
+        if (!mfn)
+            break;
+        map = map_domain_page(mfn);
+        if (!map)
+            break;
+        chunk_size = MIN(len, PAGE_SIZE - off_in_page);
+        memcpy(map + off_in_page, from, chunk_size);
+        unmap_domain_page(map);
+        off_in_page = 0;
+        len -= chunk_size;
+        from += chunk_size;
+        va += chunk_size;
+    }
+    return len;
+}
+
+unsigned long copy_from_user_hvm(void *to, const void *from, unsigned len)
+{
+    unsigned long mfn;
+    unsigned long va;
+    void *map;
+    unsigned long off_in_page;
+    unsigned long chunk_size;
+
+    ASSERT(hvm_guest(current));
+    va = (unsigned long)from;
+    off_in_page = va % PAGE_SIZE;
+    while (len != 0) {
+        mfn = gva_to_mfn(va);
+        if (!mfn)
+            break;
+        map = map_domain_page(mfn);
+        if (!map)
+            break;
+        chunk_size = MIN(len, PAGE_SIZE - off_in_page);
+        memcpy(to, map + off_in_page, chunk_size);
+        unmap_domain_page(map);
+        off_in_page = 0;
+        len -= chunk_size;
+        to += chunk_size;
+        va += chunk_size;
+    }
+    return len;
+}
+
 /*
  * Local variables:
  * mode: C
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/svm/svm.c
--- a/xen/arch/x86/hvm/svm/svm.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/svm/svm.c	Tue Jul 18 13:43:27 2006 +0100
@@ -25,6 +25,7 @@
 #include <xen/sched.h>
 #include <xen/irq.h>
 #include <xen/softirq.h>
+#include <xen/hypercall.h>
 #include <asm/current.h>
 #include <asm/io.h>
 #include <asm/shadow.h>
@@ -456,6 +457,28 @@ void svm_init_ap_context(struct vcpu_gue
     ctxt->flags = VGCF_HVM_GUEST;
 }
 
+static void svm_init_hypercall_page(struct domain *d, void *hypercall_page)
+{
+    char *p;
+    int i;
+
+    memset(hypercall_page, 0, PAGE_SIZE);
+
+    for ( i = 0; i < (PAGE_SIZE / 32); i++ )
+    {
+        p = (char *)(hypercall_page + (i * 32));
+        *(u8  *)(p + 0) = 0xb8; /* mov imm32, %eax */
+        *(u32 *)(p + 1) = i;
+        *(u8  *)(p + 5) = 0x0f; /* vmmcall */
+        *(u8  *)(p + 6) = 0x01;
+        *(u8  *)(p + 7) = 0xd9;
+        *(u8  *)(p + 8) = 0xc3; /* ret */
+    }
+
+    /* Don't support HYPERVISOR_iret at the moment */
+    *(u16 *)(hypercall_page + (__HYPERVISOR_iret * 32)) = 0x0b0f; /* ud2 */
+}
+
 int start_svm(void)
 {
     u32 eax, ecx, edx;
@@ -503,6 +526,8 @@ int start_svm(void)
     hvm_funcs.instruction_length = svm_instruction_length;
     hvm_funcs.get_guest_ctrl_reg = svm_get_ctrl_reg;
     hvm_funcs.init_ap_context = svm_init_ap_context;
+
+    hvm_funcs.init_hypercall_page = svm_init_hypercall_page;
 
     hvm_enabled = 1;    
 
@@ -2085,7 +2110,7 @@ static inline void svm_vmexit_do_hlt(str
         next_wakeup = next_pit;
     if ( next_wakeup != - 1 )
         set_timer(&current->arch.hvm_svm.hlt_timer, next_wakeup);
-    hvm_safe_block();
+    do_sched_op_compat(SCHEDOP_block, 0);
 }
 
 
@@ -2314,33 +2339,39 @@ static int svm_do_vmmcall(struct vcpu *v
     inst_len = __get_instruction_length(vmcb, INSTR_VMCALL, NULL);
     ASSERT(inst_len > 0);
 
-    /* VMMCALL sanity check */
-    if (vmcb->cpl > get_vmmcall_cpl(regs->edi))
-    {
-        printf("VMMCALL CPL check failed\n");
-        return -1;
-    }
-
-    /* handle the request */
-    switch (regs->edi) 
-    {
-    case VMMCALL_RESET_TO_REALMODE:
-        if (svm_do_vmmcall_reset_to_realmode(v, regs)) 
-        {
-            printf("svm_do_vmmcall_reset_to_realmode() failed\n");
+    if (regs->eax & 0x80000000) {
+        /* VMMCALL sanity check */
+        if (vmcb->cpl > get_vmmcall_cpl(regs->edi))
+        {
+            printf("VMMCALL CPL check failed\n");
             return -1;
         }
-    
-        /* since we just reset the VMCB, return without adjusting the eip */
-        return 0;
-    case VMMCALL_DEBUG:
-        printf("DEBUG features not implemented yet\n");
-        break;
-    default:
-    break;
-    }
-
-    hvm_print_line(v, regs->eax); /* provides the current domain */
+
+        /* handle the request */
+        switch (regs->eax)
+        {
+        case VMMCALL_RESET_TO_REALMODE:
+            if (svm_do_vmmcall_reset_to_realmode(v, regs))
+            {
+                printf("svm_do_vmmcall_reset_to_realmode() failed\n");
+                return -1;
+            }
+            /* since we just reset the VMCB, return without adjusting
+             * the eip */
+            return 0;
+
+        case VMMCALL_DEBUG:
+            printf("DEBUG features not implemented yet\n");
+            break;
+        default:
+            break;
+        }
+
+        hvm_print_line(v, regs->eax); /* provides the current domain */
+    } else {
+        /* It's a hypercall */
+        hvm_do_hypercall(regs);
+    }
 
     __update_guest_eip(vmcb, inst_len);
     return 0;
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/svm/vmcb.c
--- a/xen/arch/x86/hvm/svm/vmcb.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/svm/vmcb.c	Tue Jul 18 13:43:27 2006 +0100
@@ -370,18 +370,6 @@ void svm_do_launch(struct vcpu *v)
     if (v->vcpu_id == 0)
         hvm_setup_platform(v->domain);
 
-    if ( evtchn_bind_vcpu(iopacket_port(v), v->vcpu_id) < 0 )
-    {
-        printk("HVM domain bind port %d to vcpu %d failed!\n",
-               iopacket_port(v), v->vcpu_id);
-        domain_crash_synchronous();
-    }
-
-    HVM_DBG_LOG(DBG_LEVEL_1, "eport: %x", iopacket_port(v));
-
-    clear_bit(iopacket_port(v),
-              &v->domain->shared_info->evtchn_mask[0]);
-
     if (hvm_apic_support(v->domain))
         vlapic_init(v);
     init_timer(&v->arch.hvm_svm.hlt_timer,
@@ -455,9 +443,10 @@ void svm_do_resume(struct vcpu *v)
         pickup_deactive_ticks(pt);
     }
 
-    if ( test_bit(iopacket_port(v), &d->shared_info->evtchn_pending[0]) ||
-         test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags) )
-        hvm_wait_io();
+    if (test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags)) {
+        hvm_io_assist(v);
+        ASSERT(!test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags));
+    }
 
     /* We can't resume the guest if we're waiting on I/O */
     ASSERT(!test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags));
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/vlapic.c
--- a/xen/arch/x86/hvm/vlapic.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/vlapic.c	Tue Jul 18 13:43:27 2006 +0100
@@ -33,6 +33,7 @@
 #include <xen/sched.h>
 #include <asm/current.h>
 #include <public/hvm/ioreq.h>
+#include <public/hvm/params.h>
 
 /* XXX remove this definition after GFW enabled */
 #define VLAPIC_NO_BIOS
@@ -63,7 +64,7 @@ int vlapic_find_highest_irr(struct vlapi
 
 int hvm_apic_support(struct domain *d)
 {
-    return d->arch.hvm_domain.apic_enabled;
+    return d->arch.hvm_domain.params[HVM_PARAM_APIC_ENABLED];
 }
 
 s_time_t get_apictime_scheduled(struct vcpu *v)
@@ -223,7 +224,7 @@ static int vlapic_accept_irq(struct vcpu
               "level trig mode for vector %d\n", vector);
             set_bit(vector, &vlapic->tmr[0]);
         }
-        evtchn_set_pending(v, iopacket_port(v));
+        hvm_prod_vcpu(v);
 
         result = 1;
         break;
@@ -367,7 +368,7 @@ int vlapic_check_vector(struct vlapic *v
     return 1;
 }
 
-void vlapic_ipi(struct vlapic *vlapic)
+static void vlapic_ipi(struct vlapic *vlapic)
 {
     unsigned int dest = (vlapic->icr_high >> 24) & 0xff;
     unsigned int short_hand = (vlapic->icr_low >> 18) & 3;
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/vmx/io.c
--- a/xen/arch/x86/hvm/vmx/io.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/vmx/io.c	Tue Jul 18 13:43:27 2006 +0100
@@ -142,6 +142,7 @@ asmlinkage void vmx_intr_assist(void)
     struct hvm_domain *plat=&v->domain->arch.hvm_domain;
     struct periodic_time *pt = &plat->pl_time.periodic_tm;
     struct hvm_virpic *pic= &plat->vpic;
+    int callback_irq;
     unsigned int idtv_info_field;
     unsigned long inst_len;
     int    has_ext_irq;
@@ -152,6 +153,15 @@ asmlinkage void vmx_intr_assist(void)
     if ( (v->vcpu_id == 0) && pt->enabled && pt->pending_intr_nr ) {
         pic_set_irq(pic, pt->irq, 0);
         pic_set_irq(pic, pt->irq, 1);
+    }
+
+    callback_irq = v->domain->arch.hvm_domain.params[HVM_PARAM_CALLBACK_IRQ];
+    if ( callback_irq != 0 &&
+         local_events_need_delivery() ) {
+        /*inject para-device call back irq*/
+        v->vcpu_info->evtchn_upcall_mask = 1;
+        pic_set_irq(pic, callback_irq, 0);
+        pic_set_irq(pic, callback_irq, 1);
     }
 
     has_ext_irq = cpu_has_pending_irq(v);
@@ -220,7 +230,7 @@ asmlinkage void vmx_intr_assist(void)
 
 void vmx_do_resume(struct vcpu *v)
 {
-    struct domain *d = v->domain;
+    ioreq_t *p;
     struct periodic_time *pt = &v->domain->arch.hvm_domain.pl_time.periodic_tm;
 
     vmx_stts();
@@ -234,9 +244,13 @@ void vmx_do_resume(struct vcpu *v)
         pickup_deactive_ticks(pt);
     }
 
-    if ( test_bit(iopacket_port(v), &d->shared_info->evtchn_pending[0]) ||
-         test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags) )
-        hvm_wait_io();
+    p = &get_vio(v->domain, v->vcpu_id)->vp_ioreq;
+    if (p->state == STATE_IORESP_READY)
+        hvm_io_assist(v);
+    if (p->state != STATE_INVALID) {
+        printf("Weird HVM iorequest state %d.\n", p->state);
+        domain_crash(v->domain);
+    }
 
     /* We can't resume the guest if we're waiting on I/O */
     ASSERT(!test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags));
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/vmx/vmcs.c
--- a/xen/arch/x86/hvm/vmx/vmcs.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/vmx/vmcs.c	Tue Jul 18 13:43:27 2006 +0100
@@ -245,18 +245,6 @@ static void vmx_do_launch(struct vcpu *v
     if (v->vcpu_id == 0)
         hvm_setup_platform(v->domain);
 
-    if ( evtchn_bind_vcpu(iopacket_port(v), v->vcpu_id) < 0 )
-    {
-        printk("VMX domain bind port %d to vcpu %d failed!\n",
-               iopacket_port(v), v->vcpu_id);
-        domain_crash_synchronous();
-    }
-
-    HVM_DBG_LOG(DBG_LEVEL_1, "eport: %x", iopacket_port(v));
-
-    clear_bit(iopacket_port(v),
-              &v->domain->shared_info->evtchn_mask[0]);
-
     __asm__ __volatile__ ("mov %%cr0,%0" : "=r" (cr0) : );
 
     error |= __vmwrite(GUEST_CR0, cr0);
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/vmx/vmx.c
--- a/xen/arch/x86/hvm/vmx/vmx.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/vmx/vmx.c	Tue Jul 18 13:43:27 2006 +0100
@@ -25,6 +25,7 @@
 #include <xen/irq.h>
 #include <xen/softirq.h>
 #include <xen/domain_page.h>
+#include <xen/hypercall.h>
 #include <asm/current.h>
 #include <asm/io.h>
 #include <asm/shadow.h>
@@ -139,6 +140,7 @@ static void vmx_relinquish_guest_resourc
             kill_timer(&VLAPIC(v)->vlapic_timer);
             xfree(VLAPIC(v));
         }
+	hvm_release_assist_channel(v);
     }
 
     kill_timer(&d->arch.hvm_domain.pl_time.periodic_tm.timer);
@@ -669,6 +671,28 @@ static int check_vmx_controls(u32 ctrls,
     return 1;
 }
 
+static void vmx_init_hypercall_page(struct domain *d, void *hypercall_page)
+{
+    char *p;
+    int i;
+
+    memset(hypercall_page, 0, PAGE_SIZE);
+
+    for ( i = 0; i < (PAGE_SIZE / 32); i++ )
+    {
+        p = (char *)(hypercall_page + (i * 32));
+        *(u8  *)(p + 0) = 0xb8; /* mov imm32, %eax */
+        *(u32 *)(p + 1) = i;
+        *(u8  *)(p + 5) = 0x0f; /* vmcall */
+        *(u8  *)(p + 6) = 0x01;
+        *(u8  *)(p + 7) = 0xc1;
+        *(u8  *)(p + 8) = 0xc3; /* ret */
+    }
+
+    /* Don't support HYPERVISOR_iret at the moment */
+    *(u16 *)(hypercall_page + (__HYPERVISOR_iret * 32)) = 0x0b0f; /* ud2 */
+}
+
 int start_vmx(void)
 {
     u32 eax, edx;
@@ -748,6 +772,8 @@ int start_vmx(void)
     hvm_funcs.get_guest_ctrl_reg = vmx_get_ctrl_reg;
 
     hvm_funcs.init_ap_context = vmx_init_ap_context;
+
+    hvm_funcs.init_hypercall_page = vmx_init_hypercall_page;
 
     hvm_enabled = 1;
 
@@ -1968,7 +1994,7 @@ void vmx_vmexit_do_hlt(void)
         next_wakeup = next_pit;
     if ( next_wakeup != - 1 ) 
         set_timer(&current->arch.hvm_vmx.hlt_timer, next_wakeup);
-    hvm_safe_block();
+    do_sched_op_compat(SCHEDOP_block, 0);
 }
 
 static inline void vmx_vmexit_do_extint(struct cpu_user_regs *regs)
@@ -2138,11 +2164,10 @@ asmlinkage void vmx_vmexit_handler(struc
          * (1) We can get an exception (e.g. #PG) in the guest, or
          * (2) NMI
          */
-        int error;
         unsigned int vector;
         unsigned long va;
 
-        if ((error = __vmread(VM_EXIT_INTR_INFO, &vector))
+        if (__vmread(VM_EXIT_INTR_INFO, &vector)
             || !(vector & INTR_INFO_VALID_MASK))
             __hvm_bug(&regs);
         vector &= INTR_INFO_VECTOR_MASK;
@@ -2215,7 +2240,7 @@ asmlinkage void vmx_vmexit_handler(struc
                         (unsigned long)regs.ecx, (unsigned long)regs.edx,
                         (unsigned long)regs.esi, (unsigned long)regs.edi);
 
-            if (!(error = vmx_do_page_fault(va, &regs))) {
+            if (!vmx_do_page_fault(va, &regs)) {
                 /*
                  * Inject #PG using Interruption-Information Fields
                  */
@@ -2273,16 +2298,16 @@ asmlinkage void vmx_vmexit_handler(struc
         __update_guest_eip(inst_len);
         break;
     }
-#if 0 /* keep this for debugging */
     case EXIT_REASON_VMCALL:
+    {
         __get_instruction_length(inst_len);
         __vmread(GUEST_RIP, &eip);
         __vmread(EXIT_QUALIFICATION, &exit_qualification);
 
-        hvm_print_line(v, regs.eax); /* provides the current domain */
+        hvm_do_hypercall(&regs);
         __update_guest_eip(inst_len);
         break;
-#endif
+    }
     case EXIT_REASON_CR_ACCESS:
     {
         __vmread(GUEST_RIP, &eip);
@@ -2323,7 +2348,6 @@ asmlinkage void vmx_vmexit_handler(struc
     case EXIT_REASON_MWAIT_INSTRUCTION:
         __hvm_bug(&regs);
         break;
-    case EXIT_REASON_VMCALL:
     case EXIT_REASON_VMCLEAR:
     case EXIT_REASON_VMLAUNCH:
     case EXIT_REASON_VMPTRLD:
diff -r ecb8ff1fcf1f xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/mm.c	Tue Jul 18 13:43:27 2006 +0100
@@ -2982,7 +2982,12 @@ long arch_memory_op(int op, XEN_GUEST_HA
         if ( copy_from_guest(&xatp, arg, 1) )
             return -EFAULT;
 
-        if ( (d = find_domain_by_id(xatp.domid)) == NULL )
+        if ( xatp.domid == DOMID_SELF ) {
+            d = current->domain;
+            get_knownalive_domain(d);
+        } else if ( !IS_PRIV(current->domain) )
+            return -EPERM;
+        else if ( (d = find_domain_by_id(xatp.domid)) == NULL )
             return -ESRCH;
 
         switch ( xatp.space )
diff -r ecb8ff1fcf1f xen/arch/x86/x86_32/entry.S
--- a/xen/arch/x86/x86_32/entry.S	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/x86_32/entry.S	Tue Jul 18 13:43:27 2006 +0100
@@ -656,6 +656,7 @@ ENTRY(hypercall_table)
         .long do_xenoprof_op
         .long do_event_channel_op
         .long do_physdev_op
+        .long do_hvm_op             /* 34 */
         .rept NR_hypercalls-((.-hypercall_table)/4)
         .long do_ni_hypercall
         .endr
@@ -695,6 +696,7 @@ ENTRY(hypercall_args_table)
         .byte 2 /* do_xenoprof_op       */
         .byte 2 /* do_event_channel_op  */
         .byte 2 /* do_physdev_op        */
+        .byte 2 /* do_hvm_op            */  /* 34 */
         .rept NR_hypercalls-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff -r ecb8ff1fcf1f xen/arch/x86/x86_32/traps.c
--- a/xen/arch/x86/x86_32/traps.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/x86_32/traps.c	Tue Jul 18 13:43:27 2006 +0100
@@ -486,9 +486,11 @@ static void hypercall_page_initialise_ri
     *(u16 *)(p+ 6) = 0x82cd;  /* int  $0x82 */
 }
 
-void hypercall_page_initialise(void *hypercall_page)
-{
-    if ( supervisor_mode_kernel )
+void hypercall_page_initialise(struct domain *d, void *hypercall_page)
+{
+    if ( hvm_guest(d->vcpu[0]) )
+        hvm_hypercall_page_initialise(d, hypercall_page);
+    else if ( supervisor_mode_kernel )
         hypercall_page_initialise_ring0_kernel(hypercall_page);
     else
         hypercall_page_initialise_ring1_kernel(hypercall_page);
diff -r ecb8ff1fcf1f xen/common/event_channel.c
--- a/xen/common/event_channel.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/common/event_channel.c	Tue Jul 18 13:43:27 2006 +0100
@@ -46,6 +46,104 @@
         goto out;                                                   \
     } while ( 0 )
 
+#define NR_XEN_EVENT_CHANNELS 32
+#define XECS_FREE 0        /* Not in use at all */
+#define XECS_UNBOUND 1     /* Allocated but not bound to */
+#define XECS_BOUND 2       /* Bound to somewhere in domain-space */
+#define XECS_HBOUND 3      /* Half bound: Xen is trying to tear this
+                              down, but a domain is still attached */
+struct xen_evtchn {
+    int state;
+
+    void (*fire)(void *d); /* called when dom0 tries to send on this
+                              event channel. */
+    void *data;
+
+    struct domain *dom; /* Who is allowed to bind/currently bound */
+    int dom_port;
+};
+
+static struct xen_evtchn xen_event_channels[NR_XEN_EVENT_CHANNELS];
+/* Leaf lock protecting the xen_event_channels array. */
+static spinlock_t xen_event_channel_lock = SPIN_LOCK_UNLOCKED;
+
+int alloc_xen_event_channel(void (*f)(void *d),
+                            void *data,
+                            struct domain *d)
+{
+    int ind;
+
+    spin_lock(&xen_event_channel_lock);
+    for (ind = 0; ind < NR_XEN_EVENT_CHANNELS; ind++)
+        if ( xen_event_channels[ind].state == XECS_FREE )
+            break;
+    if ( ind == NR_XEN_EVENT_CHANNELS ) {
+        printf("Out of Xen event channels?\n");
+        ind = -1;
+        goto out;
+    }
+    xen_event_channels[ind].state = XECS_UNBOUND;
+    xen_event_channels[ind].fire = f;
+    xen_event_channels[ind].data = data;
+    xen_event_channels[ind].dom = d;
+ out:
+    spin_unlock(&xen_event_channel_lock);
+    return ind;
+}
+
+void release_xen_event_channel(int ind)
+{
+    spin_lock(&xen_event_channel_lock);
+    switch ( xen_event_channels[ind].state ) {
+    case XECS_UNBOUND:
+        xen_event_channels[ind].state = XECS_FREE;
+        break;
+    case XECS_BOUND:
+        xen_event_channels[ind].state = XECS_HBOUND;
+        break;
+    case XECS_HBOUND:
+        panic("Double free of Xen event channel.\n");
+    case XECS_FREE:
+        printf("Attempt to free non-allocated Xen event channel %d?\n",
+               ind);
+    default:
+        BUG();
+    }
+
+    spin_unlock(&xen_event_channel_lock);
+}
+
+void notify_xen_event_channel(int port)
+{
+    struct xen_evtchn *xchn = xen_event_channels + port;
+    struct domain *d = NULL;
+    struct evtchn *chn;
+
+    /* We rely on our caller to ensure that nobody's trying to tear
+       the channel down from inside Xen while it's being signalled on.
+       That means that the only transition the channel could make is
+       from BOUND to UNBOUND or vice-versa.  Neither of those change
+       the dom field, so we can read it without taking a lock.  This
+       simplifies the lock ordering a bit. */
+    d = xchn->dom;
+    ASSERT(d);
+    if ( !get_domain(d) )
+        return;
+    spin_lock(&d->evtchn_lock);
+    spin_lock(&xen_event_channel_lock);
+    if ( xchn->state != XECS_UNBOUND ) {
+        BUG_ON(xchn->state != XECS_BOUND);
+        BUG_ON(d != xchn->dom);
+        chn = evtchn_from_port(d, xchn->dom_port);
+        if ( chn->state == ECS_XEN )
+            evtchn_set_pending(d->vcpu[chn->notify_vcpu_id],
+                               xchn->dom_port);
+    } else
+        printf("Send on unbound Xen event channel?\n");
+
+    spin_unlock(&d->evtchn_lock);
+    spin_unlock(&xen_event_channel_lock);
+}
 
 static int virq_is_global(int virq)
 {
@@ -134,6 +232,44 @@ static long evtchn_alloc_unbound(evtchn_
 }
 
 
+static long evtchn_bind_xen(struct domain *ld, int xen_port)
+{
+    long rc = 0;
+    struct evtchn *lchn;
+    struct xen_evtchn *rchn;
+    int lport;
+
+    if ( xen_port < 0 || xen_port >= NR_XEN_EVENT_CHANNELS )
+        return -EINVAL;
+
+    spin_lock(&ld->evtchn_lock);
+    spin_lock(&xen_event_channel_lock);
+
+    rchn = xen_event_channels + xen_port;
+    if ( rchn->state != XECS_UNBOUND || rchn->dom != ld )
+        ERROR_EXIT(-EINVAL);
+
+    if ( (lport = get_free_port(ld)) < 0 )
+        ERROR_EXIT(lport);
+    lchn = evtchn_from_port(ld, lport);
+    lchn->state = ECS_XEN;
+    lchn->u.xen_port = xen_port;
+
+    rchn->state = XECS_BOUND;
+    rchn->dom_port = lport;
+
+    /* Somewhat ugly hack to avoid lost wakeups if we've tried to
+       notify this port before anyone got around to binding it. */
+    evtchn_set_pending(ld->vcpu[lchn->notify_vcpu_id], lport);
+    rc = lport;
+
+ out:
+    spin_unlock(&xen_event_channel_lock);
+    spin_unlock(&ld->evtchn_lock);
+
+    return rc;
+}
+
 static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
 {
     struct evtchn *lchn, *rchn;
@@ -147,6 +283,15 @@ static long evtchn_bind_interdomain(evtc
 
     if ( rdom == DOMID_SELF )
         rdom = current->domain->domain_id;
+
+    if ( rdom == DOMID_XEN ) {
+        rc = evtchn_bind_xen(ld, rport);
+        if ( rc >= 0 ) {
+            bind->local_port = rc;
+            rc = 0;
+        }
+        return rc;
+    }
 
     if ( (rd = find_domain_by_id(rdom)) == NULL )
         return -ESRCH;
@@ -317,11 +462,12 @@ static long evtchn_bind_pirq(evtchn_bind
 
 static long __evtchn_close(struct domain *d1, int port1)
 {
-    struct domain *d2 = NULL;
-    struct vcpu   *v;
-    struct evtchn *chn1, *chn2;
-    int            port2;
-    long           rc = 0;
+    struct domain     *d2 = NULL;
+    struct vcpu       *v;
+    struct evtchn     *chn1, *chn2;
+    int                port2;
+    long               rc = 0;
+    struct xen_evtchn *xchn;
 
  again:
     spin_lock(&d1->evtchn_lock);
@@ -409,6 +555,19 @@ static long __evtchn_close(struct domain
         chn2->u.unbound.remote_domid = d1->domain_id;
         break;
 
+    case ECS_XEN:
+        spin_lock(&xen_event_channel_lock);
+        xchn = xen_event_channels + chn1->u.xen_port;
+        BUG_ON(xchn->dom != d1);
+        if ( xchn->state == XECS_HBOUND )
+            xchn->state = XECS_FREE;
+        else if (xchn->state == XECS_BOUND)
+            xchn->state = XECS_UNBOUND;
+        else
+            BUG();
+        spin_unlock(&xen_event_channel_lock);
+        break;
+
     default:
         BUG();
     }
@@ -442,6 +601,7 @@ long evtchn_send(unsigned int lport)
     struct evtchn *lchn, *rchn;
     struct domain *ld = current->domain, *rd;
     int            rport, ret = 0;
+    struct xen_evtchn *xchn;
 
     spin_lock(&ld->evtchn_lock);
 
@@ -465,6 +625,16 @@ long evtchn_send(unsigned int lport)
         break;
     case ECS_UNBOUND:
         /* silently drop the notification */
+        break;
+    case ECS_XEN:
+        xchn = xen_event_channels + lchn->u.xen_port;
+        spin_lock(&xen_event_channel_lock);
+        if ( xchn->state != XECS_HBOUND )
+        {
+            BUG_ON(xchn->state != XECS_BOUND);
+            xchn->fire(xchn->data);
+        }
+        spin_unlock(&xen_event_channel_lock);
         break;
     default:
         ret = -EINVAL;
@@ -596,6 +766,11 @@ static long evtchn_status(evtchn_status_
             chn->u.interdomain.remote_dom->domain_id;
         status->u.interdomain.port = chn->u.interdomain.remote_port;
         break;
+    case ECS_XEN:
+        status->status = EVTCHNSTAT_interdomain;
+        status->u.interdomain.dom = DOMID_XEN;
+        status->u.interdomain.port = chn->u.xen_port;
+        break;
     case ECS_PIRQ:
         status->status = EVTCHNSTAT_pirq;
         status->u.pirq = chn->u.pirq;
@@ -649,6 +824,7 @@ long evtchn_bind_vcpu(unsigned int port,
     case ECS_UNBOUND:
     case ECS_INTERDOMAIN:
     case ECS_PIRQ:
+    case ECS_XEN:
         chn->notify_vcpu_id = vcpu_id;
         break;
     default:
diff -r ecb8ff1fcf1f xen/common/memory.c
--- a/xen/common/memory.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/common/memory.c	Tue Jul 18 13:43:27 2006 +0100
@@ -158,6 +158,9 @@ guest_remove_page(
     }
             
     page = mfn_to_page(mfn);
+    if ( IS_XEN_HEAP_FRAME(page) )
+        return 0;
+
     if ( unlikely(!get_page(page, d)) )
     {
         DPRINTK("Bad page free for domain %u\n", d->domain_id);
diff -r ecb8ff1fcf1f xen/include/asm-x86/domain.h
--- a/xen/include/asm-x86/domain.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/domain.h	Tue Jul 18 13:43:27 2006 +0100
@@ -55,7 +55,7 @@ extern void toggle_guest_mode(struct vcp
  * Initialise a hypercall-transfer page. The given pointer must be mapped
  * in Xen virtual address space (accesses are not validated or checked).
  */
-extern void hypercall_page_initialise(void *);
+extern void hypercall_page_initialise(struct domain *d, void *);
 
 struct arch_domain
 {
diff -r ecb8ff1fcf1f xen/include/asm-x86/guest_access.h
--- a/xen/include/asm-x86/guest_access.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/guest_access.h	Tue Jul 18 13:43:27 2006 +0100
@@ -8,6 +8,8 @@
 #define __ASM_X86_GUEST_ACCESS_H__
 
 #include <asm/uaccess.h>
+#include <asm/hvm/support.h>
+#include <asm/hvm/guest_access.h>
 
 /* Is the guest handle a NULL reference? */
 #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
@@ -28,6 +30,8 @@
 #define copy_to_guest_offset(hnd, off, ptr, nr) ({      \
     const typeof(ptr) _x = (hnd).p;                     \
     const typeof(ptr) _y = (ptr);                       \
+    hvm_guest(current) ?                                \
+    copy_to_user_hvm(_x+(off), _y, sizeof(*_x)*(nr)) :  \
     copy_to_user(_x+(off), _y, sizeof(*_x)*(nr));       \
 })
 
@@ -38,6 +42,8 @@
 #define copy_from_guest_offset(ptr, hnd, off, nr) ({    \
     const typeof(ptr) _x = (hnd).p;                     \
     const typeof(ptr) _y = (ptr);                       \
+    hvm_guest(current) ?                                \
+    copy_from_user_hvm(_y, _x+(off), sizeof(*_x)*(nr)) :\
     copy_from_user(_y, _x+(off), sizeof(*_x)*(nr));     \
 })
 
@@ -45,6 +51,8 @@
 #define copy_field_to_guest(hnd, ptr, field) ({         \
     const typeof(&(ptr)->field) _x = &(hnd).p->field;   \
     const typeof(&(ptr)->field) _y = &(ptr)->field;     \
+    hvm_guest(current) ?                                \
+    copy_to_user_hvm(_x, _y, sizeof(*_x)) :             \
     copy_to_user(_x, _y, sizeof(*_x));                  \
 })
 
@@ -52,6 +60,8 @@
 #define copy_field_from_guest(ptr, hnd, field) ({       \
     const typeof(&(ptr)->field) _x = &(hnd).p->field;   \
     const typeof(&(ptr)->field) _y = &(ptr)->field;     \
+    hvm_guest(current) ?                                \
+    copy_from_user_hvm(_y, _x, sizeof(*_x)) :           \
     copy_from_user(_y, _x, sizeof(*_x));                \
 })
 
@@ -60,29 +70,37 @@
  * Allows use of faster __copy_* functions.
  */
 #define guest_handle_okay(hnd, nr)                      \
-    array_access_ok((hnd).p, (nr), sizeof(*(hnd).p))
+    (hvm_guest(current) || array_access_ok((hnd).p, (nr), sizeof(*(hnd).p)))
 
 #define __copy_to_guest_offset(hnd, off, ptr, nr) ({    \
     const typeof(ptr) _x = (hnd).p;                     \
     const typeof(ptr) _y = (ptr);                       \
+    hvm_guest(current) ?                                \
+    copy_to_user_hvm(_x+(off), _y, sizeof(*_x)*(nr)) :  \
     __copy_to_user(_x+(off), _y, sizeof(*_x)*(nr));     \
 })
 
 #define __copy_from_guest_offset(ptr, hnd, off, nr) ({  \
     const typeof(ptr) _x = (hnd).p;                     \
     const typeof(ptr) _y = (ptr);                       \
+    hvm_guest(current) ?                                \
+    copy_from_user_hvm(_y, _x+(off),sizeof(*_x)*(nr)) : \
     __copy_from_user(_y, _x+(off), sizeof(*_x)*(nr));   \
 })
 
 #define __copy_field_to_guest(hnd, ptr, field) ({       \
     const typeof(&(ptr)->field) _x = &(hnd).p->field;   \
     const typeof(&(ptr)->field) _y = &(ptr)->field;     \
+    hvm_guest(current) ?                                \
+    copy_to_user_hvm(_x, _y, sizeof(*_x)) :             \
     __copy_to_user(_x, _y, sizeof(*_x));                \
 })
 
 #define __copy_field_from_guest(ptr, hnd, field) ({     \
     const typeof(&(ptr)->field) _x = &(hnd).p->field;   \
     const typeof(&(ptr)->field) _y = &(ptr)->field;     \
+    hvm_guest(current) ?                                \
+    copy_from_user_hvm(_x, _y, sizeof(*_x)) :           \
     __copy_from_user(_y, _x, sizeof(*_x));              \
 })
 
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/domain.h
--- a/xen/include/asm-x86/hvm/domain.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/domain.h	Tue Jul 18 13:43:27 2006 +0100
@@ -27,17 +27,15 @@
 #include <asm/hvm/vpit.h>
 #include <asm/hvm/vlapic.h>
 #include <asm/hvm/vioapic.h>
+#include <public/hvm/params.h>
 
 #define HVM_PBUF_SIZE   80
 
 struct hvm_domain {
     unsigned long          shared_page_va;
-    unsigned int           nr_vcpus;
-    unsigned int           apic_enabled;
-    unsigned int           pae_enabled;
     s64                    tsc_frequency;
     struct pl_time         pl_time;
-    
+
     struct hvm_virpic      vpic;
     struct hvm_vioapic     vioapic;
     struct hvm_io_handler  io_handler;
@@ -48,6 +46,8 @@ struct hvm_domain {
 
     int                    pbuf_index;
     char                   pbuf[HVM_PBUF_SIZE];
+
+    unsigned long          params[HVM_NR_PARAMS];
 };
 
 #endif /* __ASM_X86_HVM_DOMAIN_H__ */
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/hvm.h
--- a/xen/include/asm-x86/hvm/hvm.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/hvm.h	Tue Jul 18 13:43:27 2006 +0100
@@ -61,6 +61,8 @@ struct hvm_function_table {
 
     void (*init_ap_context)(struct vcpu_guest_context *ctxt,
                             int vcpuid, int trampoline_vector);
+
+    void (*init_hypercall_page)(struct domain *d, void *hypercall_page);
 };
 
 extern struct hvm_function_table hvm_funcs;
@@ -75,12 +77,20 @@ hvm_disable(void)
         hvm_funcs.disable();
 }
 
+void hvm_create_event_channels(struct vcpu *v);
+void hvm_map_io_shared_page(struct vcpu *v);
+
 static inline int
 hvm_initialize_guest_resources(struct vcpu *v)
 {
-    if ( hvm_funcs.initialize_guest_resources )
-        return hvm_funcs.initialize_guest_resources(v);
-    return 0;
+    int ret = 1;
+    if (hvm_funcs.initialize_guest_resources)
+	ret = hvm_funcs.initialize_guest_resources(v);
+    if (ret == 1) {
+	hvm_map_io_shared_page(v);
+	hvm_create_event_channels(v);
+    }
+    return ret;
 }
 
 static inline void
@@ -121,6 +131,9 @@ hvm_instruction_length(struct vcpu *v)
     return hvm_funcs.instruction_length(v);
 }
 
+void hvm_hypercall_page_initialise(struct domain *d,
+                                   void *hypercall_page);
+
 static inline unsigned long
 hvm_get_guest_ctrl_reg(struct vcpu *v, unsigned int num)
 {
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/io.h
--- a/xen/include/asm-x86/hvm/io.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/io.h	Tue Jul 18 13:43:27 2006 +0100
@@ -150,14 +150,14 @@ static inline int irq_masked(unsigned lo
 #endif
 
 extern void handle_mmio(unsigned long, unsigned long);
-extern void hvm_wait_io(void);
-extern void hvm_safe_block(void);
 extern void hvm_io_assist(struct vcpu *v);
 extern void pic_irq_request(void *data, int level);
 extern void hvm_pic_assist(struct vcpu *v);
 extern int cpu_get_interrupt(struct vcpu *v, int *type);
 extern int cpu_has_pending_irq(struct vcpu *v);
 
+void hvm_release_assist_channel(struct vcpu *v);
+
 // XXX - think about this, maybe use bit 30 of the mfn to signify an MMIO frame.
 #define mmio_space(gpa) (!VALID_MFN(get_mfn_from_gpfn((gpa) >> PAGE_SHIFT)))
 
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/support.h
--- a/xen/include/asm-x86/hvm/support.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/support.h	Tue Jul 18 13:43:27 2006 +0100
@@ -42,11 +42,6 @@ static inline vcpu_iodata_t *get_vio(str
 static inline vcpu_iodata_t *get_vio(struct domain *d, unsigned long cpu)
 {
     return &get_sp(d)->vcpu_iodata[cpu];
-}
-
-static inline int iopacket_port(struct vcpu *v)
-{
-    return get_vio(v->domain, v->vcpu_id)->vp_eport;
 }
 
 /* XXX these are really VMX specific */
@@ -148,4 +143,9 @@ extern void hvm_print_line(struct vcpu *
 extern void hvm_print_line(struct vcpu *v, const char c);
 extern void hlt_timer_fn(void *data);
 
+void hvm_prod_vcpu(struct vcpu *v);
+void hvm_assist_complete(struct vcpu *v);
+
+void hvm_do_hypercall(struct cpu_user_regs *pregs);
+
 #endif /* __ASM_X86_HVM_SUPPORT_H__ */
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/svm/vmmcall.h
--- a/xen/include/asm-x86/hvm/svm/vmmcall.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/svm/vmmcall.h	Tue Jul 18 13:43:27 2006 +0100
@@ -23,11 +23,11 @@
 #define __ASM_X86_HVM_SVM_VMMCALL_H__
 
 /* VMMCALL command fields */
-#define VMMCALL_CODE_CPL_MASK     0xC0000000
-#define VMMCALL_CODE_MBZ_MASK     0x3FFF0000
+#define VMMCALL_CODE_CPL_MASK     0x60000000
+#define VMMCALL_CODE_MBZ_MASK     0x1FFF0000
 #define VMMCALL_CODE_COMMAND_MASK 0x0000FFFF
 
-#define MAKE_VMMCALL_CODE(cpl,func) ((cpl << 30) | (func))
+#define MAKE_VMMCALL_CODE(cpl,func) ((cpl << 29) | (func) | 0x80000000)
 
 /* CPL=0 VMMCALL Requests */
 #define VMMCALL_RESET_TO_REALMODE   MAKE_VMMCALL_CODE(0,1)
@@ -38,7 +38,7 @@
 /* return the cpl required for the vmmcall cmd */
 static inline int get_vmmcall_cpl(int cmd)
 {
-    return (cmd & VMMCALL_CODE_CPL_MASK) >> 30;
+    return (cmd & VMMCALL_CODE_CPL_MASK) >> 29;
 }
 
 #endif /* __ASM_X86_HVM_SVM_VMMCALL_H__ */
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/vcpu.h
--- a/xen/include/asm-x86/hvm/vcpu.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/vcpu.h	Tue Jul 18 13:43:27 2006 +0100
@@ -38,6 +38,8 @@ struct hvm_vcpu {
     /* For AP startup */
     unsigned long       init_sipi_sipi_state;
 
+    int                 xen_port;
+
     /* Flags */
     int                 flag_dr_dirty;
 
diff -r ecb8ff1fcf1f xen/include/asm-x86/shadow.h
--- a/xen/include/asm-x86/shadow.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/shadow.h	Tue Jul 18 13:43:27 2006 +0100
@@ -1733,6 +1733,32 @@ static inline unsigned long gva_to_gpa(u
 
     return l1e_get_paddr(gpte) + (gva & ~PAGE_MASK); 
 }
+
+static inline unsigned long gva_to_mfn(unsigned long gva)
+{
+    l1_pgentry_t l1e;
+
+    if (__copy_from_user(&l1e, &shadow_linear_pg_table[l1_linear_offset(gva)],
+                         sizeof(l1e)) ||
+        (l1e_get_flags(l1e) & (_PAGE_PRESENT | _PAGE_RW)) !=
+         (_PAGE_PRESENT | _PAGE_RW) ) {
+        struct cpu_user_regs cur;
+        /* Error code -> write */
+        cur.error_code = 3;
+        cur.cs = 0; /* Ring 0 -> hypervisor */
+        cur.eflags = 0;
+        shadow_fault(gva, &cur);
+        if (__copy_from_user(&l1e,
+                             &shadow_linear_pg_table[l1_linear_offset(gva)],
+                             sizeof(l1e)) ||
+            (l1e_get_flags(l1e) & (_PAGE_PRESENT | _PAGE_RW)) !=
+             (_PAGE_PRESENT | _PAGE_RW) ) {
+            return 0;
+        }
+    }
+    return l1e_get_pfn(l1e);
+}
+
 #endif
 /************************************************************************/
 
diff -r ecb8ff1fcf1f xen/include/public/hvm/ioreq.h
--- a/xen/include/public/hvm/ioreq.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/public/hvm/ioreq.h	Tue Jul 18 13:43:27 2006 +0100
@@ -27,7 +27,6 @@
 #define STATE_IOREQ_READY       1
 #define STATE_IOREQ_INPROCESS   2
 #define STATE_IORESP_READY      3
-#define STATE_IORESP_HOOK       4
 
 #define IOREQ_TYPE_PIO          0 /* pio */
 #define IOREQ_TYPE_COPY         1 /* mmio ops */
@@ -67,10 +66,8 @@ typedef struct global_iodata global_ioda
 typedef struct global_iodata global_iodata_t;
 
 struct vcpu_iodata {
-    struct ioreq         vp_ioreq;
-    /* Event channel port */
-    unsigned int    vp_eport;   /* VMX vcpu uses this to notify DM */
-    unsigned int    dm_eport;   /* DM uses this to notify VMX vcpu */
+    ioreq_t         vp_ioreq;
+    int             vp_xen_port;
 };
 typedef struct vcpu_iodata vcpu_iodata_t;
 
diff -r ecb8ff1fcf1f xen/include/public/xen.h
--- a/xen/include/public/xen.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/public/xen.h	Tue Jul 18 13:43:27 2006 +0100
@@ -66,6 +66,7 @@
 #define __HYPERVISOR_xenoprof_op          31
 #define __HYPERVISOR_event_channel_op     32
 #define __HYPERVISOR_physdev_op           33
+#define __HYPERVISOR_hvm_op               34
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff -r ecb8ff1fcf1f xen/include/xen/event.h
--- a/xen/include/xen/event.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/xen/event.h	Tue Jul 18 13:43:27 2006 +0100
@@ -44,4 +44,10 @@ extern long evtchn_send(unsigned int lpo
 /* Bind a local event-channel port to the specified VCPU. */
 extern long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id);
 
+int alloc_xen_event_channel(void (*f)(void *d),
+                            void *data,
+                            struct domain *d);
+void release_xen_event_channel(int ind);
+void notify_xen_event_channel(int port);
+
 #endif /* __XEN_EVENT_H__ */
diff -r ecb8ff1fcf1f xen/include/xen/hypercall.h
--- a/xen/include/xen/hypercall.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/xen/hypercall.h	Tue Jul 18 13:43:27 2006 +0100
@@ -87,4 +87,9 @@ do_nmi_op(
     unsigned int cmd,
     XEN_GUEST_HANDLE(void) arg);
 
+extern long
+do_hvm_op(
+    unsigned long op,
+    XEN_GUEST_HANDLE(void) arg);
+
 #endif /* __XEN_HYPERCALL_H__ */
diff -r ecb8ff1fcf1f xen/include/xen/sched.h
--- a/xen/include/xen/sched.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/xen/sched.h	Tue Jul 18 13:43:27 2006 +0100
@@ -36,6 +36,7 @@ struct evtchn
 #define ECS_PIRQ         4 /* Channel is bound to a physical IRQ line.       */
 #define ECS_VIRQ         5 /* Channel is bound to a virtual IRQ line.        */
 #define ECS_IPI          6 /* Channel is bound to a virtual IPI line.        */
+#define ECS_XEN          7 /* Channel ends in Xen                            */
     u16 state;             /* ECS_* */
     u16 notify_vcpu_id;    /* VCPU for local delivery notification */
     union {
@@ -48,6 +49,7 @@ struct evtchn
         } interdomain; /* state == ECS_INTERDOMAIN */
         u16 pirq;      /* state == ECS_PIRQ */
         u16 virq;      /* state == ECS_VIRQ */
+        int xen_port;  /* state == ECS_XEN */
     } u;
 };
 
diff -r ecb8ff1fcf1f tools/ioemu/hw/xen_evtchn.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/ioemu/hw/xen_evtchn.c	Tue Jul 18 13:43:27 2006 +0100
@@ -0,0 +1,160 @@
+/*
+ * XEN event channel fake pci devicel
+ * 
+ * Copyright (c) 2003-2004 Intel Corp.
+ * Copyright (c) 2006 XenSource
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#include "vl.h"
+
+#include <xenguest.h>
+#include <xc_private.h>
+
+extern FILE *logfile;
+
+extern int domid;
+extern int xc_handle;
+
+static unsigned ioport_base;
+
+static void evtchn_ioport_write(void *opaque, uint32_t addr, uint32_t val)
+{
+    DECLARE_DOM0_OP;
+    int rc;
+
+    switch (addr - ioport_base) {
+    case 0:
+	fprintf(logfile, "Init hypercall page %x, addr %x.\n", val, addr);
+	op.u.hypercall_init.domain = domid;
+	op.u.hypercall_init.gmfn = val;
+	op.cmd = DOM0_HYPERCALL_INIT;
+	rc = xc_dom0_op(xc_handle, &op);
+	fprintf(logfile, "result -> %d.\n", rc);
+	break;
+    default:
+	fprintf(logfile, "Write to bad port %x (base %x) on evtchn device.\n",
+		addr, ioport_base);
+	break;
+    }
+}
+
+static uint32_t evtchn_ioport_read(void *opaque, uint32_t addr)
+{
+    return 0;
+}
+
+static void evtchn_map(PCIDevice *pci_dev, int region_num,
+                       uint32_t addr, uint32_t size, int type)
+{
+    ioport_base = addr;
+    register_ioport_write(addr, 16, 4, evtchn_ioport_write, NULL);
+    register_ioport_read(addr, 16, 1, evtchn_ioport_read, NULL);
+}
+
+static uint32_t xen_mmio_read(void *opaque, target_phys_addr_t addr)
+{
+    fprintf(logfile, "Warning: try read from evtchn mmio space\n");
+    return 0;
+}
+
+static void xen_mmio_write(void *opaque, target_phys_addr_t addr,
+			       uint32_t val)
+{
+    fprintf(logfile, "Warning: try write to evtchn mmio space\n");
+    return;
+}
+
+static CPUReadMemoryFunc *xen_evtchn_mmio_read[3] = {
+    xen_mmio_read,
+    xen_mmio_read,
+    xen_mmio_read,
+};
+
+static CPUWriteMemoryFunc *xen_evtchn_mmio_write[3] = {
+    xen_mmio_write,
+    xen_mmio_write,
+    xen_mmio_write,
+};
+
+static void xen_evtchn_pci_mmio_map(PCIDevice *d, int region_num,
+				uint32_t addr, uint32_t size, int type)
+{
+    int mmio_io_addr;
+
+    mmio_io_addr = cpu_register_io_memory(0,
+                        xen_evtchn_mmio_read,
+                        xen_evtchn_mmio_write, NULL);
+
+    cpu_register_physical_memory(addr, 0x1000000, mmio_io_addr);
+}
+
+struct pci_config_header {
+    unsigned short vendor_id;
+    unsigned short device_id;
+    unsigned short command;
+    unsigned short status;
+    unsigned char revision;
+    unsigned char api;
+    unsigned char subclass;
+    unsigned char class;
+    unsigned char cache_line_size; /* Units of 32 bit words */
+    unsigned char latency_timer; /* In units of bus cycles */
+    unsigned char header_type; /* Should be 0 */
+    unsigned char bist; /* Built in self test */
+    unsigned long base_address_regs[6];
+    unsigned long reserved1;
+    unsigned long reserved2;
+    unsigned long rom_addr;
+    unsigned long reserved3;
+    unsigned long reserved4;
+    unsigned char interrupt_line;
+    unsigned char interrupt_pin;
+    unsigned char min_gnt;
+    unsigned char max_lat;
+};
+
+void pci_xen_evtchn_init(PCIBus *bus)
+{
+    PCIDevice *d;
+    struct pci_config_header *pch;
+
+    printf("Register xen evtchn.\n");
+    d = pci_register_device(bus, "xen-evtchn", sizeof(PCIDevice), -1, NULL,
+			    NULL);
+    pch = (struct pci_config_header *)d->config;
+    pch->vendor_id = 0xfffd;
+    pch->device_id = 0x0101;
+    pch->command = 3; /* IO and memory access */
+    pch->revision = 0;
+    pch->api = 0;
+    pch->subclass = 0x80; /* Other */
+    pch->class = 0xff; /* Unclassified device class */
+    pch->header_type = 0;
+    pch->interrupt_pin = 1;
+
+    pci_register_io_region(d, 0, 0x100, PCI_ADDRESS_SPACE_IO, evtchn_map);
+
+    /* reserve 16MB mmio address for share memory*/
+    pci_register_io_region(d, 1, 0x1000000, PCI_ADDRESS_SPACE_MEM_PREFETCH,
+			   xen_evtchn_pci_mmio_map);
+
+    register_savevm("evtchn", 0, 1, generic_pci_save, generic_pci_load, d);
+    printf("Done register evtchn.\n");
+}
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/guest_access.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/xen/include/asm-x86/hvm/guest_access.h	Tue Jul 18 13:43:27 2006 +0100
@@ -0,0 +1,7 @@
+#ifndef __ASM_X86_HVM_GUEST_ACCESS_H__
+#define __ASM_X86_HVM_GUEST_ACCESS_H__
+
+unsigned long copy_to_user_hvm(void *to, const void *from, unsigned len);
+unsigned long copy_from_user_hvm(void *to, const void *from, unsigned len);
+
+#endif /* __ASM_X86_HVM_GUEST_ACCESS_H__ */
diff -r ecb8ff1fcf1f xen/include/public/hvm/params.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/xen/include/public/hvm/params.h	Tue Jul 18 13:43:27 2006 +0100
@@ -0,0 +1,22 @@
+#ifndef PARAMS_H__
+#define PARAMS_H__
+
+#define HVM_NR_PARAMS 4
+
+#define HVM_PARAM_CALLBACK_IRQ 0
+#define HVM_PARAM_STORE_PFN    1
+#define HVM_PARAM_STORE_EVTCHN 2
+#define HVM_PARAM_APIC_ENABLED 3
+
+#define HVMOP_set_param 0
+#define HVMOP_get_param 1
+
+struct xen_hvm_param {
+    domid_t domid;
+    unsigned index;
+    unsigned long value;
+};
+typedef struct xen_hvm_param xen_hvm_param_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_param_t);
+
+#endif /* PARAMS_H__ */

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread
* RE: Paravirtualised drivers for fully virtualised domains
@ 2006-07-19  4:14 Ian Pratt
  0 siblings, 0 replies; 34+ messages in thread
From: Ian Pratt @ 2006-07-19  4:14 UTC (permalink / raw)
  To: Steve Ofsthun, Steven Smith; +Cc: xen-devel

> >>Have you built the guest environment on anything other than a 2.6.16
> >>version of Linux?  We ran into extra work supporting older linux
> versions.
> >
> > #ifdef soup will get you back to about 2.6.12-ish without too many
> > problems.  These patches don't include that, since it would
complicate
> > merging.
> 
> I was thinking about SLES9 (2.6.5), RHEL4 (2.6.9), RHEL3 (2.4.21).

Steven's patches should be easy to back port given that we already have
real PV drivers for all these kernels. 

Source for strictly unofficial (non vendor Supported) xen-ports of these
kernels are available at http://xenbits.xensource.com/kernels
2.6.5 sles9sp2; 2.6.9 rhel4u1; 2.4.21 rhel3u5 


Ian

^ permalink raw reply	[flat|nested] 34+ messages in thread
* RE: Paravirtualised drivers for fully virtualised domains
@ 2006-07-26 22:35 Nakajima, Jun
  0 siblings, 0 replies; 34+ messages in thread
From: Nakajima, Jun @ 2006-07-26 22:35 UTC (permalink / raw)
  To: Steven Smith, xen-devel; +Cc: sos22

Steven Smith wrote:
> I've just put an updated version of these patches up at
> http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev2 .  There's also an
> equivalent single big patch at
> http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev2.combined .  Thank you to
> everyone who gave feedback on the previous version.
> 
> The main changes since last time are:
> 
> -- Support for SMP guests
> -- Support for 64 bit guests on a 64 bit hypervisor
> -- Partial support for 32 bit guests on a 64 bit hypervisor: the
>    network interface works, but the block device doesn't.
> 
> The block device can be made to work by #define'ing ALIEN_INTERFACES
> in blkif.h, but drivers compiled in that way won't work with 32 on 32.
> The problem here is that blkif_request_t contains extra padding in 64
> bit builds, and so is a different size, and so the block ring layout
> is different.

When do you expect this be in the unstable tree? Or which issues must be
resolved befor that?

> 
> Other structures with similar problems are handled either by run time
> tests in the drivers (shared_info_t) or translation wrappers in the
> hypervisor (xen_feature_info_t, xen_add_to_physmap_t), but trying to
> do this for the block rings would require far more painful and
> extensive surgery.  I'm inclined to stick with multiply compiling the
> frontend drivers in the short term, although it'll obviously need
> doing in a slightly less grotty way.
> 
> Steven.

Jun
---
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 34+ messages in thread
* RE: Paravirtualised drivers for fully virtualised domains
@ 2006-08-02  8:01 He, Qing
  2006-08-02  9:30 ` Steven Smith
  0 siblings, 1 reply; 34+ messages in thread
From: He, Qing @ 2006-08-02  8:01 UTC (permalink / raw)
  To: Steven Smith, xen-devel; +Cc: sos22

Hi Steven,
I found some issues regarding this patch.
When I'm trying to start windows as VMX guest (with no drivers, of course) under this patch, the guests fail. I ran with three images, windows 2000, XP and 2003.

For 2000 and XP, QEMU windows do not show, there are two lines in the serial output:
	(XEN) Create event channels for vcpu 0.
	(XEN) Send on unbound Xen event channel?

For 2003 guest, QEMU can start, but before the windows start screen shows, it crashes and restarts, complaining about unreasonable mmio opcodes. The serial output is:
	(XEN) (GUEST: 1) unsupported PCI BIOS function 0x0E
	(XEN) (GUEST: 1) int13_harddisk: function 15, unmapped device for ELDL=82
	(XEN) 0, This opcode isn't handled yet!
	(XEN) handle_mmio: failed to decode instruction
	(XEN) mmio opcode: va 0xf821f600, gpa 0xa9600, len 2: 00 00
	(XEN) domain_crash_sync called from platform.c:880
	(XEN) Domain 1 (vcpu#0) crashed on cpu#2:
	(XEN) ----[ Xen-3.0-unstable    Not tainted ]----
	(XEN) CPU:    2
	(XEN) EIP:    0008:[<8081d986>]
	(XEN) EFLAGS: 00010202   CONTEXT: hvm
	(XEN) eax: 00008008   ebx: 000003ce   ecx: 000003ce   edx: f821f600
	(XEN) esi: 8081d9fa   edi: f886ecd0   ebp: f886ecfc   esp: f886ecbc
	(XEN) cr0: 8001003b   cr3: 8f500000
	(XEN) ds: 0023   es: 0023   fs: 0030   gs: 0000   ss: 0010   cs: 0008
	(XEN) Create event channels for vcpu 0.
	(XEN) Send on unbound Xen event channel?
	(XEN) (GUEST: 2) HVM Loader
	(XEN) (GUEST: 2) Loading ROMBIOS ...
	(XEN) (GUEST: 2) Loading Cirrus VGABIOS ...
	(XEN) (GUEST: 2) Loading VMXAssist ...
	(XEN) (GUEST: 2) VMX go ...
	(XEN) (GUEST: 2) VMXAssist (Aug  2 2006)
	(XEN) (GUEST: 2) Memory size 512 MB
	(XEN) (GUEST: 2) E820 map:
	(XEN) (GUEST: 2) 0000000000000000 - 000000000009F800 (RAM)
	(XEN) (GUEST: 2) 000000000009F800 - 00000000000A0000 (Reserved)
	(XEN) (GUEST: 2) 00000000000A0000 - 00000000000C0000 (Type 16)
	(XEN) (GUEST: 2) 00000000000F0000 - 0000000000100000 (Reserved)
	(XEN) (GUEST: 2) 0000000000100000 - 000000001FFFE000 (RAM)
	(XEN) (GUEST: 2) 000000001FFFE000 - 000000001FFFF000 (Type 18)
	(XEN) (GUEST: 2) 000000001FFFF000 - 0000000020000000 (Type 17)
	(XEN) (GUEST: 2) 0000000020000000 - 0000000020003000 (ACPI NVS)
	(XEN) (GUEST: 2) 0000000020003000 - 000000002000D000 (ACPI Data)
	(XEN) (GUEST: 2) 00000000FEC00000 - 0000000100000000 (Type 16)
	(XEN) (GUEST: 2)
	(XEN) (GUEST: 2) Start BIOS ...
	(XEN) (GUEST: 2) Starting emulated 16-bit real-mode: ip=F000:FFF0
	(XEN) (GUEST: 2)  rombios.c,v 1.138 2005/05/07 15:55:26 vruppert Exp $
	(XEN) (GUEST: 2) Remapping master: ICW2 0x8 -> 0x20
	(XEN) (GUEST: 2) Remapping slave: ICW2 0x70 -> 0x28
	(XEN) (GUEST: 2) VGABios $Id: vgabios.c,v 1.61 2005/05/24 16:50:50 vruppert Exp $
	(XEN) (GUEST: 2) HVMAssist BIOS, 1 cpu, $Revision: 1.138 $ $Date: 2005/05/07 15:55:26 $
	(XEN) (GUEST: 2)
	(XEN) (GUEST: 2) ata0-0: PCHS=16383/16/63 translation=lba LCHS=1024/255/63
	(XEN) (GUEST: 2) ata0 master: QEMU HARDDISK ATA-7 Hard-Disk (12289 MBytes)
	(XEN) (GUEST: 2) ata0-1: PCHS=3047/16/63 translation=lba LCHS=761/64/63
	(XEN) (GUEST: 2) ata0  slave: QEMU HARDDISK ATA-7 Hard-Disk (1500 MBytes)
	(XEN) (GUEST: 2) ata1 master: QEMU CD-ROM ATAPI-4 CD-Rom/DVD-Rom
	(XEN) (GUEST: 2) ata1  slave: Unknown device
	(XEN) (GUEST: 2)
	(XEN) (GUEST: 2) Booting from CD-Rom...
	(XEN) (GUEST: 2) unsupported PCI BIOS function 0x0E
	(XEN) (GUEST: 2) int13_harddisk: function 15, unmapped device for ELDL=82
	(XEN) 0, This opcode isn't handled yet!
	(XEN) handle_mmio: failed to decode instruction
	(XEN) mmio opcode: va 0xf821f600, gpa 0xa9600, len 2: 00 00
	(XEN) domain_crash_sync called from platform.c:880
	(XEN) Domain 2 (vcpu#0) crashed on cpu#2:
	(XEN) ----[ Xen-3.0-unstable    Not tainted ]----
	(XEN) CPU:    2
	(XEN) EIP:    0008:[<8081d986>]
	(XEN) EFLAGS: 00010202   CONTEXT: hvm
	(XEN) eax: 00008008   ebx: 000003ce   ecx: 000003ce   edx: f821f600
	(XEN) esi: 8081d9fa   edi: f886ecd0   ebp: f886ecfc   esp: f886ecbc
	(XEN) cr0: 8001003b   cr3: 2ded8000
	(XEN) ds: 0023   es: 0023   fs: 0030   gs: 0000   ss: 0010   cs: 0008

Meanwhile, I don't experience any problems for Linux guest. Do you have any ideas why this happens?

Best regards,
Qing He
>-----Original Message-----
>From: xen-devel-bounces@lists.xensource.com
>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Steven Smith
>Sent: 2006年7月26日 23:35
>To: xen-devel@lists.xensource.com
>Cc: sos22@srcf.ucam.org
>Subject: Re: [Xen-devel] Paravirtualised drivers for fully virtualised domains
>
>I've just put an updated version of these patches up at
>http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev2 .  There's also an
>equivalent single big patch at
>http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev2.combined .  Thank you to
>everyone who gave feedback on the previous version.
>
>The main changes since last time are:
>
>-- Support for SMP guests
>-- Support for 64 bit guests on a 64 bit hypervisor
>-- Partial support for 32 bit guests on a 64 bit hypervisor: the network
>   interface works, but the block device doesn't.
>
>The block device can be made to work by #define'ing ALIEN_INTERFACES
>in blkif.h, but drivers compiled in that way won't work with 32 on 32.
>The problem here is that blkif_request_t contains extra padding in 64
>bit builds, and so is a different size, and so the block ring layout
>is different.
>
>Other structures with similar problems are handled either by run time
>tests in the drivers (shared_info_t) or translation wrappers in the
>hypervisor (xen_feature_info_t, xen_add_to_physmap_t), but trying to
>do this for the block rings would require far more painful and
>extensive surgery.  I'm inclined to stick with multiply compiling the
>frontend drivers in the short term, although it'll obviously need
>doing in a slightly less grotty way.
>
>Steven.

^ permalink raw reply	[flat|nested] 34+ messages in thread
* RE: Paravirtualised drivers for fully virtualised domains
@ 2006-08-02  8:23 Zhao, Yunfeng
  2006-08-02  8:56 ` Steven Hand
  0 siblings, 1 reply; 34+ messages in thread
From: Zhao, Yunfeng @ 2006-08-02  8:23 UTC (permalink / raw)
  To: He, Qing, Steven Smith, xen-devel; +Cc: sos22

Qing
Your problem should be problem of credit scheduler.
If you use sedf or bvt, you would not meet the problem.

Thanks
Yunfeng


>-----Original Message-----
>From: xen-devel-bounces@lists.xensource.com
>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of He, Qing
>Sent: 2006年8月2日 16:02
>To: Steven Smith; xen-devel@lists.xensource.com
>Cc: sos22@srcf.ucam.org
>Subject: RE: [Xen-devel] Paravirtualised drivers for fully virtualised domains
>
>Hi Steven,
>I found some issues regarding this patch.
>When I'm trying to start windows as VMX guest (with no drivers, of course) under
>this patch, the guests fail. I ran with three images, windows 2000, XP and 2003.
>
>For 2000 and XP, QEMU windows do not show, there are two lines in the serial
>output:
>	(XEN) Create event channels for vcpu 0.
>	(XEN) Send on unbound Xen event channel?
>
>For 2003 guest, QEMU can start, but before the windows start screen shows, it
>crashes and restarts, complaining about unreasonable mmio opcodes. The serial
>output is:
>	(XEN) (GUEST: 1) unsupported PCI BIOS function 0x0E
>	(XEN) (GUEST: 1) int13_harddisk: function 15, unmapped device for ELDL=82
>	(XEN) 0, This opcode isn't handled yet!
>	(XEN) handle_mmio: failed to decode instruction
>	(XEN) mmio opcode: va 0xf821f600, gpa 0xa9600, len 2: 00 00
>	(XEN) domain_crash_sync called from platform.c:880
>	(XEN) Domain 1 (vcpu#0) crashed on cpu#2:
>	(XEN) ----[ Xen-3.0-unstable    Not tainted ]----
>	(XEN) CPU:    2
>	(XEN) EIP:    0008:[<8081d986>]
>	(XEN) EFLAGS: 00010202   CONTEXT: hvm
>	(XEN) eax: 00008008   ebx: 000003ce   ecx: 000003ce   edx: f821f600
>	(XEN) esi: 8081d9fa   edi: f886ecd0   ebp: f886ecfc   esp: f886ecbc
>	(XEN) cr0: 8001003b   cr3: 8f500000
>	(XEN) ds: 0023   es: 0023   fs: 0030   gs: 0000   ss: 0010   cs: 0008
>	(XEN) Create event channels for vcpu 0.
>	(XEN) Send on unbound Xen event channel?
>	(XEN) (GUEST: 2) HVM Loader
>	(XEN) (GUEST: 2) Loading ROMBIOS ...
>	(XEN) (GUEST: 2) Loading Cirrus VGABIOS ...
>	(XEN) (GUEST: 2) Loading VMXAssist ...
>	(XEN) (GUEST: 2) VMX go ...
>	(XEN) (GUEST: 2) VMXAssist (Aug  2 2006)
>	(XEN) (GUEST: 2) Memory size 512 MB
>	(XEN) (GUEST: 2) E820 map:
>	(XEN) (GUEST: 2) 0000000000000000 - 000000000009F800 (RAM)
>	(XEN) (GUEST: 2) 000000000009F800 - 00000000000A0000 (Reserved)
>	(XEN) (GUEST: 2) 00000000000A0000 - 00000000000C0000 (Type 16)
>	(XEN) (GUEST: 2) 00000000000F0000 - 0000000000100000 (Reserved)
>	(XEN) (GUEST: 2) 0000000000100000 - 000000001FFFE000 (RAM)
>	(XEN) (GUEST: 2) 000000001FFFE000 - 000000001FFFF000 (Type 18)
>	(XEN) (GUEST: 2) 000000001FFFF000 - 0000000020000000 (Type 17)
>	(XEN) (GUEST: 2) 0000000020000000 - 0000000020003000 (ACPI NVS)
>	(XEN) (GUEST: 2) 0000000020003000 - 000000002000D000 (ACPI Data)
>	(XEN) (GUEST: 2) 00000000FEC00000 - 0000000100000000 (Type 16)
>	(XEN) (GUEST: 2)
>	(XEN) (GUEST: 2) Start BIOS ...
>	(XEN) (GUEST: 2) Starting emulated 16-bit real-mode: ip=F000:FFF0
>	(XEN) (GUEST: 2)  rombios.c,v 1.138 2005/05/07 15:55:26 vruppert Exp $
>	(XEN) (GUEST: 2) Remapping master: ICW2 0x8 -> 0x20
>	(XEN) (GUEST: 2) Remapping slave: ICW2 0x70 -> 0x28
>	(XEN) (GUEST: 2) VGABios $Id: vgabios.c,v 1.61 2005/05/24 16:50:50
>vruppert Exp $
>	(XEN) (GUEST: 2) HVMAssist BIOS, 1 cpu, $Revision: 1.138 $ $Date:
>2005/05/07 15:55:26 $
>	(XEN) (GUEST: 2)
>	(XEN) (GUEST: 2) ata0-0: PCHS=16383/16/63 translation=lba
>LCHS=1024/255/63
>	(XEN) (GUEST: 2) ata0 master: QEMU HARDDISK ATA-7 Hard-Disk (12289 MBytes)
>	(XEN) (GUEST: 2) ata0-1: PCHS=3047/16/63 translation=lba LCHS=761/64/63
>	(XEN) (GUEST: 2) ata0  slave: QEMU HARDDISK ATA-7 Hard-Disk (1500 MBytes)
>	(XEN) (GUEST: 2) ata1 master: QEMU CD-ROM ATAPI-4 CD-Rom/DVD-Rom
>	(XEN) (GUEST: 2) ata1  slave: Unknown device
>	(XEN) (GUEST: 2)
>	(XEN) (GUEST: 2) Booting from CD-Rom...
>	(XEN) (GUEST: 2) unsupported PCI BIOS function 0x0E
>	(XEN) (GUEST: 2) int13_harddisk: function 15, unmapped device for ELDL=82
>	(XEN) 0, This opcode isn't handled yet!
>	(XEN) handle_mmio: failed to decode instruction
>	(XEN) mmio opcode: va 0xf821f600, gpa 0xa9600, len 2: 00 00
>	(XEN) domain_crash_sync called from platform.c:880
>	(XEN) Domain 2 (vcpu#0) crashed on cpu#2:
>	(XEN) ----[ Xen-3.0-unstable    Not tainted ]----
>	(XEN) CPU:    2
>	(XEN) EIP:    0008:[<8081d986>]
>	(XEN) EFLAGS: 00010202   CONTEXT: hvm
>	(XEN) eax: 00008008   ebx: 000003ce   ecx: 000003ce   edx: f821f600
>	(XEN) esi: 8081d9fa   edi: f886ecd0   ebp: f886ecfc   esp: f886ecbc
>	(XEN) cr0: 8001003b   cr3: 2ded8000
>	(XEN) ds: 0023   es: 0023   fs: 0030   gs: 0000   ss: 0010   cs: 0008
>
>Meanwhile, I don't experience any problems for Linux guest. Do you have any ideas
>why this happens?
>
>Best regards,
>Qing He
>>-----Original Message-----
>>From: xen-devel-bounces@lists.xensource.com
>>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Steven Smith
>>Sent: 2006年7月26日 23:35
>>To: xen-devel@lists.xensource.com
>>Cc: sos22@srcf.ucam.org
>>Subject: Re: [Xen-devel] Paravirtualised drivers for fully virtualised domains
>>
>>I've just put an updated version of these patches up at
>>http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev2 .  There's also an
>>equivalent single big patch at
>>http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev2.combined .  Thank you to
>>everyone who gave feedback on the previous version.
>>
>>The main changes since last time are:
>>
>>-- Support for SMP guests
>>-- Support for 64 bit guests on a 64 bit hypervisor
>>-- Partial support for 32 bit guests on a 64 bit hypervisor: the network
>>   interface works, but the block device doesn't.
>>
>>The block device can be made to work by #define'ing ALIEN_INTERFACES
>>in blkif.h, but drivers compiled in that way won't work with 32 on 32.
>>The problem here is that blkif_request_t contains extra padding in 64
>>bit builds, and so is a different size, and so the block ring layout
>>is different.
>>
>>Other structures with similar problems are handled either by run time
>>tests in the drivers (shared_info_t) or translation wrappers in the
>>hypervisor (xen_feature_info_t, xen_add_to_physmap_t), but trying to
>>do this for the block rings would require far more painful and
>>extensive surgery.  I'm inclined to stick with multiply compiling the
>>frontend drivers in the short term, although it'll obviously need
>>doing in a slightly less grotty way.
>>
>>Steven.
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread
* RE: Paravirtualised drivers for fully virtualised domains
@ 2006-08-02  9:49 He, Qing
  0 siblings, 0 replies; 34+ messages in thread
From: He, Qing @ 2006-08-02  9:49 UTC (permalink / raw)
  To: Steven Smith, xen-devel; +Cc: sos22



>-----Original Message-----
>From: Steven Smith [mailto:sos22@hermes.cam.ac.uk] On Behalf Of Steven Smith
>Sent: 2006年8月2日 17:31
>To: He, Qing
>Cc: Steven Smith; xen-devel@lists.xensource.com; sos22@srcf.ucam.org
>Subject: Re: [Xen-devel] Paravirtualised drivers for fully virtualised domains
>
>> When I'm trying to start windows as VMX guest (with no drivers, of
>> course) under this patch, the guests fail. I ran with three images,
>> windows 2000, XP and 2003.
>
>> For 2000 and XP, QEMU windows do not show, there are two lines in the serial
>output:
>> 	(XEN) Create event channels for vcpu 0.
>> 	(XEN) Send on unbound Xen event channel?
>Is there anything interesting in /var/log/qemu-dm.* ?  Only the most
>recent log file is relevant (which isn't necessarily the one with the
>highest number, unfortunately).
>
>Also, it looks like this is crashing too soon for it to be related to
>what guest you're running.  Are all of the disk images the same type
>(file vs. block device) and size?
>
Sorry, these 2 cases are of some kind of configuration errors, some qemu parameters changed after qemu update, I can get them run using an early changeset. So when it cannot boot, I don't think of the possibility of configuration errors. 

After changed the configuration, they can boot now (doesn't test if they meet the same problem as below)

>> For 2003 guest, QEMU can start, but before the windows start screen
>>shows, it crashes and restarts, complaining about unreasonable mmio
>>opcodes. The serial output is:
>>
>> 	(XEN) (GUEST: 1) unsupported PCI BIOS function 0x0E
>> 	(XEN) (GUEST: 1) int13_harddisk: function 15, unmapped device for ELDL=82
>> 	(XEN) 0, This opcode isn't handled yet!
>> 	(XEN) handle_mmio: failed to decode instruction
>> 	(XEN) mmio opcode: va 0xf821f600, gpa 0xa9600, len 2: 00 00
>> 	(XEN) domain_crash_sync called from platform.c:880
>This looks like a problem with hvm_copy.  Is this a PAE hypervisor?
>
Your patch is based on Cset 10735, before applied the patch, I can start and run the image with no problems; but after the patch, this problem can be reproduced every time.
It's not a PAE hypervisor, and qemu log doesn't show much information:
	domid: 1
	qemu: the number of cpus is 1
	shared page at pfn:1ffff, mfn: 3e35f
	char device redirected to /dev/pts/2


>> Meanwhile, I don't experience any problems for Linux guest. Do you
>> have any ideas why this happens?
>Some kind of race would be my first guess.
>
>Steven.

Best regards,
Qing

^ permalink raw reply	[flat|nested] 34+ messages in thread
* RE: Paravirtualised drivers for fully virtualised domains
@ 2006-08-02 10:35 He, Qing
  2006-08-03  6:59 ` Himanshu Raj
  0 siblings, 1 reply; 34+ messages in thread
From: He, Qing @ 2006-08-02 10:35 UTC (permalink / raw)
  To: Steven Smith, xen-devel; +Cc: sos22

[-- Attachment #1: Type: text/plain, Size: 1149 bytes --]

Thanks Steven, with this patch, the problem's gone.

Qing

>-----Original Message-----
>From: Steven Smith [mailto:sos22@hermes.cam.ac.uk] On Behalf Of Steven Smith
>Sent: 2006年8月2日 18:16
>To: He, Qing
>Cc: Steven Smith; sos22@srcf.ucam.org; xen-devel@xensource.com
>Subject: Re: [Xen-devel] Paravirtualised drivers for fully virtualised domains
>
>> >> 	(XEN) 0, This opcode isn't handled yet!
>> >> 	(XEN) handle_mmio: failed to decode instruction
>> >> 	(XEN) mmio opcode: va 0xf821f600, gpa 0xa9600, len 2: 00 00
>> >> 	(XEN) domain_crash_sync called from platform.c:880
>> >This looks like a problem with hvm_copy.  Is this a PAE hypervisor?
>> >
>> Your patch is based on Cset 10735, before applied the patch, I can
>>start and run the image with no problems; but after the patch, this
>>problem can be reproduced every time.
>Sorry, I wasn't trying to shift blame here: the patch I posted
>includes some changes to hvm_copy in the non-PAE case, and I suspect
>that it's those which are causing these problems.  Does the attached
>patch help?
>
>(Apply it over the top of the ones I posted previously)
>
>Steven

[-- Attachment #2: unoptimise.hvm_copy.diff --]
[-- Type: application/octet-stream, Size: 1950 bytes --]

diff -r 8ca23cd6190f xen/include/asm-x86/shadow.h
--- a/xen/include/asm-x86/shadow.h	Wed Aug 02 11:09:15 2006 +0100
+++ b/xen/include/asm-x86/shadow.h	Wed Aug 02 11:09:37 2006 +0100
@@ -178,11 +178,6 @@ extern void shadow_l2_normal_pt_update(s
       ((s) >= (L2_PAGETABLE_FIRST_XEN_SLOT & (L2_PAGETABLE_ENTRIES - 1))) )
 
 extern unsigned long gva_to_gpa(unsigned long gva);
-static inline unsigned long gva_to_mfn(unsigned long gva)
-{
-    unsigned long gpa = gva_to_gpa(gva);
-    return get_mfn_from_gpfn(gpa >> PAGE_SHIFT);
-}
 
 extern void shadow_l3_normal_pt_update(struct domain *d,
                                        paddr_t pa, l3_pgentry_t l3e,
@@ -1740,32 +1735,14 @@ static inline unsigned long gva_to_gpa(u
     return l1e_get_paddr(gpte) + (gva & ~PAGE_MASK); 
 }
 
+#endif
+
 static inline unsigned long gva_to_mfn(unsigned long gva)
 {
-    l1_pgentry_t l1e;
-
-    if (__copy_from_user(&l1e, &shadow_linear_pg_table[l1_linear_offset(gva)],
-                         sizeof(l1e)) ||
-        (l1e_get_flags(l1e) & (_PAGE_PRESENT | _PAGE_RW)) !=
-         (_PAGE_PRESENT | _PAGE_RW) ) {
-        struct cpu_user_regs cur;
-        /* Error code -> write */
-        cur.error_code = 3;
-        cur.cs = 0; /* Ring 0 -> hypervisor */
-        cur.eflags = 0;
-        shadow_fault(gva, &cur);
-        if (__copy_from_user(&l1e,
-                             &shadow_linear_pg_table[l1_linear_offset(gva)],
-                             sizeof(l1e)) ||
-            (l1e_get_flags(l1e) & (_PAGE_PRESENT | _PAGE_RW)) !=
-             (_PAGE_PRESENT | _PAGE_RW) ) {
-            return 0;
-        }
-    }
-    return l1e_get_pfn(l1e);
-}
-
-#endif
+    unsigned long gpa = gva_to_gpa(gva);
+    return get_mfn_from_gpfn(gpa >> PAGE_SHIFT);
+}
+
 /************************************************************************/
 
 extern void __update_pagetables(struct vcpu *v);

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2006-08-16 13:36 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-18 12:51 Paravirtualised drivers for fully virtualised domains Steven Smith
2006-07-18 13:45 ` Ben Thomas
2006-07-18 16:00 ` Steve Ofsthun
2006-07-18 16:23   ` Mark Williamson
2006-07-18 20:34   ` Steven Smith
2006-07-18 23:24     ` Steve Ofsthun
2006-07-19  6:50       ` Gerd Hoffmann
2006-07-26 15:34 ` Steven Smith
2006-08-08  9:42   ` Steven Smith
2006-08-09 18:05     ` Steve Dobbelstein
2006-08-10 11:08     ` Paravirtualised drivers for fully virtualised domains, rev9 Steven Smith
2006-08-10 21:48       ` Steve Dobbelstein
2006-08-11 10:17         ` Steven Smith
2006-08-11 10:31           ` Harry Butterworth
2006-08-14  9:12             ` Steven Smith
2006-08-11 17:04           ` Steve Dobbelstein
2006-08-12  8:32             ` Steven Smith
2006-08-14 21:22               ` Steve Dobbelstein
2006-08-15  7:27                 ` Steven Smith
2006-08-15 22:05                   ` Steve Dobbelstein
2006-08-16 13:36         ` Steven Smith
2006-08-16 13:33       ` Paravirtualised drivers for fully virtualised domains, rev11 sos22-xen
  -- strict thread matches above, loose matches on Subject: below --
2006-07-19  4:14 Paravirtualised drivers for fully virtualised domains Ian Pratt
2006-07-26 22:35 Nakajima, Jun
2006-08-02  8:01 He, Qing
2006-08-02  9:30 ` Steven Smith
2006-08-02  8:23 Zhao, Yunfeng
2006-08-02  8:56 ` Steven Hand
2006-08-02  9:37   ` Steven Smith
2006-08-02  9:49 He, Qing
2006-08-02 10:35 He, Qing
2006-08-03  6:59 ` Himanshu Raj
2006-08-03  9:35   ` Steven Smith
2006-08-04  6:13     ` Himanshu Raj

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.