All of lore.kernel.org
 help / color / mirror / Atom feed
* Paravirtualised drivers for fully virtualised domains
@ 2006-07-18 12:51 Steven Smith
  2006-07-18 13:45 ` Ben Thomas
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Steven Smith @ 2006-07-18 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: sos22


[-- Attachment #1.1.1: Type: text/plain, Size: 3748 bytes --]

(The list appears to have eaten my previous attempt to send this.
Apologies if you receive multiple copies.)

The attached patches allow you to use paravirtualised network and
block interfaces from fully virtualised domains, based on Intel's
patches from a few months ago.  These are significantly faster than
the equivalent ioemu devices, sometimes by more than an order of
magnitude.

These drivers are explicitly not considered by XenSource to be an
alternative to improving the performance of the ioemu devices.
Rather, work on both will continue in parallel.

To build, apply the three patches to a clean checkout of xen-unstable
and then build Xen, dom0, and the tools in the usual way.  To build
the drivers themselves, you first need to build a native kernel for
the guest, and then go

cd xen-unstable.hg/unmodified-drivers/linux-2.6
./mkbuildtree
make -C /usr/src/linux-2.6.16 M=$PWD modules

where /usr/src/linux-2.6.16 is the path to the area where you built
the guest kernel.  This should be a native kernel, and not a xenolinux
one.  You should end up with four modules.  xen-evtchn.ko should be
loaded first, followed by xenbus.ko, and then whichever of xen-vnif.ko
and xen-vbd.ko you need.  None of the modules need any arguments.

The xm configuration syntax is exactly the same as it would be for
paravirtualised devices in a paravirtualised domain.  For a network
interface, you take your line

vif= [ 'type=ioemu,mac=00:16:3E:C1:CA:78' ]

(or whatever) and replace it with

vif= [ 'type=ioemu,mac=00:16:3E:C1:CA:78', 'bridge=xenbr0' ]

where bridge=xenbr0 should be some suitable netif configuration
string, as it would be in the PV-on-PV case.  Disk is likewise fairly
simple:

disk = [ 'file:/path/to/image,ioemu:hda,w' ]

becomes

disk = [ 'file:/path/to/image,ioemu:hda,w', 'file:/path/to/some/other/image,hde,w' ]

There is a slight complication in that the paravirtualised block
device can't share an IDE controller with an ioemu device, so if you
have an ioemu hda, the paravirtualised device must be hde or later.
This is to avoid confusing the Linux IDE driver.

Note that having a PV device doesn't imply having a corresponding
ioemu device, and vice versa.  Configuring a single backing store to
appear as both an IDE device and a paravirtualised block device is
likely to cause problems; don't do it.



The patches consist of a number of big parts:

-- A version of netback and netfront which can copy packets into
   domains rather than doing page flipping.  It's much easier to make
   this work well with qemu, since the P2M table doesn't need to
   change, and it can be faster for some workloads.

   The copying interface has been confirmed to work in paravirtualised
   domains, but is currently disabled there.

-- Reworking the device model and hypervisor support so that iorequest
   completion notifications no longer go to the HVM guest's event
   channel mask.  This avoids a whole slew of really quite nasty race
   conditions

-- Adding a new device to the qemu PCI bus which is used for
   bootstrapping the devices and getting an IRQ.

-- Support for hypercalls from HVM domains

-- Various shims and fixes to the frontends so that they work without
   the rest of the xenolinux infrastructure.

The patches still have a few rough edges, and they're not as easy to
understand as I'd like, but I think they should be mostly
comprehensible and reasonably stable.  The plan is to add them to
xen-unstable over the next few weeks, probably before 3.0.3, so any
testing which anyone can do would be helpful.

The Xen and tools changes are also available as a series of smaller
patches at http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/hvm_xen .  The
composition of these gives hvm_xen_unstable.diff.

Steven.

[-- Attachment #1.1.2: copy_netif.diff --]
[-- Type: text/plain, Size: 15145 bytes --]

# HG changeset patch
# User sos22@douglas.cl.cam.ac.uk
# Date 1153175686 -3600
# Node ID 7053592c928b488b0c653fb25ce6f73bc6deeb05
# Parent  4726fd416506a34da96888bac0e7c9772c5037e8
Copying netback.

diff -r 4726fd416506 -r 7053592c928b linux-2.6-xen-sparse/drivers/xen/netback/common.h
--- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Mon Jul 17 22:55:34 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Mon Jul 17 23:34:46 2006 +0100
@@ -59,6 +59,8 @@ typedef struct netif_st {
 	/* Unique identifier for this interface. */
 	domid_t          domid;
 	unsigned int     handle;
+	unsigned int     rx_flags;
+	unsigned int     copy_delivery_offset;
 
 	u8               fe_dev_addr[6];
 
diff -r 4726fd416506 -r 7053592c928b linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Mon Jul 17 22:55:34 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Mon Jul 17 23:34:46 2006 +0100
@@ -63,13 +63,17 @@ static struct timer_list net_timer;
 #define MAX_PENDING_REQS 256
 
 static struct sk_buff_head rx_queue;
-static multicall_entry_t rx_mcl[NET_RX_RING_SIZE+1];
+static multicall_entry_t rx_mcl[NET_RX_RING_SIZE+3];
 static mmu_update_t rx_mmu[NET_RX_RING_SIZE];
-static gnttab_transfer_t grant_rx_op[NET_RX_RING_SIZE];
+static gnttab_transfer_t grant_rx_trans_op[NET_RX_RING_SIZE];
+static gnttab_map_grant_ref_t grant_rx_map_op[NET_RX_RING_SIZE];
+static gnttab_unmap_grant_ref_t grant_rx_unmap_op[NET_RX_RING_SIZE];
 static unsigned char rx_notify[NR_IRQS];
 
 static unsigned long mmap_vstart;
 #define MMAP_VADDR(_req) (mmap_vstart + ((_req) * PAGE_SIZE))
+
+static void *rx_mmap_area;
 
 #define PKT_PROT_LEN 64
 
@@ -96,13 +100,12 @@ static struct list_head net_schedule_lis
 static struct list_head net_schedule_list;
 static spinlock_t net_schedule_list_lock;
 
+static unsigned long alloc_mfn(void)
+{
 #define MAX_MFN_ALLOC 64
-static unsigned long mfn_list[MAX_MFN_ALLOC];
-static unsigned int alloc_index = 0;
-static DEFINE_SPINLOCK(mfn_lock);
-
-static unsigned long alloc_mfn(void)
-{
+	static unsigned long mfn_list[MAX_MFN_ALLOC];
+	static unsigned int alloc_index = 0;
+	static DEFINE_SPINLOCK(mfn_lock);
 	unsigned long mfn = 0, flags;
 	struct xen_memory_reservation reservation = {
 		.nr_extents   = MAX_MFN_ALLOC,
@@ -218,73 +221,122 @@ static void net_rx_action(unsigned long 
 	u16 size, id, irq, flags;
 	multicall_entry_t *mcl;
 	mmu_update_t *mmu;
-	gnttab_transfer_t *gop;
+	gnttab_transfer_t *flip_gop;
+	gnttab_map_grant_ref_t *map_gop;
+	gnttab_unmap_grant_ref_t *unmap_gop;
 	unsigned long vdata, old_mfn, new_mfn;
-	struct sk_buff_head rxq;
+	struct sk_buff_head flip_rxq, copy_rxq;
 	struct sk_buff *skb;
 	u16 notify_list[NET_RX_RING_SIZE];
 	int notify_nr = 0;
 	int ret;
-
-	skb_queue_head_init(&rxq);
+	void *rx_mmap_ptr;
+	netif_rx_request_t *rx_req_p;
+	void *remote_data;
+
+	skb_queue_head_init(&flip_rxq);
+	skb_queue_head_init(&copy_rxq);
 
 	mcl = rx_mcl;
 	mmu = rx_mmu;
-	gop = grant_rx_op;
-
+	flip_gop = grant_rx_trans_op;
+	map_gop = grant_rx_map_op;
+	rx_mmap_ptr = rx_mmap_area;
+
+	/* Split the incoming skbs according to whether they need to
+	   be page flipped or copied, and build up the first set of
+	   hypercall arguments. */
 	while ((skb = skb_dequeue(&rx_queue)) != NULL) {
 		netif   = netdev_priv(skb->dev);
-		vdata   = (unsigned long)skb->data;
-		old_mfn = virt_to_mfn(vdata);
-
-		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-			/* Memory squeeze? Back off for an arbitrary while. */
-			if ((new_mfn = alloc_mfn()) == 0) {
-				if ( net_ratelimit() )
-					WPRINTK("Memory squeeze in netback "
-						"driver.\n");
-				mod_timer(&net_timer, jiffies + HZ);
-				skb_queue_head(&rx_queue, skb);
+		size    = skb->tail - skb->data;
+		rx_req_p = RING_GET_REQUEST(&netif->rx,
+					    netif->rx.req_cons);
+
+		if (netif->rx_flags &&
+		    (rx_req_p->flags & NETIF_RXRF_copy_packet)) {
+			if (map_gop - grant_rx_map_op ==
+			    ARRAY_SIZE(grant_rx_map_op))
 				break;
+			if (size > PAGE_SIZE - netif->copy_delivery_offset) {
+				if (net_ratelimit()) {
+					printk("Discarding jumbogram to copying interface\n");
+				}
+				netif_put(netif);
+				dev_kfree_skb(skb);
+				continue;
 			}
-			/*
-			 * Set the new P2M table entry before reassigning
-			 * the old data page. Heed the comment in
-			 * pgtable-2level.h:pte_page(). :-)
-			 */
-			set_phys_to_machine(
-				__pa(skb->data) >> PAGE_SHIFT,
-				new_mfn);
-
-			MULTI_update_va_mapping(mcl, vdata,
-						pfn_pte_ma(new_mfn,
-							   PAGE_KERNEL), 0);
-			mcl++;
-
-			mmu->ptr = ((maddr_t)new_mfn << PAGE_SHIFT) |
-				MMU_MACHPHYS_UPDATE;
-			mmu->val = __pa(vdata) >> PAGE_SHIFT;
-			mmu++;
-		}
-
-		gop->mfn = old_mfn;
-		gop->domid = netif->domid;
-		gop->ref = RING_GET_REQUEST(
-			&netif->rx, netif->rx.req_cons)->gref;
-		netif->rx.req_cons++;
-		gop++;
-
-		__skb_queue_tail(&rxq, skb);
-
-		/* Filled the batch queue? */
-		if ((gop - grant_rx_op) == ARRAY_SIZE(grant_rx_op))
-			break;
-	}
-
-	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-		if (mcl == rx_mcl)
-			return;
-
+			map_gop->host_addr = (unsigned long)rx_mmap_ptr;
+			map_gop->dom       = netif->domid;
+			map_gop->ref       = rx_req_p->gref;
+			map_gop->flags     = GNTMAP_host_map;
+			map_gop++;
+			rx_mmap_ptr += PAGE_SIZE;
+
+			memcpy(skb->cb, rx_req_p, sizeof(*rx_req_p));
+
+			netif->rx.req_cons++;
+			__skb_queue_tail(&copy_rxq, skb);
+		} else {
+			/* Filled the batch queue? */
+			if ((flip_gop - grant_rx_trans_op) ==
+			    ARRAY_SIZE(grant_rx_trans_op))
+				break;
+
+			vdata   = (unsigned long)skb->data;
+			old_mfn = virt_to_mfn(vdata);
+
+			if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+				/* Memory squeeze? Back off for an
+				 * arbitrary while. */
+				if ((new_mfn = alloc_mfn()) == 0) {
+					if ( net_ratelimit() )
+						WPRINTK("Memory squeeze in netback "
+							"driver.\n");
+					mod_timer(&net_timer, jiffies + HZ);
+					skb_queue_head(&rx_queue, skb);
+					break;
+				}
+				/*
+				 * Set the new P2M table entry before
+				 * reassigning the old data page. Heed
+				 * the comment in
+				 * pgtable-2level.h:pte_page(). :-)
+				 */
+				set_phys_to_machine(
+					__pa(skb->data) >> PAGE_SHIFT,
+					new_mfn);
+
+				MULTI_update_va_mapping(mcl, vdata,
+							pfn_pte_ma(new_mfn,
+								   PAGE_KERNEL), 0);
+				mcl++;
+
+				mmu->ptr = ((maddr_t)new_mfn << PAGE_SHIFT) |
+					MMU_MACHPHYS_UPDATE;
+				mmu->val = __pa(vdata) >> PAGE_SHIFT;
+				mmu++;
+			}
+
+			flip_gop->mfn   = old_mfn;
+			flip_gop->domid = netif->domid;
+			flip_gop->ref   = rx_req_p->gref;
+			flip_gop++;
+
+			netif->rx.req_cons++;
+			__skb_queue_tail(&flip_rxq, skb);
+		}
+
+		netif->stats.tx_bytes += size;
+		netif->stats.tx_packets++;
+	}
+
+	if (flip_gop == grant_rx_trans_op && map_gop == grant_rx_map_op) {
+		/* Nothing to do */
+		return;
+	}
+
+	if (mcl != rx_mcl) {
+		/* Did some unmaps -> need a TLB flush */
 		mcl[-1].args[MULTI_UVMFLAGS_INDEX] = UVMF_TLB_FLUSH|UVMF_ALL;
 
 		if (mmu - rx_mmu) {
@@ -296,26 +348,32 @@ static void net_rx_action(unsigned long 
 			mcl++;
 		}
 
-		ret = HYPERVISOR_multicall(rx_mcl, mcl - rx_mcl);
-		BUG_ON(ret != 0);
-	}
-
-	ret = HYPERVISOR_grant_table_op(GNTTABOP_transfer, grant_rx_op, 
-					gop - grant_rx_op);
+		BUG_ON(flip_gop == grant_rx_trans_op);
+		MULTI_grant_table_op(mcl, GNTTABOP_transfer,
+				     grant_rx_trans_op,
+				     flip_gop - grant_rx_trans_op);
+		mcl++;
+	}
+	if (map_gop != grant_rx_map_op) {
+		MULTI_grant_table_op(mcl, GNTTABOP_map_grant_ref,
+				     grant_rx_map_op,
+				     map_gop - grant_rx_map_op);
+		mcl++;
+	}
+
+	ret = HYPERVISOR_multicall(rx_mcl, mcl - rx_mcl);
 	BUG_ON(ret != 0);
 
+	/* Now do all of the page flips */
 	mcl = rx_mcl;
-	gop = grant_rx_op;
-	while ((skb = __skb_dequeue(&rxq)) != NULL) {
+	flip_gop = grant_rx_trans_op;
+	while ((skb = __skb_dequeue(&flip_rxq)) != NULL) {
 		netif   = netdev_priv(skb->dev);
 		size    = skb->tail - skb->data;
 
 		atomic_set(&(skb_shinfo(skb)->dataref), 1);
 		skb_shinfo(skb)->nr_frags = 0;
 		skb_shinfo(skb)->frag_list = NULL;
-
-		netif->stats.tx_bytes += size;
-		netif->stats.tx_packets++;
 
 		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
 			/* The update_va_mapping() must not fail. */
@@ -325,14 +383,14 @@ static void net_rx_action(unsigned long 
 
 		/* Check the reassignment error code. */
 		status = NETIF_RSP_OKAY;
-		if (gop->status != 0) { 
+		if (flip_gop->status != 0) { 
 			DPRINTK("Bad status %d from grant transfer to DOM%u\n",
-				gop->status, netif->domid);
+				flip_gop->status, netif->domid);
 			/*
 			 * Page no longer belongs to us unless GNTST_bad_page,
 			 * but that should be a fatal error anyway.
 			 */
-			BUG_ON(gop->status == GNTST_bad_page);
+			BUG_ON(flip_gop->status == GNTST_bad_page);
 			status = NETIF_RSP_ERROR; 
 		}
 		irq = netif->irq;
@@ -352,7 +410,72 @@ static void net_rx_action(unsigned long 
 
 		netif_put(netif);
 		dev_kfree_skb(skb);
-		gop++;
+		flip_gop++;
+	}
+
+	/* Now do all of the copies */
+	map_gop = grant_rx_map_op;
+	unmap_gop = grant_rx_unmap_op;
+	skb = ((struct sk_buff *)&copy_rxq)->next;
+	while (skb != (struct sk_buff *)&copy_rxq) {
+		netif = netdev_priv(skb->dev);
+		size  = skb->tail - skb->data;
+
+		rx_req_p = (netif_rx_request_t *)skb->cb;
+
+		if (map_gop->status == 0) {
+			remote_data =
+				(void *)(unsigned long)map_gop->host_addr;
+			memcpy(remote_data + 16,
+			       skb->data,
+			       size);
+			unmap_gop->host_addr    = map_gop->host_addr;
+			unmap_gop->dev_bus_addr = 0;
+			unmap_gop->handle       = map_gop->handle;
+			unmap_gop++;
+		}
+
+		map_gop++;
+		skb = skb->next;
+	}
+
+	/* Unmap the packets we just copied into */
+	if (unmap_gop != grant_rx_unmap_op) {
+		ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
+						grant_rx_unmap_op,
+						unmap_gop - grant_rx_unmap_op);
+		BUG_ON(ret);
+		map_gop = grant_rx_map_op;
+		/* And notify the other side. */
+		while ((skb = __skb_dequeue(&copy_rxq)) != NULL) {
+			netif = netdev_priv(skb->dev);
+			rx_req_p = (netif_rx_request_t *)skb->cb;
+
+			flags = 0;
+			if (skb->ip_summed == CHECKSUM_HW)
+				flags |= (NETRXF_csum_blank |
+					  NETRXF_data_validated);
+			else if (skb->proto_data_valid)
+				flags |= NETRXF_data_validated;
+
+			if (map_gop->status)
+				status = NETIF_RSP_ERROR;
+			else
+				status = NETIF_RSP_OKAY;
+
+			irq = netif->irq;
+			if (make_rx_response(netif, rx_req_p->id, status,
+					     netif->copy_delivery_offset, size,
+					     flags) &&
+			    rx_notify[irq] == 0) {
+				rx_notify[irq] = 1;
+				notify_list[notify_nr++] = irq;
+			}
+
+			netif_put(netif);
+			dev_kfree_skb(skb);
+			map_gop++;
+		}
 	}
 
 	while (notify_nr != 0) {
@@ -966,6 +1089,12 @@ static void netif_page_release(struct pa
 	set_page_count(page, 1);
 
 	netif_idx_release(pending_idx);
+}
+
+static void netif_rx_page_release(struct page *page)
+{
+	/* Ready for next use. */
+	set_page_count(page, 1);
 }
 
 irqreturn_t netif_be_int(int irq, void *dev_id, struct pt_regs *regs)
@@ -1093,6 +1222,16 @@ static int __init netback_init(void)
 		SetPageForeign(page, netif_page_release);
 	}
 
+	page = balloon_alloc_empty_page_range(NET_RX_RING_SIZE);
+	BUG_ON(page == NULL);
+	rx_mmap_area = pfn_to_kaddr(page_to_pfn(page));
+
+	for (i = 0; i < NET_RX_RING_SIZE; i++) {
+		page = virt_to_page(rx_mmap_area + (i * PAGE_SIZE));
+		set_page_count(page, 1);
+		SetPageForeign(page, netif_rx_page_release);
+	}
+
 	pending_cons = 0;
 	pending_prod = MAX_PENDING_REQS;
 	for (i = 0; i < MAX_PENDING_REQS; i++)
diff -r 4726fd416506 -r 7053592c928b linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Mon Jul 17 22:55:34 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Mon Jul 17 23:34:46 2006 +0100
@@ -110,6 +110,18 @@ static int netback_probe(struct xenbus_d
 		}
 #endif
 
+		err = xenbus_printf(xbt, dev->nodename, "feature-rx-copy", "%d", 1);
+		if (err) {
+			message = "writing feature-copying";
+			goto abort_transaction;
+		}
+
+		err = xenbus_printf(xbt, dev->nodename, "feature-rx-flags", "%d", 1);
+		if (err) {
+			message = "writing feature-rx-flags";
+			goto abort_transaction;
+		}
+
 		err = xenbus_transaction_end(xbt, 0);
 	} while (err == -EAGAIN);
 
@@ -363,6 +375,30 @@ static int connect_rings(struct backend_
 	if (err) {
 		xenbus_dev_fatal(dev, err,
 				 "reading %s/ring-ref and event-channel",
+				 dev->otherend);
+		return err;
+	}
+
+	err = xenbus_scanf(XBT_NIL, dev->otherend,
+			   "use-rx-flags", "%u",
+			   &be->netif->rx_flags);
+	if (err == -ENOENT) {
+		be->netif->rx_flags = 0;
+	} else if (err < 0) {
+		xenbus_dev_fatal(dev, err,
+				 "reading %s/use-rx-flags",
+				 dev->otherend);
+		return err;
+	}
+
+	err = xenbus_scanf(XBT_NIL, dev->otherend,
+			   "copy-delivery-offset", "%u",
+			   &be->netif->copy_delivery_offset);
+	if (err == -ENOENT) {
+		be->netif->copy_delivery_offset = 0;
+	} else if (err < 0) {
+		xenbus_dev_fatal(dev, err,
+				 "reading %s/copy_delivery_offset",
 				 dev->otherend);
 		return err;
 	}
diff -r 4726fd416506 -r 7053592c928b linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypervisor.h
--- a/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypervisor.h	Mon Jul 17 22:55:34 2006 +0100
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypervisor.h	Mon Jul 17 23:34:46 2006 +0100
@@ -200,6 +200,16 @@ MULTI_update_va_mapping(
 }
 
 static inline void
+MULTI_grant_table_op(multicall_entry_t *mcl, unsigned int cmd,
+		     void *uop, unsigned int count)
+{
+    mcl->op = __HYPERVISOR_grant_table_op;
+    mcl->args[0] = cmd;
+    mcl->args[1] = (unsigned long)uop;
+    mcl->args[2] = count;
+}
+
+static inline void
 MULTI_update_va_mapping_otherdomain(
     multicall_entry_t *mcl, unsigned long va,
     pte_t new_val, unsigned long flags, domid_t domid)
diff -r 4726fd416506 -r 7053592c928b xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h	Mon Jul 17 22:55:34 2006 +0100
+++ b/xen/include/public/io/netif.h	Mon Jul 17 23:34:46 2006 +0100
@@ -109,8 +109,12 @@ struct netif_tx_response {
 };
 typedef struct netif_tx_response netif_tx_response_t;
 
+#define _NETIF_RXRF_copy_packet (0)
+#define  NETIF_RXRF_copy_packet (1U<<_NETIF_RXRF_copy_packet)
+
 struct netif_rx_request {
     uint16_t    id;        /* Echoed in response message.        */
+    uint16_t    flags;     /* NETRXRF_* */
     grant_ref_t gref;      /* Reference to incoming granted frame */
 };
 typedef struct netif_rx_request netif_rx_request_t;

[-- Attachment #1.1.3: frontend_changes.diff --]
[-- Type: text/plain, Size: 66799 bytes --]

# HG changeset patch
# User sos22@douglas.cl.cam.ac.uk
# Date 1153175939 -3600
# Node ID aa3087ee5769d60d5ab1e368cc062233d364ec8b
# Parent  7053592c928b488b0c653fb25ce6f73bc6deeb05
Frontend parts of PV-on-HVM patches.

diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/blkfront/blkfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/blkfront/blkfront.c	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/blkfront/blkfront.c	Mon Jul 17 23:38:59 2006 +0100
@@ -46,6 +46,7 @@
 #include <xen/interface/grant_table.h>
 #include <xen/gnttab.h>
 #include <asm/hypervisor.h>
+#include <asm/maddr.h>
 
 #define BLKIF_STATE_DISCONNECTED 0
 #define BLKIF_STATE_CONNECTED    1
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/core/gnttab.c
--- a/linux-2.6-xen-sparse/drivers/xen/core/gnttab.c	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/core/gnttab.c	Mon Jul 17 23:38:59 2006 +0100
@@ -41,6 +41,13 @@
 #include <asm/pgtable.h>
 #include <asm/uaccess.h>
 #include <asm/synch_bitops.h>
+#include <asm/maddr.h>
+#include <xen/interface/memory.h>
+
+#ifndef CONFIG_XEN
+#include <asm/io.h>
+#include <evtchn-pci.h>
+#endif
 
 /* External tools reserve first few grant table entries. */
 #define NR_RESERVED_ENTRIES 8
@@ -350,6 +357,7 @@ void gnttab_cancel_free_callback(struct 
 }
 EXPORT_SYMBOL_GPL(gnttab_cancel_free_callback);
 
+#ifdef CONFIG_XEN
 #ifndef __ia64__
 static int map_pte_fn(pte_t *pte, struct page *pmd_page,
 		      unsigned long addr, void *data)
@@ -404,23 +412,49 @@ int gnttab_resume(void)
 	shared = __va(frames[0] << PAGE_SHIFT);
 	printk("grant table at %p\n", shared);
 #endif
-
-	return 0;
-}
+}
+#else /* !CONFIG_XEN */
+int
+gnttab_resume(void)
+{
+	unsigned long frames;
+	int x;
+	struct xen_add_to_physmap xatp;
+
+	frames = alloc_xen_mmio(PAGE_SIZE * NR_GRANT_FRAMES);
+	shared = ioremap(frames, PAGE_SIZE * NR_GRANT_FRAMES);
+	if(!shared){
+		printk("error to ioremap gnttab share frames\n");
+		return -1;
+	}
+	for (x = 0; x < NR_GRANT_FRAMES; x++) {
+		xatp.domid = DOMID_SELF;
+		xatp.idx = x;
+		xatp.space = XENMAPSPACE_grant_table;
+		xatp.gpfn = (frames >> PAGE_SHIFT) + x;
+		BUG_ON(HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp));
+	}
+	return 0;
+}
+#endif
 
 int gnttab_suspend(void)
 {
 
 #ifndef __ia64__
+#ifdef CONFIG_XEN
 	apply_to_page_range(&init_mm, (unsigned long)shared,
 			    PAGE_SIZE * NR_GRANT_FRAMES,
 			    unmap_pte_fn, NULL);
-#endif
-
-	return 0;
-}
-
-static int __init gnttab_init(void)
+#else
+	iounmap(shared);
+#endif
+#endif
+
+	return 0;
+}
+
+int __init gnttab_init(void)
 {
 	int i;
 
@@ -439,4 +473,6 @@ static int __init gnttab_init(void)
 	return 0;
 }
 
+#ifdef CONFIG_XEN
 core_initcall(gnttab_init);
+#endif
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/core/xen_proc.c
--- a/linux-2.6-xen-sparse/drivers/xen/core/xen_proc.c	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/core/xen_proc.c	Mon Jul 17 23:38:59 2006 +0100
@@ -1,4 +1,5 @@
 
+#include <linux/module.h>
 #include <linux/config.h>
 #include <linux/proc_fs.h>
 #include <xen/xen_proc.h>
@@ -12,6 +13,7 @@ struct proc_dir_entry *create_xen_proc_e
 			panic("Couldn't create /proc/xen");
 	return create_proc_entry(name, mode, xen_base);
 }
+EXPORT_SYMBOL(create_xen_proc_entry);
 
 void remove_xen_proc_entry(const char *name)
 {
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Mon Jul 17 23:38:59 2006 +0100
@@ -61,6 +61,25 @@
 #include <asm/uaccess.h>
 #include <xen/interface/grant_table.h>
 #include <xen/gnttab.h>
+#include <asm/maddr.h>
+
+/* If we don't have GSO, fake things up so that we never try to use
+   it */
+#ifndef NETIF_F_GSO
+#define netif_needs_gso(dev, skb) 0
+#define NETIF_F_GSO_ROBUST 0
+#define NETIF_F_GSO_SHIFT 16
+#else
+#define HAVE_GSO
+#endif
+
+#ifdef CONFIG_XEN
+#define SKB_PROTO_DATA_VALID(skb) (skb)->proto_data_valid
+#define SET_SKB_PROTO_DATA_VALID(skb, v) do { (skb)->proto_data_valid = (v); } while (0)
+#else
+#define SKB_PROTO_DATA_VALID(skb) 0
+#define SET_SKB_PROTO_DATA_VALID(skb, v) do {} while (0)
+#endif
 
 #define GRANT_INVALID_REF	0
 
@@ -88,6 +107,7 @@ struct netfront_info {
 
 	unsigned int handle;
 	unsigned int evtchn, irq;
+	unsigned int copyall;
 
 	/* Receive-ring batched refills. */
 #define RX_MIN_TARGET 8
@@ -148,7 +168,7 @@ static inline unsigned short get_id_from
 
 static int talk_to_backend(struct xenbus_device *, struct netfront_info *);
 static int setup_device(struct xenbus_device *, struct netfront_info *);
-static struct net_device *create_netdev(int, struct xenbus_device *);
+static struct net_device *create_netdev(int, int, struct xenbus_device *);
 
 static void netfront_closing(struct xenbus_device *);
 
@@ -190,14 +210,41 @@ static int __devinit netfront_probe(stru
 	struct net_device *netdev;
 	struct netfront_info *info;
 	unsigned int handle;
+#ifndef CONFIG_XEN
+	unsigned feature_rx_flags;
+#endif
+	unsigned feature_rx_copy;
 
 	err = xenbus_scanf(XBT_NIL, dev->nodename, "handle", "%u", &handle);
 	if (err != 1) {
 		xenbus_dev_fatal(dev, err, "reading handle");
 		return err;
 	}
-
-	netdev = create_netdev(handle, dev);
+#ifndef CONFIG_XEN
+	err = xenbus_scanf(XBT_NIL, dev->otherend, "feature-rx-flags", "%u",
+			   &feature_rx_flags);
+	if (err == 1) {
+		err = xenbus_scanf(XBT_NIL,
+				   dev->otherend,
+				   "feature-rx-copy",
+				   "%u",
+				   &feature_rx_copy);
+		if (err != 1) {
+			feature_rx_copy = 0;
+			err = EINVAL;
+		}
+	} else {
+		feature_rx_copy = feature_rx_flags = 0;
+	}
+	if (!feature_rx_copy) {
+		xenbus_dev_fatal(dev, err, "need a copy-capable backend");
+		return err;
+	}
+#else
+	feature_rx_copy = 0;
+#endif
+
+	netdev = create_netdev(handle, feature_rx_copy, dev);
 	if (IS_ERR(netdev)) {
 		err = PTR_ERR(netdev);
 		xenbus_dev_fatal(dev, err, "creating netdev");
@@ -300,6 +347,19 @@ again:
 			    "event-channel", "%u", info->evtchn);
 	if (err) {
 		message = "writing event-channel";
+		goto abort_transaction;
+	}
+
+	err = xenbus_printf(xbt, dev->nodename, "use-rx-flags", "%u", 1);
+	if (err) {
+		message = "writing use-rx-flags";
+		goto abort_transaction;
+	}
+
+	err = xenbus_printf(xbt, dev->nodename, "copy-delivery-offset", "%u",
+			    16);
+	if (err) {
+		message = "writing copy-delivery-offset";
 		goto abort_transaction;
 	}
 
@@ -550,6 +610,8 @@ static void network_alloc_rx_buffers(str
 	RING_IDX req_prod = np->rx.req_prod_pvt;
 	struct xen_memory_reservation reservation;
 	grant_ref_t ref;
+	netif_rx_request_t *req;
+	int nr_flips;
 
 	if (unlikely(!netif_carrier_ok(dev)))
 		return;
@@ -592,7 +654,7 @@ static void network_alloc_rx_buffers(str
 		np->rx_target = np->rx_max_target;
 
  refill:
-	for (i = 0; ; i++) {
+	for (nr_flips = i = 0; ; i++) {
 		if ((skb = __skb_dequeue(&np->rx_batch)) == NULL)
 			break;
 
@@ -602,17 +664,78 @@ static void network_alloc_rx_buffers(str
 
 		np->rx_skbs[id] = skb;
 
-		RING_GET_REQUEST(&np->rx, req_prod + i)->id = id;
 		ref = gnttab_claim_grant_reference(&np->gref_rx_head);
 		BUG_ON((signed short)ref < 0);
 		np->grant_rx_ref[id] = ref;
-		gnttab_grant_foreign_transfer_ref(ref,
-						  np->xbdev->otherend_id,
-						  __pa(skb->head)>>PAGE_SHIFT);
-		RING_GET_REQUEST(&np->rx, req_prod + i)->gref = ref;
-		np->rx_pfn_array[i] = virt_to_mfn(skb->head);
+
+		req = RING_GET_REQUEST(&np->rx, req_prod + i);
+		if ( !np->copyall ) {
+			gnttab_grant_foreign_transfer_ref(ref,
+							  np->xbdev->otherend_id,
+							  __pa(skb->head) >> PAGE_SHIFT);
+			np->rx_pfn_array[nr_flips] = virt_to_mfn(skb->head);
+
+			if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+				/* Remove this page from map before
+				 * passing back to Xen. */
+				set_phys_to_machine(__pa(skb->head) >>
+						    PAGE_SHIFT,
+						    INVALID_P2M_ENTRY);
+
+				MULTI_update_va_mapping(np->rx_mcl+nr_flips,
+						      (unsigned long)skb->head,
+							__pte(0), 0);
+			}
+			nr_flips++;
+			req->flags = 0;
+		} else {
+			gnttab_grant_foreign_access_ref(ref,
+							np->xbdev->otherend_id,
+							virt_to_mfn(skb->head),
+							0);
+			req->flags = NETIF_RXRF_copy_packet;
+		}
+		req->gref = ref;
+		req->id = id;
+	}
+
+	if ( nr_flips != 0 ) {
+		set_xen_guest_handle(reservation.extent_start,
+				     np->rx_pfn_array);
+		reservation.nr_extents   = nr_flips;
+		reservation.extent_order = 0;
+		reservation.address_bits = 0;
+		reservation.domid        = DOMID_SELF;
+
+		/* Tell the ballon driver what is going on. */
+		balloon_update_driver_allowance(nr_flips);
 
 		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+			/* After all PTEs have been zapped, flush the
+			 * TLB. */
+			np->rx_mcl[nr_flips-1].args[MULTI_UVMFLAGS_INDEX] =
+				UVMF_TLB_FLUSH|UVMF_ALL;
+
+			/* Give away a batch of pages. */
+			np->rx_mcl[nr_flips].op = __HYPERVISOR_memory_op;
+			np->rx_mcl[nr_flips].args[0] =
+				XENMEM_decrease_reservation;
+			np->rx_mcl[nr_flips].args[1] =
+				(unsigned long)&reservation;
+
+			/* Zap PTEs and give away pages in one big
+			 * multicall. */
+			(void)HYPERVISOR_multicall(np->rx_mcl, nr_flips + 1);
+
+			/* Check return status of
+			 * HYPERVISOR_memory_op(). */
+			if (unlikely(np->rx_mcl[nr_flips].result != nr_flips))
+				panic("Unable to reduce memory reservation (%ld,%d)\n",
+				      np->rx_mcl[nr_flips].result, nr_flips);
+		} else {
+			if (HYPERVISOR_memory_op(XENMEM_decrease_reservation,
+						 &reservation) != i)
+				panic("Unable to reduce memory reservation\n");
 			/* Remove this page before passing back to Xen. */
 			set_phys_to_machine(__pa(skb->head) >> PAGE_SHIFT,
 					    INVALID_P2M_ENTRY);
@@ -620,37 +743,9 @@ static void network_alloc_rx_buffers(str
 						(unsigned long)skb->head,
 						__pte(0), 0);
 		}
-	}
-
-	/* Tell the ballon driver what is going on. */
-	balloon_update_driver_allowance(i);
-
-	set_xen_guest_handle(reservation.extent_start, np->rx_pfn_array);
-	reservation.nr_extents   = i;
-	reservation.extent_order = 0;
-	reservation.address_bits = 0;
-	reservation.domid        = DOMID_SELF;
-
-	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-		/* After all PTEs have been zapped, flush the TLB. */
-		np->rx_mcl[i-1].args[MULTI_UVMFLAGS_INDEX] =
-			UVMF_TLB_FLUSH|UVMF_ALL;
-
-		/* Give away a batch of pages. */
-		np->rx_mcl[i].op = __HYPERVISOR_memory_op;
-		np->rx_mcl[i].args[0] = XENMEM_decrease_reservation;
-		np->rx_mcl[i].args[1] = (unsigned long)&reservation;
-
-		/* Zap PTEs and give away pages in one big multicall. */
-		(void)HYPERVISOR_multicall(np->rx_mcl, i+1);
-
-		/* Check return status of HYPERVISOR_memory_op(). */
-		if (unlikely(np->rx_mcl[i].result != i))
-			panic("Unable to reduce memory reservation\n");
-	} else
-		if (HYPERVISOR_memory_op(XENMEM_decrease_reservation,
-					 &reservation) != i)
-			panic("Unable to reduce memory reservation\n");
+	} else {
+		wmb();
+	}
 
 	/* Above is a suitable barrier to ensure backend will see requests. */
 	np->rx.req_prod_pvt = req_prod + i;
@@ -774,9 +869,10 @@ static int network_start_xmit(struct sk_
 
 	if (skb->ip_summed == CHECKSUM_HW) /* local packet? */
 		tx->flags |= NETTXF_csum_blank | NETTXF_data_validated;
-	if (skb->proto_data_valid) /* remote but checksummed? */
+	if (SKB_PROTO_DATA_VALID(skb)) /* remote but checksummed? */
 		tx->flags |= NETTXF_data_validated;
 
+#ifdef HAVE_GSO
 	if (skb_shinfo(skb)->gso_size) {
 		struct netif_extra_info *gso = (struct netif_extra_info *)
 			RING_GET_REQUEST(&np->tx, ++i);
@@ -793,6 +889,7 @@ static int network_start_xmit(struct sk_
 		gso->flags = 0;
 		extra = gso;
 	}
+#endif
 
 	np->tx.req_prod_pvt = i + 1;
 
@@ -852,6 +949,8 @@ static int netif_poll(struct net_device 
 	unsigned long flags;
 	unsigned long mfn;
 	grant_ref_t ref;
+	unsigned long ret;
+	netif_rx_request_t *req;
 
 	spin_lock(&np->rx_lock);
 
@@ -883,25 +982,50 @@ static int netif_poll(struct net_device 
 			continue;
 		}
 
-		/* Memory pressure, insufficient buffer headroom, ... */
-		if ((mfn = gnttab_end_foreign_transfer_ref(ref)) == 0) {
-			if (net_ratelimit())
-				WPRINTK("Unfulfilled rx req (id=%d, st=%d).\n",
-					rx->id, rx->status);
-			RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id =
-				rx->id;
-			RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref =
-				ref;
-			np->rx.req_prod_pvt++;
-			RING_PUSH_REQUESTS(&np->rx);
-			work_done--;
-			continue;
+		skb = np->rx_skbs[rx->id];
+
+		if ( !np->copyall ) {
+			/* Memory pressure, insufficient buffer
+			 * headroom, ... */
+			if ((mfn = gnttab_end_foreign_transfer_ref(ref)) == 0)
+			{
+				if (net_ratelimit())
+					WPRINTK("Unfulfilled rx req (id=%d, st=%d).\n",
+						rx->id, rx->status);
+				req = RING_GET_REQUEST(&np->rx,
+						       np->rx.req_prod_pvt);
+				req->id = rx->id;
+				req->gref = ref;
+				np->rx.req_prod_pvt++;
+				RING_PUSH_REQUESTS(&np->rx);
+				work_done--;
+				continue;
+			}
+			/* Remap the page. */
+			if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+				MULTI_update_va_mapping(mcl,
+						      (unsigned long)skb->head,
+							pfn_pte_ma(mfn,
+								  PAGE_KERNEL),
+							0);
+				mcl++;
+				mmu->ptr = ((maddr_t)mfn << PAGE_SHIFT)
+					| MMU_MACHPHYS_UPDATE;
+				mmu->val = __pa(skb->head) >> PAGE_SHIFT;
+				mmu++;
+
+				set_phys_to_machine(__pa(skb->head)
+						    >> PAGE_SHIFT,
+						    mfn);
+			}
+		} else {
+			ret = gnttab_end_foreign_access_ref(ref, 0);
+			BUG_ON(!ret);
 		}
 
 		gnttab_release_grant_reference(&np->gref_rx_head, ref);
 		np->grant_rx_ref[rx->id] = GRANT_INVALID_REF;
 
-		skb = np->rx_skbs[rx->id];
 		add_id_to_freelist(np->rx_skbs, rx->id);
 
 		/* NB. We handle skb overflow later. */
@@ -915,30 +1039,16 @@ static int netif_poll(struct net_device 
 		 */
 		if (rx->flags & (NETRXF_data_validated|NETRXF_csum_blank)) {
 			skb->ip_summed = CHECKSUM_UNNECESSARY;
-			skb->proto_data_valid = 1;
+			SET_SKB_PROTO_DATA_VALID(skb, 1);
 		} else {
 			skb->ip_summed = CHECKSUM_NONE;
-			skb->proto_data_valid = 0;
+			SET_SKB_PROTO_DATA_VALID(skb, 0);
 		}
+#ifdef CONFIG_XEN
 		skb->proto_csum_blank = !!(rx->flags & NETRXF_csum_blank);
-
+#endif
 		np->stats.rx_packets++;
 		np->stats.rx_bytes += rx->status;
-
-		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-			/* Remap the page. */
-			MULTI_update_va_mapping(mcl, (unsigned long)skb->head,
-						pfn_pte_ma(mfn, PAGE_KERNEL),
-						0);
-			mcl++;
-			mmu->ptr = ((maddr_t)mfn << PAGE_SHIFT)
-				| MMU_MACHPHYS_UPDATE;
-			mmu->val = __pa(skb->head) >> PAGE_SHIFT;
-			mmu++;
-
-			set_phys_to_machine(__pa(skb->head) >> PAGE_SHIFT,
-					    mfn);
-		}
 
 		__skb_queue_tail(&rxq, skb);
 	}
@@ -996,8 +1106,11 @@ static int netif_poll(struct net_device 
 				/* Copy any other fields we already set up. */
 				nskb->dev = skb->dev;
 				nskb->ip_summed = skb->ip_summed;
-				nskb->proto_data_valid = skb->proto_data_valid;
+				SET_SKB_PROTO_DATA_VALID(nskb,
+						  SKB_PROTO_DATA_VALID(skb));
+#ifdef CONFIG_XEN
 				nskb->proto_csum_blank = skb->proto_csum_blank;
+#endif
 			}
 
 			/* Reinitialise and then destroy the old skbuff. */
@@ -1126,6 +1239,8 @@ static void network_connect(struct net_d
 	struct netfront_info *np = netdev_priv(dev);
 	int i, requeue_idx;
 	struct sk_buff *skb;
+	grant_ref_t gref;
+	netif_rx_request_t *req;
 
 	xennet_set_features(dev);
 
@@ -1159,13 +1274,21 @@ static void network_connect(struct net_d
 	for (requeue_idx = 0, i = 1; i <= NET_RX_RING_SIZE; i++) {
 		if ((unsigned long)np->rx_skbs[i] < PAGE_OFFSET)
 			continue;
-		gnttab_grant_foreign_transfer_ref(
-			np->grant_rx_ref[i], np->xbdev->otherend_id,
-			__pa(np->rx_skbs[i]->data) >> PAGE_SHIFT);
-		RING_GET_REQUEST(&np->rx, requeue_idx)->gref =
-			np->grant_rx_ref[i];
-		RING_GET_REQUEST(&np->rx, requeue_idx)->id = i;
-		requeue_idx++;
+		gref = np->grant_rx_ref[i];
+		skb = np->rx_skbs[i];
+		if ( !np->copyall ) {
+			gnttab_grant_foreign_transfer_ref(
+				gref, np->xbdev->otherend_id,
+				__pa(skb->data) >> PAGE_SHIFT);
+		} else {
+			gnttab_grant_foreign_access_ref(
+				gref, np->xbdev->otherend_id,
+				virt_to_mfn(skb->data), 0);
+		}
+		req = RING_GET_REQUEST(&np->rx, requeue_idx);
+		req->gref = gref;
+		req->id = i;
+		requeue_idx++; 
 	}
 
 	np->rx.req_prod_pvt = requeue_idx;
@@ -1348,10 +1471,13 @@ static void network_set_multicast_list(s
 
 /** Create a network device.
  * @param handle device handle
+ * @param copyall flag; 1 if every packet must be copied, 0 if every packet
+ * must be flipped.
  * @param val return parameter for created device
  * @return 0 on success, error code otherwise
  */
 static struct net_device * __devinit create_netdev(int handle,
+						   int copyall,
 						   struct xenbus_device *dev)
 {
 	int i, err = 0;
@@ -1368,6 +1494,7 @@ static struct net_device * __devinit cre
 	np                = netdev_priv(netdev);
 	np->handle        = handle;
 	np->xbdev         = dev;
+	np->copyall       = copyall;
 
 	netif_carrier_off(netdev);
 
@@ -1418,7 +1545,11 @@ static struct net_device * __devinit cre
 	netdev->uninit          = netif_uninit;
 	netdev->change_mtu	= xennet_change_mtu;
 	netdev->weight          = 64;
+#ifdef CONFIG_XEN
 	netdev->features        = NETIF_F_IP_CSUM;
+#else
+	netdev->features        = 0;
+#endif
 
 	SET_ETHTOOL_OPS(netdev, &network_ethtool_ops);
 	SET_MODULE_OWNER(netdev);
@@ -1581,8 +1712,10 @@ static int __init netif_init(void)
 	if (!is_running_on_xen())
 		return -ENODEV;
 
+#ifdef CONFIG_XEN
 	if (xen_start_info->flags & SIF_INITDOMAIN)
 		return 0;
+#endif
 
 	IPRINTK("Initialising virtual ethernet driver.\n");
 
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.c
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.c	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.c	Mon Jul 17 23:38:59 2006 +0100
@@ -39,6 +39,8 @@
 #include <xen/xenbus.h>
 #include "xenbus_comms.h"
 
+void *shared_xenstore_buf;
+
 static int xenbus_irq;
 
 extern void xenbus_probe(void *);
@@ -49,7 +51,7 @@ DECLARE_WAIT_QUEUE_HEAD(xb_waitq);
 
 static inline struct xenstore_domain_interface *xenstore_domain_interface(void)
 {
-	return mfn_to_virt(xen_start_info->store_mfn);
+	return shared_xenstore_buf;
 }
 
 static irqreturn_t wake_waiting(int irq, void *unused, struct pt_regs *regs)
@@ -129,7 +131,7 @@ int xb_write(const void *data, unsigned 
 		intf->req_prod += avail;
 
 		/* This implies mb() before other side sees interrupt. */
-		notify_remote_via_evtchn(xen_start_info->store_evtchn);
+		notify_remote_via_evtchn(xen_store_evtchn);
 	}
 
 	return 0;
@@ -180,7 +182,7 @@ int xb_read(void *data, unsigned len)
 		pr_debug("Finished read of %i bytes (%i to go)\n", avail, len);
 
 		/* Implies mb(): they will see new header. */
-		notify_remote_via_evtchn(xen_start_info->store_evtchn);
+		notify_remote_via_evtchn(xen_store_evtchn);
 	}
 
 	return 0;
@@ -195,7 +197,7 @@ int xb_init_comms(void)
 		unbind_from_irqhandler(xenbus_irq, &xb_waitq);
 
 	err = bind_evtchn_to_irqhandler(
-		xen_start_info->store_evtchn, wake_waiting,
+		xen_store_evtchn, wake_waiting,
 		0, "xenbus", &xb_waitq);
 	if (err <= 0) {
 		printk(KERN_ERR "XENBUS request irq failed %i\n", err);
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.h
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.h	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.h	Mon Jul 17 23:38:59 2006 +0100
@@ -39,5 +39,7 @@ int xb_read(void *data, unsigned len);
 int xb_read(void *data, unsigned len);
 int xs_input_avail(void);
 extern wait_queue_head_t xb_waitq;
+extern void *shared_xenstore_buf;
+extern int xen_store_evtchn;
 
 #endif /* _XENBUS_COMMS_H */
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_dev.c
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_dev.c	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_dev.c	Mon Jul 17 23:38:59 2006 +0100
@@ -48,6 +48,7 @@
 #include <xen/xenbus.h>
 #include <xen/xen_proc.h>
 #include <asm/hypervisor.h>
+#include <asm/io.h>
 
 struct xenbus_dev_transaction {
 	struct list_head list;
@@ -181,7 +182,7 @@ static int xenbus_dev_open(struct inode 
 {
 	struct xenbus_dev_data *u;
 
-	if (xen_start_info->store_evtchn == 0)
+	if (xen_store_evtchn == 0)
 		return -ENOENT;
 
 	nonseekable_open(inode, filp);
@@ -232,7 +233,7 @@ static struct file_operations xenbus_dev
 	.poll = xenbus_dev_poll,
 };
 
-static int __init
+int __init
 xenbus_dev_init(void)
 {
 	xenbus_dev_intf = create_xen_proc_entry("xenbus", 0400);
@@ -242,4 +243,6 @@ xenbus_dev_init(void)
 	return 0;
 }
 
+#ifndef MODULE
 __initcall(xenbus_dev_init);
+#endif
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c	Mon Jul 17 23:38:59 2006 +0100
@@ -44,6 +44,7 @@
 #include <linux/kthread.h>
 
 #include <asm/io.h>
+#include <asm/maddr.h>
 #include <asm/page.h>
 #include <asm/pgtable.h>
 #include <asm/hypervisor.h>
@@ -51,8 +52,12 @@
 #include <xen/xen_proc.h>
 #include <xen/evtchn.h>
 #include <xen/features.h>
+#include <xen/hvm.h>
 
 #include "xenbus_comms.h"
+
+int xen_store_evtchn;
+static unsigned long xen_store_mfn;
 
 extern struct mutex xenwatch_mutex;
 
@@ -915,8 +920,7 @@ static int xsd_kva_mmap(struct file *fil
 	if ((size > PAGE_SIZE) || (vma->vm_pgoff != 0))
 		return -EINVAL;
 
-	if (remap_pfn_range(vma, vma->vm_start,
-			    mfn_to_pfn(xen_start_info->store_mfn),
+	if (remap_pfn_range(vma, vma->vm_start, mfn_to_pfn(xen_store_mfn),
 			    size, vma->vm_page_prot))
 		return -EAGAIN;
 
@@ -928,7 +932,7 @@ static int xsd_kva_read(char *page, char
 {
 	int len;
 
-	len  = sprintf(page, "0x%p", mfn_to_virt(xen_start_info->store_mfn));
+	len  = sprintf(page, "0x%p", mfn_to_virt(xen_store_mfn));
 	*eof = 1;
 	return len;
 }
@@ -938,12 +942,11 @@ static int xsd_port_read(char *page, cha
 {
 	int len;
 
-	len  = sprintf(page, "%d", xen_start_info->store_evtchn);
+	len  = sprintf(page, "%d", xen_store_evtchn);
 	*eof = 1;
 	return len;
 }
 #endif
-
 
 static int __init xenbus_probe_init(void)
 {
@@ -962,7 +965,11 @@ static int __init xenbus_probe_init(void
 	/*
 	 * Domain0 doesn't have a store_evtchn or store_mfn yet.
 	 */
+#ifdef CONFIG_XEN
 	dom0 = (xen_start_info->store_evtchn == 0);
+#else
+	dom0 = 0;
+#endif
 
 	if (dom0) {
 		struct evtchn_alloc_unbound alloc_unbound;
@@ -972,7 +979,7 @@ static int __init xenbus_probe_init(void
 		if (!page)
 			return -ENOMEM;
 
-		xen_start_info->store_mfn =
+		xen_store_mfn =
 			pfn_to_mfn(virt_to_phys((void *)page) >>
 				   PAGE_SHIFT);
 
@@ -985,7 +992,7 @@ static int __init xenbus_probe_init(void
 		if (err == -ENOSYS)
 			goto err;
 		BUG_ON(err);
-		xen_start_info->store_evtchn = alloc_unbound.port;
+		xen_store_evtchn = alloc_unbound.port;
 
 #ifdef CONFIG_PROC_FS
 		/* And finally publish the above info in /proc/xen */
@@ -1001,8 +1008,21 @@ static int __init xenbus_probe_init(void
 		if (xsd_port_intf)
 			xsd_port_intf->read_proc = xsd_port_read;
 #endif
-	} else
+		shared_xenstore_buf = mfn_to_virt(xen_store_mfn);
+	} else {
 		xenstored_ready = 1;
+#ifdef CONFIG_XEN
+		xen_store_evtchn = xen_start_info->store_evtchn;
+		xen_store_mfn = xen_start_info->store_mfn;
+		shared_xenstore_buf = mfn_to_virt(xen_store_mfn);
+#else
+		xen_store_evtchn = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN);
+		xen_store_mfn = hvm_get_parameter(HVM_PARAM_STORE_PFN);
+		shared_xenstore_buf = ioremap(xen_store_mfn << PAGE_SHIFT,
+					      PAGE_SIZE);
+		xenbus_dev_init();
+#endif
+	}
 
 	/* Initialize the interface to xenstore. */
 	err = xs_init();
@@ -1035,8 +1055,10 @@ static int __init xenbus_probe_init(void
 }
 
 postcore_initcall(xenbus_probe_init);
-
-
+MODULE_LICENSE("Dual BSD/GPL");
+
+
+#ifndef MODULE
 static int is_disconnected_device(struct device *dev, void *data)
 {
 	struct xenbus_device *xendev = to_xenbus_device(dev);
@@ -1105,3 +1127,4 @@ static int __init wait_for_devices(void)
 }
 
 late_initcall(wait_for_devices);
+#endif
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h
--- a/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h	Mon Jul 17 23:38:59 2006 +0100
@@ -42,6 +42,7 @@
 #define __STR(x) #x
 #define STR(x) __STR(x)
 
+#ifdef CONFIG_XEN
 #define _hypercall0(type, name)			\
 ({						\
 	long __res;				\
@@ -114,6 +115,92 @@
 		: "memory" );					\
 	(type)__res;						\
 })
+#else
+#define _hypercall0(type, name)			                \
+({						                \
+	long __res;				                \
+	asm volatile (				                \
+                "movl hypercall_page, %%eax\n"                  \
+                "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+		"call *%%eax"                                   \
+		: "=a" (__res)			                \
+		:				                \
+		: "memory" );			                \
+	(type)__res;				                \
+})
+
+#define _hypercall1(type, name, a1)				\
+({								\
+	long __res, __ign1;					\
+	asm volatile (						\
+                "movl hypercall_page, %%eax\n"                  \
+                "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+		"call *%%eax"                                   \
+		: "=a" (__res), "=b" (__ign1)			\
+		: "1" ((long)(a1))				\
+		: "memory" );					\
+	(type)__res;						\
+})
+
+#define _hypercall2(type, name, a1, a2)				\
+({								\
+	long __res, __ign1, __ign2;				\
+	asm volatile (						\
+                "movl hypercall_page, %%eax\n"                  \
+                "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+		"call *%%eax"                                   \
+		: "=a" (__res), "=b" (__ign1), "=c" (__ign2)	\
+		: "1" ((long)(a1)), "2" ((long)(a2))		\
+		: "memory" );					\
+	(type)__res;						\
+})
+
+#define _hypercall3(type, name, a1, a2, a3)			\
+({								\
+	long __res, __ign1, __ign2, __ign3;			\
+	asm volatile (						\
+                "movl hypercall_page, %%eax\n"                  \
+                "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+		"call *%%eax"                                   \
+		: "=a" (__res), "=b" (__ign1), "=c" (__ign2), 	\
+		"=d" (__ign3)					\
+		: "1" ((long)(a1)), "2" ((long)(a2)),		\
+		"3" ((long)(a3))				\
+		: "memory" );					\
+	(type)__res;						\
+})
+
+#define _hypercall4(type, name, a1, a2, a3, a4)			\
+({								\
+	long __res, __ign1, __ign2, __ign3, __ign4;		\
+	asm volatile (						\
+                "movl hypercall_page, %%eax\n"                  \
+                "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+		"call *%%eax"                                   \
+		: "=a" (__res), "=b" (__ign1), "=c" (__ign2),	\
+		"=d" (__ign3), "=S" (__ign4)			\
+		: "1" ((long)(a1)), "2" ((long)(a2)),		\
+		"3" ((long)(a3)), "4" ((long)(a4))		\
+		: "memory" );					\
+	(type)__res;						\
+})
+
+#define _hypercall5(type, name, a1, a2, a3, a4, a5)		\
+({								\
+	long __res, __ign1, __ign2, __ign3, __ign4, __ign5;	\
+	asm volatile (						\
+                "movl hypercall_page, %%eax\n"                  \
+                "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+		"call *%%eax"                                   \
+		: "=a" (__res), "=b" (__ign1), "=c" (__ign2),	\
+		"=d" (__ign3), "=S" (__ign4), "=D" (__ign5)	\
+		: "1" ((long)(a1)), "2" ((long)(a2)),		\
+		"3" ((long)(a3)), "4" ((long)(a4)),		\
+		"5" ((long)(a5))				\
+		: "memory" );					\
+	(type)__res;						\
+})
+#endif
 
 static inline int
 HYPERVISOR_set_trap_table(
@@ -354,6 +441,13 @@ HYPERVISOR_nmi_op(
 	return _hypercall2(int, nmi_op, op, arg);
 }
 
+static inline unsigned long
+HYPERVISOR_hvm_op(
+    int op, void *arg)
+{
+    return _hypercall2(unsigned long, hvm_op, op, arg);
+}
+
 static inline int
 HYPERVISOR_callback_op(
 	int cmd, void *arg)
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/page.h
--- a/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/page.h	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/page.h	Mon Jul 17 23:38:59 2006 +0100
@@ -20,6 +20,7 @@
 #include <xen/interface/xen.h>
 #include <xen/features.h>
 #include <xen/foreign_page.h>
+#include <asm/maddr.h>
 
 #define arch_free_page(_page,_order)			\
 ({	int foreign = PageForeign(_page);		\
@@ -59,123 +60,6 @@
 
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
 #define copy_user_page(to, from, vaddr, pg)	copy_page(to, from)
-
-/**** MACHINE <-> PHYSICAL CONVERSION MACROS ****/
-#define INVALID_P2M_ENTRY	(~0UL)
-#define FOREIGN_FRAME_BIT	(1UL<<31)
-#define FOREIGN_FRAME(m)	((m) | FOREIGN_FRAME_BIT)
-
-extern unsigned long *phys_to_machine_mapping;
-
-#undef machine_to_phys_mapping
-extern unsigned long *machine_to_phys_mapping;
-extern unsigned int   machine_to_phys_order;
-
-static inline unsigned long pfn_to_mfn(unsigned long pfn)
-{
-	if (xen_feature(XENFEAT_auto_translated_physmap))
-		return pfn;
-	return phys_to_machine_mapping[(unsigned int)(pfn)] &
-		~FOREIGN_FRAME_BIT;
-}
-
-static inline int phys_to_machine_mapping_valid(unsigned long pfn)
-{
-	if (xen_feature(XENFEAT_auto_translated_physmap))
-		return 1;
-	return (phys_to_machine_mapping[pfn] != INVALID_P2M_ENTRY);
-}
-
-static inline unsigned long mfn_to_pfn(unsigned long mfn)
-{
-	extern unsigned long max_mapnr;
-	unsigned long pfn;
-
-	if (xen_feature(XENFEAT_auto_translated_physmap))
-		return mfn;
-
-	if (unlikely((mfn >> machine_to_phys_order) != 0))
-		return max_mapnr;
-
-	/* The array access can fail (e.g., device space beyond end of RAM). */
-	asm (
-		"1:	movl %1,%0\n"
-		"2:\n"
-		".section .fixup,\"ax\"\n"
-		"3:	movl %2,%0\n"
-		"	jmp  2b\n"
-		".previous\n"
-		".section __ex_table,\"a\"\n"
-		"	.align 4\n"
-		"	.long 1b,3b\n"
-		".previous"
-		: "=r" (pfn)
-		: "m" (machine_to_phys_mapping[mfn]), "m" (max_mapnr) );
-
-	return pfn;
-}
-
-/*
- * We detect special mappings in one of two ways:
- *  1. If the MFN is an I/O page then Xen will set the m2p entry
- *     to be outside our maximum possible pseudophys range.
- *  2. If the MFN belongs to a different domain then we will certainly
- *     not have MFN in our p2m table. Conversely, if the page is ours,
- *     then we'll have p2m(m2p(MFN))==MFN.
- * If we detect a special mapping then it doesn't have a 'struct page'.
- * We force !pfn_valid() by returning an out-of-range pointer.
- *
- * NB. These checks require that, for any MFN that is not in our reservation,
- * there is no PFN such that p2m(PFN) == MFN. Otherwise we can get confused if
- * we are foreign-mapping the MFN, and the other domain as m2p(MFN) == PFN.
- * Yikes! Various places must poke in INVALID_P2M_ENTRY for safety.
- *
- * NB2. When deliberately mapping foreign pages into the p2m table, you *must*
- *      use FOREIGN_FRAME(). This will cause pte_pfn() to choke on it, as we
- *      require. In all the cases we care about, the FOREIGN_FRAME bit is
- *      masked (e.g., pfn_to_mfn()) so behaviour there is correct.
- */
-static inline unsigned long mfn_to_local_pfn(unsigned long mfn)
-{
-	extern unsigned long max_mapnr;
-	unsigned long pfn = mfn_to_pfn(mfn);
-	if ((pfn < max_mapnr)
-	    && !xen_feature(XENFEAT_auto_translated_physmap)
-	    && (phys_to_machine_mapping[pfn] != mfn))
-		return max_mapnr; /* force !pfn_valid() */
-	return pfn;
-}
-
-static inline void set_phys_to_machine(unsigned long pfn, unsigned long mfn)
-{
-	if (xen_feature(XENFEAT_auto_translated_physmap)) {
-		BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
-		return;
-	}
-	phys_to_machine_mapping[pfn] = mfn;
-}
-
-/* Definitions for machine and pseudophysical addresses. */
-#ifdef CONFIG_X86_PAE
-typedef unsigned long long paddr_t;
-typedef unsigned long long maddr_t;
-#else
-typedef unsigned long paddr_t;
-typedef unsigned long maddr_t;
-#endif
-
-static inline maddr_t phys_to_machine(paddr_t phys)
-{
-	maddr_t machine = pfn_to_mfn(phys >> PAGE_SHIFT);
-	machine = (machine << PAGE_SHIFT) | (phys & ~PAGE_MASK);
-	return machine;
-}
-static inline paddr_t machine_to_phys(maddr_t machine)
-{
-	paddr_t phys = mfn_to_pfn(machine >> PAGE_SHIFT);
-	phys = (phys << PAGE_SHIFT) | (machine & ~PAGE_MASK);
-	return phys;
-}
 
 /*
  * These are used to make use of C type-checking..
@@ -254,7 +138,6 @@ static inline unsigned long pgd_val(pgd_
 
 #define pgprot_val(x)	((x).pgprot)
 
-#define __pte_ma(x)	((pte_t) { (x) } )
 #define __pgprot(x)	((pgprot_t) { (x) } )
 
 #endif /* !__ASSEMBLY__ */
@@ -323,11 +206,6 @@ extern int page_is_ram(unsigned long pag
 	((current->personality & READ_IMPLIES_EXEC) ? VM_EXEC : 0 ) | \
 		 VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
 
-/* VIRT <-> MACHINE conversion */
-#define virt_to_machine(v)	(phys_to_machine(__pa(v)))
-#define virt_to_mfn(v)		(pfn_to_mfn(__pa(v) >> PAGE_SHIFT))
-#define mfn_to_virt(m)		(__va(mfn_to_pfn(m) << PAGE_SHIFT))
-
 #define __HAVE_ARCH_GATE_AREA 1
 
 #endif /* __KERNEL__ */
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-2level.h
--- a/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-2level.h	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-2level.h	Mon Jul 17 23:38:59 2006 +0100
@@ -45,7 +45,6 @@
 
 #define pte_none(x)		(!(x).pte_low)
 #define pfn_pte(pfn, prot)	__pte(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
-#define pfn_pte_ma(pfn, prot)	__pte_ma(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
 #define pfn_pmd(pfn, prot)	__pmd(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
 
 /*
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-3level.h
--- a/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-3level.h	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-3level.h	Mon Jul 17 23:38:59 2006 +0100
@@ -151,18 +151,6 @@ static inline int pte_none(pte_t pte)
 
 extern unsigned long long __supported_pte_mask;
 
-static inline pte_t pfn_pte_ma(unsigned long page_nr, pgprot_t pgprot)
-{
-	pte_t pte;
-
-	pte.pte_high = (page_nr >> (32 - PAGE_SHIFT)) | \
-					(pgprot_val(pgprot) >> 32);
-	pte.pte_high &= (__supported_pte_mask >> 32);
-	pte.pte_low = ((page_nr << PAGE_SHIFT) | pgprot_val(pgprot)) & \
-							__supported_pte_mask;
-	return pte;
-}
-
 static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
 {
 	return pfn_pte_ma(pfn_to_mfn(page_nr), pgprot);
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/xen/xenbus.h
--- a/linux-2.6-xen-sparse/include/xen/xenbus.h	Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/include/xen/xenbus.h	Mon Jul 17 23:38:59 2006 +0100
@@ -295,5 +295,6 @@ void xenbus_dev_fatal(struct xenbus_devi
 void xenbus_dev_fatal(struct xenbus_device *dev, int err, const char *fmt,
 		      ...);
 
+int __init xenbus_dev_init(void);
 
 #endif /* _XEN_XENBUS_H */
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/maddr.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/maddr.h	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,153 @@
+#ifndef _I386_MADDR_H
+#define _I386_MADDR_H
+
+#include <xen/features.h>
+#include <xen/interface/arch-x86_32.h>
+#include <xen/interface/xen.h>
+
+/**** MACHINE <-> PHYSICAL CONVERSION MACROS ****/
+#define INVALID_P2M_ENTRY	(~0UL)
+#define FOREIGN_FRAME_BIT	(1UL<<31)
+#define FOREIGN_FRAME(m)	((m) | FOREIGN_FRAME_BIT)
+
+extern unsigned long *phys_to_machine_mapping;
+
+#undef machine_to_phys_mapping
+extern unsigned long *machine_to_phys_mapping;
+extern unsigned int   machine_to_phys_order;
+
+static inline unsigned long pfn_to_mfn(unsigned long pfn)
+{
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return pfn;
+	return phys_to_machine_mapping[(unsigned int)(pfn)] &
+		~FOREIGN_FRAME_BIT;
+}
+
+static inline int phys_to_machine_mapping_valid(unsigned long pfn)
+{
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return 1;
+	return (phys_to_machine_mapping[pfn] != INVALID_P2M_ENTRY);
+}
+
+static inline unsigned long mfn_to_pfn(unsigned long mfn)
+{
+#ifdef CONFIG_XEN
+	extern unsigned long max_mapnr;
+	unsigned long pfn;
+#endif
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return mfn;
+
+#ifndef CONFIG_XEN
+	BUG();
+#else
+	if (unlikely((mfn >> machine_to_phys_order) != 0))
+		return max_mapnr;
+
+	/* The array access can fail (e.g., device space beyond end of RAM). */
+	asm (
+		"1:	movl %1,%0\n"
+		"2:\n"
+		".section .fixup,\"ax\"\n"
+		"3:	movl %2,%0\n"
+		"	jmp  2b\n"
+		".previous\n"
+		".section __ex_table,\"a\"\n"
+		"	.align 4\n"
+		"	.long 1b,3b\n"
+		".previous"
+		: "=r" (pfn)
+		: "m" (machine_to_phys_mapping[mfn]), "m" (max_mapnr) );
+
+	return pfn;
+#endif
+}
+
+/*
+ * We detect special mappings in one of two ways:
+ *  1. If the MFN is an I/O page then Xen will set the m2p entry
+ *     to be outside our maximum possible pseudophys range.
+ *  2. If the MFN belongs to a different domain then we will certainly
+ *     not have MFN in our p2m table. Conversely, if the page is ours,
+ *     then we'll have p2m(m2p(MFN))==MFN.
+ * If we detect a special mapping then it doesn't have a 'struct page'.
+ * We force !pfn_valid() by returning an out-of-range pointer.
+ *
+ * NB. These checks require that, for any MFN that is not in our reservation,
+ * there is no PFN such that p2m(PFN) == MFN. Otherwise we can get confused if
+ * we are foreign-mapping the MFN, and the other domain as m2p(MFN) == PFN.
+ * Yikes! Various places must poke in INVALID_P2M_ENTRY for safety.
+ *
+ * NB2. When deliberately mapping foreign pages into the p2m table, you *must*
+ *      use FOREIGN_FRAME(). This will cause pte_pfn() to choke on it, as we
+ *      require. In all the cases we care about, the FOREIGN_FRAME bit is
+ *      masked (e.g., pfn_to_mfn()) so behaviour there is correct.
+ */
+static inline unsigned long mfn_to_local_pfn(unsigned long mfn)
+{
+	extern unsigned long max_mapnr;
+	unsigned long pfn = mfn_to_pfn(mfn);
+	if ((pfn < max_mapnr)
+	    && !xen_feature(XENFEAT_auto_translated_physmap)
+	    && (phys_to_machine_mapping[pfn] != mfn))
+		return max_mapnr; /* force !pfn_valid() */
+	return pfn;
+}
+
+static inline void set_phys_to_machine(unsigned long pfn, unsigned long mfn)
+{
+	if (xen_feature(XENFEAT_auto_translated_physmap)) {
+		BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
+		return;
+	}
+	phys_to_machine_mapping[pfn] = mfn;
+}
+
+/* Definitions for machine and pseudophysical addresses. */
+#ifdef CONFIG_X86_PAE
+typedef unsigned long long paddr_t;
+typedef unsigned long long maddr_t;
+#else
+typedef unsigned long paddr_t;
+typedef unsigned long maddr_t;
+#endif
+
+static inline maddr_t phys_to_machine(paddr_t phys)
+{
+	maddr_t machine = pfn_to_mfn(phys >> PAGE_SHIFT);
+	machine = (machine << PAGE_SHIFT) | (phys & ~PAGE_MASK);
+	return machine;
+}
+static inline paddr_t machine_to_phys(maddr_t machine)
+{
+	paddr_t phys = mfn_to_pfn(machine >> PAGE_SHIFT);
+	phys = (phys << PAGE_SHIFT) | (machine & ~PAGE_MASK);
+	return phys;
+}
+
+/* VIRT <-> MACHINE conversion */
+#define virt_to_machine(v)	(phys_to_machine(__pa(v)))
+#define virt_to_mfn(v)		(pfn_to_mfn(__pa(v) >> PAGE_SHIFT))
+#define mfn_to_virt(m)		(__va(mfn_to_pfn(m) << PAGE_SHIFT))
+
+#ifdef CONFIG_X86_PAE
+static inline pte_t pfn_pte_ma(unsigned long page_nr, pgprot_t pgprot)
+{
+	pte_t pte;
+
+	pte.pte_high = (page_nr >> (32 - PAGE_SHIFT)) | \
+					(pgprot_val(pgprot) >> 32);
+	pte.pte_high &= (__supported_pte_mask >> 32);
+	pte.pte_low = ((page_nr << PAGE_SHIFT) | pgprot_val(pgprot)) & \
+							__supported_pte_mask;
+	return pte;
+}
+#else
+#define pfn_pte_ma(pfn, prot)	__pte_ma(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
+#endif
+
+#define __pte_ma(x)	((pte_t) { (x) } )
+
+#endif /* _I386_MADDR_H */
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/xen/hvm.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/linux-2.6-xen-sparse/include/xen/hvm.h	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,17 @@
+/* Simple wrappers around HVM functions */
+#ifndef XEN_HVM_H__
+#define XEN_HVM_H__
+
+#include <xen/interface/hvm/params.h>
+#include <asm/hypercall.h>
+
+static inline unsigned long hvm_get_parameter(int idx)
+{
+	struct xen_hvm_param xhv;
+
+	xhv.domid = DOMID_SELF;
+	xhv.index = idx;
+	return HYPERVISOR_hvm_op(HVMOP_get_param, &xhv);
+}
+
+#endif /* XEN_HVM_H__ */
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/Makefile
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/Makefile	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,22 @@
+include $(M)/overrides.mk
+
+obj-$(CONFIG_XEN_EVTCHN_PCI)	+= evtchn-pci/
+obj-$(CONFIG_XEN_BLKDEV_FRONTEND)	+= blkfront/
+obj-$(CONFIG_XEN_NETDEV_FRONTEND)	+= netfront/
+obj-m	+= xenbus/
+
+
+debug:
+	chmod +x compile.sh
+	chmod +x mkbuildtree
+	echo $(XEN_DRIVERS_ROOT)
+	echo $(EXTRA_CFLAGS)
+	./compile.sh
+
+clean:
+	find . -name "*.o" |xargs rm -f
+	find . -name "*.ko" |xargs rm -f
+	find . -name "*.mod.c" |xargs rm -f
+	find . -name ".*.cmd" |xargs rm -f
+	rm .tmp_versions -rf
+    
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/README
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/README	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,7 @@
+To build, run ./mkbuildtree and then
+
+make -C /path/to/kernel/source M=$PWD modules
+
+You get four modules, xen-evtchn-pci.ko, xenbus.ko, xen-vbd.ko, and
+xen-vnif.ko.  Load xen-evtchn-pci first, then xenbus, and then
+whichever of xen-vbd and xen-vnif you happen to need.
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/blkfront/Kbuild
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/blkfront/Kbuild	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,6 @@
+include $(M)/overrides.mk
+
+obj-m += xen-vbd.o
+
+xen-vbd-objs := blkfront.o vbd.o
+
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/Kbuild
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/Kbuild	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,8 @@
+include $(M)/overrides.mk
+
+obj-m := xen-evtchn-pci.o
+
+EXTRA_CFLAGS += -I$(M)/evtchn-pci
+
+xen-evtchn-pci-objs := evtchn.o evtchn-pci.o gnttab.o xen_proc.o xen_support.o\
+	features.o
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/debuginfo.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/debuginfo.h	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,56 @@
+#ifndef __DEBUG_INFO__
+#define __DEBUG_INFO__
+//#define INSERT_TEST
+//#define VMX_DEBUG_INFO
+//#define KERNEL_DEBUG_INFO
+//#define FREQ_PRINT
+
+#define infotime(seconds, x, a...) \
+{           \
+static unsigned long prevjiffy = 0; \
+        if(time_after(jiffies, prevjiffy + seconds*HZ)) { \
+            prevjiffy = jiffies; \
+            vmx_printk(x, ##a); \
+        } \
+}
+
+#ifdef KERNEL_DEBUG_INFO
+#define dprintk(x, a...) \
+	printk("<vbd> " x, ##a)
+#define dprintknl(x, a...) \
+	printk(x, ##a)
+#define dprintkentry(x, a...) \
+	printk("<vbd-entry> " x "\n", ##a)
+#define dprintkexit(x, a...) \
+	printk("<vbd-exit> " x "\n", ##a)
+#ifdef FREQ_PRINT
+#define dprintkfreq(x, a...) \
+	printk("<vbd-freq> " x, ##a)
+#else
+#define dprintkfreq(x, a...)
+#endif 
+#elif defined(VMX_DEBUG_INFO)
+#define dprintk(x, a...) \
+	vmx_printk("<vbd> " x, ##a)
+#define dprintknl(x, a...) \
+	vmx_printk(x, ##a)
+#define dprintkentry(x, a...) \
+	vmx_printk("<vbd-entry> " x "\n", ##a)
+#define dprintkexit(x, a...) \
+	vmx_printk("<vbd-exit> " x "\n", ##a)
+#ifdef FREQ_PRINT
+#define dprintkfreq(x, a...) \
+	vmx_printk("<vbd-freq> " x, ##a)
+#else
+#define dprintkfreq(x, a...)
+#endif 
+
+#else
+#define dprintk(x, a...)
+#define dprintkentry(x, a...)
+#define dprintkexit(x, a...)
+#define dprintkfreq(x, a...)
+#define dprintknl(x, a...)
+#endif
+int vmx_printk(const char *fmt, ...);
+#endif
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/evtchn-pci.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/evtchn-pci.c	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,299 @@
+/******************************************************************************
+ * evtchn-pci.c
+ * xen event channel fake PCI device driver
+ * Copyright (C) 2005, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/errno.h>
+#include <linux/pci.h>
+#include <linux/init.h>
+#include <linux/version.h>
+#include <linux/interrupt.h>
+#include <asm/system.h>
+#include <asm/io.h>
+#include <asm/irq.h>
+#include <asm/uaccess.h>
+#include <asm/hypervisor.h>
+#include <xen/interface/memory.h>
+
+#include "evtchn-pci.h"
+
+#define DRV_NAME    "xen-evtchn-pci"
+#define DRV_VERSION "0.10"
+#define DRV_RELDATE "03/03/2005"
+
+extern void *hypercall_page;
+
+static int callbackirq = 3;		/* legacy mode irq */
+static int nopci = 0;
+static char version[] __devinitdata =
+	KERN_INFO DRV_NAME ":version " DRV_VERSION " " DRV_RELDATE
+	" Xiaofeng. Ling\n";
+
+MODULE_AUTHOR("xiaofeng.ling@intel.com");
+MODULE_DESCRIPTION("Xen evtchn PCI device");
+MODULE_LICENSE("GPL");
+
+MODULE_PARM(nopci, "i");
+MODULE_PARM(callbackirq, "i");
+MODULE_PARM_DESC(callbackirq, "callback irq number for xen event channel");
+
+#define XEN_EVTCHN_VENDOR_ID 0xfffd
+#define XEN_EVTCHN_DEVICE_ID 0x0101
+
+static struct pci_device_id evtchn_pci_tbl[] __devinitdata = {
+	{XEN_EVTCHN_VENDOR_ID, XEN_EVTCHN_DEVICE_ID,
+	 PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0},
+	{0,}
+};
+
+MODULE_DEVICE_TABLE(pci, evtchn_pci_tbl);
+
+unsigned long *phys_to_machine_mapping;
+EXPORT_SYMBOL(phys_to_machine_mapping);
+
+static int __init init_xen_info(void)
+{
+	unsigned long shared_info_frame;
+	struct xen_add_to_physmap xatp;
+
+	setup_xen_features();
+
+	shared_info_frame = alloc_xen_mmio(PAGE_SIZE) >> PAGE_SHIFT;
+	xatp.domid = DOMID_SELF;
+	xatp.idx = 0;
+	xatp.space = XENMAPSPACE_shared_info;
+	xatp.gpfn = shared_info_frame;
+	BUG_ON(HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp));
+	HYPERVISOR_shared_info =
+		ioremap(shared_info_frame << PAGE_SHIFT, PAGE_SIZE);
+
+	if (!HYPERVISOR_shared_info)
+		panic("can't map shared info\n");
+
+	dprintk("ioremap shared_info successful\n");
+
+	phys_to_machine_mapping = NULL;
+
+	gnttab_init();
+	evtchn_init();
+
+	return 0;
+}
+
+static void __devexit evtchn_pci_remove(struct pci_dev *pdev)
+{
+	long ioaddr, iolen;
+
+	/*if there are io region, don't forget to release */
+	ioaddr = pci_resource_start(pdev, 0);
+	iolen = pci_resource_len(pdev, 0);
+	if (ioaddr != 0)
+	{
+		release_region(ioaddr, iolen);
+	}
+
+	pci_set_drvdata(pdev, NULL);
+	free_irq(pdev->irq, NULL);
+}
+
+extern irqreturn_t evtchn_interrupt(int irq, void *devid, struct pt_regs *regs);
+
+unsigned long evtchn_mmio = 0xc000000;
+unsigned long evtchn_mmio_alloc;
+unsigned long evtchn_mmiolen = 0x1000000;
+
+unsigned long alloc_xen_mmio(unsigned long len)
+{
+	unsigned long addr;
+
+	addr = 0;
+	if (evtchn_mmio_alloc + len <= evtchn_mmiolen)
+	{
+		addr = evtchn_mmio + evtchn_mmio_alloc;
+		evtchn_mmio_alloc += len;
+	} else {
+		panic("ran out of xen mmio space");
+	}
+	return addr;
+}
+
+static int __devinit evtchn_pci_init(struct pci_dev *pdev,
+				     const struct pci_device_id *ent)
+{
+	int i, ret, irq;
+	long ioaddr, iolen;
+	long mmio_addr, mmio_len;
+
+	printk(KERN_INFO DRV_NAME ":found evtchn pci device model, do init\n");
+
+#ifndef MODULE
+	static int printed_version;
+	if (!printed_version++)
+		printk(version);
+#endif
+
+	i = pci_enable_device(pdev);
+	if (i)
+		return i;
+
+	ioaddr = pci_resource_start(pdev, 0);
+	iolen = pci_resource_len(pdev, 0);
+
+	mmio_addr = pci_resource_start(pdev, 1);
+	mmio_len = pci_resource_len(pdev, 1);
+
+	if (mmio_addr != 0)
+	{
+		if (request_mem_region(mmio_addr, mmio_len, DRV_NAME) == NULL)
+		{
+			printk(KERN_ERR ":MEM I/O resource 0x%lx @ 0x%lx busy\n",
+				   mmio_addr, mmio_len);
+			return -EBUSY;
+		}
+		evtchn_mmio = mmio_addr;
+		evtchn_mmiolen = mmio_len;
+	}
+	else
+	{
+		printk(KERN_WARNING DRV_NAME ":no MMIO found!\n");
+	}
+
+	irq = pdev->irq;
+	callbackirq = irq;
+
+	/* 
+	 *  maybe some day we may use I/O port for checking status 
+	 *  when sharing interrupts 
+	 */
+	if (ioaddr != 0)
+	{
+		if (request_region(ioaddr, iolen, DRV_NAME) == NULL)
+		{
+			printk(KERN_ERR DRV_NAME ":I/O resource 0x%lx @ 0x%lx busy\n",
+				   iolen, ioaddr);
+			return -EBUSY;
+		}
+
+		hypercall_page = (void *)__get_free_page(GFP_KERNEL);
+		if (!hypercall_page)
+			panic("Cannot get hypercall page.\n");
+		memset(hypercall_page, 0xcc, PAGE_SIZE);
+		asm volatile("outl %%eax, %%dx\n"
+			     :
+			     : "a" (virt_to_phys(hypercall_page) >> PAGE_SHIFT),
+			       "d" (ioaddr)
+			     : "memory");
+	}
+	printk(KERN_INFO DRV_NAME ":use irq %d for event channel\n", irq);
+
+	if ((ret = request_irq(irq, evtchn_interrupt, SA_SHIRQ,
+			       "xen-evtchn-pci", evtchn_interrupt))) {
+		goto out;
+	}
+
+	if ((ret = init_xen_info()))
+		goto out;
+
+	if ((ret = set_callback_irq(irq)))
+		goto out;
+
+ out:
+	if (ret && hypercall_page)
+		free_page((unsigned long)hypercall_page);
+	return 0;
+}
+
+static struct pci_driver evtchn_driver = {
+  name:DRV_NAME,
+  probe:evtchn_pci_init,
+  remove:__devexit_p(evtchn_pci_remove),
+  id_table:evtchn_pci_tbl,
+};
+
+int __init setup_xen_callback(void)
+{
+	int rc = 0;
+	/* two ways for call back from hypervisor */
+
+	printk(KERN_INFO DRV_NAME ":legacy driver request irq :%d\n", callbackirq);
+	rc = request_irq(callbackirq, evtchn_interrupt, SA_SHIRQ,
+					 "xen-evtchn", evtchn_interrupt);
+	if (rc != 0)
+		printk(":request irq error:%d!", rc);
+	rc = set_callback_irq(callbackirq);
+	if (rc != 0)
+		printk(KERN_ERR DRV_NAME ":set call back irq error:%d!", rc);
+	return rc;
+}
+
+static int __init evtchn_pci_module_init(void)
+{
+	int rc;
+
+	printk(KERN_INFO DRV_NAME ":do xen module support init\n");
+
+/* when a module, this is printed whether or not devices are found in probe */
+#ifdef MODULE
+	printk(version);
+#endif
+
+	if (!nopci)
+	{
+		rc = pci_module_init(&evtchn_driver);
+		if (rc)
+			printk(KERN_INFO DRV_NAME ":No evtchn pci device model found,"
+				   "use legacy mode\n");
+	}
+	else
+	{
+		printk(KERN_INFO DRV_NAME ":disable evtchn pci device model"
+			   "by module arguments,use legacy mode\n");
+		rc = 1;
+	}
+
+	if (rc)
+	{
+		/*No Pci device, try legacy mode */
+		rc = init_xen_info();
+		if (rc)
+			return rc;
+		setup_xen_callback();
+		if (rc)
+			printk(KERN_ERR DRV_NAME ":setup xen legacy callback fail\n");
+	}
+
+	return rc;
+}
+
+static void __exit evtchn_pci_module_cleanup(void)
+{
+	printk(KERN_INFO DRV_NAME ":Do evtchn module cleanup\n");
+	/* disable hypervisor for callback irq */
+	set_callback_irq(0);
+
+	free_irq(callbackirq, NULL);
+
+	/*TODO: unmap hypercall param share page */
+
+	pci_unregister_driver(&evtchn_driver);
+}
+
+module_init(evtchn_pci_module_init);
+module_exit(evtchn_pci_module_cleanup);
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/evtchn-pci.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/evtchn-pci.h	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,58 @@
+/******************************************************************************
+ * evtchn-pci.h
+ * module driver support in unmodified Linux
+ * Copyright (C) 2004, Intel Corporation. <xiaofeng.ling@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+
+#ifndef __XEN_SUPPORT_H
+#define __XEN_SUPPORT_H
+#include <linux/version.h>
+#include <asm/io.h>
+#include <xen/interface/hvm/params.h>
+
+#include "debuginfo.h"
+
+extern unsigned long *phys_to_machine_mapping;
+
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0)
+#else
+#define __user
+#endif
+
+static inline int set_callback_irq(int irq)
+{
+	struct xen_hvm_param a;
+
+	a.domid = DOMID_SELF;
+	a.index = HVM_PARAM_CALLBACK_IRQ;
+	a.value = irq;
+	return HYPERVISOR_hvm_op(HVMOP_set_param, &a);
+}
+
+#define L2_PAGETABLE_SHIFT 22
+unsigned long alloc_xen_mmio(unsigned long len);
+
+int gnttab_init(void);
+void evtchn_init(void);
+void ctrl_if_init(void);
+
+void xen_machphys_update(unsigned long mfn, unsigned long pfn);
+int xen_do_init(void);
+
+void setup_xen_features(void);
+
+#endif
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/evtchn.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/evtchn.c	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,200 @@
+/******************************************************************************
+ * evtchn.c
+ * 
+ * A simplified event channel for para-drivers in unmodified linux
+ * 
+ * Copyright (c) 2002-2005, K A Fraser
+ * Copyright (c) 2005, <xiaofeng.ling@intel.com>
+ * 
+ * This file may be distributed separately from the Linux kernel, or
+ * incorporated into other software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/config.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <xen/evtchn.h>
+#include <xen/interface/hvm/ioreq.h>
+#include "evtchn-pci.h"
+
+void *hypercall_page;
+
+#define cpu_from_evtchn(port) (0)
+#define MAX_EVTCHN 256
+static struct
+{
+	irqreturn_t(*handler) (int, void *, struct pt_regs *);
+	void *dev_id;
+} evtchns[MAX_EVTCHN];
+
+void mask_evtchn(int port)
+{
+	shared_info_t *s = HYPERVISOR_shared_info;
+	synch_set_bit(port, &s->evtchn_mask[0]);
+}
+EXPORT_SYMBOL(mask_evtchn);
+
+void unmask_evtchn(int port)
+{
+	shared_info_t *s = HYPERVISOR_shared_info;
+	unsigned int cpu = smp_processor_id();
+	vcpu_info_t *vcpu_info = &s->vcpu_info[cpu];
+
+	/* Slow path (hypercall) if this is a non-local port. */
+	if (unlikely(cpu != cpu_from_evtchn(port))) {
+		evtchn_unmask_t op = { .port = port };
+		(void)HYPERVISOR_event_channel_op(EVTCHNOP_unmask,
+						  &op);
+		return;
+	}
+
+	synch_clear_bit(port, &s->evtchn_mask[0]);
+
+	/*
+	 * The following is basically the equivalent of 'hw_resend_irq'. Just
+	 * like a real IO-APIC we 'lose the interrupt edge' if the channel is
+	 * masked.
+	 */
+	if (synch_test_bit(port, &s->evtchn_pending[0]) && 
+	    !synch_test_and_set_bit(port / BITS_PER_LONG,
+				    &vcpu_info->evtchn_pending_sel)) {
+		vcpu_info->evtchn_upcall_pending = 1;
+		if (!vcpu_info->evtchn_upcall_mask)
+			force_evtchn_callback();
+	}
+}
+EXPORT_SYMBOL(unmask_evtchn);
+
+unsigned int bind_virq_to_evtchn(int virq)
+{
+	evtchn_bind_virq_t op;
+
+	op.virq = virq;
+	op.vcpu = 0;
+	if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq, &op) != 0)
+		BUG();
+
+	return op.port;
+}
+
+int
+bind_evtchn_to_irqhandler(unsigned int evtchn,
+			  irqreturn_t(*handler) (int, void *,
+						 struct pt_regs *),
+			  unsigned long irqflags, const char *devname,
+			  void *dev_id)
+{
+	if (evtchn >= MAX_EVTCHN)
+		return -EINVAL;
+	evtchns[evtchn].handler = handler;
+	evtchns[evtchn].dev_id = dev_id;
+	unmask_evtchn(evtchn);
+	return evtchn;
+}
+
+EXPORT_SYMBOL(bind_evtchn_to_irqhandler);
+
+void unbind_from_irqhandler(unsigned int evtchn, void *dev_id)
+{
+	if (evtchn >= MAX_EVTCHN)
+		return;
+
+	mask_evtchn(evtchn);
+	evtchns[evtchn].handler = NULL;
+}
+
+EXPORT_SYMBOL(unbind_from_irqhandler);
+
+void notify_remote_via_irq(int irq)
+{
+	int evtchn = irq;
+	notify_remote_via_evtchn(evtchn);
+}
+
+EXPORT_SYMBOL(notify_remote_via_irq);
+
+void unbind_evtchn_from_irq(unsigned int evtchn)
+{
+	return;
+}
+
+EXPORT_SYMBOL(unbind_evtchn_from_irq);
+
+#define active_evtchns(cpu,sh,idx)		\
+	((sh)->evtchn_pending[idx] &		\
+	 ~(sh)->evtchn_mask[idx])
+
+irqreturn_t evtchn_interrupt(int irq, void *dev_id, struct pt_regs *regs)
+{
+	unsigned long l1, l2;
+	unsigned int l1i, l2i, port;
+	int cpu = smp_processor_id();
+	irqreturn_t(*handler) (int, void *, struct pt_regs *);
+	shared_info_t *s = HYPERVISOR_shared_info;
+	vcpu_info_t *vcpu_info = &s->vcpu_info[cpu];
+
+	vcpu_info->evtchn_upcall_pending = 0;
+
+	/* NB. No need for a barrier here -- XCHG is a barrier on x86. */
+	l1 = xchg(&vcpu_info->evtchn_pending_sel, 0);
+	while (l1 != 0)
+	{
+		l1i = __ffs(l1);
+		l1 &= ~(1 << l1i);
+
+		while ((l2 = active_evtchns(cpu, s, l1i)) != 0)
+		{
+			l2i = __ffs(l2);
+
+			port = (l1i * BITS_PER_LONG) + l2i;
+
+			if ((handler = evtchns[port].handler) != NULL)
+			{
+				clear_evtchn(port);
+				handler(port, evtchns[port].dev_id, regs);
+			}
+			else
+			{
+				evtchn_device_upcall(port);
+			}
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
+void force_evtchn_callback(void)
+{
+	evtchn_interrupt(0, NULL, NULL);
+}
+
+EXPORT_SYMBOL(force_evtchn_callback);
+
+void bind_evtchn_to_cpu(unsigned int chn, unsigned int cpu)
+{
+}
+
+void __init evtchn_init(void)
+{
+
+}
+
+EXPORT_SYMBOL(hypercall_page);
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/xen_support.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/xen_support.c	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,53 @@
+/******************************************************************************
+ * support.c
+ * Xen module support functions.
+ * Copyright (C) 2004, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <xen/evtchn.h>
+#include <xen/interface/xen.h>
+#include <asm/hypervisor.h>
+#include "evtchn-pci.h"
+
+shared_info_t *HYPERVISOR_shared_info = NULL;
+EXPORT_SYMBOL(HYPERVISOR_shared_info); 
+
+EXPORT_SYMBOL(xen_machphys_update);
+void xen_machphys_update(unsigned long mfn, unsigned long pfn)
+{
+    mmu_update_t u;
+    u.ptr = (mfn << PAGE_SHIFT) | MMU_MACHPHYS_UPDATE;
+    u.val = pfn;
+    BUG_ON(HYPERVISOR_mmu_update(&u, 1, NULL, DOMID_SELF) < 0);
+}
+
+void balloon_update_driver_allowance(long delta)
+{
+}
+
+EXPORT_SYMBOL(balloon_update_driver_allowance);
+
+void evtchn_device_upcall(int port)
+{
+	printk("Error,no device upcall in guest domain (%d)!\n", port);
+	clear_evtchn(port);
+}
+
+EXPORT_SYMBOL (evtchn_device_upcall);
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/mkbuildtree
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/mkbuildtree	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,35 @@
+#! /bin/sh
+
+C=$PWD
+
+XEN=$C/../../xen
+XL=$C/../../linux-2.6-xen-sparse
+
+for d in $(find ${XL}/drivers/xen/ -type d -maxdepth 1 | sed -e 1d); do
+    if ! echo $d | egrep -q back; then
+        lndir $d $(basename $d) > /dev/null 2>&1
+    fi
+done
+
+ln -sf ${XL}/drivers/xen/net_driver_util.c netfront
+
+ln -sf ${XL}/drivers/xen/core/gnttab.c evtchn-pci
+ln -sf ${XL}/drivers/xen/core/features.c evtchn-pci
+ln -sf ${XL}/drivers/xen/core/xen_proc.c evtchn-pci
+
+mkdir -p include
+mkdir -p include/xen
+mkdir -p include/public
+mkdir -p include/asm
+
+lndir -silent ${XL}/include/xen include/xen
+ln -sf ${XEN}/include/public include/xen/interface
+
+# Need to be quite careful here: we don't want the files we link in to
+# risk overriding the native Linux ones (in particular, system.h must
+# be native and not xenolinux).
+ln -sf ${XL}/include/asm-i386/mach-xen/asm/hypervisor.h include/asm
+ln -sf ${XL}/include/asm-i386/mach-xen/asm/hypercall.h include/asm
+ln -sf ${XL}/include/asm-i386/mach-xen/asm/synch_bitops.h include/asm
+ln -sf ${XL}/include/asm-i386/mach-xen/asm/maddr.h include/asm
+
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/netfront/Kbuild
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/netfront/Kbuild	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,4 @@
+include $(M)/overrides.mk
+
+obj-m  = xen-vnif.o
+xen-vnif-objs	:= netfront.o
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/overrides.mk
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/overrides.mk	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,16 @@
+# Hack: we need to use the config which was used to build the kernel,
+# except that that won't have the right headers etc., so duplicate
+# some of the mach-xen infrastructure in here.
+#
+# (i.e. we need the native config for things like -mregparm, but
+# a Xen kernel to find the right headers)
+CONFIG_X86_XEN=y
+CONFIG_XEN_EVTCHN_PCI = m
+CONFIG_XEN_BLKDEV_FRONTEND	= m
+CONFIG_XEN_NETDEV_FRONTEND	= m
+EXTRA_CFLAGS += -DCONFIG_VMX -DCONFIG_VMX_GUEST -DCONFIG_X86_XEN
+EXTRA_CFLAGS += -DCONFIG_XEN_SHADOW_MODE -DCONFIG_XEN_SHADOW_TRANSLATE
+EXTRA_CFLAGS += -DCONFIG_XEN_BLKDEV_GRANT -DXEN_EVTCHN_MASK_OPS
+EXTRA_CFLAGS += -DCONFIG_XEN_NETDEV_GRANT_RX -DCONFIG_XEN_NETDEV_GRANT_TX
+EXTRA_CFLAGS += -D__XEN_INTERFACE_VERSION__=0x00030202
+EXTRA_CFLAGS += -I$(M)/include
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/xenbus/Kbuild
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/xenbus/Kbuild	Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,9 @@
+include $(M)/overrides.mk
+
+obj-m	+= xenbus.o
+xenbus-objs =
+xenbus-objs += xenbus_comms.o
+xenbus-objs += xenbus_xs.o
+xenbus-objs += xenbus_probe.o 
+xenbus-objs += xenbus_dev.o 
+xenbus-objs += xenbus_client.o 

[-- Attachment #1.1.4: hvm_xen_unstable.diff --]
[-- Type: text/plain, Size: 76134 bytes --]

diff -r ecb8ff1fcf1f linux-2.6-xen-sparse/drivers/xen/privcmd/privcmd.c
--- a/linux-2.6-xen-sparse/drivers/xen/privcmd/privcmd.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/privcmd/privcmd.c	Tue Jul 18 13:43:27 2006 +0100
@@ -270,6 +270,7 @@ static int __init privcmd_init(void)
 	set_bit(__HYPERVISOR_sched_op_compat,  hypercall_permission_map);
 	set_bit(__HYPERVISOR_event_channel_op_compat,
 		hypercall_permission_map);
+	set_bit(__HYPERVISOR_hvm_op,           hypercall_permission_map);
 
 	privcmd_intf = create_xen_proc_entry("privcmd", 0400);
 	if (privcmd_intf != NULL)
diff -r ecb8ff1fcf1f tools/firmware/hvmloader/hvmloader.c
--- a/tools/firmware/hvmloader/hvmloader.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/tools/firmware/hvmloader/hvmloader.c	Tue Jul 18 13:43:27 2006 +0100
@@ -31,7 +31,7 @@
 #define	ROMBIOS_PHYSICAL_ADDRESS	0x000F0000
 
 /* invoke SVM's paged realmode support */
-#define SVM_VMMCALL_RESET_TO_REALMODE	0x00000001
+#define SVM_VMMCALL_RESET_TO_REALMODE	0x80000001
 
 /*
  * C runtime start off
@@ -133,15 +133,15 @@ cirrus_check(void)
 	return inb(0x3C5) == 0x12;
 }
 
-int 
-vmmcall(int edi, int esi, int edx, int ecx, int ebx)
+int
+vmmcall(int function, int edi, int esi, int edx, int ecx, int ebx)
 {
         int eax;
 
         __asm__ __volatile__(
 		".byte 0x0F,0x01,0xD9"
                 : "=a" (eax)
-		: "a"(0x58454E00), /* XEN\0 key */
+		: "a"(function),
 		  "b"(ebx), "c"(ecx), "d"(edx), "D"(edi), "S"(esi)
 	);
         return eax;
@@ -200,7 +200,7 @@ main(void)
 	if (check_amd()) {
 		/* AMD implies this is SVM */
                 puts("SVM go ...\n");
-                vmmcall(SVM_VMMCALL_RESET_TO_REALMODE, 0, 0, 0, 0);
+                vmmcall(SVM_VMMCALL_RESET_TO_REALMODE, 0, 0, 0, 0, 0);
 	} else {
 		puts("Loading VMXAssist ...\n");
 		memcpy((void *)VMXASSIST_PHYSICAL_ADDRESS,
diff -r ecb8ff1fcf1f tools/ioemu/Makefile.target
--- a/tools/ioemu/Makefile.target	Fri Jul 14 18:53:27 2006 +0100
+++ b/tools/ioemu/Makefile.target	Tue Jul 18 13:43:27 2006 +0100
@@ -336,6 +336,7 @@ VL_OBJS+= fdc.o mc146818rtc.o serial.o p
 VL_OBJS+= fdc.o mc146818rtc.o serial.o pc.o
 VL_OBJS+= cirrus_vga.o mixeng.o parallel.o
 VL_OBJS+= piix4acpi.o
+VL_OBJS+= xen_evtchn.o
 DEFINES += -DHAS_AUDIO
 endif
 ifeq ($(TARGET_BASE_ARCH), ppc)
diff -r ecb8ff1fcf1f tools/ioemu/hw/pc.c
--- a/tools/ioemu/hw/pc.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/tools/ioemu/hw/pc.c	Tue Jul 18 13:43:27 2006 +0100
@@ -819,6 +819,9 @@ static void pc_init1(uint64_t ram_size, 
     }
 #endif /* !CONFIG_DM */
 
+    if (pci_enabled)
+	pci_xen_evtchn_init(pci_bus);
+
     for(i = 0; i < MAX_SERIAL_PORTS; i++) {
         if (serial_hds[i]) {
             serial_init(&pic_set_irq_new, isa_pic,
diff -r ecb8ff1fcf1f tools/ioemu/target-i386-dm/helper2.c
--- a/tools/ioemu/target-i386-dm/helper2.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/tools/ioemu/target-i386-dm/helper2.c	Tue Jul 18 13:43:27 2006 +0100
@@ -82,6 +82,10 @@ int xce_handle = -1;
 /* which vcpu we are serving */
 int send_vcpu = 0;
 
+//the evtchn port for polling the notification,
+#define NR_CPUS 32
+evtchn_port_t ioreq_local_port[NR_CPUS];
+
 CPUX86State *cpu_x86_init(void)
 {
     CPUX86State *env;
@@ -105,15 +109,14 @@ CPUX86State *cpu_x86_init(void)
             return NULL;
         }
 
-        /* FIXME: how about if we overflow the page here? */
         for (i = 0; i < vcpus; i++) {
-            rc = xc_evtchn_bind_interdomain(
-                xce_handle, domid, shared_page->vcpu_iodata[i].vp_eport);
+	    rc = xc_evtchn_bind_interdomain(xce_handle, DOMID_XEN,
+			     shared_page->vcpu_iodata[i].vp_xen_port);
             if (rc == -1) {
                 fprintf(logfile, "bind interdomain ioctl error %d\n", errno);
                 return NULL;
             }
-            shared_page->vcpu_iodata[i].dm_eport = rc;
+	    ioreq_local_port[i] = rc;
         }
     }
 
@@ -184,10 +187,9 @@ void sp_info()
 
     for (i = 0; i < vcpus; i++) {
         req = &(shared_page->vcpu_iodata[i].vp_ioreq);
-        term_printf("vcpu %d: event port %d\n", i,
-                    shared_page->vcpu_iodata[i].vp_eport);
+        term_printf("vcpu %d: event port %d\n", i, ioreq_local_port[i]);
         term_printf("  req state: %x, pvalid: %x, addr: %"PRIx64", "
-                    "data: %"PRIx64", count: %"PRIx64", size: %"PRIx64"\n",
+                    "data: %"PRIx64",  count: %"PRIx64", size: %"PRIx64"\n",
                     req->state, req->pdata_valid, req->addr,
                     req->u.data, req->count, req->size);
         term_printf("  IO totally occurred on this vcpu: %"PRIx64"\n",
@@ -201,17 +203,12 @@ static ioreq_t *__cpu_get_ioreq(int vcpu
     ioreq_t *req;
 
     req = &(shared_page->vcpu_iodata[vcpu].vp_ioreq);
-
     if (req->state == STATE_IOREQ_READY) {
-        req->state = STATE_IOREQ_INPROCESS;
-        return req;
-    }
-
-    fprintf(logfile, "False I/O request ... in-service already: "
-            "%x, pvalid: %x, port: %"PRIx64", "
-            "data: %"PRIx64", count: %"PRIx64", size: %"PRIx64"\n",
-            req->state, req->pdata_valid, req->addr,
-            req->u.data, req->count, req->size);
+	req->state = STATE_IOREQ_INPROCESS;
+	rmb();
+	return req;
+    }
+
     return NULL;
 }
 
@@ -226,7 +223,7 @@ static ioreq_t *cpu_get_ioreq(void)
     port = xc_evtchn_pending(xce_handle);
     if (port != -1) {
         for ( i = 0; i < vcpus; i++ )
-            if ( shared_page->vcpu_iodata[i].dm_eport == port )
+            if ( ioreq_local_port[i] == port )
                 break;
 
         if ( i == vcpus ) {
@@ -447,8 +444,10 @@ void cpu_handle_ioreq(void *opaque)
         }
 
         /* No state change if state = STATE_IORESP_HOOK */
-        if (req->state == STATE_IOREQ_INPROCESS)
+        if (req->state == STATE_IOREQ_INPROCESS) {
+	    mb();
             req->state = STATE_IORESP_READY;
+	}
         env->send_event = 1;
     }
 }
@@ -479,8 +478,7 @@ int main_loop(void)
 
         if (env->send_event) {
             env->send_event = 0;
-            xc_evtchn_notify(xce_handle,
-                             shared_page->vcpu_iodata[send_vcpu].dm_eport);
+            (void)xc_evtchn_notify(xce_handle, ioreq_local_port[send_vcpu]);
         }
     }
     destroy_hvm_domain();
diff -r ecb8ff1fcf1f tools/libxc/xc_hvm_build.c
--- a/tools/libxc/xc_hvm_build.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/tools/libxc/xc_hvm_build.c	Tue Jul 18 13:43:27 2006 +0100
@@ -6,12 +6,14 @@
 #include <stddef.h>
 #include <inttypes.h>
 #include "xg_private.h"
+#include "xc_private.h"
 #include "xc_elf.h"
 #include <stdlib.h>
 #include <unistd.h>
 #include <zlib.h>
 #include <xen/hvm/hvm_info_table.h>
 #include <xen/hvm/ioreq.h>
+#include <xen/hvm/params.h>
 
 #define HVM_LOADER_ENTR_ADDR  0x00100000
 
@@ -52,6 +54,30 @@ loadelfimage(
     char *elfbase, int xch, uint32_t dom, unsigned long *parray,
     struct domain_setup_info *dsi);
 
+static void xc_set_hvm_param(int handle,
+                             domid_t dom, int param, unsigned long value)
+{
+    DECLARE_HYPERCALL;
+    xen_hvm_param_t arg;
+    int rc;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_set_param;
+    hypercall.arg[1] = (unsigned long)&arg;
+    arg.domid = dom;
+    arg.index = param;
+    arg.value = value;
+    if ( mlock(&arg, sizeof(arg)) != 0 )
+    {
+        PERROR("Could not lock memory for set parameter");
+        return;
+    }
+    rc = do_xen_hypercall(handle, &hypercall);
+    safe_munlock(&arg, sizeof(arg));
+    if (rc < 0)
+        PERROR("set HVM parameter failed (%d)", rc);
+}
+
 static unsigned char build_e820map(void *e820_page, unsigned long long mem_size)
 {
     struct e820entry *e820entry =
@@ -162,6 +188,8 @@ static int set_hvm_info(int xc_handle, u
     set_hvm_info_checksum(va_hvm);
 
     munmap(va_map, PAGE_SIZE);
+
+    xc_set_hvm_param(xc_handle, dom, HVM_PARAM_APIC_ENABLED, apic);
 
     return 0;
 }
@@ -275,27 +303,17 @@ static int setup_guest(int xc_handle,
         shared_info->vcpu_info[i].evtchn_upcall_mask = 1;
     munmap(shared_info, PAGE_SIZE);
 
-    /* Populate the event channel port in the shared page */
+    /* Paranoia */
     shared_page_frame = page_array[(v_end >> PAGE_SHIFT) - 1];
     if ( (sp = (shared_iopage_t *) xc_map_foreign_range(
               xc_handle, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
               shared_page_frame)) == 0 )
         goto error_out;
     memset(sp, 0, PAGE_SIZE);
-
-    /* FIXME: how about if we overflow the page here? */
-    for ( i = 0; i < vcpus; i++ ) {
-        unsigned int vp_eport;
-
-        vp_eport = xc_evtchn_alloc_unbound(xc_handle, dom, 0);
-        if ( vp_eport < 0 ) {
-            PERROR("Couldn't get unbound port from VMX guest.\n");
-            goto error_out;
-        }
-        sp->vcpu_iodata[i].vp_eport = vp_eport;
-    }
-
     munmap(sp, PAGE_SIZE);
+
+    xc_set_hvm_param(xc_handle, dom, HVM_PARAM_STORE_PFN, (v_end >> PAGE_SHIFT) - 2);
+    xc_set_hvm_param(xc_handle, dom, HVM_PARAM_STORE_EVTCHN, store_evtchn);
 
     *store_mfn = page_array[(v_end >> PAGE_SHIFT) - 2];
     if ( xc_clear_domain_page(xc_handle, dom, *store_mfn) )
diff -r ecb8ff1fcf1f xen/arch/x86/dom0_ops.c
--- a/xen/arch/x86/dom0_ops.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/dom0_ops.c	Tue Jul 18 13:43:27 2006 +0100
@@ -429,7 +429,7 @@ long arch_do_dom0_op(struct dom0_op *op,
         ret = 0;
 
         hypercall_page = map_domain_page(mfn);
-        hypercall_page_initialise(hypercall_page);
+        hypercall_page_initialise(d, hypercall_page);
         unmap_domain_page(hypercall_page);
 
         put_page_and_type(mfn_to_page(mfn));
diff -r ecb8ff1fcf1f xen/arch/x86/domain.c
--- a/xen/arch/x86/domain.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/domain.c	Tue Jul 18 13:43:27 2006 +0100
@@ -819,7 +819,7 @@ unsigned long hypercall_create_continuat
 #if defined(__i386__)
         regs->eax  = op;
 
-        if ( supervisor_mode_kernel )
+        if ( supervisor_mode_kernel || hvm_guest(current) )
             regs->eip &= ~31; /* re-execute entire hypercall entry stub */
         else
             regs->eip -= 2;   /* re-execute 'int 0x82' */
diff -r ecb8ff1fcf1f xen/arch/x86/domain_build.c
--- a/xen/arch/x86/domain_build.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/domain_build.c	Tue Jul 18 13:43:27 2006 +0100
@@ -704,7 +704,7 @@ int construct_dom0(struct domain *d,
             return -1;
         }
 
-        hypercall_page_initialise((void *)hypercall_page);
+        hypercall_page_initialise(d, (void *)hypercall_page);
     }
 
     /* Copy the initial ramdisk. */
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/hvm.c	Tue Jul 18 13:43:27 2006 +0100
@@ -45,6 +45,9 @@
 #include <public/sched.h>
 #include <public/hvm/ioreq.h>
 #include <public/hvm/hvm_info_table.h>
+#include <xen/event.h>
+#include <xen/hypercall.h>
+#include <xen/guest_access.h>
 
 int hvm_enabled = 0;
 
@@ -58,6 +61,8 @@ static void hvm_zap_mmio_range(
 {
     unsigned long i, val = INVALID_MFN;
 
+    ASSERT(d == current->domain);
+
     for ( i = 0; i < nr_pfn; i++ )
     {
         if ( pfn + i >= 0xfffff )
@@ -67,18 +72,27 @@ static void hvm_zap_mmio_range(
     }
 }
 
-static void hvm_map_io_shared_page(struct domain *d)
+static void e820_zap_iommu_callback(struct domain *d,
+                                    struct e820entry *e,
+                                    void *ign)
+{
+    if ( e->type == E820_IO )
+        hvm_zap_mmio_range(d, e->addr >> PAGE_SHIFT, e->size >> PAGE_SHIFT);
+}
+
+static void e820_foreach(struct domain *d,
+                         void (*cb)(struct domain *d,
+                                    struct e820entry *e,
+                                    void *data),
+                         void *data)
 {
     int i;
     unsigned char e820_map_nr;
     struct e820entry *e820entry;
     unsigned char *p;
     unsigned long mfn;
-    unsigned long gpfn = 0;
-
-    local_flush_tlb_pge();
-
-    mfn = get_mfn_from_gpfn(E820_MAP_PAGE >> PAGE_SHIFT);
+
+    mfn = gmfn_to_mfn(d, E820_MAP_PAGE >> PAGE_SHIFT);
     if (mfn == INVALID_MFN) {
         printk("Can not find E820 memory map page for HVM domain.\n");
         domain_crash_synchronous();
@@ -95,26 +109,40 @@ static void hvm_map_io_shared_page(struc
 
     for ( i = 0; i < e820_map_nr; i++ )
     {
-        if ( e820entry[i].type == E820_SHARED_PAGE )
-            gpfn = (e820entry[i].addr >> PAGE_SHIFT);
-        if ( e820entry[i].type == E820_IO )
-            hvm_zap_mmio_range(
-                d, 
-                e820entry[i].addr >> PAGE_SHIFT,
-                e820entry[i].size >> PAGE_SHIFT);
-    }
-
-    if ( gpfn == 0 ) {
-        printk("Can not get io request shared page"
-               " from E820 memory map for HVM domain.\n");
-        unmap_domain_page(p);
-        domain_crash_synchronous();
-    }
+        cb(d, e820entry + i, data);
+    }
+
     unmap_domain_page(p);
-
-    /* Initialise shared page */
-    mfn = get_mfn_from_gpfn(gpfn);
-    if (mfn == INVALID_MFN) {
+}
+
+static void hvm_zap_iommu_pages(struct domain *d)
+{
+    e820_foreach(d, e820_zap_iommu_callback, NULL);
+}
+
+static void e820_map_io_shared_callback(struct domain *d,
+                                        struct e820entry *e,
+                                        void *data)
+{
+    unsigned long *mfn = data;
+    if ( e->type == E820_SHARED_PAGE ) {
+        ASSERT(*mfn == INVALID_MFN);
+        *mfn = gmfn_to_mfn(d, e->addr >> PAGE_SHIFT);
+    }
+}
+
+void hvm_map_io_shared_page(struct vcpu *v)
+{
+    unsigned long mfn = INVALID_MFN;
+    void *p;
+    struct domain *d = v->domain;
+
+    if ( d->arch.hvm_domain.shared_page_va )
+        return;
+
+    e820_foreach(d, e820_map_io_shared_callback, &mfn);
+
+    if ( mfn == INVALID_MFN ) {
         printk("Can not find io request shared page for HVM domain.\n");
         domain_crash_synchronous();
     }
@@ -127,59 +155,20 @@ static void hvm_map_io_shared_page(struc
     d->arch.hvm_domain.shared_page_va = (unsigned long)p;
 }
 
-static int validate_hvm_info(struct hvm_info_table *t)
-{
-    char signature[] = "HVM INFO";
-    uint8_t *ptr = (uint8_t *)t;
-    uint8_t sum = 0;
-    int i;
-
-    /* strncmp(t->signature, "HVM INFO", 8) */
-    for ( i = 0; i < 8; i++ ) {
-        if ( signature[i] != t->signature[i] ) {
-            printk("Bad hvm info signature\n");
-            return 0;
-        }
-    }
-
-    for ( i = 0; i < t->length; i++ )
-        sum += ptr[i];
-
-    return (sum == 0);
-}
-
-static void hvm_get_info(struct domain *d)
-{
-    unsigned char *p;
-    unsigned long mfn;
-    struct hvm_info_table *t;
-
-    mfn = get_mfn_from_gpfn(HVM_INFO_PFN);
-    if ( mfn == INVALID_MFN ) {
-        printk("Can not get info page mfn for HVM domain.\n");
-        domain_crash_synchronous();
-    }
-
-    p = map_domain_page(mfn);
-    if ( p == NULL ) {
-        printk("Can not map info page for HVM domain.\n");
-        domain_crash_synchronous();
-    }
-
-    t = (struct hvm_info_table *)(p + HVM_INFO_OFFSET);
-
-    if ( validate_hvm_info(t) ) {
-        d->arch.hvm_domain.nr_vcpus = t->nr_vcpus;
-        d->arch.hvm_domain.apic_enabled = t->apic_enabled;
-        d->arch.hvm_domain.pae_enabled = t->pae_enabled;
-    } else {
-        printk("Bad hvm info table\n");
-        d->arch.hvm_domain.nr_vcpus = 1;
-        d->arch.hvm_domain.apic_enabled = 0;
-        d->arch.hvm_domain.pae_enabled = 0;
-    }
-
-    unmap_domain_page(p);
+static void evtchn_callback_func(void *v)
+{
+    hvm_assist_complete(v);
+}
+
+void hvm_create_event_channels(struct vcpu *v)
+{
+    vcpu_iodata_t *p;
+    p = get_vio(v->domain, v->vcpu_id);
+    v->arch.hvm_vcpu.xen_port = p->vp_xen_port =
+        alloc_xen_event_channel(evtchn_callback_func,
+                                v,
+                                dom0);
+    DPRINTK("Allocated port %d for hvm.\n", v->arch.hvm_vcpu.xen_port);
 }
 
 void hvm_setup_platform(struct domain* d)
@@ -196,8 +185,7 @@ void hvm_setup_platform(struct domain* d
         domain_crash_synchronous();
     }
 
-    hvm_map_io_shared_page(d);
-    hvm_get_info(d);
+    hvm_zap_iommu_pages(d);
 
     platform = &d->arch.hvm_domain;
     pic_init(&platform->vpic, pic_irq_request, &platform->interrupt_request);
@@ -329,6 +317,59 @@ void hvm_print_line(struct vcpu *v, cons
 	pbuf[(*index)++] = c;
 }
 
+void hvm_release_assist_channel(struct vcpu *v)
+{
+    release_xen_event_channel(v->arch.hvm_vcpu.xen_port);
+}
+
+#if defined(__i386__)
+typedef unsigned long hvm_hypercall_handler(unsigned long, unsigned long,
+                                            unsigned long, unsigned long,
+                                            unsigned long);
+#define HYPERCALL(x) [ __HYPERVISOR_ ## x ] = (hvm_hypercall_handler *) do_ ## x
+static hvm_hypercall_handler *hvm_hypercall_table[] = {
+    HYPERCALL(mmu_update),
+    HYPERCALL(memory_op),
+    HYPERCALL(multicall),
+    HYPERCALL(update_va_mapping),
+    HYPERCALL(event_channel_op_compat),
+    HYPERCALL(xen_version),
+    HYPERCALL(grant_table_op),
+    HYPERCALL(event_channel_op),
+    HYPERCALL(hvm_op)
+};
+#undef HYPERCALL
+
+void hvm_do_hypercall(struct cpu_user_regs *pregs)
+{
+    if (pregs->eax > ARRAY_SIZE(hvm_hypercall_table) ||
+        !hvm_hypercall_table[pregs->eax]) {
+        DPRINTK("HVM vcpu %d:%d did a bad hypercall %d.\n",
+                current->domain->domain_id, current->vcpu_id,
+                pregs->eax);
+        pregs->eax = -ENOSYS;
+    } else {
+        pregs->eax = hvm_hypercall_table[pregs->eax](pregs->ebx, pregs->ecx,
+                                                     pregs->edx, pregs->esi,
+                                                     pregs->edi);
+    }
+}
+#else
+void hvm_do_hypercall(struct cpu_user_regs *pregs)
+{
+    printk("not supported yet!\n");
+}
+#endif
+
+/* Initialise a hypercall transfer page for a VMX domain using
+   paravirtualised drivers. */
+void hvm_hypercall_page_initialise(struct domain *d,
+                                   void *hypercall_page)
+{
+    hvm_funcs.init_hypercall_page(d, hypercall_page);
+}
+
+
 /*
  * only called in HVM domain BSP context
  * when booting, vcpuid is always equal to apic_id
@@ -372,6 +413,57 @@ int hvm_bringup_ap(int vcpuid, int tramp
 
     xfree(ctxt);
 
+    return rc;
+}
+
+long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg)
+
+{
+    long rc = 0;
+
+    switch (op)
+    {
+    case HVMOP_set_param:
+    case HVMOP_get_param:
+    {
+        struct xen_hvm_param a;
+        struct domain *d;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        if ( a.index < 0 || a.index > HVM_NR_PARAMS ) {
+            return -EINVAL;
+        }
+
+        if ( a.domid == DOMID_SELF ) {
+            get_knownalive_domain(current->domain);
+            d = current->domain;
+        } else if ( IS_PRIV(current->domain) ) {
+            d = find_domain_by_id(a.domid);
+            if ( !d ) {
+                return -ESRCH;
+            }
+        } else {
+            return -EPERM;
+        }
+
+        if ( op == HVMOP_set_param ) {
+            rc = 0;
+            d->arch.hvm_domain.params[a.index] = a.value;
+        } else {
+            rc = d->arch.hvm_domain.params[a.index];
+        }
+
+        put_domain(d);
+        return rc;
+    }
+    default:
+    {
+        DPRINTK("Bad HVM op %ld.\n", op);
+        rc = -EINVAL;
+    }
+    }
     return rc;
 }
 
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/intercept.c
--- a/xen/arch/x86/hvm/intercept.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/intercept.c	Tue Jul 18 13:43:27 2006 +0100
@@ -211,7 +211,7 @@ void hlt_timer_fn(void *data)
 {
     struct vcpu *v = data;
 
-    evtchn_set_pending(v, iopacket_port(v));
+    hvm_prod_vcpu(v);
 }
 
 static __inline__ void missed_ticks(struct periodic_time *pt)
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/io.c
--- a/xen/arch/x86/hvm/io.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/io.c	Tue Jul 18 13:43:27 2006 +0100
@@ -687,85 +687,18 @@ void hvm_io_assist(struct vcpu *v)
 
     p = &vio->vp_ioreq;
 
-    /* clear IO wait HVM flag */
-    if ( test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags) ) {
-        if ( p->state == STATE_IORESP_READY ) {
-            p->state = STATE_INVALID;
-            clear_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags);
-
-            if ( p->type == IOREQ_TYPE_PIO )
-                hvm_pio_assist(regs, p, io_opp);
-            else {
-                hvm_mmio_assist(regs, p, io_opp);
-                hvm_load_cpu_guest_regs(v, regs);
-            }
-
-            /* Copy register changes back into current guest state. */
-            memcpy(guest_cpu_user_regs(), regs, HVM_CONTEXT_STACK_BYTES);
-        }
-        /* else an interrupt send event raced us */
-    }
-}
-
-/*
- * On exit from hvm_wait_io, we're guaranteed not to be waiting on
- * I/O response from the device model.
- */
-void hvm_wait_io(void)
-{
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    int port = iopacket_port(v);
-
-    for ( ; ; )
-    {
-        /* Clear master flag, selector flag, event flag each in turn. */
-        v->vcpu_info->evtchn_upcall_pending = 0;
-        clear_bit(port/BITS_PER_LONG, &v->vcpu_info->evtchn_pending_sel);
-        smp_mb__after_clear_bit();
-        if ( test_and_clear_bit(port, &d->shared_info->evtchn_pending[0]) )
-            hvm_io_assist(v);
-
-        /* Need to wait for I/O responses? */
-        if ( !test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags) )
-            break;
-
-        do_sched_op_compat(SCHEDOP_block, 0);
-    }
-
-    /*
-     * Re-set the selector and master flags in case any other notifications
-     * are pending.
-     */
-    if ( d->shared_info->evtchn_pending[port/BITS_PER_LONG] )
-        set_bit(port/BITS_PER_LONG, &v->vcpu_info->evtchn_pending_sel);
-    if ( v->vcpu_info->evtchn_pending_sel )
-        v->vcpu_info->evtchn_upcall_pending = 1;
-}
-
-void hvm_safe_block(void)
-{
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    int port = iopacket_port(v);
-
-    for ( ; ; )
-    {
-        /* Clear master flag & selector flag so we will wake from block. */
-        v->vcpu_info->evtchn_upcall_pending = 0;
-        clear_bit(port/BITS_PER_LONG, &v->vcpu_info->evtchn_pending_sel);
-        smp_mb__after_clear_bit();
-
-        /* Event pending already? */
-        if ( test_bit(port, &d->shared_info->evtchn_pending[0]) )
-            break;
-
-        do_sched_op_compat(SCHEDOP_block, 0);
-    }
-
-    /* Reflect pending event in selector and master flags. */
-    set_bit(port/BITS_PER_LONG, &v->vcpu_info->evtchn_pending_sel);
-    v->vcpu_info->evtchn_upcall_pending = 1;
+    if (p->state == STATE_IORESP_READY) {
+        p->state = STATE_INVALID;
+        if (p->type == IOREQ_TYPE_PIO)
+            hvm_pio_assist(regs, p, io_opp);
+        else {
+            hvm_mmio_assist(regs, p, io_opp);
+            hvm_load_cpu_guest_regs(v, regs);
+        }
+
+        /* Copy register changes back into current guest state. */
+        memcpy(guest_cpu_user_regs(), regs, HVM_CONTEXT_STACK_BYTES);
+    }
 }
 
 /*
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/platform.c
--- a/xen/arch/x86/hvm/platform.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/platform.c	Tue Jul 18 13:43:27 2006 +0100
@@ -669,6 +669,37 @@ int inst_copy_from_guest(unsigned char *
     return inst_len;
 }
 
+static void hvm_send_assist_req(struct vcpu *v)
+{
+    ioreq_t *p;
+
+    ASSERT(!test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags));
+    spin_lock(&v->pause_lock);
+    if ( v->pause_count++ == 0 )
+        set_bit(_VCPUF_paused, &v->vcpu_flags);
+    spin_unlock(&v->pause_lock);
+    set_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags);
+    mb();
+    p = &get_vio(v->domain, v->vcpu_id)->vp_ioreq;
+    if (unlikely(p->state != STATE_INVALID)) {
+        /* This indicates a bug in the device model.  Crash the
+           domain. */
+        printf("Device model set bad IO state %d.\n", p->state);
+        domain_crash(v->domain);
+        return;
+    }
+    vcpu_sleep_nosync(v);
+    wmb();
+    p->state = STATE_IOREQ_READY;
+    notify_xen_event_channel(v->arch.hvm_vcpu.xen_port);
+}
+
+/* Wake up a vcpu whihc is waiting for interrupts to come in */
+void hvm_prod_vcpu(struct vcpu *v)
+{
+    vcpu_unblock(v);
+}
+
 void send_pio_req(struct cpu_user_regs *regs, unsigned long port,
                   unsigned long count, int size, long value, int dir, int pvalid)
 {
@@ -682,13 +713,11 @@ void send_pio_req(struct cpu_user_regs *
         domain_crash_synchronous();
     }
 
-    if (test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags)) {
-        printf("HVM I/O has not yet completed\n");
-        domain_crash_synchronous();
-    }
-    set_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags);
-
     p = &vio->vp_ioreq;
+    if (p->state != STATE_INVALID) {
+        printf("WARNING: send pio with something already pending (%d)?\n",
+               p->state);
+    }
     p->dir = dir;
     p->pdata_valid = pvalid;
 
@@ -714,15 +743,11 @@ void send_pio_req(struct cpu_user_regs *
         return;
     }
 
-    p->state = STATE_IOREQ_READY;
-
-    evtchn_send(iopacket_port(v));
-    hvm_wait_io();
-}
-
-void send_mmio_req(
-    unsigned char type, unsigned long gpa,
-    unsigned long count, int size, long value, int dir, int pvalid)
+    hvm_send_assist_req(v);
+}
+
+static void send_mmio_req(unsigned char type, unsigned long gpa,
+                          unsigned long count, int size, long value, int dir, int pvalid)
 {
     struct vcpu *v = current;
     vcpu_iodata_t *vio;
@@ -739,12 +764,10 @@ void send_mmio_req(
 
     p = &vio->vp_ioreq;
 
-    if (test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags)) {
-        printf("HVM I/O has not yet completed\n");
-        domain_crash_synchronous();
-    }
-
-    set_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags);
+    if (p->state != STATE_INVALID) {
+        printf("WARNING: send pio with something already pending (%d)?\n",
+               p->state);
+    }
     p->dir = dir;
     p->pdata_valid = pvalid;
 
@@ -770,10 +793,7 @@ void send_mmio_req(
         return;
     }
 
-    p->state = STATE_IOREQ_READY;
-
-    evtchn_send(iopacket_port(v));
-    hvm_wait_io();
+    hvm_send_assist_req(v);
 }
 
 static void mmio_operands(int type, unsigned long gpa, struct instruction *inst,
@@ -1035,6 +1055,108 @@ void handle_mmio(unsigned long va, unsig
     }
 }
 
+void hvm_assist_complete(struct vcpu *v)
+{
+    ioreq_t *p;
+    /* The device model just sent an event channel message to us.  Either:
+
+    a) It just finished processing a request, or
+    b) it wants us to send an interrupt into the guest.
+
+    We only need to handle case (b) explicitly if there is no pending
+    IO request from us to the device model (since if there is, we'll
+    pick up the interrupt when the request completes). */
+    p = &get_vio(v->domain, v->vcpu_id)->vp_ioreq;
+    if (p->state == STATE_IORESP_READY) {
+        /* There's a race here, in that the device model could set
+           p->state while we're not looking, but we don't care, since
+           that would imply that *this* notification is not related to
+           that state transition, and so there'll be another one along
+           shortly. */
+        if (test_and_clear_bit(ARCH_HVM_IO_WAIT,
+                               &v->arch.hvm_vcpu.ioflags)) {
+            /* Just completed a wait-for-io, so we can unpause the
+               vcpu.  It'll pick up the response when it returns.  */
+            vcpu_unpause(v);
+            return;
+        } else {
+            /* Someone got in and processed the response before us.
+               Just to be on the safe side, treat this as an interrupt
+               delivery. */
+            /* (the other path implicitly does interrupt delivery as
+               the vcpu returns to the guest) */
+        }
+    }
+
+    /* Evtchn message must have been for interrupt delivery. */
+    hvm_prod_vcpu(v);
+    smp_send_event_check_cpu(v->processor);
+}
+
+#define MIN(x,y) ((x)<(y)?(x):(y))
+
+/* Note that copy_{to,from}_user_hvm don't set the A and D bits on
+   PTEs, and require the PTE to be writable even when they're only
+   trying to read from it.  The guest is expected to deal with
+   this. */
+unsigned long copy_to_user_hvm(void *to, const void *from, unsigned len)
+{
+    unsigned long mfn;
+    unsigned long va;
+    void *map;
+    unsigned long off_in_page;
+    unsigned long chunk_size;
+
+    ASSERT(hvm_guest(current));
+    va = (unsigned long)to;
+    off_in_page = va % PAGE_SIZE;
+    while (len != 0) {
+        mfn = gva_to_mfn(va);
+        if (!mfn)
+            break;
+        map = map_domain_page(mfn);
+        if (!map)
+            break;
+        chunk_size = MIN(len, PAGE_SIZE - off_in_page);
+        memcpy(map + off_in_page, from, chunk_size);
+        unmap_domain_page(map);
+        off_in_page = 0;
+        len -= chunk_size;
+        from += chunk_size;
+        va += chunk_size;
+    }
+    return len;
+}
+
+unsigned long copy_from_user_hvm(void *to, const void *from, unsigned len)
+{
+    unsigned long mfn;
+    unsigned long va;
+    void *map;
+    unsigned long off_in_page;
+    unsigned long chunk_size;
+
+    ASSERT(hvm_guest(current));
+    va = (unsigned long)from;
+    off_in_page = va % PAGE_SIZE;
+    while (len != 0) {
+        mfn = gva_to_mfn(va);
+        if (!mfn)
+            break;
+        map = map_domain_page(mfn);
+        if (!map)
+            break;
+        chunk_size = MIN(len, PAGE_SIZE - off_in_page);
+        memcpy(to, map + off_in_page, chunk_size);
+        unmap_domain_page(map);
+        off_in_page = 0;
+        len -= chunk_size;
+        to += chunk_size;
+        va += chunk_size;
+    }
+    return len;
+}
+
 /*
  * Local variables:
  * mode: C
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/svm/svm.c
--- a/xen/arch/x86/hvm/svm/svm.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/svm/svm.c	Tue Jul 18 13:43:27 2006 +0100
@@ -25,6 +25,7 @@
 #include <xen/sched.h>
 #include <xen/irq.h>
 #include <xen/softirq.h>
+#include <xen/hypercall.h>
 #include <asm/current.h>
 #include <asm/io.h>
 #include <asm/shadow.h>
@@ -456,6 +457,28 @@ void svm_init_ap_context(struct vcpu_gue
     ctxt->flags = VGCF_HVM_GUEST;
 }
 
+static void svm_init_hypercall_page(struct domain *d, void *hypercall_page)
+{
+    char *p;
+    int i;
+
+    memset(hypercall_page, 0, PAGE_SIZE);
+
+    for ( i = 0; i < (PAGE_SIZE / 32); i++ )
+    {
+        p = (char *)(hypercall_page + (i * 32));
+        *(u8  *)(p + 0) = 0xb8; /* mov imm32, %eax */
+        *(u32 *)(p + 1) = i;
+        *(u8  *)(p + 5) = 0x0f; /* vmmcall */
+        *(u8  *)(p + 6) = 0x01;
+        *(u8  *)(p + 7) = 0xd9;
+        *(u8  *)(p + 8) = 0xc3; /* ret */
+    }
+
+    /* Don't support HYPERVISOR_iret at the moment */
+    *(u16 *)(hypercall_page + (__HYPERVISOR_iret * 32)) = 0x0b0f; /* ud2 */
+}
+
 int start_svm(void)
 {
     u32 eax, ecx, edx;
@@ -503,6 +526,8 @@ int start_svm(void)
     hvm_funcs.instruction_length = svm_instruction_length;
     hvm_funcs.get_guest_ctrl_reg = svm_get_ctrl_reg;
     hvm_funcs.init_ap_context = svm_init_ap_context;
+
+    hvm_funcs.init_hypercall_page = svm_init_hypercall_page;
 
     hvm_enabled = 1;    
 
@@ -2085,7 +2110,7 @@ static inline void svm_vmexit_do_hlt(str
         next_wakeup = next_pit;
     if ( next_wakeup != - 1 )
         set_timer(&current->arch.hvm_svm.hlt_timer, next_wakeup);
-    hvm_safe_block();
+    do_sched_op_compat(SCHEDOP_block, 0);
 }
 
 
@@ -2314,33 +2339,39 @@ static int svm_do_vmmcall(struct vcpu *v
     inst_len = __get_instruction_length(vmcb, INSTR_VMCALL, NULL);
     ASSERT(inst_len > 0);
 
-    /* VMMCALL sanity check */
-    if (vmcb->cpl > get_vmmcall_cpl(regs->edi))
-    {
-        printf("VMMCALL CPL check failed\n");
-        return -1;
-    }
-
-    /* handle the request */
-    switch (regs->edi) 
-    {
-    case VMMCALL_RESET_TO_REALMODE:
-        if (svm_do_vmmcall_reset_to_realmode(v, regs)) 
-        {
-            printf("svm_do_vmmcall_reset_to_realmode() failed\n");
+    if (regs->eax & 0x80000000) {
+        /* VMMCALL sanity check */
+        if (vmcb->cpl > get_vmmcall_cpl(regs->edi))
+        {
+            printf("VMMCALL CPL check failed\n");
             return -1;
         }
-    
-        /* since we just reset the VMCB, return without adjusting the eip */
-        return 0;
-    case VMMCALL_DEBUG:
-        printf("DEBUG features not implemented yet\n");
-        break;
-    default:
-    break;
-    }
-
-    hvm_print_line(v, regs->eax); /* provides the current domain */
+
+        /* handle the request */
+        switch (regs->eax)
+        {
+        case VMMCALL_RESET_TO_REALMODE:
+            if (svm_do_vmmcall_reset_to_realmode(v, regs))
+            {
+                printf("svm_do_vmmcall_reset_to_realmode() failed\n");
+                return -1;
+            }
+            /* since we just reset the VMCB, return without adjusting
+             * the eip */
+            return 0;
+
+        case VMMCALL_DEBUG:
+            printf("DEBUG features not implemented yet\n");
+            break;
+        default:
+            break;
+        }
+
+        hvm_print_line(v, regs->eax); /* provides the current domain */
+    } else {
+        /* It's a hypercall */
+        hvm_do_hypercall(regs);
+    }
 
     __update_guest_eip(vmcb, inst_len);
     return 0;
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/svm/vmcb.c
--- a/xen/arch/x86/hvm/svm/vmcb.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/svm/vmcb.c	Tue Jul 18 13:43:27 2006 +0100
@@ -370,18 +370,6 @@ void svm_do_launch(struct vcpu *v)
     if (v->vcpu_id == 0)
         hvm_setup_platform(v->domain);
 
-    if ( evtchn_bind_vcpu(iopacket_port(v), v->vcpu_id) < 0 )
-    {
-        printk("HVM domain bind port %d to vcpu %d failed!\n",
-               iopacket_port(v), v->vcpu_id);
-        domain_crash_synchronous();
-    }
-
-    HVM_DBG_LOG(DBG_LEVEL_1, "eport: %x", iopacket_port(v));
-
-    clear_bit(iopacket_port(v),
-              &v->domain->shared_info->evtchn_mask[0]);
-
     if (hvm_apic_support(v->domain))
         vlapic_init(v);
     init_timer(&v->arch.hvm_svm.hlt_timer,
@@ -455,9 +443,10 @@ void svm_do_resume(struct vcpu *v)
         pickup_deactive_ticks(pt);
     }
 
-    if ( test_bit(iopacket_port(v), &d->shared_info->evtchn_pending[0]) ||
-         test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags) )
-        hvm_wait_io();
+    if (test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags)) {
+        hvm_io_assist(v);
+        ASSERT(!test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags));
+    }
 
     /* We can't resume the guest if we're waiting on I/O */
     ASSERT(!test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags));
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/vlapic.c
--- a/xen/arch/x86/hvm/vlapic.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/vlapic.c	Tue Jul 18 13:43:27 2006 +0100
@@ -33,6 +33,7 @@
 #include <xen/sched.h>
 #include <asm/current.h>
 #include <public/hvm/ioreq.h>
+#include <public/hvm/params.h>
 
 /* XXX remove this definition after GFW enabled */
 #define VLAPIC_NO_BIOS
@@ -63,7 +64,7 @@ int vlapic_find_highest_irr(struct vlapi
 
 int hvm_apic_support(struct domain *d)
 {
-    return d->arch.hvm_domain.apic_enabled;
+    return d->arch.hvm_domain.params[HVM_PARAM_APIC_ENABLED];
 }
 
 s_time_t get_apictime_scheduled(struct vcpu *v)
@@ -223,7 +224,7 @@ static int vlapic_accept_irq(struct vcpu
               "level trig mode for vector %d\n", vector);
             set_bit(vector, &vlapic->tmr[0]);
         }
-        evtchn_set_pending(v, iopacket_port(v));
+        hvm_prod_vcpu(v);
 
         result = 1;
         break;
@@ -367,7 +368,7 @@ int vlapic_check_vector(struct vlapic *v
     return 1;
 }
 
-void vlapic_ipi(struct vlapic *vlapic)
+static void vlapic_ipi(struct vlapic *vlapic)
 {
     unsigned int dest = (vlapic->icr_high >> 24) & 0xff;
     unsigned int short_hand = (vlapic->icr_low >> 18) & 3;
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/vmx/io.c
--- a/xen/arch/x86/hvm/vmx/io.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/vmx/io.c	Tue Jul 18 13:43:27 2006 +0100
@@ -142,6 +142,7 @@ asmlinkage void vmx_intr_assist(void)
     struct hvm_domain *plat=&v->domain->arch.hvm_domain;
     struct periodic_time *pt = &plat->pl_time.periodic_tm;
     struct hvm_virpic *pic= &plat->vpic;
+    int callback_irq;
     unsigned int idtv_info_field;
     unsigned long inst_len;
     int    has_ext_irq;
@@ -152,6 +153,15 @@ asmlinkage void vmx_intr_assist(void)
     if ( (v->vcpu_id == 0) && pt->enabled && pt->pending_intr_nr ) {
         pic_set_irq(pic, pt->irq, 0);
         pic_set_irq(pic, pt->irq, 1);
+    }
+
+    callback_irq = v->domain->arch.hvm_domain.params[HVM_PARAM_CALLBACK_IRQ];
+    if ( callback_irq != 0 &&
+         local_events_need_delivery() ) {
+        /*inject para-device call back irq*/
+        v->vcpu_info->evtchn_upcall_mask = 1;
+        pic_set_irq(pic, callback_irq, 0);
+        pic_set_irq(pic, callback_irq, 1);
     }
 
     has_ext_irq = cpu_has_pending_irq(v);
@@ -220,7 +230,7 @@ asmlinkage void vmx_intr_assist(void)
 
 void vmx_do_resume(struct vcpu *v)
 {
-    struct domain *d = v->domain;
+    ioreq_t *p;
     struct periodic_time *pt = &v->domain->arch.hvm_domain.pl_time.periodic_tm;
 
     vmx_stts();
@@ -234,9 +244,13 @@ void vmx_do_resume(struct vcpu *v)
         pickup_deactive_ticks(pt);
     }
 
-    if ( test_bit(iopacket_port(v), &d->shared_info->evtchn_pending[0]) ||
-         test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags) )
-        hvm_wait_io();
+    p = &get_vio(v->domain, v->vcpu_id)->vp_ioreq;
+    if (p->state == STATE_IORESP_READY)
+        hvm_io_assist(v);
+    if (p->state != STATE_INVALID) {
+        printf("Weird HVM iorequest state %d.\n", p->state);
+        domain_crash(v->domain);
+    }
 
     /* We can't resume the guest if we're waiting on I/O */
     ASSERT(!test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags));
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/vmx/vmcs.c
--- a/xen/arch/x86/hvm/vmx/vmcs.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/vmx/vmcs.c	Tue Jul 18 13:43:27 2006 +0100
@@ -245,18 +245,6 @@ static void vmx_do_launch(struct vcpu *v
     if (v->vcpu_id == 0)
         hvm_setup_platform(v->domain);
 
-    if ( evtchn_bind_vcpu(iopacket_port(v), v->vcpu_id) < 0 )
-    {
-        printk("VMX domain bind port %d to vcpu %d failed!\n",
-               iopacket_port(v), v->vcpu_id);
-        domain_crash_synchronous();
-    }
-
-    HVM_DBG_LOG(DBG_LEVEL_1, "eport: %x", iopacket_port(v));
-
-    clear_bit(iopacket_port(v),
-              &v->domain->shared_info->evtchn_mask[0]);
-
     __asm__ __volatile__ ("mov %%cr0,%0" : "=r" (cr0) : );
 
     error |= __vmwrite(GUEST_CR0, cr0);
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/vmx/vmx.c
--- a/xen/arch/x86/hvm/vmx/vmx.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/vmx/vmx.c	Tue Jul 18 13:43:27 2006 +0100
@@ -25,6 +25,7 @@
 #include <xen/irq.h>
 #include <xen/softirq.h>
 #include <xen/domain_page.h>
+#include <xen/hypercall.h>
 #include <asm/current.h>
 #include <asm/io.h>
 #include <asm/shadow.h>
@@ -139,6 +140,7 @@ static void vmx_relinquish_guest_resourc
             kill_timer(&VLAPIC(v)->vlapic_timer);
             xfree(VLAPIC(v));
         }
+	hvm_release_assist_channel(v);
     }
 
     kill_timer(&d->arch.hvm_domain.pl_time.periodic_tm.timer);
@@ -669,6 +671,28 @@ static int check_vmx_controls(u32 ctrls,
     return 1;
 }
 
+static void vmx_init_hypercall_page(struct domain *d, void *hypercall_page)
+{
+    char *p;
+    int i;
+
+    memset(hypercall_page, 0, PAGE_SIZE);
+
+    for ( i = 0; i < (PAGE_SIZE / 32); i++ )
+    {
+        p = (char *)(hypercall_page + (i * 32));
+        *(u8  *)(p + 0) = 0xb8; /* mov imm32, %eax */
+        *(u32 *)(p + 1) = i;
+        *(u8  *)(p + 5) = 0x0f; /* vmcall */
+        *(u8  *)(p + 6) = 0x01;
+        *(u8  *)(p + 7) = 0xc1;
+        *(u8  *)(p + 8) = 0xc3; /* ret */
+    }
+
+    /* Don't support HYPERVISOR_iret at the moment */
+    *(u16 *)(hypercall_page + (__HYPERVISOR_iret * 32)) = 0x0b0f; /* ud2 */
+}
+
 int start_vmx(void)
 {
     u32 eax, edx;
@@ -748,6 +772,8 @@ int start_vmx(void)
     hvm_funcs.get_guest_ctrl_reg = vmx_get_ctrl_reg;
 
     hvm_funcs.init_ap_context = vmx_init_ap_context;
+
+    hvm_funcs.init_hypercall_page = vmx_init_hypercall_page;
 
     hvm_enabled = 1;
 
@@ -1968,7 +1994,7 @@ void vmx_vmexit_do_hlt(void)
         next_wakeup = next_pit;
     if ( next_wakeup != - 1 ) 
         set_timer(&current->arch.hvm_vmx.hlt_timer, next_wakeup);
-    hvm_safe_block();
+    do_sched_op_compat(SCHEDOP_block, 0);
 }
 
 static inline void vmx_vmexit_do_extint(struct cpu_user_regs *regs)
@@ -2138,11 +2164,10 @@ asmlinkage void vmx_vmexit_handler(struc
          * (1) We can get an exception (e.g. #PG) in the guest, or
          * (2) NMI
          */
-        int error;
         unsigned int vector;
         unsigned long va;
 
-        if ((error = __vmread(VM_EXIT_INTR_INFO, &vector))
+        if (__vmread(VM_EXIT_INTR_INFO, &vector)
             || !(vector & INTR_INFO_VALID_MASK))
             __hvm_bug(&regs);
         vector &= INTR_INFO_VECTOR_MASK;
@@ -2215,7 +2240,7 @@ asmlinkage void vmx_vmexit_handler(struc
                         (unsigned long)regs.ecx, (unsigned long)regs.edx,
                         (unsigned long)regs.esi, (unsigned long)regs.edi);
 
-            if (!(error = vmx_do_page_fault(va, &regs))) {
+            if (!vmx_do_page_fault(va, &regs)) {
                 /*
                  * Inject #PG using Interruption-Information Fields
                  */
@@ -2273,16 +2298,16 @@ asmlinkage void vmx_vmexit_handler(struc
         __update_guest_eip(inst_len);
         break;
     }
-#if 0 /* keep this for debugging */
     case EXIT_REASON_VMCALL:
+    {
         __get_instruction_length(inst_len);
         __vmread(GUEST_RIP, &eip);
         __vmread(EXIT_QUALIFICATION, &exit_qualification);
 
-        hvm_print_line(v, regs.eax); /* provides the current domain */
+        hvm_do_hypercall(&regs);
         __update_guest_eip(inst_len);
         break;
-#endif
+    }
     case EXIT_REASON_CR_ACCESS:
     {
         __vmread(GUEST_RIP, &eip);
@@ -2323,7 +2348,6 @@ asmlinkage void vmx_vmexit_handler(struc
     case EXIT_REASON_MWAIT_INSTRUCTION:
         __hvm_bug(&regs);
         break;
-    case EXIT_REASON_VMCALL:
     case EXIT_REASON_VMCLEAR:
     case EXIT_REASON_VMLAUNCH:
     case EXIT_REASON_VMPTRLD:
diff -r ecb8ff1fcf1f xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/mm.c	Tue Jul 18 13:43:27 2006 +0100
@@ -2982,7 +2982,12 @@ long arch_memory_op(int op, XEN_GUEST_HA
         if ( copy_from_guest(&xatp, arg, 1) )
             return -EFAULT;
 
-        if ( (d = find_domain_by_id(xatp.domid)) == NULL )
+        if ( xatp.domid == DOMID_SELF ) {
+            d = current->domain;
+            get_knownalive_domain(d);
+        } else if ( !IS_PRIV(current->domain) )
+            return -EPERM;
+        else if ( (d = find_domain_by_id(xatp.domid)) == NULL )
             return -ESRCH;
 
         switch ( xatp.space )
diff -r ecb8ff1fcf1f xen/arch/x86/x86_32/entry.S
--- a/xen/arch/x86/x86_32/entry.S	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/x86_32/entry.S	Tue Jul 18 13:43:27 2006 +0100
@@ -656,6 +656,7 @@ ENTRY(hypercall_table)
         .long do_xenoprof_op
         .long do_event_channel_op
         .long do_physdev_op
+        .long do_hvm_op             /* 34 */
         .rept NR_hypercalls-((.-hypercall_table)/4)
         .long do_ni_hypercall
         .endr
@@ -695,6 +696,7 @@ ENTRY(hypercall_args_table)
         .byte 2 /* do_xenoprof_op       */
         .byte 2 /* do_event_channel_op  */
         .byte 2 /* do_physdev_op        */
+        .byte 2 /* do_hvm_op            */  /* 34 */
         .rept NR_hypercalls-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff -r ecb8ff1fcf1f xen/arch/x86/x86_32/traps.c
--- a/xen/arch/x86/x86_32/traps.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/x86_32/traps.c	Tue Jul 18 13:43:27 2006 +0100
@@ -486,9 +486,11 @@ static void hypercall_page_initialise_ri
     *(u16 *)(p+ 6) = 0x82cd;  /* int  $0x82 */
 }
 
-void hypercall_page_initialise(void *hypercall_page)
-{
-    if ( supervisor_mode_kernel )
+void hypercall_page_initialise(struct domain *d, void *hypercall_page)
+{
+    if ( hvm_guest(d->vcpu[0]) )
+        hvm_hypercall_page_initialise(d, hypercall_page);
+    else if ( supervisor_mode_kernel )
         hypercall_page_initialise_ring0_kernel(hypercall_page);
     else
         hypercall_page_initialise_ring1_kernel(hypercall_page);
diff -r ecb8ff1fcf1f xen/common/event_channel.c
--- a/xen/common/event_channel.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/common/event_channel.c	Tue Jul 18 13:43:27 2006 +0100
@@ -46,6 +46,104 @@
         goto out;                                                   \
     } while ( 0 )
 
+#define NR_XEN_EVENT_CHANNELS 32
+#define XECS_FREE 0        /* Not in use at all */
+#define XECS_UNBOUND 1     /* Allocated but not bound to */
+#define XECS_BOUND 2       /* Bound to somewhere in domain-space */
+#define XECS_HBOUND 3      /* Half bound: Xen is trying to tear this
+                              down, but a domain is still attached */
+struct xen_evtchn {
+    int state;
+
+    void (*fire)(void *d); /* called when dom0 tries to send on this
+                              event channel. */
+    void *data;
+
+    struct domain *dom; /* Who is allowed to bind/currently bound */
+    int dom_port;
+};
+
+static struct xen_evtchn xen_event_channels[NR_XEN_EVENT_CHANNELS];
+/* Leaf lock protecting the xen_event_channels array. */
+static spinlock_t xen_event_channel_lock = SPIN_LOCK_UNLOCKED;
+
+int alloc_xen_event_channel(void (*f)(void *d),
+                            void *data,
+                            struct domain *d)
+{
+    int ind;
+
+    spin_lock(&xen_event_channel_lock);
+    for (ind = 0; ind < NR_XEN_EVENT_CHANNELS; ind++)
+        if ( xen_event_channels[ind].state == XECS_FREE )
+            break;
+    if ( ind == NR_XEN_EVENT_CHANNELS ) {
+        printf("Out of Xen event channels?\n");
+        ind = -1;
+        goto out;
+    }
+    xen_event_channels[ind].state = XECS_UNBOUND;
+    xen_event_channels[ind].fire = f;
+    xen_event_channels[ind].data = data;
+    xen_event_channels[ind].dom = d;
+ out:
+    spin_unlock(&xen_event_channel_lock);
+    return ind;
+}
+
+void release_xen_event_channel(int ind)
+{
+    spin_lock(&xen_event_channel_lock);
+    switch ( xen_event_channels[ind].state ) {
+    case XECS_UNBOUND:
+        xen_event_channels[ind].state = XECS_FREE;
+        break;
+    case XECS_BOUND:
+        xen_event_channels[ind].state = XECS_HBOUND;
+        break;
+    case XECS_HBOUND:
+        panic("Double free of Xen event channel.\n");
+    case XECS_FREE:
+        printf("Attempt to free non-allocated Xen event channel %d?\n",
+               ind);
+    default:
+        BUG();
+    }
+
+    spin_unlock(&xen_event_channel_lock);
+}
+
+void notify_xen_event_channel(int port)
+{
+    struct xen_evtchn *xchn = xen_event_channels + port;
+    struct domain *d = NULL;
+    struct evtchn *chn;
+
+    /* We rely on our caller to ensure that nobody's trying to tear
+       the channel down from inside Xen while it's being signalled on.
+       That means that the only transition the channel could make is
+       from BOUND to UNBOUND or vice-versa.  Neither of those change
+       the dom field, so we can read it without taking a lock.  This
+       simplifies the lock ordering a bit. */
+    d = xchn->dom;
+    ASSERT(d);
+    if ( !get_domain(d) )
+        return;
+    spin_lock(&d->evtchn_lock);
+    spin_lock(&xen_event_channel_lock);
+    if ( xchn->state != XECS_UNBOUND ) {
+        BUG_ON(xchn->state != XECS_BOUND);
+        BUG_ON(d != xchn->dom);
+        chn = evtchn_from_port(d, xchn->dom_port);
+        if ( chn->state == ECS_XEN )
+            evtchn_set_pending(d->vcpu[chn->notify_vcpu_id],
+                               xchn->dom_port);
+    } else
+        printf("Send on unbound Xen event channel?\n");
+
+    spin_unlock(&d->evtchn_lock);
+    spin_unlock(&xen_event_channel_lock);
+}
 
 static int virq_is_global(int virq)
 {
@@ -134,6 +232,44 @@ static long evtchn_alloc_unbound(evtchn_
 }
 
 
+static long evtchn_bind_xen(struct domain *ld, int xen_port)
+{
+    long rc = 0;
+    struct evtchn *lchn;
+    struct xen_evtchn *rchn;
+    int lport;
+
+    if ( xen_port < 0 || xen_port >= NR_XEN_EVENT_CHANNELS )
+        return -EINVAL;
+
+    spin_lock(&ld->evtchn_lock);
+    spin_lock(&xen_event_channel_lock);
+
+    rchn = xen_event_channels + xen_port;
+    if ( rchn->state != XECS_UNBOUND || rchn->dom != ld )
+        ERROR_EXIT(-EINVAL);
+
+    if ( (lport = get_free_port(ld)) < 0 )
+        ERROR_EXIT(lport);
+    lchn = evtchn_from_port(ld, lport);
+    lchn->state = ECS_XEN;
+    lchn->u.xen_port = xen_port;
+
+    rchn->state = XECS_BOUND;
+    rchn->dom_port = lport;
+
+    /* Somewhat ugly hack to avoid lost wakeups if we've tried to
+       notify this port before anyone got around to binding it. */
+    evtchn_set_pending(ld->vcpu[lchn->notify_vcpu_id], lport);
+    rc = lport;
+
+ out:
+    spin_unlock(&xen_event_channel_lock);
+    spin_unlock(&ld->evtchn_lock);
+
+    return rc;
+}
+
 static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
 {
     struct evtchn *lchn, *rchn;
@@ -147,6 +283,15 @@ static long evtchn_bind_interdomain(evtc
 
     if ( rdom == DOMID_SELF )
         rdom = current->domain->domain_id;
+
+    if ( rdom == DOMID_XEN ) {
+        rc = evtchn_bind_xen(ld, rport);
+        if ( rc >= 0 ) {
+            bind->local_port = rc;
+            rc = 0;
+        }
+        return rc;
+    }
 
     if ( (rd = find_domain_by_id(rdom)) == NULL )
         return -ESRCH;
@@ -317,11 +462,12 @@ static long evtchn_bind_pirq(evtchn_bind
 
 static long __evtchn_close(struct domain *d1, int port1)
 {
-    struct domain *d2 = NULL;
-    struct vcpu   *v;
-    struct evtchn *chn1, *chn2;
-    int            port2;
-    long           rc = 0;
+    struct domain     *d2 = NULL;
+    struct vcpu       *v;
+    struct evtchn     *chn1, *chn2;
+    int                port2;
+    long               rc = 0;
+    struct xen_evtchn *xchn;
 
  again:
     spin_lock(&d1->evtchn_lock);
@@ -409,6 +555,19 @@ static long __evtchn_close(struct domain
         chn2->u.unbound.remote_domid = d1->domain_id;
         break;
 
+    case ECS_XEN:
+        spin_lock(&xen_event_channel_lock);
+        xchn = xen_event_channels + chn1->u.xen_port;
+        BUG_ON(xchn->dom != d1);
+        if ( xchn->state == XECS_HBOUND )
+            xchn->state = XECS_FREE;
+        else if (xchn->state == XECS_BOUND)
+            xchn->state = XECS_UNBOUND;
+        else
+            BUG();
+        spin_unlock(&xen_event_channel_lock);
+        break;
+
     default:
         BUG();
     }
@@ -442,6 +601,7 @@ long evtchn_send(unsigned int lport)
     struct evtchn *lchn, *rchn;
     struct domain *ld = current->domain, *rd;
     int            rport, ret = 0;
+    struct xen_evtchn *xchn;
 
     spin_lock(&ld->evtchn_lock);
 
@@ -465,6 +625,16 @@ long evtchn_send(unsigned int lport)
         break;
     case ECS_UNBOUND:
         /* silently drop the notification */
+        break;
+    case ECS_XEN:
+        xchn = xen_event_channels + lchn->u.xen_port;
+        spin_lock(&xen_event_channel_lock);
+        if ( xchn->state != XECS_HBOUND )
+        {
+            BUG_ON(xchn->state != XECS_BOUND);
+            xchn->fire(xchn->data);
+        }
+        spin_unlock(&xen_event_channel_lock);
         break;
     default:
         ret = -EINVAL;
@@ -596,6 +766,11 @@ static long evtchn_status(evtchn_status_
             chn->u.interdomain.remote_dom->domain_id;
         status->u.interdomain.port = chn->u.interdomain.remote_port;
         break;
+    case ECS_XEN:
+        status->status = EVTCHNSTAT_interdomain;
+        status->u.interdomain.dom = DOMID_XEN;
+        status->u.interdomain.port = chn->u.xen_port;
+        break;
     case ECS_PIRQ:
         status->status = EVTCHNSTAT_pirq;
         status->u.pirq = chn->u.pirq;
@@ -649,6 +824,7 @@ long evtchn_bind_vcpu(unsigned int port,
     case ECS_UNBOUND:
     case ECS_INTERDOMAIN:
     case ECS_PIRQ:
+    case ECS_XEN:
         chn->notify_vcpu_id = vcpu_id;
         break;
     default:
diff -r ecb8ff1fcf1f xen/common/memory.c
--- a/xen/common/memory.c	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/common/memory.c	Tue Jul 18 13:43:27 2006 +0100
@@ -158,6 +158,9 @@ guest_remove_page(
     }
             
     page = mfn_to_page(mfn);
+    if ( IS_XEN_HEAP_FRAME(page) )
+        return 0;
+
     if ( unlikely(!get_page(page, d)) )
     {
         DPRINTK("Bad page free for domain %u\n", d->domain_id);
diff -r ecb8ff1fcf1f xen/include/asm-x86/domain.h
--- a/xen/include/asm-x86/domain.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/domain.h	Tue Jul 18 13:43:27 2006 +0100
@@ -55,7 +55,7 @@ extern void toggle_guest_mode(struct vcp
  * Initialise a hypercall-transfer page. The given pointer must be mapped
  * in Xen virtual address space (accesses are not validated or checked).
  */
-extern void hypercall_page_initialise(void *);
+extern void hypercall_page_initialise(struct domain *d, void *);
 
 struct arch_domain
 {
diff -r ecb8ff1fcf1f xen/include/asm-x86/guest_access.h
--- a/xen/include/asm-x86/guest_access.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/guest_access.h	Tue Jul 18 13:43:27 2006 +0100
@@ -8,6 +8,8 @@
 #define __ASM_X86_GUEST_ACCESS_H__
 
 #include <asm/uaccess.h>
+#include <asm/hvm/support.h>
+#include <asm/hvm/guest_access.h>
 
 /* Is the guest handle a NULL reference? */
 #define guest_handle_is_null(hnd)        ((hnd).p == NULL)
@@ -28,6 +30,8 @@
 #define copy_to_guest_offset(hnd, off, ptr, nr) ({      \
     const typeof(ptr) _x = (hnd).p;                     \
     const typeof(ptr) _y = (ptr);                       \
+    hvm_guest(current) ?                                \
+    copy_to_user_hvm(_x+(off), _y, sizeof(*_x)*(nr)) :  \
     copy_to_user(_x+(off), _y, sizeof(*_x)*(nr));       \
 })
 
@@ -38,6 +42,8 @@
 #define copy_from_guest_offset(ptr, hnd, off, nr) ({    \
     const typeof(ptr) _x = (hnd).p;                     \
     const typeof(ptr) _y = (ptr);                       \
+    hvm_guest(current) ?                                \
+    copy_from_user_hvm(_y, _x+(off), sizeof(*_x)*(nr)) :\
     copy_from_user(_y, _x+(off), sizeof(*_x)*(nr));     \
 })
 
@@ -45,6 +51,8 @@
 #define copy_field_to_guest(hnd, ptr, field) ({         \
     const typeof(&(ptr)->field) _x = &(hnd).p->field;   \
     const typeof(&(ptr)->field) _y = &(ptr)->field;     \
+    hvm_guest(current) ?                                \
+    copy_to_user_hvm(_x, _y, sizeof(*_x)) :             \
     copy_to_user(_x, _y, sizeof(*_x));                  \
 })
 
@@ -52,6 +60,8 @@
 #define copy_field_from_guest(ptr, hnd, field) ({       \
     const typeof(&(ptr)->field) _x = &(hnd).p->field;   \
     const typeof(&(ptr)->field) _y = &(ptr)->field;     \
+    hvm_guest(current) ?                                \
+    copy_from_user_hvm(_y, _x, sizeof(*_x)) :           \
     copy_from_user(_y, _x, sizeof(*_x));                \
 })
 
@@ -60,29 +70,37 @@
  * Allows use of faster __copy_* functions.
  */
 #define guest_handle_okay(hnd, nr)                      \
-    array_access_ok((hnd).p, (nr), sizeof(*(hnd).p))
+    (hvm_guest(current) || array_access_ok((hnd).p, (nr), sizeof(*(hnd).p)))
 
 #define __copy_to_guest_offset(hnd, off, ptr, nr) ({    \
     const typeof(ptr) _x = (hnd).p;                     \
     const typeof(ptr) _y = (ptr);                       \
+    hvm_guest(current) ?                                \
+    copy_to_user_hvm(_x+(off), _y, sizeof(*_x)*(nr)) :  \
     __copy_to_user(_x+(off), _y, sizeof(*_x)*(nr));     \
 })
 
 #define __copy_from_guest_offset(ptr, hnd, off, nr) ({  \
     const typeof(ptr) _x = (hnd).p;                     \
     const typeof(ptr) _y = (ptr);                       \
+    hvm_guest(current) ?                                \
+    copy_from_user_hvm(_y, _x+(off),sizeof(*_x)*(nr)) : \
     __copy_from_user(_y, _x+(off), sizeof(*_x)*(nr));   \
 })
 
 #define __copy_field_to_guest(hnd, ptr, field) ({       \
     const typeof(&(ptr)->field) _x = &(hnd).p->field;   \
     const typeof(&(ptr)->field) _y = &(ptr)->field;     \
+    hvm_guest(current) ?                                \
+    copy_to_user_hvm(_x, _y, sizeof(*_x)) :             \
     __copy_to_user(_x, _y, sizeof(*_x));                \
 })
 
 #define __copy_field_from_guest(ptr, hnd, field) ({     \
     const typeof(&(ptr)->field) _x = &(hnd).p->field;   \
     const typeof(&(ptr)->field) _y = &(ptr)->field;     \
+    hvm_guest(current) ?                                \
+    copy_from_user_hvm(_x, _y, sizeof(*_x)) :           \
     __copy_from_user(_y, _x, sizeof(*_x));              \
 })
 
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/domain.h
--- a/xen/include/asm-x86/hvm/domain.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/domain.h	Tue Jul 18 13:43:27 2006 +0100
@@ -27,17 +27,15 @@
 #include <asm/hvm/vpit.h>
 #include <asm/hvm/vlapic.h>
 #include <asm/hvm/vioapic.h>
+#include <public/hvm/params.h>
 
 #define HVM_PBUF_SIZE   80
 
 struct hvm_domain {
     unsigned long          shared_page_va;
-    unsigned int           nr_vcpus;
-    unsigned int           apic_enabled;
-    unsigned int           pae_enabled;
     s64                    tsc_frequency;
     struct pl_time         pl_time;
-    
+
     struct hvm_virpic      vpic;
     struct hvm_vioapic     vioapic;
     struct hvm_io_handler  io_handler;
@@ -48,6 +46,8 @@ struct hvm_domain {
 
     int                    pbuf_index;
     char                   pbuf[HVM_PBUF_SIZE];
+
+    unsigned long          params[HVM_NR_PARAMS];
 };
 
 #endif /* __ASM_X86_HVM_DOMAIN_H__ */
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/hvm.h
--- a/xen/include/asm-x86/hvm/hvm.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/hvm.h	Tue Jul 18 13:43:27 2006 +0100
@@ -61,6 +61,8 @@ struct hvm_function_table {
 
     void (*init_ap_context)(struct vcpu_guest_context *ctxt,
                             int vcpuid, int trampoline_vector);
+
+    void (*init_hypercall_page)(struct domain *d, void *hypercall_page);
 };
 
 extern struct hvm_function_table hvm_funcs;
@@ -75,12 +77,20 @@ hvm_disable(void)
         hvm_funcs.disable();
 }
 
+void hvm_create_event_channels(struct vcpu *v);
+void hvm_map_io_shared_page(struct vcpu *v);
+
 static inline int
 hvm_initialize_guest_resources(struct vcpu *v)
 {
-    if ( hvm_funcs.initialize_guest_resources )
-        return hvm_funcs.initialize_guest_resources(v);
-    return 0;
+    int ret = 1;
+    if (hvm_funcs.initialize_guest_resources)
+	ret = hvm_funcs.initialize_guest_resources(v);
+    if (ret == 1) {
+	hvm_map_io_shared_page(v);
+	hvm_create_event_channels(v);
+    }
+    return ret;
 }
 
 static inline void
@@ -121,6 +131,9 @@ hvm_instruction_length(struct vcpu *v)
     return hvm_funcs.instruction_length(v);
 }
 
+void hvm_hypercall_page_initialise(struct domain *d,
+                                   void *hypercall_page);
+
 static inline unsigned long
 hvm_get_guest_ctrl_reg(struct vcpu *v, unsigned int num)
 {
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/io.h
--- a/xen/include/asm-x86/hvm/io.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/io.h	Tue Jul 18 13:43:27 2006 +0100
@@ -150,14 +150,14 @@ static inline int irq_masked(unsigned lo
 #endif
 
 extern void handle_mmio(unsigned long, unsigned long);
-extern void hvm_wait_io(void);
-extern void hvm_safe_block(void);
 extern void hvm_io_assist(struct vcpu *v);
 extern void pic_irq_request(void *data, int level);
 extern void hvm_pic_assist(struct vcpu *v);
 extern int cpu_get_interrupt(struct vcpu *v, int *type);
 extern int cpu_has_pending_irq(struct vcpu *v);
 
+void hvm_release_assist_channel(struct vcpu *v);
+
 // XXX - think about this, maybe use bit 30 of the mfn to signify an MMIO frame.
 #define mmio_space(gpa) (!VALID_MFN(get_mfn_from_gpfn((gpa) >> PAGE_SHIFT)))
 
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/support.h
--- a/xen/include/asm-x86/hvm/support.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/support.h	Tue Jul 18 13:43:27 2006 +0100
@@ -42,11 +42,6 @@ static inline vcpu_iodata_t *get_vio(str
 static inline vcpu_iodata_t *get_vio(struct domain *d, unsigned long cpu)
 {
     return &get_sp(d)->vcpu_iodata[cpu];
-}
-
-static inline int iopacket_port(struct vcpu *v)
-{
-    return get_vio(v->domain, v->vcpu_id)->vp_eport;
 }
 
 /* XXX these are really VMX specific */
@@ -148,4 +143,9 @@ extern void hvm_print_line(struct vcpu *
 extern void hvm_print_line(struct vcpu *v, const char c);
 extern void hlt_timer_fn(void *data);
 
+void hvm_prod_vcpu(struct vcpu *v);
+void hvm_assist_complete(struct vcpu *v);
+
+void hvm_do_hypercall(struct cpu_user_regs *pregs);
+
 #endif /* __ASM_X86_HVM_SUPPORT_H__ */
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/svm/vmmcall.h
--- a/xen/include/asm-x86/hvm/svm/vmmcall.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/svm/vmmcall.h	Tue Jul 18 13:43:27 2006 +0100
@@ -23,11 +23,11 @@
 #define __ASM_X86_HVM_SVM_VMMCALL_H__
 
 /* VMMCALL command fields */
-#define VMMCALL_CODE_CPL_MASK     0xC0000000
-#define VMMCALL_CODE_MBZ_MASK     0x3FFF0000
+#define VMMCALL_CODE_CPL_MASK     0x60000000
+#define VMMCALL_CODE_MBZ_MASK     0x1FFF0000
 #define VMMCALL_CODE_COMMAND_MASK 0x0000FFFF
 
-#define MAKE_VMMCALL_CODE(cpl,func) ((cpl << 30) | (func))
+#define MAKE_VMMCALL_CODE(cpl,func) ((cpl << 29) | (func) | 0x80000000)
 
 /* CPL=0 VMMCALL Requests */
 #define VMMCALL_RESET_TO_REALMODE   MAKE_VMMCALL_CODE(0,1)
@@ -38,7 +38,7 @@
 /* return the cpl required for the vmmcall cmd */
 static inline int get_vmmcall_cpl(int cmd)
 {
-    return (cmd & VMMCALL_CODE_CPL_MASK) >> 30;
+    return (cmd & VMMCALL_CODE_CPL_MASK) >> 29;
 }
 
 #endif /* __ASM_X86_HVM_SVM_VMMCALL_H__ */
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/vcpu.h
--- a/xen/include/asm-x86/hvm/vcpu.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/vcpu.h	Tue Jul 18 13:43:27 2006 +0100
@@ -38,6 +38,8 @@ struct hvm_vcpu {
     /* For AP startup */
     unsigned long       init_sipi_sipi_state;
 
+    int                 xen_port;
+
     /* Flags */
     int                 flag_dr_dirty;
 
diff -r ecb8ff1fcf1f xen/include/asm-x86/shadow.h
--- a/xen/include/asm-x86/shadow.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/shadow.h	Tue Jul 18 13:43:27 2006 +0100
@@ -1733,6 +1733,32 @@ static inline unsigned long gva_to_gpa(u
 
     return l1e_get_paddr(gpte) + (gva & ~PAGE_MASK); 
 }
+
+static inline unsigned long gva_to_mfn(unsigned long gva)
+{
+    l1_pgentry_t l1e;
+
+    if (__copy_from_user(&l1e, &shadow_linear_pg_table[l1_linear_offset(gva)],
+                         sizeof(l1e)) ||
+        (l1e_get_flags(l1e) & (_PAGE_PRESENT | _PAGE_RW)) !=
+         (_PAGE_PRESENT | _PAGE_RW) ) {
+        struct cpu_user_regs cur;
+        /* Error code -> write */
+        cur.error_code = 3;
+        cur.cs = 0; /* Ring 0 -> hypervisor */
+        cur.eflags = 0;
+        shadow_fault(gva, &cur);
+        if (__copy_from_user(&l1e,
+                             &shadow_linear_pg_table[l1_linear_offset(gva)],
+                             sizeof(l1e)) ||
+            (l1e_get_flags(l1e) & (_PAGE_PRESENT | _PAGE_RW)) !=
+             (_PAGE_PRESENT | _PAGE_RW) ) {
+            return 0;
+        }
+    }
+    return l1e_get_pfn(l1e);
+}
+
 #endif
 /************************************************************************/
 
diff -r ecb8ff1fcf1f xen/include/public/hvm/ioreq.h
--- a/xen/include/public/hvm/ioreq.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/public/hvm/ioreq.h	Tue Jul 18 13:43:27 2006 +0100
@@ -27,7 +27,6 @@
 #define STATE_IOREQ_READY       1
 #define STATE_IOREQ_INPROCESS   2
 #define STATE_IORESP_READY      3
-#define STATE_IORESP_HOOK       4
 
 #define IOREQ_TYPE_PIO          0 /* pio */
 #define IOREQ_TYPE_COPY         1 /* mmio ops */
@@ -67,10 +66,8 @@ typedef struct global_iodata global_ioda
 typedef struct global_iodata global_iodata_t;
 
 struct vcpu_iodata {
-    struct ioreq         vp_ioreq;
-    /* Event channel port */
-    unsigned int    vp_eport;   /* VMX vcpu uses this to notify DM */
-    unsigned int    dm_eport;   /* DM uses this to notify VMX vcpu */
+    ioreq_t         vp_ioreq;
+    int             vp_xen_port;
 };
 typedef struct vcpu_iodata vcpu_iodata_t;
 
diff -r ecb8ff1fcf1f xen/include/public/xen.h
--- a/xen/include/public/xen.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/public/xen.h	Tue Jul 18 13:43:27 2006 +0100
@@ -66,6 +66,7 @@
 #define __HYPERVISOR_xenoprof_op          31
 #define __HYPERVISOR_event_channel_op     32
 #define __HYPERVISOR_physdev_op           33
+#define __HYPERVISOR_hvm_op               34
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff -r ecb8ff1fcf1f xen/include/xen/event.h
--- a/xen/include/xen/event.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/xen/event.h	Tue Jul 18 13:43:27 2006 +0100
@@ -44,4 +44,10 @@ extern long evtchn_send(unsigned int lpo
 /* Bind a local event-channel port to the specified VCPU. */
 extern long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id);
 
+int alloc_xen_event_channel(void (*f)(void *d),
+                            void *data,
+                            struct domain *d);
+void release_xen_event_channel(int ind);
+void notify_xen_event_channel(int port);
+
 #endif /* __XEN_EVENT_H__ */
diff -r ecb8ff1fcf1f xen/include/xen/hypercall.h
--- a/xen/include/xen/hypercall.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/xen/hypercall.h	Tue Jul 18 13:43:27 2006 +0100
@@ -87,4 +87,9 @@ do_nmi_op(
     unsigned int cmd,
     XEN_GUEST_HANDLE(void) arg);
 
+extern long
+do_hvm_op(
+    unsigned long op,
+    XEN_GUEST_HANDLE(void) arg);
+
 #endif /* __XEN_HYPERCALL_H__ */
diff -r ecb8ff1fcf1f xen/include/xen/sched.h
--- a/xen/include/xen/sched.h	Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/xen/sched.h	Tue Jul 18 13:43:27 2006 +0100
@@ -36,6 +36,7 @@ struct evtchn
 #define ECS_PIRQ         4 /* Channel is bound to a physical IRQ line.       */
 #define ECS_VIRQ         5 /* Channel is bound to a virtual IRQ line.        */
 #define ECS_IPI          6 /* Channel is bound to a virtual IPI line.        */
+#define ECS_XEN          7 /* Channel ends in Xen                            */
     u16 state;             /* ECS_* */
     u16 notify_vcpu_id;    /* VCPU for local delivery notification */
     union {
@@ -48,6 +49,7 @@ struct evtchn
         } interdomain; /* state == ECS_INTERDOMAIN */
         u16 pirq;      /* state == ECS_PIRQ */
         u16 virq;      /* state == ECS_VIRQ */
+        int xen_port;  /* state == ECS_XEN */
     } u;
 };
 
diff -r ecb8ff1fcf1f tools/ioemu/hw/xen_evtchn.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/ioemu/hw/xen_evtchn.c	Tue Jul 18 13:43:27 2006 +0100
@@ -0,0 +1,160 @@
+/*
+ * XEN event channel fake pci devicel
+ * 
+ * Copyright (c) 2003-2004 Intel Corp.
+ * Copyright (c) 2006 XenSource
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#include "vl.h"
+
+#include <xenguest.h>
+#include <xc_private.h>
+
+extern FILE *logfile;
+
+extern int domid;
+extern int xc_handle;
+
+static unsigned ioport_base;
+
+static void evtchn_ioport_write(void *opaque, uint32_t addr, uint32_t val)
+{
+    DECLARE_DOM0_OP;
+    int rc;
+
+    switch (addr - ioport_base) {
+    case 0:
+	fprintf(logfile, "Init hypercall page %x, addr %x.\n", val, addr);
+	op.u.hypercall_init.domain = domid;
+	op.u.hypercall_init.gmfn = val;
+	op.cmd = DOM0_HYPERCALL_INIT;
+	rc = xc_dom0_op(xc_handle, &op);
+	fprintf(logfile, "result -> %d.\n", rc);
+	break;
+    default:
+	fprintf(logfile, "Write to bad port %x (base %x) on evtchn device.\n",
+		addr, ioport_base);
+	break;
+    }
+}
+
+static uint32_t evtchn_ioport_read(void *opaque, uint32_t addr)
+{
+    return 0;
+}
+
+static void evtchn_map(PCIDevice *pci_dev, int region_num,
+                       uint32_t addr, uint32_t size, int type)
+{
+    ioport_base = addr;
+    register_ioport_write(addr, 16, 4, evtchn_ioport_write, NULL);
+    register_ioport_read(addr, 16, 1, evtchn_ioport_read, NULL);
+}
+
+static uint32_t xen_mmio_read(void *opaque, target_phys_addr_t addr)
+{
+    fprintf(logfile, "Warning: try read from evtchn mmio space\n");
+    return 0;
+}
+
+static void xen_mmio_write(void *opaque, target_phys_addr_t addr,
+			       uint32_t val)
+{
+    fprintf(logfile, "Warning: try write to evtchn mmio space\n");
+    return;
+}
+
+static CPUReadMemoryFunc *xen_evtchn_mmio_read[3] = {
+    xen_mmio_read,
+    xen_mmio_read,
+    xen_mmio_read,
+};
+
+static CPUWriteMemoryFunc *xen_evtchn_mmio_write[3] = {
+    xen_mmio_write,
+    xen_mmio_write,
+    xen_mmio_write,
+};
+
+static void xen_evtchn_pci_mmio_map(PCIDevice *d, int region_num,
+				uint32_t addr, uint32_t size, int type)
+{
+    int mmio_io_addr;
+
+    mmio_io_addr = cpu_register_io_memory(0,
+                        xen_evtchn_mmio_read,
+                        xen_evtchn_mmio_write, NULL);
+
+    cpu_register_physical_memory(addr, 0x1000000, mmio_io_addr);
+}
+
+struct pci_config_header {
+    unsigned short vendor_id;
+    unsigned short device_id;
+    unsigned short command;
+    unsigned short status;
+    unsigned char revision;
+    unsigned char api;
+    unsigned char subclass;
+    unsigned char class;
+    unsigned char cache_line_size; /* Units of 32 bit words */
+    unsigned char latency_timer; /* In units of bus cycles */
+    unsigned char header_type; /* Should be 0 */
+    unsigned char bist; /* Built in self test */
+    unsigned long base_address_regs[6];
+    unsigned long reserved1;
+    unsigned long reserved2;
+    unsigned long rom_addr;
+    unsigned long reserved3;
+    unsigned long reserved4;
+    unsigned char interrupt_line;
+    unsigned char interrupt_pin;
+    unsigned char min_gnt;
+    unsigned char max_lat;
+};
+
+void pci_xen_evtchn_init(PCIBus *bus)
+{
+    PCIDevice *d;
+    struct pci_config_header *pch;
+
+    printf("Register xen evtchn.\n");
+    d = pci_register_device(bus, "xen-evtchn", sizeof(PCIDevice), -1, NULL,
+			    NULL);
+    pch = (struct pci_config_header *)d->config;
+    pch->vendor_id = 0xfffd;
+    pch->device_id = 0x0101;
+    pch->command = 3; /* IO and memory access */
+    pch->revision = 0;
+    pch->api = 0;
+    pch->subclass = 0x80; /* Other */
+    pch->class = 0xff; /* Unclassified device class */
+    pch->header_type = 0;
+    pch->interrupt_pin = 1;
+
+    pci_register_io_region(d, 0, 0x100, PCI_ADDRESS_SPACE_IO, evtchn_map);
+
+    /* reserve 16MB mmio address for share memory*/
+    pci_register_io_region(d, 1, 0x1000000, PCI_ADDRESS_SPACE_MEM_PREFETCH,
+			   xen_evtchn_pci_mmio_map);
+
+    register_savevm("evtchn", 0, 1, generic_pci_save, generic_pci_load, d);
+    printf("Done register evtchn.\n");
+}
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/guest_access.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/xen/include/asm-x86/hvm/guest_access.h	Tue Jul 18 13:43:27 2006 +0100
@@ -0,0 +1,7 @@
+#ifndef __ASM_X86_HVM_GUEST_ACCESS_H__
+#define __ASM_X86_HVM_GUEST_ACCESS_H__
+
+unsigned long copy_to_user_hvm(void *to, const void *from, unsigned len);
+unsigned long copy_from_user_hvm(void *to, const void *from, unsigned len);
+
+#endif /* __ASM_X86_HVM_GUEST_ACCESS_H__ */
diff -r ecb8ff1fcf1f xen/include/public/hvm/params.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/xen/include/public/hvm/params.h	Tue Jul 18 13:43:27 2006 +0100
@@ -0,0 +1,22 @@
+#ifndef PARAMS_H__
+#define PARAMS_H__
+
+#define HVM_NR_PARAMS 4
+
+#define HVM_PARAM_CALLBACK_IRQ 0
+#define HVM_PARAM_STORE_PFN    1
+#define HVM_PARAM_STORE_EVTCHN 2
+#define HVM_PARAM_APIC_ENABLED 3
+
+#define HVMOP_set_param 0
+#define HVMOP_get_param 1
+
+struct xen_hvm_param {
+    domid_t domid;
+    unsigned index;
+    unsigned long value;
+};
+typedef struct xen_hvm_param xen_hvm_param_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_param_t);
+
+#endif /* PARAMS_H__ */

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains
  2006-07-18 12:51 Paravirtualised drivers for fully virtualised domains Steven Smith
@ 2006-07-18 13:45 ` Ben Thomas
  2006-07-18 16:00 ` Steve Ofsthun
  2006-07-26 15:34 ` Steven Smith
  2 siblings, 0 replies; 22+ messages in thread
From: Ben Thomas @ 2006-07-18 13:45 UTC (permalink / raw)
  To: Steven Smith; +Cc: xen-devel, sos22


[-- Attachment #1.1: Type: text/plain, Size: 5131 bytes --]

Steven,

This is very interesting. Thanks for posting it. It appears to closely
parallel work that we've been doing here and have mentioned at times on this
list. I believe that Steve Ofsthun posted some suggested patches in this
area a little while back. I'm looking forward to seeing how your patches
resolve the issues and feedback that was given to Steve.

I'm also interested to see how you resolve the 32/64 bit issues. I know that
there's some sensitivity here as one of the pieces of feedback about our
recent XI shadow posting was the current lack of 32bit support.  A 64 bit
hypervisor on an hvm capable machine should be capable of concurrent support
of both 32 and 64 bit guest domains.

Thanks for posting this. We're looking forward to seeing your final
submissions.

Thanks !
-b


On 7/18/06, Steven Smith <sos22-xen@srcf.ucam.org> wrote:
>
> (The list appears to have eaten my previous attempt to send this.
> Apologies if you receive multiple copies.)
>
> The attached patches allow you to use paravirtualised network and
> block interfaces from fully virtualised domains, based on Intel's
> patches from a few months ago.  These are significantly faster than
> the equivalent ioemu devices, sometimes by more than an order of
> magnitude.
>
> These drivers are explicitly not considered by XenSource to be an
> alternative to improving the performance of the ioemu devices.
> Rather, work on both will continue in parallel.
>
> To build, apply the three patches to a clean checkout of xen-unstable
> and then build Xen, dom0, and the tools in the usual way.  To build
> the drivers themselves, you first need to build a native kernel for
> the guest, and then go
>
> cd xen-unstable.hg/unmodified-drivers/linux-2.6
> ./mkbuildtree
> make -C /usr/src/linux-2.6.16 M=$PWD modules
>
> where /usr/src/linux-2.6.16 is the path to the area where you built
> the guest kernel.  This should be a native kernel, and not a xenolinux
> one.  You should end up with four modules.  xen-evtchn.ko should be
> loaded first, followed by xenbus.ko, and then whichever of xen-vnif.ko
> and xen-vbd.ko you need.  None of the modules need any arguments.
>
> The xm configuration syntax is exactly the same as it would be for
> paravirtualised devices in a paravirtualised domain.  For a network
> interface, you take your line
>
> vif= [ 'type=ioemu,mac=00:16:3E:C1:CA:78' ]
>
> (or whatever) and replace it with
>
> vif= [ 'type=ioemu,mac=00:16:3E:C1:CA:78', 'bridge=xenbr0' ]
>
> where bridge=xenbr0 should be some suitable netif configuration
> string, as it would be in the PV-on-PV case.  Disk is likewise fairly
> simple:
>
> disk = [ 'file:/path/to/image,ioemu:hda,w' ]
>
> becomes
>
> disk = [ 'file:/path/to/image,ioemu:hda,w',
> 'file:/path/to/some/other/image,hde,w' ]
>
> There is a slight complication in that the paravirtualised block
> device can't share an IDE controller with an ioemu device, so if you
> have an ioemu hda, the paravirtualised device must be hde or later.
> This is to avoid confusing the Linux IDE driver.
>
> Note that having a PV device doesn't imply having a corresponding
> ioemu device, and vice versa.  Configuring a single backing store to
> appear as both an IDE device and a paravirtualised block device is
> likely to cause problems; don't do it.
>
>
>
> The patches consist of a number of big parts:
>
> -- A version of netback and netfront which can copy packets into
>    domains rather than doing page flipping.  It's much easier to make
>    this work well with qemu, since the P2M table doesn't need to
>    change, and it can be faster for some workloads.
>
>    The copying interface has been confirmed to work in paravirtualised
>    domains, but is currently disabled there.
>
> -- Reworking the device model and hypervisor support so that iorequest
>    completion notifications no longer go to the HVM guest's event
>    channel mask.  This avoids a whole slew of really quite nasty race
>    conditions
>
> -- Adding a new device to the qemu PCI bus which is used for
>    bootstrapping the devices and getting an IRQ.
>
> -- Support for hypercalls from HVM domains
>
> -- Various shims and fixes to the frontends so that they work without
>    the rest of the xenolinux infrastructure.
>
> The patches still have a few rough edges, and they're not as easy to
> understand as I'd like, but I think they should be mostly
> comprehensible and reasonably stable.  The plan is to add them to
> xen-unstable over the next few weeks, probably before 3.0.3, so any
> testing which anyone can do would be helpful.
>
> The Xen and tools changes are also available as a series of smaller
> patches at http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/hvm_xen .  The
> composition of these gives hvm_xen_unstable.diff.
>
> Steven.
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.3 (GNU/Linux)
>
> iD8DBQFEvNk5O4S8/gLNrjcRAviLAJ0eS/1FZY+5ArbCrAaExsMrNAl9AQCgqyIp
> cRz5az+HktMS60u0qy+3dJA=
> =19b4
> -----END PGP SIGNATURE-----
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>
>
>

[-- Attachment #1.2: Type: text/html, Size: 6010 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains
  2006-07-18 12:51 Paravirtualised drivers for fully virtualised domains Steven Smith
  2006-07-18 13:45 ` Ben Thomas
@ 2006-07-18 16:00 ` Steve Ofsthun
  2006-07-18 16:23   ` Mark Williamson
  2006-07-18 20:34   ` Steven Smith
  2006-07-26 15:34 ` Steven Smith
  2 siblings, 2 replies; 22+ messages in thread
From: Steve Ofsthun @ 2006-07-18 16:00 UTC (permalink / raw)
  To: Steven Smith; +Cc: xen-devel, sos22

Steven Smith wrote:

> The attached patches allow you to use paravirtualised network and
> block interfaces from fully virtualised domains, based on Intel's
> patches from a few months ago.  These are significantly faster than
> the equivalent ioemu devices, sometimes by more than an order of
> magnitude.

Excellent work Steven!

I've been working on a similar set of patches and your effort seems
quite comprehensive.  I do have a few questions:

Can you comment on the testing matrix you used?  In particular, does
this patch address both 32-bit and 64-bit hypervisors?  Can 32-bit
guests make 64-bit hypercalls?

Have you built the guest environment on anything other than a 2.6.16
version of Linux?  We ran into extra work supporting older linux versions.

You did some work to make xenbus a loadable module in the guest domains.
Can this be used to make xenbus loadable in Domain 0?

> These drivers are explicitly not considered by XenSource to be an
> alternative to improving the performance of the ioemu devices.
> Rather, work on both will continue in parallel.

I agree.  Both activities are worth developing.

> There is a slight complication in that the paravirtualised block
> device can't share an IDE controller with an ioemu device, so if you
> have an ioemu hda, the paravirtualised device must be hde or later.
> This is to avoid confusing the Linux IDE driver.
> 
> Note that having a PV device doesn't imply having a corresponding
> ioemu device, and vice versa.  Configuring a single backing store to
> appear as both an IDE device and a paravirtualised block device is
> likely to cause problems; don't do it.

Several problems exist here:

Domain 0 buffer cache coherency issues can cause catastrophic file
system corruption.  This is due to the backend accessing the backing
device directly, and QEMU accessing the device through buffered reads
and writes.  We are working on a patch to convert QEMU to use O_DIRECT
whenever possible.  This solves the cache coherency issue.

Actually presenting two copies of the same device to linux can cause
its own problems.  Mounting using LABEL= will complain about duplicate
labels.  However, using the device names directly seems to work.  With
this approach it is possible to decide in the guest whether to mount
a device as an emulated disk or a PV disk.

> The patches consist of a number of big parts:
> 
> -- A version of netback and netfront which can copy packets into
>    domains rather than doing page flipping.  It's much easier to make
>    this work well with qemu, since the P2M table doesn't need to
>    change, and it can be faster for some workloads.

Recent patches to change QEMU to dynamically map memory may make this
easier.  We still avoid it to prevent large guest pages from being
broken up (under the XI shadow code).

>    The copying interface has been confirmed to work in paravirtualised
>    domains, but is currently disabled there.
> 
> -- Reworking the device model and hypervisor support so that iorequest
>    completion notifications no longer go to the HVM guest's event
>    channel mask.  This avoids a whole slew of really quite nasty race
>    conditions

This is great news.  We were filtering iorequest bits out during guest
event notification delivery.  Your method is much cleaner.

> -- Adding a new device to the qemu PCI bus which is used for
>    bootstrapping the devices and getting an IRQ.

Have you thought about supporting more than one IRQ.  We are experimenting
with an IRQ per device class (BUS, NIC, VBD).

> -- Support for hypercalls from HVM domains
> 
> -- Various shims and fixes to the frontends so that they work without
>    the rest of the xenolinux infrastructure.
> 
> The patches still have a few rough edges, and they're not as easy to
> understand as I'd like, but I think they should be mostly
> comprehensible and reasonably stable.  The plan is to add them to
> xen-unstable over the next few weeks, probably before 3.0.3, so any
> testing which anyone can do would be helpful.

This is a very good start!

Steve
-- 
Steve Ofsthun - Virtual Iron Software, Inc.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains
  2006-07-18 16:00 ` Steve Ofsthun
@ 2006-07-18 16:23   ` Mark Williamson
  2006-07-18 20:34   ` Steven Smith
  1 sibling, 0 replies; 22+ messages in thread
From: Mark Williamson @ 2006-07-18 16:23 UTC (permalink / raw)
  To: xen-devel; +Cc: Steve Ofsthun, sos22

> > These drivers are explicitly not considered by XenSource to be an
> > alternative to improving the performance of the ioemu devices.
> > Rather, work on both will continue in parallel.
>
> I agree.  Both activities are worth developing.

There's lots of stuff still to be done to make the ioemu devices work better, 
even if some users wish to use PV drivers directly some will still want the 
simplicity of working "out of the box".

> Actually presenting two copies of the same device to linux can cause
> its own problems.  Mounting using LABEL= will complain about duplicate
> labels.  However, using the device names directly seems to work.  With
> this approach it is possible to decide in the guest whether to mount
> a device as an emulated disk or a PV disk.

We should *really* have interlocks in dom0 to prevent a guest from accessing 
both simultaneously :-)

Initially, we could just allow the user only to configure as either model, not 
both (using a check in Xend, as we do for checking mounted partitions, etc).  
To support what you propose we'd probably have to add a little control plane 
stuff, but I think it'd be worth it to avoid too many people damaging stuff!

To mangle a quote I once saw online: duplicate device access can be used to 
hunt both foot and game, but only one will feed your family.

Cheers,
Mark

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains
  2006-07-18 16:00 ` Steve Ofsthun
  2006-07-18 16:23   ` Mark Williamson
@ 2006-07-18 20:34   ` Steven Smith
  2006-07-18 23:24     ` Steve Ofsthun
  1 sibling, 1 reply; 22+ messages in thread
From: Steven Smith @ 2006-07-18 20:34 UTC (permalink / raw)
  To: Steve Ofsthun; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5367 bytes --]

> >The attached patches allow you to use paravirtualised network and
> >block interfaces from fully virtualised domains, based on Intel's
> >patches from a few months ago.  These are significantly faster than
> >the equivalent ioemu devices, sometimes by more than an order of
> >magnitude.
> I've been working on a similar set of patches and your effort seems
> quite comprehensive.
Yeah, we (XenSource and Virtual Iron) really need to do a better job
of coordinating who's working on what. :)

> I do have a few questions:
> 
> Can you comment on the testing matrix you used? In particular, does
> this patch address both 32-bit and 64-bit hypervisors?  Can 32-bit
> guests make 64-bit hypercalls?
This set of patches only deals with the 32 bit case.  Further, the PAE
case depends on Tim Deegan's new shadow mode posted last week.

Sorry, I should have said that in the initial post.

> Have you built the guest environment on anything other than a 2.6.16
> version of Linux?  We ran into extra work supporting older linux versions.
#ifdef soup will get you back to about 2.6.12-ish without too many
problems.  These patches don't include that, since it would complicate
merging.

> You did some work to make xenbus a loadable module in the guest domains.
> Can this be used to make xenbus loadable in Domain 0?
I can't see any immediate reason why not, but it's not clear to me why
that would be useful.

> >There is a slight complication in that the paravirtualised block
> >device can't share an IDE controller with an ioemu device, so if you
> >have an ioemu hda, the paravirtualised device must be hde or later.
> >This is to avoid confusing the Linux IDE driver.
> >
> >Note that having a PV device doesn't imply having a corresponding
> >ioemu device, and vice versa.  Configuring a single backing store to
> >appear as both an IDE device and a paravirtualised block device is
> >likely to cause problems; don't do it.
> Domain 0 buffer cache coherency issues can cause catastrophic file
> system corruption.  This is due to the backend accessing the backing
> device directly, and QEMU accessing the device through buffered
> reads and writes. We are working on a patch to convert QEMU to use
> O_DIRECT whenever possible.  This solves the cache coherency issue.
I wasn't aware of these issues.  I was much more worried about domU
trying to cache the devices twice, and those caches getting out of
sync.  It's pretty much the usual problem of configuring a device into
two domains and then having them trip over each other.  Do you have a
plan for dealing with this?

> Actually presenting two copies of the same device to linux can cause
> its own problems.  Mounting using LABEL= will complain about duplicate
> labels.  However, using the device names directly seems to work.  With
> this approach it is possible to decide in the guest whether to mount
> a device as an emulated disk or a PV disk.
My plan here was to just not support VMs which mix paravirtualised and
ioemulated devices, requiring the user to load the PV drivers from an
initrd.  Of course, you have to load the initrd somehow, but the
bootloader should only be reading the disk, which makes the coherency
issues much easier.  As a last resort, rombios could learn about the
PV devices, but I'd rather avoid that if possible.

Your way would be preferable, though, if it works.

> >The patches consist of a number of big parts:
> >
> >-- A version of netback and netfront which can copy packets into
> >   domains rather than doing page flipping.  It's much easier to make
> >   this work well with qemu, since the P2M table doesn't need to
> >   change, and it can be faster for some workloads.
> Recent patches to change QEMU to dynamically map memory may make this
> easier.
Yes, agreed.  It should be possible to add this in later in a
backwards-compatible fashion.

> >-- Reworking the device model and hypervisor support so that iorequest
> >   completion notifications no longer go to the HVM guest's event
> >   channel mask.  This avoids a whole slew of really quite nasty race
> >   conditions
> This is great news.  We were filtering iorequest bits out during guest
> event notification delivery.  Your method is much cleaner.
Thank you.

> >-- Adding a new device to the qemu PCI bus which is used for
> >   bootstrapping the devices and getting an IRQ.
> Have you thought about supporting more than one IRQ.  We are experimenting
> with an IRQ per device class (BUS, NIC, VBD).
I considered it, but it wasn't obvious that there would be much
benefit.  You can potentially scan a smaller part of the pending event
channel mask, but that's fairly quick already.

Steven.

> >-- Support for hypercalls from HVM domains
> >
> >-- Various shims and fixes to the frontends so that they work without
> >   the rest of the xenolinux infrastructure.
> >
> >The patches still have a few rough edges, and they're not as easy to
> >understand as I'd like, but I think they should be mostly
> >comprehensible and reasonably stable.  The plan is to add them to
> >xen-unstable over the next few weeks, probably before 3.0.3, so any
> >testing which anyone can do would be helpful.
> 
> This is a very good start!
> 
> Steve
> -- 
> Steve Ofsthun - Virtual Iron Software, Inc.

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains
  2006-07-18 20:34   ` Steven Smith
@ 2006-07-18 23:24     ` Steve Ofsthun
  2006-07-19  6:50       ` Gerd Hoffmann
  0 siblings, 1 reply; 22+ messages in thread
From: Steve Ofsthun @ 2006-07-18 23:24 UTC (permalink / raw)
  To: Steven Smith; +Cc: xen-devel

Steven Smith wrote:

>>Have you built the guest environment on anything other than a 2.6.16
>>version of Linux?  We ran into extra work supporting older linux versions.
> 
> #ifdef soup will get you back to about 2.6.12-ish without too many
> problems.  These patches don't include that, since it would complicate
> merging.

I was thinking about SLES9 (2.6.5), RHEL4 (2.6.9), RHEL3 (2.4.21).

>>You did some work to make xenbus a loadable module in the guest domains.
>>Can this be used to make xenbus loadable in Domain 0?
> 
> I can't see any immediate reason why not, but it's not clear to me why
> that would be useful.

It just makes it easier to insert alternate bus implementations.

>>Domain 0 buffer cache coherency issues can cause catastrophic file
>>system corruption.  This is due to the backend accessing the backing
>>device directly, and QEMU accessing the device through buffered
>>reads and writes. We are working on a patch to convert QEMU to use
>>O_DIRECT whenever possible.  This solves the cache coherency issue.
> 
> I wasn't aware of these issues.  I was much more worried about domU
> trying to cache the devices twice, and those caches getting out of
> sync.  It's pretty much the usual problem of configuring a device into
> two domains and then having them trip over each other.  Do you have a
> plan for dealing with this?

We eliminate any buffer cache use in domain 0 for backing store objects.
This prevents double caching and reduces domain 0 's memory footprint.
We don't restrict multiple domain access to the same "raw" backing
object.  Real hardware allows this (at least for SCSI/FC).  This may be
necessary for shared storage clustering.

>>Actually presenting two copies of the same device to linux can cause
>>its own problems.  Mounting using LABEL= will complain about duplicate
>>labels.  However, using the device names directly seems to work.  With
>>this approach it is possible to decide in the guest whether to mount
>>a device as an emulated disk or a PV disk.
> 
> My plan here was to just not support VMs which mix paravirtualised and
> ioemulated devices, requiring the user to load the PV drivers from an
> initrd.  Of course, you have to load the initrd somehow, but the
> bootloader should only be reading the disk, which makes the coherency
> issues much easier.  As a last resort, rombios could learn about the
> PV devices, but I'd rather avoid that if possible.
> 
> Your way would be preferable, though, if it works.

We currently only allow this for the boot device (mainly to avoid the
rombios work you mention).  In addition, we make the qemu device only
visible to the rombios (and not the guest O/S) by controlling the IDE
probe logic in qemu.

>>>-- Adding a new device to the qemu PCI bus which is used for
>>>  bootstrapping the devices and getting an IRQ.
>>
>>Have you thought about supporting more than one IRQ.  We are experimenting
>>with an IRQ per device class (BUS, NIC, VBD).
> 
> I considered it, but it wasn't obvious that there would be much
> benefit.  You can potentially scan a smaller part of the pending event
> channel mask, but that's fairly quick already.

The main benefit we see is for legacy Linux variants that limit 1 CPU
per IRQ.  Allowing additional IRQs increases the possible interrupt
processing concurrency.  In addition, one interrupt class can't starve
another (on SMP guests).

Steve
-- 
Steve Ofsthun - Virtual Iron Software, Inc.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains
  2006-07-18 23:24     ` Steve Ofsthun
@ 2006-07-19  6:50       ` Gerd Hoffmann
  0 siblings, 0 replies; 22+ messages in thread
From: Gerd Hoffmann @ 2006-07-19  6:50 UTC (permalink / raw)
  To: Steve Ofsthun; +Cc: xen-devel

Steve Ofsthun wrote:
> Steven Smith wrote:
> 
>>> Have you built the guest environment on anything other than a 2.6.16
>>> version of Linux?  We ran into extra work supporting older linux
>>> versions.
>>
>> #ifdef soup will get you back to about 2.6.12-ish without too many
>> problems.  These patches don't include that, since it would complicate
>> merging.
> 
> I was thinking about SLES9 (2.6.5), RHEL4 (2.6.9), RHEL3 (2.4.21).

SLES9 SP3 kernels available here:
http://forge.novell.com/modules/xfcontent/downloads.php/xenpreview/SUSE%20Linux%20Enterprise%20Server/9%20SP3/

I have a sles9 guest up and running on a sles10 host machine.

cheers,

  Gerd

-- 
Gerd Hoffmann <kraxel@suse.de>
http://www.suse.de/~kraxel/julika-dora.jpeg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains
  2006-07-18 12:51 Paravirtualised drivers for fully virtualised domains Steven Smith
  2006-07-18 13:45 ` Ben Thomas
  2006-07-18 16:00 ` Steve Ofsthun
@ 2006-07-26 15:34 ` Steven Smith
  2006-08-08  9:42   ` Steven Smith
  2 siblings, 1 reply; 22+ messages in thread
From: Steven Smith @ 2006-07-26 15:34 UTC (permalink / raw)
  To: xen-devel; +Cc: sos22


[-- Attachment #1.1: Type: text/plain, Size: 1263 bytes --]

I've just put an updated version of these patches up at
http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev2 .  There's also an
equivalent single big patch at
http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev2.combined .  Thank you to
everyone who gave feedback on the previous version.

The main changes since last time are:

-- Support for SMP guests
-- Support for 64 bit guests on a 64 bit hypervisor
-- Partial support for 32 bit guests on a 64 bit hypervisor: the network
   interface works, but the block device doesn't.

The block device can be made to work by #define'ing ALIEN_INTERFACES
in blkif.h, but drivers compiled in that way won't work with 32 on 32.
The problem here is that blkif_request_t contains extra padding in 64
bit builds, and so is a different size, and so the block ring layout
is different.

Other structures with similar problems are handled either by run time
tests in the drivers (shared_info_t) or translation wrappers in the
hypervisor (xen_feature_info_t, xen_add_to_physmap_t), but trying to
do this for the block rings would require far more painful and
extensive surgery.  I'm inclined to stick with multiply compiling the
frontend drivers in the short term, although it'll obviously need
doing in a slightly less grotty way.

Steven.

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains
  2006-07-26 15:34 ` Steven Smith
@ 2006-08-08  9:42   ` Steven Smith
  2006-08-09 18:05     ` Steve Dobbelstein
  2006-08-10 11:08     ` Paravirtualised drivers for fully virtualised domains, rev9 Steven Smith
  0 siblings, 2 replies; 22+ messages in thread
From: Steven Smith @ 2006-08-08  9:42 UTC (permalink / raw)
  To: xen-devel; +Cc: sos22


[-- Attachment #1.1: Type: text/plain, Size: 215 bytes --]

I just put a new version of the PV-on-HVM patches up at
http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev8 .  These are against
10968:51c227428166 and are otherwise largely unchanged from the
previous versions.

Steven.

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains
  2006-08-08  9:42   ` Steven Smith
@ 2006-08-09 18:05     ` Steve Dobbelstein
  2006-08-10 11:08     ` Paravirtualised drivers for fully virtualised domains, rev9 Steven Smith
  1 sibling, 0 replies; 22+ messages in thread
From: Steve Dobbelstein @ 2006-08-09 18:05 UTC (permalink / raw)
  To: Steven Smith; +Cc: xen-devel

Steven Smith <sos22-xen@srcf.ucam.org> wrote on 08/08/2006 04:42:15 AM:

> I just put a new version of the PV-on-HVM patches up at
> http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev8 .  These are against
> 10968:51c227428166 and are otherwise largely unchanged from the
> previous versions.
>
> Steven.

I have been running some informal performance tests on the rev8 patches.
Thought I'd share my finding thus far.

I am finding that disk performance (sequential/random read/write) with the
PV xen-vbd driver in an HVM domain is pretty much equal to that of a PV
domain.  Cool.  Not surprising, but cool nonetheless.

At the moment I'm having trouble running a network test (netperf) of the PV
xen-vnif driver within our testing framework.  I'll post those findings
when I get some reliable numbers.  Testing on the rev2 version of the
patches showed pretty much equal network performance between running on a
PV driver in an HVM domain and a PV domain.

I am noticing two odd behaviors with the rev8 patches, though.

1. When I try to create a PV domain, the domain hangs on bootup displaying
repeated messages to the console:
netfront: Bad rx response id 1.
netfront: Bad rx response id 0.
netfront: Bad rx response id 1.
netfront: Bad rx response id 0.
...

I had to reboot from an unpatched changeset 10968 build to get the
performance numbers for a PV domain.  (Hence, I am not comparing numbers
from the exact same code base, which is one reason why the tests are
"informal".)

I haven't dug into the cause of this problem yet.

2. When I destroy the HVM domain it stays in the zombie state.
dib:~ # xm list
Name                              ID Mem(MiB) VCPUs State  Time(s)
Domain-0                           0      768     1 r-----  2328.4
Zombie-hvm1                        1      768     1 -----d  1502.6

I'm not sure how to debug this one.  Any pointers would be helpful.

Steve D.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Paravirtualised drivers for fully virtualised domains, rev9
  2006-08-08  9:42   ` Steven Smith
  2006-08-09 18:05     ` Steve Dobbelstein
@ 2006-08-10 11:08     ` Steven Smith
  2006-08-10 21:48       ` Steve Dobbelstein
  2006-08-16 13:33       ` Paravirtualised drivers for fully virtualised domains, rev11 sos22-xen
  1 sibling, 2 replies; 22+ messages in thread
From: Steven Smith @ 2006-08-10 11:08 UTC (permalink / raw)
  To: Steven Smith; +Cc: xen-devel, sos22


[-- Attachment #1.1: Type: text/plain, Size: 366 bytes --]

I just put a new version of the PV-on-HVM patches up at
http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev9 .  These are against
10968:51c227428166, as before.  Hopefully, the problems some people
have been having with network access from paravirtualised domains and
domains becoming zombies are now fixed.

Thanks to everyone who submitted bug reports on these.

Steven.

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains, rev9
  2006-08-10 11:08     ` Paravirtualised drivers for fully virtualised domains, rev9 Steven Smith
@ 2006-08-10 21:48       ` Steve Dobbelstein
  2006-08-11 10:17         ` Steven Smith
  2006-08-16 13:36         ` Steven Smith
  2006-08-16 13:33       ` Paravirtualised drivers for fully virtualised domains, rev11 sos22-xen
  1 sibling, 2 replies; 22+ messages in thread
From: Steve Dobbelstein @ 2006-08-10 21:48 UTC (permalink / raw)
  To: Steven Smith; +Cc: xen-devel, sos22, xen-devel-bounces

Steven Smith <sos22-xen@srcf.ucam.org> wrote on 08/10/2006 06:08:38 AM:

> I just put a new version of the PV-on-HVM patches up at
> http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev9 .  These are against
> 10968:51c227428166, as before.  Hopefully, the problems some people
> have been having with network access from paravirtualised domains and
> domains becoming zombies are now fixed.
>
> Thanks to everyone who submitted bug reports on these.

Hi, Steve.

Thought I'd share my findings so far with rev9.

The good news is that I don't get zombies anymore.  The bad news is that
I'm still getting very poor network performance running netperf, worse than
a fully virtualized domain.  I thought it was something wrong with my test
setup when I was testing rev8, but the test setup looks good and the
results are repeatable.

Here is what I have found so far in trying to chase down the cause of the
slowdown.
The qemu-dm process is running 99.9% of the CPU on dom0.  I ran xenoprofile
to see what functions are chewing up the most time.  Here are the first
several lines of output from the xenoprofile report:

1316786  17.1956  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
system_call
1243487  16.2385  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
do_select
492967    6.4376  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
do_gettimeofday
467692    6.1075  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
sys_select
376844    4.9211  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up fget
330483    4.3157  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
sys_clock_gettime
291153    3.8021  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
ktime_get_ts
291098    3.8014  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
memset
249732    3.2612  xen-unstable-syms        xen-unstable-syms
write_cr3
195102    2.5478  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
fget_light
190663    2.4898  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
__kmalloc
183748    2.3995  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
tty_poll
152136    1.9867  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
copy_user_generic
129317    1.6887  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
tun_chr_poll
115066    1.5026  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
getnstimeofday
94228     1.2305  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
wait_for_completion_interruptible
85598     1.1178  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
copy_from_user
83495     1.0903  qemu-dm                  qemu-dm
qemu_run_timers
82606     1.0787  xen-unstable-syms        xen-unstable-syms
syscall_enter
82507     1.0774  xen-unstable-syms        xen-unstable-syms        FLT2
76960     1.0050  qemu-dm                  qemu-dm
main_loop_wait
71759     0.9371  xen-unstable-syms        xen-unstable-syms
toggle_guest_mode
47744     0.6235  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
sys_read
44890     0.5862  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
pipe_poll
40506     0.5290  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
pty_chars_in_buffer
40210     0.5251  librt-2.4.so             librt-2.4.so
clock_gettime
37866     0.4945  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
normal_poll
35160     0.4591  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
tty_paranoia_check
34715     0.4533  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
poll_initwait
34225     0.4469  xen-unstable-syms        xen-unstable-syms
test_guest_events
32643     0.4263  xen-unstable-syms        xen-unstable-syms
restore_all_guest
31101     0.4061  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
posix_ktime_get_ts
29352     0.3833  qemu-dm                  qemu-dm                  DMA_run
27741     0.3623  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
vfs_read
27443     0.3584  papps1-syms              papps1-syms              (no
symbols)
26663     0.3482  qemu-dm                  qemu-dm
main_loop
26283     0.3432  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up kfree
24446     0.3192  xen-unstable-syms        xen-unstable-syms
__copy_from_user_ll
23117     0.3019  xen-unstable-syms        xen-unstable-syms        do_iret
22559     0.2946  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up fput
20354     0.2658  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
__wake_up_common
19516     0.2549  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
tty_ldisc_deref
19290     0.2519  xen-unstable-syms        xen-unstable-syms
test_all_events
18499     0.2416  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
rw_verify_area
17759     0.2319  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
__wake_up
13282     0.1734  xen-unstable-syms        xen-unstable-syms
create_bounce_frame
11968     0.1563  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
hypercall_page
11211     0.1464  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
set_normalized_timespec
11127     0.1453  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
sysret_check
10494     0.1370  xen-unstable-syms        xen-unstable-syms        FLT131
10467     0.1367  pxen1-syms               pxen1-syms
vmx_asm_vmexit_handler
9478      0.1238  libpthread-2.4.so        libpthread-2.4.so
__pthread_disable_asynccancel
9260      0.1209  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
copy_to_user
9222      0.1204  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
sync_buffer
8616      0.1125  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
tty_ldisc_ref_wait
7985      0.1043  oprofiled                oprofiled
odb_insert
6862      0.0896  libpthread-2.4.so        libpthread-2.4.so
__read_nocancel
6806      0.0889  xen-unstable-syms        xen-unstable-syms        FLT3
6676      0.0872  qemu-dm                  qemu-dm
cpu_get_clock
6576      0.0859  pxen1-syms               pxen1-syms
vmx_load_cpu_guest_regs
6450      0.0842  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
evtchn_poll
6349      0.0829  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
pty_write_room
5978      0.0781  qemu-dm                  qemu-dm
qemu_get_clock
5906      0.0771  pxen1-syms               pxen1-syms
resync_all
5745      0.0750  oprofiled                oprofiled
opd_process_samples
5738      0.0749  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
tty_ldisc_try
4943      0.0645  oprofiled                oprofiled
sfile_find
4803      0.0627  xen-unstable-syms        xen-unstable-syms
pit_read_counter
4194      0.0548  xen-unstable-syms        xen-unstable-syms
copy_from_user
4007      0.0523  pxen1-syms               pxen1-syms
vmx_store_cpu_guest_regs
3838      0.0501  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
n_tty_chars_in_buffer
3507      0.0458  xen-unstable-syms        xen-unstable-syms        FLT6
3501      0.0457  xen-unstable-syms        xen-unstable-syms        FLT4
3474      0.0454  oprofiled                oprofiled
pop_buffer_value
3436      0.0449  xen-unstable-syms        xen-unstable-syms        FLT11
3283      0.0429  vmlinux-2.6.16.13-xen0-up vmlinux-2.6.16.13-xen0-up
poll_freewait
3260      0.0426  pxen1-syms               pxen1-syms
vmx_vmexit_handler

xen-unstable-syms is the Xen hypervisor running on behalf of dom0.
pxen1-syms is the Xen hypervisor running on behalf of the HVM domain.
vmlinux-2.6.16.13-xen0-up is the kernel running in dom0.

It appears that a lot of time is spent running timers and getting the
current time.  Not being familiar with the code, I am now crawling through
it to see how timers are handled and how the xen-vnif PV driver uses them.
I'm also looking for potential differences between rev2 and rev8 since the
network performance of rev2 was pretty equal to that of a PV domain.
Knowing the code, you may have a solution before I find the problem.

Steve D.

P.S.  This just in from a test running while I typed the above.  I noticed
that qemu will start a "gui_timer" when VNC is not used.  I normally run
without graphics (nographic=1 in the domain config file).  I changed the
config file to use VNC. The qemu-dm CPU utilization in dom0 dropped to
below 10%.   The network performance improved from 0.19 Mb/s to 9.75 Mb/s
(still less than the 23.07 Mb/s for a fully virtualized domain).  It
appears there is some interaction between using the xen-vnif driver and the
qemu timer code.  I'm still exploring.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains, rev9
  2006-08-10 21:48       ` Steve Dobbelstein
@ 2006-08-11 10:17         ` Steven Smith
  2006-08-11 10:31           ` Harry Butterworth
  2006-08-11 17:04           ` Steve Dobbelstein
  2006-08-16 13:36         ` Steven Smith
  1 sibling, 2 replies; 22+ messages in thread
From: Steven Smith @ 2006-08-11 10:17 UTC (permalink / raw)
  To: Steve Dobbelstein; +Cc: xen-devel, sos22


[-- Attachment #1.1: Type: text/plain, Size: 2404 bytes --]

> Here is what I have found so far in trying to chase down the cause of the
> slowdown.
> The qemu-dm process is running 99.9% of the CPU on dom0.
That seems very wrong.  When I try this, the device model is almost
completely idle.  Could you see what strace says, please, or if there
are any strange messages in the /var/log/qemu-dm. file?

> It appears that a lot of time is spent running timers and getting the
> current time.  Not being familiar with the code, I am now crawling through
> it to see how timers are handled and how the xen-vnif PV driver uses them.
Timer handling isn't really changed by any of these patches.  Patch
02.ioemu_xen_evtchns.diff is in vaguely the same area, but I can't see
how it could cause the problems you're seeing, assuming your
hypervisor and libxc are up to date.

What changeset of xen-unstable did you apply the patches to?

> P.S.  This just in from a test running while I typed the above.  I noticed
> that qemu will start a "gui_timer" when VNC is not used.  I normally run
> without graphics (nographic=1 in the domain config file).  I changed the
> config file to use VNC. The qemu-dm CPU utilization in dom0 dropped to
> below 10%.   The network performance improved from 0.19 Mb/s to 9.75 Mb/s
> (still less than the 23.07 Mb/s for a fully virtualized domain).
When I try this, I see about 1600Mb/s between dom0 and a
paravirtualised domU, about 30Mb/s between dom0 and an ioemu domU, and
about 1200Mb/s between dom0 and an HVM domU running these drivers, all
collected using netpipe-tcp.  That is a regression, but much smaller
than you're seeing.

There are a couple of obvious things to check:

1) Do the statistics reported by ifconfig show any errors?
2) How often is the event channel interrupt firing according to
	/proc/interrupts?  I see about 50k-150k/second.
3) Is there any packet loss when you ping a domain?  Start your test
	and run a ping in parallel.

The other thing is that these drivers seem to be very sensitive to
kernel debugging options in the domU.  If you've got anything enabled
in the kernel hacking menu it might be worth trying again with that
switched off.

> It appears there is some interaction between using the xen-vnif
> driver and the qemu timer code.  I'm still exploring.
I'd be happier if I could reproduce this problem here.  Are you
running SMP?  PAE?  64 bit?  What kernel are you running in the domU?

Steven.

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains, rev9
  2006-08-11 10:17         ` Steven Smith
@ 2006-08-11 10:31           ` Harry Butterworth
  2006-08-14  9:12             ` Steven Smith
  2006-08-11 17:04           ` Steve Dobbelstein
  1 sibling, 1 reply; 22+ messages in thread
From: Harry Butterworth @ 2006-08-11 10:31 UTC (permalink / raw)
  To: Steven Smith; +Cc: Steve Dobbelstein, xen-devel, sos22

On Fri, 2006-08-11 at 11:17 +0100, Steven Smith wrote:
> > Here is what I have found so far in trying to chase down the cause of the
> > slowdown.
> > The qemu-dm process is running 99.9% of the CPU on dom0.
> That seems very wrong.  When I try this, the device model is almost
> completely idle.  Could you see what strace says, please, or if there
> are any strange messages in the /var/log/qemu-dm. file?

I haven't tried the patches being discussed in this thread but I'm
seeing similar problems with qemu-dm anyway...

I've been looking into bugzilla 725 and I'm also seeing 100% cpu usage
by qemu-dm.  xm-test uses the nographic flag and I find that if this is
not set then the cpu usage drops to normal levels and the test passes.

> 
> > It appears that a lot of time is spent running timers and getting the
> > current time.

Yes, this is what I was seeing with the nographic flag set.

> Not being familiar with the code, I am now crawling through
> > it to see how timers are handled and how the xen-vnif PV driver uses them.
> Timer handling isn't really changed by any of these patches.  Patch
> 02.ioemu_xen_evtchns.diff is in vaguely the same area, but I can't see
> how it could cause the problems you're seeing, assuming your
> hypervisor and libxc are up to date.
> 
> What changeset of xen-unstable did you apply the patches to?

I've been seeing the problem on recent unstable changesets without the
patches.  Changesets 10992, 10949 for example.

> 
> > P.S.  This just in from a test running while I typed the above.  I noticed
> > that qemu will start a "gui_timer" when VNC is not used.  I normally run
> > without graphics (nographic=1 in the domain config file).

>   I changed the
> > config file to use VNC. The qemu-dm CPU utilization in dom0 dropped to
> > below 10%.

Yep, that's what I see without the patches.

> The network performance improved from 0.19 Mb/s to 9.75 Mb/s
> > (still less than the 23.07 Mb/s for a fully virtualized domain).
> When I try this, I see about 1600Mb/s between dom0 and a
> paravirtualised domU, about 30Mb/s between dom0 and an ioemu domU, and
> about 1200Mb/s between dom0 and an HVM domU running these drivers, all
> collected using netpipe-tcp.  That is a regression, but much smaller
> than you're seeing.
> 
> There are a couple of obvious things to check:
> 
> 1) Do the statistics reported by ifconfig show any errors?
> 2) How often is the event channel interrupt firing according to
> 	/proc/interrupts?  I see about 50k-150k/second.
> 3) Is there any packet loss when you ping a domain?  Start your test
> 	and run a ping in parallel.
> 
> The other thing is that these drivers seem to be very sensitive to
> kernel debugging options in the domU.  If you've got anything enabled
> in the kernel hacking menu it might be worth trying again with that
> switched off.
> 
> > It appears there is some interaction between using the xen-vnif
> > driver and the qemu timer code.  I'm still exploring.
> I'd be happier if I could reproduce this problem here.  Are you
> running SMP?  PAE?  64 bit?  What kernel are you running in the domU?
> 
> Steven.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains, rev9
  2006-08-11 10:17         ` Steven Smith
  2006-08-11 10:31           ` Harry Butterworth
@ 2006-08-11 17:04           ` Steve Dobbelstein
  2006-08-12  8:32             ` Steven Smith
  1 sibling, 1 reply; 22+ messages in thread
From: Steve Dobbelstein @ 2006-08-11 17:04 UTC (permalink / raw)
  To: Steven Smith; +Cc: xen-devel

Steven Smith <sos22@hermes.cam.ac.uk> wrote on 08/11/2006 05:17:04 AM:

> > Here is what I have found so far in trying to chase down the cause of
the
> > slowdown.
> > The qemu-dm process is running 99.9% of the CPU on dom0.
> That seems very wrong.  When I try this, the device model is almost
> completely idle.  Could you see what strace says, please, or if there
> are any strange messages in the /var/log/qemu-dm. file?

Looks like I jumped the gun in relating the 99.9% CPU usage for qemu-dm and
the network.  I start up the HVM domain and without running any tests
qemu-dm is chewing up 99.9% of the CPU in dom0.  So it appears that the
100% CPU qemu usage is a problem by itself.  Looks like the same problem
Harry Butterworth is seeing.

> > It appears that a lot of time is spent running timers and getting the
> > current time.  Not being familiar with the code, I am now crawling
through
> > it to see how timers are handled and how the xen-vnif PV driver uses
them.
> Timer handling isn't really changed by any of these patches.  Patch
> 02.ioemu_xen_evtchns.diff is in vaguely the same area, but I can't see
> how it could cause the problems you're seeing, assuming your
> hypervisor and libxc are up to date.
>
> What changeset of xen-unstable did you apply the patches to?

10968

> > P.S.  This just in from a test running while I typed the above.  I
noticed
> > that qemu will start a "gui_timer" when VNC is not used.  I normally
run
> > without graphics (nographic=1 in the domain config file).  I changed
the
> > config file to use VNC. The qemu-dm CPU utilization in dom0 dropped to
> > below 10%.   The network performance improved from 0.19 Mb/s to 9.75
Mb/s
> > (still less than the 23.07 Mb/s for a fully virtualized domain).
> When I try this, I see about 1600Mb/s between dom0 and a
> paravirtualised domU, about 30Mb/s between dom0 and an ioemu domU, and
> about 1200Mb/s between dom0 and an HVM domU running these drivers, all
> collected using netpipe-tcp.  That is a regression, but much smaller
> than you're seeing.
>
> There are a couple of obvious things to check:
>
> 1) Do the statistics reported by ifconfig show any errors?

No errors.

> 2) How often is the event channel interrupt firing according to
>    /proc/interrupts?  I see about 50k-150k/second.

I'm seeing ~ 500/s when netpipe-tcp reports decent throughput at smaller
buffer sizes and then ~50/s when the throughput drops at larger buffer
sizes.

> 3) Is there any packet loss when you ping a domain?  Start your test
>    and run a ping in parallel.

No packet loss.

> The other thing is that these drivers seem to be very sensitive to
> kernel debugging options in the domU.  If you've got anything enabled
> in the kernel hacking menu it might be worth trying again with that
> switched off.

Kernel debugging is on.  I also have Oprofile enabled.  I'll build a kernel
without those and see if it helps.

> > It appears there is some interaction between using the xen-vnif
> > driver and the qemu timer code.  I'm still exploring.
> I'd be happier if I could reproduce this problem here.  Are you
> running SMP?  PAE?  64 bit?  What kernel are you running in the domU?

UP kernels in both the domU and dom0 (although the scheduler likes to move
the 1 vcpu in dom0 around to different physical CPUs).  64-bit kernels on
both.
Steve D.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains, rev9
  2006-08-11 17:04           ` Steve Dobbelstein
@ 2006-08-12  8:32             ` Steven Smith
  2006-08-14 21:22               ` Steve Dobbelstein
  0 siblings, 1 reply; 22+ messages in thread
From: Steven Smith @ 2006-08-12  8:32 UTC (permalink / raw)
  To: Steve Dobbelstein; +Cc: xen-devel, sos22


[-- Attachment #1.1: Type: text/plain, Size: 2058 bytes --]

> > > Here is what I have found so far in trying to chase down the cause of
> the
> > > slowdown.
> > > The qemu-dm process is running 99.9% of the CPU on dom0.
> > That seems very wrong.  When I try this, the device model is almost
> > completely idle.  Could you see what strace says, please, or if there
> > are any strange messages in the /var/log/qemu-dm. file?
> Looks like I jumped the gun in relating the 99.9% CPU usage for qemu-dm and
> the network.  I start up the HVM domain and without running any tests
> qemu-dm is chewing up 99.9% of the CPU in dom0.  So it appears that the
> 100% CPU qemu usage is a problem by itself.  Looks like the same problem
> Harry Butterworth is seeing.
qemu-dm misbehaving could certainly lead to the netif going very
slowly.

> > 2) How often is the event channel interrupt firing according to
> >    /proc/interrupts?  I see about 50k-150k/second.
> I'm seeing ~ 500/s when netpipe-tcp reports decent throughput at smaller
> buffer sizes and then ~50/s when the throughput drops at larger buffer
> sizes.
How large do they have to be to cause problems?

> > The other thing is that these drivers seem to be very sensitive to
> > kernel debugging options in the domU.  If you've got anything enabled
> > in the kernel hacking menu it might be worth trying again with that
> > switched off.
> Kernel debugging is on.  I also have Oprofile enabled.  I'll build a kernel
> without those and see if it helps.
Worth a shot.  It shouldn't cause the problems with qemu, though.

> > > It appears there is some interaction between using the xen-vnif
> > > driver and the qemu timer code.  I'm still exploring.
> > I'd be happier if I could reproduce this problem here.  Are you
> > running SMP?  PAE?  64 bit?  What kernel are you running in the domU?
> UP kernels in both the domU and dom0 (although the scheduler likes to move
> the 1 vcpu in dom0 around to different physical CPUs).  64-bit kernels on
> both.
I've mostly been testing with 32 bit PAE.  I'll have a go with a 64
bit system on Monday.

Thanks,

Steven.

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains, rev9
  2006-08-11 10:31           ` Harry Butterworth
@ 2006-08-14  9:12             ` Steven Smith
  0 siblings, 0 replies; 22+ messages in thread
From: Steven Smith @ 2006-08-14  9:12 UTC (permalink / raw)
  To: Harry Butterworth; +Cc: Steve Dobbelstein, xen-devel, sos22


[-- Attachment #1.1.1: Type: text/plain, Size: 624 bytes --]

> > > Here is what I have found so far in trying to chase down the cause of the
> > > slowdown.
> > > The qemu-dm process is running 99.9% of the CPU on dom0.
> > That seems very wrong.  When I try this, the device model is almost
> > completely idle.  Could you see what strace says, please, or if there
> > are any strange messages in the /var/log/qemu-dm. file?
> I've been looking into bugzilla 725 and I'm also seeing 100% cpu usage
> by qemu-dm.  xm-test uses the nographic flag and I find that if this is
> not set then the cpu usage drops to normal levels and the test passes.
Does the attached patch help?

Steven.

[-- Attachment #1.1.2: fix_qemu_blocking.diff --]
[-- Type: text/plain, Size: 601 bytes --]

diff -r fd10729d891f tools/ioemu/vl.c
--- a/tools/ioemu/vl.c	Fri Aug 11 17:39:33 2006 +0100
+++ b/tools/ioemu/vl.c	Mon Aug 14 11:06:01 2006 +0100
@@ -6036,7 +6036,7 @@ int main(int argc, char **argv)
                 }
                 break;
             case QEMU_OPTION_nographic:
-                pstrcpy(monitor_device, sizeof(monitor_device), "stdio");
+                pstrcpy(monitor_device, sizeof(monitor_device), "null");
                 if(!strcmp(serial_devices[0], "vc"))
                     pstrcpy(serial_devices[0], sizeof(serial_devices[0]),
                             "stdio");

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains, rev9
  2006-08-12  8:32             ` Steven Smith
@ 2006-08-14 21:22               ` Steve Dobbelstein
  2006-08-15  7:27                 ` Steven Smith
  0 siblings, 1 reply; 22+ messages in thread
From: Steve Dobbelstein @ 2006-08-14 21:22 UTC (permalink / raw)
  To: Steven Smith; +Cc: xen-devel, sos22

Steven Smith <sos22@hermes.cam.ac.uk> wrote on 08/12/2006 03:32:23 AM:

> > > > Here is what I have found so far in trying to chase down the cause
of
> > the
> > > > slowdown.
> > > > The qemu-dm process is running 99.9% of the CPU on dom0.
> > > That seems very wrong.  When I try this, the device model is almost
> > > completely idle.  Could you see what strace says, please, or if there
> > > are any strange messages in the /var/log/qemu-dm. file?
> > Looks like I jumped the gun in relating the 99.9% CPU usage for qemu-dm
and
> > the network.  I start up the HVM domain and without running any tests
> > qemu-dm is chewing up 99.9% of the CPU in dom0.  So it appears that the
> > 100% CPU qemu usage is a problem by itself.  Looks like the same
problem
> > Harry Butterworth is seeing.
> qemu-dm misbehaving could certainly lead to the netif going very
> slowly.

Agreed.  I applied the patch to sent to Harry.  I appears to fix the 99.9%
CPU usage problem.

> > > 2) How often is the event channel interrupt firing according to
> > >    /proc/interrupts?  I see about 50k-150k/second.
> > I'm seeing ~ 500/s when netpipe-tcp reports decent throughput at
smaller
> > buffer sizes and then ~50/s when the throughput drops at larger buffer
> > sizes.
> How large do they have to be to cause problems?

I'm noticing a drop off in throughput at a buffer size of 3069.  Here is a
snip from the output from netpipe-tcp.

 43:    1021 bytes    104 times -->     20.27 Mbps in     384.28 usec
 44:    1024 bytes    129 times -->     20.14 Mbps in     387.86 usec
 45:    1027 bytes    129 times -->     20.17 Mbps in     388.46 usec
 46:    1533 bytes    129 times -->     22.94 Mbps in     509.95 usec
 47:    1536 bytes    130 times -->     23.00 Mbps in     509.48 usec
 48:    1539 bytes    130 times -->     23.12 Mbps in     507.92 usec
 49:    2045 bytes     66 times -->     30.02 Mbps in     519.66 usec
 50:    2048 bytes     96 times -->     30.50 Mbps in     512.35 usec
 51:    2051 bytes     97 times -->     30.61 Mbps in     511.24 usec
 52:    3069 bytes     98 times -->      0.61 Mbps in   38672.52 usec
 53:    3072 bytes      3 times -->      0.48 Mbps in   48633.50 usec
 54:    3075 bytes      3 times -->      0.48 Mbps in   48542.50 usec
 55:    4093 bytes      3 times -->      0.64 Mbps in   48516.35 usec
 56:    4096 bytes      3 times -->      0.65 Mbps in   48449.48 usec
 57:    4099 bytes      3 times -->      0.64 Mbps in   48575.84 usec

The throughput remains low for the remainder of the buffer sizes which go
to 49155 before the benchmarks exits due to the requests taking more than a
second.

> > > The other thing is that these drivers seem to be very sensitive to
> > > kernel debugging options in the domU.  If you've got anything enabled
> > > in the kernel hacking menu it might be worth trying again with that
> > > switched off.
> > Kernel debugging is on.  I also have Oprofile enabled.  I'll build a
kernel
> > without those and see if it helps.
> Worth a shot.  It shouldn't cause the problems with qemu, though.

I built a kernel without kernel debugging and without instrumentation.  The
results were very similar.

 43:    1021 bytes    104 times -->     20.27 Mbps in     384.28 usec
 44:    1024 bytes    129 times -->     20.30 Mbps in     384.91 usec
 45:    1027 bytes    130 times -->     20.19 Mbps in     388.02 usec
 46:    1533 bytes    129 times -->     22.97 Mbps in     509.25 usec
 47:    1536 bytes    130 times -->     23.02 Mbps in     509.12 usec
 48:    1539 bytes    131 times -->     23.04 Mbps in     509.65 usec
 49:    2045 bytes     65 times -->     30.41 Mbps in     513.07 usec
 50:    2048 bytes     97 times -->     30.49 Mbps in     512.49 usec
 51:    2051 bytes     97 times -->     30.45 Mbps in     513.85 usec
 52:    3069 bytes     97 times -->      0.75 Mbps in   31141.34 usec
 53:    3072 bytes      3 times -->      0.48 Mbps in   48596.50 usec
 54:    3075 bytes      3 times -->      0.48 Mbps in   48876.17 usec
 55:    4093 bytes      3 times -->      0.64 Mbps in   48489.33 usec
 56:    4096 bytes      3 times -->      0.64 Mbps in   48606.63 usec
 57:    4099 bytes      3 times -->      0.64 Mbps in   48568.33 usec

Again, the throughput remains low for the remainder of the buffer sizes
which go to 49155

The above tests were run to netpipe-tcp running on another machine.  When I
run to netpipe-tcp running in dom0 I get better throughput but also some
strange behavior.  Again, a snip from the output.

 43:    1021 bytes    606 times -->    140.14 Mbps in      55.58 usec
 44:    1024 bytes    898 times -->    141.16 Mbps in      55.35 usec
 45:    1027 bytes    905 times -->    138.93 Mbps in      56.40 usec
 46:    1533 bytes    890 times -->    133.74 Mbps in      87.45 usec
 47:    1536 bytes    762 times -->    132.82 Mbps in      88.23 usec
 48:    1539 bytes    756 times -->    132.01 Mbps in      88.95 usec
 49:    2045 bytes    376 times -->    172.36 Mbps in      90.52 usec
 50:    2048 bytes    552 times -->    177.41 Mbps in      88.07 usec
 51:    2051 bytes    568 times -->    176.12 Mbps in      88.85 usec
 52:    3069 bytes    564 times -->      0.44 Mbps in   53173.74 usec
 53:    3072 bytes      3 times -->      0.44 Mbps in   53249.32 usec
 54:    3075 bytes      3 times -->      0.50 Mbps in   46639.64 usec
 55:    4093 bytes      3 times -->    321.94 Mbps in      97.00 usec
 56:    4096 bytes    515 times -->    287.05 Mbps in     108.87 usec
 57:    4099 bytes    459 times -->      2.69 Mbps in   11615.94 usec
 58:    6141 bytes      4 times -->      0.63 Mbps in   74535.64 usec
 59:    6144 bytes      3 times -->      0.35 Mbps in  133242.01 usec
 60:    6147 bytes      3 times -->      0.35 Mbps in  133311.47 usec
 61:    8189 bytes      3 times -->      0.62 Mbps in  100391.51 usec
 62:    8192 bytes      3 times -->      1.05 Mbps in   59535.66 usec
 63:    8195 bytes      3 times -->      0.63 Mbps in   99598.69 usec
 64:   12285 bytes      3 times -->      0.47 Mbps in  199974.34 usec
 65:   12288 bytes      3 times -->      4.70 Mbps in   19933.34 usec
 66:   12291 bytes      3 times -->      4.70 Mbps in   19933.30 usec
 67:   16381 bytes      3 times -->      0.71 Mbps in  176984.35 usec
 68:   16384 bytes      3 times -->      0.93 Mbps in  134929.50 usec
 69:   16387 bytes      3 times -->      0.93 Mbps in  134930.33 usec

The throughput drops at a buffer size of 3069 as in the prior runs, but it
regains at 4093 and 4096, and then drops off again for the remainder of the
test.

I don't know offhand why the throughput drops off.  I'll look into it.  Any
tips would be helpful.

For comparison, an FV domU running netpipe-tcp to another machine will ramp
up to about 20 Mbps at a buffer size of around 128 KB and then taper off to
17 Mbps.  A PV domU will ramp up to around 750 Mbps at a buffer size of
about 2 MB and maintain that throughput to an 8 MB buffer when the test
stopped.  On dom0 netpipe-tcp running to another machine ramps up to around
850 Mbps at a buffer sizes from 3 MB to 8 MB where the test stopped.

Steve D.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains, rev9
  2006-08-14 21:22               ` Steve Dobbelstein
@ 2006-08-15  7:27                 ` Steven Smith
  2006-08-15 22:05                   ` Steve Dobbelstein
  0 siblings, 1 reply; 22+ messages in thread
From: Steven Smith @ 2006-08-15  7:27 UTC (permalink / raw)
  To: Steve Dobbelstein; +Cc: xen-devel, sos22


[-- Attachment #1.1: Type: text/plain, Size: 1659 bytes --]

> > > Looks like I jumped the gun in relating the 99.9% CPU usage for qemu-dm
> and
> > > the network.  I start up the HVM domain and without running any tests
> > > qemu-dm is chewing up 99.9% of the CPU in dom0.  So it appears that the
> > > 100% CPU qemu usage is a problem by itself.  Looks like the same
> problem
> > > Harry Butterworth is seeing.
> > qemu-dm misbehaving could certainly lead to the netif going very
> > slowly.
> Agreed.  I applied the patch to sent to Harry.  I appears to fix the 99.9%
> CPU usage problem.
Great, thanks.

> > > > 2) How often is the event channel interrupt firing according to
> > > >    /proc/interrupts?  I see about 50k-150k/second.
> > > I'm seeing ~ 500/s when netpipe-tcp reports decent throughput at
> smaller
> > > buffer sizes and then ~50/s when the throughput drops at larger buffer
> > > sizes.
> > How large do they have to be to cause problems?
> I'm noticing a drop off in throughput at a buffer size of 3069.  Here is a
> snip from the output from netpipe-tcp.
What are the MTUs on the interfaces, according to ifconfig, in dom0
and domU?

> I don't know offhand why the throughput drops off.  I'll look into it.  Any
> tips would be helpful.
tcpdump in the domU and dom0 might be enlightening, just to see if any
packets are getting dropped or truncated.  The connections probably
slow enough when it's misbehaving for it to keep up.

Are you running through the bridge?  It's unlikely to be that, but it
would be good to eliminate it as a variable by doing some domU<->dom0
tests without it involved.

What version of Linux are you running in the domU?  Does it have any
patches applied?

Steven.

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains, rev9
  2006-08-15  7:27                 ` Steven Smith
@ 2006-08-15 22:05                   ` Steve Dobbelstein
  0 siblings, 0 replies; 22+ messages in thread
From: Steve Dobbelstein @ 2006-08-15 22:05 UTC (permalink / raw)
  To: Steven Smith; +Cc: xen-devel, sos22

[-- Attachment #1: Type: text/plain, Size: 2204 bytes --]

Steven Smith <sos22-xen@srcf.ucam.org> wrote on 08/15/2006 02:27:50 AM:

> > > > > 2) How often is the event channel interrupt firing according to
> > > > >    /proc/interrupts?  I see about 50k-150k/second.
> > > > I'm seeing ~ 500/s when netpipe-tcp reports decent throughput at
> > smaller
> > > > buffer sizes and then ~50/s when the throughput drops at larger
buffer
> > > > sizes.
> > > How large do they have to be to cause problems?
> > I'm noticing a drop off in throughput at a buffer size of 3069.  Here
is a
> > snip from the output from netpipe-tcp.
> What are the MTUs on the interfaces, according to ifconfig, in dom0
> and domU?

MTUs on all the interfaces are 1500.

> > I don't know offhand why the throughput drops off.  I'll look into it.
Any
> > tips would be helpful.
> tcpdump in the domU and dom0 might be enlightening, just to see if any
> packets are getting dropped or truncated.  The connections probably
> slow enough when it's misbehaving for it to keep up.

tcpdump on both dom0 and domU shows no packets dropped and none truncated.

I noticed lines such as:

16:28:18.596654 IP dib.ltc.austin.ibm.com > hvm1.ltc.austin.ibm.com: ICMP
dib.ltc.austin.ibm.com unreachable - need to frag (mtu 1500), length 556

in the tcpdump output during the slow down.  (dib.ltc.austin.ibm.com is
dom0.)  Knowing very little about the TCP protocol, I'm not sure if that
indicates a problem.

> Are you running through the bridge?  It's unlikely to be that, but it
> would be good to eliminate it as a variable by doing some domU<->dom0
> tests without it involved.

I am running through the bridge, the default Xen setup.

I doubt the bridge is the problem since I also use the bridge for a PV domU
and an FV domU and those don't see a slowdown.

> What version of Linux are you running in the domU?  Does it have any
> patches applied?

SLES 10 beta 10.  (Yes, SLES 10 has released.  We haven't updated our
automated testing framework yet.)  I'm running a 2.6.16.13 kernel.org
kernel, the current base kernel for xen-unstable.  No patches applied.

Here is the kernel config from /proc/config.gz in the HVM domU.
(See attached file: hvm_kernel_config)

Thanks for your attention.

Steve D.

[-- Attachment #2: hvm_kernel_config --]
[-- Type: application/octet-stream, Size: 25911 bytes --]

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.16.13-baremetal-up
# Mon Aug 14 10:56:34 2006
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_SYSCTL=y
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_UID16=y
CONFIG_VM86=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
# CONFIG_EMBEDDED is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_CC_ALIGN_FUNCTIONS=0
CONFIG_CC_ALIGN_LABELS=0
CONFIG_CC_ALIGN_LOOPS=0
CONFIG_CC_ALIGN_JUMPS=0
CONFIG_SLAB=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_OBSOLETE_MODPARM=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y

#
# Block layer
#
# CONFIG_LBD is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="anticipatory"

#
# Processor type and features
#
CONFIG_X86_PC=y
# CONFIG_X86_VSMP is not set
# CONFIG_MK8 is not set
CONFIG_MPSC=y
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_L1_CACHE_BYTES=128
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_MICROCODE=y
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
# CONFIG_MTRR is not set
# CONFIG_SMP is not set
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_HPET_TIMER=y
CONFIG_GART_IOMMU=y
CONFIG_SWIOTLB=y
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_INTEL is not set
# CONFIG_X86_MCE_AMD is not set
# CONFIG_KEXEC is not set
# CONFIG_CRASH_DUMP is not set
CONFIG_PHYSICAL_START=0x100000
CONFIG_SECCOMP=y
CONFIG_HZ_100=y
# CONFIG_HZ_250 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=100
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_ISA_DMA_API=y

#
# Power management options
#
CONFIG_PM=y
# CONFIG_PM_LEGACY is not set
# CONFIG_PM_DEBUG is not set
# CONFIG_SOFTWARE_SUSPEND is not set

#
# ACPI (Advanced Configuration and Power Interface) Support
#
CONFIG_ACPI=y
# CONFIG_ACPI_SLEEP is not set
CONFIG_ACPI_AC=m
CONFIG_ACPI_BATTERY=m
CONFIG_ACPI_BUTTON=m
CONFIG_ACPI_VIDEO=m
# CONFIG_ACPI_HOTKEY is not set
CONFIG_ACPI_FAN=m
CONFIG_ACPI_PROCESSOR=m
CONFIG_ACPI_THERMAL=m
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_IBM is not set
# CONFIG_ACPI_TOSHIBA is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_PM_TIMER=y
# CONFIG_ACPI_CONTAINER is not set

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
# CONFIG_UNORDERED_IO is not set
CONFIG_PCIEPORTBUS=y
# CONFIG_PCI_MSI is not set
CONFIG_PCI_LEGACY_PROC=y

#
# PCCARD (PCMCIA/CardBus) support
#
# CONFIG_PCCARD is not set

#
# PCI Hotplug Support
#
# CONFIG_HOTPLUG_PCI is not set

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_IA32_EMULATION=y
CONFIG_IA32_AOUT=y
CONFIG_COMPAT=y
CONFIG_SYSVIPC_COMPAT=y

#
# Networking
#
CONFIG_NET=y

#
# Networking options
#
# CONFIG_NETDEBUG is not set
CONFIG_PACKET=y
# CONFIG_PACKET_MMAP is not set
CONFIG_UNIX=y
# CONFIG_NET_KEY is not set
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_FIB_HASH=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
# CONFIG_IP_PNP_BOOTP is not set
# CONFIG_IP_PNP_RARP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_TUNNEL is not set
# CONFIG_INET_DIAG is not set
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_BIC=y

#
# IP: Virtual Server Configuration
#
# CONFIG_IP_VS is not set
# CONFIG_IPV6 is not set
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
CONFIG_BRIDGE_NETFILTER=y

#
# Core Netfilter Configuration
#
# CONFIG_NETFILTER_NETLINK is not set
# CONFIG_NETFILTER_XTABLES is not set

#
# IP: Netfilter Configuration
#
CONFIG_IP_NF_CONNTRACK=m
CONFIG_IP_NF_CT_ACCT=y
# CONFIG_IP_NF_CONNTRACK_MARK is not set
# CONFIG_IP_NF_CONNTRACK_EVENTS is not set
# CONFIG_IP_NF_CT_PROTO_SCTP is not set
CONFIG_IP_NF_FTP=m
# CONFIG_IP_NF_IRC is not set
# CONFIG_IP_NF_NETBIOS_NS is not set
# CONFIG_IP_NF_TFTP is not set
# CONFIG_IP_NF_AMANDA is not set
# CONFIG_IP_NF_PPTP is not set
# CONFIG_IP_NF_QUEUE is not set

#
# Bridge: Netfilter Configuration
#
# CONFIG_BRIDGE_NF_EBTABLES is not set

#
# DCCP Configuration (EXPERIMENTAL)
#
# CONFIG_IP_DCCP is not set

#
# SCTP Configuration (EXPERIMENTAL)
#
# CONFIG_IP_SCTP is not set

#
# TIPC Configuration (EXPERIMENTAL)
#
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
CONFIG_BRIDGE=y
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set

#
# QoS and/or fair queueing
#
# CONFIG_NET_SCHED is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_HAMRADIO is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_IEEE80211 is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_STANDALONE=y
# CONFIG_PREVENT_FIRMWARE_BUILD is not set
CONFIG_FW_LOADER=m

#
# Connector - unified userspace <-> kernelspace linker
#
# CONFIG_CONNECTOR is not set

#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set

#
# Parallel port support
#
# CONFIG_PARPORT is not set

#
# Plug and Play support
#
# CONFIG_PNP is not set

#
# Block devices
#
CONFIG_BLK_DEV_FD=y
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_CRYPTOLOOP=y
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_UB is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=16384
CONFIG_BLK_DEV_INITRD=y
# CONFIG_CDROM_PKTCDVD is not set
# CONFIG_ATA_OVER_ETH is not set

#
# ATA/ATAPI/MFM/RLL support
#
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y

#
# Please see Documentation/ide.txt for help/info on IDE drives
#
# CONFIG_BLK_DEV_IDE_SATA is not set
# CONFIG_BLK_DEV_HD_IDE is not set
CONFIG_BLK_DEV_IDEDISK=y
# CONFIG_IDEDISK_MULTI_MODE is not set
CONFIG_BLK_DEV_IDECD=y
# CONFIG_BLK_DEV_IDETAPE is not set
# CONFIG_BLK_DEV_IDEFLOPPY is not set
# CONFIG_BLK_DEV_IDESCSI is not set
# CONFIG_IDE_TASK_IOCTL is not set

#
# IDE chipset support/bugfixes
#
CONFIG_IDE_GENERIC=y
# CONFIG_BLK_DEV_CMD640 is not set
CONFIG_BLK_DEV_IDEPCI=y
# CONFIG_IDEPCI_SHARE_IRQ is not set
# CONFIG_BLK_DEV_OFFBOARD is not set
CONFIG_BLK_DEV_GENERIC=y
# CONFIG_BLK_DEV_OPTI621 is not set
# CONFIG_BLK_DEV_RZ1000 is not set
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_IDEDMA_FORCED is not set
CONFIG_IDEDMA_PCI_AUTO=y
# CONFIG_IDEDMA_ONLYDISK is not set
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
# CONFIG_BLK_DEV_AMD74XX is not set
# CONFIG_BLK_DEV_ATIIXP is not set
# CONFIG_BLK_DEV_CMD64X is not set
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5520 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT34X is not set
# CONFIG_BLK_DEV_HPT366 is not set
# CONFIG_BLK_DEV_SC1200 is not set
CONFIG_BLK_DEV_PIIX=y
# CONFIG_BLK_DEV_IT821X is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_PDC202XX_OLD is not set
# CONFIG_BLK_DEV_PDC202XX_NEW is not set
# CONFIG_BLK_DEV_SVWKS is not set
# CONFIG_BLK_DEV_SIIMAGE is not set
# CONFIG_BLK_DEV_SIS5513 is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
# CONFIG_BLK_DEV_VIA82CXXX is not set
# CONFIG_IDE_ARM is not set
CONFIG_BLK_DEV_IDEDMA=y
# CONFIG_IDEDMA_IVB is not set
CONFIG_IDEDMA_AUTO=y
# CONFIG_BLK_DEV_HD is not set

#
# SCSI device support
#
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
# CONFIG_BLK_DEV_SR is not set
CONFIG_CHR_DEV_SG=y
# CONFIG_CHR_DEV_SCH is not set

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
# CONFIG_SCSI_MULTI_LUN is not set
# CONFIG_SCSI_CONSTANTS is not set
# CONFIG_SCSI_LOGGING is not set

#
# SCSI Transport Attributes
#
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_FC_ATTRS=y
# CONFIG_SCSI_ISCSI_ATTRS is not set
CONFIG_SCSI_SAS_ATTRS=y

#
# SCSI low-level drivers
#
# CONFIG_ISCSI_TCP is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
CONFIG_SCSI_AIC79XX=m
CONFIG_AIC79XX_CMDS_PER_DEVICE=32
CONFIG_AIC79XX_RESET_DELAY_MS=15000
# CONFIG_AIC79XX_BUILD_FIRMWARE is not set
# CONFIG_AIC79XX_ENABLE_RD_STRM is not set
CONFIG_AIC79XX_DEBUG_ENABLE=y
CONFIG_AIC79XX_DEBUG_MASK=0
CONFIG_AIC79XX_REG_PRETTY_PRINT=y
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
CONFIG_SCSI_SATA=m
# CONFIG_SCSI_SATA_AHCI is not set
# CONFIG_SCSI_SATA_SVW is not set
CONFIG_SCSI_ATA_PIIX=m
# CONFIG_SCSI_SATA_MV is not set
# CONFIG_SCSI_SATA_NV is not set
# CONFIG_SCSI_PDC_ADMA is not set
# CONFIG_SCSI_SATA_QSTOR is not set
# CONFIG_SCSI_SATA_PROMISE is not set
# CONFIG_SCSI_SATA_SX4 is not set
# CONFIG_SCSI_SATA_SIL is not set
# CONFIG_SCSI_SATA_SIL24 is not set
# CONFIG_SCSI_SATA_SIS is not set
# CONFIG_SCSI_SATA_ULI is not set
# CONFIG_SCSI_SATA_VIA is not set
# CONFIG_SCSI_SATA_VITESSE is not set
CONFIG_SCSI_SATA_INTEL_COMBINED=y
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_GDTH is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_QLOGIC_FC is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
CONFIG_SCSI_LPFC=m
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_DEBUG is not set

#
# Multi-device support (RAID and LVM)
#
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_LINEAR=y
CONFIG_MD_RAID0=y
CONFIG_MD_RAID1=y
# CONFIG_MD_RAID10 is not set
# CONFIG_MD_RAID5 is not set
# CONFIG_MD_RAID6 is not set
CONFIG_MD_MULTIPATH=y
# CONFIG_MD_FAULTY is not set
CONFIG_BLK_DEV_DM=y
CONFIG_DM_CRYPT=y
CONFIG_DM_SNAPSHOT=y
CONFIG_DM_MIRROR=y
# CONFIG_DM_ZERO is not set
CONFIG_DM_MULTIPATH=y
CONFIG_DM_MULTIPATH_EMC=y

#
# Fusion MPT device support
#
CONFIG_FUSION=y
CONFIG_FUSION_SPI=m
CONFIG_FUSION_FC=m
CONFIG_FUSION_SAS=m
CONFIG_FUSION_MAX_SGE=128
CONFIG_FUSION_CTL=m

#
# IEEE 1394 (FireWire) support
#
# CONFIG_IEEE1394 is not set

#
# I2O device support
#
# CONFIG_I2O is not set

#
# Network device support
#
CONFIG_NETDEVICES=y
# CONFIG_DUMMY is not set
# CONFIG_BONDING is not set
# CONFIG_EQUALIZER is not set
CONFIG_TUN=y

#
# ARCnet devices
#
# CONFIG_ARCNET is not set

#
# PHY device support
#
# CONFIG_PHYLIB is not set

#
# Ethernet (10 or 100Mbit)
#
CONFIG_NET_ETHERNET=y
CONFIG_MII=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
CONFIG_NET_VENDOR_3COM=y
CONFIG_VORTEX=y
# CONFIG_TYPHOON is not set

#
# Tulip family network device support
#
CONFIG_NET_TULIP=y
# CONFIG_DE2104X is not set
CONFIG_TULIP=y
# CONFIG_TULIP_MWI is not set
# CONFIG_TULIP_MMIO is not set
# CONFIG_TULIP_NAPI is not set
# CONFIG_DE4X5 is not set
# CONFIG_WINBOND_840 is not set
# CONFIG_DM9102 is not set
# CONFIG_ULI526X is not set
# CONFIG_HP100 is not set
CONFIG_NET_PCI=y
CONFIG_PCNET32=y
# CONFIG_AMD8111_ETH is not set
# CONFIG_ADAPTEC_STARFIRE is not set
# CONFIG_B44 is not set
# CONFIG_FORCEDETH is not set
# CONFIG_DGRS is not set
# CONFIG_EEPRO100 is not set
CONFIG_E100=y
# CONFIG_FEALNX is not set
# CONFIG_NATSEMI is not set
CONFIG_NE2K_PCI=y
# CONFIG_8139CP is not set
CONFIG_8139TOO=y
CONFIG_8139TOO_PIO=y
# CONFIG_8139TOO_TUNE_TWISTER is not set
# CONFIG_8139TOO_8129 is not set
# CONFIG_8139_OLD_RX_RESET is not set
# CONFIG_SIS900 is not set
# CONFIG_EPIC100 is not set
# CONFIG_SUNDANCE is not set
CONFIG_VIA_RHINE=y
# CONFIG_VIA_RHINE_MMIO is not set

#
# Ethernet (1000 Mbit)
#
CONFIG_ACENIC=y
# CONFIG_ACENIC_OMIT_TIGON_I is not set
# CONFIG_DL2K is not set
CONFIG_E1000=y
# CONFIG_E1000_NAPI is not set
# CONFIG_E1000_DISABLE_PACKET_SPLIT is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
# CONFIG_R8169 is not set
# CONFIG_SIS190 is not set
# CONFIG_SKGE is not set
# CONFIG_SKY2 is not set
CONFIG_SK98LIN=y
# CONFIG_VIA_VELOCITY is not set
CONFIG_TIGON3=y
# CONFIG_BNX2 is not set

#
# Ethernet (10000 Mbit)
#
# CONFIG_CHELSIO_T1 is not set
# CONFIG_IXGB is not set
# CONFIG_S2IO is not set

#
# Token Ring devices
#
# CONFIG_TR is not set

#
# Wireless LAN (non-hamradio)
#
# CONFIG_NET_RADIO is not set

#
# Wan interfaces
#
# CONFIG_WAN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_NET_FC is not set
# CONFIG_SHAPER is not set
# CONFIG_NETCONSOLE is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set

#
# ISDN subsystem
#
# CONFIG_ISDN is not set

#
# Telephony Support
#
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_TSDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
# CONFIG_MOUSE_SERIAL is not set
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
# CONFIG_SERIAL_NONSTANDARD is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
# CONFIG_SERIAL_8250_ACPI is not set
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256

#
# IPMI
#
# CONFIG_IPMI_HANDLER is not set

#
# Watchdog Cards
#
# CONFIG_WATCHDOG is not set
# CONFIG_HW_RANDOM is not set
# CONFIG_NVRAM is not set
# CONFIG_RTC is not set
# CONFIG_GEN_RTC is not set
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set

#
# Ftape, the floppy tape device driver
#
# CONFIG_FTAPE is not set
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
# CONFIG_AGP_INTEL is not set
CONFIG_DRM=m
CONFIG_DRM_TDFX=m
CONFIG_DRM_R128=m
CONFIG_DRM_RADEON=m
CONFIG_DRM_MGA=m
# CONFIG_DRM_SIS is not set
# CONFIG_DRM_VIA is not set
# CONFIG_DRM_SAVAGE is not set
# CONFIG_MWAVE is not set
# CONFIG_RAW_DRIVER is not set
# CONFIG_HPET is not set
# CONFIG_HANGCHECK_TIMER is not set

#
# TPM devices
#
# CONFIG_TCG_TPM is not set
# CONFIG_TELCLOCK is not set

#
# I2C support
#
# CONFIG_I2C is not set

#
# SPI support
#
CONFIG_SPI=y
CONFIG_SPI_MASTER=y

#
# SPI Master Controller Drivers
#
CONFIG_SPI_BITBANG=y

#
# SPI Protocol Masters
#

#
# Dallas's 1-wire bus
#
# CONFIG_W1 is not set

#
# Hardware Monitoring support
#
# CONFIG_HWMON is not set
# CONFIG_HWMON_VID is not set

#
# Misc devices
#
# CONFIG_IBM_ASM is not set

#
# Multimedia Capabilities Port drivers
#

#
# Multimedia devices
#
# CONFIG_VIDEO_DEV is not set

#
# Digital Video Broadcasting Devices
#
# CONFIG_DVB is not set

#
# Graphics support
#
# CONFIG_FB is not set
# CONFIG_VIDEO_SELECT is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y

#
# Sound
#
# CONFIG_SOUND is not set

#
# USB support
#
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set

#
# Miscellaneous USB options
#
# CONFIG_USB_DEVICEFS is not set
# CONFIG_USB_BANDWIDTH is not set
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_SUSPEND is not set
# CONFIG_USB_OTG is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_EHCI_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
CONFIG_USB_OHCI_HCD=y
# CONFIG_USB_OHCI_BIG_ENDIAN is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_SL811_HCD is not set

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set

#
# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support'
#

#
# may also be needed; see USB_STORAGE Help for more information
#
# CONFIG_USB_STORAGE is not set
# CONFIG_USB_LIBUSUAL is not set

#
# USB Input Devices
#
CONFIG_USB_HID=y
CONFIG_USB_HIDINPUT=y
# CONFIG_USB_HIDINPUT_POWERBOOK is not set
# CONFIG_HID_FF is not set
# CONFIG_USB_HIDDEV is not set
# CONFIG_USB_AIPTEK is not set
# CONFIG_USB_WACOM is not set
# CONFIG_USB_ACECAD is not set
# CONFIG_USB_KBTAB is not set
# CONFIG_USB_POWERMATE is not set
# CONFIG_USB_MTOUCH is not set
# CONFIG_USB_ITMTOUCH is not set
# CONFIG_USB_EGALAX is not set
# CONFIG_USB_YEALINK is not set
# CONFIG_USB_XPAD is not set
# CONFIG_USB_ATI_REMOTE is not set
# CONFIG_USB_ATI_REMOTE2 is not set
# CONFIG_USB_KEYSPAN_REMOTE is not set
# CONFIG_USB_APPLETOUCH is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set

#
# USB Multimedia devices
#
# CONFIG_USB_DABUSB is not set

#
# Video4Linux support is needed for USB Multimedia device support
#

#
# USB Network Adapters
#
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET is not set
CONFIG_USB_MON=y

#
# USB port drivers
#

#
# USB Serial Converter support
#
# CONFIG_USB_SERIAL is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_AUERSWALD is not set
# CONFIG_USB_RIO500 is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_LED is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_PHIDGETKIT is not set
# CONFIG_USB_PHIDGETSERVO is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_LD is not set

#
# USB DSL modem support
#

#
# USB Gadget Support
#
# CONFIG_USB_GADGET is not set

#
# MMC/SD Card support
#
# CONFIG_MMC is not set

#
# InfiniBand support
#
CONFIG_INFINIBAND=y
# CONFIG_INFINIBAND_USER_MAD is not set
# CONFIG_INFINIBAND_USER_ACCESS is not set
CONFIG_INFINIBAND_MTHCA=y
CONFIG_INFINIBAND_MTHCA_DEBUG=y
CONFIG_INFINIBAND_IPOIB=y
CONFIG_INFINIBAND_IPOIB_DEBUG=y
CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=y
CONFIG_INFINIBAND_SRP=y

#
# EDAC - error detection and reporting (RAS) (EXPERIMENTAL)
#
# CONFIG_EDAC is not set

#
# Firmware Drivers
#
CONFIG_EDD=m
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set

#
# File systems
#
CONFIG_EXT2_FS=m
# CONFIG_EXT2_FS_XATTR is not set
# CONFIG_EXT2_FS_XIP is not set
CONFIG_EXT3_FS=m
CONFIG_EXT3_FS_XATTR=y
# CONFIG_EXT3_FS_POSIX_ACL is not set
# CONFIG_EXT3_FS_SECURITY is not set
CONFIG_JBD=m
# CONFIG_JBD_DEBUG is not set
CONFIG_FS_MBCACHE=m
CONFIG_REISERFS_FS=y
# CONFIG_REISERFS_CHECK is not set
# CONFIG_REISERFS_PROC_INFO is not set
# CONFIG_REISERFS_FS_XATTR is not set
# CONFIG_JFS_FS is not set
# CONFIG_FS_POSIX_ACL is not set
# CONFIG_XFS_FS is not set
# CONFIG_OCFS2_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_INOTIFY=y
# CONFIG_QUOTA is not set
CONFIG_DNOTIFY=y
CONFIG_AUTOFS_FS=y
CONFIG_AUTOFS4_FS=y
# CONFIG_FUSE_FS is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_ZISOFS_FS=y
# CONFIG_UDF_FS is not set

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not set
CONFIG_RAMFS=y
# CONFIG_RELAYFS_FS is not set
# CONFIG_CONFIGFS_FS is not set

#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
CONFIG_CRAMFS=y
# CONFIG_VXFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set

#
# Network File Systems
#
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
# CONFIG_NFS_V3_ACL is not set
# CONFIG_NFS_V4 is not set
# CONFIG_NFS_DIRECTIO is not set
CONFIG_NFSD=m
CONFIG_NFSD_V3=y
# CONFIG_NFSD_V3_ACL is not set
# CONFIG_NFSD_V4 is not set
CONFIG_NFSD_TCP=y
# CONFIG_ROOT_NFS is not set
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=m
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
# CONFIG_RPCSEC_GSS_KRB5 is not set
# CONFIG_RPCSEC_GSS_SPKM3 is not set
# CONFIG_SMB_FS is not set
# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
# CONFIG_9P_FS is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y

#
# Native Language Support
#
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
# CONFIG_NLS_ASCII is not set
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_UTF8 is not set

#
# Instrumentation Support
#
# CONFIG_PROFILING is not set
# CONFIG_KPROBES is not set

#
# Kernel hacking
#
# CONFIG_PRINTK_TIME is not set
CONFIG_MAGIC_SYSRQ=y
# CONFIG_DEBUG_KERNEL is not set
CONFIG_LOG_BUF_SHIFT=14

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY is not set

#
# Cryptographic options
#
CONFIG_CRYPTO=y
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_NULL is not set
# CONFIG_CRYPTO_MD4 is not set
CONFIG_CRYPTO_MD5=m
CONFIG_CRYPTO_SHA1=m
# CONFIG_CRYPTO_SHA256 is not set
# CONFIG_CRYPTO_SHA512 is not set
# CONFIG_CRYPTO_WP512 is not set
# CONFIG_CRYPTO_TGR192 is not set
CONFIG_CRYPTO_DES=m
# CONFIG_CRYPTO_BLOWFISH is not set
# CONFIG_CRYPTO_TWOFISH is not set
# CONFIG_CRYPTO_SERPENT is not set
# CONFIG_CRYPTO_AES is not set
# CONFIG_CRYPTO_AES_X86_64 is not set
# CONFIG_CRYPTO_CAST5 is not set
# CONFIG_CRYPTO_CAST6 is not set
# CONFIG_CRYPTO_TEA is not set
# CONFIG_CRYPTO_ARC4 is not set
# CONFIG_CRYPTO_KHAZAD is not set
# CONFIG_CRYPTO_ANUBIS is not set
# CONFIG_CRYPTO_DEFLATE is not set
# CONFIG_CRYPTO_MICHAEL_MIC is not set
CONFIG_CRYPTO_CRC32C=m
# CONFIG_CRYPTO_TEST is not set

#
# Hardware crypto devices
#

#
# Library routines
#
# CONFIG_CRC_CCITT is not set
# CONFIG_CRC16 is not set
CONFIG_CRC32=y
CONFIG_LIBCRC32C=m
CONFIG_ZLIB_INFLATE=y

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Paravirtualised drivers for fully virtualised domains, rev11
  2006-08-10 11:08     ` Paravirtualised drivers for fully virtualised domains, rev9 Steven Smith
  2006-08-10 21:48       ` Steve Dobbelstein
@ 2006-08-16 13:33       ` sos22-xen
  1 sibling, 0 replies; 22+ messages in thread
From: sos22-xen @ 2006-08-16 13:33 UTC (permalink / raw)
  To: xen-devel; +Cc: sos22


[-- Attachment #1.1: Type: text/plain, Size: 305 bytes --]

There's a new version of this patch up at
http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev11 .  The main change here
is that I now do a slightly more thorough job of disabling GSO when
compiled against kernels which don't support it.  These patches should
apply against changeset 11139:ff124973a28a.

Steven.

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Paravirtualised drivers for fully virtualised domains, rev9
  2006-08-10 21:48       ` Steve Dobbelstein
  2006-08-11 10:17         ` Steven Smith
@ 2006-08-16 13:36         ` Steven Smith
  1 sibling, 0 replies; 22+ messages in thread
From: Steven Smith @ 2006-08-16 13:36 UTC (permalink / raw)
  To: Steve Dobbelstein; +Cc: xen-devel, sos22, xen-devel-bounces


[-- Attachment #1.1: Type: text/plain, Size: 376 bytes --]

> The good news is that I don't get zombies anymore.  The bad news is that
> I'm still getting very poor network performance running netperf, worse than
> a fully virtualized domain.  I thought it was something wrong with my test
> setup when I was testing rev8, but the test setup looks good and the
> results are repeatable.
This should be fixed in rev11.

Thanks,

Steven.

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2006-08-16 13:36 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-18 12:51 Paravirtualised drivers for fully virtualised domains Steven Smith
2006-07-18 13:45 ` Ben Thomas
2006-07-18 16:00 ` Steve Ofsthun
2006-07-18 16:23   ` Mark Williamson
2006-07-18 20:34   ` Steven Smith
2006-07-18 23:24     ` Steve Ofsthun
2006-07-19  6:50       ` Gerd Hoffmann
2006-07-26 15:34 ` Steven Smith
2006-08-08  9:42   ` Steven Smith
2006-08-09 18:05     ` Steve Dobbelstein
2006-08-10 11:08     ` Paravirtualised drivers for fully virtualised domains, rev9 Steven Smith
2006-08-10 21:48       ` Steve Dobbelstein
2006-08-11 10:17         ` Steven Smith
2006-08-11 10:31           ` Harry Butterworth
2006-08-14  9:12             ` Steven Smith
2006-08-11 17:04           ` Steve Dobbelstein
2006-08-12  8:32             ` Steven Smith
2006-08-14 21:22               ` Steve Dobbelstein
2006-08-15  7:27                 ` Steven Smith
2006-08-15 22:05                   ` Steve Dobbelstein
2006-08-16 13:36         ` Steven Smith
2006-08-16 13:33       ` Paravirtualised drivers for fully virtualised domains, rev11 sos22-xen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.