[PATCH 00/14] RFC: VMCI for Linux

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/14] RFC: VMCI for Linux
@ 2012-02-15  1:05 Andrew Stiegmann (stieg)
  2012-02-15  1:05 ` [PATCH 01/14] Add vmciContext.* Andrew Stiegmann (stieg)
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: Andrew Stiegmann (stieg) @ 2012-02-15  1:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: vm-crosstalk, dtor, cschamp, Andrew Stiegmann (stieg)

In an effort to improve the out-of-the-box experience with Linux
kernels for VMware users, VMware is working on readying the Virtual
Machine Communication Interface (VMCI) and VMCI Sockets (vsock) kernel
modules for inclusion in the Linux kernel. The purpose of this initial
post is to both show how much we love Linux on Valentine's day and 
request early feedback on the VMCI kernel module with the
intent of addressing any major issues early in the process. The VMCI
Socket kernel module will be presented in a later post.

VMCI allows virtual machines to communicate with host kernel modules
and the VMware hypervisors. User level applications both in a virtual
machine and on the host can use VMCI through VMCI Sockets, a socket
address family designed to be compatible with UDP and TCP at the
interface level. Today, VMCI and VMCI Sockets are used by the VMware
shared folders (HGFS) and various VMware Tools components inside the
guest for zero-config, network-less access to VMware host services. In
addition to this, VMware's users are using VMCI Sockets for various
applications, where network access of the virtual machine is
restricted or non-existent. Examples of this are VMs communicating
with device proxies for proprietary hardware running as host
applications and automated testing of applications running within
virtual machines.

In a virtual machine, VMCI is exposed as a regular PCI device. The
primary communication mechanisms supported are a point-to-point
bidirectional transport based on a pair of memory-mapped queues, and
asynchronous notifications in the form of datagrams and
doorbells. These features are available to kernel level components
such as HGFS and VMCI Sockets through the VMCI kernel API. In addition
to this, the VMCI kernel API provides support for receiving events
related to the state of the VMCI communication channels, and the
virtual machine itself.

Outside the virtual machine, the host side support of the VMCI kernel
module makes the same VMCI kernel API available to VMCI endpoints on
the host. In addition to this, the host side manages each VMCI device
in a virtual machine through a context object. This context object
serves to identify the virtual machine for communication, and to track
the resource consumption of the given VMCI device. Both operations
related to communication between the virtual machine and the host
kernel, and those related to the management of the VMCI device state
in the host kernel, are invoked by the user level component of the
hypervisor through a set of ioctls on the VMCI device node.  To
provide seamless support for nested virtualization, where a virtual
machine may use both a VMCI PCI device to talk to its hypervisor, and
the VMCI host side support to run nested virtual machines, the VMCI
host and virtual machine support are combined in a single kernel
module.

For additional information about the use of VMCI and in particular
VMCI Sockets, please refer to the VMCI Socket Programming Guide
available at https://www.vmware.com/support/developer/vmci-sdk/.

Andrew Stiegmann (stieg) (14):
  Add vmciContext.*
  Add vmciDatagram.*
  Add vmciDoorbell.*
  Add vmciDriver.*
  Add vmciEvent.*
  Add vmciHashtable.*
  Add vmciQueuePair.*
  Add vmciResource.*
  Add vmciRoute.*
  Add accessor methods for Queue Pairs in VMCI
  Add VMCI kernel API defs and the internal header file
  Add misc header files used by VMCI
  Add main driver and kernel interface file
  Add Kconfig and Makefiles for VMCI

 drivers/misc/Kconfig                        |    1 +
 drivers/misc/Makefile                       |    1 +
 drivers/misc/vmw_vmci/Kconfig               |   16 +
 drivers/misc/vmw_vmci/Makefile              |   36 +
 drivers/misc/vmw_vmci/driver.c              | 2352 +++++++++++++++++++++++
 drivers/misc/vmw_vmci/vmciCommonInt.h       |  105 ++
 drivers/misc/vmw_vmci/vmciContext.c         | 1763 +++++++++++++++++
 drivers/misc/vmw_vmci/vmciContext.h         |   77 +
 drivers/misc/vmw_vmci/vmciDatagram.c        |  842 +++++++++
 drivers/misc/vmw_vmci/vmciDatagram.h        |   42 +
 drivers/misc/vmw_vmci/vmciDoorbell.c        | 1072 +++++++++++
 drivers/misc/vmw_vmci/vmciDoorbell.h        |   37 +
 drivers/misc/vmw_vmci/vmciDriver.c          |  663 +++++++
 drivers/misc/vmw_vmci/vmciDriver.h          |   57 +
 drivers/misc/vmw_vmci/vmciEvent.c           |  648 +++++++
 drivers/misc/vmw_vmci/vmciEvent.h           |   32 +
 drivers/misc/vmw_vmci/vmciHashtable.c       |  519 +++++
 drivers/misc/vmw_vmci/vmciHashtable.h       |   58 +
 drivers/misc/vmw_vmci/vmciKernelAPI.h       |   28 +
 drivers/misc/vmw_vmci/vmciKernelAPI1.h      |  148 ++
 drivers/misc/vmw_vmci/vmciKernelAPI2.h      |   48 +
 drivers/misc/vmw_vmci/vmciKernelIf.c        | 1351 ++++++++++++++
 drivers/misc/vmw_vmci/vmciQPair.c           | 1164 ++++++++++++
 drivers/misc/vmw_vmci/vmciQueue.h           |  108 ++
 drivers/misc/vmw_vmci/vmciQueuePair.c       | 2696 +++++++++++++++++++++++++++
 drivers/misc/vmw_vmci/vmciQueuePair.h       |   95 +
 drivers/misc/vmw_vmci/vmciResource.c        |  383 ++++
 drivers/misc/vmw_vmci/vmciResource.h        |   68 +
 drivers/misc/vmw_vmci/vmciRoute.c           |  249 +++
 drivers/misc/vmw_vmci/vmciRoute.h           |   36 +
 drivers/misc/vmw_vmci/vmci_call_defs.h      |  264 +++
 drivers/misc/vmw_vmci/vmci_defs.h           |  772 ++++++++
 drivers/misc/vmw_vmci/vmci_handle_array.h   |  339 ++++
 drivers/misc/vmw_vmci/vmci_infrastructure.h |  119 ++
 drivers/misc/vmw_vmci/vmci_iocontrols.h     |  411 ++++
 drivers/misc/vmw_vmci/vmci_kernel_if.h      |  111 ++
 36 files changed, 16711 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/Kconfig
 create mode 100644 drivers/misc/vmw_vmci/Makefile
 create mode 100644 drivers/misc/vmw_vmci/driver.c
 create mode 100644 drivers/misc/vmw_vmci/vmciCommonInt.h
 create mode 100644 drivers/misc/vmw_vmci/vmciContext.c
 create mode 100644 drivers/misc/vmw_vmci/vmciContext.h
 create mode 100644 drivers/misc/vmw_vmci/vmciDatagram.c
 create mode 100644 drivers/misc/vmw_vmci/vmciDatagram.h
 create mode 100644 drivers/misc/vmw_vmci/vmciDoorbell.c
 create mode 100644 drivers/misc/vmw_vmci/vmciDoorbell.h
 create mode 100644 drivers/misc/vmw_vmci/vmciDriver.c
 create mode 100644 drivers/misc/vmw_vmci/vmciDriver.h
 create mode 100644 drivers/misc/vmw_vmci/vmciEvent.c
 create mode 100644 drivers/misc/vmw_vmci/vmciEvent.h
 create mode 100644 drivers/misc/vmw_vmci/vmciHashtable.c
 create mode 100644 drivers/misc/vmw_vmci/vmciHashtable.h
 create mode 100644 drivers/misc/vmw_vmci/vmciKernelAPI.h
 create mode 100644 drivers/misc/vmw_vmci/vmciKernelAPI1.h
 create mode 100644 drivers/misc/vmw_vmci/vmciKernelAPI2.h
 create mode 100644 drivers/misc/vmw_vmci/vmciKernelIf.c
 create mode 100644 drivers/misc/vmw_vmci/vmciQPair.c
 create mode 100644 drivers/misc/vmw_vmci/vmciQueue.h
 create mode 100644 drivers/misc/vmw_vmci/vmciQueuePair.c
 create mode 100644 drivers/misc/vmw_vmci/vmciQueuePair.h
 create mode 100644 drivers/misc/vmw_vmci/vmciResource.c
 create mode 100644 drivers/misc/vmw_vmci/vmciResource.h
 create mode 100644 drivers/misc/vmw_vmci/vmciRoute.c
 create mode 100644 drivers/misc/vmw_vmci/vmciRoute.h
 create mode 100644 drivers/misc/vmw_vmci/vmci_call_defs.h
 create mode 100644 drivers/misc/vmw_vmci/vmci_defs.h
 create mode 100644 drivers/misc/vmw_vmci/vmci_handle_array.h
 create mode 100644 drivers/misc/vmw_vmci/vmci_infrastructure.h
 create mode 100644 drivers/misc/vmw_vmci/vmci_iocontrols.h
 create mode 100644 drivers/misc/vmw_vmci/vmci_kernel_if.h

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 01/14] Add vmciContext.*
  2012-02-15  1:05 [PATCH 00/14] RFC: VMCI for Linux Andrew Stiegmann (stieg)
@ 2012-02-15  1:05 ` Andrew Stiegmann (stieg)
  2012-02-15  1:05 ` [PATCH 02/14] Add vmciDatagram.* Andrew Stiegmann (stieg)
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Andrew Stiegmann (stieg) @ 2012-02-15  1:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: vm-crosstalk, dtor, cschamp, Andrew Stiegmann (stieg)

---
 drivers/misc/vmw_vmci/vmciContext.c | 1763 +++++++++++++++++++++++++++++++++++
 drivers/misc/vmw_vmci/vmciContext.h |   77 ++
 2 files changed, 1840 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmciContext.c
 create mode 100644 drivers/misc/vmw_vmci/vmciContext.h

diff --git a/drivers/misc/vmw_vmci/vmciContext.c b/drivers/misc/vmw_vmci/vmciContext.c
new file mode 100644
index 0000000..f252927
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciContext.c
@@ -0,0 +1,1763 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+
+#include "vmci_defs.h"
+#include "vmci_kernel_if.h"
+#include "vmci_infrastructure.h"
+#include "vmciCommonInt.h"
+#include "vmciContext.h"
+#include "vmciDatagram.h"
+#include "vmciDoorbell.h"
+#include "vmciDriver.h"
+#include "vmciEvent.h"
+#include "vmciKernelAPI.h"
+#include "vmciQueuePair.h"
+
+#define LGPFX "VMCIContext: "
+
+/* List of current VMCI contexts. */
+static struct {
+	struct list_head head;
+	spinlock_t lock;
+	spinlock_t firingLock;
+} contextList;
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContextSignalNotify --
+ *
+ *      Sets the notify flag to true.  Assumes that the context lock is
+ *      held.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+static inline void VMCIContextSignalNotify(struct vmci_context *context)
+{
+	if (context->notify)
+		*context->notify = true;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContextClearNotify --
+ *
+ *      Sets the notify flag to false.  Assumes that the context lock is
+ *      held.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+static inline void VMCIContextClearNotify(struct vmci_context *context)
+{
+	if (context->notify)
+		*context->notify = false;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContextClearNotifyAndCall --
+ *
+ *      If nothing requires the attention of the guest, clears both
+ *      notify flag and call.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+static inline void VMCIContextClearNotifyAndCall(struct vmci_context *context)
+{
+	if (context->pendingDatagrams == 0 &&
+	    VMCIHandleArray_GetSize(context->pendingDoorbellArray) == 0)
+		VMCIContextClearNotify(context);
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_CheckAndSignalNotify --
+ *
+ *      Sets the context's notify flag iff datagrams are pending for this
+ *      context.  Called from VMCISetupNotify().
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+void VMCIContext_CheckAndSignalNotify(struct vmci_context *context)
+{
+	ASSERT(context);
+
+	spin_lock(&contextList.lock);
+	if (context->pendingDatagrams)
+		VMCIContextSignalNotify(context);
+	spin_unlock(&contextList.lock);
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_Init --
+ *
+ *      Initializes the VMCI context module.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIContext_Init(void)
+{
+	INIT_LIST_HEAD(&contextList.head);
+
+	spin_lock_init(&contextList.lock);
+	spin_lock_init(&contextList.firingLock);
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContextExists --
+ *
+ *      Internal helper to check if a context with the specified context
+ *      ID exists. Assumes the contextList.lock is held.
+ *
+ * Results:
+ *      true if a context exists with the given cid.
+ *      false otherwise
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+static bool VMCIContextExists(uint32_t cid)	// IN
+{
+	struct vmci_context *context;
+
+	list_for_each_entry(context, &contextList.head, listItem) {
+		if (context->cid == cid)
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_InitContext --
+ *
+ *      Allocates and initializes a VMCI context.
+ *
+ * Results:
+ *      Returns 0 on success, appropriate error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int
+VMCIContext_InitContext(uint32_t cid,
+			uint32_t privFlags,
+			uintptr_t eventHnd,
+			int userVersion,
+			uid_t * user, struct vmci_context **outContext)
+{
+	struct vmci_context *context;
+	int result;
+
+	if (privFlags & ~VMCI_PRIVILEGE_ALL_FLAGS) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Invalid flag (flags=0x%x) for VMCI context.\n",
+				privFlags));
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	if (userVersion == 0)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	context = kmalloc(sizeof *context, GFP_KERNEL);
+	if (context == NULL) {
+		VMCI_WARNING((LGPFX
+			      "Failed to allocate memory for VMCI context.\n"));
+		return VMCI_ERROR_NO_MEM;
+	}
+	memset(context, 0, sizeof *context);
+
+	INIT_LIST_HEAD(&context->listItem);
+	INIT_LIST_HEAD(&context->datagramQueue);
+
+	context->userVersion = userVersion;
+
+	context->queuePairArray = VMCIHandleArray_Create(0);
+	if (!context->queuePairArray) {
+		result = VMCI_ERROR_NO_MEM;
+		goto error;
+	}
+
+	context->doorbellArray = VMCIHandleArray_Create(0);
+	if (!context->doorbellArray) {
+		result = VMCI_ERROR_NO_MEM;
+		goto error;
+	}
+
+	context->pendingDoorbellArray = VMCIHandleArray_Create(0);
+	if (!context->pendingDoorbellArray) {
+		result = VMCI_ERROR_NO_MEM;
+		goto error;
+	}
+
+	context->notifierArray = VMCIHandleArray_Create(0);
+	if (context->notifierArray == NULL) {
+		result = VMCI_ERROR_NO_MEM;
+		goto error;
+	}
+
+	spin_lock_init(&context->lock);
+
+	atomic_set(&context->refCount, 1);
+
+	/* Inititialize host-specific VMCI context. */
+	init_waitqueue_head(&context->hostContext.waitQueue);
+
+	context->privFlags = privFlags;
+
+	/*
+	 * If we collide with an existing context we generate a new and use it
+	 * instead. The VMX will determine if regeneration is okay. Since there
+	 * isn't 4B - 16 VMs running on a given host, the below loop will terminate.
+	 */
+	spin_lock(&contextList.lock);
+	ASSERT(cid != VMCI_INVALID_ID);
+	while (VMCIContextExists(cid)) {
+
+		/*
+		 * If the cid is below our limit and we collide we are creating duplicate
+		 * contexts internally so we want to assert fail in that case.
+		 */
+		ASSERT(cid >= VMCI_RESERVED_CID_LIMIT);
+
+		/* We reserve the lowest 16 ids for fixed contexts. */
+		cid = max(cid, VMCI_RESERVED_CID_LIMIT - 1) + 1;
+		if (cid == VMCI_INVALID_ID) {
+			cid = VMCI_RESERVED_CID_LIMIT;
+		}
+	}
+	ASSERT(!VMCIContextExists(cid));
+	context->cid = cid;
+	context->validUser = user != NULL;
+	if (context->validUser) {
+		context->user = *user;
+	}
+	list_add(&context->listItem, &contextList.head);
+	spin_unlock(&contextList.lock);
+
+	context->notify = NULL;
+	context->notifyPage = NULL;
+
+	*outContext = context;
+	return VMCI_SUCCESS;
+
+ error:
+	if (context->notifierArray) {
+		VMCIHandleArray_Destroy(context->notifierArray);
+	}
+	if (context->queuePairArray) {
+		VMCIHandleArray_Destroy(context->queuePairArray);
+	}
+	if (context->doorbellArray) {
+		VMCIHandleArray_Destroy(context->doorbellArray);
+	}
+	if (context->pendingDoorbellArray) {
+		VMCIHandleArray_Destroy(context->pendingDoorbellArray);
+	}
+	kfree(context);
+	return result;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_ReleaseContext --
+ *
+ *      Cleans up a VMCI context.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+void VMCIContext_ReleaseContext(struct vmci_context *context)	// IN
+{
+	/* Dequeue VMCI context. */
+
+	spin_lock(&contextList.lock);
+	list_del(&context->listItem);
+	spin_unlock(&contextList.lock);
+
+	VMCIContext_Release(context);
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContextFireNotification --
+ *
+ *      Fire notification for all contexts interested in given cid.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+static int VMCIContextFireNotification(uint32_t contextID,	// IN
+				       uint32_t privFlags)	// IN
+{
+	uint32_t i, arraySize;
+	struct vmci_context *subCtx;
+	struct vmci_handle_arr *subscriberArray;
+	struct vmci_handle contextHandle =
+	    VMCI_MAKE_HANDLE(contextID, VMCI_EVENT_HANDLER);
+
+	/*
+	 * We create an array to hold the subscribers we find when scanning through
+	 * all contexts.
+	 */
+	subscriberArray = VMCIHandleArray_Create(0);
+	if (subscriberArray == NULL) {
+		return VMCI_ERROR_NO_MEM;
+	}
+
+	/*
+	 * Scan all contexts to find who is interested in being notified about
+	 * given contextID. We have a special firingLock that we use to synchronize
+	 * across all notification operations. This avoids us having to take the
+	 * context lock for each HasEntry call and it solves a lock ranking issue.
+	 */
+	spin_lock(&contextList.firingLock);
+	spin_lock(&contextList.lock);
+	list_for_each_entry(subCtx, &contextList.head, listItem) {
+		/*
+		 * We only deliver notifications of the removal of contexts, if
+		 * the two contexts are allowed to interact.
+		 */
+		if (VMCIHandleArray_HasEntry
+		    (subCtx->notifierArray, contextHandle)
+		    && !VMCIDenyInteraction(privFlags, subCtx->privFlags)) {
+			VMCIHandleArray_AppendEntry(&subscriberArray,
+						    VMCI_MAKE_HANDLE
+						    (subCtx->cid,
+						     VMCI_EVENT_HANDLER));
+		}
+	}
+	spin_unlock(&contextList.lock);
+	spin_unlock(&contextList.firingLock);
+
+	/* Fire event to all subscribers. */
+	arraySize = VMCIHandleArray_GetSize(subscriberArray);
+	for (i = 0; i < arraySize; i++) {
+		int result;
+		struct vmci_event_msg *eMsg;
+		struct vmci_event_payld_ctx *evPayload;
+		char buf[sizeof *eMsg + sizeof *evPayload];
+
+		eMsg = (struct vmci_event_msg *)buf;
+
+		/* Clear out any garbage. */
+		memset(eMsg, 0, sizeof *eMsg + sizeof *evPayload);
+		eMsg->hdr.dst = VMCIHandleArray_GetEntry(subscriberArray, i);
+		eMsg->hdr.src =
+		    VMCI_MAKE_HANDLE(VMCI_HYPERVISOR_CONTEXT_ID,
+				     VMCI_CONTEXT_RESOURCE_ID);
+		eMsg->hdr.payloadSize =
+		    sizeof *eMsg + sizeof *evPayload - sizeof eMsg->hdr;
+		eMsg->eventData.event = VMCI_EVENT_CTX_REMOVED;
+		evPayload = VMCIEventMsgPayload(eMsg);
+		evPayload->contextID = contextID;
+
+		result = VMCIDatagram_Dispatch(VMCI_HYPERVISOR_CONTEXT_ID,
+					       (struct vmci_datagram *)
+					       eMsg, false);
+		if (result < VMCI_SUCCESS) {
+			VMCI_DEBUG_LOG(4,
+				       (LGPFX
+					"Failed to enqueue event datagram "
+					"(type=%d) for context (ID=0x%x).\n",
+					eMsg->eventData.event,
+					eMsg->hdr.dst.context));
+			/* We continue to enqueue on next subscriber. */
+		}
+	}
+	VMCIHandleArray_Destroy(subscriberArray);
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIContextFreeContext --
+ *
+ *      Deallocates all parts of a context datastructure. This
+ *      functions doesn't lock the context, because it assumes that
+ *      the caller is holding the last reference to context.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      Paged memory is freed.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static void VMCIContextFreeContext(struct vmci_context *context)	// IN
+{
+	struct list_head *curr;
+	struct list_head *next;
+	struct datagram_queue_entry *dqEntry;
+	struct vmci_handle tempHandle;
+
+	/* Fire event to all contexts interested in knowing this context is dying. */
+	VMCIContextFireNotification(context->cid, context->privFlags);
+
+	/*
+	 * Cleanup all queue pair resources attached to context.  If the VM dies
+	 * without cleaning up, this code will make sure that no resources are
+	 * leaked.
+	 */
+
+	tempHandle = VMCIHandleArray_GetEntry(context->queuePairArray, 0);
+	while (!VMCI_HANDLE_EQUAL(tempHandle, VMCI_INVALID_HANDLE)) {
+		if (VMCIQPBroker_Detach(tempHandle, context) < VMCI_SUCCESS) {
+			/*
+			 * When VMCIQPBroker_Detach() succeeds it removes the handle from the
+			 * array.  If detach fails, we must remove the handle ourselves.
+			 */
+			VMCIHandleArray_RemoveEntry(context->queuePairArray,
+						    tempHandle);
+		}
+		tempHandle =
+		    VMCIHandleArray_GetEntry(context->queuePairArray, 0);
+	}
+
+	/*
+	 * It is fine to destroy this without locking the callQueue, as
+	 * this is the only thread having a reference to the context.
+	 */
+
+	list_for_each_safe(curr, next, &context->datagramQueue) {
+		dqEntry =
+		    list_entry(curr, struct datagram_queue_entry, listItem);
+		list_del(curr);
+		ASSERT(dqEntry && dqEntry->dg);
+		ASSERT(dqEntry->dgSize == VMCI_DG_SIZE(dqEntry->dg));
+		kfree(dqEntry->dg);
+		kfree(dqEntry);
+	}
+
+	VMCIHandleArray_Destroy(context->notifierArray);
+	VMCIHandleArray_Destroy(context->queuePairArray);
+	VMCIHandleArray_Destroy(context->doorbellArray);
+	VMCIHandleArray_Destroy(context->pendingDoorbellArray);
+	VMCIUnsetNotify(context);
+	kfree(context);
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_PendingDatagrams --
+ *
+ *      Returns the current number of pending datagrams. The call may
+ *      also serve as a synchronization point for the datagram queue,
+ *      as no enqueue operations can occur concurrently.
+ *
+ * Results:
+ *      Length of datagram queue for the given context.
+ *
+ * Side effects:
+ *      Locks datagram queue.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIContext_PendingDatagrams(uint32_t cid,	// IN
+				 uint32_t * pending)	// OUT
+{
+	struct vmci_context *context;
+
+	context = VMCIContext_Get(cid);
+	if (context == NULL) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	spin_lock(&context->lock);
+	if (pending) {
+		*pending = context->pendingDatagrams;
+	}
+	spin_unlock(&context->lock);
+	VMCIContext_Release(context);
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_EnqueueDatagram --
+ *
+ *      Queues a VMCI datagram for the appropriate target VM
+ *      context.
+ *
+ * Results:
+ *      Size of enqueued data on success, appropriate error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIContext_EnqueueDatagram(uint32_t cid,	// IN: Target VM
+				struct vmci_datagram *dg)	// IN:
+{
+	struct datagram_queue_entry *dqEntry;
+	struct vmci_context *context;
+	struct vmci_handle dgSrc;
+	size_t vmciDgSize;
+
+	ASSERT(dg);
+	vmciDgSize = VMCI_DG_SIZE(dg);
+	ASSERT(vmciDgSize <= VMCI_MAX_DG_SIZE);
+
+	/* Get the target VM's VMCI context. */
+	context = VMCIContext_Get(cid);
+	if (context == NULL) {
+		VMCI_DEBUG_LOG(4, (LGPFX "Invalid context (ID=0x%x).\n", cid));
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	/* Allocate guest call entry and add it to the target VM's queue. */
+	dqEntry = kmalloc(sizeof *dqEntry, GFP_KERNEL);
+	if (dqEntry == NULL) {
+		VMCI_WARNING((LGPFX
+			      "Failed to allocate memory for datagram.\n"));
+		VMCIContext_Release(context);
+		return VMCI_ERROR_NO_MEM;
+	}
+	dqEntry->dg = dg;
+	dqEntry->dgSize = vmciDgSize;
+	dgSrc = dg->src;
+	INIT_LIST_HEAD(&dqEntry->listItem);
+
+	spin_lock(&context->lock);
+	/*
+	 * We put a higher limit on datagrams from the hypervisor.  If the pending
+	 * datagram is not from hypervisor, then we check if enqueueing it would
+	 * exceed the VMCI_MAX_DATAGRAM_QUEUE_SIZE limit on the destination.  If the
+	 * pending datagram is from hypervisor, we allow it to be queued at the
+	 * destination side provided we don't reach the
+	 * VMCI_MAX_DATAGRAM_AND_EVENT_QUEUE_SIZE limit.
+	 */
+	if (context->datagramQueueSize + vmciDgSize >=
+	    VMCI_MAX_DATAGRAM_QUEUE_SIZE &&
+	    (!VMCI_HANDLE_EQUAL(dgSrc,
+				VMCI_MAKE_HANDLE
+				(VMCI_HYPERVISOR_CONTEXT_ID,
+				 VMCI_CONTEXT_RESOURCE_ID))
+	     || context->datagramQueueSize + vmciDgSize >=
+	     VMCI_MAX_DATAGRAM_AND_EVENT_QUEUE_SIZE)) {
+		spin_unlock(&context->lock);
+		VMCIContext_Release(context);
+		kfree(dqEntry);
+		VMCI_DEBUG_LOG(10,
+			       (LGPFX
+				"Context (ID=0x%x) receive queue is full.\n",
+				cid));
+		return VMCI_ERROR_NO_RESOURCES;
+	}
+
+	list_add(&dqEntry->listItem, &context->datagramQueue);
+	context->pendingDatagrams++;
+	context->datagramQueueSize += vmciDgSize;
+	VMCIContextSignalNotify(context);
+	wake_up(&context->hostContext.waitQueue);
+	spin_unlock(&context->lock);
+	VMCIContext_Release(context);
+
+	return vmciDgSize;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_Exists --
+ *
+ *      Verifies whether a context with the specified context ID exists.
+ *
+ * Results:
+ *      true if a context exists with the given cid.
+ *      false otherwise
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+bool VMCIContext_Exists(uint32_t cid)	// IN
+{
+	bool rv;
+
+	spin_lock(&contextList.lock);
+	rv = VMCIContextExists(cid);
+	spin_unlock(&contextList.lock);
+	return rv;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_Get --
+ *
+ *      Retrieves VMCI context corresponding to the given cid.
+ *
+ * Results:
+ *      VMCI context on success, NULL otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+struct vmci_context *VMCIContext_Get(uint32_t cid)	// IN
+{
+	struct vmci_context *context = NULL;
+
+	if (cid == VMCI_INVALID_ID)
+		return NULL;
+
+	spin_lock(&contextList.lock);
+	list_for_each_entry(context, &contextList.head, listItem) {
+		if (context->cid == cid) {
+			/*
+			 * At this point, we are sure that the reference count is
+			 * larger already than zero. When starting the destruction of
+			 * a context, we always remove it from the context list
+			 * before decreasing the reference count. As we found the
+			 * context here, it hasn't been destroyed yet. This means
+			 * that we are not about to increase the reference count of
+			 * something that is in the process of being destroyed.
+			 */
+
+			atomic_inc(&context->refCount);
+			break;
+		}
+	}
+	spin_unlock(&contextList.lock);
+
+	return (context && context->cid == cid) ? context : NULL;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_Release --
+ *
+ *      Releases the VMCI context. If this is the last reference to
+ *      the context it will be deallocated. A context is created with
+ *      a reference count of one, and on destroy, it is removed from
+ *      the context list before its reference count is
+ *      decremented. Thus, if we reach zero, we are sure that nobody
+ *      else are about to increment it (they need the entry in the
+ *      context list for that). This function musn't be called with a
+ *      lock held.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      Paged memory may be deallocated.
+ *
+ *----------------------------------------------------------------------
+ */
+
+void VMCIContext_Release(struct vmci_context *context)	// IN
+{
+	ASSERT(context);
+	if (atomic_dec_and_test(&context->refCount))
+		VMCIContextFreeContext(context);
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_DequeueDatagram --
+ *
+ *      Dequeues the next datagram and returns it to caller.
+ *      The caller passes in a pointer to the max size datagram
+ *      it can handle and the datagram is only unqueued if the
+ *      size is less than maxSize. If larger maxSize is set to
+ *      the size of the datagram to give the caller a chance to
+ *      set up a larger buffer for the guestcall.
+ *
+ * Results:
+ *      On success:  0 if no more pending datagrams, otherwise the size of
+ *                   the next pending datagram.
+ *      On failure:  appropriate error code.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIContext_DequeueDatagram(struct vmci_context *context,	// IN
+				size_t * maxSize,	// IN/OUT: max size of 
+				//         datagram caller can handle.
+				struct vmci_datagram **dg)	// OUT:
+{
+	struct datagram_queue_entry *dqEntry;
+	struct list_head *listItem;
+	int rv;
+
+	ASSERT(context && dg);
+
+	/* Dequeue the next datagram entry. */
+	spin_lock(&context->lock);
+	if (context->pendingDatagrams == 0) {
+		VMCIContextClearNotifyAndCall(context);
+		spin_unlock(&context->lock);
+		VMCI_DEBUG_LOG(4, (LGPFX "No datagrams pending.\n"));
+		return VMCI_ERROR_NO_MORE_DATAGRAMS;
+	}
+
+	listItem = context->datagramQueue.next;
+	ASSERT(!list_empty(&context->datagramQueue));
+
+	dqEntry = list_entry(listItem, struct datagram_queue_entry, listItem);
+	ASSERT(dqEntry->dg);
+
+	/* Check size of caller's buffer. */
+	if (*maxSize < dqEntry->dgSize) {
+		*maxSize = dqEntry->dgSize;
+		spin_unlock(&context->lock);
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX "Caller's buffer should be at least "
+				"(size=%u bytes).\n", (uint32_t) * maxSize));
+		return VMCI_ERROR_NO_MEM;
+	}
+
+	list_del(listItem);
+	context->pendingDatagrams--;
+	context->datagramQueueSize -= dqEntry->dgSize;
+	if (context->pendingDatagrams == 0) {
+		VMCIContextClearNotifyAndCall(context);
+		rv = VMCI_SUCCESS;
+	} else {
+		/*
+		 * Return the size of the next datagram.
+		 */
+		struct datagram_queue_entry *nextEntry;
+
+		listItem = context->datagramQueue.next;
+		ASSERT(!list_empty(&context->datagramQueue));
+		nextEntry =
+		    list_entry(listItem, struct datagram_queue_entry, listItem);
+		ASSERT(nextEntry && nextEntry->dg);
+		/*
+		 * The following size_t -> int truncation is fine as the maximum size of
+		 * a (routable) datagram is 68KB.
+		 */
+		rv = (int)nextEntry->dgSize;
+	}
+	spin_unlock(&context->lock);
+
+	/* Caller must free datagram. */
+	ASSERT(dqEntry->dgSize == VMCI_DG_SIZE(dqEntry->dg));
+	*dg = dqEntry->dg;
+	dqEntry->dg = NULL;
+	kfree(dqEntry);
+
+	return rv;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_GetId --
+ *
+ *      Retrieves cid of given VMCI context.
+ *
+ * Results:
+ *      uint32_t of context on success, VMCI_INVALID_ID otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+uint32_t VMCIContext_GetId(struct vmci_context * context)	// IN:
+{
+	if (!context) {
+		return VMCI_INVALID_ID;
+	}
+	ASSERT(context->cid != VMCI_INVALID_ID);
+	return context->cid;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_GetPrivFlags --
+ *
+ *      Retrieves the privilege flags of the given VMCI context ID.
+ *
+ * Results:
+ *     Context's privilege flags.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+uint32_t VMCIContext_GetPrivFlags(uint32_t contextID)	// IN
+{
+	if (VMCI_HostPersonalityActive()) {
+		uint32_t flags;
+		struct vmci_context *context;
+
+		context = VMCIContext_Get(contextID);
+		if (!context) {
+			return VMCI_LEAST_PRIVILEGE_FLAGS;
+		}
+		flags = context->privFlags;
+		VMCIContext_Release(context);
+		return flags;
+	}
+	return VMCI_NO_PRIVILEGE_FLAGS;
+}
+
+EXPORT_SYMBOL(VMCIContext_GetPrivFlags);
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_AddNotification --
+ *
+ *      Add remoteCID to list of contexts current contexts wants
+ *      notifications from/about.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, error code otherwise.
+ *
+ * Side effects:
+ *      As in VMCIHandleArray_AppendEntry().
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIContext_AddNotification(uint32_t contextID,	// IN:
+				uint32_t remoteCID)	// IN:
+{
+	int result = VMCI_ERROR_ALREADY_EXISTS;
+	struct vmci_handle notifierHandle;
+	struct vmci_context *context = VMCIContext_Get(contextID);
+	if (context == NULL) {
+		return VMCI_ERROR_NOT_FOUND;
+	}
+
+	if (VMCI_CONTEXT_IS_VM(contextID) && VMCI_CONTEXT_IS_VM(remoteCID)) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Context removed notifications for other VMs not "
+				"supported (src=0x%x, remote=0x%x).\n",
+				contextID, remoteCID));
+		result = VMCI_ERROR_DST_UNREACHABLE;
+		goto out;
+	}
+
+	if (context->privFlags & VMCI_PRIVILEGE_FLAG_RESTRICTED) {
+		result = VMCI_ERROR_NO_ACCESS;
+		goto out;
+	}
+
+	notifierHandle = VMCI_MAKE_HANDLE(remoteCID, VMCI_EVENT_HANDLER);
+	spin_lock(&contextList.firingLock);
+	spin_lock(&context->lock);
+	if (!VMCIHandleArray_HasEntry(context->notifierArray, notifierHandle)) {
+		VMCIHandleArray_AppendEntry(&context->notifierArray,
+					    notifierHandle);
+		result = VMCI_SUCCESS;
+	}
+	spin_unlock(&context->lock);
+	spin_unlock(&contextList.firingLock);
+ out:
+	VMCIContext_Release(context);
+	return result;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_RemoveNotification --
+ *
+ *      Remove remoteCID from current context's list of contexts it is
+ *      interested in getting notifications from/about.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIContext_RemoveNotification(uint32_t contextID,	// IN:
+				   uint32_t remoteCID)	// IN:
+{
+	struct vmci_context *context = VMCIContext_Get(contextID);
+	struct vmci_handle tmpHandle;
+	if (context == NULL) {
+		return VMCI_ERROR_NOT_FOUND;
+	}
+	spin_lock(&contextList.firingLock);
+	spin_lock(&context->lock);
+	tmpHandle =
+	    VMCIHandleArray_RemoveEntry(context->notifierArray,
+					VMCI_MAKE_HANDLE(remoteCID,
+							 VMCI_EVENT_HANDLER));
+	spin_unlock(&context->lock);
+	spin_unlock(&contextList.firingLock);
+	VMCIContext_Release(context);
+
+	if (VMCI_HANDLE_EQUAL(tmpHandle, VMCI_INVALID_HANDLE)) {
+		return VMCI_ERROR_NOT_FOUND;
+	}
+	return VMCI_SUCCESS;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_GetCheckpointState --
+ *
+ *      Get current context's checkpoint state of given type.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIContext_GetCheckpointState(uint32_t contextID,	// IN:
+				   uint32_t cptType,	// IN:
+				   uint32_t * bufSize,	// IN/OUT:
+				   char **cptBufPtr)	// OUT:
+{
+	int i, result;
+	uint32_t arraySize, cptDataSize;
+	struct vmci_handle_arr *array;
+	struct vmci_context *context;
+	char *cptBuf;
+	bool getContextID;
+
+	ASSERT(bufSize && cptBufPtr);
+
+	context = VMCIContext_Get(contextID);
+	if (context == NULL) {
+		return VMCI_ERROR_NOT_FOUND;
+	}
+
+	spin_lock(&context->lock);
+	if (cptType == VMCI_NOTIFICATION_CPT_STATE) {
+		ASSERT(context->notifierArray);
+		array = context->notifierArray;
+		getContextID = true;
+	} else if (cptType == VMCI_WELLKNOWN_CPT_STATE) {
+		/*
+		 * For compatibility with VMX'en with VM to VM communication, we
+		 * always return zero wellknown handles.
+		 */
+
+		*bufSize = 0;
+		*cptBufPtr = NULL;
+		result = VMCI_SUCCESS;
+		goto release;
+	} else if (cptType == VMCI_DOORBELL_CPT_STATE) {
+		ASSERT(context->doorbellArray);
+		array = context->doorbellArray;
+		getContextID = false;
+	} else {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX "Invalid cpt state (type=%d).\n",
+				cptType));
+		result = VMCI_ERROR_INVALID_ARGS;
+		goto release;
+	}
+
+	arraySize = VMCIHandleArray_GetSize(array);
+	if (arraySize > 0) {
+		if (cptType == VMCI_DOORBELL_CPT_STATE) {
+			cptDataSize =
+			    arraySize * sizeof(struct dbell_cpt_state);
+		} else {
+			cptDataSize = arraySize * sizeof(uint32_t);
+		}
+		if (*bufSize < cptDataSize) {
+			*bufSize = cptDataSize;
+			result = VMCI_ERROR_MORE_DATA;
+			goto release;
+		}
+
+		cptBuf = kmalloc(cptDataSize, GFP_ATOMIC);
+
+		if (cptBuf == NULL) {
+			result = VMCI_ERROR_NO_MEM;
+			goto release;
+		}
+
+		for (i = 0; i < arraySize; i++) {
+			struct vmci_handle tmpHandle =
+			    VMCIHandleArray_GetEntry(array, i);
+			if (cptType == VMCI_DOORBELL_CPT_STATE) {
+				((struct dbell_cpt_state *)cptBuf)[i].handle =
+				    tmpHandle;
+			} else {
+				((uint32_t *) cptBuf)[i] =
+				    getContextID ? tmpHandle.context :
+				    tmpHandle.resource;
+			}
+		}
+		*bufSize = cptDataSize;
+		*cptBufPtr = cptBuf;
+	} else {
+		*bufSize = 0;
+		*cptBufPtr = NULL;
+	}
+	result = VMCI_SUCCESS;
+
+ release:
+	spin_unlock(&context->lock);
+	VMCIContext_Release(context);
+
+	return result;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_SetCheckpointState --
+ *
+ *      Set current context's checkpoint state of given type.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIContext_SetCheckpointState(uint32_t contextID,	// IN:
+				   uint32_t cptType,	// IN:
+				   uint32_t bufSize,	// IN:
+				   char *cptBuf)	// IN:
+{
+	uint32_t i;
+	uint32_t currentID;
+	int result = VMCI_SUCCESS;
+	uint32_t numIDs = bufSize / sizeof(uint32_t);
+	ASSERT(cptBuf);
+
+	if (cptType == VMCI_WELLKNOWN_CPT_STATE && numIDs > 0) {
+		/*
+		 * We would end up here if VMX with VM to VM communication
+		 * attempts to restore a checkpoint with wellknown handles.
+		 */
+
+		VMCI_WARNING((LGPFX
+			      "Attempt to restore checkpoint with obsolete "
+			      "wellknown handles.\n"));
+		return VMCI_ERROR_OBSOLETE;
+	}
+
+	if (cptType != VMCI_NOTIFICATION_CPT_STATE) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX "Invalid cpt state (type=%d).\n",
+				cptType));
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	for (i = 0; i < numIDs && result == VMCI_SUCCESS; i++) {
+		currentID = ((uint32_t *) cptBuf)[i];
+		result = VMCIContext_AddNotification(contextID, currentID);
+		if (result != VMCI_SUCCESS) {
+			break;
+		}
+	}
+	if (result != VMCI_SUCCESS) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Failed to set cpt state (type=%d) (error=%d).\n",
+				cptType, result));
+	}
+	return result;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_ReceiveNotificationsGet --
+ *
+ *      Retrieves the specified context's pending notifications in the
+ *      form of a handle array. The handle arrays returned are the
+ *      actual data - not a copy and should not be modified by the
+ *      caller. They must be released using
+ *      VMCIContext_ReceiveNotificationsRelease.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIContext_ReceiveNotificationsGet(uint32_t contextID,	// IN
+					struct vmci_handle_arr **dbHandleArray,	// OUT
+					struct vmci_handle_arr **qpHandleArray)	// OUT
+{
+	struct vmci_context *context;
+	int result = VMCI_SUCCESS;
+
+	ASSERT(dbHandleArray && qpHandleArray);
+
+	context = VMCIContext_Get(contextID);
+	if (context == NULL) {
+		return VMCI_ERROR_NOT_FOUND;
+	}
+	spin_lock(&context->lock);
+
+	*dbHandleArray = context->pendingDoorbellArray;
+	context->pendingDoorbellArray = VMCIHandleArray_Create(0);
+	if (!context->pendingDoorbellArray) {
+		context->pendingDoorbellArray = *dbHandleArray;
+		*dbHandleArray = NULL;
+		result = VMCI_ERROR_NO_MEM;
+	}
+	*qpHandleArray = NULL;
+
+	spin_unlock(&context->lock);
+	VMCIContext_Release(context);
+
+	return result;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_ReceiveNotificationsRelease --
+ *
+ *      Releases handle arrays with pending notifications previously
+ *      retrieved using VMCIContext_ReceiveNotificationsGet. If the
+ *      notifications were not successfully handed over to the guest,
+ *      success must be false.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+void VMCIContext_ReceiveNotificationsRelease(uint32_t contextID,	// IN
+					     struct vmci_handle_arr *dbHandleArray,	// IN
+					     struct vmci_handle_arr *qpHandleArray,	// IN
+					     bool success)	// IN
+{
+	struct vmci_context *context = VMCIContext_Get(contextID);
+
+	if (context) {
+		spin_lock(&context->lock);
+		if (!success) {
+			struct vmci_handle handle;
+
+			/*
+			 * New notifications may have been added while we were not
+			 * holding the context lock, so we transfer any new pending
+			 * doorbell notifications to the old array, and reinstate the
+			 * old array.
+			 */
+
+			handle =
+			    VMCIHandleArray_RemoveTail
+			    (context->pendingDoorbellArray);
+			while (!VMCI_HANDLE_INVALID(handle)) {
+				ASSERT(VMCIHandleArray_HasEntry
+				       (context->doorbellArray, handle));
+				if (!VMCIHandleArray_HasEntry
+				    (dbHandleArray, handle)) {
+					VMCIHandleArray_AppendEntry
+					    (&dbHandleArray, handle);
+				}
+				handle =
+				    VMCIHandleArray_RemoveTail
+				    (context->pendingDoorbellArray);
+			}
+			VMCIHandleArray_Destroy(context->pendingDoorbellArray);
+			context->pendingDoorbellArray = dbHandleArray;
+			dbHandleArray = NULL;
+		} else {
+			VMCIContextClearNotifyAndCall(context);
+		}
+		spin_unlock(&context->lock);
+		VMCIContext_Release(context);
+	} else {
+		/*
+		 * The OS driver part is holding on to the context for the
+		 * duration of the receive notification ioctl, so it should
+		 * still be here.
+		 */
+
+		ASSERT(false);
+	}
+
+	if (dbHandleArray) {
+		VMCIHandleArray_Destroy(dbHandleArray);
+	}
+	if (qpHandleArray) {
+		VMCIHandleArray_Destroy(qpHandleArray);
+	}
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_DoorbellCreate --
+ *
+ *      Registers that a new doorbell handle has been allocated by the
+ *      context. Only doorbell handles registered can be notified.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, appropriate error code otherewise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIContext_DoorbellCreate(uint32_t contextID,	// IN
+			       struct vmci_handle handle)	// IN
+{
+	struct vmci_context *context;
+	int result;
+
+	if (contextID == VMCI_INVALID_ID || VMCI_HANDLE_INVALID(handle)) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	context = VMCIContext_Get(contextID);
+	if (context == NULL) {
+		return VMCI_ERROR_NOT_FOUND;
+	}
+
+	spin_lock(&context->lock);
+	if (!VMCIHandleArray_HasEntry(context->doorbellArray, handle)) {
+		VMCIHandleArray_AppendEntry(&context->doorbellArray, handle);
+		result = VMCI_SUCCESS;
+	} else {
+		result = VMCI_ERROR_DUPLICATE_ENTRY;
+	}
+	spin_unlock(&context->lock);
+
+	VMCIContext_Release(context);
+
+	return result;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_DoorbellDestroy --
+ *
+ *      Unregisters a doorbell handle that was previously registered
+ *      with VMCIContext_DoorbellCreate.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, appropriate error code otherewise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIContext_DoorbellDestroy(uint32_t contextID,	// IN
+				struct vmci_handle handle)	// IN
+{
+	struct vmci_context *context;
+	struct vmci_handle removedHandle;
+
+	if (contextID == VMCI_INVALID_ID || VMCI_HANDLE_INVALID(handle)) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	context = VMCIContext_Get(contextID);
+	if (context == NULL) {
+		return VMCI_ERROR_NOT_FOUND;
+	}
+
+	spin_lock(&context->lock);
+	removedHandle =
+	    VMCIHandleArray_RemoveEntry(context->doorbellArray, handle);
+	VMCIHandleArray_RemoveEntry(context->pendingDoorbellArray, handle);
+	spin_unlock(&context->lock);
+
+	VMCIContext_Release(context);
+
+	if (VMCI_HANDLE_INVALID(removedHandle)) {
+		return VMCI_ERROR_NOT_FOUND;
+	} else {
+		return VMCI_SUCCESS;
+	}
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_DoorbellDestroyAll --
+ *
+ *      Unregisters all doorbell handles that were previously
+ *      registered with VMCIContext_DoorbellCreate.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, appropriate error code otherewise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIContext_DoorbellDestroyAll(uint32_t contextID)	// IN
+{
+	struct vmci_context *context;
+	struct vmci_handle removedHandle;
+
+	if (contextID == VMCI_INVALID_ID) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	context = VMCIContext_Get(contextID);
+	if (context == NULL) {
+		return VMCI_ERROR_NOT_FOUND;
+	}
+
+	spin_lock(&context->lock);
+	do {
+		removedHandle =
+		    VMCIHandleArray_RemoveTail(context->doorbellArray);
+	} while (!VMCI_HANDLE_INVALID(removedHandle));
+	do {
+		removedHandle =
+		    VMCIHandleArray_RemoveTail(context->pendingDoorbellArray);
+	} while (!VMCI_HANDLE_INVALID(removedHandle));
+	spin_unlock(&context->lock);
+
+	VMCIContext_Release(context);
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_NotifyDoorbell --
+ *
+ *      Registers a notification of a doorbell handle initiated by the
+ *      specified source context. The notification of doorbells are
+ *      subject to the same isolation rules as datagram delivery. To
+ *      allow host side senders of notifications a finer granularity
+ *      of sender rights than those assigned to the sending context
+ *      itself, the host context is required to specify a different
+ *      set of privilege flags that will override the privileges of
+ *      the source context.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, appropriate error code otherewise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIContext_NotifyDoorbell(uint32_t srcCID,	// IN
+			       struct vmci_handle handle,	// IN
+			       uint32_t srcPrivFlags)	// IN
+{
+	struct vmci_context *dstContext;
+	int result;
+
+	if (VMCI_HANDLE_INVALID(handle)) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	/* Get the target VM's VMCI context. */
+	dstContext = VMCIContext_Get(handle.context);
+	if (dstContext == NULL) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX "Invalid context (ID=0x%x).\n",
+				handle.context));
+		return VMCI_ERROR_NOT_FOUND;
+	}
+
+	if (srcCID != handle.context) {
+		uint32_t dstPrivFlags;
+
+		if (VMCI_CONTEXT_IS_VM(srcCID)
+		    && VMCI_CONTEXT_IS_VM(handle.context)) {
+			VMCI_DEBUG_LOG(4,
+				       (LGPFX
+					"Doorbell notification from VM to VM not "
+					"supported (src=0x%x, dst=0x%x).\n",
+					srcCID, handle.context));
+			result = VMCI_ERROR_DST_UNREACHABLE;
+			goto out;
+		}
+
+		result = VMCIDoorbellGetPrivFlags(handle, &dstPrivFlags);
+		if (result < VMCI_SUCCESS) {
+			VMCI_WARNING((LGPFX
+				      "Failed to get privilege flags for destination "
+				      "(handle=0x%x:0x%x).\n",
+				      handle.context, handle.resource));
+			goto out;
+		}
+
+		if (srcCID != VMCI_HOST_CONTEXT_ID ||
+		    srcPrivFlags == VMCI_NO_PRIVILEGE_FLAGS) {
+			srcPrivFlags = VMCIContext_GetPrivFlags(srcCID);
+		}
+
+		if (VMCIDenyInteraction(srcPrivFlags, dstPrivFlags)) {
+			result = VMCI_ERROR_NO_ACCESS;
+			goto out;
+		}
+	}
+
+	if (handle.context == VMCI_HOST_CONTEXT_ID) {
+		result = VMCIDoorbellHostContextNotify(srcCID, handle);
+	} else {
+		spin_lock(&dstContext->lock);
+
+		if (!VMCIHandleArray_HasEntry
+		    (dstContext->doorbellArray, handle)) {
+			result = VMCI_ERROR_NOT_FOUND;
+		} else {
+			if (!VMCIHandleArray_HasEntry
+			    (dstContext->pendingDoorbellArray, handle)) {
+				VMCIHandleArray_AppendEntry
+				    (&dstContext->pendingDoorbellArray, handle);
+
+				VMCIContextSignalNotify(dstContext);
+				wake_up(&dstContext->hostContext.waitQueue);
+
+			}
+			result = VMCI_SUCCESS;
+		}
+		spin_unlock(&dstContext->lock);
+	}
+
+ out:
+	VMCIContext_Release(dstContext);
+
+	return result;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCI_ContextID2HostVmID --
+ *
+ *      Maps a context ID to the host specific (process/world) ID
+ *      of the VM/VMX.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCI_ContextID2HostVmID(uint32_t contextID,	// IN
+			    void *hostVmID,	// OUT
+			    size_t hostVmIDLen)	// IN
+{
+	return VMCI_ERROR_UNAVAILABLE;
+}
+
+EXPORT_SYMBOL(VMCI_ContextID2HostVmID);
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCI_IsContextOwner --
+ *
+ *      Determines whether a given host OS specific representation of
+ *      user is the owner of the VM/VMX.
+ *
+ * Results:
+ *      VMCI_SUCCESS if the hostUser is owner, error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCI_IsContextOwner(uint32_t contextID,	// IN
+			void *hostUser)	// IN
+{
+	if (VMCI_HostPersonalityActive()) {
+		struct vmci_context *context;
+		uid_t *user = (uid_t *) hostUser;
+		int retval;
+
+		if (!hostUser) {
+			return VMCI_ERROR_INVALID_ARGS;
+		}
+
+		context = VMCIContext_Get(contextID);
+		if (!context) {
+			return VMCI_ERROR_NOT_FOUND;
+		}
+
+		if (context->validUser) {
+			retval = VMCIHost_CompareUser(user, &context->user);
+		} else {
+			retval = VMCI_ERROR_UNAVAILABLE;
+		}
+		VMCIContext_Release(context);
+
+		return retval;
+	}
+	return VMCI_ERROR_UNAVAILABLE;
+}
+
+EXPORT_SYMBOL(VMCI_IsContextOwner);
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_SupportsHostQP --
+ *
+ *      Can host QPs be connected to this user process.  The answer is
+ *      false unless a sufficient version number has previously been set
+ *      by this caller.
+ *
+ * Results:
+ *      true if context supports host queue pairs, false otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+bool VMCIContext_SupportsHostQP(struct vmci_context * context)	// IN: Context structure
+{
+	if (!context || context->userVersion < VMCI_VERSION_HOSTQP) {
+		return false;
+	}
+	return true;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_QueuePairCreate --
+ *
+ *      Registers that a new queue pair handle has been allocated by
+ *      the context.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, appropriate error code otherewise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIContext_QueuePairCreate(struct vmci_context *context,	// IN: Context structure
+				struct vmci_handle handle)	// IN
+{
+	int result;
+
+	if (context == NULL || VMCI_HANDLE_INVALID(handle)) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	spin_lock(&context->lock);
+	if (!VMCIHandleArray_HasEntry(context->queuePairArray, handle)) {
+		VMCIHandleArray_AppendEntry(&context->queuePairArray, handle);
+		result = VMCI_SUCCESS;
+	} else {
+		result = VMCI_ERROR_DUPLICATE_ENTRY;
+	}
+	spin_unlock(&context->lock);
+
+	return result;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_QueuePairDestroy --
+ *
+ *      Unregisters a queue pair handle that was previously registered
+ *      with VMCIContext_QueuePairCreate.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, appropriate error code otherewise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIContext_QueuePairDestroy(struct vmci_context *context,	// IN: Context structure
+				 struct vmci_handle handle)	// IN
+{
+	struct vmci_handle removedHandle;
+
+	if (context == NULL || VMCI_HANDLE_INVALID(handle)) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	spin_lock(&context->lock);
+	removedHandle =
+	    VMCIHandleArray_RemoveEntry(context->queuePairArray, handle);
+	spin_unlock(&context->lock);
+
+	if (VMCI_HANDLE_INVALID(removedHandle)) {
+		return VMCI_ERROR_NOT_FOUND;
+	} else {
+		return VMCI_SUCCESS;
+	}
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIContext_QueuePairExists --
+ *
+ *      Determines whether a given queue pair handle is registered
+ *      with the given context.
+ *
+ * Results:
+ *      true, if queue pair is registered with context. false, otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+bool VMCIContext_QueuePairExists(struct vmci_context *context,	// IN: Context structure
+				 struct vmci_handle handle)	// IN
+{
+	bool result;
+
+	if (context == NULL || VMCI_HANDLE_INVALID(handle)) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	spin_lock(&context->lock);
+	result = VMCIHandleArray_HasEntry(context->queuePairArray, handle);
+	spin_unlock(&context->lock);
+
+	return result;
+}
diff --git a/drivers/misc/vmw_vmci/vmciContext.h b/drivers/misc/vmw_vmci/vmciContext.h
new file mode 100644
index 0000000..d6f7388
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciContext.h
@@ -0,0 +1,77 @@
+/*
+ *
+ * VMware VMCI driver (vmciContext.h)
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef _VMCI_CONTEXT_H_
+#define _VMCI_CONTEXT_H_
+
+#include "vmci_defs.h"
+#include "vmci_handle_array.h"
+#include "vmci_infrastructure.h"
+#include "vmci_kernel_if.h"
+#include "vmciCommonInt.h"
+
+#define MAX_QUEUED_GUESTCALLS_PER_VM  100
+
+int VMCIContext_Init(void);
+int VMCIContext_InitContext(uint32_t cid, uint32_t flags,
+			    uintptr_t eventHnd, int version,
+			    uid_t * user, struct vmci_context **context);
+
+bool VMCIContext_SupportsHostQP(struct vmci_context *context);
+void VMCIContext_ReleaseContext(struct vmci_context *context);
+int VMCIContext_EnqueueDatagram(uint32_t cid, struct vmci_datagram *dg);
+int VMCIContext_DequeueDatagram(struct vmci_context *context,
+				size_t * maxSize, struct vmci_datagram **dg);
+int VMCIContext_PendingDatagrams(uint32_t cid, uint32_t * pending);
+struct vmci_context *VMCIContext_Get(uint32_t cid);
+void VMCIContext_Release(struct vmci_context *context);
+bool VMCIContext_Exists(uint32_t cid);
+
+uint32_t VMCIContext_GetId(struct vmci_context *context);
+int VMCIContext_AddNotification(uint32_t contextID, uint32_t remoteCID);
+int VMCIContext_RemoveNotification(uint32_t contextID, uint32_t remoteCID);
+int VMCIContext_GetCheckpointState(uint32_t contextID, uint32_t cptType,
+				   uint32_t * numCIDs, char **cptBufPtr);
+int VMCIContext_SetCheckpointState(uint32_t contextID, uint32_t cptType,
+				   uint32_t numCIDs, char *cptBuf);
+
+int VMCIContext_QueuePairCreate(struct vmci_context *context,
+				struct vmci_handle handle);
+int VMCIContext_QueuePairDestroy(struct vmci_context *context,
+				 struct vmci_handle handle);
+bool VMCIContext_QueuePairExists(struct vmci_context *context,
+				 struct vmci_handle handle);
+
+void VMCIContext_CheckAndSignalNotify(struct vmci_context *context);
+void VMCIUnsetNotify(struct vmci_context *context);
+
+int VMCIContext_DoorbellCreate(uint32_t contextID, struct vmci_handle handle);
+int VMCIContext_DoorbellDestroy(uint32_t contextID, struct vmci_handle handle);
+int VMCIContext_DoorbellDestroyAll(uint32_t contextID);
+int VMCIContext_NotifyDoorbell(uint32_t cid, struct vmci_handle handle,
+			       uint32_t srcPrivFlags);
+
+int VMCIContext_ReceiveNotificationsGet(uint32_t contextID, struct vmci_handle_arr
+					**dbHandleArray, struct vmci_handle_arr
+					**qpHandleArray);
+void VMCIContext_ReceiveNotificationsRelease(uint32_t contextID, struct vmci_handle_arr
+					     *dbHandleArray, struct vmci_handle_arr
+					     *qpHandleArray, bool success);
+#endif				// _VMCI_CONTEXT_H_
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 02/14] Add vmciDatagram.*
  2012-02-15  1:05 [PATCH 00/14] RFC: VMCI for Linux Andrew Stiegmann (stieg)
  2012-02-15  1:05 ` [PATCH 01/14] Add vmciContext.* Andrew Stiegmann (stieg)
@ 2012-02-15  1:05 ` Andrew Stiegmann (stieg)
  2012-02-15  1:05 ` [PATCH 03/14] Add vmciDoorbell.* Andrew Stiegmann (stieg)
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Andrew Stiegmann (stieg) @ 2012-02-15  1:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: vm-crosstalk, dtor, cschamp, Andrew Stiegmann (stieg)

---
 drivers/misc/vmw_vmci/vmciDatagram.c |  842 ++++++++++++++++++++++++++++++++++
 drivers/misc/vmw_vmci/vmciDatagram.h |   42 ++
 2 files changed, 884 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmciDatagram.c
 create mode 100644 drivers/misc/vmw_vmci/vmciDatagram.h

diff --git a/drivers/misc/vmw_vmci/vmciDatagram.c b/drivers/misc/vmw_vmci/vmciDatagram.c
new file mode 100644
index 0000000..e8a95de
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciDatagram.c
@@ -0,0 +1,842 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#include <linux/module.h>
+#include <linux/sched.h>
+
+#include "vmci_defs.h"
+#include "vmci_infrastructure.h"
+#include "vmci_kernel_if.h"
+#include "vmciCommonInt.h"
+#include "vmciContext.h"
+#include "vmciDatagram.h"
+#include "vmciDriver.h"
+#include "vmciEvent.h"
+#include "vmciHashtable.h"
+#include "vmciKernelAPI.h"
+#include "vmciResource.h"
+#include "vmciRoute.h"
+
+#define LGPFX "VMCIDatagram: "
+
+/*
+ * struct datagram_entry describes the datagram entity. It is used for datagram
+ * entities created only on the host.
+ */
+struct datagram_entry {
+	struct vmci_resource resource;
+	uint32_t flags;
+	bool runDelayed;
+	VMCIDatagramRecvCB recvCB;
+	void *clientData;
+	wait_queue_head_t destroyEvent;
+	uint32_t privFlags;
+};
+
+struct delayed_datagram_info {
+	bool inDGHostQueue;
+	struct datagram_entry *entry;
+	struct vmci_datagram msg;
+};
+
+static atomic_t delayedDGHostQueueSize;
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  DatagramFreeCB --
+ *     Callback to free datagram structure when resource is no longer used,
+ *     ie. the reference count reached 0.
+ *
+ *  Result:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static void DatagramFreeCB(void *clientData)
+{
+	struct datagram_entry *entry = (struct datagram_entry *)clientData;
+	ASSERT(entry);
+	/* Entry is freed in VMCIDatagram_DestroyHnd, who waits for the signal */
+	wake_up(&entry->destroyEvent);
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  DatagramReleaseCB --
+ *
+ *     Callback to release the resource reference. It is called by the
+ *     VMCI_WaitOnEvent function before it blocks.
+ *
+ *  Result:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static int DatagramReleaseCB(void *clientData)
+{
+	struct datagram_entry *entry = (struct datagram_entry *)clientData;
+	ASSERT(entry);
+	VMCIResource_Release(&entry->resource);
+	return 0;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * DatagramCreateHnd --
+ *
+ *      Internal function to create a datagram entry given a handle.
+ *
+ * Results:
+ *      VMCI_SUCCESS if created, negative errno value otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static int DatagramCreateHnd(uint32_t resourceID,	// IN:
+			     uint32_t flags,	// IN:
+			     uint32_t privFlags,	// IN:
+			     VMCIDatagramRecvCB recvCB,	// IN:
+			     void *clientData,	// IN:
+			     struct vmci_handle *outHandle)	// OUT:
+{
+	int result;
+	uint32_t contextID;
+	struct vmci_handle handle;
+	struct datagram_entry *entry;
+
+	ASSERT(recvCB != NULL);
+	ASSERT(outHandle != NULL);
+	ASSERT(!(privFlags & ~VMCI_PRIVILEGE_ALL_FLAGS));
+
+	if ((flags & VMCI_FLAG_WELLKNOWN_DG_HND) != 0) {
+		return VMCI_ERROR_INVALID_ARGS;
+	} else {
+		if ((flags & VMCI_FLAG_ANYCID_DG_HND) != 0) {
+			contextID = VMCI_INVALID_ID;
+		} else {
+			contextID = VMCI_GetContextID();
+			if (contextID == VMCI_INVALID_ID) {
+				return VMCI_ERROR_NO_RESOURCES;
+			}
+		}
+
+		if (resourceID == VMCI_INVALID_ID) {
+			resourceID = VMCIResource_GetID(contextID);
+			if (resourceID == VMCI_INVALID_ID) {
+				return VMCI_ERROR_NO_HANDLE;
+			}
+		}
+
+		handle = VMCI_MAKE_HANDLE(contextID, resourceID);
+	}
+
+	entry = kmalloc(sizeof *entry, GFP_KERNEL);
+	if (entry == NULL) {
+		VMCI_WARNING((LGPFX
+			      "Failed allocating memory for datagram entry.\n"));
+		return VMCI_ERROR_NO_MEM;
+	}
+
+	entry->runDelayed = (flags & VMCI_FLAG_DG_DELAYED_CB) ? true : false;
+	entry->flags = flags;
+	entry->recvCB = recvCB;
+	entry->clientData = clientData;
+	init_waitqueue_head(&entry->destroyEvent);
+	entry->privFlags = privFlags;
+
+	/* Make datagram resource live. */
+	result =
+	    VMCIResource_Add(&entry->resource, VMCI_RESOURCE_TYPE_DATAGRAM,
+			     handle, DatagramFreeCB, entry);
+	if (result != VMCI_SUCCESS) {
+		VMCI_WARNING((LGPFX
+			      "Failed to add new resource (handle=0x%x:0x%x).\n",
+			      handle.context, handle.resource));
+		kfree(entry);
+		return result;
+	}
+	*outHandle = handle;
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIDatagram_Init --
+ *
+ *     Initialize Datagram API, ie. register the API functions with their
+ *     corresponding vectors.
+ *
+ *  Result:
+ *     None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIDatagram_Init(void)
+{
+	atomic_set(&delayedDGHostQueueSize, 0);
+	return VMCI_SUCCESS;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDatagram_CreateHnd --
+ *
+ *      Creates a host context datagram endpoint and returns a handle to it.
+ *
+ * Results:
+ *      VMCI_SUCCESS if created, negative errno value otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIDatagram_CreateHnd(uint32_t resourceID,	// IN: Optional, generated
+			   //     if VMCI_INVALID_ID
+			   uint32_t flags,	// IN:
+			   VMCIDatagramRecvCB recvCB,	// IN:
+			   void *clientData,	// IN:
+			   struct vmci_handle *outHandle)	// OUT: newly created handle
+{
+	if (outHandle == NULL)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	if (recvCB == NULL) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Client callback needed when creating datagram.\n"));
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	return DatagramCreateHnd(resourceID, flags,
+				 VMCI_DEFAULT_PROC_PRIVILEGE_FLAGS, recvCB,
+				 clientData, outHandle);
+}
+
+EXPORT_SYMBOL(VMCIDatagram_CreateHnd);
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDatagram_CreateHndPriv --
+ *
+ *      Creates a host context datagram endpoint and returns a handle to it.
+ *
+ * Results:
+ *      VMCI_SUCCESS if created, negative errno value otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIDatagram_CreateHndPriv(uint32_t resourceID,	// IN: Optional, generated
+			       //     if VMCI_INVALID_ID
+			       uint32_t flags,	// IN:
+			       uint32_t privFlags,	// IN:
+			       VMCIDatagramRecvCB recvCB,	// IN:
+			       void *clientData,	// IN:
+			       struct vmci_handle *outHandle)	// OUT: newly created handle
+{
+	if (outHandle == NULL) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	if (recvCB == NULL) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Client callback needed when creating datagram.\n"));
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	if (privFlags & ~VMCI_PRIVILEGE_ALL_FLAGS) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	return DatagramCreateHnd(resourceID, flags, privFlags, recvCB,
+				 clientData, outHandle);
+}
+
+EXPORT_SYMBOL(VMCIDatagram_CreateHndPriv);
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDatagram_DestroyHnd --
+ *
+ *      Destroys a handle.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIDatagram_DestroyHnd(struct vmci_handle handle)	// IN
+{
+	struct datagram_entry *entry;
+	struct vmci_resource *resource = VMCIResource_Get(handle,
+							  VMCI_RESOURCE_TYPE_DATAGRAM);
+	if (resource == NULL) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Failed to destroy datagram (handle=0x%x:0x%x).\n",
+				handle.context, handle.resource));
+		return VMCI_ERROR_NOT_FOUND;
+	}
+	entry = RESOURCE_CONTAINER(resource, struct datagram_entry, resource);
+
+	VMCIResource_Remove(handle, VMCI_RESOURCE_TYPE_DATAGRAM);
+
+	/*
+	 * We now wait on the destroyEvent and release the reference we got
+	 * above.
+	 */
+	VMCI_WaitOnEvent(&entry->destroyEvent, DatagramReleaseCB, entry);
+
+	/*
+	 * We know that we are now the only reference to the above entry so
+	 * can safely free it.
+	 */
+	kfree(entry);
+
+	return VMCI_SUCCESS;
+}
+
+EXPORT_SYMBOL(VMCIDatagram_DestroyHnd);
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIDatagramGetPrivFlagsInt --
+ *
+ *     Internal utilility function with the same purpose as
+ *     VMCIDatagram_GetPrivFlags that also takes a contextID.
+ *
+ *  Result:
+ *     VMCI_SUCCESS on success, VMCI_ERROR_INVALID_ARGS if handle is invalid.
+ *
+ *  Side effects:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static int VMCIDatagramGetPrivFlagsInt(uint32_t contextID,	// IN
+				       struct vmci_handle handle,	// IN
+				       uint32_t * privFlags)	// OUT
+{
+	ASSERT(privFlags);
+	ASSERT(contextID != VMCI_INVALID_ID);
+
+	if (contextID == VMCI_HOST_CONTEXT_ID) {
+		struct datagram_entry *srcEntry;
+		struct vmci_resource *resource;
+
+		resource =
+		    VMCIResource_Get(handle, VMCI_RESOURCE_TYPE_DATAGRAM);
+		if (resource == NULL) {
+			return VMCI_ERROR_INVALID_ARGS;
+		}
+		srcEntry =
+		    RESOURCE_CONTAINER(resource, struct datagram_entry,
+				       resource);
+		*privFlags = srcEntry->privFlags;
+		VMCIResource_Release(resource);
+	} else if (contextID == VMCI_HYPERVISOR_CONTEXT_ID) {
+		*privFlags = VMCI_MAX_PRIVILEGE_FLAGS;
+	} else {
+		*privFlags = VMCIContext_GetPrivFlags(contextID);
+	}
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIDatagram_GetPrivFlags --
+ *
+ *     Utilility function that retrieves the privilege flags
+ *     associated with a given datagram handle. For hypervisor and
+ *     guest endpoints, the privileges are determined by the context
+ *     ID, but for host endpoints privileges are associated with the
+ *     complete handle.
+ *
+ *  Result:
+ *     VMCI_SUCCESS on success, VMCI_ERROR_INVALID_ARGS if handle is invalid.
+ *
+ *  Side effects:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIDatagram_GetPrivFlags(struct vmci_handle handle,	// IN
+			      uint32_t * privFlags)	// OUT
+{
+	if (privFlags == NULL || handle.context == VMCI_INVALID_ID)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	return VMCIDatagramGetPrivFlagsInt(handle.context, handle, privFlags);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIDatagramDelayedDispatchCB --
+ *
+ *      Calls the specified callback in a delayed context.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static void VMCIDatagramDelayedDispatchCB(void *data)	// IN
+{
+	bool inDGHostQueue;
+	struct delayed_datagram_info *dgInfo =
+	    (struct delayed_datagram_info *)data;
+
+	ASSERT(data);
+
+	dgInfo->entry->recvCB(dgInfo->entry->clientData, &dgInfo->msg);
+
+	VMCIResource_Release(&dgInfo->entry->resource);
+
+	inDGHostQueue = dgInfo->inDGHostQueue;
+	kfree(dgInfo);
+
+	if (inDGHostQueue)
+		atomic_dec(&delayedDGHostQueueSize);
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIDatagramDispatchAsHost --
+ *
+ *     Dispatch datagram as a host, to the host or other vm context. This
+ *     function cannot dispatch to hypervisor context handlers. This should
+ *     have been handled before we get here by VMCIDatagramDispatch.
+ *
+ *  Result:
+ *     Number of bytes sent on success, appropriate error code otherwise.
+ *
+ *  Side effects:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static int VMCIDatagramDispatchAsHost(uint32_t contextID,	// IN:
+				      struct vmci_datagram *dg)	// IN:
+{
+	int retval;
+	size_t dgSize;
+	uint32_t srcPrivFlags;
+
+	ASSERT(dg);
+	ASSERT(VMCI_HostPersonalityActive());
+
+	dgSize = VMCI_DG_SIZE(dg);
+
+	if (contextID == VMCI_HOST_CONTEXT_ID &&
+	    dg->dst.context == VMCI_HYPERVISOR_CONTEXT_ID) {
+		return VMCI_ERROR_DST_UNREACHABLE;
+	}
+
+	ASSERT(dg->dst.context != VMCI_HYPERVISOR_CONTEXT_ID);
+
+	/* Check that source handle matches sending context. */
+	if (dg->src.context != contextID) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Sender context (ID=0x%x) is not owner of src "
+				"datagram entry (handle=0x%x:0x%x).\n",
+				contextID, dg->src.context, dg->src.resource));
+		return VMCI_ERROR_NO_ACCESS;
+	}
+
+	/* Get hold of privileges of sending endpoint. */
+	retval = VMCIDatagramGetPrivFlagsInt(contextID, dg->src, &srcPrivFlags);
+	if (retval != VMCI_SUCCESS) {
+		VMCI_WARNING((LGPFX
+			      "Couldn't get privileges (handle=0x%x:0x%x).\n",
+			      dg->src.context, dg->src.resource));
+		return retval;
+	}
+
+	/* Determine if we should route to host or guest destination. */
+	if (dg->dst.context == VMCI_HOST_CONTEXT_ID) {
+		/* Route to host datagram entry. */
+		struct datagram_entry *dstEntry;
+		struct vmci_resource *resource;
+
+		if (dg->src.context == VMCI_HYPERVISOR_CONTEXT_ID &&
+		    dg->dst.resource == VMCI_EVENT_HANDLER) {
+			return VMCIEvent_Dispatch(dg);
+		}
+
+		resource =
+		    VMCIResource_Get(dg->dst, VMCI_RESOURCE_TYPE_DATAGRAM);
+		if (resource == NULL) {
+			VMCI_DEBUG_LOG(4,
+				       (LGPFX
+					"Sending to invalid destination "
+					"(handle=0x%x:0x%x).\n",
+					dg->dst.context, dg->dst.resource));
+			return VMCI_ERROR_INVALID_RESOURCE;
+		}
+		dstEntry =
+		    RESOURCE_CONTAINER(resource, struct datagram_entry,
+				       resource);
+		if (VMCIDenyInteraction(srcPrivFlags, dstEntry->privFlags)) {
+			VMCIResource_Release(resource);
+			return VMCI_ERROR_NO_ACCESS;
+		}
+		ASSERT(dstEntry->recvCB);
+
+		/*
+		 * If a VMCI datagram destined for the host is also sent by the
+		 * host, we always run it delayed. This ensures that no locks
+		 * are held when the datagram callback runs.
+		 */
+
+		if (dstEntry->runDelayed
+		    || dg->src.context == VMCI_HOST_CONTEXT_ID) {
+			struct delayed_datagram_info *dgInfo;
+
+			if (atomic_add_return(1, &delayedDGHostQueueSize)
+			    == VMCI_MAX_DELAYED_DG_HOST_QUEUE_SIZE) {
+				atomic_dec(&delayedDGHostQueueSize);
+				VMCIResource_Release(resource);
+				return VMCI_ERROR_NO_MEM;
+			}
+
+			dgInfo =
+			    kmalloc(sizeof *dgInfo +
+				    (size_t) dg->payloadSize, GFP_ATOMIC);
+			if (NULL == dgInfo) {
+				atomic_dec(&delayedDGHostQueueSize);
+				VMCIResource_Release(resource);
+				return VMCI_ERROR_NO_MEM;
+			}
+
+			dgInfo->inDGHostQueue = true;
+			dgInfo->entry = dstEntry;
+			memcpy(&dgInfo->msg, dg, dgSize);
+
+			retval =
+			    VMCI_ScheduleDelayedWork
+			    (VMCIDatagramDelayedDispatchCB, dgInfo);
+			if (retval < VMCI_SUCCESS) {
+				VMCI_WARNING((LGPFX
+					      "Failed to schedule delayed work for datagram "
+					      "(result=%d).\n", retval));
+				kfree(dgInfo);
+				VMCIResource_Release(resource);
+				atomic_dec(&delayedDGHostQueueSize);
+				return retval;
+			}
+		} else {
+			retval = dstEntry->recvCB(dstEntry->clientData, dg);
+			VMCIResource_Release(resource);
+			if (retval < VMCI_SUCCESS) {
+				return retval;
+			}
+		}
+	} else {
+		/* Route to destination VM context. */
+		struct vmci_datagram *newDG;
+
+		if (contextID != dg->dst.context) {
+			if (VMCIDenyInteraction(srcPrivFlags,
+						VMCIContext_GetPrivFlags
+						(dg->dst.context))) {
+				return VMCI_ERROR_NO_ACCESS;
+			} else if (VMCI_CONTEXT_IS_VM(contextID)) {
+				/* If the sending context is a VM, it cannot reach another VM. */
+
+				VMCI_DEBUG_LOG(4,
+					       (LGPFX
+						"Datagram communication between VMs not"
+						"supported (src=0x%x, dst=0x%x).\n",
+						contextID, dg->dst.context));
+				return VMCI_ERROR_DST_UNREACHABLE;
+			}
+		}
+
+		/* We make a copy to enqueue. */
+		newDG = kmalloc(dgSize, GFP_KERNEL);
+		if (newDG == NULL) {
+			return VMCI_ERROR_NO_MEM;
+		}
+		memcpy(newDG, dg, dgSize);
+		retval = VMCIContext_EnqueueDatagram(dg->dst.context, newDG);
+		if (retval < VMCI_SUCCESS) {
+			kfree(newDG);
+			return retval;
+		}
+	}
+
+	/*
+	 * We currently truncate the size to signed 32 bits. This doesn't
+	 * matter for this handler as it only support 4Kb messages.
+	 */
+	return (int)dgSize;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIDatagramDispatchAsGuest --
+ *
+ *     Dispatch datagram as a guest, down through the VMX and potentially to
+ *     the host.
+ *
+ *  Result:
+ *     Number of bytes sent on success, appropriate error code otherwise.
+ *
+ *  Side effects:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static int VMCIDatagramDispatchAsGuest(struct vmci_datagram *dg)
+{
+	int retval;
+	struct vmci_resource *resource;
+
+	resource = VMCIResource_Get(dg->src, VMCI_RESOURCE_TYPE_DATAGRAM);
+	if (NULL == resource) {
+		return VMCI_ERROR_NO_HANDLE;
+	}
+
+	retval = VMCI_SendDatagram(dg);
+	VMCIResource_Release(resource);
+	return retval;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIDatagram_Dispatch --
+ *
+ *     Dispatch datagram.  This will determine the routing for the datagram
+ *     and dispatch it accordingly.
+ *
+ *  Result:
+ *     Number of bytes sent on success, appropriate error code otherwise.
+ *
+ *  Side effects:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int
+VMCIDatagram_Dispatch(uint32_t contextID,
+		      struct vmci_datagram *dg, bool fromGuest)
+{
+	int retval;
+	enum vmci_route route;
+
+	ASSERT(dg);
+	ASSERT_ON_COMPILE(sizeof(struct vmci_datagram) == 24);
+
+	if (VMCI_DG_SIZE(dg) > VMCI_MAX_DG_SIZE) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX "Payload (size=%" FMT64
+				"u bytes) too big to " "send.\n",
+				dg->payloadSize));
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	retval = VMCI_Route(&dg->src, &dg->dst, fromGuest, &route);
+	if (retval < VMCI_SUCCESS) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Failed to route datagram (src=0x%x, dst=0x%x, "
+				"err=%d)\n.", dg->src.context,
+				dg->dst.context, retval));
+		return retval;
+	}
+
+	if (VMCI_ROUTE_AS_HOST == route) {
+		if (VMCI_INVALID_ID == contextID)
+			contextID = VMCI_HOST_CONTEXT_ID;
+		return VMCIDatagramDispatchAsHost(contextID, dg);
+	}
+
+	if (VMCI_ROUTE_AS_GUEST == route)
+		return VMCIDatagramDispatchAsGuest(dg);
+
+	VMCI_WARNING((LGPFX "Unknown route (%d) for datagram.\n", route));
+	return VMCI_ERROR_DST_UNREACHABLE;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIDatagram_InvokeGuestHandler --
+ *
+ *     Invoke the handler for the given datagram.  This is intended to be
+ *     called only when acting as a guest and receiving a datagram from the
+ *     virtual device.
+ *
+ *  Result:
+ *     VMCI_SUCCESS on success, other error values on failure.
+ *
+ *  Side effects:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIDatagram_InvokeGuestHandler(struct vmci_datagram *dg)	// IN
+{
+	int retval;
+	struct vmci_resource *resource;
+	struct datagram_entry *dstEntry;
+
+	ASSERT(dg);
+
+	resource = VMCIResource_Get(dg->dst, VMCI_RESOURCE_TYPE_DATAGRAM);
+	if (NULL == resource) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"destination (handle=0x%x:0x%x) doesn't exist.\n",
+				dg->dst.context, dg->dst.resource));
+		return VMCI_ERROR_NO_HANDLE;
+	}
+
+	dstEntry =
+	    RESOURCE_CONTAINER(resource, struct datagram_entry, resource);
+	if (dstEntry->runDelayed) {
+		struct delayed_datagram_info *dgInfo;
+
+		dgInfo =
+		    kmalloc(sizeof *dgInfo + (size_t) dg->payloadSize,
+			    GFP_ATOMIC);
+		if (NULL == dgInfo) {
+			VMCIResource_Release(resource);
+			retval = VMCI_ERROR_NO_MEM;
+			goto exit;
+		}
+
+		dgInfo->inDGHostQueue = false;
+		dgInfo->entry = dstEntry;
+		memcpy(&dgInfo->msg, dg, VMCI_DG_SIZE(dg));
+
+		retval =
+		    VMCI_ScheduleDelayedWork(VMCIDatagramDelayedDispatchCB,
+					     dgInfo);
+		if (retval < VMCI_SUCCESS) {
+			VMCI_WARNING((LGPFX
+				      "Failed to schedule delayed work for datagram "
+				      "(result=%d).\n", retval));
+			kfree(dgInfo);
+			VMCIResource_Release(resource);
+			dgInfo = NULL;
+			goto exit;
+		}
+	} else {
+		dstEntry->recvCB(dstEntry->clientData, dg);
+		VMCIResource_Release(resource);
+		retval = VMCI_SUCCESS;
+	}
+
+ exit:
+	return retval;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDatagram_Send --
+ *
+ *      Sends the payload to the destination datagram handle.
+ *
+ * Results:
+ *      Returns number of bytes sent if success, or error code if failure.
+ *
+ * Side effects:
+ *      None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIDatagram_Send(struct vmci_datagram *msg)	// IN
+{
+	if (msg == NULL)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	return VMCIDatagram_Dispatch(VMCI_INVALID_ID, msg, false);
+}
+
+EXPORT_SYMBOL(VMCIDatagram_Send);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIDatagram_Sync --
+ *
+ *      Use this as a synchronization point when setting globals, for example,
+ *      during device shutdown.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void VMCIDatagram_Sync(void)
+{
+	VMCIResource_Sync();
+}
diff --git a/drivers/misc/vmw_vmci/vmciDatagram.h b/drivers/misc/vmw_vmci/vmciDatagram.h
new file mode 100644
index 0000000..9ae003e
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciDatagram.h
@@ -0,0 +1,42 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef _VMCI_DATAGRAM_H_
+#define _VMCI_DATAGRAM_H_
+
+#include "vmci_call_defs.h"
+#include "vmci_iocontrols.h"
+#include "vmciContext.h"
+
+#define VMCI_MAX_DELAYED_DG_HOST_QUEUE_SIZE 256
+
+/* Init functions. */
+int VMCIDatagram_Init(void);
+
+/* Datagram API for non-public use. */
+int VMCIDatagram_Dispatch(uint32_t contextID, struct vmci_datagram *dg,
+			  bool fromGuest);
+int VMCIDatagram_InvokeGuestHandler(struct vmci_datagram *dg);
+int VMCIDatagram_GetPrivFlags(struct vmci_handle handle, uint32_t * privFlags);
+
+/* Misc. */
+void VMCIDatagram_Sync(void);
+
+#endif				// _VMCI_DATAGRAM_H_
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 03/14] Add vmciDoorbell.*
  2012-02-15  1:05 [PATCH 00/14] RFC: VMCI for Linux Andrew Stiegmann (stieg)
  2012-02-15  1:05 ` [PATCH 01/14] Add vmciContext.* Andrew Stiegmann (stieg)
  2012-02-15  1:05 ` [PATCH 02/14] Add vmciDatagram.* Andrew Stiegmann (stieg)
@ 2012-02-15  1:05 ` Andrew Stiegmann (stieg)
  2012-02-15  1:05 ` [PATCH 04/14] Add vmciDriver.* Andrew Stiegmann (stieg)
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Andrew Stiegmann (stieg) @ 2012-02-15  1:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: vm-crosstalk, dtor, cschamp, Andrew Stiegmann (stieg)

---
 drivers/misc/vmw_vmci/vmciDoorbell.c | 1072 ++++++++++++++++++++++++++++++++++
 drivers/misc/vmw_vmci/vmciDoorbell.h |   37 ++
 2 files changed, 1109 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmciDoorbell.c
 create mode 100644 drivers/misc/vmw_vmci/vmciDoorbell.h

diff --git a/drivers/misc/vmw_vmci/vmciDoorbell.c b/drivers/misc/vmw_vmci/vmciDoorbell.c
new file mode 100644
index 0000000..e14e7f8
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciDoorbell.c
@@ -0,0 +1,1072 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+
+#include "vmci_defs.h"
+#include "vmci_infrastructure.h"
+#include "vmci_kernel_if.h"
+#include "vmciCommonInt.h"
+#include "vmciDatagram.h"
+#include "vmciDoorbell.h"
+#include "vmciDriver.h"
+#include "vmciKernelAPI.h"
+#include "vmciResource.h"
+#include "vmciRoute.h"
+
+#define LGPFX "VMCIDoorbell: "
+
+#define VMCI_DOORBELL_INDEX_TABLE_SIZE 64
+#define VMCI_DOORBELL_HASH(_idx)				\
+	VMCI_HashId((_idx), VMCI_DOORBELL_INDEX_TABLE_SIZE)
+
+/*
+ * DoorbellEntry describes the a doorbell notification handle allocated by the
+ * host.
+ */
+
+struct doorbell_entry {
+	struct vmci_resource resource;
+	uint32_t idx;
+	struct list_head idxListItem;
+	uint32_t privFlags;
+	bool isDoorbell;
+	bool runDelayed;
+	VMCICallback notifyCB;
+	void *clientData;
+	wait_queue_head_t destroyEvent;
+	atomic_t active;	/* Only used by guest personality */
+};
+
+struct doorbell_index_table {
+	spinlock_t lock;
+	struct list_head entries[VMCI_DOORBELL_INDEX_TABLE_SIZE];
+};
+
+/* The VMCI index table keeps track of currently registered doorbells. */
+static struct doorbell_index_table vmciDoorbellIT;
+
+/*
+ * The maxNotifyIdx is one larger than the currently known bitmap index in
+ * use, and is used to determine how much of the bitmap needs to be scanned.
+ */
+static uint32_t maxNotifyIdx;
+
+/*
+ * The notifyIdxCount is used for determining whether there are free entries
+ * within the bitmap (if notifyIdxCount + 1 < maxNotifyIdx).
+ */
+static uint32_t notifyIdxCount;
+
+/*
+ * The lastNotifyIdxReserved is used to track the last index handed out - in
+ * the case where multiple handles share a notification index, we hand out
+ * indexes round robin based on lastNotifyIdxReserved.
+ */
+static uint32_t lastNotifyIdxReserved;
+
+/* This is a one entry cache used to by the index allocation. */
+static uint32_t lastNotifyIdxReleased = PAGE_SIZE;
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDoorbell_Init --
+ *
+ *    General init code.
+ *
+ * Result:
+ *    VMCI_SUCCESS on success, lock allocation error otherwise.
+ *
+ * Side effects:
+ *    None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIDoorbell_Init(void)
+{
+	uint32_t bucket;
+
+	for (bucket = 0; bucket < ARRAY_SIZE(vmciDoorbellIT.entries); ++bucket)
+		INIT_LIST_HEAD(&vmciDoorbellIT.entries[bucket]);
+
+	spin_lock_init(&vmciDoorbellIT.lock);
+	return VMCI_SUCCESS;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDoorbellFreeCB --
+ *
+ *    Callback to free doorbell entry structure when resource is no longer used,
+ *    ie. the reference count reached 0.  The entry is freed in
+ *    VMCIDoorbell_Destroy(), which is waiting on the signal that gets fired
+ *    here.
+ *
+ * Result:
+ *    None.
+ *
+ * Side effects:
+ *    Signals VMCI event.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static void VMCIDoorbellFreeCB(void *clientData)	// IN
+{
+	struct doorbell_entry *entry = (struct doorbell_entry *)clientData;
+	ASSERT(entry);
+	wake_up(&entry->destroyEvent);
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDoorbellReleaseCB --
+ *
+ *     Callback to release the resource reference. It is called by the
+ *     VMCI_WaitOnEvent function before it blocks.
+ *
+ * Result:
+ *     Always 0.
+ *
+ * Side effects:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static int VMCIDoorbellReleaseCB(void *clientData)	// IN: doorbell entry
+{
+	struct doorbell_entry *entry = (struct doorbell_entry *)clientData;
+	ASSERT(entry);
+	VMCIResource_Release(&entry->resource);
+	return 0;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDoorbellGetPrivFlags --
+ *
+ *    Utility function that retrieves the privilege flags associated
+ *    with a given doorbell handle. For guest endpoints, the
+ *    privileges are determined by the context ID, but for host
+ *    endpoints privileges are associated with the complete
+ *    handle. Hypervisor endpoints are not yet supported.
+ *
+ * Result:
+ *    VMCI_SUCCESS on success,
+ *    VMCI_ERROR_NOT_FOUND if handle isn't found,
+ *    VMCI_ERROR_INVALID_ARGS if handle is invalid.
+ *
+ * Side effects:
+ *    None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIDoorbellGetPrivFlags(struct vmci_handle handle,	// IN
+			     uint32_t * privFlags)	// OUT
+{
+	if (privFlags == NULL || handle.context == VMCI_INVALID_ID) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	if (handle.context == VMCI_HOST_CONTEXT_ID) {
+		struct doorbell_entry *entry;
+		struct vmci_resource *resource;
+
+		resource =
+		    VMCIResource_Get(handle, VMCI_RESOURCE_TYPE_DOORBELL);
+		if (resource == NULL)
+			return VMCI_ERROR_NOT_FOUND;
+
+		entry =
+		    RESOURCE_CONTAINER(resource, struct doorbell_entry,
+				       resource);
+		*privFlags = entry->privFlags;
+		VMCIResource_Release(resource);
+	} else if (handle.context == VMCI_HYPERVISOR_CONTEXT_ID) {
+		/* Hypervisor endpoints for notifications are not supported (yet). */
+		return VMCI_ERROR_INVALID_ARGS;
+	} else {
+		*privFlags = VMCIContext_GetPrivFlags(handle.context);
+	}
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIDoorbellIndexTableFind --
+ *
+ *    Find doorbell entry by bitmap index.
+ *
+ * Results:
+ *    Entry if found, NULL if not.
+ *
+ * Side effects:
+ *    None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static struct doorbell_entry *VMCIDoorbellIndexTableFind(uint32_t idx)	// IN
+{
+	uint32_t bucket = VMCI_DOORBELL_HASH(idx);
+	struct list_head *iter;
+
+	ASSERT(VMCI_GuestPersonalityActive());
+
+	list_for_each(iter, &vmciDoorbellIT.entries[bucket]) {
+		struct doorbell_entry *cur =
+		    list_entry(iter, struct doorbell_entry, idxListItem);
+
+		ASSERT(cur);
+
+		if (idx == cur->idx)
+			return cur;
+	}
+
+	return NULL;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDoorbellIndexTableAdd --
+ *
+ *    Add the given entry to the index table.  This will hold() the entry's
+ *    resource so that the entry is not deleted before it is removed from the
+ *    table.
+ *
+ * Results:
+ *    None.
+ *
+ * Side effects:
+ *    None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static void VMCIDoorbellIndexTableAdd(struct doorbell_entry *entry)	// IN/OUT
+{
+	uint32_t bucket;
+	uint32_t newNotifyIdx;
+
+	ASSERT(entry);
+	ASSERT(VMCI_GuestPersonalityActive());
+
+	VMCIResource_Hold(&entry->resource);
+
+	spin_lock_bh(&vmciDoorbellIT.lock);
+
+	/*
+	 * Below we try to allocate an index in the notification bitmap with "not
+	 * too much" sharing between resources. If we use less that the full bitmap,
+	 * we either add to the end if there are no unused flags within the
+	 * currently used area, or we search for unused ones. If we use the full
+	 * bitmap, we allocate the index round robin.
+	 */
+
+	if (maxNotifyIdx < PAGE_SIZE || notifyIdxCount < PAGE_SIZE) {
+		if (lastNotifyIdxReleased < maxNotifyIdx &&
+		    !VMCIDoorbellIndexTableFind(lastNotifyIdxReleased)) {
+			newNotifyIdx = lastNotifyIdxReleased;
+			lastNotifyIdxReleased = PAGE_SIZE;
+		} else {
+			bool reused = false;
+			newNotifyIdx = lastNotifyIdxReserved;
+			if (notifyIdxCount + 1 < maxNotifyIdx) {
+				do {
+					if (!VMCIDoorbellIndexTableFind
+					    (newNotifyIdx)) {
+						reused = true;
+						break;
+					}
+					newNotifyIdx =
+					    (newNotifyIdx + 1) % maxNotifyIdx;
+				} while (newNotifyIdx != lastNotifyIdxReleased);
+			}
+			if (!reused) {
+				newNotifyIdx = maxNotifyIdx;
+				maxNotifyIdx++;
+			}
+		}
+	} else {
+		newNotifyIdx = (lastNotifyIdxReserved + 1) % PAGE_SIZE;
+	}
+
+	lastNotifyIdxReserved = newNotifyIdx;
+	notifyIdxCount++;
+
+	entry->idx = newNotifyIdx;
+	bucket = VMCI_DOORBELL_HASH(entry->idx);
+	list_add(&entry->idxListItem, &vmciDoorbellIT.entries[bucket]);
+
+	spin_unlock_bh(&vmciDoorbellIT.lock);
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDoorbellIndexTableRemove --
+ *
+ *    Remove the given entry from the index table.  This will release() the
+ *    entry's resource.
+ *
+ * Results:
+ *    None.
+ *
+ * Side effects:
+ *    None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static void VMCIDoorbellIndexTableRemove(struct doorbell_entry *entry)	// IN/OUT
+{
+	ASSERT(entry);
+	ASSERT(VMCI_GuestPersonalityActive());
+
+	spin_lock_bh(&vmciDoorbellIT.lock);
+
+	list_del(&entry->idxListItem);
+
+	notifyIdxCount--;
+	if (entry->idx == maxNotifyIdx - 1) {
+		/*
+		 * If we delete an entry with the maximum known notification index, we
+		 * take the opportunity to prune the current max. As there might be other
+		 * unused indices immediately below, we lower the maximum until we hit an
+		 * index in use.
+		 */
+
+		while (maxNotifyIdx > 0 &&
+		       !VMCIDoorbellIndexTableFind(maxNotifyIdx - 1)) {
+			maxNotifyIdx--;
+		}
+	}
+
+	lastNotifyIdxReleased = entry->idx;
+
+	spin_unlock_bh(&vmciDoorbellIT.lock);
+
+	VMCIResource_Release(&entry->resource);
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDoorbellLink --
+ *
+ *    Creates a link between the given doorbell handle and the given
+ *    index in the bitmap in the device backend.
+ *
+ * Results:
+ *    VMCI_SUCCESS if success, appropriate error code otherwise.
+ *
+ * Side effects:
+ *    Notification state is created in hypervisor.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static int VMCIDoorbellLink(struct vmci_handle handle,	// IN
+			    bool isDoorbell,	// IN
+			    uint32_t notifyIdx)	// IN
+{
+	uint32_t resourceID;
+	struct vmci_doorbell_link_msg linkMsg;
+
+	ASSERT(!VMCI_HANDLE_INVALID(handle));
+	ASSERT(VMCI_GuestPersonalityActive());
+
+	if (isDoorbell) {
+		resourceID = VMCI_DOORBELL_LINK;
+	} else {
+		/* XXX: Why would isDoorbell be false? */
+		ASSERT(false);
+	}
+
+	linkMsg.hdr.dst =
+	    VMCI_MAKE_HANDLE(VMCI_HYPERVISOR_CONTEXT_ID, resourceID);
+	linkMsg.hdr.src = VMCI_ANON_SRC_HANDLE;
+	linkMsg.hdr.payloadSize = sizeof linkMsg - VMCI_DG_HEADERSIZE;
+	linkMsg.handle = handle;
+	linkMsg.notifyIdx = notifyIdx;
+
+	return VMCI_SendDatagram((struct vmci_datagram *)&linkMsg);
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDoorbellUnlink --
+ *
+ *    Unlinks the given doorbell handle from an index in the bitmap in
+ *    the device backend.
+ *
+ * Results:
+ *      VMCI_SUCCESS if success, appropriate error code otherwise.
+ *
+ * Side effects:
+ *      Notification state is destroyed in hypervisor.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static int VMCIDoorbellUnlink(struct vmci_handle handle,	// IN
+			      bool isDoorbell)	// IN
+{
+	uint32_t resourceID;
+	struct vmci_doorbell_unlink_msg unlinkMsg;
+
+	ASSERT(!VMCI_HANDLE_INVALID(handle));
+	ASSERT(VMCI_GuestPersonalityActive());
+
+	if (isDoorbell) {
+		resourceID = VMCI_DOORBELL_UNLINK;
+	} else {
+		/* XXX: Why would isDoorbell be false? */
+		ASSERT(false);
+	}
+
+	unlinkMsg.hdr.dst =
+	    VMCI_MAKE_HANDLE(VMCI_HYPERVISOR_CONTEXT_ID, resourceID);
+	unlinkMsg.hdr.src = VMCI_ANON_SRC_HANDLE;
+	unlinkMsg.hdr.payloadSize = sizeof unlinkMsg - VMCI_DG_HEADERSIZE;
+	unlinkMsg.handle = handle;
+
+	return VMCI_SendDatagram((struct vmci_datagram *)&unlinkMsg);
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDoorbell_Create --
+ *
+ *    Creates a doorbell with the given callback. If the handle is
+ *    VMCI_INVALID_HANDLE, a free handle will be assigned, if
+ *    possible. The callback can be run immediately (potentially with
+ *    locks held - the default) or delayed (in a kernel thread) by
+ *    specifying the flag VMCI_FLAG_DELAYED_CB. If delayed execution
+ *    is selected, a given callback may not be run if the kernel is
+ *    unable to allocate memory for the delayed execution (highly
+ *    unlikely).
+ *
+ * Results:
+ *    VMCI_SUCCESS on success, appropriate error code otherwise.
+ *
+ * Side effects:
+ *    None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIDoorbell_Create(struct vmci_handle *handle,	// IN/OUT
+			uint32_t flags,	// IN
+			uint32_t privFlags,	// IN
+			VMCICallback notifyCB,	// IN
+			void *clientData)	// IN
+{
+	struct doorbell_entry *entry;
+	struct vmci_handle newHandle;
+	int result;
+
+	if (!handle || !notifyCB || flags & ~VMCI_FLAG_DELAYED_CB ||
+	    privFlags & ~VMCI_PRIVILEGE_ALL_FLAGS)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	entry = kmalloc(sizeof *entry, GFP_KERNEL);
+	if (entry == NULL) {
+		VMCI_WARNING((LGPFX
+			      "Failed allocating memory for datagram entry.\n"));
+		return VMCI_ERROR_NO_MEM;
+	}
+
+	if (VMCI_HANDLE_INVALID(*handle)) {
+		uint32_t contextID = VMCI_GetContextID();
+		uint32_t resourceID = VMCIResource_GetID(contextID);
+		if (resourceID == VMCI_INVALID_ID) {
+			result = VMCI_ERROR_NO_HANDLE;
+			goto freeMem;
+		}
+		newHandle = VMCI_MAKE_HANDLE(contextID, resourceID);
+	} else {
+		bool validContext = false;
+
+		/*
+		 * Validate the handle.  We must do both of the checks below
+		 * because we can be acting as both a host and a guest at the
+		 * same time. We always allow the host context ID, since the
+		 * host functionality is in practice always there with the
+		 * unified driver.
+		 */
+
+		if (VMCI_HOST_CONTEXT_ID == handle->context ||
+		    (VMCI_GuestPersonalityActive() &&
+		     VMCI_GetContextID() == handle->context))
+			validContext = true;
+
+		if (!validContext || VMCI_INVALID_ID == handle->resource) {
+			VMCI_DEBUG_LOG(4,
+				       (LGPFX
+					"Invalid argument (handle=0x%x:0x%x).\n",
+					handle->context, handle->resource));
+			result = VMCI_ERROR_INVALID_ARGS;
+			goto freeMem;
+		}
+
+		newHandle = *handle;
+	}
+
+	entry->idx = 0;
+	INIT_LIST_HEAD(&entry->idxListItem);
+	entry->privFlags = privFlags;
+	entry->isDoorbell = true;
+	entry->runDelayed = (flags & VMCI_FLAG_DELAYED_CB) ? true : false;
+	entry->notifyCB = notifyCB;
+	entry->clientData = clientData;
+	atomic_set(&entry->active, 0);
+	init_waitqueue_head(&entry->destroyEvent);
+
+	result =
+	    VMCIResource_Add(&entry->resource, VMCI_RESOURCE_TYPE_DOORBELL,
+			     newHandle, VMCIDoorbellFreeCB, entry);
+	if (result != VMCI_SUCCESS) {
+		VMCI_WARNING((LGPFX
+			      "Failed to add new resource (handle=0x%x:0x%x).\n",
+			      newHandle.context, newHandle.resource));
+		if (result == VMCI_ERROR_DUPLICATE_ENTRY) {
+			result = VMCI_ERROR_ALREADY_EXISTS;
+		}
+		goto freeMem;
+	}
+
+	if (VMCI_GuestPersonalityActive()) {
+		VMCIDoorbellIndexTableAdd(entry);
+		result =
+		    VMCIDoorbellLink(newHandle, entry->isDoorbell, entry->idx);
+		if (VMCI_SUCCESS != result)
+			goto destroyResource;
+
+		atomic_set(&entry->active, 1);
+	}
+
+	if (VMCI_HANDLE_INVALID(*handle)) {
+		*handle = newHandle;
+	}
+
+	return result;
+
+ destroyResource:
+	VMCIDoorbellIndexTableRemove(entry);
+	VMCIResource_Remove(newHandle, VMCI_RESOURCE_TYPE_DOORBELL);
+ freeMem:
+	kfree(entry);
+	return result;
+}
+
+EXPORT_SYMBOL(VMCIDoorbell_Create);
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDoorbell_Destroy --
+ *
+ *    Destroys a doorbell previously created with
+ *    VMCIDoorbell_Create. This operation may block waiting for a
+ *    callback to finish.
+ *
+ * Results:
+ *    VMCI_SUCCESS on success, appropriate error code otherwise.
+ *
+ * Side effects:
+ *    May block.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIDoorbell_Destroy(struct vmci_handle handle)	// IN
+{
+	struct doorbell_entry *entry;
+	struct vmci_resource *resource;
+
+	if (VMCI_HANDLE_INVALID(handle))
+		return VMCI_ERROR_INVALID_ARGS;
+
+	resource = VMCIResource_Get(handle, VMCI_RESOURCE_TYPE_DOORBELL);
+	if (resource == NULL) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Failed to destroy doorbell (handle=0x%x:0x%x).\n",
+				handle.context, handle.resource));
+		return VMCI_ERROR_NOT_FOUND;
+	}
+
+	entry = RESOURCE_CONTAINER(resource, struct doorbell_entry, resource);
+
+	if (VMCI_GuestPersonalityActive()) {
+		int result;
+
+		VMCIDoorbellIndexTableRemove(entry);
+
+		result = VMCIDoorbellUnlink(handle, entry->isDoorbell);
+		if (VMCI_SUCCESS != result) {
+
+			/*
+			 * The only reason this should fail would be an inconsistency between
+			 * guest and hypervisor state, where the guest believes it has an
+			 * active registration whereas the hypervisor doesn't. One case where
+			 * this may happen is if a doorbell is unregistered following a
+			 * hibernation at a time where the doorbell state hasn't been restored
+			 * on the hypervisor side yet. Since the handle has now been removed
+			 * in the guest, we just print a warning and return success.
+			 */
+
+			VMCI_DEBUG_LOG(4,
+				       (LGPFX
+					"Unlink of %s (handle=0x%x:0x%x) unknown by "
+					"hypervisor (error=%d).\n",
+					entry->isDoorbell ? "doorbell" :
+					"queuepair", handle.context,
+					handle.resource, result));
+		}
+	}
+
+	/*
+	 * Now remove the resource from the table.  It might still be in use
+	 * after this, in a callback or still on the delayed work queue.
+	 */
+	VMCIResource_Remove(handle, VMCI_RESOURCE_TYPE_DOORBELL);
+
+	/*
+	 * We now wait on the destroyEvent and release the reference we got
+	 * above.
+	 */
+	VMCI_WaitOnEvent(&entry->destroyEvent, VMCIDoorbellReleaseCB, entry);
+
+	/*
+	 * We know that we are now the only reference to the above entry so
+	 * can safely free it.
+	 */
+	kfree(entry);
+
+	return VMCI_SUCCESS;
+}
+
+EXPORT_SYMBOL(VMCIDoorbell_Destroy);
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDoorbellNotifyAsGuest --
+ *
+ *    Notify another guest or the host.  We send a datagram down to the
+ *    host via the hypervisor with the notification info.
+ *
+ * Results:
+ *    VMCI_SUCCESS on success, appropriate error code otherwise.
+ *
+ * Side effects:
+ *    May do a hypercall.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static int VMCIDoorbellNotifyAsGuest(struct vmci_handle handle,	// IN
+				     uint32_t privFlags)	// IN
+{
+	struct vmci_doorbell_ntfy_msg notifyMsg;
+
+	ASSERT(VMCI_GuestPersonalityActive());
+
+	notifyMsg.hdr.dst = VMCI_MAKE_HANDLE(VMCI_HYPERVISOR_CONTEXT_ID,
+					     VMCI_DOORBELL_NOTIFY);
+	notifyMsg.hdr.src = VMCI_ANON_SRC_HANDLE;
+	notifyMsg.hdr.payloadSize = sizeof notifyMsg - VMCI_DG_HEADERSIZE;
+	notifyMsg.handle = handle;
+
+	return VMCI_SendDatagram((struct vmci_datagram *)&notifyMsg);
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDoorbell_Notify --
+ *
+ *    Generates a notification on the doorbell identified by the
+ *    handle. For host side generation of notifications, the caller
+ *    can specify what the privilege of the calling side is.
+ *
+ * Results:
+ *    VMCI_SUCCESS on success, appropriate error code otherwise.
+ *
+ * Side effects:
+ *    May do a hypercall.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIDoorbell_Notify(struct vmci_handle dst,	// IN
+			uint32_t privFlags)	// IN
+{
+	int retval;
+	enum vmci_route route;
+	struct vmci_handle src;
+
+	if (VMCI_HANDLE_INVALID(dst)
+	    || (privFlags & ~VMCI_PRIVILEGE_ALL_FLAGS))
+		return VMCI_ERROR_INVALID_ARGS;
+
+	src = VMCI_INVALID_HANDLE;
+	retval = VMCI_Route(&src, &dst, false, &route);
+	if (retval < VMCI_SUCCESS)
+		return retval;
+
+	if (VMCI_ROUTE_AS_HOST == route)
+		return VMCIContext_NotifyDoorbell(VMCI_HOST_CONTEXT_ID,
+						  dst, privFlags);
+
+	if (VMCI_ROUTE_AS_GUEST == route)
+		return VMCIDoorbellNotifyAsGuest(dst, privFlags);
+
+	VMCI_WARNING((LGPFX "Unknown route (%d) for doorbell.\n", route));
+	return VMCI_ERROR_DST_UNREACHABLE;
+}
+
+EXPORT_SYMBOL(VMCIDoorbell_Notify);
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDoorbellDelayedDispatchCB --
+ *
+ *    Calls the specified callback in a delayed context.
+ *
+ * Results:
+ *    None.
+ *
+ * Side effects:
+ *    None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static void VMCIDoorbellDelayedDispatchCB(void *data)	// IN
+{
+	struct doorbell_entry *entry = (struct doorbell_entry *)data;
+
+	ASSERT(data);
+
+	entry->notifyCB(entry->clientData);
+
+	VMCIResource_Release(&entry->resource);
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDoorbellHostContextNotify --
+ *
+ *    Dispatches a doorbell notification to the host context.
+ *
+ * Results:
+ *    VMCI_SUCCESS on success. Appropriate error code otherwise.
+ *
+ * Side effects:
+ *    None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIDoorbellHostContextNotify(uint32_t srcCID,	// IN
+				  struct vmci_handle handle)	// IN
+{
+	struct doorbell_entry *entry;
+	struct vmci_resource *resource;
+	int result;
+
+	ASSERT(VMCI_HostPersonalityActive());
+
+	if (VMCI_HANDLE_INVALID(handle)) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Notifying an invalid doorbell (handle=0x%x:0x%x).\n",
+				handle.context, handle.resource));
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	resource = VMCIResource_Get(handle, VMCI_RESOURCE_TYPE_DOORBELL);
+	if (resource == NULL) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Notifying an unknown doorbell (handle=0x%x:0x%x).\n",
+				handle.context, handle.resource));
+		return VMCI_ERROR_NOT_FOUND;
+	}
+	entry = RESOURCE_CONTAINER(resource, struct doorbell_entry, resource);
+
+	if (entry->runDelayed) {
+		result =
+		    VMCI_ScheduleDelayedWork(VMCIDoorbellDelayedDispatchCB,
+					     entry);
+		if (result < VMCI_SUCCESS) {
+			/*
+			 * If we failed to schedule the delayed work, we need to
+			 * release the resource immediately. Otherwise, the resource
+			 * will be released once the delayed callback has been
+			 * completed.
+			 */
+
+			VMCI_DEBUG_LOG(10,
+				       (LGPFX
+					"Failed to schedule delayed doorbell "
+					"notification (result=%d).\n", result));
+			VMCIResource_Release(resource);
+		}
+	} else {
+		entry->notifyCB(entry->clientData);
+		VMCIResource_Release(resource);
+		result = VMCI_SUCCESS;
+	}
+	return result;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDoorbell_Hibernate --
+ *
+ *      When a guest leaves hibernation, the device driver state is out of sync
+ *      with the device state, since the driver state has doorbells registered
+ *      that aren't known to the device.  This function takes care of
+ *      reregistering any doorbells. In case an error occurs during
+ *      reregistration (this is highly unlikely since 1) it succeeded the first
+ *      time 2) the device driver is the only source of doorbell registrations),
+ *      we simply log the error.  The doorbell can still be destroyed using
+ *      VMCIDoorbell_Destroy.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+void VMCIDoorbell_Hibernate(bool enterHibernate)
+{
+	uint32_t bucket;
+	struct list_head *iter;
+
+	if (!VMCI_GuestPersonalityActive() || enterHibernate)
+		return;
+
+	spin_lock_bh(&vmciDoorbellIT.lock);
+
+	for (bucket = 0; bucket < ARRAY_SIZE(vmciDoorbellIT.entries); bucket++) {
+		list_for_each(iter, &vmciDoorbellIT.entries[bucket]) {
+			int result;
+			struct vmci_handle h;
+			struct doorbell_entry *cur;
+
+			cur =
+			    list_entry(iter, struct doorbell_entry,
+				       idxListItem);
+			h = VMCIResource_Handle(&cur->resource);
+			result = VMCIDoorbellLink(h, cur->isDoorbell, cur->idx);
+			if (result != VMCI_SUCCESS
+			    && result != VMCI_ERROR_DUPLICATE_ENTRY) {
+				VMCI_WARNING((LGPFX
+					      "Failed to reregister doorbell "
+					      "(handle=0x%x:0x%x) of resource %s to index "
+					      "(error=%d).\n", h.context,
+					      h.resource,
+					      cur->isDoorbell ? "doorbell" :
+					      "queue pair", result));
+			}
+		}
+	}
+
+	spin_unlock_bh(&vmciDoorbellIT.lock);
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIDoorbell_Sync --
+ *
+ *      Use this as a synchronization point when setting globals, for example,
+ *      during device shutdown.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+void VMCIDoorbell_Sync(void)
+{
+	spin_lock_bh(&vmciDoorbellIT.lock);
+	spin_unlock_bh(&vmciDoorbellIT.lock);
+	VMCIResource_Sync();
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCI_RegisterNotificationBitmap --
+ *
+ *      Register the notification bitmap with the host.
+ *
+ * Results:
+ *      true if the bitmap is registered successfully with the device, false
+ *      otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+bool VMCI_RegisterNotificationBitmap(uint32_t bitmapPPN)
+{
+	int result;
+	struct vmci_ntfy_bm_set_msg bitmapSetMsg;
+
+	/*
+	 * Do not ASSERT() on the guest device here.  This function can get called
+	 * during device initialization, so the ASSERT() will fail even though
+	 * the device is (almost) up.
+	 */
+
+	bitmapSetMsg.hdr.dst = VMCI_MAKE_HANDLE(VMCI_HYPERVISOR_CONTEXT_ID,
+						VMCI_SET_NOTIFY_BITMAP);
+	bitmapSetMsg.hdr.src = VMCI_ANON_SRC_HANDLE;
+	bitmapSetMsg.hdr.payloadSize = sizeof bitmapSetMsg - VMCI_DG_HEADERSIZE;
+	bitmapSetMsg.bitmapPPN = bitmapPPN;
+
+	result = VMCI_SendDatagram((struct vmci_datagram *)&bitmapSetMsg);
+	if (result != VMCI_SUCCESS) {
+		VMCI_DEBUG_LOG(4, (LGPFX "Failed to register (PPN=%u) as "
+				   "notification bitmap (error=%d).\n",
+				   bitmapPPN, result));
+		return false;
+	}
+	return true;
+}
+
+/*
+ *-------------------------------------------------------------------------
+ *
+ * VMCIDoorbellFireEntries --
+ *
+ *     Executes or schedules the handlers for a given notify index.
+ *
+ * Result:
+ *     Notification hash entry if found. NULL otherwise.
+ *
+ * Side effects:
+ *     Whatever the side effects of the handlers are.
+ *
+ *-------------------------------------------------------------------------
+ */
+
+static void VMCIDoorbellFireEntries(uint32_t notifyIdx)	// IN
+{
+	uint32_t bucket = VMCI_DOORBELL_HASH(notifyIdx);
+	struct list_head *iter;
+
+	ASSERT(VMCI_GuestPersonalityActive());
+
+	spin_lock_bh(&vmciDoorbellIT.lock);
+
+	list_for_each(iter, &vmciDoorbellIT.entries[bucket]) {
+		struct doorbell_entry *cur =
+		    list_entry(iter, struct doorbell_entry, idxListItem);
+
+		ASSERT(cur);
+
+		if (cur->idx == notifyIdx && atomic_read(&cur->active) == 1) {
+			ASSERT(cur->notifyCB);
+			if (cur->runDelayed) {
+				int err;
+
+				VMCIResource_Hold(&cur->resource);
+				err =
+				    VMCI_ScheduleDelayedWork
+				    (VMCIDoorbellDelayedDispatchCB, cur);
+				if (err != VMCI_SUCCESS) {
+					VMCIResource_Release(&cur->resource);
+					goto out;
+				}
+			} else {
+				cur->notifyCB(cur->clientData);
+			}
+		}
+	}
+
+ out:
+	spin_unlock_bh(&vmciDoorbellIT.lock);
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCI_ScanNotificationBitmap --
+ *
+ *      Scans the notification bitmap, collects pending notifications,
+ *      resets the bitmap and invokes appropriate callbacks.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      May schedule tasks, allocate memory and run callbacks.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+void VMCI_ScanNotificationBitmap(uint8_t * bitmap)
+{
+	uint32_t idx;
+
+	ASSERT(bitmap);
+	ASSERT(VMCI_GuestPersonalityActive());
+
+	for (idx = 0; idx < maxNotifyIdx; idx++) {
+		if (bitmap[idx] & 0x1) {
+			bitmap[idx] &= ~1;
+			VMCIDoorbellFireEntries(idx);
+		}
+	}
+}
diff --git a/drivers/misc/vmw_vmci/vmciDoorbell.h b/drivers/misc/vmw_vmci/vmciDoorbell.h
new file mode 100644
index 0000000..a039261
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciDoorbell.h
@@ -0,0 +1,37 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef VMCI_DOORBELL_H
+#define VMCI_DOORBELL_H
+
+#include "vmci_defs.h"
+#include "vmci_kernel_if.h"
+
+int VMCIDoorbell_Init(void);
+void VMCIDoorbell_Hibernate(bool enterHibernation);
+void VMCIDoorbell_Sync(void);
+
+int VMCIDoorbellHostContextNotify(uint32_t srcCID, struct vmci_handle handle);
+int VMCIDoorbellGetPrivFlags(struct vmci_handle handle, uint32_t * privFlags);
+
+bool VMCI_RegisterNotificationBitmap(uint32_t bitmapPPN);
+void VMCI_ScanNotificationBitmap(uint8_t * bitmap);
+
+#endif				// VMCI_DOORBELL_H
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 04/14] Add vmciDriver.*
  2012-02-15  1:05 [PATCH 00/14] RFC: VMCI for Linux Andrew Stiegmann (stieg)
                   ` (2 preceding siblings ...)
  2012-02-15  1:05 ` [PATCH 03/14] Add vmciDoorbell.* Andrew Stiegmann (stieg)
@ 2012-02-15  1:05 ` Andrew Stiegmann (stieg)
  2012-02-15  1:05 ` [PATCH 05/14] Add vmciEvent.* Andrew Stiegmann (stieg)
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Andrew Stiegmann (stieg) @ 2012-02-15  1:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: vm-crosstalk, dtor, cschamp, Andrew Stiegmann (stieg)

---
 drivers/misc/vmw_vmci/vmciDriver.c |  663 ++++++++++++++++++++++++++++++++++++
 drivers/misc/vmw_vmci/vmciDriver.h |   57 +++
 2 files changed, 720 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmciDriver.c
 create mode 100644 drivers/misc/vmw_vmci/vmciDriver.h

diff --git a/drivers/misc/vmw_vmci/vmciDriver.c b/drivers/misc/vmw_vmci/vmciDriver.c
new file mode 100644
index 0000000..e121d6d
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciDriver.c
@@ -0,0 +1,663 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+#include "vmci_defs.h"
+#include "vmci_infrastructure.h"
+#include "vmci_kernel_if.h"
+#include "vmciCommonInt.h"
+#include "vmciContext.h"
+#include "vmciDatagram.h"
+#include "vmciDoorbell.h"
+#include "vmciDriver.h"
+#include "vmciEvent.h"
+#include "vmciHashtable.h"
+#include "vmciKernelAPI.h"
+#include "vmciQueuePair.h"
+#include "vmciResource.h"
+
+#define LGPFX "VMCI: "
+#define VMCI_UTIL_NUM_RESOURCES 1
+
+static uint32_t ctxUpdateSubID = VMCI_INVALID_ID;
+static struct vmci_context *hostContext;
+static atomic_t vmContextID = { VMCI_INVALID_ID };
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCI_HostInit --
+ *
+ *      Initializes the host driver specific components of VMCI.
+ *
+ * Results:
+ *      VMCI_SUCCESS if successful, appropriate error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCI_HostInit(void)
+{
+	int result;
+
+	/*
+	 * In theory, it is unsafe to pass an eventHnd of -1 to platforms which use
+	 * it (VMKernel/Windows/Mac OS at the time of this writing). In practice we
+	 * are fine though, because the event is never used in the case of the host
+	 * context.
+	 */
+	result = VMCIContext_InitContext(VMCI_HOST_CONTEXT_ID,
+					 VMCI_DEFAULT_PROC_PRIVILEGE_FLAGS,
+					 -1, VMCI_VERSION, NULL, &hostContext);
+	if (result < VMCI_SUCCESS) {
+		VMCI_WARNING((LGPFX
+			      "Failed to initialize VMCIContext (result=%d).\n",
+			      result));
+		goto errorExit;
+	}
+
+	result = VMCIQPBroker_Init();
+	if (result < VMCI_SUCCESS) {
+		goto hostContextExit;
+	}
+
+	VMCI_LOG((LGPFX "host components initialized.\n"));
+	return VMCI_SUCCESS;
+
+ hostContextExit:
+	VMCIContext_ReleaseContext(hostContext);
+ errorExit:
+	return result;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCI_HostCleanup --
+ *
+ *      Cleans up the host specific components of the VMCI module.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+void VMCI_HostCleanup(void)
+{
+	VMCIContext_ReleaseContext(hostContext);
+	VMCIQPBroker_Exit();
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCI_DeviceGet --
+ *
+ *      Verifies that a valid VMCI device is present, and indicates
+ *      the callers intention to use the device until it calls
+ *      VMCI_DeviceRelease().
+ *
+ * Results:
+ *      true if a valid VMCI device is present, false otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+bool VMCI_DeviceGet(uint32_t * apiVersion,	// IN/OUT
+		    VMCI_DeviceShutdownFn * deviceShutdownCB,	// UNUSED
+		    void *userData,	// UNUSED
+		    void **deviceRegistration)	// OUT
+{
+	if (NULL != deviceRegistration) {
+		*deviceRegistration = NULL;
+	}
+
+	if (*apiVersion > VMCI_KERNEL_API_VERSION) {
+		*apiVersion = VMCI_KERNEL_API_VERSION;
+		return false;
+	}
+
+	if (!VMCI_DeviceEnabled()) {
+		return false;
+	}
+
+	return true;
+}
+
+EXPORT_SYMBOL(VMCI_DeviceGet);
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCI_DeviceRelease --
+ *
+ *      Indicates that the caller is done using the VMCI device.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ * XXX: Remove me?  Used by vsock?
+ *----------------------------------------------------------------------
+ */
+
+void VMCI_DeviceRelease(void *deviceRegistration)	// UNUSED
+{
+}
+
+EXPORT_SYMBOL(VMCI_DeviceRelease);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIUtilCidUpdate --
+ *
+ *      Gets called with the new context id if updated or resumed.
+ *
+ * Results:
+ *      Context id.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static void VMCIUtilCidUpdate(uint32_t subID,	// IN:
+			      struct vmci_event_data *eventData,	// IN:
+			      void *clientData)	// IN:
+{
+	struct vmci_event_payld_ctx *evPayload =
+	    VMCIEventDataPayload(eventData);
+
+	if (subID != ctxUpdateSubID) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX "Invalid subscriber (ID=0x%x).\n",
+				subID));
+		return;
+	}
+
+	if (eventData == NULL || evPayload->contextID == VMCI_INVALID_ID) {
+		VMCI_DEBUG_LOG(4, (LGPFX "Invalid event data.\n"));
+		return;
+	}
+
+	VMCI_LOG((LGPFX
+		  "Updating context from (ID=0x%x) to (ID=0x%x) on event "
+		  "(type=%d).\n", atomic_read(&vmContextID),
+		  evPayload->contextID, eventData->event));
+	atomic_set(&vmContextID, evPayload->contextID);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIUtil_Init --
+ *
+ *      Subscribe to context id update event.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void VMCIUtil_Init(void)
+{
+	/*
+	 * We subscribe to the VMCI_EVENT_CTX_ID_UPDATE here so we can update the
+	 * internal context id when needed.
+	 */
+	if (VMCIEvent_Subscribe
+	    (VMCI_EVENT_CTX_ID_UPDATE, VMCI_FLAG_EVENT_NONE,
+	     VMCIUtilCidUpdate, NULL, &ctxUpdateSubID) < VMCI_SUCCESS) {
+		VMCI_WARNING((LGPFX
+			      "Failed to subscribe to event (type=%d).\n",
+			      VMCI_EVENT_CTX_ID_UPDATE));
+	}
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIUtil_Exit --
+ *
+ *      Cleanup
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void VMCIUtil_Exit(void)
+{
+	if (VMCIEvent_Unsubscribe(ctxUpdateSubID) < VMCI_SUCCESS) {
+		VMCI_WARNING((LGPFX
+			      "Failed to unsubscribe to event (type=%d) with "
+			      "subscriber (ID=0x%x).\n",
+			      VMCI_EVENT_CTX_ID_UPDATE, ctxUpdateSubID));
+	}
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIUtil_CheckHostCapabilities --
+ *
+ *      Verify that the host supports the hypercalls we need. If it does not,
+ *      try to find fallback hypercalls and use those instead.
+ *
+ * Results:
+ *      true if required hypercalls (or fallback hypercalls) are
+ *      supported by the host, false otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static bool VMCIUtilCheckHostCapabilities(void)
+{
+	int result;
+	struct vmci_rscs_query_msg *msg;
+	uint32_t msgSize = sizeof(struct vmci_rsrc_query_hdr) +
+	    VMCI_UTIL_NUM_RESOURCES * sizeof(uint32_t);
+	struct vmci_datagram *checkMsg = kmalloc(msgSize, GFP_KERNEL);
+
+	if (checkMsg == NULL) {
+		VMCI_WARNING((LGPFX "Check host: Insufficient memory.\n"));
+		return false;
+	}
+
+	checkMsg->dst = VMCI_MAKE_HANDLE(VMCI_HYPERVISOR_CONTEXT_ID,
+					 VMCI_RESOURCES_QUERY);
+	checkMsg->src = VMCI_ANON_SRC_HANDLE;
+	checkMsg->payloadSize = msgSize - VMCI_DG_HEADERSIZE;
+	msg = (struct vmci_rscs_query_msg *)VMCI_DG_PAYLOAD(checkMsg);
+
+	msg->numResources = VMCI_UTIL_NUM_RESOURCES;
+	msg->resources[0] = VMCI_GET_CONTEXT_ID;
+
+	result = VMCI_SendDatagram(checkMsg);
+	kfree(checkMsg);
+
+	/* We need the vector. There are no fallbacks. */
+	return (result == 0x1);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCI_CheckHostCapabilities --
+ *
+ *      Tell host which guestcalls we support and let each API check
+ *      that the host supports the hypercalls it needs. If a hypercall
+ *      is not supported, the API can check for a fallback hypercall,
+ *      or fail the check.
+ *
+ * Results:
+ *      true if successful, false otherwise.
+ *
+ * Side effects:
+ *      Fallback mechanisms may be enabled in the API and vmmon.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+bool VMCI_CheckHostCapabilities(void)
+{
+	bool result = VMCIUtilCheckHostCapabilities();
+
+	VMCI_LOG((LGPFX "Host capability check: %s.\n",
+		  result ? "PASSED" : "FAILED"));
+
+	return result;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCI_ReadDatagramsFromPort --
+ *
+ *      Reads datagrams from the data in port and dispatches them. We
+ *      always start reading datagrams into only the first page of the
+ *      datagram buffer. If the datagrams don't fit into one page, we
+ *      use the maximum datagram buffer size for the remainder of the
+ *      invocation. This is a simple heuristic for not penalizing
+ *      small datagrams.
+ *
+ *      This function assumes that it has exclusive access to the data
+ *      in port for the duration of the call.
+ *
+ * Results:
+ *      No result.
+ *
+ * Side effects:
+ *      Datagram handlers may be invoked.
+ *
+ *----------------------------------------------------------------------
+ */
+
+void VMCI_ReadDatagramsFromPort(int ioHandle,	// IN
+				unsigned short int dgInPort,	// IN
+				uint8_t * dgInBuffer,	// IN
+				size_t dgInBufferSize)	// IN
+{
+	struct vmci_datagram *dg;
+	size_t currentDgInBufferSize = PAGE_SIZE;
+	size_t remainingBytes;
+
+	ASSERT(dgInBufferSize >= PAGE_SIZE);
+
+	insb(dgInPort, dgInBuffer, currentDgInBufferSize);
+	dg = (struct vmci_datagram *)dgInBuffer;
+	remainingBytes = currentDgInBufferSize;
+
+	while (dg->dst.resource != VMCI_INVALID_ID
+	       || remainingBytes > PAGE_SIZE) {
+		unsigned dgInSize;
+
+		/*
+		 * When the input buffer spans multiple pages, a datagram can
+		 * start on any page boundary in the buffer.
+		 */
+
+		if (dg->dst.resource == VMCI_INVALID_ID) {
+			ASSERT(remainingBytes > PAGE_SIZE);
+			dg = (struct vmci_datagram *)roundup((uintptr_t)
+							     dg + 1, PAGE_SIZE);
+			ASSERT((uint8_t *) dg <
+			       dgInBuffer + currentDgInBufferSize);
+			remainingBytes =
+			    (size_t) (dgInBuffer + currentDgInBufferSize -
+				      (uint8_t *) dg);
+			continue;
+		}
+
+		dgInSize = VMCI_DG_SIZE_ALIGNED(dg);
+
+		if (dgInSize <= dgInBufferSize) {
+			int result;
+
+			/*
+			 * If the remaining bytes in the datagram buffer doesn't
+			 * contain the complete datagram, we first make sure we have
+			 * enough room for it and then we read the reminder of the
+			 * datagram and possibly any following datagrams.
+			 */
+
+			if (dgInSize > remainingBytes) {
+				if (remainingBytes != currentDgInBufferSize) {
+
+					/*
+					 * We move the partial datagram to the front and read
+					 * the reminder of the datagram and possibly following
+					 * calls into the following bytes.
+					 */
+
+					memmove(dgInBuffer, dgInBuffer +
+						currentDgInBufferSize -
+						remainingBytes, remainingBytes);
+					dg = (struct vmci_datagram *)
+					    dgInBuffer;
+				}
+
+				if (currentDgInBufferSize != dgInBufferSize)
+					currentDgInBufferSize = dgInBufferSize;
+
+				insb(dgInPort, dgInBuffer + remainingBytes,
+				     currentDgInBufferSize - remainingBytes);
+			}
+
+			/* We special case event datagrams from the hypervisor. */
+			if (dg->src.context == VMCI_HYPERVISOR_CONTEXT_ID
+			    && dg->dst.resource == VMCI_EVENT_HANDLER) {
+				result = VMCIEvent_Dispatch(dg);
+			} else {
+				result = VMCIDatagram_InvokeGuestHandler(dg);
+			}
+			if (result < VMCI_SUCCESS) {
+				VMCI_DEBUG_LOG(4,
+					       (LGPFX
+						"Datagram with resource (ID=0x%x) failed "
+						"(err=%d).\n",
+						dg->dst.resource, result));
+			}
+
+			/* On to the next datagram. */
+			dg = (struct vmci_datagram *)((uint8_t *) dg +
+						      dgInSize);
+		} else {
+			size_t bytesToSkip;
+
+			/* Datagram doesn't fit in datagram buffer of maximal size. We drop it. */
+			VMCI_DEBUG_LOG(4,
+				       (LGPFX
+					"Failed to receive datagram (size=%u bytes).\n",
+					dgInSize));
+
+			bytesToSkip = dgInSize - remainingBytes;
+			if (currentDgInBufferSize != dgInBufferSize)
+				currentDgInBufferSize = dgInBufferSize;
+
+			for (;;) {
+				insb(dgInPort, dgInBuffer,
+				     currentDgInBufferSize);
+				if (bytesToSkip <= currentDgInBufferSize) {
+					break;
+				}
+				bytesToSkip -= currentDgInBufferSize;
+			}
+			dg = (struct vmci_datagram *)(dgInBuffer + bytesToSkip);
+		}
+
+		remainingBytes =
+		    (size_t) (dgInBuffer + currentDgInBufferSize -
+			      (uint8_t *) dg);
+
+		if (remainingBytes < VMCI_DG_HEADERSIZE) {
+			/* Get the next batch of datagrams. */
+
+			insb(dgInPort, dgInBuffer, currentDgInBufferSize);
+			dg = (struct vmci_datagram *)dgInBuffer;
+			remainingBytes = currentDgInBufferSize;
+		}
+	}
+}
+
+/*
+ *----------------------------------------------------------------------------
+ *
+ * VMCI_GetContextID --
+ *
+ *    Returns the current context ID.  Note that since this is accessed only
+ *    from code running in the host, this always returns the host context ID.
+ *
+ * Results:
+ *    Context ID.
+ *
+ * Side effects:
+ *    None.
+ *
+ *----------------------------------------------------------------------------
+ */
+
+uint32_t VMCI_GetContextID(void)
+{
+	if (VMCI_GuestPersonalityActive()) {
+		if (atomic_read(&vmContextID) == VMCI_INVALID_ID) {
+			uint32_t result;
+			struct vmci_datagram getCidMsg;
+			getCidMsg.dst =
+			    VMCI_MAKE_HANDLE(VMCI_HYPERVISOR_CONTEXT_ID,
+					     VMCI_GET_CONTEXT_ID);
+			getCidMsg.src = VMCI_ANON_SRC_HANDLE;
+			getCidMsg.payloadSize = 0;
+			result = VMCI_SendDatagram(&getCidMsg);
+			atomic_set(&vmContextID, result);
+		}
+		return atomic_read(&vmContextID);
+	} else if (VMCI_HostPersonalityActive()) {
+		return VMCI_HOST_CONTEXT_ID;
+	}
+	return VMCI_INVALID_ID;
+}
+
+EXPORT_SYMBOL(VMCI_GetContextID);
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCI_Version --
+ *
+ *     Returns the version of the VMCI driver.
+ *
+ * Results:
+ *      Returns a version number.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+uint32_t VMCI_Version()
+{
+	return VMCI_VERSION;
+}
+
+EXPORT_SYMBOL(VMCI_Version);
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCI_SharedInit --
+ *
+ *      Initializes VMCI components shared between guest and host
+ *      driver. This registers core hypercalls.
+ *
+ * Results:
+ *      VMCI_SUCCESS if successful, appropriate error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCI_SharedInit(void)
+{
+	int result;
+
+	result = VMCIResource_Init();
+	if (result < VMCI_SUCCESS) {
+		VMCI_WARNING((LGPFX
+			      "Failed to initialize VMCIResource (result=%d).\n",
+			      result));
+		goto errorExit;
+	}
+
+	result = VMCIContext_Init();
+	if (result < VMCI_SUCCESS) {
+		VMCI_WARNING((LGPFX
+			      "Failed to initialize VMCIContext (result=%d).\n",
+			      result));
+		goto resourceExit;
+	}
+
+	result = VMCIDatagram_Init();
+	if (result < VMCI_SUCCESS) {
+		VMCI_WARNING((LGPFX
+			      "Failed to initialize VMCIDatagram (result=%d).\n",
+			      result));
+		goto resourceExit;
+	}
+
+	result = VMCIEvent_Init();
+	if (result < VMCI_SUCCESS) {
+		VMCI_WARNING((LGPFX
+			      "Failed to initialize VMCIEvent (result=%d).\n",
+			      result));
+		goto resourceExit;
+	}
+
+	result = VMCIDoorbell_Init();
+	if (result < VMCI_SUCCESS) {
+		VMCI_WARNING((LGPFX
+			      "Failed to initialize VMCIDoorbell (result=%d).\n",
+			      result));
+		goto eventExit;
+	}
+
+	VMCI_LOG((LGPFX "shared components initialized.\n"));
+	return VMCI_SUCCESS;
+
+ eventExit:
+	VMCIEvent_Exit();
+ resourceExit:
+	VMCIResource_Exit();
+ errorExit:
+	return result;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCI_SharedCleanup --
+ *
+ *      Cleans up VMCI components shared between guest and host
+ *      driver.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+void VMCI_SharedCleanup(void)
+{
+	VMCIEvent_Exit();
+	VMCIResource_Exit();
+}
diff --git a/drivers/misc/vmw_vmci/vmciDriver.h b/drivers/misc/vmw_vmci/vmciDriver.h
new file mode 100644
index 0000000..9f38bee
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciDriver.h
@@ -0,0 +1,57 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef _VMCI_DRIVER_H_
+#define _VMCI_DRIVER_H_
+
+#include "vmci_defs.h"
+#include "vmci_infrastructure.h"
+#include "vmciContext.h"
+
+/*
+ * A few macros to encapsulate logging in common code. The macros
+ * result in LOG/LOGThrottled on vmkernel and Log on hosted.
+ */
+
+#define VMCI_DEBUG_LEVEL 4
+#define VMCI_DEBUG_LOG(_level, _args)		\
+  do {						\
+    if (_level < VMCI_DEBUG_LEVEL) {		\
+      Log _args ;				\
+    }						\
+  } while(false)
+#define VMCI_LOG(_args) Log _args
+#define VMCI_WARNING(_args) Warning _args
+
+int VMCI_SharedInit(void);
+void VMCI_SharedCleanup(void);
+int VMCI_HostInit(void);
+void VMCI_HostCleanup(void);
+uint32_t VMCI_GetContextID(void);
+int VMCI_SendDatagram(struct vmci_datagram *dg);
+
+void VMCIUtil_Init(void);
+void VMCIUtil_Exit(void);
+bool VMCI_CheckHostCapabilities(void);
+void VMCI_ReadDatagramsFromPort(int ioHandle, unsigned short int dgInPort,
+				uint8_t * dgInBuffer, size_t dgInBufferSize);
+bool VMCI_DeviceEnabled(void);
+
+#endif				// _VMCI_DRIVER_H_
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 05/14] Add vmciEvent.*
  2012-02-15  1:05 [PATCH 00/14] RFC: VMCI for Linux Andrew Stiegmann (stieg)
                   ` (3 preceding siblings ...)
  2012-02-15  1:05 ` [PATCH 04/14] Add vmciDriver.* Andrew Stiegmann (stieg)
@ 2012-02-15  1:05 ` Andrew Stiegmann (stieg)
  2012-02-15  1:05 ` [PATCH 06/14] Add vmciHashtable.* Andrew Stiegmann (stieg)
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Andrew Stiegmann (stieg) @ 2012-02-15  1:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: vm-crosstalk, dtor, cschamp, Andrew Stiegmann (stieg)

---
 drivers/misc/vmw_vmci/vmciEvent.c |  648 +++++++++++++++++++++++++++++++++++++
 drivers/misc/vmw_vmci/vmciEvent.h |   32 ++
 2 files changed, 680 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmciEvent.c
 create mode 100644 drivers/misc/vmw_vmci/vmciEvent.h

diff --git a/drivers/misc/vmw_vmci/vmciEvent.c b/drivers/misc/vmw_vmci/vmciEvent.c
new file mode 100644
index 0000000..50b1c23
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciEvent.c
@@ -0,0 +1,648 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/sched.h>
+
+#include "vmci_defs.h"
+#include "vmci_infrastructure.h"
+#include "vmciEvent.h"
+#include "vmciKernelAPI.h"
+#include "vmciDriver.h"
+
+#define LGPFX "VMCIEvent: "
+#define EVENT_MAGIC 0xEABE0000
+#define VMCI_EVENT_MAX_ATTEMPTS 10
+
+struct vmci_subscription {
+	uint32_t id;
+	int refCount;
+	bool runDelayed;
+	wait_queue_head_t destroyEvent;
+	uint32_t event;
+	VMCI_EventCB callback;
+	void *callbackData;
+	struct list_head subscriberListItem;
+};
+
+static struct list_head subscriberArray[VMCI_EVENT_MAX];
+static spinlock_t subscriberLock;
+
+struct delayed_event_info {
+	struct vmci_subscription *sub;
+	uint8_t eventPayload[sizeof(struct vmci_event_data_max)];
+};
+
+struct event_ref {
+	struct vmci_subscription *sub;
+	struct list_head listItem;
+};
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIEvent_Init --
+ *
+ *      General init code.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, appropriate error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIEvent_Init(void)
+{
+	int i;
+
+	for (i = 0; i < VMCI_EVENT_MAX; i++)
+		INIT_LIST_HEAD(&subscriberArray[i]);
+
+	spin_lock_init(&subscriberLock);
+	return VMCI_SUCCESS;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIEvent_Exit --
+ *
+ *      General exit code.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+void VMCIEvent_Exit(void)
+{
+	uint32_t e;
+
+	/* We free all memory at exit. */
+	for (e = 0; e < VMCI_EVENT_MAX; e++) {
+		struct vmci_subscription *cur, *p2;
+		list_for_each_entry_safe(cur, p2, &subscriberArray[e],
+					 subscriberListItem) {
+
+			/*
+			 * We should never get here because all events should have been
+			 * unregistered before we try to unload the driver module.
+			 * Also, delayed callbacks could still be firing so this cleanup
+			 * would not be safe.
+			 * Still it is better to free the memory than not ... so we
+			 * leave this code in just in case....
+			 */
+			pr_warning("Unexpected free events occuring in %s",
+				   __func__);
+			kfree(cur);
+	  }
+	}
+
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIEvent_Sync --
+ *
+ *      Use this as a synchronization point when setting globals, for example,
+ *      during device shutdown.
+ *
+ * Results:
+ *      true.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void VMCIEvent_Sync(void)
+{
+	spin_lock_bh(&subscriberLock);
+	spin_unlock_bh(&subscriberLock);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIEventGet --
+ *
+ *      Gets a reference to the given VMCISubscription.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static void VMCIEventGet(struct vmci_subscription *entry)	// IN
+{
+	ASSERT(entry);
+
+	entry->refCount++;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIEventRelease --
+ *
+ *      Releases the given VMCISubscription.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      Fires the destroy event if the reference count has gone to zero.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static void VMCIEventRelease(struct vmci_subscription *entry)	// IN
+{
+	ASSERT(entry);
+	ASSERT(entry->refCount > 0);
+
+	entry->refCount--;
+	if (entry->refCount == 0)
+		wake_up(&entry->destroyEvent);
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  EventReleaseCB --
+ *
+ *     Callback to release the event entry reference. It is called by the
+ *     VMCI_WaitOnEvent function before it blocks.
+ *
+ *  Result:
+ *     None.
+ *
+ *  Side effects:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static int EventReleaseCB(void *clientData)	// IN
+{
+	struct vmci_subscription *sub = (struct vmci_subscription *)clientData;
+
+	ASSERT(sub);
+
+	spin_lock_bh(&subscriberLock);
+	VMCIEventRelease(sub);
+	spin_unlock_bh(&subscriberLock);
+
+	return 0;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIEventFind --
+ *
+ *      Find entry. Assumes lock is held.
+ *
+ * Results:
+ *      Entry if found, NULL if not.
+ *
+ * Side effects:
+ *      Increments the VMCISubscription refcount if an entry is found.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static struct vmci_subscription *VMCIEventFind(uint32_t subID)	// IN
+{
+	uint32_t e;
+
+	for (e = 0; e < VMCI_EVENT_MAX; e++) {
+		struct vmci_subscription *cur;
+		list_for_each_entry(cur, &subscriberArray[e],
+				    subscriberListItem) {
+			if (cur->id == subID) {
+				VMCIEventGet(cur);
+				return cur;
+			}
+		}
+	}
+	return NULL;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIEventDelayedDispatchCB --
+ *
+ *      Calls the specified callback in a delayed context.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+static void VMCIEventDelayedDispatchCB(void *data)	// IN
+{
+	struct delayed_event_info *eventInfo;
+	struct vmci_subscription *sub;
+	struct vmci_event_data *ed;
+
+	eventInfo = data;
+
+	ASSERT(eventInfo);
+	ASSERT(eventInfo->sub);
+
+	sub = eventInfo->sub;
+	ed = (struct vmci_event_data *)eventInfo->eventPayload;
+
+	sub->callback(sub->id, ed, sub->callbackData);
+
+	spin_lock_bh(&subscriberLock);
+	VMCIEventRelease(sub);
+	spin_unlock_bh(&subscriberLock);
+
+	kfree(eventInfo);
+}
+
+/*
+ *----------------------------------------------------------------------------
+ *
+ * VMCIEventDeliver --
+ *
+ *      Actually delivers the events to the subscribers.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      The callback function for each subscriber is invoked.
+ *
+ *----------------------------------------------------------------------------
+ */
+
+static int VMCIEventDeliver(struct vmci_event_msg *eventMsg)	// IN
+{
+	int err = VMCI_SUCCESS;
+	struct vmci_subscription *cur;
+	struct list_head noDelayList;
+	struct vmci_event_data *ed;
+	struct event_ref *eventRef, *p2;
+
+	ASSERT(eventMsg);
+
+	INIT_LIST_HEAD(&noDelayList);
+
+	spin_lock_bh(&subscriberLock);
+	list_for_each_entry(cur, &subscriberArray[eventMsg->eventData.event],
+			    subscriberListItem) {
+		ASSERT(cur && cur->event == eventMsg->eventData.event);
+
+		if (cur->runDelayed) {
+			struct delayed_event_info *eventInfo;
+			eventInfo = kcalloc(sizeof *eventInfo, 1, GFP_ATOMIC);
+			if (!eventInfo) {
+				err = VMCI_ERROR_NO_MEM;
+				goto out;
+			}
+
+			VMCIEventGet(cur);
+
+			memset(eventInfo, 0, sizeof *eventInfo);
+			memcpy(eventInfo->eventPayload,
+			       VMCI_DG_PAYLOAD(eventMsg),
+			       (size_t) eventMsg->hdr.payloadSize);
+			eventInfo->sub = cur;
+			err = VMCI_ScheduleDelayedWork(VMCIEventDelayedDispatchCB,
+						       eventInfo);
+			if (err != VMCI_SUCCESS) {
+				VMCIEventRelease(cur);
+				kfree(eventInfo);
+				goto out;
+			}
+
+		} else {
+			struct event_ref *eventRef;
+
+			/*
+			 * To avoid possible lock rank voilation when holding
+			 * subscriberLock, we construct a local list of
+			 * subscribers and release subscriberLock before
+			 * invokes the callbacks. This is similar to delayed
+			 * callbacks, but callbacks is invoked right away here.
+			 */
+			if ((eventRef =
+			     kcalloc(sizeof *eventRef, 1, GFP_ATOMIC)) == NULL) {
+				err = VMCI_ERROR_NO_MEM;
+				goto out;
+			}
+
+			VMCIEventGet(cur);
+			eventRef->sub = cur;
+			INIT_LIST_HEAD(&eventRef->listItem);
+			list_add(&eventRef->listItem, &noDelayList);
+		}
+	}
+
+ out:
+	spin_unlock_bh(&subscriberLock);
+
+	list_for_each_entry_safe(eventRef, p2, &noDelayList, listItem) {
+		struct vmci_subscription *cur = eventRef->sub;
+		uint8_t
+			eventPayload[sizeof(struct vmci_event_data_max)];
+
+		/* We set event data before each callback to ensure isolation. */
+		memset(eventPayload, 0, sizeof eventPayload);
+		memcpy(eventPayload, VMCI_DG_PAYLOAD(eventMsg),
+		       (size_t) eventMsg->hdr.payloadSize);
+		ed = (struct vmci_event_data *)eventPayload;
+		cur->callback(cur->id, ed, cur->callbackData);
+
+		spin_lock_bh(&subscriberLock);
+		VMCIEventRelease(cur);
+		spin_unlock_bh(&subscriberLock);
+		kfree(eventRef);
+	}
+
+	return err;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIEvent_Dispatch --
+ *
+ *      Dispatcher for the VMCI_EVENT_RECEIVE datagrams. Calls all
+ *      subscribers for given event.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIEvent_Dispatch(struct vmci_datagram *msg)	// IN
+{
+	struct vmci_event_msg *eventMsg = (struct vmci_event_msg *)msg;
+
+	ASSERT(msg &&
+	       msg->src.context == VMCI_HYPERVISOR_CONTEXT_ID &&
+	       msg->dst.resource == VMCI_EVENT_HANDLER);
+
+	if (msg->payloadSize < sizeof(uint32_t) ||
+	    msg->payloadSize > sizeof(struct vmci_event_data_max)) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	if (!VMCI_EVENT_VALID(eventMsg->eventData.event)) {
+		return VMCI_ERROR_EVENT_UNKNOWN;
+	}
+
+	VMCIEventDeliver(eventMsg);
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIEventRegisterSubscription --
+ *
+ *      Initialize and add subscription to subscriber list.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+static int VMCIEventRegisterSubscription(struct vmci_subscription *sub,	// IN
+					 uint32_t event,	// IN
+					 uint32_t flags,	// IN
+					 VMCI_EventCB callback,	// IN
+					 void *callbackData)	// IN
+{
+	static uint32_t subscriptionID = 0;
+	uint32_t attempts = 0;
+	int result;
+	bool success;
+
+	ASSERT(sub);
+
+	if (!VMCI_EVENT_VALID(event) || callback == NULL) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Failed to subscribe to event (type=%d) "
+				"(callback=%p) (data=%p).\n", event,
+				callback, callbackData));
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	sub->runDelayed = (flags & VMCI_FLAG_EVENT_DELAYED_CB) ? true : false;
+	sub->refCount = 1;
+	sub->event = event;
+	sub->callback = callback;
+	sub->callbackData = callbackData;
+	INIT_LIST_HEAD(&sub->subscriberListItem);
+
+	spin_lock_bh(&subscriberLock);
+
+	/* Creation of a new event is always allowed. */
+	for (success = false, attempts = 0;
+	     success == false && attempts < VMCI_EVENT_MAX_ATTEMPTS;
+	     attempts++) {
+		struct vmci_subscription *existingSub = NULL;
+
+		/*
+		 * We try to get an id a couple of time before claiming we are out of
+		 * resources.
+		 */
+		sub->id = ++subscriptionID;
+
+		/* Test for duplicate id. */
+		existingSub = VMCIEventFind(sub->id);
+		if (existingSub == NULL) {
+			/* We succeeded if we didn't find a duplicate. */
+			success = true;
+		} else {
+			VMCIEventRelease(existingSub);
+		}
+	}
+
+	if (success) {
+		init_waitqueue_head(&sub->destroyEvent);
+		list_add(&sub->subscriberListItem, &subscriberArray[event]);
+		result = VMCI_SUCCESS;
+	} else {
+		result = VMCI_ERROR_NO_RESOURCES;
+	}
+
+	spin_unlock_bh(&subscriberLock);
+	return result;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIEventUnregisterSubscription --
+ *
+ *      Remove subscription from subscriber list.
+ *
+ * Results:
+ *      VMCISubscription when found, NULL otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+static struct vmci_subscription *VMCIEventUnregisterSubscription(uint32_t subID)	// IN
+{
+	struct vmci_subscription *s;
+
+	spin_lock_bh(&subscriberLock);
+	s = VMCIEventFind(subID);
+	if (s != NULL) {
+		VMCIEventRelease(s);
+		list_del(&s->subscriberListItem);
+	}
+	spin_unlock_bh(&subscriberLock);
+
+	if (s != NULL) {
+		VMCI_WaitOnEvent(&s->destroyEvent, EventReleaseCB, s);
+	}
+
+	return s;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIEvent_Subscribe --
+ *
+ *      Subscribe to given event. The callback specified can be fired
+ *      in different contexts depending on what flag is specified while
+ *      registering. If flags contains VMCI_FLAG_EVENT_NONE then the
+ *      callback is fired with the subscriber lock held (and BH context
+ *      on the guest). If flags contain VMCI_FLAG_EVENT_DELAYED_CB then
+ *      the callback is fired with no locks held in thread context.
+ *      This is useful because other VMCIEvent functions can be called,
+ *      but it also increases the chances that an event will be dropped.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIEvent_Subscribe(uint32_t event,	// IN
+			uint32_t flags,	// IN
+			VMCI_EventCB callback,	// IN
+			void *callbackData,	// IN
+			uint32_t * subscriptionID)	// OUT
+{
+	int retval;
+	struct vmci_subscription *s = NULL;
+
+	if (subscriptionID == NULL) {
+		VMCI_DEBUG_LOG(4, (LGPFX "Invalid subscription (NULL).\n"));
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	s = kmalloc(sizeof *s, GFP_KERNEL);
+	if (s == NULL)
+		return VMCI_ERROR_NO_MEM;
+
+	retval = VMCIEventRegisterSubscription(s, event, flags,
+					       callback, callbackData);
+	if (retval < VMCI_SUCCESS) {
+		kfree(s);
+		return retval;
+	}
+
+	*subscriptionID = s->id;
+	return retval;
+}
+
+EXPORT_SYMBOL(VMCIEvent_Subscribe);
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIEvent_Unsubscribe --
+ *
+ *      Unsubscribe to given event. Removes it from list and frees it.
+ *      Will return callbackData if requested by caller.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+int VMCIEvent_Unsubscribe(uint32_t subID)	// IN
+{
+	struct vmci_subscription *s;
+
+	/*
+	 * Return subscription. At this point we know noone else is accessing
+	 * the subscription so we can free it.
+	 */
+	s = VMCIEventUnregisterSubscription(subID);
+	if (s == NULL)
+		return VMCI_ERROR_NOT_FOUND;
+
+	kfree(s);
+
+	return VMCI_SUCCESS;
+}
+
+EXPORT_SYMBOL(VMCIEvent_Unsubscribe);
diff --git a/drivers/misc/vmw_vmci/vmciEvent.h b/drivers/misc/vmw_vmci/vmciEvent.h
new file mode 100644
index 0000000..575bff0
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciEvent.h
@@ -0,0 +1,32 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef __VMCI_EVENT_H__
+#define __VMCI_EVENT_H__
+
+#include "vmci_defs.h"
+#include "vmci_call_defs.h"
+
+int VMCIEvent_Init(void);
+void VMCIEvent_Exit(void);
+void VMCIEvent_Sync(void);
+int VMCIEvent_Dispatch(struct vmci_datagram *msg);
+
+#endif				//__VMCI_EVENT_H__
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 06/14] Add vmciHashtable.*
  2012-02-15  1:05 [PATCH 00/14] RFC: VMCI for Linux Andrew Stiegmann (stieg)
                   ` (4 preceding siblings ...)
  2012-02-15  1:05 ` [PATCH 05/14] Add vmciEvent.* Andrew Stiegmann (stieg)
@ 2012-02-15  1:05 ` Andrew Stiegmann (stieg)
  2012-02-15  1:05 ` [PATCH 07/14] Add vmciQueuePair.* Andrew Stiegmann (stieg)
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Andrew Stiegmann (stieg) @ 2012-02-15  1:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: vm-crosstalk, dtor, cschamp, Andrew Stiegmann (stieg)

---
 drivers/misc/vmw_vmci/vmciHashtable.c |  519 +++++++++++++++++++++++++++++++++
 drivers/misc/vmw_vmci/vmciHashtable.h |   58 ++++
 2 files changed, 577 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmciHashtable.c
 create mode 100644 drivers/misc/vmw_vmci/vmciHashtable.h

diff --git a/drivers/misc/vmw_vmci/vmciHashtable.c b/drivers/misc/vmw_vmci/vmciHashtable.c
new file mode 100644
index 0000000..dd5c4cd
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciHashtable.c
@@ -0,0 +1,519 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#include "vmci_defs.h"
+#include "vmci_infrastructure.h"
+#include "vmci_kernel_if.h"
+#include "vmciCommonInt.h"
+#include "vmciDriver.h"
+#include "vmciHashtable.h"
+
+#define LGPFX "VMCIHashTable: "
+
+#define VMCI_HASHTABLE_HASH(_h, _sz)				\
+	VMCI_HashId(VMCI_HANDLE_TO_RESOURCE_ID(_h), (_sz))
+
+/* static int HashTableUnlinkEntry(struct vmci_hash_table *table, */
+/* 				struct vmci_hash_entry *entry); */
+/* static bool VMCIHashTableEntryExistsLocked(struct vmci_hash_table *table, */
+/* 					   struct vmci_handle handle); */
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIHashTable_Create --
+ *     XXX: Factor out the hashtable code to be shared amongst host and guest.
+ *
+ *  Result:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+struct vmci_hash_table *VMCIHashTable_Create(int size)
+{
+
+	struct vmci_hash_table *table = kmalloc(sizeof *table, GFP_KERNEL);
+	if (table == NULL)
+		return NULL;
+
+	table->entries = kmalloc(sizeof *table->entries * size, GFP_KERNEL);
+	if (table->entries == NULL) {
+		kfree(table);
+		return NULL;
+	}
+	memset(table->entries, 0, sizeof *table->entries * size);
+	table->size = size;
+
+	spin_lock_init(&table->lock);
+
+	return table;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIHashTable_Destroy --
+ *     This function should be called at module exit time.
+ *     We rely on the module ref count to insure that no one is accessing any
+ *     hash table entries at this point in time. Hence we should be able to just
+ *     remove all entries from the hash table.
+ *
+ *  Result:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+void VMCIHashTable_Destroy(struct vmci_hash_table *table)
+{
+	ASSERT(table);
+
+	spin_lock_bh(&table->lock);
+	kfree(table->entries);
+	table->entries = NULL;
+	spin_unlock_bh(&table->lock);
+	kfree(table);
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIHashTable_InitEntry --
+ *     Initializes a hash entry;
+ *
+ *  Result:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+void VMCIHashTable_InitEntry(struct vmci_hash_entry *entry,	// IN
+			     struct vmci_handle handle)	// IN
+{
+	ASSERT(entry);
+	entry->handle = handle;
+	entry->refCount = 0;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIHashTableEntryExistsLocked --
+ *
+ *     Unlocked version of VMCIHashTable_EntryExists.
+ *
+ *  Result:
+ *     true if handle already in hashtable. false otherwise.
+ *
+ *  Side effects:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static bool VMCIHashTableEntryExistsLocked(struct vmci_hash_table *table,	// IN
+					   struct vmci_handle handle)	// IN
+{
+	struct vmci_hash_entry *entry;
+	int idx;
+
+	ASSERT(table);
+
+	idx = VMCI_HASHTABLE_HASH(handle, table->size);
+
+	for (entry = table->entries[idx]; entry; entry = entry->next) {
+		if (VMCI_HANDLE_TO_RESOURCE_ID(entry->handle) ==
+		    VMCI_HANDLE_TO_RESOURCE_ID(handle) &&
+		    ((VMCI_HANDLE_TO_CONTEXT_ID(entry->handle) ==
+		      VMCI_HANDLE_TO_CONTEXT_ID(handle)) ||
+		     (VMCI_INVALID_ID == VMCI_HANDLE_TO_CONTEXT_ID(handle))
+		     || (VMCI_INVALID_ID ==
+			 VMCI_HANDLE_TO_CONTEXT_ID(entry->handle)))) {
+			return true;
+		}
+	}
+
+	return false;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  HashTableUnlinkEntry --
+ *     XXX Factor out the hashtable code to shared amongst API and perhaps
+ *     host and guest.
+ *     Assumes caller holds table lock.
+ *
+ *  Result:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static int HashTableUnlinkEntry(struct vmci_hash_table *table,	// IN
+				struct vmci_hash_entry *entry)	// IN
+{
+	int result;
+	struct vmci_hash_entry *prev, *cur;
+	int idx;
+
+	idx = VMCI_HASHTABLE_HASH(entry->handle, table->size);
+
+	prev = NULL;
+	cur = table->entries[idx];
+	while (true) {
+		if (cur == NULL) {
+			result = VMCI_ERROR_NOT_FOUND;
+			break;
+		}
+		if (VMCI_HANDLE_EQUAL(cur->handle, entry->handle)) {
+			ASSERT(cur == entry);
+
+			/* Remove entry and break. */
+			if (prev) {
+				prev->next = cur->next;
+			} else {
+				table->entries[idx] = cur->next;
+			}
+			cur->next = NULL;
+			result = VMCI_SUCCESS;
+			break;
+		}
+		prev = cur;
+		cur = cur->next;
+	}
+	return result;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIHashTable_AddEntry --
+ *     XXX Factor out the hashtable code to be shared amongst host and guest.
+ *
+ *  Result:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIHashTable_AddEntry(struct vmci_hash_table *table,	// IN
+			   struct vmci_hash_entry *entry)	// IN
+{
+	int idx;
+
+	ASSERT(entry);
+	ASSERT(table);
+
+	spin_lock_bh(&table->lock);
+
+	/* Creation of a new hashtable entry is always allowed. */
+	if (VMCIHashTableEntryExistsLocked(table, entry->handle)) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Entry (handle=0x%x:0x%x) already exists.\n",
+				entry->handle.context, entry->handle.resource));
+		spin_unlock_bh(&table->lock);
+		return VMCI_ERROR_DUPLICATE_ENTRY;
+	}
+
+	idx = VMCI_HASHTABLE_HASH(entry->handle, table->size);
+	ASSERT(idx < table->size);
+
+	/* New entry is added to top/front of hash bucket. */
+	entry->refCount++;
+	entry->next = table->entries[idx];
+	table->entries[idx] = entry;
+	spin_unlock_bh(&table->lock);
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIHashTable_RemoveEntry --
+ *     XXX Factor out the hashtable code to shared amongst API and perhaps
+ *     host and guest.
+ *
+ *  Result:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIHashTable_RemoveEntry(struct vmci_hash_table *table,	// IN
+			      struct vmci_hash_entry *entry)	// IN
+{
+	int result;
+
+	ASSERT(table);
+	ASSERT(entry);
+
+	spin_lock_bh(&table->lock);
+
+	/* First unlink the entry. */
+	result = HashTableUnlinkEntry(table, entry);
+	if (result != VMCI_SUCCESS) {
+		/* We failed to find the entry. */
+		goto done;
+	}
+
+	/* Decrement refcount and check if this is last reference. */
+	entry->refCount--;
+	if (entry->refCount == 0) {
+		result = VMCI_SUCCESS_ENTRY_DEAD;
+		goto done;
+	}
+
+ done:
+	spin_unlock_bh(&table->lock);
+
+	return result;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIHashTableGetEntryLocked --
+ *
+ *       Looks up an entry in the hash table, that is already locked.
+ *
+ *  Result:
+ *       If the element is found, a pointer to the element is returned.
+ *       Otherwise NULL is returned.
+ *
+ *  Side effects:
+ *       The reference count of the returned element is increased.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static inline struct vmci_hash_entry *VMCIHashTableGetEntryLocked(struct vmci_hash_table *table,	// IN
+								  struct vmci_handle handle)	// IN
+{
+	struct vmci_hash_entry *cur = NULL;
+	int idx;
+
+	ASSERT(!VMCI_HANDLE_EQUAL(handle, VMCI_INVALID_HANDLE));
+	ASSERT(table);
+
+	idx = VMCI_HASHTABLE_HASH(handle, table->size);
+
+	for (cur = table->entries[idx]; cur != NULL; cur = cur->next) {
+		if (VMCI_HANDLE_TO_RESOURCE_ID(cur->handle) ==
+		    VMCI_HANDLE_TO_RESOURCE_ID(handle) &&
+		    ((VMCI_HANDLE_TO_CONTEXT_ID(cur->handle) ==
+		      VMCI_HANDLE_TO_CONTEXT_ID(handle)) ||
+		     (VMCI_INVALID_ID ==
+		      VMCI_HANDLE_TO_CONTEXT_ID(cur->handle)))) {
+			cur->refCount++;
+			break;
+		}
+	}
+
+	return cur;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIHashTable_GetEntry --
+ *     XXX Factor out the hashtable code to shared amongst API and perhaps
+ *     host and guest.
+ *
+ *  Result:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+struct vmci_hash_entry *VMCIHashTable_GetEntry(struct vmci_hash_table *table,	// IN
+					       struct vmci_handle handle)	// IN
+{
+	struct vmci_hash_entry *entry;
+
+	if (VMCI_HANDLE_EQUAL(handle, VMCI_INVALID_HANDLE))
+		return NULL;
+
+	ASSERT(table);
+
+	spin_lock_bh(&table->lock);
+	entry = VMCIHashTableGetEntryLocked(table, handle);
+	spin_unlock_bh(&table->lock);
+
+	return entry;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIHashTable_HoldEntry --
+ *
+ *     Hold the given entry.  This will increment the entry's reference count.
+ *     This is like a GetEntry() but without having to lookup the entry by
+ *     handle.
+ *
+ *  Result:
+ *     None.
+ *
+ *  Side effects:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+void VMCIHashTable_HoldEntry(struct vmci_hash_table *table,	// IN
+			     struct vmci_hash_entry *entry)	// IN/OUT
+{
+	ASSERT(table);
+	ASSERT(entry);
+
+	spin_lock_bh(&table->lock);
+	entry->refCount++;
+	spin_unlock_bh(&table->lock);
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIHashTableReleaseEntryLocked --
+ *
+ *       Releases an element previously obtained with
+ *       VMCIHashTableGetEntryLocked.
+ *
+ *  Result:
+ *       If the entry is removed from the hash table, VMCI_SUCCESS_ENTRY_DEAD
+ *       is returned. Otherwise, VMCI_SUCCESS is returned.
+ *
+ *  Side effects:
+ *       The reference count of the entry is decreased and the entry is removed
+ *       from the hash table on 0.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static inline int VMCIHashTableReleaseEntryLocked(struct vmci_hash_table *table,	// IN
+						  struct vmci_hash_entry *entry)	// IN
+{
+	int result = VMCI_SUCCESS;
+
+	ASSERT(table);
+	ASSERT(entry);
+
+	entry->refCount--;
+	/* Check if this is last reference and report if so. */
+	if (entry->refCount == 0) {
+
+		/*
+		 * Remove entry from hash table if not already removed. This could have
+		 * happened already because VMCIHashTable_RemoveEntry was called to unlink
+		 * it. We ignore if it is not found. Datagram handles will often have
+		 * RemoveEntry called, whereas SharedMemory regions rely on ReleaseEntry
+		 * to unlink the entry, since the creator does not call RemoveEntry when
+		 * it detaches.
+		 */
+
+		HashTableUnlinkEntry(table, entry);
+		result = VMCI_SUCCESS_ENTRY_DEAD;
+	}
+
+	return result;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIHashTable_ReleaseEntry --
+ *     XXX Factor out the hashtable code to shared amongst API and perhaps
+ *     host and guest.
+ *
+ *  Result:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIHashTable_ReleaseEntry(struct vmci_hash_table *table,	// IN
+			       struct vmci_hash_entry *entry)	// IN
+{
+	int result;
+
+	ASSERT(table);
+	spin_lock_bh(&table->lock);
+	result = VMCIHashTableReleaseEntryLocked(table, entry);
+	spin_unlock_bh(&table->lock);
+
+	return result;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIHashTable_EntryExists --
+ *     XXX Factor out the hashtable code to shared amongst API and perhaps
+ *     host and guest.
+ *
+ *  Result:
+ *     true if handle already in hashtable. false otherwise.
+ *
+ *  Side effects:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+bool VMCIHashTable_EntryExists(struct vmci_hash_table * table,	// IN
+			       struct vmci_handle handle)	// IN
+{
+	bool exists;
+
+	ASSERT(table);
+
+	spin_lock_bh(&table->lock);
+	exists = VMCIHashTableEntryExistsLocked(table, handle);
+	spin_unlock_bh(&table->lock);
+
+	return exists;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIHashTable_Sync --
+ *
+ *      Use this as a synchronization point when setting globals, for example,
+ *      during device shutdown.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void VMCIHashTable_Sync(struct vmci_hash_table *table)
+{
+	ASSERT(table);
+	spin_lock_bh(&table->lock);
+	spin_unlock_bh(&table->lock);
+}
diff --git a/drivers/misc/vmw_vmci/vmciHashtable.h b/drivers/misc/vmw_vmci/vmciHashtable.h
new file mode 100644
index 0000000..33d5503
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciHashtable.h
@@ -0,0 +1,58 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef _VMCI_HASHTABLE_H_
+#define _VMCI_HASHTABLE_H_
+
+#include "vmci_defs.h"
+#include "vmci_kernel_if.h"
+
+struct vmci_hash_entry {
+	struct vmci_handle handle;
+	int refCount;
+	struct vmci_hash_entry *next;
+};
+
+struct vmci_hash_table {
+	struct vmci_hash_entry **entries;
+	int size;		/* Number of buckets in above array. */
+	spinlock_t lock;
+};
+
+struct vmci_hash_table *VMCIHashTable_Create(int size);
+void VMCIHashTable_Destroy(struct vmci_hash_table *table);
+void VMCIHashTable_InitEntry(struct vmci_hash_entry *entry,
+			     struct vmci_handle handle);
+int VMCIHashTable_AddEntry(struct vmci_hash_table *table,
+			   struct vmci_hash_entry *entry);
+int VMCIHashTable_RemoveEntry(struct vmci_hash_table *table,
+			      struct vmci_hash_entry *entry);
+struct vmci_hash_entry *VMCIHashTable_GetEntry(struct vmci_hash_table
+					       *table,
+					       struct vmci_handle handle);
+void VMCIHashTable_HoldEntry(struct vmci_hash_table *table,
+			     struct vmci_hash_entry *entry);
+int VMCIHashTable_ReleaseEntry(struct vmci_hash_table *table,
+			       struct vmci_hash_entry *entry);
+bool VMCIHashTable_EntryExists(struct vmci_hash_table *table,
+			       struct vmci_handle handle);
+void VMCIHashTable_Sync(struct vmci_hash_table *table);
+
+#endif				// _VMCI_HASHTABLE_H_
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 07/14] Add vmciQueuePair.*
  2012-02-15  1:05 [PATCH 00/14] RFC: VMCI for Linux Andrew Stiegmann (stieg)
                   ` (5 preceding siblings ...)
  2012-02-15  1:05 ` [PATCH 06/14] Add vmciHashtable.* Andrew Stiegmann (stieg)
@ 2012-02-15  1:05 ` Andrew Stiegmann (stieg)
  2012-02-15  1:05 ` [PATCH 08/14] Add vmciResource.* Andrew Stiegmann (stieg)
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Andrew Stiegmann (stieg) @ 2012-02-15  1:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: vm-crosstalk, dtor, cschamp, Andrew Stiegmann (stieg)

---
 drivers/misc/vmw_vmci/vmciQueuePair.c | 2696 +++++++++++++++++++++++++++++++++
 drivers/misc/vmw_vmci/vmciQueuePair.h |   95 ++
 2 files changed, 2791 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmciQueuePair.c
 create mode 100644 drivers/misc/vmw_vmci/vmciQueuePair.h

diff --git a/drivers/misc/vmw_vmci/vmciQueuePair.c b/drivers/misc/vmw_vmci/vmciQueuePair.c
new file mode 100644
index 0000000..0745e09
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciQueuePair.c
@@ -0,0 +1,2696 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#include <linux/semaphore.h>
+
+#include "vmci_defs.h"
+#include "vmci_handle_array.h"
+#include "vmci_infrastructure.h"
+#include "vmci_kernel_if.h"
+#include "vmciCommonInt.h"
+#include "vmciContext.h"
+#include "vmciDatagram.h"
+#include "vmciDriver.h"
+#include "vmciEvent.h"
+#include "vmciHashtable.h"
+#include "vmciKernelAPI.h"
+#include "vmciQueuePair.h"
+#include "vmciResource.h"
+#include "vmciRoute.h"
+
+#define LGPFX "VMCIQueuePair: "
+
+/*
+ * In the following, we will distinguish between two kinds of VMX processes -
+ * the ones with versions lower than VMCI_VERSION_NOVMVM that use specialized
+ * VMCI page files in the VMX and supporting VM to VM communication and the
+ * newer ones that use the guest memory directly. We will in the following refer
+ * to the older VMX versions as old-style VMX'en, and the newer ones as new-style
+ * VMX'en.
+ *
+ * The state transition datagram is as follows (the VMCIQPB_ prefix has been
+ * removed for readability) - see below for more details on the transtions:
+ *
+ *            --------------  NEW  -------------
+ *            |                                |
+ *           \_/                              \_/
+ *     CREATED_NO_MEM <-----------------> CREATED_MEM
+ *            |    |                           |
+ *            |    o-----------------------o   |
+ *            |                            |   |
+ *           \_/                          \_/ \_/
+ *     ATTACHED_NO_MEM <----------------> ATTACHED_MEM
+ *            |                            |   |
+ *            |     o----------------------o   |
+ *            |     |                          |
+ *           \_/   \_/                        \_/
+ *     SHUTDOWN_NO_MEM <----------------> SHUTDOWN_MEM
+ *            |                                |
+ *            |                                |
+ *            -------------> gone <-------------
+ *
+ * In more detail. When a VMCI queue pair is first created, it will be in the
+ * VMCIQPB_NEW state. It will then move into one of the following states:
+ * - VMCIQPB_CREATED_NO_MEM: this state indicates that either:
+ *     - the created was performed by a host endpoint, in which case there is no
+ *       backing memory yet.
+ *     - the create was initiated by an old-style VMX, that uses
+ *       VMCIQPBroker_SetPageStore to specify the UVAs of the queue pair at a
+ *       later point in time. This state can be distinguished from the one above
+ *       by the context ID of the creator. A host side is not allowed to attach
+ *       until the page store has been set.
+ * - VMCIQPB_CREATED_MEM: this state is the result when the queue pair is created
+ *     by a VMX using the queue pair device backend that sets the UVAs of the
+ *     queue pair immediately and stores the information for later attachers. At
+ *     this point, it is ready for the host side to attach to it.
+ * Once the queue pair is in one of the created states (with the exception of the
+ * case mentioned for older VMX'en above), it is possible to attach to the queue
+ * pair. Again we have two new states possible:
+ * - VMCIQPB_ATTACHED_MEM: this state can be reached through the following paths:
+ *     - from VMCIQPB_CREATED_NO_MEM when a new-style VMX allocates a queue pair,
+ *       and attaches to a queue pair previously created by the host side.
+ *     - from VMCIQPB_CREATED_MEM when the host side attaches to a queue pair
+ *       already created by a guest.
+ *     - from VMCIQPB_ATTACHED_NO_MEM, when an old-style VMX calls
+ *       VMCIQPBroker_SetPageStore (see below).
+ * - VMCIQPB_ATTACHED_NO_MEM: If the queue pair already was in the
+ *     VMCIQPB_CREATED_NO_MEM due to a host side create, an old-style VMX will
+ *     bring the queue pair into this state. Once VMCIQPBroker_SetPageStore is
+ *     called to register the user memory, the VMCIQPB_ATTACH_MEM state will be
+ *     entered.
+ * From the attached queue pair, the queue pair can enter the shutdown states
+ * when either side of the queue pair detaches. If the guest side detaches first,
+ * the queue pair will enter the VMCIQPB_SHUTDOWN_NO_MEM state, where the content
+ * of the queue pair will no longer be available. If the host side detaches first,
+ * the queue pair will either enter the VMCIQPB_SHUTDOWN_MEM, if the guest memory
+ * is currently mapped, or VMCIQPB_SHUTDOWN_NO_MEM, if the guest memory is not
+ * mapped (e.g., the host detaches while a guest is stunned).
+ *
+ * New-style VMX'en will also unmap guest memory, if the guest is quiesced, e.g.,
+ * during a snapshot operation. In that case, the guest memory will no longer be
+ * available, and the queue pair will transition from *_MEM state to a *_NO_MEM
+ * state. The VMX may later map the memory once more, in which case the queue
+ * pair will transition from the *_NO_MEM state at that point back to the *_MEM
+ * state. Note that the *_NO_MEM state may have changed, since the peer may have
+ * either attached or detached in the meantime. The values are laid out such that
+ * ++ on a state will move from a *_NO_MEM to a *_MEM state, and vice versa.
+ */
+
+typedef enum {
+	VMCIQPB_NEW,
+	VMCIQPB_CREATED_NO_MEM,
+	VMCIQPB_CREATED_MEM,
+	VMCIQPB_ATTACHED_NO_MEM,
+	VMCIQPB_ATTACHED_MEM,
+	VMCIQPB_SHUTDOWN_NO_MEM,
+	VMCIQPB_SHUTDOWN_MEM,
+	VMCIQPB_GONE
+} QPBrokerState;
+
+#define QPBROKERSTATE_HAS_MEM(_qpb) (_qpb->state == VMCIQPB_CREATED_MEM || \
+                                     _qpb->state == VMCIQPB_ATTACHED_MEM || \
+                                     _qpb->state == VMCIQPB_SHUTDOWN_MEM)
+
+/*
+ * In the queue pair broker, we always use the guest point of view for
+ * the produce and consume queue values and references, e.g., the
+ * produce queue size stored is the guests produce queue size. The
+ * host endpoint will need to swap these around. The only exception is
+ * the local queue pairs on the host, in which case the host endpoint
+ * that creates the queue pair will have the right orientation, and
+ * the attaching host endpoint will need to swap.
+ */
+
+struct qp_entry {
+	struct list_head listItem;
+	struct vmci_handle handle;
+	uint32_t peer;
+	uint32_t flags;
+	uint64_t produceSize;
+	uint64_t consumeSize;
+	uint32_t refCount;
+};
+
+struct qp_broker_entry {
+	struct qp_entry qp;
+	uint32_t createId;
+	uint32_t attachId;
+	QPBrokerState state;
+	bool requireTrustedAttach;
+	bool createdByTrusted;
+	bool vmciPageFiles;	// Created by VMX using VMCI page files
+	struct vmci_queue *produceQ;
+	struct vmci_queue *consumeQ;
+	struct vmci_queue_header savedProduceQ;
+	struct vmci_queue_header savedConsumeQ;
+	VMCIEventReleaseCB wakeupCB;
+	void *clientData;
+	void *localMem;		// Kernel memory for local queue pair
+};
+
+struct qp_guest_endpoint {
+	struct qp_entry qp;
+	uint64_t numPPNs;
+	void *produceQ;
+	void *consumeQ;
+	bool hibernateFailure;
+	struct PPNSet ppnSet;
+};
+
+struct qp_list {
+	struct list_head head;
+	atomic_t hibernate;
+	struct semaphore mutex;
+};
+
+static struct qp_list qpBrokerList;
+
+#define QPE_NUM_PAGES(_QPE) ((uint32_t)(CEILING(_QPE.produceSize, PAGE_SIZE) + \
+					CEILING(_QPE.consumeSize, PAGE_SIZE) + 2))
+
+static struct qp_list qpGuestEndpoints;
+static struct vmci_handle_arr *hibernateFailedList;
+static spinlock_t hibernateFailedListLock;
+
+extern int VMCI_SendDatagram(struct vmci_datagram *);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * QueuePairList_FindEntry --
+ *
+ *      Finds the entry in the list corresponding to a given handle. Assumes
+ *      that the list is locked.
+ *
+ * Results:
+ *      Pointer to entry.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static struct qp_entry *QueuePairList_FindEntry(struct qp_list *qpList,	// IN
+						struct vmci_handle handle)	// IN
+{
+	struct list_head *next;
+
+	if (VMCI_HANDLE_INVALID(handle)) {
+		return NULL;
+	}
+
+	list_for_each(next, &qpList->head) {
+		struct qp_entry *entry =
+		    list_entry(next, struct qp_entry, listItem);
+
+		if (VMCI_HANDLE_EQUAL(entry->handle, handle)) {
+			return entry;
+		}
+	}
+
+	return NULL;
+}
+
+/*
+ *----------------------------------------------------------------------------
+ *
+ * QueuePairNotifyPeerLocal --
+ *
+ *      Dispatches a queue pair event message directly into the local event
+ *      queue.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, error code otherwise
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------------
+ */
+
+static int QueuePairNotifyPeerLocal(bool attach,	// IN: attach or detach?
+				    struct vmci_handle handle)	// IN: queue pair handle
+{
+	struct vmci_event_msg *eMsg;
+	struct vmci_event_payld_qp *ePayload;
+	/* buf is only 48 bytes. */
+	char buf[sizeof *eMsg + sizeof *ePayload];
+	uint32_t contextId;
+
+	contextId = VMCI_GetContextID();
+
+	eMsg = (struct vmci_event_msg *)buf;
+	ePayload = VMCIEventMsgPayload(eMsg);
+
+	eMsg->hdr.dst = VMCI_MAKE_HANDLE(contextId, VMCI_EVENT_HANDLER);
+	eMsg->hdr.src = VMCI_MAKE_HANDLE(VMCI_HYPERVISOR_CONTEXT_ID,
+					 VMCI_CONTEXT_RESOURCE_ID);
+	eMsg->hdr.payloadSize =
+	    sizeof *eMsg + sizeof *ePayload - sizeof eMsg->hdr;
+	eMsg->eventData.event =
+	    attach ? VMCI_EVENT_QP_PEER_ATTACH : VMCI_EVENT_QP_PEER_DETACH;
+	ePayload->peerId = contextId;
+	ePayload->handle = handle;
+
+	return VMCIEvent_Dispatch((struct vmci_datagram *)eMsg);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * QPGuestEndpointCreate --
+ *
+ *      Allocates and initializes a QPGuestEndpoint structure.
+ *      Allocates a QueuePair rid (and handle) iff the given entry has
+ *      an invalid handle.  0 through VMCI_RESERVED_RESOURCE_ID_MAX
+ *      are reserved handles.  Assumes that the QP list mutex is held
+ *      by the caller.
+ *
+ * Results:
+ *      Pointer to structure intialized.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+struct qp_guest_endpoint *QPGuestEndpointCreate(struct vmci_handle handle,	// IN
+						uint32_t peer,	// IN
+						uint32_t flags,	// IN
+						uint64_t produceSize,	// IN
+						uint64_t consumeSize,	// IN
+						void *produceQ,	// IN
+						void *consumeQ)	// IN
+{
+	static uint32_t queuePairRID = VMCI_RESERVED_RESOURCE_ID_MAX + 1;
+	struct qp_guest_endpoint *entry;
+	const uint64_t numPPNs = CEILING(produceSize, PAGE_SIZE) + CEILING(consumeSize, PAGE_SIZE) + 2;	/* One page each for the queue headers. */
+
+	ASSERT((produceSize || consumeSize) && produceQ && consumeQ);
+
+	if (VMCI_HANDLE_INVALID(handle)) {
+		uint32_t contextID = VMCI_GetContextID();
+		uint32_t oldRID = queuePairRID;
+
+		/*
+		 * Generate a unique QueuePair rid.  Keep on trying until we wrap around
+		 * in the RID space.
+		 */
+		ASSERT(oldRID > VMCI_RESERVED_RESOURCE_ID_MAX);
+		do {
+			handle = VMCI_MAKE_HANDLE(contextID, queuePairRID);
+			entry = (struct qp_guest_endpoint *)
+			    QueuePairList_FindEntry(&qpGuestEndpoints, handle);
+			queuePairRID++;
+			if (unlikely(!queuePairRID)) {
+				/*
+				 * Skip the reserved rids.
+				 */
+				queuePairRID =
+				    VMCI_RESERVED_RESOURCE_ID_MAX + 1;
+			}
+		} while (entry && queuePairRID != oldRID);
+
+		if (unlikely(entry != NULL)) {
+			ASSERT(queuePairRID == oldRID);
+			/*
+			 * We wrapped around --- no rids were free.
+			 */
+			return NULL;
+		}
+	}
+
+	ASSERT(!VMCI_HANDLE_INVALID(handle) &&
+	       QueuePairList_FindEntry(&qpGuestEndpoints, handle) == NULL);
+	entry = kmalloc(sizeof *entry, GFP_KERNEL);
+	if (entry) {
+		entry->qp.handle = handle;
+		entry->qp.peer = peer;
+		entry->qp.flags = flags;
+		entry->qp.produceSize = produceSize;
+		entry->qp.consumeSize = consumeSize;
+		entry->qp.refCount = 0;
+		entry->numPPNs = numPPNs;
+		memset(&entry->ppnSet, 0, sizeof entry->ppnSet);
+		entry->produceQ = produceQ;
+		entry->consumeQ = consumeQ;
+		INIT_LIST_HEAD(&entry->qp.listItem);
+	}
+	return entry;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * QPGuestEndpointDestroy --
+ *
+ *      Frees a QPGuestEndpoint structure.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void QPGuestEndpointDestroy(struct qp_guest_endpoint *entry)	// IN
+{
+	ASSERT(entry);
+	ASSERT(entry->qp.refCount == 0);
+
+	VMCI_FreePPNSet(&entry->ppnSet);
+	VMCI_CleanupQueueMutex(entry->produceQ, entry->consumeQ);
+	VMCI_FreeQueue(entry->produceQ, entry->qp.produceSize);
+	VMCI_FreeQueue(entry->consumeQ, entry->qp.consumeSize);
+	kfree(entry);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQueuePairAllocHypercall --
+ *
+ *      Helper to make a QueuePairAlloc hypercall when the driver is
+ *      supporting a guest device.
+ *
+ * Results:
+ *      Result of the hypercall.
+ *
+ * Side effects:
+ *      Memory is allocated & freed.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static int VMCIQueuePairAllocHypercall(const struct qp_guest_endpoint *entry)	// IN
+{
+	struct vmci_qp_alloc_msg *allocMsg;
+	size_t msgSize;
+	int result;
+
+	if (!entry || entry->numPPNs <= 2)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	ASSERT(!(entry->qp.flags & VMCI_QPFLAG_LOCAL));
+
+	msgSize = sizeof *allocMsg + (size_t) entry->numPPNs * sizeof(uint32_t);
+	allocMsg = kmalloc(msgSize, GFP_KERNEL);
+	if (!allocMsg)
+		return VMCI_ERROR_NO_MEM;
+
+	allocMsg->hdr.dst = VMCI_MAKE_HANDLE(VMCI_HYPERVISOR_CONTEXT_ID,
+					     VMCI_QUEUEPAIR_ALLOC);
+	allocMsg->hdr.src = VMCI_ANON_SRC_HANDLE;
+	allocMsg->hdr.payloadSize = msgSize - VMCI_DG_HEADERSIZE;
+	allocMsg->handle = entry->qp.handle;
+	allocMsg->peer = entry->qp.peer;
+	allocMsg->flags = entry->qp.flags;
+	allocMsg->produceSize = entry->qp.produceSize;
+	allocMsg->consumeSize = entry->qp.consumeSize;
+	allocMsg->numPPNs = entry->numPPNs;
+
+	result =
+	    VMCI_PopulatePPNList((uint8_t *) allocMsg + sizeof *allocMsg,
+				 &entry->ppnSet);
+	if (result == VMCI_SUCCESS)
+		result = VMCI_SendDatagram((struct vmci_datagram *)allocMsg);
+
+	kfree(allocMsg);
+
+	return result;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQueuePairDetachHypercall --
+ *
+ *      Helper to make a QueuePairDetach hypercall when the driver is
+ *      supporting a guest device.
+ *
+ * Results:
+ *      Result of the hypercall.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIQueuePairDetachHypercall(struct vmci_handle handle)	// IN
+{
+	struct vmci_qp_detach_msg detachMsg;
+
+	detachMsg.hdr.dst = VMCI_MAKE_HANDLE(VMCI_HYPERVISOR_CONTEXT_ID,
+					     VMCI_QUEUEPAIR_DETACH);
+	detachMsg.hdr.src = VMCI_ANON_SRC_HANDLE;
+	detachMsg.hdr.payloadSize = sizeof handle;
+	detachMsg.handle = handle;
+
+	return VMCI_SendDatagram((struct vmci_datagram *)&detachMsg);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPUnmarkHibernateFailed --
+ *
+ *      Helper function that removes a queue pair entry from the group
+ *      of handles marked as having failed hibernation. Must be called
+ *      with the queue pair list lock held.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static void VMCIQPUnmarkHibernateFailed(struct qp_guest_endpoint *entry)	// IN
+{
+	struct vmci_handle handle;
+
+	/*
+	 * entry->handle is located in paged memory, so it can't be
+	 * accessed while holding a spinlock.
+	 */
+
+	handle = entry->qp.handle;
+	entry->hibernateFailure = false;
+	spin_lock_bh(&hibernateFailedListLock);
+	VMCIHandleArray_RemoveEntry(hibernateFailedList, handle);
+	spin_unlock_bh(&hibernateFailedListLock);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * QueuePairList_RemoveEntry --
+ *
+ *      Removes the given entry from the list. Assumes that the list is locked.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static void QueuePairList_RemoveEntry(struct qp_list *qpList,	// IN
+				      struct qp_entry *entry)	// IN
+{
+	if (entry)
+		list_del(&entry->listItem);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQueuePairDetachGuestWork --
+ *
+ *      Helper for VMCI QueuePair detach interface. Frees the physical
+ *      pages for the queue pair.
+ *
+ * Results:
+ *      Success or failure.
+ *
+ * Side effects:
+ *      Memory may be freed.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static int VMCIQueuePairDetachGuestWork(struct vmci_handle handle)	// IN
+{
+	int result;
+	struct qp_guest_endpoint *entry;
+	uint32_t refCount = 0xffffffff;	/* To avoid compiler warning below */
+
+	ASSERT(!VMCI_HANDLE_INVALID(handle));
+
+	down(&qpGuestEndpoints.mutex);
+
+	entry = (struct qp_guest_endpoint *)
+	    QueuePairList_FindEntry(&qpGuestEndpoints, handle);
+	if (!entry) {
+		up(&qpGuestEndpoints.mutex);
+		return VMCI_ERROR_NOT_FOUND;
+	}
+
+	ASSERT(entry->qp.refCount >= 1);
+
+	if (entry->qp.flags & VMCI_QPFLAG_LOCAL) {
+		result = VMCI_SUCCESS;
+
+		if (entry->qp.refCount > 1) {
+			result = QueuePairNotifyPeerLocal(false, handle);
+			/*
+			 * We can fail to notify a local queuepair because we can't allocate.
+			 * We still want to release the entry if that happens, so don't bail
+			 * out yet.
+			 */
+		}
+	} else {
+		result = VMCIQueuePairDetachHypercall(handle);
+		if (entry->hibernateFailure) {
+			if (result == VMCI_ERROR_NOT_FOUND) {
+				/*
+				 * If a queue pair detach failed when entering
+				 * hibernation, the guest driver and the device may
+				 * disagree on its existence when coming out of
+				 * hibernation. The guest driver will regard it as a
+				 * non-local queue pair, but the device state is gone,
+				 * since the device has been powered off. In this case, we
+				 * treat the queue pair as a local queue pair with no
+				 * peer.
+				 */
+
+				ASSERT(entry->qp.refCount == 1);
+				result = VMCI_SUCCESS;
+			}
+
+			if (result == VMCI_SUCCESS)
+				VMCIQPUnmarkHibernateFailed(entry);
+		}
+		if (result < VMCI_SUCCESS) {
+			/*
+			 * We failed to notify a non-local queuepair.  That other queuepair
+			 * might still be accessing the shared memory, so don't release the
+			 * entry yet.  It will get cleaned up by VMCIQueuePair_Exit()
+			 * if necessary (assuming we are going away, otherwise why did this
+			 * fail?).
+			 */
+
+			up(&qpGuestEndpoints.mutex);
+			return result;
+		}
+	}
+
+	/*
+	 * If we get here then we either failed to notify a local queuepair, or
+	 * we succeeded in all cases.  Release the entry if required.
+	 */
+
+	entry->qp.refCount--;
+	if (entry->qp.refCount == 0) {
+		QueuePairList_RemoveEntry(&qpGuestEndpoints, &entry->qp);
+	}
+
+	/* If we didn't remove the entry, this could change once we unlock. */
+	if (entry)
+		refCount = entry->qp.refCount;
+
+	up(&qpGuestEndpoints.mutex);
+
+	if (refCount == 0) {
+		QPGuestEndpointDestroy(entry);
+	}
+	return result;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * QueuePairList_AddEntry --
+ *
+ *      Adds the given entry to the list. Assumes that the list is locked.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static void QueuePairList_AddEntry(struct qp_list *qpList,	// IN
+				   struct qp_entry *entry)	// IN
+{
+	if (entry)
+		list_add(&entry->listItem, &qpList->head);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQueuePairAllocGuestWork --
+ *
+ *      This functions handles the actual allocation of a VMCI queue
+ *      pair guest endpoint. Allocates physical pages for the queue
+ *      pair. It makes OS dependent calls through generic wrappers.
+ *
+ * Results:
+ *      Success or failure.
+ *
+ * Side effects:
+ *      Memory is allocated.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static int VMCIQueuePairAllocGuestWork(struct vmci_handle *handle,	// IN/OUT
+				       struct vmci_queue **produceQ,	// OUT
+				       uint64_t produceSize,	// IN
+				       struct vmci_queue **consumeQ,	// OUT
+				       uint64_t consumeSize,	// IN
+				       uint32_t peer,	// IN
+				       uint32_t flags,	// IN
+				       uint32_t privFlags)	// IN
+{
+	const uint64_t numProducePages = CEILING(produceSize, PAGE_SIZE) + 1;
+	const uint64_t numConsumePages = CEILING(consumeSize, PAGE_SIZE) + 1;
+	void *myProduceQ = NULL;
+	void *myConsumeQ = NULL;
+	int result;
+	struct qp_guest_endpoint *queuePairEntry = NULL;
+
+	/*
+	 * XXX Check for possible overflow of 'size' arguments when passed to
+	 * compat_get_order (after some arithmetic ops).
+	 */
+
+	ASSERT(handle && produceQ && consumeQ && (produceSize || consumeSize));
+
+	if (privFlags != VMCI_NO_PRIVILEGE_FLAGS)
+		return VMCI_ERROR_NO_ACCESS;
+
+	down(&qpGuestEndpoints.mutex);
+
+	/* Creation/attachment of a queuepair is allowed. */
+	if ((atomic_read(&qpGuestEndpoints.hibernate) == 1) &&
+	    !(flags & VMCI_QPFLAG_LOCAL)) {
+		/*
+		 * While guest OS is in hibernate state, creating non-local
+		 * queue pairs is not allowed after the point where the VMCI
+		 * guest driver converted the existing queue pairs to local
+		 * ones.
+		 */
+
+		result = VMCI_ERROR_UNAVAILABLE;
+		goto error;
+	}
+
+	if ((queuePairEntry = (struct qp_guest_endpoint *)
+	     QueuePairList_FindEntry(&qpGuestEndpoints, *handle))) {
+		if (queuePairEntry->qp.flags & VMCI_QPFLAG_LOCAL) {
+			/* Local attach case. */
+			if (queuePairEntry->qp.refCount > 1) {
+				VMCI_DEBUG_LOG(4,
+					       (LGPFX
+						"Error attempting to attach more than "
+						"once.\n"));
+				result = VMCI_ERROR_UNAVAILABLE;
+				goto errorKeepEntry;
+			}
+
+			if (queuePairEntry->qp.produceSize != consumeSize
+			    || queuePairEntry->qp.consumeSize !=
+			    produceSize
+			    || queuePairEntry->qp.flags !=
+			    (flags & ~VMCI_QPFLAG_ATTACH_ONLY)) {
+				VMCI_DEBUG_LOG(4,
+					       (LGPFX
+						"Error mismatched queue pair in local "
+						"attach.\n"));
+				result = VMCI_ERROR_QUEUEPAIR_MISMATCH;
+				goto errorKeepEntry;
+			}
+
+			/*
+			 * Do a local attach.  We swap the consume and produce queues for the
+			 * attacher and deliver an attach event.
+			 */
+			result = QueuePairNotifyPeerLocal(true, *handle);
+			if (result < VMCI_SUCCESS)
+				goto errorKeepEntry;
+
+			myProduceQ = queuePairEntry->consumeQ;
+			myConsumeQ = queuePairEntry->produceQ;
+			goto out;
+		}
+		result = VMCI_ERROR_ALREADY_EXISTS;
+		goto errorKeepEntry;
+	}
+
+	myProduceQ = VMCI_AllocQueue(produceSize);
+	if (!myProduceQ) {
+		VMCI_WARNING((LGPFX
+			      "Error allocating pages for produce queue.\n"));
+		result = VMCI_ERROR_NO_MEM;
+		goto error;
+	}
+
+	myConsumeQ = VMCI_AllocQueue(consumeSize);
+	if (!myConsumeQ) {
+		VMCI_WARNING((LGPFX
+			      "Error allocating pages for consume queue.\n"));
+		result = VMCI_ERROR_NO_MEM;
+		goto error;
+	}
+
+	queuePairEntry = QPGuestEndpointCreate(*handle, peer, flags,
+					       produceSize, consumeSize,
+					       myProduceQ, myConsumeQ);
+	if (!queuePairEntry) {
+		VMCI_WARNING((LGPFX "Error allocating memory in %s.\n",
+			      __FUNCTION__));
+		result = VMCI_ERROR_NO_MEM;
+		goto error;
+	}
+
+	result = VMCI_AllocPPNSet(myProduceQ, numProducePages, myConsumeQ,
+				  numConsumePages, &queuePairEntry->ppnSet);
+	if (result < VMCI_SUCCESS) {
+		VMCI_WARNING((LGPFX "VMCI_AllocPPNSet failed.\n"));
+		goto error;
+	}
+
+	/*
+	 * It's only necessary to notify the host if this queue pair will be
+	 * attached to from another context.
+	 */
+	if (queuePairEntry->qp.flags & VMCI_QPFLAG_LOCAL) {
+		/* Local create case. */
+		uint32_t contextId = VMCI_GetContextID();
+
+		/*
+		 * Enforce similar checks on local queue pairs as we do for regular ones.
+		 * The handle's context must match the creator or attacher context id
+		 * (here they are both the current context id) and the attach-only flag
+		 * cannot exist during create.  We also ensure specified peer is this
+		 * context or an invalid one.
+		 */
+		if (queuePairEntry->qp.handle.context != contextId ||
+		    (queuePairEntry->qp.peer != VMCI_INVALID_ID &&
+		     queuePairEntry->qp.peer != contextId)) {
+			result = VMCI_ERROR_NO_ACCESS;
+			goto error;
+		}
+
+		if (queuePairEntry->qp.flags & VMCI_QPFLAG_ATTACH_ONLY) {
+			result = VMCI_ERROR_NOT_FOUND;
+			goto error;
+		}
+	} else {
+		result = VMCIQueuePairAllocHypercall(queuePairEntry);
+		if (result < VMCI_SUCCESS) {
+			VMCI_WARNING((LGPFX
+				      "VMCIQueuePairAllocHypercall result = %d.\n",
+				      result));
+			goto error;
+		}
+	}
+
+	VMCI_InitQueueMutex((struct vmci_queue *)myProduceQ,
+			    (struct vmci_queue *)myConsumeQ);
+
+	QueuePairList_AddEntry(&qpGuestEndpoints, &queuePairEntry->qp);
+
+ out:
+	queuePairEntry->qp.refCount++;
+	*handle = queuePairEntry->qp.handle;
+	*produceQ = (struct vmci_queue *)myProduceQ;
+	*consumeQ = (struct vmci_queue *)myConsumeQ;
+
+	/*
+	 * We should initialize the queue pair header pages on a local queue pair
+	 * create.  For non-local queue pairs, the hypervisor initializes the header
+	 * pages in the create step.
+	 */
+	if ((queuePairEntry->qp.flags & VMCI_QPFLAG_LOCAL) &&
+	    queuePairEntry->qp.refCount == 1) {
+		VMCIQueueHeader_Init((*produceQ)->qHeader, *handle);
+		VMCIQueueHeader_Init((*consumeQ)->qHeader, *handle);
+	}
+
+	up(&qpGuestEndpoints.mutex);
+
+	return VMCI_SUCCESS;
+
+ error:
+	up(&qpGuestEndpoints.mutex);
+	if (queuePairEntry) {
+		/* The queues will be freed inside the destroy routine. */
+		QPGuestEndpointDestroy(queuePairEntry);
+	} else {
+		if (myProduceQ) {
+			VMCI_FreeQueue(myProduceQ, produceSize);
+		}
+		if (myConsumeQ) {
+			VMCI_FreeQueue(myConsumeQ, consumeSize);
+		}
+	}
+	return result;
+
+ errorKeepEntry:
+	/* This path should only be used when an existing entry was found. */
+	ASSERT(queuePairEntry->qp.refCount > 0);
+	up(&qpGuestEndpoints.mutex);
+	return result;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPBrokerCreate --
+ *
+ *      The first endpoint issuing a queue pair allocation will create the state
+ *      of the queue pair in the queue pair broker.
+ *
+ *      If the creator is a guest, it will associate a VMX virtual address range
+ *      with the queue pair as specified by the pageStore. For compatibility with
+ *      older VMX'en, that would use a separate step to set the VMX virtual
+ *      address range, the virtual address range can be registered later using
+ *      VMCIQPBroker_SetPageStore. In that case, a pageStore of NULL should be
+ *      used.
+ *
+ *      If the creator is the host, a pageStore of NULL should be used as well,
+ *      since the host is not able to supply a page store for the queue pair.
+ *
+ *      For older VMX and host callers, the queue pair will be created in the
+ *      VMCIQPB_CREATED_NO_MEM state, and for current VMX callers, it will be
+ *      created in VMCOQPB_CREATED_MEM state.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, appropriate error code otherwise.
+ *
+ * Side effects:
+ *      Memory will be allocated, and pages may be pinned.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static int VMCIQPBrokerCreate(struct vmci_handle handle,	// IN
+			      uint32_t peer,	// IN
+			      uint32_t flags,	// IN
+			      uint32_t privFlags,	// IN
+			      uint64_t produceSize,	// IN
+			      uint64_t consumeSize,	// IN
+			      QueuePairPageStore * pageStore,	// IN
+			      struct vmci_context *context,	// IN: Caller
+			      VMCIEventReleaseCB wakeupCB,	// IN
+			      void *clientData,	// IN
+			      struct qp_broker_entry **ent)	// OUT
+{
+	struct qp_broker_entry *entry = NULL;
+	const uint32_t contextId = VMCIContext_GetId(context);
+	bool isLocal = flags & VMCI_QPFLAG_LOCAL;
+	int result;
+	uint64_t guestProduceSize;
+	uint64_t guestConsumeSize;
+
+	/*
+	 * Do not create if the caller asked not to.
+	 */
+
+	if (flags & VMCI_QPFLAG_ATTACH_ONLY) {
+		return VMCI_ERROR_NOT_FOUND;
+	}
+
+	/*
+	 * Creator's context ID should match handle's context ID or the creator
+	 * must allow the context in handle's context ID as the "peer".
+	 */
+
+	if (handle.context != contextId && handle.context != peer) {
+		return VMCI_ERROR_NO_ACCESS;
+	}
+
+	if (VMCI_CONTEXT_IS_VM(contextId) && VMCI_CONTEXT_IS_VM(peer)) {
+		return VMCI_ERROR_DST_UNREACHABLE;
+	}
+
+	/*
+	 * Creator's context ID for local queue pairs should match the
+	 * peer, if a peer is specified.
+	 */
+
+	if (isLocal && peer != VMCI_INVALID_ID && contextId != peer) {
+		return VMCI_ERROR_NO_ACCESS;
+	}
+
+	entry = kmalloc(sizeof *entry, GFP_ATOMIC);
+	if (!entry) {
+		return VMCI_ERROR_NO_MEM;
+	}
+
+	if (VMCIContext_GetId(context) == VMCI_HOST_CONTEXT_ID && !isLocal) {
+		/*
+		 * The queue pair broker entry stores values from the guest
+		 * point of view, so a creating host side endpoint should swap
+		 * produce and consume values -- unless it is a local queue
+		 * pair, in which case no swapping is necessary, since the local
+		 * attacher will swap queues.
+		 */
+
+		guestProduceSize = consumeSize;
+		guestConsumeSize = produceSize;
+	} else {
+		guestProduceSize = produceSize;
+		guestConsumeSize = consumeSize;
+	}
+
+	memset(entry, 0, sizeof *entry);
+	entry->qp.handle = handle;
+	entry->qp.peer = peer;
+	entry->qp.flags = flags;
+	entry->qp.produceSize = guestProduceSize;
+	entry->qp.consumeSize = guestConsumeSize;
+	entry->qp.refCount = 1;
+	entry->createId = contextId;
+	entry->attachId = VMCI_INVALID_ID;
+	entry->state = VMCIQPB_NEW;
+	entry->requireTrustedAttach =
+	    (context->privFlags & VMCI_PRIVILEGE_FLAG_RESTRICTED) ? true :
+	    false;
+	entry->createdByTrusted =
+	    (privFlags & VMCI_PRIVILEGE_FLAG_TRUSTED) ? true : false;
+	entry->vmciPageFiles = false;
+	entry->wakeupCB = wakeupCB;
+	entry->clientData = clientData;
+	entry->produceQ = VMCIHost_AllocQueue(guestProduceSize);
+	if (entry->produceQ == NULL) {
+		result = VMCI_ERROR_NO_MEM;
+		goto error;
+	}
+	entry->consumeQ = VMCIHost_AllocQueue(guestConsumeSize);
+	if (entry->consumeQ == NULL) {
+		result = VMCI_ERROR_NO_MEM;
+		goto error;
+	}
+
+	VMCI_InitQueueMutex(entry->produceQ, entry->consumeQ);
+
+	INIT_LIST_HEAD(&entry->qp.listItem);
+
+	if (isLocal) {
+		ASSERT(pageStore == NULL);
+
+		entry->localMem =
+		    kmalloc(QPE_NUM_PAGES(entry->qp) * PAGE_SIZE, GFP_KERNEL);
+		if (entry->localMem == NULL) {
+			result = VMCI_ERROR_NO_MEM;
+			goto error;
+		}
+		entry->state = VMCIQPB_CREATED_MEM;
+		entry->produceQ->qHeader = entry->localMem;
+		entry->consumeQ->qHeader =
+		    (struct vmci_queue_header *)((uint8_t *) entry->localMem +
+						 (CEILING
+						  (entry->qp.produceSize,
+						   PAGE_SIZE) + 1) * PAGE_SIZE);
+		VMCIQueueHeader_Init(entry->produceQ->qHeader, handle);
+		VMCIQueueHeader_Init(entry->consumeQ->qHeader, handle);
+	} else if (pageStore) {
+		ASSERT(entry->createId != VMCI_HOST_CONTEXT_ID || isLocal);
+
+		/*
+		 * The VMX already initialized the queue pair headers, so no
+		 * need for the kernel side to do that.
+		 */
+
+		result = VMCIHost_RegisterUserMemory(pageStore,
+						     entry->produceQ,
+						     entry->consumeQ);
+		if (result < VMCI_SUCCESS) {
+			goto error;
+		}
+		entry->state = VMCIQPB_CREATED_MEM;
+	} else {
+		/*
+		 * A create without a pageStore may be either a host side create (in which
+		 * case we are waiting for the guest side to supply the memory) or an old
+		 * style queue pair create (in which case we will expect a set page store
+		 * call as the next step).
+		 */
+
+		entry->state = VMCIQPB_CREATED_NO_MEM;
+	}
+
+	QueuePairList_AddEntry(&qpBrokerList, &entry->qp);
+	if (ent != NULL) {
+		*ent = entry;
+	}
+
+	VMCIContext_QueuePairCreate(context, handle);
+
+	return VMCI_SUCCESS;
+
+ error:
+	if (entry != NULL) {
+		if (entry->produceQ != NULL) {
+			VMCIHost_FreeQueue(entry->produceQ, guestProduceSize);
+		}
+		if (entry->consumeQ != NULL) {
+			VMCIHost_FreeQueue(entry->consumeQ, guestConsumeSize);
+		}
+		kfree(entry);
+	}
+	return result;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * QueuePairNotifyPeer --
+ *
+ *      Enqueues an event datagram to notify the peer VM attached to
+ *      the given queue pair handle about attach/detach event by the
+ *      given VM.
+ *
+ * Results:
+ *      Payload size of datagram enqueued on success, error code otherwise.
+ *
+ * Side effects:
+ *      Memory is allocated.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int QueuePairNotifyPeer(bool attach,	// IN: attach or detach?
+			struct vmci_handle handle,	// IN
+			uint32_t myId,	// IN
+			uint32_t peerId)	// IN: CID of VM to notify
+{
+	int rv;
+	struct vmci_event_msg *eMsg;
+	struct vmci_event_payld_qp *evPayload;
+	char buf[sizeof *eMsg + sizeof *evPayload];
+
+	if (VMCI_HANDLE_INVALID(handle) || myId == VMCI_INVALID_ID ||
+	    peerId == VMCI_INVALID_ID) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	/*
+	 * Notification message contains: queue pair handle and
+	 * attaching/detaching VM's context id.
+	 */
+
+	eMsg = (struct vmci_event_msg *)buf;
+
+	/*
+	 * In VMCIContext_EnqueueDatagram() we enforce the upper limit on number of
+	 * pending events from the hypervisor to a given VM otherwise a rogue VM
+	 * could do an arbitrary number of attach and detach operations causing memory
+	 * pressure in the host kernel.
+	 */
+
+	/* Clear out any garbage. */
+	memset(eMsg, 0, sizeof buf);
+
+	eMsg->hdr.dst = VMCI_MAKE_HANDLE(peerId, VMCI_EVENT_HANDLER);
+	eMsg->hdr.src = VMCI_MAKE_HANDLE(VMCI_HYPERVISOR_CONTEXT_ID,
+					 VMCI_CONTEXT_RESOURCE_ID);
+	eMsg->hdr.payloadSize =
+	    sizeof *eMsg + sizeof *evPayload - sizeof eMsg->hdr;
+	eMsg->eventData.event =
+	    attach ? VMCI_EVENT_QP_PEER_ATTACH : VMCI_EVENT_QP_PEER_DETACH;
+	evPayload = VMCIEventMsgPayload(eMsg);
+	evPayload->handle = handle;
+	evPayload->peerId = myId;
+
+	rv = VMCIDatagram_Dispatch(VMCI_HYPERVISOR_CONTEXT_ID,
+				   (struct vmci_datagram *)eMsg, false);
+	if (rv < VMCI_SUCCESS) {
+		VMCI_WARNING((LGPFX
+			      "Failed to enqueue QueuePair %s event datagram for "
+			      "context (ID=0x%x).\n",
+			      attach ? "ATTACH" : "DETACH", peerId));
+	}
+
+	return rv;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPBrokerAttach --
+ *
+ *      The second endpoint issuing a queue pair allocation will attach to the
+ *      queue pair registered with the queue pair broker.
+ *
+ *      If the attacher is a guest, it will associate a VMX virtual address range
+ *      with the queue pair as specified by the pageStore. At this point, the
+ *      already attach host endpoint may start using the queue pair, and an
+ *      attach event is sent to it. For compatibility with older VMX'en, that
+ *      used a separate step to set the VMX virtual address range, the virtual
+ *      address range can be registered later using VMCIQPBroker_SetPageStore. In
+ *      that case, a pageStore of NULL should be used, and the attach event will
+ *      be generated once the actual page store has been set.
+ *
+ *      If the attacher is the host, a pageStore of NULL should be used as well,
+ *      since the page store information is already set by the guest.
+ *
+ *      For new VMX and host callers, the queue pair will be moved to the
+ *      VMCIQPB_ATTACHED_MEM state, and for older VMX callers, it will be
+ *      moved to the VMCOQPB_ATTACHED_NO_MEM state.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, appropriate error code otherwise.
+ *
+ * Side effects:
+ *      Memory will be allocated, and pages may be pinned.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static int VMCIQPBrokerAttach(struct qp_broker_entry *entry,	// IN
+			      uint32_t peer,	// IN
+			      uint32_t flags,	// IN
+			      uint32_t privFlags,	// IN
+			      uint64_t produceSize,	// IN
+			      uint64_t consumeSize,	// IN
+			      QueuePairPageStore * pageStore,	// IN/OUT
+			      struct vmci_context *context,	// IN: Caller
+			      VMCIEventReleaseCB wakeupCB,	// IN
+			      void *clientData,	// IN
+			      struct qp_broker_entry **ent)	// OUT
+{
+	const uint32_t contextId = VMCIContext_GetId(context);
+	bool isLocal = flags & VMCI_QPFLAG_LOCAL;
+	int result;
+
+	if (entry->state != VMCIQPB_CREATED_NO_MEM &&
+	    entry->state != VMCIQPB_CREATED_MEM)
+		return VMCI_ERROR_UNAVAILABLE;
+
+	if (isLocal) {
+		if (!(entry->qp.flags & VMCI_QPFLAG_LOCAL) ||
+		    contextId != entry->createId) {
+			return VMCI_ERROR_INVALID_ARGS;
+		}
+	} else if (contextId == entry->createId || contextId == entry->attachId) {
+		return VMCI_ERROR_ALREADY_EXISTS;
+	}
+
+	ASSERT(entry->qp.refCount < 2);
+	ASSERT(entry->attachId == VMCI_INVALID_ID);
+
+	if (VMCI_CONTEXT_IS_VM(contextId)
+	    && VMCI_CONTEXT_IS_VM(entry->createId))
+		return VMCI_ERROR_DST_UNREACHABLE;
+
+	/*
+	 * If we are attaching from a restricted context then the queuepair
+	 * must have been created by a trusted endpoint.
+	 */
+
+	if ((context->privFlags & VMCI_PRIVILEGE_FLAG_RESTRICTED) &&
+	    !entry->createdByTrusted)
+		return VMCI_ERROR_NO_ACCESS;
+
+	/*
+	 * If we are attaching to a queuepair that was created by a restricted
+	 * context then we must be trusted.
+	 */
+
+	if (entry->requireTrustedAttach &&
+	    (!(privFlags & VMCI_PRIVILEGE_FLAG_TRUSTED)))
+		return VMCI_ERROR_NO_ACCESS;
+
+	/*
+	 * If the creator specifies VMCI_INVALID_ID in "peer" field, access
+	 * control check is not performed.
+	 */
+
+	if (entry->qp.peer != VMCI_INVALID_ID && entry->qp.peer != contextId)
+		return VMCI_ERROR_NO_ACCESS;
+
+	if (entry->createId == VMCI_HOST_CONTEXT_ID) {
+		/*
+		 * Do not attach if the caller doesn't support Host Queue Pairs
+		 * and a host created this queue pair.
+		 */
+
+		if (!VMCIContext_SupportsHostQP(context)) {
+			return VMCI_ERROR_INVALID_RESOURCE;
+		}
+	} else if (contextId == VMCI_HOST_CONTEXT_ID) {
+		struct vmci_context *createContext;
+		bool supportsHostQP;
+
+		/*
+		 * Do not attach a host to a user created queue pair if that
+		 * user doesn't support host queue pair end points.
+		 */
+
+		createContext = VMCIContext_Get(entry->createId);
+		supportsHostQP = VMCIContext_SupportsHostQP(createContext);
+		VMCIContext_Release(createContext);
+
+		if (!supportsHostQP) {
+			return VMCI_ERROR_INVALID_RESOURCE;
+		}
+	}
+
+	if (entry->qp.flags != (flags & ~VMCI_QPFLAG_ATTACH_ONLY))
+		return VMCI_ERROR_QUEUEPAIR_MISMATCH;
+
+	if (contextId != VMCI_HOST_CONTEXT_ID) {
+		/*
+		 * The queue pair broker entry stores values from the guest
+		 * point of view, so an attaching guest should match the values
+		 * stored in the entry.
+		 */
+
+		if (entry->qp.produceSize != produceSize ||
+		    entry->qp.consumeSize != consumeSize) {
+			return VMCI_ERROR_QUEUEPAIR_MISMATCH;
+		}
+	} else if (entry->qp.produceSize != consumeSize ||
+		   entry->qp.consumeSize != produceSize) {
+		return VMCI_ERROR_QUEUEPAIR_MISMATCH;
+	}
+
+	if (contextId != VMCI_HOST_CONTEXT_ID) {
+		/*
+		 * If a guest attached to a queue pair, it will supply the backing memory.
+		 * If this is a pre NOVMVM vmx, the backing memory will be supplied by
+		 * calling VMCIQPBroker_SetPageStore() following the return of the
+		 * VMCIQPBroker_Alloc() call. If it is a vmx of version NOVMVM or later,
+		 * the page store must be supplied as part of the VMCIQPBroker_Alloc call.
+		 * Under all circumstances must the initially created queue pair not have
+		 * any memory associated with it already.
+		 */
+
+		if (entry->state != VMCIQPB_CREATED_NO_MEM) {
+			return VMCI_ERROR_INVALID_ARGS;
+		}
+
+		if (pageStore != NULL) {
+			/*
+			 * Patch up host state to point to guest supplied memory. The VMX
+			 * already initialized the queue pair headers, so no need for the
+			 * kernel side to do that.
+			 */
+
+			result = VMCIHost_RegisterUserMemory(pageStore,
+							     entry->produceQ,
+							     entry->consumeQ);
+			if (result < VMCI_SUCCESS) {
+				return result;
+			}
+			entry->state = VMCIQPB_ATTACHED_MEM;
+		} else {
+			entry->state = VMCIQPB_ATTACHED_NO_MEM;
+		}
+	} else if (entry->state == VMCIQPB_CREATED_NO_MEM) {
+		/*
+		 * The host side is attempting to attach to a queue pair that doesn't have
+		 * any memory associated with it. This must be a pre NOVMVM vmx that hasn't
+		 * set the page store information yet, or a quiesced VM.
+		 */
+
+		return VMCI_ERROR_UNAVAILABLE;
+	} else {
+		/*
+		 * The host side has successfully attached to a queue pair.
+		 */
+		entry->state = VMCIQPB_ATTACHED_MEM;
+	}
+
+	if (entry->state == VMCIQPB_ATTACHED_MEM) {
+		result =
+		    QueuePairNotifyPeer(true, entry->qp.handle, contextId,
+					entry->createId);
+		if (result < VMCI_SUCCESS) {
+			VMCI_WARNING((LGPFX
+				      "Failed to notify peer (ID=0x%x) of attach to queue "
+				      "pair (handle=0x%x:0x%x).\n",
+				      entry->createId,
+				      entry->qp.handle.context,
+				      entry->qp.handle.resource));
+		}
+	}
+
+	entry->attachId = contextId;
+	entry->qp.refCount++;
+	if (wakeupCB) {
+		ASSERT(!entry->wakeupCB);
+		entry->wakeupCB = wakeupCB;
+		entry->clientData = clientData;
+	}
+
+	/*
+	 * When attaching to local queue pairs, the context already has
+	 * an entry tracking the queue pair, so don't add another one.
+	 */
+
+	if (!isLocal) {
+		VMCIContext_QueuePairCreate(context, entry->qp.handle);
+	} else {
+		ASSERT(VMCIContext_QueuePairExists(context, entry->qp.handle));
+	}
+
+	if (ent != NULL)
+		*ent = entry;
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPBrokerAllocInt --
+ *
+ *      QueuePair_Alloc for use when setting up queue pair endpoints
+ *      on the host. Like QueuePair_Alloc, but returns a pointer to
+ *      the struct qp_broker_entry on success.
+ *
+ * Results:
+ *      Success or failure.
+ *
+ * Side effects:
+ *      Memory may be allocated.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static int VMCIQPBrokerAllocInt(struct vmci_handle handle,	// IN
+				uint32_t peer,	// IN
+				uint32_t flags,	// IN
+				uint32_t privFlags,	// IN
+				uint64_t produceSize,	// IN
+				uint64_t consumeSize,	// IN
+				QueuePairPageStore * pageStore,	// IN/OUT
+				struct vmci_context *context,	// IN: Caller
+				VMCIEventReleaseCB wakeupCB,	// IN
+				void *clientData,	// IN
+				struct qp_broker_entry **ent,	// OUT
+				bool * swap)	// OUT: swap queues?
+{
+	const uint32_t contextId = VMCIContext_GetId(context);
+	bool create;
+	struct qp_broker_entry *entry;
+	bool isLocal = flags & VMCI_QPFLAG_LOCAL;
+	int result;
+
+	if (VMCI_HANDLE_INVALID(handle) ||
+	    (flags & ~VMCI_QP_ALL_FLAGS) || isLocal ||
+	    !(produceSize || consumeSize) ||
+	    !context || contextId == VMCI_INVALID_ID ||
+	    handle.context == VMCI_INVALID_ID) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	if (pageStore && !VMCI_QP_PAGESTORE_IS_WELLFORMED(pageStore)) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	/*
+	 * In the initial argument check, we ensure that non-vmkernel hosts
+	 * are not allowed to create local queue pairs.
+	 */
+
+	ASSERT(!isLocal);
+
+	down(&qpBrokerList.mutex);
+
+	if (!isLocal && VMCIContext_QueuePairExists(context, handle)) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Context (ID=0x%x) already attached to queue pair "
+				"(handle=0x%x:0x%x).\n", contextId,
+				handle.context, handle.resource));
+		up(&qpBrokerList.mutex);
+		return VMCI_ERROR_ALREADY_EXISTS;
+	}
+
+	entry = (struct qp_broker_entry *)
+	    QueuePairList_FindEntry(&qpBrokerList, handle);
+	if (!entry) {
+		create = true;
+		result =
+		    VMCIQPBrokerCreate(handle, peer, flags, privFlags,
+				       produceSize, consumeSize, pageStore,
+				       context, wakeupCB, clientData, ent);
+	} else {
+		create = false;
+		result =
+		    VMCIQPBrokerAttach(entry, peer, flags, privFlags,
+				       produceSize, consumeSize, pageStore,
+				       context, wakeupCB, clientData, ent);
+	}
+
+	up(&qpBrokerList.mutex);
+
+	if (swap) {
+		*swap = (contextId == VMCI_HOST_CONTEXT_ID) && !(create
+								 && isLocal);
+	}
+
+	return result;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIQueuePairAllocHostWork --
+ *
+ *    This function implements the kernel API for allocating a queue
+ *    pair.
+ *
+ * Results:
+ *     VMCI_SUCCESS on succes and appropriate failure code otherwise.
+ *
+ * Side effects:
+ *     May allocate memory.
+ *
+ *----------------------------------------------------------------------
+ */
+
+static int VMCIQueuePairAllocHostWork(struct vmci_handle *handle,	// IN/OUT
+				      struct vmci_queue **produceQ,	// OUT
+				      uint64_t produceSize,	// IN
+				      struct vmci_queue **consumeQ,	// OUT
+				      uint64_t consumeSize,	// IN
+				      uint32_t peer,	// IN
+				      uint32_t flags,	// IN
+				      uint32_t privFlags,	// IN
+				      VMCIEventReleaseCB wakeupCB,	// IN
+				      void *clientData)	// IN
+{
+	struct vmci_context *context;
+	struct qp_broker_entry *entry;
+	int result;
+	bool swap;
+
+	if (VMCI_HANDLE_INVALID(*handle)) {
+		uint32_t resourceID = VMCIResource_GetID(VMCI_HOST_CONTEXT_ID);
+		if (resourceID == VMCI_INVALID_ID) {
+			return VMCI_ERROR_NO_HANDLE;
+		}
+		*handle = VMCI_MAKE_HANDLE(VMCI_HOST_CONTEXT_ID, resourceID);
+	}
+
+	context = VMCIContext_Get(VMCI_HOST_CONTEXT_ID);
+	ASSERT(context);
+
+	entry = NULL;
+	result =
+	    VMCIQPBrokerAllocInt(*handle, peer, flags, privFlags,
+				 produceSize, consumeSize, NULL, context,
+				 wakeupCB, clientData, &entry, &swap);
+	if (result == VMCI_SUCCESS) {
+		if (swap) {
+			/*
+			 * If this is a local queue pair, the attacher will swap around produce
+			 * and consume queues.
+			 */
+
+			*produceQ = entry->consumeQ;
+			*consumeQ = entry->produceQ;
+		} else {
+			*produceQ = entry->produceQ;
+			*consumeQ = entry->consumeQ;
+		}
+	} else {
+		*handle = VMCI_INVALID_HANDLE;
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"queue pair broker failed to alloc (result=%d).\n",
+				result));
+	}
+	VMCIContext_Release(context);
+	return result;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQueuePair_Alloc --
+ *
+ *      Allocates a VMCI QueuePair. Only checks validity of input
+ *      arguments. The real work is done in the host or guest
+ *      specific function.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, appropriate error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIQueuePair_Alloc(struct vmci_handle *handle,	// IN/OUT
+			struct vmci_queue **produceQ,	// OUT
+			uint64_t produceSize,	// IN
+			struct vmci_queue **consumeQ,	// OUT
+			uint64_t consumeSize,	// IN
+			uint32_t peer,	// IN
+			uint32_t flags,	// IN
+			uint32_t privFlags,	// IN
+			bool guestEndpoint,	// IN
+			VMCIEventReleaseCB wakeupCB,	// IN
+			void *clientData)	// IN
+{
+	if (!handle || !produceQ || !consumeQ || (!produceSize && !consumeSize)
+	    || (flags & ~VMCI_QP_ALL_FLAGS)) {
+		VMCI_DBG("Bad args");
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	if (guestEndpoint) {
+		return VMCIQueuePairAllocGuestWork(handle, produceQ,
+						   produceSize, consumeQ,
+						   consumeSize, peer,
+						   flags, privFlags);
+	} else {
+		return VMCIQueuePairAllocHostWork(handle, produceQ,
+						  produceSize, consumeQ,
+						  consumeSize, peer, flags,
+						  privFlags, wakeupCB,
+						  clientData);
+	}
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIQueuePairDetachHostWork --
+ *
+ *    This function implements the host kernel API for detaching from
+ *    a queue pair.
+ *
+ * Results:
+ *     VMCI_SUCCESS on success and appropriate failure code otherwise.
+ *
+ * Side effects:
+ *     May deallocate memory.
+ *
+ *----------------------------------------------------------------------
+ */
+
+static int VMCIQueuePairDetachHostWork(struct vmci_handle handle)	// IN
+{
+	int result;
+	struct vmci_context *context;
+
+	context = VMCIContext_Get(VMCI_HOST_CONTEXT_ID);
+
+	result = VMCIQPBroker_Detach(handle, context);
+
+	VMCIContext_Release(context);
+	return result;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQueuePair_Detach --
+ *
+ *      Detaches from a VMCI QueuePair. Only checks validity of input argument.
+ *      Real work is done in the host or guest specific function.
+ *
+ * Results:
+ *      Success or failure.
+ *
+ * Side effects:
+ *      Memory is freed.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIQueuePair_Detach(struct vmci_handle handle,	// IN
+			 bool guestEndpoint)	// IN
+{
+	if (VMCI_HANDLE_INVALID(handle))
+		return VMCI_ERROR_INVALID_ARGS;
+
+	if (guestEndpoint) {
+		return VMCIQueuePairDetachGuestWork(handle);
+	} else {
+		return VMCIQueuePairDetachHostWork(handle);
+	}
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * QueuePairList_Init --
+ *
+ *      Initializes the list of QueuePairs.
+ *
+ * Results:
+ *      Success or failure.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline int QueuePairList_Init(struct qp_list *qpList)	// IN
+{
+	INIT_LIST_HEAD(&qpList->head);
+	atomic_set(&qpList->hibernate, 0);
+	sema_init(&qpList->mutex, 1);
+	return VMCI_SUCCESS;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * QueuePairList_Destroy --
+ *
+ *      Destroy the list's mutex.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline void QueuePairList_Destroy(struct qp_list *qpList)
+{
+	/* VMCIMutex_Destroy(&qpList->mutex); NOOP.  XXX: CHECK THIS */
+	INIT_LIST_HEAD(&qpList->head);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * QueuePairList_GetHead --
+ *
+ *      Returns the entry from the head of the list. Assumes that the list is
+ *      locked.
+ *
+ * Results:
+ *      Pointer to entry.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static struct qp_entry *QueuePairList_GetHead(struct qp_list *qpList)
+{
+	if (!list_empty(&qpList->head)) {
+		struct qp_entry *entry =
+		    list_first_entry(&qpList->head, struct qp_entry,
+				     listItem);
+		return entry;
+	}
+
+	return NULL;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPBroker_Init --
+ *
+ *      Initalizes queue pair broker state.
+ *
+ * Results:
+ *      Success or failure.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIQPBroker_Init(void)
+{
+	return QueuePairList_Init(&qpBrokerList);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPBroker_Exit --
+ *
+ *      Destroys the queue pair broker state.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void VMCIQPBroker_Exit(void)
+{
+	struct qp_broker_entry *entry;
+
+	down(&qpBrokerList.mutex);
+
+	while ((entry = (struct qp_broker_entry *)
+		QueuePairList_GetHead(&qpBrokerList))) {
+		QueuePairList_RemoveEntry(&qpBrokerList, &entry->qp);
+		kfree(entry);
+	}
+
+	up(&qpBrokerList.mutex);
+	QueuePairList_Destroy(&qpBrokerList);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPBroker_Alloc --
+ *
+ *      Requests that a queue pair be allocated with the VMCI queue
+ *      pair broker. Allocates a queue pair entry if one does not
+ *      exist. Attaches to one if it exists, and retrieves the page
+ *      files backing that QueuePair.  Assumes that the queue pair
+ *      broker lock is held.
+ *
+ * Results:
+ *      Success or failure.
+ *
+ * Side effects:
+ *      Memory may be allocated.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIQPBroker_Alloc(struct vmci_handle handle,	// IN
+		       uint32_t peer,	// IN
+		       uint32_t flags,	// IN
+		       uint32_t privFlags,	// IN
+		       uint64_t produceSize,	// IN
+		       uint64_t consumeSize,	// IN
+		       QueuePairPageStore * pageStore,	// IN/OUT
+		       struct vmci_context *context)	// IN: Caller
+{
+	return VMCIQPBrokerAllocInt(handle, peer, flags, privFlags,
+				    produceSize, consumeSize,
+				    pageStore, context, NULL, NULL, NULL, NULL);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPBroker_SetPageStore --
+ *
+ *      VMX'en with versions lower than VMCI_VERSION_NOVMVM use a separate
+ *      step to add the UVAs of the VMX mapping of the queue pair. This function
+ *      provides backwards compatibility with such VMX'en, and takes care of
+ *      registering the page store for a queue pair previously allocated by the
+ *      VMX during create or attach. This function will move the queue pair state
+ *      to either from VMCIQBP_CREATED_NO_MEM to VMCIQBP_CREATED_MEM or
+ *      VMCIQBP_ATTACHED_NO_MEM to VMCIQBP_ATTACHED_MEM. If moving to the
+ *      attached state with memory, the queue pair is ready to be used by the
+ *      host peer, and an attached event will be generated.
+ *
+ *      Assumes that the queue pair broker lock is held.
+ *
+ *      This function is only used by the hosted platform, since there is no
+ *      issue with backwards compatibility for vmkernel.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, appropriate error code otherwise.
+ *
+ * Side effects:
+ *      Pages may get pinned.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIQPBroker_SetPageStore(struct vmci_handle handle,	// IN
+			      uint64_t produceUVA,	// IN
+			      uint64_t consumeUVA,	// IN
+			      struct vmci_context *context)	// IN: Caller
+{
+	struct qp_broker_entry *entry;
+	int result;
+	const uint32_t contextId = VMCIContext_GetId(context);
+
+	if (VMCI_HANDLE_INVALID(handle) || !context
+	    || contextId == VMCI_INVALID_ID)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	/*
+	 * We only support guest to host queue pairs, so the VMX must
+	 * supply UVAs for the mapped page files.
+	 */
+
+	if (produceUVA == 0 || consumeUVA == 0)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	down(&qpBrokerList.mutex);
+
+	if (!VMCIContext_QueuePairExists(context, handle)) {
+		VMCI_WARNING((LGPFX
+			      "Context (ID=0x%x) not attached to queue pair "
+			      "(handle=0x%x:0x%x).\n", contextId,
+			      handle.context, handle.resource));
+		result = VMCI_ERROR_NOT_FOUND;
+		goto out;
+	}
+
+	entry = (struct qp_broker_entry *)
+	    QueuePairList_FindEntry(&qpBrokerList, handle);
+	if (!entry) {
+		result = VMCI_ERROR_NOT_FOUND;
+		goto out;
+	}
+
+	/*
+	 * If I'm the owner then I can set the page store.
+	 *
+	 * Or, if a host created the QueuePair and I'm the attached peer
+	 * then I can set the page store.
+	 */
+
+	if (entry->createId != contextId &&
+	    (entry->createId != VMCI_HOST_CONTEXT_ID ||
+	     entry->attachId != contextId)) {
+		/* XXX: Log? */
+		result = VMCI_ERROR_QUEUEPAIR_NOTOWNER;
+		goto out;
+	}
+
+	if (entry->state != VMCIQPB_CREATED_NO_MEM &&
+	    entry->state != VMCIQPB_ATTACHED_NO_MEM) {
+		/* XXX: Log? */
+		result = VMCI_ERROR_UNAVAILABLE;
+		goto out;
+	}
+
+	result = VMCIHost_GetUserMemory(produceUVA, consumeUVA,
+					entry->produceQ, entry->consumeQ);
+	if (result < VMCI_SUCCESS)
+		goto out;
+
+	result = VMCIHost_MapQueueHeaders(entry->produceQ, entry->consumeQ);
+	if (result < VMCI_SUCCESS) {
+		VMCIHost_ReleaseUserMemory(entry->produceQ, entry->consumeQ);
+		goto out;
+	}
+
+	if (entry->state == VMCIQPB_CREATED_NO_MEM) {
+		entry->state = VMCIQPB_CREATED_MEM;
+	} else {
+		ASSERT(entry->state == VMCIQPB_ATTACHED_NO_MEM);
+		entry->state = VMCIQPB_ATTACHED_MEM;
+	}
+	entry->vmciPageFiles = true;
+
+	if (entry->state == VMCIQPB_ATTACHED_MEM) {
+		result =
+		    QueuePairNotifyPeer(true, handle, contextId,
+					entry->createId);
+		if (result < VMCI_SUCCESS) {
+			VMCI_WARNING((LGPFX
+				      "Failed to notify peer (ID=0x%x) of attach to queue "
+				      "pair (handle=0x%x:0x%x).\n",
+				      entry->createId,
+				      entry->qp.handle.context,
+				      entry->qp.handle.resource));
+		}
+	}
+
+	result = VMCI_SUCCESS;
+ out:
+	up(&qpBrokerList.mutex);
+	return result;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * QueuePairResetSavedHeaders --
+ *
+ *      Resets saved queue headers for the given QP broker
+ *      entry. Should be used when guest memory becomes available
+ *      again, or the guest detaches.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static void QueuePairResetSavedHeaders(struct qp_broker_entry *entry)	// IN
+{
+	entry->produceQ->savedHeader = NULL;
+	entry->consumeQ->savedHeader = NULL;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPBroker_Detach --
+ *
+ *      The main entry point for detaching from a queue pair registered with the
+ *      queue pair broker. If more than one endpoint is attached to the queue
+ *      pair, the first endpoint will mainly decrement a reference count and
+ *      generate a notification to its peer. The last endpoint will clean up
+ *      the queue pair state registered with the broker.
+ *
+ *      When a guest endpoint detaches, it will unmap and unregister the guest
+ *      memory backing the queue pair. If the host is still attached, it will
+ *      no longer be able to access the queue pair content.
+ *
+ *      If the queue pair is already in a state where there is no memory
+ *      registered for the queue pair (any *_NO_MEM state), it will transition to
+ *      the VMCIQPB_SHUTDOWN_NO_MEM state. This will also happen, if a guest 
+ *      endpoint is the first of two endpoints to detach. If the host endpoint is
+ *      the first out of two to detach, the queue pair will move to the
+ *      VMCIQPB_SHUTDOWN_MEM state.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, appropriate error code otherwise.
+ *
+ * Side effects:
+ *      Memory may be freed, and pages may be unpinned.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIQPBroker_Detach(struct vmci_handle handle,	// IN
+			struct vmci_context *context)	// IN
+{
+	struct qp_broker_entry *entry;
+	const uint32_t contextId = VMCIContext_GetId(context);
+	uint32_t peerId;
+	bool isLocal = false;
+	int result;
+
+	if (VMCI_HANDLE_INVALID(handle) || !context
+	    || contextId == VMCI_INVALID_ID) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	down(&qpBrokerList.mutex);
+
+	if (!VMCIContext_QueuePairExists(context, handle)) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Context (ID=0x%x) not attached to queue pair "
+				"(handle=0x%x:0x%x).\n", contextId,
+				handle.context, handle.resource));
+		result = VMCI_ERROR_NOT_FOUND;
+		goto out;
+	}
+
+	entry = (struct qp_broker_entry *)
+	    QueuePairList_FindEntry(&qpBrokerList, handle);
+	if (!entry) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Context (ID=0x%x) reports being attached to queue pair "
+				"(handle=0x%x:0x%x) that isn't present in broker.\n",
+				contextId, handle.context, handle.resource));
+		result = VMCI_ERROR_NOT_FOUND;
+		goto out;
+	}
+
+	if (contextId != entry->createId && contextId != entry->attachId) {
+		result = VMCI_ERROR_QUEUEPAIR_NOTATTACHED;
+		goto out;
+	}
+
+	if (contextId == entry->createId) {
+		peerId = entry->attachId;
+		entry->createId = VMCI_INVALID_ID;
+	} else {
+		peerId = entry->createId;
+		entry->attachId = VMCI_INVALID_ID;
+	}
+	entry->qp.refCount--;
+
+	isLocal = entry->qp.flags & VMCI_QPFLAG_LOCAL;
+
+	if (contextId != VMCI_HOST_CONTEXT_ID) {
+		int result;
+		bool headersMapped;
+
+		ASSERT(!isLocal);
+
+		/*
+		 * Pre NOVMVM vmx'en may detach from a queue pair before setting the page
+		 * store, and in that case there is no user memory to detach from. Also,
+		 * more recent VMX'en may detach from a queue pair in the quiesced state.
+		 */
+
+		VMCI_AcquireQueueMutex(entry->produceQ);
+		headersMapped = entry->produceQ->qHeader
+		    || entry->consumeQ->qHeader;
+		if (QPBROKERSTATE_HAS_MEM(entry)) {
+			result =
+			    VMCIHost_UnmapQueueHeaders
+			    (INVALID_VMCI_GUEST_MEM_ID, entry->produceQ,
+			     entry->consumeQ);
+			if (result < VMCI_SUCCESS)
+				VMCI_WARNING((LGPFX
+					      "Failed to unmap queue headers for queue pair "
+					      "(handle=0x%x:0x%x,result=%d).\n",
+					      handle.context,
+					      handle.resource, result));
+
+			if (entry->vmciPageFiles) {
+				VMCIHost_ReleaseUserMemory(entry->produceQ,
+							   entry->consumeQ);
+			} else {
+				VMCIHost_UnregisterUserMemory(entry->produceQ,
+							      entry->consumeQ);
+			}
+		}
+
+		if (!headersMapped)
+			QueuePairResetSavedHeaders(entry);
+
+		VMCI_ReleaseQueueMutex(entry->produceQ);
+
+		if (!headersMapped && entry->wakeupCB)
+			entry->wakeupCB(entry->clientData);
+
+	} else {
+		if (entry->wakeupCB) {
+			entry->wakeupCB = NULL;
+			entry->clientData = NULL;
+		}
+	}
+
+	if (entry->qp.refCount == 0) {
+		QueuePairList_RemoveEntry(&qpBrokerList, &entry->qp);
+
+		if (isLocal) {
+			kfree(entry->localMem);
+		}
+		VMCI_CleanupQueueMutex(entry->produceQ, entry->consumeQ);
+		VMCIHost_FreeQueue(entry->produceQ, entry->qp.produceSize);
+		VMCIHost_FreeQueue(entry->consumeQ, entry->qp.consumeSize);
+		kfree(entry);
+
+		VMCIContext_QueuePairDestroy(context, handle);
+	} else {
+		ASSERT(peerId != VMCI_INVALID_ID);
+		QueuePairNotifyPeer(false, handle, contextId, peerId);
+		if (contextId == VMCI_HOST_CONTEXT_ID
+		    && QPBROKERSTATE_HAS_MEM(entry)) {
+			entry->state = VMCIQPB_SHUTDOWN_MEM;
+		} else {
+			entry->state = VMCIQPB_SHUTDOWN_NO_MEM;
+		}
+
+		if (!isLocal)
+			VMCIContext_QueuePairDestroy(context, handle);
+
+	}
+	result = VMCI_SUCCESS;
+ out:
+	up(&qpBrokerList.mutex);
+	return result;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPBroker_Map --
+ *
+ *      Establishes the necessary mappings for a queue pair given a
+ *      reference to the queue pair guest memory. This is usually
+ *      called when a guest is unquiesced and the VMX is allowed to
+ *      map guest memory once again.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, appropriate error code otherwise.
+ *
+ * Side effects:
+ *      Memory may be allocated, and pages may be pinned.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIQPBroker_Map(struct vmci_handle handle,	// IN
+		     struct vmci_context *context,	// IN
+		     uint64_t guestMem)	// IN
+{
+	struct qp_broker_entry *entry;
+	const uint32_t contextId = VMCIContext_GetId(context);
+	bool isLocal = false;
+	int result;
+
+	if (VMCI_HANDLE_INVALID(handle) || !context
+	    || contextId == VMCI_INVALID_ID)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	down(&qpBrokerList.mutex);
+
+	if (!VMCIContext_QueuePairExists(context, handle)) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Context (ID=0x%x) not attached to queue pair "
+				"(handle=0x%x:0x%x).\n", contextId,
+				handle.context, handle.resource));
+		result = VMCI_ERROR_NOT_FOUND;
+		goto out;
+	}
+
+	entry = (struct qp_broker_entry *)
+	    QueuePairList_FindEntry(&qpBrokerList, handle);
+	if (!entry) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Context (ID=0x%x) reports being attached to queue pair "
+				"(handle=0x%x:0x%x) that isn't present in broker.\n",
+				contextId, handle.context, handle.resource));
+		result = VMCI_ERROR_NOT_FOUND;
+		goto out;
+	}
+
+	if (contextId != entry->createId && contextId != entry->attachId) {
+		result = VMCI_ERROR_QUEUEPAIR_NOTATTACHED;
+		goto out;
+	}
+
+	isLocal = entry->qp.flags & VMCI_QPFLAG_LOCAL;
+	result = VMCI_SUCCESS;
+
+	if (contextId != VMCI_HOST_CONTEXT_ID) {
+		QueuePairPageStore pageStore;
+
+		ASSERT(entry->state == VMCIQPB_CREATED_NO_MEM ||
+		       entry->state == VMCIQPB_SHUTDOWN_NO_MEM ||
+		       entry->state == VMCIQPB_ATTACHED_NO_MEM);
+		ASSERT(!isLocal);
+
+		pageStore.pages = guestMem;
+		pageStore.len = QPE_NUM_PAGES(entry->qp);
+
+		VMCI_AcquireQueueMutex(entry->produceQ);
+		QueuePairResetSavedHeaders(entry);
+		result =
+		    VMCIHost_RegisterUserMemory(&pageStore,
+						entry->produceQ,
+						entry->consumeQ);
+		VMCI_ReleaseQueueMutex(entry->produceQ);
+		if (result == VMCI_SUCCESS) {
+			/* Move state from *_NO_MEM to *_MEM */
+
+			entry->state++;
+
+			ASSERT(entry->state == VMCIQPB_CREATED_MEM ||
+			       entry->state == VMCIQPB_SHUTDOWN_MEM ||
+			       entry->state == VMCIQPB_ATTACHED_MEM);
+
+			if (entry->wakeupCB)
+				entry->wakeupCB(entry->clientData);
+		}
+	}
+
+ out:
+	up(&qpBrokerList.mutex);
+	return result;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * QueuePairSaveHeaders --
+ *
+ *      Saves a snapshot of the queue headers for the given QP broker
+ *      entry. Should be used when guest memory is unmapped.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, appropriate error code if guest memory
+ *      can't be accessed..
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static int QueuePairSaveHeaders(struct qp_broker_entry *entry)	// IN
+{
+	int result;
+
+	if (NULL == entry->produceQ->qHeader
+	    || NULL == entry->consumeQ->qHeader) {
+		result =
+		    VMCIHost_MapQueueHeaders(entry->produceQ, entry->consumeQ);
+		if (result < VMCI_SUCCESS)
+			return result;
+	}
+
+	memcpy(&entry->savedProduceQ, entry->produceQ->qHeader,
+	       sizeof entry->savedProduceQ);
+	entry->produceQ->savedHeader = &entry->savedProduceQ;
+	memcpy(&entry->savedConsumeQ, entry->consumeQ->qHeader,
+	       sizeof entry->savedConsumeQ);
+	entry->consumeQ->savedHeader = &entry->savedConsumeQ;
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPBroker_Unmap --
+ *
+ *      Removes all references to the guest memory of a given queue pair, and
+ *      will move the queue pair from state *_MEM to *_NO_MEM. It is usually
+ *      called when a VM is being quiesced where access to guest memory should
+ *      avoided.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, appropriate error code otherwise.
+ *
+ * Side effects:
+ *      Memory may be freed, and pages may be unpinned.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIQPBroker_Unmap(struct vmci_handle handle,	// IN
+		       struct vmci_context *context,	// IN
+		       uint32_t gid)	// IN
+{
+	struct qp_broker_entry *entry;
+	const uint32_t contextId = VMCIContext_GetId(context);
+	bool isLocal = false;
+	int result;
+
+	if (VMCI_HANDLE_INVALID(handle) || !context
+	    || contextId == VMCI_INVALID_ID)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	down(&qpBrokerList.mutex);
+
+	if (!VMCIContext_QueuePairExists(context, handle)) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Context (ID=0x%x) not attached to queue pair "
+				"(handle=0x%x:0x%x).\n", contextId,
+				handle.context, handle.resource));
+		result = VMCI_ERROR_NOT_FOUND;
+		goto out;
+	}
+
+	entry = (struct qp_broker_entry *)
+	    QueuePairList_FindEntry(&qpBrokerList, handle);
+	if (!entry) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Context (ID=0x%x) reports being attached to queue pair "
+				"(handle=0x%x:0x%x) that isn't present in broker.\n",
+				contextId, handle.context, handle.resource));
+		result = VMCI_ERROR_NOT_FOUND;
+		goto out;
+	}
+
+	if (contextId != entry->createId && contextId != entry->attachId) {
+		result = VMCI_ERROR_QUEUEPAIR_NOTATTACHED;
+		goto out;
+	}
+
+	isLocal = entry->qp.flags & VMCI_QPFLAG_LOCAL;
+
+	if (contextId != VMCI_HOST_CONTEXT_ID) {
+		ASSERT(entry->state != VMCIQPB_CREATED_NO_MEM &&
+		       entry->state != VMCIQPB_SHUTDOWN_NO_MEM &&
+		       entry->state != VMCIQPB_ATTACHED_NO_MEM);
+		ASSERT(!isLocal);
+
+		VMCI_AcquireQueueMutex(entry->produceQ);
+		result = QueuePairSaveHeaders(entry);
+		if (result < VMCI_SUCCESS)
+			VMCI_WARNING((LGPFX
+				      "Failed to save queue headers for queue pair "
+				      "(handle=0x%x:0x%x,result=%d).\n",
+				      handle.context, handle.resource, result));
+
+		VMCIHost_UnmapQueueHeaders(gid,
+					   entry->produceQ, entry->consumeQ);
+
+		/*
+		 * On hosted, when we unmap queue pairs, the VMX will also
+		 * unmap the guest memory, so we invalidate the previously
+		 * registered memory. If the queue pair is mapped again at a
+		 * later point in time, we will need to reregister the user
+		 * memory with a possibly new user VA.
+		 */
+
+		VMCIHost_UnregisterUserMemory(entry->produceQ, entry->consumeQ);
+
+		/*
+		 * Move state from *_MEM to *_NO_MEM.
+		 */
+
+		entry->state--;
+
+		VMCI_ReleaseQueueMutex(entry->produceQ);
+	}
+
+	result = VMCI_SUCCESS;
+ out:
+	up(&qpBrokerList.mutex);
+	return result;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPGuestEndpoints_Init --
+ *
+ *      Initalizes data structure state keeping track of queue pair
+ *      guest endpoints.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success and appropriate failure code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIQPGuestEndpoints_Init(void)
+{
+	int err = QueuePairList_Init(&qpGuestEndpoints);
+
+	if (err < VMCI_SUCCESS)
+		return err;
+
+	hibernateFailedList = VMCIHandleArray_Create(0);
+	if (NULL == hibernateFailedList) {
+		QueuePairList_Destroy(&qpGuestEndpoints);
+		return VMCI_ERROR_NO_MEM;
+	}
+
+	/*
+	 * The lock rank must be lower than subscriberLock in vmciEvent,
+	 * since we hold the hibernateFailedListLock while generating
+	 * detach events.
+	 */
+
+	spin_lock_init(&hibernateFailedListLock);
+	return VMCI_SUCCESS;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPGuestEndpoints_Exit --
+ *
+ *      Destroys all guest queue pair endpoints. If active guest queue
+ *      pairs still exist, hypercalls to attempt detach from these
+ *      queue pairs will be made. Any failure to detach is silently
+ *      ignored.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void VMCIQPGuestEndpoints_Exit(void)
+{
+	struct qp_guest_endpoint *entry;
+
+	down(&qpGuestEndpoints.mutex);
+
+	while ((entry = (struct qp_guest_endpoint *)
+		QueuePairList_GetHead(&qpGuestEndpoints))) {
+
+		/* Don't make a hypercall for local QueuePairs. */
+		if (!(entry->qp.flags & VMCI_QPFLAG_LOCAL))
+			VMCIQueuePairDetachHypercall(entry->qp.handle);
+
+		/* We cannot fail the exit, so let's reset refCount. */
+		entry->qp.refCount = 0;
+		QueuePairList_RemoveEntry(&qpGuestEndpoints, &entry->qp);
+		QPGuestEndpointDestroy(entry);
+	}
+
+	atomic_set(&qpGuestEndpoints.hibernate, 0);
+	up(&qpGuestEndpoints.mutex);
+	QueuePairList_Destroy(&qpGuestEndpoints);
+	VMCIHandleArray_Destroy(hibernateFailedList);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPGuestEndpoints_Sync --
+ *
+ *      Use this as a synchronization point when setting globals, for example,
+ *      during device shutdown.
+ *
+ * Results:
+ *      true.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void VMCIQPGuestEndpoints_Sync(void)
+{
+	down(&qpGuestEndpoints.mutex);
+	up(&qpGuestEndpoints.mutex);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPMarkHibernateFailed --
+ *
+ *      Helper function that marks a queue pair entry as not being
+ *      converted to a local version during hibernation. Must be
+ *      called with the queue pair list mutex held.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static void VMCIQPMarkHibernateFailed(struct qp_guest_endpoint *entry)	// IN
+{
+	struct vmci_handle handle;
+
+	/*
+	 * entry->handle is located in paged memory, so it can't be
+	 * accessed while holding a spinlock.
+	 */
+
+	handle = entry->qp.handle;
+	entry->hibernateFailure = true;
+	spin_lock_bh(&hibernateFailedListLock);
+	VMCIHandleArray_AppendEntry(&hibernateFailedList, handle);
+	spin_unlock_bh(&hibernateFailedListLock);
+}
+
+/*
+ *----------------------------------------------------------------------------
+ *
+ * VMCIQPGuestEndpoints_Convert --
+ *
+ *      Guest queue pair endpoints may be converted to local ones in
+ *      two cases: when entering hibernation or when the device is
+ *      powered off before entering a sleep mode. Below we first
+ *      discuss the case of hibernation and then the case of entering
+ *      sleep state.
+ *
+ *      When the guest enters hibernation, any non-local queue pairs
+ *      will disconnect no later than at the time the VMCI device
+ *      powers off. To preserve the content of the non-local queue
+ *      pairs for this guest, we make a local copy of the content and
+ *      disconnect from the queue pairs. This will ensure that the
+ *      peer doesn't continue to update the queue pair state while the
+ *      guest OS is checkpointing the memory (otherwise we might end
+ *      up with a inconsistent snapshot where the pointers of the
+ *      consume queue are checkpointed later than the data pages they
+ *      point to, possibly indicating that non-valid data is
+ *      valid). While we are in hibernation mode, we block the
+ *      allocation of new non-local queue pairs. Note that while we
+ *      are doing the conversion to local queue pairs, we are holding
+ *      the queue pair list lock, which will prevent concurrent
+ *      creation of additional non-local queue pairs.
+ *
+ *      The hibernation cannot fail, so if we are unable to either
+ *      save the queue pair state or detach from a queue pair, we deal
+ *      with it by keeping the queue pair around, and converting it to
+ *      a local queue pair when going out of hibernation. Since
+ *      failing a detach is highly unlikely (it would require a queue
+ *      pair being actively used as part of a DMA operation), this is
+ *      an acceptable fall back. Once we come back from hibernation,
+ *      these queue pairs will no longer be external, so we simply
+ *      mark them as local at that point.
+ *
+ *      For the sleep state, the VMCI device will also be put into the
+ *      D3 power state, which may make the device inaccessible to the
+ *      guest driver (Windows unmaps the I/O space). When entering
+ *      sleep state, the hypervisor is likely to suspend the guest as
+ *      well, which will again convert all queue pairs to local ones.
+ *      However, VMCI device clients, e.g., VMCI Sockets, may attempt
+ *      to use queue pairs after the device has been put into the D3
+ *      power state, so we convert the queue pairs to local ones in
+ *      that case as well. When exiting the sleep states, the device
+ *      has not been reset, so all device state is still in sync with
+ *      the device driver, so no further processing is necessary at
+ *      that point.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      Queue pairs are detached.
+ *
+ *----------------------------------------------------------------------------
+ */
+
+void VMCIQPGuestEndpoints_Convert(bool toLocal,	// IN
+				  bool deviceReset)	// IN
+{
+	if (toLocal) {
+		struct list_head *next;
+
+		down(&qpGuestEndpoints.mutex);
+
+		list_for_each(next, &qpGuestEndpoints.head) {
+			struct qp_guest_endpoint *entry =
+			    (struct qp_guest_endpoint *)list_entry(next,
+								   struct
+								   qp_entry,
+								   listItem);
+
+			if (!(entry->qp.flags & VMCI_QPFLAG_LOCAL)) {
+				UNUSED_PARAM(struct vmci_queue *prodQ);	// Only used on Win32
+				UNUSED_PARAM(struct vmci_queue *consQ);	// Only used on Win32
+				void *oldProdQ;
+				UNUSED_PARAM(void *oldConsQ);	// Only used on Win32
+				int result;
+
+				prodQ = (struct vmci_queue *)entry->produceQ;
+				consQ = (struct vmci_queue *)entry->consumeQ;
+				oldConsQ = oldProdQ = NULL;
+
+				VMCI_AcquireQueueMutex(prodQ);
+
+				// XXX: CLEANUP!  USELESS CODE HERE
+				result = VMCI_ERROR_UNAVAILABLE;
+				if (result != VMCI_SUCCESS) {
+					VMCI_WARNING((LGPFX
+						      "Hibernate failed to create local consume "
+						      "queue from handle %x:%x (error: %d)\n",
+						      entry->qp.handle.context,
+						      entry->qp.handle.resource,
+						      result));
+					VMCI_ReleaseQueueMutex(prodQ);
+					VMCIQPMarkHibernateFailed(entry);
+					continue;
+				}
+				// XXX: CLEANUP.  DEFINED to this always.  Code to remove.
+				result = VMCI_ERROR_UNAVAILABLE;
+				if (result != VMCI_SUCCESS) {
+					VMCI_WARNING((LGPFX
+						      "Hibernate failed to create local produce "
+						      "queue from handle %x:%x (error: %d)\n",
+						      entry->qp.handle.context,
+						      entry->qp.handle.resource,
+						      result));
+					VMCI_ReleaseQueueMutex(prodQ);
+					VMCIQPMarkHibernateFailed(entry);
+					continue;
+				}
+
+				/*
+				 * Now that the contents of the queue pair has been saved,
+				 * we can detach from the non-local queue pair. This will
+				 * discard the content of the non-local queues.
+				 */
+
+				result =
+				    VMCIQueuePairDetachHypercall(entry->
+								 qp.handle);
+				if (result < VMCI_SUCCESS) {
+					VMCI_WARNING((LGPFX
+						      "Hibernate failed to detach from handle "
+						      "%x:%x\n",
+						      entry->qp.handle.context,
+						      entry->qp.
+						      handle.resource));
+					VMCI_ReleaseQueueMutex(prodQ);
+					VMCIQPMarkHibernateFailed(entry);
+					continue;
+				}
+
+				entry->qp.flags |= VMCI_QPFLAG_LOCAL;
+
+				VMCI_ReleaseQueueMutex(prodQ);
+
+				QueuePairNotifyPeerLocal(false,
+							 entry->qp.handle);
+			}
+		}
+		atomic_set(&qpGuestEndpoints.hibernate, 1);
+
+		up(&qpGuestEndpoints.mutex);
+	} else {
+		struct vmci_handle handle;
+
+		/*
+		 * When a guest enters hibernation, there may be queue pairs
+		 * around, that couldn't be converted to local queue
+		 * pairs. When coming out of hibernation, these queue pairs
+		 * will be restored as part of the guest main mem by the OS
+		 * hibernation code and they can now be regarded as local
+		 * versions. Since they are no longer connected, detach
+		 * notifications are sent to the local endpoint.
+		 */
+
+		spin_lock_bh(&hibernateFailedListLock);
+		while (VMCIHandleArray_GetSize(hibernateFailedList) > 0) {
+			handle =
+			    VMCIHandleArray_RemoveTail(hibernateFailedList);
+			if (deviceReset) {
+				QueuePairNotifyPeerLocal(false, handle);
+			}
+		}
+		spin_unlock_bh(&hibernateFailedListLock);
+
+		atomic_set(&qpGuestEndpoints.hibernate, 0);
+	}
+}
diff --git a/drivers/misc/vmw_vmci/vmciQueuePair.h b/drivers/misc/vmw_vmci/vmciQueuePair.h
new file mode 100644
index 0000000..d4fb0bf
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciQueuePair.h
@@ -0,0 +1,95 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef _VMCI_QUEUE_PAIR_H_
+#define _VMCI_QUEUE_PAIR_H_
+
+#include "vmci_defs.h"
+#include "vmci_iocontrols.h"
+#include "vmci_kernel_if.h"
+#include "vmciContext.h"
+#include "vmciQueue.h"
+
+/*
+ * QueuePairPageStore describes how the memory of a given queue pair
+ * is backed. When the queue pair is between the host and a guest, the
+ * page store consists of references to the guest pages. On vmkernel,
+ * this is a list of PPNs, and on hosted, it is a user VA where the
+ * queue pair is mapped into the VMX address space.
+ */
+
+typedef struct QueuePairPageStore {
+	uint64_t pages;		// Reference to pages backing the queue pair.
+	uint32_t len;		// Length of pageList/virtual addres range (in pages).
+} QueuePairPageStore;
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCI_QP_PAGESTORE_IS_WELLFORMED --
+ *
+ *     Utility function that checks whether the fields of the page
+ *     store contain valid values.
+ *
+ *  Result:
+ *     true if the page store is wellformed. false otherwise.
+ *
+ *  Side effects:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static inline bool VMCI_QP_PAGESTORE_IS_WELLFORMED(QueuePairPageStore * pageStore)	// IN
+{
+	return pageStore->len >= 2;
+}
+
+int VMCIQPBroker_Init(void);
+void VMCIQPBroker_Exit(void);
+int VMCIQPBroker_Alloc(struct vmci_handle handle, uint32_t peer,
+		       uint32_t flags, uint32_t privFlags,
+		       uint64_t produceSize, uint64_t consumeSize,
+		       QueuePairPageStore * pageStore,
+		       struct vmci_context *context);
+int VMCIQPBroker_SetPageStore(struct vmci_handle handle,
+			      uint64_t produceUVA, uint64_t consumeUVA,
+			      struct vmci_context *context);
+int VMCIQPBroker_Detach(struct vmci_handle handle,
+			struct vmci_context *context);
+
+int VMCIQPGuestEndpoints_Init(void);
+void VMCIQPGuestEndpoints_Exit(void);
+void VMCIQPGuestEndpoints_Sync(void);
+void VMCIQPGuestEndpoints_Convert(bool toLocal, bool deviceReset);
+
+int VMCIQueuePair_Alloc(struct vmci_handle *handle,
+			struct vmci_queue **produceQ, uint64_t produceSize,
+			struct vmci_queue **consumeQ, uint64_t consumeSize,
+			uint32_t peer, uint32_t flags, uint32_t privFlags,
+			bool guestEndpoint, VMCIEventReleaseCB wakeupCB,
+			void *clientData);
+int VMCIQueuePair_Detach(struct vmci_handle handle, bool guestEndpoint);
+int VMCIQPBroker_Map(struct vmci_handle handle,
+		     struct vmci_context *context, uint64_t guestMem);
+int VMCIQPBroker_Unmap(struct vmci_handle handle,
+		       struct vmci_context *context, uint32_t gid);
+
+#endif				/* !_VMCI_QUEUE_PAIR_H_ */
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 08/14] Add vmciResource.*
  2012-02-15  1:05 [PATCH 00/14] RFC: VMCI for Linux Andrew Stiegmann (stieg)
                   ` (6 preceding siblings ...)
  2012-02-15  1:05 ` [PATCH 07/14] Add vmciQueuePair.* Andrew Stiegmann (stieg)
@ 2012-02-15  1:05 ` Andrew Stiegmann (stieg)
  2012-02-15  1:05 ` [PATCH 09/14] Add vmciRoute.* Andrew Stiegmann (stieg)
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Andrew Stiegmann (stieg) @ 2012-02-15  1:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: vm-crosstalk, dtor, cschamp, Andrew Stiegmann (stieg)

---
 drivers/misc/vmw_vmci/vmciResource.c |  383 ++++++++++++++++++++++++++++++++++
 drivers/misc/vmw_vmci/vmciResource.h |   68 ++++++
 2 files changed, 451 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmciResource.c
 create mode 100644 drivers/misc/vmw_vmci/vmciResource.h

diff --git a/drivers/misc/vmw_vmci/vmciResource.c b/drivers/misc/vmw_vmci/vmciResource.c
new file mode 100644
index 0000000..f4a3710
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciResource.c
@@ -0,0 +1,383 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#include "vmci_defs.h"
+#include "vmci_infrastructure.h"
+#include "vmci_kernel_if.h"
+#include "vmciCommonInt.h"
+#include "vmciHashtable.h"
+#include "vmciResource.h"
+#include "vmciDriver.h"
+
+#define LGPFX "VMCIResource: "
+
+/* 0 through VMCI_RESERVED_RESOURCE_ID_MAX are reserved. */
+static uint32_t resourceID = VMCI_RESERVED_RESOURCE_ID_MAX + 1;
+static spinlock_t resourceIdLock;
+static struct vmci_hash_table *resourceTable = NULL;
+
+/* Public Resource Access Control API. */
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIResource_Init --
+ *
+ *      Initializes the VMCI Resource Access Control API. Creates a hashtable
+ *      to hold all resources, and registers vectors and callbacks for
+ *      hypercalls.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIResource_Init(void)
+{
+	spin_lock_init(&resourceIdLock);
+
+	resourceTable = VMCIHashTable_Create(128);
+	if (resourceTable == NULL) {
+		VMCI_WARNING((LGPFX
+			      "Failed creating a resource hash table for VMCI.\n"));
+		return VMCI_ERROR_NO_MEM;
+	}
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIResource_Exit --
+ *
+ *      Cleans up resources.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+void VMCIResource_Exit(void)
+{
+	if (resourceTable)
+		VMCIHashTable_Destroy(resourceTable);
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIResource_GetID --
+ *
+ *     Return resource ID. The first VMCI_RESERVED_RESOURCE_ID_MAX are
+ *     reserved so we start from its value + 1.
+ *
+ *  Result:
+ *     VMCI resource id on success, VMCI_INVALID_ID on failure.
+ *
+ *  Side effects:
+ *     None.
+ *
+ *
+ *------------------------------------------------------------------------------
+ */
+
+uint32_t VMCIResource_GetID(uint32_t contextID)
+{
+	uint32_t oldRID = resourceID;
+	uint32_t currentRID;
+	bool foundRID = false;
+
+	/*
+	 * Generate a unique resource ID.  Keep on trying until we wrap around
+	 * in the RID space.
+	 */
+	ASSERT(oldRID > VMCI_RESERVED_RESOURCE_ID_MAX);
+
+	do {
+		struct vmci_handle handle;
+
+		spin_lock(&resourceIdLock);
+		currentRID = resourceID;
+		handle = VMCI_MAKE_HANDLE(contextID, currentRID);
+		resourceID++;
+		if (unlikely(resourceID == VMCI_INVALID_ID)) {
+			/*
+			 * Skip the reserved rids.
+			 */
+
+			resourceID = VMCI_RESERVED_RESOURCE_ID_MAX + 1;
+		}
+		spin_unlock(&resourceIdLock);
+		foundRID = !VMCIHashTable_EntryExists(resourceTable, handle);
+	} while (!foundRID && resourceID != oldRID);
+
+	if (unlikely(!foundRID)) {
+		return VMCI_INVALID_ID;
+	} else {
+		return currentRID;
+	}
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIResource_Add --
+ *
+ * Results:
+ *      VMCI_SUCCESS if successful, error code if not.
+ *
+ * Side effects:
+ *      None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIResource_Add(struct vmci_resource *resource,	// IN
+		     enum vmci_resource_type resourceType,	// IN
+		     struct vmci_handle resourceHandle,	// IN
+		     VMCIResourceFreeCB containerFreeCB,	// IN
+		     void *containerObject)	// IN
+{
+	int result;
+
+	ASSERT(resource);
+
+	if (VMCI_HANDLE_EQUAL(resourceHandle, VMCI_INVALID_HANDLE)) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX
+				"Invalid argument resource (handle=0x%x:0x%x).\n",
+				resourceHandle.context,
+				resourceHandle.resource));
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	VMCIHashTable_InitEntry(&resource->hashEntry, resourceHandle);
+	resource->type = resourceType;
+	resource->containerFreeCB = containerFreeCB;
+	resource->containerObject = containerObject;
+
+	/* Add resource to hashtable. */
+	result = VMCIHashTable_AddEntry(resourceTable, &resource->hashEntry);
+	if (result != VMCI_SUCCESS) {
+		VMCI_DEBUG_LOG(4,
+			       (LGPFX "Failed to add entry to hash table "
+				"(result=%d).\n", result));
+		return result;
+	}
+
+	return result;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIResource_Remove --
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+void VMCIResource_Remove(struct vmci_handle resourceHandle,	// IN:
+			 enum vmci_resource_type resourceType)	// IN:
+{
+	struct vmci_resource *resource =
+	    VMCIResource_Get(resourceHandle, resourceType);
+	if (resource == NULL)
+		return;
+
+	/* Remove resource from hashtable. */
+	VMCIHashTable_RemoveEntry(resourceTable, &resource->hashEntry);
+
+	VMCIResource_Release(resource);
+	/* resource could be freed by now. */
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIResource_Get --
+ *
+ * Results:
+ *      Resource is successful. Otherwise NULL.
+ *
+ * Side effects:
+ *      None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+struct vmci_resource *VMCIResource_Get(struct vmci_handle resourceHandle,	// IN
+				       enum vmci_resource_type resourceType)	// IN
+{
+	struct vmci_resource *resource;
+	struct vmci_hash_entry *entry =
+	    VMCIHashTable_GetEntry(resourceTable, resourceHandle);
+	if (entry == NULL) {
+		return NULL;
+	}
+	resource = RESOURCE_CONTAINER(entry, struct vmci_resource, hashEntry);
+	if (resourceType == VMCI_RESOURCE_TYPE_ANY
+	    || resource->type == resourceType) {
+		return resource;
+	}
+	VMCIHashTable_ReleaseEntry(resourceTable, entry);
+	return NULL;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIResource_Hold --
+ *
+ *      Hold the given resource.  This will hold the hashtable entry.  This
+ *      is like doing a Get() but without having to lookup the resource by
+ *      handle.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+void VMCIResource_Hold(struct vmci_resource *resource)
+{
+	ASSERT(resource);
+	VMCIHashTable_HoldEntry(resourceTable, &resource->hashEntry);
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIResourceDoRemove --
+ *
+ *      Deallocates data structures associated with the given resource
+ *      and invoke any call back registered for the resource.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      May deallocate memory and invoke a callback for the removed resource.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static void inline VMCIResourceDoRemove(struct vmci_resource *resource)
+{
+	ASSERT(resource);
+
+	if (resource->containerFreeCB) {
+		resource->containerFreeCB(resource->containerObject);
+		/* Resource has been freed don't dereference it. */
+	}
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIResource_Release --
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      resource's containerFreeCB will get called if last reference.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCIResource_Release(struct vmci_resource *resource)
+{
+	int result;
+
+	ASSERT(resource);
+
+	result =
+	    VMCIHashTable_ReleaseEntry(resourceTable, &resource->hashEntry);
+	if (result == VMCI_SUCCESS_ENTRY_DEAD)
+		VMCIResourceDoRemove(resource);
+
+	/*
+	 * We propagate the information back to caller in case it wants to know
+	 * whether entry was freed.
+	 */
+	return result;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIResource_Handle --
+ *
+ *      Get the handle for the given resource.
+ *
+ * Results:
+ *      The resource's associated handle.
+ *
+ * Side effects:
+ *      None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+struct vmci_handle VMCIResource_Handle(struct vmci_resource *resource)
+{
+	ASSERT(resource);
+	return resource->hashEntry.handle;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ * VMCIResource_Sync --
+ *
+ *      Use this as a synchronization point when setting globals, for example,
+ *      during device shutdown.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+void VMCIResource_Sync(void)
+{
+	VMCIHashTable_Sync(resourceTable);
+}
diff --git a/drivers/misc/vmw_vmci/vmciResource.h b/drivers/misc/vmw_vmci/vmciResource.h
new file mode 100644
index 0000000..1c5d5f6
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciResource.h
@@ -0,0 +1,68 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef _VMCI_RESOURCE_H_
+#define _VMCI_RESOURCE_H_
+
+#include "vmci_defs.h"
+#include "vmci_kernel_if.h"
+#include "vmciHashtable.h"
+#include "vmciContext.h"
+
+#define RESOURCE_CONTAINER(ptr, type, member)			\
+	((type *)((char *)(ptr) - offsetof(type, member)))
+
+typedef void (*VMCIResourceFreeCB) (void *resource);
+
+enum vmci_resource_type {
+	VMCI_RESOURCE_TYPE_ANY,
+	VMCI_RESOURCE_TYPE_API,
+	VMCI_RESOURCE_TYPE_GROUP,
+	VMCI_RESOURCE_TYPE_DATAGRAM,
+	VMCI_RESOURCE_TYPE_DOORBELL,
+};
+
+struct vmci_resource {
+	struct vmci_hash_entry hashEntry;
+	enum vmci_resource_type type;
+	VMCIResourceFreeCB containerFreeCB;	// Callback to free container
+	/* object when refCount is 0. */
+	void *containerObject;	// Container object reference.
+};
+
+int VMCIResource_Init(void);
+void VMCIResource_Exit(void);
+void VMCIResource_Sync(void);
+
+uint32_t VMCIResource_GetID(uint32_t contextID);
+
+int VMCIResource_Add(struct vmci_resource *resource,
+		     enum vmci_resource_type resourceType,
+		     struct vmci_handle resourceHandle,
+		     VMCIResourceFreeCB containerFreeCB, void *containerObject);
+void VMCIResource_Remove(struct vmci_handle resourceHandle,
+			 enum vmci_resource_type resourceType);
+struct vmci_resource *VMCIResource_Get(struct vmci_handle resourceHandle,
+				       enum vmci_resource_type resourceType);
+void VMCIResource_Hold(struct vmci_resource *resource);
+int VMCIResource_Release(struct vmci_resource *resource);
+struct vmci_handle VMCIResource_Handle(struct vmci_resource *resource);
+
+#endif				/* _VMCI_RESOURCE_H_ */
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 09/14] Add vmciRoute.*
  2012-02-15  1:05 [PATCH 00/14] RFC: VMCI for Linux Andrew Stiegmann (stieg)
                   ` (7 preceding siblings ...)
  2012-02-15  1:05 ` [PATCH 08/14] Add vmciResource.* Andrew Stiegmann (stieg)
@ 2012-02-15  1:05 ` Andrew Stiegmann (stieg)
  2012-02-15  1:05 ` [PATCH 10/14] Add accessor methods for Queue Pairs in VMCI Andrew Stiegmann (stieg)
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Andrew Stiegmann (stieg) @ 2012-02-15  1:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: vm-crosstalk, dtor, cschamp, Andrew Stiegmann (stieg)

---
 drivers/misc/vmw_vmci/vmciRoute.c |  249 +++++++++++++++++++++++++++++++++++++
 drivers/misc/vmw_vmci/vmciRoute.h |   36 ++++++
 2 files changed, 285 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmciRoute.c
 create mode 100644 drivers/misc/vmw_vmci/vmciRoute.h

diff --git a/drivers/misc/vmw_vmci/vmciRoute.c b/drivers/misc/vmw_vmci/vmciRoute.c
new file mode 100644
index 0000000..f201b33
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciRoute.c
@@ -0,0 +1,249 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#include "vmci_defs.h"
+#include "vmci_infrastructure.h"
+#include "vmci_kernel_if.h"
+#include "vmciCommonInt.h"
+#include "vmciContext.h"
+#include "vmciDriver.h"
+#include "vmciKernelAPI.h"
+#include "vmciRoute.h"
+
+#define LGPFX "VMCIRoute: "
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCI_Route --
+ *
+ *     Make a routing decision for the given source and destination handles.
+ *     This will try to determine the route using the handles and the available
+ *     devices.
+ *
+ *  Result:
+ *     A VMCIRoute value.
+ *
+ *  Side effects:
+ *     Sets the source context if it is invalid.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+int VMCI_Route(struct vmci_handle *src,	// IN/OUT
+	       const struct vmci_handle *dst,	// IN
+	       bool fromGuest,	// IN
+	       enum vmci_route *route)	// OUT
+{
+	bool hasHostDevice;
+	bool hasGuestDevice;
+
+	ASSERT(src);
+	ASSERT(dst);
+	ASSERT(route);
+
+	*route = VMCI_ROUTE_NONE;
+
+	/*
+	 * "fromGuest" is only ever set to true by IOCTL_VMCI_DATAGRAM_SEND (or by
+	 * the vmkernel equivalent), which comes from the VMX, so we know it is
+	 * coming from a guest.
+	 */
+
+	/*
+	 * To avoid inconsistencies, test these once.  We will test them again
+	 * when we do the actual send to ensure that we do not touch a non-existent
+	 * device.
+	 */
+
+	hasHostDevice = VMCI_HostPersonalityActive();
+	hasGuestDevice = VMCI_GuestPersonalityActive();
+
+	/* Must have a valid destination context. */
+	if (VMCI_INVALID_ID == dst->context)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	/* Anywhere to hypervisor. */
+	if (VMCI_HYPERVISOR_CONTEXT_ID == dst->context) {
+		/*
+		 * If this message already came from a guest then we cannot send it
+		 * to the hypervisor.  It must come from a local client.
+		 */
+
+		if (fromGuest)
+			return VMCI_ERROR_DST_UNREACHABLE;
+
+		/* We must be acting as a guest in order to send to the hypervisor. */
+		if (!hasGuestDevice)
+			return VMCI_ERROR_DEVICE_NOT_FOUND;
+
+		/* And we cannot send if the source is the host context. */
+		if (VMCI_HOST_CONTEXT_ID == src->context)
+			return VMCI_ERROR_INVALID_ARGS;
+
+		/* Send from local client down to the hypervisor. */
+		*route = VMCI_ROUTE_AS_GUEST;
+		return VMCI_SUCCESS;
+	}
+
+	/* Anywhere to local client on host. */
+	if (VMCI_HOST_CONTEXT_ID == dst->context) {
+		/*
+		 * If it is not from a guest but we are acting as a guest, then we need
+		 * to send it down to the host.  Note that if we are also acting as a
+		 * host then this will prevent us from sending from local client to
+		 * local client, but we accept that restriction as a way to remove
+		 * any ambiguity from the host context.
+		 */
+		if (src->context == VMCI_HYPERVISOR_CONTEXT_ID) {
+			/*
+			 * If the hypervisor is the source, this is host local
+			 * communication. The hypervisor may send vmci event
+			 * datagrams to the host itself, but it will never send
+			 * datagrams to an "outer host" through the guest device.
+			 */
+
+			if (hasHostDevice) {
+				*route = VMCI_ROUTE_AS_HOST;
+				return VMCI_SUCCESS;
+			} else {
+				return VMCI_ERROR_DEVICE_NOT_FOUND;
+			}
+		}
+
+		if (!fromGuest && hasGuestDevice) {
+			/* If no source context then use the current. */
+			if (VMCI_INVALID_ID == src->context)
+				src->context = VMCI_GetContextID();
+
+			/* Send it from local client down to the host. */
+			*route = VMCI_ROUTE_AS_GUEST;
+			return VMCI_SUCCESS;
+		}
+
+		/*
+		 * Otherwise we already received it from a guest and it is destined
+		 * for a local client on this host, or it is from another local client
+		 * on this host.  We must be acting as a host to service it.
+		 */
+		if (!hasHostDevice)
+			return VMCI_ERROR_DEVICE_NOT_FOUND;
+
+		if (VMCI_INVALID_ID == src->context) {
+			/*
+			 * If it came from a guest then it must have a valid context.
+			 * Otherwise we can use the host context.
+			 */
+
+			if (fromGuest)
+				return VMCI_ERROR_INVALID_ARGS;
+
+			src->context = VMCI_HOST_CONTEXT_ID;
+		}
+
+		/* Route to local client. */
+		*route = VMCI_ROUTE_AS_HOST;
+		return VMCI_SUCCESS;
+	}
+
+	/* If we are acting as a host then this might be destined for a guest. */
+	if (hasHostDevice) {
+		/* It will have a context if it is meant for a guest. */
+		if (VMCIContext_Exists(dst->context)) {
+			if (VMCI_INVALID_ID == src->context) {
+				/*
+				 * If it came from a guest then it must have a valid context.
+				 * Otherwise we can use the host context.
+				 */
+
+				if (fromGuest) {
+					return VMCI_ERROR_INVALID_ARGS;
+				}
+				src->context = VMCI_HOST_CONTEXT_ID;
+			} else if (VMCI_CONTEXT_IS_VM(src->context) &&
+				   src->context != dst->context) {
+				/*
+				 * VM to VM communication is not allowed. Since we catch
+				 * all communication destined for the host above, this
+				 * must be destined for a VM since there is a valid
+				 * context.
+				 */
+
+				ASSERT(VMCI_CONTEXT_IS_VM(dst->context));
+
+				return VMCI_ERROR_DST_UNREACHABLE;
+			}
+
+			/* Pass it up to the guest. */
+			*route = VMCI_ROUTE_AS_HOST;
+			return VMCI_SUCCESS;
+		}
+	}
+
+	/*
+	 * We must be a guest trying to send to another guest, which means
+	 * we need to send it down to the host. We do not filter out VM to
+	 * VM communication here, since we want to be able to use the guest
+	 * driver on older versions that do support VM to VM communication.
+	 */
+	if (!hasGuestDevice)
+		return VMCI_ERROR_DEVICE_NOT_FOUND;
+
+	/* If no source context then use the current context. */
+	if (VMCI_INVALID_ID == src->context)
+		src->context = VMCI_GetContextID();
+
+	/*
+	 * Send it from local client down to the host, which will route it to
+	 * the other guest for us.
+	 */
+	*route = VMCI_ROUTE_AS_GUEST;
+	return VMCI_SUCCESS;
+}
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCI_RouteString --
+ *
+ *     Get a string for the given route.
+ *
+ *  Result:
+ *     A string representing the route, if the route is valid, otherwise an
+ *     empty string.
+ *
+ *  Side effects:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+const char *VMCI_RouteString(enum vmci_route route)	// IN
+{
+	const char *vmciRouteStrings[] = {
+		"none",
+		"as host",
+		"as guest",
+	};
+	if (route >= VMCI_ROUTE_NONE && route <= VMCI_ROUTE_AS_GUEST) {
+		return vmciRouteStrings[route];
+	}
+	return "";
+}
diff --git a/drivers/misc/vmw_vmci/vmciRoute.h b/drivers/misc/vmw_vmci/vmciRoute.h
new file mode 100644
index 0000000..b5627aa
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciRoute.h
@@ -0,0 +1,36 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef _VMCI_ROUTE_H_
+#define _VMCI_ROUTE_H_
+
+#include "vmci_defs.h"
+
+enum vmci_route {
+	VMCI_ROUTE_NONE,
+	VMCI_ROUTE_AS_HOST,
+	VMCI_ROUTE_AS_GUEST,
+};
+
+int VMCI_Route(struct vmci_handle *src, const struct vmci_handle *dst,
+	       bool fromGuest, enum vmci_route *route);
+const char *VMCI_RouteString(enum vmci_route route);
+
+#endif				// _VMCI_ROUTE_H_
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 10/14] Add accessor methods for Queue Pairs in VMCI
  2012-02-15  1:05 [PATCH 00/14] RFC: VMCI for Linux Andrew Stiegmann (stieg)
                   ` (8 preceding siblings ...)
  2012-02-15  1:05 ` [PATCH 09/14] Add vmciRoute.* Andrew Stiegmann (stieg)
@ 2012-02-15  1:05 ` Andrew Stiegmann (stieg)
  2012-02-15  1:05 ` [PATCH 11/14] Add VMCI kernel API defs and the internal header file Andrew Stiegmann (stieg)
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Andrew Stiegmann (stieg) @ 2012-02-15  1:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: vm-crosstalk, dtor, cschamp, Andrew Stiegmann (stieg)

---
 drivers/misc/vmw_vmci/vmciQPair.c | 1164 +++++++++++++++++++++++++++++++++++++
 drivers/misc/vmw_vmci/vmciQueue.h |  108 ++++
 2 files changed, 1272 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmciQPair.c
 create mode 100644 drivers/misc/vmw_vmci/vmciQueue.h

diff --git a/drivers/misc/vmw_vmci/vmciQPair.c b/drivers/misc/vmw_vmci/vmciQPair.c
new file mode 100644
index 0000000..3cb5dbd
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciQPair.c
@@ -0,0 +1,1164 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ *
+ *
+ *      This file implements Queue accessor methods.
+ *
+ *      VMCIQPair is a new interface that hides the queue pair internals.
+ *      Rather than access each queue in a pair directly, operations are now
+ *      performed on the queue as a whole.  This is simpler and less
+ *      error-prone, and allows for future queue pair features to be added
+ *      under the hood with no change to the client code.
+ *
+ *      This also helps in a particular case on Windows hosts, where the memory
+ *      allocated by the client (e.g., VMX) will disappear when the client does
+ *      (e.g., abnormal termination).  The kernel can't lock user memory into
+ *      its address space indefinitely.  By guarding access to the queue
+ *      contents we can correctly handle the case where the client disappears.
+ *
+ *      On code style:
+ *
+ *      + This entire file started its life as a cut-and-paste of the
+ *        static inline functions in bora/public/vmci_queue_pair.h.
+ *        From there, new copies of the routines were made named
+ *        without the prefix VMCI, without the underscore (the one
+ *        that followed struct vmci_queue_).  The no-underscore versions of
+ *        the routines require that the mutexes are held.
+ *
+ *      + The code -always- uses the xyzLocked() version of any given
+ *        routine even when the wrapped function is a one-liner.  The
+ *        reason for this decision was to ensure that we didn't have
+ *        copies of logic lying around that needed to be maintained.
+ *
+ *      + Note that we still pass around 'const struct vmci_queue *'s.
+ *
+ *      + Note that mutex is a field within struct vmci_queue.  We skirt the
+ *        issue of passing around a const struct vmci_queue, even though the
+ *        mutex field (__mutex, specifically) will get modified by not
+ *        ever referring to the mutex -itself- except during
+ *        initialization.  Beyond that, the code only passes the
+ *        pointer to the mutex, which is also a member of struct vmci_queue,
+ *        obviously, and which doesn't change after initialization.
+ *        This eliminates having to redefine all the functions that
+ *        are currently taking const struct vmci_queue's so that these
+ *        functions are compatible with those definitions.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+
+#include "vmci_defs.h"
+#include "vmci_kernel_if.h"
+#include "vmci_handle_array.h"
+#include "vmciKernelAPI.h"
+#include "vmciQueuePair.h"
+#include "vmciRoute.h"
+
+/*
+ * VMCIQPair
+ *
+ *      This structure is opaque to the clients.
+ */
+
+struct VMCIQPair {
+	struct vmci_handle handle;
+	struct vmci_queue *produceQ;
+	struct vmci_queue *consumeQ;
+	uint64_t produceQSize;
+	uint64_t consumeQSize;
+	uint32_t peer;
+	uint32_t flags;
+	uint32_t privFlags;
+	bool guestEndpoint;
+	uint32_t blocked;
+	wait_queue_head_t event;
+};
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPairMapQueueHeaders --
+ *
+ *      The queue headers may not be mapped at all times. If a queue is
+ *      currently not mapped, it will be attempted to do so.
+ *
+ * Results:
+ *      VMCI_SUCCESS if queues were validated, appropriate error code otherwise.
+ *
+ * Side effects:
+ *      May attempt to map in guest memory.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static int VMCIQPairMapQueueHeaders(struct vmci_queue *produceQ,	// IN
+				    struct vmci_queue *consumeQ)	// IN
+{
+	int result;
+
+	if (NULL == produceQ->qHeader || NULL == consumeQ->qHeader) {
+		result = VMCIHost_MapQueueHeaders(produceQ, consumeQ);
+		if (result < VMCI_SUCCESS) {
+			if (produceQ->savedHeader && consumeQ->savedHeader) {
+				return VMCI_ERROR_QUEUEPAIR_NOT_READY;
+			} else {
+				return VMCI_ERROR_QUEUEPAIR_NOTATTACHED;
+			}
+		}
+	}
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPairGetQueueHeaders --
+ *
+ *      Helper routine that will retrieve the produce and consume
+ *      headers of a given queue pair. If the guest memory of the
+ *      queue pair is currently not available, the saved queue headers
+ *      will be returned, if these are available.
+ *
+ * Results:
+ *      VMCI_SUCCESS if either current or saved queue headers are found.
+ *      Appropriate error code otherwise.
+ *
+ * Side effects:
+ *      May block.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static int VMCIQPairGetQueueHeaders(const VMCIQPair * qpair,	// IN
+				    struct vmci_queue_header **produceQHeader,	// OUT
+				    struct vmci_queue_header **consumeQHeader)	// OUT
+{
+	int result;
+
+	result = VMCIQPairMapQueueHeaders(qpair->produceQ, qpair->consumeQ);
+	if (result == VMCI_SUCCESS) {
+		*produceQHeader = qpair->produceQ->qHeader;
+		*consumeQHeader = qpair->consumeQ->qHeader;
+	} else if (qpair->produceQ->savedHeader && qpair->consumeQ->savedHeader) {
+		ASSERT(!qpair->guestEndpoint);
+		*produceQHeader = qpair->produceQ->savedHeader;
+		*consumeQHeader = qpair->consumeQ->savedHeader;
+		result = VMCI_SUCCESS;
+	}
+
+	return result;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPairWakeupCB --
+ *
+ *      Callback from VMCI queue pair broker indicating that a queue
+ *      pair that was previously not ready, now either is ready or
+ *      gone forever.
+ *
+ * Results:
+ *      VMCI_SUCCESS always.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static int VMCIQPairWakeupCB(void *clientData)
+{
+	VMCIQPair *qpair = (VMCIQPair *) clientData;
+	ASSERT(qpair);
+
+	VMCI_AcquireQueueMutex(qpair->produceQ);
+	while (qpair->blocked > 0) {
+		qpair->blocked--;
+		wake_up(&qpair->event);
+	}
+	VMCI_ReleaseQueueMutex(qpair->produceQ);
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPairReleaseMutexCB --
+ *
+ *      Callback from VMCI_WaitOnEvent releasing the queue pair mutex
+ *      protecting the queue pair header state.
+ *
+ * Results:
+ *      0 always.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static int VMCIQPairReleaseMutexCB(void *clientData)
+{
+	VMCIQPair *qpair = (VMCIQPair *) clientData;
+	ASSERT(qpair);
+	VMCI_ReleaseQueueMutex(qpair->produceQ);
+	return 0;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPairWaitForReadyQueue --
+ *
+ *      Makes the calling thread wait for the queue pair to become
+ *      ready for host side access.
+ *
+ * Results:
+ *     true when thread is woken up after queue pair state change.
+ *     false otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static bool VMCIQPairWaitForReadyQueue(VMCIQPair * qpair)
+{
+	if (unlikely(qpair->guestEndpoint))
+		ASSERT(false);
+
+	if (qpair->flags & VMCI_QPFLAG_NONBLOCK)
+		return false;
+
+	qpair->blocked++;
+	VMCI_WaitOnEvent(&qpair->event, VMCIQPairReleaseMutexCB, qpair);
+	VMCI_AcquireQueueMutex(qpair->produceQ);
+	return true;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPair_Alloc --
+ *
+ *      This is the client interface for allocating the memory for a
+ *      VMCIQPair structure and then attaching to the underlying
+ *      queue.  If an error occurs allocating the memory for the
+ *      VMCIQPair structure, no attempt is made to attach.  If an
+ *      error occurs attaching, then there's the VMCIQPair structure
+ *      is freed.
+ *
+ * Results:
+ *      An err, if < 0.
+ *
+ * Side effects:
+ *      Windows blocking call.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIQPair_Alloc(VMCIQPair ** qpair,	// OUT
+		    struct vmci_handle *handle,	// OUT
+		    uint64_t produceQSize,	// IN
+		    uint64_t consumeQSize,	// IN
+		    uint32_t peer,	// IN
+		    uint32_t flags,	// IN
+		    uint32_t privFlags)	// IN
+{
+	VMCIQPair *myQPair;
+	int retval;
+	struct vmci_handle src = VMCI_INVALID_HANDLE;
+	struct vmci_handle dst = VMCI_MAKE_HANDLE(peer, VMCI_INVALID_ID);
+	enum vmci_route route;
+	VMCIEventReleaseCB wakeupCB;
+	void *clientData;
+
+	/*
+	 * Restrict the size of a queuepair.  The device already enforces a limit
+	 * on the total amount of memory that can be allocated to queuepairs for a
+	 * guest.  However, we try to allocate this memory before we make the
+	 * queuepair allocation hypercall.  On Windows and Mac OS, we request a
+	 * single, continguous block, and it will fail if the OS cannot satisfy the
+	 * request. On Linux, we allocate each page separately, which means rather
+	 * than fail, the guest will thrash while it tries to allocate, and will
+	 * become increasingly unresponsive to the point where it appears to be hung.
+	 * So we place a limit on the size of an individual queuepair here, and
+	 * leave the device to enforce the restriction on total queuepair memory.
+	 * (Note that this doesn't prevent all cases; a user with only this much
+	 * physical memory could still get into trouble.)  The error used by the
+	 * device is NO_RESOURCES, so use that here too.
+	 */
+
+	if (produceQSize + consumeQSize < max(produceQSize, consumeQSize)
+	    || produceQSize + consumeQSize > VMCI_MAX_GUEST_QP_MEMORY)
+		return VMCI_ERROR_NO_RESOURCES;
+
+	myQPair = kmalloc(sizeof *myQPair, GFP_KERNEL);
+	if (!myQPair)
+		return VMCI_ERROR_NO_MEM;
+
+	memset(myQPair, 0, sizeof *myQPair);
+	myQPair->produceQSize = produceQSize;
+	myQPair->consumeQSize = consumeQSize;
+	myQPair->peer = peer;
+	myQPair->flags = flags;
+	myQPair->privFlags = privFlags;
+	retval = VMCI_Route(&src, &dst, false, &route);
+	if (retval < VMCI_SUCCESS) {
+		if (VMCI_GuestPersonalityActive()) {
+			route = VMCI_ROUTE_AS_GUEST;
+		} else {
+			route = VMCI_ROUTE_AS_HOST;
+		}
+	}
+
+	wakeupCB = clientData = NULL;
+	if (VMCI_ROUTE_AS_HOST == route) {
+		myQPair->guestEndpoint = false;
+		if (!(flags & VMCI_QPFLAG_LOCAL)) {
+			myQPair->blocked = 0;
+			init_waitqueue_head(&myQPair->event);
+			wakeupCB = VMCIQPairWakeupCB;
+			clientData = (void *)myQPair;
+		}
+	} else {
+		myQPair->guestEndpoint = true;
+	}
+
+	retval = VMCIQueuePair_Alloc(handle,
+				     &myQPair->produceQ,
+				     myQPair->produceQSize,
+				     &myQPair->consumeQ,
+				     myQPair->consumeQSize,
+				     myQPair->peer,
+				     myQPair->flags,
+				     myQPair->privFlags,
+				     myQPair->guestEndpoint,
+				     wakeupCB, clientData);
+
+	if (retval < VMCI_SUCCESS) {
+		kfree(myQPair);
+		return retval;
+	}
+
+	*qpair = myQPair;
+	myQPair->handle = *handle;
+
+	return retval;
+}
+
+EXPORT_SYMBOL(VMCIQPair_Alloc);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPair_Detach --
+ *
+ *      This is the client interface for detaching from a VMCIQPair.
+ *      Note that this routine will free the memory allocated for the
+ *      VMCIQPair structure, too.
+ *
+ * Results:
+ *      An error, if < 0.
+ *
+ * Side effects:
+ *      Will clear the caller's pointer to the VMCIQPair structure.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIQPair_Detach(VMCIQPair ** qpair)	// IN/OUT
+{
+	int result;
+	VMCIQPair *oldQPair;
+
+	if (!qpair || !(*qpair)) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	oldQPair = *qpair;
+	result =
+	    VMCIQueuePair_Detach(oldQPair->handle, oldQPair->guestEndpoint);
+
+	/*
+	 * The guest can fail to detach for a number of reasons, and if it does so,
+	 * it will cleanup the entry (if there is one).  The host can fail too, but
+	 * it won't cleanup the entry immediately, it will do that later when the
+	 * context is freed.  Either way, we need to release the qpair struct here;
+	 * there isn't much the caller can do, and we don't want to leak.
+	 */
+
+	memset(oldQPair, 0, sizeof *oldQPair);
+	oldQPair->handle = VMCI_INVALID_HANDLE;
+	oldQPair->peer = VMCI_INVALID_ID;
+	kfree(oldQPair);
+	*qpair = NULL;
+
+	return result;
+}
+
+EXPORT_SYMBOL(VMCIQPair_Detach);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPair_GetProduceIndexes --
+ *
+ *      This is the client interface for getting the current indexes of the
+ *      QPair from the point of the view of the caller as the producer.
+ *
+ * Results:
+ *      err, if < 0
+ *      Success otherwise.
+ *
+ * Side effects:
+ *      Windows blocking call.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIQPair_GetProduceIndexes(const VMCIQPair * qpair,	// IN
+				uint64_t * producerTail,	// OUT
+				uint64_t * consumerHead)	// OUT
+{
+	struct vmci_queue_header *produceQHeader;
+	struct vmci_queue_header *consumeQHeader;
+	int result;
+
+	if (!qpair)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	VMCI_AcquireQueueMutex(qpair->produceQ);
+	result =
+	    VMCIQPairGetQueueHeaders(qpair, &produceQHeader, &consumeQHeader);
+	if (result == VMCI_SUCCESS)
+		VMCIQueueHeader_GetPointers(produceQHeader, consumeQHeader,
+					    producerTail, consumerHead);
+
+	VMCI_ReleaseQueueMutex(qpair->produceQ);
+
+	if (result == VMCI_SUCCESS &&
+	    ((producerTail && *producerTail >= qpair->produceQSize) ||
+	     (consumerHead && *consumerHead >= qpair->produceQSize)))
+		return VMCI_ERROR_INVALID_SIZE;
+
+	return result;
+}
+
+EXPORT_SYMBOL(VMCIQPair_GetProduceIndexes);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPair_GetConsumeIndexes --
+ *
+ *      This is the client interface for getting the current indexes of the
+ *      QPair from the point of the view of the caller as the consumer.
+ *
+ * Results:
+ *      err, if < 0
+ *      Success otherwise.
+ *
+ * Side effects:
+ *      Windows blocking call.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIQPair_GetConsumeIndexes(const VMCIQPair * qpair,	// IN
+				uint64_t * consumerTail,	// OUT
+				uint64_t * producerHead)	// OUT
+{
+	struct vmci_queue_header *produceQHeader;
+	struct vmci_queue_header *consumeQHeader;
+	int result;
+
+	if (!qpair)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	VMCI_AcquireQueueMutex(qpair->produceQ);
+	result =
+	    VMCIQPairGetQueueHeaders(qpair, &produceQHeader, &consumeQHeader);
+	if (result == VMCI_SUCCESS)
+		VMCIQueueHeader_GetPointers(consumeQHeader, produceQHeader,
+					    consumerTail, producerHead);
+
+	VMCI_ReleaseQueueMutex(qpair->produceQ);
+
+	if (result == VMCI_SUCCESS &&
+	    ((consumerTail && *consumerTail >= qpair->consumeQSize) ||
+	     (producerHead && *producerHead >= qpair->consumeQSize)))
+		return VMCI_ERROR_INVALID_SIZE;
+
+	return result;
+}
+
+EXPORT_SYMBOL(VMCIQPair_GetConsumeIndexes);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPair_ProduceFreeSpace --
+ *
+ *      This is the client interface for getting the amount of free
+ *      space in the QPair from the point of the view of the caller as
+ *      the producer which is the common case.
+ *
+ * Results:
+ *      Err, if < 0.
+ *      Full queue if = 0.
+ *      Number of available bytes into which data can be enqueued if > 0.
+ *
+ * Side effects:
+ *      Windows blocking call.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int64_t VMCIQPair_ProduceFreeSpace(const VMCIQPair * qpair)	// IN
+{
+	struct vmci_queue_header *produceQHeader;
+	struct vmci_queue_header *consumeQHeader;
+	int64_t result;
+
+	if (!qpair)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	VMCI_AcquireQueueMutex(qpair->produceQ);
+	result =
+	    VMCIQPairGetQueueHeaders(qpair, &produceQHeader, &consumeQHeader);
+	if (result == VMCI_SUCCESS) {
+		result =
+		    VMCIQueueHeader_FreeSpace(produceQHeader,
+					      consumeQHeader,
+					      qpair->produceQSize);
+	} else {
+		result = 0;
+	}
+	VMCI_ReleaseQueueMutex(qpair->produceQ);
+
+	return result;
+}
+
+EXPORT_SYMBOL(VMCIQPair_ProduceFreeSpace);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPair_ConsumeFreeSpace --
+ *
+ *      This is the client interface for getting the amount of free
+ *      space in the QPair from the point of the view of the caller as
+ *      the consumer which is not the common case (see
+ *      VMCIQPair_ProduceFreeSpace(), above).
+ *
+ * Results:
+ *      Err, if < 0.
+ *      Full queue if = 0.
+ *      Number of available bytes into which data can be enqueued if > 0.
+ *
+ * Side effects:
+ *      Windows blocking call.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int64_t VMCIQPair_ConsumeFreeSpace(const VMCIQPair * qpair)	// IN
+{
+	struct vmci_queue_header *produceQHeader;
+	struct vmci_queue_header *consumeQHeader;
+	int64_t result;
+
+	if (!qpair)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	VMCI_AcquireQueueMutex(qpair->produceQ);
+	result =
+	    VMCIQPairGetQueueHeaders(qpair, &produceQHeader, &consumeQHeader);
+	if (result == VMCI_SUCCESS) {
+		result =
+		    VMCIQueueHeader_FreeSpace(consumeQHeader,
+					      produceQHeader,
+					      qpair->consumeQSize);
+	} else {
+		result = 0;
+	}
+	VMCI_ReleaseQueueMutex(qpair->produceQ);
+
+	return result;
+}
+
+EXPORT_SYMBOL(VMCIQPair_ConsumeFreeSpace);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPair_ProduceBufReady --
+ *
+ *      This is the client interface for getting the amount of
+ *      enqueued data in the QPair from the point of the view of the
+ *      caller as the producer which is not the common case (see
+ *      VMCIQPair_ConsumeBufReady(), above).
+ *
+ * Results:
+ *      Err, if < 0.
+ *      Empty queue if = 0.
+ *      Number of bytes ready to be dequeued if > 0.
+ *
+ * Side effects:
+ *      Windows blocking call.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int64_t VMCIQPair_ProduceBufReady(const VMCIQPair * qpair)	// IN
+{
+	struct vmci_queue_header *produceQHeader;
+	struct vmci_queue_header *consumeQHeader;
+	int64_t result;
+
+	if (!qpair)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	VMCI_AcquireQueueMutex(qpair->produceQ);
+	result =
+	    VMCIQPairGetQueueHeaders(qpair, &produceQHeader, &consumeQHeader);
+	if (result == VMCI_SUCCESS) {
+		result =
+		    VMCIQueueHeader_BufReady(produceQHeader,
+					     consumeQHeader,
+					     qpair->produceQSize);
+	} else {
+		result = 0;
+	}
+	VMCI_ReleaseQueueMutex(qpair->produceQ);
+
+	return result;
+}
+
+EXPORT_SYMBOL(VMCIQPair_ProduceBufReady);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPair_ConsumeBufReady --
+ *
+ *      This is the client interface for getting the amount of
+ *      enqueued data in the QPair from the point of the view of the
+ *      caller as the consumer which is the normal case.
+ *
+ * Results:
+ *      Err, if < 0.
+ *      Empty queue if = 0.
+ *      Number of bytes ready to be dequeued if > 0.
+ *
+ * Side effects:
+ *      Windows blocking call.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int64_t VMCIQPair_ConsumeBufReady(const VMCIQPair * qpair)	// IN
+{
+	struct vmci_queue_header *produceQHeader;
+	struct vmci_queue_header *consumeQHeader;
+	int64_t result;
+
+	if (!qpair)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	VMCI_AcquireQueueMutex(qpair->produceQ);
+	result =
+	    VMCIQPairGetQueueHeaders(qpair, &produceQHeader, &consumeQHeader);
+	if (result == VMCI_SUCCESS) {
+		result =
+		    VMCIQueueHeader_BufReady(consumeQHeader,
+					     produceQHeader,
+					     qpair->consumeQSize);
+	} else {
+		result = 0;
+	}
+	VMCI_ReleaseQueueMutex(qpair->produceQ);
+
+	return result;
+}
+
+EXPORT_SYMBOL(VMCIQPair_ConsumeBufReady);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * EnqueueLocked --
+ *
+ *      Enqueues a given buffer to the produce queue using the provided
+ *      function. As many bytes as possible (space available in the queue)
+ *      are enqueued.
+ *
+ *      Assumes the queue->mutex has been acquired.
+ *
+ * Results:
+ *      VMCI_ERROR_QUEUEPAIR_NOSPACE if no space was available to enqueue data.
+ *      VMCI_ERROR_INVALID_SIZE, if any queue pointer is outside the queue
+ *      (as defined by the queue size).
+ *      VMCI_ERROR_INVALID_ARGS, if an error occured when accessing the buffer.
+ *      VMCI_ERROR_QUEUEPAIR_NOTATTACHED, if the queue pair pages aren't
+ *      available.
+ *      Otherwise, the number of bytes written to the queue is returned.
+ *
+ * Side effects:
+ *      Updates the tail pointer of the produce queue.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline ssize_t EnqueueLocked(struct vmci_queue *produceQ,	// IN
+				    struct vmci_queue *consumeQ,	// IN
+				    const uint64_t produceQSize,	// IN
+				    const void *buf,	// IN
+				    size_t bufSize,	// IN
+				    int bufType,	// IN
+				    VMCIMemcpyToQueueFunc memcpyToQueue)	// IN
+{
+	int64_t freeSpace;
+	uint64_t tail;
+	size_t written;
+	ssize_t result;
+
+	result = VMCIQPairMapQueueHeaders(produceQ, consumeQ);
+	if (unlikely(result != VMCI_SUCCESS))
+		return result;
+
+	freeSpace = VMCIQueueHeader_FreeSpace(produceQ->qHeader,
+					      consumeQ->qHeader, produceQSize);
+	if (freeSpace == 0)
+		return VMCI_ERROR_QUEUEPAIR_NOSPACE;
+
+	if (freeSpace < VMCI_SUCCESS)
+		return (ssize_t) freeSpace;
+
+	written = (size_t) (freeSpace > bufSize ? bufSize : freeSpace);
+	tail = VMCIQueueHeader_ProducerTail(produceQ->qHeader);
+	if (likely(tail + written < produceQSize)) {
+		result =
+		    memcpyToQueue(produceQ, tail, buf, 0, written, bufType);
+	} else {
+		/* Tail pointer wraps around. */
+
+		const size_t tmp = (size_t) (produceQSize - tail);
+
+		result = memcpyToQueue(produceQ, tail, buf, 0, tmp, bufType);
+		if (result >= VMCI_SUCCESS) {
+			result =
+			    memcpyToQueue(produceQ, 0, buf, tmp,
+					  written - tmp, bufType);
+		}
+	}
+
+	if (result < VMCI_SUCCESS) {
+		return result;
+	}
+
+	VMCIQueueHeader_AddProducerTail(produceQ->qHeader, written,
+					produceQSize);
+	return written;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * DequeueLocked --
+ *
+ *      Dequeues data (if available) from the given consume queue. Writes data
+ *      to the user provided buffer using the provided function.
+ *
+ *      Assumes the queue->mutex has been acquired.
+ *
+ * Results:
+ *      VMCI_ERROR_QUEUEPAIR_NODATA if no data was available to dequeue.
+ *      VMCI_ERROR_INVALID_SIZE, if any queue pointer is outside the queue
+ *      (as defined by the queue size).
+ *      VMCI_ERROR_INVALID_ARGS, if an error occured when accessing the buffer.
+ *      Otherwise the number of bytes dequeued is returned.
+ *
+ * Side effects:
+ *      Updates the head pointer of the consume queue.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline ssize_t DequeueLocked(struct vmci_queue *produceQ,	// IN
+				    struct vmci_queue *consumeQ,	// IN
+				    const uint64_t consumeQSize,	// IN
+				    void *buf,	// IN
+				    size_t bufSize,	// IN
+				    int bufType,	// IN
+				    VMCIMemcpyFromQueueFunc memcpyFromQueue,	// IN
+				    bool updateConsumer)	// IN
+{
+	int64_t bufReady;
+	uint64_t head;
+	size_t read;
+	ssize_t result;
+
+	result = VMCIQPairMapQueueHeaders(produceQ, consumeQ);
+	if (unlikely(result != VMCI_SUCCESS))
+		return result;
+
+	bufReady = VMCIQueueHeader_BufReady(consumeQ->qHeader,
+					    produceQ->qHeader, consumeQSize);
+	if (bufReady == 0)
+		return VMCI_ERROR_QUEUEPAIR_NODATA;
+
+	if (bufReady < VMCI_SUCCESS)
+		return (ssize_t) bufReady;
+
+	read = (size_t) (bufReady > bufSize ? bufSize : bufReady);
+	head = VMCIQueueHeader_ConsumerHead(produceQ->qHeader);
+	if (likely(head + read < consumeQSize)) {
+		result = memcpyFromQueue(buf, 0, consumeQ, head, read, bufType);
+	} else {
+		/* Head pointer wraps around. */
+
+		const size_t tmp = (size_t) (consumeQSize - head);
+
+		result = memcpyFromQueue(buf, 0, consumeQ, head, tmp, bufType);
+		if (result >= VMCI_SUCCESS) {
+			result =
+			    memcpyFromQueue(buf, tmp, consumeQ, 0,
+					    read - tmp, bufType);
+		}
+	}
+
+	if (result < VMCI_SUCCESS)
+		return result;
+
+	if (updateConsumer) {
+		VMCIQueueHeader_AddConsumerHead(produceQ->qHeader,
+						read, consumeQSize);
+	}
+
+	return read;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPair_Enqueue --
+ *
+ *      This is the client interface for enqueueing data into the queue.
+ *
+ * Results:
+ *      Err, if < 0.
+ *      Number of bytes enqueued if >= 0.
+ *
+ * Side effects:
+ *      Windows blocking call.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+ssize_t VMCIQPair_Enqueue(VMCIQPair * qpair,	// IN
+			  const void *buf,	// IN
+			  size_t bufSize,	// IN
+			  int bufType)	// IN
+{
+	ssize_t result;
+
+	if (!qpair || !buf)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	VMCI_AcquireQueueMutex(qpair->produceQ);
+
+	do {
+		result = EnqueueLocked(qpair->produceQ,
+				       qpair->consumeQ,
+				       qpair->produceQSize,
+				       buf, bufSize, bufType,
+				       qpair->flags & VMCI_QPFLAG_LOCAL ?
+				       VMCIMemcpyToQueueLocal :
+				       VMCIMemcpyToQueue);
+		if (result == VMCI_ERROR_QUEUEPAIR_NOT_READY) {
+			if (!VMCIQPairWaitForReadyQueue(qpair)) {
+				result = VMCI_ERROR_WOULD_BLOCK;
+			}
+		}
+	} while (result == VMCI_ERROR_QUEUEPAIR_NOT_READY);
+
+	VMCI_ReleaseQueueMutex(qpair->produceQ);
+
+	return result;
+}
+
+EXPORT_SYMBOL(VMCIQPair_Enqueue);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPair_Dequeue --
+ *
+ *      This is the client interface for dequeueing data from the queue.
+ *
+ * Results:
+ *      Err, if < 0.
+ *      Number of bytes dequeued if >= 0.
+ *
+ * Side effects:
+ *      Windows blocking call.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+ssize_t VMCIQPair_Dequeue(VMCIQPair * qpair,	// IN
+			  void *buf,	// IN
+			  size_t bufSize,	// IN
+			  int bufType)	// IN
+{
+	ssize_t result;
+
+	if (!qpair || !buf)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	VMCI_AcquireQueueMutex(qpair->produceQ);
+
+	do {
+		result = DequeueLocked(qpair->produceQ,
+				       qpair->consumeQ,
+				       qpair->consumeQSize,
+				       buf, bufSize, bufType,
+				       qpair->flags & VMCI_QPFLAG_LOCAL ?
+				       VMCIMemcpyFromQueueLocal :
+				       VMCIMemcpyFromQueue, true);
+		if (result == VMCI_ERROR_QUEUEPAIR_NOT_READY) {
+			if (!VMCIQPairWaitForReadyQueue(qpair)) {
+				result = VMCI_ERROR_WOULD_BLOCK;
+			}
+		}
+	} while (result == VMCI_ERROR_QUEUEPAIR_NOT_READY);
+
+	VMCI_ReleaseQueueMutex(qpair->produceQ);
+
+	return result;
+}
+
+EXPORT_SYMBOL(VMCIQPair_Dequeue);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPair_Peek --
+ *
+ *      This is the client interface for peeking into a queue.  (I.e.,
+ *      copy data from the queue without updating the head pointer.)
+ *
+ * Results:
+ *      Err, if < 0.
+ *      Number of bytes peeked, if >= 0.
+ *
+ * Side effects:
+ *      Windows blocking call.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+ssize_t VMCIQPair_Peek(VMCIQPair * qpair,	// IN
+		       void *buf,	// IN
+		       size_t bufSize,	// IN
+		       int bufType)	// IN
+{
+	ssize_t result;
+
+	if (!qpair || !buf)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	VMCI_AcquireQueueMutex(qpair->produceQ);
+
+	do {
+		result = DequeueLocked(qpair->produceQ,
+				       qpair->consumeQ,
+				       qpair->consumeQSize,
+				       buf, bufSize, bufType,
+				       qpair->flags & VMCI_QPFLAG_LOCAL ?
+				       VMCIMemcpyFromQueueLocal :
+				       VMCIMemcpyFromQueue, false);
+		if (result == VMCI_ERROR_QUEUEPAIR_NOT_READY) {
+			if (!VMCIQPairWaitForReadyQueue(qpair)) {
+				result = VMCI_ERROR_WOULD_BLOCK;
+			}
+		}
+	} while (result == VMCI_ERROR_QUEUEPAIR_NOT_READY);
+
+	VMCI_ReleaseQueueMutex(qpair->produceQ);
+
+	return result;
+}
+
+EXPORT_SYMBOL(VMCIQPair_Peek);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPair_EnqueueV --
+ *
+ *      This is the client interface for enqueueing data into the queue.
+ *
+ * Results:
+ *      Err, if < 0.
+ *      Number of bytes enqueued if >= 0.
+ *
+ * Side effects:
+ *      Windows blocking call.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+ssize_t VMCIQPair_EnqueueV(VMCIQPair * qpair,	// IN
+			   void *iov,	// IN
+			   size_t iovSize,	// IN
+			   int bufType)	// IN
+{
+	ssize_t result;
+
+	if (!qpair || !iov)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	VMCI_AcquireQueueMutex(qpair->produceQ);
+
+	do {
+		result = EnqueueLocked(qpair->produceQ,
+				       qpair->consumeQ,
+				       qpair->produceQSize,
+				       iov, iovSize, bufType,
+				       VMCIMemcpyToQueueV);
+		if (result == VMCI_ERROR_QUEUEPAIR_NOT_READY) {
+			if (!VMCIQPairWaitForReadyQueue(qpair)) {
+				result = VMCI_ERROR_WOULD_BLOCK;
+			}
+		}
+	} while (result == VMCI_ERROR_QUEUEPAIR_NOT_READY);
+
+	VMCI_ReleaseQueueMutex(qpair->produceQ);
+
+	return result;
+}
+
+EXPORT_SYMBOL(VMCIQPair_EnqueueV);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPair_DequeueV --
+ *
+ *      This is the client interface for dequeueing data from the queue.
+ *
+ * Results:
+ *      Err, if < 0.
+ *      Number of bytes dequeued if >= 0.
+ *
+ * Side effects:
+ *      Windows blocking call.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+ssize_t VMCIQPair_DequeueV(VMCIQPair * qpair,	// IN
+			   void *iov,	// IN
+			   size_t iovSize,	// IN
+			   int bufType)	// IN
+{
+	ssize_t result;
+
+	VMCI_AcquireQueueMutex(qpair->produceQ);
+
+	if (!qpair || !iov)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	do {
+		result = DequeueLocked(qpair->produceQ,
+				       qpair->consumeQ,
+				       qpair->consumeQSize,
+				       iov, iovSize, bufType,
+				       VMCIMemcpyFromQueueV, true);
+		if (result == VMCI_ERROR_QUEUEPAIR_NOT_READY) {
+			if (!VMCIQPairWaitForReadyQueue(qpair)) {
+				result = VMCI_ERROR_WOULD_BLOCK;
+			}
+		}
+	} while (result == VMCI_ERROR_QUEUEPAIR_NOT_READY);
+
+	VMCI_ReleaseQueueMutex(qpair->produceQ);
+
+	return result;
+}
+
+EXPORT_SYMBOL(VMCIQPair_DequeueV);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQPair_PeekV --
+ *
+ *      This is the client interface for peeking into a queue.  (I.e.,
+ *      copy data from the queue without updating the head pointer.)
+ *
+ * Results:
+ *      Err, if < 0.
+ *      Number of bytes peeked, if >= 0.
+ *
+ * Side effects:
+ *      Windows blocking call.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+ssize_t VMCIQPair_PeekV(VMCIQPair * qpair,	// IN
+			void *iov,	// IN
+			size_t iovSize,	// IN
+			int bufType)	// IN
+{
+	ssize_t result;
+
+	if (!qpair || !iov)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	VMCI_AcquireQueueMutex(qpair->produceQ);
+
+	do {
+		result = DequeueLocked(qpair->produceQ,
+				       qpair->consumeQ,
+				       qpair->consumeQSize,
+				       iov, iovSize, bufType,
+				       VMCIMemcpyFromQueueV, false);
+		if (result == VMCI_ERROR_QUEUEPAIR_NOT_READY) {
+			if (!VMCIQPairWaitForReadyQueue(qpair)) {
+				result = VMCI_ERROR_WOULD_BLOCK;
+			}
+		}
+	} while (result == VMCI_ERROR_QUEUEPAIR_NOT_READY);
+
+	VMCI_ReleaseQueueMutex(qpair->produceQ);
+
+	return result;
+}
+
+EXPORT_SYMBOL(VMCIQPair_PeekV);
diff --git a/drivers/misc/vmw_vmci/vmciQueue.h b/drivers/misc/vmw_vmci/vmciQueue.h
new file mode 100644
index 0000000..1d7c17c
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciQueue.h
@@ -0,0 +1,108 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef _VMCI_QUEUE_H_
+#define _VMCI_QUEUE_H_
+
+/*
+ * struct vmci_queue
+ *
+ * This data type contains the information about a queue.
+ *
+ * There are two queues (hence, queue pairs) per transaction model between a
+ * pair of end points, A & B.  One queue is used by end point A to transmit
+ * commands and responses to B.  The other queue is used by B to transmit
+ * commands and responses.
+ *
+ * struct vmci_queue_kern_if is a per-OS defined Queue structure.  It contains either a
+ * direct pointer to the linear address of the buffer contents or a pointer to
+ * structures which help the OS locate those data pages.  See vmciKernelIf.c
+ * for each platform for its definition.
+ */
+
+struct vmci_queue {
+	struct vmci_queue_header *qHeader;
+	struct vmci_queue_header *savedHeader;
+	struct vmci_queue_kern_if *kernelIf;
+};
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIMemcpy{To,From}QueueFunc() prototypes.  Functions of these
+ * types are passed around to enqueue and dequeue routines.  Note that
+ * often the functions passed are simply wrappers around memcpy
+ * itself.
+ *
+ * Note: In order for the memcpy typedefs to be compatible with the VMKernel,
+ * there's an unused last parameter for the hosted side.  In
+ * ESX, that parameter holds a buffer type.
+ *
+ *-----------------------------------------------------------------------------
+ */
+typedef int VMCIMemcpyToQueueFunc(struct vmci_queue *queue,
+				  uint64_t queueOffset, const void *src,
+				  size_t srcOffset, size_t size, int bufType);
+typedef int VMCIMemcpyFromQueueFunc(void *dest, size_t destOffset,
+				    const struct vmci_queue *queue,
+				    uint64_t queueOffset, size_t size,
+				    int bufType);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIMemcpy{To,From}Queue[v]() prototypes
+ *
+ * Note that these routines are NOT SAFE to call on a host end-point
+ * until the guest end of the queue pair has attached -AND-
+ * SetPageStore().  The VMX crosstalk device will issue the
+ * SetPageStore() on behalf of the guest when the guest creates a
+ * QueuePair or attaches to one created by the host.  So, if the guest
+ * notifies the host that it's attached then the queue is safe to use.
+ * Also, if the host registers notification of the connection of the
+ * guest, then it will only receive that notification when the guest
+ * has issued the SetPageStore() call and not before (when the guest
+ * had attached).
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIMemcpyToQueue(struct vmci_queue *queue, uint64_t queueOffset,
+		      const void *src, size_t srcOffset, size_t size,
+		      int bufType);
+int VMCIMemcpyFromQueue(void *dest, size_t destOffset,
+			const struct vmci_queue *queue,
+			uint64_t queueOffset, size_t size, int bufType);
+
+int VMCIMemcpyToQueueLocal(struct vmci_queue *queue, uint64_t queueOffset,
+			   const void *src, size_t srcOffset, size_t size,
+			   int bufType);
+int VMCIMemcpyFromQueueLocal(void *dest, size_t destOffset,
+			     const struct vmci_queue *queue,
+			     uint64_t queueOffset, size_t size, int bufType);
+
+int VMCIMemcpyToQueueV(struct vmci_queue *queue, uint64_t queueOffset,
+		       const void *src, size_t srcOffset, size_t size,
+		       int bufType);
+int VMCIMemcpyFromQueueV(void *dest, size_t destOffset,
+			 const struct vmci_queue *queue,
+			 uint64_t queueOffset, size_t size, int bufType);
+
+#endif				/* !_VMCI_QUEUE_H_ */
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 11/14] Add VMCI kernel API defs and the internal header file
  2012-02-15  1:05 [PATCH 00/14] RFC: VMCI for Linux Andrew Stiegmann (stieg)
                   ` (9 preceding siblings ...)
  2012-02-15  1:05 ` [PATCH 10/14] Add accessor methods for Queue Pairs in VMCI Andrew Stiegmann (stieg)
@ 2012-02-15  1:05 ` Andrew Stiegmann (stieg)
  2012-02-15  1:05 ` [PATCH 12/14] Add misc header files used by VMCI Andrew Stiegmann (stieg)
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Andrew Stiegmann (stieg) @ 2012-02-15  1:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: vm-crosstalk, dtor, cschamp, Andrew Stiegmann (stieg)

---
 drivers/misc/vmw_vmci/vmciCommonInt.h  |  105 ++++++++++++++++++++++
 drivers/misc/vmw_vmci/vmciKernelAPI.h  |   28 ++++++
 drivers/misc/vmw_vmci/vmciKernelAPI1.h |  148 ++++++++++++++++++++++++++++++++
 drivers/misc/vmw_vmci/vmciKernelAPI2.h |   48 ++++++++++
 4 files changed, 329 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmciCommonInt.h
 create mode 100644 drivers/misc/vmw_vmci/vmciKernelAPI.h
 create mode 100644 drivers/misc/vmw_vmci/vmciKernelAPI1.h
 create mode 100644 drivers/misc/vmw_vmci/vmciKernelAPI2.h

diff --git a/drivers/misc/vmw_vmci/vmciCommonInt.h b/drivers/misc/vmw_vmci/vmciCommonInt.h
new file mode 100644
index 0000000..936c7f1
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciCommonInt.h
@@ -0,0 +1,105 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef _VMCI_COMMONINT_H_
+#define _VMCI_COMMONINT_H_
+
+#include "vmci_defs.h"
+#include "vmci_call_defs.h"
+#include "vmci_infrastructure.h"
+#include "vmci_handle_array.h"
+#include "vmci_kernel_if.h"
+
+/*
+ *  The struct datagram_queue_entry is a queue header for the in-kernel VMCI
+ *  datagram queues. It is allocated in non-paged memory, as the
+ *  content is accessed while holding a spinlock. The pending datagram
+ *  itself may be allocated from paged memory. We shadow the size of
+ *  the datagram in the non-paged queue entry as this size is used
+ *  while holding the same spinlock as above.
+ */
+
+struct datagram_queue_entry {
+	struct list_head listItem;	/* For queuing. */
+	size_t dgSize;		/* Size of datagram. */
+	struct vmci_datagram *dg;	/* Pending datagram. */
+};
+
+struct vmci_context {
+	struct list_head listItem;	/* For global VMCI list. */
+	uint32_t cid;
+	atomic_t refCount;
+	struct list_head datagramQueue;	/* Head of per VM queue. */
+	uint32_t pendingDatagrams;
+	size_t datagramQueueSize;	/* Size of datagram queue in bytes. */
+	int userVersion;	/*
+				 * Version of the code that created
+				 * this context; e.g., VMX.
+				 */
+	spinlock_t lock;	/* Locks callQueue and handleArrays. */
+	struct vmci_handle_arr *queuePairArray;	/*
+						 * QueuePairs attached to.  The array of
+						 * handles for queue pairs is accessed
+						 * from the code for QP API, and there
+						 * it is protected by the QP lock.  It
+						 * is also accessed from the context
+						 * clean up path, which does not
+						 * require a lock.  VMCILock is not
+						 * used to protect the QP array field.
+						 */
+	struct vmci_handle_arr *doorbellArray;	/* Doorbells created by context. */
+	struct vmci_handle_arr *pendingDoorbellArray;	/* Doorbells pending for context. */
+	struct vmci_handle_arr *notifierArray;	/* Contexts current context is subscribing to. */
+	struct vmci_host hostContext;
+	uint32_t privFlags;
+	uid_t user;
+	bool validUser;
+	bool *notify;		/* Notify flag pointer - hosted only. */
+	struct page *notifyPage;	/* Page backing the notify UVA. */
+};
+
+/*
+ *------------------------------------------------------------------------------
+ *
+ *  VMCIDenyInteraction --
+ *
+ *     Utilility function that checks whether two entities are allowed
+ *     to interact. If one of them is restricted, the other one must
+ *     be trusted.
+ *
+ *  Result:
+ *     true if the two entities are not allowed to interact. false otherwise.
+ *
+ *  Side effects:
+ *     None.
+ *
+ *------------------------------------------------------------------------------
+ */
+
+static inline bool VMCIDenyInteraction(uint32_t partOne,	// IN
+				       uint32_t partTwo)	// IN
+{
+	return (((partOne & VMCI_PRIVILEGE_FLAG_RESTRICTED) &&
+		 !(partTwo & VMCI_PRIVILEGE_FLAG_TRUSTED)) ||
+		((partTwo & VMCI_PRIVILEGE_FLAG_RESTRICTED) &&
+		 !(partOne & VMCI_PRIVILEGE_FLAG_TRUSTED)));
+}
+
+#endif				/* _VMCI_COMMONINT_H_ */
diff --git a/drivers/misc/vmw_vmci/vmciKernelAPI.h b/drivers/misc/vmw_vmci/vmciKernelAPI.h
new file mode 100644
index 0000000..7a6a964
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciKernelAPI.h
@@ -0,0 +1,28 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef __VMCI_KERNELAPI_H__
+#define __VMCI_KERNELAPI_H__
+
+/* With this file you always get the latest version. */
+#include "vmciKernelAPI1.h"
+#include "vmciKernelAPI2.h"
+
+#endif				/* !__VMCI_KERNELAPI_H__ */
diff --git a/drivers/misc/vmw_vmci/vmciKernelAPI1.h b/drivers/misc/vmw_vmci/vmciKernelAPI1.h
new file mode 100644
index 0000000..bf11a51
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciKernelAPI1.h
@@ -0,0 +1,148 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef __VMCI_KERNELAPI_1_H__
+#define __VMCI_KERNELAPI_1_H__
+
+#include "vmci_call_defs.h"
+#include "vmci_defs.h"
+
+/* VMCI module namespace on vmkernel. */
+#define MOD_VMCI_NAMESPACE "com.vmware.vmci"
+
+/* Define version 1. */
+#undef  VMCI_KERNEL_API_VERSION
+#define VMCI_KERNEL_API_VERSION_1 1
+#define VMCI_KERNEL_API_VERSION   VMCI_KERNEL_API_VERSION_1
+
+/* Macros to operate on the driver version number. */
+#define VMCI_MAJOR_VERSION(v) (((v) >> 16) & 0xffff)
+#define VMCI_MINOR_VERSION(v) ((v) & 0xffff)
+
+/* VMCI Device Usage API. */
+typedef void (VMCI_DeviceShutdownFn) (void *deviceRegistration, void *userData);
+
+bool VMCI_DeviceGet(uint32_t * apiVersion,
+		    VMCI_DeviceShutdownFn * deviceShutdownCB,
+		    void *userData, void **deviceRegistration);
+void VMCI_DeviceRelease(void *deviceRegistration);
+
+/* VMCI Datagram API. */
+int VMCIDatagram_CreateHnd(uint32_t resourceID, uint32_t flags,
+			   VMCIDatagramRecvCB recvCB, void *clientData,
+			   struct vmci_handle *outHandle);
+int VMCIDatagram_CreateHndPriv(uint32_t resourceID, uint32_t flags,
+			       uint32_t privFlags,
+			       VMCIDatagramRecvCB recvCB, void *clientData,
+			       struct vmci_handle *outHandle);
+int VMCIDatagram_DestroyHnd(struct vmci_handle handle);
+int VMCIDatagram_Send(struct vmci_datagram *msg);
+
+/* VMCI Utility API. */
+uint32_t VMCI_GetContextID(void);
+uint32_t VMCI_Version(void);
+int VMCI_ContextID2HostVmID(uint32_t contextID, void *hostVmID,
+			    size_t hostVmIDLen);
+int VMCI_IsContextOwner(uint32_t contextID, void *hostUser);
+
+/* VMCI Event API. */
+typedef void (*VMCI_EventCB) (uint32_t subID, struct vmci_event_data * ed,
+			      void *clientData);
+
+int VMCIEvent_Subscribe(uint32_t event, uint32_t flags,
+			VMCI_EventCB callback, void *callbackData,
+			uint32_t * subID);
+int VMCIEvent_Unsubscribe(uint32_t subID);
+
+/* VMCI Context API */
+uint32_t VMCIContext_GetPrivFlags(uint32_t contextID);
+
+/* VMCI Queue Pair API. */
+typedef struct VMCIQPair VMCIQPair;
+
+int VMCIQPair_Alloc(VMCIQPair ** qpair,
+		    struct vmci_handle *handle,
+		    uint64_t produceQSize,
+		    uint64_t consumeQSize,
+		    uint32_t peer, uint32_t flags, uint32_t privFlags);
+
+int VMCIQPair_Detach(VMCIQPair ** qpair);
+
+int VMCIQPair_GetProduceIndexes(const VMCIQPair * qpair,
+				uint64_t * producerTail,
+				uint64_t * consumerHead);
+int VMCIQPair_GetConsumeIndexes(const VMCIQPair * qpair,
+				uint64_t * consumerTail,
+				uint64_t * producerHead);
+int64_t VMCIQPair_ProduceFreeSpace(const VMCIQPair * qpair);
+int64_t VMCIQPair_ProduceBufReady(const VMCIQPair * qpair);
+int64_t VMCIQPair_ConsumeFreeSpace(const VMCIQPair * qpair);
+int64_t VMCIQPair_ConsumeBufReady(const VMCIQPair * qpair);
+ssize_t VMCIQPair_Enqueue(VMCIQPair * qpair,
+			  const void *buf, size_t bufSize, int mode);
+ssize_t VMCIQPair_Dequeue(VMCIQPair * qpair,
+			  void *buf, size_t bufSize, int mode);
+ssize_t VMCIQPair_Peek(VMCIQPair * qpair, void *buf, size_t bufSize, int mode);
+
+/* Environments that support struct iovec */
+ssize_t VMCIQPair_EnqueueV(VMCIQPair * qpair,
+			   void *iov, size_t iovSize, int mode);
+ssize_t VMCIQPair_DequeueV(VMCIQPair * qpair,
+			   void *iov, size_t iovSize, int mode);
+ssize_t VMCIQPair_PeekV(VMCIQPair * qpair, void *iov, size_t iovSize, int mode);
+
+/* Typedefs for all of the above, used by the IOCTLs and the kernel library. */
+typedef void (VMCI_DeviceReleaseFct) (void *);
+typedef int (VMCIDatagram_CreateHndFct) (uint32_t, uint32_t,
+					 VMCIDatagramRecvCB, void *,
+					 struct vmci_handle *);
+typedef int (VMCIDatagram_CreateHndPrivFct) (uint32_t, uint32_t, uint32_t,
+					     VMCIDatagramRecvCB, void *,
+					     struct vmci_handle *);
+typedef int (VMCIDatagram_DestroyHndFct) (struct vmci_handle);
+typedef int (VMCIDatagram_SendFct) (struct vmci_datagram *);
+typedef uint32_t(VMCI_GetContextIDFct) (void);
+typedef uint32_t(VMCI_VersionFct) (void);
+typedef int (VMCI_ContextID2HostVmIDFct) (uint32_t, void *, size_t);
+typedef int (VMCI_IsContextOwnerFct) (uint32_t, void *);
+typedef int (VMCIEvent_SubscribeFct) (uint32_t, uint32_t, VMCI_EventCB,
+				      void *, uint32_t *);
+typedef int (VMCIEvent_UnsubscribeFct) (uint32_t);
+typedef uint32_t(VMCIContext_GetPrivFlagsFct) (uint32_t);
+typedef int (VMCIQPair_AllocFct) (VMCIQPair **, struct vmci_handle *,
+				  uint64_t, uint64_t, uint32_t, uint32_t,
+				  uint32_t);
+typedef int (VMCIQPair_DetachFct) (VMCIQPair **);
+typedef int (VMCIQPair_GetProduceIndexesFct) (const VMCIQPair *,
+					      uint64_t *, uint64_t *);
+typedef int (VMCIQPair_GetConsumeIndexesFct) (const VMCIQPair *,
+					      uint64_t *, uint64_t *);
+typedef int64_t(VMCIQPair_ProduceFreeSpaceFct) (const VMCIQPair *);
+typedef int64_t(VMCIQPair_ProduceBufReadyFct) (const VMCIQPair *);
+typedef int64_t(VMCIQPair_ConsumeFreeSpaceFct) (const VMCIQPair *);
+typedef int64_t(VMCIQPair_ConsumeBufReadyFct) (const VMCIQPair *);
+typedef ssize_t(VMCIQPair_EnqueueFct) (VMCIQPair *, const void *, size_t, int);
+typedef ssize_t(VMCIQPair_DequeueFct) (VMCIQPair *, void *, size_t, int);
+typedef ssize_t(VMCIQPair_PeekFct) (VMCIQPair *, void *, size_t, int);
+typedef ssize_t(VMCIQPair_EnqueueVFct) (VMCIQPair * qpair, void *, size_t, int);
+typedef ssize_t(VMCIQPair_DequeueVFct) (VMCIQPair * qpair, void *, size_t, int);
+typedef ssize_t(VMCIQPair_PeekVFct) (VMCIQPair * qpair, void *, size_t, int);
+
+#endif				/* !__VMCI_KERNELAPI_1_H__ */
diff --git a/drivers/misc/vmw_vmci/vmciKernelAPI2.h b/drivers/misc/vmw_vmci/vmciKernelAPI2.h
new file mode 100644
index 0000000..bcd65cb
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciKernelAPI2.h
@@ -0,0 +1,48 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef __VMCI_KERNELAPI_2_H__
+#define __VMCI_KERNELAPI_2_H__
+
+#include "vmciKernelAPI1.h"
+
+/* Define version 2. */
+#undef  VMCI_KERNEL_API_VERSION
+#define VMCI_KERNEL_API_VERSION_2 2
+#define VMCI_KERNEL_API_VERSION   VMCI_KERNEL_API_VERSION_2
+
+/* VMCI Doorbell API. */
+#define VMCI_FLAG_DELAYED_CB 0x01
+
+typedef void (*VMCICallback) (void *clientData);
+
+int VMCIDoorbell_Create(struct vmci_handle *handle, uint32_t flags,
+			uint32_t privFlags,
+			VMCICallback notifyCB, void *clientData);
+int VMCIDoorbell_Destroy(struct vmci_handle handle);
+int VMCIDoorbell_Notify(struct vmci_handle handle, uint32_t privFlags);
+
+/* Typedefs for all of the above, used by the IOCTLs and the kernel library. */
+typedef int (VMCIDoorbell_CreateFct) (struct vmci_handle *, uint32_t,
+				      uint32_t, VMCICallback, void *);
+typedef int (VMCIDoorbell_DestroyFct) (struct vmci_handle);
+typedef int (VMCIDoorbell_NotifyFct) (struct vmci_handle, uint32_t);
+
+#endif				/* !__VMCI_KERNELAPI_2_H__ */
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 12/14] Add misc header files used by VMCI
  2012-02-15  1:05 [PATCH 00/14] RFC: VMCI for Linux Andrew Stiegmann (stieg)
                   ` (10 preceding siblings ...)
  2012-02-15  1:05 ` [PATCH 11/14] Add VMCI kernel API defs and the internal header file Andrew Stiegmann (stieg)
@ 2012-02-15  1:05 ` Andrew Stiegmann (stieg)
  2012-02-15  1:05 ` [PATCH 13/14] Add main driver and kernel interface file Andrew Stiegmann (stieg)
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Andrew Stiegmann (stieg) @ 2012-02-15  1:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: vm-crosstalk, dtor, cschamp, Andrew Stiegmann (stieg)

---
 drivers/misc/vmw_vmci/vmci_call_defs.h      |  264 +++++++++
 drivers/misc/vmw_vmci/vmci_defs.h           |  772 +++++++++++++++++++++++++++
 drivers/misc/vmw_vmci/vmci_handle_array.h   |  339 ++++++++++++
 drivers/misc/vmw_vmci/vmci_infrastructure.h |  119 ++++
 drivers/misc/vmw_vmci/vmci_iocontrols.h     |  411 ++++++++++++++
 drivers/misc/vmw_vmci/vmci_kernel_if.h      |  111 ++++
 6 files changed, 2016 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/vmci_call_defs.h
 create mode 100644 drivers/misc/vmw_vmci/vmci_defs.h
 create mode 100644 drivers/misc/vmw_vmci/vmci_handle_array.h
 create mode 100644 drivers/misc/vmw_vmci/vmci_infrastructure.h
 create mode 100644 drivers/misc/vmw_vmci/vmci_iocontrols.h
 create mode 100644 drivers/misc/vmw_vmci/vmci_kernel_if.h

diff --git a/drivers/misc/vmw_vmci/vmci_call_defs.h b/drivers/misc/vmw_vmci/vmci_call_defs.h
new file mode 100644
index 0000000..480c0dc
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_call_defs.h
@@ -0,0 +1,264 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef _VMCI_CALL_DEFS_H_
+#define _VMCI_CALL_DEFS_H_
+
+#include "vmci_defs.h"
+
+/*
+ * All structs here are an integral size of their largest member, ie. a struct
+ * with at least one 8-byte member will have a size that is an integral of 8.
+ * A struct which has a largest member of size 4 will have a size that is an
+ * integral of 4. This is because Windows CL enforces this rule. 32 bit gcc
+ * doesn't e.g. 32 bit gcc can misalign an 8 byte member if it is preceeded by
+ * a 4 byte member.
+ */
+
+/* Base struct for vmci datagrams. */
+struct vmci_datagram {
+	struct vmci_handle dst;
+	struct vmci_handle src;
+	uint64_t payloadSize;
+};
+
+/*
+ * Second flag is for creating a well-known handle instead of a per context
+ * handle.  Next flag is for deferring datagram delivery, so that the
+ * datagram callback is invoked in a delayed context (not interrupt context).
+ */
+#define VMCI_FLAG_DG_NONE          0
+#define VMCI_FLAG_WELLKNOWN_DG_HND 0x1
+#define VMCI_FLAG_ANYCID_DG_HND    0x2
+#define VMCI_FLAG_DG_DELAYED_CB    0x4
+
+/* Event callback should fire in a delayed context (not interrupt context.) */
+#define VMCI_FLAG_EVENT_NONE       0
+#define VMCI_FLAG_EVENT_DELAYED_CB 0x1
+
+/*
+ * Maximum supported size of a VMCI datagram for routable datagrams.
+ * Datagrams going to the hypervisor are allowed to be larger.
+ */
+#define VMCI_MAX_DG_SIZE (17 * 4096)
+#define VMCI_MAX_DG_PAYLOAD_SIZE (VMCI_MAX_DG_SIZE - sizeof(struct vmci_datagram))
+#define VMCI_DG_PAYLOAD(_dg) (void *)((char *)(_dg) + sizeof(struct vmci_datagram))
+#define VMCI_DG_HEADERSIZE sizeof(struct vmci_datagram)
+#define VMCI_DG_SIZE(_dg) (VMCI_DG_HEADERSIZE + (size_t)(_dg)->payloadSize)
+#define VMCI_DG_SIZE_ALIGNED(_dg) ((VMCI_DG_SIZE(_dg) + 7) & (size_t)CONST64U(0xfffffffffffffff8))
+#define VMCI_MAX_DATAGRAM_QUEUE_SIZE (VMCI_MAX_DG_SIZE * 2)
+
+/*
+ * We allow at least 1024 more event datagrams from the hypervisor past the
+ * normally allowed datagrams pending for a given context.  We define this
+ * limit on event datagrams from the hypervisor to guard against DoS attack
+ * from a malicious VM which could repeatedly attach to and detach from a queue
+ * pair, causing events to be queued at the destination VM.  However, the rate
+ * at which such events can be generated is small since it requires a VM exit
+ * and handling of queue pair attach/detach call at the hypervisor.  Event
+ * datagrams may be queued up at the destination VM if it has interrupts
+ * disabled or if it is not draining events for some other reason.  1024
+ * datagrams is a grossly conservative estimate of the time for which
+ * interrupts may be disabled in the destination VM, but at the same time does
+ * not exacerbate the memory pressure problem on the host by much (size of each
+ * event datagram is small).
+ */
+#define VMCI_MAX_DATAGRAM_AND_EVENT_QUEUE_SIZE				\
+	(VMCI_MAX_DATAGRAM_QUEUE_SIZE +					\
+	 1024 * (sizeof(struct vmci_datagram) + sizeof(struct vmci_event_data_max)))
+
+/*
+ * Struct used for querying, via VMCI_RESOURCES_QUERY, the availability of
+ * hypervisor resources.
+ * Struct size is 16 bytes. All fields in struct are aligned to their natural
+ * alignment.
+ */
+struct vmci_rsrc_query_hdr {
+	struct vmci_datagram hdr;
+	uint32_t numResources;
+	uint32_t _padding;
+};
+
+/*
+ * Convenience struct for negotiating vectors. Must match layout of
+ * VMCIResourceQueryHdr minus the struct vmci_datagram header.
+ */
+struct vmci_rscs_query_msg {
+	uint32_t numResources;
+	uint32_t _padding;
+	uint32_t resources[1];
+};
+
+/*
+ * The maximum number of resources that can be queried using
+ * VMCI_RESOURCE_QUERY is 31, as the result is encoded in the lower 31
+ * bits of a positive return value. Negative values are reserved for
+ * errors.
+ */
+#define VMCI_RESOURCE_QUERY_MAX_NUM 31
+
+/* Maximum size for the VMCI_RESOURCE_QUERY request. */
+#define VMCI_RESOURCE_QUERY_MAX_SIZE sizeof(struct vmci_rsrc_query_hdr)	\
+	+ VMCI_RESOURCE_QUERY_MAX_NUM * sizeof(uint32_t)
+
+/*
+ * Struct used for setting the notification bitmap.  All fields in
+ * struct are aligned to their natural alignment.
+ */
+struct vmci_ntfy_bm_set_msg {
+	struct vmci_datagram hdr;
+	uint32_t bitmapPPN;
+	uint32_t _pad;
+};
+
+/*
+ * Struct used for linking a doorbell handle with an index in the
+ * notify bitmap. All fields in struct are aligned to their natural
+ * alignment.
+ */
+struct vmci_doorbell_link_msg {
+	struct vmci_datagram hdr;
+	struct vmci_handle handle;
+	uint64_t notifyIdx;
+};
+
+/*
+ * Struct used for unlinking a doorbell handle from an index in the
+ * notify bitmap. All fields in struct are aligned to their natural
+ * alignment.
+ */
+struct vmci_doorbell_unlink_msg {
+	struct vmci_datagram hdr;
+	struct vmci_handle handle;
+};
+
+/*
+ * Struct used for generating a notification on a doorbell handle. All
+ * fields in struct are aligned to their natural alignment.
+ */
+struct vmci_doorbell_ntfy_msg {
+	struct vmci_datagram hdr;
+	struct vmci_handle handle;
+};
+
+/*
+ * This struct is used to contain data for events.  Size of this struct is a
+ * multiple of 8 bytes, and all fields are aligned to their natural alignment.
+ */
+struct vmci_event_data {
+	uint32_t event;		/* 4 bytes. */
+	uint32_t _pad;
+	/* Event payload is put here. */
+};
+
+/* Callback needed for correctly waiting on events. */
+typedef int
+ (*VMCIDatagramRecvCB) (void *clientData,	// IN: client data for handler
+			struct vmci_datagram * msg);	// IN:
+
+/*
+ * We use the following inline function to access the payload data associated
+ * with an event data.
+ */
+static inline void *VMCIEventDataPayload(struct vmci_event_data *evData)	// IN:
+{
+	return (void *)((char *)evData + sizeof *evData);
+}
+
+/*
+ * Define the different VMCI_EVENT payload data types here.  All structs must
+ * be a multiple of 8 bytes, and fields must be aligned to their natural
+ * alignment.
+ */
+struct vmci_event_payld_ctx {
+	uint32_t contextID;	/* 4 bytes. */
+	uint32_t _pad;
+};
+
+struct vmci_event_payld_qp {
+	struct vmci_handle handle;	/* QueuePair handle. */
+	uint32_t peerId;	/* Context id of attaching/detaching VM. */
+	uint32_t _pad;
+};
+
+/*
+ * We define the following struct to get the size of the maximum event data
+ * the hypervisor may send to the guest.  If adding a new event payload type
+ * above, add it to the following struct too (inside the union).
+ */
+struct vmci_event_data_max {
+	struct vmci_event_data eventData;
+	union {
+		struct vmci_event_payld_ctx contextPayload;
+		struct vmci_event_payld_qp qpPayload;
+	} evDataPayload;
+};
+
+/*
+ * Struct used for VMCI_EVENT_SUBSCRIBE/UNSUBSCRIBE and VMCI_EVENT_HANDLER
+ * messages.  Struct size is 32 bytes.  All fields in struct are aligned to
+ * their natural alignment.
+ */
+struct vmci_event_msg {
+	struct vmci_datagram hdr;
+	struct vmci_event_data eventData;	/* Has event type and payload. */
+	/* Payload gets put here. */
+};
+
+/*
+ * We use the following inline function to access the payload data associated
+ * with an event message.
+ *
+ * XXX: NUKE ME.
+ */
+static inline void *VMCIEventMsgPayload(struct vmci_event_msg *eMsg)	// IN:
+{
+	return VMCIEventDataPayload(&eMsg->eventData);
+}
+
+/* Flags for VMCI QueuePair API. */
+#define VMCI_QPFLAG_ATTACH_ONLY 0x1	/* Fail alloc if QP not created by peer. */
+#define VMCI_QPFLAG_LOCAL       0x2	/* Only allow attaches from local context. */
+#define VMCI_QPFLAG_NONBLOCK    0x4	/* Host won't block when guest is quiesced. */
+/* Update the following (bitwise OR flags) while adding new flags. */
+#define VMCI_QP_ALL_FLAGS       (VMCI_QPFLAG_ATTACH_ONLY | VMCI_QPFLAG_LOCAL | \
+                                 VMCI_QPFLAG_NONBLOCK)
+
+/*
+ * Structs used for QueuePair alloc and detach messages.  We align fields of
+ * these structs to 64bit boundaries.
+ */
+struct vmci_qp_alloc_msg {
+	struct vmci_datagram hdr;
+	struct vmci_handle handle;
+	uint32_t peer;		/* 32bit field. */
+	uint32_t flags;
+	uint64_t produceSize;
+	uint64_t consumeSize;
+	uint64_t numPPNs;
+	/* List of PPNs placed here. */
+};
+
+struct vmci_qp_detach_msg {
+	struct vmci_datagram hdr;
+	struct vmci_handle handle;
+};
+
+#endif
diff --git a/drivers/misc/vmw_vmci/vmci_defs.h b/drivers/misc/vmw_vmci/vmci_defs.h
new file mode 100644
index 0000000..bf1569b
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_defs.h
@@ -0,0 +1,772 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef _VMCI_DEF_H_
+#define _VMCI_DEF_H_
+
+#include <linux/atomic.h>
+#include <linux/printk.h>
+
+#define DEBUG
+
+#ifdef DEBUG
+#define VMCI_DBG(msg, args...) do {					\
+		pr_devel("VMCI_DBG %s - %s - %d: msg", __FILE__,	\
+			 __func__, __LINE__, ##args );			\
+	} while (0)
+#else
+#define VMCI_DBG(msg, args...)
+#endif
+
+/* Register offsets. */
+#define VMCI_STATUS_ADDR      0x00
+#define VMCI_CONTROL_ADDR     0x04
+#define VMCI_ICR_ADDR	      0x08
+#define VMCI_IMR_ADDR         0x0c
+#define VMCI_DATA_OUT_ADDR    0x10
+#define VMCI_DATA_IN_ADDR     0x14
+#define VMCI_CAPS_ADDR        0x18
+#define VMCI_RESULT_LOW_ADDR  0x1c
+#define VMCI_RESULT_HIGH_ADDR 0x20
+
+/* Max number of devices. */
+#define VMCI_MAX_DEVICES 1
+
+/* Status register bits. */
+#define VMCI_STATUS_INT_ON     0x1
+
+/* Control register bits. */
+#define VMCI_CONTROL_RESET        0x1
+#define VMCI_CONTROL_INT_ENABLE   0x2
+#define VMCI_CONTROL_INT_DISABLE  0x4
+
+/* Capabilities register bits. */
+#define VMCI_CAPS_HYPERCALL     0x1
+#define VMCI_CAPS_GUESTCALL     0x2
+#define VMCI_CAPS_DATAGRAM      0x4
+#define VMCI_CAPS_NOTIFICATIONS 0x8
+
+/* Interrupt Cause register bits. */
+#define VMCI_ICR_DATAGRAM      0x1
+#define VMCI_ICR_NOTIFICATION  0x2
+
+/* Interrupt Mask register bits. */
+#define VMCI_IMR_DATAGRAM      0x1
+#define VMCI_IMR_NOTIFICATION  0x2
+
+/* Interrupt type. */
+enum {
+	VMCI_INTR_TYPE_INTX = 0,
+	VMCI_INTR_TYPE_MSI = 1,
+	VMCI_INTR_TYPE_MSIX = 2
+};
+
+/* Maximum MSI/MSI-X interrupt vectors in the device. */
+#define VMCI_MAX_INTRS 2
+
+/*
+ * Supported interrupt vectors.  There is one for each ICR value above,
+ * but here they indicate the position in the vector array/message ID.
+ */
+#define VMCI_INTR_DATAGRAM     0
+#define VMCI_INTR_NOTIFICATION 1
+
+/*
+ * A single VMCI device has an upper limit of 128MB on the amount of
+ * memory that can be used for queue pairs.
+ */
+#define VMCI_MAX_GUEST_QP_MEMORY (128 * 1024 * 1024)
+
+/*
+ * We have a fixed set of resource IDs available in the VMX.
+ * This allows us to have a very simple implementation since we statically
+ * know how many will create datagram handles. If a new caller arrives and
+ * we have run out of slots we can manually increment the maximum size of
+ * available resource IDs.
+ *
+ * VMCI reserved hypervisor datagram resource IDs.
+ */
+#define VMCI_RESOURCES_QUERY      0
+#define VMCI_GET_CONTEXT_ID       1
+#define VMCI_SET_NOTIFY_BITMAP    2
+#define VMCI_DOORBELL_LINK        3
+#define VMCI_DOORBELL_UNLINK      4
+#define VMCI_DOORBELL_NOTIFY      5
+/*
+ * VMCI_DATAGRAM_REQUEST_MAP and VMCI_DATAGRAM_REMOVE_MAP are
+ * obsoleted by the removal of VM to VM communication.
+ */
+#define VMCI_DATAGRAM_REQUEST_MAP 6
+#define VMCI_DATAGRAM_REMOVE_MAP  7
+#define VMCI_EVENT_SUBSCRIBE      8
+#define VMCI_EVENT_UNSUBSCRIBE    9
+#define VMCI_QUEUEPAIR_ALLOC      10
+#define VMCI_QUEUEPAIR_DETACH     11
+
+/*
+ * VMCI_VSOCK_VMX_LOOKUP was assigned to 12 for Fusion 3.0/3.1,
+ * WS 7.0/7.1 and ESX 4.1
+ */
+#define VMCI_HGFS_TRANSPORT       13
+#define VMCI_UNITY_PBRPC_REGISTER 14
+#define VMCI_RESOURCE_MAX         15
+
+#define Log(fmt, args...) printk(KERN_INFO fmt, ##args)
+#define Warning(fmt, args...) printk(KERN_WARNING fmt, ##args)
+#define PCI_VENDOR_ID_VMWARE                    0x15AD
+#define PCI_DEVICE_ID_VMWARE_VMCI               0x0740
+#define VMCI_DRIVER_VERSION          9.3.14.0-k
+#define VMCI_DRIVER_VERSION_STRING   "9.3.14.0-k"
+
+#define ASSERT(cond) ({if (unlikely(!(cond))) panic("ASSERT Failed at %s:%d\n", __FILE__, __LINE__);})
+#define QWORD(_hi, _lo)   ((((uint64_t)(_hi)) << 32) | ((uint32_t)(_lo)))
+
+/*
+ * Compile-time assertions.
+ *
+ * The implementation uses both enum and typedef because the typedef alone is
+ * insufficient; gcc allows arrays to be declared with non-constant expressions
+ * (even in typedefs, where it makes no sense).
+ */
+#define ASSERT_ON_COMPILE(e)						\
+	do {								\
+		enum { AssertOnCompileMisused = ((e) ? 1 : -1) };	\
+		typedef char AssertOnCompileFailed[AssertOnCompileMisused]; \
+	} while (0)
+
+/* XXX: Replacement? */
+#define CEILING(x, y) (((x) + (y) - 1) / (y))
+
+#ifdef CONFIG_X86_64
+#   define CONST64(c) c##L
+#   define CONST64U(c) c##uL
+#   define FMT64 "ll"
+#else
+#   define CONST64(c) c##LL
+#   define CONST64U(c) c##uLL
+#   define FMT64 "L"
+#endif
+
+#define UNUSED_PARAM(_parm) _parm  __attribute__((__unused__))
+
+/* VMCI Ids. */
+struct vmci_handle {
+	uint32_t context;
+	uint32_t resource;
+};
+
+static inline struct vmci_handle VMCI_MAKE_HANDLE(uint32_t cid, uint32_t rid)
+{
+	struct vmci_handle h;
+	h.context = cid;
+	h.resource = rid;
+	return h;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCI_HANDLE_TO_UINT64 --
+ *
+ *     Helper for VMCI handle to uint64_t conversion.
+ *
+ * Results:
+ *     The uint64_t value.
+ *
+ * Side effects:
+ *     None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+static inline uint64_t VMCI_HANDLE_TO_UINT64(struct vmci_handle handle)	// IN:
+{
+	uint64_t handle64;
+
+	handle64 = handle.context;
+	handle64 <<= 32;
+	handle64 |= handle.resource;
+	return handle64;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCI_UINT64_TO_HANDLE --
+ *
+ *     Helper for uint64_t to VMCI handle conversion.
+ *
+ * Results:
+ *     The VMCI handle value.
+ *
+ * Side effects:
+ *     None.
+ *
+ *----------------------------------------------------------------------
+ */
+
+static inline struct vmci_handle VMCI_UINT64_TO_HANDLE(uint64_t handle64)	// IN:
+{
+	uint32_t context = (uint32_t) (handle64 >> 32);
+	uint32_t resource = (uint32_t) handle64;
+
+	return VMCI_MAKE_HANDLE(context, resource);
+}
+
+#define VMCI_HANDLE_TO_CONTEXT_ID(_handle) ((_handle).context)
+#define VMCI_HANDLE_TO_RESOURCE_ID(_handle) ((_handle).resource)
+#define VMCI_HANDLE_EQUAL(_h1, _h2) ((_h1).context == (_h2).context &&	\
+				     (_h1).resource == (_h2).resource)
+
+#define VMCI_INVALID_ID 0xFFFFFFFF
+static const struct vmci_handle VMCI_INVALID_HANDLE = { VMCI_INVALID_ID,
+	VMCI_INVALID_ID
+};
+
+#define VMCI_HANDLE_INVALID(_handle)				\
+	VMCI_HANDLE_EQUAL((_handle), VMCI_INVALID_HANDLE)
+
+/*
+ * The below defines can be used to send anonymous requests.
+ * This also indicates that no response is expected.
+ */
+#define VMCI_ANON_SRC_CONTEXT_ID   VMCI_INVALID_ID
+#define VMCI_ANON_SRC_RESOURCE_ID  VMCI_INVALID_ID
+#define VMCI_ANON_SRC_HANDLE       VMCI_MAKE_HANDLE(VMCI_ANON_SRC_CONTEXT_ID, \
+						    VMCI_ANON_SRC_RESOURCE_ID)
+
+/* The lowest 16 context ids are reserved for internal use. */
+#define VMCI_RESERVED_CID_LIMIT (uint32_t) 16
+
+/*
+ * Hypervisor context id, used for calling into hypervisor
+ * supplied services from the VM.
+ */
+#define VMCI_HYPERVISOR_CONTEXT_ID 0
+
+/*
+ * Well-known context id, a logical context that contains a set of
+ * well-known services. This context ID is now obsolete.
+ */
+#define VMCI_WELL_KNOWN_CONTEXT_ID 1
+
+/*
+ * Context ID used by host endpoints.
+ */
+#define VMCI_HOST_CONTEXT_ID  2
+
+#define VMCI_CONTEXT_IS_VM(_cid) (VMCI_INVALID_ID != _cid &&	\
+                                  _cid > VMCI_HOST_CONTEXT_ID)
+
+/*
+ * The VMCI_CONTEXT_RESOURCE_ID is used together with VMCI_MAKE_HANDLE to make
+ * handles that refer to a specific context.
+ */
+#define VMCI_CONTEXT_RESOURCE_ID 0
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCI error codes.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+#define VMCI_SUCCESS_QUEUEPAIR_ATTACH     5
+#define VMCI_SUCCESS_QUEUEPAIR_CREATE     4
+#define VMCI_SUCCESS_LAST_DETACH          3
+#define VMCI_SUCCESS_ACCESS_GRANTED       2
+#define VMCI_SUCCESS_ENTRY_DEAD           1
+#define VMCI_SUCCESS                      0LL
+#define VMCI_ERROR_INVALID_RESOURCE      (-1)
+#define VMCI_ERROR_INVALID_ARGS          (-2)
+#define VMCI_ERROR_NO_MEM                (-3)
+#define VMCI_ERROR_DATAGRAM_FAILED       (-4)
+#define VMCI_ERROR_MORE_DATA             (-5)
+#define VMCI_ERROR_NO_MORE_DATAGRAMS     (-6)
+#define VMCI_ERROR_NO_ACCESS             (-7)
+#define VMCI_ERROR_NO_HANDLE             (-8)
+#define VMCI_ERROR_DUPLICATE_ENTRY       (-9)
+#define VMCI_ERROR_DST_UNREACHABLE       (-10)
+#define VMCI_ERROR_PAYLOAD_TOO_LARGE     (-11)
+#define VMCI_ERROR_INVALID_PRIV          (-12)
+#define VMCI_ERROR_GENERIC               (-13)
+#define VMCI_ERROR_PAGE_ALREADY_SHARED   (-14)
+#define VMCI_ERROR_CANNOT_SHARE_PAGE     (-15)
+#define VMCI_ERROR_CANNOT_UNSHARE_PAGE   (-16)
+#define VMCI_ERROR_NO_PROCESS            (-17)
+#define VMCI_ERROR_NO_DATAGRAM           (-18)
+#define VMCI_ERROR_NO_RESOURCES          (-19)
+#define VMCI_ERROR_UNAVAILABLE           (-20)
+#define VMCI_ERROR_NOT_FOUND             (-21)
+#define VMCI_ERROR_ALREADY_EXISTS        (-22)
+#define VMCI_ERROR_NOT_PAGE_ALIGNED      (-23)
+#define VMCI_ERROR_INVALID_SIZE          (-24)
+#define VMCI_ERROR_REGION_ALREADY_SHARED (-25)
+#define VMCI_ERROR_TIMEOUT               (-26)
+#define VMCI_ERROR_DATAGRAM_INCOMPLETE   (-27)
+#define VMCI_ERROR_INCORRECT_IRQL        (-28)
+#define VMCI_ERROR_EVENT_UNKNOWN         (-29)
+#define VMCI_ERROR_OBSOLETE              (-30)
+#define VMCI_ERROR_QUEUEPAIR_MISMATCH    (-31)
+#define VMCI_ERROR_QUEUEPAIR_NOTSET      (-32)
+#define VMCI_ERROR_QUEUEPAIR_NOTOWNER    (-33)
+#define VMCI_ERROR_QUEUEPAIR_NOTATTACHED (-34)
+#define VMCI_ERROR_QUEUEPAIR_NOSPACE     (-35)
+#define VMCI_ERROR_QUEUEPAIR_NODATA      (-36)
+#define VMCI_ERROR_BUSMEM_INVALIDATION   (-37)
+#define VMCI_ERROR_MODULE_NOT_LOADED     (-38)
+#define VMCI_ERROR_DEVICE_NOT_FOUND      (-39)
+#define VMCI_ERROR_QUEUEPAIR_NOT_READY   (-40)
+#define VMCI_ERROR_WOULD_BLOCK           (-41)
+
+/* VMCI clients should return error code within this range */
+#define VMCI_ERROR_CLIENT_MIN     (-500)
+#define VMCI_ERROR_CLIENT_MAX     (-550)
+
+/* Internal error codes. */
+#define VMCI_SHAREDMEM_ERROR_BAD_CONTEXT (-1000)
+
+#define VMCI_PATH_MAX 256
+
+/* VMCI reserved events. */
+#define VMCI_EVENT_CTX_ID_UPDATE  0	// Only applicable to guest endpoints
+#define VMCI_EVENT_CTX_REMOVED    1	// Applicable to guest and host
+#define VMCI_EVENT_QP_RESUMED     2	// Only applicable to guest endpoints
+#define VMCI_EVENT_QP_PEER_ATTACH 3	// Applicable to guest and host
+#define VMCI_EVENT_QP_PEER_DETACH 4	// Applicable to guest and host
+#define VMCI_EVENT_MEM_ACCESS_ON  5	// Applicable to VMX and vmk.  On vmk,
+				     // this event has the Context payload type.
+#define VMCI_EVENT_MEM_ACCESS_OFF 6	// Applicable to VMX and vmk.  Same as
+				     // above for the payload type.
+#define VMCI_EVENT_MAX            7
+
+/*
+ * Of the above events, a few are reserved for use in the VMX, and
+ * other endpoints (guest and host kernel) should not use them. For
+ * the rest of the events, we allow both host and guest endpoints to
+ * subscribe to them, to maintain the same API for host and guest
+ * endpoints.
+ */
+
+#define VMCI_EVENT_VALID_VMX(_event) (_event == VMCI_EVENT_MEM_ACCESS_ON || \
+                                      _event == VMCI_EVENT_MEM_ACCESS_OFF)
+
+#define VMCI_EVENT_VALID(_event) (_event < VMCI_EVENT_MAX &&		\
+                                  !VMCI_EVENT_VALID_VMX(_event))
+
+/* Reserved guest datagram resource ids. */
+#define VMCI_EVENT_HANDLER 0
+
+/*
+ * VMCI coarse-grained privileges (per context or host
+ * process/endpoint. An entity with the restricted flag is only
+ * allowed to interact with the hypervisor and trusted entities.
+ */
+#define VMCI_PRIVILEGE_FLAG_RESTRICTED     0x01
+#define VMCI_PRIVILEGE_FLAG_TRUSTED        0x02
+#define VMCI_PRIVILEGE_ALL_FLAGS           (VMCI_PRIVILEGE_FLAG_RESTRICTED | \
+				            VMCI_PRIVILEGE_FLAG_TRUSTED)
+#define VMCI_NO_PRIVILEGE_FLAGS            0x00
+#define VMCI_DEFAULT_PROC_PRIVILEGE_FLAGS  VMCI_NO_PRIVILEGE_FLAGS
+#define VMCI_LEAST_PRIVILEGE_FLAGS         VMCI_PRIVILEGE_FLAG_RESTRICTED
+#define VMCI_MAX_PRIVILEGE_FLAGS           VMCI_PRIVILEGE_FLAG_TRUSTED
+
+#define VMCI_PUBLIC_GROUP_NAME "vmci public group"
+/* 0 through VMCI_RESERVED_RESOURCE_ID_MAX are reserved. */
+#define VMCI_RESERVED_RESOURCE_ID_MAX 1023
+
+#define VMCI_DOMAIN_NAME_MAXLEN  32
+
+#define VMCI_LGPFX "VMCI: "
+
+/*
+ * struct vmci_queue_header
+ *
+ * A Queue cannot stand by itself as designed.  Each Queue's header
+ * contains a pointer into itself (the producerTail) and into its peer
+ * (consumerHead).  The reason for the separation is one of
+ * accessibility: Each end-point can modify two things: where the next
+ * location to enqueue is within its produceQ (producerTail); and
+ * where the next dequeue location is in its consumeQ (consumerHead).
+ *
+ * An end-point cannot modify the pointers of its peer (guest to
+ * guest; NOTE that in the host both queue headers are mapped r/w).
+ * But, each end-point needs read access to both Queue header
+ * structures in order to determine how much space is used (or left)
+ * in the Queue.  This is because for an end-point to know how full
+ * its produceQ is, it needs to use the consumerHead that points into
+ * the produceQ but -that- consumerHead is in the Queue header for
+ * that end-points consumeQ.
+ *
+ * Thoroughly confused?  Sorry.
+ *
+ * producerTail: the point to enqueue new entrants.  When you approach
+ * a line in a store, for example, you walk up to the tail.
+ *
+ * consumerHead: the point in the queue from which the next element is
+ * dequeued.  In other words, who is next in line is he who is at the
+ * head of the line.
+ *
+ * Also, producerTail points to an empty byte in the Queue, whereas
+ * consumerHead points to a valid byte of data (unless producerTail ==
+ * consumerHead in which case consumerHead does not point to a valid
+ * byte of data).
+ *
+ * For a queue of buffer 'size' bytes, the tail and head pointers will be in
+ * the range [0, size-1].
+ *
+ * If produceQHeader->producerTail == consumeQHeader->consumerHead
+ * then the produceQ is empty.
+ */
+
+struct vmci_queue_header {
+	/* All fields are 64bit and aligned. */
+	struct vmci_handle handle;	/* Identifier. */
+	atomic64_t producerTail;	/* Offset in this queue. */
+	atomic64_t consumerHead;	/* Offset in peer queue. */
+};
+
+/*
+ * If one client of a QueuePair is a 32bit entity, we restrict the QueuePair
+ * size to be less than 4GB, and use 32bit atomic operations on the head and
+ * tail pointers. 64bit atomic read on a 32bit entity involves cmpxchg8b which
+ * is an atomic read-modify-write. This will cause traces to fire when a 32bit
+ * consumer tries to read the producer's tail pointer, for example, because the
+ * consumer has read-only access to the producer's tail pointer.
+ *
+ * We provide the following macros to invoke 32bit or 64bit atomic operations
+ * based on the architecture the code is being compiled on.
+ */
+
+/* Architecture independent maximum queue size. */
+#define QP_MAX_QUEUE_SIZE_ARCH_ANY   CONST64U(0xffffffff)
+
+#ifdef __x86_64__
+#  define QP_MAX_QUEUE_SIZE_ARCH  CONST64U(0xffffffffffffffff)
+#else
+#  define QP_MAX_QUEUE_SIZE_ARCH  CONST64U(0xffffffff)
+#endif				/* __x86_64__  */
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * QPAddPointer --
+ *
+ *      Helper to add a given offset to a head or tail pointer. Wraps the value
+ *      of the pointer around the max size of the queue.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline void QPAddPointer(atomic64_t * var,	// IN:
+				size_t add,	// IN:
+				uint64_t size)	// IN:
+{
+	uint64_t newVal = atomic64_read(var);
+
+	if (newVal >= size - add) {
+		newVal -= size;
+	}
+	newVal += add;
+
+	atomic64_set(var, newVal);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQueueHeader_ProducerTail() --
+ *
+ *      Helper routine to get the Producer Tail from the supplied queue.
+ *
+ * Results:
+ *      The contents of the queue's producer tail.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline uint64_t VMCIQueueHeader_ProducerTail(const struct vmci_queue_header *qHeader)	// IN:
+{
+	struct vmci_queue_header *qh = (struct vmci_queue_header *)qHeader;
+	return atomic64_read(&qh->producerTail);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQueueHeader_ConsumerHead() --
+ *
+ *      Helper routine to get the Consumer Head from the supplied queue.
+ *
+ * Results:
+ *      The contents of the queue's consumer tail.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline uint64_t VMCIQueueHeader_ConsumerHead(const struct vmci_queue_header *qHeader)	// IN:
+{
+	struct vmci_queue_header *qh = (struct vmci_queue_header *)qHeader;
+	return atomic64_read(&qh->consumerHead);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQueueHeader_AddProducerTail() --
+ *
+ *      Helper routine to increment the Producer Tail.  Fundamentally,
+ *      QPAddPointer() is used to manipulate the tail itself.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline void VMCIQueueHeader_AddProducerTail(struct vmci_queue_header *qHeader,	// IN/OUT:
+						   size_t add,	// IN:
+						   uint64_t queueSize)	// IN:
+{
+	QPAddPointer(&qHeader->producerTail, add, queueSize);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQueueHeader_AddConsumerHead() --
+ *
+ *      Helper routine to increment the Consumer Head.  Fundamentally,
+ *      QPAddPointer() is used to manipulate the head itself.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline void VMCIQueueHeader_AddConsumerHead(struct vmci_queue_header *qHeader,	// IN/OUT:
+						   size_t add,	// IN:
+						   uint64_t queueSize)	// IN:
+{
+	QPAddPointer(&qHeader->consumerHead, add, queueSize);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQueueHeader_CheckAlignment --
+ *
+ *      Checks if the given queue is aligned to page boundary.  Returns true if
+ *      the alignment is good.
+ *
+ * Results:
+ *      true or false.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline bool VMCIQueueHeader_CheckAlignment(const struct vmci_queue_header *qHeader)	// IN:
+{
+	uintptr_t hdr, offset;
+
+	hdr = (uintptr_t) qHeader;
+	offset = hdr & (PAGE_SIZE - 1);
+
+	return offset == 0;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQueueHeader_GetPointers --
+ *
+ *      Helper routine for getting the head and the tail pointer for a queue.
+ *      Both the VMCIQueues are needed to get both the pointers for one queue.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline void VMCIQueueHeader_GetPointers(const struct vmci_queue_header *produceQHeader,	// IN:
+					       const struct vmci_queue_header *consumeQHeader,	// IN:
+					       uint64_t * producerTail,	// OUT:
+					       uint64_t * consumerHead)	// OUT:
+{
+	if (producerTail)
+		*producerTail = VMCIQueueHeader_ProducerTail(produceQHeader);
+
+	if (consumerHead)
+		*consumerHead = VMCIQueueHeader_ConsumerHead(consumeQHeader);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQueueHeader_ResetPointers --
+ *
+ *      Reset the tail pointer (of "this" queue) and the head pointer (of
+ *      "peer" queue).
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline void VMCIQueueHeader_ResetPointers(struct vmci_queue_header *qHeader)	// IN/OUT:
+{
+	atomic64_set(&qHeader->producerTail, CONST64U(0));
+	atomic64_set(&qHeader->consumerHead, CONST64U(0));
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQueueHeader_Init --
+ *
+ *      Initializes a queue's state (head & tail pointers).
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline void VMCIQueueHeader_Init(struct vmci_queue_header *qHeader,	// IN/OUT:
+					const struct vmci_handle handle)	// IN:
+{
+	qHeader->handle = handle;
+	VMCIQueueHeader_ResetPointers(qHeader);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQueueHeader_FreeSpace --
+ *
+ *      Finds available free space in a produce queue to enqueue more
+ *      data or reports an error if queue pair corruption is detected.
+ *
+ * Results:
+ *      Free space size in bytes or an error code.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline int64_t VMCIQueueHeader_FreeSpace(const struct vmci_queue_header *produceQHeader,	// IN:
+						const struct vmci_queue_header *consumeQHeader,	// IN:
+						const uint64_t produceQSize)	// IN:
+{
+	uint64_t tail;
+	uint64_t head;
+	uint64_t freeSpace;
+
+	tail = VMCIQueueHeader_ProducerTail(produceQHeader);
+	head = VMCIQueueHeader_ConsumerHead(consumeQHeader);
+
+	if (tail >= produceQSize || head >= produceQSize)
+		return VMCI_ERROR_INVALID_SIZE;
+
+	/*
+	 * Deduct 1 to avoid tail becoming equal to head which causes ambiguity. If
+	 * head and tail are equal it means that the queue is empty.
+	 */
+
+	if (tail >= head) {
+		freeSpace = produceQSize - (tail - head) - 1;
+	} else {
+		freeSpace = head - tail - 1;
+	}
+
+	return freeSpace;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIQueueHeader_BufReady --
+ *
+ *      VMCIQueueHeader_FreeSpace() does all the heavy lifting of
+ *      determing the number of free bytes in a Queue.  This routine,
+ *      then subtracts that size from the full size of the Queue so
+ *      the caller knows how many bytes are ready to be dequeued.
+ *
+ * Results:
+ *      On success, available data size in bytes (up to MAX_INT64).
+ *      On failure, appropriate error code.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline int64_t VMCIQueueHeader_BufReady(const struct vmci_queue_header *consumeQHeader,	// IN:
+					       const struct vmci_queue_header *produceQHeader,	// IN:
+					       const uint64_t consumeQSize)	// IN:
+{
+	int64_t freeSpace;
+
+	freeSpace = VMCIQueueHeader_FreeSpace(consumeQHeader,
+					      produceQHeader, consumeQSize);
+	if (freeSpace < VMCI_SUCCESS) {
+		return freeSpace;
+	} else {
+		return consumeQSize - freeSpace - 1;
+	}
+}
+
+#endif				/* _VMCI_DEF_H_ */
diff --git a/drivers/misc/vmw_vmci/vmci_handle_array.h b/drivers/misc/vmw_vmci/vmci_handle_array.h
new file mode 100644
index 0000000..eb237e3
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_handle_array.h
@@ -0,0 +1,339 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef _VMCI_HANDLE_ARRAY_H_
+#define _VMCI_HANDLE_ARRAY_H_
+
+#include <linux/slab.h>
+
+#include "vmci_defs.h"
+#include "vmci_kernel_if.h"
+
+#define VMCI_HANDLE_ARRAY_DEFAULT_SIZE 4
+#define VMCI_ARR_CAP_MULT 2	/* Array capacity multiplier */
+
+struct vmci_handle_arr {
+	uint32_t capacity;
+	uint32_t size;
+	struct vmci_handle entries[1];
+};
+
+/*
+ *-----------------------------------------------------------------------------------
+ *
+ * VMCIHandleArray_Create --
+ *
+ * Results:
+ *      Array if successful, NULL if not.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------------
+ */
+
+static inline struct vmci_handle_arr *VMCIHandleArray_Create(uint32_t capacity)
+{
+	struct vmci_handle_arr *array;
+
+	if (capacity == 0)
+		capacity = VMCI_HANDLE_ARRAY_DEFAULT_SIZE;
+
+	array = (struct vmci_handle_arr *)kmalloc(sizeof array->capacity +
+						  sizeof array->size +
+						  capacity *
+						  sizeof(struct
+							 vmci_handle),
+						  GFP_ATOMIC);
+	if (array == NULL)
+		return NULL;
+
+	array->capacity = capacity;
+	array->size = 0;
+
+	return array;
+}
+
+/*
+ *-----------------------------------------------------------------------------------
+ *
+ * VMCIHandleArray_Destroy --
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------------
+ */
+
+static inline void VMCIHandleArray_Destroy(struct vmci_handle_arr *array)
+{
+	kfree(array);
+}
+
+/*
+ *-----------------------------------------------------------------------------------
+ *
+ * VMCIHandleArray_AppendEntry --
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      Array may be reallocated.
+ *
+ *-----------------------------------------------------------------------------------
+ */
+
+static inline void
+VMCIHandleArray_AppendEntry(struct vmci_handle_arr **arrayPtr,
+			    struct vmci_handle handle)
+{
+	struct vmci_handle_arr *array;
+
+	ASSERT(arrayPtr && *arrayPtr);
+	array = *arrayPtr;
+
+	if (unlikely(array->size >= array->capacity)) {
+		/* reallocate. */
+		uint32_t arraySize =
+		    sizeof array->capacity + sizeof array->size +
+		    array->capacity * sizeof(struct vmci_handle) *
+		    VMCI_ARR_CAP_MULT;
+		struct vmci_handle_arr *newArray = (struct vmci_handle_arr *)
+		    kmalloc(arraySize, GFP_ATOMIC);
+
+		if (newArray == NULL)
+			return;
+
+		memcpy(newArray, array, arraySize);
+		newArray->capacity *= VMCI_ARR_CAP_MULT;
+		kfree(array);
+		*arrayPtr = newArray;
+		array = newArray;
+	}
+	array->entries[array->size] = handle;
+	array->size++;
+}
+
+/*
+ *-----------------------------------------------------------------------------------
+ *
+ * VMCIHandleArray_RemoveEntry --
+ *
+ * Results:
+ *      Handle that was removed, VMCI_INVALID_HANDLE if entry not found.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------------
+ */
+
+static inline struct vmci_handle
+VMCIHandleArray_RemoveEntry(struct vmci_handle_arr *array,
+			    struct vmci_handle entryHandle)
+{
+	uint32_t i;
+	struct vmci_handle handle = VMCI_INVALID_HANDLE;
+
+	ASSERT(array);
+	for (i = 0; i < array->size; i++) {
+		if (VMCI_HANDLE_EQUAL(array->entries[i], entryHandle)) {
+			handle = array->entries[i];
+			array->size--;
+			array->entries[i] = array->entries[array->size];
+			array->entries[array->size] = VMCI_INVALID_HANDLE;
+			break;
+		}
+	}
+
+	return handle;
+}
+
+/*
+ *-----------------------------------------------------------------------------------
+ *
+ * VMCIHandleArray_RemoveTail --
+ *
+ * Results:
+ *      Handle that was removed, VMCI_INVALID_HANDLE if array was empty.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------------
+ */
+
+static inline struct vmci_handle
+VMCIHandleArray_RemoveTail(struct vmci_handle_arr *array)
+{
+	struct vmci_handle handle = VMCI_INVALID_HANDLE;
+
+	if (array->size) {
+		array->size--;
+		handle = array->entries[array->size];
+		array->entries[array->size] = VMCI_INVALID_HANDLE;
+	}
+
+	return handle;
+}
+
+/*
+ *-----------------------------------------------------------------------------------
+ *
+ * VMCIHandleArray_GetEntry --
+ *
+ * Results:
+ *      Handle at given index, VMCI_INVALID_HANDLE if invalid index.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------------
+ */
+
+static inline struct vmci_handle
+VMCIHandleArray_GetEntry(const struct vmci_handle_arr *array, uint32_t index)
+{
+	ASSERT(array);
+
+	if (unlikely(index >= array->size))
+		return VMCI_INVALID_HANDLE;
+
+	return array->entries[index];
+}
+
+/*
+ *-----------------------------------------------------------------------------------
+ *
+ * VMCIHandleArray_GetSize --
+ *
+ * Results:
+ *      Number of entries in array.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------------
+ */
+
+static inline uint32_t
+VMCIHandleArray_GetSize(const struct vmci_handle_arr *array)
+{
+	ASSERT(array);
+	return array->size;
+}
+
+/*
+ *-----------------------------------------------------------------------------------
+ *
+ * VMCIHandleArray_HasEntry --
+ *
+ * Results:
+ *      true is entry exists in array, false if not.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------------
+ */
+
+static inline bool
+VMCIHandleArray_HasEntry(const struct vmci_handle_arr *array,
+			 struct vmci_handle entryHandle)
+{
+	uint32_t i;
+
+	ASSERT(array);
+	for (i = 0; i < array->size; i++)
+		if (VMCI_HANDLE_EQUAL(array->entries[i], entryHandle))
+			return true;
+
+	return false;
+}
+
+/*
+ *-----------------------------------------------------------------------------------
+ *
+ * VMCIHandleArray_GetCopy --
+ *
+ * Results:
+ *      Returns pointer to copy of array on success or NULL, if memory allocation
+ *      fails.
+ *
+ * Side effects:
+ *      Allocates nonpaged memory.
+ *
+ *-----------------------------------------------------------------------------------
+ */
+
+static inline struct vmci_handle_arr *VMCIHandleArray_GetCopy(const struct
+							      vmci_handle_arr
+							      *array)
+{
+	struct vmci_handle_arr *arrayCopy;
+
+	ASSERT(array);
+
+	arrayCopy =
+	    (struct vmci_handle_arr *)kmalloc(sizeof array->capacity +
+					      sizeof array->size +
+					      array->size *
+					      sizeof(struct vmci_handle),
+					      GFP_ATOMIC);
+	if (arrayCopy != NULL) {
+		memcpy(&arrayCopy->size, &array->size,
+		       sizeof array->size +
+		       array->size * sizeof(struct vmci_handle));
+		arrayCopy->capacity = array->size;
+	}
+
+	return arrayCopy;
+}
+
+/*
+ *-----------------------------------------------------------------------------------
+ *
+ * VMCIHandleArray_GetHandles --
+ *
+ * Results:
+ *      NULL if the array is empty. Otherwise, a pointer to the array
+ *      of VMCI handles in the handle array.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------------
+ */
+
+static inline struct vmci_handle *VMCIHandleArray_GetHandles(struct vmci_handle_arr *array)	// IN
+{
+	ASSERT(array);
+
+	if (array->size)
+		return array->entries;
+
+	return NULL;
+}
+
+#endif				// _VMCI_HANDLE_ARRAY_H_
diff --git a/drivers/misc/vmw_vmci/vmci_infrastructure.h b/drivers/misc/vmw_vmci/vmci_infrastructure.h
new file mode 100644
index 0000000..04a9ba6
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_infrastructure.h
@@ -0,0 +1,119 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef _VMCI_INFRASTRUCTURE_H_
+#define _VMCI_INFRASTRUCTURE_H_
+
+#include "vmci_defs.h"
+
+typedef enum {
+	VMCIOBJ_VMX_VM = 10,
+	VMCIOBJ_CONTEXT,
+	VMCIOBJ_SOCKET,
+	VMCIOBJ_NOT_SET,
+} VMCIObjType;
+
+/* For storing VMCI structures in file handles. */
+typedef struct VMCIObj {
+	void *ptr;
+	VMCIObjType type;
+} VMCIObj;
+
+/* Guestcalls currently support a maximum of 8 uint64_t arguments. */
+#define VMCI_GUESTCALL_MAX_ARGS_SIZE 64
+
+/*
+ * Structure used for checkpointing the doorbell mappings. It is
+ * written to the checkpoint as is, so changing this structure will
+ * break checkpoint compatibility.
+ */
+struct dbell_cpt_state {
+	struct vmci_handle handle;
+	uint64_t bitmapIdx;
+};
+
+/* Used to determine what checkpoint state to get and set. */
+#define VMCI_NOTIFICATION_CPT_STATE 0x1
+#define VMCI_WELLKNOWN_CPT_STATE    0x2
+#define VMCI_DG_OUT_STATE           0x3
+#define VMCI_DG_IN_STATE            0x4
+#define VMCI_DG_IN_SIZE_STATE       0x5
+#define VMCI_DOORBELL_CPT_STATE     0x6
+
+/*
+ *-------------------------------------------------------------------------
+ *
+ *  VMCI_Hash --
+ *
+ *     Hash function used by the Simple Datagram API. Based on the djb2
+ *     hash function by Dan Bernstein.
+ *
+ *  Result:
+ *     Returns guest call size.
+ *
+ *  Side effects:
+ *     None.
+ *
+ *-------------------------------------------------------------------------
+ */
+
+static inline int VMCI_Hash(struct vmci_handle handle, unsigned size)
+{
+	unsigned i;
+	int hash = 5381;
+	const uint64_t handleValue = QWORD(handle.resource, handle.context);
+
+	for (i = 0; i < sizeof handle; i++)
+		hash =
+		    ((hash << 5) + hash) + (uint8_t) (handleValue >> (i * 8));
+
+	return hash & (size - 1);
+}
+
+/*
+ *-------------------------------------------------------------------------
+ *
+ *  VMCI_HashId --
+ *
+ *     Hash function used by the Simple Datagram API. Hashes only a VMCI id
+ *     (not the full VMCI handle) Based on the djb2
+ *     hash function by Dan Bernstein.
+ *
+ *  Result:
+ *     Returns guest call size.
+ *
+ *  Side effects:
+ *     None.
+ *
+ *-------------------------------------------------------------------------
+ */
+
+static inline int VMCI_HashId(uint32_t id, unsigned size)
+{
+	unsigned i;
+	int hash = 5381;
+
+	for (i = 0; i < sizeof id; i++)
+		hash = ((hash << 5) + hash) + (uint8_t) (id >> (i * 8));
+
+	return hash & (size - 1);
+}
+
+#endif				// _VMCI_INFRASTRUCTURE_H_
diff --git a/drivers/misc/vmw_vmci/vmci_iocontrols.h b/drivers/misc/vmw_vmci/vmci_iocontrols.h
new file mode 100644
index 0000000..06f5776
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_iocontrols.h
@@ -0,0 +1,411 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#ifndef _VMCI_IOCONTROLS_H_
+#define _VMCI_IOCONTROLS_H_
+
+#include "vmci_defs.h"
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIVA64ToPtr --
+ *
+ *      Convert a VA64 to a pointer.
+ *
+ * Results:
+ *      Virtual address.
+ *
+ * Side effects:
+ *      None
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline void *VMCIVA64ToPtr(uint64_t va64)	// IN
+{
+#ifdef CONFIG_X86_64
+	ASSERT_ON_COMPILE(sizeof(void *) == 8);
+#else
+	ASSERT_ON_COMPILE(sizeof(void *) == 4);
+	/* Check that nothing of value will be lost. */
+	ASSERT(!(va64 >> 32));
+#endif
+	return (void *)(uintptr_t) va64;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIPtrToVA64 --
+ *
+ *      Convert a pointer to a uint64_t.
+ *
+ * Results:
+ *      Virtual address.
+ *
+ * Side effects:
+ *      None
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline uint64_t VMCIPtrToVA64(void const *ptr)	// IN
+{
+	ASSERT_ON_COMPILE(sizeof ptr <= sizeof(uint64_t));
+	return (uint64_t) (uintptr_t) ptr;
+}
+
+/*
+ * Driver version.
+ *
+ * Increment major version when you make an incompatible change.
+ * Compatibility goes both ways (old driver with new executable
+ * as well as new driver with old executable).
+ */
+#define VMCI_VERSION_SHIFT_WIDTH   16	/* Never change this. */
+#define VMCI_MAKE_VERSION(_major, _minor)    ((_major) <<		\
+                                              VMCI_VERSION_SHIFT_WIDTH | \
+                                              (uint16_t) (_minor))
+#define VMCI_VERSION_MAJOR(v)  ((uint32) (v) >> VMCI_VERSION_SHIFT_WIDTH)
+#define VMCI_VERSION_MINOR(v)  ((uint16_t) (v))
+
+/*
+ * VMCI_VERSION is always the current version.  Subsequently listed
+ * versions are ways of detecting previous versions of the connecting
+ * application (i.e., VMX).
+ *
+ * VMCI_VERSION_NOVMVM: This version removed support for VM to VM
+ * communication.
+ *
+ * VMCI_VERSION_NOTIFY: This version introduced doorbell notification
+ * support.
+ *
+ * VMCI_VERSION_HOSTQP: This version introduced host end point support
+ * for hosted products.
+ *
+ * VMCI_VERSION_PREHOSTQP: This is the version prior to the adoption of
+ * support for host end-points.
+ *
+ * VMCI_VERSION_PREVERS2: This fictional version number is intended to
+ * represent the version of a VMX which doesn't call into the driver
+ * with ioctl VERSION2 and thus doesn't establish its version with the
+ * driver.
+ */
+
+#define VMCI_VERSION                VMCI_VERSION_NOVMVM
+#define VMCI_VERSION_NOVMVM         VMCI_MAKE_VERSION(11, 0)
+#define VMCI_VERSION_NOTIFY         VMCI_MAKE_VERSION(10, 0)
+#define VMCI_VERSION_HOSTQP         VMCI_MAKE_VERSION(9, 0)
+#define VMCI_VERSION_PREHOSTQP      VMCI_MAKE_VERSION(8, 0)
+#define VMCI_VERSION_PREVERS2       VMCI_MAKE_VERSION(1, 0)
+
+/*
+ * Linux defines _IO* macros, but the core kernel code ignore the encoded
+ * ioctl value. It is up to individual drivers to decode the value (for
+ * example to look at the size of a structure to determine which version
+ * of a specific command should be used) or not (which is what we
+ * currently do, so right now the ioctl value for a given command is the
+ * command itself).
+ *
+ * Hence, we just define the IOCTL_VMCI_foo values directly, with no
+ * intermediate IOCTLCMD_ representation.
+ */
+#  define IOCTLCMD(_cmd) IOCTL_VMCI_ ## _cmd
+
+enum IOCTLCmd_VMCI {
+	/*
+	 * We need to bracket the range of values used for ioctls, because x86_64
+	 * Linux forces us to explicitly register ioctl handlers by value for
+	 * handling 32 bit ioctl syscalls.  Hence FIRST and LAST.  Pick something
+	 * for FIRST that doesn't collide with vmmon (2001+).
+	 */
+	IOCTLCMD(FIRST) = 1951,
+	IOCTLCMD(VERSION) = IOCTLCMD(FIRST),
+
+	/* BEGIN VMCI */
+	IOCTLCMD(INIT_CONTEXT),
+
+	/*
+	 * The following two were used for process and datagram process creation.
+	 * They are not used anymore and reserved for future use.
+	 * They will fail if issued.
+	 */
+	IOCTLCMD(RESERVED1),
+	IOCTLCMD(RESERVED2),
+
+	/*
+	 * The following used to be for shared memory. It is now unused and and is
+	 * reserved for future use. It will fail if issued.
+	 */
+	IOCTLCMD(RESERVED3),
+
+	/*
+	 * The follwoing three were also used to be for shared memory. An
+	 * old WS6 user-mode client might try to use them with the new
+	 * driver, but since we ensure that only contexts created by VMX'en
+	 * of the appropriate version (VMCI_VERSION_NOTIFY or
+	 * VMCI_VERSION_NEWQP) or higher use these ioctl, everything is
+	 * fine.
+	 */
+	IOCTLCMD(QUEUEPAIR_SETVA),
+	IOCTLCMD(NOTIFY_RESOURCE),
+	IOCTLCMD(NOTIFICATIONS_RECEIVE),
+	IOCTLCMD(VERSION2),
+	IOCTLCMD(QUEUEPAIR_ALLOC),
+	IOCTLCMD(QUEUEPAIR_SETPAGEFILE),
+	IOCTLCMD(QUEUEPAIR_DETACH),
+	IOCTLCMD(DATAGRAM_SEND),
+	IOCTLCMD(DATAGRAM_RECEIVE),
+	IOCTLCMD(DATAGRAM_REQUEST_MAP),
+	IOCTLCMD(DATAGRAM_REMOVE_MAP),
+	IOCTLCMD(CTX_ADD_NOTIFICATION),
+	IOCTLCMD(CTX_REMOVE_NOTIFICATION),
+	IOCTLCMD(CTX_GET_CPT_STATE),
+	IOCTLCMD(CTX_SET_CPT_STATE),
+	IOCTLCMD(GET_CONTEXT_ID),
+	/* END VMCI */
+
+	/*
+	 * BEGIN VMCI SOCKETS
+	 * XXX: NEEDED?
+	 *
+	 * We mark the end of the vmci commands and the start of the vmci sockets
+	 * commands since they are used in separate modules on Linux.
+	 * */
+	IOCTLCMD(LAST),
+	IOCTLCMD(SOCKETS_FIRST) = IOCTLCMD(LAST),
+
+	/*
+	 * This used to be for accept() on Windows and Mac OS, which is now
+	 * redundant (since we now use real handles).  It is used instead for
+	 * getting the version.  This value is now public, so it cannot change.
+	 */
+	IOCTLCMD(SOCKETS_VERSION) = IOCTLCMD(SOCKETS_FIRST),
+	IOCTLCMD(SOCKETS_BIND),
+
+	/*
+	 * This used to be for close() on Windows and Mac OS, but is no longer
+	 * used for the same reason as accept() above.  It is used instead for
+	 * sending private symbols to the Mac OS driver.
+	 */
+	IOCTLCMD(SOCKETS_SET_SYMBOLS),
+	IOCTLCMD(SOCKETS_CONNECT),
+
+	/*
+	 * The next two values are public (vmci_sockets.h) and cannot be changed.
+	 * That means the number of values above these cannot be changed either
+	 * unless the base index (specified below) is updated accordingly.
+	 */
+	IOCTLCMD(SOCKETS_GET_AF_VALUE),
+	IOCTLCMD(SOCKETS_GET_LOCAL_CID),
+	IOCTLCMD(SOCKETS_GET_SOCK_NAME),
+	IOCTLCMD(SOCKETS_GET_SOCK_OPT),
+	IOCTLCMD(SOCKETS_GET_VM_BY_NAME),
+	IOCTLCMD(SOCKETS_IOCTL),
+	IOCTLCMD(SOCKETS_LISTEN),
+	IOCTLCMD(SOCKETS_RECV),
+	IOCTLCMD(SOCKETS_RECV_FROM),
+	IOCTLCMD(SOCKETS_SELECT),
+	IOCTLCMD(SOCKETS_SEND),
+	IOCTLCMD(SOCKETS_SEND_TO),
+	IOCTLCMD(SOCKETS_SET_SOCK_OPT),
+	IOCTLCMD(SOCKETS_SHUTDOWN),
+	IOCTLCMD(SOCKETS_SOCKET),	/* 1990 on Linux. */
+	/* END VMCI SOCKETS */
+
+	/*
+	 * We reserve a range of 4 ioctls for VMCI Sockets to grow.  We cannot
+	 * reserve many ioctls here since we are close to overlapping with vmmon
+	 * ioctls.  Define a meta-ioctl if running out of this binary space.
+	 *
+	 * Must be last.
+	 */
+	IOCTLCMD(SOCKETS_LAST) = IOCTLCMD(SOCKETS_SOCKET) + 4,	/* 1994 on Linux. */
+
+	/*
+	 * The VSockets ioctls occupy the block above.  We define a new range of
+	 * VMCI ioctls to maintain binary compatibility between the user land and
+	 * the kernel driver.  Careful, vmmon ioctls start from 2001, so this means
+	 * we can add only 4 new VMCI ioctls.  Define a meta-ioctl if running out of
+	 * this binary space.
+	 */
+
+	IOCTLCMD(FIRST2),
+	IOCTLCMD(SET_NOTIFY) = IOCTLCMD(FIRST2),	/* 1995 on Linux. */
+	IOCTLCMD(LAST2),
+};
+
+/* Clean up helper macros */
+#undef IOCTLCMD
+
+/*
+ * VMCI driver initialization. This block can also be used to
+ * pass initial group membership etc.
+ */
+struct vmci_init_blk {
+	uint32_t cid;
+	uint32_t flags;
+};
+
+/* VMCIQueuePairAllocInfo_VMToVM */
+struct vmci_qp_ai_vmvm {
+	struct vmci_handle handle;
+	uint32_t peer;
+	uint32_t flags;
+	uint64_t produceSize;
+	uint64_t consumeSize;
+	uint64_t producePageFile;	/* User VA. */
+	uint64_t consumePageFile;	/* User VA. */
+	uint64_t producePageFileSize;	/* Size of the file name array. */
+	uint64_t consumePageFileSize;	/* Size of the file name array. */
+	int32_t result;
+	uint32_t _pad;
+};
+
+/* VMCIQueuePairAllocInfo */
+struct vmci_qp_alloc_info {
+	struct vmci_handle handle;
+	uint32_t peer;
+	uint32_t flags;
+	uint64_t produceSize;
+	uint64_t consumeSize;
+	uint64_t ppnVA;		/* Start VA of queue pair PPNs. */
+	uint64_t numPPNs;
+	int32_t result;
+	uint32_t version;
+};
+
+/* VMCIQueuePairSetVAInfo */
+struct vmci_qp_set_va_info {
+	struct vmci_handle handle;
+	uint64_t va;		/* Start VA of queue pair PPNs. */
+	uint64_t numPPNs;
+	uint32_t version;
+	int32_t result;
+};
+
+/*
+ * For backwards compatibility, here is a version of the
+ * VMCIQueuePairPageFileInfo before host support end-points was added.
+ * Note that the current version of that structure requires VMX to
+ * pass down the VA of the mapped file.  Before host support was added
+ * there was nothing of the sort.  So, when the driver sees the ioctl
+ * with a parameter that is the sizeof
+ * VMCIQueuePairPageFileInfo_NoHostQP then it can infer that the version
+ * of VMX running can't attach to host end points because it doesn't
+ * provide the VA of the mapped files.
+ *
+ * The Linux driver doesn't get an indication of the size of the
+ * structure passed down from user space.  So, to fix a long standing
+ * but unfiled bug, the _pad field has been renamed to version.
+ * Existing versions of VMX always initialize the PageFileInfo
+ * structure so that _pad, er, version is set to 0.
+ *
+ * A version value of 1 indicates that the size of the structure has
+ * been increased to include two UVA's: produceUVA and consumeUVA.
+ * These UVA's are of the mmap()'d queue contents backing files.
+ *
+ * In addition, if when VMX is sending down the
+ * VMCIQueuePairPageFileInfo structure it gets an error then it will
+ * try again with the _NoHostQP version of the file to see if an older
+ * VMCI kernel module is running.
+ */
+/* VMCIQueuePairPageFileInfo */
+struct vmci_qp_page_file_info {
+	struct vmci_handle handle;
+	uint64_t producePageFile;	/* User VA. */
+	uint64_t consumePageFile;	/* User VA. */
+	uint64_t producePageFileSize;	/* Size of the file name array. */
+	uint64_t consumePageFileSize;	/* Size of the file name array. */
+	int32_t result;
+	uint32_t version;	/* Was _pad. */
+	uint64_t produceVA;	/* User VA of the mapped file. */
+	uint64_t consumeVA;	/* User VA of the mapped file. */
+};
+
+/* VMCIQueuePairDetachInfo */
+struct vmci_qp_dtch_info {
+	struct vmci_handle handle;
+	int32_t result;
+	uint32_t _pad;
+};
+
+/* VMCIDatagramSendRecvInfo */
+struct vmci_dg_snd_rcv_info {
+	uint64_t addr;
+	uint32_t len;
+	int32_t result;
+};
+
+/* VMCINotifyAddRemoveInfo: Used to add/remove remote context notifications. */
+struct vmci_notify_add_rm_info {
+	uint32_t remoteCID;
+	int result;
+};
+
+/* VMCICptBufInfo: Used to set/get current context's checkpoint state. */
+struct vmci_chkpt_buf_info {
+	uint64_t cptBuf;
+	uint32_t cptType;
+	uint32_t bufSize;
+	int32_t result;
+	uint32_t _pad;
+};
+
+/* VMCISetNotifyInfo: Used to pass notify flag's address to the host driver. */
+struct vmci_set_notify_info {
+	uint64_t notifyUVA;
+	int32_t result;
+	uint32_t _pad;
+};
+
+#define VMCI_NOTIFY_RESOURCE_QUEUE_PAIR 0
+#define VMCI_NOTIFY_RESOURCE_DOOR_BELL  1
+
+#define VMCI_NOTIFY_RESOURCE_ACTION_NOTIFY  0
+#define VMCI_NOTIFY_RESOURCE_ACTION_CREATE  1
+#define VMCI_NOTIFY_RESOURCE_ACTION_DESTROY 2
+
+/*
+ * VMCINotifyResourceInfo: Used to create and destroy doorbells, and
+ * generate a notification for a doorbell or queue pair.
+ */
+struct vmci_notify_rsrc_info {
+	struct vmci_handle handle;
+	uint16_t resource;
+	uint16_t action;
+	int32_t result;
+};
+
+/*
+ * VMCINotificationReceiveInfo: Used to recieve pending notifications
+ * for doorbells and queue pairs.
+ */
+struct vmci_notify_recv_info {
+	uint64_t dbHandleBufUVA;
+	uint64_t dbHandleBufSize;
+	uint64_t qpHandleBufUVA;
+	uint64_t qpHandleBufSize;
+	int32_t result;
+	uint32_t _pad;
+};
+
+#endif				// ifndef _VMCI_IOCONTROLS_H_
diff --git a/drivers/misc/vmw_vmci/vmci_kernel_if.h b/drivers/misc/vmw_vmci/vmci_kernel_if.h
new file mode 100644
index 0000000..9b4e114
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_kernel_if.h
@@ -0,0 +1,111 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+/*
+ * This file defines helper functions for VMCI host _and_ guest
+ * kernel code.
+ */
+
+#ifndef _VMCI_KERNEL_IF_H_
+#define _VMCI_KERNEL_IF_H_
+
+#include <linux/kernel.h>
+#include <linux/wait.h>
+
+#include "vmci_defs.h"
+
+/* Callback needed for correctly waiting on events. */
+typedef int (*VMCIEventReleaseCB) (void *clientData);
+
+/* Host specific struct used for signalling */
+struct vmci_host {
+	wait_queue_head_t waitQueue;
+};
+
+/* Guest device port I/O. */
+bool VMCIHost_WaitForCallLocked(struct vmci_host *hostContext,
+				spinlock_t * lock,
+				unsigned long *flags, bool useBH);
+
+bool VMCIWellKnownID_AllowMap(uint32_t wellKnownID, uint32_t privFlags);
+
+int VMCIHost_CompareUser(uid_t * user1, uid_t * user2);
+
+void VMCI_WaitOnEvent(wait_queue_head_t * event,
+		      VMCIEventReleaseCB releaseCB, void *clientData);
+
+bool VMCI_WaitOnEventInterruptible(wait_queue_head_t * event,
+				   VMCIEventReleaseCB releaseCB,
+				   void *clientData);
+
+typedef void (VMCIWorkFn) (void *data);
+int VMCI_ScheduleDelayedWork(VMCIWorkFn * workFn, void *data);
+
+void *VMCI_AllocQueue(uint64_t size);
+void VMCI_FreeQueue(void *q, uint64_t size);
+struct PPNSet {
+	uint64_t numProducePages;
+	uint64_t numConsumePages;
+	uint32_t *producePPNs;
+	uint32_t *consumePPNs;
+	bool initialized;
+};
+int VMCI_AllocPPNSet(void *produceQ, uint64_t numProducePages,
+		     void *consumeQ, uint64_t numConsumePages,
+		     struct PPNSet *ppnSet);
+void VMCI_FreePPNSet(struct PPNSet *ppnSet);
+int VMCI_PopulatePPNList(uint8_t * callBuf, const struct PPNSet *ppnSet);
+
+struct vmci_queue;
+
+struct PageStoreAttachInfo;
+struct vmci_queue *VMCIHost_AllocQueue(uint64_t queueSize);
+void VMCIHost_FreeQueue(struct vmci_queue *queue, uint64_t queueSize);
+
+#define INVALID_VMCI_GUEST_MEM_ID  0
+
+struct QueuePairPageStore;
+int VMCIHost_RegisterUserMemory(struct QueuePairPageStore *pageStore,
+				struct vmci_queue *produceQ,
+				struct vmci_queue *consumeQ);
+void VMCIHost_UnregisterUserMemory(struct vmci_queue *produceQ,
+				   struct vmci_queue *consumeQ);
+int VMCIHost_MapQueueHeaders(struct vmci_queue *produceQ,
+			     struct vmci_queue *consumeQ);
+int VMCIHost_UnmapQueueHeaders(uint32_t gid,
+			       struct vmci_queue *produceQ,
+			       struct vmci_queue *consumeQ);
+void VMCI_InitQueueMutex(struct vmci_queue *produceQ,
+			 struct vmci_queue *consumeQ);
+void VMCI_CleanupQueueMutex(struct vmci_queue *produceQ,
+			    struct vmci_queue *consumeQ);
+void VMCI_AcquireQueueMutex(struct vmci_queue *queue);
+void VMCI_ReleaseQueueMutex(struct vmci_queue *queue);
+
+int VMCIHost_GetUserMemory(uint64_t produceUVA, uint64_t consumeUVA,
+			   struct vmci_queue *produceQ,
+			   struct vmci_queue *consumeQ);
+void VMCIHost_ReleaseUserMemory(struct vmci_queue *produceQ,
+				struct vmci_queue *consumeQ);
+
+bool VMCI_GuestPersonalityActive(void);
+bool VMCI_HostPersonalityActive(void);
+
+#endif				// _VMCI_KERNEL_IF_H_
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 13/14] Add main driver and kernel interface file
  2012-02-15  1:05 [PATCH 00/14] RFC: VMCI for Linux Andrew Stiegmann (stieg)
                   ` (11 preceding siblings ...)
  2012-02-15  1:05 ` [PATCH 12/14] Add misc header files used by VMCI Andrew Stiegmann (stieg)
@ 2012-02-15  1:05 ` Andrew Stiegmann (stieg)
  2012-02-15  1:05 ` [PATCH 14/14] Add Kconfig and Makefiles for VMCI Andrew Stiegmann (stieg)
  2012-02-17 19:28 ` [PATCH 00/14] RFC: VMCI for Linux Pavel Machek
  14 siblings, 0 replies; 16+ messages in thread
From: Andrew Stiegmann (stieg) @ 2012-02-15  1:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: vm-crosstalk, dtor, cschamp, Andrew Stiegmann (stieg)

---
 drivers/misc/vmw_vmci/driver.c       | 2352 ++++++++++++++++++++++++++++++++++
 drivers/misc/vmw_vmci/vmciKernelIf.c | 1351 +++++++++++++++++++
 2 files changed, 3703 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/driver.c
 create mode 100644 drivers/misc/vmw_vmci/vmciKernelIf.c

diff --git a/drivers/misc/vmw_vmci/driver.c b/drivers/misc/vmw_vmci/driver.c
new file mode 100644
index 0000000..ea9dc90
--- /dev/null
+++ b/drivers/misc/vmw_vmci/driver.c
@@ -0,0 +1,2352 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#include <asm/atomic.h>
+#include <asm/io.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/highmem.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/poll.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+#include <linux/version.h>
+
+#include "vmci_defs.h"
+#include "vmci_handle_array.h"
+#include "vmci_infrastructure.h"
+#include "vmci_iocontrols.h"
+#include "vmci_kernel_if.h"
+#include "vmciCommonInt.h"
+#include "vmciContext.h"
+#include "vmciDatagram.h"
+#include "vmciDoorbell.h"
+#include "vmciDriver.h"
+#include "vmciEvent.h"
+#include "vmciKernelAPI.h"
+#include "vmciQueuePair.h"
+#include "vmciResource.h"
+
+#define LGPFX "VMCI: "
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * PCI Device interface --
+ *
+ *      Declarations of types and functions related to the VMCI PCI
+ *      device personality.
+ *
+ *
+ *----------------------------------------------------------------------
+ */
+
+/*
+ * VMCI PCI driver state
+ */
+
+struct vmci_device {
+	struct mutex lock;
+
+	unsigned int ioaddr;
+	unsigned int ioaddr_size;
+	unsigned int irq;
+	unsigned int intr_type;
+	bool exclusive_vectors;
+	struct msix_entry msix_entries[VMCI_MAX_INTRS];
+
+	bool enabled;
+	spinlock_t dev_spinlock;
+	atomic_t datagrams_allowed;
+};
+
+static const struct pci_device_id vmci_ids[] = {
+	{PCI_DEVICE(PCI_VENDOR_ID_VMWARE, PCI_DEVICE_ID_VMWARE_VMCI),},
+	{0},
+};
+
+static int vmci_probe_device(struct pci_dev *pdev,
+			     const struct pci_device_id *id);
+
+static void vmci_remove_device(struct pci_dev *pdev);
+
+static struct pci_driver vmci_driver = {
+	.name = "vmci",
+	.id_table = vmci_ids,
+	.probe = vmci_probe_device,
+	.remove = __devexit_p(vmci_remove_device),
+};
+
+static struct vmci_device vmci_dev;
+static int vmci_disable_host = 0;
+static int vmci_disable_guest = 0;
+static int vmci_disable_msi;
+static int vmci_disable_msix = 0;
+
+/*
+ * Allocate a buffer for incoming datagrams globally to avoid repeated
+ * allocation in the interrupt handler's atomic context.
+ */
+
+static uint8_t *data_buffer = NULL;
+static uint32_t data_buffer_size = VMCI_MAX_DG_SIZE;
+
+/*
+ * If the VMCI hardware supports the notification bitmap, we allocate
+ * and register a page with the device.
+ */
+
+static uint8_t *notification_bitmap = NULL;
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * Host device node interface --
+ *
+ *      Implements VMCI by implementing open/close/ioctl functions
+ *
+ *
+ *----------------------------------------------------------------------
+ */
+
+/*
+ * Per-instance host state
+ */
+struct vmci_linux {
+	struct vmci_context *context;
+	int userVersion;
+	VMCIObjType ctType;
+	struct mutex lock;
+};
+
+/*
+ * Static driver state.
+ */
+struct vmci_linux_state {
+	int major;
+	int minor;
+	struct miscdevice misc;
+	char deviceName[32];
+	char buf[1024];
+	atomic_t activeContexts;
+};
+
+static struct vmci_linux_state linuxState;
+
+static int VMCISetupNotify(struct vmci_context *context, uintptr_t notifyUVA);
+
+static void VMCIUnsetNotifyInt(struct vmci_context *context, bool useLock);
+
+static int LinuxDriver_Open(struct inode *inode, struct file *filp);
+
+static long LinuxDriver_UnlockedIoctl(struct file *filp,
+				      u_int iocmd, unsigned long ioarg);
+
+static int LinuxDriver_Close(struct inode *inode, struct file *filp);
+
+static unsigned int LinuxDriverPoll(struct file *file, poll_table * wait);
+
+static struct file_operations vmuser_fops;
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * Shared VMCI device definitions --
+ *
+ *      Types and variables shared by both host and guest personality
+ *
+ *
+ *----------------------------------------------------------------------
+ */
+
+static bool guestDeviceInit;
+static atomic_t guestDeviceActive;
+static bool hostDeviceInit;
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * vmci_host_init --
+ *
+ *      Initializes the VMCI host device driver.
+ *
+ * Results:
+ *      0 on success, other error codes on failure.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int vmci_host_init(void)
+{
+	int retval;
+
+	if (VMCI_HostInit() < VMCI_SUCCESS) {
+		return -ENOMEM;
+	}
+
+	/*
+	 * Initialize the file_operations structure. Because this code is always
+	 * compiled as a module, this is fine to do it here and not in a static
+	 * initializer.
+	 */
+
+	memset(&vmuser_fops, 0, sizeof vmuser_fops);
+	vmuser_fops.owner = THIS_MODULE;
+	vmuser_fops.poll = LinuxDriverPoll;
+	vmuser_fops.unlocked_ioctl = LinuxDriver_UnlockedIoctl;
+	vmuser_fops.compat_ioctl = LinuxDriver_UnlockedIoctl;
+	vmuser_fops.open = LinuxDriver_Open;
+	vmuser_fops.release = LinuxDriver_Close;
+
+	sprintf(linuxState.deviceName, "vmci");
+	linuxState.major = 10;
+	linuxState.misc.minor = MISC_DYNAMIC_MINOR;
+	linuxState.misc.name = linuxState.deviceName;
+	linuxState.misc.fops = &vmuser_fops;
+	atomic_set(&linuxState.activeContexts, 0);
+
+	retval = misc_register(&linuxState.misc);
+
+	if (retval) {
+		printk(KERN_WARNING LGPFX "Module registration error "
+		       "(name=%s,major=%d,minor=%d,err=%d).\n",
+		       linuxState.deviceName, -retval, linuxState.major,
+		       linuxState.minor);
+		VMCI_HostCleanup();
+	} else {
+		linuxState.minor = linuxState.misc.minor;
+		printk(KERN_INFO LGPFX
+		       "Module registered (name=%s,major=%d,"
+		       "minor=%d).\n", linuxState.deviceName,
+		       linuxState.major, linuxState.minor);
+	}
+
+	return retval;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * LinuxDriver_Open  --
+ *
+ *     Called on open of /dev/vmci.
+ *
+ * Side effects:
+ *     Increment use count used to determine eventual deallocation of
+ *     the module
+ *
+ *----------------------------------------------------------------------
+ */
+
+static int LinuxDriver_Open(struct inode *inode,	// IN
+			    struct file *filp)	// IN
+{
+	struct vmci_linux *vmciLinux;
+
+	vmciLinux = kmalloc(sizeof(struct vmci_linux), GFP_KERNEL);
+	if (vmciLinux == NULL) {
+		return -ENOMEM;
+	}
+	memset(vmciLinux, 0, sizeof *vmciLinux);	/* XXX: Necessary? */
+	vmciLinux->ctType = VMCIOBJ_NOT_SET;
+	vmciLinux->userVersion = 0;	/* XXX: Not necessary w/ memset */
+	mutex_init(&vmciLinux->lock);
+	filp->private_data = vmciLinux;
+
+	return 0;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * LinuxDriver_Close  --
+ *
+ *      Called on close of /dev/vmci, most often when the process
+ *      exits.
+ *
+ *----------------------------------------------------------------------
+ */
+
+static int LinuxDriver_Close(struct inode *inode,	// IN
+			     struct file *filp)	// IN
+{
+	struct vmci_linux *vmciLinux;
+
+	vmciLinux = (struct vmci_linux *)filp->private_data;
+	ASSERT(vmciLinux);
+
+	if (vmciLinux->ctType == VMCIOBJ_CONTEXT) {
+		ASSERT(vmciLinux->context);
+
+		VMCIContext_ReleaseContext(vmciLinux->context);
+		vmciLinux->context = NULL;
+
+		/*
+		 * The number of active contexts is used to track whether any
+		 * VMX'en are using the host personality. It is incremented when
+		 * a context is created through the IOCTL_VMCI_INIT_CONTEXT
+		 * ioctl.
+		 */
+
+		atomic_dec(&linuxState.activeContexts);
+	}
+	vmciLinux->ctType = VMCIOBJ_NOT_SET;
+
+	kfree(vmciLinux);
+	filp->private_data = NULL;
+	return 0;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * LinuxDriverPoll  --
+ *
+ *      This is used to wake up the VMX when a VMCI call arrives, or
+ *      to wake up select() or poll() at the next clock tick.
+ *
+ *----------------------------------------------------------------------
+ */
+
+static unsigned int LinuxDriverPoll(struct file *filp, poll_table * wait)
+{
+	struct vmci_linux *vmciLinux = (struct vmci_linux *)filp->private_data;
+	unsigned int mask = 0;
+
+	if (vmciLinux->ctType == VMCIOBJ_CONTEXT) {
+		ASSERT(vmciLinux->context != NULL);
+		/*
+		 * Check for VMCI calls to this VM context.
+		 */
+
+		if (wait != NULL) {
+			poll_wait(filp,
+				  &vmciLinux->context->hostContext.waitQueue,
+				  wait);
+		}
+
+		spin_lock(&vmciLinux->context->lock);
+		if (vmciLinux->context->pendingDatagrams > 0 ||
+		    VMCIHandleArray_GetSize(vmciLinux->
+					    context->pendingDoorbellArray) >
+		    0) {
+			mask = POLLIN;
+		}
+		spin_unlock(&vmciLinux->context->lock);
+	}
+	return mask;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCICopyHandleArrayToUser  --
+ *
+ *      Copies the handles of a handle array into a user buffer, and
+ *      returns the new length in userBufferSize. If the copy to the
+ *      user buffer fails, the functions still returns VMCI_SUCCESS,
+ *      but retval != 0.
+ *
+ *----------------------------------------------------------------------
+ */
+
+static int VMCICopyHandleArrayToUser(void *userBufUVA,	// IN
+				     uint64_t * userBufSize,	// IN/OUT
+				     struct vmci_handle_arr *handleArray,	// IN
+				     int *retval)	// IN
+{
+	uint32_t arraySize;
+	struct vmci_handle *handles;
+
+	if (handleArray) {
+		arraySize = VMCIHandleArray_GetSize(handleArray);
+	} else {
+		arraySize = 0;
+	}
+
+	if (arraySize * sizeof *handles > *userBufSize) {
+		return VMCI_ERROR_MORE_DATA;
+	}
+
+	*userBufSize = arraySize * sizeof *handles;
+	if (*userBufSize) {
+		*retval = copy_to_user(userBufUVA,
+				       VMCIHandleArray_GetHandles
+				       (handleArray), *userBufSize);
+	}
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIDoQPBrokerAlloc --
+ *
+ *      Helper function for creating queue pair and copying the result
+ *      to user memory.
+ *
+ * Results:
+ *      0 if result value was copied to user memory, -EFAULT otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static int
+VMCIDoQPBrokerAlloc(struct vmci_handle handle,
+		    uint32_t peer,
+		    uint32_t flags,
+		    uint64_t produceSize,
+		    uint64_t consumeSize,
+		    QueuePairPageStore * pageStore,
+		    struct vmci_context *context, bool vmToVm, void *resultUVA)
+{
+	uint32_t cid;
+	int result;
+	int retval;
+
+	cid = VMCIContext_GetId(context);
+
+	result =
+	    VMCIQPBroker_Alloc(handle, peer, flags,
+			       VMCI_NO_PRIVILEGE_FLAGS, produceSize,
+			       consumeSize, pageStore, context);
+	if (result == VMCI_SUCCESS && vmToVm) {
+		result = VMCI_SUCCESS_QUEUEPAIR_CREATE;
+	}
+	retval = copy_to_user(resultUVA, &result, sizeof result);
+	if (retval) {
+		retval = -EFAULT;
+		if (result >= VMCI_SUCCESS) {
+			result = VMCIQPBroker_Detach(handle, context);
+			ASSERT(result >= VMCI_SUCCESS);
+		}
+	}
+
+	return retval;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * LinuxDriver_UnlockedIoctl --
+ *
+ *      Main path for UserRPC
+ *
+ * Results:
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static long
+LinuxDriver_UnlockedIoctl(struct file *filp, u_int iocmd, unsigned long ioarg)
+{
+	struct vmci_linux *vmciLinux = (struct vmci_linux *)filp->private_data;
+	int retval = 0;
+
+	switch (iocmd) {
+	case IOCTL_VMCI_VERSION2:{
+			int verFromUser;
+
+			if (copy_from_user
+			    (&verFromUser, (void *)ioarg, sizeof verFromUser)) {
+				retval = -EFAULT;
+				break;
+			}
+
+			vmciLinux->userVersion = verFromUser;
+		}
+		/* Fall through. */
+	case IOCTL_VMCI_VERSION:
+		/*
+		 * The basic logic here is:
+		 *
+		 * If the user sends in a version of 0 tell it our version.
+		 * If the user didn't send in a version, tell it our version.
+		 * If the user sent in an old version, tell it -its- version.
+		 * If the user sent in an newer version, tell it our version.
+		 *
+		 * The rationale behind telling the caller its version is that
+		 * Workstation 6.5 required that VMX and VMCI kernel module were
+		 * version sync'd.  All new VMX users will be programmed to
+		 * handle the VMCI kernel module version.
+		 */
+
+		if (vmciLinux->userVersion > 0 &&
+		    vmciLinux->userVersion < VMCI_VERSION_HOSTQP) {
+			retval = vmciLinux->userVersion;
+		} else {
+			retval = VMCI_VERSION;
+		}
+		break;
+
+	case IOCTL_VMCI_INIT_CONTEXT:{
+			struct vmci_init_blk initBlock;
+			uid_t user;
+
+			retval =
+			    copy_from_user(&initBlock, (void *)ioarg,
+					   sizeof initBlock);
+			if (retval != 0) {
+				printk(KERN_INFO LGPFX
+				       "Error reading init block.\n");
+				retval = -EFAULT;
+				break;
+			}
+
+			mutex_lock(&vmciLinux->lock);
+			if (vmciLinux->ctType != VMCIOBJ_NOT_SET) {
+				printk(KERN_INFO LGPFX
+				       "Received VMCI init on initialized handle.\n");
+				retval = -EINVAL;
+				goto init_release;
+			}
+
+			if (initBlock.flags & ~VMCI_PRIVILEGE_FLAG_RESTRICTED) {
+				printk(KERN_INFO LGPFX
+				       "Unsupported VMCI restriction flag.\n");
+				retval = -EINVAL;
+				goto init_release;
+			}
+
+			user = current_uid();
+			retval =
+			    VMCIContext_InitContext(initBlock.cid,
+						    initBlock.flags,
+						    0 /* Unused */ ,
+						    vmciLinux->userVersion,
+						    &user, &vmciLinux->context);
+			if (retval < VMCI_SUCCESS) {
+				printk(KERN_INFO LGPFX
+				       "Error initializing context.\n");
+				retval =
+				    retval ==
+				    VMCI_ERROR_DUPLICATE_ENTRY ? -EEXIST :
+				    -EINVAL;
+				goto init_release;
+			}
+
+			/*
+			 * Copy cid to userlevel, we do this to allow the VMX to enforce its
+			 * policy on cid generation.
+			 */
+			initBlock.cid = VMCIContext_GetId(vmciLinux->context);
+			retval =
+			    copy_to_user((void *)ioarg, &initBlock,
+					 sizeof initBlock);
+			if (retval != 0) {
+				VMCIContext_ReleaseContext(vmciLinux->context);
+				vmciLinux->context = NULL;
+				printk(KERN_INFO LGPFX
+				       "Error writing init block.\n");
+				retval = -EFAULT;
+				goto init_release;
+			}
+			ASSERT(initBlock.cid != VMCI_INVALID_ID);
+
+			vmciLinux->ctType = VMCIOBJ_CONTEXT;
+
+			atomic_inc(&linuxState.activeContexts);
+
+ init_release:
+			mutex_unlock(&vmciLinux->lock);
+			break;
+		}
+
+	case IOCTL_VMCI_DATAGRAM_SEND:{
+			struct vmci_dg_snd_rcv_info sendInfo;
+			struct vmci_datagram *dg = NULL;
+			uint32_t cid;
+
+			if (vmciLinux->ctType != VMCIOBJ_CONTEXT) {
+				Warning(LGPFX
+					"Ioctl only valid for context handle (iocmd=%d).\n",
+					iocmd);
+				retval = -EINVAL;
+				break;
+			}
+
+			retval =
+			    copy_from_user(&sendInfo, (void *)ioarg,
+					   sizeof sendInfo);
+			if (retval) {
+				Warning(LGPFX "copy_from_user failed.\n");
+				retval = -EFAULT;
+				break;
+			}
+
+			if (sendInfo.len > VMCI_MAX_DG_SIZE) {
+				Warning(LGPFX
+					"Datagram too big (size=%d).\n",
+					sendInfo.len);
+				retval = -EINVAL;
+				break;
+			}
+
+			if (sendInfo.len < sizeof *dg) {
+				Warning(LGPFX
+					"Datagram too small (size=%d).\n",
+					sendInfo.len);
+				retval = -EINVAL;
+				break;
+			}
+
+			dg = kmalloc(sendInfo.len, GFP_KERNEL);
+			if (dg == NULL) {
+				printk(KERN_INFO LGPFX
+				       "Cannot allocate memory to dispatch datagram.\n");
+				retval = -ENOMEM;
+				break;
+			}
+
+			retval =
+			    copy_from_user(dg,
+					   (char *)(uintptr_t) sendInfo.addr,
+					   sendInfo.len);
+			if (retval != 0) {
+				printk(KERN_INFO LGPFX
+				       "Error getting datagram (err=%d).\n",
+				       retval);
+				kfree(dg);
+				retval = -EFAULT;
+				break;
+			}
+
+			VMCI_DEBUG_LOG(10,
+				       (LGPFX
+					"Datagram dst (handle=0x%x:0x%x) src "
+					"(handle=0x%x:0x%x), payload (size=%"
+					FMT64 "u " "bytes).\n",
+					dg->dst.context, dg->dst.resource,
+					dg->src.context, dg->src.resource,
+					dg->payloadSize));
+
+			/* Get source context id. */
+			ASSERT(vmciLinux->context);
+			cid = VMCIContext_GetId(vmciLinux->context);
+			ASSERT(cid != VMCI_INVALID_ID);
+			sendInfo.result = VMCIDatagram_Dispatch(cid, dg, true);
+			kfree(dg);
+			retval =
+			    copy_to_user((void *)ioarg, &sendInfo,
+					 sizeof sendInfo);
+			break;
+		}
+
+	case IOCTL_VMCI_DATAGRAM_RECEIVE:{
+			struct vmci_dg_snd_rcv_info recvInfo;
+			struct vmci_datagram *dg = NULL;
+			size_t size;
+
+			if (vmciLinux->ctType != VMCIOBJ_CONTEXT) {
+				Warning(LGPFX
+					"Ioctl only valid for context handle (iocmd=%d).\n",
+					iocmd);
+				retval = -EINVAL;
+				break;
+			}
+
+			retval =
+			    copy_from_user(&recvInfo, (void *)ioarg,
+					   sizeof recvInfo);
+			if (retval) {
+				Warning(LGPFX "copy_from_user failed.\n");
+				retval = -EFAULT;
+				break;
+			}
+
+			ASSERT(vmciLinux->ctType == VMCIOBJ_CONTEXT);
+
+			size = recvInfo.len;
+			ASSERT(vmciLinux->context);
+			recvInfo.result =
+			    VMCIContext_DequeueDatagram(vmciLinux->context,
+							&size, &dg);
+
+			if (recvInfo.result >= VMCI_SUCCESS) {
+				ASSERT(dg);
+				retval = copy_to_user((void *)((uintptr_t)
+							       recvInfo.addr),
+						      dg, VMCI_DG_SIZE(dg));
+				kfree(dg);
+				if (retval != 0) {
+					break;
+				}
+			}
+			retval =
+			    copy_to_user((void *)ioarg, &recvInfo,
+					 sizeof recvInfo);
+			break;
+		}
+
+	case IOCTL_VMCI_QUEUEPAIR_ALLOC:{
+			if (vmciLinux->ctType != VMCIOBJ_CONTEXT) {
+				printk(KERN_INFO LGPFX
+				       "IOCTL_VMCI_QUEUEPAIR_ALLOC only valid for contexts.\n");
+				retval = -EINVAL;
+				break;
+			}
+
+			if (vmciLinux->userVersion < VMCI_VERSION_NOVMVM) {
+				struct vmci_qp_ai_vmvm queuePairAllocInfo;
+				struct vmci_qp_ai_vmvm *info =
+				    (struct vmci_qp_ai_vmvm *)ioarg;
+
+				retval =
+				    copy_from_user(&queuePairAllocInfo,
+						   (void *)ioarg,
+						   sizeof queuePairAllocInfo);
+				if (retval) {
+					retval = -EFAULT;
+					break;
+				}
+
+				retval = VMCIDoQPBrokerAlloc(queuePairAllocInfo.handle, queuePairAllocInfo.peer, queuePairAllocInfo.flags, queuePairAllocInfo.produceSize, queuePairAllocInfo.consumeSize, NULL, vmciLinux->context, true,	// VM to VM style create
+							     &info->result);
+			} else {
+				struct vmci_qp_alloc_info
+				 queuePairAllocInfo;
+				struct vmci_qp_alloc_info *info =
+				    (struct vmci_qp_alloc_info *)ioarg;
+				QueuePairPageStore pageStore;
+
+				retval =
+				    copy_from_user(&queuePairAllocInfo,
+						   (void *)ioarg,
+						   sizeof queuePairAllocInfo);
+				if (retval) {
+					retval = -EFAULT;
+					break;
+				}
+
+				pageStore.pages = queuePairAllocInfo.ppnVA;
+				pageStore.len = queuePairAllocInfo.numPPNs;
+
+				retval = VMCIDoQPBrokerAlloc(queuePairAllocInfo.handle, queuePairAllocInfo.peer, queuePairAllocInfo.flags, queuePairAllocInfo.produceSize, queuePairAllocInfo.consumeSize, &pageStore, vmciLinux->context, false,	// Not VM to VM style create
+							     &info->result);
+			}
+			break;
+		}
+
+	case IOCTL_VMCI_QUEUEPAIR_SETVA:{
+			struct vmci_qp_set_va_info setVAInfo;
+			struct vmci_qp_set_va_info *info =
+			    (struct vmci_qp_set_va_info *)ioarg;
+			int32_t result;
+
+			if (vmciLinux->ctType != VMCIOBJ_CONTEXT) {
+				printk(KERN_INFO LGPFX
+				       "IOCTL_VMCI_QUEUEPAIR_SETVA only valid for contexts.\n");
+				retval = -EINVAL;
+				break;
+			}
+
+			if (vmciLinux->userVersion < VMCI_VERSION_NOVMVM) {
+				printk(KERN_INFO LGPFX
+				       "IOCTL_VMCI_QUEUEPAIR_SETVA not supported for this VMX version.\n");
+				retval = -EINVAL;
+				break;
+			}
+
+			retval =
+			    copy_from_user(&setVAInfo, (void *)ioarg,
+					   sizeof setVAInfo);
+			if (retval) {
+				retval = -EFAULT;
+				break;
+			}
+
+			if (setVAInfo.va) {
+				/*
+				 * VMX is passing down a new VA for the queue pair mapping.
+				 */
+
+				result =
+				    VMCIQPBroker_Map(setVAInfo.handle,
+						     vmciLinux->context,
+						     setVAInfo.va);
+			} else {
+				/*
+				 * The queue pair is about to be unmapped by the VMX.
+				 */
+
+				result =
+				    VMCIQPBroker_Unmap(setVAInfo.handle,
+						       vmciLinux->context, 0);
+			}
+
+			retval =
+			    copy_to_user(&info->result, &result, sizeof result);
+			if (retval) {
+				retval = -EFAULT;
+			}
+
+			break;
+		}
+
+	case IOCTL_VMCI_QUEUEPAIR_SETPAGEFILE:{
+			struct vmci_qp_page_file_info pageFileInfo;
+			struct vmci_qp_page_file_info *info =
+			    (struct vmci_qp_page_file_info *)ioarg;
+			int32_t result;
+
+			if (vmciLinux->userVersion < VMCI_VERSION_HOSTQP ||
+			    vmciLinux->userVersion >= VMCI_VERSION_NOVMVM) {
+				printk(KERN_INFO LGPFX
+				       "IOCTL_VMCI_QUEUEPAIR_SETPAGEFILE not supported this VMX "
+				       "(version=%d).\n",
+				       vmciLinux->userVersion);
+				retval = -EINVAL;
+				break;
+			}
+
+			if (vmciLinux->ctType != VMCIOBJ_CONTEXT) {
+				printk(KERN_INFO LGPFX
+				       "IOCTL_VMCI_QUEUEPAIR_SETPAGEFILE only valid for contexts.\n");
+				retval = -EINVAL;
+				break;
+			}
+
+			retval =
+			    copy_from_user(&pageFileInfo, (void *)ioarg,
+					   sizeof *info);
+			if (retval) {
+				retval = -EFAULT;
+				break;
+			}
+
+			/*
+			 * Communicate success pre-emptively to the caller.  Note that
+			 * the basic premise is that it is incumbent upon the caller not
+			 * to look at the info.result field until after the ioctl()
+			 * returns.  And then, only if the ioctl() result indicates no
+			 * error.  We send up the SUCCESS status before calling
+			 * SetPageStore() store because failing to copy up the result
+			 * code means unwinding the SetPageStore().
+			 *
+			 * It turns out the logic to unwind a SetPageStore() opens a can
+			 * of worms.  For example, if a host had created the QueuePair
+			 * and a guest attaches and SetPageStore() is successful but
+			 * writing success fails, then ... the host has to be stopped
+			 * from writing (anymore) data into the QueuePair.  That means
+			 * an additional test in the VMCI_Enqueue() code path.  Ugh.
+			 */
+
+			result = VMCI_SUCCESS;
+			retval =
+			    copy_to_user(&info->result, &result, sizeof result);
+			if (retval == 0) {
+				result =
+				    VMCIQPBroker_SetPageStore
+				    (pageFileInfo.handle,
+				     pageFileInfo.produceVA,
+				     pageFileInfo.consumeVA,
+				     vmciLinux->context);
+				if (result < VMCI_SUCCESS) {
+
+					retval =
+					    copy_to_user(&info->result,
+							 &result,
+							 sizeof result);
+					if (retval != 0) {
+						/*
+						 * Note that in this case the SetPageStore() call
+						 * failed but we were unable to communicate that to the
+						 * caller (because the copy_to_user() call failed).
+						 * So, if we simply return an error (in this case
+						 * -EFAULT) then the caller will know that the
+						 * SetPageStore failed even though we couldn't put the
+						 * result code in the result field and indicate exactly
+						 * why it failed.
+						 *
+						 * That says nothing about the issue where we were once
+						 * able to write to the caller's info memory and now
+						 * can't.  Something more serious is probably going on
+						 * than the fact that SetPageStore() didn't work.
+						 */
+						retval = -EFAULT;
+					}
+				}
+
+			} else {
+				/*
+				 * In this case, we can't write a result field of the
+				 * caller's info block.  So, we don't even try to
+				 * SetPageStore().
+				 */
+				retval = -EFAULT;
+			}
+
+			break;
+		}
+
+	case IOCTL_VMCI_QUEUEPAIR_DETACH:{
+			struct vmci_qp_dtch_info detachInfo;
+			struct vmci_qp_dtch_info *info =
+			    (struct vmci_qp_dtch_info *)ioarg;
+			int32_t result;
+
+			if (vmciLinux->ctType != VMCIOBJ_CONTEXT) {
+				printk(KERN_INFO LGPFX
+				       "IOCTL_VMCI_QUEUEPAIR_DETACH only valid for contexts.\n");
+				retval = -EINVAL;
+				break;
+			}
+
+			retval =
+			    copy_from_user(&detachInfo, (void *)ioarg,
+					   sizeof detachInfo);
+			if (retval) {
+				retval = -EFAULT;
+				break;
+			}
+
+			result =
+			    VMCIQPBroker_Detach(detachInfo.handle,
+						vmciLinux->context);
+			if (result == VMCI_SUCCESS
+			    && vmciLinux->userVersion < VMCI_VERSION_NOVMVM) {
+				result = VMCI_SUCCESS_LAST_DETACH;
+			}
+
+			retval =
+			    copy_to_user(&info->result, &result, sizeof result);
+			if (retval) {
+				retval = -EFAULT;
+			}
+
+			break;
+		}
+
+	case IOCTL_VMCI_CTX_ADD_NOTIFICATION:{
+			struct vmci_notify_add_rm_info arInfo;
+			struct vmci_notify_add_rm_info *info =
+			    (struct vmci_notify_add_rm_info *)ioarg;
+			int32_t result;
+			uint32_t cid;
+
+			if (vmciLinux->ctType != VMCIOBJ_CONTEXT) {
+				printk(KERN_INFO LGPFX
+				       "IOCTL_VMCI_CTX_ADD_NOTIFICATION only valid for contexts.\n");
+				retval = -EINVAL;
+				break;
+			}
+
+			retval =
+			    copy_from_user(&arInfo, (void *)ioarg,
+					   sizeof arInfo);
+			if (retval) {
+				retval = -EFAULT;
+				break;
+			}
+
+			cid = VMCIContext_GetId(vmciLinux->context);
+			result =
+			    VMCIContext_AddNotification(cid, arInfo.remoteCID);
+			retval =
+			    copy_to_user(&info->result, &result, sizeof result);
+			if (retval) {
+				retval = -EFAULT;
+				break;
+			}
+			break;
+		}
+
+	case IOCTL_VMCI_CTX_REMOVE_NOTIFICATION:{
+			struct vmci_notify_add_rm_info arInfo;
+			struct vmci_notify_add_rm_info *info =
+			    (struct vmci_notify_add_rm_info *)ioarg;
+			int32_t result;
+			uint32_t cid;
+
+			if (vmciLinux->ctType != VMCIOBJ_CONTEXT) {
+				printk(KERN_INFO LGPFX
+				       "IOCTL_VMCI_CTX_REMOVE_NOTIFICATION only valid for "
+				       "contexts.\n");
+				retval = -EINVAL;
+				break;
+			}
+
+			retval =
+			    copy_from_user(&arInfo, (void *)ioarg,
+					   sizeof arInfo);
+			if (retval) {
+				retval = -EFAULT;
+				break;
+			}
+
+			cid = VMCIContext_GetId(vmciLinux->context);
+			result =
+			    VMCIContext_RemoveNotification(cid,
+							   arInfo.remoteCID);
+			retval =
+			    copy_to_user(&info->result, &result, sizeof result);
+			if (retval) {
+				retval = -EFAULT;
+				break;
+			}
+			break;
+		}
+
+	case IOCTL_VMCI_CTX_GET_CPT_STATE:{
+			struct vmci_chkpt_buf_info getInfo;
+			uint32_t cid;
+			char *cptBuf;
+
+			if (vmciLinux->ctType != VMCIOBJ_CONTEXT) {
+				printk(KERN_INFO LGPFX
+				       "IOCTL_VMCI_CTX_GET_CPT_STATE only valid for contexts.\n");
+				retval = -EINVAL;
+				break;
+			}
+
+			retval =
+			    copy_from_user(&getInfo, (void *)ioarg,
+					   sizeof getInfo);
+			if (retval) {
+				retval = -EFAULT;
+				break;
+			}
+
+			cid = VMCIContext_GetId(vmciLinux->context);
+			getInfo.result =
+			    VMCIContext_GetCheckpointState(cid,
+							   getInfo.cptType,
+							   &getInfo.bufSize,
+							   &cptBuf);
+			if (getInfo.result == VMCI_SUCCESS && getInfo.bufSize) {
+				retval = copy_to_user((void *)(uintptr_t)
+						      getInfo.cptBuf, cptBuf,
+						      getInfo.bufSize);
+				kfree(cptBuf);
+				if (retval) {
+					retval = -EFAULT;
+					break;
+				}
+			}
+			retval =
+			    copy_to_user((void *)ioarg, &getInfo,
+					 sizeof getInfo);
+			if (retval) {
+				retval = -EFAULT;
+				break;
+			}
+			break;
+		}
+
+	case IOCTL_VMCI_CTX_SET_CPT_STATE:{
+			struct vmci_chkpt_buf_info setInfo;
+			uint32_t cid;
+			char *cptBuf;
+
+			if (vmciLinux->ctType != VMCIOBJ_CONTEXT) {
+				printk(KERN_INFO LGPFX
+				       "IOCTL_VMCI_CTX_SET_CPT_STATE only valid for contexts.\n");
+				retval = -EINVAL;
+				break;
+			}
+
+			retval =
+			    copy_from_user(&setInfo, (void *)ioarg,
+					   sizeof setInfo);
+			if (retval) {
+				retval = -EFAULT;
+				break;
+			}
+
+			cptBuf = kmalloc(setInfo.bufSize, GFP_KERNEL);
+			if (cptBuf == NULL) {
+				printk(KERN_INFO LGPFX
+				       "Cannot allocate memory to set cpt state (type=%d).\n",
+				       setInfo.cptType);
+				retval = -ENOMEM;
+				break;
+			}
+			retval =
+			    copy_from_user(cptBuf,
+					   (void *)(uintptr_t) setInfo.cptBuf,
+					   setInfo.bufSize);
+			if (retval) {
+				kfree(cptBuf);
+				retval = -EFAULT;
+				break;
+			}
+
+			cid = VMCIContext_GetId(vmciLinux->context);
+			setInfo.result =
+			    VMCIContext_SetCheckpointState(cid,
+							   setInfo.cptType,
+							   setInfo.bufSize,
+							   cptBuf);
+			kfree(cptBuf);
+			retval =
+			    copy_to_user((void *)ioarg, &setInfo,
+					 sizeof setInfo);
+			if (retval) {
+				retval = -EFAULT;
+				break;
+			}
+			break;
+		}
+
+	case IOCTL_VMCI_GET_CONTEXT_ID:{
+			uint32_t cid = VMCI_HOST_CONTEXT_ID;
+
+			retval = copy_to_user((void *)ioarg, &cid, sizeof cid);
+			break;
+		}
+
+	case IOCTL_VMCI_SET_NOTIFY:{
+			struct vmci_set_notify_info notifyInfo;
+
+			if (vmciLinux->ctType != VMCIOBJ_CONTEXT) {
+				printk(KERN_INFO LGPFX
+				       "IOCTL_VMCI_SET_NOTIFY only valid for contexts.\n");
+				retval = -EINVAL;
+				break;
+			}
+
+			retval =
+			    copy_from_user(&notifyInfo, (void *)ioarg,
+					   sizeof notifyInfo);
+			if (retval) {
+				retval = -EFAULT;
+				break;
+			}
+
+			if ((uintptr_t) notifyInfo.notifyUVA !=
+			    (uintptr_t) NULL) {
+				notifyInfo.result =
+				    VMCISetupNotify(vmciLinux->context,
+						    (uintptr_t)
+						    notifyInfo.notifyUVA);
+			} else {
+				VMCIUnsetNotifyInt(vmciLinux->context, true);
+				notifyInfo.result = VMCI_SUCCESS;
+			}
+
+			retval =
+			    copy_to_user((void *)ioarg, &notifyInfo,
+					 sizeof notifyInfo);
+			if (retval) {
+				retval = -EFAULT;
+				break;
+			}
+
+			break;
+		}
+
+	case IOCTL_VMCI_NOTIFY_RESOURCE:{
+			struct vmci_notify_rsrc_info info;
+			uint32_t cid;
+
+			if (vmciLinux->userVersion < VMCI_VERSION_NOTIFY) {
+				printk(KERN_INFO LGPFX
+				       "IOCTL_VMCI_NOTIFY_RESOURCE is invalid for current"
+				       " VMX versions.\n");
+				retval = -EINVAL;
+				break;
+			}
+
+			if (vmciLinux->ctType != VMCIOBJ_CONTEXT) {
+				printk(KERN_INFO LGPFX
+				       "IOCTL_VMCI_NOTIFY_RESOURCE is only valid for contexts.\n");
+				retval = -EINVAL;
+				break;
+			}
+
+			retval =
+			    copy_from_user(&info, (void *)ioarg, sizeof info);
+			if (retval) {
+				retval = -EFAULT;
+				break;
+			}
+
+			cid = VMCIContext_GetId(vmciLinux->context);
+			switch (info.action) {
+			case VMCI_NOTIFY_RESOURCE_ACTION_NOTIFY:
+				if (info.resource ==
+				    VMCI_NOTIFY_RESOURCE_DOOR_BELL) {
+					info.result =
+					    VMCIContext_NotifyDoorbell(cid,
+								       info.handle,
+								       VMCI_NO_PRIVILEGE_FLAGS);
+				} else {
+					info.result = VMCI_ERROR_UNAVAILABLE;
+				}
+				break;
+			case VMCI_NOTIFY_RESOURCE_ACTION_CREATE:
+				info.result =
+				    VMCIContext_DoorbellCreate(cid,
+							       info.handle);
+				break;
+			case VMCI_NOTIFY_RESOURCE_ACTION_DESTROY:
+				info.result =
+				    VMCIContext_DoorbellDestroy(cid,
+								info.handle);
+				break;
+			default:
+				printk(KERN_INFO LGPFX
+				       "IOCTL_VMCI_NOTIFY_RESOURCE got unknown action (action=%d).\n",
+				       info.action);
+				info.result = VMCI_ERROR_INVALID_ARGS;
+			}
+			retval = copy_to_user((void *)ioarg, &info,
+					      sizeof info);
+			if (retval) {
+				retval = -EFAULT;
+				break;
+			}
+
+			break;
+		}
+
+	case IOCTL_VMCI_NOTIFICATIONS_RECEIVE:{
+			struct vmci_notify_recv_info info;
+			struct vmci_handle_arr *dbHandleArray;
+			struct vmci_handle_arr *qpHandleArray;
+			uint32_t cid;
+
+			if (vmciLinux->ctType != VMCIOBJ_CONTEXT) {
+				printk(KERN_INFO LGPFX
+				       "IOCTL_VMCI_NOTIFICATIONS_RECEIVE is only valid for contexts.\n");
+				retval = -EINVAL;
+				break;
+			}
+
+			if (vmciLinux->userVersion < VMCI_VERSION_NOTIFY) {
+				printk(KERN_INFO LGPFX
+				       "IOCTL_VMCI_NOTIFICATIONS_RECEIVE is not supported for the"
+				       " current vmx version.\n");
+				retval = -EINVAL;
+				break;
+			}
+
+			retval =
+			    copy_from_user(&info, (void *)ioarg, sizeof info);
+			if (retval) {
+				retval = -EFAULT;
+				break;
+			}
+
+			if ((info.dbHandleBufSize && !info.dbHandleBufUVA)
+			    || (info.qpHandleBufSize && !info.qpHandleBufUVA)) {
+				retval = -EINVAL;
+				break;
+			}
+
+			cid = VMCIContext_GetId(vmciLinux->context);
+			info.result =
+			    VMCIContext_ReceiveNotificationsGet(cid,
+								&dbHandleArray,
+								&qpHandleArray);
+			if (info.result == VMCI_SUCCESS) {
+				info.result = VMCICopyHandleArrayToUser((void *)
+									(uintptr_t)
+									info.dbHandleBufUVA,
+									&info.dbHandleBufSize,
+									dbHandleArray,
+									&retval);
+				if (info.result == VMCI_SUCCESS && !retval) {
+					info.result =
+					    VMCICopyHandleArrayToUser((void *)
+								      (uintptr_t)
+								      info.qpHandleBufUVA,
+								      &info.qpHandleBufSize,
+								      qpHandleArray,
+								      &retval);
+				}
+				if (!retval) {
+					retval =
+					    copy_to_user((void *)ioarg,
+							 &info, sizeof info);
+				}
+				VMCIContext_ReceiveNotificationsRelease
+				    (cid, dbHandleArray, qpHandleArray,
+				     info.result == VMCI_SUCCESS && !retval);
+			} else {
+				retval =
+				    copy_to_user((void *)ioarg, &info,
+						 sizeof info);
+			}
+			break;
+		}
+
+	default:
+		Warning(LGPFX "Unknown ioctl (iocmd=%d).\n", iocmd);
+		retval = -EINVAL;
+	}
+
+	return retval;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIUserVALockPage --
+ *
+ *      Lock physical page backing a given user VA.  Copied from
+ *      bora/modules/vmnet/linux/userif.c:UserIfLockPage().  TODO libify the
+ *      common code.
+ *
+ * Results:
+ *      Pointer to struct page on success, NULL otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline struct page *VMCIUserVALockPage(uintptr_t addr)	// IN:
+{
+	struct page *page = NULL;
+	int retval;
+
+	down_read(&current->mm->mmap_sem);
+	retval = get_user_pages(current, current->mm, addr,
+				1, 1, 0, &page, NULL);
+	up_read(&current->mm->mmap_sem);
+
+	if (retval != 1) {
+		return NULL;
+	}
+
+	return page;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIMapBoolPtr --
+ *
+ *      Lock physical page backing a given user VA and maps it to kernel
+ *      address space.  The range of the mapped memory should be within a
+ *      single page otherwise an error is returned.  Copied from
+ *      bora/modules/vmnet/linux/userif.c:VNetUserIfMapUint32Ptr().  TODO
+ *      libify the common code.
+ *
+ * Results:
+ *      0 on success, negative error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static inline int VMCIMapBoolPtr(uintptr_t notifyUVA,	// IN:
+				 struct page **p,	// OUT:
+				 bool ** notifyPtr)	// OUT:
+{
+	if (!access_ok(VERIFY_WRITE, notifyUVA, sizeof **notifyPtr) ||
+	    (((notifyUVA + sizeof **notifyPtr - 1) & ~(PAGE_SIZE - 1)) !=
+	     (notifyUVA & ~(PAGE_SIZE - 1)))) {
+		return -EINVAL;
+	}
+
+	*p = VMCIUserVALockPage(notifyUVA);
+	if (*p == NULL) {
+		return -EAGAIN;
+	}
+
+	*notifyPtr =
+	    (bool *) ((uint8_t *) kmap(*p) + (notifyUVA & (PAGE_SIZE - 1)));
+	return 0;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCISetupNotify --
+ *
+ *      Sets up a given context for notify to work.  Calls VMCIMapBoolPtr()
+ *      which maps the notify boolean in user VA in kernel space.
+ *
+ * Results:
+ *      VMCI_SUCCESS on success, error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static int VMCISetupNotify(struct vmci_context *context,	// IN:
+			   uintptr_t notifyUVA)	// IN:
+{
+	int retval;
+
+	if (context->notify) {
+		Warning(LGPFX "Notify mechanism is already set up.\n");
+		return VMCI_ERROR_DUPLICATE_ENTRY;
+	}
+
+	retval =
+	    VMCIMapBoolPtr(notifyUVA, &context->notifyPage,
+			   &context->notify) ==
+	    0 ? VMCI_SUCCESS : VMCI_ERROR_GENERIC;
+	if (retval == VMCI_SUCCESS) {
+		VMCIContext_CheckAndSignalNotify(context);
+	}
+
+	return retval;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIUnsetNotifyInt --
+ *
+ *      Internal version of VMCIUnsetNotify, that allows for locking
+ *      the context before unsetting the notify pointer. If useLock is
+ *      true, the context lock is grabbed.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static void VMCIUnsetNotifyInt(struct vmci_context *context,	// IN
+			       bool useLock)	// IN
+{
+	if (useLock) {
+		spin_lock(&context->lock);
+	}
+
+	if (context->notifyPage) {
+		struct page *notifyPage = context->notifyPage;
+
+		context->notify = NULL;
+		context->notifyPage = NULL;
+
+		if (useLock) {
+			spin_unlock(&context->lock);
+		}
+
+		kunmap(notifyPage);
+		put_page(notifyPage);
+	} else {
+		if (useLock) {
+			spin_unlock(&context->lock);
+		}
+	}
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIUnsetNotify --
+ *
+ *      Reverts actions set up by VMCISetupNotify().  Unmaps and unlocks the
+ *      page mapped/locked by VMCISetupNotify().
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void VMCIUnsetNotify(struct vmci_context *context)	// IN:
+{
+	VMCIUnsetNotifyInt(context, false);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * PCI device support --
+ *
+ *      The following functions implement the support for the VMCI
+ *      guest device. This includes initializing the device and
+ *      interrupt handling.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * dispatch_datagrams --
+ *
+ *      Reads and dispatches incoming datagrams.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      Reads data from the device.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void dispatch_datagrams(unsigned long data)
+{
+	struct vmci_device *dev = (struct vmci_device *)data;
+
+	if (dev == NULL) {
+		printk(KERN_DEBUG
+		       "vmci: dispatch_datagrams(): no vmci device"
+		       "present.\n");
+		return;
+	}
+
+	if (data_buffer == NULL) {
+		printk(KERN_DEBUG
+		       "vmci: dispatch_datagrams(): no buffer present.\n");
+		return;
+	}
+
+	VMCI_ReadDatagramsFromPort((int)0,
+				   dev->ioaddr + VMCI_DATA_IN_ADDR,
+				   data_buffer, data_buffer_size);
+}
+DECLARE_TASKLET(vmci_dg_tasklet, dispatch_datagrams, (unsigned long)&vmci_dev);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * process_bitmap --
+ *
+ *      Scans the notification bitmap for raised flags, clears them
+ *      and handles the notifications.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void process_bitmap(unsigned long data)
+{
+	struct vmci_device *dev = (struct vmci_device *)data;
+
+	if (dev == NULL) {
+		printk(KERN_DEBUG "vmci: process_bitmaps(): no vmci device"
+		       "present.\n");
+		return;
+	}
+
+	if (notification_bitmap == NULL) {
+		printk(KERN_DEBUG
+		       "vmci: process_bitmaps(): no bitmap present.\n");
+		return;
+	}
+
+	VMCI_ScanNotificationBitmap(notification_bitmap);
+}
+DECLARE_TASKLET(vmci_bm_tasklet, process_bitmap, (unsigned long)&vmci_dev);
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * vmci_guest_init --
+ *
+ *      Initializes the VMCI PCI device. The initialization might fail
+ *      if there is no VMCI PCI device.
+ *
+ * Results:
+ *      0 on success, other error codes on failure.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static int vmci_guest_init(void)
+{
+	int retval;
+
+	/* Initialize guest device data. */
+	mutex_init(&vmci_dev.lock);
+	vmci_dev.intr_type = VMCI_INTR_TYPE_INTX;
+	vmci_dev.exclusive_vectors = false;
+	spin_lock_init(&vmci_dev.dev_spinlock);
+	vmci_dev.enabled = false;
+	atomic_set(&vmci_dev.datagrams_allowed, 0);
+	atomic_set(&guestDeviceActive, 0);
+
+	data_buffer = vmalloc(data_buffer_size);
+	if (!data_buffer) {
+		return -ENOMEM;
+	}
+
+	/* This should be last to make sure we are done initializing. */
+	retval = pci_register_driver(&vmci_driver);
+	if (retval < 0) {
+		vfree(data_buffer);
+		data_buffer = NULL;
+		return retval;
+	}
+
+	return 0;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * vmci_enable_msix --
+ *
+ *      Enable MSI-X.  Try exclusive vectors first, then shared vectors.
+ *
+ * Results:
+ *      0 on success, other error codes on failure.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static int vmci_enable_msix(struct pci_dev *pdev)	// IN
+{
+	int i;
+	int result;
+
+	for (i = 0; i < VMCI_MAX_INTRS; ++i) {
+		vmci_dev.msix_entries[i].entry = i;
+		vmci_dev.msix_entries[i].vector = i;
+	}
+
+	result = pci_enable_msix(pdev, vmci_dev.msix_entries, VMCI_MAX_INTRS);
+	if (!result) {
+		vmci_dev.exclusive_vectors = true;
+	} else if (result > 0) {
+		result = pci_enable_msix(pdev, vmci_dev.msix_entries, 1);
+	}
+	return result;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * vmci_interrupt --
+ *
+ *      Interrupt handler for legacy or MSI interrupt, or for first MSI-X
+ *      interrupt (vector VMCI_INTR_DATAGRAM).
+ *
+ * Results:
+ *      COMPAT_IRQ_HANDLED if the interrupt is handled, COMPAT_IRQ_NONE if
+ *      not an interrupt.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static irqreturn_t vmci_interrupt(int irq,	// IN
+				  void *clientdata)	// IN
+{
+	struct vmci_device *dev = clientdata;
+
+	if (dev == NULL) {
+		printk(KERN_DEBUG
+		       "vmci_interrupt(): irq %d for unknown device.\n", irq);
+		return IRQ_NONE;
+	}
+
+	/*
+	 * If we are using MSI-X with exclusive vectors then we simply schedule
+	 * the datagram tasklet, since we know the interrupt was meant for us.
+	 * Otherwise we must read the ICR to determine what to do.
+	 */
+
+	if (dev->intr_type == VMCI_INTR_TYPE_MSIX && dev->exclusive_vectors) {
+		tasklet_schedule(&vmci_dg_tasklet);
+	} else {
+		unsigned int icr;
+
+		ASSERT(dev->intr_type == VMCI_INTR_TYPE_INTX ||
+		       dev->intr_type == VMCI_INTR_TYPE_MSI);
+
+		/* Acknowledge interrupt and determine what needs doing. */
+		icr = inl(dev->ioaddr + VMCI_ICR_ADDR);
+		if (icr == 0 || icr == 0xffffffff) {
+			return IRQ_NONE;
+		}
+
+		if (icr & VMCI_ICR_DATAGRAM) {
+			tasklet_schedule(&vmci_dg_tasklet);
+			icr &= ~VMCI_ICR_DATAGRAM;
+		}
+		if (icr & VMCI_ICR_NOTIFICATION) {
+			tasklet_schedule(&vmci_bm_tasklet);
+			icr &= ~VMCI_ICR_NOTIFICATION;
+		}
+		if (icr != 0) {
+			printk(KERN_INFO LGPFX
+			       "Ignoring unknown interrupt cause (%d).\n", icr);
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * vmci_interrupt_bm --
+ *
+ *      Interrupt handler for MSI-X interrupt vector VMCI_INTR_NOTIFICATION,
+ *      which is for the notification bitmap.  Will only get called if we are
+ *      using MSI-X with exclusive vectors.
+ *
+ * Results:
+ *      COMPAT_IRQ_HANDLED if the interrupt is handled, COMPAT_IRQ_NONE if
+ *      not an interrupt.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static irqreturn_t vmci_interrupt_bm(int irq,	// IN
+				     void *clientdata)	// IN
+{
+	struct vmci_device *dev = clientdata;
+
+	if (dev == NULL) {
+		printk(KERN_DEBUG
+		       "vmci_interrupt_bm(): irq %d for unknown device.\n",
+		       irq);
+		return IRQ_NONE;
+	}
+
+	/* For MSI-X we can just assume it was meant for us. */
+	ASSERT(dev->intr_type == VMCI_INTR_TYPE_MSIX && dev->exclusive_vectors);
+	tasklet_schedule(&vmci_bm_tasklet);
+
+	return IRQ_HANDLED;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * vmci_probe_device --
+ *
+ *      Most of the initialization at module load time is done here.
+ *
+ * Results:
+ *      Returns 0 for success, an error otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static int __devinit vmci_probe_device(struct pci_dev *pdev,	// IN: vmci PCI device
+				       const struct pci_device_id *id)	// IN: matching device ID
+{
+	unsigned int ioaddr;
+	unsigned int ioaddr_size;
+	unsigned int capabilities;
+	int result;
+
+	printk(KERN_INFO "Probing for vmci/PCI.\n");
+
+	result = pci_enable_device(pdev);
+	if (result) {
+		printk(KERN_ERR "Cannot VMCI device %s: error %d\n",
+		       pci_name(pdev), result);
+		return result;
+	}
+	pci_set_master(pdev);	/* To enable QueuePair functionality. */
+	ioaddr = pci_resource_start(pdev, 0);
+	ioaddr_size = pci_resource_len(pdev, 0);
+
+	/*
+	 * Request I/O region with adjusted base address and size. The adjusted
+	 * values are needed and used if we release the region in case of failure.
+	 */
+
+	if (!request_region(ioaddr, ioaddr_size, "vmci")) {
+		printk(KERN_INFO "vmci: Another driver already loaded "
+		       "for device in slot %s.\n", pci_name(pdev));
+		goto pci_disable;
+	}
+
+	printk(KERN_INFO "Found vmci/PCI at %#x, irq %u.\n", ioaddr, pdev->irq);
+
+	/*
+	 * Verify that the VMCI Device supports the capabilities that
+	 * we need. If the device is missing capabilities that we would
+	 * like to use, check for fallback capabilities and use those
+	 * instead (so we can run a new VM on old hosts). Fail the load if
+	 * a required capability is missing and there is no fallback.
+	 *
+	 * Right now, we need datagrams. There are no fallbacks.
+	 */
+	capabilities = inl(ioaddr + VMCI_CAPS_ADDR);
+
+	if ((capabilities & VMCI_CAPS_DATAGRAM) == 0) {
+		printk(KERN_ERR "VMCI device does not support datagrams.\n");
+		goto release;
+	}
+
+	/*
+	 * If the hardware supports notifications, we will use that as
+	 * well.
+	 */
+	if (capabilities & VMCI_CAPS_NOTIFICATIONS) {
+		capabilities = VMCI_CAPS_DATAGRAM;
+		notification_bitmap = vmalloc(PAGE_SIZE);
+		if (notification_bitmap == NULL) {
+			printk(KERN_ERR
+			       "VMCI device unable to allocate notification bitmap.\n");
+		} else {
+			memset(notification_bitmap, 0, PAGE_SIZE);
+			capabilities |= VMCI_CAPS_NOTIFICATIONS;
+		}
+	} else {
+		capabilities = VMCI_CAPS_DATAGRAM;
+	}
+	printk(KERN_INFO "VMCI: using capabilities 0x%x.\n", capabilities);
+
+	/* Let the host know which capabilities we intend to use. */
+	outl(capabilities, ioaddr + VMCI_CAPS_ADDR);
+
+	/* Device struct initialization. */
+	mutex_lock(&vmci_dev.lock);
+	if (vmci_dev.enabled) {
+		printk(KERN_ERR "VMCI device already enabled.\n");
+		goto unlock;
+	}
+
+	vmci_dev.ioaddr = ioaddr;
+	vmci_dev.ioaddr_size = ioaddr_size;
+	atomic_set(&vmci_dev.datagrams_allowed, 1);
+
+	/*
+	 * Register notification bitmap with device if that capability is
+	 * used
+	 */
+	if (capabilities & VMCI_CAPS_NOTIFICATIONS) {
+		unsigned long bitmapPPN;
+		bitmapPPN = page_to_pfn(vmalloc_to_page(notification_bitmap));
+		if (!VMCI_RegisterNotificationBitmap(bitmapPPN)) {
+			printk(KERN_ERR
+			       "VMCI device unable to register notification bitmap "
+			       "with PPN 0x%x.\n", (uint32_t) bitmapPPN);
+			goto datagram_disallow;
+		}
+	}
+
+	/* Check host capabilities. */
+	if (!VMCI_CheckHostCapabilities()) {
+		goto remove_bitmap;
+	}
+
+	/* Enable device. */
+	vmci_dev.enabled = true;
+	pci_set_drvdata(pdev, &vmci_dev);
+
+	/*
+	 * We do global initialization here because we need datagrams
+	 * during VMCIUtil_Init, since it registers for VMCI events. If we
+	 * ever support more than one VMCI device we will have to create
+	 * seperate LateInit/EarlyExit functions that can be used to do
+	 * initialization/cleanup that depends on the device being
+	 * accessible.  We need to initialize VMCI components before
+	 * requesting an irq - the VMCI interrupt handler uses these
+	 * components, and it may be invoked once request_irq() has
+	 * registered the handler (as the irq line may be shared).
+	 */
+	VMCIUtil_Init();
+
+	if (VMCIQPGuestEndpoints_Init() < VMCI_SUCCESS) {
+		goto util_exit;
+	}
+
+	/*
+	 * Enable interrupts.  Try MSI-X first, then MSI, and then fallback on
+	 * legacy interrupts.
+	 */
+	if (!vmci_disable_msix && !vmci_enable_msix(pdev)) {
+		vmci_dev.intr_type = VMCI_INTR_TYPE_MSIX;
+		vmci_dev.irq = vmci_dev.msix_entries[0].vector;
+	} else if (!vmci_disable_msi && !pci_enable_msi(pdev)) {
+		vmci_dev.intr_type = VMCI_INTR_TYPE_MSI;
+		vmci_dev.irq = pdev->irq;
+	} else {
+		vmci_dev.intr_type = VMCI_INTR_TYPE_INTX;
+		vmci_dev.irq = pdev->irq;
+	}
+
+	/* Request IRQ for legacy or MSI interrupts, or for first MSI-X vector. */
+	result = request_irq(vmci_dev.irq, vmci_interrupt, IRQF_SHARED,
+			     "vmci", &vmci_dev);
+	if (result) {
+		printk(KERN_ERR "vmci: irq %u in use: %d\n", vmci_dev.irq,
+		       result);
+		goto components_exit;
+	}
+
+	/*
+	 * For MSI-X with exclusive vectors we need to request an interrupt for each
+	 * vector so that we get a separate interrupt handler routine.  This allows
+	 * us to distinguish between the vectors.
+	 */
+
+	if (vmci_dev.exclusive_vectors) {
+		ASSERT(vmci_dev.intr_type == VMCI_INTR_TYPE_MSIX);
+		result = request_irq(vmci_dev.msix_entries[1].vector,
+				     vmci_interrupt_bm, 0, "vmci", &vmci_dev);
+		if (result) {
+			printk(KERN_ERR "vmci: irq %u in use: %d\n",
+			       vmci_dev.msix_entries[1].vector, result);
+			free_irq(vmci_dev.irq, &vmci_dev);
+			goto components_exit;
+		}
+	}
+
+	printk(KERN_INFO "Registered vmci device.\n");
+
+	atomic_inc(&guestDeviceActive);
+
+	mutex_unlock(&vmci_dev.lock);
+
+	/* Enable specific interrupt bits. */
+	if (capabilities & VMCI_CAPS_NOTIFICATIONS) {
+		outl(VMCI_IMR_DATAGRAM | VMCI_IMR_NOTIFICATION,
+		     vmci_dev.ioaddr + VMCI_IMR_ADDR);
+	} else {
+		outl(VMCI_IMR_DATAGRAM, vmci_dev.ioaddr + VMCI_IMR_ADDR);
+	}
+
+	/* Enable interrupts. */
+	outl(VMCI_CONTROL_INT_ENABLE, vmci_dev.ioaddr + VMCI_CONTROL_ADDR);
+
+	return 0;
+
+ components_exit:
+	VMCIQPGuestEndpoints_Exit();
+ util_exit:
+	VMCIUtil_Exit();
+	vmci_dev.enabled = false;
+	if (vmci_dev.intr_type == VMCI_INTR_TYPE_MSIX) {
+		pci_disable_msix(pdev);
+	} else if (vmci_dev.intr_type == VMCI_INTR_TYPE_MSI) {
+		pci_disable_msi(pdev);
+	}
+ remove_bitmap:
+	if (notification_bitmap) {
+		outl(VMCI_CONTROL_RESET, vmci_dev.ioaddr + VMCI_CONTROL_ADDR);
+	}
+ datagram_disallow:
+	atomic_set(&vmci_dev.datagrams_allowed, 0);
+ unlock:
+	mutex_unlock(&vmci_dev.lock);
+ release:
+	if (notification_bitmap) {
+		vfree(notification_bitmap);
+		notification_bitmap = NULL;
+	}
+	release_region(ioaddr, ioaddr_size);
+ pci_disable:
+	pci_disable_device(pdev);
+	return -EBUSY;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * vmci_remove_device --
+ *
+ *      Cleanup, called for each device on unload.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static void __devexit vmci_remove_device(struct pci_dev *pdev)
+{
+	struct vmci_device *dev = pci_get_drvdata(pdev);
+
+	printk(KERN_INFO "Removing vmci device\n");
+
+	atomic_dec(&guestDeviceActive);
+
+	VMCIQPGuestEndpoints_Exit();
+	VMCIUtil_Exit();
+
+	mutex_lock(&dev->lock);
+
+	atomic_set(&vmci_dev.datagrams_allowed, 0);
+
+	printk(KERN_INFO "Resetting vmci device\n");
+	outl(VMCI_CONTROL_RESET, vmci_dev.ioaddr + VMCI_CONTROL_ADDR);
+
+	/*
+	 * Free IRQ and then disable MSI/MSI-X as appropriate.  For MSI-X, we might
+	 * have multiple vectors, each with their own IRQ, which we must free too.
+	 */
+
+	free_irq(dev->irq, dev);
+	if (dev->intr_type == VMCI_INTR_TYPE_MSIX) {
+		if (dev->exclusive_vectors) {
+			free_irq(dev->msix_entries[1].vector, dev);
+		}
+		pci_disable_msix(pdev);
+	} else if (dev->intr_type == VMCI_INTR_TYPE_MSI) {
+		pci_disable_msi(pdev);
+	}
+	dev->exclusive_vectors = false;
+	dev->intr_type = VMCI_INTR_TYPE_INTX;
+
+	release_region(dev->ioaddr, dev->ioaddr_size);
+	dev->enabled = false;
+	if (notification_bitmap) {
+		/*
+		 * The device reset above cleared the bitmap state of the
+		 * device, so we can safely free it here.
+		 */
+
+		vfree(notification_bitmap);
+		notification_bitmap = NULL;
+	}
+
+	printk(KERN_INFO "Unregistered vmci device.\n");
+	mutex_unlock(&dev->lock);
+
+	pci_disable_device(pdev);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCI_DeviceEnabled --
+ *
+ *      Checks whether the VMCI device is enabled.
+ *
+ * Results:
+ *      true if device is enabled, false otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+bool VMCI_DeviceEnabled(void)
+{
+	return VMCI_GuestPersonalityActive()
+	    || VMCI_HostPersonalityActive();
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCI_SendDatagram --
+ *
+ *      VM to hypervisor call mechanism. We use the standard VMware naming
+ *      convention since shared code is calling this function as well.
+ *
+ * Results:
+ *      The result of the hypercall.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCI_SendDatagram(struct vmci_datagram *dg)
+{
+	unsigned long flags;
+	int result;
+
+	/* Check args. */
+	if (dg == NULL) {
+		return VMCI_ERROR_INVALID_ARGS;
+	}
+
+	if (atomic_read(&vmci_dev.datagrams_allowed) == 0) {
+		return VMCI_ERROR_UNAVAILABLE;
+	}
+
+	/*
+	 * Need to acquire spinlock on the device because
+	 * the datagram data may be spread over multiple pages and the monitor may
+	 * interleave device user rpc calls from multiple VCPUs. Acquiring the
+	 * spinlock precludes that possibility. Disabling interrupts to avoid
+	 * incoming datagrams during a "rep out" and possibly landing up in this
+	 * function.
+	 */
+	spin_lock_irqsave(&vmci_dev.dev_spinlock, flags);
+
+	/*
+	 * Send the datagram and retrieve the return value from the result register.
+	 */
+	__asm__ __volatile__("cld\n\t" "rep outsb\n\t":	/* No output. */
+			     :"d"(vmci_dev.ioaddr + VMCI_DATA_OUT_ADDR),
+			     "c"(VMCI_DG_SIZE(dg)), "S"(dg)
+	    );
+
+	/*
+	 * XXX Should read result high port as well when updating handlers to
+	 * return 64bit.
+	 */
+	result = inl(vmci_dev.ioaddr + VMCI_RESULT_LOW_ADDR);
+	spin_unlock_irqrestore(&vmci_dev.dev_spinlock, flags);
+
+	return result;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * Shared functions --
+ *
+ *      Functions shared between host and guest personality.
+ *
+ *----------------------------------------------------------------------
+ */
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCI_GuestPersonalityActive --
+ *
+ *      Determines whether the VMCI PCI device has been successfully
+ *      initialized.
+ *
+ * Results:
+ *      true, if VMCI guest device is operational, false otherwise.
+ *
+ * Side effects:
+ *      Reads data from the device.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+bool VMCI_GuestPersonalityActive(void)
+{
+	return guestDeviceInit && atomic_read(&guestDeviceActive) > 0;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCI_HostPersonalityActive --
+ *
+ *      Determines whether the VMCI host personality is
+ *      available. Since the core functionality of the host driver is
+ *      always present, all guests could possibly use the host
+ *      personality. However, to minimize the deviation from the
+ *      pre-unified driver state of affairs, we only consider the host
+ *      device active, if there is no active guest device, or if there
+ *      are VMX'en with active VMCI contexts using the host device.
+ *
+ * Results:
+ *      true, if VMCI host driver is operational, false otherwise.
+ *
+ * Side effects:
+ *      Reads data from the device.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+bool VMCI_HostPersonalityActive(void)
+{
+	return hostDeviceInit &&
+	    (!VMCI_GuestPersonalityActive() ||
+	     atomic_read(&linuxState.activeContexts) > 0);
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * Module definitions --
+ *
+ *      Implements support for module load/unload.
+ *
+ *----------------------------------------------------------------------
+ */
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * vmci_init --
+ *
+ *      linux module entry point. Called by /sbin/insmod command
+ *
+ * Results:
+ *      registers a device driver for a major # that depends
+ *      on the uid. Add yourself to that list.  List is now in
+ *      private/driver-private.c.
+ *
+ *----------------------------------------------------------------------
+ */
+
+static int __init vmci_init(void)
+{
+	int retval;
+
+	retval = VMCI_SharedInit();
+	if (retval != VMCI_SUCCESS) {
+		Warning(LGPFX
+			"Failed to initialize VMCI common components (err=%d).\n",
+			retval);
+		return -ENOMEM;
+	}
+
+	if (vmci_disable_guest) {
+		guestDeviceInit = 0;
+	} else {
+		retval = vmci_guest_init();
+		if (retval != 0) {
+			Warning(LGPFX
+				"VMCI PCI device not initialized (err=%d).\n",
+				retval);
+		}
+		guestDeviceInit = (retval == 0);
+		if (VMCI_GuestPersonalityActive()) {
+			printk(KERN_INFO LGPFX "Using guest personality\n");
+		}
+	}
+
+	if (vmci_disable_host) {
+		hostDeviceInit = 0;
+	} else {
+		retval = vmci_host_init();
+		if (retval != 0) {
+			Warning(LGPFX
+				"Unable to initialize host personality (err=%d).\n",
+				retval);
+		}
+		hostDeviceInit = (retval == 0);
+		if (hostDeviceInit) {
+			printk(KERN_INFO LGPFX "Using host personality\n");
+		}
+	}
+
+	if (!guestDeviceInit && !hostDeviceInit) {
+		VMCI_SharedCleanup();
+		return -ENODEV;
+	}
+
+	printk(KERN_INFO LGPFX "Module (name=%s) is initialized\n",
+	       linuxState.deviceName);
+
+	return 0;
+}
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * vmci_exit --
+ *
+ *      Called by /sbin/rmmod
+ *
+ *
+ *----------------------------------------------------------------------
+ */
+
+static void __exit vmci_exit(void)
+{
+	int retval;
+
+	if (guestDeviceInit) {
+		pci_unregister_driver(&vmci_driver);
+		vfree(data_buffer);
+		guestDeviceInit = false;
+	}
+
+	if (hostDeviceInit) {
+		VMCI_HostCleanup();
+
+		retval = misc_deregister(&linuxState.misc);
+		if (retval) {
+			Warning(LGPFX "Module %s: error unregistering\n",
+				linuxState.deviceName);
+		} else {
+			printk(KERN_INFO LGPFX "Module %s: unloaded\n",
+			       linuxState.deviceName);
+		}
+
+		hostDeviceInit = false;
+	}
+
+	VMCI_SharedCleanup();
+}
+
+module_init(vmci_init);
+module_exit(vmci_exit);
+MODULE_DEVICE_TABLE(pci, vmci_ids);
+
+module_param_named(disable_host, vmci_disable_host, bool, 0);
+MODULE_PARM_DESC(disable_host, "Disable driver host personality - (default=0)");
+
+module_param_named(disable_guest, vmci_disable_guest, bool, 0);
+MODULE_PARM_DESC(disable_guest,
+		 "Disable driver guest personality - (default=0)");
+
+module_param_named(disable_msi, vmci_disable_msi, bool, 0);
+MODULE_PARM_DESC(disable_msi, "Disable MSI use in driver - (default=0)");
+
+module_param_named(disable_msix, vmci_disable_msix, bool, 0);
+MODULE_PARM_DESC(disable_msix, "Disable MSI-X use in driver - (default=0)");
+
+MODULE_AUTHOR("VMware, Inc.");
+MODULE_DESCRIPTION("VMware Virtual Machine Communication Interface (VMCI).");
+MODULE_VERSION(VMCI_DRIVER_VERSION_STRING);
+MODULE_LICENSE("GPL v2");
+
+/*
+ * Starting with SLE10sp2, Novell requires that IHVs sign a support agreement
+ * with them and mark their kernel modules as externally supported via a
+ * change to the module header. If this isn't done, the module will not load
+ * by default (i.e., neither mkinitrd nor modprobe will accept it).
+ */
+MODULE_INFO(supported, "external");
diff --git a/drivers/misc/vmw_vmci/vmciKernelIf.c b/drivers/misc/vmw_vmci/vmciKernelIf.c
new file mode 100644
index 0000000..7001149
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmciKernelIf.c
@@ -0,0 +1,1351 @@
+/*
+ *
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+#include <linux/mm.h>		/* For vmalloc_to_page() and get_user_pages() */
+#include <linux/pagemap.h>	/* For page_cache_release() */
+#include <linux/sched.h>
+#include <linux/semaphore.h>
+#include <linux/socket.h>	/* For memcpy_{to,from}iovec(). */
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/version.h>
+#include <linux/vmalloc.h>
+#include <linux/wait.h>
+#include <linux/workqueue.h>
+
+#include "vmci_iocontrols.h"
+#include "vmci_kernel_if.h"
+#include "vmciQueue.h"
+#include "vmciQueuePair.h"
+
+/* The Kernel specific component of the struct vmci_queue structure. */
+struct vmci_queue_kern_if {
+	struct page **page;
+	struct page **headerPage;
+	struct semaphore __mutex;
+	struct semaphore *mutex;
+	bool host;
+	size_t numPages;
+};
+
+struct vmci_dlyd_wrk_info {
+	struct work_struct work;
+	VMCIWorkFn *workFn;
+	void *data;
+};
+
+/*
+ *----------------------------------------------------------------------
+ *
+ * VMCIHost_WaitForCallLocked --
+ *
+ *      Wait until a VMCI call is pending or the waiting thread is
+ *      interrupted. It is assumed that a lock is held prior to
+ *      calling this function. The lock will be released during the
+ *      wait. The correctnes of this funtion depends on that the same
+ *      lock is held when the call is signalled.
+ *
+ * Results:
+ *      true on success
+ *      false if the wait was interrupted.
+ *
+ * Side effects:
+ *      The call may block.
+ *
+ *----------------------------------------------------------------------
+ */
+
+bool VMCIHost_WaitForCallLocked(struct vmci_host *hostContext,	// IN
+				spinlock_t * lock,	// IN
+				unsigned long *flags,	// IN
+				bool useBH)	// IN
+{
+	DECLARE_WAITQUEUE(wait, current);
+
+	/*
+	 * The thread must be added to the wait queue and have its state
+	 * changed while holding the lock - otherwise a signal may change
+	 * the state in between and have it overwritten causing a loss of
+	 * the event.
+	 */
+
+	add_wait_queue(&hostContext->waitQueue, &wait);
+	current->state = TASK_INTERRUPTIBLE;
+
+	if (useBH) {
+		spin_unlock_bh(lock);
+	} else {
+		spin_unlock(lock);
+	}
+
+	schedule();
+
+	if (useBH) {
+		spin_lock_bh(lock);
+	} else {
+		spin_lock(lock);
+	}
+
+	current->state = TASK_RUNNING;
+
+	remove_wait_queue(&hostContext->waitQueue, &wait);
+
+	if (signal_pending(current))
+		return false;
+
+	return true;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIHost_CompareUser --
+ *
+ *      Determines whether the two users are the same.
+ *
+ * Results:
+ *      VMCI_SUCCESS if equal, error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIHost_CompareUser(uid_t * user1, uid_t * user2)
+{
+	if (!user1 || !user2)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	if (*user1 == *user2)
+		return VMCI_SUCCESS;
+
+	return VMCI_ERROR_GENERIC;
+}
+
+/*
+ *----------------------------------------------------------------------------
+ *
+ * VMCIDelayedWorkCB
+ *
+ *      Called in a worker thread context.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------------
+ */
+
+static void VMCIDelayedWorkCB(struct work_struct *work)	// IN
+{
+	struct vmci_dlyd_wrk_info *delayedWorkInfo;
+
+	delayedWorkInfo = container_of(work, struct vmci_dlyd_wrk_info, work);
+	ASSERT(delayedWorkInfo);
+	ASSERT(delayedWorkInfo->workFn);
+
+	delayedWorkInfo->workFn(delayedWorkInfo->data);
+
+	kfree(delayedWorkInfo);
+}
+
+/*
+ *----------------------------------------------------------------------------
+ *
+ * VMCI_ScheduleDelayedWork --
+ *
+ *      Schedule the specified callback.
+ *
+ * Results:
+ *      Zero on success, error code otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *----------------------------------------------------------------------------
+ */
+
+int VMCI_ScheduleDelayedWork(VMCIWorkFn * workFn,	// IN
+			     void *data)	// IN
+{
+	struct vmci_dlyd_wrk_info *delayedWorkInfo;
+
+	ASSERT(workFn);
+
+	delayedWorkInfo = kmalloc(sizeof *delayedWorkInfo, GFP_ATOMIC);
+	if (!delayedWorkInfo)
+		return VMCI_ERROR_NO_MEM;
+
+	delayedWorkInfo->workFn = workFn;
+	delayedWorkInfo->data = data;
+
+	INIT_WORK(&delayedWorkInfo->work, VMCIDelayedWorkCB);
+
+	schedule_work(&delayedWorkInfo->work);
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCI_WaitOnEvent --
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void VMCI_WaitOnEvent(wait_queue_head_t * event,	// IN:
+		      VMCIEventReleaseCB releaseCB,	// IN:
+		      void *clientData)	// IN:
+{
+	/*
+	 * XXX Should this be a TASK_UNINTERRUPTIBLE wait? I'm leaving it
+	 * as it was for now.
+	 */
+	VMCI_WaitOnEventInterruptible(event, releaseCB, clientData);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCI_WaitOnEventInterruptible --
+ *
+ * Results:
+ *      True if the wait was interrupted by a signal, false otherwise.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+bool VMCI_WaitOnEventInterruptible(wait_queue_head_t * event,	// IN:
+				   VMCIEventReleaseCB releaseCB,	// IN:
+				   void *clientData)	// IN:
+{
+	DECLARE_WAITQUEUE(wait, current);
+
+	if (event == NULL || releaseCB == NULL)
+		return false;
+
+	add_wait_queue(event, &wait);
+	current->state = TASK_INTERRUPTIBLE;
+
+	/*
+	 * Release the lock or other primitive that makes it possible for us to
+	 * put the current thread on the wait queue without missing the signal.
+	 * Ie. on Linux we need to put ourselves on the wait queue and set our
+	 * stateto TASK_INTERRUPTIBLE without another thread signalling us.
+	 * The releaseCB is used to synchronize this.
+	 */
+	releaseCB(clientData);
+
+	schedule();
+	current->state = TASK_RUNNING;
+	remove_wait_queue(event, &wait);
+
+	return signal_pending(current);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCI_AllocQueue --
+ *
+ *      Allocates kernel VA space of specified size, plus space for the
+ *      queue structure/kernel interface and the queue header.  Allocates
+ *      physical pages for the queue data pages.
+ *
+ *      PAGE m:      struct vmci_queue_header (struct vmci_queue->qHeader)
+ *      PAGE m+1:    struct vmci_queue
+ *      PAGE m+1+q:  struct vmci_queue_kern_if (struct vmci_queue->kernelIf)
+ *      PAGE n-size: Data pages (struct vmci_queue->kernelIf->page[])
+ *
+ * Results:
+ *      Pointer to the queue on success, NULL otherwise.
+ *
+ * Side effects:
+ *      Memory is allocated.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void *VMCI_AllocQueue(uint64_t size)	// IN: size of queue (not including header)
+{
+	uint64_t i;
+	struct vmci_queue *queue;
+	struct vmci_queue_header *qHeader;
+	const uint64_t numDataPages = CEILING(size, PAGE_SIZE);
+	const uint queueSize =
+	    PAGE_SIZE +
+	    sizeof *queue + sizeof *(queue->kernelIf) +
+	    numDataPages * sizeof *(queue->kernelIf->page);
+
+	/*
+	 * Size should be enforced by VMCIQPair_Alloc(), double-check here.
+	 * Allocating too much on Linux can cause the system to become
+	 * unresponsive, because we allocate page-by-page, and we allow the
+	 * system to wait for pages rather than fail.
+	 */
+	if (size > VMCI_MAX_GUEST_QP_MEMORY) {
+		ASSERT(false);
+		return NULL;
+	}
+
+	qHeader = (struct vmci_queue_header *)vmalloc(queueSize);
+	if (!qHeader)
+		return NULL;
+
+	queue = (struct vmci_queue *)((uint8_t *) qHeader + PAGE_SIZE);
+	queue->qHeader = qHeader;
+	queue->savedHeader = NULL;
+	queue->kernelIf =
+	    (struct vmci_queue_kern_if *)((uint8_t *) queue + sizeof *queue);
+	queue->kernelIf->headerPage = NULL;	// Unused in guest.
+	queue->kernelIf->page =
+	    (struct page **)((uint8_t *) queue->kernelIf +
+			     sizeof *(queue->kernelIf));
+	queue->kernelIf->host = false;
+
+	for (i = 0; i < numDataPages; i++) {
+		queue->kernelIf->page[i] = alloc_pages(GFP_KERNEL, 0);
+		if (!queue->kernelIf->page[i]) {
+			while (i) {
+				__free_page(queue->kernelIf->page[--i]);
+			}
+			vfree(qHeader);
+			return NULL;
+		}
+	}
+
+	return (void *)queue;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCI_FreeQueue --
+ *
+ *      Frees kernel VA space for a given queue and its queue header, and
+ *      frees physical data pages.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      Memory is freed.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void VMCI_FreeQueue(void *q,	// IN:
+		    uint64_t size)	// IN: size of queue (not including header)
+{
+	struct vmci_queue *queue = q;
+
+	if (queue) {
+		uint64_t i;
+		for (i = 0; i < CEILING(size, PAGE_SIZE); i++) {
+			__free_page(queue->kernelIf->page[i]);
+		}
+		vfree(queue->qHeader);
+	}
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCI_AllocPPNSet --
+ *
+ *      Allocates two list of PPNs --- one for the pages in the produce queue,
+ *      and the other for the pages in the consume queue. Intializes the list
+ *      of PPNs with the page frame numbers of the KVA for the two queues (and
+ *      the queue headers).
+ *
+ * Results:
+ *      Success or failure.
+ *
+ * Side effects:
+ *      Memory may be allocated.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCI_AllocPPNSet(void *prodQ,	// IN:
+		     uint64_t numProducePages,	// IN: for queue plus header
+		     void *consQ,	// IN:
+		     uint64_t numConsumePages,	// IN: for queue plus header
+		     struct PPNSet *ppnSet)	// OUT:
+{
+	uint32_t *producePPNs;
+	uint32_t *consumePPNs;
+	struct vmci_queue *produceQ = prodQ;
+	struct vmci_queue *consumeQ = consQ;
+	uint64_t i;
+
+	if (!produceQ || !numProducePages || !consumeQ ||
+	    !numConsumePages || !ppnSet)
+		return VMCI_ERROR_INVALID_ARGS;
+
+	if (ppnSet->initialized)
+		return VMCI_ERROR_ALREADY_EXISTS;
+
+	producePPNs =
+	    kmalloc(numProducePages * sizeof *producePPNs, GFP_KERNEL);
+	if (!producePPNs)
+		return VMCI_ERROR_NO_MEM;
+
+	consumePPNs =
+	    kmalloc(numConsumePages * sizeof *consumePPNs, GFP_KERNEL);
+	if (!consumePPNs) {
+		kfree(producePPNs);
+		return VMCI_ERROR_NO_MEM;
+	}
+
+	producePPNs[0] = page_to_pfn(vmalloc_to_page(produceQ->qHeader));
+	for (i = 1; i < numProducePages; i++) {
+		unsigned long pfn;
+
+		producePPNs[i] = pfn =
+		    page_to_pfn(produceQ->kernelIf->page[i - 1]);
+
+		/* Fail allocation if PFN isn't supported by hypervisor. */
+		if (sizeof pfn > sizeof *producePPNs && pfn != producePPNs[i])
+			goto ppnError;
+	}
+	consumePPNs[0] = page_to_pfn(vmalloc_to_page(consumeQ->qHeader));
+	for (i = 1; i < numConsumePages; i++) {
+		unsigned long pfn;
+
+		consumePPNs[i] = pfn =
+		    page_to_pfn(consumeQ->kernelIf->page[i - 1]);
+
+		/* Fail allocation if PFN isn't supported by hypervisor. */
+		if (sizeof pfn > sizeof *consumePPNs && pfn != consumePPNs[i])
+			goto ppnError;
+	}
+
+	ppnSet->numProducePages = numProducePages;
+	ppnSet->numConsumePages = numConsumePages;
+	ppnSet->producePPNs = producePPNs;
+	ppnSet->consumePPNs = consumePPNs;
+	ppnSet->initialized = true;
+	return VMCI_SUCCESS;
+
+ ppnError:
+	kfree(producePPNs);
+	kfree(consumePPNs);
+	return VMCI_ERROR_INVALID_ARGS;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCI_FreePPNSet --
+ *
+ *      Frees the two list of PPNs for a queue pair.
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void VMCI_FreePPNSet(struct PPNSet *ppnSet)	// IN:
+{
+	ASSERT(ppnSet);
+	if (ppnSet->initialized) {
+		/* Do not call these functions on NULL inputs. */
+		ASSERT(ppnSet->producePPNs && ppnSet->consumePPNs);
+		kfree(ppnSet->producePPNs);
+		kfree(ppnSet->consumePPNs);
+	}
+	memset(ppnSet, 0, sizeof *ppnSet);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCI_PopulatePPNList --
+ *
+ *      Populates the list of PPNs in the hypercall structure with the PPNS
+ *      of the produce queue and the consume queue.
+ *
+ * Results:
+ *      VMCI_SUCCESS.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCI_PopulatePPNList(uint8_t * callBuf,	// OUT:
+			 const struct PPNSet *ppnSet)	// IN:
+{
+	ASSERT(callBuf && ppnSet && ppnSet->initialized);
+	memcpy(callBuf, ppnSet->producePPNs,
+	       ppnSet->numProducePages * sizeof *ppnSet->producePPNs);
+	memcpy(callBuf +
+	       ppnSet->numProducePages * sizeof *ppnSet->producePPNs,
+	       ppnSet->consumePPNs,
+	       ppnSet->numConsumePages * sizeof *ppnSet->consumePPNs);
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * __VMCIMemcpyToQueue --
+ *
+ *      Copies from a given buffer or iovector to a VMCI Queue.  Uses
+ *      kmap()/kunmap() to dynamically map/unmap required portions of the queue
+ *      by traversing the offset -> page translation structure for the queue.
+ *      Assumes that offset + size does not wrap around in the queue.
+ *
+ * Results:
+ *      Zero on success, negative error code on failure.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int __VMCIMemcpyToQueue(struct vmci_queue *queue,	// OUT:
+			uint64_t queueOffset,	// IN:
+			const void *src,	// IN:
+			size_t size,	// IN:
+			bool isIovec)	// IN: if src is a struct iovec *
+{
+	struct vmci_queue_kern_if *kernelIf = queue->kernelIf;
+	size_t bytesCopied = 0;
+
+	while (bytesCopied < size) {
+		uint64_t pageIndex = (queueOffset + bytesCopied) / PAGE_SIZE;
+		size_t pageOffset =
+		    (queueOffset + bytesCopied) & (PAGE_SIZE - 1);
+		void *va = kmap(kernelIf->page[pageIndex]);
+		size_t toCopy;
+
+		ASSERT(va);
+		if (size - bytesCopied > PAGE_SIZE - pageOffset) {
+			/* Enough payload to fill up from this page. */
+			toCopy = PAGE_SIZE - pageOffset;
+		} else {
+			toCopy = size - bytesCopied;
+		}
+
+		if (isIovec) {
+			struct iovec *iov = (struct iovec *)src;
+			int err;
+
+			/* The iovec will track bytesCopied internally. */
+			err =
+			    memcpy_fromiovec((uint8_t *) va + pageOffset,
+					     iov, toCopy);
+			if (err != 0) {
+				kunmap(kernelIf->page[pageIndex]);
+				return VMCI_ERROR_INVALID_ARGS;
+			}
+		} else {
+			memcpy((uint8_t *) va + pageOffset,
+			       (uint8_t *) src + bytesCopied, toCopy);
+		}
+
+		bytesCopied += toCopy;
+		kunmap(kernelIf->page[pageIndex]);
+	}
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * __VMCIMemcpyFromQueue --
+ *
+ *      Copies to a given buffer or iovector from a VMCI Queue.  Uses
+ *      kmap()/kunmap() to dynamically map/unmap required portions of the queue
+ *      by traversing the offset -> page translation structure for the queue.
+ *      Assumes that offset + size does not wrap around in the queue.
+ *
+ * Results:
+ *      Zero on success, negative error code on failure.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int __VMCIMemcpyFromQueue(void *dest,	// OUT:
+			  const struct vmci_queue *queue,	// IN:
+			  uint64_t queueOffset,	// IN:
+			  size_t size,	// IN:
+			  bool isIovec)	// IN: if dest is a struct iovec *
+{
+	struct vmci_queue_kern_if *kernelIf = queue->kernelIf;
+	size_t bytesCopied = 0;
+
+	while (bytesCopied < size) {
+		uint64_t pageIndex = (queueOffset + bytesCopied) / PAGE_SIZE;
+		size_t pageOffset =
+		    (queueOffset + bytesCopied) & (PAGE_SIZE - 1);
+		void *va = kmap(kernelIf->page[pageIndex]);
+		size_t toCopy;
+
+		ASSERT(va);
+		if (size - bytesCopied > PAGE_SIZE - pageOffset) {
+			/* Enough payload to fill up this page. */
+			toCopy = PAGE_SIZE - pageOffset;
+		} else {
+			toCopy = size - bytesCopied;
+		}
+
+		if (isIovec) {
+			struct iovec *iov = (struct iovec *)dest;
+			int err;
+
+			/* The iovec will track bytesCopied internally. */
+			err =
+			    memcpy_toiovec(iov,
+					   (uint8_t *) va + pageOffset, toCopy);
+			if (err != 0) {
+				kunmap(kernelIf->page[pageIndex]);
+				return VMCI_ERROR_INVALID_ARGS;
+			}
+		} else {
+			memcpy((uint8_t *) dest + bytesCopied,
+			       (uint8_t *) va + pageOffset, toCopy);
+		}
+
+		bytesCopied += toCopy;
+		kunmap(kernelIf->page[pageIndex]);
+	}
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIMemcpyToQueue --
+ *
+ *      Copies from a given buffer to a VMCI Queue.
+ *
+ * Results:
+ *      Zero on success, negative error code on failure.
+ *
+ * Side effects:
+ *      None.
+ *
+ * XXX: REMOVE
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIMemcpyToQueue(struct vmci_queue *queue,	// OUT:
+		      uint64_t queueOffset,	// IN:
+		      const void *src,	// IN:
+		      size_t srcOffset,	// IN:
+		      size_t size,	// IN:
+		      int bufType)	// IN: Unused
+{
+	return __VMCIMemcpyToQueue(queue, queueOffset,
+				   (uint8_t *) src + srcOffset, size, false);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIMemcpyFromQueue --
+ *
+ *      Copies to a given buffer from a VMCI Queue.
+ *
+ * Results:
+ *      Zero on success, negative error code on failure.
+ *
+ * Side effects:
+ *      None.
+ *
+ * XXX: REMOVE
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIMemcpyFromQueue(void *dest,	// OUT:
+			size_t destOffset,	// IN:
+			const struct vmci_queue *queue,	// IN:
+			uint64_t queueOffset,	// IN:
+			size_t size,	// IN:
+			int bufType)	// IN: Unused
+{
+	return __VMCIMemcpyFromQueue((uint8_t *) dest + destOffset,
+				     queue, queueOffset, size, false);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIMemcpyToQueueLocal --
+ *
+ *      Copies from a given buffer to a local VMCI queue. On Linux, this is the
+ *      same as a regular copy.
+ *
+ * Results:
+ *      Zero on success, negative error code on failure.
+ *
+ * Side effects:
+ *      None.
+ *
+ * XXX: REMOVE
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIMemcpyToQueueLocal(struct vmci_queue *queue,	// OUT
+			   uint64_t queueOffset,	// IN
+			   const void *src,	// IN
+			   size_t srcOffset,	// IN
+			   size_t size,	// IN
+			   int bufType)	// IN
+{
+	return __VMCIMemcpyToQueue(queue, queueOffset,
+				   (uint8_t *) src + srcOffset, size, false);;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIMemcpyFromQueueLocal --
+ *
+ *      Copies to a given buffer from a VMCI Queue.
+ *
+ * Results:
+ *      Zero on success, negative error code on failure.
+ *
+ * Side effects:
+ *      None.
+ *
+ * XXX: REMOVE
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIMemcpyFromQueueLocal(void *dest,	// OUT:
+			     size_t destOffset,	// IN:
+			     const struct vmci_queue *queue,	// IN:
+			     uint64_t queueOffset,	// IN:
+			     size_t size,	// IN:
+			     int bufType)	// IN: Unused
+{
+	return __VMCIMemcpyFromQueue((uint8_t *) dest + destOffset,
+				     queue, queueOffset, size, false);
+}
+
+/*
+ *----------------------------------------------------------------------------
+ *
+ * VMCIMemcpyToQueueV --
+ *
+ *      Copies from a given iovec from a VMCI Queue.
+ *
+ * Results:
+ *      Zero on success, negative error code on failure.
+ *
+ * Side effects:
+ *      None.
+ *
+ * XXX: REMOVE
+ *----------------------------------------------------------------------------
+ */
+
+int VMCIMemcpyToQueueV(struct vmci_queue *queue,	// OUT:
+		       uint64_t queueOffset,	// IN:
+		       const void *src,	// IN: iovec
+		       size_t srcOffset,	// IN: ignored
+		       size_t size,	// IN:
+		       int bufType)	// IN: ignored
+{
+
+	/*
+	 * We ignore srcOffset because src is really a struct iovec * and will
+	 * maintain offset internally.
+	 */
+	return __VMCIMemcpyToQueue(queue, queueOffset, src, size, true);
+}
+
+/*
+ *----------------------------------------------------------------------------
+ *
+ * VMCIMemcpyFromQueueV --
+ *
+ *      Copies to a given iovec from a VMCI Queue.
+ *
+ * Results:
+ *      Zero on success, negative error code on failure.
+ *
+ * Side effects:
+ *      None.
+ *
+ * XXX: REMOVE
+ *----------------------------------------------------------------------------
+ */
+
+int VMCIMemcpyFromQueueV(void *dest,	// OUT: iovec
+			 size_t destOffset,	// IN: ignored
+			 const struct vmci_queue *queue,	// IN:
+			 uint64_t queueOffset,	// IN:
+			 size_t size,	// IN:
+			 int bufType)	// IN: ignored
+{
+	/*
+	 * We ignore destOffset because dest is really a struct iovec * and will
+	 * maintain offset internally.
+	 */
+	return __VMCIMemcpyFromQueue(dest, queue, queueOffset, size, true);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIWellKnownID_AllowMap --
+ *
+ *      Checks whether the calling context is allowed to register for the given
+ *      well known service ID.  Currently returns false if the service ID is
+ *      within the reserved range and VMCI_PRIVILEGE_FLAG_TRUSTED is not
+ *      provided as the input privilege flags.  Otherwise returns true.
+ *      XXX TODO access control based on host configuration information; this
+ *      will be platform specific implementation.
+ *
+ * Results:
+ *      Boolean value indicating access granted or denied.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+bool VMCIWellKnownID_AllowMap(uint32_t wellKnownID,	// IN:
+			      uint32_t privFlags)	// IN:
+{
+	return (!(wellKnownID < VMCI_RESERVED_RESOURCE_ID_MAX &&
+		  !(privFlags & VMCI_PRIVILEGE_FLAG_TRUSTED)));
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIHost_AllocQueue --
+ *
+ *      Allocates kernel VA space of specified size plus space for the queue
+ *      and kernel interface.  This is different from the guest queue allocator,
+ *      because we do not allocate our own queue header/data pages here but
+ *      share those of the guest.
+ *
+ * Results:
+ *      A pointer to an allocated and initialized struct vmci_queue structure or NULL.
+ *
+ * Side effects:
+ *      None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+struct vmci_queue *VMCIHost_AllocQueue(uint64_t size)	// IN:
+{
+	struct vmci_queue *queue;
+	const size_t numPages = CEILING(size, PAGE_SIZE) + 1;
+	const size_t queueSize = sizeof *queue + sizeof *(queue->kernelIf);
+	const size_t queuePageSize = numPages * sizeof *queue->kernelIf->page;
+
+	queue = kmalloc(queueSize + queuePageSize, GFP_KERNEL);
+	if (queue) {
+		queue->qHeader = NULL;
+		queue->savedHeader = NULL;
+		queue->kernelIf =
+		    (struct vmci_queue_kern_if *)((uint8_t *) queue +
+						  sizeof *queue);
+		queue->kernelIf->host = true;
+		queue->kernelIf->mutex = NULL;
+		queue->kernelIf->numPages = numPages;
+		queue->kernelIf->headerPage =
+		    (struct page **)((uint8_t *) queue + queueSize);
+		queue->kernelIf->page = &queue->kernelIf->headerPage[1];
+		memset(queue->kernelIf->headerPage, 0,
+		       sizeof *queue->kernelIf->headerPage *
+		       queue->kernelIf->numPages);
+	}
+
+	return queue;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIHost_FreeQueue --
+ *
+ *      Frees kernel memory for a given queue (header plus translation
+ *      structure).
+ *
+ * Results:
+ *      None.
+ *
+ * Side effects:
+ *      Memory is freed.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void VMCIHost_FreeQueue(struct vmci_queue *queue,	// IN:
+			uint64_t queueSize)	// IN:
+{
+	if (queue)
+		kfree(queue);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCI_InitQueueMutex()
+ *
+ *       Initialize the mutex for the pair of queues.  This mutex is used to
+ *       protect the qHeader and the buffer from changing out from under any
+ *       users of either queue.  Of course, it's only any good if the mutexes
+ *       are actually acquired.  Queue structure must lie on non-paged memory
+ *       or we cannot guarantee access to the mutex.
+ *
+ * Results:
+ *       None.
+ *
+ * Side Effects:
+ *       None.
+ *
+ *----------------------------------------------------------------------------
+ */
+
+void VMCI_InitQueueMutex(struct vmci_queue *produceQ,	// IN/OUT
+			 struct vmci_queue *consumeQ)	// IN/OUT
+{
+	ASSERT(produceQ);
+	ASSERT(consumeQ);
+	ASSERT(produceQ->kernelIf);
+	ASSERT(consumeQ->kernelIf);
+
+	/*
+	 * Only the host queue has shared state - the guest queues do not
+	 * need to synchronize access using a queue mutex.
+	 */
+
+	if (produceQ->kernelIf->host) {
+		produceQ->kernelIf->mutex = &produceQ->kernelIf->__mutex;
+		consumeQ->kernelIf->mutex = &produceQ->kernelIf->__mutex;
+		sema_init(produceQ->kernelIf->mutex, 1);
+	}
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCI_CleanupQueueMutex()
+ *
+ *       Cleans up the mutex for the pair of queues.
+ *
+ * Results:
+ *       None.
+ *
+ * Side Effects:
+ *       None.
+ *
+ *----------------------------------------------------------------------------
+ */
+
+void VMCI_CleanupQueueMutex(struct vmci_queue *produceQ,	// IN/OUT
+			    struct vmci_queue *consumeQ)	// IN/OUT
+{
+	ASSERT(produceQ);
+	ASSERT(consumeQ);
+	ASSERT(produceQ->kernelIf);
+	ASSERT(consumeQ->kernelIf);
+
+	if (produceQ->kernelIf->host) {
+		produceQ->kernelIf->mutex = NULL;
+		consumeQ->kernelIf->mutex = NULL;
+	}
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCI_AcquireQueueMutex()
+ *
+ *       Acquire the mutex for the queue.  Note that the produceQ and
+ *       the consumeQ share a mutex.  So, only one of the two need to
+ *       be passed in to this routine.  Either will work just fine.
+ *
+ * Results:
+ *       None.
+ *
+ * Side Effects:
+ *       May block the caller.
+ *
+ *----------------------------------------------------------------------------
+ */
+
+void VMCI_AcquireQueueMutex(struct vmci_queue *queue)	// IN
+{
+	ASSERT(queue);
+	ASSERT(queue->kernelIf);
+
+	if (queue->kernelIf->host) {
+		ASSERT(queue->kernelIf->mutex);
+		down(queue->kernelIf->mutex);
+	}
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCI_ReleaseQueueMutex()
+ *
+ *       Release the mutex for the queue.  Note that the produceQ and
+ *       the consumeQ share a mutex.  So, only one of the two need to
+ *       be passed in to this routine.  Either will work just fine.
+ *
+ * Results:
+ *       None.
+ *
+ * Side Effects:
+ *       May block the caller.
+ *
+ *----------------------------------------------------------------------------
+ */
+
+void VMCI_ReleaseQueueMutex(struct vmci_queue *queue)	// IN
+{
+	ASSERT(queue);
+	ASSERT(queue->kernelIf);
+
+	if (queue->kernelIf->host) {
+		ASSERT(queue->kernelIf->mutex);
+		up(queue->kernelIf->mutex);
+	}
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIReleasePageStorePages --
+ *
+ *       Helper function to release pages in the PageStoreAttachInfo
+ *       previously obtained using get_user_pages.
+ *
+ * Results:
+ *       None.
+ *
+ * Side Effects:
+ *       None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+static void VMCIReleasePages(struct page **pages,	// IN
+			     uint64_t numPages,	// IN
+			     bool dirty)	// IN
+{
+	int i;
+
+	for (i = 0; i < numPages; i++) {
+		ASSERT(pages[i]);
+
+		if (dirty)
+			set_page_dirty(pages[i]);
+
+		page_cache_release(pages[i]);
+		pages[i] = NULL;
+	}
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIHost_RegisterUserMemory --
+ *
+ *       Registers the specification of the user pages used for backing a queue
+ *       pair. Enough information to map in pages is stored in the OS specific
+ *       part of the struct vmci_queue structure.
+ *
+ * Results:
+ *       VMCI_SUCCESS on sucess, negative error code on failure.
+ *
+ * Side Effects:
+ *       None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIHost_RegisterUserMemory(QueuePairPageStore * pageStore,	// IN
+				struct vmci_queue *produceQ,	// OUT
+				struct vmci_queue *consumeQ)	// OUT
+{
+	uint64_t produceUVA;
+	uint64_t consumeUVA;
+
+	ASSERT(produceQ->kernelIf->headerPage
+	       && consumeQ->kernelIf->headerPage);
+
+	/*
+	 * The new style and the old style mapping only differs in that we either
+	 * get a single or two UVAs, so we split the single UVA range at the
+	 * appropriate spot.
+	 */
+
+	produceUVA = pageStore->pages;
+	consumeUVA =
+	    pageStore->pages + produceQ->kernelIf->numPages * PAGE_SIZE;
+	return VMCIHost_GetUserMemory(produceUVA, consumeUVA, produceQ,
+				      consumeQ);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIHost_UnregisterUserMemory --
+ *
+ *       Releases and removes the references to user pages stored in the attach
+ *       struct.
+ *
+ * Results:
+ *       None
+ *
+ * Side Effects:
+ *       Pages are released from the page cache and may become
+ *       swappable again.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void VMCIHost_UnregisterUserMemory(struct vmci_queue *produceQ,	// IN/OUT
+				   struct vmci_queue *consumeQ)	// IN/OUT
+{
+	ASSERT(produceQ->kernelIf);
+	ASSERT(consumeQ->kernelIf);
+	ASSERT(!produceQ->qHeader && !consumeQ->qHeader);
+
+	VMCIReleasePages(produceQ->kernelIf->headerPage,
+			 produceQ->kernelIf->numPages, true);
+	memset(produceQ->kernelIf->headerPage, 0,
+	       sizeof *produceQ->kernelIf->headerPage *
+	       produceQ->kernelIf->numPages);
+	VMCIReleasePages(consumeQ->kernelIf->headerPage,
+			 consumeQ->kernelIf->numPages, true);
+	memset(consumeQ->kernelIf->headerPage, 0,
+	       sizeof *consumeQ->kernelIf->headerPage *
+	       consumeQ->kernelIf->numPages);
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIHost_MapQueueHeaders --
+ *
+ *       Once VMCIHost_RegisterUserMemory has been performed on a
+ *       queue, the queue pair headers can be mapped into the
+ *       kernel. Once mapped, they must be unmapped with
+ *       VMCIHost_UnmapQueueHeaders prior to calling
+ *       VMCIHost_UnregisterUserMemory.
+ *
+ * Results:
+ *       VMCI_SUCCESS if pages are mapped, appropriate error code otherwise.
+ *
+ * Side Effects:
+ *       Pages are pinned.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIHost_MapQueueHeaders(struct vmci_queue *produceQ,	// IN/OUT
+			     struct vmci_queue *consumeQ)	// IN/OUT
+{
+	int result;
+
+	if (!produceQ->qHeader || !consumeQ->qHeader) {
+		struct page *headers[2];
+
+		if (produceQ->qHeader != consumeQ->qHeader)
+			return VMCI_ERROR_QUEUEPAIR_MISMATCH;
+
+		if (produceQ->kernelIf->headerPage == NULL ||
+		    *produceQ->kernelIf->headerPage == NULL)
+			return VMCI_ERROR_UNAVAILABLE;
+
+		ASSERT(*produceQ->kernelIf->headerPage
+		       && *consumeQ->kernelIf->headerPage);
+
+		headers[0] = *produceQ->kernelIf->headerPage;
+		headers[1] = *consumeQ->kernelIf->headerPage;
+
+		produceQ->qHeader = vmap(headers, 2, VM_MAP, PAGE_KERNEL);
+		if (produceQ->qHeader != NULL) {
+			consumeQ->qHeader =
+			    (struct vmci_queue_header *)((uint8_t *)
+							 produceQ->qHeader +
+							 PAGE_SIZE);
+			result = VMCI_SUCCESS;
+		} else {
+			Log("vmap failed\n");
+			result = VMCI_ERROR_NO_MEM;
+		}
+	} else {
+		result = VMCI_SUCCESS;
+	}
+
+	return result;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIHost_UnmapQueueHeaders --
+ *
+ *       Unmaps previously mapped queue pair headers from the kernel.
+ *
+ * Results:
+ *       VMCI_SUCCESS always.
+ *
+ * Side Effects:
+ *       Pages are unpinned.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIHost_UnmapQueueHeaders(uint32_t gid,	// IN
+			       struct vmci_queue *produceQ,	// IN/OUT
+			       struct vmci_queue *consumeQ)	// IN/OUT
+{
+	if (produceQ->qHeader) {
+		ASSERT(consumeQ->qHeader);
+
+		if (produceQ->qHeader < consumeQ->qHeader) {
+			vunmap(produceQ->qHeader);
+		} else {
+			vunmap(consumeQ->qHeader);
+		}
+		produceQ->qHeader = NULL;
+		consumeQ->qHeader = NULL;
+	}
+
+	return VMCI_SUCCESS;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIHost_GetUserMemory --
+ *
+ *
+ *       Lock the user pages referenced by the {produce,consume}Buffer
+ *       struct into memory and populate the {produce,consume}Pages
+ *       arrays in the attach structure with them.
+ *
+ * Results:
+ *       VMCI_SUCCESS on sucess, negative error code on failure.
+ *
+ * Side Effects:
+ *       None.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+int VMCIHost_GetUserMemory(uint64_t produceUVA,	// IN
+			   uint64_t consumeUVA,	// IN
+			   struct vmci_queue *produceQ,	// OUT
+			   struct vmci_queue *consumeQ)	// OUT
+{
+	int retval;
+	int err = VMCI_SUCCESS;
+
+	down_write(&current->mm->mmap_sem);
+	retval = get_user_pages(current,
+				current->mm,
+				(uintptr_t) produceUVA,
+				produceQ->kernelIf->numPages,
+				1, 0, produceQ->kernelIf->headerPage, NULL);
+	if (retval < produceQ->kernelIf->numPages) {
+		Log("get_user_pages(produce) failed (retval=%d)\n", retval);
+		VMCIReleasePages(produceQ->kernelIf->headerPage, retval, false);
+		err = VMCI_ERROR_NO_MEM;
+		goto out;
+	}
+
+	retval = get_user_pages(current,
+				current->mm,
+				(uintptr_t) consumeUVA,
+				consumeQ->kernelIf->numPages,
+				1, 0, consumeQ->kernelIf->headerPage, NULL);
+	if (retval < consumeQ->kernelIf->numPages) {
+		Log("get_user_pages(consume) failed (retval=%d)\n", retval);
+		VMCIReleasePages(consumeQ->kernelIf->headerPage, retval, false);
+		VMCIReleasePages(produceQ->kernelIf->headerPage,
+				 produceQ->kernelIf->numPages, false);
+		err = VMCI_ERROR_NO_MEM;
+	}
+
+ out:
+	up_write(&current->mm->mmap_sem);
+
+	return err;
+}
+
+/*
+ *-----------------------------------------------------------------------------
+ *
+ * VMCIHost_ReleaseUserMemory --
+ *       Release the reference to user pages stored in the attach
+ *       struct
+ *
+ * Results:
+ *       None
+ *
+ * Side Effects:
+ *       Pages are released from the page cache and may become
+ *       swappable again.
+ *
+ *-----------------------------------------------------------------------------
+ */
+
+void VMCIHost_ReleaseUserMemory(struct vmci_queue *produceQ,	// IN/OUT
+				struct vmci_queue *consumeQ)	// IN/OUT
+{
+	ASSERT(produceQ->kernelIf->headerPage);
+
+	VMCIHost_UnregisterUserMemory(produceQ, consumeQ);
+}
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 14/14] Add Kconfig and Makefiles for VMCI
  2012-02-15  1:05 [PATCH 00/14] RFC: VMCI for Linux Andrew Stiegmann (stieg)
                   ` (12 preceding siblings ...)
  2012-02-15  1:05 ` [PATCH 13/14] Add main driver and kernel interface file Andrew Stiegmann (stieg)
@ 2012-02-15  1:05 ` Andrew Stiegmann (stieg)
  2012-02-17 19:28 ` [PATCH 00/14] RFC: VMCI for Linux Pavel Machek
  14 siblings, 0 replies; 16+ messages in thread
From: Andrew Stiegmann (stieg) @ 2012-02-15  1:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: vm-crosstalk, dtor, cschamp, Andrew Stiegmann (stieg)

---
 drivers/misc/Kconfig           |    1 +
 drivers/misc/Makefile          |    1 +
 drivers/misc/vmw_vmci/Kconfig  |   16 ++++++++++++++++
 drivers/misc/vmw_vmci/Makefile |   36 ++++++++++++++++++++++++++++++++++++
 4 files changed, 54 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/vmw_vmci/Kconfig
 create mode 100644 drivers/misc/vmw_vmci/Makefile

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 5664696..b761cc5 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -508,5 +508,6 @@ source "drivers/misc/ti-st/Kconfig"
 source "drivers/misc/lis3lv02d/Kconfig"
 source "drivers/misc/carma/Kconfig"
 source "drivers/misc/altera-stapl/Kconfig"
+source "drivers/misc/vmw_vmci/Kconfig"
 
 endif # MISC_DEVICES
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index b26495a..5aba5b5 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -48,3 +48,4 @@ obj-y				+= lis3lv02d/
 obj-y				+= carma/
 obj-$(CONFIG_USB_SWITCH_FSA9480) += fsa9480.o
 obj-$(CONFIG_ALTERA_STAPL)	+=altera-stapl/
+obj-$(CONFIG_VMW_VMCI)		+= vmw_vmci/
diff --git a/drivers/misc/vmw_vmci/Kconfig b/drivers/misc/vmw_vmci/Kconfig
new file mode 100644
index 0000000..55015e7
--- /dev/null
+++ b/drivers/misc/vmw_vmci/Kconfig
@@ -0,0 +1,16 @@
+#
+# VMware VMCI device
+#
+
+config VMWARE_VMCI
+	tristate "VMware VMCI Driver"
+	depends on X86
+	help
+	  This is VMware's Virtual Machine Communication Interface.  It enables
+	  high-speed communication between host and guest in a virtual
+	  environment via the VMCI virtual device.
+
+	  If unsure, say N.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called vmw_vmci.
diff --git a/drivers/misc/vmw_vmci/Makefile b/drivers/misc/vmw_vmci/Makefile
new file mode 100644
index 0000000..899e565
--- /dev/null
+++ b/drivers/misc/vmw_vmci/Makefile
@@ -0,0 +1,36 @@
+################################################################################
+#
+# Linux driver for VMware's VMCI device.
+#
+# Copyright (C) 2007-2012, VMware, Inc. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by the
+# Free Software Foundation; version 2 of the License and no later version.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+# NON INFRINGEMENT.  See the GNU General Public License for more
+# details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+#
+# The full GNU General Public License is included in this distribution in
+# the file called "COPYING".
+#
+# Maintained by: Andrew Stiegmann <pv-drivers@vmware.com>
+#
+################################################################################
+
+#
+# Makefile for the VMware VMCI
+#
+
+obj-$(CONFIG_VMWARE_VMCI) += vmci.o
+
+vmci-objs := driver.o vmciContext.o vmciDatagram.o vmciDriver.o vmciDoorbell.o
+vmci-objs += vmciEvent.o vmciHashtable.o vmciKernelIf.o vmciQPair.o
+vmci-objs += vmciQueuePair.o vmciResource.o vmciRoute.o
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/14] RFC: VMCI for Linux
  2012-02-15  1:05 [PATCH 00/14] RFC: VMCI for Linux Andrew Stiegmann (stieg)
                   ` (13 preceding siblings ...)
  2012-02-15  1:05 ` [PATCH 14/14] Add Kconfig and Makefiles for VMCI Andrew Stiegmann (stieg)
@ 2012-02-17 19:28 ` Pavel Machek
  14 siblings, 0 replies; 16+ messages in thread
From: Pavel Machek @ 2012-02-17 19:28 UTC (permalink / raw)
  To: Andrew Stiegmann (stieg); +Cc: linux-kernel, vm-crosstalk, dtor, cschamp


>  drivers/misc/vmw_vmci/Kconfig               |   16 +
>  drivers/misc/vmw_vmci/Makefile              |   36 +
>  drivers/misc/vmw_vmci/driver.c              | 2352 +++++++++++++++++++++++
>  drivers/misc/vmw_vmci/vmciCommonInt.h       |  105 ++
>  drivers/misc/vmw_vmci/vmciContext.c         | 1763

CamelCase in both file names and file contents... please don't.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-02-17 19:28 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-15  1:05 [PATCH 00/14] RFC: VMCI for Linux Andrew Stiegmann (stieg)
2012-02-15  1:05 ` [PATCH 01/14] Add vmciContext.* Andrew Stiegmann (stieg)
2012-02-15  1:05 ` [PATCH 02/14] Add vmciDatagram.* Andrew Stiegmann (stieg)
2012-02-15  1:05 ` [PATCH 03/14] Add vmciDoorbell.* Andrew Stiegmann (stieg)
2012-02-15  1:05 ` [PATCH 04/14] Add vmciDriver.* Andrew Stiegmann (stieg)
2012-02-15  1:05 ` [PATCH 05/14] Add vmciEvent.* Andrew Stiegmann (stieg)
2012-02-15  1:05 ` [PATCH 06/14] Add vmciHashtable.* Andrew Stiegmann (stieg)
2012-02-15  1:05 ` [PATCH 07/14] Add vmciQueuePair.* Andrew Stiegmann (stieg)
2012-02-15  1:05 ` [PATCH 08/14] Add vmciResource.* Andrew Stiegmann (stieg)
2012-02-15  1:05 ` [PATCH 09/14] Add vmciRoute.* Andrew Stiegmann (stieg)
2012-02-15  1:05 ` [PATCH 10/14] Add accessor methods for Queue Pairs in VMCI Andrew Stiegmann (stieg)
2012-02-15  1:05 ` [PATCH 11/14] Add VMCI kernel API defs and the internal header file Andrew Stiegmann (stieg)
2012-02-15  1:05 ` [PATCH 12/14] Add misc header files used by VMCI Andrew Stiegmann (stieg)
2012-02-15  1:05 ` [PATCH 13/14] Add main driver and kernel interface file Andrew Stiegmann (stieg)
2012-02-15  1:05 ` [PATCH 14/14] Add Kconfig and Makefiles for VMCI Andrew Stiegmann (stieg)
2012-02-17 19:28 ` [PATCH 00/14] RFC: VMCI for Linux Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox