Linux-HyperV List

Linux-HyperV List
 help / color / mirror / Atom feed

* RE: [EXTERNAL] Re: [PATCH 1/2] x86/Kconfig: Allow CONFIG_X86_MPPARSE disable for OF platforms
From: Saurabh Singh Sengar @ 2023-06-02 12:22 UTC (permalink / raw)
  To: Dave Hansen, Saurabh Sengar, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
	Dexuan Cui, Michael Kelley (LINUX), linux-kernel@vger.kernel.org,
	linux-hyperv@vger.kernel.org
In-Reply-To: <13922610-b9bc-ab4b-8482-9aae238396c7@intel.com>



> -----Original Message-----
> From: Dave Hansen <dave.hansen@intel.com>
> Sent: Thursday, May 25, 2023 7:26 PM
> To: Saurabh Singh Sengar <ssengar@microsoft.com>; Saurabh Sengar
> <ssengar@linux.microsoft.com>; tglx@linutronix.de; mingo@redhat.com;
> bp@alien8.de; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> <decui@microsoft.com>; Michael Kelley (LINUX) <mikelley@microsoft.com>;
> linux-kernel@vger.kernel.org; linux-hyperv@vger.kernel.org
> Subject: Re: [EXTERNAL] Re: [PATCH 1/2] x86/Kconfig: Allow
> CONFIG_X86_MPPARSE disable for OF platforms
> 
> On 5/25/23 00:06, Saurabh Singh Sengar wrote:
> > When CONFIG_X86_MPPARSE is enabled, the kernel will scan low memory,
> > looking for MP tables. In Hyper-V VBS setup, lower memory is assigned to
> VTL0.
> > This lower memory may contain the actual MPPARSE table for VTL0, which
> > can confuse the VTLx kernel and cause issues. (x > 0)
> 
> This still just seems wrong.
> 
> If an action crashes the kernel because of changes, we don't just compile it
> out and move on.  We add enumeration for the new feature that's causing it
> and turn off the action at runtime.
> 

Thanks, I greatly appreciate your suggestion and wanted to let you know that
I am actively working on upstreaming the new fix for the VTL driver to bypass
the need for the MP table scan.

Furthermore, I would like to learn about the rationale behind disallowing the
disablement of CONFIG_X86_MPPARSE when MP tables are not in use. Considering
that we compile out the features we don't support, wouldn't it be acceptable to
allow users to customize their configurations in this manner? Allowing the
disablement of CONFIG_X86_MPPARSE would provide greater flexibility and
efficiency for those who do not utilize MP tables.


^ permalink raw reply

* Re: [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions
From: Peter Zijlstra @ 2023-06-02 11:54 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: bigeasy, mark.rutland, maz, catalin.marinas, will, chenhuacai,
	kernel, hca, gor, agordeev, borntraeger, svens, pbonzini,
	wanpengli, vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa,
	jgross, boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu,
	decui, rafael, longman, boqun.feng, pmladek, senozhatsky, rostedt,
	john.ogness, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, bristot, jstultz, sboyd, linux-kernel,
	loongarch, linux-s390, kvm, linux-hyperv, linux-pm
In-Reply-To: <xhsmho7m9ptrk.mognet@vschneid.remote.csb>

On Wed, May 24, 2023 at 05:40:47PM +0100, Valentin Schneider wrote:
> On 19/05/23 12:21, Peter Zijlstra wrote:
> > With the intent to provide local_clock_noinstr(), a variant of
> > local_clock() that's safe to be called from noinstr code (with the
> > assumption that any such code will already be non-preemptible),
> > prepare for things by providing a noinstr sched_clock_read() function.
> >
> > Specifically, preempt_enable_*() calls out to schedule(), which upsets
> > noinstr validation efforts.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  arch/arm64/include/asm/arch_timer.h  |    8 ----
> >  drivers/clocksource/arm_arch_timer.c |   60 ++++++++++++++++++++++++++---------
> >  2 files changed, 47 insertions(+), 21 deletions(-)
> >
> > --- a/arch/arm64/include/asm/arch_timer.h
> > +++ b/arch/arm64/include/asm/arch_timer.h
> > @@ -88,13 +88,7 @@ static inline notrace u64 arch_timer_rea
> >
> >  #define arch_timer_reg_read_stable(reg)					\
> >       ({								\
> > -		u64 _val;						\
> > -									\
> > -		preempt_disable_notrace();				\
> > -		_val = erratum_handler(read_ ## reg)();			\
> > -		preempt_enable_notrace();				\
> > -									\
> > -		_val;							\
> > +		erratum_handler(read_ ## reg)();			\
> >       })
> >
> >  /*
> > --- a/drivers/clocksource/arm_arch_timer.c
> > +++ b/drivers/clocksource/arm_arch_timer.c
> > @@ -191,22 +191,40 @@ u32 arch_timer_reg_read(int access, enum
> >       return val;
> >  }
> >
> > -static notrace u64 arch_counter_get_cntpct_stable(void)
> > +static noinstr u64 _arch_counter_get_cntpct_stable(void)
> >  {
> >       return __arch_counter_get_cntpct_stable();
> >  }
> >
> 
> I tripped over the underscores when reviewing this :( This must be
> getting close to the symbol length limit, but could we give this a helpful
> prefix or suffix? raw_* or *_noinstr?
> 
> > @@ -1074,6 +1092,13 @@ struct arch_timer_kvm_info *arch_timer_g
> >
> >  static void __init arch_counter_register(unsigned type)
> >  {
> > +	/*
> > +	 * Default to cp15 based access because arm64 uses this function for
> > +	 * sched_clock() before DT is probed and the cp15 method is guaranteed
> > +	 * to exist on arm64. arm doesn't use this before DT is probed so even
> > +	 * if we don't have the cp15 accessors we won't have a problem.
> > +	 */
> > +	u64 (*scr)(void) = arch_counter_get_cntvct;
> 
> So this bit sent me on a little spelunking session :-)
> 
> From a control flow perspective the initialization isn't required, but then
> I looked into the comment and found it comes from the
> arch_timer_read_counter() definition... Which itself doesn't get used by
> sched_clock() until the sched_clock_register() below!
> 
> So AFAICT that comment was true as of
> 
>   220069945b29 ("clocksource: arch_timer: Add support for memory mapped timers")
> 
> but not after a commit that came 2 months later:
> 
>   65cd4f6c99c1 ("arch_timer: Move to generic sched_clock framework")
> 
> which IIUC made arm/arm64 follow the default approach of using the
> jiffy-based sched_clock() before probing DT/ACPI and registering a "proper"
> sched_clock.
> 
> All of that to say: the comment about arch_timer_read_counter() vs early
> sched_clock() doesn't apply anymore, but I think we need to keep its
> initalization around for stuff like get_cycles(). This initialization here
> should be OK to put to the bin, though.

Something like the below folded in then?

---
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -191,7 +191,7 @@ u32 arch_timer_reg_read(int access, enum
 	return val;
 }
 
-static noinstr u64 _arch_counter_get_cntpct_stable(void)
+static noinstr u64 raw_counter_get_cntpct_stable(void)
 {
 	return __arch_counter_get_cntpct_stable();
 }
@@ -210,7 +210,7 @@ static noinstr u64 arch_counter_get_cntp
 	return __arch_counter_get_cntpct();
 }
 
-static noinstr u64 _arch_counter_get_cntvct_stable(void)
+static noinstr u64 raw_counter_get_cntvct_stable(void)
 {
 	return __arch_counter_get_cntvct_stable();
 }
@@ -1092,13 +1092,7 @@ struct arch_timer_kvm_info *arch_timer_g
 
 static void __init arch_counter_register(unsigned type)
 {
-	/*
-	 * Default to cp15 based access because arm64 uses this function for
-	 * sched_clock() before DT is probed and the cp15 method is guaranteed
-	 * to exist on arm64. arm doesn't use this before DT is probed so even
-	 * if we don't have the cp15 accessors we won't have a problem.
-	 */
-	u64 (*scr)(void) = arch_counter_get_cntvct;
+	u64 (*scr)(void);
 	u64 start_count;
 	int width;
 
@@ -1110,7 +1104,7 @@ static void __init arch_counter_register
 		    arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI) {
 			if (arch_timer_counter_has_wa()) {
 				rd = arch_counter_get_cntvct_stable;
-				scr = _arch_counter_get_cntvct_stable;
+				scr = raw_counter_get_cntvct_stable;
 			} else {
 				rd = arch_counter_get_cntvct;
 				scr = arch_counter_get_cntvct;
@@ -1118,7 +1112,7 @@ static void __init arch_counter_register
 		} else {
 			if (arch_timer_counter_has_wa()) {
 				rd = arch_counter_get_cntpct_stable;
-				scr = _arch_counter_get_cntpct_stable;
+				scr = raw_counter_get_cntpct_stable;
 			} else {
 				rd = arch_counter_get_cntpct;
 				scr = arch_counter_get_cntpct;

^ permalink raw reply

* [PATCH 5/5] Drivers: hv: Remove fcopy driver
From: Saurabh Sengar @ 2023-06-02  7:57 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, mikelley, gregkh, linux-kernel,
	linux-hyperv
In-Reply-To: <1685692629-31351-1-git-send-email-ssengar@linux.microsoft.com>

As the new fcopy driver using uio is introduced, remove obsolete driver.

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
---
 drivers/hv/Makefile   |   2 +-
 drivers/hv/hv_fcopy.c | 427 ------------------------------------------
 drivers/hv/hv_util.c  |  12 --
 3 files changed, 1 insertion(+), 440 deletions(-)
 delete mode 100644 drivers/hv/hv_fcopy.c

diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile
index d76df5c8c2a9..b992c0ed182b 100644
--- a/drivers/hv/Makefile
+++ b/drivers/hv/Makefile
@@ -10,7 +10,7 @@ hv_vmbus-y := vmbus_drv.o \
 		 hv.o connection.o channel.o \
 		 channel_mgmt.o ring_buffer.o hv_trace.o
 hv_vmbus-$(CONFIG_HYPERV_TESTING)	+= hv_debugfs.o
-hv_utils-y := hv_util.o hv_kvp.o hv_snapshot.o hv_fcopy.o hv_utils_transport.o
+hv_utils-y := hv_util.o hv_kvp.o hv_snapshot.o hv_utils_transport.o
 
 # Code that must be built-in
 obj-$(subst m,y,$(CONFIG_HYPERV)) += hv_common.o
diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
deleted file mode 100644
index 922d83eb7ddf..000000000000
--- a/drivers/hv/hv_fcopy.c
+++ /dev/null
@@ -1,427 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * An implementation of file copy service.
- *
- * Copyright (C) 2014, Microsoft, Inc.
- *
- * Author : K. Y. Srinivasan <ksrinivasan@novell.com>
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/nls.h>
-#include <linux/workqueue.h>
-#include <linux/hyperv.h>
-#include <linux/sched.h>
-#include <asm/hyperv-tlfs.h>
-
-#include "hyperv_vmbus.h"
-#include "hv_utils_transport.h"
-
-#define WIN8_SRV_MAJOR		1
-#define WIN8_SRV_MINOR		1
-#define WIN8_SRV_VERSION	(WIN8_SRV_MAJOR << 16 | WIN8_SRV_MINOR)
-
-#define FCOPY_VER_COUNT 1
-static const int fcopy_versions[] = {
-	WIN8_SRV_VERSION
-};
-
-#define FW_VER_COUNT 1
-static const int fw_versions[] = {
-	UTIL_FW_VERSION
-};
-
-/*
- * Global state maintained for transaction that is being processed.
- * For a class of integration services, including the "file copy service",
- * the specified protocol is a "request/response" protocol which means that
- * there can only be single outstanding transaction from the host at any
- * given point in time. We use this to simplify memory management in this
- * driver - we cache and process only one message at a time.
- *
- * While the request/response protocol is guaranteed by the host, we further
- * ensure this by serializing packet processing in this driver - we do not
- * read additional packets from the VMBUs until the current packet is fully
- * handled.
- */
-
-static struct {
-	int state;   /* hvutil_device_state */
-	int recv_len; /* number of bytes received. */
-	struct hv_fcopy_hdr  *fcopy_msg; /* current message */
-	struct vmbus_channel *recv_channel; /* chn we got the request */
-	u64 recv_req_id; /* request ID. */
-} fcopy_transaction;
-
-static void fcopy_respond_to_host(int error);
-static void fcopy_send_data(struct work_struct *dummy);
-static void fcopy_timeout_func(struct work_struct *dummy);
-static DECLARE_DELAYED_WORK(fcopy_timeout_work, fcopy_timeout_func);
-static DECLARE_WORK(fcopy_send_work, fcopy_send_data);
-static const char fcopy_devname[] = "vmbus/hv_fcopy";
-static u8 *recv_buffer;
-static struct hvutil_transport *hvt;
-/*
- * This state maintains the version number registered by the daemon.
- */
-static int dm_reg_value;
-
-static void fcopy_poll_wrapper(void *channel)
-{
-	/* Transaction is finished, reset the state here to avoid races. */
-	fcopy_transaction.state = HVUTIL_READY;
-	tasklet_schedule(&((struct vmbus_channel *)channel)->callback_event);
-}
-
-static void fcopy_timeout_func(struct work_struct *dummy)
-{
-	/*
-	 * If the timer fires, the user-mode component has not responded;
-	 * process the pending transaction.
-	 */
-	fcopy_respond_to_host(HV_E_FAIL);
-	hv_poll_channel(fcopy_transaction.recv_channel, fcopy_poll_wrapper);
-}
-
-static void fcopy_register_done(void)
-{
-	pr_debug("FCP: userspace daemon registered\n");
-	hv_poll_channel(fcopy_transaction.recv_channel, fcopy_poll_wrapper);
-}
-
-static int fcopy_handle_handshake(u32 version)
-{
-	u32 our_ver = FCOPY_CURRENT_VERSION;
-
-	switch (version) {
-	case FCOPY_VERSION_0:
-		/* Daemon doesn't expect us to reply */
-		dm_reg_value = version;
-		break;
-	case FCOPY_VERSION_1:
-		/* Daemon expects us to reply with our own version */
-		if (hvutil_transport_send(hvt, &our_ver, sizeof(our_ver),
-		    fcopy_register_done))
-			return -EFAULT;
-		dm_reg_value = version;
-		break;
-	default:
-		/*
-		 * For now we will fail the registration.
-		 * If and when we have multiple versions to
-		 * deal with, we will be backward compatible.
-		 * We will add this code when needed.
-		 */
-		return -EINVAL;
-	}
-	pr_debug("FCP: userspace daemon ver. %d connected\n", version);
-	return 0;
-}
-
-static void fcopy_send_data(struct work_struct *dummy)
-{
-	struct hv_start_fcopy *smsg_out = NULL;
-	int operation = fcopy_transaction.fcopy_msg->operation;
-	struct hv_start_fcopy *smsg_in;
-	void *out_src;
-	int rc, out_len;
-
-	/*
-	 * The  strings sent from the host are encoded in
-	 * utf16; convert it to utf8 strings.
-	 * The host assures us that the utf16 strings will not exceed
-	 * the max lengths specified. We will however, reserve room
-	 * for the string terminating character - in the utf16s_utf8s()
-	 * function we limit the size of the buffer where the converted
-	 * string is placed to W_MAX_PATH -1 to guarantee
-	 * that the strings can be properly terminated!
-	 */
-
-	switch (operation) {
-	case START_FILE_COPY:
-		out_len = sizeof(struct hv_start_fcopy);
-		smsg_out = kzalloc(sizeof(*smsg_out), GFP_KERNEL);
-		if (!smsg_out)
-			return;
-
-		smsg_out->hdr.operation = operation;
-		smsg_in = (struct hv_start_fcopy *)fcopy_transaction.fcopy_msg;
-
-		utf16s_to_utf8s((wchar_t *)smsg_in->file_name, W_MAX_PATH,
-				UTF16_LITTLE_ENDIAN,
-				(__u8 *)&smsg_out->file_name, W_MAX_PATH - 1);
-
-		utf16s_to_utf8s((wchar_t *)smsg_in->path_name, W_MAX_PATH,
-				UTF16_LITTLE_ENDIAN,
-				(__u8 *)&smsg_out->path_name, W_MAX_PATH - 1);
-
-		smsg_out->copy_flags = smsg_in->copy_flags;
-		smsg_out->file_size = smsg_in->file_size;
-		out_src = smsg_out;
-		break;
-
-	case WRITE_TO_FILE:
-		out_src = fcopy_transaction.fcopy_msg;
-		out_len = sizeof(struct hv_do_fcopy);
-		break;
-	default:
-		out_src = fcopy_transaction.fcopy_msg;
-		out_len = fcopy_transaction.recv_len;
-		break;
-	}
-
-	fcopy_transaction.state = HVUTIL_USERSPACE_REQ;
-	rc = hvutil_transport_send(hvt, out_src, out_len, NULL);
-	if (rc) {
-		pr_debug("FCP: failed to communicate to the daemon: %d\n", rc);
-		if (cancel_delayed_work_sync(&fcopy_timeout_work)) {
-			fcopy_respond_to_host(HV_E_FAIL);
-			fcopy_transaction.state = HVUTIL_READY;
-		}
-	}
-	kfree(smsg_out);
-}
-
-/*
- * Send a response back to the host.
- */
-
-static void
-fcopy_respond_to_host(int error)
-{
-	struct icmsg_hdr *icmsghdr;
-	u32 buf_len;
-	struct vmbus_channel *channel;
-	u64 req_id;
-
-	/*
-	 * Copy the global state for completing the transaction. Note that
-	 * only one transaction can be active at a time. This is guaranteed
-	 * by the file copy protocol implemented by the host. Furthermore,
-	 * the "transaction active" state we maintain ensures that there can
-	 * only be one active transaction at a time.
-	 */
-
-	buf_len = fcopy_transaction.recv_len;
-	channel = fcopy_transaction.recv_channel;
-	req_id = fcopy_transaction.recv_req_id;
-
-	icmsghdr = (struct icmsg_hdr *)
-			&recv_buffer[sizeof(struct vmbuspipe_hdr)];
-
-	if (channel->onchannel_callback == NULL)
-		/*
-		 * We have raced with util driver being unloaded;
-		 * silently return.
-		 */
-		return;
-
-	icmsghdr->status = error;
-	icmsghdr->icflags = ICMSGHDRFLAG_TRANSACTION | ICMSGHDRFLAG_RESPONSE;
-	vmbus_sendpacket(channel, recv_buffer, buf_len, req_id,
-				VM_PKT_DATA_INBAND, 0);
-}
-
-void hv_fcopy_onchannelcallback(void *context)
-{
-	struct vmbus_channel *channel = context;
-	u32 recvlen;
-	u64 requestid;
-	struct hv_fcopy_hdr *fcopy_msg;
-	struct icmsg_hdr *icmsghdr;
-	int fcopy_srv_version;
-
-	if (fcopy_transaction.state > HVUTIL_READY)
-		return;
-
-	if (vmbus_recvpacket(channel, recv_buffer, HV_HYP_PAGE_SIZE * 2, &recvlen, &requestid)) {
-		pr_err_ratelimited("Fcopy request received. Could not read into recv buf\n");
-		return;
-	}
-
-	if (!recvlen)
-		return;
-
-	/* Ensure recvlen is big enough to read header data */
-	if (recvlen < ICMSG_HDR) {
-		pr_err_ratelimited("Fcopy request received. Packet length too small: %d\n",
-				   recvlen);
-		return;
-	}
-
-	icmsghdr = (struct icmsg_hdr *)&recv_buffer[
-			sizeof(struct vmbuspipe_hdr)];
-
-	if (icmsghdr->icmsgtype == ICMSGTYPE_NEGOTIATE) {
-		if (vmbus_prep_negotiate_resp(icmsghdr,
-				recv_buffer, recvlen,
-				fw_versions, FW_VER_COUNT,
-				fcopy_versions, FCOPY_VER_COUNT,
-				NULL, &fcopy_srv_version)) {
-
-			pr_info("FCopy IC version %d.%d\n",
-				fcopy_srv_version >> 16,
-				fcopy_srv_version & 0xFFFF);
-		}
-	} else if (icmsghdr->icmsgtype == ICMSGTYPE_FCOPY) {
-		/* Ensure recvlen is big enough to contain hv_fcopy_hdr */
-		if (recvlen < ICMSG_HDR + sizeof(struct hv_fcopy_hdr)) {
-			pr_err_ratelimited("Invalid Fcopy hdr. Packet length too small: %u\n",
-					   recvlen);
-			return;
-		}
-		fcopy_msg = (struct hv_fcopy_hdr *)&recv_buffer[ICMSG_HDR];
-
-		/*
-		 * Stash away this global state for completing the
-		 * transaction; note transactions are serialized.
-		 */
-
-		fcopy_transaction.recv_len = recvlen;
-		fcopy_transaction.recv_req_id = requestid;
-		fcopy_transaction.fcopy_msg = fcopy_msg;
-
-		if (fcopy_transaction.state < HVUTIL_READY) {
-			/* Userspace is not registered yet */
-			fcopy_respond_to_host(HV_E_FAIL);
-			return;
-		}
-		fcopy_transaction.state = HVUTIL_HOSTMSG_RECEIVED;
-
-		/*
-		 * Send the information to the user-level daemon.
-		 */
-		schedule_work(&fcopy_send_work);
-		schedule_delayed_work(&fcopy_timeout_work,
-				      HV_UTIL_TIMEOUT * HZ);
-		return;
-	} else {
-		pr_err_ratelimited("Fcopy request received. Invalid msg type: %d\n",
-				   icmsghdr->icmsgtype);
-		return;
-	}
-	icmsghdr->icflags = ICMSGHDRFLAG_TRANSACTION | ICMSGHDRFLAG_RESPONSE;
-	vmbus_sendpacket(channel, recv_buffer, recvlen, requestid,
-			VM_PKT_DATA_INBAND, 0);
-}
-
-/* Callback when data is received from userspace */
-static int fcopy_on_msg(void *msg, int len)
-{
-	int *val = (int *)msg;
-
-	if (len != sizeof(int))
-		return -EINVAL;
-
-	if (fcopy_transaction.state == HVUTIL_DEVICE_INIT)
-		return fcopy_handle_handshake(*val);
-
-	if (fcopy_transaction.state != HVUTIL_USERSPACE_REQ)
-		return -EINVAL;
-
-	/*
-	 * Complete the transaction by forwarding the result
-	 * to the host. But first, cancel the timeout.
-	 */
-	if (cancel_delayed_work_sync(&fcopy_timeout_work)) {
-		fcopy_transaction.state = HVUTIL_USERSPACE_RECV;
-		fcopy_respond_to_host(*val);
-		hv_poll_channel(fcopy_transaction.recv_channel,
-				fcopy_poll_wrapper);
-	}
-
-	return 0;
-}
-
-static void fcopy_on_reset(void)
-{
-	/*
-	 * The daemon has exited; reset the state.
-	 */
-	fcopy_transaction.state = HVUTIL_DEVICE_INIT;
-
-	if (cancel_delayed_work_sync(&fcopy_timeout_work))
-		fcopy_respond_to_host(HV_E_FAIL);
-}
-
-int hv_fcopy_init(struct hv_util_service *srv)
-{
-	recv_buffer = srv->recv_buffer;
-	fcopy_transaction.recv_channel = srv->channel;
-	fcopy_transaction.recv_channel->max_pkt_size = HV_HYP_PAGE_SIZE * 2;
-
-	/*
-	 * When this driver loads, the user level daemon that
-	 * processes the host requests may not yet be running.
-	 * Defer processing channel callbacks until the daemon
-	 * has registered.
-	 */
-	fcopy_transaction.state = HVUTIL_DEVICE_INIT;
-
-	hvt = hvutil_transport_init(fcopy_devname, 0, 0,
-				    fcopy_on_msg, fcopy_on_reset);
-	if (!hvt)
-		return -EFAULT;
-
-	return 0;
-}
-
-static void hv_fcopy_cancel_work(void)
-{
-	cancel_delayed_work_sync(&fcopy_timeout_work);
-	cancel_work_sync(&fcopy_send_work);
-}
-
-int hv_fcopy_pre_suspend(void)
-{
-	struct vmbus_channel *channel = fcopy_transaction.recv_channel;
-	struct hv_fcopy_hdr *fcopy_msg;
-
-	/*
-	 * Fake a CANCEL_FCOPY message for the user space daemon in case the
-	 * daemon is in the middle of copying some file. It doesn't matter if
-	 * there is already a message pending to be delivered to the user
-	 * space since we force fcopy_transaction.state to be HVUTIL_READY, so
-	 * the user space daemon's write() will fail with EINVAL (see
-	 * fcopy_on_msg()), and the daemon will reset the device by closing
-	 * and re-opening it.
-	 */
-	fcopy_msg = kzalloc(sizeof(*fcopy_msg), GFP_KERNEL);
-	if (!fcopy_msg)
-		return -ENOMEM;
-
-	tasklet_disable(&channel->callback_event);
-
-	fcopy_msg->operation = CANCEL_FCOPY;
-
-	hv_fcopy_cancel_work();
-
-	/* We don't care about the return value. */
-	hvutil_transport_send(hvt, fcopy_msg, sizeof(*fcopy_msg), NULL);
-
-	kfree(fcopy_msg);
-
-	fcopy_transaction.state = HVUTIL_READY;
-
-	/* tasklet_enable() will be called in hv_fcopy_pre_resume(). */
-	return 0;
-}
-
-int hv_fcopy_pre_resume(void)
-{
-	struct vmbus_channel *channel = fcopy_transaction.recv_channel;
-
-	tasklet_enable(&channel->callback_event);
-
-	return 0;
-}
-
-void hv_fcopy_deinit(void)
-{
-	fcopy_transaction.state = HVUTIL_DEVICE_DYING;
-
-	hv_fcopy_cancel_work();
-
-	hvutil_transport_destroy(hvt);
-}
diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c
index 42aec2c5606a..4bf16e58ee50 100644
--- a/drivers/hv/hv_util.c
+++ b/drivers/hv/hv_util.c
@@ -154,14 +154,6 @@ static struct hv_util_service util_vss = {
 	.util_deinit = hv_vss_deinit,
 };
 
-static struct hv_util_service util_fcopy = {
-	.util_cb = hv_fcopy_onchannelcallback,
-	.util_init = hv_fcopy_init,
-	.util_pre_suspend = hv_fcopy_pre_suspend,
-	.util_pre_resume = hv_fcopy_pre_resume,
-	.util_deinit = hv_fcopy_deinit,
-};
-
 static void perform_shutdown(struct work_struct *dummy)
 {
 	orderly_poweroff(true);
@@ -671,10 +663,6 @@ static const struct hv_vmbus_device_id id_table[] = {
 	{ HV_VSS_GUID,
 	  .driver_data = (unsigned long)&util_vss
 	},
-	/* File copy GUID */
-	{ HV_FCOPY_GUID,
-	  .driver_data = (unsigned long)&util_fcopy
-	},
 	{ },
 };
 
-- 
2.34.1


^ permalink raw reply related

* [PATCH 2/5] tools: hv: Add vmbus_bufring
From: Saurabh Sengar @ 2023-06-02  7:57 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, mikelley, gregkh, linux-kernel,
	linux-hyperv
In-Reply-To: <1685692629-31351-1-git-send-email-ssengar@linux.microsoft.com>

Common userspace interface for read/write from VMBus ringbuffer.
This implementation is open for use by any userspace driver or
application seeking direct control over VMBus ring buffers.
A significant  part of this code is borrowed from DPDK.
Link: https://github.com/DPDK/dpdk/

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
---
 tools/hv/vmbus_bufring.c | 324 +++++++++++++++++++++++++++++++++++++++
 tools/hv/vmbus_bufring.h | 158 +++++++++++++++++++
 2 files changed, 482 insertions(+)
 create mode 100644 tools/hv/vmbus_bufring.c
 create mode 100644 tools/hv/vmbus_bufring.h

diff --git a/tools/hv/vmbus_bufring.c b/tools/hv/vmbus_bufring.c
new file mode 100644
index 000000000000..eb61042f7605
--- /dev/null
+++ b/tools/hv/vmbus_bufring.c
@@ -0,0 +1,324 @@
+// SPDX-License-Identifier: BSD-3-Clause
+/*
+ * Copyright (c) 2009-2012,2016,2023 Microsoft Corp.
+ * Copyright (c) 2012 NetApp Inc.
+ * Copyright (c) 2012 Citrix Inc.
+ * All rights reserved.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <emmintrin.h>
+#include <linux/limits.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/uio.h>
+#include <unistd.h>
+#include "vmbus_bufring.h"
+
+#define	rte_compiler_barrier()	({ asm volatile ("" : : : "memory"); })
+
+#define	rte_smp_rwmb()		({ asm volatile ("" : : : "memory"); })
+
+#define VMBUS_RQST_ERROR	0xFFFFFFFFFFFFFFFF
+#define ALIGN(val, align)	((typeof(val))((val) & (~((typeof(val))((align) - 1)))))
+
+void *vmbus_uio_map(char *sys_chan_path, char *chan_no, int size)
+{
+	void *map;
+	int fd;
+	char dirname[PATH_MAX];
+
+	snprintf(dirname, sizeof(dirname), "%s/%s/ringbuffer", sys_chan_path, chan_no);
+	fd = open(dirname, O_RDWR);
+	if (fd < 0)
+		return NULL;
+
+	map = mmap(NULL, 2 * size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	close(fd);
+	if (map == MAP_FAILED)
+		return NULL;
+
+	return map;
+}
+
+/* Increase bufring index by inc with wraparound */
+static inline uint32_t vmbus_br_idxinc(uint32_t idx, uint32_t inc, uint32_t sz)
+{
+	idx += inc;
+	if (idx >= sz)
+		idx -= sz;
+
+	return idx;
+}
+
+void vmbus_br_setup(struct vmbus_br *br, void *buf, unsigned int blen)
+{
+	br->vbr = buf;
+	br->windex = br->vbr->windex;
+	br->dsize = blen - sizeof(struct vmbus_bufring);
+}
+
+static inline __always_inline void
+rte_smp_mb(void)
+{
+	asm volatile("lock addl $0, -128(%%rsp); " ::: "memory");
+}
+
+static inline int
+rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src)
+{
+	uint8_t res;
+
+	asm volatile("lock ; "
+		     "cmpxchgl %[src], %[dst];"
+		     "sete %[res];"
+		     : [res] "=a" (res),     /* output */
+		     [dst] "=m" (*dst)
+		     : [src] "r" (src),      /* input */
+		     "a" (exp),
+		     "m" (*dst)
+		     : "memory");            /* no-clobber list */
+	return res;
+}
+
+static inline uint32_t
+vmbus_txbr_copyto(const struct vmbus_br *tbr, uint32_t windex,
+		  const void *src0, uint32_t cplen)
+{
+	uint8_t *br_data = tbr->vbr->data;
+	uint32_t br_dsize = tbr->dsize;
+	const uint8_t *src = src0;
+
+	/* XXX use double mapping like Linux kernel? */
+	if (cplen > br_dsize - windex) {
+		uint32_t fraglen = br_dsize - windex;
+
+		/* Wrap-around detected */
+		memcpy(br_data + windex, src, fraglen);
+		memcpy(br_data, src + fraglen, cplen - fraglen);
+	} else {
+		memcpy(br_data + windex, src, cplen);
+	}
+
+	return vmbus_br_idxinc(windex, cplen, br_dsize);
+}
+
+/*
+ * Write scattered channel packet to TX bufring.
+ *
+ * The offset of this channel packet is written as a 64bits value
+ * immediately after this channel packet.
+ *
+ * The write goes through three stages:
+ *  1. Reserve space in ring buffer for the new data.
+ *     Writer atomically moves priv_write_index.
+ *  2. Copy the new data into the ring.
+ *  3. Update the tail of the ring (visible to host) that indicates
+ *     next read location. Writer updates write_index
+ */
+static int
+vmbus_txbr_write(struct vmbus_br *tbr, const struct iovec iov[], int iovlen,
+		 bool *need_sig)
+{
+	struct vmbus_bufring *vbr = tbr->vbr;
+	uint32_t ring_size = tbr->dsize;
+	uint32_t old_windex, next_windex, windex, total;
+	uint64_t save_windex;
+	int i;
+
+	total = 0;
+	for (i = 0; i < iovlen; i++)
+		total += iov[i].iov_len;
+	total += sizeof(save_windex);
+
+	/* Reserve space in ring */
+	do {
+		uint32_t avail;
+
+		/* Get current free location */
+		old_windex = tbr->windex;
+
+		/* Prevent compiler reordering this with calculation */
+		rte_compiler_barrier();
+
+		avail = vmbus_br_availwrite(tbr, old_windex);
+
+		/* If not enough space in ring, then tell caller. */
+		if (avail <= total)
+			return -EAGAIN;
+
+		next_windex = vmbus_br_idxinc(old_windex, total, ring_size);
+
+		/* Atomic update of next write_index for other threads */
+	} while (!rte_atomic32_cmpset(&tbr->windex, old_windex, next_windex));
+
+	/* Space from old..new is now reserved */
+	windex = old_windex;
+	for (i = 0; i < iovlen; i++)
+		windex = vmbus_txbr_copyto(tbr, windex, iov[i].iov_base, iov[i].iov_len);
+
+	/* Set the offset of the current channel packet. */
+	save_windex = ((uint64_t)old_windex) << 32;
+	windex = vmbus_txbr_copyto(tbr, windex, &save_windex,
+				   sizeof(save_windex));
+
+	/* The region reserved should match region used */
+	if (windex != next_windex)
+		return -EINVAL;
+
+	/* Ensure that data is available before updating host index */
+	rte_smp_rwmb();
+
+	/* Checkin for our reservation. wait for our turn to update host */
+	while (!rte_atomic32_cmpset(&vbr->windex, old_windex, next_windex))
+		_mm_pause();
+
+	return 0;
+}
+
+int rte_vmbus_chan_send(struct vmbus_br *txbr, uint16_t type, void *data,
+			uint32_t dlen, uint32_t flags)
+{
+	struct vmbus_chanpkt pkt;
+	unsigned int pktlen, pad_pktlen;
+	const uint32_t hlen = sizeof(pkt);
+	bool send_evt = false;
+	uint64_t pad = 0;
+	struct iovec iov[3];
+	int error;
+
+	pktlen = hlen + dlen;
+	pad_pktlen = ALIGN(pktlen, sizeof(uint64_t));
+
+	pkt.hdr.type = type;
+	pkt.hdr.flags = flags;
+	pkt.hdr.hlen = hlen >> VMBUS_CHANPKT_SIZE_SHIFT;
+	pkt.hdr.tlen = pad_pktlen >> VMBUS_CHANPKT_SIZE_SHIFT;
+	pkt.hdr.xactid = VMBUS_RQST_ERROR;
+
+	iov[0].iov_base = &pkt;
+	iov[0].iov_len = hlen;
+	iov[1].iov_base = data;
+	iov[1].iov_len = dlen;
+	iov[2].iov_base = &pad;
+	iov[2].iov_len = pad_pktlen - pktlen;
+
+	error = vmbus_txbr_write(txbr, iov, 3, &send_evt);
+
+	return error;
+}
+
+static inline uint32_t
+vmbus_rxbr_copyfrom(const struct vmbus_br *rbr, uint32_t rindex,
+		    void *dst0, size_t cplen)
+{
+	const uint8_t *br_data = rbr->vbr->data;
+	uint32_t br_dsize = rbr->dsize;
+	uint8_t *dst = dst0;
+
+	if (cplen > br_dsize - rindex) {
+		uint32_t fraglen = br_dsize - rindex;
+
+		/* Wrap-around detected. */
+		memcpy(dst, br_data + rindex, fraglen);
+		memcpy(dst + fraglen, br_data, cplen - fraglen);
+	} else {
+		memcpy(dst, br_data + rindex, cplen);
+	}
+
+	return vmbus_br_idxinc(rindex, cplen, br_dsize);
+}
+
+/* Copy data from receive ring but don't change index */
+static int
+vmbus_rxbr_peek(const struct vmbus_br *rbr, void *data, size_t dlen)
+{
+	uint32_t avail;
+
+	/*
+	 * The requested data and the 64bits channel packet
+	 * offset should be there at least.
+	 */
+	avail = vmbus_br_availread(rbr);
+	if (avail < dlen + sizeof(uint64_t))
+		return -EAGAIN;
+
+	vmbus_rxbr_copyfrom(rbr, rbr->vbr->rindex, data, dlen);
+	return 0;
+}
+
+/*
+ * Copy data from receive ring and change index
+ * NOTE:
+ * We assume (dlen + skip) == sizeof(channel packet).
+ */
+static int
+vmbus_rxbr_read(struct vmbus_br *rbr, void *data, size_t dlen, size_t skip)
+{
+	struct vmbus_bufring *vbr = rbr->vbr;
+	uint32_t br_dsize = rbr->dsize;
+	uint32_t rindex;
+
+	if (vmbus_br_availread(rbr) < dlen + skip + sizeof(uint64_t))
+		return -EAGAIN;
+
+	/* Record where host was when we started read (for debug) */
+	rbr->windex = rbr->vbr->windex;
+
+	/*
+	 * Copy channel packet from RX bufring.
+	 */
+	rindex = vmbus_br_idxinc(rbr->vbr->rindex, skip, br_dsize);
+	rindex = vmbus_rxbr_copyfrom(rbr, rindex, data, dlen);
+
+	/*
+	 * Discard this channel packet's 64bits offset, which is useless to us.
+	 */
+	rindex = vmbus_br_idxinc(rindex, sizeof(uint64_t), br_dsize);
+
+	/* Update the read index _after_ the channel packet is fetched.	 */
+	rte_compiler_barrier();
+
+	vbr->rindex = rindex;
+
+	return 0;
+}
+
+int rte_vmbus_chan_recv_raw(struct vmbus_br *rxbr,
+			    void *data, uint32_t *len)
+{
+	struct vmbus_chanpkt_hdr pkt;
+	uint32_t dlen, bufferlen = *len;
+	int error;
+
+	error = vmbus_rxbr_peek(rxbr, &pkt, sizeof(pkt));
+	if (error)
+		return error;
+
+	if (unlikely(pkt.hlen < VMBUS_CHANPKT_HLEN_MIN))
+		/* XXX this channel is dead actually. */
+		return -EIO;
+
+	if (unlikely(pkt.hlen > pkt.tlen))
+		return -EIO;
+
+	/* Length are in quad words */
+	dlen = pkt.tlen << VMBUS_CHANPKT_SIZE_SHIFT;
+	*len = dlen;
+
+	/* If caller buffer is not large enough */
+	if (unlikely(dlen > bufferlen))
+		return -ENOBUFS;
+
+	/* Read data and skip packet header */
+	error = vmbus_rxbr_read(rxbr, data, dlen, 0);
+	if (error)
+		return error;
+
+	/* Return the number of bytes read */
+	return dlen + sizeof(uint64_t);
+}
diff --git a/tools/hv/vmbus_bufring.h b/tools/hv/vmbus_bufring.h
new file mode 100644
index 000000000000..5ebe400132cc
--- /dev/null
+++ b/tools/hv/vmbus_bufring.h
@@ -0,0 +1,158 @@
+/* SPDX-License-Identifier: BSD-3-Clause */
+
+#ifndef _VMBUS_BUF_H_
+#define _VMBUS_BUF_H_
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#define __packed   __attribute__((__packed__))
+#define unlikely(x)	__builtin_expect(!!(x), 0)
+
+#define ICMSGHDRFLAG_TRANSACTION	1
+#define ICMSGHDRFLAG_REQUEST		2
+#define ICMSGHDRFLAG_RESPONSE		4
+
+#define IC_VERSION_NEGOTIATION_MAX_VER_COUNT 100
+#define ICMSG_HDR (sizeof(struct vmbuspipe_hdr) + sizeof(struct icmsg_hdr))
+#define ICMSG_NEGOTIATE_PKT_SIZE(icframe_vercnt, icmsg_vercnt) \
+	(ICMSG_HDR + sizeof(struct icmsg_negotiate) + \
+	 (((icframe_vercnt) + (icmsg_vercnt)) * sizeof(struct ic_version)))
+
+/*
+ * Channel packets
+ */
+
+/* Channel packet flags */
+#define VMBUS_CHANPKT_TYPE_INBAND	0x0006
+#define VMBUS_CHANPKT_TYPE_RXBUF	0x0007
+#define VMBUS_CHANPKT_TYPE_GPA		0x0009
+#define VMBUS_CHANPKT_TYPE_COMP		0x000b
+
+#define VMBUS_CHANPKT_FLAG_NONE		0
+#define VMBUS_CHANPKT_FLAG_RC		0x0001  /* report completion */
+
+#define VMBUS_CHANPKT_SIZE_SHIFT	3
+#define VMBUS_CHANPKT_SIZE_ALIGN	BIT(VMBUS_CHANPKT_SIZE_SHIFT)
+#define VMBUS_CHANPKT_HLEN_MIN		\
+	(sizeof(struct vmbus_chanpkt_hdr) >> VMBUS_CHANPKT_SIZE_SHIFT)
+
+/*
+ * Buffer ring
+ */
+struct vmbus_bufring {
+	volatile uint32_t windex;
+	volatile uint32_t rindex;
+
+	/*
+	 * Interrupt mask {0,1}
+	 *
+	 * For TX bufring, host set this to 1, when it is processing
+	 * the TX bufring, so that we can safely skip the TX event
+	 * notification to host.
+	 *
+	 * For RX bufring, once this is set to 1 by us, host will not
+	 * further dispatch interrupts to us, even if there are data
+	 * pending on the RX bufring.  This effectively disables the
+	 * interrupt of the channel to which this RX bufring is attached.
+	 */
+	volatile uint32_t imask;
+
+	/*
+	 * Win8 uses some of the reserved bits to implement
+	 * interrupt driven flow management. On the send side
+	 * we can request that the receiver interrupt the sender
+	 * when the ring transitions from being full to being able
+	 * to handle a message of size "pending_send_sz".
+	 *
+	 * Add necessary state for this enhancement.
+	 */
+	volatile uint32_t pending_send;
+	uint32_t reserved1[12];
+
+	union {
+		struct {
+			uint32_t feat_pending_send_sz:1;
+		};
+		uint32_t value;
+	} feature_bits;
+
+	/* Pad it to rte_mem_page_size() so that data starts on page boundary */
+	uint8_t	reserved2[4028];
+
+	/*
+	 * Ring data starts here + RingDataStartOffset
+	 * !!! DO NOT place any fields below this !!!
+	 */
+	uint8_t data[];
+} __packed;
+
+struct vmbus_br {
+	struct vmbus_bufring *vbr;
+	uint32_t	dsize;
+	uint32_t	windex; /* next available location */
+};
+
+struct vmbus_chanpkt_hdr {
+	uint16_t	type;	/* VMBUS_CHANPKT_TYPE_ */
+	uint16_t	hlen;	/* header len, in 8 bytes */
+	uint16_t	tlen;	/* total len, in 8 bytes */
+	uint16_t	flags;	/* VMBUS_CHANPKT_FLAG_ */
+	uint64_t	xactid;
+} __packed;
+
+struct vmbus_chanpkt {
+	struct vmbus_chanpkt_hdr hdr;
+} __packed;
+
+struct vmbuspipe_hdr {
+	unsigned int flags;
+	unsigned int msgsize;
+} __packed;
+
+struct ic_version {
+	unsigned short major;
+	unsigned short minor;
+} __packed;
+
+struct icmsg_negotiate {
+	unsigned short icframe_vercnt;
+	unsigned short icmsg_vercnt;
+	unsigned int reserved;
+	struct ic_version icversion_data[]; /* any size array */
+} __packed;
+
+struct icmsg_hdr {
+	struct ic_version icverframe;
+	unsigned short icmsgtype;
+	struct ic_version icvermsg;
+	unsigned short icmsgsize;
+	unsigned int status;
+	unsigned char ictransaction_id;
+	unsigned char icflags;
+	unsigned char reserved[2];
+} __packed;
+
+int rte_vmbus_chan_recv_raw(struct vmbus_br *rxbr, void *data, uint32_t *len);
+int rte_vmbus_chan_send(struct vmbus_br *txbr, uint16_t type, void *data,
+			uint32_t dlen, uint32_t flags);
+void vmbus_br_setup(struct vmbus_br *br, void *buf, unsigned int blen);
+void *vmbus_uio_map(char *sys_dev_path, char *chan_no, int size);
+
+/* Amount of space available for write */
+static inline uint32_t vmbus_br_availwrite(const struct vmbus_br *br, uint32_t windex)
+{
+	uint32_t rindex = br->vbr->rindex;
+
+	if (windex >= rindex)
+		return br->dsize - (windex - rindex);
+	else
+		return rindex - windex;
+}
+
+static inline uint32_t vmbus_br_availread(const struct vmbus_br *br)
+{
+	return br->dsize - vmbus_br_availwrite(br, br->vbr->windex);
+}
+
+#endif	/* !_VMBUS_BUF_H_ */
-- 
2.34.1


^ permalink raw reply related

* [PATCH 3/5] tools: hv: Add new fcopy application based on uio driver
From: Saurabh Sengar @ 2023-06-02  7:57 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, mikelley, gregkh, linux-kernel,
	linux-hyperv
In-Reply-To: <1685692629-31351-1-git-send-email-ssengar@linux.microsoft.com>

New fcopy application which utilizes uio_hv_vmbus_client driver

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
---
 tools/hv/Build                 |   3 +-
 tools/hv/Makefile              |  10 +-
 tools/hv/hv_fcopy_uio_daemon.c | 491 +++++++++++++++++++++++++++++++++
 3 files changed, 498 insertions(+), 6 deletions(-)
 create mode 100644 tools/hv/hv_fcopy_uio_daemon.c

diff --git a/tools/hv/Build b/tools/hv/Build
index 6cf51fa4b306..7d1f1698069b 100644
--- a/tools/hv/Build
+++ b/tools/hv/Build
@@ -1,3 +1,4 @@
 hv_kvp_daemon-y += hv_kvp_daemon.o
 hv_vss_daemon-y += hv_vss_daemon.o
-hv_fcopy_daemon-y += hv_fcopy_daemon.o
+hv_fcopy_uio_daemon-y += hv_fcopy_uio_daemon.o
+hv_fcopy_uio_daemon-y += vmbus_bufring.o
diff --git a/tools/hv/Makefile b/tools/hv/Makefile
index fe770e679ae8..944180cf916e 100644
--- a/tools/hv/Makefile
+++ b/tools/hv/Makefile
@@ -17,7 +17,7 @@ MAKEFLAGS += -r
 
 override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -I$(OUTPUT)include
 
-ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
+ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_uio_daemon
 ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
 
 ALL_SCRIPTS := hv_get_dhcp_info.sh hv_get_dns_info.sh hv_set_ifconfig.sh
@@ -39,10 +39,10 @@ $(HV_VSS_DAEMON_IN): FORCE
 $(OUTPUT)hv_vss_daemon: $(HV_VSS_DAEMON_IN)
 	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
 
-HV_FCOPY_DAEMON_IN := $(OUTPUT)hv_fcopy_daemon-in.o
-$(HV_FCOPY_DAEMON_IN): FORCE
-	$(Q)$(MAKE) $(build)=hv_fcopy_daemon
-$(OUTPUT)hv_fcopy_daemon: $(HV_FCOPY_DAEMON_IN)
+HV_FCOPY_UIO_DAEMON_IN := $(OUTPUT)hv_fcopy_uio_daemon-in.o
+$(HV_FCOPY_UIO_DAEMON_IN): FORCE
+	$(Q)$(MAKE) $(build)=hv_fcopy_uio_daemon
+$(OUTPUT)hv_fcopy_uio_daemon: $(HV_FCOPY_UIO_DAEMON_IN)
 	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
 
 clean:
diff --git a/tools/hv/hv_fcopy_uio_daemon.c b/tools/hv/hv_fcopy_uio_daemon.c
new file mode 100644
index 000000000000..d20e8040b754
--- /dev/null
+++ b/tools/hv/hv_fcopy_uio_daemon.c
@@ -0,0 +1,491 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * An implementation of host to guest copy functionality for Linux.
+ *
+ * Copyright (C) 2023, Microsoft, Inc.
+ *
+ * Author : K. Y. Srinivasan <kys@microsoft.com>
+ * Author : Saurabh Sengar <ssengar@microsoft.com>
+ *
+ */
+
+#include <dirent.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <locale.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <syslog.h>
+#include <unistd.h>
+#include <wchar.h>
+#include <sys/stat.h>
+#include <linux/hyperv.h>
+#include <linux/limits.h>
+#include "vmbus_bufring.h"
+
+#define ICMSGTYPE_NEGOTIATE	0
+#define ICMSGTYPE_FCOPY		7
+
+#define WIN8_SRV_MAJOR		1
+#define WIN8_SRV_MINOR		1
+#define WIN8_SRV_VERSION	(WIN8_SRV_MAJOR << 16 | WIN8_SRV_MINOR)
+
+#define MAX_FOLDER_NAME		15
+#define MAX_PATH_LEN		15
+#define FCOPY_SYSFS_CHAN	"/sys/bus/vmbus/devices/eb765408-105f-49b6-b4aa-c123b64d17d4/channels"
+#define FCOPY_UIO		"/sys/bus/vmbus/devices/eb765408-105f-49b6-b4aa-c123b64d17d4/uio"
+
+#define FCOPY_VER_COUNT		1
+static const int fcopy_versions[] = {
+	WIN8_SRV_VERSION
+};
+
+#define FW_VER_COUNT		1
+static const int fw_versions[] = {
+	UTIL_FW_VERSION
+};
+
+#define HV_RING_SIZE		(4 * 4096)
+
+unsigned char desc[HV_RING_SIZE];
+
+static int target_fd;
+static char target_fname[PATH_MAX];
+static unsigned long long filesize;
+
+static int hv_fcopy_create_file(char *file_name, char *path_name, __u32 flags)
+{
+	int error = HV_E_FAIL;
+	char *q, *p;
+
+	filesize = 0;
+	p = (char *)path_name;
+	snprintf(target_fname, sizeof(target_fname), "%s/%s",
+		 (char *)path_name, (char *)file_name);
+
+	/*
+	 * Check to see if the path is already in place; if not,
+	 * create if required.
+	 */
+	while ((q = strchr(p, '/')) != NULL) {
+		if (q == p) {
+			p++;
+			continue;
+		}
+		*q = '\0';
+		if (access(path_name, F_OK)) {
+			if (flags & CREATE_PATH) {
+				if (mkdir(path_name, 0755)) {
+					syslog(LOG_ERR, "Failed to create %s",
+					       path_name);
+					goto done;
+				}
+			} else {
+				syslog(LOG_ERR, "Invalid path: %s", path_name);
+				goto done;
+			}
+		}
+		p = q + 1;
+		*q = '/';
+	}
+
+	if (!access(target_fname, F_OK)) {
+		syslog(LOG_INFO, "File: %s exists", target_fname);
+		if (!(flags & OVER_WRITE)) {
+			error = HV_ERROR_ALREADY_EXISTS;
+			goto done;
+		}
+	}
+
+	target_fd = open(target_fname,
+			 O_RDWR | O_CREAT | O_TRUNC | O_CLOEXEC, 0744);
+	if (target_fd == -1) {
+		syslog(LOG_INFO, "Open Failed: %s", strerror(errno));
+		goto done;
+	}
+
+	error = 0;
+done:
+	if (error)
+		target_fname[0] = '\0';
+	return error;
+}
+
+/* copy the data into the file */
+static int hv_copy_data(struct hv_do_fcopy *cpmsg)
+{
+	ssize_t len;
+	int ret = 0;
+
+	len = pwrite(target_fd, cpmsg->data, cpmsg->size, cpmsg->offset);
+
+	filesize += cpmsg->size;
+	if (len != cpmsg->size) {
+		switch (errno) {
+		case ENOSPC:
+			ret = HV_ERROR_DISK_FULL;
+			break;
+		default:
+			ret = HV_E_FAIL;
+			break;
+		}
+		syslog(LOG_ERR, "pwrite failed to write %llu bytes: %ld (%s)",
+		       filesize, (long)len, strerror(errno));
+	}
+
+	return ret;
+}
+
+static int hv_copy_finished(void)
+{
+	close(target_fd);
+	target_fname[0] = '\0';
+
+	return 0;
+}
+
+static void print_usage(char *argv[])
+{
+	fprintf(stderr, "Usage: %s [options]\n"
+		"Options are:\n"
+		"  -n, --no-daemon        stay in foreground, don't daemonize\n"
+		"  -h, --help             print this help\n", argv[0]);
+}
+
+static bool vmbus_prep_negotiate_resp(struct icmsg_hdr *icmsghdrp, unsigned char *buf,
+				      unsigned int buflen, const int *fw_version, int fw_vercnt,
+				const int *srv_version, int srv_vercnt,
+				int *nego_fw_version, int *nego_srv_version)
+{
+	int icframe_major, icframe_minor;
+	int icmsg_major, icmsg_minor;
+	int fw_major, fw_minor;
+	int srv_major, srv_minor;
+	int i, j;
+	bool found_match = false;
+	struct icmsg_negotiate *negop;
+
+	/* Check that there's enough space for icframe_vercnt, icmsg_vercnt */
+	if (buflen < ICMSG_HDR + offsetof(struct icmsg_negotiate, reserved)) {
+		syslog(LOG_ERR, "Invalid icmsg negotiate");
+		return false;
+	}
+
+	icmsghdrp->icmsgsize = 0x10;
+	negop = (struct icmsg_negotiate *)&buf[ICMSG_HDR];
+
+	icframe_major = negop->icframe_vercnt;
+	icframe_minor = 0;
+
+	icmsg_major = negop->icmsg_vercnt;
+	icmsg_minor = 0;
+
+	/* Validate negop packet */
+	if (icframe_major > IC_VERSION_NEGOTIATION_MAX_VER_COUNT ||
+	    icmsg_major > IC_VERSION_NEGOTIATION_MAX_VER_COUNT ||
+	    ICMSG_NEGOTIATE_PKT_SIZE(icframe_major, icmsg_major) > buflen) {
+		syslog(LOG_ERR, "Invalid icmsg negotiate - icframe_major: %u, icmsg_major: %u\n",
+		       icframe_major, icmsg_major);
+		goto fw_error;
+	}
+
+	/*
+	 * Select the framework version number we will
+	 * support.
+	 */
+
+	for (i = 0; i < fw_vercnt; i++) {
+		fw_major = (fw_version[i] >> 16);
+		fw_minor = (fw_version[i] & 0xFFFF);
+
+		for (j = 0; j < negop->icframe_vercnt; j++) {
+			if (negop->icversion_data[j].major == fw_major &&
+			    negop->icversion_data[j].minor == fw_minor) {
+				icframe_major = negop->icversion_data[j].major;
+				icframe_minor = negop->icversion_data[j].minor;
+				found_match = true;
+				break;
+			}
+		}
+
+		if (found_match)
+			break;
+	}
+
+	if (!found_match)
+		goto fw_error;
+
+	found_match = false;
+
+	for (i = 0; i < srv_vercnt; i++) {
+		srv_major = (srv_version[i] >> 16);
+		srv_minor = (srv_version[i] & 0xFFFF);
+
+		for (j = negop->icframe_vercnt;
+			(j < negop->icframe_vercnt + negop->icmsg_vercnt);
+			j++) {
+			if (negop->icversion_data[j].major == srv_major &&
+			    negop->icversion_data[j].minor == srv_minor) {
+				icmsg_major = negop->icversion_data[j].major;
+				icmsg_minor = negop->icversion_data[j].minor;
+				found_match = true;
+				break;
+			}
+		}
+
+		if (found_match)
+			break;
+	}
+
+	/*
+	 * Respond with the framework and service
+	 * version numbers we can support.
+	 */
+fw_error:
+	if (!found_match) {
+		negop->icframe_vercnt = 0;
+		negop->icmsg_vercnt = 0;
+	} else {
+		negop->icframe_vercnt = 1;
+		negop->icmsg_vercnt = 1;
+	}
+
+	if (nego_fw_version)
+		*nego_fw_version = (icframe_major << 16) | icframe_minor;
+
+	if (nego_srv_version)
+		*nego_srv_version = (icmsg_major << 16) | icmsg_minor;
+
+	negop->icversion_data[0].major = icframe_major;
+	negop->icversion_data[0].minor = icframe_minor;
+	negop->icversion_data[1].major = icmsg_major;
+	negop->icversion_data[1].minor = icmsg_minor;
+
+	return found_match;
+}
+
+static void wcstoutf8(char *dest, const __u16 *src, size_t dest_size)
+{
+	size_t len = 0;
+
+	while (len < dest_size) {
+		if (src[len] < 0x80)
+			dest[len++] = (char)(*src++);
+		else
+			dest[len++] = 'X';
+	}
+
+	dest[len] = '\0';
+}
+
+static int hv_fcopy_start(struct hv_start_fcopy *smsg_in)
+{
+	setlocale(LC_ALL, "en_US.utf8");
+	size_t file_size, path_size;
+	char *file_name, *path_name;
+	char *in_file_name = (char *)smsg_in->file_name;
+	char *in_path_name = (char *)smsg_in->path_name;
+
+	file_size = wcstombs(NULL, (const wchar_t *restrict)in_file_name, 0) + 1;
+	path_size = wcstombs(NULL, (const wchar_t *restrict)in_path_name, 0) + 1;
+
+	file_name = (char *)malloc(file_size * sizeof(char));
+	path_name = (char *)malloc(path_size * sizeof(char));
+
+	wcstoutf8(file_name, (__u16 *)in_file_name, file_size);
+	wcstoutf8(path_name, (__u16 *)in_path_name, path_size);
+
+	return hv_fcopy_create_file(file_name, path_name, smsg_in->copy_flags);
+}
+
+static int hv_fcopy_send_data(struct hv_fcopy_hdr *fcopy_msg, int recvlen)
+{
+	int operation = fcopy_msg->operation;
+
+	/*
+	 * The  strings sent from the host are encoded in
+	 * utf16; convert it to utf8 strings.
+	 * The host assures us that the utf16 strings will not exceed
+	 * the max lengths specified. We will however, reserve room
+	 * for the string terminating character - in the utf16s_utf8s()
+	 * function we limit the size of the buffer where the converted
+	 * string is placed to W_MAX_PATH -1 to guarantee
+	 * that the strings can be properly terminated!
+	 */
+
+	switch (operation) {
+	case START_FILE_COPY:
+		return hv_fcopy_start((struct hv_start_fcopy *)fcopy_msg);
+	case WRITE_TO_FILE:
+		return hv_copy_data((struct hv_do_fcopy *)fcopy_msg);
+	case COMPLETE_FCOPY:
+		return hv_copy_finished();
+	}
+
+	return HV_E_FAIL;
+}
+
+/* process the packet recv from host */
+static int fcopy_pkt_process(struct vmbus_br *txbr)
+{
+	int ret, offset, pktlen;
+	int fcopy_srv_version;
+	const struct vmbus_chanpkt_hdr *pkt;
+	struct hv_fcopy_hdr *fcopy_msg;
+	struct icmsg_hdr *icmsghdr;
+
+	pkt = (const struct vmbus_chanpkt_hdr *)desc;
+	offset = pkt->hlen << 3;
+	pktlen = (pkt->tlen << 3) - offset;
+	icmsghdr = (struct icmsg_hdr *)&desc[offset + sizeof(struct vmbuspipe_hdr)];
+	icmsghdr->status = HV_E_FAIL;
+
+	if (icmsghdr->icmsgtype == ICMSGTYPE_NEGOTIATE) {
+		if (vmbus_prep_negotiate_resp(icmsghdr, desc + offset, pktlen, fw_versions,
+					      FW_VER_COUNT, fcopy_versions, FCOPY_VER_COUNT,
+					      NULL, &fcopy_srv_version)) {
+			syslog(LOG_INFO, "FCopy IC version %d.%d",
+			       fcopy_srv_version >> 16, fcopy_srv_version & 0xFFFF);
+			icmsghdr->status = 0;
+		}
+	} else if (icmsghdr->icmsgtype == ICMSGTYPE_FCOPY) {
+		/* Ensure recvlen is big enough to contain hv_fcopy_hdr */
+		if (pktlen < ICMSG_HDR + sizeof(struct hv_fcopy_hdr)) {
+			syslog(LOG_ERR, "Invalid Fcopy hdr. Packet length too small: %u",
+			       pktlen);
+			return -ENOBUFS;
+		}
+
+		fcopy_msg = (struct hv_fcopy_hdr *)&desc[offset + ICMSG_HDR];
+		icmsghdr->status = hv_fcopy_send_data(fcopy_msg, pktlen);
+	}
+
+	icmsghdr->icflags = ICMSGHDRFLAG_TRANSACTION | ICMSGHDRFLAG_RESPONSE;
+	ret = rte_vmbus_chan_send(txbr, 0x6, desc + offset, pktlen, 0);
+	if (ret) {
+		syslog(LOG_ERR, "Write to ringbuffer failed err: %d", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void fcopy_get_first_folder(char *path, char *chan_no)
+{
+	DIR *dir = opendir(path);
+	struct dirent *entry;
+
+	if (!dir) {
+		syslog(LOG_ERR, "Failed to open directory (errno=%s).\n", strerror(errno));
+		return;
+	}
+
+	while ((entry = readdir(dir)) != NULL) {
+		if (entry->d_type == DT_DIR && strcmp(entry->d_name, ".") != 0 &&
+		    strcmp(entry->d_name, "..") != 0) {
+			strcpy(chan_no, entry->d_name);
+			break;
+		}
+	}
+
+	closedir(dir);
+}
+
+int main(int argc, char *argv[])
+{
+	int fcopy_fd = -1, tmp = 1;
+	int daemonize = 1, long_index = 0, opt, ret = -EINVAL;
+	struct vmbus_br txbr, rxbr;
+	void *ring;
+	uint32_t len = HV_RING_SIZE;
+	char chan_no[MAX_FOLDER_NAME] = {0};
+	char uio_name[MAX_FOLDER_NAME] = {0};
+	char uio_dev_path[MAX_PATH_LEN] = {0};
+
+	static struct option long_options[] = {
+		{"help",	no_argument,	   0,  'h' },
+		{"no-daemon",	no_argument,	   0,  'n' },
+		{0,		0,		   0,  0   }
+	};
+
+	while ((opt = getopt_long(argc, argv, "hn", long_options,
+				  &long_index)) != -1) {
+		switch (opt) {
+		case 'n':
+			daemonize = 0;
+			break;
+		case 'h':
+		default:
+			print_usage(argv);
+			goto exit;
+		}
+	}
+
+	if (daemonize && daemon(1, 0)) {
+		syslog(LOG_ERR, "daemon() failed; error: %s", strerror(errno));
+		goto exit;
+	}
+
+	openlog("HV_UIO_FCOPY", 0, LOG_USER);
+	syslog(LOG_INFO, "starting; pid is:%d", getpid());
+
+	fcopy_get_first_folder(FCOPY_UIO, uio_name);
+	snprintf(uio_dev_path, sizeof(uio_dev_path), "/dev/%s", uio_name);
+	fcopy_fd = open(uio_dev_path, O_RDWR);
+
+	if (fcopy_fd < 0) {
+		syslog(LOG_ERR, "open %s failed; error: %d %s",
+		       uio_dev_path, errno, strerror(errno));
+		ret = fcopy_fd;
+		goto exit;
+	}
+
+	fcopy_get_first_folder(FCOPY_SYSFS_CHAN, chan_no);
+	ring = vmbus_uio_map(FCOPY_SYSFS_CHAN, chan_no, HV_RING_SIZE);
+	if (!ring) {
+		ret = errno;
+		syslog(LOG_ERR, "mmap ringbuffer failed; error: %d %s", ret, strerror(ret));
+		goto close;
+	}
+	vmbus_br_setup(&txbr, ring, HV_RING_SIZE);
+	vmbus_br_setup(&rxbr, (char *)ring + HV_RING_SIZE, HV_RING_SIZE);
+
+	while (1) {
+		/*
+		 * In this loop we process fcopy messages after the
+		 * handshake is complete.
+		 */
+		ret = pread(fcopy_fd, &tmp, sizeof(int), 0);
+		if (ret < 0) {
+			syslog(LOG_ERR, "pread failed: %s", strerror(errno));
+			continue;
+		}
+
+		len = HV_RING_SIZE;
+		ret = rte_vmbus_chan_recv_raw(&rxbr, desc, &len);
+		if (unlikely(ret <= 0)) {
+			/* This indicates a failure to communicate (or worse) */
+			syslog(LOG_ERR, "VMBus channel recv error: %d", ret);
+		} else {
+			ret = fcopy_pkt_process(&txbr);
+			if (ret < 0)
+				goto close;
+
+			/* Signal host */
+			if ((write(fcopy_fd, &tmp, sizeof(int))) != sizeof(int)) {
+				ret = errno;
+				syslog(LOG_ERR, "Registration failed: %s\n", strerror(ret));
+				goto close;
+			}
+		}
+	}
+close:
+	close(fcopy_fd);
+exit:
+	return ret;
+}
-- 
2.34.1


^ permalink raw reply related

* [PATCH 4/5] tools: hv: Remove hv_fcopy_daemon
From: Saurabh Sengar @ 2023-06-02  7:57 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, mikelley, gregkh, linux-kernel,
	linux-hyperv
In-Reply-To: <1685692629-31351-1-git-send-email-ssengar@linux.microsoft.com>

As the new fcopy application using uio is introduced, remove
obsolete hv_fcopy_daemon.c

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
---
 tools/hv/hv_fcopy_daemon.c | 266 -------------------------------------
 1 file changed, 266 deletions(-)
 delete mode 100644 tools/hv/hv_fcopy_daemon.c

diff --git a/tools/hv/hv_fcopy_daemon.c b/tools/hv/hv_fcopy_daemon.c
deleted file mode 100644
index 16d629b22c25..000000000000
--- a/tools/hv/hv_fcopy_daemon.c
+++ /dev/null
@@ -1,266 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * An implementation of host to guest copy functionality for Linux.
- *
- * Copyright (C) 2014, Microsoft, Inc.
- *
- * Author : K. Y. Srinivasan <kys@microsoft.com>
- */
-
-
-#include <sys/types.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <string.h>
-#include <errno.h>
-#include <linux/hyperv.h>
-#include <linux/limits.h>
-#include <syslog.h>
-#include <sys/stat.h>
-#include <fcntl.h>
-#include <getopt.h>
-
-static int target_fd;
-static char target_fname[PATH_MAX];
-static unsigned long long filesize;
-
-static int hv_start_fcopy(struct hv_start_fcopy *smsg)
-{
-	int error = HV_E_FAIL;
-	char *q, *p;
-
-	filesize = 0;
-	p = (char *)smsg->path_name;
-	snprintf(target_fname, sizeof(target_fname), "%s/%s",
-		 (char *)smsg->path_name, (char *)smsg->file_name);
-
-	syslog(LOG_INFO, "Target file name: %s", target_fname);
-	/*
-	 * Check to see if the path is already in place; if not,
-	 * create if required.
-	 */
-	while ((q = strchr(p, '/')) != NULL) {
-		if (q == p) {
-			p++;
-			continue;
-		}
-		*q = '\0';
-		if (access((char *)smsg->path_name, F_OK)) {
-			if (smsg->copy_flags & CREATE_PATH) {
-				if (mkdir((char *)smsg->path_name, 0755)) {
-					syslog(LOG_ERR, "Failed to create %s",
-						(char *)smsg->path_name);
-					goto done;
-				}
-			} else {
-				syslog(LOG_ERR, "Invalid path: %s",
-					(char *)smsg->path_name);
-				goto done;
-			}
-		}
-		p = q + 1;
-		*q = '/';
-	}
-
-	if (!access(target_fname, F_OK)) {
-		syslog(LOG_INFO, "File: %s exists", target_fname);
-		if (!(smsg->copy_flags & OVER_WRITE)) {
-			error = HV_ERROR_ALREADY_EXISTS;
-			goto done;
-		}
-	}
-
-	target_fd = open(target_fname,
-			 O_RDWR | O_CREAT | O_TRUNC | O_CLOEXEC, 0744);
-	if (target_fd == -1) {
-		syslog(LOG_INFO, "Open Failed: %s", strerror(errno));
-		goto done;
-	}
-
-	error = 0;
-done:
-	if (error)
-		target_fname[0] = '\0';
-	return error;
-}
-
-static int hv_copy_data(struct hv_do_fcopy *cpmsg)
-{
-	ssize_t bytes_written;
-	int ret = 0;
-
-	bytes_written = pwrite(target_fd, cpmsg->data, cpmsg->size,
-				cpmsg->offset);
-
-	filesize += cpmsg->size;
-	if (bytes_written != cpmsg->size) {
-		switch (errno) {
-		case ENOSPC:
-			ret = HV_ERROR_DISK_FULL;
-			break;
-		default:
-			ret = HV_E_FAIL;
-			break;
-		}
-		syslog(LOG_ERR, "pwrite failed to write %llu bytes: %ld (%s)",
-		       filesize, (long)bytes_written, strerror(errno));
-	}
-
-	return ret;
-}
-
-/*
- * Reset target_fname to "" in the two below functions for hibernation: if
- * the fcopy operation is aborted by hibernation, the daemon should remove the
- * partially-copied file; to achieve this, the hv_utils driver always fakes a
- * CANCEL_FCOPY message upon suspend, and later when the VM resumes back,
- * the daemon calls hv_copy_cancel() to remove the file; if a file is copied
- * successfully before suspend, hv_copy_finished() must reset target_fname to
- * avoid that the file can be incorrectly removed upon resume, since the faked
- * CANCEL_FCOPY message is spurious in this case.
- */
-static int hv_copy_finished(void)
-{
-	close(target_fd);
-	target_fname[0] = '\0';
-	return 0;
-}
-static int hv_copy_cancel(void)
-{
-	close(target_fd);
-	if (strlen(target_fname) > 0) {
-		unlink(target_fname);
-		target_fname[0] = '\0';
-	}
-	return 0;
-
-}
-
-void print_usage(char *argv[])
-{
-	fprintf(stderr, "Usage: %s [options]\n"
-		"Options are:\n"
-		"  -n, --no-daemon        stay in foreground, don't daemonize\n"
-		"  -h, --help             print this help\n", argv[0]);
-}
-
-int main(int argc, char *argv[])
-{
-	int fcopy_fd = -1;
-	int error;
-	int daemonize = 1, long_index = 0, opt;
-	int version = FCOPY_CURRENT_VERSION;
-	union {
-		struct hv_fcopy_hdr hdr;
-		struct hv_start_fcopy start;
-		struct hv_do_fcopy copy;
-		__u32 kernel_modver;
-	} buffer = { };
-	int in_handshake;
-
-	static struct option long_options[] = {
-		{"help",	no_argument,	   0,  'h' },
-		{"no-daemon",	no_argument,	   0,  'n' },
-		{0,		0,		   0,  0   }
-	};
-
-	while ((opt = getopt_long(argc, argv, "hn", long_options,
-				  &long_index)) != -1) {
-		switch (opt) {
-		case 'n':
-			daemonize = 0;
-			break;
-		case 'h':
-		default:
-			print_usage(argv);
-			exit(EXIT_FAILURE);
-		}
-	}
-
-	if (daemonize && daemon(1, 0)) {
-		syslog(LOG_ERR, "daemon() failed; error: %s", strerror(errno));
-		exit(EXIT_FAILURE);
-	}
-
-	openlog("HV_FCOPY", 0, LOG_USER);
-	syslog(LOG_INFO, "starting; pid is:%d", getpid());
-
-reopen_fcopy_fd:
-	if (fcopy_fd != -1)
-		close(fcopy_fd);
-	/* Remove any possible partially-copied file on error */
-	hv_copy_cancel();
-	in_handshake = 1;
-	fcopy_fd = open("/dev/vmbus/hv_fcopy", O_RDWR);
-
-	if (fcopy_fd < 0) {
-		syslog(LOG_ERR, "open /dev/vmbus/hv_fcopy failed; error: %d %s",
-			errno, strerror(errno));
-		exit(EXIT_FAILURE);
-	}
-
-	/*
-	 * Register with the kernel.
-	 */
-	if ((write(fcopy_fd, &version, sizeof(int))) != sizeof(int)) {
-		syslog(LOG_ERR, "Registration failed: %s", strerror(errno));
-		exit(EXIT_FAILURE);
-	}
-
-	while (1) {
-		/*
-		 * In this loop we process fcopy messages after the
-		 * handshake is complete.
-		 */
-		ssize_t len;
-
-		len = pread(fcopy_fd, &buffer, sizeof(buffer), 0);
-		if (len < 0) {
-			syslog(LOG_ERR, "pread failed: %s", strerror(errno));
-			goto reopen_fcopy_fd;
-		}
-
-		if (in_handshake) {
-			if (len != sizeof(buffer.kernel_modver)) {
-				syslog(LOG_ERR, "invalid version negotiation");
-				exit(EXIT_FAILURE);
-			}
-			in_handshake = 0;
-			syslog(LOG_INFO, "kernel module version: %u",
-			       buffer.kernel_modver);
-			continue;
-		}
-
-		switch (buffer.hdr.operation) {
-		case START_FILE_COPY:
-			error = hv_start_fcopy(&buffer.start);
-			break;
-		case WRITE_TO_FILE:
-			error = hv_copy_data(&buffer.copy);
-			break;
-		case COMPLETE_FCOPY:
-			error = hv_copy_finished();
-			break;
-		case CANCEL_FCOPY:
-			error = hv_copy_cancel();
-			break;
-
-		default:
-			error = HV_E_FAIL;
-			syslog(LOG_ERR, "Unknown operation: %d",
-				buffer.hdr.operation);
-
-		}
-
-		/*
-		 * pwrite() may return an error due to the faked CANCEL_FCOPY
-		 * message upon hibernation. Ignore the error by resetting the
-		 * dev file, i.e. closing and re-opening it.
-		 */
-		if (pwrite(fcopy_fd, &error, sizeof(int), 0) != sizeof(int)) {
-			syslog(LOG_ERR, "pwrite failed: %s", strerror(errno));
-			goto reopen_fcopy_fd;
-		}
-	}
-}
-- 
2.34.1


^ permalink raw reply related

* [PATCH 1/5] uio: Add hv_vmbus_client driver
From: Saurabh Sengar @ 2023-06-02  7:57 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, mikelley, gregkh, linux-kernel,
	linux-hyperv
In-Reply-To: <1685692629-31351-1-git-send-email-ssengar@linux.microsoft.com>

Generic VMBus driver that can be dynamically bound, to any simple
low speed Hyper-V VMBus device. This driver supports single
channel and uses hypercall instead of monitor bits to interrupt
host, which is suitable for low speed devices. Additionally, the
driver also provide the flexibility of custom ring buffer sizes and
avoid memory allocation for gpadl to offer memory optimization.

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
---
 drivers/uio/Kconfig               |  12 ++
 drivers/uio/Makefile              |   1 +
 drivers/uio/uio_hv_vmbus_client.c | 232 ++++++++++++++++++++++++++++++
 3 files changed, 245 insertions(+)
 create mode 100644 drivers/uio/uio_hv_vmbus_client.c

diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
index 2e16c5338e5b..dcc727e6fd3f 100644
--- a/drivers/uio/Kconfig
+++ b/drivers/uio/Kconfig
@@ -166,6 +166,18 @@ config UIO_HV_GENERIC
 
 	  If you compile this as a module, it will be called uio_hv_generic.
 
+config UIO_HV_SLOW_DEVICES
+	tristate "Generic driver for low speed VMBus devices"
+	depends on HYPERV
+	help
+	  Generic driver that you can bind, dynamically, to any simple
+	  Hyper-V VMBus device with single channel and these devices
+	  uses hypercall instead of monitor bits to interrupt host. This
+	  driver also provide the flexibility of custom ring buffer sizes.
+	  It is useful for slower VMbus devices.
+
+	  If you compile this as a module, it will be called uio_hv_vmbus_client.
+
 config UIO_DFL
 	tristate "Generic driver for DFL (Device Feature List) bus"
 	depends on FPGA_DFL
diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
index f2f416a14228..44be0f96da34 100644
--- a/drivers/uio/Makefile
+++ b/drivers/uio/Makefile
@@ -11,4 +11,5 @@ obj-$(CONFIG_UIO_PRUSS)         += uio_pruss.o
 obj-$(CONFIG_UIO_MF624)         += uio_mf624.o
 obj-$(CONFIG_UIO_FSL_ELBC_GPCM)	+= uio_fsl_elbc_gpcm.o
 obj-$(CONFIG_UIO_HV_GENERIC)	+= uio_hv_generic.o
+obj-$(CONFIG_UIO_HV_SLOW_DEVICES)	+= uio_hv_vmbus_client.o
 obj-$(CONFIG_UIO_DFL)	+= uio_dfl.o
diff --git a/drivers/uio/uio_hv_vmbus_client.c b/drivers/uio/uio_hv_vmbus_client.c
new file mode 100644
index 000000000000..a5a47caeffe1
--- /dev/null
+++ b/drivers/uio/uio_hv_vmbus_client.c
@@ -0,0 +1,232 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * uio_hv_vmbus_client - UIO driver for low speed VMBus devices
+ *
+ * Copyright (c) 2023, Microsoft Corporation.
+ *
+ * Authors:
+ *   Saurabh Sengar <ssengar@microsoft.com>
+ *
+ * Since the driver does not declare any device ids, you must allocate
+ * id and bind the device to the driver yourself.  For example:
+ * driverctl -b vmbus set-override <dev uuid> uio_hv_vmbus_client
+ */
+
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/uio_driver.h>
+#include <linux/hyperv.h>
+#include <linux/vmalloc.h>
+#include <linux/sysfs.h>
+
+#define DRIVER_VERSION	"0.0.1"
+#define DRIVER_AUTHOR	"Saurabh Sengar <ssengar@microsoft.com>"
+#define DRIVER_DESC	"Generic UIO driver for low speed VMBus devices"
+
+#define DEFAULT_HV_RING_SIZE	VMBUS_RING_SIZE(3 * HV_HYP_PAGE_SIZE)
+static int ring_size = DEFAULT_HV_RING_SIZE;
+
+struct uio_hv_vmbus_dev {
+	struct uio_info info;
+	struct hv_device *device;
+	atomic_t refcnt;
+};
+
+/* Sysfs API to allow mmap of the ring buffers
+ * The ring buffer is allocated as contiguous memory by vmbus_open
+ */
+static int uio_hv_vmbus_mmap(struct file *filp, struct kobject *kobj,
+			     struct bin_attribute *attr, struct vm_area_struct *vma)
+{
+	struct vmbus_channel *channel = container_of(kobj, struct vmbus_channel, kobj);
+	void *ring_buffer = page_address(channel->ringbuffer_page);
+
+	return vm_iomap_memory(vma, virt_to_phys(ring_buffer),
+			       channel->ringbuffer_pagecount << PAGE_SHIFT);
+}
+
+static const struct bin_attribute ring_buffer_bin_attr = {
+	.attr = {
+		.name = "ringbuffer",
+		.mode = 0600,
+	},
+	.mmap = uio_hv_vmbus_mmap,
+};
+
+/*
+ * This is the irqcontrol callback to be registered to uio_info.
+ * It can be used to disable/enable interrupt from user space processes.
+ *
+ * @param info
+ *  pointer to uio_info.
+ * @param irq_state
+ *  state value. 1 to enable interrupt, 0 to disable interrupt.
+ */
+static int uio_hv_vmbus_irqcontrol(struct uio_info *info, s32 irq_state)
+{
+	struct uio_hv_vmbus_dev *pdata = info->priv;
+	struct hv_device *hv_dev = pdata->device;
+
+	/* Issue a full memory barrier before triggering the notification */
+	virt_mb();
+
+	vmbus_setevent(hv_dev->channel);
+	return 0;
+}
+
+/*
+ * Callback from vmbus_event when something is in inbound ring.
+ */
+static void uio_hv_vmbus_channel_cb(void *context)
+{
+	struct uio_hv_vmbus_dev *pdata = context;
+
+	/* Issue a full memory barrier before sending the event to userspace */
+	virt_mb();
+
+	uio_event_notify(&pdata->info);
+}
+
+static int uio_hv_vmbus_open(struct uio_info *info, struct inode *inode)
+{
+	struct uio_hv_vmbus_dev *pdata = container_of(info, struct uio_hv_vmbus_dev, info);
+	struct hv_device *hv_dev = pdata->device;
+	struct vmbus_channel *channel = hv_dev->channel;
+	int ret = 0;
+
+	if (atomic_inc_return(&pdata->refcnt) != 1)
+		return 0;
+
+	ret = vmbus_open(channel, ring_size, ring_size, NULL, 0,
+			 uio_hv_vmbus_channel_cb, pdata);
+	if (ret) {
+		dev_err(&hv_dev->device, "error %d when opening the channel\n", ret);
+		goto done;
+	}
+	channel->inbound.ring_buffer->interrupt_mask = 0;
+	set_channel_read_mode(channel, HV_CALL_ISR);
+
+	ret = sysfs_create_bin_file(&channel->kobj, &ring_buffer_bin_attr);
+	if (ret)
+		dev_notice(&hv_dev->device, "sysfs create ring bin file failed; %d\n", ret);
+
+	return 0;
+
+done:
+	atomic_dec(&pdata->refcnt);
+	return ret;
+}
+
+/* VMbus primary channel is closed on last close */
+static int uio_hv_vmbus_release(struct uio_info *info, struct inode *inode)
+{
+	struct uio_hv_vmbus_dev *pdata = container_of(info, struct uio_hv_vmbus_dev, info);
+	struct hv_device *hv_dev = pdata->device;
+	struct vmbus_channel *channel = hv_dev->channel;
+
+	sysfs_remove_bin_file(&channel->kobj, &ring_buffer_bin_attr);
+
+	if (atomic_dec_and_test(&pdata->refcnt))
+		vmbus_close(channel);
+
+	return 0;
+}
+
+static ssize_t ring_size_show(struct device *dev, struct device_attribute *attr,
+			      char *buf)
+{
+	return scnprintf(buf, PAGE_SIZE, "%d\n", ring_size);
+}
+
+static ssize_t ring_size_store(struct device *dev, struct device_attribute *attr,
+			       const char *buf, size_t count)
+{
+	unsigned int val;
+
+	if (kstrtouint(buf, 0, &val) < 0)
+		return -EINVAL;
+
+	if (val < HV_HYP_PAGE_SIZE)
+		return -EINVAL;
+
+	ring_size = val;
+
+	return count;
+}
+static DEVICE_ATTR_RW(ring_size);
+
+static int uio_hv_vmbus_probe(struct hv_device *dev, const struct hv_vmbus_device_id *dev_id)
+{
+	struct uio_hv_vmbus_dev *pdata;
+	int ret = 0;
+	char *name = NULL;
+
+	pdata = devm_kzalloc(&dev->device, sizeof(*pdata), GFP_KERNEL);
+	if (!pdata)
+		return -ENOMEM;
+
+	name = kasprintf(GFP_KERNEL, "%pUl", &dev->dev_instance);
+
+	/* Fill general uio info */
+	pdata->info.name = name; /* /sys/class/uio/uioX/name */
+	pdata->info.version = DRIVER_VERSION;
+	pdata->info.irqcontrol = uio_hv_vmbus_irqcontrol;
+	pdata->info.open = uio_hv_vmbus_open;
+	pdata->info.release = uio_hv_vmbus_release;
+	pdata->info.irq = UIO_IRQ_CUSTOM;
+	pdata->info.priv = pdata;
+	pdata->device = dev;
+
+	ret = uio_register_device(&dev->device, &pdata->info);
+	if (ret) {
+		dev_err(&dev->device, "uio_hv_vmbus register failed\n");
+		goto fail;
+	}
+
+	ret = sysfs_create_file(&dev->device.kobj, &dev_attr_ring_size.attr);
+	if (ret)
+		dev_notice(&dev->device, "sysfs create ring size file failed; %d\n", ret);
+
+	hv_set_drvdata(dev, pdata);
+	return 0;
+
+fail:
+	kfree(pdata);
+	return ret;
+}
+
+static void
+uio_hv_vmbus_remove(struct hv_device *dev)
+{
+	struct uio_hv_vmbus_dev *pdata = hv_get_drvdata(dev);
+
+	sysfs_remove_file(&dev->device.kobj, &dev_attr_ring_size.attr);
+	if (pdata)
+		uio_unregister_device(&pdata->info);
+}
+
+static struct hv_driver uio_hv_vmbus_drv = {
+	.name = "uio_hv_vmbus_client",
+	.id_table = NULL, /* only dynamic id's */
+	.probe = uio_hv_vmbus_probe,
+	.remove = uio_hv_vmbus_remove,
+};
+
+static int __init uio_hv_vmbus_init(void)
+{
+	return vmbus_driver_register(&uio_hv_vmbus_drv);
+}
+
+static void __exit uio_hv_vmbus_exit(void)
+{
+	vmbus_driver_unregister(&uio_hv_vmbus_drv);
+}
+
+module_init(uio_hv_vmbus_init);
+module_exit(uio_hv_vmbus_exit);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
-- 
2.34.1


^ permalink raw reply related

* [PATCH 0/5] UIO driver for low speed Hyper-V devices
From: Saurabh Sengar @ 2023-06-02  7:57 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, mikelley, gregkh, linux-kernel,
	linux-hyperv

Hyper-V is adding some low speed "specialty" synthetic devices.
This patch series propose the solution to support these devices.
Instead of writing new kernel-level VMBus drivers for all of
these devices, we propose a solution wherein these devices are
made accessible to user space through a dedicated UIO-based
hv_vmbus_client driver, allowing for efficient device handling
via user space drivers. This solution aims to optimize the
development process by eliminating the need to create
individual kernel-level VMBus drivers for each device and
provide flexibility to user space applications to control the
ring buffer independently.

Since all these new synthetic devices are low speed devices,
they don't support monitor bits and we must use vmbus_setevent()
to enable interrupts from the host. The new uio driver supports
all these requirements effectively. Additionally, this new driver
also provide the support for having smaller/cutom ringbuffer
size.

Furthermore, this patch series includes a revision of the fcopy
application to leverage the new interface seamlessly along with
removal of old driver and application. However, please note that
the development of other similar drivers is still a work in
progress, and will be shared as they become available.

Saurabh Sengar (5):
  uio: Add hv_vmbus_client driver
  tools: hv: Add vmbus_bufring
  tools: hv: Add new fcopy application based on uio driver
  tools: hv: Remove hv_fcopy_daemon
  Drivers: hv: Remove fcopy driver

 drivers/hv/Makefile               |   2 +-
 drivers/hv/hv_fcopy.c             | 427 --------------------------
 drivers/hv/hv_util.c              |  12 -
 drivers/uio/Kconfig               |  12 +
 drivers/uio/Makefile              |   1 +
 drivers/uio/uio_hv_vmbus_client.c | 232 ++++++++++++++
 tools/hv/Build                    |   3 +-
 tools/hv/Makefile                 |  10 +-
 tools/hv/hv_fcopy_daemon.c        | 266 ----------------
 tools/hv/hv_fcopy_uio_daemon.c    | 491 ++++++++++++++++++++++++++++++
 tools/hv/vmbus_bufring.c          | 324 ++++++++++++++++++++
 tools/hv/vmbus_bufring.h          | 158 ++++++++++
 12 files changed, 1226 insertions(+), 712 deletions(-)
 delete mode 100644 drivers/hv/hv_fcopy.c
 create mode 100644 drivers/uio/uio_hv_vmbus_client.c
 delete mode 100644 tools/hv/hv_fcopy_daemon.c
 create mode 100644 tools/hv/hv_fcopy_uio_daemon.c
 create mode 100644 tools/hv/vmbus_bufring.c
 create mode 100644 tools/hv/vmbus_bufring.h

-- 
2.34.1

^ permalink raw reply

* Re: [ANNOUNCE] KVM Microconference at LPC 2023
From: Sean Christopherson @ 2023-06-02  0:23 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Paolo Bonzini, James Morris, Marc Zyngier, Borislav Petkov,
	Dave Hansen, H . Peter Anvin, Ingo Molnar, Kees Cook,
	Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li, Alexander Graf,
	Forrest Yuan Yu, John Andersen, Liran Alon,
	Madhavan T . Venkataraman, Marian Rotariu, Mihai Donțu,
	Nicușor Cîțu, Rick Edgecombe, Thara Gopinath,
	Will Deacon, Zahra Tarkhani, Ștefan Șicleru, dev, kvm,
	linux-hardening, linux-hyperv, linux-kernel,
	linux-security-module, qemu-devel, virtualization, x86, xen-devel
In-Reply-To: <9a4edc66-a0a3-73e4-09c5-db68d4cfbb68@digikod.net>

On Thu, Jun 01, 2023, Mickaï¿½l Salaï¿½n wrote:
> Hi,
> 
> What is the status of this microconference proposal? We'd be happy to talk
> about Heki [1] and potentially other hypervisor supports.

Proposal submitted (deadline is/was today), now we wait :-)  IIUC, we should find
out rather quickly whether or not the KVM MC is a go.

^ permalink raw reply

* Re: [ANNOUNCE] KVM Microconference at LPC 2023
From: Mickaël Salaün @ 2023-06-01 21:52 UTC (permalink / raw)
  To: Paolo Bonzini, James Morris
  Cc: Sean Christopherson, Marc Zyngier, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Ingo Molnar, Kees Cook, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li, Alexander Graf, Forrest Yuan Yu,
	John Andersen, Liran Alon, Madhavan T . Venkataraman,
	Marian Rotariu, Mihai Donțu, Nicușor Cîțu,
	Rick Edgecombe, Thara Gopinath, Will Deacon, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening, linux-hyperv,
	linux-kernel, linux-security-module, qemu-devel, virtualization,
	x86, xen-devel
In-Reply-To: <4142c8dc-5385-fb1d-4f8b-2a98bb3f99af@digikod.net>

Hi,

What is the status of this microconference proposal? We'd be happy to 
talk about Heki [1] and potentially other hypervisor supports.

Regards,
  Mickaël


[1] https://lore.kernel.org/all/20230505152046.6575-1-mic@digikod.net/


On 26/05/2023 18:09, Mickaël Salaün wrote:
> See James Morris's proposal here:
> https://lore.kernel.org/all/17f62cb1-a5de-2020-2041-359b8e96b8c0@linux.microsoft.com/
> 
> On 26/05/2023 04:36, James Morris wrote:
>   > [Side topic]
>   >
>   > Would folks be interested in a Linux Plumbers Conference MC on this
>   > topic generally, across different hypervisors, VMMs, and architectures?
>   >
>   > If so, please let me know who the key folk would be and we can try
> writing
>   > up an MC proposal.
> 
> The fine-grain memory management proposal from James Gowans looks
> interesting, especially the "side-car" virtual machines:
> https://lore.kernel.org/all/88db2d9cb42e471692ff1feb0b9ca855906a9d95.camel@amazon.com/
> 
> 
> On 09/05/2023 11:55, Paolo Bonzini wrote:
>> Hi all!
>>
>> We are planning on submitting a CFP to host a KVM Microconference at
>> Linux Plumbers Conference 2023. To help justify the proposal, we would
>> like to gather a list of folks that would likely attend, and crowdsource
>> a list of topics to include in the proposal.
>>
>> For both this year and future years, the intent is that a KVM
>> Microconference will complement KVM Forum, *NOT* supplant it. As you
>> probably noticed, KVM Forum is going through a somewhat radical change in
>> how it's organized; the conference is now free and (with some help from
>> Red Hat) organized directly by the KVM and QEMU communities. Despite the
>> unexpected changes and some teething pains, community response to KVM
>> Forum continues to be overwhelmingly positive! KVM Forum will remain
>> the venue of choice for KVM/userspace collaboration, for educational
>> content covering both KVM and userspace, and to discuss new features in
>> QEMU and other userspace projects.
>>
>> At least on the x86 side, however, the success of KVM Forum led us
>> virtualization folks to operate in relative isolation. KVM depends on
>> and impacts multiple subsystems (MM, scheduler, perf) in profound ways,
>> and recently we’ve seen more and more ideas/features that require
>> non-trivial changes outside KVM and buy-in from stakeholders that
>> (typically) do not attend KVM Forum. Linux Plumbers Conference is a
>> natural place to establish such collaboration within the kernel.
>>
>> Therefore, the aim of the KVM Microconference will be:
>> * to provide a setting in which to discuss KVM and kernel internals
>> * to increase collaboration and reduce friction with other subsystems
>> * to discuss system virtualization issues that require coordination with
>> other subsystems (such as VFIO, or guest support in arch/)
>>
>> Below is a rough draft of the planned CFP submission.
>>
>> Thanks!
>>
>> Paolo Bonzini (KVM Maintainer)
>> Sean Christopherson (KVM x86 Co-Maintainer)
>> Marc Zyngier (KVM ARM Co-Maintainer)
>>
>>
>> ===================
>> KVM Microconference
>> ===================
>>
>> KVM (Kernel-based Virtual Machine) enables the use of hardware features
>> to improve the efficiency, performance, and security of virtual machines
>> created and managed by userspace.  KVM was originally developed to host
>> and accelerate "full" virtual machines running a traditional kernel and
>> operating system, but has long since expanded to cover a wide array of use
>> cases, e.g. hosting real time workloads, sandboxing untrusted workloads,
>> deprivileging third party code, reducing the trusted computed base of
>> security sensitive workloads, etc.  As KVM's use cases have grown, so too
>> have the requirements placed on KVM and the interactions between it and
>> other kernel subsystems.
>>
>> The KVM Microconference will focus on how to evolve KVM and adjacent
>> subsystems in order to satisfy new and upcoming requirements: serving
>> guest memory that cannot be accessed by host userspace[1], providing
>> accurate, feature-rich PMU/perf virtualization in cloud VMs[2], etc.
>>
>>
>> Potential Topics:
>>      - Serving inaccessible/unmappable memory for KVM guests (protected VMs)
>>      - Optimizing mmu_notifiers, e.g. reducing TLB flushes and spurious zapping
>>      - Supporting multiple KVM modules (for non-disruptive upgrades)
>>      - Improving and hardening KVM+perf interactions
>>      - Implementing arch-agnostic abstractions in KVM (e.g. MMU)
>>      - Defining KVM requirements for hardware vendors
>>      - Utilizing "fault" injection to increase test coverage of edge cases
>>      - KVM vs VFIO (e.g. memory types, a rather hot topic on the ARM side)
>>
>>
>> Key Attendees:
>>      - Paolo Bonzini <pbonzini@redhat.com> (KVM Maintainer)
>>      - Sean Christopherson <seanjc@google.com>  (KVM x86 Co-Maintainer)
>>      - Your name could be here!
>>
>> [1] https://lore.kernel.org/all/20221202061347.1070246-1-chao.p.peng@linux.intel.com
>> [2] https://lore.kernel.org/all/CALMp9eRBOmwz=mspp0m5Q093K3rMUeAsF3vEL39MGV5Br9wEQQ@mail.gmail.com
>>
>>

^ permalink raw reply

* Re: [PATCH v2] RDMA/mana_ib: Use v2 version of cfg_rx_steer_req to enable RX coalescing
From: Jason Gunthorpe @ 2023-06-01 16:09 UTC (permalink / raw)
  To: longli
  Cc: Leon Romanovsky, Ajay Sharma, Dexuan Cui, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-rdma, linux-hyperv, netdev,
	linux-kernel, Long Li
In-Reply-To: <1684045095-31228-1-git-send-email-longli@linuxonhyperv.com>

On Sat, May 13, 2023 at 11:18:15PM -0700, longli@linuxonhyperv.com wrote:
> From: Long Li <longli@microsoft.com>
> 
> With RX coalescing, one CQE entry can be used to indicate multiple packets
> on the receive queue. This saves processing time and PCI bandwidth over
> the CQ.
> 
> The MANA Ethernet driver also uses the v2 version of the protocol. It
> doesn't use RX coalescing and its behavior is not changed.
> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
> 
> Change log
> v2: remove the definition of v1 protocol
> 
>  drivers/infiniband/hw/mana/qp.c               | 5 ++++-
>  drivers/net/ethernet/microsoft/mana/mana_en.c | 5 ++++-
>  include/net/mana/mana.h                       | 4 +++-
>  3 files changed, 11 insertions(+), 3 deletions(-)

Applied to for-next

Thanks,
Jason

^ permalink raw reply

* [PATCH 9/9] x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES
From: Tianyu Lan @ 2023-06-01 15:16 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, x86,
	hpa, daniel.lezcano, arnd, michael.h.kelley
  Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel, vkuznets
In-Reply-To: <20230601151624.1757616-1-ltykernel@gmail.com>

From: Tianyu Lan <tiala@microsoft.com>

Add Hyperv-specific handling for faults caused by VMMCALL
instructions.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
 arch/x86/kernel/cpu/mshyperv.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index fd37f47de134..eaa98100f354 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -32,6 +32,7 @@
 #include <asm/nmi.h>
 #include <clocksource/hyperv_timer.h>
 #include <asm/numa.h>
+#include <asm/svm.h>
 
 /* Is Linux running as the root partition? */
 bool hv_root_partition;
@@ -576,6 +577,20 @@ static bool __init ms_hyperv_msi_ext_dest_id(void)
 	return eax & HYPERV_VS_PROPERTIES_EAX_EXTENDED_IOAPIC_RTE;
 }
 
+static void hv_sev_es_hcall_prepare(struct ghcb *ghcb, struct pt_regs *regs)
+{
+	/* RAX and CPL are already in the GHCB */
+	ghcb_set_rcx(ghcb, regs->cx);
+	ghcb_set_rdx(ghcb, regs->dx);
+	ghcb_set_r8(ghcb, regs->r8);
+}
+
+static bool hv_sev_es_hcall_finish(struct ghcb *ghcb, struct pt_regs *regs)
+{
+	/* No checking of the return state needed */
+	return true;
+}
+
 const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
 	.name			= "Microsoft Hyper-V",
 	.detect			= ms_hyperv_platform,
@@ -583,4 +598,6 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
 	.init.x2apic_available	= ms_hyperv_x2apic_available,
 	.init.msi_ext_dest_id	= ms_hyperv_msi_ext_dest_id,
 	.init.init_platform	= ms_hyperv_init_platform,
+	.runtime.sev_es_hcall_prepare = hv_sev_es_hcall_prepare,
+	.runtime.sev_es_hcall_finish = hv_sev_es_hcall_finish,
 };
-- 
2.25.1


^ permalink raw reply related

* [PATCH 7/9] x86/hyperv: Initialize cpu and memory for SEV-SNP enlightened guest
From: Tianyu Lan @ 2023-06-01 15:16 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, x86,
	hpa, daniel.lezcano, arnd, michael.h.kelley
  Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel, vkuznets
In-Reply-To: <20230601151624.1757616-1-ltykernel@gmail.com>

From: Tianyu Lan <tiala@microsoft.com>

Hyper-V enlightened guest doesn't have boot loader support.
Boot Linux kernel directly from hypervisor with data(kernel
image, initrd and parameter page) and memory for boot up that
is initialized via AMD SEV PSP proctol LAUNCH_UPDATE_DATA
(Please refernce https://www.amd.com/system/files/TechDocs/
55766_SEV-KM_API_Specification.pdf 1.3.1 Launch).

Kernel needs to read processor and memory info from EN_SEV_
SNP_PROCESSOR/MEM_INFO_ADDR address which are populated by Hyper-V.
Initialize smp cpu related ops, validate system memory and add
it into e820 table.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
 arch/x86/hyperv/ivm.c           | 93 +++++++++++++++++++++++++++++++++
 arch/x86/include/asm/mshyperv.h | 17 ++++++
 arch/x86/kernel/cpu/mshyperv.c  |  3 ++
 3 files changed, 113 insertions(+)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 5d3ee3124e00..e735507d0f54 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -17,6 +17,11 @@
 #include <asm/mem_encrypt.h>
 #include <asm/mshyperv.h>
 #include <asm/hypervisor.h>
+#include <asm/coco.h>
+#include <asm/io_apic.h>
+#include <asm/sev.h>
+#include <asm/realmode.h>
+#include <asm/e820/api.h>
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
@@ -57,6 +62,8 @@ union hv_ghcb {
 
 static u16 hv_ghcb_version __ro_after_init;
 
+static u32 processor_count;
+
 u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
 {
 	union hv_ghcb *hv_ghcb;
@@ -356,6 +363,92 @@ static bool hv_is_private_mmio(u64 addr)
 	return false;
 }
 
+static __init void hv_snp_get_smp_config(unsigned int early)
+{
+	/*
+	 * The "early" is only to be true when there is AMD
+	 * numa support. Hyper-V AMD SEV-SNP guest may not
+	 * have numa support. To make sure smp config is
+	 * always initialized, do that when early is false.
+	 */
+	if (early)
+		return;
+
+	/*
+	 * There is no firmware and ACPI MADT table support in
+	 * in the Hyper-V SEV-SNP enlightened guest. Set smp
+	 * related config variable here.
+	 */
+	while (num_processors < processor_count) {
+		early_per_cpu(x86_cpu_to_apicid, num_processors) = num_processors;
+		early_per_cpu(x86_bios_cpu_apicid, num_processors) = num_processors;
+		physid_set(num_processors, phys_cpu_present_map);
+		set_cpu_possible(num_processors, true);
+		set_cpu_present(num_processors, true);
+		num_processors++;
+	}
+}
+
+__init void hv_sev_init_mem_and_cpu(void)
+{
+	struct memory_map_entry *entry;
+	struct e820_entry *e820_entry;
+	u64 e820_end;
+	u64 ram_end;
+	u64 page;
+
+	/*
+	 * Hyper-V enlightened snp guest boots kernel
+	 * directly without bootloader. So roms, bios
+	 * regions and reserve resources are not available.
+	 * Set these callback to NULL.
+	 */
+	x86_platform.legacy.rtc			= 0;
+	x86_platform.legacy.reserve_bios_regions = 0;
+	x86_platform.set_wallclock		= set_rtc_noop;
+	x86_platform.get_wallclock		= get_rtc_noop;
+	x86_init.resources.probe_roms		= x86_init_noop;
+	x86_init.resources.reserve_resources	= x86_init_noop;
+	x86_init.mpparse.find_smp_config	= x86_init_noop;
+	x86_init.mpparse.get_smp_config		= hv_snp_get_smp_config;
+
+	/*
+	 * Hyper-V SEV-SNP enlightened guest doesn't support ioapic
+	 * and legacy APIC page read/write. Switch to hv apic here.
+	 */
+	disable_ioapic_support();
+
+	/* Get processor and mem info. */
+	processor_count = *(u32 *)__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR);
+	entry = (struct memory_map_entry *)__va(EN_SEV_SNP_MEM_INFO_ADDR);
+
+	/*
+	 * There is no bootloader/EFI firmware in the SEV SNP guest.
+	 * E820 table in the memory just describes memory for kernel,
+	 * ACPI table, cmdline, boot params and ramdisk. The dynamic
+	 * data(e.g, vcpu number and the rest memory layout) needs to
+	 * be read from EN_SEV_SNP_PROCESSOR_INFO_ADDR.
+	 */
+	for (; entry->numpages != 0; entry++) {
+		e820_entry = &e820_table->entries[
+				e820_table->nr_entries - 1];
+		e820_end = e820_entry->addr + e820_entry->size;
+		ram_end = (entry->starting_gpn +
+			   entry->numpages) * PAGE_SIZE;
+
+		if (e820_end < entry->starting_gpn * PAGE_SIZE)
+			e820_end = entry->starting_gpn * PAGE_SIZE;
+
+		if (e820_end < ram_end) {
+			pr_info("Hyper-V: add e820 entry [mem %#018Lx-%#018Lx]\n", e820_end, ram_end - 1);
+			e820__range_add(e820_end, ram_end - e820_end,
+					E820_TYPE_RAM);
+			for (page = e820_end; page < ram_end; page += PAGE_SIZE)
+				pvalidate((unsigned long)__va(page), RMP_PG_SIZE_4K, true);
+		}
+	}
+}
+
 void __init hv_vtom_init(void)
 {
 	/*
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index d859d7c5f5e8..7a9a6cdc2ae9 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -50,6 +50,21 @@ extern bool hv_isolation_type_en_snp(void);
 
 extern union hv_ghcb * __percpu *hv_ghcb_pg;
 
+/*
+ * Hyper-V puts processor and memory layout info
+ * to this address in SEV-SNP enlightened guest.
+ */
+#define EN_SEV_SNP_PROCESSOR_INFO_ADDR  0x802000
+#define EN_SEV_SNP_MEM_INFO_ADDR	0x802018
+
+struct memory_map_entry {
+	u64 starting_gpn;
+	u64 numpages;
+	u16 type;
+	u16 flags;
+	u32 reserved;
+};
+
 int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
 int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
 int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
@@ -255,12 +270,14 @@ void hv_ghcb_msr_read(u64 msr, u64 *value);
 bool hv_ghcb_negotiate_protocol(void);
 void __noreturn hv_ghcb_terminate(unsigned int set, unsigned int reason);
 void hv_vtom_init(void);
+void hv_sev_init_mem_and_cpu(void);
 #else
 static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
 static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
 static inline bool hv_ghcb_negotiate_protocol(void) { return false; }
 static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}
 static inline void hv_vtom_init(void) {}
+static inline void hv_sev_init_mem_and_cpu(void) {}
 #endif
 
 extern bool hv_isolation_type_snp(void);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 9186453251f7..48b9eab3daf6 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -528,6 +528,9 @@ static void __init ms_hyperv_init_platform(void)
 	if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT))
 		mark_tsc_unstable("running on Hyper-V");
 
+	if (hv_isolation_type_en_snp())
+		hv_sev_init_mem_and_cpu();
+
 	hardlockup_detector_disable();
 }
 
-- 
2.25.1


^ permalink raw reply related

* [PATCH 8/9] x86/hyperv: Add smp support for SEV-SNP guest
From: Tianyu Lan @ 2023-06-01 15:16 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, x86,
	hpa, daniel.lezcano, arnd, michael.h.kelley
  Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel, vkuznets
In-Reply-To: <20230601151624.1757616-1-ltykernel@gmail.com>

From: Tianyu Lan <tiala@microsoft.com>

In the AMD SEV-SNP guest, AP needs to be started up via sev es
save area and Hyper-V requires to call HVCALL_START_VP hypercall
to pass the gpa of sev es save area with AP's vp index and VTL(Virtual
trust level) parameters. Override wakeup_secondary_cpu_64 callback
with hv_snp_boot_ap.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
 arch/x86/hyperv/ivm.c             | 95 +++++++++++++++++++++++++++++++
 arch/x86/include/asm/mshyperv.h   |  9 +++
 arch/x86/kernel/cpu/mshyperv.c    | 13 ++++-
 include/asm-generic/hyperv-tlfs.h |  1 +
 4 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index e735507d0f54..238d73752dd8 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -22,11 +22,15 @@
 #include <asm/sev.h>
 #include <asm/realmode.h>
 #include <asm/e820/api.h>
+#include <asm/desc.h>
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
 #define GHCB_USAGE_HYPERV_CALL	1
 
+static u8 ap_start_input_arg[PAGE_SIZE] __bss_decrypted __aligned(PAGE_SIZE);
+static u8 ap_start_stack[PAGE_SIZE] __aligned(PAGE_SIZE);
+
 union hv_ghcb {
 	struct ghcb ghcb;
 	struct {
@@ -449,6 +453,97 @@ __init void hv_sev_init_mem_and_cpu(void)
 	}
 }
 
+#define hv_populate_vmcb_seg(seg, gdtr_base)			\
+do {								\
+	if (seg.selector) {					\
+		seg.base = 0;					\
+		seg.limit = HV_AP_SEGMENT_LIMIT;		\
+		seg.attrib = *(u16 *)(gdtr_base + seg.selector + 5);	\
+		seg.attrib = (seg.attrib & 0xFF) | ((seg.attrib >> 4) & 0xF00); \
+	}							\
+} while (0)							\
+
+int hv_snp_boot_ap(int cpu, unsigned long start_ip)
+{
+	struct sev_es_save_area *vmsa = (struct sev_es_save_area *)
+		__get_free_page(GFP_KERNEL | __GFP_ZERO);
+	struct desc_ptr gdtr;
+	u64 ret, rmp_adjust, retry = 5;
+	struct hv_enable_vp_vtl *start_vp_input;
+	unsigned long flags;
+
+	native_store_gdt(&gdtr);
+
+	vmsa->gdtr.base = gdtr.address;
+	vmsa->gdtr.limit = gdtr.size;
+
+	asm volatile("movl %%es, %%eax;" : "=a" (vmsa->es.selector));
+	hv_populate_vmcb_seg(vmsa->es, vmsa->gdtr.base);
+
+	asm volatile("movl %%cs, %%eax;" : "=a" (vmsa->cs.selector));
+	hv_populate_vmcb_seg(vmsa->cs, vmsa->gdtr.base);
+
+	asm volatile("movl %%ss, %%eax;" : "=a" (vmsa->ss.selector));
+	hv_populate_vmcb_seg(vmsa->ss, vmsa->gdtr.base);
+
+	asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
+	hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
+
+	vmsa->efer = native_read_msr(MSR_EFER);
+
+	asm volatile("movq %%cr4, %%rax;" : "=a" (vmsa->cr4));
+	asm volatile("movq %%cr3, %%rax;" : "=a" (vmsa->cr3));
+	asm volatile("movq %%cr0, %%rax;" : "=a" (vmsa->cr0));
+
+	vmsa->xcr0 = 1;
+	vmsa->g_pat = HV_AP_INIT_GPAT_DEFAULT;
+	vmsa->rip = (u64)secondary_startup_64_no_verify;
+	vmsa->rsp = (u64)&ap_start_stack[PAGE_SIZE];
+
+	/*
+	 * Set the SNP-specific fields for this VMSA:
+	 *   VMPL level
+	 *   SEV_FEATURES (matches the SEV STATUS MSR right shifted 2 bits)
+	 */
+	vmsa->vmpl = 0;
+	vmsa->sev_features = sev_status >> 2;
+
+	/*
+	 * Running at VMPL0 allows the kernel to change the VMSA bit for a page
+	 * using the RMPADJUST instruction. However, for the instruction to
+	 * succeed it must target the permissions of a lesser privileged
+	 * (higher numbered) VMPL level, so use VMPL1 (refer to the RMPADJUST
+	 * instruction in the AMD64 APM Volume 3).
+	 */
+	rmp_adjust = RMPADJUST_VMSA_PAGE_BIT | 1;
+	ret = rmpadjust((unsigned long)vmsa, RMP_PG_SIZE_4K,
+			rmp_adjust);
+	if (ret != 0) {
+		pr_err("RMPADJUST(%llx) failed: %llx\n", (u64)vmsa, ret);
+		return ret;
+	}
+
+	local_irq_save(flags);
+	start_vp_input =
+		(struct hv_enable_vp_vtl *)ap_start_input_arg;
+	memset(start_vp_input, 0, sizeof(*start_vp_input));
+	start_vp_input->partition_id = -1;
+	start_vp_input->vp_index = cpu;
+	start_vp_input->target_vtl.target_vtl = ms_hyperv.vtl;
+	*(u64 *)&start_vp_input->vp_context = __pa(vmsa) | 1;
+
+	do {
+		ret = hv_do_hypercall(HVCALL_START_VP,
+				      start_vp_input, NULL);
+	} while (hv_result(ret) == HV_STATUS_TIME_OUT && retry--);
+
+	local_irq_restore(flags);
+
+	if (!hv_result_success(ret))
+		pr_err("HvCallStartVirtualProcessor failed: %llx\n", ret);
+	return ret;
+}
+
 void __init hv_vtom_init(void)
 {
 	/*
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 7a9a6cdc2ae9..804c67475054 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -65,6 +65,13 @@ struct memory_map_entry {
 	u32 reserved;
 };
 
+/*
+ * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
+ * to start AP in enlightened SEV guest.
+ */
+#define HV_AP_INIT_GPAT_DEFAULT		0x0007040600070406ULL
+#define HV_AP_SEGMENT_LIMIT		0xffffffff
+
 int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
 int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
 int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
@@ -271,6 +278,7 @@ bool hv_ghcb_negotiate_protocol(void);
 void __noreturn hv_ghcb_terminate(unsigned int set, unsigned int reason);
 void hv_vtom_init(void);
 void hv_sev_init_mem_and_cpu(void);
+int hv_snp_boot_ap(int cpu, unsigned long start_ip);
 #else
 static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
 static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
@@ -278,6 +286,7 @@ static inline bool hv_ghcb_negotiate_protocol(void) { return false; }
 static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}
 static inline void hv_vtom_init(void) {}
 static inline void hv_sev_init_mem_and_cpu(void) {}
+static int hv_snp_boot_ap(int cpu, unsigned long start_ip) {}
 #endif
 
 extern bool hv_isolation_type_snp(void);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 48b9eab3daf6..fd37f47de134 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -295,6 +295,16 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
 
 	native_smp_prepare_cpus(max_cpus);
 
+	/*
+	 *  Override wakeup_secondary_cpu_64 callback for SEV-SNP
+	 *  enlightened guest.
+	 */
+	if (hv_isolation_type_en_snp())
+		apic->wakeup_secondary_cpu_64 = hv_snp_boot_ap;
+
+	if (!hv_root_partition)
+		return;
+
 #ifdef CONFIG_X86_64
 	for_each_present_cpu(i) {
 		if (i == 0)
@@ -501,8 +511,7 @@ static void __init ms_hyperv_init_platform(void)
 
 # ifdef CONFIG_SMP
 	smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
-	if (hv_root_partition)
-		smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
+	smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
 # endif
 
 	/*
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index f4e4cc4f965f..fdac4a1714ec 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -223,6 +223,7 @@ enum HV_GENERIC_SET_FORMAT {
 #define HV_STATUS_INVALID_PORT_ID		17
 #define HV_STATUS_INVALID_CONNECTION_ID		18
 #define HV_STATUS_INSUFFICIENT_BUFFERS		19
+#define HV_STATUS_TIME_OUT                      120
 #define HV_STATUS_VTL_ALREADY_ENABLED		134
 
 /*
-- 
2.25.1


^ permalink raw reply related

* [PATCH 6/9] clocksource: hyper-v: Mark hyperv tsc page unencrypted in sev-snp enlightened guest
From: Tianyu Lan @ 2023-06-01 15:16 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, x86,
	hpa, daniel.lezcano, arnd, michael.h.kelley
  Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel, vkuznets
In-Reply-To: <20230601151624.1757616-1-ltykernel@gmail.com>

From: Tianyu Lan <tiala@microsoft.com>

Hyper-V tsc page is shared with hypervisor and mark the page
unencrypted in sev-snp enlightened guest when it's used.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
 drivers/clocksource/hyperv_timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index bcd9042a0c9f..66e29a19770b 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -376,7 +376,7 @@ EXPORT_SYMBOL_GPL(hv_stimer_global_cleanup);
 static union {
 	struct ms_hyperv_tsc_page page;
 	u8 reserved[PAGE_SIZE];
-} tsc_pg __aligned(PAGE_SIZE);
+} tsc_pg __bss_decrypted __aligned(PAGE_SIZE);
 
 static struct ms_hyperv_tsc_page *tsc_page = &tsc_pg.page;
 static unsigned long tsc_pfn;
-- 
2.25.1


^ permalink raw reply related

* [PATCH 5/9] x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest
From: Tianyu Lan @ 2023-06-01 15:16 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, x86,
	hpa, daniel.lezcano, arnd, michael.h.kelley
  Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel, vkuznets
In-Reply-To: <20230601151624.1757616-1-ltykernel@gmail.com>

From: Tianyu Lan <tiala@microsoft.com>

In sev-snp enlightened guest, Hyper-V hypercall needs
to use vmmcall to trigger vmexit and notify hypervisor
to handle hypercall request.

There is no x86 SEV SNP feature flag support so far and
hardware provides MSR_AMD64_SEV register to check SEV-SNP
capability with MSR_AMD64_SEV_ENABLED bit. ALTERNATIVE can't
work without SEV-SNP x86 feature flag. May add later when
the associated flag is introduced. 

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
 arch/x86/include/asm/mshyperv.h | 44 ++++++++++++++++++++++++---------
 1 file changed, 33 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 31c476f4e656..d859d7c5f5e8 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -61,16 +61,25 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 	u64 hv_status;
 
 #ifdef CONFIG_X86_64
-	if (!hv_hypercall_pg)
-		return U64_MAX;
+	if (hv_isolation_type_en_snp()) {
+		__asm__ __volatile__("mov %4, %%r8\n"
+				     "vmmcall"
+				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+				       "+c" (control), "+d" (input_address)
+				     :  "r" (output_address)
+				     : "cc", "memory", "r8", "r9", "r10", "r11");
+	} else {
+		if (!hv_hypercall_pg)
+			return U64_MAX;
 
-	__asm__ __volatile__("mov %4, %%r8\n"
-			     CALL_NOSPEC
-			     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
-			       "+c" (control), "+d" (input_address)
-			     :  "r" (output_address),
-				THUNK_TARGET(hv_hypercall_pg)
-			     : "cc", "memory", "r8", "r9", "r10", "r11");
+		__asm__ __volatile__("mov %4, %%r8\n"
+				     CALL_NOSPEC
+				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+				       "+c" (control), "+d" (input_address)
+				     :  "r" (output_address),
+					THUNK_TARGET(hv_hypercall_pg)
+				     : "cc", "memory", "r8", "r9", "r10", "r11");
+	}
 #else
 	u32 input_address_hi = upper_32_bits(input_address);
 	u32 input_address_lo = lower_32_bits(input_address);
@@ -104,7 +113,13 @@ static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
 	u64 hv_status;
 
 #ifdef CONFIG_X86_64
-	{
+	if (hv_isolation_type_en_snp()) {
+		__asm__ __volatile__(
+				"vmmcall"
+				: "=a" (hv_status), ASM_CALL_CONSTRAINT,
+				"+c" (control), "+d" (input1)
+				:: "cc", "r8", "r9", "r10", "r11");
+	} else {
 		__asm__ __volatile__(CALL_NOSPEC
 				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
 				       "+c" (control), "+d" (input1)
@@ -149,7 +164,14 @@ static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
 	u64 hv_status;
 
 #ifdef CONFIG_X86_64
-	{
+	if (hv_isolation_type_en_snp()) {
+		__asm__ __volatile__("mov %4, %%r8\n"
+				     "vmmcall"
+				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+				       "+c" (control), "+d" (input1)
+				     : "r" (input2)
+				     : "cc", "r8", "r9", "r10", "r11");
+	} else {
 		__asm__ __volatile__("mov %4, %%r8\n"
 				     CALL_NOSPEC
 				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
-- 
2.25.1


^ permalink raw reply related

* [PATCH 4/9] drivers: hv: Mark shared pages unencrypted in SEV-SNP enlightened guest
From: Tianyu Lan @ 2023-06-01 15:16 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, x86,
	hpa, daniel.lezcano, arnd, michael.h.kelley
  Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel, vkuznets
In-Reply-To: <20230601151624.1757616-1-ltykernel@gmail.com>

From: Tianyu Lan <tiala@microsoft.com>

Hypervisor needs to access iput arg, VMBus synic event and
message pages. Mask these pages unencrypted in the sev-snp
guest and free them only if they have been marked encrypted
successfully.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
 drivers/hv/hv.c        | 57 +++++++++++++++++++++++++++++++++++++++---
 drivers/hv/hv_common.c | 24 +++++++++++++++++-
 2 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index de6708dbe0df..94406dbe0df0 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -20,6 +20,7 @@
 #include <linux/interrupt.h>
 #include <clocksource/hyperv_timer.h>
 #include <asm/mshyperv.h>
+#include <linux/set_memory.h>
 #include "hyperv_vmbus.h"
 
 /* The one and only */
@@ -78,7 +79,7 @@ int hv_post_message(union hv_connection_id connection_id,
 
 int hv_synic_alloc(void)
 {
-	int cpu;
+	int cpu, ret = -ENOMEM;
 	struct hv_per_cpu_context *hv_cpu;
 
 	/*
@@ -123,26 +124,76 @@ int hv_synic_alloc(void)
 				goto err;
 			}
 		}
+
+		if (hv_isolation_type_en_snp()) {
+			ret = set_memory_decrypted((unsigned long)
+				hv_cpu->synic_message_page, 1);
+			if (ret) {
+				pr_err("Failed to decrypt SYNIC msg page: %d\n", ret);
+				hv_cpu->synic_message_page = NULL;
+
+				/*
+				 * Free the event page here and not encrypt
+				 * the page in hv_synic_free().
+				 */
+				free_page((unsigned long)hv_cpu->synic_event_page);
+				hv_cpu->synic_event_page = NULL;
+				goto err;
+			}
+
+			ret = set_memory_decrypted((unsigned long)
+				hv_cpu->synic_event_page, 1);
+			if (ret) {
+				pr_err("Failed to decrypt SYNIC event page: %d\n", ret);
+				hv_cpu->synic_event_page = NULL;
+				goto err;
+			}
+
+			memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
+			memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
+		}
 	}
 
 	return 0;
+
 err:
 	/*
 	 * Any memory allocations that succeeded will be freed when
 	 * the caller cleans up by calling hv_synic_free()
 	 */
-	return -ENOMEM;
+	return ret;
 }
 
 
 void hv_synic_free(void)
 {
-	int cpu;
+	int cpu, ret;
 
 	for_each_present_cpu(cpu) {
 		struct hv_per_cpu_context *hv_cpu
 			= per_cpu_ptr(hv_context.cpu_context, cpu);
 
+		/* It's better to leak the page if the encryption fails. */
+		if (hv_isolation_type_en_snp()) {
+			if (hv_cpu->synic_message_page) {
+				ret = set_memory_encrypted((unsigned long)
+					hv_cpu->synic_message_page, 1);
+				if (ret) {
+					pr_err("Failed to encrypt SYNIC msg page: %d\n", ret);
+					hv_cpu->synic_message_page = NULL;
+				}
+			}
+
+			if (hv_cpu->synic_event_page) {
+				ret = set_memory_encrypted((unsigned long)
+					hv_cpu->synic_event_page, 1);
+				if (ret) {
+					pr_err("Failed to encrypt SYNIC event page: %d\n", ret);
+					hv_cpu->synic_event_page = NULL;
+				}
+			}
+		}
+
 		free_page((unsigned long)hv_cpu->synic_event_page);
 		free_page((unsigned long)hv_cpu->synic_message_page);
 	}
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 179bc5f5bf52..bed9aa6ac19a 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -24,6 +24,7 @@
 #include <linux/kmsg_dump.h>
 #include <linux/slab.h>
 #include <linux/dma-map-ops.h>
+#include <linux/set_memory.h>
 #include <asm/hyperv-tlfs.h>
 #include <asm/mshyperv.h>
 
@@ -359,6 +360,7 @@ int hv_common_cpu_init(unsigned int cpu)
 	u64 msr_vp_index;
 	gfp_t flags;
 	int pgcount = hv_root_partition ? 2 : 1;
+	int ret;
 
 	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
 	flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
@@ -368,6 +370,17 @@ int hv_common_cpu_init(unsigned int cpu)
 	if (!(*inputarg))
 		return -ENOMEM;
 
+	if (hv_isolation_type_en_snp()) {
+		ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
+		if (ret) {
+			kfree(*inputarg);
+			*inputarg = NULL;
+			return ret;
+		}
+
+		memset(*inputarg, 0x00, pgcount * PAGE_SIZE);
+	}
+
 	if (hv_root_partition) {
 		outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
 		*outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
@@ -387,7 +400,9 @@ int hv_common_cpu_die(unsigned int cpu)
 {
 	unsigned long flags;
 	void **inputarg, **outputarg;
+	int pgcount = hv_root_partition ? 2 : 1;
 	void *mem;
+	int ret;
 
 	local_irq_save(flags);
 
@@ -402,7 +417,14 @@ int hv_common_cpu_die(unsigned int cpu)
 
 	local_irq_restore(flags);
 
-	kfree(mem);
+	if (hv_isolation_type_en_snp()) {
+		ret = set_memory_encrypted((unsigned long)mem, pgcount);
+		if (ret)
+			pr_warn("Hyper-V: Failed to encrypt input arg on cpu%d: %d\n",
+				cpu, ret);
+		/* It's unsafe to free 'mem'. */
+		return 0;
+	}
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related

* [PATCH 3/9] x86/hyperv: Mark Hyper-V vp assist page unencrypted in SEV-SNP enlightened guest
From: Tianyu Lan @ 2023-06-01 15:16 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, x86,
	hpa, daniel.lezcano, arnd, michael.h.kelley
  Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel, vkuznets
In-Reply-To: <20230601151624.1757616-1-ltykernel@gmail.com>

From: Tianyu Lan <tiala@microsoft.com>

hv vp assist page needs to be shared between SEV-SNP guest and Hyper-V.
So mark the page unencrypted in the SEV-SNP guest.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
 arch/x86/hyperv/hv_init.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index b4a2327c823b..331b855314b7 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -18,6 +18,7 @@
 #include <asm/hyperv-tlfs.h>
 #include <asm/mshyperv.h>
 #include <asm/idtentry.h>
+#include <asm/set_memory.h>
 #include <linux/kexec.h>
 #include <linux/version.h>
 #include <linux/vmalloc.h>
@@ -113,6 +114,11 @@ static int hv_cpu_init(unsigned int cpu)
 
 	}
 	if (!WARN_ON(!(*hvp))) {
+		if (hv_isolation_type_en_snp()) {
+			WARN_ON_ONCE(set_memory_decrypted((unsigned long)(*hvp), 1));
+			memset(*hvp, 0, PAGE_SIZE);
+		}
+
 		msr.enable = 1;
 		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
 	}
-- 
2.25.1


^ permalink raw reply related

* [PATCH 2/9] x86/hyperv: Set Virtual Trust Level in VMBus init message
From: Tianyu Lan @ 2023-06-01 15:16 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, x86,
	hpa, daniel.lezcano, arnd, michael.h.kelley
  Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel, vkuznets
In-Reply-To: <20230601151624.1757616-1-ltykernel@gmail.com>

From: Tianyu Lan <tiala@microsoft.com>

SEV-SNP guest provides vtl(Virtual Trust Level) and
get it from Hyper-V hvcall via register hvcall HVCALL_
GET_VP_REGISTERS.

During initialization of VMBus, vtl needs to be set in the
VMBus init message.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
 arch/x86/hyperv/hv_init.c          | 36 ++++++++++++++++++++++++++++++
 arch/x86/include/asm/hyperv-tlfs.h |  7 ++++++
 drivers/hv/connection.c            |  1 +
 include/asm-generic/mshyperv.h     |  1 +
 include/linux/hyperv.h             |  4 ++--
 5 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index a5f9474f08e1..b4a2327c823b 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -378,6 +378,40 @@ static void __init hv_get_partition_id(void)
 	local_irq_restore(flags);
 }
 
+static u8 __init get_vtl(void)
+{
+	u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS;
+	struct hv_get_vp_registers_input *input;
+	struct hv_get_vp_registers_output *output;
+	u64 vtl = 0;
+	u64 ret;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	output = (struct hv_get_vp_registers_output *)input;
+	if (!input) {
+		local_irq_restore(flags);
+		goto done;
+	}
+
+	memset(input, 0, struct_size(input, element, 1));
+	input->header.partitionid = HV_PARTITION_ID_SELF;
+	input->header.vpindex = HV_VP_INDEX_SELF;
+	input->header.inputvtl = 0;
+	input->element[0].name0 = HV_X64_REGISTER_VSM_VP_STATUS;
+
+	ret = hv_do_hypercall(control, input, output);
+	if (hv_result_success(ret))
+		vtl = output->as64.low & HV_X64_VTL_MASK;
+	else
+		pr_err("Hyper-V: failed to get VTL! %lld", ret);
+	local_irq_restore(flags);
+
+done:
+	return vtl;
+}
+
 /*
  * This function is to be invoked early in the boot sequence after the
  * hypervisor has been detected.
@@ -506,6 +540,8 @@ void __init hyperv_init(void)
 	/* Query the VMs extended capability once, so that it can be cached. */
 	hv_query_ext_cap(0);
 
+	/* Find the VTL */
+	ms_hyperv.vtl = get_vtl();
 	return;
 
 clean_guest_os_id:
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index cea95dcd27c2..4bf0b315b0ce 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -301,6 +301,13 @@ enum hv_isolation_type {
 #define HV_X64_MSR_TIME_REF_COUNT	HV_REGISTER_TIME_REF_COUNT
 #define HV_X64_MSR_REFERENCE_TSC	HV_REGISTER_REFERENCE_TSC
 
+/*
+ * Registers are only accessible via HVCALL_GET_VP_REGISTERS hvcall and
+ * there is not associated MSR address.
+ */
+#define	HV_X64_REGISTER_VSM_VP_STATUS	0x000D0003
+#define	HV_X64_VTL_MASK			GENMASK(3, 0)
+
 /* Hyper-V memory host visibility */
 enum hv_mem_host_visibility {
 	VMBUS_PAGE_NOT_VISIBLE		= 0,
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 5978e9dbc286..02b54f85dc60 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -98,6 +98,7 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 	 */
 	if (version >= VERSION_WIN10_V5) {
 		msg->msg_sint = VMBUS_MESSAGE_SINT;
+		msg->msg_vtl = ms_hyperv.vtl;
 		vmbus_connection.msg_conn_id = VMBUS_MESSAGE_CONNECTION_ID_4;
 	} else {
 		msg->interrupt_page = virt_to_phys(vmbus_connection.int_page);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index d444f831d633..c7a90f91c0d3 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -54,6 +54,7 @@ struct ms_hyperv_info {
 		};
 	};
 	u64 shared_gpa_boundary;
+	u8 vtl;
 };
 extern struct ms_hyperv_info ms_hyperv;
 extern bool hv_nested;
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index bfbc37ce223b..1f2bfec4abde 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -665,8 +665,8 @@ struct vmbus_channel_initiate_contact {
 		u64 interrupt_page;
 		struct {
 			u8	msg_sint;
-			u8	padding1[3];
-			u32	padding2;
+			u8	msg_vtl;
+			u8	reserved[6];
 		};
 	};
 	u64 monitor_page1;
-- 
2.25.1


^ permalink raw reply related

* [PATCH 1/9] x86/hyperv: Add sev-snp enlightened guest static key
From: Tianyu Lan @ 2023-06-01 15:16 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, x86,
	hpa, daniel.lezcano, arnd, michael.h.kelley
  Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel, vkuznets
In-Reply-To: <20230601151624.1757616-1-ltykernel@gmail.com>

From: Tianyu Lan <tiala@microsoft.com>

Introduce static key isolation_type_en_snp for enlightened
sev-snp guest check.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
 arch/x86/hyperv/ivm.c           | 11 +++++++++++
 arch/x86/include/asm/mshyperv.h |  3 +++
 arch/x86/kernel/cpu/mshyperv.c  |  8 ++++++--
 drivers/hv/hv_common.c          |  6 ++++++
 include/asm-generic/mshyperv.h  | 12 +++++++++---
 5 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index cc92388b7a99..5d3ee3124e00 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -409,3 +409,14 @@ bool hv_isolation_type_snp(void)
 {
 	return static_branch_unlikely(&isolation_type_snp);
 }
+
+DEFINE_STATIC_KEY_FALSE(isolation_type_en_snp);
+/*
+ * hv_isolation_type_en_snp - Check system runs in the AMD SEV-SNP based
+ * isolation enlightened VM.
+ */
+bool hv_isolation_type_en_snp(void)
+{
+	return static_branch_unlikely(&isolation_type_en_snp);
+}
+
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 49bb4f2bd300..31c476f4e656 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -26,6 +26,7 @@
 union hv_ghcb;
 
 DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
+DECLARE_STATIC_KEY_FALSE(isolation_type_en_snp);
 
 typedef int (*hyperv_fill_flush_list_func)(
 		struct hv_guest_mapping_flush_list *flush,
@@ -45,6 +46,8 @@ extern void *hv_hypercall_pg;
 
 extern u64 hv_current_partition_id;
 
+extern bool hv_isolation_type_en_snp(void);
+
 extern union hv_ghcb * __percpu *hv_ghcb_pg;
 
 int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index c7969e806c64..9186453251f7 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -402,8 +402,12 @@ static void __init ms_hyperv_init_platform(void)
 		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
 			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
 
-		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
+
+		if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
+			static_branch_enable(&isolation_type_en_snp);
+		} else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
 			static_branch_enable(&isolation_type_snp);
+		}
 	}
 
 	if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
@@ -473,7 +477,7 @@ static void __init ms_hyperv_init_platform(void)
 
 #if IS_ENABLED(CONFIG_HYPERV)
 	if ((hv_get_isolation_type() == HV_ISOLATION_TYPE_VBS) ||
-	    (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP))
+	    ms_hyperv.paravisor_present)
 		hv_vtom_init();
 	/*
 	 * Setup the hook to get control post apic initialization.
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 64f9ceca887b..179bc5f5bf52 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -502,6 +502,12 @@ bool __weak hv_isolation_type_snp(void)
 }
 EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
 
+bool __weak hv_isolation_type_en_snp(void)
+{
+	return false;
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_en_snp);
+
 void __weak hv_setup_vmbus_handler(void (*handler)(void))
 {
 }
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 402a8c1c202d..d444f831d633 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -36,15 +36,21 @@ struct ms_hyperv_info {
 	u32 nested_features;
 	u32 max_vp_index;
 	u32 max_lp_index;
-	u32 isolation_config_a;
+	union {
+		u32 isolation_config_a;
+		struct {
+			u32 paravisor_present : 1;
+			u32 reserved1 : 31;
+		};
+	};
 	union {
 		u32 isolation_config_b;
 		struct {
 			u32 cvm_type : 4;
-			u32 reserved1 : 1;
+			u32 reserved2 : 1;
 			u32 shared_gpa_boundary_active : 1;
 			u32 shared_gpa_boundary_bits : 6;
-			u32 reserved2 : 20;
+			u32 reserved3 : 20;
 		};
 	};
 	u64 shared_gpa_boundary;
-- 
2.25.1


^ permalink raw reply related

* [PATCH 0/9] x86/hyperv: Add AMD sev-snp enlightened guest support on hyperv
From: Tianyu Lan @ 2023-06-01 15:16 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, x86,
	hpa, daniel.lezcano, arnd, michael.h.kelley
  Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel, vkuznets

From: Tianyu Lan <tiala@microsoft.com>

Hyper-V provides two modes for running SEV-SNP VMs:

1) In vTOM mode with a paravisor (see Section 15.36.8 of [1])
2) In "fully enlightened" mode with normal "C" bit control
   over page encryption, and no paravisor

For #1, the paravisor runs in VMPL 0, while Linux runs in VMPL 2
(see Section 15.36.7 of [1]). The paravisor is typically provided
by Hyper-V and handles most of the SNP-related functionality. As
such, most of the SNP functionality in the Linux guest is bypassed.
The guest operates in vTOM mode, where encryption is enabled by default.
The guest must still request page transitions between private and shared,
but there is relatively less SNP machinery required in the guest. Support
for this mode of operation first went upstream in the 5.15 kernel.

For #2, this patch set provides the initial support. The existing
SEV-SNP machinery in the kernel is fully used, but Hyper-V specific
updates are required to properly share Hyper-V communication pages
between the guest and host and to start APs at boot time.

In either mode, Hyper-V requires that the guest implement the SEV-SNP
Restricted Interrupt Injection feature (see Section 15.36.16 of [1],
and Section 5 of [2]). Without this feature, the guest is subject to
attack by a compromised hypervisor that can inject any exception at
any time, such as injecting an interrupt while the guest has interrupts
disabled. In vTOM mode, Restricted Interrupt Injection is implemented
by the paravisor, so no Linux guest changes are required. But in fully
enlightened mode, the Linux guest must provide the implementation.

This patch set is derived from an earlier patch set that includes both
the Hyper-V specific changes and Restricted Interrupt Injection support.[3]
But it is now limited to only the Hyper-V specific changes. The Restricted
Interrupt Injection support will come later in a separate patch set.

[1] https://www.amd.com/system/files/TechDocs/24593.pdf
[2] https://www.amd.com/system/files/TechDocs/56421-guest-hypervisor-communication-block-standardization.pdf
[3] https://lore.kernel.org/lkml/20230515165917.1306922-1-ltykernel@gmail.com/

Tianyu Lan (9):
  x86/hyperv: Add sev-snp enlightened guest static key
  x86/hyperv: Set Virtual Trust Level in VMBus init message
  x86/hyperv: Mark Hyper-V vp assist page unencrypted in SEV-SNP
    enlightened guest
  drivers: hv: Mark percpu hvcall input arg page unencrypted in SEV-SNP
    enlightened guest
  x86/hyperv: Use vmmcall to implement Hyper-V hypercall in  sev-snp
    enlightened guest
  clocksource: hyper-v: Mark hyperv tsc page unencrypted in sev-snp
    enlightened guest
  x86/hyperv: Initialize cpu and memory for SEV-SNP enlightened guest
  x86/hyperv: Add smp support for SEV-SNP guest
  x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES

 arch/x86/hyperv/hv_init.c          |  42 ++++++
 arch/x86/hyperv/ivm.c              | 199 +++++++++++++++++++++++++++++
 arch/x86/include/asm/hyperv-tlfs.h |   7 +
 arch/x86/include/asm/mshyperv.h    |  73 +++++++++--
 arch/x86/kernel/cpu/mshyperv.c     |  41 +++++-
 drivers/clocksource/hyperv_timer.c |   2 +-
 drivers/hv/connection.c            |   1 +
 drivers/hv/hv.c                    |  57 ++++++++-
 drivers/hv/hv_common.c             |  30 ++++-
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h     |  13 +-
 include/linux/hyperv.h             |   4 +-
 12 files changed, 445 insertions(+), 25 deletions(-)

-- 
2.25.1

^ permalink raw reply

* Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR
From: Juergen Gross @ 2023-06-01 14:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, linux-hyperv, linux-doc, mikelley,
	Thomas Gleixner, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Boris Ostrovsky, xen-devel, Jonathan Corbet, Andy Lutomirski,
	Peter Zijlstra
In-Reply-To: <20230601143310.GDZHisJlbH5myXGQ8d@fat_crate.local>


[-- Attachment #1.1.1: Type: text/plain, Size: 806 bytes --]

On 01.06.23 16:33, Borislav Petkov wrote:
> On Thu, Jun 01, 2023 at 03:22:33PM +0200, Borislav Petkov wrote:
>> Now lemme restart testing.
> 
> This is from another box, with the latest changes incorporated:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=rc1-mtrr
> 
> --- proc-mtrr.before    2011-03-04 01:03:35.243994733 +0100
> +++ proc-mtrr.after     2023-06-01 16:28:54.959999456 +0200
> @@ -1,3 +1,3 @@
>   reg00: base=0x000000000 (    0MB), size= 2048MB, count=1: write-back
>   reg01: base=0x080000000 ( 2048MB), size= 1024MB, count=1: write-back
> -reg02: base=0x0c0000000 ( 3072MB), size=  256MB, count=1: write-back
> +reg02: base=0x0c0000000 ( 3072MB), size=  128MB, count=1: write-back
> 
> Want mtrr=debug output again?
> 

Yes, please


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply

* Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR
From: Borislav Petkov @ 2023-06-01 14:33 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, x86, linux-hyperv, linux-doc, mikelley,
	Thomas Gleixner, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Boris Ostrovsky, xen-devel, Jonathan Corbet, Andy Lutomirski,
	Peter Zijlstra
In-Reply-To: <20230601132233.GCZHibmemkxYxvTjSu@fat_crate.local>

On Thu, Jun 01, 2023 at 03:22:33PM +0200, Borislav Petkov wrote:
> Now lemme restart testing.

This is from another box, with the latest changes incorporated:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=rc1-mtrr

--- proc-mtrr.before    2011-03-04 01:03:35.243994733 +0100
+++ proc-mtrr.after     2023-06-01 16:28:54.959999456 +0200
@@ -1,3 +1,3 @@
 reg00: base=0x000000000 (    0MB), size= 2048MB, count=1: write-back
 reg01: base=0x080000000 ( 2048MB), size= 1024MB, count=1: write-back
-reg02: base=0x0c0000000 ( 3072MB), size=  256MB, count=1: write-back
+reg02: base=0x0c0000000 ( 3072MB), size=  128MB, count=1: write-back

Want mtrr=debug output again?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR
From: Borislav Petkov @ 2023-06-01 13:22 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, x86, linux-hyperv, linux-doc, mikelley,
	Thomas Gleixner, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Boris Ostrovsky, xen-devel, Jonathan Corbet, Andy Lutomirski,
	Peter Zijlstra
In-Reply-To: <fe33476d-0ec1-16f1-5874-8a5ff1650c3f@suse.com>

On Thu, Jun 01, 2023 at 10:19:01AM +0200, Juergen Gross wrote:
> Patch 2 wants this diff on top:

Obviously. :-)

That fixes it, thx.

Now lemme restart testing.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR
From: Juergen Gross @ 2023-06-01 13:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, linux-hyperv, linux-doc, mikelley,
	Thomas Gleixner, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Boris Ostrovsky, xen-devel, Jonathan Corbet, Andy Lutomirski,
	Peter Zijlstra
In-Reply-To: <20230531174857.GDZHeIib57h5lT5Vh1@fat_crate.local>


[-- Attachment #1.1.1: Type: text/plain, Size: 481 bytes --]

On 31.05.23 19:48, Borislav Petkov wrote:
> On Wed, May 31, 2023 at 04:20:08PM +0200, Juergen Gross wrote:
>> One other note: why does mtrr_cleanup() think that using 8 instead of 6
>> variable MTRRs would be an "optimal setting"?
> 
> Maybe the more extensive debug output below would help answer that...
> 
>> IMO it should replace the original setup only in case it is using _less_
>> MTRRs than before.
> 
> Right.

The attached patch will do that.


Juergen


[-- Attachment #1.1.2: v7-0001-x86-mtrr-Let-mtrr_cleanup-not-increase-number-of-.patch --]
[-- Type: text/x-patch, Size: 2367 bytes --]

From 7989ef9822115a708fc2ba3f7740888a350cb40f Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Thu, 1 Jun 2023 14:40:58 +0200
Subject: [PATCH v7] x86/mtrr: Let mtrr_cleanup() not increase number of used
 MTRRs

Today mtrr_cleanup() will always use the best found alternative MTRR
setting, even if this setting is using more variable MTRRs than the
BIOS provided setup.

Add a check that only settings with less variable MTRRs are used.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V7:
- new patch
---
 arch/x86/kernel/cpu/mtrr/cleanup.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
index a7cb5d32d03d..a5d331722092 100644
--- a/arch/x86/kernel/cpu/mtrr/cleanup.c
+++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
@@ -567,7 +567,7 @@ static int __init mtrr_need_cleanup(void)
 	    num_var_ranges - num[MTRR_NUM_TYPES])
 		return 0;
 
-	return 1;
+	return num_var_ranges - num[MTRR_NUM_TYPES];
 }
 
 static unsigned long __initdata range_sums;
@@ -673,6 +673,7 @@ int __init mtrr_cleanup(void)
 	u64 chunk_size, gran_size;
 	mtrr_type type;
 	int index_good;
+	int num_used;
 	int i;
 
 	if (!cpu_feature_enabled(X86_FEATURE_MTRR) || enable_mtrr_cleanup < 1)
@@ -693,7 +694,8 @@ int __init mtrr_cleanup(void)
 	}
 
 	/* Check if we need handle it and can handle it: */
-	if (!mtrr_need_cleanup())
+	num_used = mtrr_need_cleanup();
+	if (!num_used)
 		return 0;
 
 	/* Print original var MTRRs at first, for debugging: */
@@ -728,6 +730,10 @@ int __init mtrr_cleanup(void)
 		mtrr_print_out_one_result(i);
 
 		if (!result[i].bad) {
+			if (result[i].num_reg >= num_used) {
+				Dprintk("BIOS provided MTRR setting is better than found one\n");
+				return 0;
+			}
 			set_var_mtrr_all();
 			Dprintk("New variable MTRRs\n");
 			print_out_mtrr_range_state();
@@ -762,8 +768,12 @@ int __init mtrr_cleanup(void)
 	index_good = mtrr_search_optimal_index();
 
 	if (index_good != -1) {
-		pr_info("Found optimal setting for mtrr clean up\n");
 		i = index_good;
+		if (result[i].num_reg >= num_used) {
+			Dprintk("BIOS provided MTRR setting is better than found one\n");
+			return 0;
+		}
+		pr_info("Found optimal setting for mtrr clean up\n");
 		mtrr_print_out_one_result(i);
 
 		/* Convert ranges to var ranges state: */
-- 
2.35.3


[-- Attachment #1.1.3: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox