Linux-HyperV List
 help / color / mirror / Atom feed
* Re: [PATCH v4] Drivers: hv: Cosmetic changes for hv.c and balloon.c
From: Wei Liu @ 2024-05-28  5:17 UTC (permalink / raw)
  To: Aditya Nagesh
  Cc: adityanagesh, kys, haiyangz, wei.liu, decui, linux-hyperv,
	linux-kernel
In-Reply-To: <1713842326-25576-1-git-send-email-adityanagesh@linux.microsoft.com>

On Mon, Apr 22, 2024 at 08:18:46PM -0700, Aditya Nagesh wrote:
> Fix issues reported by checkpatch.pl script in hv.c and
> balloon.c
>  - Remove unnecessary parentheses
>  - Remove extra newlines
>  - Remove extra spaces
>  - Add spaces between comparison operators
>  - Remove comparison with NULL in if statements
> 
> No functional changes intended
> 
> Signed-off-by: Aditya Nagesh <adityanagesh@linux.microsoft.com>
> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>

Applied to hyperv-fixes, thanks!

^ permalink raw reply

* [PATCH 1/1] clocksource: use symbolic error names (%pe) to print logs
From: Onkarnarth @ 2024-05-24 13:57 UTC (permalink / raw)
  To: mark.rutland, maz, daniel.lezcano, tglx, patrice.chotard,
	mcoquelin.stm32, alexandre.torgue, kys, haiyangz, wei.liu, decui,
	tsbogend, fancer.lancer, liviu.dudau, sudeep.holla, lpieralisi,
	baruch, verdun, nick.hawkins, shawnguo, s.hauer, kernel, festevam,
	vz, afaerber, manivannan.sadhasivam, paul.walmsley, palmer, aou,
	thierry.reding, jonathanh
  Cc: linux-arm-kernel, linux-kernel, linux-stm32, linux-hyperv,
	linux-mips, imx, linux-actions, linux-riscv, linux-tegra,
	r.thapliyal, Onkarnath, Maninder Singh
In-Reply-To: <CGME20240524135817epcas5p47d24bc69e88b3c44bee0153daa16f148@epcas5p4.samsung.com>

From: Onkarnath <onkarnath.1@samsung.com>

It is better to use %pe instead of %d or such to print logs
for enhanced error logs readbility.

Error print logs format is more style consistent now.

Co-developed-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Onkarnath <onkarnath.1@samsung.com>
---
 drivers/clocksource/arm_arch_timer.c   |  2 +-
 drivers/clocksource/arm_global_timer.c |  4 ++--
 drivers/clocksource/armv7m_systick.c   |  4 ++--
 drivers/clocksource/hyperv_timer.c     |  6 +++---
 drivers/clocksource/jcore-pit.c        |  4 ++--
 drivers/clocksource/mips-gic-timer.c   |  2 +-
 drivers/clocksource/mps2-timer.c       | 18 +++++++++---------
 drivers/clocksource/timer-clint.c      |  6 +++---
 drivers/clocksource/timer-digicolor.c  |  2 +-
 drivers/clocksource/timer-fsl-ftm.c    | 16 ++++++++--------
 drivers/clocksource/timer-gxp.c        |  8 ++++----
 drivers/clocksource/timer-imx-tpm.c    |  2 +-
 drivers/clocksource/timer-lpc32xx.c    | 10 +++++-----
 drivers/clocksource/timer-owl.c        |  4 ++--
 drivers/clocksource/timer-pistachio.c  |  8 ++++----
 drivers/clocksource/timer-probe.c      |  4 ++--
 drivers/clocksource/timer-riscv.c      |  8 ++++----
 drivers/clocksource/timer-sp804.c      |  4 ++--
 drivers/clocksource/timer-tegra.c      |  8 ++++----
 drivers/clocksource/timer-tegra186.c   | 14 +++++++-------
 drivers/clocksource/timer-zevio.c      |  2 +-
 21 files changed, 68 insertions(+), 68 deletions(-)

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 5bb43cc1a8df..e36cc8e544cf 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -1273,7 +1273,7 @@ static int __init arch_timer_register(void)
 	}
 
 	if (err) {
-		pr_err("can't register interrupt %d (%d)\n", ppi, err);
+		pr_err("can't register interrupt %d: %pe\n", ppi, ERR_PTR(err));
 		goto out_free;
 	}
 
diff --git a/drivers/clocksource/arm_global_timer.c b/drivers/clocksource/arm_global_timer.c
index ab1c8c2b66b8..9ed1e9564227 100644
--- a/drivers/clocksource/arm_global_timer.c
+++ b/drivers/clocksource/arm_global_timer.c
@@ -399,8 +399,8 @@ static int __init global_timer_of_register(struct device_node *np)
 	err = request_percpu_irq(gt_ppi, gt_clockevent_interrupt,
 				 "gt", gt_evt);
 	if (err) {
-		pr_warn("global-timer: can't register interrupt %d (%d)\n",
-			gt_ppi, err);
+		pr_warn("global-timer: can't register interrupt %d: %pe\n",
+			gt_ppi, ERR_PTR(err));
 		goto out_free;
 	}
 
diff --git a/drivers/clocksource/armv7m_systick.c b/drivers/clocksource/armv7m_systick.c
index 7e78074480e4..15f5dd2ffdae 100644
--- a/drivers/clocksource/armv7m_systick.c
+++ b/drivers/clocksource/armv7m_systick.c
@@ -60,7 +60,7 @@ static int __init system_timer_of_register(struct device_node *np)
 	ret = clocksource_mmio_init(base + SYST_CVR, "arm_system_timer", rate,
 			200, 24, clocksource_mmio_readl_down);
 	if (ret) {
-		pr_err("failed to init clocksource (%d)\n", ret);
+		pr_err("failed to init clocksource: %pe\n", ERR_PTR(ret));
 		if (clk)
 			goto out_clk_disable;
 		else
@@ -77,7 +77,7 @@ static int __init system_timer_of_register(struct device_node *np)
 	clk_put(clk);
 out_unmap:
 	iounmap(base);
-	pr_warn("ARM System timer register failed (%d)\n", ret);
+	pr_warn("ARM System timer register failed: %pe\n", ERR_PTR(ret));
 
 	return ret;
 }
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index b2a080647e41..7d6bb26b2b3c 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -205,7 +205,7 @@ static int hv_setup_stimer0_irq(void)
 	ret = acpi_register_gsi(NULL, HYPERV_STIMER0_VECTOR,
 			ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_HIGH);
 	if (ret < 0) {
-		pr_err("Can't register Hyper-V stimer0 GSI. Error %d", ret);
+		pr_err("Can't register Hyper-V stimer0 GSI. Error: %pe", ERR_PTR(ret));
 		return ret;
 	}
 	stimer0_irq = ret;
@@ -213,8 +213,8 @@ static int hv_setup_stimer0_irq(void)
 	ret = request_percpu_irq(stimer0_irq, hv_stimer0_percpu_isr,
 		"Hyper-V stimer0", &stimer0_evt);
 	if (ret) {
-		pr_err("Can't request Hyper-V stimer0 IRQ %d. Error %d",
-			stimer0_irq, ret);
+		pr_err("Can't request Hyper-V stimer0 IRQ %d. Error: %pe",
+			stimer0_irq, ERR_PTR(ret));
 		acpi_unregister_gsi(stimer0_irq);
 		stimer0_irq = -1;
 	}
diff --git a/drivers/clocksource/jcore-pit.c b/drivers/clocksource/jcore-pit.c
index a4a991101fa3..dfd1e77377ee 100644
--- a/drivers/clocksource/jcore-pit.c
+++ b/drivers/clocksource/jcore-pit.c
@@ -156,7 +156,7 @@ static int __init jcore_pit_init(struct device_node *node)
 				    NSEC_PER_SEC, 400, 32,
 				    jcore_clocksource_read);
 	if (err) {
-		pr_err("Error registering clocksource device: %d\n", err);
+		pr_err("Error registering clocksource device: %pe\n", ERR_PTR(err));
 		return err;
 	}
 
@@ -172,7 +172,7 @@ static int __init jcore_pit_init(struct device_node *node)
 			  IRQF_TIMER | IRQF_PERCPU,
 			  "jcore_pit", jcore_pit_percpu);
 	if (err) {
-		pr_err("pit irq request failed: %d\n", err);
+		pr_err("pit irq request failed: %pe\n", ERR_PTR(err));
 		free_percpu(jcore_pit_percpu);
 		return err;
 	}
diff --git a/drivers/clocksource/mips-gic-timer.c b/drivers/clocksource/mips-gic-timer.c
index b3ae38f36720..628b3aec2b45 100644
--- a/drivers/clocksource/mips-gic-timer.c
+++ b/drivers/clocksource/mips-gic-timer.c
@@ -150,7 +150,7 @@ static int gic_clockevent_init(void)
 
 	ret = setup_percpu_irq(gic_timer_irq, &gic_compare_irqaction);
 	if (ret < 0) {
-		pr_err("IRQ %d setup failed (%d)\n", gic_timer_irq, ret);
+		pr_err("IRQ %d setup failed: %pe\n", gic_timer_irq, ERR_PTR(ret));
 		return ret;
 	}
 
diff --git a/drivers/clocksource/mps2-timer.c b/drivers/clocksource/mps2-timer.c
index efe8cad8f2a5..5e2dcb792741 100644
--- a/drivers/clocksource/mps2-timer.c
+++ b/drivers/clocksource/mps2-timer.c
@@ -109,13 +109,13 @@ static int __init mps2_clockevent_init(struct device_node *np)
 		clk = of_clk_get(np, 0);
 		if (IS_ERR(clk)) {
 			ret = PTR_ERR(clk);
-			pr_err("failed to get clock for clockevent: %d\n", ret);
+			pr_err("failed to get clock for clockevent: %pe\n", clk);
 			goto out;
 		}
 
 		ret = clk_prepare_enable(clk);
 		if (ret) {
-			pr_err("failed to enable clock for clockevent: %d\n", ret);
+			pr_err("failed to enable clock for clockevent: %pe\n", ERR_PTR(ret));
 			goto out_clk_put;
 		}
 
@@ -125,14 +125,14 @@ static int __init mps2_clockevent_init(struct device_node *np)
 	base = of_iomap(np, 0);
 	if (!base) {
 		ret = -EADDRNOTAVAIL;
-		pr_err("failed to map register for clockevent: %d\n", ret);
+		pr_err("failed to map register for clockevent: %pe\n", ERR_PTR(ret));
 		goto out_clk_disable;
 	}
 
 	irq = irq_of_parse_and_map(np, 0);
 	if (!irq) {
 		ret = -ENOENT;
-		pr_err("failed to get irq for clockevent: %d\n", ret);
+		pr_err("failed to get irq for clockevent: %pe\n", ERR_PTR(ret));
 		goto out_iounmap;
 	}
 
@@ -159,7 +159,7 @@ static int __init mps2_clockevent_init(struct device_node *np)
 
 	ret = request_irq(irq, mps2_timer_interrupt, IRQF_TIMER, name, ce);
 	if (ret) {
-		pr_err("failed to request irq for clockevent: %d\n", ret);
+		pr_err("failed to request irq for clockevent: %pe\n", ERR_PTR(ret));
 		goto out_kfree;
 	}
 
@@ -193,13 +193,13 @@ static int __init mps2_clocksource_init(struct device_node *np)
 		clk = of_clk_get(np, 0);
 		if (IS_ERR(clk)) {
 			ret = PTR_ERR(clk);
-			pr_err("failed to get clock for clocksource: %d\n", ret);
+			pr_err("failed to get clock for clocksource: %pe\n", clk);
 			goto out;
 		}
 
 		ret = clk_prepare_enable(clk);
 		if (ret) {
-			pr_err("failed to enable clock for clocksource: %d\n", ret);
+			pr_err("failed to enable clock for clocksource: %pe\n", ERR_PTR(ret));
 			goto out_clk_put;
 		}
 
@@ -209,7 +209,7 @@ static int __init mps2_clocksource_init(struct device_node *np)
 	base = of_iomap(np, 0);
 	if (!base) {
 		ret = -EADDRNOTAVAIL;
-		pr_err("failed to map register for clocksource: %d\n", ret);
+		pr_err("failed to map register for clocksource: %pe\n", ERR_PTR(ret));
 		goto out_clk_disable;
 	}
 
@@ -226,7 +226,7 @@ static int __init mps2_clocksource_init(struct device_node *np)
 				    rate, 200, 32,
 				    clocksource_mmio_readl_down);
 	if (ret) {
-		pr_err("failed to init clocksource: %d\n", ret);
+		pr_err("failed to init clocksource: %pe\n", ERR_PTR(ret));
 		goto out_iounmap;
 	}
 
diff --git a/drivers/clocksource/timer-clint.c b/drivers/clocksource/timer-clint.c
index 0bdd9d7ec545..03ce468bf15e 100644
--- a/drivers/clocksource/timer-clint.c
+++ b/drivers/clocksource/timer-clint.c
@@ -229,7 +229,7 @@ static int __init clint_timer_init_dt(struct device_node *np)
 
 	rc = clocksource_register_hz(&clint_clocksource, clint_timer_freq);
 	if (rc) {
-		pr_err("%pOFP: clocksource register failed [%d]\n", np, rc);
+		pr_err("%pOFP: clocksource register failed: %pe\n", np, ERR_PTR(rc));
 		goto fail_iounmap;
 	}
 
@@ -238,7 +238,7 @@ static int __init clint_timer_init_dt(struct device_node *np)
 	rc = request_percpu_irq(clint_timer_irq, clint_timer_interrupt,
 				 "clint-timer", &clint_clock_event);
 	if (rc) {
-		pr_err("registering percpu irq failed [%d]\n", rc);
+		pr_err("registering percpu irq failed: %pe\n", ERR_PTR(rc));
 		goto fail_iounmap;
 	}
 
@@ -260,7 +260,7 @@ static int __init clint_timer_init_dt(struct device_node *np)
 				clint_timer_starting_cpu,
 				clint_timer_dying_cpu);
 	if (rc) {
-		pr_err("%pOFP: cpuhp setup state failed [%d]\n", np, rc);
+		pr_err("%pOFP: cpuhp setup state failed: %pe\n", np, ERR_PTR(rc));
 		goto fail_free_irq;
 	}
 
diff --git a/drivers/clocksource/timer-digicolor.c b/drivers/clocksource/timer-digicolor.c
index 559aa96089c3..7b4991081bb7 100644
--- a/drivers/clocksource/timer-digicolor.c
+++ b/drivers/clocksource/timer-digicolor.c
@@ -189,7 +189,7 @@ static int __init digicolor_timer_init(struct device_node *node)
 			  IRQF_TIMER | IRQF_IRQPOLL, "digicolor_timerC",
 			  &dc_timer_dev.ce);
 	if (ret) {
-		pr_warn("request of timer irq %d failed (%d)\n", irq, ret);
+		pr_warn("request of timer irq %d failed: %pe\n", irq, ERR_PTR(ret));
 		return ret;
 	}
 
diff --git a/drivers/clocksource/timer-fsl-ftm.c b/drivers/clocksource/timer-fsl-ftm.c
index 93f336ec875a..dd709827a823 100644
--- a/drivers/clocksource/timer-fsl-ftm.c
+++ b/drivers/clocksource/timer-fsl-ftm.c
@@ -188,7 +188,7 @@ static int __init ftm_clockevent_init(unsigned long freq, int irq)
 	err = request_irq(irq, ftm_evt_interrupt, IRQF_TIMER | IRQF_IRQPOLL,
 			  "Freescale ftm timer", &ftm_clockevent);
 	if (err) {
-		pr_err("ftm: setup irq failed: %d\n", err);
+		pr_err("ftm: setup irq failed: %pe\n", ERR_PTR(err));
 		return err;
 	}
 
@@ -218,7 +218,7 @@ static int __init ftm_clocksource_init(unsigned long freq)
 				    freq / (1 << priv->ps), 300, 16,
 				    clocksource_mmio_readl_up);
 	if (err) {
-		pr_err("ftm: init clock source mmio failed: %d\n", err);
+		pr_err("ftm: init clock source mmio failed: %pe\n", ERR_PTR(err));
 		return err;
 	}
 
@@ -235,25 +235,25 @@ static int __init __ftm_clk_init(struct device_node *np, char *cnt_name,
 
 	clk = of_clk_get_by_name(np, cnt_name);
 	if (IS_ERR(clk)) {
-		pr_err("ftm: Cannot get \"%s\": %ld\n", cnt_name, PTR_ERR(clk));
+		pr_err("ftm: Cannot get \"%s\": %pe\n", cnt_name, clk);
 		return PTR_ERR(clk);
 	}
 	err = clk_prepare_enable(clk);
 	if (err) {
-		pr_err("ftm: clock failed to prepare+enable \"%s\": %d\n",
-			cnt_name, err);
+		pr_err("ftm: clock failed to prepare+enable \"%s\": %pe\n",
+			cnt_name, ERR_PTR(err));
 		return err;
 	}
 
 	clk = of_clk_get_by_name(np, ftm_name);
 	if (IS_ERR(clk)) {
-		pr_err("ftm: Cannot get \"%s\": %ld\n", ftm_name, PTR_ERR(clk));
+		pr_err("ftm: Cannot get \"%s\": %pe\n", ftm_name, clk);
 		return PTR_ERR(clk);
 	}
 	err = clk_prepare_enable(clk);
 	if (err)
-		pr_err("ftm: clock failed to prepare+enable \"%s\": %d\n",
-			ftm_name, err);
+		pr_err("ftm: clock failed to prepare+enable \"%s\": %pe\n",
+			ftm_name, ERR_PTR(err));
 
 	return clk_get_rate(clk);
 }
diff --git a/drivers/clocksource/timer-gxp.c b/drivers/clocksource/timer-gxp.c
index 57aa2e2cce53..d016fb324d54 100644
--- a/drivers/clocksource/timer-gxp.c
+++ b/drivers/clocksource/timer-gxp.c
@@ -86,13 +86,13 @@ static int __init gxp_timer_init(struct device_node *node)
 	clk = of_clk_get(node, 0);
 	if (IS_ERR(clk)) {
 		ret = (int)PTR_ERR(clk);
-		pr_err("%pOFn clock not found: %d\n", node, ret);
+		pr_err("%pOFn clock not found: %pe\n", node, clk);
 		goto err_free;
 	}
 
 	ret = clk_prepare_enable(clk);
 	if (ret) {
-		pr_err("%pOFn clock enable failed: %d\n", node, ret);
+		pr_err("%pOFn clock enable failed: %pe\n", node, ERR_PTR(ret));
 		goto err_clk_enable;
 	}
 
@@ -126,7 +126,7 @@ static int __init gxp_timer_init(struct device_node *node)
 	ret = clocksource_mmio_init(system_clock, node->name, freq,
 				    300, 32, clocksource_mmio_readl_up);
 	if (ret) {
-		pr_err("%pOFn init clocksource failed: %d", node, ret);
+		pr_err("%pOFn init clocksource failed: %pe", node, ERR_PTR(ret));
 		goto err_exit;
 	}
 
@@ -145,7 +145,7 @@ static int __init gxp_timer_init(struct device_node *node)
 	ret = request_irq(irq, gxp_timer_interrupt, IRQF_TIMER | IRQF_SHARED,
 			  node->name, gxp_timer);
 	if (ret) {
-		pr_err("%pOFn request_irq() failed: %d", node, ret);
+		pr_err("%pOFn request_irq() failed: %pe", node, ERR_PTR(ret));
 		goto err_exit;
 	}
 
diff --git a/drivers/clocksource/timer-imx-tpm.c b/drivers/clocksource/timer-imx-tpm.c
index bd64a8a8427f..308bcc4e8960 100644
--- a/drivers/clocksource/timer-imx-tpm.c
+++ b/drivers/clocksource/timer-imx-tpm.c
@@ -185,7 +185,7 @@ static int __init tpm_timer_init(struct device_node *np)
 	/* enable clk before accessing registers */
 	ret = clk_prepare_enable(ipg);
 	if (ret) {
-		pr_err("tpm: ipg clock enable failed (%d)\n", ret);
+		pr_err("tpm: ipg clock enable failed: %pe\n", ERR_PTR(ret));
 		clk_put(ipg);
 		return ret;
 	}
diff --git a/drivers/clocksource/timer-lpc32xx.c b/drivers/clocksource/timer-lpc32xx.c
index 68eae6378bf3..1e08e2090fee 100644
--- a/drivers/clocksource/timer-lpc32xx.c
+++ b/drivers/clocksource/timer-lpc32xx.c
@@ -161,13 +161,13 @@ static int __init lpc32xx_clocksource_init(struct device_node *np)
 
 	clk = of_clk_get_by_name(np, "timerclk");
 	if (IS_ERR(clk)) {
-		pr_err("clock get failed (%ld)\n", PTR_ERR(clk));
+		pr_err("clock get failed: %pe\n", clk);
 		return PTR_ERR(clk);
 	}
 
 	ret = clk_prepare_enable(clk);
 	if (ret) {
-		pr_err("clock enable failed (%d)\n", ret);
+		pr_err("clock enable failed: %pe\n", ERR_PTR(ret));
 		goto err_clk_enable;
 	}
 
@@ -193,7 +193,7 @@ static int __init lpc32xx_clocksource_init(struct device_node *np)
 	ret = clocksource_mmio_init(base + LPC32XX_TIMER_TC, "lpc3220 timer",
 				    rate, 300, 32, clocksource_mmio_readl_up);
 	if (ret) {
-		pr_err("failed to init clocksource (%d)\n", ret);
+		pr_err("failed to init clocksource: %pe\n", ERR_PTR(ret));
 		goto err_clocksource_init;
 	}
 
@@ -222,13 +222,13 @@ static int __init lpc32xx_clockevent_init(struct device_node *np)
 
 	clk = of_clk_get_by_name(np, "timerclk");
 	if (IS_ERR(clk)) {
-		pr_err("clock get failed (%ld)\n", PTR_ERR(clk));
+		pr_err("clock get failed: %pe\n", clk);
 		return PTR_ERR(clk);
 	}
 
 	ret = clk_prepare_enable(clk);
 	if (ret) {
-		pr_err("clock enable failed (%d)\n", ret);
+		pr_err("clock enable failed: %pe\n", ERR_PTR(ret));
 		goto err_clk_enable;
 	}
 
diff --git a/drivers/clocksource/timer-owl.c b/drivers/clocksource/timer-owl.c
index ac97420bfa7c..3319d3acb635 100644
--- a/drivers/clocksource/timer-owl.c
+++ b/drivers/clocksource/timer-owl.c
@@ -137,7 +137,7 @@ static int __init owl_timer_init(struct device_node *node)
 	clk = of_clk_get(node, 0);
 	if (IS_ERR(clk)) {
 		ret = PTR_ERR(clk);
-		pr_err("Failed to get clock for clocksource (%d)\n", ret);
+		pr_err("Failed to get clock for clocksource: %pe\n", clk);
 		return ret;
 	}
 
@@ -150,7 +150,7 @@ static int __init owl_timer_init(struct device_node *node)
 	ret = clocksource_mmio_init(owl_clksrc_base + OWL_Tx_VAL, node->name,
 				    rate, 200, 32, clocksource_mmio_readl_up);
 	if (ret) {
-		pr_err("Failed to register clocksource (%d)\n", ret);
+		pr_err("Failed to register clocksource: %pe\n", ERR_PTR(ret));
 		return ret;
 	}
 
diff --git a/drivers/clocksource/timer-pistachio.c b/drivers/clocksource/timer-pistachio.c
index 57b2197a0b67..6b956c3b2f20 100644
--- a/drivers/clocksource/timer-pistachio.c
+++ b/drivers/clocksource/timer-pistachio.c
@@ -174,25 +174,25 @@ static int __init pistachio_clksrc_of_init(struct device_node *node)
 
 	sys_clk = of_clk_get_by_name(node, "sys");
 	if (IS_ERR(sys_clk)) {
-		pr_err("clock get failed (%ld)\n", PTR_ERR(sys_clk));
+		pr_err("clock get failed: %pe\n", sys_clk);
 		return PTR_ERR(sys_clk);
 	}
 
 	fast_clk = of_clk_get_by_name(node, "fast");
 	if (IS_ERR(fast_clk)) {
-		pr_err("clock get failed (%lu)\n", PTR_ERR(fast_clk));
+		pr_err("clock get failed: %pe\n", fast_clk);
 		return PTR_ERR(fast_clk);
 	}
 
 	ret = clk_prepare_enable(sys_clk);
 	if (ret < 0) {
-		pr_err("failed to enable clock (%d)\n", ret);
+		pr_err("failed to enable clock: %pe\n", ERR_PTR(ret));
 		return ret;
 	}
 
 	ret = clk_prepare_enable(fast_clk);
 	if (ret < 0) {
-		pr_err("failed to enable clock (%d)\n", ret);
+		pr_err("failed to enable clock: %pe\n", ERR_PTR(ret));
 		clk_disable_unprepare(sys_clk);
 		return ret;
 	}
diff --git a/drivers/clocksource/timer-probe.c b/drivers/clocksource/timer-probe.c
index b7860bc0db4b..913473950191 100644
--- a/drivers/clocksource/timer-probe.c
+++ b/drivers/clocksource/timer-probe.c
@@ -30,8 +30,8 @@ void __init timer_probe(void)
 		ret = init_func_ret(np);
 		if (ret) {
 			if (ret != -EPROBE_DEFER)
-				pr_err("Failed to initialize '%pOF': %d\n", np,
-				       ret);
+				pr_err("Failed to initialize '%pOF': %pe\n", np,
+				       ERR_PTR(ret));
 			continue;
 		}
 
diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c
index 48ce50c5f5e6..05d2294d5444 100644
--- a/drivers/clocksource/timer-riscv.c
+++ b/drivers/clocksource/timer-riscv.c
@@ -169,7 +169,7 @@ static int __init riscv_timer_init_common(void)
 
 	error = clocksource_register_hz(&riscv_clocksource, riscv_timebase);
 	if (error) {
-		pr_err("RISCV timer registration failed [%d]\n", error);
+		pr_err("RISCV timer registration failed: %pe\n", ERR_PTR(error));
 		return error;
 	}
 
@@ -179,7 +179,7 @@ static int __init riscv_timer_init_common(void)
 				    riscv_timer_interrupt,
 				    "riscv-timer", &riscv_clock_event);
 	if (error) {
-		pr_err("registering percpu irq failed [%d]\n", error);
+		pr_err("registering percpu irq failed: %pe\n", ERR_PTR(error));
 		return error;
 	}
 
@@ -192,8 +192,8 @@ static int __init riscv_timer_init_common(void)
 			 "clockevents/riscv/timer:starting",
 			 riscv_timer_starting_cpu, riscv_timer_dying_cpu);
 	if (error)
-		pr_err("cpu hp setup state failed for RISCV timer [%d]\n",
-		       error);
+		pr_err("cpu hp setup state failed for RISCV timer: %pe\n",
+		       ERR_PTR(error));
 
 	return error;
 }
diff --git a/drivers/clocksource/timer-sp804.c b/drivers/clocksource/timer-sp804.c
index cd1916c05325..cbb3bc1eac0d 100644
--- a/drivers/clocksource/timer-sp804.c
+++ b/drivers/clocksource/timer-sp804.c
@@ -66,13 +66,13 @@ static long __init sp804_get_clock_rate(struct clk *clk, const char *name)
 	if (!clk)
 		clk = clk_get_sys("sp804", name);
 	if (IS_ERR(clk)) {
-		pr_err("%s clock not found: %ld\n", name, PTR_ERR(clk));
+		pr_err("%s clock not found: %pe\n", name, clk);
 		return PTR_ERR(clk);
 	}
 
 	err = clk_prepare_enable(clk);
 	if (err) {
-		pr_err("clock failed to enable: %d\n", err);
+		pr_err("clock failed to enable: %pe\n", ERR_PTR(err));
 		clk_put(clk);
 		return err;
 	}
diff --git a/drivers/clocksource/timer-tegra.c b/drivers/clocksource/timer-tegra.c
index e9635c25eef4..2fe79042fbf9 100644
--- a/drivers/clocksource/timer-tegra.c
+++ b/drivers/clocksource/timer-tegra.c
@@ -324,8 +324,8 @@ static int __init tegra_init_timer(struct device_node *np, bool tegra20,
 		ret = request_irq(cpu_to->clkevt.irq, tegra_timer_isr, flags,
 				  cpu_to->clkevt.name, &cpu_to->clkevt);
 		if (ret) {
-			pr_err("failed to set up irq for cpu%d: %d\n",
-			       cpu, ret);
+			pr_err("failed to set up irq for cpu%d: %pe\n",
+			       cpu, ERR_PTR(ret));
 			irq_dispose_mapping(cpu_to->clkevt.irq);
 			cpu_to->clkevt.irq = 0;
 			goto out_irq;
@@ -338,7 +338,7 @@ static int __init tegra_init_timer(struct device_node *np, bool tegra20,
 				    "timer_us", TIMER_1MHz, 300, 32,
 				    clocksource_mmio_readl_up);
 	if (ret)
-		pr_err("failed to register clocksource: %d\n", ret);
+		pr_err("failed to register clocksource: %pe\n", ERR_PTR(ret));
 
 #ifdef CONFIG_ARM
 	register_current_timer_delay(&tegra_delay_timer);
@@ -348,7 +348,7 @@ static int __init tegra_init_timer(struct device_node *np, bool tegra20,
 				"AP_TEGRA_TIMER_STARTING", tegra_timer_setup,
 				tegra_timer_stop);
 	if (ret)
-		pr_err("failed to set up cpu hp state: %d\n", ret);
+		pr_err("failed to set up cpu hp state: %pe\n", ERR_PTR(ret));
 
 	return ret;
 
diff --git a/drivers/clocksource/timer-tegra186.c b/drivers/clocksource/timer-tegra186.c
index 304537dadf2c..927533d98ef7 100644
--- a/drivers/clocksource/timer-tegra186.c
+++ b/drivers/clocksource/timer-tegra186.c
@@ -279,13 +279,13 @@ static struct tegra186_wdt *tegra186_wdt_create(struct tegra186_timer *tegra,
 
 	err = watchdog_init_timeout(&wdt->base, 5, tegra->dev);
 	if (err < 0) {
-		dev_err(tegra->dev, "failed to initialize timeout: %d\n", err);
+		dev_err(tegra->dev, "failed to initialize timeout: %pe\n", ERR_PTR(err));
 		return ERR_PTR(err);
 	}
 
 	err = devm_watchdog_register_device(tegra->dev, &wdt->base);
 	if (err < 0) {
-		dev_err(tegra->dev, "failed to register WDT: %d\n", err);
+		dev_err(tegra->dev, "failed to register WDT: %pe\n", ERR_PTR(err));
 		return ERR_PTR(err);
 	}
 
@@ -406,32 +406,32 @@ static int tegra186_timer_probe(struct platform_device *pdev)
 	tegra->wdt = tegra186_wdt_create(tegra, 0);
 	if (IS_ERR(tegra->wdt)) {
 		err = PTR_ERR(tegra->wdt);
-		dev_err(dev, "failed to create WDT: %d\n", err);
+		dev_err(dev, "failed to create WDT: %pe\n", tegra->wdt);
 		return err;
 	}
 
 	err = tegra186_timer_tsc_init(tegra);
 	if (err < 0) {
-		dev_err(dev, "failed to register TSC counter: %d\n", err);
+		dev_err(dev, "failed to register TSC counter: %pe\n", ERR_PTR(err));
 		return err;
 	}
 
 	err = tegra186_timer_osc_init(tegra);
 	if (err < 0) {
-		dev_err(dev, "failed to register OSC counter: %d\n", err);
+		dev_err(dev, "failed to register OSC counter: %pe\n", ERR_PTR(err));
 		goto unregister_tsc;
 	}
 
 	err = tegra186_timer_usec_init(tegra);
 	if (err < 0) {
-		dev_err(dev, "failed to register USEC counter: %d\n", err);
+		dev_err(dev, "failed to register USEC counter: %pe\n", ERR_PTR(err));
 		goto unregister_osc;
 	}
 
 	err = devm_request_irq(dev, irq, tegra186_timer_irq, 0,
 			       "tegra186-timer", tegra);
 	if (err < 0) {
-		dev_err(dev, "failed to request IRQ#%u: %d\n", irq, err);
+		dev_err(dev, "failed to request IRQ#%u: %pe\n", irq, ERR_PTR(err));
 		goto unregister_usec;
 	}
 
diff --git a/drivers/clocksource/timer-zevio.c b/drivers/clocksource/timer-zevio.c
index ecaa3568841c..b61973a66dc6 100644
--- a/drivers/clocksource/timer-zevio.c
+++ b/drivers/clocksource/timer-zevio.c
@@ -134,7 +134,7 @@ static int __init zevio_timer_add(struct device_node *node)
 	timer->clk = of_clk_get(node, 0);
 	if (IS_ERR(timer->clk)) {
 		ret = PTR_ERR(timer->clk);
-		pr_err("Timer clock not found! (error %d)\n", ret);
+		pr_err("Timer clock not found! error: %pe\n", timer->clk);
 		goto error_unmap;
 	}
 
-- 
2.25.1


^ permalink raw reply related

* RE: [RFC PATCH] clocksource: hyper-v: Enable the tsc_page for a TDX VM in TD mode
From: Michael Kelley @ 2024-05-24 22:44 UTC (permalink / raw)
  To: Dexuan Cui, Dave Hansen, x86@kernel.org,
	linux-coco@lists.linux.dev, bp@alien8.de,
	dave.hansen@linux.intel.com, Haiyang Zhang, hpa@zytor.com,
	kirill.shutemov@linux.intel.com, KY Srinivasan, luto@kernel.org,
	mingo@redhat.com, peterz@infradead.org,
	sathyanarayanan.kuppuswamy@linux.intel.com, tglx@linutronix.de,
	wei.liu@kernel.org, jason, thomas.lendacky@amd.com, tytso@mit.edu,
	ardb@kernel.org
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	Tianyu Lan
In-Reply-To: <SA1PR21MB1317CD997CCD64654B438754BFF52@SA1PR21MB1317.namprd21.prod.outlook.com>

From: Dexuan Cui <decui@microsoft.com> Sent: Friday, May 24, 2024 1:46 AM
> 
> > From: Dave Hansen <dave.hansen@intel.com>
> > Sent: Thursday, May 23, 2024 7:26 AM
> > [...]
> > On 5/22/24 19:24, Dexuan Cui wrote:
> > ...
> > > +static bool noinstr intel_cc_platform_td_l2(enum cc_attr attr)
> > > +{
> > > +	switch (attr) {
> > > +	case CC_ATTR_GUEST_MEM_ENCRYPT:
> > > +	case CC_ATTR_MEM_ENCRYPT:
> > > +		return true;
> > > +	default:
> > > +		return false;
> > > +	}
> > > +}
> > > +
> > >  static bool noinstr intel_cc_platform_has(enum cc_attr attr)
> > >  {
> > > +	if (tdx_partitioned_td_l2)
> > > +		return intel_cc_platform_td_l2(attr);
> > > +
> > >  	switch (attr) {
> > >  	case CC_ATTR_GUEST_UNROLL_STRING_IO:
> > >  	case CC_ATTR_HOTPLUG_DISABLED:
> >
> > On its face, this _looks_ rather troubling.  It just hijacks all of the
> > attributes.  It totally bifurcates the code.  Anything that gets added
> > to intel_cc_platform_has() now needs to be considered for addition to
> > intel_cc_platform_td_l2().
> 
> Maybe the bifurcation is necessary? TD mode is different from
> Partitioned TD mode (L2), after all. Another reason for the bifurcation
> is:  currently online/offline'ing is disallowed for a TD VM, but actually
> Hyper-V is able to support CPU online/offline'ing for a TD VM in
> Partitioned TD mode (L2) -- how can we allow online/offline'ing for such
> a VM?
> 
> BTW, the bifurcation code is copied from amd_cc_platform_has(), where
> an AMD SNP VM may run in the vTOM mode.
> 
> > > --- a/arch/x86/mm/mem_encrypt_amd.c
> > > +++ b/arch/x86/mm/mem_encrypt_amd.c
> > ...
> > > @@ -529,7 +530,7 @@ void __init mem_encrypt_free_decrypted_mem(void)
> > >  	 * CC_ATTR_MEM_ENCRYPT, aren't necessarily equivalent in a Hyper-V VM
> > >  	 * using vTOM, where sme_me_mask is always zero.
> > >  	 */
> > > -	if (sme_me_mask) {
> > > +	if (sme_me_mask || (cc_vendor == CC_VENDOR_INTEL && !tdx_partitioned_td_l2)) {

FWIW, the above won't work in a kernel built with CONFIG_TDX_GUEST=y
but CONFIG_AMD_MEM_ENCRYPT=n. mem_encrypt_free_decrypted_mem()
in arch/x86/mm/mem_encrypt_amd.c won't get built, and an empty stub is used.

> > >  		r = set_memory_encrypted(vaddr, npages);
> > >  		if (r) {
> > >  			pr_warn("failed to free unused decrypted pages\n");
> >
> > If _ever_ there were a place for a new CC_ attribute, this would be it.
> Not sure how to add a new CC attribute for the __bss_decrypted support.
> 
> For the cpu online/offline'ing support, I'm not sure how to add a new
> CC attribute and not introduce the bifurcation.
> 
> > It's also a bit concerning that now we've got a (cc_vendor ==
> > CC_VENDOR_INTEL) check in an amd.c file.
> I agree my change here is ugly...
> Currently the __bss_decrypted support is only used for SNP.
> Not sure if we should get it to work for TDX as well.
> 
> > So all of that on top of Kirill's "why do we need this in the first
> > place" questions leave me really scratching my head on this one.
> Probably I'll just use local APIC timer in such a VM or delay enabling
> Hyper-V TSC page to a later place where set_memory_decrypted()
> works for me. However, I still would like to find out how to allow
> CPU online/offline'ing for a TDX VM in Partitioned TD mode (L2).
> 

My thoughts:

__bss_decrypted is named as if it applies to any CoCo VM, but really
it is specific to AMD SEV. It was originally used for a GHCB page, which
is SEV-specific, and then it proved to be convenient for the Hyper-V TSC
page. Ideally, we could fix __bss_decrypted to work generally in a
TDX VM without any dependency on code specific to a hypervisor. But
looking at some of the details, that may be non-trivial.

A narrower solution is to remove the Hyper-V TSC page from
__bss_decrypted, and use Hyper-V specific code on both TDX and
SEV-SNP to decrypt just that page (not the entire __bss_decrypted), 
based on whether the Hyper-V guest is running with a paravisor.
From Dexuan's patch, it looks like set_memory_decrypted()
works on TDX at the time that ms_hyperv_init_platform() runs.
Does it also work on SEV-SNP? The code in kvm_init_platform()
uses early_set_mem_enc_dec_hypercall() with
kvm_sev_hc_page_enc_status(), which is SEV only.  So maybe
the normal set_memory_decrypted() doesn't work on SEV at
that point, though I'm not at all clear on what kvm_init_platform is
trying to do.  Shouldn't __bss_decrypted already be set up correctly?

The issue of taking CPUs offline is separate. Is the inability to take
a CPU offline with TDX an architectural limitation? Or just a
current Linux implementation limitation? And what about in an
L2 TDX VM?  If the existence of a limitation in a L2 TDX VM is
dependent on the hypervisor/paravisor, then can cc_platform_has()
check some architectural flag (that's independent of the host
hypervisor) to know if it is running in an L2 TDX VM and return false
for CC_ATTR_HOTPLUG_DISABLED? If a host/paravisor combo doesn't
allow taking a L2 TDX VM CPU offline, then it would be up to that
combo to implement the appropriate restriction. It's not hard to add
a CPUHP state that would prevent it.

Michael


^ permalink raw reply

* RE: [RFC PATCH] clocksource: hyper-v: Enable the tsc_page for a TDX VM in TD mode
From: Dexuan Cui @ 2024-05-24  8:45 UTC (permalink / raw)
  To: Dave Hansen, x86@kernel.org, linux-coco@lists.linux.dev,
	bp@alien8.de, dave.hansen@linux.intel.com, Haiyang Zhang,
	hpa@zytor.com, kirill.shutemov@linux.intel.com, KY Srinivasan,
	luto@kernel.org, mingo@redhat.com, peterz@infradead.org,
	sathyanarayanan.kuppuswamy@linux.intel.com, tglx@linutronix.de,
	wei.liu@kernel.org, jason, mhklinux@outlook.com,
	thomas.lendacky@amd.com, tytso@mit.edu, ardb@kernel.org
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	Tianyu Lan
In-Reply-To: <299a83e8-cb13-4599-9737-adb3bb922169@intel.com>

> From: Dave Hansen <dave.hansen@intel.com>
> Sent: Thursday, May 23, 2024 7:26 AM
> [...]
> On 5/22/24 19:24, Dexuan Cui wrote:
> ...
> > +static bool noinstr intel_cc_platform_td_l2(enum cc_attr attr)
> > +{
> > +	switch (attr) {
> > +	case CC_ATTR_GUEST_MEM_ENCRYPT:
> > +	case CC_ATTR_MEM_ENCRYPT:
> > +		return true;
> > +	default:
> > +		return false;
> > +	}
> > +}
> > +
> >  static bool noinstr intel_cc_platform_has(enum cc_attr attr)
> >  {
> > +	if (tdx_partitioned_td_l2)
> > +		return intel_cc_platform_td_l2(attr);
> > +
> >  	switch (attr) {
> >  	case CC_ATTR_GUEST_UNROLL_STRING_IO:
> >  	case CC_ATTR_HOTPLUG_DISABLED:
> 
> On its face, this _looks_ rather troubling.  It just hijacks all of the
> attributes.  It totally bifurcates the code.  Anything that gets added
> to intel_cc_platform_has() now needs to be considered for addition to
> intel_cc_platform_td_l2().

Maybe the bifurcation is necessary? TD mode is different from
Partitioned TD mode (L2), after all. Another reason for the bifurcation
is:  currently online/offline'ing is disallowed for a TD VM, but actually
Hyper-V is able to support CPU online/offline'ing for a TD VM in
Partitioned TD mode (L2) -- how can we allow online/offline'ing for such
a VM?

BTW, the bifurcation code is copied from amd_cc_platform_has(), where
an AMD SNP VM may run in the vTOM mode.

> > --- a/arch/x86/mm/mem_encrypt_amd.c
> > +++ b/arch/x86/mm/mem_encrypt_amd.c
> ...
> > @@ -529,7 +530,7 @@ void __init
> mem_encrypt_free_decrypted_mem(void)
> >  	 * CC_ATTR_MEM_ENCRYPT, aren't necessarily equivalent in a
> Hyper-V VM
> >  	 * using vTOM, where sme_me_mask is always zero.
> >  	 */
> > -	if (sme_me_mask) {
> > +	if (sme_me_mask || (cc_vendor == CC_VENDOR_INTEL
> && !tdx_partitioned_td_l2)) {
> >  		r = set_memory_encrypted(vaddr, npages);
> >  		if (r) {
> >  			pr_warn("failed to free unused decrypted
> pages\n");
> 
> If _ever_ there were a place for a new CC_ attribute, this would be it.
Not sure how to add a new CC attribute for the __bss_decrypted support.

For the cpu online/offline'ing support, I'm not sure how to add a new
CC attribute and not introduce the bifurcation.

> It's also a bit concerning that now we've got a (cc_vendor ==
> CC_VENDOR_INTEL) check in an amd.c file.
I agree my change here is ugly...
Currently the __bss_decrypted support is only used for SNP.
Not sure if we should get it to work for TDX as well.

> So all of that on top of Kirill's "why do we need this in the first
> place" questions leave me really scratching my head on this one.
Probably I'll just use local APIC timer in such a VM or delay enabling
Hyper-V TSC page to a later place where set_memory_decrypted()
works for me. However, I still would like to find out how to allow
CPU online/offline'ing for a TDX VM in Partitioned TD mode (L2).

Thanks,
Dexuan

^ permalink raw reply

* RE: [RFC PATCH] clocksource: hyper-v: Enable the tsc_page for a TDX VM in TD mode
From: Dexuan Cui @ 2024-05-24  8:22 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: x86@kernel.org, linux-coco@lists.linux.dev, bp@alien8.de,
	dave.hansen@intel.com, dave.hansen@linux.intel.com, Haiyang Zhang,
	hpa@zytor.com, KY Srinivasan, luto@kernel.org, mingo@redhat.com,
	peterz@infradead.org, sathyanarayanan.kuppuswamy@linux.intel.com,
	tglx@linutronix.de, wei.liu@kernel.org, jason,
	mhklinux@outlook.com, thomas.lendacky@amd.com, tytso@mit.edu,
	ardb@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Tianyu Lan, Elena Reshetova
In-Reply-To: <7yos4yh6te7zcwga3swddpyjyxlif2c5vqad2rouwf7euw47df@jvouxfoakct6>

> From: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Sent: Thursday, May 23, 2024 5:06 AM
> To: Dexuan Cui <decui@microsoft.com>
> [...]
> On Wed, May 22, 2024 at 07:24:41PM -0700, Dexuan Cui wrote:
> > A TDX VM on Hyper-V may run in TD mode or Partitioned TD mode (L2). For
> > the former, the VM has not enabled the Hyper-V TSC page (which is defined
> > in drivers/clocksource/hyperv_timer.c: "... tsc_pg __bss_decrypted ...")
> > because, for such a VM, the hypervisor requires that the page should be
> > shared, but currently the __bss_decrypted is not working for such a VM
> > yet.
> 
> I don't see how it is safe. It opens guest clock for manipulations form
> VMM. Could you elaborate on security implications?

The intention of the patch is not to enable Hyper-V TSC page as a clocksource
in such a VM. The default clocksource is still TSC.

The intention of the patch is to enable Hyper-V TSC page only for Hyper-V
timer. My understanding is that: "Hyper-V timer + Hyper-V TSC page" should
be as safe as "local APIC timer + TSC" because the VM needs the hypervisor
to emulate the timers, anyway. The guest may get a bad value of Hyper-V
TSC page from a malicious hypervisor, and consequently the Hyper-V timer
may fire too early or too late, but the similar thing can also happen to a local
APIC timer, if a malicious hypervisor decides to deliver the timer interrupt
too early or too late.

> > Hyper-V TSC page can work as a clocksource device similar to KVM pv
> > clock, and it's also used by the Hyper-V timer code to get the current
> > time: see hv_init_tsc_clocksource(), which sets the global function
> > pointer hv_read_reference_counter to read_hv_clock_tsc(); when
> > Hyper-V TSC page is not enabled, hv_read_reference_counter defaults to
> > be drivers/hv/hv_common.c: __hv_read_ref_counter(), which is suboptimal
> > as it uses the slow MSR interface to get the time info.
> 
> Why can't the guest just read the TSC directly? Why do we need the page?
> I am confused.
> 
>   Kiryl Shutsemau / Kirill A. Shutemov

Both Hyper-V TSC page and Hyper-V Reference Counter MSR return the
absolute time in the unit of 0.1us since the VM starts to run, i.e. the "frequency"
is 10M, which has a higher resolution than the emulated local APIC timer. 

Hyper-V timer depends on Hyper-V TSC page or Hyper-V Reference Counter MSR
as it also uses the absolute time in the unit of 0.1us. We could potentially read the
TSC and convert it to the absolute time in the unit of 0.1us, but the required math
calculation is not very easy and can be unreliable (e.g. I suppose an AP's TSC is 0
when it's just being brought up online? But the Hyper-V TSC page value on the AP
should be much bigger than 0) And, there will be an inaccuracy with the Hyper-V
side conversion that may use a slightly different math calculation (e.g. slightly
different TSC frequency or 'multi' or 'shift' values).

It turns out the local APIC timer in such a VM also works! So probably the VM
can just use local APIC timer and doesn't use Hyper-V timer. However, I noticed a
strange thing: in my 128-VP VM, each local APIC timer constantly generates
100 interrupts per second, even if the VM is idle. Trying to find out why. I do
enable tickles in my .config:

CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ=y
CONFIG_HZ_100=y
CONFIG_HZ=100

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [PATCH net-next] net: mana: Allow variable size indirection table
From: Jakub Kicinski @ 2024-05-23 19:46 UTC (permalink / raw)
  To: Shradha Gupta
  Cc: linux-hardening, netdev, linux-hyperv, linux-kernel, linux-rdma,
	Colin Ian King, Ahmed Zaki, Pavan Chebbi, Souradeep Chakrabarti,
	Konstantin Taranov, Kees Cook, Paolo Abeni, Eric Dumazet,
	David S. Miller, Dexuan Cui, Wei Liu, Haiyang Zhang,
	K. Y. Srinivasan, Leon Romanovsky, Jason Gunthorpe, Ajay Sharma,
	Long Li, Shradha Gupta
In-Reply-To: <1716483314-19028-1-git-send-email-shradhagupta@linux.microsoft.com>

On Thu, 23 May 2024 09:55:14 -0700 Shradha Gupta wrote:
> Allow variable size indirection table allocation in MANA instead
> of using a constant value MANA_INDIRECT_TABLE_SIZE.
> The size is now derived from the MANA_QUERY_VPORT_CONFIG and the
> indirection table is allocated dynamically.
> 
> Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
> Reviewed-by: Dexuan Cui <decui@microsoft.com>
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>

## Form letter - net-next-closed

The merge window for v6.10 has begun and we have already posted our pull
request. Therefore net-next is closed for new drivers, features, code
refactoring and optimizations. We are currently accepting bug fixes only.

Please repost when net-next reopens after May 26th.

RFC patches sent for review only are obviously welcome at any time.

See: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#development-cycle
-- 
pw-bot: defer


^ permalink raw reply

* [PATCH net-next] net: mana: Allow variable size indirection table
From: Shradha Gupta @ 2024-05-23 16:55 UTC (permalink / raw)
  To: linux-hardening, netdev, linux-hyperv, linux-kernel, linux-rdma
  Cc: Shradha Gupta, Colin Ian King, Ahmed Zaki, Pavan Chebbi,
	Souradeep Chakrabarti, Konstantin Taranov, Kees Cook, Paolo Abeni,
	Jakub Kicinski, Eric Dumazet, David S. Miller, Dexuan Cui,
	Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Leon Romanovsky,
	Jason Gunthorpe, Ajay Sharma, Long Li, Shradha Gupta

Allow variable size indirection table allocation in MANA instead
of using a constant value MANA_INDIRECT_TABLE_SIZE.
The size is now derived from the MANA_QUERY_VPORT_CONFIG and the
indirection table is allocated dynamically.

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/infiniband/hw/mana/qp.c               | 11 ++--
 drivers/net/ethernet/microsoft/mana/mana_en.c | 66 ++++++++++++++++---
 .../ethernet/microsoft/mana/mana_ethtool.c    | 20 ++++--
 include/net/mana/gdma.h                       |  4 +-
 include/net/mana/mana.h                       | 10 +--
 5 files changed, 84 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
index 6e7627745c95..5684738d6551 100644
--- a/drivers/infiniband/hw/mana/qp.c
+++ b/drivers/infiniband/hw/mana/qp.c
@@ -22,8 +22,7 @@ static int mana_ib_cfg_vport_steering(struct mana_ib_dev *dev,
 
 	gc = mdev_to_gc(dev);
 
-	req_buf_size =
-		sizeof(*req) + sizeof(mana_handle_t) * MANA_INDIRECT_TABLE_SIZE;
+	req_buf_size = struct_size(req, indir_tab, MANA_INDIRECT_TABLE_DEF_SIZE);
 	req = kzalloc(req_buf_size, GFP_KERNEL);
 	if (!req)
 		return -ENOMEM;
@@ -43,18 +42,18 @@ static int mana_ib_cfg_vport_steering(struct mana_ib_dev *dev,
 	if (log_ind_tbl_size)
 		req->rss_enable = true;
 
-	req->num_indir_entries = MANA_INDIRECT_TABLE_SIZE;
+	req->num_indir_entries = MANA_INDIRECT_TABLE_DEF_SIZE;
 	req->indir_tab_offset = sizeof(*req);
 	req->update_indir_tab = true;
 	req->cqe_coalescing_enable = 1;
 
 	req_indir_tab = (mana_handle_t *)(req + 1);
 	/* The ind table passed to the hardware must have
-	 * MANA_INDIRECT_TABLE_SIZE entries. Adjust the verb
+	 * MANA_INDIRECT_TABLE_DEF_SIZE entries. Adjust the verb
 	 * ind_table to MANA_INDIRECT_TABLE_SIZE if required
 	 */
 	ibdev_dbg(&dev->ib_dev, "ind table size %u\n", 1 << log_ind_tbl_size);
-	for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++) {
+	for (i = 0; i < MANA_INDIRECT_TABLE_DEF_SIZE; i++) {
 		req_indir_tab[i] = ind_table[i % (1 << log_ind_tbl_size)];
 		ibdev_dbg(&dev->ib_dev, "index %u handle 0x%llx\n", i,
 			  req_indir_tab[i]);
@@ -141,7 +140,7 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
 	}
 
 	ind_tbl_size = 1 << ind_tbl->log_ind_tbl_size;
-	if (ind_tbl_size > MANA_INDIRECT_TABLE_SIZE) {
+	if (ind_tbl_size > MANA_INDIRECT_TABLE_DEF_SIZE) {
 		ibdev_dbg(&mdev->ib_dev,
 			  "Indirect table size %d exceeding limit\n",
 			  ind_tbl_size);
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index d8af5e7e15b4..cd119349e5a0 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -481,7 +481,7 @@ static int mana_get_tx_queue(struct net_device *ndev, struct sk_buff *skb,
 	struct sock *sk = skb->sk;
 	int txq;
 
-	txq = apc->indir_table[hash & MANA_INDIRECT_TABLE_MASK];
+	txq = apc->indir_table[hash & (apc->indir_table_sz - 1)];
 
 	if (txq != old_q && sk && sk_fullsock(sk) &&
 	    rcu_access_pointer(sk->sk_dst_cache))
@@ -962,7 +962,16 @@ static int mana_query_vport_cfg(struct mana_port_context *apc, u32 vport_index,
 
 	*max_sq = resp.max_num_sq;
 	*max_rq = resp.max_num_rq;
-	*num_indir_entry = resp.num_indirection_ent;
+	if (resp.num_indirection_ent > 0 &&
+	    resp.num_indirection_ent <= MANA_INDIRECT_TABLE_MAX_SIZE &&
+	    is_power_of_2(resp.num_indirection_ent)) {
+		*num_indir_entry = resp.num_indirection_ent;
+	} else {
+		netdev_warn(apc->ndev,
+			    "Setting indirection table size to default %d for vPort %d\n",
+			    MANA_INDIRECT_TABLE_DEF_SIZE, apc->port_idx);
+		*num_indir_entry = MANA_INDIRECT_TABLE_DEF_SIZE;
+	}
 
 	apc->port_handle = resp.vport;
 	ether_addr_copy(apc->mac_addr, resp.mac_addr);
@@ -1054,7 +1063,6 @@ static int mana_cfg_vport_steering(struct mana_port_context *apc,
 				   bool update_default_rxobj, bool update_key,
 				   bool update_tab)
 {
-	u16 num_entries = MANA_INDIRECT_TABLE_SIZE;
 	struct mana_cfg_rx_steer_req_v2 *req;
 	struct mana_cfg_rx_steer_resp resp = {};
 	struct net_device *ndev = apc->ndev;
@@ -1062,7 +1070,7 @@ static int mana_cfg_vport_steering(struct mana_port_context *apc,
 	u32 req_buf_size;
 	int err;
 
-	req_buf_size = sizeof(*req) + sizeof(mana_handle_t) * num_entries;
+	req_buf_size = struct_size(req, indir_tab, apc->indir_table_sz);
 	req = kzalloc(req_buf_size, GFP_KERNEL);
 	if (!req)
 		return -ENOMEM;
@@ -1073,7 +1081,7 @@ static int mana_cfg_vport_steering(struct mana_port_context *apc,
 	req->hdr.req.msg_version = GDMA_MESSAGE_V2;
 
 	req->vport = apc->port_handle;
-	req->num_indir_entries = num_entries;
+	req->num_indir_entries = apc->indir_table_sz;
 	req->indir_tab_offset = sizeof(*req);
 	req->rx_enable = rx;
 	req->rss_enable = apc->rss_state;
@@ -1113,7 +1121,7 @@ static int mana_cfg_vport_steering(struct mana_port_context *apc,
 	}
 
 	netdev_info(ndev, "Configured steering vPort %llu entries %u\n",
-		    apc->port_handle, num_entries);
+		    apc->port_handle, apc->indir_table_sz);
 out:
 	kfree(req);
 	return err;
@@ -2346,11 +2354,34 @@ static int mana_create_vport(struct mana_port_context *apc,
 	return mana_create_txq(apc, net);
 }
 
+static int mana_rss_table_alloc(struct mana_port_context *apc)
+{
+	if (!apc->indir_table_sz) {
+		netdev_err(apc->ndev,
+			   "Indirection table size not set for vPort %d\n",
+			   apc->port_idx);
+		return -EINVAL;
+	}
+
+	apc->indir_table = kcalloc(apc->indir_table_sz, sizeof(u32), GFP_KERNEL);
+	if (!apc->indir_table)
+		return -ENOMEM;
+
+	apc->rxobj_table = kcalloc(apc->indir_table_sz, sizeof(mana_handle_t), GFP_KERNEL);
+
+	if (!apc->rxobj_table) {
+		kfree(apc->indir_table);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
 static void mana_rss_table_init(struct mana_port_context *apc)
 {
 	int i;
 
-	for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++)
+	for (i = 0; i < apc->indir_table_sz; i++)
 		apc->indir_table[i] =
 			ethtool_rxfh_indir_default(i, apc->num_queues);
 }
@@ -2363,7 +2394,7 @@ int mana_config_rss(struct mana_port_context *apc, enum TRI_STATE rx,
 	int i;
 
 	if (update_tab) {
-		for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++) {
+		for (i = 0; i < apc->indir_table_sz; i++) {
 			queue_idx = apc->indir_table[i];
 			apc->rxobj_table[i] = apc->rxqs[queue_idx]->rxobj;
 		}
@@ -2468,7 +2499,6 @@ static int mana_init_port(struct net_device *ndev)
 	struct mana_port_context *apc = netdev_priv(ndev);
 	u32 max_txq, max_rxq, max_queues;
 	int port_idx = apc->port_idx;
-	u32 num_indirect_entries;
 	int err;
 
 	err = mana_init_port_context(apc);
@@ -2476,7 +2506,7 @@ static int mana_init_port(struct net_device *ndev)
 		return err;
 
 	err = mana_query_vport_cfg(apc, port_idx, &max_txq, &max_rxq,
-				   &num_indirect_entries);
+				   &apc->indir_table_sz);
 	if (err) {
 		netdev_err(ndev, "Failed to query info for vPort %d\n",
 			   port_idx);
@@ -2725,6 +2755,10 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
 	if (err)
 		goto free_net;
 
+	err = mana_rss_table_alloc(apc);
+	if (err)
+		goto reset_apc;
+
 	netdev_lockdep_set_classes(ndev);
 
 	ndev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
@@ -2747,6 +2781,11 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
 	return 0;
 
 reset_apc:
+	apc->indir_table_sz = 0;
+	kfree(apc->indir_table);
+	apc->indir_table = NULL;
+	kfree(apc->rxobj_table);
+	apc->rxobj_table = NULL;
 	kfree(apc->rxqs);
 	apc->rxqs = NULL;
 free_net:
@@ -2899,6 +2938,7 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
 {
 	struct gdma_context *gc = gd->gdma_context;
 	struct mana_context *ac = gd->driver_data;
+	struct mana_port_context *apc;
 	struct device *dev = gc->dev;
 	struct net_device *ndev;
 	int err;
@@ -2910,6 +2950,7 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
 
 	for (i = 0; i < ac->num_ports; i++) {
 		ndev = ac->ports[i];
+		apc = netdev_priv(ndev);
 		if (!ndev) {
 			if (i == 0)
 				dev_err(dev, "No net device to remove\n");
@@ -2933,6 +2974,11 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
 		}
 
 		unregister_netdevice(ndev);
+		apc->indir_table_sz = 0;
+		kfree(apc->indir_table);
+		apc->indir_table = NULL;
+		kfree(apc->rxobj_table);
+		apc->rxobj_table = NULL;
 
 		rtnl_unlock();
 
diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
index ab2413d71f6c..1667f18046d2 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
@@ -245,7 +245,9 @@ static u32 mana_get_rxfh_key_size(struct net_device *ndev)
 
 static u32 mana_rss_indir_size(struct net_device *ndev)
 {
-	return MANA_INDIRECT_TABLE_SIZE;
+	struct mana_port_context *apc = netdev_priv(ndev);
+
+	return apc->indir_table_sz;
 }
 
 static int mana_get_rxfh(struct net_device *ndev,
@@ -257,7 +259,7 @@ static int mana_get_rxfh(struct net_device *ndev,
 	rxfh->hfunc = ETH_RSS_HASH_TOP; /* Toeplitz */
 
 	if (rxfh->indir) {
-		for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++)
+		for (i = 0; i < apc->indir_table_sz; i++)
 			rxfh->indir[i] = apc->indir_table[i];
 	}
 
@@ -273,8 +275,8 @@ static int mana_set_rxfh(struct net_device *ndev,
 {
 	struct mana_port_context *apc = netdev_priv(ndev);
 	bool update_hash = false, update_table = false;
-	u32 save_table[MANA_INDIRECT_TABLE_SIZE];
 	u8 save_key[MANA_HASH_KEY_SIZE];
+	u32 *save_table;
 	int i, err;
 
 	if (!apc->port_is_up)
@@ -284,13 +286,17 @@ static int mana_set_rxfh(struct net_device *ndev,
 	    rxfh->hfunc != ETH_RSS_HASH_TOP)
 		return -EOPNOTSUPP;
 
+	save_table = kcalloc(apc->indir_table_sz, sizeof(u32), GFP_KERNEL);
+	if (!save_table)
+		return -ENOMEM;
+
 	if (rxfh->indir) {
-		for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++)
+		for (i = 0; i < apc->indir_table_sz; i++)
 			if (rxfh->indir[i] >= apc->num_queues)
 				return -EINVAL;
 
 		update_table = true;
-		for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++) {
+		for (i = 0; i < apc->indir_table_sz; i++) {
 			save_table[i] = apc->indir_table[i];
 			apc->indir_table[i] = rxfh->indir[i];
 		}
@@ -306,7 +312,7 @@ static int mana_set_rxfh(struct net_device *ndev,
 
 	if (err) { /* recover to original values */
 		if (update_table) {
-			for (i = 0; i < MANA_INDIRECT_TABLE_SIZE; i++)
+			for (i = 0; i < apc->indir_table_sz; i++)
 				apc->indir_table[i] = save_table[i];
 		}
 
@@ -316,6 +322,8 @@ static int mana_set_rxfh(struct net_device *ndev,
 		mana_config_rss(apc, TRI_STATE_TRUE, update_hash, update_table);
 	}
 
+	kfree(save_table);
+
 	return err;
 }
 
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 27684135bb4d..3b76d994c3d0 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -543,11 +543,13 @@ enum {
  */
 #define GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX BIT(2)
 #define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG BIT(3)
+#define GDMA_DRV_CAP_FLAG_1_VARIABLE_INDIRECTION_TABLE_SUPPORT BIT(5)
 
 #define GDMA_DRV_CAP_FLAGS1 \
 	(GDMA_DRV_CAP_FLAG_1_EQ_SHARING_MULTI_VPORT | \
 	 GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX | \
-	 GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG)
+	 GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG |\
+	 GDMA_DRV_CAP_FLAG_1_VARIABLE_INDIRECTION_TABLE_SUPPORT)
 
 #define GDMA_DRV_CAP_FLAGS2 0
 
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 4eeedf14711b..59823901b74f 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -30,8 +30,8 @@ enum TRI_STATE {
 };
 
 /* Number of entries for hardware indirection table must be in power of 2 */
-#define MANA_INDIRECT_TABLE_SIZE 64
-#define MANA_INDIRECT_TABLE_MASK (MANA_INDIRECT_TABLE_SIZE - 1)
+#define MANA_INDIRECT_TABLE_MAX_SIZE 512
+#define MANA_INDIRECT_TABLE_DEF_SIZE 64
 
 /* The Toeplitz hash key's length in bytes: should be multiple of 8 */
 #define MANA_HASH_KEY_SIZE 40
@@ -410,10 +410,11 @@ struct mana_port_context {
 	struct mana_tx_qp *tx_qp;
 
 	/* Indirection Table for RX & TX. The values are queue indexes */
-	u32 indir_table[MANA_INDIRECT_TABLE_SIZE];
+	u32 *indir_table;
+	u32 indir_table_sz;
 
 	/* Indirection table containing RxObject Handles */
-	mana_handle_t rxobj_table[MANA_INDIRECT_TABLE_SIZE];
+	mana_handle_t *rxobj_table;
 
 	/*  Hash key used by the NIC */
 	u8 hashkey[MANA_HASH_KEY_SIZE];
@@ -670,6 +671,7 @@ struct mana_cfg_rx_steer_req_v2 {
 	u8 hashkey[MANA_HASH_KEY_SIZE];
 	u8 cqe_coalescing_enable;
 	u8 reserved2[7];
+	mana_handle_t indir_tab[] __counted_by(num_indir_entries);
 }; /* HW DATA */
 
 struct mana_cfg_rx_steer_resp {
-- 
2.34.1


^ permalink raw reply related

* Re: [RFC PATCH] clocksource: hyper-v: Enable the tsc_page for a TDX VM in TD mode
From: Dave Hansen @ 2024-05-23 14:25 UTC (permalink / raw)
  To: Dexuan Cui, x86, linux-coco, bp, dave.hansen, haiyangz, hpa,
	kirill.shutemov, kys, luto, mingo, peterz,
	sathyanarayanan.kuppuswamy, tglx, wei.liu, Jason, mhklinux,
	thomas.lendacky, tytso, ardb
  Cc: linux-hyperv, linux-kernel, Tianyu.Lan
In-Reply-To: <20240523022441.20879-1-decui@microsoft.com>

On 5/22/24 19:24, Dexuan Cui wrote:
...
> +static bool noinstr intel_cc_platform_td_l2(enum cc_attr attr)
> +{
> +	switch (attr) {
> +	case CC_ATTR_GUEST_MEM_ENCRYPT:
> +	case CC_ATTR_MEM_ENCRYPT:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}
> +
>  static bool noinstr intel_cc_platform_has(enum cc_attr attr)
>  {
> +	if (tdx_partitioned_td_l2)
> +		return intel_cc_platform_td_l2(attr);
> +
>  	switch (attr) {
>  	case CC_ATTR_GUEST_UNROLL_STRING_IO:
>  	case CC_ATTR_HOTPLUG_DISABLED:

On its face, this _looks_ rather troubling.  It just hijacks all of the
attributes.  It totally bifurcates the code.  Anything that gets added
to intel_cc_platform_has() now needs to be considered for addition to
intel_cc_platform_td_l2().

> --- a/arch/x86/mm/mem_encrypt_amd.c
> +++ b/arch/x86/mm/mem_encrypt_amd.c
...
> @@ -529,7 +530,7 @@ void __init mem_encrypt_free_decrypted_mem(void)
>  	 * CC_ATTR_MEM_ENCRYPT, aren't necessarily equivalent in a Hyper-V VM
>  	 * using vTOM, where sme_me_mask is always zero.
>  	 */
> -	if (sme_me_mask) {
> +	if (sme_me_mask || (cc_vendor == CC_VENDOR_INTEL && !tdx_partitioned_td_l2)) {
>  		r = set_memory_encrypted(vaddr, npages);
>  		if (r) {
>  			pr_warn("failed to free unused decrypted pages\n");

If _ever_ there were a place for a new CC_ attribute, this would be it.
It's also a bit concerning that now we've got a (cc_vendor ==
CC_VENDOR_INTEL) check in an amd.c file.

So all of that on top of Kirill's "why do we need this in the first
place" questions leave me really scratching my head on this one.

^ permalink raw reply

* Re: [RFC PATCH] clocksource: hyper-v: Enable the tsc_page for a TDX VM in TD mode
From: Kirill A. Shutemov @ 2024-05-23 12:06 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: x86, linux-coco, bp, dave.hansen, dave.hansen, haiyangz, hpa, kys,
	luto, mingo, peterz, sathyanarayanan.kuppuswamy, tglx, wei.liu,
	Jason, mhklinux, thomas.lendacky, tytso, ardb, linux-hyperv,
	linux-kernel, Tianyu.Lan, Elena Reshetova
In-Reply-To: <20240523022441.20879-1-decui@microsoft.com>

On Wed, May 22, 2024 at 07:24:41PM -0700, Dexuan Cui wrote:
> A TDX VM on Hyper-V may run in TD mode or Partitioned TD mode (L2). For
> the former, the VM has not enabled the Hyper-V TSC page (which is defined
> in drivers/clocksource/hyperv_timer.c: "... tsc_pg __bss_decrypted ...")
> because, for such a VM, the hypervisor requires that the page should be
> shared, but currently the __bss_decrypted is not working for such a VM
> yet.

I don't see how it is safe. It opens guest clock for manipulations form
VMM. Could you elaborate on security implications?

> Hyper-V TSC page can work as a clocksource device similar to KVM pv
> clock, and it's also used by the Hyper-V timer code to get the current
> time: see hv_init_tsc_clocksource(), which sets the global function
> pointer hv_read_reference_counter to read_hv_clock_tsc(); when
> Hyper-V TSC page is not enabled, hv_read_reference_counter defaults to
> be drivers/hv/hv_common.c: __hv_read_ref_counter(), which is suboptimal
> as it uses the slow MSR interface to get the time info.

Why can't the guest just read the TSC directly? Why do we need the page?
I am confused.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply

* [RFC PATCH] clocksource: hyper-v: Enable the tsc_page for a TDX VM in TD mode
From: Dexuan Cui @ 2024-05-23  2:24 UTC (permalink / raw)
  To: x86, linux-coco, bp, dave.hansen, dave.hansen, haiyangz, hpa,
	kirill.shutemov, kys, luto, mingo, peterz,
	sathyanarayanan.kuppuswamy, tglx, wei.liu, Jason, mhklinux,
	thomas.lendacky, tytso, ardb
  Cc: linux-hyperv, linux-kernel, Tianyu.Lan, Dexuan Cui

A TDX VM on Hyper-V may run in TD mode or Partitioned TD mode (L2). For
the former, the VM has not enabled the Hyper-V TSC page (which is defined
in drivers/clocksource/hyperv_timer.c: "... tsc_pg __bss_decrypted ...")
because, for such a VM, the hypervisor requires that the page should be
shared, but currently the __bss_decrypted is not working for such a VM
yet.

Hyper-V TSC page can work as a clocksource device similar to KVM pv
clock, and it's also used by the Hyper-V timer code to get the current
time: see hv_init_tsc_clocksource(), which sets the global function
pointer hv_read_reference_counter to read_hv_clock_tsc(); when
Hyper-V TSC page is not enabled, hv_read_reference_counter defaults to
be drivers/hv/hv_common.c: __hv_read_ref_counter(), which is suboptimal
as it uses the slow MSR interface to get the time info.

The attribute __bss_decrypted was added for a native SNP VM by the
commit 45f46b1ac95e ("clocksource: hyper-v: Mark hyperv tsc page unencrypted in sev-snp enlightened guest")

The attribute works for SNP because of the commit below
commit b3f0907c71e0 ("x86/mm: Add .bss..decrypted section to hold shared variables")

The attribute is not working for TDX because __startup_64() ->
sme_postprocess_startup() is not for TDX; we can't just call
set_memory_decrypted() in sme_postprocess_startup() because
sme_postprocess_startup() runs too early and set_memory_decrypted() is not
working there yet.

This RFC patch calls set_memory_decrypted() in a later place, i.e., in
start_kernel() -> setup_arch() -> init_hypervisor_platform() ->
ms_hyperv_init_platform(), so set_memory_decrypted() works there;
accordingly, mem_encrypt_free_decrypted_mem() -> set_memory_encrypted()
must be called for TDX now.

When a TDX VM runs in Partitioned TD mode (L2), the Hyper-V TSC page should
be a private page, so set_memory_decrypted() should not be called for the
page in such a VM. Introduce a global variable "tdx_partitioned_td_l2" to
handle this type of VM differently.

BTW, the 4KB Hyper-V TSC page is enabled very early in
hv_init_tsc_clocksource(), where set_memory_decrypted() is not working yet,
so we can't simply call set_memory_decrypted() in hv_init_tsc_clocksource()
for a TDX VM in TD mode, and we need to get the attribute __bss_decrypted
to work for such a VM.

The changes to arch/x86/kernel/cpu/mshyperv.c and
arch/x86/mm/mem_encrypt_amd.c are not satisfying to me. I wish there
could be a better way to support __bss_decrypted for a native TDX VM
so that a TDX VM on KVM could also benefit from __bss_decrypted, if
some one wants to use it in future. BTW, kvm_init_platform() has
similar code for SNP.

This is just a RFC patch. I apprecite your insight. Thanks!

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 arch/x86/coco/core.c           | 15 +++++++++++++++
 arch/x86/coco/tdx/tdx.c        |  2 ++
 arch/x86/hyperv/ivm.c          |  3 ++-
 arch/x86/include/asm/tdx.h     |  1 +
 arch/x86/kernel/cpu/mshyperv.c |  8 ++++++--
 arch/x86/mm/mem_encrypt_amd.c  |  3 ++-
 6 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index b31ef2424d194..61cec405f1084 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -15,6 +15,7 @@
 
 #include <asm/archrandom.h>
 #include <asm/coco.h>
+#include <asm/tdx.h>
 #include <asm/processor.h>
 
 enum cc_vendor cc_vendor __ro_after_init = CC_VENDOR_NONE;
@@ -25,8 +26,22 @@ static struct cc_attr_flags {
 	      __resv		: 63;
 } cc_flags;
 
+static bool noinstr intel_cc_platform_td_l2(enum cc_attr attr)
+{
+	switch (attr) {
+	case CC_ATTR_GUEST_MEM_ENCRYPT:
+	case CC_ATTR_MEM_ENCRYPT:
+		return true;
+	default:
+		return false;
+	}
+}
+
 static bool noinstr intel_cc_platform_has(enum cc_attr attr)
 {
+	if (tdx_partitioned_td_l2)
+		return intel_cc_platform_td_l2(attr);
+
 	switch (attr) {
 	case CC_ATTR_GUEST_UNROLL_STRING_IO:
 	case CC_ATTR_HOTPLUG_DISABLED:
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index abf3cd591afd3..8e6ab42add7c0 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -39,6 +39,8 @@
 
 #define TDREPORT_SUBTYPE_0	0
 
+bool tdx_partitioned_td_l2 __ro_after_init;
+
 /* Called from __tdx_hypercall() for unrecoverable failure */
 noinstr void __noreturn __tdx_hypercall_failed(void)
 {
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 768d73de0d098..52cd44e507846 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -626,7 +626,7 @@ static bool hv_is_private_mmio(u64 addr)
 	return false;
 }
 
-void __init hv_vtom_init(void)
+void __init hv_vtom_init(void) /* TODO: rename the function for TDX */
 {
 	enum hv_isolation_type type = hv_get_isolation_type();
 
@@ -650,6 +650,7 @@ void __init hv_vtom_init(void)
 
 	case HV_ISOLATION_TYPE_TDX:
 		cc_vendor = CC_VENDOR_INTEL;
+		tdx_partitioned_td_l2 = true;
 		break;
 
 	default:
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index eba178996d845..ddcc9ef82dc99 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -66,6 +66,7 @@ int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport);
 
 u64 tdx_hcall_get_quote(u8 *buf, size_t size);
 
+extern bool tdx_partitioned_td_l2;
 #else
 
 static inline void tdx_early_init(void) { };
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e0fd57a8ba840..7c336bc020c9f 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -18,6 +18,7 @@
 #include <linux/kexec.h>
 #include <linux/i8253.h>
 #include <linux/random.h>
+#include <linux/set_memory.h>
 #include <asm/processor.h>
 #include <asm/hypervisor.h>
 #include <asm/hyperv-tlfs.h>
@@ -449,8 +450,11 @@ static void __init ms_hyperv_init_platform(void)
 			ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
 
 			if (!ms_hyperv.paravisor_present) {
-				/* To be supported: more work is required.  */
-				ms_hyperv.features &= ~HV_MSR_REFERENCE_TSC_AVAILABLE;
+				unsigned long vaddr = (unsigned long)__start_bss_decrypted;
+				unsigned long vaddr_end = (unsigned long)__end_bss_decrypted;
+
+				for (; vaddr < vaddr_end; vaddr += PMD_SIZE)
+					set_memory_decrypted(vaddr, PMD_SIZE >> PAGE_SHIFT);
 
 				/* HV_MSR_CRASH_CTL is unsupported. */
 				ms_hyperv.misc_features &= ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c
index 422602f6039b8..0ddb9e5d222c3 100644
--- a/arch/x86/mm/mem_encrypt_amd.c
+++ b/arch/x86/mm/mem_encrypt_amd.c
@@ -25,6 +25,7 @@
 #include <asm/fixmap.h>
 #include <asm/setup.h>
 #include <asm/mem_encrypt.h>
+#include <asm/tdx.h>
 #include <asm/bootparam.h>
 #include <asm/set_memory.h>
 #include <asm/cacheflush.h>
@@ -529,7 +530,7 @@ void __init mem_encrypt_free_decrypted_mem(void)
 	 * CC_ATTR_MEM_ENCRYPT, aren't necessarily equivalent in a Hyper-V VM
 	 * using vTOM, where sme_me_mask is always zero.
 	 */
-	if (sme_me_mask) {
+	if (sme_me_mask || (cc_vendor == CC_VENDOR_INTEL && !tdx_partitioned_td_l2)) {
 		r = set_memory_encrypted(vaddr, npages);
 		if (r) {
 			pr_warn("failed to free unused decrypted pages\n");
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH V2 net] net: mana: Fix the extra HZ in mana_hwc_send_request
From: patchwork-bot+netdevbpf @ 2024-05-22  9:20 UTC (permalink / raw)
  To: Souradeep Chakrabarti
  Cc: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
	longli, yury.norov, leon, cai.huoqing, ssengar, vkuznets, tglx,
	linux-hyperv, netdev, linux-kernel, linux-rdma, schakrabarti,
	stable
In-Reply-To: <1716185104-31658-1-git-send-email-schakrabarti@linux.microsoft.com>

Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Sun, 19 May 2024 23:05:04 -0700 you wrote:
> Commit 62c1bff593b7 added an extra HZ along with msecs_to_jiffies.
> This patch fixes that.
> 
> Cc: stable@vger.kernel.org
> Fixes: 62c1bff593b7 ("net: mana: Configure hwc timeout from hardware")
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> Reviewed-by: Brett Creeley <brett.creeley@amd.com>
> Reviewed-by: Dexuan Cui <decui@microsoft.com>
> 
> [...]

Here is the summary with links:
  - [V2,net] net: mana: Fix the extra HZ in mana_hwc_send_request
    https://git.kernel.org/netdev/net/c/9c91c7fadb17

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH] x86/tdx: Support vmalloc() for tdx_enc_status_changed()
From: Dexuan Cui @ 2024-05-21  2:12 UTC (permalink / raw)
  To: x86, linux-coco, ak, arnd, bp, brijesh.singh, dan.j.williams,
	dave.hansen, dave.hansen, haiyangz, hpa, jane.chu,
	kirill.shutemov, kys, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	Jason, nik.borisov, mhklinux
  Cc: linux-hyperv, linux-kernel, Tianyu.Lan, rick.p.edgecombe, andavis,
	mheslin, vkuznets, xiaoyao.li, Dexuan Cui, stable

When a TDX guest runs on Hyper-V, the hv_netvsc driver's netvsc_init_buf()
allocates buffers using vzalloc(), and needs to share the buffers with the
host OS by calling set_memory_decrypted(), which is not working for
vmalloc() yet. Add the support by handling the pages one by one.

Co-developed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Cc: stable@vger.kernel.org # 6.6+
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

This is basically a repost of the second patch of the 2023 patchset:
https://lwn.net/ml/linux-kernel/20230811214826.9609-3-decui@microsoft.com/

The first patch of the patchset got merged into mainline, but unluckily the
second patch didn't, and I kind of lost track of it. Sorry.

Changes since the previous patchset (please refer to the link above):
  Added Rick's and Dave's Reviewed-by.
  Added Kai's Acked-by.
  Removeda the test "if (offset_in_page(start) != 0)" since we know the
  'start' is page-aligned: see __set_memory_enc_pgtable().

Please review. Thanks!
Dexuan

 arch/x86/coco/tdx/tdx.c | 35 ++++++++++++++++++++++++++++-------
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index c1cb90369915b..abf3cd591afd3 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -7,6 +7,7 @@
 #include <linux/cpufeature.h>
 #include <linux/export.h>
 #include <linux/io.h>
+#include <linux/mm.h>
 #include <asm/coco.h>
 #include <asm/tdx.h>
 #include <asm/vmx.h>
@@ -778,6 +779,19 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
 	return false;
 }
 
+static bool tdx_enc_status_changed_phys(phys_addr_t start, phys_addr_t end,
+					bool enc)
+{
+	if (!tdx_map_gpa(start, end, enc))
+		return false;
+
+	/* shared->private conversion requires memory to be accepted before use */
+	if (enc)
+		return tdx_accept_memory(start, end);
+
+	return true;
+}
+
 /*
  * Inform the VMM of the guest's intent for this physical page: shared with
  * the VMM or private to the guest.  The VMM is expected to change its mapping
@@ -785,15 +799,22 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
  */
 static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
 {
-	phys_addr_t start = __pa(vaddr);
-	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
+	unsigned long start = vaddr;
+	unsigned long end = start + numpages * PAGE_SIZE;
+	unsigned long step = end - start;
+	unsigned long addr;
 
-	if (!tdx_map_gpa(start, end, enc))
-		return false;
+	/* Step through page-by-page for vmalloc() mappings */
+	if (is_vmalloc_addr((void *)vaddr))
+		step = PAGE_SIZE;
 
-	/* shared->private conversion requires memory to be accepted before use */
-	if (enc)
-		return tdx_accept_memory(start, end);
+	for (addr = start; addr < end; addr += step) {
+		phys_addr_t start_pa = slow_virt_to_phys((void *)addr);
+		phys_addr_t end_pa   = start_pa + step;
+
+		if (!tdx_enc_status_changed_phys(start_pa, end_pa, enc))
+			return false;
+	}
 
 	return true;
 }
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH v2 1/6] arm64/hyperv: Support DeviceTree
From: Roman Kisel @ 2024-05-20 20:36 UTC (permalink / raw)
  To: Krzysztof Kozlowski, arnd, bhelgaas, bp, catalin.marinas,
	dave.hansen, decui, haiyangz, hpa, kw, kys, lenb, lpieralisi,
	mingo, mhklinux, rafael, robh, tglx, wei.liu, will, linux-acpi,
	linux-arch, linux-arm-kernel, linux-hyperv, linux-kernel,
	linux-pci, x86
  Cc: ssengar, sunilmut, vdso
In-Reply-To: <6f7ae5e6-d20f-4980-9b6e-25ba6d7b5558@linaro.org>



On 5/19/2024 11:45 PM, Krzysztof Kozlowski wrote:
> On 15/05/2024 19:33, Roman Kisel wrote:
>>>>    static bool hyperv_initialized;
>>>> @@ -27,6 +30,29 @@ int hv_get_hypervisor_version(union hv_hypervisor_version_info *info)
>>>>    	return 0;
>>>>    }
>>>>    
>>>> +static bool hyperv_detect_fdt(void)
>>>> +{
>>>> +#ifdef CONFIG_OF
>>>> +	const unsigned long hyp_node = of_get_flat_dt_subnode_by_name(
>>>> +			of_get_flat_dt_root(), "hypervisor");
>>>
>>> Why do you add an ABI for node name? Although name looks OK, but is it
>>> really described in the spec that you depend on it? I really do not like
>>> name dependencies...
>>
>> Followed the existing DeviceTree's of naming and approaches in the
>> kernel to surprise less and "invent" even less. As for the spec, the
> 
> I am sorry, but there is no approved existing approach of adding ABI for
> node names. There are exceptions or specific cases, but that's not
> "invent less" approach. ABI is defined by compatible.
I should check on the compatible instead of adding a node that is not 
mentioned in the DeviceTree spec as I understand. Appreciate your help!

> 
> Best regards,
> Krzysztof
> 

-- 
Thank you,
Roman

^ permalink raw reply

* Re: [PATCH v2 5/6] drivers/hv/vmbus: Get the irq number from DeviceTree
From: Roman Kisel @ 2024-05-20 19:25 UTC (permalink / raw)
  To: Rob Herring
  Cc: arnd, bhelgaas, bp, catalin.marinas, dave.hansen, decui, haiyangz,
	hpa, kw, kys, lenb, lpieralisi, mingo, mhklinux, rafael, tglx,
	wei.liu, will, linux-acpi, linux-arch, linux-arm-kernel,
	linux-hyperv, linux-kernel, linux-pci, x86, ssengar, sunilmut,
	vdso
In-Reply-To: <CAL_JsqKXejxzixzwQO4U_00WAaV_iaEh8Mndf6R5BhLQsgVwLQ@mail.gmail.com>



On 5/17/2024 10:14 AM, Rob Herring wrote:
> On Tue, May 14, 2024 at 5:45 PM Roman Kisel <romank@linux.microsoft.com> wrote:
>>
>> The vmbus driver uses ACPI for interrupt assignment on
>> arm64 hence it won't function in the VTL mode where only
>> DeviceTree can be used.
>>
>> Update the vmbus driver to discover interrupt configuration
>> via DeviceTree.
>>
>> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
>> ---
>>   drivers/hv/vmbus_drv.c | 37 +++++++++++++++++++++++++++++++++++++
>>   1 file changed, 37 insertions(+)
>>
>> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
>> index e25223cee3ab..52f01bd1c947 100644
>> --- a/drivers/hv/vmbus_drv.c
>> +++ b/drivers/hv/vmbus_drv.c
>> @@ -36,6 +36,7 @@
>>   #include <linux/syscore_ops.h>
>>   #include <linux/dma-map-ops.h>
>>   #include <linux/pci.h>
>> +#include <linux/of_irq.h>
> 
> If you are using this header in a driver, you are doing it wrong. We
> have common functions which work on both ACPI or DT, so use them if
> you have a need to support both.
> 
Understood, thank you! I'll look more for the examples. If you happen to 
have in mind the place where the idiomatic/more preferred approach is 
used, please let me know, would owe you a great debt of gratitude.


> Though my first question on a binding will be the same as on every
> 'hypervisor binding'.  Why can't you make your hypervisor interfaces
> discoverable? It's all s/w, not some h/w device which is fixed.
> 
I've taken a look at the related art. AWS's Firecracker, Intel's Cloud 
Hypervisor, Google's CrosVM, QEmu allow the guest use the 
well-established battle-tested generic approaches (ACPI, 
DeviceTree/OpenFirmware) of describing the virtual hardware and its 
resources rather than making the guest use their own specific 
interfaces. That holds true for the s/w devices like 
"vcpu-stall-detector" and VirtIO that do not have counterparts built as 
hardware, too.

Here, the guest needs to set up VMBus (the intra-partition communication 
transport) to be able to talk to the host partition. Receiving a message 
needs an interrupt service routine attached to the interrupt injected 
into the guest virtual processor, and DeviceTree lets provide the 
interrupt number. If a custom interface were used here, it'd look less 
expected due to others relying on ACPI and DT for configuring virtual 
devices and busses. A specialized interface would add more code (new 
code) instead of relying on the approach that is widely used.


> Rob

-- 
Thank you,
Roman

^ permalink raw reply

* Re: [PATCH 1/6] arm64/hyperv: Support DeviceTree
From: Roman Kisel @ 2024-05-20 17:08 UTC (permalink / raw)
  To: Easwar Hariharan, kys, haiyangz, wei.liu, decui, linux-hyperv,
	rafael, lenb, linux-acpi
  Cc: ssengar, sunilmut
In-Reply-To: <e6f9761a-e542-417a-a2af-ba9f2bd253c4@linux.microsoft.com>



On 5/14/2024 5:00 PM, Easwar Hariharan wrote:
> On 5/14/2024 4:17 PM, Roman Kisel wrote:
>>
>>
>> On 5/14/2024 3:46 PM, Easwar Hariharan wrote:
>>> On 5/10/2024 10:42 AM, Roman Kisel wrote:
>>>>
>>>>
>>>> On 5/10/2024 10:04 AM, Easwar Hariharan wrote:
>>>>> On 5/10/2024 9:05 AM, romank@linux.microsoft.com wrote:
>>>>>> From: Roman Kisel <romank@linux.microsoft.com>
>>>>>>
>>>>>> Update the driver to support DeviceTree boot as well along with ACPI.
>>>>>> This enables the Virtual Trust Level platforms boot up on ARM64.
>>>>>>
>>>>>> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
>>>>>> ---
>>>>>>     arch/arm64/hyperv/mshyperv.c | 34 +++++++++++++++++++++++++++++-----
>>>>>>     1 file changed, 29 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/arm64/hyperv/mshyperv.c b/arch/arm64/hyperv/mshyperv.c
>>>>>> index b1a4de4eee29..208a3bcb9686 100644
>>>>>> --- a/arch/arm64/hyperv/mshyperv.c
>>>>>> +++ b/arch/arm64/hyperv/mshyperv.c
>>>>>> @@ -15,6 +15,9 @@
>>>>>>     #include <linux/errno.h>
>>>>>>     #include <linux/version.h>
>>>>>>     #include <linux/cpuhotplug.h>
>>>>>> +#include <linux/libfdt.h>
>>>>>> +#include <linux/of.h>
>>>>>> +#include <linux/of_fdt.h>
>>>>>>     #include <asm/mshyperv.h>
>>>>>>       static bool hyperv_initialized;
>>>>>> @@ -27,6 +30,29 @@ int hv_get_hypervisor_version(union hv_hypervisor_version_info *info)
>>>>>>         return 0;
>>>>>>     }
>>>>>>     +static bool hyperv_detect_fdt(void)
>>>>>> +{
>>>>>> +#ifdef CONFIG_OF
>>>>>> +    const unsigned long hyp_node = of_get_flat_dt_subnode_by_name(
>>>>>> +            of_get_flat_dt_root(), "hypervisor");
>>>>>> +
>>>>>> +    return (hyp_node != -FDT_ERR_NOTFOUND) &&
>>>>>> +            of_flat_dt_is_compatible(hyp_node, "microsoft,hyperv");
>>>>>> +#else
>>>>>> +    return false;
>>>>>> +#endif
>>>>>> +}
>>>>>> +
>>>>>> +static bool hyperv_detect_acpi(void)
>>>>>> +{
>>>>>> +#ifdef CONFIG_ACPI
>>>>>> +    return !acpi_disabled &&
>>>>>> +            !strncmp((char *)&acpi_gbl_FADT.hypervisor_id, "MsHyperV", 8);
>>>>>> +#else
>>>>>> +    return false;
>>>>>> +#endif
>>>>>> +}
>>>>>> +
>>>>>
>>>>> Could using IS_ENABLED() allow collapsing these two functions into one hyperv_detect_fw()?
>>>>> I am wondering if #ifdef was explicitly chosen to allow for the code to be compiled in if CONFIG* is defined
>>>>> v/s IS_ENABLED() only being true if the CONFIG value is true.
>>>>>
>>>> In the hyperv_detect_fdt function, the #ifdef has been chosen due to the functions being declared only when the macro is defined, hence I could not rely on `if IS_ENABLED()` and the run-time detection. For the compile-time option, `#if IS_ENABLED()` would convey the intent better, could update with that.
>>>
>>> In patch 2/6, just under the diff you have, we already `select OF_EARLY_FLATTREE if OF`, so the declarations (and definitions)
>>> of the functions being present is already handled, AFAIK. Are we thinking there may be some weird config where neither OF nor
>>> ACPI is selected? If so, we should set a `depends on ACPI || OF` for config HYPERV to prevent that.
>>>
>>> I don't know if I'm missing something obvious here, so please correct me if I'm wrong.
>>>
>> I just sent out the v2 of the patch set, and (un?)fortunately missed the change I had for the #ifdef's in this chunk (to use #if IS_ENABLED() and remove pre-processor directives from the ACPI-related function).
>>
>> I am making the point that the change you are suggesting (as I understand) is this conditional statement
>>
>> if IS_ENABLED(CONFIG_OF) {
>>      const unsigned long hyp_node = of_get_flat_dt_subnode_by_name(
>>                  of_get_flat_dt_root(), "hypervisor");
>>
>>      return (hyp_node != -FDT_ERR_NOTFOUND) &&
>>                  of_flat_dt_is_compatible(hyp_node, "microsoft,hyperv");
>> }
>>
> 
> That's right, that's the suggestion I'm making.
> 
>> and for it to link successfully, one needs of_get_flat_dt_subnode_by_name defined. From the source code, that needs CONFIG_OF_EARLY_FLATTREE as there is no stub available when CONFIG_OF_EARLY_FLATTREE is not defined.
> 
> We agree that you need of_get_flat_dt_subnode_by_name declared and defined, and it's not available, stub or otherwise, if CONFIG_OF_EARLY_FLATTREE is not defined.
> 
> In my mind, I believed that either ACPI or OF had to be selected for some sort of firmware handoff to occur, but I did some experimentation and ended up with an
> x86_64 kernel config that has neither enabled, but nonetheless has CONFIG_HYPERV enabled. The kernel builds with such a config, whether it's useful for anything
> is another question. ARM64 requires CONFIG_OF so that side of the equation is clear.
> 
> That's where my question above came in: Are we thinking there may be some weird config where neither OF nor ACPI is selected? I thought that was not a reasonable
> config, thus my prescription to set `depends on ACPI || OF` for config HYPERV. If there is a use case for an x86_64 kernel that has neither ACPI nor OpenFirmware/FDT,
> but nevertheless sets CONFIG_HYPERV, feel free to ignore this comment thread entirely.
> 
Thanks for laying out the details so patiently for me! I believe I 
understand your concerns better now. In the V2 of the patch series, 
Michael Kelly suggested rethinking how the Kconfig change is carried 
out. Appears natural to try out what your suggesting alongside with the 
Michael's suggestions.

As for your other point, I wouldn't think, too, the system could boot or 
be useful without some coordinates of the environment passed via ACPI || DT.

> 
> Thanks,
> Easwar

-- 
Thank you,
Roman

^ permalink raw reply

* Re: [PATCH 2/6] drivers/hv: Enable VTL mode for arm64
From: Roman Kisel @ 2024-05-20 17:00 UTC (permalink / raw)
  To: Wei Liu
  Cc: kys, haiyangz, decui, linux-hyperv, rafael, lenb, linux-acpi,
	ssengar, sunilmut
In-Reply-To: <ZkRnnM1f5lUo7OLB@liuwe-devbox-debian-v2>



On 5/15/2024 12:43 AM, Wei Liu wrote:
> On Fri, May 10, 2024 at 09:05:01AM -0700, romank@linux.microsoft.com wrote:
>> From: Roman Kisel <romank@linux.microsoft.com>
>>
>> This change removes dependency on ACPI when buidling the hv drivers to
>> allow Virtual Trust Level boot with DeviceTree.
>>
>> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
>> ---
>>   drivers/hv/Kconfig | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
>> index 862c47b191af..a5cd1365e248 100644
>> --- a/drivers/hv/Kconfig
>> +++ b/drivers/hv/Kconfig
>> @@ -5,7 +5,7 @@ menu "Microsoft Hyper-V guest support"
>>   config HYPERV
>>   	tristate "Microsoft Hyper-V client drivers"
>>   	depends on (X86 && X86_LOCAL_APIC && HYPERVISOR_GUEST) \
>> -		|| (ACPI && ARM64 && !CPU_BIG_ENDIAN)
>> +		|| (ARM64 && !CPU_BIG_ENDIAN)
>>   	select PARAVIRT
>>   	select X86_HV_CALLBACK_VECTOR if X86
>>   	select OF_EARLY_FLATTREE if OF
>> @@ -15,7 +15,7 @@ config HYPERV
>>   
>>   config HYPERV_VTL_MODE
>>   	bool "Enable Linux to boot in VTL context"
>> -	depends on X86_64 && HYPERV
>> +	depends on HYPERV
> 
> As Ktest bot pointed out, this change broke the build.
> 
Appreciate your help! I've sent out v2 with the base commit set, and so 
far, robots have been fine with it :) I haven't noticed an alert of a 
broken build due to this change.

> Thanks,
> Wei.
> 
>>   	depends on SMP
>>   	default n
>>   	help
>> @@ -31,7 +31,7 @@ config HYPERV_VTL_MODE
>>   
>>   	  Select this option to build a Linux kernel to run at a VTL other than
>>   	  the normal VTL0, which currently is only VTL2.  This option
>> -	  initializes the x86 platform for VTL2, and adds the ability to boot
>> +	  initializes the kernel to run in VTL2, and adds the ability to boot
>>   	  secondary CPUs directly into 64-bit context as required for VTLs other
>>   	  than 0.  A kernel built with this option must run at VTL2, and will
>>   	  not run as a normal guest.
>> -- 
>> 2.45.0
>>

-- 
Thank you,
Roman

^ permalink raw reply

* Re: [PATCH 01/20] x86/tdx: Introduce tdvmcall_trampoline()
From: Dave Hansen @ 2024-05-20 15:49 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Sean Christopherson, Paolo Bonzini, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Josh Poimboeuf, Peter Zijlstra, linux-coco, linux-kernel,
	linux-hyperv
In-Reply-To: <itsl2xhcrwotxrjpm7msfzc7mp73wk47pnwgyfcvbrioj3p3hn@2e7c4p4gtwh4>

On 5/20/24 03:32, Kirill A. Shutemov wrote:
>> In other words, I think this as the foundational justification for the
>> rest of the series leaves a little to be desired.
> See the patch below. Is it what you had in mind?
> 
> This patch saves ~2K of code, comparing to ~3K for my patchset:
> 
> add/remove: 0/0 grow/shrink: 1/17 up/down: 8/-2266 (-2258)
> 
> But it is considerably simpler.
> 
>  arch/x86/boot/compressed/tdx.c    |  32 ++++----
>  arch/x86/coco/tdx/tdx-shared.c    |   3 +-
>  arch/x86/coco/tdx/tdx.c           | 159 +++++++++++++++++++++-----------------
>  arch/x86/hyperv/ivm.c             |  32 ++++----
>  arch/x86/include/asm/shared/tdx.h |  25 ++++--
>  5 files changed, 142 insertions(+), 109 deletions(-)

The diffstat is a bit misleading because those extra lines really add
very little complexity. The only real risk is that folks end up leaving
the args structure uninitialized, but there are a number of ways to
mitigate that risk.

> +static __always_inline void tdx_arg_init(struct tdx_module_args *args)
> +{
> +	asm ("rep stosb"
> +	     : "+D" (args)
> +	     : "c" (sizeof(*args)), "a" (0)
> +	     : "memory");
> +}

There are a bunch of ways to do this.  This one certainly isn't _bad_,
but I'd be open to doing it other ways if folks have more ideas.

Either way, I very much prefer this approach to adding a bunch more
assembly and making things less self-documenting.  I also suspect you
can get some more text size shrinkage from selectively uninlining a few
things.

^ permalink raw reply

* Re: [PATCH 00/20] x86/tdx: Rewrite TDCALL wrappers
From: Huang, Kai @ 2024-05-20 11:56 UTC (permalink / raw)
  To: jpoimboe@kernel.org, Hansen, Dave, dave.hansen@linux.intel.com,
	haiyangz@microsoft.com, kirill.shutemov@linux.intel.com,
	seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com,
	kys@microsoft.com, Cui, Dexuan, tglx@linutronix.de, hpa@zytor.com,
	peterz@infradead.org, wei.liu@kernel.org, bp@alien8.de,
	x86@kernel.org
  Cc: linux-coco@lists.linux.dev, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <a8628846-1d87-4191-92b8-c7a1e70cb196@intel.com>

On Fri, 2024-05-17 at 08:18 -0700, Dave Hansen wrote:
> On 5/17/24 07:19, Kirill A. Shutemov wrote:
> >  arch/x86/boot/compressed/tdx.c    |  32 +---
> >  arch/x86/coco/tdx/tdcall.S        | 145 ++++++++++-----
> >  arch/x86/coco/tdx/tdx-shared.c    |  26 +--
> >  arch/x86/coco/tdx/tdx.c           | 298 ++++++++----------------------
> >  arch/x86/hyperv/ivm.c             |  33 +---
> >  arch/x86/include/asm/shared/tdx.h | 159 +++++++++++-----
> >  arch/x86/include/asm/tdx.h        |   2 +
> >  arch/x86/virt/vmx/tdx/tdxcall.S   |  29 +--
> >  tools/objtool/noreturns.h         |   2 +-
> >  9 files changed, 322 insertions(+), 404 deletions(-)
> 
> I was going to grumble about this being a waste of time, but it looks
> like this gives smaller binaries and less code.  Looks promising so far!
> 

I'll start to work on the SEAMCALL part too.  Thanks Kirill for the work.

^ permalink raw reply

* Re: [PATCH 16/20] x86/tdx: Convert VP_INFO tdcall to use new TDCALL_5() macro
From: Kirill A. Shutemov @ 2024-05-20 11:02 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Sean Christopherson, Paolo Bonzini, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Josh Poimboeuf, Peter Zijlstra, linux-coco, linux-kernel,
	linux-hyperv
In-Reply-To: <ca2adcf7-5708-4142-bdd5-8700b98b4a5b@intel.com>

On Fri, May 17, 2024 at 08:57:10AM -0700, Dave Hansen wrote:
> On 5/17/24 07:19, Kirill A. Shutemov wrote:
> > -	/*
> > -	 * TDINFO TDX module call is used to get the TD execution environment
> > -	 * information like GPA width, number of available vcpus, debug mode
> > -	 * information, etc. More details about the ABI can be found in TDX
> > -	 * Guest-Host-Communication Interface (GHCI), section 2.4.2 TDCALL
> > -	 * [TDG.VP.INFO].
> > -	 */
> > -	tdcall(TDG_VP_INFO, &args);
> > +	tdg_vp_info(&gpa_width, &td_attr);
> 
> Why is the comment going away?

By mistake. Will fix.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply

* Re: [PATCH 02/20] x86/tdx: Add macros to generate TDVMCALL wrappers
From: Kirill A. Shutemov @ 2024-05-20 10:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Josh Poimboeuf,
	Peter Zijlstra, linux-coco, linux-kernel, linux-hyperv
In-Reply-To: <58b02adc-7389-4fcd-a443-1856af7886b7@redhat.com>

On Fri, May 17, 2024 at 06:54:15PM +0200, Paolo Bonzini wrote:
> On 5/17/24 16:19, Kirill A. Shutemov wrote:
> > Introduce a set of macros that allow to generate wrappers for TDVMCALL
> > leafs. The macros uses tdvmcall_trmapoline() and provides SYSV-complaint
> > ABI on top of it.
> 
> Not really SYSV-compliant, more like "The macros use asm() to call
> tdvmcall_trampoline with its custom parameter passing convention".

Sounds better, thanks.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply

* Re: [PATCH 01/20] x86/tdx: Introduce tdvmcall_trampoline()
From: Kirill A. Shutemov @ 2024-05-20 10:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Josh Poimboeuf,
	Peter Zijlstra, linux-coco, linux-kernel, linux-hyperv
In-Reply-To: <9cd7f179-673b-4e96-be08-128dc6fb6271@redhat.com>

On Fri, May 17, 2024 at 07:02:25PM +0200, Paolo Bonzini wrote:
> On 5/17/24 16:19, Kirill A. Shutemov wrote:
> > The function will be used from inline assembly to handle most TDVMCALL
> > cases.
> 
> Perhaps add that the calling convention is designed to allow using the asm
> constraints a/b/c/d/S/D and keep the asm blocks simpler?

Sure.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply

* Re: [PATCH 01/20] x86/tdx: Introduce tdvmcall_trampoline()
From: Kirill A. Shutemov @ 2024-05-20 10:32 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Sean Christopherson, Paolo Bonzini, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Josh Poimboeuf, Peter Zijlstra, linux-coco, linux-kernel,
	linux-hyperv
In-Reply-To: <395850c4-f8a3-46ed-9b0c-b1f47386610c@intel.com>

On Fri, May 17, 2024 at 08:21:37AM -0700, Dave Hansen wrote:
> On 5/17/24 07:19, Kirill A. Shutemov wrote:
> > TDCALL calls are centralized into a few megawrappers that take the
> > struct tdx_module_args as input. Most of the call sites only use a few
> > arguments, but they have to zero out unused fields in the structure to
> > avoid data leaks to the VMM. This leads to the compiler generating
> > inefficient code: dozens of instructions per call site to clear unused
> > fields of the structure.
> 
> I agree that this is what the silly compiler does in practice.  But my
> first preference for fixing it would just be an out-of-line memset() or
> a pretty bare REP;MOV.
> 
> In other words, I think this as the foundational justification for the
> rest of the series leaves a little to be desired.

See the patch below. Is it what you had in mind?

This patch saves ~2K of code, comparing to ~3K for my patchset:

add/remove: 0/0 grow/shrink: 1/17 up/down: 8/-2266 (-2258)

But it is considerably simpler.

 arch/x86/boot/compressed/tdx.c    |  32 ++++----
 arch/x86/coco/tdx/tdx-shared.c    |   3 +-
 arch/x86/coco/tdx/tdx.c           | 159 +++++++++++++++++++++-----------------
 arch/x86/hyperv/ivm.c             |  32 ++++----
 arch/x86/include/asm/shared/tdx.h |  25 ++++--
 5 files changed, 142 insertions(+), 109 deletions(-)

diff --git a/arch/x86/boot/compressed/tdx.c b/arch/x86/boot/compressed/tdx.c
index 8451d6a1030c..a6784a9153e4 100644
--- a/arch/x86/boot/compressed/tdx.c
+++ b/arch/x86/boot/compressed/tdx.c
@@ -18,13 +18,14 @@ void __tdx_hypercall_failed(void)
 
 static inline unsigned int tdx_io_in(int size, u16 port)
 {
-	struct tdx_module_args args = {
-		.r10 = TDX_HYPERCALL_STANDARD,
-		.r11 = hcall_func(EXIT_REASON_IO_INSTRUCTION),
-		.r12 = size,
-		.r13 = 0,
-		.r14 = port,
-	};
+	struct tdx_module_args args;
+
+	tdx_arg_init(&args);
+	args.r10 = TDX_HYPERCALL_STANDARD;
+	args.r11 = hcall_func(EXIT_REASON_IO_INSTRUCTION);
+	args.r12 = size;
+	args.r13 = 0;
+	args.r14 = port;
 
 	if (__tdx_hypercall(&args))
 		return UINT_MAX;
@@ -34,14 +35,15 @@ static inline unsigned int tdx_io_in(int size, u16 port)
 
 static inline void tdx_io_out(int size, u16 port, u32 value)
 {
-	struct tdx_module_args args = {
-		.r10 = TDX_HYPERCALL_STANDARD,
-		.r11 = hcall_func(EXIT_REASON_IO_INSTRUCTION),
-		.r12 = size,
-		.r13 = 1,
-		.r14 = port,
-		.r15 = value,
-	};
+	struct tdx_module_args args;
+
+	tdx_arg_init(&args);
+	args.r10 = TDX_HYPERCALL_STANDARD;
+	args.r11 = hcall_func(EXIT_REASON_IO_INSTRUCTION);
+	args.r12 = size;
+	args.r13 = 1;
+	args.r14 = port;
+	args.r15 = value;
 
 	__tdx_hypercall(&args);
 }
diff --git a/arch/x86/coco/tdx/tdx-shared.c b/arch/x86/coco/tdx/tdx-shared.c
index 1655aa56a0a5..b8d1b3d940d2 100644
--- a/arch/x86/coco/tdx/tdx-shared.c
+++ b/arch/x86/coco/tdx/tdx-shared.c
@@ -5,7 +5,7 @@ static unsigned long try_accept_one(phys_addr_t start, unsigned long len,
 				    enum pg_level pg_level)
 {
 	unsigned long accept_size = page_level_size(pg_level);
-	struct tdx_module_args args = {};
+	struct tdx_module_args args;
 	u8 page_size;
 
 	if (!IS_ALIGNED(start, accept_size))
@@ -34,6 +34,7 @@ static unsigned long try_accept_one(phys_addr_t start, unsigned long len,
 		return 0;
 	}
 
+	tdx_arg_init(&args);
 	args.rcx = start | page_size;
 	if (__tdcall(TDG_MEM_PAGE_ACCEPT, &args))
 		return 0;
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index cadd583d6f62..e8bb8afe04a9 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -53,13 +53,14 @@ noinstr void __noreturn __tdx_hypercall_failed(void)
 long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, unsigned long p2,
 		       unsigned long p3, unsigned long p4)
 {
-	struct tdx_module_args args = {
-		.r10 = nr,
-		.r11 = p1,
-		.r12 = p2,
-		.r13 = p3,
-		.r14 = p4,
-	};
+	struct tdx_module_args args;
+
+	tdx_arg_init(&args);
+	args.r10 = nr;
+	args.r11 = p1;
+	args.r12 = p2;
+	args.r13 = p3;
+	args.r14 = p4;
 
 	return __tdx_hypercall(&args);
 }
@@ -80,11 +81,12 @@ static inline void tdcall(u64 fn, struct tdx_module_args *args)
 /* Read TD-scoped metadata */
 static inline u64 tdg_vm_rd(u64 field, u64 *value)
 {
-	struct tdx_module_args args = {
-		.rdx = field,
-	};
+	struct tdx_module_args args;
 	u64 ret;
 
+	tdx_arg_init(&args);
+	args.rdx = field,
+
 	ret = __tdcall_ret(TDG_VM_RD, &args);
 	*value = args.r8;
 
@@ -94,11 +96,12 @@ static inline u64 tdg_vm_rd(u64 field, u64 *value)
 /* Write TD-scoped metadata */
 static inline u64 tdg_vm_wr(u64 field, u64 value, u64 mask)
 {
-	struct tdx_module_args args = {
-		.rdx = field,
-		.r8 = value,
-		.r9 = mask,
-	};
+	struct tdx_module_args args;
+
+	tdx_arg_init(&args);
+	args.rdx = field;
+	args.r8 = value;
+	args.r9 = mask;
 
 	return __tdcall(TDG_VM_WR, &args);
 }
@@ -119,13 +122,14 @@ static inline u64 tdg_vm_wr(u64 field, u64 value, u64 mask)
  */
 int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport)
 {
-	struct tdx_module_args args = {
-		.rcx = virt_to_phys(tdreport),
-		.rdx = virt_to_phys(reportdata),
-		.r8 = TDREPORT_SUBTYPE_0,
-	};
+	struct tdx_module_args args;
 	u64 ret;
 
+	tdx_arg_init(&args);
+	args.rcx = virt_to_phys(tdreport);
+	args.rdx = virt_to_phys(reportdata);
+	args.r8 = TDREPORT_SUBTYPE_0;
+
 	ret = __tdcall(TDG_MR_REPORT, &args);
 	if (ret) {
 		if (TDCALL_RETURN_CODE(ret) == TDCALL_INVALID_OPERAND)
@@ -160,11 +164,7 @@ EXPORT_SYMBOL_GPL(tdx_hcall_get_quote);
 
 static void __noreturn tdx_panic(const char *msg)
 {
-	struct tdx_module_args args = {
-		.r10 = TDX_HYPERCALL_STANDARD,
-		.r11 = TDVMCALL_REPORT_FATAL_ERROR,
-		.r12 = 0, /* Error code: 0 is Panic */
-	};
+	struct tdx_module_args args;
 	union {
 		/* Define register order according to the GHCI */
 		struct { u64 r14, r15, rbx, rdi, rsi, r8, r9, rdx; };
@@ -175,6 +175,11 @@ static void __noreturn tdx_panic(const char *msg)
 	/* VMM assumes '\0' in byte 65, if the message took all 64 bytes */
 	strtomem_pad(message.str, msg, '\0');
 
+	tdx_arg_init(&args);
+	args.r10 = TDX_HYPERCALL_STANDARD;
+	args.r11 = TDVMCALL_REPORT_FATAL_ERROR;
+	args.r12 = 0; /* Error code: 0 is Panic */
+
 	args.r8  = message.r8;
 	args.r9  = message.r9;
 	args.r14 = message.r14;
@@ -277,10 +282,12 @@ static void enable_cpu_topology_enumeration(void)
 
 static void tdx_setup(u64 *cc_mask)
 {
-	struct tdx_module_args args = {};
+	struct tdx_module_args args;
 	unsigned int gpa_width;
 	u64 td_attr;
 
+	tdx_arg_init(&args);
+
 	/*
 	 * TDINFO TDX module call is used to get the TD execution environment
 	 * information like GPA width, number of available vcpus, debug mode
@@ -356,11 +363,12 @@ static int ve_instr_len(struct ve_info *ve)
 
 static u64 __cpuidle __halt(const bool irq_disabled)
 {
-	struct tdx_module_args args = {
-		.r10 = TDX_HYPERCALL_STANDARD,
-		.r11 = hcall_func(EXIT_REASON_HLT),
-		.r12 = irq_disabled,
-	};
+	struct tdx_module_args args;
+
+	tdx_arg_init(&args);
+	args.r10 = TDX_HYPERCALL_STANDARD;
+	args.r11 = hcall_func(EXIT_REASON_HLT);
+	args.r12 = irq_disabled;
 
 	/*
 	 * Emulate HLT operation via hypercall. More info about ABI
@@ -400,11 +408,12 @@ void __cpuidle tdx_safe_halt(void)
 
 static int read_msr(struct pt_regs *regs, struct ve_info *ve)
 {
-	struct tdx_module_args args = {
-		.r10 = TDX_HYPERCALL_STANDARD,
-		.r11 = hcall_func(EXIT_REASON_MSR_READ),
-		.r12 = regs->cx,
-	};
+	struct tdx_module_args args;
+
+	tdx_arg_init(&args);
+	args.r10 = TDX_HYPERCALL_STANDARD;
+	args.r11 = hcall_func(EXIT_REASON_MSR_READ);
+	args.r12 = regs->cx;
 
 	/*
 	 * Emulate the MSR read via hypercall. More info about ABI
@@ -421,12 +430,13 @@ static int read_msr(struct pt_regs *regs, struct ve_info *ve)
 
 static int write_msr(struct pt_regs *regs, struct ve_info *ve)
 {
-	struct tdx_module_args args = {
-		.r10 = TDX_HYPERCALL_STANDARD,
-		.r11 = hcall_func(EXIT_REASON_MSR_WRITE),
-		.r12 = regs->cx,
-		.r13 = (u64)regs->dx << 32 | regs->ax,
-	};
+	struct tdx_module_args args;
+
+	tdx_arg_init(&args);
+	args.r10 = TDX_HYPERCALL_STANDARD;
+	args.r11 = hcall_func(EXIT_REASON_MSR_WRITE);
+	args.r12 = regs->cx;
+	args.r13 = (u64)regs->dx << 32 | regs->ax;
 
 	/*
 	 * Emulate the MSR write via hypercall. More info about ABI
@@ -441,12 +451,13 @@ static int write_msr(struct pt_regs *regs, struct ve_info *ve)
 
 static int handle_cpuid(struct pt_regs *regs, struct ve_info *ve)
 {
-	struct tdx_module_args args = {
-		.r10 = TDX_HYPERCALL_STANDARD,
-		.r11 = hcall_func(EXIT_REASON_CPUID),
-		.r12 = regs->ax,
-		.r13 = regs->cx,
-	};
+	struct tdx_module_args args;
+
+	tdx_arg_init(&args);
+	args.r10 = TDX_HYPERCALL_STANDARD;
+	args.r11 = hcall_func(EXIT_REASON_CPUID);
+	args.r12 = regs->ax;
+	args.r13 = regs->cx;
 
 	/*
 	 * Only allow VMM to control range reserved for hypervisor
@@ -483,14 +494,15 @@ static int handle_cpuid(struct pt_regs *regs, struct ve_info *ve)
 
 static bool mmio_read(int size, unsigned long addr, unsigned long *val)
 {
-	struct tdx_module_args args = {
-		.r10 = TDX_HYPERCALL_STANDARD,
-		.r11 = hcall_func(EXIT_REASON_EPT_VIOLATION),
-		.r12 = size,
-		.r13 = EPT_READ,
-		.r14 = addr,
-		.r15 = *val,
-	};
+	struct tdx_module_args args;
+
+	tdx_arg_init(&args);
+	args.r10 = TDX_HYPERCALL_STANDARD;
+	args.r11 = hcall_func(EXIT_REASON_EPT_VIOLATION);
+	args.r12 = size;
+	args.r13 = EPT_READ;
+	args.r14 = addr;
+	args.r15 = *val;
 
 	if (__tdx_hypercall(&args))
 		return false;
@@ -612,16 +624,17 @@ static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
 
 static bool handle_in(struct pt_regs *regs, int size, int port)
 {
-	struct tdx_module_args args = {
-		.r10 = TDX_HYPERCALL_STANDARD,
-		.r11 = hcall_func(EXIT_REASON_IO_INSTRUCTION),
-		.r12 = size,
-		.r13 = PORT_READ,
-		.r14 = port,
-	};
+	struct tdx_module_args args;
 	u64 mask = GENMASK(BITS_PER_BYTE * size, 0);
 	bool success;
 
+	tdx_arg_init(&args);
+	args.r10 = TDX_HYPERCALL_STANDARD;
+	args.r11 = hcall_func(EXIT_REASON_IO_INSTRUCTION);
+	args.r12 = size;
+	args.r13 = PORT_READ;
+	args.r14 = port;
+
 	/*
 	 * Emulate the I/O read via hypercall. More info about ABI can be found
 	 * in TDX Guest-Host-Communication Interface (GHCI) section titled
@@ -706,7 +719,9 @@ __init bool tdx_early_handle_ve(struct pt_regs *regs)
 
 void tdx_get_ve_info(struct ve_info *ve)
 {
-	struct tdx_module_args args = {};
+	struct tdx_module_args args;
+
+	tdx_arg_init(&args);
 
 	/*
 	 * Called during #VE handling to retrieve the #VE info from the
@@ -849,14 +864,16 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
 	}
 
 	while (retry_count < max_retries_per_page) {
-		struct tdx_module_args args = {
-			.r10 = TDX_HYPERCALL_STANDARD,
-			.r11 = TDVMCALL_MAP_GPA,
-			.r12 = start,
-			.r13 = end - start };
-
+		struct tdx_module_args args;
 		u64 map_fail_paddr;
-		u64 ret = __tdx_hypercall(&args);
+		u64 ret;
+
+		tdx_arg_init(&args);
+		args.r10 = TDX_HYPERCALL_STANDARD;
+		args.r11 = TDVMCALL_MAP_GPA;
+		args.r12 = start;
+		args.r13 = end - start;
+		ret = __tdx_hypercall(&args);
 
 		if (ret != TDVMCALL_STATUS_RETRY)
 			return !ret;
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index b4a851d27c7c..38560b006cdf 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -385,27 +385,30 @@ static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
 #ifdef CONFIG_INTEL_TDX_GUEST
 static void hv_tdx_msr_write(u64 msr, u64 val)
 {
-	struct tdx_module_args args = {
-		.r10 = TDX_HYPERCALL_STANDARD,
-		.r11 = EXIT_REASON_MSR_WRITE,
-		.r12 = msr,
-		.r13 = val,
-	};
+	struct tdx_module_args args;
+	u64 ret;
 
-	u64 ret = __tdx_hypercall(&args);
+	args.r10 = TDX_HYPERCALL_STANDARD;
+	args.r11 = EXIT_REASON_MSR_WRITE;
+	args.r12 = msr;
+	args.r13 = val;
+
+	ret = __tdx_hypercall(&args);
 
 	WARN_ONCE(ret, "Failed to emulate MSR write: %lld\n", ret);
 }
 
 static void hv_tdx_msr_read(u64 msr, u64 *val)
 {
-	struct tdx_module_args args = {
-		.r10 = TDX_HYPERCALL_STANDARD,
-		.r11 = EXIT_REASON_MSR_READ,
-		.r12 = msr,
-	};
+	struct tdx_module_args args;
+	u64 ret;
 
-	u64 ret = __tdx_hypercall(&args);
+	tdx_arg_init(&args);
+	args.r10 = TDX_HYPERCALL_STANDARD;
+	args.r11 = EXIT_REASON_MSR_READ;
+	args.r12 = msr;
+
+	ret = __tdx_hypercall(&args);
 
 	if (WARN_ONCE(ret, "Failed to emulate MSR read: %lld\n", ret))
 		*val = 0;
@@ -415,8 +418,9 @@ static void hv_tdx_msr_read(u64 msr, u64 *val)
 
 u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2)
 {
-	struct tdx_module_args args = { };
+	struct tdx_module_args args;
 
+	tdx_arg_init(&args);
 	args.r10 = control;
 	args.rdx = param1;
 	args.r8  = param2;
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index 89f7fcade8ae..fc3082f050dc 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -100,6 +100,14 @@ struct tdx_module_args {
 	u64 rsi;
 };
 
+static __always_inline void tdx_arg_init(struct tdx_module_args *args)
+{
+	asm ("rep stosb"
+	     : "+D" (args)
+	     : "c" (sizeof(*args)), "a" (0)
+	     : "memory");
+}
+
 /* Used to communicate with the TDX module */
 u64 __tdcall(u64 fn, struct tdx_module_args *args);
 u64 __tdcall_ret(u64 fn, struct tdx_module_args *args);
@@ -114,14 +122,15 @@ u64 __tdx_hypercall(struct tdx_module_args *args);
  */
 static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
 {
-	struct tdx_module_args args = {
-		.r10 = TDX_HYPERCALL_STANDARD,
-		.r11 = fn,
-		.r12 = r12,
-		.r13 = r13,
-		.r14 = r14,
-		.r15 = r15,
-	};
+	struct tdx_module_args args;
+
+	tdx_arg_init(&args);
+	args.r10 = TDX_HYPERCALL_STANDARD;
+	args.r11 = fn;
+	args.r12 = r12;
+	args.r13 = r13;
+	args.r14 = r14;
+	args.r15 = r15;
 
 	return __tdx_hypercall(&args);
 }
-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply related

* Re: [PATCH v2 1/6] arm64/hyperv: Support DeviceTree
From: Krzysztof Kozlowski @ 2024-05-20  6:45 UTC (permalink / raw)
  To: Roman Kisel, arnd, bhelgaas, bp, catalin.marinas, dave.hansen,
	decui, haiyangz, hpa, kw, kys, lenb, lpieralisi, mingo, mhklinux,
	rafael, robh, tglx, wei.liu, will, linux-acpi, linux-arch,
	linux-arm-kernel, linux-hyperv, linux-kernel, linux-pci, x86
  Cc: ssengar, sunilmut, vdso
In-Reply-To: <267ef0e2-2384-44bd-81f9-f33dda7bb9d2@linux.microsoft.com>

On 15/05/2024 19:33, Roman Kisel wrote:
>>>   static bool hyperv_initialized;
>>> @@ -27,6 +30,29 @@ int hv_get_hypervisor_version(union hv_hypervisor_version_info *info)
>>>   	return 0;
>>>   }
>>>   
>>> +static bool hyperv_detect_fdt(void)
>>> +{
>>> +#ifdef CONFIG_OF
>>> +	const unsigned long hyp_node = of_get_flat_dt_subnode_by_name(
>>> +			of_get_flat_dt_root(), "hypervisor");
>>
>> Why do you add an ABI for node name? Although name looks OK, but is it
>> really described in the spec that you depend on it? I really do not like
>> name dependencies...
> 
> Followed the existing DeviceTree's of naming and approaches in the 
> kernel to surprise less and "invent" even less. As for the spec, the 

I am sorry, but there is no approved existing approach of adding ABI for
node names. There are exceptions or specific cases, but that's not
"invent less" approach. ABI is defined by compatible.



Best regards,
Krzysztof


^ permalink raw reply

* [PATCH V2 net] net: mana: Fix the extra HZ in mana_hwc_send_request
From: Souradeep Chakrabarti @ 2024-05-20  6:05 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
	longli, yury.norov, leon, cai.huoqing, ssengar, vkuznets, tglx,
	linux-hyperv, netdev, linux-kernel, linux-rdma
  Cc: schakrabarti, Souradeep Chakrabarti, stable

Commit 62c1bff593b7 added an extra HZ along with msecs_to_jiffies.
This patch fixes that.

Cc: stable@vger.kernel.org
Fixes: 62c1bff593b7 ("net: mana: Configure hwc timeout from hardware")
Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/net/ethernet/microsoft/mana/hw_channel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
index 2729a2c5acf9..ca814fe8a775 100644
--- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
@@ -848,7 +848,7 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
 	}
 
 	if (!wait_for_completion_timeout(&ctx->comp_event,
-					 (msecs_to_jiffies(hwc->hwc_timeout) * HZ))) {
+					 (msecs_to_jiffies(hwc->hwc_timeout)))) {
 		dev_err(hwc->dev, "HWC: Request timed out!\n");
 		err = -ETIMEDOUT;
 		goto out;
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH] tracing/treewide: Remove second parameter of __assign_str()
From: Guenter Roeck @ 2024-05-18  1:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux trace kernel, Masami Hiramatsu, Mathieu Desnoyers,
	Linus Torvalds, linuxppc-dev, kvm, linux-block, linux-cxl,
	linux-media, dri-devel, amd-gfx, intel-gfx, intel-xe,
	linux-arm-msm, freedreno, virtualization, linux-rdma, linux-pm,
	iommu, linux-tegra, netdev, linux-hyperv, ath10k, linux-wireless,
	ath11k, ath12k, brcm80211, brcm80211-dev-list.pdl, linux-usb,
	linux-bcachefs, linux-nfs, ocfs2-devel, linux-cifs, linux-xfs,
	linux-edac, selinux, linux-btrfs, linux-erofs, linux-f2fs-devel,
	linux-hwmon, io-uring, linux-sound, bpf, linux-wpan, dev,
	linux-s390, tipc-discussion, Julia Lawall
In-Reply-To: <5cff0ff0-48d1-49f8-84f4-bb33571fdf16@roeck-us.net>

On 5/17/24 11:00, Guenter Roeck wrote:
> On 5/17/24 10:48, Steven Rostedt wrote:
>> On Fri, 17 May 2024 10:36:37 -0700
>> Guenter Roeck <linux@roeck-us.net> wrote:
>>
>>> Building csky:allmodconfig (and others) ... failed
>>> --------------
>>> Error log:
>>> In file included from include/trace/trace_events.h:419,
>>>                   from include/trace/define_trace.h:102,
>>>                   from drivers/cxl/core/trace.h:737,
>>>                   from drivers/cxl/core/trace.c:8:
>>> drivers/cxl/core/./trace.h:383:1: error: macro "__assign_str" passed 2 arguments, but takes just 1
>>>
>>> This is with the patch applied on top of v6.9-8410-gff2632d7d08e.
>>> So far that seems to be the only build failure.
>>> Introduced with commit 6aec00139d3a8 ("cxl/core: Add region info to
>>> cxl_general_media and cxl_dram events"). Guess we'll see more of those
>>> towards the end of the commit window.
>>
>> Looks like I made this patch just before this commit was pulled into
>> Linus's tree.
>>
>> Which is why I'll apply and rerun the above again probably on Tuesday of
>> next week against Linus's latest.
>>
>> This patch made it through both an allyesconfig and an allmodconfig, but on
>> the commit I had applied it to, which was:
>>
>>    1b294a1f3561 ("Merge tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")
>>
>> I'll be compiling those two builds after I update it then.
>>
> 
> I am currently repeating my test builds with the above errors fixed.
> That should take a couple of hours. I'll let you know how it goes.
> 

There are no more build failures caused by this patch after fixing the above
errors.

Tested-by: Guenter Roeck <linux@roeck-us.net>

Guenter


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox