Netdev List
 help / color / mirror / Atom feed
* [PATCH 2/8] dmaengine: shdmac: Change platform check to CONFIG_ARCH_RENESAS
From: Geert Uytterhoeven @ 2018-04-20 13:28 UTC (permalink / raw)
  To: Simon Horman, Magnus Damm, Russell King, Catalin Marinas,
	Will Deacon, Dan Williams, Vinod Koul, Mauro Carvalho Chehab,
	Sergei Shtylyov, David S . Miller, Greg Kroah-Hartman,
	Liam Girdwood, Mark Brown, Jaroslav Kysela, Takashi Iwai,
	Arnd Bergmann, Kuninori Morimoto, Laurent Pinchart
  Cc: devel, alsa-devel, Geert Uytterhoeven, netdev, linux-kernel,
	linux-renesas-soc, dmaengine, linux-arm-kernel, linux-media
In-Reply-To: <1524230914-10175-1-git-send-email-geert+renesas@glider.be>

Since commit 9b5ba0df4ea4f940 ("ARM: shmobile: Introduce ARCH_RENESAS")
is CONFIG_ARCH_RENESAS a more appropriate platform check than the legacy
CONFIG_ARCH_SHMOBILE, hence use the former.

Renesas SuperH SH-Mobile SoCs are still covered by the CONFIG_CPU_SH4
check, just like before support for Renesas ARM SoCs was added.

Instead of blindly changing all the #ifdefs, switch the main code block
in sh_dmae_probe() to IS_ENABLED(), as this allows to remove all the
remaining #ifdefs.

This will allow to drop ARCH_SHMOBILE on ARM in the near future.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/dma/sh/shdmac.c | 50 +++++++++++++++++++++----------------------------
 1 file changed, 21 insertions(+), 29 deletions(-)

diff --git a/drivers/dma/sh/shdmac.c b/drivers/dma/sh/shdmac.c
index 516f5487cc44cf96..8fcaae482ce0949a 100644
--- a/drivers/dma/sh/shdmac.c
+++ b/drivers/dma/sh/shdmac.c
@@ -440,7 +440,6 @@ static bool sh_dmae_reset(struct sh_dmae_device *shdev)
 	return ret;
 }
 
-#if defined(CONFIG_CPU_SH4) || defined(CONFIG_ARCH_SHMOBILE)
 static irqreturn_t sh_dmae_err(int irq, void *data)
 {
 	struct sh_dmae_device *shdev = data;
@@ -451,7 +450,6 @@ static irqreturn_t sh_dmae_err(int irq, void *data)
 	sh_dmae_reset(shdev);
 	return IRQ_HANDLED;
 }
-#endif
 
 static bool sh_dmae_desc_completed(struct shdma_chan *schan,
 				   struct shdma_desc *sdesc)
@@ -683,11 +681,8 @@ static int sh_dmae_probe(struct platform_device *pdev)
 	const struct sh_dmae_pdata *pdata;
 	unsigned long chan_flag[SH_DMAE_MAX_CHANNELS] = {};
 	int chan_irq[SH_DMAE_MAX_CHANNELS];
-#if defined(CONFIG_CPU_SH4) || defined(CONFIG_ARCH_SHMOBILE)
 	unsigned long irqflags = 0;
-	int errirq;
-#endif
-	int err, i, irq_cnt = 0, irqres = 0, irq_cap = 0;
+	int err, errirq, i, irq_cnt = 0, irqres = 0, irq_cap = 0;
 	struct sh_dmae_device *shdev;
 	struct dma_device *dma_dev;
 	struct resource *chan, *dmars, *errirq_res, *chanirq_res;
@@ -789,33 +784,32 @@ static int sh_dmae_probe(struct platform_device *pdev)
 	if (err)
 		goto rst_err;
 
-#if defined(CONFIG_CPU_SH4) || defined(CONFIG_ARCH_SHMOBILE)
-	chanirq_res = platform_get_resource(pdev, IORESOURCE_IRQ, 1);
+	if (IS_ENABLED(CONFIG_CPU_SH4) || IS_ENABLED(CONFIG_ARCH_RENESAS)) {
+		chanirq_res = platform_get_resource(pdev, IORESOURCE_IRQ, 1);
 
-	if (!chanirq_res)
-		chanirq_res = errirq_res;
-	else
-		irqres++;
+		if (!chanirq_res)
+			chanirq_res = errirq_res;
+		else
+			irqres++;
 
-	if (chanirq_res == errirq_res ||
-	    (errirq_res->flags & IORESOURCE_BITS) == IORESOURCE_IRQ_SHAREABLE)
-		irqflags = IRQF_SHARED;
+		if (chanirq_res == errirq_res ||
+		    (errirq_res->flags & IORESOURCE_BITS) == IORESOURCE_IRQ_SHAREABLE)
+			irqflags = IRQF_SHARED;
 
-	errirq = errirq_res->start;
+		errirq = errirq_res->start;
 
-	err = devm_request_irq(&pdev->dev, errirq, sh_dmae_err, irqflags,
-			       "DMAC Address Error", shdev);
-	if (err) {
-		dev_err(&pdev->dev,
-			"DMA failed requesting irq #%d, error %d\n",
-			errirq, err);
-		goto eirq_err;
+		err = devm_request_irq(&pdev->dev, errirq, sh_dmae_err,
+				       irqflags, "DMAC Address Error", shdev);
+		if (err) {
+			dev_err(&pdev->dev,
+				"DMA failed requesting irq #%d, error %d\n",
+				errirq, err);
+			goto eirq_err;
+		}
+	} else {
+		chanirq_res = errirq_res;
 	}
 
-#else
-	chanirq_res = errirq_res;
-#endif /* CONFIG_CPU_SH4 || CONFIG_ARCH_SHMOBILE */
-
 	if (chanirq_res->start == chanirq_res->end &&
 	    !platform_get_resource(pdev, IORESOURCE_IRQ, 1)) {
 		/* Special case - all multiplexed */
@@ -881,9 +875,7 @@ static int sh_dmae_probe(struct platform_device *pdev)
 chan_probe_err:
 	sh_dmae_chan_remove(shdev);
 
-#if defined(CONFIG_CPU_SH4) || defined(CONFIG_ARCH_SHMOBILE)
 eirq_err:
-#endif
 rst_err:
 	spin_lock_irq(&sh_dmae_lock);
 	list_del_rcu(&shdev->node);
-- 
2.7.4

^ permalink raw reply related

* [PATCH 3/8] [media] v4l: rcar_fdp1: Change platform dependency to ARCH_RENESAS
From: Geert Uytterhoeven @ 2018-04-20 13:28 UTC (permalink / raw)
  To: Simon Horman, Magnus Damm, Russell King, Catalin Marinas,
	Will Deacon, Dan Williams, Vinod Koul, Mauro Carvalho Chehab,
	Sergei Shtylyov, David S . Miller, Greg Kroah-Hartman,
	Liam Girdwood, Mark Brown, Jaroslav Kysela, Takashi Iwai,
	Arnd Bergmann, Kuninori Morimoto, Laurent Pinchart
  Cc: devel, alsa-devel, Geert Uytterhoeven, netdev, linux-kernel,
	linux-renesas-soc, dmaengine, linux-arm-kernel, linux-media
In-Reply-To: <1524230914-10175-1-git-send-email-geert+renesas@glider.be>

The Renesas Fine Display Processor driver is used on Renesas R-Car SoCs
only.  Since commit 9b5ba0df4ea4f940 ("ARM: shmobile: Introduce
ARCH_RENESAS") is ARCH_RENESAS a more appropriate platform dependency
than the legacy ARCH_SHMOBILE, hence use the former.

This will allow to drop ARCH_SHMOBILE on ARM and ARM64 in the near
future.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/media/platform/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index f9235e8f8e962d2e..7ad4725f9d1f9627 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -396,7 +396,7 @@ config VIDEO_SH_VEU
 config VIDEO_RENESAS_FDP1
 	tristate "Renesas Fine Display Processor"
 	depends on VIDEO_DEV && VIDEO_V4L2 && HAS_DMA
-	depends on ARCH_SHMOBILE || COMPILE_TEST
+	depends on ARCH_RENESAS || COMPILE_TEST
 	depends on (!ARCH_RENESAS && !VIDEO_RENESAS_FCP) || VIDEO_RENESAS_FCP
 	select VIDEOBUF2_DMA_CONTIG
 	select V4L2_MEM2MEM_DEV
-- 
2.7.4

^ permalink raw reply related

* [PATCH 4/8] sh_eth: Change platform check to CONFIG_ARCH_RENESAS
From: Geert Uytterhoeven @ 2018-04-20 13:28 UTC (permalink / raw)
  To: Simon Horman, Magnus Damm, Russell King, Catalin Marinas,
	Will Deacon, Dan Williams, Vinod Koul, Mauro Carvalho Chehab,
	Sergei Shtylyov, David S . Miller, Greg Kroah-Hartman,
	Liam Girdwood, Mark Brown, Jaroslav Kysela, Takashi Iwai,
	Arnd Bergmann, Kuninori Morimoto, Laurent Pinchart
  Cc: devel, alsa-devel, Geert Uytterhoeven, netdev, linux-kernel,
	linux-renesas-soc, dmaengine, linux-arm-kernel, linux-media
In-Reply-To: <1524230914-10175-1-git-send-email-geert+renesas@glider.be>

Since commit 9b5ba0df4ea4f940 ("ARM: shmobile: Introduce ARCH_RENESAS")
is CONFIG_ARCH_RENESAS a more appropriate platform check than the legacy
CONFIG_ARCH_SHMOBILE, hence use the former.

Renesas SuperH SH-Mobile SoCs are still covered by the CONFIG_CPU_SH4
check.

This will allow to drop ARCH_SHMOBILE on ARM and ARM64 in the near
future.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/net/ethernet/renesas/sh_eth.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.h b/drivers/net/ethernet/renesas/sh_eth.h
index a5b792ce2ae7d046..1bf930d4a1e52c18 100644
--- a/drivers/net/ethernet/renesas/sh_eth.h
+++ b/drivers/net/ethernet/renesas/sh_eth.h
@@ -163,7 +163,7 @@ enum {
 };
 
 /* Driver's parameters */
-#if defined(CONFIG_CPU_SH4) || defined(CONFIG_ARCH_SHMOBILE)
+#if defined(CONFIG_CPU_SH4) || defined(CONFIG_ARCH_RENESAS)
 #define SH_ETH_RX_ALIGN		32
 #else
 #define SH_ETH_RX_ALIGN		2
-- 
2.7.4

^ permalink raw reply related

* [PATCH 5/8] staging: emxx_udc: Change platform dependency to ARCH_RENESAS
From: Geert Uytterhoeven @ 2018-04-20 13:28 UTC (permalink / raw)
  To: Simon Horman, Magnus Damm, Russell King, Catalin Marinas,
	Will Deacon, Dan Williams, Vinod Koul, Mauro Carvalho Chehab,
	Sergei Shtylyov, David S . Miller, Greg Kroah-Hartman,
	Liam Girdwood, Mark Brown, Jaroslav Kysela, Takashi Iwai,
	Arnd Bergmann, Kuninori Morimoto, Laurent Pinchart
  Cc: devel, alsa-devel, Geert Uytterhoeven, netdev, linux-kernel,
	linux-renesas-soc, dmaengine, linux-arm-kernel, linux-media
In-Reply-To: <1524230914-10175-1-git-send-email-geert+renesas@glider.be>

Emma Mobile is a Renesas ARM SoC.  Since commit 9b5ba0df4ea4f940 ("ARM:
shmobile: Introduce ARCH_RENESAS") is ARCH_RENESAS a more appropriate
platform dependency than the legacy ARCH_SHMOBILE, hence use the
former.

This will allow to drop ARCH_SHMOBILE on ARM in the near future.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/staging/emxx_udc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/emxx_udc/Kconfig b/drivers/staging/emxx_udc/Kconfig
index d7577096fb25ae7a..e50e722183648c55 100644
--- a/drivers/staging/emxx_udc/Kconfig
+++ b/drivers/staging/emxx_udc/Kconfig
@@ -1,6 +1,6 @@
 config USB_EMXX
 	tristate "EMXX USB Function Device Controller"
- 	depends on USB_GADGET && (ARCH_SHMOBILE || (ARM && COMPILE_TEST))
+	depends on USB_GADGET && (ARCH_RENESAS || (ARM && COMPILE_TEST))
 	help
 	   The Emma Mobile series of SoCs from Renesas Electronics and
 	   former NEC Electronics include USB Function hardware.
-- 
2.7.4

^ permalink raw reply related

* [PATCH 6/8] ASoC: sh: Update menu title and platform dependency
From: Geert Uytterhoeven @ 2018-04-20 13:28 UTC (permalink / raw)
  To: Simon Horman, Magnus Damm, Russell King, Catalin Marinas,
	Will Deacon, Dan Williams, Vinod Koul, Mauro Carvalho Chehab,
	Sergei Shtylyov, David S . Miller, Greg Kroah-Hartman,
	Liam Girdwood, Mark Brown, Jaroslav Kysela, Takashi Iwai,
	Arnd Bergmann, Kuninori Morimoto, Laurent Pinchart
  Cc: linux-renesas-soc, linux-arm-kernel, dmaengine, linux-media,
	netdev, devel, alsa-devel, linux-kernel, Geert Uytterhoeven
In-Reply-To: <1524230914-10175-1-git-send-email-geert+renesas@glider.be>

Change the menu title to refer to "Renesas SoCs" instead of "SuperH", as
both SuperH and ARM SoCs are supported.

Since commit 9b5ba0df4ea4f940 ("ARM: shmobile: Introduce ARCH_RENESAS")
is ARCH_RENESAS a more appropriate platform dependency for Renesas ARM
SoCs than the legacy ARCH_SHMOBILE, hence use the former.
Renesas SuperH SH-Mobile SoCs are still covered by the SUPERH
dependency.

This will allow to drop ARCH_SHMOBILE on ARM and ARM64 in the near
future.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 sound/soc/sh/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/soc/sh/Kconfig b/sound/soc/sh/Kconfig
index 1aa5cd77ca24a06f..c1b7fb91e3063f2b 100644
--- a/sound/soc/sh/Kconfig
+++ b/sound/soc/sh/Kconfig
@@ -1,5 +1,5 @@
-menu "SoC Audio support for SuperH"
-	depends on SUPERH || ARCH_SHMOBILE || COMPILE_TEST
+menu "SoC Audio support for Renesas SoCs"
+	depends on SUPERH || ARCH_RENESAS || COMPILE_TEST
 
 config SND_SOC_PCM_SH7760
 	tristate "SoC Audio support for Renesas SH7760"
-- 
2.7.4

^ permalink raw reply related

* [PATCH/RFC 7/8] ARM: shmobile: Remove the ARCH_SHMOBILE Kconfig symbol
From: Geert Uytterhoeven @ 2018-04-20 13:28 UTC (permalink / raw)
  To: Simon Horman, Magnus Damm, Russell King, Catalin Marinas,
	Will Deacon, Dan Williams, Vinod Koul, Mauro Carvalho Chehab,
	Sergei Shtylyov, David S . Miller, Greg Kroah-Hartman,
	Liam Girdwood, Mark Brown, Jaroslav Kysela, Takashi Iwai,
	Arnd Bergmann, Kuninori Morimoto, Laurent Pinchart
  Cc: devel, alsa-devel, Geert Uytterhoeven, netdev, linux-kernel,
	linux-renesas-soc, dmaengine, linux-arm-kernel, linux-media
In-Reply-To: <1524230914-10175-1-git-send-email-geert+renesas@glider.be>

All drivers for Renesas ARM SoCs have gained proper ARCH_RENESAS
platform dependencies.  Hence finish the conversion from ARCH_SHMOBILE
to ARCH_RENESAS for Renesas 32-bit ARM SoCs, as started by commit
9b5ba0df4ea4f940 ("ARM: shmobile: Introduce ARCH_RENESAS").

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
This depends on the previous patches in this series, hence the RFC.

JFTR, after this, the following symbols for drivers supporting only
Renesas SuperH "SH-Mobile" SoCs can no longer be selected:
  - CONFIG_KEYBOARD_SH_KEYSC,
  - CONFIG_VIDEO_SH_VOU,
  - CONFIG_VIDEO_SH_MOBILE_CEU,
  - CONFIG_DRM_SHMOBILE[*],
  - CONFIG_FB_SH_MOBILE_MERAM.
(changes for a shmobile_defconfig .config)

[*] CONFIG_DRM_SHMOBILE has a dependency on ARM, but it was never wired
    up.  From the use of sh_mobile_meram, I guess it was meant for
    SH-Mobile AP4 on Mackerel or AP4EVB, which are long gone.
    So the only remaining upstream platforms that could make use of it
    are legacy SuperH SH-Mobile SoCs?
---
 arch/arm/mach-shmobile/Kconfig | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/arm/mach-shmobile/Kconfig b/arch/arm/mach-shmobile/Kconfig
index 96672da02f5f17b9..d892c5b52b6f5627 100644
--- a/arch/arm/mach-shmobile/Kconfig
+++ b/arch/arm/mach-shmobile/Kconfig
@@ -1,6 +1,3 @@
-config ARCH_SHMOBILE
-	bool
-
 config PM_RMOBILE
 	bool
 	select PM
@@ -30,7 +27,6 @@ menuconfig ARCH_RENESAS
 	bool "Renesas ARM SoCs"
 	depends on ARCH_MULTI_V7 && MMU
 	select ARCH_DMA_ADDR_T_64BIT if ARM_LPAE
-	select ARCH_SHMOBILE
 	select ARM_GIC
 	select GPIOLIB
 	select HAVE_ARM_SCU if SMP
-- 
2.7.4

^ permalink raw reply related

* [PATCH/RFC 8/8] arm64: renesas: Remove the ARCH_SHMOBILE Kconfig symbol
From: Geert Uytterhoeven @ 2018-04-20 13:28 UTC (permalink / raw)
  To: Simon Horman, Magnus Damm, Russell King, Catalin Marinas,
	Will Deacon, Dan Williams, Vinod Koul, Mauro Carvalho Chehab,
	Sergei Shtylyov, David S . Miller, Greg Kroah-Hartman,
	Liam Girdwood, Mark Brown, Jaroslav Kysela, Takashi Iwai,
	Arnd Bergmann, Kuninori Morimoto, Laurent Pinchart
  Cc: devel, alsa-devel, Geert Uytterhoeven, netdev, linux-kernel,
	linux-renesas-soc, dmaengine, linux-arm-kernel, linux-media
In-Reply-To: <1524230914-10175-1-git-send-email-geert+renesas@glider.be>

The Kconfig symbol for Renesas 64-bit ARM SoCs has always been
ARCH_RENESAS, with ARCH_SHMOBILE being selected to reuse drivers shared
with Renesas 32-bit ARM and/or Renesas SuperH SH-Mobile SoCs.

Commit 9b5ba0df4ea4f940 ("ARM: shmobile: Introduce ARCH_RENESAS")
started the conversion from ARCH_SHMOBILE to ARCH_RENESAS for Renesas
32-bit SoCs.  Now all drivers for Renesas ARM SoCs have gained proper
ARCH_RENESAS platform dependencies, there is no longer a need to select
ARCH_SHMOBILE.

With ARCH_SHMOBILE gone, move the ARCH_RENESAS section up, to restore
alphabetical sort order.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
This depends on the driver patches in this series, hence the RFC.

JFTR, after this, the following symbols for drivers supporting only
Renesas SuperH "SH-Mobile" SoCs can no longer be selected:
  - CONFIG_KEYBOARD_SH_KEYSC,
  - CONFIG_VIDEO_SH_VOU,
  - CONFIG_VIDEO_RENESAS_CEU,
  - CONFIG_FB_SH_MOBILE_MERAM.
(changes for a renesas_defconfig .config)
---
 arch/arm64/Kconfig.platforms | 42 +++++++++++++++++++-----------------------
 1 file changed, 19 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms
index d5aeac351fc3a776..49d8ed1ab84766dd 100644
--- a/arch/arm64/Kconfig.platforms
+++ b/arch/arm64/Kconfig.platforms
@@ -145,31 +145,8 @@ config ARCH_REALTEK
 	  This enables support for the ARMv8 based Realtek chipsets,
 	  like the RTD1295.
 
-config ARCH_ROCKCHIP
-	bool "Rockchip Platforms"
-	select ARCH_HAS_RESET_CONTROLLER
-	select GPIOLIB
-	select PINCTRL
-	select PINCTRL_ROCKCHIP
-	select ROCKCHIP_TIMER
-	help
-	  This enables support for the ARMv8 based Rockchip chipsets,
-	  like the RK3368.
-
-config ARCH_SEATTLE
-	bool "AMD Seattle SoC Family"
-	help
-	  This enables support for AMD Seattle SOC Family
-
-config ARCH_SHMOBILE
-	bool
-
-config ARCH_SYNQUACER
-	bool "Socionext SynQuacer SoC Family"
-
 config ARCH_RENESAS
 	bool "Renesas SoC Platforms"
-	select ARCH_SHMOBILE
 	select PINCTRL
 	select PM
 	select PM_GENERIC_DOMAINS
@@ -220,6 +197,25 @@ config ARCH_R8A77995
 	help
 	  This enables support for the Renesas R-Car D3 SoC.
 
+config ARCH_ROCKCHIP
+	bool "Rockchip Platforms"
+	select ARCH_HAS_RESET_CONTROLLER
+	select GPIOLIB
+	select PINCTRL
+	select PINCTRL_ROCKCHIP
+	select ROCKCHIP_TIMER
+	help
+	  This enables support for the ARMv8 based Rockchip chipsets,
+	  like the RK3368.
+
+config ARCH_SEATTLE
+	bool "AMD Seattle SoC Family"
+	help
+	  This enables support for AMD Seattle SOC Family
+
+config ARCH_SYNQUACER
+	bool "Socionext SynQuacer SoC Family"
+
 config ARCH_STRATIX10
 	bool "Altera's Stratix 10 SoCFPGA Family"
 	help
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel
From: Eric W. Biederman @ 2018-04-20 13:36 UTC (permalink / raw)
  To: Rahul Lakkireddy
  Cc: Dave Young, netdev@vger.kernel.org, kexec@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Indranil Choudhury, Nirranjan Kirubaharan,
	stephen@networkplumber.org, Ganesh GR, akpm@linux-foundation.org,
	torvalds@linux-foundation.org, davem@davemloft.net,
	viro@zeniv.linux.org.uk
In-Reply-To: <20180420130632.GA32304@chelsio.com>

Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:

> On Thursday, April 04/19/18, 2018 at 20:23:37 +0530, Eric W. Biederman wrote:
>> Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
>> 
>> > On Thursday, April 04/19/18, 2018 at 07:10:30 +0530, Dave Young wrote:
>> >> On 04/18/18 at 06:01pm, Rahul Lakkireddy wrote:
>> >> > On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
>> >> > > Hi Rahul,
>> >> > > On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
>> >> > > > On production servers running variety of workloads over time, kernel
>> >> > > > panic can happen sporadically after days or even months. It is
>> >> > > > important to collect as much debug logs as possible to root cause
>> >> > > > and fix the problem, that may not be easy to reproduce. Snapshot of
>> >> > > > underlying hardware/firmware state (like register dump, firmware
>> >> > > > logs, adapter memory, etc.), at the time of kernel panic will be very
>> >> > > > helpful while debugging the culprit device driver.
>> >> > > > 
>> >> > > > This series of patches add new generic framework that enable device
>> >> > > > drivers to collect device specific snapshot of the hardware/firmware
>> >> > > > state of the underlying device in the crash recovery kernel. In crash
>> >> > > > recovery kernel, the collected logs are added as elf notes to
>> >> > > > /proc/vmcore, which is copied by user space scripts for post-analysis.
>> >> > > > 
>> >> > > > The sequence of actions done by device drivers to append their device
>> >> > > > specific hardware/firmware logs to /proc/vmcore are as follows:
>> >> > > > 
>> >> > > > 1. During probe (before hardware is initialized), device drivers
>> >> > > > register to the vmcore module (via vmcore_add_device_dump()), with
>> >> > > > callback function, along with buffer size and log name needed for
>> >> > > > firmware/hardware log collection.
>> >> > > 
>> >> > > I assumed the elf notes info should be prepared while kexec_[file_]load
>> >> > > phase. But I did not read the old comment, not sure if it has been discussed
>> >> > > or not.
>> >> > > 
>> >> > 
>> >> > We must not collect dumps in crashing kernel. Adding more things in
>> >> > crash dump path risks not collecting vmcore at all. Eric had
>> >> > discussed this in more detail at:
>> >> > 
>> >> > https://lkml.org/lkml/2018/3/24/319
>> >> > 
>> >> > We are safe to collect dumps in the second kernel. Each device dump
>> >> > will be exported as an elf note in /proc/vmcore.
>> >> 
>> >> I understand that we should avoid adding anything in crash path.  And I also
>> >> agree to collect device dump in second kernel.  I just assumed device
>> >> dump use some memory area to store the debug info and the memory
>> >> is persistent so that this can be done in 2 steps, first register the
>> >> address in elf header in kexec_load, then collect the dump in 2nd
>> >> kernel.  But it seems the driver is doing some other logic to collect
>> >> the info instead of just that simple like I thought. 
>> >> 
>> >
>> > It seems simpler, but I'm concerned with waste of memory area, if
>> > there are no device dumps being collected in second kernel. In
>> > approach proposed in these series, we dynamically allocate memory
>> > for the device dumps from second kernel's available memory.
>> 
>> Don't count that kernel having more than about 128MiB.
>> 
>
> If large dump is expected, Administrator can increase the memory
> allocated to the second kernel (using crashkernel boot param), to
> ensure device dumps get collected.

Except 128MiB is already a already a huge amount to reserve.  I
typically have run crash dumps with 16MiB of memory and thought it was
overkill.  Looking below 32MiB seems a bit high but it is small enough
that it is still doable.  I am baffled at how 2GiB can be guaranteed to fit
in 32MiB (sparse register space?) but if it works reliably.

>> For that reason if for no other it would be nice if it was possible to
>> have the driver to not initialize the device and just stand there
>> handing out the data a piece at a time as it is read from /proc/vmcore.
>> 
>
> Since cxgb4 is a network driver, it can be used to transfer the dumps
> over the network. So we must ensure the dumps get collected and
> stored, before device gets initialized to transfer dumps over
> the network.

Good point.  For some reason I was thinking it was an infiniband and not
an 10GiB ethernet device.

>> The 2GiB number I read earlier concerns me for working in a limited
>> environment.
>> 
>
> All dumps, including the 2GB on-chip memory dump, is compressed by
> the cxgb4 driver as they are collected. The overall compressed dump
> comes out at max 32 MB.
>
>> It might even make sense to separate this into a completely separate
>> module (depended upon the main driver if it makes sense to share
>> the functionality) so that people performing crash dumps would not
>> hesitate to include the code in their initramfs images.
>> 
>> I can see splitting a device up into a portion only to be used in case
>> of a crash dump and a normal portion like we do for main memory but I
>> doubt that makes sense in practice.
>> 
>
> This is not required, especially in case of network drivers, which
> must collect underlying device dump and initialize the device to
> transfer dumps over the network.

I have a practical concern.  What happens if the previous kernel left
the device in such a bad stat the driver can not successfully initialize
it.

Does failure to initialize cxgb4 after a crash now mean that you can not
capture the crash dump to see the crazy state the device was in?

Typically the initramfs for a crash dump does not include unnecessary
drivers so that hardware in states the drivers can't handle won't
prevent taking a crash dump.

I understand the issue if you are taking a dump over your 10GiB ethernet
it is a moot point.  But if you are writing your dump to disk, or
writing it over a management gigabit ethernet then it is still an issue.

Is there a decoupling so that a totally b0rked device can't prevent
taking it's own dump?

Eric

^ permalink raw reply

* Re: [PATCH 0/8] arm: renesas: Change platform dependency to ARCH_RENESAS
From: Arnd Bergmann @ 2018-04-20 13:40 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Simon Horman, Magnus Damm, Russell King, Catalin Marinas,
	Will Deacon, Dan Williams, Vinod Koul, Mauro Carvalho Chehab,
	Sergei Shtylyov, David S . Miller, Greg Kroah-Hartman,
	Liam Girdwood, Mark Brown, Jaroslav Kysela, Takashi Iwai,
	Kuninori Morimoto, Laurent Pinchart, Linux-Renesas, Linux
In-Reply-To: <1524230914-10175-1-git-send-email-geert+renesas@glider.be>

On Fri, Apr 20, 2018 at 3:28 PM, Geert Uytterhoeven
<geert+renesas@glider.be> wrote:
>         Hi all,
>
> Commit 9b5ba0df4ea4f940 ("ARM: shmobile: Introduce ARCH_RENESAS")
> started the conversion from ARCH_SHMOBILE to ARCH_RENESAS for Renesas
> ARM SoCs.  This patch series completes the conversion, by:
>   1. Updating dependencies for drivers that weren't converted yet,
>   2. Removing the ARCH_SHMOBILE Kconfig symbols on ARM and ARM64.
>
> The first 6 patches can be applied independently by subsystem
> maintainers.
> The last two patches depend on the first 6 patches, and are thus marked
> RFC.

This all looks fine to me.

Acked-by: Arnd Bergmann <arnd@arndb.de>

      Arnd

^ permalink raw reply

* Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Matthew Wilcox @ 2018-04-20 13:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mikulas Patocka, David Miller, Andrew Morton, linux-mm,
	eric.dumazet, edumazet, bhutchings, netdev, linux-kernel, mst,
	jasowang, virtualization, dm-devel, Vlastimil Babka
In-Reply-To: <20180420130852.GC16083@dhcp22.suse.cz>

On Fri, Apr 20, 2018 at 03:08:52PM +0200, Michal Hocko wrote:
> > In order to detect these bugs reliably I submit this patch that changes
> > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
> 
> No way. This is just wrong! First of all, you will explode most likely
> on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be
> enabled quite often.

I think it'll still suit Mikulas' debugging needs if we always use
vmalloc for sizes above PAGE_SIZE?

^ permalink raw reply

* [PATCH] iptables: Per-net ns lock
From: Kirill Tkhai @ 2018-04-20 13:42 UTC (permalink / raw)
  To: fw, netdev, pablo, rstoyanov1, ptikhomirov, avagin, ktkhai

Containers want to restore their own net ns,
while they may have no their own mnt ns.
This case they share host's /run/xtables.lock
file, but they may not have permission to open
it.

Patch makes /run/xtables.lock to be per-namespace,
i.e., to refer to the caller task's net ns.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 iptables/xshared.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/iptables/xshared.c b/iptables/xshared.c
index 06db72d4..b6dbe4e7 100644
--- a/iptables/xshared.c
+++ b/iptables/xshared.c
@@ -254,7 +254,12 @@ static int xtables_lock(int wait, struct timeval *wait_interval)
 	time_left.tv_sec = wait;
 	time_left.tv_usec = 0;
 
-	fd = open(XT_LOCK_NAME, O_CREAT, 0600);
+	if (symlink("/proc/self/ns/net", XT_LOCK_NAME) != 0 &&
+	    errno != EEXIST) {
+		fprintf(stderr, "Fatal: can't create lock file\n");
+		return XT_LOCK_FAILED;
+	}
+	fd = open(XT_LOCK_NAME, O_RDONLY);
 	if (fd < 0) {
 		fprintf(stderr, "Fatal: can't open lock file %s: %s\n",
 			XT_LOCK_NAME, strerror(errno));

^ permalink raw reply related

* Re: [PATCH net-next 2/2] udp: implement and use per cpu rx skbs cache
From: Jesper Dangaard Brouer @ 2018-04-20 13:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: brouer, Paolo Abeni, netdev, David S. Miller, Tariq Toukan
In-Reply-To: <0e3abeb5-8081-f9ea-4de6-cc1a7edfc5a5@gmail.com>


On Thu, 19 Apr 2018 06:47:10 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On 04/19/2018 12:40 AM, Paolo Abeni wrote:
> > On Wed, 2018-04-18 at 12:21 -0700, Eric Dumazet wrote:  
> >> On 04/18/2018 10:15 AM, Paolo Abeni wrote:
[...]
> > 
> > Any suggestions for better results are more than welcome!  
> 
> Yes, remote skb freeing. I mentioned this idea to Jesper and Tariq in
> Seoul (netdev conference). Not tied to UDP, but a generic solution.

Yes, I remember.  I think... was it the idea, where you basically
wanted to queue back SKBs to the CPU that allocated them, right?

Freeing an SKB on the same CPU that allocated it, have multiple
advantages. (1) the SLUB allocator can use a non-atomic
"cpu-local" (double)cmpxchg. (2) the 4 cache-lines memset cleared of
the SKB stay local.  (3) the atomic SKB refcnt/users stay local.

We just have to avoid that queue back SKB's mechanism, doesn't cost
more than the operations we expect to save.  Bulk transfer is an
obvious approach.  For storing SKBs until they are returned, we already
have a fast mechanism see napi_consume_skb calling _kfree_skb_defer,
which SLUB/SLAB-bulk free to amortize cost (1).

I guess, the missing information is that we don't know what CPU the SKB
were created on...

Where to store this CPU info?

(a) In struct sk_buff, in a cache-line that is already read on remote
CPU in UDP code?

(b) In struct page, as SLUB alloc hand-out objects/SKBs on a per page
basis, we could have SLUB store a hint about the CPU it was allocated
on, and bet on returning to that CPU ? (might be bad to read the
struct-page cache-line)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Michal Hocko @ 2018-04-20 13:49 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Mikulas Patocka, David Miller, Andrew Morton, linux-mm,
	eric.dumazet, edumazet, bhutchings, netdev, linux-kernel, mst,
	jasowang, virtualization, dm-devel, Vlastimil Babka
In-Reply-To: <20180420134136.GD10788@bombadil.infradead.org>

On Fri 20-04-18 06:41:36, Matthew Wilcox wrote:
> On Fri, Apr 20, 2018 at 03:08:52PM +0200, Michal Hocko wrote:
> > > In order to detect these bugs reliably I submit this patch that changes
> > > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
> > 
> > No way. This is just wrong! First of all, you will explode most likely
> > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be
> > enabled quite often.
> 
> I think it'll still suit Mikulas' debugging needs if we always use
> vmalloc for sizes above PAGE_SIZE?

Even if that was the case then this doesn't sounds like CONFIG_DEBUG_VM
material. We do not want a completely different behavior when the config
is enabled. If we really need some better fallback testing coverage
then the fault injection, as suggested by Vlastimil, sounds much more
reasonable to me

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH bpf-next 1/5] samples/bpf: Fix typo in comment
From: Jesper Dangaard Brouer @ 2018-04-20 13:52 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Leo Yan, Alexei Starovoitov, Daniel Borkmann, netdev,
	linux-kernel, brouer
In-Reply-To: <20180420132116.uswpqniteogfu4zz@holly.lan>

On Fri, 20 Apr 2018 14:21:16 +0100
Daniel Thompson <daniel.thompson@linaro.org> wrote:

> On Fri, Apr 20, 2018 at 02:10:04PM +0200, Jesper Dangaard Brouer wrote:
> > 
> > On Thu, 19 Apr 2018 09:34:02 +0800 Leo Yan <leo.yan@linaro.org> wrote:
> >   
> > > Fix typo by replacing 'iif' with 'if'.
> > > 
> > > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > > ---
> > >  samples/bpf/bpf_load.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
> > > index bebe418..28e4678 100644
> > > --- a/samples/bpf/bpf_load.c
> > > +++ b/samples/bpf/bpf_load.c
> > > @@ -393,7 +393,7 @@ static int load_elf_maps_section(struct bpf_map_data *maps, int maps_shndx,
> > >  			continue;
> > >  		if (sym[nr_maps].st_shndx != maps_shndx)
> > >  			continue;
> > > -		/* Only increment iif maps section */
> > > +		/* Only increment if maps section */
> > >  		nr_maps++;
> > >  	}  
> > 
> > This was actually not a typo from my side.
> > 
> > With 'iif' I mean 'if and only if' ... but it doesn't matter much.  
> 
> I think 'if and only if' is more commonly abbreviated 'iff' isn't it?

Ah, yes![1]  -- then it *is* actually a typo! - LOL

I'm fine with changing this to "if" :-)


[1] https://en.wikipedia.org/wiki/If_and_only_if

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH iproute2 1/1] tc: return on invalid smac or dmac in ife action
From: Roman Mashak @ 2018-04-20 13:52 UTC (permalink / raw)
  To: stephen; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak

Return on invalid smac/dmac and use invarg consistently for invalid
arguments report.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 tc/m_ife.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/tc/m_ife.c b/tc/m_ife.c
index d7e61703f666..ed0913a379aa 100644
--- a/tc/m_ife.c
+++ b/tc/m_ife.c
@@ -94,9 +94,7 @@ static int parse_ife(struct action_util *a, int *argc_p, char ***argv_p,
 			} else if (matches(*argv, "tcindex") == 0) {
 				ife_tcindex = IFE_META_TCINDEX;
 			} else {
-				fprintf(stderr, "Illegal meta define <%s>\n",
-					*argv);
-				return -1;
+				invarg("Illegal meta define", *argv);
 			}
 		} else if (matches(*argv, "use") == 0) {
 			NEXT_ARG();
@@ -116,9 +114,7 @@ static int parse_ife(struct action_util *a, int *argc_p, char ***argv_p,
 					invarg("ife tcindex val is invalid",
 					       *argv);
 			} else {
-				fprintf(stderr, "Illegal meta use type <%s>\n",
-					*argv);
-				return -1;
+				invarg("Illegal meta use type", *argv);
 			}
 		} else if (matches(*argv, "type") == 0) {
 			NEXT_ARG();
@@ -132,8 +128,7 @@ static int parse_ife(struct action_util *a, int *argc_p, char ***argv_p,
 			if (sscanf(daddr, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
 				   dbuf, dbuf + 1, dbuf + 2,
 				   dbuf + 3, dbuf + 4, dbuf + 5) != 6) {
-				fprintf(stderr, "Invalid mac address %s\n",
-					daddr);
+				invarg("Invalid mac address", *argv);
 			}
 			fprintf(stderr, "dst MAC address <%s>\n", daddr);
 
@@ -143,8 +138,7 @@ static int parse_ife(struct action_util *a, int *argc_p, char ***argv_p,
 			if (sscanf(saddr, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
 				   sbuf, sbuf + 1, sbuf + 2,
 				   sbuf + 3, sbuf + 4, sbuf + 5) != 6) {
-				fprintf(stderr, "Invalid mac address %s\n",
-					saddr);
+				invarg("Invalid mac address", *argv);
 			}
 			fprintf(stderr, "src MAC address <%s>\n", saddr);
 		} else if (matches(*argv, "help") == 0) {
-- 
2.7.4

^ permalink raw reply related

* Re: [RFC] vhost: introduce mdev based hardware vhost backend
From: Michael S. Tsirkin @ 2018-04-20 13:52 UTC (permalink / raw)
  To: Liang, Cunming
  Cc: Bie, Tiwei, Jason Wang, alex.williamson@redhat.com,
	ddutile@redhat.com, Duyck, Alexander H,
	virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, Daly, Dan, Wang, Zhihong, Tan, Jianfeng,
	Wang, Xiao W, Tian, Kevin
In-Reply-To: <D0158A423229094DA7ABF71CF2FA0DA34E9511D5@SHSMSX104.ccr.corp.intel.com>

On Fri, Apr 20, 2018 at 03:50:41AM +0000, Liang, Cunming wrote:
> 
> 
> > -----Original Message-----
> > From: Bie, Tiwei
> > Sent: Friday, April 20, 2018 11:28 AM
> > To: Michael S. Tsirkin <mst@redhat.com>
> > Cc: Jason Wang <jasowang@redhat.com>; alex.williamson@redhat.com;
> > ddutile@redhat.com; Duyck, Alexander H <alexander.h.duyck@intel.com>;
> > virtio-dev@lists.oasis-open.org; linux-kernel@vger.kernel.org;
> > kvm@vger.kernel.org; virtualization@lists.linux-foundation.org;
> > netdev@vger.kernel.org; Daly, Dan <dan.daly@intel.com>; Liang, Cunming
> > <cunming.liang@intel.com>; Wang, Zhihong <zhihong.wang@intel.com>; Tan,
> > Jianfeng <jianfeng.tan@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>;
> > Tian, Kevin <kevin.tian@intel.com>
> > Subject: Re: [RFC] vhost: introduce mdev based hardware vhost backend
> > 
> > On Thu, Apr 19, 2018 at 09:40:23PM +0300, Michael S. Tsirkin wrote:
> > > On Tue, Apr 10, 2018 at 03:25:45PM +0800, Jason Wang wrote:
> > > > > > > One problem is that, different virtio ring compatible devices
> > > > > > > may have different device interfaces. That is to say, we will
> > > > > > > need different drivers in QEMU. It could be troublesome. And
> > > > > > > that's what this patch trying to fix. The idea behind this
> > > > > > > patch is very simple: mdev is a standard way to emulate device
> > > > > > > in kernel.
> > > > > > So you just move the abstraction layer from qemu to kernel, and
> > > > > > you still need different drivers in kernel for different device
> > > > > > interfaces of accelerators. This looks even more complex than
> > > > > > leaving it in qemu. As you said, another idea is to implement
> > > > > > userspace vhost backend for accelerators which seems easier and
> > > > > > could co-work with other parts of qemu without inventing new type of
> > messages.
> > > > > I'm not quite sure. Do you think it's acceptable to add various
> > > > > vendor specific hardware drivers in QEMU?
> > > > >
> > > >
> > > > I don't object but we need to figure out the advantages of doing it
> > > > in qemu too.
> > > >
> > > > Thanks
> > >
> > > To be frank kernel is exactly where device drivers belong.  DPDK did
> > > move them to userspace but that's merely a requirement for data path.
> > > *If* you can have them in kernel that is best:
> > > - update kernel and there's no need to rebuild userspace
> > > - apps can be written in any language no need to maintain multiple
> > >   libraries or add wrappers
> > > - security concerns are much smaller (ok people are trying to
> > >   raise the bar with IOMMUs and such, but it's already pretty
> > >   good even without)
> > >
> > > The biggest issue is that you let userspace poke at the device which
> > > is also allowed by the IOMMU to poke at kernel memory (needed for
> > > kernel driver to work).
> > 
> > I think the device won't and shouldn't be allowed to poke at kernel memory. Its
> > kernel driver needs some kernel memory to work. But the device doesn't have
> > the access to them. Instead, the device only has the access to:
> > 
> > (1) the entire memory of the VM (if vIOMMU isn't used) or
> > (2) the memory belongs to the guest virtio device (if
> >     vIOMMU is being used).
> > 
> > Below is the reason:
> > 
> > For the first case, we should program the IOMMU for the hardware device based
> > on the info in the memory table which is the entire memory of the VM.
> > 
> > For the second case, we should program the IOMMU for the hardware device
> > based on the info in the shadow page table of the vIOMMU.
> > 
> > So the memory can be accessed by the device is limited, it should be safe
> > especially for the second case.
> > 
> > My concern is that, in this RFC, we don't program the IOMMU for the mdev
> > device in the userspace via the VFIO API directly. Instead, we pass the memory
> > table to the kernel driver via the mdev device (BAR0) and ask the driver to do the
> > IOMMU programming. Someone may don't like it. The main reason why we don't
> > program IOMMU via VFIO API in userspace directly is that, currently IOMMU
> > drivers don't support mdev bus.
> > 
> > >
> > > Yes, maybe if device is not buggy it's all fine, but it's better if we
> > > do not have to trust the device otherwise the security picture becomes
> > > more murky.
> > >
> > > I suggested attaching a PASID to (some) queues - see my old post
> > > "using PASIDs to enable a safe variant of direct ring access".
> > 
> Ideally we can have a device binding with normal driver in host, meanwhile support to allocate a few queues attaching with PASID on-demand. By vhost mdev transport channel, the data path ability of queues(as a device) can expose to qemu vhost adaptor as a vDPA instance. Then we can avoid VF number limitation, providing vhost data path acceleration in a small granularity.

Exactly my point.

> > It's pretty cool. We also have some similar ideas.
> > Cunming will talk more about this.
> > 
> > Best regards,
> > Tiwei Bie
> > 
> > >
> > > Then using IOMMU with VFIO to limit access through queue to corrent
> > > ranges of memory.
> > >
> > >
> > > --
> > > MST

^ permalink raw reply

* Re: [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks
From: Christian Brauner @ 2018-04-20 13:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: davem, netdev, linux-kernel, avagin, ktkhai, serge, gregkh
In-Reply-To: <20180418215246.GA24000@gmail.com>

On Wed, Apr 18, 2018 at 11:52:47PM +0200, Christian Brauner wrote:
> On Wed, Apr 18, 2018 at 11:55:52AM -0500, Eric W. Biederman wrote:
> > Christian Brauner <christian.brauner@ubuntu.com> writes:
> > 
> > > Now that it's possible to have a different set of uevents in different
> > > network namespaces, per-network namespace uevent sequence numbers are
> > > introduced. This increases performance as locking is now restricted to the
> > > network namespace affected by the uevent rather than locking
> > > everything.
> > 
> > Numbers please.  I personally expect that the netlink mc_list issues
> > will swamp any benefit you get from this.
> 
> I wouldn't see how this would be the case. The gist of this is:
> Everytime you send a uevent into a network namespace *not* owned by
> init_user_ns you currently *have* to take mutex_lock(uevent_sock_list)
> effectively blocking the host from processing uevents even though
> - the uevent you're receiving might be totally different from the
>   uevent that you're sending
> - the uevent socket of the non-init_user_ns owned network namespace
>   isn't even recorded in the list.
> 
> The other argument is that we now have properly isolated network
> namespaces wrt to uevents such that each netns can have its own set of
> uevents. This can either happen by a sufficiently privileged userspace
> process sending it uevents that are only dedicated to that specific
> netns. Or - and this *has been true for a long time* - because network
> devices are *properly namespaced*. Meaning a uevent for that network
> device is *tied to a network namespace*. For both cases the uevent
> sequence numbering will be absolutely misleading. For example, whenever
> you create e.g. a new veth device in a new network namespace it
> shouldn't be accounted against the initial network namespace but *only*
> against the network namespace that has that device added to it.

Eric, I did the testing. Here's what I did:

I compiled two 4.17-rc1 Kernels:
- one with per netns uevent seqnums with decoupled locking
- one without per netns uevent seqnums with decoupled locking

# Testcase 1:
Only Injecting Uevents into network namespaces not owned by the initial user
namespace.
- created 1000 new user namespace + network namespace pairs
- opened a uevent listener in each of those namespace pairs
- injected uevents into each of those network namespaces 10,000 times meaning
  10,000,000 (10 million) uevents were injected. (The high number of
  uevent injections should get rid of a lot of jitter.)
- Calculated the mean transaction time.
- *without* uevent sequence number namespacing:
  67 μs
- *with* uevent sequence number namespacing:
  55 μs
- makes a difference of 12 μs

# Testcase 2:
Injecting Uevents into network namespaces not owned by the initial user
namespace and network namespaces owned by the initial user namespace.
- created 500 new user namespace + network namespace pairs
- created 500 new network namespace pairs
- opened a uevent listener in each of those namespace pairs
- injected uevents into each of those network namespaces 10,000 times meaning
  10,000,000 (10 million) uevents were injected. (The high number of
  uevent injections should get rid of a lot of jitter.)
- Calculated the mean transaction time.
- *without* uevent sequence number namespacing:
  572 μs
- *with* uevent sequence number namespacing:
  514 μs
- makes a difference of 58 μs

So there's performance gain. The third case would be to create a bunch
of hanging processes that send SIGSTOP to themselves but do not actually
open a uevent socket in their respective namespaces and then inject
uevents into them. I expect there to be an even more performance
benefits since the rtnl_table_lock() isn't hit in this case because
there are no listeners.

Christian

^ permalink raw reply

* [PATCH net] tcp: don't read out-of-bounds opsize
From: Jann Horn @ 2018-04-20 13:57 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, netdev, linux-kernel, jannh

The old code reads the "opsize" variable from out-of-bounds memory (first
byte behind the segment) if a broken TCP segment ends directly after an
opcode that is neither EOL nor NOP.

The result of the read isn't used for anything, so the worst thing that
could theoretically happen is a pagefault; and since the physmap is usually
mostly contiguous, even that seems pretty unlikely.

The following C reproducer triggers the uninitialized read - however, you
can't actually see anything happen unless you put something like a
pr_warn() in tcp_parse_md5sig_option() to print the opsize.

====================================
#define _GNU_SOURCE
#include <arpa/inet.h>
#include <stdlib.h>
#include <errno.h>
#include <stdarg.h>
#include <net/if.h>
#include <linux/if.h>
#include <linux/ip.h>
#include <linux/tcp.h>
#include <linux/in.h>
#include <linux/if_tun.h>
#include <err.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <assert.h>

void systemf(const char *command, ...) {
  char *full_command;
  va_list ap;
  va_start(ap, command);
  if (vasprintf(&full_command, command, ap) == -1)
    err(1, "vasprintf");
  va_end(ap);
  printf("systemf: <<<%s>>>\n", full_command);
  system(full_command);
}

char *devname;

int tun_alloc(char *name) {
  int fd = open("/dev/net/tun", O_RDWR);
  if (fd == -1)
    err(1, "open tun dev");
  static struct ifreq req = { .ifr_flags = IFF_TUN|IFF_NO_PI };
  strcpy(req.ifr_name, name);
  if (ioctl(fd, TUNSETIFF, &req))
    err(1, "TUNSETIFF");
  devname = req.ifr_name;
  printf("device name: %s\n", devname);
  return fd;
}

#define IPADDR(a,b,c,d) (((a)<<0)+((b)<<8)+((c)<<16)+((d)<<24))

void sum_accumulate(unsigned int *sum, void *data, int len) {
  assert((len&2)==0);
  for (int i=0; i<len/2; i++) {
    *sum += ntohs(((unsigned short *)data)[i]);
  }
}

unsigned short sum_final(unsigned int sum) {
  sum = (sum >> 16) + (sum & 0xffff);
  sum = (sum >> 16) + (sum & 0xffff);
  return htons(~sum);
}

void fix_ip_sum(struct iphdr *ip) {
  unsigned int sum = 0;
  sum_accumulate(&sum, ip, sizeof(*ip));
  ip->check = sum_final(sum);
}

void fix_tcp_sum(struct iphdr *ip, struct tcphdr *tcp) {
  unsigned int sum = 0;
  struct {
    unsigned int saddr;
    unsigned int daddr;
    unsigned char pad;
    unsigned char proto_num;
    unsigned short tcp_len;
  } fakehdr = {
    .saddr = ip->saddr,
    .daddr = ip->daddr,
    .proto_num = ip->protocol,
    .tcp_len = htons(ntohs(ip->tot_len) - ip->ihl*4)
  };
  sum_accumulate(&sum, &fakehdr, sizeof(fakehdr));
  sum_accumulate(&sum, tcp, tcp->doff*4);
  tcp->check = sum_final(sum);
}

int main(void) {
  int tun_fd = tun_alloc("inject_dev%d");
  systemf("ip link set %s up", devname);
  systemf("ip addr add 192.168.42.1/24 dev %s", devname);

  struct {
    struct iphdr ip;
    struct tcphdr tcp;
    unsigned char tcp_opts[20];
  } __attribute__((packed)) syn_packet = {
    .ip = {
      .ihl = sizeof(struct iphdr)/4,
      .version = 4,
      .tot_len = htons(sizeof(syn_packet)),
      .ttl = 30,
      .protocol = IPPROTO_TCP,
      /* FIXUP check */
      .saddr = IPADDR(192,168,42,2),
      .daddr = IPADDR(192,168,42,1)
    },
    .tcp = {
      .source = htons(1),
      .dest = htons(1337),
      .seq = 0x12345678,
      .doff = (sizeof(syn_packet.tcp)+sizeof(syn_packet.tcp_opts))/4,
      .syn = 1,
      .window = htons(64),
      .check = 0 /*FIXUP*/
    },
    .tcp_opts = {
      /* INVALID: trailing MD5SIG opcode after NOPs */
      1, 1, 1, 1, 1,
      1, 1, 1, 1, 1,
      1, 1, 1, 1, 1,
      1, 1, 1, 1, 19
    }
  };
  fix_ip_sum(&syn_packet.ip);
  fix_tcp_sum(&syn_packet.ip, &syn_packet.tcp);
  while (1) {
    int write_res = write(tun_fd, &syn_packet, sizeof(syn_packet));
    if (write_res != sizeof(syn_packet))
      err(1, "packet write failed");
  }
}
====================================

Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.")
Signed-off-by: Jann Horn <jannh@google.com>
---
 net/ipv4/tcp_input.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 367def6ddeda..e51c644484dc 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3868,11 +3868,8 @@ const u8 *tcp_parse_md5sig_option(const struct tcphdr *th)
 	int length = (th->doff << 2) - sizeof(*th);
 	const u8 *ptr = (const u8 *)(th + 1);
 
-	/* If the TCP option is too short, we can short cut */
-	if (length < TCPOLEN_MD5SIG)
-		return NULL;
-
-	while (length > 0) {
+	/* If not enough data remaining, we can short cut */
+	while (length >= TCPOLEN_MD5SIG) {
 		int opcode = *ptr++;
 		int opsize;
 
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* [PATCH] [net] ipv6: sr: fix NULL pointer dereference in seg6_do_srh_encap()- v4 pkts
From: Ahmed Abdelsalam @ 2018-04-20 13:58 UTC (permalink / raw)
  To: davem, dlebrun, kuznet, yoshfuji, netdev, linux-kernel; +Cc: amsalam20

In case of seg6 in encap mode, seg6_do_srh_encap() calls set_tun_src()
in order to set the src addr of outer IPv6 header.

The net_device is required for set_tun_src(). However calling ip6_dst_idev()
on dst_entry in case of IPv4 traffic results on the following bug.

Using just dst->dev should fix this BUG.

[  196.242461] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  196.242975] PGD 800000010f076067 P4D 800000010f076067 PUD 10f060067 PMD 0
[  196.243329] Oops: 0000 [#1] SMP PTI
[  196.243468] Modules linked in: nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd input_leds glue_helper led_class pcspkr serio_raw mac_hid video autofs4 hid_generic usbhid hid e1000 i2c_piix4 ahci pata_acpi libahci
[  196.244362] CPU: 2 PID: 1089 Comm: ping Not tainted 4.16.0+ #1
[  196.244606] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  196.244968] RIP: 0010:seg6_do_srh_encap+0x1ac/0x300
[  196.245236] RSP: 0018:ffffb2ce00b23a60 EFLAGS: 00010202
[  196.245464] RAX: 0000000000000000 RBX: ffff8c7f53eea300 RCX: 0000000000000000
[  196.245742] RDX: 0000f10000000000 RSI: ffff8c7f52085a6c RDI: ffff8c7f41166850
[  196.246018] RBP: ffffb2ce00b23aa8 R08: 00000000000261e0 R09: ffff8c7f41166800
[  196.246294] R10: ffffdce5040ac780 R11: ffff8c7f41166828 R12: ffff8c7f41166808
[  196.246570] R13: ffff8c7f52085a44 R14: ffffffffb73211c0 R15: ffff8c7e69e44200
[  196.246846] FS:  00007fc448789700(0000) GS:ffff8c7f59d00000(0000) knlGS:0000000000000000
[  196.247286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  196.247526] CR2: 0000000000000000 CR3: 000000010f05a000 CR4: 00000000000406e0
[  196.247804] Call Trace:
[  196.247972]  seg6_do_srh+0x15b/0x1c0
[  196.248156]  seg6_output+0x3c/0x220
[  196.248341]  ? prandom_u32+0x14/0x20
[  196.248526]  ? ip_idents_reserve+0x6c/0x80
[  196.248723]  ? __ip_select_ident+0x90/0x100
[  196.248923]  ? ip_append_data.part.50+0x6c/0xd0
[  196.249133]  lwtunnel_output+0x44/0x70
[  196.249328]  ip_send_skb+0x15/0x40
[  196.249515]  raw_sendmsg+0x8c3/0xac0
[  196.249701]  ? _copy_from_user+0x2e/0x60
[  196.249897]  ? rw_copy_check_uvector+0x53/0x110
[  196.250106]  ? _copy_from_user+0x2e/0x60
[  196.250299]  ? copy_msghdr_from_user+0xce/0x140
[  196.250508]  sock_sendmsg+0x36/0x40
[  196.250690]  ___sys_sendmsg+0x292/0x2a0
[  196.250881]  ? _cond_resched+0x15/0x30
[  196.251074]  ? copy_termios+0x1e/0x70
[  196.251261]  ? _copy_to_user+0x22/0x30
[  196.251575]  ? tty_mode_ioctl+0x1c3/0x4e0
[  196.251782]  ? _cond_resched+0x15/0x30
[  196.251972]  ? mutex_lock+0xe/0x30
[  196.252152]  ? vvar_fault+0xd2/0x110
[  196.252337]  ? __do_fault+0x1f/0xc0
[  196.252521]  ? __handle_mm_fault+0xc1f/0x12d0
[  196.252727]  ? __sys_sendmsg+0x63/0xa0
[  196.252919]  __sys_sendmsg+0x63/0xa0
[  196.253107]  do_syscall_64+0x72/0x200
[  196.253305]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  196.253530] RIP: 0033:0x7fc4480b0690
[  196.253715] RSP: 002b:00007ffde9f252f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  196.254053] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007fc4480b0690
[  196.254331] RDX: 0000000000000000 RSI: 000000000060a360 RDI: 0000000000000003
[  196.254608] RBP: 00007ffde9f253f0 R08: 00000000002d1e81 R09: 0000000000000002
[  196.254884] R10: 00007ffde9f250c0 R11: 0000000000000246 R12: 0000000000b22070
[  196.255205] R13: 20c49ba5e353f7cf R14: 431bde82d7b634db R15: 00007ffde9f278fe
[  196.255484] Code: a5 0f b6 45 c0 41 88 41 28 41 0f b6 41 2c 48 c1 e0 04 49 8b 54 01 38 49 8b 44 01 30 49 89 51 20 49 89 41 18 48 8b 83 b0 00 00 00 <48> 8b 30 49 8b 86 08 0b 00 00 48 8b 40 20 48 8b 50 08 48 0b 10
[  196.256190] RIP: seg6_do_srh_encap+0x1ac/0x300 RSP: ffffb2ce00b23a60
[  196.256445] CR2: 0000000000000000
[  196.256676] ---[ end trace 71af7d093603885c ]---

Fixes: 8936ef7604c11 ipv6: sr: fix NULL pointer dereference when setting encap source address
Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
---
I tested the patch for IPv6 and IPv4 traffic 

 net/ipv6/seg6_iptunnel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index f343e6f..5fe1394 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -136,7 +136,7 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
 	isrh->nexthdr = proto;
 
 	hdr->daddr = isrh->segments[isrh->first_segment];
-	set_tun_src(net, ip6_dst_idev(dst)->dev, &hdr->daddr, &hdr->saddr);
+	set_tun_src(net, dst->dev, &hdr->daddr, &hdr->saddr);
 
 #ifdef CONFIG_IPV6_SEG6_HMAC
 	if (sr_has_hmac(isrh)) {
-- 
2.1.4

^ permalink raw reply related

* Re: Q: force netif ON even when there is no real link ?
From: Ran Shalit @ 2018-04-20 14:01 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev
In-Reply-To: <CAJ2oMhKCq73QSPH7zWCBMbxOMxcij9vE+ovdhN=ueEBKXsZabA@mail.gmail.com>

On Fri, Apr 20, 2018 at 3:14 PM, Ran Shalit <ranshalit@gmail.com> wrote:
> On Fri, Apr 20, 2018 at 3:05 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>> On Fri, Apr 20, 2018 at 03:01:09PM +0300, Ran Shalit wrote:
>>> On Fri, Apr 20, 2018 at 2:55 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>>> > On Fri, Apr 20, 2018 at 11:44:14AM +0300, Ran Shalit wrote:
>>> >> Hello,
>>> >>
>>> >> We configure external switch in u-boot.
>>> >> The configuration is through mdio (cpu is mac and switch is phy).
>>> >>
>>> >> But in Linux we rather not implement any communication in mdio to
>>> >> switch, but it means that we then don't have the information of link
>>> >> state.
>>> >>
>>> >> Is it possible to force in Linux (by default in startup) Ethernet
>>> >> connectivity (netif_carrier_on, netif_wake_queue) even if there is no
>>> >> information of real link state ?
>>> >
>>> > Hi Ran
>>> >
>>> > Use a fixed-phy.
>>> >
>>>
>>> Hi Andrew,
>>>
>>> I'll check about fixed phy,
>>> but in general, is it a problem to have always netif_carrier_on, even
>>> when there is no link ?
>>
>> The link between the CPU and the switch should be up all the
>> time. That is the point of fixed-link.
>>
>
> I understand.
> But what about the mac driver,  does it just do netif_start_queue ?
>

By saying "mac driver", I mean Ethernet driver with fixed phy.

Regards,
Ranran

> Thanks
>
>
>>     Andrew

^ permalink raw reply

* Re: [RFC] vhost: introduce mdev based hardware vhost backend
From: Michael S. Tsirkin @ 2018-04-20 14:12 UTC (permalink / raw)
  To: Jason Wang
  Cc: Tiwei Bie, alex.williamson, ddutile, alexander.h.duyck,
	virtio-dev, linux-kernel, kvm, virtualization, netdev, dan.daly,
	cunming.liang, zhihong.wang, jianfeng.tan, xiao.w.wang
In-Reply-To: <060e2b5f-2e93-c53f-387b-5baaa33e87cd@redhat.com>

On Fri, Apr 20, 2018 at 11:52:47AM +0800, Jason Wang wrote:
> > The biggest issue is that you let userspace poke at the
> > device which is also allowed by the IOMMU to poke at
> > kernel memory (needed for kernel driver to work).
> 
> I don't quite get. The userspace driver could be built on top of VFIO for
> sure. So kernel memory were perfectly isolated in this case.

VFIO does what it can but it mostly just has the IOMMU to play with.
So don't overestimate what it can do - it assumes a high level
of spec compliance for protections to work. For example,
ATS is enabled by default if device has it, and that
treats translated requests are trusted. FLS is assumed to reset
the device for when VFIO is unbound from the device. etc.


> > 
> > Yes, maybe if device is not buggy it's all fine, but
> > it's better if we do not have to trust the device
> > otherwise the security picture becomes more murky.
> > 
> > I suggested attaching a PASID to (some) queues - see my old post "using
> > PASIDs to enable a safe variant of direct ring access".
> > 
> > Then using IOMMU with VFIO to limit access through queue to corrent
> > ranges of memory.
> 
> Well userspace driver could benefit from this too. And we can even go
> further by using nested IO page tables to share IOVA address space between
> devices and a VM.
> 
> Thanks

Yes I suggested this separately.

-- 
MST

^ permalink raw reply

* Re: [PATCH net] tcp: don't read out-of-bounds opsize
From: Eric Dumazet @ 2018-04-20 14:21 UTC (permalink / raw)
  To: Jann Horn, davem, kuznet, yoshfuji, netdev, linux-kernel
In-Reply-To: <20180420135730.44921-1-jannh@google.com>



On 04/20/2018 06:57 AM, Jann Horn wrote:
> The old code reads the "opsize" variable from out-of-bounds memory (first
> byte behind the segment) if a broken TCP segment ends directly after an
> opcode that is neither EOL nor NOP.
> 
> The result of the read isn't used for anything, so the worst thing that
> could theoretically happen is a pagefault; and since the physmap is usually
> mostly contiguous, even that seems pretty unlikely.
>

No page fault possible, because tcp headers are in skb->head

And we have 'struct skb_shared_info'  at the end of skb->head anyway.

But, yes, reading some extra bytes with random content is possible.

^ permalink raw reply

* Re: [PATCH net-next] net: phy: mdio-boardinfo: Allow recursive mdiobus_register()
From: David Miller @ 2018-04-20 14:34 UTC (permalink / raw)
  To: andrew; +Cc: netdev, f.fainelli, vivien.didelot
In-Reply-To: <1524096047-16823-1-git-send-email-andrew@lunn.ch>

From: Andrew Lunn <andrew@lunn.ch>
Date: Thu, 19 Apr 2018 02:00:47 +0200

> mdiobus_register will search for any mdiobus board info registered for
> the bus being registered. If found, it will probe devices on the bus.
> That device, if for example it is an ethernet switch, may then try to
> register an mdio bus. Thus we need to allow recursive calls to
> mdiobus_register.
> 
> Holding the mdio_board_lock will cause a deadlock during this
> recursion. Release the lock and use list_for_each_entry_safe.
> 
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>

Applied.

While looking over this code I see that we currently never unregister
mdio boardinfo objects.

If we have drivers that can be unloaded, as it seems the one you plan
to add that needs this change should be, the situation could get more
tricky here.

^ permalink raw reply

* Re: [PATCH] net: net_cls: remove a NULL check for css_cls_state
From: David Miller @ 2018-04-20 14:37 UTC (permalink / raw)
  To: lirongqing; +Cc: netdev
In-Reply-To: <1524113961-30166-1-git-send-email-lirongqing@baidu.com>

From: Li RongQing <lirongqing@baidu.com>
Date: Thu, 19 Apr 2018 12:59:21 +0800

> The input of css_cls_state() is impossible to NULL except
> cgrp_css_online, so simplify it
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>

I don't view this as an improvement.  Just let the helper always check
NULL and that way there are less situations to audit.

And it's not like this is a critical fast path either.

I'm not applying this, sorry.

^ permalink raw reply

* Re: [PATCH] [net] ipv6: sr: fix NULL pointer dereference in seg6_do_srh_encap()- v4 pkts
From: David Lebrun @ 2018-04-20 14:38 UTC (permalink / raw)
  To: Ahmed Abdelsalam, davem, dlebrun, kuznet, yoshfuji, netdev,
	linux-kernel
In-Reply-To: <1524232685-1203-1-git-send-email-amsalam20@gmail.com>

On 04/20/2018 02:58 PM, Ahmed Abdelsalam wrote:
> In case of seg6 in encap mode, seg6_do_srh_encap() calls set_tun_src()
> in order to set the src addr of outer IPv6 header.
> 
> The net_device is required for set_tun_src(). However calling ip6_dst_idev()
> on dst_entry in case of IPv4 traffic results on the following bug.
> 
> Using just dst->dev should fix this BUG.
> 

Good catch, thanks for spotting this. If you actually tested your fix 
with IPv4 and IPv6 traffic, you should mention it in the commit message. 
Your current formulation suggests that you just guessed a fix without 
testing.

> 
> Fixes: 8936ef7604c11 ipv6: sr: fix NULL pointer dereference when setting encap source address
> Signed-off-by: Ahmed Abdelsalam<amsalam20@gmail.com>

Acked-by: David Lebrun <dlebrun@google.com>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox