Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 2/8] dmaengine: shdmac: Change platform check to CONFIG_ARCH_RENESAS
From: Geert Uytterhoeven @ 2018-04-20 13:28 UTC (permalink / raw)
  To: Simon Horman, Magnus Damm, Russell King, Catalin Marinas,
	Will Deacon, Dan Williams, Vinod Koul, Mauro Carvalho Chehab,
	Sergei Shtylyov, David S . Miller, Greg Kroah-Hartman,
	Liam Girdwood, Mark Brown, Jaroslav Kysela, Takashi Iwai,
	Arnd Bergmann, Kuninori Morimoto, Laurent Pinchart
  Cc: devel, alsa-devel, Geert Uytterhoeven, netdev, linux-kernel,
	linux-renesas-soc, dmaengine, linux-arm-kernel, linux-media
In-Reply-To: <1524230914-10175-1-git-send-email-geert+renesas@glider.be>

Since commit 9b5ba0df4ea4f940 ("ARM: shmobile: Introduce ARCH_RENESAS")
is CONFIG_ARCH_RENESAS a more appropriate platform check than the legacy
CONFIG_ARCH_SHMOBILE, hence use the former.

Renesas SuperH SH-Mobile SoCs are still covered by the CONFIG_CPU_SH4
check, just like before support for Renesas ARM SoCs was added.

Instead of blindly changing all the #ifdefs, switch the main code block
in sh_dmae_probe() to IS_ENABLED(), as this allows to remove all the
remaining #ifdefs.

This will allow to drop ARCH_SHMOBILE on ARM in the near future.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/dma/sh/shdmac.c | 50 +++++++++++++++++++++----------------------------
 1 file changed, 21 insertions(+), 29 deletions(-)

diff --git a/drivers/dma/sh/shdmac.c b/drivers/dma/sh/shdmac.c
index 516f5487cc44cf96..8fcaae482ce0949a 100644
--- a/drivers/dma/sh/shdmac.c
+++ b/drivers/dma/sh/shdmac.c
@@ -440,7 +440,6 @@ static bool sh_dmae_reset(struct sh_dmae_device *shdev)
 	return ret;
 }
 
-#if defined(CONFIG_CPU_SH4) || defined(CONFIG_ARCH_SHMOBILE)
 static irqreturn_t sh_dmae_err(int irq, void *data)
 {
 	struct sh_dmae_device *shdev = data;
@@ -451,7 +450,6 @@ static irqreturn_t sh_dmae_err(int irq, void *data)
 	sh_dmae_reset(shdev);
 	return IRQ_HANDLED;
 }
-#endif
 
 static bool sh_dmae_desc_completed(struct shdma_chan *schan,
 				   struct shdma_desc *sdesc)
@@ -683,11 +681,8 @@ static int sh_dmae_probe(struct platform_device *pdev)
 	const struct sh_dmae_pdata *pdata;
 	unsigned long chan_flag[SH_DMAE_MAX_CHANNELS] = {};
 	int chan_irq[SH_DMAE_MAX_CHANNELS];
-#if defined(CONFIG_CPU_SH4) || defined(CONFIG_ARCH_SHMOBILE)
 	unsigned long irqflags = 0;
-	int errirq;
-#endif
-	int err, i, irq_cnt = 0, irqres = 0, irq_cap = 0;
+	int err, errirq, i, irq_cnt = 0, irqres = 0, irq_cap = 0;
 	struct sh_dmae_device *shdev;
 	struct dma_device *dma_dev;
 	struct resource *chan, *dmars, *errirq_res, *chanirq_res;
@@ -789,33 +784,32 @@ static int sh_dmae_probe(struct platform_device *pdev)
 	if (err)
 		goto rst_err;
 
-#if defined(CONFIG_CPU_SH4) || defined(CONFIG_ARCH_SHMOBILE)
-	chanirq_res = platform_get_resource(pdev, IORESOURCE_IRQ, 1);
+	if (IS_ENABLED(CONFIG_CPU_SH4) || IS_ENABLED(CONFIG_ARCH_RENESAS)) {
+		chanirq_res = platform_get_resource(pdev, IORESOURCE_IRQ, 1);
 
-	if (!chanirq_res)
-		chanirq_res = errirq_res;
-	else
-		irqres++;
+		if (!chanirq_res)
+			chanirq_res = errirq_res;
+		else
+			irqres++;
 
-	if (chanirq_res == errirq_res ||
-	    (errirq_res->flags & IORESOURCE_BITS) == IORESOURCE_IRQ_SHAREABLE)
-		irqflags = IRQF_SHARED;
+		if (chanirq_res == errirq_res ||
+		    (errirq_res->flags & IORESOURCE_BITS) == IORESOURCE_IRQ_SHAREABLE)
+			irqflags = IRQF_SHARED;
 
-	errirq = errirq_res->start;
+		errirq = errirq_res->start;
 
-	err = devm_request_irq(&pdev->dev, errirq, sh_dmae_err, irqflags,
-			       "DMAC Address Error", shdev);
-	if (err) {
-		dev_err(&pdev->dev,
-			"DMA failed requesting irq #%d, error %d\n",
-			errirq, err);
-		goto eirq_err;
+		err = devm_request_irq(&pdev->dev, errirq, sh_dmae_err,
+				       irqflags, "DMAC Address Error", shdev);
+		if (err) {
+			dev_err(&pdev->dev,
+				"DMA failed requesting irq #%d, error %d\n",
+				errirq, err);
+			goto eirq_err;
+		}
+	} else {
+		chanirq_res = errirq_res;
 	}
 
-#else
-	chanirq_res = errirq_res;
-#endif /* CONFIG_CPU_SH4 || CONFIG_ARCH_SHMOBILE */
-
 	if (chanirq_res->start == chanirq_res->end &&
 	    !platform_get_resource(pdev, IORESOURCE_IRQ, 1)) {
 		/* Special case - all multiplexed */
@@ -881,9 +875,7 @@ static int sh_dmae_probe(struct platform_device *pdev)
 chan_probe_err:
 	sh_dmae_chan_remove(shdev);
 
-#if defined(CONFIG_CPU_SH4) || defined(CONFIG_ARCH_SHMOBILE)
 eirq_err:
-#endif
 rst_err:
 	spin_lock_irq(&sh_dmae_lock);
 	list_del_rcu(&shdev->node);
-- 
2.7.4

^ permalink raw reply related

* [PATCH 1/8] arm: shmobile: Change platform dependency to ARCH_RENESAS
From: Geert Uytterhoeven @ 2018-04-20 13:28 UTC (permalink / raw)
  To: Simon Horman, Magnus Damm, Russell King, Catalin Marinas,
	Will Deacon, Dan Williams, Vinod Koul, Mauro Carvalho Chehab,
	Sergei Shtylyov, David S . Miller, Greg Kroah-Hartman,
	Liam Girdwood, Mark Brown, Jaroslav Kysela, Takashi Iwai,
	Arnd Bergmann, Kuninori Morimoto, Laurent Pinchart
  Cc: devel, alsa-devel, Geert Uytterhoeven, netdev, linux-kernel,
	linux-renesas-soc, dmaengine, linux-arm-kernel, linux-media
In-Reply-To: <1524230914-10175-1-git-send-email-geert+renesas@glider.be>

Since commit 9b5ba0df4ea4f940 ("ARM: shmobile: Introduce ARCH_RENESAS")
is ARCH_RENESAS a more appropriate platform dependency than the legacy
ARCH_SHMOBILE, hence use the former.

This will allow to drop ARCH_SHMOBILE on ARM in the near future.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 arch/arm/Kconfig  | 2 +-
 arch/arm/Makefile | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index a7f8e7f4b88fdd03..2d34c0a44877e85b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1467,7 +1467,7 @@ config ARM_PSCI
 config ARCH_NR_GPIO
 	int
 	default 2048 if ARCH_SOCFPGA
-	default 1024 if ARCH_BRCMSTB || ARCH_SHMOBILE || ARCH_TEGRA || \
+	default 1024 if ARCH_BRCMSTB || ARCH_RENESAS || ARCH_TEGRA || \
 		ARCH_ZYNQ
 	default 512 if ARCH_EXYNOS || ARCH_KEYSTONE || SOC_OMAP5 || \
 		SOC_DRA7XX || ARCH_S3C24XX || ARCH_S3C64XX || ARCH_S5PV210
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index e4e537f27339f7a1..a92f5a876d96839d 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -212,7 +212,7 @@ machine-$(CONFIG_ARCH_S3C24XX)		+= s3c24xx
 machine-$(CONFIG_ARCH_S3C64XX)		+= s3c64xx
 machine-$(CONFIG_ARCH_S5PV210)		+= s5pv210
 machine-$(CONFIG_ARCH_SA1100)		+= sa1100
-machine-$(CONFIG_ARCH_SHMOBILE) 	+= shmobile
+machine-$(CONFIG_ARCH_RENESAS)	 	+= shmobile
 machine-$(CONFIG_ARCH_SIRF)		+= prima2
 machine-$(CONFIG_ARCH_SOCFPGA)		+= socfpga
 machine-$(CONFIG_ARCH_STI)		+= sti
-- 
2.7.4

^ permalink raw reply related

* [PATCH 0/8] arm: renesas: Change platform dependency to ARCH_RENESAS
From: Geert Uytterhoeven @ 2018-04-20 13:28 UTC (permalink / raw)
  To: Simon Horman, Magnus Damm, Russell King, Catalin Marinas,
	Will Deacon, Dan Williams, Vinod Koul, Mauro Carvalho Chehab,
	Sergei Shtylyov, David S . Miller, Greg Kroah-Hartman,
	Liam Girdwood, Mark Brown, Jaroslav Kysela, Takashi Iwai,
	Arnd Bergmann, Kuninori Morimoto, Laurent Pinchart
  Cc: devel, alsa-devel, Geert Uytterhoeven, netdev, linux-kernel,
	linux-renesas-soc, dmaengine, linux-arm-kernel, linux-media

	Hi all,

Commit 9b5ba0df4ea4f940 ("ARM: shmobile: Introduce ARCH_RENESAS")
started the conversion from ARCH_SHMOBILE to ARCH_RENESAS for Renesas
ARM SoCs.  This patch series completes the conversion, by:
  1. Updating dependencies for drivers that weren't converted yet,
  2. Removing the ARCH_SHMOBILE Kconfig symbols on ARM and ARM64.

The first 6 patches can be applied independently by subsystem
maintainers.
The last two patches depend on the first 6 patches, and are thus marked
RFC.

Thanks for your comments!

Geert Uytterhoeven (8):
  arm: shmobile: Change platform dependency to ARCH_RENESAS
  dmaengine: shdmac: Change platform check to CONFIG_ARCH_RENESAS
  [media] v4l: rcar_fdp1: Change platform dependency to ARCH_RENESAS
  sh_eth: Change platform check to CONFIG_ARCH_RENESAS
  staging: emxx_udc: Change platform dependency to ARCH_RENESAS
  ASoC: sh: Update menu title and platform dependency
  [RFC] ARM: shmobile: Remove the ARCH_SHMOBILE Kconfig symbol
  [RFC] arm64: renesas: Remove the ARCH_SHMOBILE Kconfig symbol

 arch/arm/Kconfig                      |  2 +-
 arch/arm/Makefile                     |  2 +-
 arch/arm/mach-shmobile/Kconfig        |  4 ---
 arch/arm64/Kconfig.platforms          | 42 +++++++++++++----------------
 drivers/dma/sh/shdmac.c               | 50 +++++++++++++++--------------------
 drivers/media/platform/Kconfig        |  2 +-
 drivers/net/ethernet/renesas/sh_eth.h |  2 +-
 drivers/staging/emxx_udc/Kconfig      |  2 +-
 sound/soc/sh/Kconfig                  |  4 +--
 9 files changed, 47 insertions(+), 63 deletions(-)

-- 
2.7.4

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH bpf-next 1/5] samples/bpf: Fix typo in comment
From: Daniel Thompson @ 2018-04-20 13:21 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Leo Yan, Alexei Starovoitov, Daniel Borkmann, netdev,
	linux-kernel
In-Reply-To: <20180420141004.259d7d6f@redhat.com>

On Fri, Apr 20, 2018 at 02:10:04PM +0200, Jesper Dangaard Brouer wrote:
> 
> On Thu, 19 Apr 2018 09:34:02 +0800 Leo Yan <leo.yan@linaro.org> wrote:
> 
> > Fix typo by replacing 'iif' with 'if'.
> > 
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  samples/bpf/bpf_load.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
> > index bebe418..28e4678 100644
> > --- a/samples/bpf/bpf_load.c
> > +++ b/samples/bpf/bpf_load.c
> > @@ -393,7 +393,7 @@ static int load_elf_maps_section(struct bpf_map_data *maps, int maps_shndx,
> >  			continue;
> >  		if (sym[nr_maps].st_shndx != maps_shndx)
> >  			continue;
> > -		/* Only increment iif maps section */
> > +		/* Only increment if maps section */
> >  		nr_maps++;
> >  	}
> 
> This was actually not a typo from my side.
> 
> With 'iif' I mean 'if and only if' ... but it doesn't matter much.

I think 'if and only if' is more commonly abbreviated 'iff' isn't it?


Daniel.

^ permalink raw reply

* Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Michal Hocko @ 2018-04-20 13:08 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: dm-devel, eric.dumazet, mst, netdev, linux-kernel, virtualization,
	linux-mm, edumazet, bhutchings, Andrew Morton, David Miller,
	Vlastimil Babka
In-Reply-To: <alpine.LRH.2.02.1804191207380.31175@file01.intranet.prod.int.rdu2.redhat.com>

On Thu 19-04-18 12:12:38, Mikulas Patocka wrote:
[...]
> From: Mikulas Patocka <mpatocka@redhat.com>
> Subject: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
> 
> The kvmalloc function tries to use kmalloc and falls back to vmalloc if
> kmalloc fails.
> 
> Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> code.
> 
> These bugs are hard to reproduce because vmalloc falls back to kmalloc
> only if memory is fragmented.
> 
> In order to detect these bugs reliably I submit this patch that changes
> kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.

No way. This is just wrong! First of all, you will explode most likely
on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be
enabled quite often.

> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

Nacked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/util.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: linux-2.6/mm/util.c
> ===================================================================
> --- linux-2.6.orig/mm/util.c	2018-04-18 15:46:23.000000000 +0200
> +++ linux-2.6/mm/util.c	2018-04-18 16:00:43.000000000 +0200
> @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap);
>   */
>  void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  {
> +#ifndef CONFIG_DEBUG_VM
>  	gfp_t kmalloc_flags = flags;
>  	void *ret;
>  
> @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f
>  	 */
>  	if (ret || size <= PAGE_SIZE)
>  		return ret;
> +#endif
>  
>  	return __vmalloc_node_flags_caller(size, node, flags,
>  			__builtin_return_address(0));

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel
From: Rahul Lakkireddy @ 2018-04-20 13:06 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Dave Young, netdev@vger.kernel.org, kexec@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Indranil Choudhury, Nirranjan Kirubaharan,
	stephen@networkplumber.org, Ganesh GR, akpm@linux-foundation.org,
	torvalds@linux-foundation.org, davem@davemloft.net,
	viro@zeniv.linux.org.uk
In-Reply-To: <87lgdjnt72.fsf@xmission.com>

On Thursday, April 04/19/18, 2018 at 20:23:37 +0530, Eric W. Biederman wrote:
> Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
> 
> > On Thursday, April 04/19/18, 2018 at 07:10:30 +0530, Dave Young wrote:
> >> On 04/18/18 at 06:01pm, Rahul Lakkireddy wrote:
> >> > On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
> >> > > Hi Rahul,
> >> > > On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
> >> > > > On production servers running variety of workloads over time, kernel
> >> > > > panic can happen sporadically after days or even months. It is
> >> > > > important to collect as much debug logs as possible to root cause
> >> > > > and fix the problem, that may not be easy to reproduce. Snapshot of
> >> > > > underlying hardware/firmware state (like register dump, firmware
> >> > > > logs, adapter memory, etc.), at the time of kernel panic will be very
> >> > > > helpful while debugging the culprit device driver.
> >> > > > 
> >> > > > This series of patches add new generic framework that enable device
> >> > > > drivers to collect device specific snapshot of the hardware/firmware
> >> > > > state of the underlying device in the crash recovery kernel. In crash
> >> > > > recovery kernel, the collected logs are added as elf notes to
> >> > > > /proc/vmcore, which is copied by user space scripts for post-analysis.
> >> > > > 
> >> > > > The sequence of actions done by device drivers to append their device
> >> > > > specific hardware/firmware logs to /proc/vmcore are as follows:
> >> > > > 
> >> > > > 1. During probe (before hardware is initialized), device drivers
> >> > > > register to the vmcore module (via vmcore_add_device_dump()), with
> >> > > > callback function, along with buffer size and log name needed for
> >> > > > firmware/hardware log collection.
> >> > > 
> >> > > I assumed the elf notes info should be prepared while kexec_[file_]load
> >> > > phase. But I did not read the old comment, not sure if it has been discussed
> >> > > or not.
> >> > > 
> >> > 
> >> > We must not collect dumps in crashing kernel. Adding more things in
> >> > crash dump path risks not collecting vmcore at all. Eric had
> >> > discussed this in more detail at:
> >> > 
> >> > https://lkml.org/lkml/2018/3/24/319
> >> > 
> >> > We are safe to collect dumps in the second kernel. Each device dump
> >> > will be exported as an elf note in /proc/vmcore.
> >> 
> >> I understand that we should avoid adding anything in crash path.  And I also
> >> agree to collect device dump in second kernel.  I just assumed device
> >> dump use some memory area to store the debug info and the memory
> >> is persistent so that this can be done in 2 steps, first register the
> >> address in elf header in kexec_load, then collect the dump in 2nd
> >> kernel.  But it seems the driver is doing some other logic to collect
> >> the info instead of just that simple like I thought. 
> >> 
> >
> > It seems simpler, but I'm concerned with waste of memory area, if
> > there are no device dumps being collected in second kernel. In
> > approach proposed in these series, we dynamically allocate memory
> > for the device dumps from second kernel's available memory.
> 
> Don't count that kernel having more than about 128MiB.
> 

If large dump is expected, Administrator can increase the memory
allocated to the second kernel (using crashkernel boot param), to
ensure device dumps get collected.

> For that reason if for no other it would be nice if it was possible to
> have the driver to not initialize the device and just stand there
> handing out the data a piece at a time as it is read from /proc/vmcore.
> 

Since cxgb4 is a network driver, it can be used to transfer the dumps
over the network. So we must ensure the dumps get collected and
stored, before device gets initialized to transfer dumps over
the network.

> The 2GiB number I read earlier concerns me for working in a limited
> environment.
> 

All dumps, including the 2GB on-chip memory dump, is compressed by
the cxgb4 driver as they are collected. The overall compressed dump
comes out at max 32 MB.

> It might even make sense to separate this into a completely separate
> module (depended upon the main driver if it makes sense to share
> the functionality) so that people performing crash dumps would not
> hesitate to include the code in their initramfs images.
> 
> I can see splitting a device up into a portion only to be used in case
> of a crash dump and a normal portion like we do for main memory but I
> doubt that makes sense in practice.
> 

This is not required, especially in case of network drivers, which
must collect underlying device dump and initialize the device to
transfer dumps over the network.

> >> > > If do this in 2nd kernel a question is driver can be loaded later than vmcore init.
> >> > 
> >> > Yes, drivers will add their device dumps after vmcore init.
> >> > 
> >> > > How to guarantee the function works if vmcore reading happens before
> >> > > the driver is loaded?
> >> > > 
> >> > > Also it is possible that kdump initramfs does not contains the driver
> >> > > module.
> >> > > 
> >> > > Am I missing something?
> >> > > 
> >> > 
> >> > Yes, driver must be in initramfs if it wants to collect and add device
> >> > dump to /proc/vmcore in second kernel.
> >> 
> >> In RH/Fedora kdump scripts we only add the things are required to
> >> bring up the dump target, so that we can use as less memory as we can.
> >> 
> >> For example, if a net driver panicked, and the dump target is rootfs
> >> which is a scsi disk, then no network related stuff will be added in
> >> initramfs.
> >> 
> >> In this case the device dump info will be not collected..
> >
> > Correct. If the driver is not present in initramfs, it can't collect
> > its underlying device's dump. Administrator is expected to add the 
> > driver to initramfs, if device dump needs to be collected.
> 
> That makes sense, as most people won't have that need.  Still if we can
> find something that can work automatically and safely without the need
> for manual configuration people are more likely to use it.
> 

Thanks,
Rahul

^ permalink raw reply

* Re: [PATCH 2/3] xen netback: add fault injection facility
From: staskins @ 2018-04-20 13:02 UTC (permalink / raw)
  To: Juergen Gross
  Cc: jakub.kicinski, hpa, mcroce, tglx, ggarcia, daniel, x86, mingo,
	xen-devel, axboe, konrad.wilk, amir.jer.levy, paul.durrant,
	stefanha, dsa, boris.ostrovsky, linux-block, wei.liu2, netdev,
	linux-kernel, davem, dwmw, roger.pau
In-Reply-To: <5c4e4f3d-03ce-b517-d451-3229c0d542c0@suse.com>

On 04/20/18 15:00, Juergen Gross wrote:
> On 20/04/18 14:52, staskins@amazon.com wrote:
>> On 04/20/18 13:25, Juergen Gross wrote:
>>> On 20/04/18 12:47, Stanislav Kinsburskii wrote:
>>>> +    for (fi = 0; fi < XENVIF_FI_MAX; fi++) {
>>>> +        vfi->faults[fi] = xen_fi_dir_add(vfi->dir,
>>>> +                xenvif_fi_names[fi]);
>>> How does this work? xenvif_fi_names[] is an empty array and this is the
>>> only reference to it. Who is allocating the memory for that array?
>> Well, it works in the way one adds a var to enum (which is used as a key
>> later) and a corresponding string into the array (which is used as a
>> name for the fault directory in sysfs).
> Then you should size the array via XENVIF_FI_MAX.

Makes sense.
Thanks!

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply

* Re: [PATCH 2/3] xen netback: add fault injection facility
From: Juergen Gross @ 2018-04-20 13:00 UTC (permalink / raw)
  To: staskins
  Cc: jakub.kicinski, hpa, mcroce, tglx, ggarcia, daniel, x86, mingo,
	xen-devel, axboe, konrad.wilk, amir.jer.levy, paul.durrant,
	stefanha, dsa, boris.ostrovsky, linux-block, wei.liu2, netdev,
	linux-kernel, davem, dwmw, roger.pau
In-Reply-To: <909fd3b2-19e8-8c13-6ede-cfd6051c6f1d@amazon.com>

On 20/04/18 14:52, staskins@amazon.com wrote:
> On 04/20/18 13:25, Juergen Gross wrote:
>> On 20/04/18 12:47, Stanislav Kinsburskii wrote:
>>> +    for (fi = 0; fi < XENVIF_FI_MAX; fi++) {
>>> +        vfi->faults[fi] = xen_fi_dir_add(vfi->dir,
>>> +                xenvif_fi_names[fi]);
>> How does this work? xenvif_fi_names[] is an empty array and this is the
>> only reference to it. Who is allocating the memory for that array?
> 
> Well, it works in the way one adds a var to enum (which is used as a key
> later) and a corresponding string into the array (which is used as a
> name for the fault directory in sysfs).

Then you should size the array via XENVIF_FI_MAX.


Juergen

^ permalink raw reply

* Re: [PATCH 3/3] xen blkback: add fault injection facility
From: staskins @ 2018-04-20 12:53 UTC (permalink / raw)
  To: Juergen Gross
  Cc: jakub.kicinski, hpa, mcroce, tglx, ggarcia, daniel, x86, mingo,
	xen-devel, axboe, konrad.wilk, amir.jer.levy, paul.durrant,
	stefanha, dsa, boris.ostrovsky, linux-block, wei.liu2, netdev,
	linux-kernel, davem, dwmw, roger.pau
In-Reply-To: <ef016b7b-6c0e-1684-5714-0a6d5a46504e@suse.com>

On 04/20/18 13:28, Juergen Gross wrote:
> On 20/04/18 12:47, Stanislav Kinsburskii wrote:
>> This patch adds wrapper helpers around generic Xen fault inject
>> facility.
>> The major reason is to keep all the module fault injection directories
>> in a dedicated subdirectory instead of Xen fault inject root.
>>
>> IOW, when using these helpers, per-device and named by device name
>> fault injection control directories will appear under the following
>> directory:
>> - /sys/kernel/debug/xen/fault_inject/xen-blkback/
>> instead of:
>> - /sys/kernel/debug/xen/fault_inject/
>>
>> Signed-off-by: Stanislav Kinsburskii <staskins@amazon.com>
>> CC: Jens Axboe <axboe@kernel.dk>
>> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> CC: "Roger Pau Monné" <roger.pau@citrix.com>
>> CC: linux-kernel@vger.kernel.org
>> CC: linux-block@vger.kernel.org
>> CC: xen-devel@lists.xenproject.org
>> CC: Stanislav Kinsburskii <staskins@amazon.com>
>> CC: David Woodhouse <dwmw@amazon.co.uk>
> This is an exact copy of the netback patch apart from the names.
>
> I don't like adding multiple copies of the same coding to the tree.

Sure, will dedupplicate this.

>
> Juergen
>

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply

* Re: [PATCH 2/3] xen netback: add fault injection facility
From: staskins @ 2018-04-20 12:52 UTC (permalink / raw)
  To: Juergen Gross
  Cc: jakub.kicinski, hpa, mcroce, tglx, ggarcia, daniel, x86, mingo,
	xen-devel, axboe, konrad.wilk, amir.jer.levy, paul.durrant,
	stefanha, dsa, boris.ostrovsky, linux-block, wei.liu2, netdev,
	linux-kernel, davem, dwmw, roger.pau
In-Reply-To: <e33c055c-5a5a-2329-7a2e-faf715bb95fc@suse.com>

On 04/20/18 13:25, Juergen Gross wrote:
> On 20/04/18 12:47, Stanislav Kinsburskii wrote:
>> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
>> index 8918466..5cc9acd 100644
>> --- a/drivers/net/Kconfig
>> +++ b/drivers/net/Kconfig
>> @@ -465,6 +465,14 @@ config XEN_NETDEV_BACKEND
>>   	  compile this driver as a module, chose M here: the module
>>   	  will be called xen-netback.
>>   
>> +config XEN_NETDEV_BACKEND_FAULT_INJECTION
>> +	  bool "Xen net-device backend driver fault injection"
>> +	  depends on XEN_NETDEV_BACKEND
>> +	  depends on XEN_FAULT_INJECTION
>> +	  default n
>> +	  help
>> +	    Allow to inject errors to Xen backend network driver
>> +
>>   config VMXNET3
>>   	tristate "VMware VMXNET3 ethernet driver"
>>   	depends on PCI && INET
>> diff --git a/drivers/net/xen-netback/Makefile b/drivers/net/xen-netback/Makefile
>> index d49798a..28abcdc 100644
>> --- a/drivers/net/xen-netback/Makefile
>> +++ b/drivers/net/xen-netback/Makefile
>> @@ -1,3 +1,4 @@
>>   obj-$(CONFIG_XEN_NETDEV_BACKEND) := xen-netback.o
>>   
>>   xen-netback-y := netback.o xenbus.o interface.o hash.o rx.o
>> +xen-netback-$(CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION) += netback_fi.o
>> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
>> index a46a1e9..30d676d 100644
>> --- a/drivers/net/xen-netback/common.h
>> +++ b/drivers/net/xen-netback/common.h
>> @@ -286,6 +286,9 @@ struct xenvif {
>>   
>>   #ifdef CONFIG_DEBUG_FS
>>   	struct dentry *xenvif_dbg_root;
>> +#ifdef CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION
>> +	void *fi_info;
>> +#endif
>>   #endif
>>   
>>   	struct xen_netif_ctrl_back_ring ctrl;
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index a27daa2..ecc416e 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -33,6 +33,7 @@
>>    */
>>   
>>   #include "common.h"
>> +#include "netback_fi.h"
>>   
>>   #include <linux/kthread.h>
>>   #include <linux/if_vlan.h>
>> @@ -1649,6 +1650,7 @@ static int __init netback_init(void)
>>   			PTR_ERR(xen_netback_dbg_root));
>>   #endif /* CONFIG_DEBUG_FS */
>>   
>> +	(void) xen_netbk_fi_init();
> This is the only usage of xen_netbk_fi_init(). Why don't you make it
> return void from the beginning?

Well, I could do so, of course.
My intention was to treat this as an error. But then it doesn't 
correlate to ignored debugfs directory creation error above.

>>   	return 0;
>>   
>>   failed_init:
>> @@ -1659,6 +1661,7 @@ module_init(netback_init);
>>   
>>   static void __exit netback_fini(void)
>>   {
>> +	xen_netbk_fi_fini();
>>   #ifdef CONFIG_DEBUG_FS
>>   	if (!IS_ERR_OR_NULL(xen_netback_dbg_root))
>>   		debugfs_remove_recursive(xen_netback_dbg_root);
>> diff --git a/drivers/net/xen-netback/netback_fi.c b/drivers/net/xen-netback/netback_fi.c
>> new file mode 100644
>> index 0000000..47541d0
>> --- /dev/null
>> +++ b/drivers/net/xen-netback/netback_fi.c
>> @@ -0,0 +1,119 @@
>> +/*
>> + * Fault injection interface for Xen backend network driver
>> + *
>> + * Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License version 2
>> + * as published by the Free Software Foundation; or, when distributed
>> + * separately from the Linux kernel or incorporated into other
>> + * software packages, subject to the following license:
> SPDX again.

Will fix.

>
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this source file (the "Software"), to deal in the Software without
>> + * restriction, including without limitation the rights to use, copy, modify,
>> + * merge, publish, distribute, sublicense, and/or sell copies of the Software,
>> + * and to permit persons to whom the Software is furnished to do so, subject to
>> + * the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
>> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
>> + * IN THE SOFTWARE.
>> + */
>> +
>> +#include "common.h"
>> +
>> +#include <linux/debugfs.h>
>> +
>> +#include <xen/fault_inject.h>
>> +#include "netback_fi.h"
>> +
>> +static struct dentry *vif_fi_dir;
>> +
>> +static const char *xenvif_fi_names[] = {
>> +};
>> +
>> +struct xenvif_fi {
>> +	struct dentry *dir;
>> +	struct xen_fi *faults[XENVIF_FI_MAX];
>> +};
>> +
>> +int xen_netbk_fi_init(void)
>> +{
>> +	vif_fi_dir = xen_fi_dir_create("xen-netback");
>> +	if (!vif_fi_dir)
>> +		return -ENOMEM;
>> +	return 0;
>> +}
>> +
>> +void xen_netbk_fi_fini(void)
>> +{
>> +	debugfs_remove_recursive(vif_fi_dir);
>> +}
>> +
>> +void xenvif_fi_fini(struct xenvif *vif)
>> +{
>> +	struct xenvif_fi *vfi = vif->fi_info;
>> +	int fi;
>> +
>> +	if (!vif->fi_info)
>> +		return;
>> +
>> +	vif->fi_info = NULL;
>> +
>> +	for (fi = 0; fi < XENVIF_FI_MAX; fi++)
>> +		xen_fi_del(vfi->faults[fi]);
>> +	debugfs_remove_recursive(vfi->dir);
>> +	kfree(vfi);
>> +}
>> +
>> +int xenvif_fi_init(struct xenvif *vif)
>> +{
>> +	struct dentry *parent;
>> +	struct xenvif_fi *vfi;
>> +	int fi, err = -ENOMEM;
>> +
>> +	parent = vif_fi_dir;
>> +	if (!parent)
>> +		return -ENOMEM;
>> +
>> +	vfi = kmalloc(sizeof(*vfi), GFP_KERNEL);
>> +	if (!vfi)
>> +		return -ENOMEM;
>> +
>> +	vfi->dir = debugfs_create_dir(vif->dev->name, parent);
>> +	if (!vfi->dir)
>> +		goto err_dir;
>> +
>> +	for (fi = 0; fi < XENVIF_FI_MAX; fi++) {
>> +		vfi->faults[fi] = xen_fi_dir_add(vfi->dir,
>> +				xenvif_fi_names[fi]);
> How does this work? xenvif_fi_names[] is an empty array and this is the
> only reference to it. Who is allocating the memory for that array?

Well, it works in the way one adds a var to enum (which is used as a key 
later) and a corresponding string into the array (which is used as a 
name for the fault directory in sysfs).

>> +		if (!vfi->faults[fi])
>> +			goto err_fault;
>> +	}
>> +
>> +	vif->fi_info = vfi;
>> +	return 0;
>> +
>> +err_fault:
>> +	for (; fi > 0; fi--)
>> +		xen_fi_del(vfi->faults[fi]);
> What about vfi->faults[0] ?

Thanks! Will fix.


>> +	debugfs_remove_recursive(vfi->dir);
>> +err_dir:
>> +	kfree(vfi);
>> +	return err;
>> +}
>> +
>> +bool xenvif_should_fail(struct xenvif *vif, xenvif_fi_t type)
>> +{
>> +	struct xenvif_fi *vfi = vif->fi_info;
>> +
>> +	return xen_should_fail(vfi->faults[type]);
>> +}
>> diff --git a/drivers/net/xen-netback/netback_fi.h b/drivers/net/xen-netback/netback_fi.h
>> new file mode 100644
>> index 0000000..895c6a6
>> --- /dev/null
>> +++ b/drivers/net/xen-netback/netback_fi.h
>> @@ -0,0 +1,35 @@
>> +#ifndef _XEN_NETBACK_FI_H
>> +#define _XEN_NETBACK_FI_H
>> +
>> +struct xen_fi;
> Why?

Ditto.

>> +
>> +typedef enum {
>> +	XENVIF_FI_MAX
>> +} xenvif_fi_t;
> It would have helped if you had added some users of the stuff you are
> adding here. This enum just looks weird this way.
>
it in pla
Yeah... Probably I should mark this thing as a RFC.

>> +
>> +#ifdef CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION
>> +
>> +int xen_netbk_fi_init(void);
>> +void xen_netbk_fi_fini(void);
>> +
>> +void xenvif_fi_fini(struct xenvif *vif);
>> +int xenvif_fi_init(struct xenvif *vif);
>> +
>> +bool xenvif_should_fail(struct xenvif *vif, xenvif_fi_t type);
>> +
>> +#else
>> +
>> +static inline int xen_netbk_fi_init(void) { return 0; }
>> +static inline void xen_netbk_fi_fini(void) { }
>> +
>> +static inline void xenvif_fi_fini(struct xenvif *vif) { }
>> +static inline int xenvif_fi_init(struct xenvif *vif) { return 0; }
>> +
>> +static inline bool xenvif_should_fail(struct xenvif *vif, xenvif_fi_t type)
>> +{
>> +	return false;
>> +}
>> +
>> +#endif /* CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION */
>> +
>> +#endif /* _XEN_NETBACK_FI_H */
>> diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
>> index e1aef25..c775ee0 100644
>> --- a/drivers/net/xen-netback/xenbus.c
>> +++ b/drivers/net/xen-netback/xenbus.c
>> @@ -21,6 +21,7 @@
>>   #include "common.h"
>>   #include <linux/vmalloc.h>
>>   #include <linux/rtnetlink.h>
>> +#include "netback_fi.h"
>>   
>>   struct backend_info {
>>   	struct xenbus_device *dev;
>> @@ -502,6 +503,7 @@ static void backend_disconnect(struct backend_info *be)
>>   #ifdef CONFIG_DEBUG_FS
>>   		xenvif_debugfs_delif(vif);
>>   #endif /* CONFIG_DEBUG_FS */
>> +		xenvif_fi_fini(vif);
>>   		xenvif_disconnect_data(vif);
>>   
>>   		/* At this point some of the handlers may still be active
>> @@ -1024,6 +1026,10 @@ static void connect(struct backend_info *be)
>>   		}
>>   	}
>>   
>> +	err = xenvif_fi_init(be->vif);
>> +	if (err)
>> +		goto err;
>> +
>>   #ifdef CONFIG_DEBUG_FS
>>   	xenvif_debugfs_addif(be->vif);
>>   #endif /* CONFIG_DEBUG_FS */
>>
> Without any user of that infrastructure I really can't say whether I
> want this.
>

The code we are using this faults for is not yet sent (we have it in plans).
Probably I'll send it once again after this code using it is sent.
Thanks anyway!

> Juergen
>

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply

* Re: [PATCH 1/3] xen: add generic fault injection facility
From: staskins @ 2018-04-20 12:45 UTC (permalink / raw)
  To: Juergen Gross
  Cc: jakub.kicinski, hpa, mcroce, tglx, ggarcia, daniel, x86, mingo,
	xen-devel, axboe, konrad.wilk, amir.jer.levy, paul.durrant,
	stefanha, dsa, boris.ostrovsky, linux-block, wei.liu2, netdev,
	linux-kernel, davem, dwmw, roger.pau
In-Reply-To: <03a333dc-e505-e254-f39d-e68b1453a6f3@suse.com>

On 04/20/18 12:59, Juergen Gross wrote:
> On 20/04/18 12:47, Stanislav Kinsburskii wrote:
>> diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
>> index c1f98f3..483fc16 100644
>> --- a/arch/x86/xen/Kconfig
>> +++ b/arch/x86/xen/Kconfig
>> @@ -77,3 +77,10 @@ config XEN_PVH
>>   	bool "Support for running as a PVH guest"
>>   	depends on XEN && XEN_PVHVM && ACPI
>>   	def_bool n
>> +
>> +config XEN_FAULT_INJECTION
>> +	bool "Enable Xen fault injection"
>> +	depends on FAULT_INJECTION_DEBUG_FS
>> +	default n
>> +	help
>> +	  Enable Xen fault injection facility
> Why for x86 only? I'd rather add this under drivers/xen

Sure.

>
>> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
>> index d83cb54..3158fe1 100644
>> --- a/arch/x86/xen/Makefile
>> +++ b/arch/x86/xen/Makefile
>> @@ -34,3 +34,4 @@ obj-$(CONFIG_XEN_DOM0)		+= vga.o
>>   obj-$(CONFIG_SWIOTLB_XEN)	+= pci-swiotlb-xen.o
>>   obj-$(CONFIG_XEN_EFI)		+= efi.o
>>   obj-$(CONFIG_XEN_PVH)	 	+= xen-pvh.o
>> +obj-$(CONFIG_XEN_FAULT_INJECTION)	+= fault_inject.o
>> diff --git a/arch/x86/xen/fault_inject.c b/arch/x86/xen/fault_inject.c
>> new file mode 100644
>> index 0000000..ecf0f7c
>> --- /dev/null
>> +++ b/arch/x86/xen/fault_inject.c
>> @@ -0,0 +1,109 @@
>> +/*
>> + * Fauit injection interface for Xen virtual block devices
>> + *
>> + * Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License version 2
>> + * as published by the Free Software Foundation; or, when distributed
>> + * separately from the Linux kernel or incorporated into other
>> + * software packages, subject to the following license:
> Please use the appropriate SPDX header instead of the full GPL2
> boilerplate.

Ditto.

>
>
> Juergen
>

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply

* Re: [Xen-devel] [PATCH] xen-netfront: Fix hang on device removal
From: Jason Andryuk @ 2018-04-20 12:41 UTC (permalink / raw)
  To: Simon Gaiser
  Cc: netdev, xen-devel, Eduardo Otubo, Juergen Gross, Boris Ostrovsky,
	open list
In-Reply-To: <6ad676c8-476f-1181-1acd-985671f6c837@invisiblethingslab.com>

On Thu, Apr 19, 2018 at 4:09 PM, Simon Gaiser
<simon@invisiblethingslab.com> wrote:
> Jason Andryuk:
>> On Thu, Apr 19, 2018 at 2:10 PM, Simon Gaiser
>> <simon@invisiblethingslab.com> wrote:
>>> Jason Andryuk:
>>>> A toolstack may delete the vif frontend and backend xenstore entries
>>>> while xen-netfront is in the removal code path.  In that case, the
>>>> checks for xenbus_read_driver_state would return XenbusStateUnknown, and
>>>> xennet_remove would hang indefinitely.  This hang prevents system
>>>> shutdown.
>>>>
>>>> xennet_remove must be able to handle XenbusStateUnknown, and
>>>> netback_changed must also wake up the wake_queue for that state as well.
>>>>
>>>> Fixes: 5b5971df3bc2 ("xen-netfront: remove warning when unloading module")
>>>
>>> I think this should go into stable since AFAIK the hanging network
>>> device can only be fixed by rebooting the guest. AFAICS this affects all
>>> 4.* branches since 5b5971df3bc2 got backported to them.
>>>
>>> Upstream commit c2d2e6738a209f0f9dffa2dc8e7292fc45360d61.
>>
>> Simon,
>>
>> Yes, I agree.  I actually submitted the request to stable earlier
>> today, so hopefully it gets added soon.
>
> Ok, great. (I checked the stable patch queue, but didn't check the
> mailing list archive).
>
>> Have you experienced this hang?
>
> Yes, it's affecting the kernel shipped by Qubes OS (see [1]).

Ok, interesting.  I tracked down this bug with older xenvm tools, and
I didn't know if libxl tools were also affected.

Greg KH added the patch to the stable queue, so it's in the process.

Regards,
Jason

^ permalink raw reply

* [PATCH 02/02] afs: Implement namespacing
From: David Howells @ 2018-04-20 12:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dhowells, Andrew Morton, Alexander Viro, linux-kernel, linux-afs,
	netdev, Alexey Dobriyan
In-Reply-To: <22726.1524227374@warthog.procyon.org.uk>

Implement namespacing within AFS, but don't yet let mounts occur outside
the init namespace.  An additional patch will be required propagate the
network namespace across automounts.
    
Signed-off-by: David Howells <dhowells@redhat.com>
---
 cell.c     |    4 +-
 internal.h |   39 +++++++++++++++-----------
 main.c     |   33 ++++++++++++++++++----
 proc.c     |   89 +++++++++++++++++++++++++++++++++++++++----------------------
 super.c    |   20 ++++++++-----
 5 files changed, 121 insertions(+), 64 deletions(-)

diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index fdf4c36cff79..a98a8a3d5544 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -528,7 +528,7 @@ static int afs_activate_cell(struct afs_net *net, struct afs_cell *cell)
 					     NULL, 0,
 					     cell, 0, true);
 #endif
-	ret = afs_proc_cell_setup(net, cell);
+	ret = afs_proc_cell_setup(cell);
 	if (ret < 0)
 		return ret;
 	spin_lock(&net->proc_cells_lock);
@@ -544,7 +544,7 @@ static void afs_deactivate_cell(struct afs_net *net, struct afs_cell *cell)
 {
 	_enter("%s", cell->name);
 
-	afs_proc_cell_remove(net, cell);
+	afs_proc_cell_remove(cell);
 
 	spin_lock(&net->proc_cells_lock);
 	list_del_init(&cell->proc_link);
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index f8086ec95e24..9863dac6ff6a 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -22,6 +22,8 @@
 #include <linux/backing-dev.h>
 #include <linux/uuid.h>
 #include <net/net_namespace.h>
+#include <net/netns/generic.h>
+#include <net/sock.h>
 #include <net/af_rxrpc.h>
 
 #include "afs.h"
@@ -40,7 +42,8 @@ struct afs_mount_params {
 	afs_voltype_t		type;		/* type of volume requested */
 	int			volnamesz;	/* size of volume name */
 	const char		*volname;	/* name of volume to mount */
-	struct afs_net		*net;		/* Network namespace in effect */
+	struct net		*net_ns;	/* Network namespace in effect */
+	struct afs_net		*net;		/* the AFS net namespace stuff */
 	struct afs_cell		*cell;		/* cell in which to find volume */
 	struct afs_volume	*volume;	/* volume record */
 	struct key		*key;		/* key to use for secure mounting */
@@ -189,7 +192,7 @@ struct afs_read {
  * - there's one superblock per volume
  */
 struct afs_super_info {
-	struct afs_net		*net;		/* Network namespace */
+	struct net		*net_ns;	/* Network namespace */
 	struct afs_cell		*cell;		/* The cell in which the volume resides */
 	struct afs_volume	*volume;	/* volume record */
 	bool			dyn_root;	/* True if dynamic root */
@@ -218,6 +221,7 @@ struct afs_sysnames {
  * AFS network namespace record.
  */
 struct afs_net {
+	struct net		*net;		/* Backpointer to the owning net namespace */
 	struct afs_uuid		uuid;
 	bool			live;		/* F if this namespace is being removed */
 
@@ -280,7 +284,6 @@ struct afs_net {
 };
 
 extern const char afs_init_sysname[];
-extern struct afs_net __afs_net;// Dummy AFS network namespace; TODO: replace with real netns
 
 enum afs_cell_state {
 	AFS_CELL_UNSET,
@@ -787,34 +790,36 @@ extern int afs_drop_inode(struct inode *);
  * main.c
  */
 extern struct workqueue_struct *afs_wq;
+extern int afs_net_id;
 
-static inline struct afs_net *afs_d2net(struct dentry *dentry)
+static inline struct afs_net *afs_net(struct net *net)
 {
-	return &__afs_net;
+	return net_generic(net, afs_net_id);
 }
 
-static inline struct afs_net *afs_i2net(struct inode *inode)
+static inline struct afs_net *afs_sb2net(struct super_block *sb)
 {
-	return &__afs_net;
+	return afs_net(AFS_FS_S(sb)->net_ns);
 }
 
-static inline struct afs_net *afs_v2net(struct afs_vnode *vnode)
+static inline struct afs_net *afs_d2net(struct dentry *dentry)
 {
-	return &__afs_net;
+	return afs_sb2net(dentry->d_sb);
 }
 
-static inline struct afs_net *afs_sock2net(struct sock *sk)
+static inline struct afs_net *afs_i2net(struct inode *inode)
 {
-	return &__afs_net;
+	return afs_sb2net(inode->i_sb);
 }
 
-static inline struct afs_net *afs_get_net(struct afs_net *net)
+static inline struct afs_net *afs_v2net(struct afs_vnode *vnode)
 {
-	return net;
+	return afs_i2net(&vnode->vfs_inode);
 }
 
-static inline void afs_put_net(struct afs_net *net)
+static inline struct afs_net *afs_sock2net(struct sock *sk)
 {
+	return net_generic(sock_net(sk), afs_net_id);
 }
 
 static inline void __afs_stat(atomic_t *s)
@@ -849,8 +854,8 @@ extern int afs_get_ipv4_interfaces(struct afs_interface *, size_t, bool);
  */
 extern int __net_init afs_proc_init(struct afs_net *);
 extern void __net_exit afs_proc_cleanup(struct afs_net *);
-extern int afs_proc_cell_setup(struct afs_net *, struct afs_cell *);
-extern void afs_proc_cell_remove(struct afs_net *, struct afs_cell *);
+extern int afs_proc_cell_setup(struct afs_cell *);
+extern void afs_proc_cell_remove(struct afs_cell *);
 extern void afs_put_sysnames(struct afs_sysnames *);
 
 /*
@@ -983,7 +988,7 @@ extern bool afs_annotate_server_list(struct afs_server_list *, struct afs_server
  * super.c
  */
 extern int __init afs_fs_init(void);
-extern void __exit afs_fs_exit(void);
+extern void afs_fs_exit(void);
 
 /*
  * vlclient.c
diff --git a/fs/afs/main.c b/fs/afs/main.c
index d7560168b3bf..7d2c1354e2ca 100644
--- a/fs/afs/main.c
+++ b/fs/afs/main.c
@@ -15,6 +15,7 @@
 #include <linux/completion.h>
 #include <linux/sched.h>
 #include <linux/random.h>
+#include <linux/proc_fs.h>
 #define CREATE_TRACE_POINTS
 #include "internal.h"
 
@@ -32,7 +33,7 @@ module_param(rootcell, charp, 0);
 MODULE_PARM_DESC(rootcell, "root AFS cell name and VL server IP addr list");
 
 struct workqueue_struct *afs_wq;
-struct afs_net __afs_net;
+static struct proc_dir_entry *afs_proc_symlink;
 
 #if defined(CONFIG_ALPHA)
 const char afs_init_sysname[] = "alpha_linux26";
@@ -67,11 +68,13 @@ const char afs_init_sysname[] = "unknown_linux26";
 /*
  * Initialise an AFS network namespace record.
  */
-static int __net_init afs_net_init(struct afs_net *net)
+static int __net_init afs_net_init(struct net *net_ns)
 {
 	struct afs_sysnames *sysnames;
+	struct afs_net *net = afs_net(net_ns);
 	int ret;
 
+	net->net = net_ns;
 	net->live = true;
 	generate_random_uuid((unsigned char *)&net->uuid);
 
@@ -142,8 +145,10 @@ static int __net_init afs_net_init(struct afs_net *net)
 /*
  * Clean up and destroy an AFS network namespace record.
  */
-static void __net_exit afs_net_exit(struct afs_net *net)
+static void __net_exit afs_net_exit(struct net *net_ns)
 {
+	struct afs_net *net = afs_net(net_ns);
+
 	net->live = false;
 	afs_cell_purge(net);
 	afs_purge_servers(net);
@@ -152,6 +157,13 @@ static void __net_exit afs_net_exit(struct afs_net *net)
 	afs_put_sysnames(net->sysnames);
 }
 
+static struct pernet_operations afs_net_ops = {
+	.init	= afs_net_init,
+	.exit	= afs_net_exit,
+	.id	= &afs_net_id,
+	.size	= sizeof(struct afs_net),
+};
+
 /*
  * initialise the AFS client FS module
  */
@@ -178,7 +190,7 @@ static int __init afs_init(void)
 		goto error_cache;
 #endif
 
-	ret = afs_net_init(&__afs_net);
+	ret = register_pernet_subsys(&afs_net_ops);
 	if (ret < 0)
 		goto error_net;
 
@@ -187,10 +199,18 @@ static int __init afs_init(void)
 	if (ret < 0)
 		goto error_fs;
 
+	afs_proc_symlink = proc_symlink("fs/afs", NULL, "../self/net/afs");
+	if (IS_ERR(afs_proc_symlink)) {
+		ret = PTR_ERR(afs_proc_symlink);
+		goto error_proc;
+	}
+
 	return ret;
 
+error_proc:
+	afs_fs_exit();
 error_fs:
-	afs_net_exit(&__afs_net);
+	unregister_pernet_subsys(&afs_net_ops);
 error_net:
 #ifdef CONFIG_AFS_FSCACHE
 	fscache_unregister_netfs(&afs_cache_netfs);
@@ -219,8 +239,9 @@ static void __exit afs_exit(void)
 {
 	printk(KERN_INFO "kAFS: Red Hat AFS client v0.1 unregistering.\n");
 
+	proc_remove(afs_proc_symlink);
 	afs_fs_exit();
-	afs_net_exit(&__afs_net);
+	unregister_pernet_subsys(&afs_net_ops);
 #ifdef CONFIG_AFS_FSCACHE
 	fscache_unregister_netfs(&afs_cache_netfs);
 #endif
diff --git a/fs/afs/proc.c b/fs/afs/proc.c
index 839a22280606..cc7c48a5b743 100644
--- a/fs/afs/proc.c
+++ b/fs/afs/proc.c
@@ -17,14 +17,16 @@
 #include <linux/uaccess.h>
 #include "internal.h"
 
-static inline struct afs_net *afs_proc2net(struct file *f)
+static inline struct afs_net *afs_proc2net_get(struct file *f)
 {
-	return &__afs_net;
+	struct net *net_ns = get_proc_net(file_inode(f));
+
+	return net_ns ? afs_net(net_ns) : NULL;
 }
 
 static inline struct afs_net *afs_seq2net(struct seq_file *m)
 {
-	return &__afs_net; // TODO: use seq_file_net(m)
+	return afs_net(seq_file_net(m));
 }
 
 static int afs_proc_cells_open(struct inode *inode, struct file *file);
@@ -161,7 +163,7 @@ int afs_proc_init(struct afs_net *net)
 {
 	_enter("");
 
-	net->proc_afs = proc_mkdir("fs/afs", NULL);
+	net->proc_afs = proc_net_mkdir(net->net, "afs", net->net->proc_net);
 	if (!net->proc_afs)
 		goto error_dir;
 
@@ -196,16 +198,8 @@ void afs_proc_cleanup(struct afs_net *net)
  */
 static int afs_proc_cells_open(struct inode *inode, struct file *file)
 {
-	struct seq_file *m;
-	int ret;
-
-	ret = seq_open(file, &afs_proc_cells_ops);
-	if (ret < 0)
-		return ret;
-
-	m = file->private_data;
-	m->private = PDE_DATA(inode);
-	return 0;
+	return seq_open_net(inode, file, &afs_proc_cells_ops,
+			    sizeof(struct seq_net_private));
 }
 
 /*
@@ -266,7 +260,8 @@ static int afs_proc_cells_show(struct seq_file *m, void *v)
 static ssize_t afs_proc_cells_write(struct file *file, const char __user *buf,
 				    size_t size, loff_t *_pos)
 {
-	struct afs_net *net = afs_proc2net(file);
+	struct afs_net *net;
+	struct net *net_ns = NULL;
 	char *kbuf, *name, *args;
 	int ret;
 
@@ -305,6 +300,12 @@ static ssize_t afs_proc_cells_write(struct file *file, const char __user *buf,
 	/* determine command to perform */
 	_debug("cmd=%s name=%s args=%s", kbuf, name, args);
 
+	ret = -ESTALE;
+	net_ns = get_proc_net(file_inode(file));
+	if (!net_ns)
+		goto done;
+	net = afs_net(net_ns);
+
 	if (strcmp(kbuf, "add") == 0) {
 		struct afs_cell *cell;
 
@@ -324,6 +325,7 @@ static ssize_t afs_proc_cells_write(struct file *file, const char __user *buf,
 	ret = size;
 
 done:
+	put_net(net_ns);
 	kfree(kbuf);
 	_leave(" = %d", ret);
 	return ret;
@@ -338,15 +340,24 @@ static ssize_t afs_proc_rootcell_read(struct file *file, char __user *buf,
 				      size_t size, loff_t *_pos)
 {
 	struct afs_cell *cell;
-	struct afs_net *net = afs_proc2net(file);
+	struct afs_net *net;
+	struct net *net_ns = NULL;
 	unsigned int seq = 0;
 	char name[AFS_MAXCELLNAME + 1];
 	int len;
 
 	if (*_pos > 0)
 		return 0;
-	if (!net->ws_cell)
-		return 0;
+
+	net_ns = get_proc_net(file_inode(file));
+	if (!net_ns)
+		return -ESTALE;
+	net = afs_net(net_ns);
+
+	if (!net->ws_cell) {
+		len = 0;
+		goto out;
+	}
 
 	rcu_read_lock();
 	do {
@@ -362,14 +373,18 @@ static ssize_t afs_proc_rootcell_read(struct file *file, char __user *buf,
 	rcu_read_unlock();
 
 	if (!len)
-		return 0;
+		goto out;
 
 	name[len++] = '\n';
 	if (len > size)
 		len = size;
-	if (copy_to_user(buf, name, len) != 0)
-		return -EFAULT;
+	if (copy_to_user(buf, name, len) != 0) {
+		len = -EFAULT;
+		goto out;
+	}
 	*_pos = 1;
+out:
+	put_net(net_ns);
 	return len;
 }
 
@@ -381,7 +396,8 @@ static ssize_t afs_proc_rootcell_write(struct file *file,
 				       const char __user *buf,
 				       size_t size, loff_t *_pos)
 {
-	struct afs_net *net = afs_proc2net(file);
+	struct afs_net *net;
+	struct net *net_ns = NULL;
 	char *kbuf, *s;
 	int ret;
 
@@ -407,6 +423,12 @@ static ssize_t afs_proc_rootcell_write(struct file *file,
 	/* determine command to perform */
 	_debug("rootcell=%s", kbuf);
 
+	ret = -ESTALE;
+	net_ns = get_proc_net(file_inode(file));
+	if (!net_ns)
+		goto out;
+	net = afs_net(net_ns);
+
 	ret = afs_cell_init(net, kbuf);
 	if (ret >= 0)
 		ret = size;	/* consume everything, always */
@@ -420,13 +442,14 @@ static ssize_t afs_proc_rootcell_write(struct file *file,
 /*
  * initialise /proc/fs/afs/<cell>/
  */
-int afs_proc_cell_setup(struct afs_net *net, struct afs_cell *cell)
+int afs_proc_cell_setup(struct afs_cell *cell)
 {
 	struct proc_dir_entry *dir;
+	struct afs_net *net = cell->net;
 
 	_enter("%p{%s},%p", cell, cell->name, net->proc_afs);
 
-	dir = proc_mkdir(cell->name, net->proc_afs);
+	dir = proc_net_mkdir(net->net, cell->name, net->proc_afs);
 	if (!dir)
 		goto error_dir;
 
@@ -449,12 +472,12 @@ int afs_proc_cell_setup(struct afs_net *net, struct afs_cell *cell)
 /*
  * remove /proc/fs/afs/<cell>/
  */
-void afs_proc_cell_remove(struct afs_net *net, struct afs_cell *cell)
+void afs_proc_cell_remove(struct afs_cell *cell)
 {
-	_enter("");
+	struct afs_net *net = cell->net;
 
+	_enter("");
 	remove_proc_subtree(cell->name, net->proc_afs);
-
 	_leave("");
 }
 
@@ -471,7 +494,8 @@ static int afs_proc_cell_volumes_open(struct inode *inode, struct file *file)
 	if (!cell)
 		return -ENOENT;
 
-	ret = seq_open(file, &afs_proc_cell_volumes_ops);
+	ret = seq_open_net(inode, file, &afs_proc_cell_volumes_ops,
+			   sizeof(struct seq_net_private));
 	if (ret < 0)
 		return ret;
 
@@ -560,7 +584,8 @@ static int afs_proc_cell_vlservers_open(struct inode *inode, struct file *file)
 	if (!cell)
 		return -ENOENT;
 
-	ret = seq_open(file, &afs_proc_cell_vlservers_ops);
+	ret = seq_open_net(inode, file, &afs_proc_cell_vlservers_ops,
+			   sizeof(struct seq_net_private));
 	if (ret<0)
 		return ret;
 
@@ -649,7 +674,8 @@ static int afs_proc_cell_vlservers_show(struct seq_file *m, void *v)
  */
 static int afs_proc_servers_open(struct inode *inode, struct file *file)
 {
-	return seq_open(file, &afs_proc_servers_ops);
+	return seq_open_net(inode, file, &afs_proc_servers_ops,
+			    sizeof(struct seq_net_private));
 }
 
 /*
@@ -729,7 +755,8 @@ static int afs_proc_sysname_open(struct inode *inode, struct file *file)
 	struct seq_file *m;
 	int ret;
 
-	ret = seq_open(file, &afs_proc_sysname_ops);
+	ret = seq_open_net(inode, file, &afs_proc_sysname_ops,
+			   sizeof(struct seq_net_private));
 	if (ret < 0)
 		return ret;
 
diff --git a/fs/afs/super.c b/fs/afs/super.c
index 65081ec3c36e..593820372848 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -48,6 +48,8 @@ struct file_system_type afs_fs_type = {
 };
 MODULE_ALIAS_FS("afs");
 
+int afs_net_id;
+
 static const struct super_operations afs_super_ops = {
 	.statfs		= afs_statfs,
 	.alloc_inode	= afs_alloc_inode,
@@ -117,7 +119,7 @@ int __init afs_fs_init(void)
 /*
  * clean up the filesystem
  */
-void __exit afs_fs_exit(void)
+void afs_fs_exit(void)
 {
 	_enter("");
 
@@ -351,7 +353,7 @@ static int afs_test_super(struct super_block *sb, void *data)
 	struct afs_super_info *as1 = data;
 	struct afs_super_info *as = AFS_FS_S(sb);
 
-	return (as->net == as1->net &&
+	return (as->net_ns == as1->net_ns &&
 		as->volume &&
 		as->volume->vid == as1->volume->vid);
 }
@@ -437,7 +439,7 @@ static struct afs_super_info *afs_alloc_sbi(struct afs_mount_params *params)
 
 	as = kzalloc(sizeof(struct afs_super_info), GFP_KERNEL);
 	if (as) {
-		as->net = afs_get_net(params->net);
+		as->net_ns = get_net(params->net_ns);
 		if (params->dyn_root)
 			as->dyn_root = true;
 		else
@@ -450,8 +452,8 @@ static void afs_destroy_sbi(struct afs_super_info *as)
 {
 	if (as) {
 		afs_put_volume(as->cell, as->volume);
-		afs_put_cell(as->net, as->cell);
-		afs_put_net(as->net);
+		afs_put_cell(afs_net(as->net_ns), as->cell);
+		put_net(as->net_ns);
 		kfree(as);
 	}
 }
@@ -472,12 +474,13 @@ static struct dentry *afs_mount(struct file_system_type *fs_type,
 	_enter(",,%s,%p", dev_name, options);
 
 	memset(&params, 0, sizeof(params));
-	params.net = &__afs_net;
 
 	ret = -EINVAL;
 	if (current->nsproxy->net_ns != &init_net)
 		goto error;
-
+	params.net_ns = current->nsproxy->net_ns;
+	params.net = afs_net(params.net_ns);
+	
 	/* parse the options and device name */
 	if (options) {
 		ret = afs_parse_options(&params, options, &dev_name);
@@ -571,7 +574,8 @@ static void afs_kill_super(struct super_block *sb)
 	 * deactivating the superblock.
 	 */
 	if (as->volume)
-		afs_clear_callback_interests(as->net, as->volume->servers);
+		afs_clear_callback_interests(afs_net(as->net_ns),
+					     as->volume->servers);
 	kill_anon_super(sb);
 	if (as->volume)
 		afs_deactivate_volume(as->volume);

^ permalink raw reply related

* [PATCH 01/02] net: Export get_proc_net()
From: David Howells @ 2018-04-20 12:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dhowells, Andrew Morton, Alexander Viro, linux-kernel, linux-afs,
	netdev, Alexey Dobriyan
In-Reply-To: <22726.1524227374@warthog.procyon.org.uk>

Export get_proc_net() so that write() routines attached to net-namespaced
proc files can find the network namespace that they're in.  Currently, this
is only accessible via seqfile routines.
    
This will permit AFS to have a separate cell database per-network namespace
and to manage each one independently of the others.
    
Signed-off-by: David Howells <dhowells@redhat.com>
---
 fs/proc/proc_net.c      |    3 ++-
 include/linux/proc_fs.h |    2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index 1763f370489d..23d56146e38f 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -33,10 +33,11 @@ static inline struct net *PDE_NET(struct proc_dir_entry *pde)
 	return pde->parent->data;
 }
 
-static struct net *get_proc_net(const struct inode *inode)
+struct net *get_proc_net(const struct inode *inode)
 {
 	return maybe_get_net(PDE_NET(PDE(inode)));
 }
+EXPORT_SYMBOL_GPL(get_proc_net);
 
 int seq_open_net(struct inode *ino, struct file *f,
 		 const struct seq_operations *ops, int size)
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 928ef9e4d912..47a87121d30c 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -79,6 +79,8 @@ static inline struct proc_dir_entry *proc_net_mkdir(
 	return proc_mkdir_data(name, 0, parent, net);
 }
 
+extern struct net *get_proc_net(const struct inode *inode);
+
 struct ns_common;
 int open_related_ns(struct ns_common *ns,
 		   struct ns_common *(*get_ns)(struct ns_common *ns));

^ permalink raw reply related

* Re: [PATCH 20/39] afs: simplify procfs code
From: David Howells @ 2018-04-20 12:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dhowells, Andrew Morton, Alexander Viro, linux-kernel, linux-afs,
	netdev, Alexey Dobriyan
In-Reply-To: <22411.1524142823@warthog.procyon.org.uk>

David Howells <dhowells@redhat.com> wrote:

> > Use remove_proc_subtree to remove the whole subtree on cleanup, and
> > unwind the registration loop into individual calls.  Switch to use
> > proc_create_seq where applicable.
> 
> Note that this is likely going to clash with my patch to net-namespace all of
> the afs proc files:
> 
> 	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=mount-context&id=f60c26c827c073583107ebf19d87bc5c0e71c3d2
> 
> If it helps, I should be able to disentangle this from the mount-api changes
> since the subsequent patch connects the dots to propagate the network
> namespace over automount using the new fs_context to do it.

Okay, I'll follow up this mail with a pair of patches to just use network
namespacing in AFS.  The first exports a function from core code only; the
second is the actual modifications to AFS.

Note that this doesn't actually yet permit AFS to use any other namespace.
The propagation-over-automount issue is handled by a separate patch.

David

^ permalink raw reply

* Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Mikulas Patocka @ 2018-04-20 12:20 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: David Miller, Andrew Morton, linux-mm, eric.dumazet, edumazet,
	netdev, linux-kernel, mst, jasowang, virtualization, dm-devel,
	Vlastimil Babka
In-Reply-To: <20180420114712.GB10788@bombadil.infradead.org>



On Fri, 20 Apr 2018, Matthew Wilcox wrote:

> On Thu, Apr 19, 2018 at 12:12:38PM -0400, Mikulas Patocka wrote:
> > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> > uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> > code.
> 
> Maybe it's time to have the SG code handle vmalloced pages?  This is
> becoming more and more common with vmapped stacks (and some of our
> workarounds are hideous -- allocate 4 bytes with kmalloc because we can't
> DMA onto the stack any more?).  We already have a few places which do
> handle sgs of vmalloced addresses, such as the nx crypto driver:
> 
>         if (is_vmalloc_addr(start_addr))
>                 sg_addr = page_to_phys(vmalloc_to_page(start_addr))
>                           + offset_in_page(sg_addr);
>         else
>                 sg_addr = __pa(sg_addr);
> 
> and videobuf:
> 
>                 pg = vmalloc_to_page(virt);
>                 if (NULL == pg)
>                         goto err;
>                 BUG_ON(page_to_pfn(pg) >= (1 << (32 - PAGE_SHIFT)));
>                 sg_set_page(&sglist[i], pg, PAGE_SIZE, 0);
> 
> Yes, there's the potential that we have to produce two SG entries for a
> virtually contiguous region if it crosses a page boundary, and our APIs
> aren't set up right to make it happen.  But this is something we should
> consider fixing ... otherwise we'll end up with dozens of driver hacks.
> The videobuf implementation was already copy-and-pasted into the saa7146
> driver, for example.

What if the device requires physically contiguous area and the vmalloc 
area crosses a page? Will you use a bounce buffer? Where do you allocate 
the bounce buffer from? What if you run out of bounce buffers?

Mikulkas

^ permalink raw reply

* Re: [PATCH net-next] tun: do not compute the rxhash, if not needed
From: Jason Wang @ 2018-04-20 12:19 UTC (permalink / raw)
  To: Paolo Abeni, netdev; +Cc: David S. Miller
In-Reply-To: <1c43f8bc63407239c91df916b149d4fdbf26bed3.1524222969.git.pabeni@redhat.com>



On 2018年04月20日 19:18, Paolo Abeni wrote:
> Currently, the tun driver, in absence of an eBPF steering program,
> always compute the rxhash in its rx path, even when such value
> is later unused due to additional checks (
>
> This changeset moves the all the related checks just before the
> __skb_get_hash_symmetric(), so that the latter is no more computed
> when unneeded.
>
> Also replace an unneeded RCU section with rcu_access_pointer().
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>   drivers/net/tun.c | 15 ++++++---------
>   1 file changed, 6 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 1e58be152d5c..091ace726763 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -525,11 +525,6 @@ static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
>   
>   	rcu_read_lock();
>   
> -	/* We may get a very small possibility of OOO during switching, not
> -	 * worth to optimize.*/
> -	if (tun->numqueues == 1 || tfile->detached)
> -		goto unlock;
> -
>   	e = tun_flow_find(head, rxhash);
>   	if (likely(e)) {
>   		/* TODO: keep queueing to old queue until it's empty? */
> @@ -548,7 +543,6 @@ static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
>   		spin_unlock_bh(&tun->lock);
>   	}
>   
> -unlock:
>   	rcu_read_unlock();
>   }
>   
> @@ -1937,10 +1931,13 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>   		rcu_read_unlock();
>   	}
>   
> -	rcu_read_lock();
> -	if (!rcu_dereference(tun->steering_prog))
> +	/* Compute the costly rx hash only if needed for flow updates.
> +	 * We may get a very small possibility of OOO during switching, not
> +	 * worth to optimize.
> +	 */
> +	if (!rcu_access_pointer(tun->steering_prog) && tun->numqueues > 1 &&
> +	    !tfile->detached)
>   		rxhash = __skb_get_hash_symmetric(skb);
> -	rcu_read_unlock();
>   
>   	if (frags) {
>   		/* Exercise flow dissector code path. */

Acked-by: Jason Wang <jasowang@redhat.com>

^ permalink raw reply

* Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Mikulas Patocka @ 2018-04-20 12:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, linux-mm, eric.dumazet, edumazet, bhutchings,
	netdev, linux-kernel, mst, jasowang, virtualization, dm-devel,
	Vlastimil Babka
In-Reply-To: <20180419162250.00bf82e2c40b4960a7e23cdc@linux-foundation.org>



On Thu, 19 Apr 2018, Andrew Morton wrote:

> On Thu, 19 Apr 2018 17:19:20 -0400 (EDT) Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> > > > In order to detect these bugs reliably I submit this patch that changes
> > > > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
> > > > 
> > > > ...
> > > >
> > > > --- linux-2.6.orig/mm/util.c	2018-04-18 15:46:23.000000000 +0200
> > > > +++ linux-2.6/mm/util.c	2018-04-18 16:00:43.000000000 +0200
> > > > @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap);
> > > >   */
> > > >  void *kvmalloc_node(size_t size, gfp_t flags, int node)
> > > >  {
> > > > +#ifndef CONFIG_DEBUG_VM
> > > >  	gfp_t kmalloc_flags = flags;
> > > >  	void *ret;
> > > >  
> > > > @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f
> > > >  	 */
> > > >  	if (ret || size <= PAGE_SIZE)
> > > >  		return ret;
> > > > +#endif
> > > >  
> > > >  	return __vmalloc_node_flags_caller(size, node, flags,
> > > >  			__builtin_return_address(0));
> > > 
> > > Well, it doesn't have to be done at compile-time, does it?  We could
> > > add a knob (in debugfs, presumably) which enables this at runtime. 
> > > That's far more user-friendly.
> > 
> > But who will turn it on in debugfs?
> 
> But who will turn it on in Kconfig?  Just a handful of developers.  We

So, it won't receive much testing.

I've never played with those debugfs files (because I didn't need it), and 
most users also won't play with it. Having a debugfs option is like having 
no option at all.

> could add SONFIG_DEBUG_SG to the list in
> Documentation/process/submit-checklist.rst, but nobody reads that.
> 
> Also, a whole bunch of defconfigs set CONFIG_DEBUG_SG=y and some
> googling indicates that they aren't the only ones...
> 
> > It should be default for debugging 
> > kernels, so that users using them would report the error.
> 
> Well.  This isn't the first time we've wanted to enable expensive (or
> noisy) debugging things in debug kernels, by any means.
> 
> So how could we define a debug kernel in which it's OK to enable such
> things?

Debug kernel is what distributions distribute as debug kernel - i.e. RHEL 
or Fedora have debugging kernels. So it needs to be bound to an option 
that is turned on in these kernels - so that any user who boots the 
debugging kernel triggers the bug.

> - Could be "it's an -rc kernel".  But then we'd be enabling a bunch of
>   untested code when Linus cuts a release.
> 
> - Could be "it's an -rc kernel with SUBLEVEL <= 5".  But then we risk
>   unexpected things happening when Linux cuts -rc6, which still isn't
>   good.
> 
> - How about "it's an -rc kernel with odd-numbered SUBLEVEL and
>   SUBLEVEL <= 5".  That way everybody who runs -rc1, -rc3 and -rc5 will
>   have kvmalloc debugging enabled.  That's potentially nasty because
>   vmalloc is much slower than kmalloc.  But kvmalloc() is only used for
>   large and probably infrequent allocations, so it's probably OK.
> 
> I wonder how we get at SUBLEVEL from within .c.  

Don't bind it to rc level, bind it to some debugging configuration option.

Mikulas

^ permalink raw reply

* Re: Q: force netif ON even when there is no real link ?
From: Ran Shalit @ 2018-04-20 12:14 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev
In-Reply-To: <20180420120516.GC20256@lunn.ch>

On Fri, Apr 20, 2018 at 3:05 PM, Andrew Lunn <andrew@lunn.ch> wrote:
> On Fri, Apr 20, 2018 at 03:01:09PM +0300, Ran Shalit wrote:
>> On Fri, Apr 20, 2018 at 2:55 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>> > On Fri, Apr 20, 2018 at 11:44:14AM +0300, Ran Shalit wrote:
>> >> Hello,
>> >>
>> >> We configure external switch in u-boot.
>> >> The configuration is through mdio (cpu is mac and switch is phy).
>> >>
>> >> But in Linux we rather not implement any communication in mdio to
>> >> switch, but it means that we then don't have the information of link
>> >> state.
>> >>
>> >> Is it possible to force in Linux (by default in startup) Ethernet
>> >> connectivity (netif_carrier_on, netif_wake_queue) even if there is no
>> >> information of real link state ?
>> >
>> > Hi Ran
>> >
>> > Use a fixed-phy.
>> >
>>
>> Hi Andrew,
>>
>> I'll check about fixed phy,
>> but in general, is it a problem to have always netif_carrier_on, even
>> when there is no link ?
>
> The link between the CPU and the switch should be up all the
> time. That is the point of fixed-link.
>

I understand.
But what about the mac driver,  does it just do netif_start_queue ?

Thanks


>     Andrew

^ permalink raw reply

* Re: [PATCH bpf-next 1/5] samples/bpf: Fix typo in comment
From: Jesper Dangaard Brouer @ 2018-04-20 12:10 UTC (permalink / raw)
  To: Leo Yan; +Cc: brouer, Alexei Starovoitov, Daniel Borkmann, netdev, linux-kernel
In-Reply-To: <1524101646-6544-2-git-send-email-leo.yan@linaro.org>


On Thu, 19 Apr 2018 09:34:02 +0800 Leo Yan <leo.yan@linaro.org> wrote:

> Fix typo by replacing 'iif' with 'if'.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  samples/bpf/bpf_load.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
> index bebe418..28e4678 100644
> --- a/samples/bpf/bpf_load.c
> +++ b/samples/bpf/bpf_load.c
> @@ -393,7 +393,7 @@ static int load_elf_maps_section(struct bpf_map_data *maps, int maps_shndx,
>  			continue;
>  		if (sym[nr_maps].st_shndx != maps_shndx)
>  			continue;
> -		/* Only increment iif maps section */
> +		/* Only increment if maps section */
>  		nr_maps++;
>  	}

This was actually not a typo from my side.

With 'iif' I mean 'if and only if' ... but it doesn't matter much.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: Q: force netif ON even when there is no real link ?
From: Andrew Lunn @ 2018-04-20 12:05 UTC (permalink / raw)
  To: Ran Shalit; +Cc: netdev
In-Reply-To: <CAJ2oMhJKOyFceuNwxVb4zPtF9Dm34s=JGfnp0AnP=c2zu7VWAA@mail.gmail.com>

On Fri, Apr 20, 2018 at 03:01:09PM +0300, Ran Shalit wrote:
> On Fri, Apr 20, 2018 at 2:55 PM, Andrew Lunn <andrew@lunn.ch> wrote:
> > On Fri, Apr 20, 2018 at 11:44:14AM +0300, Ran Shalit wrote:
> >> Hello,
> >>
> >> We configure external switch in u-boot.
> >> The configuration is through mdio (cpu is mac and switch is phy).
> >>
> >> But in Linux we rather not implement any communication in mdio to
> >> switch, but it means that we then don't have the information of link
> >> state.
> >>
> >> Is it possible to force in Linux (by default in startup) Ethernet
> >> connectivity (netif_carrier_on, netif_wake_queue) even if there is no
> >> information of real link state ?
> >
> > Hi Ran
> >
> > Use a fixed-phy.
> >
> 
> Hi Andrew,
> 
> I'll check about fixed phy,
> but in general, is it a problem to have always netif_carrier_on, even
> when there is no link ?

The link between the CPU and the switch should be up all the
time. That is the point of fixed-link.

    Andrew

^ permalink raw reply

* Re: Q: force netif ON even when there is no real link ?
From: Ran Shalit @ 2018-04-20 12:01 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev
In-Reply-To: <20180420115556.GB20256@lunn.ch>

On Fri, Apr 20, 2018 at 2:55 PM, Andrew Lunn <andrew@lunn.ch> wrote:
> On Fri, Apr 20, 2018 at 11:44:14AM +0300, Ran Shalit wrote:
>> Hello,
>>
>> We configure external switch in u-boot.
>> The configuration is through mdio (cpu is mac and switch is phy).
>>
>> But in Linux we rather not implement any communication in mdio to
>> switch, but it means that we then don't have the information of link
>> state.
>>
>> Is it possible to force in Linux (by default in startup) Ethernet
>> connectivity (netif_carrier_on, netif_wake_queue) even if there is no
>> information of real link state ?
>
> Hi Ran
>
> Use a fixed-phy.
>

Hi Andrew,

I'll check about fixed phy,
but in general, is it a problem to have always netif_carrier_on, even
when there is no link ?

Thank you,
ranran

^ permalink raw reply

* Re: Q: force netif ON even when there is no real link ?
From: Andrew Lunn @ 2018-04-20 11:55 UTC (permalink / raw)
  To: Ran Shalit; +Cc: netdev
In-Reply-To: <CAJ2oMhKJyVFiw_k5eMzQXUbQ8DnUgbgdbCsWsjV+mwHHxPtAag@mail.gmail.com>

On Fri, Apr 20, 2018 at 11:44:14AM +0300, Ran Shalit wrote:
> Hello,
> 
> We configure external switch in u-boot.
> The configuration is through mdio (cpu is mac and switch is phy).
> 
> But in Linux we rather not implement any communication in mdio to
> switch, but it means that we then don't have the information of link
> state.
> 
> Is it possible to force in Linux (by default in startup) Ethernet
> connectivity (netif_carrier_on, netif_wake_queue) even if there is no
> information of real link state ?

Hi Ran

Use a fixed-phy.

    Andrew

^ permalink raw reply

* Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Matthew Wilcox @ 2018-04-20 11:47 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: dm-devel, eric.dumazet, mst, netdev, linux-kernel, virtualization,
	linux-mm, edumazet, bhutchings, Andrew Morton, David Miller,
	Vlastimil Babka
In-Reply-To: <alpine.LRH.2.02.1804191207380.31175@file01.intranet.prod.int.rdu2.redhat.com>

On Thu, Apr 19, 2018 at 12:12:38PM -0400, Mikulas Patocka wrote:
> Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> code.

Maybe it's time to have the SG code handle vmalloced pages?  This is
becoming more and more common with vmapped stacks (and some of our
workarounds are hideous -- allocate 4 bytes with kmalloc because we can't
DMA onto the stack any more?).  We already have a few places which do
handle sgs of vmalloced addresses, such as the nx crypto driver:

        if (is_vmalloc_addr(start_addr))
                sg_addr = page_to_phys(vmalloc_to_page(start_addr))
                          + offset_in_page(sg_addr);
        else
                sg_addr = __pa(sg_addr);

and videobuf:

                pg = vmalloc_to_page(virt);
                if (NULL == pg)
                        goto err;
                BUG_ON(page_to_pfn(pg) >= (1 << (32 - PAGE_SHIFT)));
                sg_set_page(&sglist[i], pg, PAGE_SIZE, 0);

Yes, there's the potential that we have to produce two SG entries for a
virtually contiguous region if it crosses a page boundary, and our APIs
aren't set up right to make it happen.  But this is something we should
consider fixing ... otherwise we'll end up with dozens of driver hacks.
The videobuf implementation was already copy-and-pasted into the saa7146
driver, for example.

^ permalink raw reply

* Re: [PATCH 3/3] xen blkback: add fault injection facility
From: Juergen Gross @ 2018-04-20 11:28 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: jakub.kicinski, hpa, mcroce, tglx, ggarcia, daniel, x86, mingo,
	xen-devel, axboe, konrad.wilk, amir.jer.levy, paul.durrant,
	stefanha, dsa, boris.ostrovsky, linux-block, wei.liu2, netdev,
	linux-kernel, davem, dwmw, roger.pau
In-Reply-To: <20180420104736.17823.42983.stgit@dev-dsk-staskins-1a-ca5afbf2.eu-west-1.amazon.com>

On 20/04/18 12:47, Stanislav Kinsburskii wrote:
> This patch adds wrapper helpers around generic Xen fault inject
> facility.
> The major reason is to keep all the module fault injection directories
> in a dedicated subdirectory instead of Xen fault inject root.
> 
> IOW, when using these helpers, per-device and named by device name
> fault injection control directories will appear under the following
> directory:
> - /sys/kernel/debug/xen/fault_inject/xen-blkback/
> instead of:
> - /sys/kernel/debug/xen/fault_inject/
> 
> Signed-off-by: Stanislav Kinsburskii <staskins@amazon.com>
> CC: Jens Axboe <axboe@kernel.dk>
> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> CC: "Roger Pau Monné" <roger.pau@citrix.com>
> CC: linux-kernel@vger.kernel.org
> CC: linux-block@vger.kernel.org
> CC: xen-devel@lists.xenproject.org
> CC: Stanislav Kinsburskii <staskins@amazon.com>
> CC: David Woodhouse <dwmw@amazon.co.uk>

This is an exact copy of the netback patch apart from the names.

I don't like adding multiple copies of the same coding to the tree.


Juergen

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox