Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] coredump: rename umh_pipe_setup() to coredump_pipe_setup()
From: Luis R. Rodriguez @ 2018-05-10 23:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Al Viro
  Cc: Luis R. Rodriguez, ast, linux-fsdevel, linux-kernel,
	David S. Miller, netdev
In-Reply-To: <20180510231907.xbok4h6rjopwdq6e@ast-mbp>

On Thu, May 10, 2018 at 04:19:09PM -0700, Alexei Starovoitov wrote:
> On Mon, May 07, 2018 at 04:30:02PM -0700, Luis R. Rodriguez wrote:
> > This makes it clearer this code is part of the coredump code, and
> > is not an exported generic helper from kernel/umh.c.
> > 
> > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> > ---
> >  fs/coredump.c | 9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/coredump.c b/fs/coredump.c
> > index 1e2c87acac9b..566504781683 100644
> > --- a/fs/coredump.c
> > +++ b/fs/coredump.c
> > @@ -508,7 +508,7 @@ static void wait_for_dump_helpers(struct file *file)
> >  }
> >  
> >  /*
> > - * umh_pipe_setup
> > + * coredump_pipe_setup
> >   * helper function to customize the process used
> >   * to collect the core in userspace.  Specifically
> >   * it sets up a pipe and installs it as fd 0 (stdin)
> > @@ -518,7 +518,7 @@ static void wait_for_dump_helpers(struct file *file)
> >   * is a special value that we use to trap recursive
> >   * core dumps
> >   */
> > -static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
> > +static int coredump_pipe_setup(struct subprocess_info *info, struct cred *new)
> 
> I think this renaming makes sense.
> How do we want to proceed?
> I can take it as part of my series and get the whole thing through net-next
> or folks want to apply this separately?

I think net-next makes sense if Al Viro is OK with that. This way it could go
in regardless of the state of your series, but it also lines up with your work.

  Luis

^ permalink raw reply

* [PATCH v6 5/6] net: pch_gbe: Allow build on MIPS platforms
From: Paul Burton @ 2018-05-10 23:16 UTC (permalink / raw)
  To: netdev; +Cc: linux-mips, David S . Miller, Andrew Lunn, Paul Burton
In-Reply-To: <20180510231657.28503-1-paul.burton@mips.com>

Allow the pch_gbe driver to be built on MIPS platforms, allowing its use
on the MIPS Boston development board.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-mips@linux-mips.org
Cc: netdev@vger.kernel.org

---

Changes in v6:
- None.

Changes in v5:
- None.

Changes in v4:
- None.

Changes in v3:
- None.

Changes in v2:
- None.

 drivers/net/ethernet/oki-semi/pch_gbe/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
index 045256e99586..bf85c44fb7e5 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
@@ -4,7 +4,7 @@

 config PCH_GBE
 	tristate "OKI SEMICONDUCTOR IOH(ML7223/ML7831) GbE"
-	depends on PCI && (X86_32 || COMPILE_TEST)
+	depends on PCI && (X86_32 || MIPS || COMPILE_TEST)
 	select PTP_1588_CLOCK_PCH
 	select NET_PTP_CLASSIFY
 	select AT803X_PHY
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH bpf-next] selftests/bpf: Fix bash reference in Makefile
From: Daniel Borkmann @ 2018-05-10 23:38 UTC (permalink / raw)
  To: Joe Stringer; +Cc: netdev
In-Reply-To: <20180510222651.4817-1-joe@wand.net.nz>

On 05/11/2018 12:26 AM, Joe Stringer wrote:
> '|& ...' is a bash 4.0+ construct which is not guaranteed to be available
> when using '$(shell ...)' in a Makefile. Fall back to the more portable
> '2>&1 | ...'.
> 
> Fixes the following warning during compilation:
> 
> 	/bin/sh: 1: Syntax error: "&" unexpected
> 
> Signed-off-by: Joe Stringer <joe@wand.net.nz>

Applied to bpf-next, thanks Joe!

^ permalink raw reply

* [PATCH v6 6/6] MIPS: Boston: Adjust DT for pch_gbe PHY support
From: Paul Burton @ 2018-05-10 23:16 UTC (permalink / raw)
  To: netdev; +Cc: linux-mips, David S . Miller, Andrew Lunn, Paul Burton
In-Reply-To: <20180510231657.28503-1-paul.burton@mips.com>

The pch_gbe driver support for PHY reset GPIOs is now provided by the
standard phylib infrastructure, using a standard PHY binding. Adjust the
Boston devicetree to make use of the standard PHY binding.

This is possible because we bundle the DT along with the kernel binary
into a Flattened Image Tree, so the DT and kernel are always shipped
together for the Boston platform.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-mips@linux-mips.org
Cc: netdev@vger.kernel.org

---

Changes in v6:
- New patch.

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 arch/mips/boot/dts/img/boston.dts | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/mips/boot/dts/img/boston.dts b/arch/mips/boot/dts/img/boston.dts
index 65af3f6ba81c..cb55f7ba20c3 100644
--- a/arch/mips/boot/dts/img/boston.dts
+++ b/arch/mips/boot/dts/img/boston.dts
@@ -144,8 +144,17 @@
 				eg20t_mac@2,0,1 {
 					compatible = "pci8086,8802";
 					reg = <0x00020100 0 0 0 0>;
-					phy-reset-gpios = <&eg20t_gpio 6
-							   GPIO_ACTIVE_LOW>;
+
+					#address-cells = <1>;
+					#size-cells = <0>;
+
+					ethernet-phy@0 {
+						compatible = "ethernet-phy-id001c.c915";
+						reg = <0>;
+						reset-gpios = <&eg20t_gpio 6 GPIO_ACTIVE_LOW>;
+						reset-assert-us = <25000>;
+						reset-deassert-us = <25000>;
+					};
 				};

 				eg20t_gpio: eg20t_gpio@2,0,2 {
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH net] macmace: Set platform device coherent_dma_mask
From: Finn Thain @ 2018-05-10 23:55 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Geert Uytterhoeven, David S. Miller, linux-m68k, netdev,
	Linux Kernel Mailing List, Christoph Hellwig
In-Reply-To: <CAOmrzkKNPemq5RySvza+Y8_jgwg2fkUZodR794cgOxpQpfh+SA@mail.gmail.com>

On Fri, 11 May 2018, Michael Schmitz wrote:

> > > Perhaps you can add a new helper 
> > > (platform_device_register_simple_dma()?) that takes the DMA mask, 
> > > too?
...
> >
> > So far, it looks like macmace and macsonic would be the only callers 
> > of this new API call.
> >
> > What's worse, if you do pass a dma_mask in struct 
> > platform_device_info, you end up with this problem in 
> > platform_device_register_full():
> >
> >         if (pdevinfo->dma_mask) {
> >                 /*
> >                  * This memory isn't freed when the device is put,
> >                  * I don't have a nice idea for that though.  Conceptually
> >                  * dma_mask in struct device should not be a pointer.
> >                  * See http://thread.gmane.org/gmane.linux.kernel.pci/9081
> >                  */
> >                 pdev->dev.dma_mask =
> >                         kmalloc(sizeof(*pdev->dev.dma_mask), GFP_KERNEL);
> 
> Maybe platform_device_register_full() should rather check whether 
> dev.coherent_dma_mask is set, and make dev.dma_mask point to that? This 
> is how we solved the warning issue for the Zorro bus devices... 
> (8614f1b58bd0e920a5859464a500b93152c5f8b1)
> 

The claim in the comment above that a pointer is the wrong solution 
suggests that your proposal won't get far. Also, your proposal doesn't 
address the other issues I raised: a new 
platform_device_register_simple_dma() API would only have two callers, and 
the dma mask setup for device-tree probed platform devices is apparently a 
work-in-progress (which I don't want to churn up).

> > > With people setting the mask to kill the WARNING splat, this may 
> > > become more common.
> >
> > Since the commit which introduced the WARNING, only commits f61e64310b75
> > ("m68k: set dma and coherent masks for platform FEC ethernets") and
> > 7bcfab202ca7 ("powerpc/macio: set a proper dma_coherent_mask") seem to be
> > aimed at squelching that WARNING.
> >
> > (Am I missing any others?)
> 
> Zorro devices :-)

Right, I should add commit 55496d3fe2ac ("zorro: Set up z->dev.dma_mask 
for the DMA API") to that list.

> Which begs the question: why can' you set up all Nubus bus devices' DMA 
> masks in nubus_device_register(), or nubus_add_board()?

I am expecting to see the same WARNING from the nubus sonic driver but it 
hasn't happened yet, so I don't have a patch for it yet. In anycase, the 
nubus fix would be a lot like the zorro bus fix, so I don't see a problem.

-- 

^ permalink raw reply

* [RFC PATCH] ipv6: sr: lwt_seg6local_verifier_ops can be static
From: kbuild test robot @ 2018-05-11  0:05 UTC (permalink / raw)
  To: Mathieu Xhonneux; +Cc: kbuild-all, netdev, dlebrun, alexei.starovoitov
In-Reply-To: <d1d96894aaa863de51ecd65efa724342c4e13063.1525898587.git.m.xhonneux@gmail.com>


Fixes: e7d82c64d15a ("ipv6: sr: Add seg6local action End.BPF")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
---
 filter.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index ce10f20..9e47c86 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6195,13 +6195,13 @@ const struct bpf_prog_ops lwt_xmit_prog_ops = {
 	.test_run		= bpf_prog_test_run_skb,
 };
 
-const struct bpf_verifier_ops lwt_seg6local_verifier_ops = {
+static const struct bpf_verifier_ops lwt_seg6local_verifier_ops = {
 	.get_func_proto		= lwt_seg6local_func_proto,
 	.is_valid_access	= lwt_is_valid_access,
 	.convert_ctx_access	= bpf_convert_ctx_access,
 };
 
-const struct bpf_prog_ops lwt_seg6local_prog_ops = {
+static const struct bpf_prog_ops lwt_seg6local_prog_ops = {
 	.test_run		= bpf_prog_test_run_skb,
 };
 

^ permalink raw reply related

* Re: [PATCH bpf-next v4 5/6] ipv6: sr: Add seg6local action End.BPF
From: kbuild test robot @ 2018-05-11  0:05 UTC (permalink / raw)
  To: Mathieu Xhonneux; +Cc: kbuild-all, netdev, dlebrun, alexei.starovoitov
In-Reply-To: <d1d96894aaa863de51ecd65efa724342c4e13063.1525898587.git.m.xhonneux@gmail.com>

Hi Mathieu,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Mathieu-Xhonneux/ipv6-sr-introduce-seg6local-End-BPF-action/20180511-032546
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   net/core/filter.c:112:48: sparse: expression using sizeof(void)
   net/core/filter.c:112:48: sparse: expression using sizeof(void)
   net/core/filter.c:206:32: sparse: cast to restricted __be16
   net/core/filter.c:206:32: sparse: cast to restricted __be16
   net/core/filter.c:206:32: sparse: cast to restricted __be16
   net/core/filter.c:206:32: sparse: cast to restricted __be16
   net/core/filter.c:206:32: sparse: cast to restricted __be16
   net/core/filter.c:206:32: sparse: cast to restricted __be16
   net/core/filter.c:206:32: sparse: cast to restricted __be16
   net/core/filter.c:206:32: sparse: cast to restricted __be16
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:233:32: sparse: cast to restricted __be32
   net/core/filter.c:406:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:409:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:412:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:415:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:418:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:481:27: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:484:27: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:487:27: sparse: subtraction of functions? Share your drugs
   include/linux/filter.h:615:16: sparse: expression using sizeof(void)
   include/linux/filter.h:615:16: sparse: expression using sizeof(void)
   include/linux/filter.h:615:16: sparse: expression using sizeof(void)
   include/linux/filter.h:615:16: sparse: expression using sizeof(void)
   net/core/filter.c:1368:39: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct sock_filter const *filter @@    got struct sockstruct sock_filter const *filter @@
   net/core/filter.c:1368:39:    expected struct sock_filter const *filter
   net/core/filter.c:1368:39:    got struct sock_filter [noderef] <asn:1>*filter
   include/linux/filter.h:615:16: sparse: expression using sizeof(void)
   include/linux/filter.h:615:16: sparse: expression using sizeof(void)
   net/core/filter.c:1470:39: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct sock_filter const *filter @@    got struct sockstruct sock_filter const *filter @@
   net/core/filter.c:1470:39:    expected struct sock_filter const *filter
   net/core/filter.c:1470:39:    got struct sock_filter [noderef] <asn:1>*filter
   include/linux/filter.h:615:16: sparse: expression using sizeof(void)
   net/core/filter.c:1772:43: sparse: incorrect type in argument 2 (different base types) @@    expected restricted __wsum [usertype] diff @@    got unsigned lonrestricted __wsum [usertype] diff @@
   net/core/filter.c:1772:43:    expected restricted __wsum [usertype] diff
   net/core/filter.c:1772:43:    got unsigned long long [unsigned] [usertype] to
   net/core/filter.c:1775:36: sparse: incorrect type in argument 2 (different base types) @@    expected restricted __be16 [usertype] old @@    got unsigned lonrestricted __be16 [usertype] old @@
   net/core/filter.c:1775:36:    expected restricted __be16 [usertype] old
   net/core/filter.c:1775:36:    got unsigned long long [unsigned] [usertype] from
   net/core/filter.c:1775:42: sparse: incorrect type in argument 3 (different base types) @@    expected restricted __be16 [usertype] new @@    got unsigned lonrestricted __be16 [usertype] new @@
   net/core/filter.c:1775:42:    expected restricted __be16 [usertype] new
   net/core/filter.c:1775:42:    got unsigned long long [unsigned] [usertype] to
   net/core/filter.c:1778:36: sparse: incorrect type in argument 2 (different base types) @@    expected restricted __be32 [usertype] from @@    got unsigned lonrestricted __be32 [usertype] from @@
   net/core/filter.c:1778:36:    expected restricted __be32 [usertype] from
   net/core/filter.c:1778:36:    got unsigned long long [unsigned] [usertype] from
   net/core/filter.c:1778:42: sparse: incorrect type in argument 3 (different base types) @@    expected restricted __be32 [usertype] to @@    got unsigned lonrestricted __be32 [usertype] to @@
   net/core/filter.c:1778:42:    expected restricted __be32 [usertype] to
   net/core/filter.c:1778:42:    got unsigned long long [unsigned] [usertype] to
   net/core/filter.c:1823:59: sparse: incorrect type in argument 3 (different base types) @@    expected restricted __wsum [usertype] diff @@    got unsigned lonrestricted __wsum [usertype] diff @@
   net/core/filter.c:1823:59:    expected restricted __wsum [usertype] diff
   net/core/filter.c:1823:59:    got unsigned long long [unsigned] [usertype] to
   net/core/filter.c:1826:52: sparse: incorrect type in argument 3 (different base types) @@    expected restricted __be16 [usertype] from @@    got unsigned lonrestricted __be16 [usertype] from @@
   net/core/filter.c:1826:52:    expected restricted __be16 [usertype] from
   net/core/filter.c:1826:52:    got unsigned long long [unsigned] [usertype] from
   net/core/filter.c:1826:58: sparse: incorrect type in argument 4 (different base types) @@    expected restricted __be16 [usertype] to @@    got unsigned lonrestricted __be16 [usertype] to @@
   net/core/filter.c:1826:58:    expected restricted __be16 [usertype] to
   net/core/filter.c:1826:58:    got unsigned long long [unsigned] [usertype] to
   net/core/filter.c:1829:52: sparse: incorrect type in argument 3 (different base types) @@    expected restricted __be32 [usertype] from @@    got unsigned lonrestricted __be32 [usertype] from @@
   net/core/filter.c:1829:52:    expected restricted __be32 [usertype] from
   net/core/filter.c:1829:52:    got unsigned long long [unsigned] [usertype] from
   net/core/filter.c:1829:58: sparse: incorrect type in argument 4 (different base types) @@    expected restricted __be32 [usertype] to @@    got unsigned lonrestricted __be32 [usertype] to @@
   net/core/filter.c:1829:58:    expected restricted __be32 [usertype] to
   net/core/filter.c:1829:58:    got unsigned long long [unsigned] [usertype] to
   net/core/filter.c:1875:28: sparse: incorrect type in return expression (different base types) @@    expected unsigned long long @@    got nsigned long long @@
   net/core/filter.c:1875:28:    expected unsigned long long
   net/core/filter.c:1875:28:    got restricted __wsum
   net/core/filter.c:1897:35: sparse: incorrect type in return expression (different base types) @@    expected unsigned long long @@    got restricted unsigned long long @@
   net/core/filter.c:1897:35:    expected unsigned long long
   net/core/filter.c:1897:35:    got restricted __wsum [usertype] csum
   net/core/filter.c:3708:41: sparse: expression using sizeof(void)
   net/core/filter.c:3712:41: sparse: expression using sizeof(void)
   net/core/filter.c:3716:46: sparse: expression using sizeof(void)
   net/core/filter.c:3716:46: sparse: expression using sizeof(void)
   net/core/filter.c:3784:47: sparse: expression using sizeof(void)
   net/core/filter.c:3990:17: sparse: incorrect type in assignment (different base types) @@    expected unsigned int [unsigned] [usertype] spi @@    got unsigned int [unsigned] [usertype] spi @@
   net/core/filter.c:3990:17:    expected unsigned int [unsigned] [usertype] spi
   net/core/filter.c:3990:17:    got restricted __be32 const [usertype] spi
   net/core/filter.c:3996:33: sparse: incorrect type in assignment (different base types) @@    expected unsigned int [unsigned] [usertype] remote_ipv4 @@    got unsigned int [unsigned] [usertype] remote_ipv4 @@
   net/core/filter.c:3996:33:    expected unsigned int [unsigned] [usertype] remote_ipv4
   net/core/filter.c:3996:33:    got restricted __be32 const [usertype] a4
   net/core/filter.c:4835:27: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:4838:27: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:4841:27: sparse: subtraction of functions? Share your drugs
>> net/core/filter.c:6198:31: sparse: symbol 'lwt_seg6local_verifier_ops' was not declared. Should it be static?
>> net/core/filter.c:6204:27: sparse: symbol 'lwt_seg6local_prog_ops' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [PATCH bpf-next 0/7] bpf: add perf event reading loop and move samples closer to libbpf
From: Daniel Borkmann @ 2018-05-11  0:06 UTC (permalink / raw)
  To: Jakub Kicinski, alexei.starovoitov; +Cc: oss-drivers, netdev
In-Reply-To: <20180510172443.17238-1-jakub.kicinski@netronome.com>

On 05/10/2018 07:24 PM, Jakub Kicinski wrote:
> Hi!
> 
> This series started out as a follow up to the bpftool perf event dumping
> patches.
> 
> As suggested by Daniel patch 1 makes use of PERF_SAMPLE_TIME to simplify
> code and improve accuracy of timestamps.
> 
> Remaining patches are trying to move perf event loop into libbpf as
> suggested by Alexei.  One user for this new function is bpftool which
> links with libbpf nicely, the other, unfortunately, is in samples/bpf.
> Remaining patches make samples/bpf link against full libbpf.a (not just
> a handful of objects).  Once we have full power of libbpf at our disposal
> we can convert some of XDP samples to use libbpf loader instead of
> bpf_load.c.  My understanding is that this is the desired direction,
> at least for networking code.

Looks good, applied to bpf-next, thanks Jakub!

^ permalink raw reply

* Re: [PATCH] mlx4_core: allocate 4KB ICM chunks
From: Yanjun Zhu @ 2018-05-11  0:13 UTC (permalink / raw)
  To: Qing Huang, tariqt, davem; +Cc: netdev, linux-rdma, linux-kernel
In-Reply-To: <20180510233143.7236-1-qing.huang@oracle.com>



On 2018/5/11 7:31, Qing Huang wrote:
> When a system is under memory presure (high usage with fragments),
> the original 256KB ICM chunk allocations will likely trigger kernel
> memory management to enter slow path doing memory compact/migration
> ops in order to complete high order memory allocations.
>
> When that happens, user processes calling uverb APIs may get stuck
> for more than 120s easily even though there are a lot of free pages
> in smaller chunks available in the system.
>
> Syslog:
> ...
> Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
> oracle_205573_e:205573 blocked for more than 120 seconds.
> ...
>
> With 4KB ICM chunk size, the above issue is fixed.
>
> However in order to support 4KB ICM chunk size, we need to fix another
> issue in large size kcalloc allocations.
>
> E.g.
> Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
> size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
> entry). So we need a 16MB allocation for a table->icm pointer array to
> hold 2M pointers which can easily cause kcalloc to fail.
>
> The solution is to use vzalloc to replace kcalloc. There is no need
> for contiguous memory pages for a driver meta data structure (no need
Hi,

Replace continuous memory pages with virtual memory, is there any 
performance loss?

Zhu Yanjun
> of DMA ops).
>
> Signed-off-by: Qing Huang <qing.huang@oracle.com>
> Acked-by: Daniel Jurgens <danielj@mellanox.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/icm.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
> index a822f7a..2b17a4b 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
> @@ -43,12 +43,12 @@
>   #include "fw.h"
>   
>   /*
> - * We allocate in as big chunks as we can, up to a maximum of 256 KB
> - * per chunk.
> + * We allocate in 4KB page size chunks to avoid high order memory
> + * allocations in fragmented/high usage memory situation.
>    */
>   enum {
> -	MLX4_ICM_ALLOC_SIZE	= 1 << 18,
> -	MLX4_TABLE_CHUNK_SIZE	= 1 << 18
> +	MLX4_ICM_ALLOC_SIZE	= 1 << 12,
> +	MLX4_TABLE_CHUNK_SIZE	= 1 << 12
>   };
>   
>   static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)
> @@ -400,7 +400,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table,
>   	obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size;
>   	num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk;
>   
> -	table->icm      = kcalloc(num_icm, sizeof(*table->icm), GFP_KERNEL);
> +	table->icm      = vzalloc(num_icm * sizeof(*table->icm));
>   	if (!table->icm)
>   		return -ENOMEM;
>   	table->virt     = virt;
> @@ -446,7 +446,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table,
>   			mlx4_free_icm(dev, table->icm[i], use_coherent);
>   		}
>   
> -	kfree(table->icm);
> +	vfree(table->icm);
>   
>   	return -ENOMEM;
>   }
> @@ -462,5 +462,5 @@ void mlx4_cleanup_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table)
>   			mlx4_free_icm(dev, table->icm[i], table->coherent);
>   		}
>   
> -	kfree(table->icm);
> +	vfree(table->icm);
>   }

^ permalink raw reply

* Re:Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS
From: Gao Feng @ 2018-05-11  0:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem@davemloft.net, daniel, jakub.kicinski, David Ahern,
	netdev@vger.kernel.org
In-Reply-To: <5ac360c4-0936-9c12-56cb-f81f08c925e6@gmail.com>

At 2018-05-10 21:02:55, "Eric Dumazet" <eric.dumazet@gmail.com> wrote:
>
>
>On 05/10/2018 01:28 AM, gfree.wind@vip.163.com wrote:
>> From: Gao Feng <gfree.wind@vip.163.com>
>> 
>> The skb flow limit is implemented for each CPU independently. In the
>> current codes, the function skb_flow_limit gets the softnet_data by
>> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
>> the current cpu when enable RPS. As the result, the skb_flow_limit checks
>> the stats of current CPU, while the skb is going to append the queue of
>> another CPU. It isn't the expected behavior.
>> 
>> Now pass the softnet_data as a param to softnet_data to make consistent.
>>
>
>Please add a correct Fixes: tag

Thanks Eric.

I have one question about the "Fixes: tag".
Most of patches are bug fixes, but when need to add the "Fixes: tag", and when not ?

I'm not clear about it. Could you explain it please?

Best Regards
Feng

>
>By doing so, you will likely add a CC: tag to make sure the author of the code
>will receive your email and give feed back.
>
>Thanks !
>

^ permalink raw reply

* Re: [PATCH v6 1/6] net: phy: at803x: Export at803x_debug_reg_mask()
From: Andrew Lunn @ 2018-05-11  0:26 UTC (permalink / raw)
  To: Paul Burton; +Cc: netdev, linux-mips, David S . Miller
In-Reply-To: <20180510231657.28503-2-paul.burton@mips.com>

On Thu, May 10, 2018 at 04:16:52PM -0700, Paul Burton wrote:
> From: Andrew Lunn <andrew@lunn.ch>
> 
> On some boards, this PHY has a problem when it hibernates. Export this
> function to a board can register a PHY fixup to disable hibernation.

What do you know about the problem?

https://patchwork.ozlabs.org/patch/686371/

I don't remember how it was solved, but you should probably do the
same.

	Andrew

^ permalink raw reply

* Re: [PATCH v6 6/6] MIPS: Boston: Adjust DT for pch_gbe PHY support
From: Andrew Lunn @ 2018-05-11  0:28 UTC (permalink / raw)
  To: Paul Burton; +Cc: netdev, linux-mips, David S . Miller
In-Reply-To: <20180510231657.28503-7-paul.burton@mips.com>

> +					ethernet-phy@0 {
> +						compatible = "ethernet-phy-id001c.c915";

You only need to specify the compatible string like this if the PHY
has its own ID wrong. The AT802x gets this right, so you don't need
this.

	Andrew

^ permalink raw reply

* Re: [PATCH v6 6/6] MIPS: Boston: Adjust DT for pch_gbe PHY support
From: Andrew Lunn @ 2018-05-11  0:35 UTC (permalink / raw)
  To: Paul Burton; +Cc: netdev, linux-mips, David S . Miller
In-Reply-To: <20180510231657.28503-7-paul.burton@mips.com>

>  				eg20t_mac@2,0,1 {
>  					compatible = "pci8086,8802";
>  					reg = <0x00020100 0 0 0 0>;
> -					phy-reset-gpios = <&eg20t_gpio 6
> -							   GPIO_ACTIVE_LOW>;
> +
> +					#address-cells = <1>;
> +					#size-cells = <0>;

It is generally a good idea to put an 'mdio' container which the PHYs
are on. You then pass this container node to of_mdiobus_register().

> +
> +					ethernet-phy@0 {
> +						compatible = "ethernet-phy-id001c.c915";
> +						reg = <0>;
> +						reset-gpios = <&eg20t_gpio 6 GPIO_ACTIVE_LOW>;
> +						reset-assert-us = <25000>;
> +						reset-deassert-us = <25000>;
> +					};

  Andrew

^ permalink raw reply

* [PATCH net-next] udp: Fix kernel panic in UDP GSO path
From: Sean Tranchetti @ 2018-05-11  0:38 UTC (permalink / raw)
  To: willemb, davem, netdev; +Cc: Sean Tranchetti, Subash Abhinov Kasiviswanathan

Using GSO in the UDP path on a device with
scatter-gather netdevice feature disabled will result in a kernel
panic with the following call stack:

kernel BUG at net/core/skbuff.c:104!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
PC is at skb_panic+0x4c/0x54
LR is at skb_panic+0x4c/0x54
Process udpgso_bench_tx (pid: 4078, stack limit = 0xffffff8048de8000)
[<ffffff96e8790378>] skb_panic+0x4c/0x54
[<ffffff96e8788b54>] skb_copy_bits+0x0/0x244
[<ffffff96e8836088>] __ip_append_data+0x230/0x814
[<ffffff96e8837090>] ip_make_skb+0xe4/0x178
[<ffffff96e8865444>] udp_sendmsg+0x828/0x888
[<ffffff96e8872818>] inet_sendmsg+0xe4/0x130
[<ffffff96e877c894>] ___sys_sendmsg+0x1d8/0x2c0
[<ffffff96e877ca0c>] SyS_sendmsg+0x90/0xe0

This panic is the result of allocating SKBs with small size
for the newly segmented SKB. If the scatter-gather feature is
disabled, the code attempts to call skb_put() on the small SKB
with an argument of nearly the entire unsegmented SKB length.

After this patch, attempting to use GSO with scatter-gather
disabled will result in -EINVAL being returned.

Fixes: 15e36f5b8e98 ("udp: paged allocation with gso")
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 net/ipv4/ip_output.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index b5e21eb..0d63690 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1054,8 +1054,16 @@ static int __ip_append_data(struct sock *sk,
 			copy = length;
 
 		if (!(rt->dst.dev->features&NETIF_F_SG)) {
+			struct sk_buff *tmp;
 			unsigned int off;
 
+			if (paged) {
+				err = -EINVAL;
+				while ((tmp = __skb_dequeue(queue)) != NULL)
+					kfree(tmp);
+				goto error;
+			}
+
 			off = skb->len;
 			if (getfrag(from, skb_put(skb, copy),
 					offset, copy, off, skb) < 0) {
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net-next] udp: Fix kernel panic in UDP GSO path
From: Eric Dumazet @ 2018-05-11  0:51 UTC (permalink / raw)
  To: Sean Tranchetti, willemb, davem, netdev; +Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1525999127-11585-1-git-send-email-stranche@codeaurora.org>



On 05/10/2018 05:38 PM, Sean Tranchetti wrote:
> Using GSO in the UDP path on a device with
> scatter-gather netdevice feature disabled will result in a kernel
> panic with the following call stack:
>
> This panic is the result of allocating SKBs with small size
> for the newly segmented SKB. If the scatter-gather feature is
> disabled, the code attempts to call skb_put() on the small SKB
> with an argument of nearly the entire unsegmented SKB length.
> 
> After this patch, attempting to use GSO with scatter-gather
> disabled will result in -EINVAL being returned.
> 
> Fixes: 15e36f5b8e98 ("udp: paged allocation with gso")
> Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
> ---
>  net/ipv4/ip_output.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index b5e21eb..0d63690 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1054,8 +1054,16 @@ static int __ip_append_data(struct sock *sk,
>  			copy = length;
>  
>  		if (!(rt->dst.dev->features&NETIF_F_SG)) {
> +			struct sk_buff *tmp;
>  			unsigned int off;
>  
> +			if (paged) {
> +				err = -EINVAL;
> +				while ((tmp = __skb_dequeue(queue)) != NULL)
> +					kfree(tmp);
> +				goto error;
> +			}
> +
>  			off = skb->len;
>  			if (getfrag(from, skb_put(skb, copy),
>  					offset, copy, off, skb) < 0) {
> 


Hmm, no, we absolutely need to fix GSO instead.

Think of a bonding device (or any virtual devices), your patch wont avoid the crash.

^ permalink raw reply

* Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS
From: Eric Dumazet @ 2018-05-11  0:55 UTC (permalink / raw)
  To: Gao Feng, Eric Dumazet
  Cc: davem@davemloft.net, daniel, jakub.kicinski, David Ahern,
	netdev@vger.kernel.org
In-Reply-To: <654af0ff.3e1.1634c90380e.Coremail.gfree.wind@vip.163.com>



On 05/10/2018 05:18 PM, Gao Feng wrote:
> At 2018-05-10 21:02:55, "Eric Dumazet" <eric.dumazet@gmail.com> wrote:
>>
>>
>> On 05/10/2018 01:28 AM, gfree.wind@vip.163.com wrote:
>>> From: Gao Feng <gfree.wind@vip.163.com>
>>>
>>> The skb flow limit is implemented for each CPU independently. In the
>>> current codes, the function skb_flow_limit gets the softnet_data by
>>> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
>>> the current cpu when enable RPS. As the result, the skb_flow_limit checks
>>> the stats of current CPU, while the skb is going to append the queue of
>>> another CPU. It isn't the expected behavior.
>>>
>>> Now pass the softnet_data as a param to softnet_data to make consistent.
>>>
>>
>> Please add a correct Fixes: tag
> 
> Thanks Eric.
> 
> I have one question about the "Fixes: tag".
> Most of patches are bug fixes, but when need to add the "Fixes: tag", and when not ?
> 
> I'm not clear about it. Could you explain it please?
> 

For this particular patch, since you have not CC Willem (author of the patch),
I found very useful that you did a search to find out.
Once you found which commit added the problem, simply add the Fixes: tag and CC: the author.

Doing so saves us (stable teams, reviewers, maintainers) a lot of time really.

In my opinion, Fixes: tags should be mandatory when applicable.

> Best Regards
> Feng
> 
>>
>> By doing so, you will likely add a CC: tag to make sure the author of the code
>> will receive your email and give feed back.
>>
>> Thanks !
>>

^ permalink raw reply

* Re: [PATCH net] tun: fix use after free for ptr_ring
From: Jason Wang @ 2018-05-11  1:29 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, LKML, Eric Dumazet,
	Michael S . Tsirkin
In-Reply-To: <CAM_iQpUVFZ-4EFeGM6eKyOrJzc2=5uu7b81d3Rf5Pf7TgZw38Q@mail.gmail.com>



On 2018年05月11日 02:08, Cong Wang wrote:
> On Tue, May 8, 2018 at 11:59 PM, Jason Wang <jasowang@redhat.com> wrote:
>> We used to initialize ptr_ring during TUNSETIFF, this is because its
>> size depends on the tx_queue_len of netdevice. And we try to clean it
>> up when socket were detached from netdevice. A race were spotted when
>> trying to do uninit during a read which will lead a use after free for
>> pointer ring. Solving this by always initialize a zero size ptr_ring
>> in open() and do resizing during TUNSETIFF, and then we can safely do
>> cleanup during close(). With this, there's no need for the workaround
>> that was introduced by commit 4df0bfc79904 ("tun: fix a memory leak
>> for tfile->tx_array").
>>
> Ah, I didn't know ptr_ring_init(0) could work... Nice patch!
> Except one thing below.
>
>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index ef33950..298cb96 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -681,15 +681,6 @@ static void tun_queue_purge(struct tun_file *tfile)
>>          skb_queue_purge(&tfile->sk.sk_error_queue);
>>   }
>>
>> -static void tun_cleanup_tx_ring(struct tun_file *tfile)
>> -{
>> -       if (tfile->tx_ring.queue) {
>> -               ptr_ring_cleanup(&tfile->tx_ring, tun_ptr_free);
>> -               xdp_rxq_info_unreg(&tfile->xdp_rxq);
>> -               memset(&tfile->tx_ring, 0, sizeof(tfile->tx_ring));
>> -       }
>> -}
>
> I don't think you can totally remove ptr_ring_cleanup(), it should be
> called unconditionally with your ptr_ring_init(0) trick, right?

Right, my bad. Actually I do intend to cleanup it at close() like what 
commit log said.

Will send v2.

Thanks

^ permalink raw reply

* Re: linux-next: Signed-off-by missing for commit in the net tree
From: Hangbin Liu @ 2018-05-11  1:30 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: David Miller, Networking, Linux-Next Mailing List,
	Linux Kernel Mailing List
In-Reply-To: <20180511071716.038a9095@canb.auug.org.au>

On Fri, May 11, 2018 at 07:17:16AM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Commit
> 
>   0e8411e426e2 ("ipv4: reset fnhe_mtu_locked after cache route flushed")
> 
> is missing a Signed-off-by from its author.

Opps, My bad.

> After route cache is flushed via ipv4_sysctl_rtcache_flush(), we forget
> to reset fnhe_mtu_locked in rt_bind_exception(). When pmtu is updated
> in __ip_rt_update_pmtu(), it will return directly since the pmtu is
> still locked. e.g.
>
> + ip netns exec client ping 10.10.1.1 -c 1 -s 1400 -M do
> PING 10.10.1.1 (10.10.1.1) 1400(1428) bytes of data.
> From 10.10.0.254 icmp_seq=1 Frag needed and DF set (mtu = 0)
>
> --- 10.10.1.1 ping statistics ---
> 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

I shouldn't add comments with the '---' lines. David reminded me before. But
I didn't realise it when pasted the ping logs. Another lesson learned...

Thanks Stephen.

Regards
Hangbin

^ permalink raw reply

* Re:Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS
From: Gao Feng @ 2018-05-11  1:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem@davemloft.net, daniel@iogearbox.net,
	jakub.kicinski@netronome.com, David Ahern, netdev@vger.kernel.org
In-Reply-To: <721ce144-2470-6124-1edd-cc7a343994a6@gmail.com>

<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><pre>At 2018-05-11 08:55:47, "Eric Dumazet" &lt;eric.dumazet@gmail.com&gt; wrote:
&gt;
&gt;
&gt;On 05/10/2018 05:18 PM, Gao Feng wrote:
&gt;&gt; At 2018-05-10 21:02:55, "Eric Dumazet" &lt;eric.dumazet@gmail.com&gt; wrote:
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; On 05/10/2018 01:28 AM, gfree.wind@vip.163.com wrote:
&gt;&gt;&gt;&gt; From: Gao Feng &lt;gfree.wind@vip.163.com&gt;
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; The skb flow limit is implemented for each CPU independently. In the
&gt;&gt;&gt;&gt; current codes, the function skb_flow_limit gets the softnet_data by
&gt;&gt;&gt;&gt; this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
&gt;&gt;&gt;&gt; the current cpu when enable RPS. As the result, the skb_flow_limit checks
&gt;&gt;&gt;&gt; the stats of current CPU, while the skb is going to append the queue of
&gt;&gt;&gt;&gt; another CPU. It isn't the expected behavior.
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; Now pass the softnet_data as a param to softnet_data to make consistent.
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; Please add a correct Fixes: tag
&gt;&gt; 
&gt;&gt; Thanks Eric.
&gt;&gt; 
&gt;&gt; I have one question about the "Fixes: tag".
&gt;&gt; Most of patches are bug fixes, but when need to add the "Fixes: tag", and when not ?
&gt;&gt; 
&gt;&gt; I'm not clear about it. Could you explain it please?
&gt;&gt; 
&gt;
&gt;For this particular patch, since you have not CC Willem (author of the patch),
&gt;I found very useful that you did a search to find out.
&gt;Once you found which commit added the problem, simply add the Fixes: tag and CC: the author.
&gt;
<div>&gt;Doing so saves us (stable teams, reviewers, maintainers) a lot of time really.</div><div><br /></div><div> Normally I get the "to" list by get_maintainer.pl script, now I would save the stable team ASAP.</div>&gt;
<div>&gt;In my opinion, Fixes: tags should be mandatory when applicable.</div><div><br /></div><div>Thanks your explanations, I get it.</div><div><br /></div><div>Best Regards</div><div>Feng</div><div><br /></div>&gt;
&gt;&gt; Best Regards
&gt;&gt; Feng
&gt;&gt; 
&gt;&gt;&gt;
&gt;&gt;&gt; By doing so, you will likely add a CC: tag to make sure the author of the code
&gt;&gt;&gt; will receive your email and give feed back.
&gt;&gt;&gt;
&gt;&gt;&gt; Thanks !
&gt;&gt;&gt;
</pre></div>

^ permalink raw reply

* Re: [PATCH] mlx4_core: allocate 4KB ICM chunks
From: Qing Huang @ 2018-05-11  1:36 UTC (permalink / raw)
  To: Yanjun Zhu, tariqt, davem; +Cc: netdev, linux-rdma, linux-kernel
In-Reply-To: <6768e075-70f5-4de3-a98a-fdffa53e0a2f@oracle.com>

Thank you for reviewing it!


On 5/10/2018 6:23 PM, Yanjun Zhu wrote:
>
>
>
> On 2018/5/11 9:15, Qing Huang wrote:
>>
>>
>>
>> On 5/10/2018 5:13 PM, Yanjun Zhu wrote:
>>>
>>>
>>> On 2018/5/11 7:31, Qing Huang wrote:
>>>> When a system is under memory presure (high usage with fragments),
>>>> the original 256KB ICM chunk allocations will likely trigger kernel
>>>> memory management to enter slow path doing memory compact/migration
>>>> ops in order to complete high order memory allocations.
>>>>
>>>> When that happens, user processes calling uverb APIs may get stuck
>>>> for more than 120s easily even though there are a lot of free pages
>>>> in smaller chunks available in the system.
>>>>
>>>> Syslog:
>>>> ...
>>>> Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
>>>> oracle_205573_e:205573 blocked for more than 120 seconds.
>>>> ...
>>>>
>>>> With 4KB ICM chunk size, the above issue is fixed.
>>>>
>>>> However in order to support 4KB ICM chunk size, we need to fix another
>>>> issue in large size kcalloc allocations.
>>>>
>>>> E.g.
>>>> Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
>>>> size, each ICM chunk can only hold 512 mtt entries (8 bytes for 
>>>> each mtt
>>>> entry). So we need a 16MB allocation for a table->icm pointer array to
>>>> hold 2M pointers which can easily cause kcalloc to fail.
>>>>
>>>> The solution is to use vzalloc to replace kcalloc. There is no need
>>>> for contiguous memory pages for a driver meta data structure (no need
>>> Hi,
>>>
>>> Replace continuous memory pages with virtual memory, is there any 
>>> performance loss?
>>
>> Not really. "table->icm" will be accessed as individual pointer 
>> variables randomly. Kcalloc
>
> Sure. Thanks. If "table->icm" will be accessed as individual pointer 
> variables randomly, the performance loss
> caused by discontinuous memory will be very trivial.
>
> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>
>> also returns a virtual address except its mapped pages are guaranteed 
>> to be contiguous
>> which will provide little advantage over vzalloc for individual 
>> pointer variable access.
>>
>> Qing
>>
>>>
>>> Zhu Yanjun
>>>> of DMA ops).
>>>>
>>>> Signed-off-by: Qing Huang <qing.huang@oracle.com>
>>>> Acked-by: Daniel Jurgens <danielj@mellanox.com>
>>>> ---
>>>>   drivers/net/ethernet/mellanox/mlx4/icm.c | 14 +++++++-------
>>>>   1 file changed, 7 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c 
>>>> b/drivers/net/ethernet/mellanox/mlx4/icm.c
>>>> index a822f7a..2b17a4b 100644
>>>> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
>>>> @@ -43,12 +43,12 @@
>>>>   #include "fw.h"
>>>>     /*
>>>> - * We allocate in as big chunks as we can, up to a maximum of 256 KB
>>>> - * per chunk.
>>>> + * We allocate in 4KB page size chunks to avoid high order memory
>>>> + * allocations in fragmented/high usage memory situation.
>>>>    */
>>>>   enum {
>>>> -    MLX4_ICM_ALLOC_SIZE    = 1 << 18,
>>>> -    MLX4_TABLE_CHUNK_SIZE    = 1 << 18
>>>> +    MLX4_ICM_ALLOC_SIZE    = 1 << 12,
>>>> +    MLX4_TABLE_CHUNK_SIZE    = 1 << 12
>>>>   };
>>>>     static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct 
>>>> mlx4_icm_chunk *chunk)
>>>> @@ -400,7 +400,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, 
>>>> struct mlx4_icm_table *table,
>>>>       obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size;
>>>>       num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk;
>>>>   -    table->icm      = kcalloc(num_icm, sizeof(*table->icm), 
>>>> GFP_KERNEL);
>>>> +    table->icm      = vzalloc(num_icm * sizeof(*table->icm));
>>>>       if (!table->icm)
>>>>           return -ENOMEM;
>>>>       table->virt     = virt;
>>>> @@ -446,7 +446,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, 
>>>> struct mlx4_icm_table *table,
>>>>               mlx4_free_icm(dev, table->icm[i], use_coherent);
>>>>           }
>>>>   -    kfree(table->icm);
>>>> +    vfree(table->icm);
>>>>         return -ENOMEM;
>>>>   }
>>>> @@ -462,5 +462,5 @@ void mlx4_cleanup_icm_table(struct mlx4_dev 
>>>> *dev, struct mlx4_icm_table *table)
>>>>               mlx4_free_icm(dev, table->icm[i], table->coherent);
>>>>           }
>>>>   -    kfree(table->icm);
>>>> +    vfree(table->icm);
>>>>   }
>>>
>>
>

^ permalink raw reply

* [PATCH bpf-next] samples/bpf: xdp_monitor, accept short options
From: Prashant Bhole @ 2018-05-11  1:37 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: Prashant Bhole, Jesper Dangaard Brouer, David S . Miller, netdev

updated optstring accept short options

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
---
 samples/bpf/xdp_monitor_user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/samples/bpf/xdp_monitor_user.c b/samples/bpf/xdp_monitor_user.c
index 894bc64c2cac..668511c77aaf 100644
--- a/samples/bpf/xdp_monitor_user.c
+++ b/samples/bpf/xdp_monitor_user.c
@@ -594,7 +594,7 @@ int main(int argc, char **argv)
 	snprintf(bpf_obj_file, sizeof(bpf_obj_file), "%s_kern.o", argv[0]);
 
 	/* Parse commands line args */
-	while ((opt = getopt_long(argc, argv, "h",
+	while ((opt = getopt_long(argc, argv, "hDSs:",
 				  long_options, &longindex)) != -1) {
 		switch (opt) {
 		case 'D':
-- 
2.13.6

^ permalink raw reply related

* [PATCH net-next] udp: avoid refcount_t saturation in __udp_gso_segment()
From: Eric Dumazet @ 2018-05-11  2:07 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Willem de Bruijn,
	Alexander Duyck

For some reason, Willem thought that the issue we fixed for TCP
in commit 7ec318feeed1 ("tcp: gso: avoid refcount_t warning from
tcp_gso_segment()") was not relevant for UDP GSO.

But syzbot found its way.

refcount_t: saturated; leaking memory.
WARNING: CPU: 0 PID: 10261 at lib/refcount.c:78 refcount_add_not_zero+0x2d4/0x320 lib/refcount.c:78
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 10261 Comm: syz-executor5 Not tainted 4.17.0-rc3+ #38
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 panic+0x22f/0x4de kernel/panic.c:184
 __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
 report_bug+0x252/0x2d0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
RIP: 0010:refcount_add_not_zero+0x2d4/0x320 lib/refcount.c:78
RSP: 0018:ffff880196db6b90 EFLAGS: 00010282
RAX: 0000000000000026 RBX: 00000000ffffff01 RCX: ffffc900040d9000
RDX: 0000000000004a29 RSI: ffffffff8160f6f1 RDI: ffff880196db66f0
RBP: ffff880196db6c78 R08: ffff8801b33d6740 R09: 0000000000000002
R10: ffff8801b33d6740 R11: 0000000000000000 R12: 0000000000000000
R13: 00000000ffffffff R14: ffff880196db6c50 R15: 0000000000020101
 refcount_add+0x1b/0x70 lib/refcount.c:102
 __udp_gso_segment+0xaa5/0xee0 net/ipv4/udp_offload.c:272
 udp4_ufo_fragment+0x592/0x7a0 net/ipv4/udp_offload.c:301
 inet_gso_segment+0x639/0x12b0 net/ipv4/af_inet.c:1342
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 __skb_gso_segment+0x3bb/0x870 net/core/dev.c:2865
 skb_gso_segment include/linux/netdevice.h:4050 [inline]
 validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3122
 __dev_queue_xmit+0xbf8/0x34c0 net/core/dev.c:3579
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3620
 neigh_direct_output+0x15/0x20 net/core/neighbour.c:1401
 neigh_output include/net/neighbour.h:483 [inline]
 ip_finish_output2+0xa5f/0x1840 net/ipv4/ip_output.c:229
 ip_finish_output+0x828/0xf80 net/ipv4/ip_output.c:317
 NF_HOOK_COND include/linux/netfilter.h:277 [inline]
 ip_output+0x21b/0x850 net/ipv4/ip_output.c:405
 dst_output include/net/dst.h:444 [inline]
 ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
 ip_send_skb+0x40/0xe0 net/ipv4/ip_output.c:1434
 udp_send_skb.isra.37+0x5eb/0x1000 net/ipv4/udp.c:825
 udp_push_pending_frames+0x5c/0xf0 net/ipv4/udp.c:853
 udp_v6_push_pending_frames+0x380/0x3e0 net/ipv6/udp.c:1105
 udp_lib_setsockopt+0x59a/0x600 net/ipv4/udp.c:2403
 udpv6_setsockopt+0x95/0xa0 net/ipv6/udp.c:1447
 sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3046
 __sys_setsockopt+0x1bd/0x390 net/socket.c:1903
 __do_sys_setsockopt net/socket.c:1914 [inline]
 __se_sys_setsockopt net/socket.c:1911 [inline]
 __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: ad405857b174 ("udp: better wmem accounting on gso")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 net/ipv4/udp_offload.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index ede2a7305b90f789c748d911530453ec2cbbfab7..92dc9e5a7ff3d0a7509bfa2a66e9189c8341a5fa 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -268,9 +268,17 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 		uh->check = gso_make_checksum(seg, ~check) ? : CSUM_MANGLED_0;
 
 	/* update refcount for the packet */
-	if (copy_dtor)
-		refcount_add(sum_truesize - gso_skb->truesize,
-			     &sk->sk_wmem_alloc);
+	if (copy_dtor) {
+		int delta = sum_truesize - gso_skb->truesize;
+
+		/* In some pathological cases, delta can be negative.
+		 * We need to either use refcount_add() or refcount_sub_and_test()
+		 */
+		if (likely(delta >= 0))
+			refcount_add(delta, &sk->sk_wmem_alloc);
+		else
+			WARN_ON_ONCE(refcount_sub_and_test(-delta, &sk->sk_wmem_alloc));
+	}
 	return segs;
 }
 EXPORT_SYMBOL_GPL(__udp_gso_segment);
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

* Re: [PATCH net] macmace: Set platform device coherent_dma_mask
From: Michael Schmitz @ 2018-05-11  2:11 UTC (permalink / raw)
  To: Finn Thain
  Cc: Geert Uytterhoeven, David S. Miller, linux-m68k, netdev,
	Linux Kernel Mailing List, Christoph Hellwig
In-Reply-To: <alpine.LNX.2.21.1805110921020.8@nippy.intranet>

Hi Finn,

On Fri, May 11, 2018 at 11:55 AM, Finn Thain <fthain@telegraphics.com.au> wrote:

>> > What's worse, if you do pass a dma_mask in struct
>> > platform_device_info, you end up with this problem in
>> > platform_device_register_full():
>> >
>> >         if (pdevinfo->dma_mask) {
>> >                 /*
>> >                  * This memory isn't freed when the device is put,
>> >                  * I don't have a nice idea for that though.  Conceptually
>> >                  * dma_mask in struct device should not be a pointer.
>> >                  * See http://thread.gmane.org/gmane.linux.kernel.pci/9081
>> >                  */
>> >                 pdev->dev.dma_mask =
>> >                         kmalloc(sizeof(*pdev->dev.dma_mask), GFP_KERNEL);
>>
>> Maybe platform_device_register_full() should rather check whether
>> dev.coherent_dma_mask is set, and make dev.dma_mask point to that? This
>> is how we solved the warning issue for the Zorro bus devices...
>> (8614f1b58bd0e920a5859464a500b93152c5f8b1)
>>
>
> The claim in the comment above that a pointer is the wrong solution
> suggests that your proposal won't get far. Also, your proposal doesn't

I read the comment to be mostly concerned about not freeing memory,
and attempted to address that. I won't pretend it's the right thing to
do if the pointer will go away anyway, and I certainly won't submit a
patch. Sorry for muddling the issue.

> address the other issues I raised: a new
> platform_device_register_simple_dma() API would only have two callers, and
> the dma mask setup for device-tree probed platform devices is apparently a
> work-in-progress (which I don't want to churn up).

Yes, and that's why I would prefer your old patch handling this in the
device driver (which Geert didn't like), or in the alternative to set
the mask up when registering a device with its bus where appropriate.

I concede this won't help with pure platform devices but as we can't
test all these, we should leave the fix for platfoem devices up to
Christoph.

>
>> > > With people setting the mask to kill the WARNING splat, this may
>> > > become more common.
>> >
>> > Since the commit which introduced the WARNING, only commits f61e64310b75
>> > ("m68k: set dma and coherent masks for platform FEC ethernets") and
>> > 7bcfab202ca7 ("powerpc/macio: set a proper dma_coherent_mask") seem to be
>> > aimed at squelching that WARNING.
>> >
>> > (Am I missing any others?)
>>
>> Zorro devices :-)
>
> Right, I should add commit 55496d3fe2ac ("zorro: Set up z->dev.dma_mask
> for the DMA API") to that list.
>
>> Which begs the question: why can' you set up all Nubus bus devices' DMA
>> masks in nubus_device_register(), or nubus_add_board()?
>
> I am expecting to see the same WARNING from the nubus sonic driver but it
> hasn't happened yet, so I don't have a patch for it yet. In anycase, the
> nubus fix would be a lot like the zorro bus fix, so I don't see a problem.

That's odd. But what I meant to say is that by setting up
dma_coherent_mask in nubus_add_board(), and pointing dma_mask to that,
ypu won't need any patches to Nubus device drivers.

I must be missing something else...

Cheers,

  Michael


>
> --

^ permalink raw reply

* Re: [PATCH] coredump: rename umh_pipe_setup() to coredump_pipe_setup()
From: Al Viro @ 2018-05-11  2:48 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Alexei Starovoitov, ast, linux-fsdevel, linux-kernel,
	David S. Miller, netdev
In-Reply-To: <20180510233247.GG27853@wotan.suse.de>

On Thu, May 10, 2018 at 11:32:47PM +0000, Luis R. Rodriguez wrote:

> I think net-next makes sense if Al Viro is OK with that. This way it could go
> in regardless of the state of your series, but it also lines up with your work.

Fine by me...

^ permalink raw reply

* [PATCH net V2] tun: fix use after free for ptr_ring
From: Jason Wang @ 2018-05-11  2:49 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: xiyou.wangcong, eric.dumazet, mst, Jason Wang

We used to initialize ptr_ring during TUNSETIFF, this is because its
size depends on the tx_queue_len of netdevice. And we try to clean it
up when socket were detached from netdevice. A race were spotted when
trying to do uninit during a read which will lead a use after free for
pointer ring. Solving this by always initialize a zero size ptr_ring
in open() and do resizing during TUNSETIFF, and then we can safely do
cleanup during close(). With this, there's no need for the workaround
that was introduced by commit 4df0bfc79904 ("tun: fix a memory leak
for tfile->tx_array").

Reported-by: syzbot+e8b902c3c3fadf0a9dba@syzkaller.appspotmail.com
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Fixes: 1576d9860599 ("tun: switch to use skb array for tx")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
Changes from v1:
- free ptr_ring during close()
- use tun_ptr_free() during resie for safety
---
 drivers/net/tun.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index ef33950..9fbbb32 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -681,15 +681,6 @@ static void tun_queue_purge(struct tun_file *tfile)
 	skb_queue_purge(&tfile->sk.sk_error_queue);
 }
 
-static void tun_cleanup_tx_ring(struct tun_file *tfile)
-{
-	if (tfile->tx_ring.queue) {
-		ptr_ring_cleanup(&tfile->tx_ring, tun_ptr_free);
-		xdp_rxq_info_unreg(&tfile->xdp_rxq);
-		memset(&tfile->tx_ring, 0, sizeof(tfile->tx_ring));
-	}
-}
-
 static void __tun_detach(struct tun_file *tfile, bool clean)
 {
 	struct tun_file *ntfile;
@@ -736,7 +727,8 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
 			    tun->dev->reg_state == NETREG_REGISTERED)
 				unregister_netdevice(tun->dev);
 		}
-		tun_cleanup_tx_ring(tfile);
+		if (tun)
+			xdp_rxq_info_unreg(&tfile->xdp_rxq);
 		sock_put(&tfile->sk);
 	}
 }
@@ -783,14 +775,14 @@ static void tun_detach_all(struct net_device *dev)
 		tun_napi_del(tun, tfile);
 		/* Drop read queue */
 		tun_queue_purge(tfile);
+		xdp_rxq_info_unreg(&tfile->xdp_rxq);
 		sock_put(&tfile->sk);
-		tun_cleanup_tx_ring(tfile);
 	}
 	list_for_each_entry_safe(tfile, tmp, &tun->disabled, next) {
 		tun_enable_queue(tfile);
 		tun_queue_purge(tfile);
+		xdp_rxq_info_unreg(&tfile->xdp_rxq);
 		sock_put(&tfile->sk);
-		tun_cleanup_tx_ring(tfile);
 	}
 	BUG_ON(tun->numdisabled != 0);
 
@@ -834,7 +826,8 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
 	}
 
 	if (!tfile->detached &&
-	    ptr_ring_init(&tfile->tx_ring, dev->tx_queue_len, GFP_KERNEL)) {
+	    ptr_ring_resize(&tfile->tx_ring, dev->tx_queue_len,
+			    GFP_KERNEL, tun_ptr_free)) {
 		err = -ENOMEM;
 		goto out;
 	}
@@ -3219,6 +3212,11 @@ static int tun_chr_open(struct inode *inode, struct file * file)
 					    &tun_proto, 0);
 	if (!tfile)
 		return -ENOMEM;
+	if (ptr_ring_init(&tfile->tx_ring, 0, GFP_KERNEL)) {
+		sk_free(&tfile->sk);
+		return -ENOMEM;
+	}
+
 	RCU_INIT_POINTER(tfile->tun, NULL);
 	tfile->flags = 0;
 	tfile->ifindex = 0;
@@ -3239,8 +3237,6 @@ static int tun_chr_open(struct inode *inode, struct file * file)
 
 	sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
 
-	memset(&tfile->tx_ring, 0, sizeof(tfile->tx_ring));
-
 	return 0;
 }
 
@@ -3249,6 +3245,7 @@ static int tun_chr_close(struct inode *inode, struct file *file)
 	struct tun_file *tfile = file->private_data;
 
 	tun_detach(tfile, true);
+	ptr_ring_cleanup(&tfile->tx_ring, tun_ptr_free);
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox