Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] udp: restore UDPlite many-cast delivery
From: David Miller @ 2016-11-16  3:14 UTC (permalink / raw)
  To: pablo; +Cc: netdev, edumazet, drheld
In-Reply-To: <1479163230-8734-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Mon, 14 Nov 2016 23:40:30 +0100

> Honor udptable parameter that is passed to __udp*_lib_mcast_deliver(),
> otherwise udplite broadcast/multicast use the wrong table and it breaks.
> 
> Fixes: 2dc41cff7545 ("udp: Use hash2 for long hash1 chains in __udp*_lib_mcast_deliver.")
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Applied and queued up for -stable, thanks Pablo.

^ permalink raw reply

* Re: [PATCH net-next v3 2/3] net: fsl: Allow most drivers to be built with COMPILE_TEST
From: kbuild test robot @ 2016-11-16  3:23 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: kbuild-all, netdev, davem, mw, arnd, gregory.clement, Shaohui.Xie,
	Florian Fainelli
In-Reply-To: <20161116004037.20941-4-f.fainelli@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3891 bytes --]

Hi Florian,

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-gianfar_ptp-Rename-FS-bit-to-FIPERST/20161116-095805
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sh 

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/freescale/fsl_pq_mdio.c: In function 'fsl_pq_mdio_remove':
>> drivers/net/ethernet/freescale/fsl_pq_mdio.c:498:27: warning: unused variable 'priv' [-Wunused-variable]
     struct fsl_pq_mdio_priv *priv = bus->priv;
                              ^~~~

vim +/priv +498 drivers/net/ethernet/freescale/fsl_pq_mdio.c

1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  482  	return 0;
1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  483  
dd3b8a32 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi      2012-08-29  484  error:
dd3b8a32 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi      2012-08-29  485  	if (priv->map)
b3319b10 drivers/net/fsl_pq_mdio.c                    Anton Vorontsov 2009-12-30  486  		iounmap(priv->map);
dd3b8a32 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi      2012-08-29  487  
1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  488  	kfree(new_bus);
dd3b8a32 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi      2012-08-29  489  
1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  490  	return err;
1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  491  }
1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  492  
1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  493  
5078ac79 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi      2012-08-29  494  static int fsl_pq_mdio_remove(struct platform_device *pdev)
1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  495  {
5078ac79 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi      2012-08-29  496  	struct device *device = &pdev->dev;
1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  497  	struct mii_bus *bus = dev_get_drvdata(device);
b3319b10 drivers/net/fsl_pq_mdio.c                    Anton Vorontsov 2009-12-30 @498  	struct fsl_pq_mdio_priv *priv = bus->priv;
1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  499  
1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  500  	mdiobus_unregister(bus);
1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  501  
b3319b10 drivers/net/fsl_pq_mdio.c                    Anton Vorontsov 2009-12-30  502  	iounmap(priv->map);
1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  503  	mdiobus_free(bus);
1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  504  
1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  505  	return 0;
1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  506  }

:::::: The code at line 498 was first introduced by commit
:::::: b3319b10523d8dac82b134a05de2a403119abebd fsl_pq_mdio: Fix iomem unmapping for non-eTSEC2.0 controllers

:::::: TO: Anton Vorontsov <avorontsov@ru.mvista.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 42369 bytes --]

^ permalink raw reply

* Re: [PATCH net-next V6 0/9] liquidio CN23XX VF support
From: David Miller @ 2016-11-16  3:25 UTC (permalink / raw)
  To: rvatsavayi; +Cc: netdev
In-Reply-To: <1479167687-9904-1-git-send-email-rvatsavayi@caviumnetworks.com>

From: Raghu Vatsavayi <rvatsavayi@caviumnetworks.com>
Date: Mon, 14 Nov 2016 15:54:38 -0800

> Following is the V6 patch series for adding VF support on
> CN23XX devices. This version addressed:
> 1) Your concern for ordering of local variable declarations
>    from longest to shortest line.
> 2) Removed module parameters max_vfs, num_queues_per_{p,v}f.
> 3) Minor changes for fixing new checkpatch script related 
>    errors on pre-existing driver.
> 4) Fixed compilation issues when CONFIG_PCI_IOV/CONFIG_PCI_ATS
>    options are disabled.
> 5) Modified qualifiers for printing mac addresses with pM format.
> 
> I will post remaining VF patches soon after this patchseries is
> applied. Please apply patches in the following order as some of
> the patches depend on earlier patches.

Series applied, thanks.

^ permalink raw reply

* [PATCH] net: dsa: mv88e6xxx: Respect SPEED_UNFORCED, don't set force bit
From: Andrew Lunn @ 2016-11-16  3:26 UTC (permalink / raw)
  To: David Miller; +Cc: Vivien Didelot, netdev, Andrew Lunn

The SPEED_UNFORCED indicates the MAC & PHY should perform
auto-negotiation to determine a speed which works. If this is called
for, don't set the force bit. If it is set, the MAC actually does
10Gbps, why the internal PHYs don't support.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/dsa/mv88e6xxx/port.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
index e4978f6367aa..af4772d86086 100644
--- a/drivers/net/dsa/mv88e6xxx/port.c
+++ b/drivers/net/dsa/mv88e6xxx/port.c
@@ -213,7 +213,7 @@ static int mv88e6xxx_port_set_speed(struct mv88e6xxx_chip *chip, int port,
 		reg &= ~PORT_PCS_CTRL_ALTSPEED;
 	if (force_bit) {
 		reg &= ~PORT_PCS_CTRL_FORCE_SPEED;
-		if (speed)
+		if (speed != SPEED_UNFORCED)
 			ctrl |= PORT_PCS_CTRL_FORCE_SPEED;
 	}
 	reg |= ctrl;
-- 
2.10.2

^ permalink raw reply related

* Re: [PATCH net] gro_cells: mark napi struct as not busy poll candidates
From: David Miller @ 2016-11-16  3:29 UTC (permalink / raw)
  To: eric.dumazet
  Cc: ebiederm, paulmck, xiyou.wangcong, rolf.neugebauer, netdev,
	justin.cormack, ian.campbell, edumazet
In-Reply-To: <1479169722.8455.108.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 14 Nov 2016 16:28:42 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Rolf Neugebauer reported very long delays at netns dismantle.
> 
> Eric W. Biederman was kind enough to look at this problem
> and noticed synchronize_net() occurring from netif_napi_del() that was
> added in linux-4.5
> 
> Busy polling makes no sense for tunnels NAPI.
> If busy poll is used for sessions over tunnels, the poller will need to
> poll the physical device queue anyway.
> 
> netif_tx_napi_add() could be used here, but function name is misleading,
> and renaming it is not stable material, so set NAPI_STATE_NO_BUSY_POLL
> bit directly.
> 
> This will avoid inserting gro_cells napi structures in napi_hash[]
> and avoid the problematic synchronize_net() (per possible cpu) that
> Rolf reported.
> 
> Fixes: 93d05d4a320c ("net: provide generic busy polling to all NAPI drivers")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
> Reported-by: Eric W. Biederman <ebiederm@xmission.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] tcp: allow to enable the repair mode for non-listening sockets
From: David Miller @ 2016-11-16  3:29 UTC (permalink / raw)
  To: avagin; +Cc: linux-kernel, netdev, kuznet, jmorris, yoshfuji, kaber, criu
In-Reply-To: <1479176114-12658-1-git-send-email-avagin@openvz.org>

From: Andrei Vagin <avagin@openvz.org>
Date: Mon, 14 Nov 2016 18:15:14 -0800

> The repair mode is used to get and restore sequence numbers and
> data from queues. It used to checkpoint/restore connections.
> 
> Currently the repair mode can be enabled for sockets in the established
> and closed states, but for other states we have to dump the same socket
> properties, so lets allow to enable repair mode for these sockets.
> 
> The repair mode reveals nothing more for sockets in other states.
> 
> Signed-off-by: Andrei Vagin <avagin@openvz.org>

Applied.

^ permalink raw reply

* Re: [PATCH] net: ioctl SIOCSIFADDR minor cleanup
From: David Miller @ 2016-11-16  3:30 UTC (permalink / raw)
  To: Linyu.Yuan; +Cc: cugyly, netdev
In-Reply-To: <8729016553E3654398EA69218DA29EEF0E42F73A@cnshjmbx02>

From: YUAN Linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Date: Wed, 16 Nov 2016 03:13:31 +0000

> orginal code means when reqest name have colon, then label name will have colon.

And that is intentional.

^ permalink raw reply

* Re: [PATCH] net: ioctl SIOCSIFADDR minor cleanup
From: David Miller @ 2016-11-16  3:31 UTC (permalink / raw)
  To: Linyu.Yuan; +Cc: cugyly, netdev
In-Reply-To: <8729016553E3654398EA69218DA29EEF0E42F73A@cnshjmbx02>

From: YUAN Linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Date: Wed, 16 Nov 2016 03:13:31 +0000

> So assign label to request name will do same thing as original code.

Nope.  dev->name does not have the colon, it was trimmed from the
string for the device lookup.  So the found device's dev->name does
not have the colon character, even if it was in ifr.ifr_name

This was my entire point.

You are changing the behvaior of the code in an invalid way.

^ permalink raw reply

* [PATCH iproute2] ifstat/nstat: fix help output alignment
From: Mike Frysinger @ 2016-11-16  3:34 UTC (permalink / raw)
  To: stephen.hemminger, netdev

Some lines use tabs while others use spaces.  Use spaces everywhere.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 misc/ifstat.c | 24 ++++++++++++------------
 misc/nstat.c  | 22 +++++++++++-----------
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/misc/ifstat.c b/misc/ifstat.c
index d55197375e3c..92d67b0c5fd1 100644
--- a/misc/ifstat.c
+++ b/misc/ifstat.c
@@ -662,18 +662,18 @@ static void usage(void)
 {
 	fprintf(stderr,
 "Usage: ifstat [OPTION] [ PATTERN [ PATTERN ] ]\n"
-"   -h, --help		this message\n"
-"   -a, --ignore	ignore history\n"
-"   -d, --scan=SECS	sample every statistics every SECS\n"
-"   -e, --errors	show errors\n"
-"   -j, --json          format output in JSON\n"
-"   -n, --nooutput	do history only\n"
-"   -p, --pretty        pretty print\n"
-"   -r, --reset		reset history\n"
-"   -s, --noupdate	don\'t update history\n"
-"   -t, --interval=SECS	report average over the last SECS\n"
-"   -V, --version	output version information\n"
-"   -z, --zeros		show entries with zero activity\n");
+"   -h, --help           this message\n"
+"   -a, --ignore         ignore history\n"
+"   -d, --scan=SECS      sample every statistics every SECS\n"
+"   -e, --errors         show errors\n"
+"   -j, --json           format output in JSON\n"
+"   -n, --nooutput       do history only\n"
+"   -p, --pretty         pretty print\n"
+"   -r, --reset          reset history\n"
+"   -s, --noupdate       don't update history\n"
+"   -t, --interval=SECS  report average over the last SECS\n"
+"   -V, --version        output version information\n"
+"   -z, --zeros          show entries with zero activity\n");
 
 	exit(-1);
 }
diff --git a/misc/nstat.c b/misc/nstat.c
index 1cb6c7eea27a..1212b1f2c812 100644
--- a/misc/nstat.c
+++ b/misc/nstat.c
@@ -526,17 +526,17 @@ static void usage(void)
 {
 	fprintf(stderr,
 "Usage: nstat [OPTION] [ PATTERN [ PATTERN ] ]\n"
-"   -h, --help		this message\n"
-"   -a, --ignore	ignore history\n"
-"   -d, --scan=SECS	sample every statistics every SECS\n"
-"   -j, --json          format output in JSON\n"
-"   -n, --nooutput	do history only\n"
-"   -p, --pretty        pretty print\n"
-"   -r, --reset		reset history\n"
-"   -s, --noupdate	don\'t update history\n"
-"   -t, --interval=SECS	report average over the last SECS\n"
-"   -V, --version	output version information\n"
-"   -z, --zeros		show entries with zero activity\n");
+"   -h, --help           this message\n"
+"   -a, --ignore         ignore history\n"
+"   -d, --scan=SECS      sample every statistics every SECS\n"
+"   -j, --json           format output in JSON\n"
+"   -n, --nooutput       do history only\n"
+"   -p, --pretty         pretty print\n"
+"   -r, --reset          reset history\n"
+"   -s, --noupdate       don't update history\n"
+"   -t, --interval=SECS  report average over the last SECS\n"
+"   -V, --version        output version information\n"
+"   -z, --zeros          show entries with zero activity\n");
 	exit(-1);
 }
 
-- 
2.10.2

^ permalink raw reply related

* Re: [PATCH net-next v8 0/9] dpaa_eth: Add the QorIQ DPAA Ethernet driver
From: David Miller @ 2016-11-16  3:34 UTC (permalink / raw)
  To: madalin.bucur
  Cc: netdev, linuxppc-dev, linux-kernel, oss, ppc, joe, pebolle,
	joakim.tjernlund
In-Reply-To: <1479199269-9748-1-git-send-email-madalin.bucur@nxp.com>

From: Madalin Bucur <madalin.bucur@nxp.com>
Date: Tue, 15 Nov 2016 10:41:00 +0200

> This patch series adds the Ethernet driver for the Freescale
> QorIQ Data Path Acceleration Architecture (DPAA).

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next 2/2] net: marvell: Allow drivers to be built with COMPILE_TEST
From: kbuild test robot @ 2016-11-16  3:35 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: kbuild-all, netdev, davem, mw, arnd, gregory.clement, Shaohui.Xie,
	Igal.Liberman, Florian Fainelli
In-Reply-To: <20161115173548.32567-3-f.fainelli@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3163 bytes --]

Hi Florian,

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-ethernet-Allow-Marvell-Freescale-to-COMPILE_TEST/20161116-024633
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `.mvneta_bm_pool_use':
>> (.text+0x19a18cc): undefined reference to `.mvebu_mbus_get_dram_win_info'
   drivers/built-in.o: In function `.brcmf_create_iovar.constprop.1':
   fwil.c:(.text+0x1ff95f0): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `.memcpy' defined in .text section in arch/powerpc/lib/built-in.o
   fwil.c:(.text+0x1ff9614): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `.memcpy' defined in .text section in arch/powerpc/lib/built-in.o
   drivers/built-in.o: In function `.brcmf_create_bsscfg.constprop.0':
   fwil.c:(.text+0x1ff976c): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `.memcpy' defined in .text section in arch/powerpc/lib/built-in.o
   fwil.c:(.text+0x1ff9790): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `.memcpy' defined in .text section in arch/powerpc/lib/built-in.o
   drivers/built-in.o: In function `.brcmf_fil_cmd_data_get':
   (.text+0x1ff9c7c): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `.debug_lockdep_rcu_enabled' defined in .text section in kernel/built-in.o
   drivers/built-in.o: In function `.brcmf_fil_cmd_data_get':
   (.text+0x1ff9dc4): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `.debug_lockdep_rcu_enabled' defined in .text section in kernel/built-in.o
   drivers/built-in.o: In function `.brcmf_fil_iovar_data_set':
   (.text+0x1ffa1ac): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `.debug_lockdep_rcu_enabled' defined in .text section in kernel/built-in.o
   drivers/built-in.o: In function `.brcmf_fil_iovar_data_set':
   (.text+0x1ffa2f4): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `.debug_lockdep_rcu_enabled' defined in .text section in kernel/built-in.o
   drivers/built-in.o: In function `.brcmf_fil_iovar_data_get':
   (.text+0x1ffa51c): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `.memcpy' defined in .text section in arch/powerpc/lib/built-in.o
   drivers/built-in.o: In function `.brcmf_fil_iovar_data_get':
   (.text+0x1ffa5fc): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `.debug_lockdep_rcu_enabled' defined in .text section in kernel/built-in.o
   drivers/built-in.o: In function `.brcmf_fil_iovar_data_get':
   (.text+0x1ffa744): additional relocation overflows omitted from the output

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 51094 bytes --]

^ permalink raw reply

* Re: [PATCH net] rtnetlink: fix rtnl_vfinfo_size
From: David Miller @ 2016-11-16  3:42 UTC (permalink / raw)
  To: sd; +Cc: netdev, eranbe, hadarh, ogerlitz, sucheta.chakraborty
In-Reply-To: <188991b3f1ede7ff1231f78411f4716cb5926950.1479202614.git.sd@queasysnail.net>

From: Sabrina Dubroca <sd@queasysnail.net>
Date: Tue, 15 Nov 2016 10:39:03 +0100

> The size reported by rtnl_vfinfo_size doesn't match the space used by
> rtnl_fill_vfinfo.
> 
> rtnl_vfinfo_size currently doesn't account for the nest attributes
> used by statistics (added in commit 3b766cd83232), nor for struct
> ifla_vf_tx_rate (since commit ed616689a3d9, which added ifla_vf_rate
> to the dump without removing ifla_vf_tx_rate, but replaced
> ifla_vf_tx_rate with ifla_vf_rate in the size computation).
> 
> Fixes: 3b766cd83232 ("net/core: Add reading VF statistics through the PF netdevice")
> Fixes: ed616689a3d9 ("net-next:v4: Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool")
> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net] rtnetlink: fix rtnl message size computation for XDP
From: David Miller @ 2016-11-16  3:43 UTC (permalink / raw)
  To: sd; +Cc: netdev, bblanco
In-Reply-To: <0169d2419991a4ba533087a4fda70420ccdaca30.1479204870.git.sd@queasysnail.net>

From: Sabrina Dubroca <sd@queasysnail.net>
Date: Tue, 15 Nov 2016 11:16:35 +0100

> rtnl_xdp_size() only considers the size of the actual payload attribute,
> and misses the space taken by the attribute used for nesting (IFLA_XDP).
> 
> Fixes: d1fdd9138682 ("rtnl: add option for setting link xdp prog")
> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net-next v3 0/5] Adding PHY-Tunables and downshift support
From: David Miller @ 2016-11-16  3:44 UTC (permalink / raw)
  To: allan.nielsen; +Cc: netdev, andrew, raju.lakkaraju
In-Reply-To: <1479205204-27768-1-git-send-email-allan.nielsen@microsemi.com>

From: "Allan W. Nielsen" <allan.nielsen@microsemi.com>
Date: Tue, 15 Nov 2016 11:19:59 +0100

> Old cover letters included below.

Please do not form your cover letter by quoting older and older
cover letters.

Always write a clean, single, cover letter with a changelog of
changed between each and every version leading up to the
current one.

Thank you.

^ permalink raw reply

* Re: [PATCH net-next V2 0/9] alx: add multi queue support
From: David Miller @ 2016-11-16  3:46 UTC (permalink / raw)
  To: tobias.regnery; +Cc: netdev, jcliburn, chris.snook
In-Reply-To: <cover.1479208627.git.tobias.regnery@gmail.com>

From: Tobias Regnery <tobias.regnery@gmail.com>
Date: Tue, 15 Nov 2016 12:43:07 +0100

> This patchset lays the groundwork for multi queue support in the alx driver
> and enables multi queue support for the tx path by default. The hardware
> supports up to 4 tx queues. 

Series applied, thanks.

^ permalink raw reply

* RE: [PATCH] net: ioctl SIOCSIFADDR minor cleanup
From: YUAN Linyu @ 2016-11-16  3:57 UTC (permalink / raw)
  To: David Miller; +Cc: cugyly@163.com, netdev@vger.kernel.org
In-Reply-To: <20161115.223111.734917440131578613.davem@davemloft.net>

No, this patch will not change dev->name,
It's care about ifa->ifa_label.
> -			if (colon)
> -				memcpy(ifa->ifa_label, ifr.ifr_name, IFNAMSIZ);
> -			else
> -				memcpy(ifa->ifa_label, dev->name, IFNAMSIZ);
When ifr.ifr_name have no colon, dev->name must equal to ifr.ifr_name.
So we change to 
> -			else
> -				memcpy(ifa->ifa_label, ifr.ifr_name, IFNAMSIZ);
Then if and else do same thing. Just one line is enough,
memcpy(ifa->ifa_label, ifr.ifr_name, IFNAMSIZ);

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Wednesday, November 16, 2016 11:31 AM
> To: YUAN Linyu
> Cc: cugyly@163.com; netdev@vger.kernel.org
> Subject: Re: [PATCH] net: ioctl SIOCSIFADDR minor cleanup
> 
> From: YUAN Linyu <Linyu.Yuan@alcatel-sbell.com.cn>
> Date: Wed, 16 Nov 2016 03:13:31 +0000
> 
> > So assign label to request name will do same thing as original code.
> 
> Nope.  dev->name does not have the colon, it was trimmed from the
> string for the device lookup.  So the found device's dev->name does
> not have the colon character, even if it was in ifr.ifr_name
> 
> This was my entire point.
> 
> You are changing the behvaior of the code in an invalid way.

^ permalink raw reply

* Re: [RFC PATCH 1/2] net: use cmpxchg instead of spinlock in ptr rings
From: John Fastabend @ 2016-11-16  4:30 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: jasowang, netdev, linux-kernel
In-Reply-To: <20161115002552-mutt-send-email-mst@kernel.org>

On 16-11-14 03:01 PM, Michael S. Tsirkin wrote:
> On Thu, Nov 10, 2016 at 08:44:08PM -0800, John Fastabend wrote:
>>
>> ---
>>  include/linux/ptr_ring_ll.h |  136 +++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/skb_array.h   |   25 ++++++++
>>  2 files changed, 161 insertions(+)
>>  create mode 100644 include/linux/ptr_ring_ll.h
>>
>> diff --git a/include/linux/ptr_ring_ll.h b/include/linux/ptr_ring_ll.h
>> new file mode 100644
>> index 0000000..bcb11f3
>> --- /dev/null
>> +++ b/include/linux/ptr_ring_ll.h
>> @@ -0,0 +1,136 @@
>> +/*
>> + *	Definitions for the 'struct ptr_ring_ll' datastructure.
>> + *
>> + *	Author:
>> + *		John Fastabend <john.r.fastabend@intel.com>
>> + *
>> + *	Copyright (C) 2016 Intel Corp.
>> + *
>> + *	This program is free software; you can redistribute it and/or modify it
>> + *	under the terms of the GNU General Public License as published by the
>> + *	Free Software Foundation; either version 2 of the License, or (at your
>> + *	option) any later version.
>> + *
>> + *	This is a limited-size FIFO maintaining pointers in FIFO order, with
>> + *	one CPU producing entries and another consuming entries from a FIFO.
>> + *	extended from ptr_ring_ll to use cmpxchg over spin lock.
> 
> So when is each one (ptr-ring/ptr-ring-ll) a win? _ll suffix seems to
> imply this gives a better latency, OTOH for a ping/pong I suspect
> ptr-ring would be better as it avoids index cache line bounces.

My observation under qdisc testing with pktgen is that I get better pps
numbers with this code vs ptr_ring using spinlock. I actually wrote
this implementation before the skb_array code was around though and
haven't done a thorough analysis of the two yet only pktgen benchmarks.

In my pktgen benchmarks I test 1:1 producer/consumer and many to one
producer/consumer tests. I'll post some numbers later this week.

[...]

>> + */
>> +static inline int __ptr_ring_ll_produce(struct ptr_ring_ll *r, void *ptr)
>> +{
>> +	u32 ret, head, tail, next, slots, mask;
>> +
>> +	do {
>> +		head = READ_ONCE(r->prod_head);
>> +		mask = READ_ONCE(r->prod_mask);
>> +		tail = READ_ONCE(r->cons_tail);
>> +
>> +		slots = mask + tail - head;
>> +		if (slots < 1)
>> +			return -ENOMEM;
>> +
>> +		next = head + 1;
>> +		ret = cmpxchg(&r->prod_head, head, next);
>> +	} while (ret != head);
> 
> 
> So why is this preferable to a lock?
> 
> I suspect it's nothing else than the qspinlock fairness
> and polling code complexity. It's all not very useful if you
> 1. are just doing a couple of instructions under the lock
> and
> 2. use a finite FIFO which is unfair anyway
> 
> 
> How about this hack (lifted from virt_spin_lock):
> 
> static inline void quick_spin_lock(struct qspinlock *lock)
> {
>         do {
>                 while (atomic_read(&lock->val) != 0)
>                         cpu_relax();
>         } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
> }
> 
> Or maybe we should even drop the atomic_read in the middle -
> worth profiling and comparing:
> 
> static inline void quick_spin_lock(struct qspinlock *lock)
> {
>         while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0)
> 		cpu_relax();
> }
> 
> 
> Then, use quick_spin_lock instead of spin_lock everywhere in
> ptr_ring - will that make it more efficient?
> 

I think this could be the case. I'll give it a test later this week I
am working on the xdp bits for virtio at the moment. To be honest though
for my qdisc patchset first I need to resolve a bug and then probably in
the first set just use the existing skb_array implementation. Its fun
to micro-optimize this stuff but really any implementation will show
improvement over existing code.

Thanks,
John

^ permalink raw reply

* Re: [RFC PATCH 1/2] net: use cmpxchg instead of spinlock in ptr rings
From: John Fastabend @ 2016-11-16  4:37 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev@vger.kernel.org; +Cc: Michael S. Tsirkin
In-Reply-To: <20161115143258.2c46fc9a@redhat.com>

On 16-11-15 05:32 AM, Jesper Dangaard Brouer wrote:
> 
> (looks like my message didn't reach the netdev list, due to me sending
> from the wrong email, forwarded message again):
> 
> On Thu, 10 Nov 2016 20:44:08 -0800 John Fastabend <john.fastabend@gmail.com> wrote:
> 
>> ---
>>  include/linux/ptr_ring_ll.h |  136 +++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/skb_array.h   |   25 ++++++++
>>  2 files changed, 161 insertions(+)
>>  create mode 100644 include/linux/ptr_ring_ll.h
>>
>> diff --git a/include/linux/ptr_ring_ll.h b/include/linux/ptr_ring_ll.h
>> new file mode 100644
>> index 0000000..bcb11f3
>> --- /dev/null
>> +++ b/include/linux/ptr_ring_ll.h
>> @@ -0,0 +1,136 @@
>> +/*
>> + *	Definitions for the 'struct ptr_ring_ll' datastructure.
>> + *
>> + *	Author:
>> + *		John Fastabend <john.r.fastabend@intel.com>  
> [...]
>> + *
>> + *	This is a limited-size FIFO maintaining pointers in FIFO order, with
>> + *	one CPU producing entries and another consuming entries from a FIFO.
>> + *	extended from ptr_ring_ll to use cmpxchg over spin lock.  
> 
> It sounds like this is Single Producer Single Consumer (SPSC)
> implementation, but your implementation actually is Multi Producer
> Multi Consumer (MPMC) capable.

Correct qdisc requires a MPMC to handle all the OOO cases.

> 
> The implementation looks a lot like my alf_queue[1] implementation:
>  [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/include/linux/alf_queue.h
> 

Sure, I was using that implementation originally.

> If the primary use-case is one CPU producing and another consuming,
> then the normal ptr_ring (skb_array) will actually be faster!
> 
> The reason is ptr_ring avoids bouncing a cache-line between the CPUs on
> every ring access.  This is achieved by having the checks for full
> (__ptr_ring_full) and empty (__ptr_ring_empty) use the contents of the
> array (NULL value).
> 
> I actually implemented two micro-benchmarks to measure the difference
> between skb_array[2] and alf_queue[3]:
>  [2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/skb_array_parallel01.c
>  [3] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/alf_queue_parallel01.c
> 

But :) this doesn't jive with my experiments where this implementation
was actually giving better numbers with pktgen over pfifo_fast even in
the SPSC case. I'll rerun metrics later this week its possible there was
some other issue causing the difference I guess.

As I noted in Michael's email though really I need to fix a bug in my
qdisc code and submit it before I worry too much about this
optimization.

> 
>> + */
>> +
>> +#ifndef _LINUX_PTR_RING_LL_H
>> +#define _LINUX_PTR_RING_LL_H 1
>> +  
> [...]
>> +
>> +struct ptr_ring_ll {
>> +	u32 prod_size;
>> +	u32 prod_mask;
>> +	u32 prod_head;
>> +	u32 prod_tail;
>> +	u32 cons_size;
>> +	u32 cons_mask;
>> +	u32 cons_head;
>> +	u32 cons_tail;
>> +
>> +	void **queue;
>> +};  
> 
> Your implementation doesn't even split the consumer and producer into
> different cachelines (which in practice doesn't help much due to how
> the empty/full checks are performed).

Its was just a implementation to get the qdisc patches off the ground. I
expected to follow up with patches to optimize the implementation.


[...]

>> +static inline int ptr_ring_ll_init(struct ptr_ring_ll *r, int size, gfp_t gfp)
>> +{
>> +	r->queue = __ptr_ring_init_queue_alloc(size, gfp);
>> +	if (!r->queue)
>> +		return -ENOMEM;
>> +
>> +	r->prod_size = r->cons_size = size;
>> +	r->prod_mask = r->cons_mask = size - 1;  
> 
> Shouldn't we have some check like is_power_of_2(size), as this code
> looks like it depend on this.
> 

Sure it is required. I was just ensuring callers do it correctly.

>> +	r->prod_tail = r->prod_head = 0;
>> +	r->cons_tail = r->prod_tail = 0;
>> +
>> +	return 0;
>> +}
>> +  

[...]

>>  
>> +static inline struct sk_buff *skb_array_ll_consume(struct skb_array_ll *a)
>> +{
>> +	return __ptr_ring_ll_consume(&a->ring);
>> +}
>> +  
> 
> Note in the Multi Producer Multi Consumer (MPMC) use-case this type of
> queue can be faster than normal ptr_ring.  And in patch2 you implement
> bulking, which is where the real benefit shows (in the MPMC case) for
> this kind of queue.
> 
> What I would really like to see is a lock-free (locked cmpxchg) queue
> implementation, what like ptr_ring use the array as empty/full check,
> and still (somehow) support bulking.
> 

OK perhaps worth experimenting with after if I can _finally_ get the
qdisc series in.

.John

^ permalink raw reply

* Re: [RFC PATCH 2/2] ptr_ring_ll: pop/push multiple objects at once
From: John Fastabend @ 2016-11-16  4:42 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: jasowang, netdev, linux-kernel
In-Reply-To: <20161115010140-mutt-send-email-mst@kernel.org>

On 16-11-14 03:06 PM, Michael S. Tsirkin wrote:
> On Thu, Nov 10, 2016 at 08:44:32PM -0800, John Fastabend wrote:
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> 
> This will naturally reduce the cache line bounce
> costs, but so will a _many API for ptr-ring,
> doing lock-add many-unlock.
> 
> the number of atomics also scales better with the lock:
> one per push instead of one per queue.
> 
> Also, when can qdisc use a _many operation?
> 

On dequeue we can pull off many skbs instead of one at a time and
then either (a) pass them down as an array to the driver (I started
to write this on top of ixgbe and it seems like a win) or (b) pass
them one by one down to the driver and set the xmit_more bit correctly.

The pass one by one also seems like a win because we avoid the lock
per skb.

On enqueue qdisc side its a bit more evasive to start doing this.

[...]

>> +++ b/net/sched/sch_generic.c
>> @@ -571,7 +571,7 @@ static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc,
>>  	struct skb_array_ll *q = band2list(priv, band);
>>  	int err;
>>  
>> -	err = skb_array_ll_produce(q, skb);
>> +	err = skb_array_ll_produce(q, &skb);
>>  
>>  	if (unlikely(err)) {
>>  		net_warn_ratelimited("drop a packet from fast enqueue\n");
> 
> I don't see a pop many operation here.
> 

Patches need a bit of cleanup looks like it was part of another patch.

.John

^ permalink raw reply

* Re: [RFC PATCH 2/2] ptr_ring_ll: pop/push multiple objects at once
From: Michael S. Tsirkin @ 2016-11-16  5:23 UTC (permalink / raw)
  To: John Fastabend; +Cc: jasowang, netdev, linux-kernel
In-Reply-To: <582BE39B.9050007@gmail.com>

On Tue, Nov 15, 2016 at 08:42:03PM -0800, John Fastabend wrote:
> On 16-11-14 03:06 PM, Michael S. Tsirkin wrote:
> > On Thu, Nov 10, 2016 at 08:44:32PM -0800, John Fastabend wrote:
> >> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> > 
> > This will naturally reduce the cache line bounce
> > costs, but so will a _many API for ptr-ring,
> > doing lock-add many-unlock.
> > 
> > the number of atomics also scales better with the lock:
> > one per push instead of one per queue.
> > 
> > Also, when can qdisc use a _many operation?
> > 
> 
> On dequeue we can pull off many skbs instead of one at a time and
> then either (a) pass them down as an array to the driver (I started
> to write this on top of ixgbe and it seems like a win) or (b) pass
> them one by one down to the driver and set the xmit_more bit correctly.
> 
> The pass one by one also seems like a win because we avoid the lock
> per skb.
> 
> On enqueue qdisc side its a bit more evasive to start doing this.
> 
> 
> [...]

I see. So we could wrap __ptr_ring_consume and
implement __skb_array_consume. You can call that
in a loop under a lock. I would limit it to something
small like 16 pointers, to make sure lock contention is
not an issue.

-- 
MST

^ permalink raw reply

* Re: [PATCH] igmp: Make igmp group member RFC 3376 compliant
From: Hangbin Liu @ 2016-11-16  6:20 UTC (permalink / raw)
  To: Michal Tesar; +Cc: David Miller, kuznet, jmorris, kaber, netdev
In-Reply-To: <20161108092625.GA14456@sparky-lenivo.brq.redhat.com>

Hi David,

On Tue, Nov 08, 2016 at 10:26:25AM +0100, Michal Tesar wrote:
> On Mon, Nov 07, 2016 at 08:13:45PM -0500, David Miller wrote:
> 
> > From: Michal Tesar <mtesar@redhat.com>
> > Date: Thu, 3 Nov 2016 10:38:34 +0100
> > 
> > >  2. If the received Query is a General Query, the interface timer is
> > >     used to schedule a response to the General Query after the
> > >     selected delay.  Any previously pending response to a General
> > >     Query is canceled.
> > > --8<--
> > > 
> > > Currently the timer is rearmed with new random expiration time for
> > > every incoming query regardless of possibly already pending report.
> > > Which is not aligned with the above RFE.
> > 
> > I don't read it that way.  #2 says if this is a general query then any
> > pending response to a general query is cancelled.  And that's
> > effectively what the code is doing right now.
> 
> Hi David,
> I think that it is important to notice that the RFC says also 
> that only the first matching rule is applied.
> 
> "
> When new Query with the Router-Alert option arrives on an
> interface, provided the system has state to report, a delay for a
> response is randomly selected in the range (0, [Max Resp Time]) where
> Max Resp Time is derived from Max Resp Code in the received Query
> message.  The following rules are then used to determine if a Report
> needs to be scheduled and the type of Report to schedule.  The rules
> are considered in order and only the first matching rule is applied.

 ^^

Would you like to reconsider about this? I also agree with Michal that we
need to choose the sooner timer. Or if we receive query very quickly, we
will keep refresh the timer and may never reply the report.

Thanks
Hangbin

> 
> 1. If there is a pending response to a previous General Query
> scheduled sooner than the selected delay, no additional response
> needs to be scheduled.
> 
> 2. If the received Query is a General Query, the interface timer is
> used to schedule a response to the General Query after the
> selected delay.  Any previously pending response to a General
> Query is canceled.
> "
> 
> So I would read the above like below:
> If some general query arrives and there is some
> already pending response scheduled sooner,
> no action is needed.
> That is how I understand to the rule [1].
> 
> But when an general query arrives and there is some other
> response already pending, but not sooner as rule one says but later,
> new report should be scheduled and the already pending one
> needs to be canceled.
> That is how I understand to the rule [2]
> 
> If there is no already pending report scheduled
> the first part of the rule [2] is applied
> and new report is scheduled along the selected delay.
> 
> So basically we need to compare the already scheduled response exp time
> with the one coming in and choose the sooner one.
> This is exactly what the patch does.
> 
> What do you think?
> Best regards Michal Tesar

^ permalink raw reply

* [PATCH net] virtio-net: add a missing synchronize_net()
From: Eric Dumazet @ 2016-11-16  6:24 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jason Wang, Michael S. Tsirkin

From: Eric Dumazet <edumazet@google.com>

It seems many drivers do not respect napi_hash_del() contract.

When napi_hash_del() is used before netif_napi_del(), an RCU grace
period is needed before freeing NAPI object.

Fixes: 91815639d880 ("virtio-net: rx busy polling support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index fd8b1e62301f..7276d5a95bd0 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1497,6 +1497,11 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 		netif_napi_del(&vi->rq[i].napi);
 	}

+	/* We called napi_hash_del() before netif_napi_del(),
+	 * we need to respect an RCU grace period before freeing vi->rq
+	 */
+	synchronize_net();
+
 	kfree(vi->rq);
 	kfree(vi->sq);
 }

^ permalink raw reply related

* Re: [PATCH net-next 2/5] net: ethoc: Implement ethtool::nway_reset
From: Tobias Klauser @ 2016-11-16  7:38 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, davem, andrew, tremyfr, Colin Ian King, open list
In-Reply-To: <20161115191949.15361-3-f.fainelli@gmail.com>

On 2016-11-15 at 20:19:46 +0100, Florian Fainelli <f.fainelli@gmail.com> wrote:
> Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
> already using dev->phydev all over the place so this comes for free.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Reviewed-by: Tobias Klauser <tklauser@distanz.ch>

^ permalink raw reply

* Re: [PATCH for-next 03/11] IB/hns: Optimize the logic of allocating memory using APIs
From: Leon Romanovsky @ 2016-11-16  8:36 UTC (permalink / raw)
  To: Salil Mehta
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Huwei (Xavier),
	oulijun, mehta.salil.lnk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Linuxarm,
	Zhangping (ZP)
In-Reply-To: <F4CC6FACFEB3C54C9141D49AD221F7F91A7A2371@lhreml503-mbx>

[-- Attachment #1: Type: text/plain, Size: 3338 bytes --]

On Tue, Nov 15, 2016 at 03:52:46PM +0000, Salil Mehta wrote:
> > -----Original Message-----
> > From: Leon Romanovsky [mailto:leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org]
> > Sent: Wednesday, November 09, 2016 7:22 AM
> > To: Salil Mehta
> > Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; Huwei (Xavier); oulijun;
> > mehta.salil.lnk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> > netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Linuxarm;
> > Zhangping (ZP)
> > Subject: Re: [PATCH for-next 03/11] IB/hns: Optimize the logic of
> > allocating memory using APIs
> >
> > On Fri, Nov 04, 2016 at 04:36:25PM +0000, Salil Mehta wrote:
> > > From: "Wei Hu (Xavier)" <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > >
> > > This patch modified the logic of allocating memory using APIs in
> > > hns RoCE driver. We used kcalloc instead of kmalloc_array and
> > > bitmap_zero. And When kcalloc failed, call vzalloc to alloc
> > > memory.
> > >
> > > Signed-off-by: Wei Hu (Xavier) <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > > Signed-off-by: Ping Zhang <zhangping5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > > Signed-off-by: Salil Mehta  <salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > > ---
> > >  drivers/infiniband/hw/hns/hns_roce_mr.c |   15 ++++++++-------
> > >  1 file changed, 8 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c
> > b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > index fb87883..d3dfb5f 100644
> > > --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > @@ -137,11 +137,12 @@ static int hns_roce_buddy_init(struct
> > hns_roce_buddy *buddy, int max_order)
> > >
> > >  	for (i = 0; i <= buddy->max_order; ++i) {
> > >  		s = BITS_TO_LONGS(1 << (buddy->max_order - i));
> > > -		buddy->bits[i] = kmalloc_array(s, sizeof(long),
> > GFP_KERNEL);
> > > -		if (!buddy->bits[i])
> > > -			goto err_out_free;
> > > -
> > > -		bitmap_zero(buddy->bits[i], 1 << (buddy->max_order - i));
> > > +		buddy->bits[i] = kcalloc(s, sizeof(long), GFP_KERNEL);
> > > +		if (!buddy->bits[i]) {
> > > +			buddy->bits[i] = vzalloc(s * sizeof(long));
> >
> > I wonder, why don't you use directly vzalloc instead of kcalloc
> > fallback?
> As we know we will have physical contiguous pages if the kcalloc
> call succeeds. This will give us a chance to have better performance
> over the allocations which are just virtually contiguous through the
> function vzalloc(). Therefore, later has only been used as a fallback
> when our memory request cannot be entertained through kcalloc.
>
> Are you suggesting that there will not be much performance penalty
> if we use just vzalloc ?

Not exactly,
I asked it, because we have similar code in our drivers and this
construction looks strange to me.

1. If performance is critical, we will use kmalloc.
2. If performance is not critical, we will use vmalloc.

But in this case, such construction shows me that we can live with
vmalloc performance and kmalloc allocation are not really needed.

In your specific case, I'm not sure that kcalloc will ever fail.

Thanks


>
> >
> > > +			if (!buddy->bits[i])
> > > +				goto err_out_free;
> > > +		}
> > >  	}

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* [PATCH net] net: nsid cannot be allocated for a dead netns
From: Nicolas Dichtel @ 2016-11-16  8:41 UTC (permalink / raw)
  To: avagin; +Cc: davem, netdev, xiyou.wangcong, Nicolas Dichtel
In-Reply-To: <CANaxB-yaizaZbUi-OaJLzy+k2fNOkTCb8L8uwBCw4L5axOCzdg@mail.gmail.com>

Andrei reports the following kmemleak error:
unreferenced object 0xffff91badb543950 (size 2096):
  comm "kworker/u4:0", pid 6, jiffies 4295152553 (age 28.418s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 cb 5f df ba 91 ff ff  .........._.....
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffffb1865bea>] kmemleak_alloc+0x4a/0xa0
    [<ffffffffb1243b38>] kmem_cache_alloc+0x128/0x280
    [<ffffffffb142f5ab>] idr_layer_alloc+0x2b/0x90
    [<ffffffffb142f9cd>] idr_get_empty_slot+0x34d/0x370
    [<ffffffffb142fa4e>] idr_alloc+0x5e/0x110
    [<ffffffffb170ac3d>] __peernet2id_alloc+0x6d/0x90
    [<ffffffffb170bda5>] peernet2id_alloc+0x55/0xb0
    [<ffffffffb1731216>] rtnl_fill_ifinfo+0xaa6/0x10a0
    [<ffffffffb1733073>] rtmsg_ifinfo_build_skb+0x73/0xd0
    [<ffffffffb17125d5>] rollback_registered_many+0x295/0x390
    [<ffffffffb1712765>] unregister_netdevice_many+0x25/0x80
    [<ffffffffb17138a5>] default_device_exit_batch+0x145/0x170
    [<ffffffffb170ae52>] ops_exit_list.isra.4+0x52/0x60
    [<ffffffffb170c17f>] cleanup_net+0x1bf/0x2a0
    [<ffffffffb10b616f>] process_one_work+0x1ff/0x660
    [<ffffffffb10b661e>] worker_thread+0x4e/0x480

There is no reason to try to allocate an nsid for a netns which is dying.

Reported-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---

Thank you for the report. Can you test this patch?

Regards,
Nicolas

 net/core/net_namespace.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index f61c0e02a413..f1340ed0d8df 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -159,6 +159,9 @@ static int alloc_netid(struct net *net, struct net *peer, int reqid)
 		max = reqid + 1;
 	}
 
+	if (!atomic_read(&net->count) || !&atomic_read(peer->count))
+		return -EINVAL;
+
 	return idr_alloc(&net->netns_ids, peer, min, max, GFP_ATOMIC);
 }
 
-- 
2.8.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox