Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v3 8/8] ethtool: add compat for devlink info
From: David Miller @ 2019-01-31 17:23 UTC (permalink / raw)
  To: lkp
  Cc: jakub.kicinski, kbuild-all, netdev, oss-drivers, jiri, andrew,
	f.fainelli, mkubecek, eugenem, jonathan.lemon
In-Reply-To: <201902010037.2Jsqwihj%fengguang.wu@intel.com>

From: kbuild test robot <lkp@intel.com>
Date: Fri, 1 Feb 2019 00:19:33 +0800

> All errors (new ones prefixed by >>):
> 
>    m68k-linux-gnu-ld: drivers/rtc/proc.o: in function `is_rtc_hctosys.isra.0':
>    proc.c:(.text+0x178): undefined reference to `strcmp'
>    m68k-linux-gnu-ld: net/core/ethtool.o: in function `ethtool_get_drvinfo':
>>> ethtool.c:(.text+0xc08): undefined reference to `devlink_compat_running_version'

Missing string.h include perhaps?

^ permalink raw reply

* Re: [PATCH net] l2tp: copy 4 more bytes to linear part if necessary
From: David Miller @ 2019-01-31 17:20 UTC (permalink / raw)
  To: jian.w.wen; +Cc: netdev, gnault
In-Reply-To: <20190131071856.22120-1-jian.w.wen@oracle.com>

From: Jacob Wen <jian.w.wen@oracle.com>
Date: Thu, 31 Jan 2019 15:18:56 +0800

> The size of L2TPv2 header with all optional fields is 14 bytes.
> l2tp_udp_recv_core only moves 10 bytes to the linear part of a
> skb. This may lead to l2tp_recv_common read data outside of a skb.
> 
> This patch make sure that there is at least 14 bytes in the linear
> part of a skb to meet the maximum need of l2tp_udp_recv_core and
> l2tp_recv_common. The minimum size of both PPP HDLC-like frame and
> Ethernet frame is larger than 14 bytes, so we are safe to do so.
> 
> Also remove L2TP_HDR_SIZE_NOSEQ, it is unused now.
> 
> Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
> Suggested-by: Guillaume Nault <gnault@redhat.com>
> Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net] rds: fix refcount bug in rds_sock_addref
From: Santosh Shilimkar @ 2019-01-31 17:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Eric Dumazet, syzbot, Sowmini Varadhan,
	rds-devel, Cong Wang
In-Reply-To: <20190131164710.230590-1-edumazet@google.com>

On 1/31/2019 8:47 AM, Eric Dumazet wrote:
> syzbot was able to catch a bug in rds [1]
> 
> The issue here is that the socket might be found in a hash table
> but that its refcount has already be set to 0 by another cpu.
> 
> We need to use refcount_inc_not_zero() to be safe here.
> 
> [1]
> 
> refcount_t: increment on 0; use-after-free.
> WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked lib/refcount.c:153 [inline]
> WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked+0x61/0x70 lib/refcount.c:151
> Kernel panic - not syncing: panic_on_warn set ...
> CPU: 1 PID: 23129 Comm: syz-executor3 Not tainted 5.0.0-rc4+ #53
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>   __dump_stack lib/dump_stack.c:77 [inline]
>   dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
>   panic+0x2cb/0x65c kernel/panic.c:214
>   __warn.cold+0x20/0x48 kernel/panic.c:571
>   report_bug+0x263/0x2b0 lib/bug.c:186
>   fixup_bug arch/x86/kernel/traps.c:178 [inline]
>   fixup_bug arch/x86/kernel/traps.c:173 [inline]
>   do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
>   do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
>   invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
> RIP: 0010:refcount_inc_checked lib/refcount.c:153 [inline]
> RIP: 0010:refcount_inc_checked+0x61/0x70 lib/refcount.c:151
> Code: 1d 51 63 c8 06 31 ff 89 de e8 eb 1b f2 fd 84 db 75 dd e8 a2 1a f2 fd 48 c7 c7 60 9f 81 88 c6 05 31 63 c8 06 01 e8 af 65 bb fd <0f> 0b eb c1 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 49
> RSP: 0018:ffff8880a0cbf1e8 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc90006113000
> RDX: 000000000001047d RSI: ffffffff81685776 RDI: 0000000000000005
> RBP: ffff8880a0cbf1f8 R08: ffff888097c9e100 R09: ffffed1015ce5021
> R10: ffffed1015ce5020 R11: ffff8880ae728107 R12: ffff8880723c20c0
> R13: ffff8880723c24b0 R14: dffffc0000000000 R15: ffffed1014197e64
>   sock_hold include/net/sock.h:647 [inline]
>   rds_sock_addref+0x19/0x20 net/rds/af_rds.c:675
>   rds_find_bound+0x97c/0x1080 net/rds/bind.c:82
>   rds_recv_incoming+0x3be/0x1430 net/rds/recv.c:362
>   rds_loop_xmit+0xf3/0x2a0 net/rds/loop.c:96
>   rds_send_xmit+0x1355/0x2a10 net/rds/send.c:355
>   rds_sendmsg+0x323c/0x44e0 net/rds/send.c:1368
>   sock_sendmsg_nosec net/socket.c:621 [inline]
>   sock_sendmsg+0xdd/0x130 net/socket.c:631
>   __sys_sendto+0x387/0x5f0 net/socket.c:1788
>   __do_sys_sendto net/socket.c:1800 [inline]
>   __se_sys_sendto net/socket.c:1796 [inline]
>   __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1796
>   do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x458089
> Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007fc266df8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 0000000000458089
> RDX: 0000000000000000 RSI: 00000000204b3fff RDI: 0000000000000005
> RBP: 000000000073bf00 R08: 00000000202b4000 R09: 0000000000000010
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc266df96d4
> R13: 00000000004c56e4 R14: 00000000004d94a8 R15: 00000000ffffffff
> 
> Fixes: cc4dfb7f70a3 ("rds: fix two RCU related problems")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> Cc: rds-devel@oss.oracle.com
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> ---
Looks good.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>


^ permalink raw reply

* Re: [PATCH net-next] nfp: use struct_size() in kzalloc()
From: Gustavo A. R. Silva @ 2019-01-31 17:15 UTC (permalink / raw)
  To: Joe Perches, Jakub Kicinski, David S. Miller
  Cc: oss-drivers, netdev, linux-kernel
In-Reply-To: <dc6d827c187cd228b917f8d9d984bd5ea578d712.camel@perches.com>

Hi Joe,

On 1/31/19 11:11 AM, Joe Perches wrote:
> On Wed, 2019-01-30 at 18:38 -0600, Gustavo A. R. Silva wrote:
>> One of the more common cases of allocation size calculations is finding
>> the size of a structure that has a zero-sized array at the end, along
>> with memory for some number of elements for that array. For example:
>>
>> struct foo {
>>     int stuff;
>>     struct boo entry[];
>> };
>>
>> instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);
>>
>> Instead of leaving these open-coded and prone to type mistakes, we can
>> now use the new struct_size() helper:
>>
>> instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);
>>
>> This code was detected with the help of Coccinelle.
> 
> Might be useful to augment the script to include cases
> where the computed size is saved to a temporary and
> that temporary is used ala:
> 

Yep. That's already on my list.

> https://patchwork.kernel.org/patch/10782453/
> 
> On Sat, 2019-01-26 at 20:42 +0800, YueHaibing wrote:
>> Use kmemdup rather than duplicating its implementation
> []
>> diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
> []
>> @@ -1196,13 +1196,9 @@ iwl_parse_nvm_mcc_info(struct device *dev, const struct iwl_cfg *cfg,
>>  	regd_to_copy = sizeof(struct ieee80211_regdomain) +
>>  		valid_rules * sizeof(struct ieee80211_reg_rule);
>> -	copy_rd = kzalloc(regd_to_copy, GFP_KERNEL);
>> -	if (!copy_rd) {
>> +	copy_rd = kmemdup(regd, regd_to_copy, GFP_KERNEL);
> 
> This should probably be
> 
> 	copy_rd = kmemdup(regd, struct_size(regd, reg_rules, valid_rules),
> 			  GFP_KERNEL);
> 

I agree.

Thanks
--
Gustavo

^ permalink raw reply

* Re: [PATCH net-next] nfp: use struct_size() in kzalloc()
From: Joe Perches @ 2019-01-31 17:11 UTC (permalink / raw)
  To: Gustavo A. R. Silva, Jakub Kicinski, David S. Miller
  Cc: oss-drivers, netdev, linux-kernel
In-Reply-To: <20190131003859.GA28539@embeddedor>

On Wed, 2019-01-30 at 18:38 -0600, Gustavo A. R. Silva wrote:
> One of the more common cases of allocation size calculations is finding
> the size of a structure that has a zero-sized array at the end, along
> with memory for some number of elements for that array. For example:
> 
> struct foo {
>     int stuff;
>     struct boo entry[];
> };
> 
> instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);
> 
> Instead of leaving these open-coded and prone to type mistakes, we can
> now use the new struct_size() helper:
> 
> instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);
> 
> This code was detected with the help of Coccinelle.

Might be useful to augment the script to include cases
where the computed size is saved to a temporary and
that temporary is used ala:

https://patchwork.kernel.org/patch/10782453/

On Sat, 2019-01-26 at 20:42 +0800, YueHaibing wrote:
> Use kmemdup rather than duplicating its implementation
[]
> diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
[]
> @@ -1196,13 +1196,9 @@ iwl_parse_nvm_mcc_info(struct device *dev, const struct iwl_cfg *cfg,
>  	regd_to_copy = sizeof(struct ieee80211_regdomain) +
>  		valid_rules * sizeof(struct ieee80211_reg_rule);
> -	copy_rd = kzalloc(regd_to_copy, GFP_KERNEL);
> -	if (!copy_rd) {
> +	copy_rd = kmemdup(regd, regd_to_copy, GFP_KERNEL);

This should probably be

	copy_rd = kmemdup(regd, struct_size(regd, reg_rules, valid_rules),
			  GFP_KERNEL);




^ permalink raw reply

* [PATCH] mlx4_ib: Increase the timeout for CM cache
From: Håkon Bugge @ 2019-01-31 17:09 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, linux-rdma, rds-devel, linux-kernel

Using CX-3 virtual functions, either from a bare-metal machine or
pass-through from a VM, MAD packets are proxied through the PF driver.

Since the VMs have separate name spaces for MAD Transaction Ids
(TIDs), the PF driver has to re-map the TIDs and keep the book keeping
in a cache.

Following the RDMA CM protocol, it is clear when an entry has to
evicted form the cache. But life is not perfect, remote peers may die
or be rebooted. Hence, it's a timeout to wipe out a cache entry, when
the PF driver assumes the remote peer has gone.

We have experienced excessive amount of DREQ retries during fail-over
testing, when running with eight VMs per database server.

The problem has been reproduced in a bare-metal system using one VM
per physical node. In this environment, running 256 processes in each
VM, each process uses RDMA CM to create an RC QP between himself and
all (256) remote processes. All in all 16K QPs.

When tearing down these 16K QPs, excessive DREQ retries (and
duplicates) are observed. With some cat/paste/awk wizardry on the
infiniband_cm sysfs, we observe:

      dreq:       5007
cm_rx_msgs:
      drep:       3838
      dreq:      13018
       rep:       8128
       req:       8256
       rtu:       8256
cm_tx_msgs:
      drep:       8011
      dreq:      68856
       rep:       8256
       req:       8128
       rtu:       8128
cm_tx_retries:
      dreq:      60483

Note that the active/passive side is distributed.

Enabling pr_debug in cm.c gives tons of:

[171778.814239] <mlx4_ib> mlx4_ib_multiplex_cm_handler: id{slave:
1,sl_cm_id: 0xd393089f} is NULL!

By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the
tear-down phase of the application is reduced from 113 to 67
seconds. Retries/duplicates are also significantly reduced:

cm_rx_duplicates:
      dreq:       7726
[]
cm_tx_retries:
      drep:          1
      dreq:       7779

Increasing the timeout further didn't help, as these duplicates and
retries stem from a too short CMA timeout, which was 20 (~4 seconds)
on the systems. By increasing the CMA timeout to 22 (~17 seconds), the
numbers fell down to about one hundred for both of them.

Adjustment of the CMA timeout is _not_ part of this commit.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
---
 drivers/infiniband/hw/mlx4/cm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
index fedaf8260105..8c79a480f2b7 100644
--- a/drivers/infiniband/hw/mlx4/cm.c
+++ b/drivers/infiniband/hw/mlx4/cm.c
@@ -39,7 +39,7 @@
 
 #include "mlx4_ib.h"
 
-#define CM_CLEANUP_CACHE_TIMEOUT  (5 * HZ)
+#define CM_CLEANUP_CACHE_TIMEOUT  (30 * HZ)
 
 struct id_map_entry {
 	struct rb_node node;
-- 
2.20.1


^ permalink raw reply related

* pull-request: ieee802154 for net 2019-01-31
From: Stefan Schmidt @ 2019-01-31 17:00 UTC (permalink / raw)
  To: davem; +Cc: linux-wpan, alex.aring, netdev

Hello Dave.

An update from ieee802154 for your *net* tree.

I waited a while to see if anything else comes up, but it seems this time
we only have one fixup patch for the -rc rounds.
Colin fixed some indentation in the mcr20a drivers. That's about it.

If there are any problems with taking these two before the final 5.0 let
me know.

Greetings from Brussels, FOSDEM ahead. :-)

regards
Stefan Schmidt

The following changes since commit 3aa9179b2dfe06d32d4248f07cf6dd005a964507:

  Merge branch 'stmmac-fixes' (2019-01-30 22:24:49 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan.git ieee802154-for-davem-2019-01-31

for you to fetch changes up to 34aaaac815d166daef361f49529f4c6b77da49f1:

  ieee802154: mcr20a: fix indentation, remove tabs (2019-01-31 17:42:05 +0100)

----------------------------------------------------------------
Colin Ian King (1):
      ieee802154: mcr20a: fix indentation, remove tabs

 drivers/net/ieee802154/mcr20a.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

^ permalink raw reply

* [PATCH v2] Revert "net: phy: marvell: avoid pause mode on SGMII-to-Copper for 88e151x"
From: Russell King @ 2019-01-31 16:59 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, Heiner Kallweit; +Cc: David S. Miller, netdev

This reverts commit 6623c0fba10ef45b64ca213ad5dec926f37fa9a0.

The original diagnosis was incorrect: it appears that the NIC had
PHY polling mode enabled, which meant that it overwrote the PHYs
advertisement register during negotiation.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
v2: respun on top of net-next, but I've not yet been able to boot
 test this - which is unlikely to be for a while yet.

The extraneous whitespace change comes from the reversion of the
original patch - I added a "u32 pause" and blank line there.  The
patch converting to link modes removed the "u32 pause" but left
the blank line, causing a coding style issue.  It seems only right
that a reversion of the original commit fixes that too.

 drivers/net/phy/marvell.c | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 90f44ba8aca7..3ccba37bd6dd 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -842,7 +842,6 @@ static int m88e1510_config_init(struct phy_device *phydev)
 
 	/* SGMII-to-Copper mode initialization */
 	if (phydev->interface == PHY_INTERFACE_MODE_SGMII) {
-
 		/* Select page 18 */
 		err = marvell_set_page(phydev, 18);
 		if (err < 0)
@@ -865,21 +864,6 @@ static int m88e1510_config_init(struct phy_device *phydev)
 		err = marvell_set_page(phydev, MII_MARVELL_COPPER_PAGE);
 		if (err < 0)
 			return err;
-
-		/* There appears to be a bug in the 88e1512 when used in
-		 * SGMII to copper mode, where the AN advertisement register
-		 * clears the pause bits each time a negotiation occurs.
-		 * This means we can never be truely sure what was advertised,
-		 * so disable Pause support.
-		 */
-		linkmode_clear_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
-				   phydev->supported);
-		linkmode_clear_bit(ETHTOOL_LINK_MODE_Pause_BIT,
-				   phydev->supported);
-		linkmode_clear_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
-				   phydev->advertising);
-		linkmode_clear_bit(ETHTOOL_LINK_MODE_Pause_BIT,
-				   phydev->advertising);
 	}
 
 	return m88e1318_config_init(phydev);
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH v2 bpf] bpf: selftests: handle sparse CPU allocations
From: Y Song @ 2019-01-31 16:48 UTC (permalink / raw)
  To: Martynas Pumputis; +Cc: netdev, Alexei Starovoitov, Daniel Borkmann
In-Reply-To: <20190131091933.27764-1-m@lambda.lt>

On Thu, Jan 31, 2019 at 1:18 AM Martynas Pumputis <m@lambda.lt> wrote:
>
> Previously, bpf_num_possible_cpus() had a bug when calculating a
> number of possible CPUs in the case of sparse CPU allocations, as
> it was considering only the first range or element of
> /sys/devices/system/cpu/possible.
>
> E.g. in the case of "0,2-3" (CPU 1 is not available), the function
> returned 1 instead of 3.
>
> This patch fixes the function by making it parse all CPU ranges and
> elements.
>
> Signed-off-by: Martynas Pumputis <m@lambda.lt>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply

* Re: [PATCH net-next] r8169: improve WoL handling
From: David Miller @ 2019-01-31 16:48 UTC (permalink / raw)
  To: hkallweit1; +Cc: nic_swsd, netdev
In-Reply-To: <ed1c07d6-839d-daad-0672-caa0fdf9b42d@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Thu, 31 Jan 2019 07:43:30 +0100

> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 3e650bd9e..2dab28115 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -1371,6 +1371,8 @@ static void rtl_link_chg_patch(struct rtl8169_private *tp)
>  
>  #define WAKE_ANY (WAKE_PHY | WAKE_MAGIC | WAKE_UCAST | WAKE_BCAST | WAKE_MCAST)
>  
> +/* Don't delete it completely, in case we need to re-enable it */
> +#if 0
>  static u32 __rtl8169_get_wol(struct rtl8169_private *tp)

Please don't do this.

The code lives forever in the GIT repository history and can be brought
back in at any time in the future.

^ permalink raw reply

* [PATCH net] rds: fix refcount bug in rds_sock_addref
From: Eric Dumazet @ 2019-01-31 16:47 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, syzbot, Sowmini Varadhan,
	Santosh Shilimkar, rds-devel, Cong Wang

syzbot was able to catch a bug in rds [1]

The issue here is that the socket might be found in a hash table
but that its refcount has already be set to 0 by another cpu.

We need to use refcount_inc_not_zero() to be safe here.

[1]

refcount_t: increment on 0; use-after-free.
WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked lib/refcount.c:153 [inline]
WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked+0x61/0x70 lib/refcount.c:151
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 23129 Comm: syz-executor3 Not tainted 5.0.0-rc4+ #53
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
 panic+0x2cb/0x65c kernel/panic.c:214
 __warn.cold+0x20/0x48 kernel/panic.c:571
 report_bug+0x263/0x2b0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 fixup_bug arch/x86/kernel/traps.c:173 [inline]
 do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
RIP: 0010:refcount_inc_checked lib/refcount.c:153 [inline]
RIP: 0010:refcount_inc_checked+0x61/0x70 lib/refcount.c:151
Code: 1d 51 63 c8 06 31 ff 89 de e8 eb 1b f2 fd 84 db 75 dd e8 a2 1a f2 fd 48 c7 c7 60 9f 81 88 c6 05 31 63 c8 06 01 e8 af 65 bb fd <0f> 0b eb c1 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 49
RSP: 0018:ffff8880a0cbf1e8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc90006113000
RDX: 000000000001047d RSI: ffffffff81685776 RDI: 0000000000000005
RBP: ffff8880a0cbf1f8 R08: ffff888097c9e100 R09: ffffed1015ce5021
R10: ffffed1015ce5020 R11: ffff8880ae728107 R12: ffff8880723c20c0
R13: ffff8880723c24b0 R14: dffffc0000000000 R15: ffffed1014197e64
 sock_hold include/net/sock.h:647 [inline]
 rds_sock_addref+0x19/0x20 net/rds/af_rds.c:675
 rds_find_bound+0x97c/0x1080 net/rds/bind.c:82
 rds_recv_incoming+0x3be/0x1430 net/rds/recv.c:362
 rds_loop_xmit+0xf3/0x2a0 net/rds/loop.c:96
 rds_send_xmit+0x1355/0x2a10 net/rds/send.c:355
 rds_sendmsg+0x323c/0x44e0 net/rds/send.c:1368
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xdd/0x130 net/socket.c:631
 __sys_sendto+0x387/0x5f0 net/socket.c:1788
 __do_sys_sendto net/socket.c:1800 [inline]
 __se_sys_sendto net/socket.c:1796 [inline]
 __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1796
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458089
Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fc266df8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 0000000000458089
RDX: 0000000000000000 RSI: 00000000204b3fff RDI: 0000000000000005
RBP: 000000000073bf00 R08: 00000000202b4000 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc266df96d4
R13: 00000000004c56e4 R14: 00000000004d94a8 R15: 00000000ffffffff

Fixes: cc4dfb7f70a3 ("rds: fix two RCU related problems")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: rds-devel@oss.oracle.com
Cc: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/rds/bind.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/rds/bind.c b/net/rds/bind.c
index 762d2c6788a385631a312a39625d67a1154ef596..17c9d9f0c8483b4b0a887e69e7caac246c369423 100644
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -78,10 +78,10 @@ struct rds_sock *rds_find_bound(const struct in6_addr *addr, __be16 port,
 	__rds_create_bind_key(key, addr, port, scope_id);
 	rcu_read_lock();
 	rs = rhashtable_lookup(&bind_hash_table, key, ht_parms);
-	if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
-		rds_sock_addref(rs);
-	else
+	if (rs && (sock_flag(rds_rs_to_sk(rs), SOCK_DEAD) ||
+		   !refcount_inc_not_zero(&rds_rs_to_sk(rs)->sk_refcnt)))
 		rs = NULL;
+
 	rcu_read_unlock();
 
 	rdsdebug("returning rs %p for %pI6c:%u\n", rs, addr,
-- 
2.20.1.495.gaa96b0ce6b-goog


^ permalink raw reply related

* Re: [PATCH bpf-next v5 0/5] bpf: add BPF_LWT_ENCAP_IP option to bpf_lwt_push_encap
From: David Ahern @ 2019-01-31 16:38 UTC (permalink / raw)
  To: Peter Oskolkov, Alexei Starovoitov, Daniel Borkmann, netdev
  Cc: Peter Oskolkov
In-Reply-To: <20190130235136.136527-1-posk@google.com>

On 1/30/19 4:51 PM, Peter Oskolkov wrote:
> This patchset implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
> BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
> and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
> to packets (e.g. IP/GRE, GUE, IPIP).
> 
> This is useful when thousands of different short-lived flows should be
> encapped, each with different and dynamically determined destination.
> Although lwtunnels can be used in some of these scenarios, the ability
> to dynamically generate encap headers adds more flexibility, e.g.
> when routing depends on the state of the host (reflected in global bpf
> maps).
> 

For the set:
Reviewed-by: David Ahern <dsahern@gmail.com>



^ permalink raw reply

* Re: [RFC 00/14] netlink/hierarchical stats
From: Roopa Prabhu @ 2019-01-31 16:31 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Miller, oss-drivers, netdev, Jiří Pírko,
	Florian Fainelli, Andrew Lunn, Michal Kubecek, David Ahern,
	Simon Horman, Brandeburg, Jesse, maciejromanfijalkowski,
	vasundhara-v.volam, Michael Chan, shalomt, Ido Schimmel
In-Reply-To: <CAJieiUjBo6P4Ut-LpqZcMyfSbzgtVqYL48TdV+v=bMMg-LyeAg@mail.gmail.com>

On Thu, Jan 31, 2019 at 8:16 AM Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
>
> On Wed, Jan 30, 2019 at 4:24 PM Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
> >
> > On Wed, 30 Jan 2019 14:14:34 -0800, Roopa Prabhu wrote:
> > > On Mon, Jan 28, 2019 at 3:45 PM Jakub Kicinski wrote:
> > > > Hi!
> > > >
> > > > As I tried to explain in my slides at netconf 2018 we are lacking
> > > > an expressive, standard API to report device statistics.
> > > >
> > > > Networking silicon generally maintains some IEEE 802.3 and/or RMON
> > > > statistics.  Today those all end up in ethtool -S.  Here is a simple
> > > > attempt (admittedly very imprecise) of counting how many names driver
> > > > authors invented for IETF RFC2819 etherStatsPkts512to1023Octets
> > > > statistics (RX and TX):
> > > >
> > > > $ git grep '".*512.*1023.*"' -- drivers/net/ | \
> > > >     sed -e 's/.*"\(.*\)".*/\1/' | sort | uniq | wc -l
> > > > 63
> > > >
> > > > Interestingly only two drivers in the tree use the name the standard
> > > > gave us (etherStatsPkts512to1023, modulo case).
> > > >
> > > > I set out to working on this set in an attempt to give drivers a way
> > > > to express clearly to user space standard-compliant counters.
> > > >
> > > > Second most common use for custom statistics is per-queue counters.
> > > > This is where the "hierarchical" part of this set comes in, as
> > > > groups can be nested, and user space tools can handle the aggregation
> > > > inside the groups if needed.
> > > >
> > > > This set also tries to address the problem of users not knowing if
> > > > a statistic is reported by hardware or the driver.  Many modern drivers
> > > > use some prefix in ethtool -S to indicate MAC/PHY stats.  At a quick
> > > > glance: Netronome uses "mac.", Intel "port." and Mellanox "_phy".
> > > > In this set, netlink attributes describe whether a group of statistics
> > > > is RX or TX, maintained by device or driver.
> > > >
> > > > The purpose of this patch set is _not_ to replace ethtool -S.  It is
> > > > an incredibly useful tool, and we will certainly continue using it.
> > > > However, for standard-based and commonly maintained statistics a more
> > > > structured API seems warranted.
> > > >
> > > > There are two things missing from these patches, which I initially
> > > > planned to address as well: filtering, and refresh rate control.
> > > >
> > > > Filtering doesn't need much explanation, users should be able to request
> > > > only a subset of statistics (like only SW stats or only given ID).  The
> > > > bitmap of statistics in each group is there for filtering later on.
> > > >
> > > > By refresh control I mean the ability for user space to indicate how
> > > > "fresh" values it expects.  Sometimes reading the HW counters requires
> > > > slow register reads or FW communication, in such cases drivers may cache
> > > > the result.  (Privileged) user space should be able to add a "not older
> > > > than" timestamp to indicate how fresh statistics it expects.  And vice
> > > > versa, drivers can then also put the timestamp of when the statistics
> > > > were last refreshed in the dump for more precise bandwidth estimation.
> > >
> > > Jakub, Glad to see hw stats in the RTM_*STATS api. I do see you
> > > mention 'partial' support for ethtool stats. I understand the reason
> > > you say its partial.
> > > But while at it, why not also include the ability to have driver
> > > extensible stats here ? ie make it complete. We have talked about
> > > making all hw stats available
> > > via the RTM_*STATS api in the past..., so just want to make sure the
> > > new HSTATS infra you are adding to the RTM_*STATS api
> > > covers or at-least makes it possible to include driver extensible
> > > stats in the future where the driver gets to define the stats id +
> > > value (This is very useful).
> > >  It would be nice if you can account for that in this new HSTATS API.
> >
> > My thinking was that we should leave truly custom/strange stats to
> > ethtool API which works quite well for that and at the same time be
> > very accepting of people adding new IDs to HSTAT (only requirement is
> > basically defining the meaning very clearly).
>
> that sounds reasonable. But the 'defining meaning clearly' gets tricky
> sometimes.
> The vendor who gets their ID or meaning first wins :) and the rest
> will have to live with
> ethtool and explain to rest of the world that ethtool is more reliable
> for their hardware :)
>
> I am also concerned that this getting the ID into common HSTAT ID
> space will  slow down the process of adding new counters
> for vendors. Which will lead to vendors sticking with ethtool API. It
> would be great if people can get all stats in one place and not rely
> on another API for 'more'.
>
> >
> > For the first stab I looked at two drivers and added all the stats that
> > were common.
> >
> > Given this set is identifying statistics by ID - how would we make that
> > extensible to drivers?  Would we go back to strings or have some
> > "driver specific" ID space?
>
> I was looking for ideas from you really, to see if you had considered
> this. agree per driver ID space seems ugly.
> ethtool strings are great today...if we can control the duplication.
> But thinking some more..., i did see some
> patches recently for vendor specific parameter (with ID) space in
> devlink. maybe something like that will be
> reasonable ?
>
> >
> > Is there any particular type of statistic you'd expect drivers to want
> > to add?  For NICs I think IEEE/RMON should pretty much cover the
> > silicon ones, but I don't know much about switches :)
>
> I will have to go through the list. But switch asics do support
> flexible stats/counters that can be attached at various points.
> And new chip versions come with more support. Having that flexibility
> to expose/extend such stats incrementally is very valuable on a per
> hardware/vendor basis.

Just want to clarify that I am suggesting a nested HSTATS extension
infra for drivers (just like ethtool).
'Common stats' stays at the top-level.

^ permalink raw reply

* Re: [PATCH iproute2-next] Introduce ip-brctl shell script
From: Roopa Prabhu @ 2019-01-31 16:28 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: Nikolay Aleksandrov, David Ahern, Phil Sutter, Eric Garver,
	Tomas Dolezal, Stephen Hemminger, Lennert Buytenhek, netdev
In-Reply-To: <20190131134606.4be7f205@redhat.com>

On Thu, Jan 31, 2019 at 4:46 AM Stefano Brivio <sbrivio@redhat.com> wrote:
>
> On Wed, 30 Jan 2019 14:30:59 -0800
> Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
>
> > On Sun, Jan 27, 2019 at 11:57 PM Stefano Brivio <sbrivio@redhat.com> wrote:
> > >
> > > They can't replace brctl not because they are badly designed or
> > > unusable, but simply because they are different tools with different
> > > purposes (see also my comments to Nikolay).
> >
> > I don't think i understand that they are different tools. The new netlink tools
> > are supposed to deprecate the old tools that use ioctls. this is the same reason
> > we don't have a ip-ifconfig today
>
> It's not just ioctl vs. netlink: brctl operates on bridges, whereas
> ip-link and 'bridge' operate on generic links. If you look at
> cmd_showstp() and cmd_show() from my script (I wouldn't recommend that
> right after breakfast ;)) that should be apparent.
>
> If you want to get basic information about a bridge, with ip-link or
> 'bridge' you need multiple commands. This is very different from
> ifconfig: there's no such ifconfig command that can't be implemented as
> a single ip-link command.
>
> > > > So, I think its better to fix the first two instead of introducing
> > > > another one.
> > >
> > > This is really not the same thing: I'm not introducing a new tool, I'm
> > > effectively replacing a 1794-LoC, non-trivial, ioctl-based
> > > implementation with a trivial, 572-lines shell script, with half
> > > the binary size.
> >
> > you are replacing a ioctl-based tool from another package into
> > iproute2..and maybe there-by deprecating the netlink based tool ? :).
>
> Oh, I see your point here. Well, it's not deprecating anything because
> I'm using the netlink tool in the wrapper (which allows it to be
> rather dumb and simple), but sure, I understand what you mean.
>
> Still, we could very easily get the wrapper to print equivalent ip-link
> and bridge commands before executing them -- and this would actually
> ease and encourage migration, compared to the current situation.
>
> > We are in opposite camps, I strongly want people to move to bridge and
> > ip link which has all the latest support.
>
> I wouldn't say we are in opposite camps, I just think that whatever has
> been done so far to get people to switch hasn't worked.

agreed. I think not much has been done to make output prettier for
humans. The detailed output especially is very tricky to read.
I think we should fix that.

>
> > > > Can we extend 'bridge' tool with extra options to provide a summary
> > > > view of all bridges like brctl ?
> > >
> > > We could, and I initially thought of that approach instead, but that
> > > has a number of fundamental downsides:
> > >
> > > - we can't provide a brctl-compatible syntax, unless we want to
> > >   substantially rewrite the 'bridge' interface, and I think it's a
> > >   bad idea to break 'bridge' syntax for users, while we won't be able to
> > >   replace brctl if we don't provide a similar syntax, history showed
> >
> > I am certainly not suggesting we break existing bridge users. I am
> > talking about new options.
>
> Let's take 'brctl showstp br0'. I wouldn't even know where to start
> adding options to 'bridge' to have a similar functionality (any idea?),
> and still, that brctl command is very convenient.


I don't see a reason not to have 'bridge stp show'  (similar to bridge
vlan show).
Maybe there are better ways to represent the same thing.


>
> > I understand some people are finding it hard to move away from brctl
> > output, but in my experience,
> > these are also the people who want new things in brctl like json
> > output etc. which is already available in the bridge command
>
> Not in my experience: the output of brctl showmacs and showstp commands
> is human-readable, that's what a large category of users need. JSON is
> well suited for other purposes.

For showmacs i can tell you, we have been able to move every one to
'bridge fdb show' because beyond certain point brctl showmacs is not
very useful. bridge fdb show on the other hand covers vxlan etc.

>
> > > - the fdb implementation has a long-dated comment by Stephen in its
> > >   header,
> > >         * TODO: merge/replace this with ip neighbour
> > >   and this is actually the only part of 'bridge' I'm using in ip-brctl.
> > >   Code is conceptually duplicated there, and I think we should actually
> > >   get rid of that -- but then 'bridge' wouldn't even give information
> > >   about the FDB, one would need to use ip neighbour instead.
> >
> > This could be comment from initial days. Today bridge has support for
> > fdb, vlans and vlan tunnels which you
> > cannot get from brctl and any brctl compat tool.
>
> This is unrelated. I'm specifically referring to the fdb information,
> which is similar to what ip neighbour displays, and which you can
> indeed get with brctl, see cmd_showmacs() from my script.

I understood what you are saying. But there is no point moving fdb to
neigh now because there is so much around it 'bridge fdb show, 'bridge
monitor fdb' etc)


>
> > > - 'bridge' doesn't implement settings for basic bridge features (say,
> > >   STP), which are convenient for users, especially if they are used to
> > >   brctl. To get that, even at an interface/syntax level, we would need
> > >   to duplicate some parts of ip-link, which looks like a bad idea per
> > >   se.
> >
> > thats fine IMO. Today ip link set extended bridge attribute support is
> > only for convenience.
> >
> > You can set most attributes both from ip link set and bridge link
> > command. We can see if they can share code.
>
> *This* is exactly what looks confusing to me.
>
> > You can set a vlan on a bridge today via the bridge command. I dont
> > see why we should hesitate about STP here.
> > And you will get the json output for free.
>
> I'm not particularly interested in non-human users here, not in the
> context of this discussion at least.

sure, i am with u for better human readable output.

>
> > > > Its supposed to be the netlink based tool for all bridging and hence
> > > > could be a good replacement for all brctl users.
> > >
> > > I still think the best replacement for users is the one that changes
> > > absolutely nothing, and if that's easily achievable, I'd rather go for
> > > it.
> >
> > That would also mean we add ip-ifconfig and ip-ethtool (if we
> > deprecate ethtool tomorrow. i am not saying its going away....,
> > but just giving you an example of ioctl to netlink based tools).
>
> Again, it's not the same as ifconfig, for the reasons I mentioned
> above. And by the way I think ifconfig doesn't even vaguely have the
> same amount of users of brctl nowadays, and for a reason, but I would
> only have anecdotal experience to support this.
>
> --
> Stefano

^ permalink raw reply

* [RFC PATCH v2 5/5] net: lorawan: Split skb definitions into another header
From: Jian-Hong Pan @ 2019-01-31 16:28 UTC (permalink / raw)
  To: Andreas Färber; +Cc: netdev, linux-lpwan, Ben Whitten, Jian-Hong Pan
In-Reply-To: <20190131162101.18386-1-starnight@g.ncu.edu.tw>

Split LoRaWAN related skb definitions from lora/lorawan_netdev.h into
another header lora/lorawan/skb.h.

Signed-off-by: Jian-Hong Pan <starnight@g.ncu.edu.tw>
---
v2:
- Modify the commit message
- Move lora/lorawan_netdev.h to lora/lorawan/skb.h

 include/linux/lora/lorawan/skb.h    | 33 +++++++++++++++++++++++++++++
 include/linux/lora/lorawan_netdev.h | 20 -----------------
 net/lorawan/socket.c                |  1 +
 3 files changed, 34 insertions(+), 20 deletions(-)
 create mode 100644 include/linux/lora/lorawan/skb.h

diff --git a/include/linux/lora/lorawan/skb.h b/include/linux/lora/lorawan/skb.h
new file mode 100644
index 000000000000..ea4f900b37be
--- /dev/null
+++ b/include/linux/lora/lorawan/skb.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later OR BSD-3-Clause */
+/*
+ * LoRaWAN socket buffer related definitions
+ *
+ * Copyright (c) 2018 Jian-Hong Pan <starnight@g.ncu.edu.tw>
+ */
+
+#ifndef __LORAWAN_SKB_H__
+#define __LORAWAN_SKB_H__
+
+#include <linux/skbuff.h>
+
+/**
+ * lrw_mac_cb - This structure holds the control buffer (cb) of sk_buff
+ *
+ * @devaddr:	the LoRaWAN device address of this LoRaWAN hardware
+ */
+struct lrw_mac_cb {
+	u32 devaddr;
+};
+
+/**
+ * lrw_get_mac_cb - Get the LoRaWAN MAC control buffer of the sk_buff
+ * @skb:	the exchanging sk_buff
+ *
+ * Return:	the pointer of LoRaWAN MAC control buffer
+ */
+static inline struct lrw_mac_cb *lrw_get_mac_cb(struct sk_buff *skb)
+{
+	return (struct lrw_mac_cb *)skb->cb;
+}
+
+#endif
diff --git a/include/linux/lora/lorawan_netdev.h b/include/linux/lora/lorawan_netdev.h
index a6c94cb96bf4..e515ba1ab508 100644
--- a/include/linux/lora/lorawan_netdev.h
+++ b/include/linux/lora/lorawan_netdev.h
@@ -28,24 +28,4 @@ struct sockaddr_lorawan {
 	struct lrw_addr_in addr_in;
 };
 
-/**
- * lrw_mac_cb - This structure holds the control buffer (cb) of sk_buff
- *
- * @devaddr:	the LoRaWAN device address of this LoRaWAN hardware
- */
-struct lrw_mac_cb {
-	u32 devaddr;
-};
-
-/**
- * lrw_get_mac_cb - Get the LoRaWAN MAC control buffer of the sk_buff
- * @skb:	the exchanging sk_buff
- *
- * Return:	the pointer of LoRaWAN MAC control buffer
- */
-static inline struct lrw_mac_cb *lrw_get_mac_cb(struct sk_buff *skb)
-{
-	return (struct lrw_mac_cb *)skb->cb;
-}
-
 #endif
diff --git a/net/lorawan/socket.c b/net/lorawan/socket.c
index c94dc0f3cf82..e7cb45bd93d0 100644
--- a/net/lorawan/socket.c
+++ b/net/lorawan/socket.c
@@ -12,6 +12,7 @@
 #include <linux/if_arp.h>
 #include <linux/init.h>
 #include <linux/list.h>
+#include <linux/lora/lorawan/skb.h>
 #include <linux/lora/lorawan_netdev.h>
 #include <linux/module.h>
 #include <linux/net.h>
-- 
2.20.1


^ permalink raw reply related

* [RFC PATCH v2 4/5] net: lorawan: Fulfill the help text of Kconfig
From: Jian-Hong Pan @ 2019-01-31 16:28 UTC (permalink / raw)
  To: Andreas Färber; +Cc: netdev, linux-lpwan, Ben Whitten, Jian-Hong Pan
In-Reply-To: <20190131162101.18386-1-starnight@g.ncu.edu.tw>

Mention the LoRaWAN network feature to distinguish it from other
Low-Power Wide-Area Network like Sigfox and NB-IoT.

Fixes: 48e5bb6cec79 ("net: Prepare LoRaWAN socket module")
Signed-off-by: Jian-Hong Pan <starnight@g.ncu.edu.tw>
---
v2:
- Modify the commit message
- Fix the help text's space between two sentences

 net/lorawan/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/lorawan/Kconfig b/net/lorawan/Kconfig
index bf6c9b77573b..a55762b1aba0 100644
--- a/net/lorawan/Kconfig
+++ b/net/lorawan/Kconfig
@@ -4,7 +4,8 @@ config LORAWAN
 	  LoRaWAN defines low data rate, low power and long range wireless
 	  wide area networks. It was designed to organize networks of automation
 	  devices, such as sensors, switches and actuators. It can operate
-	  multiple kilometers wide.
+	  multiple kilometers wide. The network is client/server technology
+	  centered around gateways.
 
 	  Say Y here to compile LoRaWAN support into the kernel or say M to
 	  compile it as a module.
-- 
2.20.1


^ permalink raw reply related

* [RFC PATCH v2 3/5] net: lorawan: Fix net device leakage
From: Jian-Hong Pan @ 2019-01-31 16:28 UTC (permalink / raw)
  To: Andreas Färber; +Cc: netdev, linux-lpwan, Ben Whitten, Jian-Hong Pan
In-Reply-To: <20190131162101.18386-1-starnight@g.ncu.edu.tw>

The net device may be missed to be put after error check.  This patch
fixes the issue to prevent the leakage.

Fixes: 48e5bb6cec79 ("net: Prepare LoRaWAN socket module")
Signed-off-by: Jian-Hong Pan <starnight@g.ncu.edu.tw>
---
v2:
- Modify the commit message

 net/lorawan/socket.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/lorawan/socket.c b/net/lorawan/socket.c
index 38cee1ff02af..c94dc0f3cf82 100644
--- a/net/lorawan/socket.c
+++ b/net/lorawan/socket.c
@@ -51,8 +51,10 @@ lrw_get_dev_by_addr(struct net *net, u32 devaddr)
 
 	rcu_read_lock();
 	ndev = dev_getbyhwaddr_rcu(net, ARPHRD_LORAWAN, (char *)&be_addr);
-	if (ndev)
+	if (ndev && ndev->type == ARPHRD_LORAWAN)
 		dev_hold(ndev);
+	else
+		ndev = NULL;
 	rcu_read_unlock();
 
 	return ndev;
@@ -99,11 +101,6 @@ dgram_bind(struct sock *sk, struct sockaddr *uaddr, int len)
 	}
 	netdev_dbg(ndev, "%s: get ndev\n", __func__);
 
-	if (ndev->type != ARPHRD_LORAWAN) {
-		ret = -ENODEV;
-		goto dgram_bind_end;
-	}
-
 	ro->src_devaddr = addr->addr_in.devaddr;
 	ro->bound = 1;
 	ret = 0;
@@ -152,7 +149,7 @@ dgram_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 	if (size > ndev->mtu) {
 		netdev_dbg(ndev, "size = %zu, mtu = %u\n", size, ndev->mtu);
 		ret = -EMSGSIZE;
-		goto dgram_sendmsg_end;
+		goto dgram_sendmsg_no_skb;
 	}
 
 	netdev_dbg(ndev, "%s: create skb\n", __func__);
@@ -189,7 +186,6 @@ dgram_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 	kfree_skb(skb);
 dgram_sendmsg_no_skb:
 	dev_put(ndev);
-
 dgram_sendmsg_end:
 	return ret;
 }
-- 
2.20.1


^ permalink raw reply related

* [RFC PATCH v2 2/5] net: lorawan: Remove unused lrw_dev_hard_header function
From: Jian-Hong Pan @ 2019-01-31 16:27 UTC (permalink / raw)
  To: Andreas Färber; +Cc: netdev, linux-lpwan, Ben Whitten, Jian-Hong Pan
In-Reply-To: <20190131162101.18386-1-starnight@g.ncu.edu.tw>

The lorawan module is an abastraction layer over the LoRaWAN soft and
hard MAC.  It passes the original buffer to the real MAC layer.  So,
this patch removes the lrw_dev_hard_header function.

Fixes: 48e5bb6cec79 ("net: Prepare LoRaWAN socket module")
Signed-off-by: Jian-Hong Pan <starnight@g.ncu.edu.tw>
---
v2:
- Modify the commit message

 net/lorawan/socket.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/net/lorawan/socket.c b/net/lorawan/socket.c
index 31a77c3e5ee9..38cee1ff02af 100644
--- a/net/lorawan/socket.c
+++ b/net/lorawan/socket.c
@@ -115,14 +115,6 @@ dgram_bind(struct sock *sk, struct sockaddr *uaddr, int len)
 	return ret;
 }
 
-static int
-lrw_dev_hard_header(struct sk_buff *skb, struct net_device *ndev,
-		    const u32 src_devaddr, size_t len)
-{
-	/* TODO: Prepare the LoRaWAN sending header here */
-	return 0;
-}
-
 static int
 dgram_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 {
@@ -176,10 +168,6 @@ dgram_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 	skb_reserve(skb, hlen);
 	skb_reset_network_header(skb);
 
-	ret = lrw_dev_hard_header(skb, ndev, 0, size);
-	if (ret < 0)
-		goto dgram_sendmsg_no_skb;
-
 	ret = memcpy_from_msg(skb_put(skb, size), msg, size);
 	if (ret > 0)
 		goto dgram_sendmsg_err_skb;
-- 
2.20.1


^ permalink raw reply related

* [RFC PATCH v2 1/5] net: lorawan: Refine the coding style
From: Jian-Hong Pan @ 2019-01-31 16:27 UTC (permalink / raw)
  To: Andreas Färber; +Cc: netdev, linux-lpwan, Ben Whitten, Jian-Hong Pan
In-Reply-To: <20190131162101.18386-1-starnight@g.ncu.edu.tw>

Fix the coding style.

Fixes: 48e5bb6cec79 ("net: Prepare LoRaWAN socket module")
Signed-off-by: Jian-Hong Pan <starnight@g.ncu.edu.tw>
---
v2:
- Modify the commit message
- Order the included header files

 include/linux/lora/lorawan_netdev.h |  5 ++--
 net/lorawan/socket.c                | 45 ++++++++++++++---------------
 2 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/include/linux/lora/lorawan_netdev.h b/include/linux/lora/lorawan_netdev.h
index 5bffb5164f95..a6c94cb96bf4 100644
--- a/include/linux/lora/lorawan_netdev.h
+++ b/include/linux/lora/lorawan_netdev.h
@@ -1,9 +1,8 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later OR BSD-3-Clause */
-/*-
+/*
  * LoRaWAN stack related definitions
  *
- * Copyright (c) 2018 Jian-Hong, Pan <starnight@g.ncu.edu.tw>
- *
+ * Copyright (c) 2018 Jian-Hong Pan <starnight@g.ncu.edu.tw>
  */
 
 #ifndef __LORAWAN_NET_DEVICE_H__
diff --git a/net/lorawan/socket.c b/net/lorawan/socket.c
index 7ef106b877ca..31a77c3e5ee9 100644
--- a/net/lorawan/socket.c
+++ b/net/lorawan/socket.c
@@ -1,36 +1,33 @@
 // SPDX-License-Identifier: GPL-2.0-or-later OR BSD-3-Clause
-/*-
+/*
  * LoRaWAN stack related definitions
  *
- * Copyright (c) 2018 Jian-Hong, Pan <starnight@g.ncu.edu.tw>
- *
+ * Copyright (c) 2018 Jian-Hong Pan <starnight@g.ncu.edu.tw>
  */
 
 #define	LORAWAN_MODULE_NAME	"lorawan"
 
 #define	pr_fmt(fmt)		LORAWAN_MODULE_NAME ": " fmt
 
+#include <linux/if_arp.h>
 #include <linux/init.h>
-#include <linux/module.h>
 #include <linux/list.h>
+#include <linux/lora/lorawan_netdev.h>
+#include <linux/module.h>
 #include <linux/net.h>
-#include <linux/if_arp.h>
 #include <linux/termios.h>		/* For TIOCOUTQ/INQ */
 #include <net/sock.h>
-#include <linux/lora/lorawan_netdev.h>
 
 /**
  * dgram_sock - This structure holds the states of Datagram socket
  *
  * @sk:			network layer representation of the socket
- *			sk must be the first member of dgram_sock
  * @src_devaddr:	the LoRaWAN device address for this connection
  * @bound:		this socket is bound or not
  * @connected:		this socket is connected to the destination or not
- * @want_ack:		this socket needs to ack for the connection or not
  */
 struct dgram_sock {
-	struct sock sk;
+	struct sock sk;	/* sk must be the first member of dgram_sock */
 	u32 src_devaddr;
 
 	u8 bound:1;
@@ -136,7 +133,7 @@ dgram_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 	size_t tlen;
 	int ret;
 
-	pr_debug("%s: going to send %zu bytes", __func__, size);
+	pr_debug("%s: going to send %zu bytes\n", __func__, size);
 	if (msg->msg_flags & MSG_OOB) {
 		pr_debug("msg->msg_flags = 0x%x\n", msg->msg_flags);
 		return -EOPNOTSUPP;
@@ -532,7 +529,7 @@ static const struct proto_ops lrw_dgram_ops = {
 };
 
 static int
-lorawan_creat(struct net *net, struct socket *sock, int protocol, int kern)
+lrw_create(struct net *net, struct socket *sock, int protocol, int kern)
 {
 	struct sock *sk;
 	int ret;
@@ -557,7 +554,7 @@ lorawan_creat(struct net *net, struct socket *sock, int protocol, int kern)
 		ret = sk->sk_prot->hash(sk);
 		if (ret) {
 			sk_common_release(sk);
-			goto lorawan_creat_end;
+			goto lrw_create_end;
 		}
 	}
 
@@ -567,14 +564,14 @@ lorawan_creat(struct net *net, struct socket *sock, int protocol, int kern)
 			sk_common_release(sk);
 	}
 
-lorawan_creat_end:
+lrw_create_end:
 	return ret;
 }
 
 static const struct net_proto_family lorawan_family_ops = {
 	.owner		= THIS_MODULE,
 	.family		= PF_LORAWAN,
-	.create		= lorawan_creat,
+	.create		= lrw_create,
 };
 
 static int
@@ -617,29 +614,29 @@ lrw_dgram_deliver(struct net_device *ndev, struct sk_buff *skb)
 }
 
 static int
-lorawan_rcv(struct sk_buff *skb, struct net_device *ndev,
-	    struct packet_type *pt, struct net_device *orig_ndev)
+lrw_rcv(struct sk_buff *skb, struct net_device *ndev,
+	struct packet_type *pt, struct net_device *orig_ndev)
 {
 	if (!netif_running(ndev))
-		goto lorawan_rcv_drop;
+		goto lrw_rcv_drop;
 
 	if (!net_eq(dev_net(ndev), &init_net))
-		goto lorawan_rcv_drop;
+		goto lrw_rcv_drop;
 
 	if (ndev->type != ARPHRD_LORAWAN)
-		goto lorawan_rcv_drop;
+		goto lrw_rcv_drop;
 
 	if (skb->pkt_type != PACKET_OTHERHOST)
 		return lrw_dgram_deliver(ndev, skb);
 
-lorawan_rcv_drop:
+lrw_rcv_drop:
 	kfree_skb(skb);
 	return NET_RX_DROP;
 }
 
 static struct packet_type lorawan_packet_type = {
 	.type		= htons(ETH_P_LORAWAN),
-	.func		= lorawan_rcv,
+	.func		= lrw_rcv,
 };
 
 static int __init
@@ -665,7 +662,7 @@ lrw_sock_init(void)
 	proto_unregister(&lrw_dgram_prot);
 
 lrw_sock_init_end:
-	return 0;
+	return ret;
 }
 
 static void __exit
@@ -680,7 +677,7 @@ lrw_sock_exit(void)
 module_init(lrw_sock_init);
 module_exit(lrw_sock_exit);
 
-MODULE_AUTHOR("Jian-Hong Pan, <starnight@g.ncu.edu.tw>");
-MODULE_DESCRIPTION("LoRaWAN socket kernel module");
+MODULE_AUTHOR("Jian-Hong Pan <starnight@g.ncu.edu.tw>");
+MODULE_DESCRIPTION("LoRaWAN socket protocol");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_ALIAS_NETPROTO(PF_LORAWAN);
-- 
2.20.1


^ permalink raw reply related

* [RFC PATCH v2 0/5] net: lorawan: Refine the lorawan protocol module
From: Jian-Hong Pan @ 2019-01-31 16:21 UTC (permalink / raw)
  To: Andreas Färber; +Cc: netdev, linux-lpwan, Ben Whitten, Jian-Hong Pan

Hello,

This series refines the commit 48e5bb6cec79 ("net: Prepare LoRaWAN
socket module") for coding style.
Besides, sperates LoRaWAN related skb definitions from
lora/lorawan_netdev.h into another header lora/lorawan/skb.h.

Thanks,
Jian-Hong Pan

Jian-Hong Pan (5):
  net: lorawan: Refine the coding style
  net: lorawan: Remove unused lrw_dev_hard_header function
  net: lorawan: Fix net device leakage
  net: lorawan: Fulfill the help text of Kconfig
  net: lorawan: Split skb definitions into another header

 include/linux/lora/lorawan/skb.h    | 33 ++++++++++++++
 include/linux/lora/lorawan_netdev.h | 25 +----------
 net/lorawan/Kconfig                 |  3 +-
 net/lorawan/socket.c                | 70 +++++++++++------------------
 4 files changed, 63 insertions(+), 68 deletions(-)
 create mode 100644 include/linux/lora/lorawan/skb.h

-- 
2.20.1


^ permalink raw reply

* Re: [PATCH net-next v3 8/8] ethtool: add compat for devlink info
From: kbuild test robot @ 2019-01-31 16:19 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: kbuild-all, davem, netdev, oss-drivers, jiri, andrew, f.fainelli,
	mkubecek, eugenem, jonathan.lemon, Jakub Kicinski
In-Reply-To: <20190130234133.4298-9-jakub.kicinski@netronome.com>

[-- Attachment #1: Type: text/plain, Size: 1079 bytes --]

Hi Jakub,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jakub-Kicinski/devlink-add-device-driver-information-API/20190131-224626
config: m68k-sun3_defconfig (attached as .config)
compiler: m68k-linux-gnu-gcc (Debian 8.2.0-11) 8.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=8.2.0 make.cross ARCH=m68k 

All errors (new ones prefixed by >>):

   m68k-linux-gnu-ld: drivers/rtc/proc.o: in function `is_rtc_hctosys.isra.0':
   proc.c:(.text+0x178): undefined reference to `strcmp'
   m68k-linux-gnu-ld: net/core/ethtool.o: in function `ethtool_get_drvinfo':
>> ethtool.c:(.text+0xc08): undefined reference to `devlink_compat_running_version'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 12133 bytes --]

^ permalink raw reply

* Re: [RFC 00/14] netlink/hierarchical stats
From: Roopa Prabhu @ 2019-01-31 16:16 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Miller, oss-drivers, netdev, Jiří Pírko,
	Florian Fainelli, Andrew Lunn, Michal Kubecek, David Ahern,
	Simon Horman, Brandeburg, Jesse, maciejromanfijalkowski,
	vasundhara-v.volam, Michael Chan, shalomt, Ido Schimmel
In-Reply-To: <20190130162408.60f1f5dc@cakuba.hsd1.ca.comcast.net>

On Wed, Jan 30, 2019 at 4:24 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Wed, 30 Jan 2019 14:14:34 -0800, Roopa Prabhu wrote:
> > On Mon, Jan 28, 2019 at 3:45 PM Jakub Kicinski wrote:
> > > Hi!
> > >
> > > As I tried to explain in my slides at netconf 2018 we are lacking
> > > an expressive, standard API to report device statistics.
> > >
> > > Networking silicon generally maintains some IEEE 802.3 and/or RMON
> > > statistics.  Today those all end up in ethtool -S.  Here is a simple
> > > attempt (admittedly very imprecise) of counting how many names driver
> > > authors invented for IETF RFC2819 etherStatsPkts512to1023Octets
> > > statistics (RX and TX):
> > >
> > > $ git grep '".*512.*1023.*"' -- drivers/net/ | \
> > >     sed -e 's/.*"\(.*\)".*/\1/' | sort | uniq | wc -l
> > > 63
> > >
> > > Interestingly only two drivers in the tree use the name the standard
> > > gave us (etherStatsPkts512to1023, modulo case).
> > >
> > > I set out to working on this set in an attempt to give drivers a way
> > > to express clearly to user space standard-compliant counters.
> > >
> > > Second most common use for custom statistics is per-queue counters.
> > > This is where the "hierarchical" part of this set comes in, as
> > > groups can be nested, and user space tools can handle the aggregation
> > > inside the groups if needed.
> > >
> > > This set also tries to address the problem of users not knowing if
> > > a statistic is reported by hardware or the driver.  Many modern drivers
> > > use some prefix in ethtool -S to indicate MAC/PHY stats.  At a quick
> > > glance: Netronome uses "mac.", Intel "port." and Mellanox "_phy".
> > > In this set, netlink attributes describe whether a group of statistics
> > > is RX or TX, maintained by device or driver.
> > >
> > > The purpose of this patch set is _not_ to replace ethtool -S.  It is
> > > an incredibly useful tool, and we will certainly continue using it.
> > > However, for standard-based and commonly maintained statistics a more
> > > structured API seems warranted.
> > >
> > > There are two things missing from these patches, which I initially
> > > planned to address as well: filtering, and refresh rate control.
> > >
> > > Filtering doesn't need much explanation, users should be able to request
> > > only a subset of statistics (like only SW stats or only given ID).  The
> > > bitmap of statistics in each group is there for filtering later on.
> > >
> > > By refresh control I mean the ability for user space to indicate how
> > > "fresh" values it expects.  Sometimes reading the HW counters requires
> > > slow register reads or FW communication, in such cases drivers may cache
> > > the result.  (Privileged) user space should be able to add a "not older
> > > than" timestamp to indicate how fresh statistics it expects.  And vice
> > > versa, drivers can then also put the timestamp of when the statistics
> > > were last refreshed in the dump for more precise bandwidth estimation.
> >
> > Jakub, Glad to see hw stats in the RTM_*STATS api. I do see you
> > mention 'partial' support for ethtool stats. I understand the reason
> > you say its partial.
> > But while at it, why not also include the ability to have driver
> > extensible stats here ? ie make it complete. We have talked about
> > making all hw stats available
> > via the RTM_*STATS api in the past..., so just want to make sure the
> > new HSTATS infra you are adding to the RTM_*STATS api
> > covers or at-least makes it possible to include driver extensible
> > stats in the future where the driver gets to define the stats id +
> > value (This is very useful).
> >  It would be nice if you can account for that in this new HSTATS API.
>
> My thinking was that we should leave truly custom/strange stats to
> ethtool API which works quite well for that and at the same time be
> very accepting of people adding new IDs to HSTAT (only requirement is
> basically defining the meaning very clearly).

that sounds reasonable. But the 'defining meaning clearly' gets tricky
sometimes.
The vendor who gets their ID or meaning first wins :) and the rest
will have to live with
ethtool and explain to rest of the world that ethtool is more reliable
for their hardware :)

I am also concerned that this getting the ID into common HSTAT ID
space will  slow down the process of adding new counters
for vendors. Which will lead to vendors sticking with ethtool API. It
would be great if people can get all stats in one place and not rely
on another API for 'more'.

>
> For the first stab I looked at two drivers and added all the stats that
> were common.
>
> Given this set is identifying statistics by ID - how would we make that
> extensible to drivers?  Would we go back to strings or have some
> "driver specific" ID space?

I was looking for ideas from you really, to see if you had considered
this. agree per driver ID space seems ugly.
ethtool strings are great today...if we can control the duplication.
But thinking some more..., i did see some
patches recently for vendor specific parameter (with ID) space in
devlink. maybe something like that will be
reasonable ?

>
> Is there any particular type of statistic you'd expect drivers to want
> to add?  For NICs I think IEEE/RMON should pretty much cover the
> silicon ones, but I don't know much about switches :)

I will have to go through the list. But switch asics do support
flexible stats/counters that can be attached at various points.
And new chip versions come with more support. Having that flexibility
to expose/extend such stats incrementally is very valuable on a per
hardware/vendor basis.

^ permalink raw reply

* Re: [PATCH] net: check negative value for signed refcnt
From: Kirill Tkhai @ 2019-01-31 15:34 UTC (permalink / raw)
  To: Eric Dumazet, alexandre.besnard, davem, ecree, jiri, petrm,
	alexander.h.duyck, amritha.nambiar, lirongqing
  Cc: netdev, linux-kernel
In-Reply-To: <31736d5e-8e14-67f2-9780-9ba6a6f10f4d@gmail.com>

On 31.01.2019 18:21, Eric Dumazet wrote:
> 
> 
> On 01/31/2019 07:15 AM, Eric Dumazet wrote:
>>
>>
>> On 01/31/2019 05:49 AM, Kirill Tkhai wrote:
>>>
>>> 2)Not related to your patch -- it looks like we have problem in existing
>>> code with this netdev_refcnt_read(). It does not imply a memory ordering
>>> or some guarantees about reading percpu values. For example, in generic
>>> code struct percpu_ref switches a counter into atomic mode before it checks
>>> for the last reference. But there is nothing in netdev_refcnt_read().
>>
>> Well, if we read an old value here, after a full and expensive synchronize_net(),
>> then we would have lot more problems than simply having a second round in
>> netdev_wait_allrefs()
>>  
>>
> 
> percpu_ref was added more recently than the netdev_refcnt stuff, and is
> interesting for users wanting a synchronous wait for the refcnt reaching 0.
> 
> netdev_wait_allrefs() was designed to be asynchronous, so that we at least release
> RTNL (and current cpu) when something is wrong and a device can not be dismantled.

Yeah, they are different, and I think we can't add more synchronize_rcu()-dependent
synchronizations in this code, since network namespaces are already destroyed very slow.

^ permalink raw reply

* Re: [PATCH] tty: Fix WARNING in tty_set_termios
From: Johan Hovold @ 2019-01-31 15:33 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Johan Hovold, shuah, Al Viro, open list:NFC SUBSYSTEM, chris,
	devel, Rob Herring, Samuel Ortiz, open list:SERIAL DRIVERS,
	Jiri Slaby, santhameena13, kirk, Johan Hedberg, Arnd Bergmann,
	samuel.thibault, m.maya.nakamura, zhongjiang, Greg KH, speakup,
	Linux Kernel Mailing List, Bluez mailing list, netdev,
	nishka.dasgupta_ug18, David S. Miller
In-Reply-To: <D0EE5C33-2930-4286-998C-822DD7898B76@holtmann.org>

On Thu, Jan 31, 2019 at 04:18:33PM +0100, Marcel Holtmann wrote:

> > I agree with Al that this change doesn't make much sense. The WARN_ON
> > is there to catch any bugs leading to the termios being changed for a
> > master side pty. Those should bugs should be fixed, and not worked
> > around in order to silence a WARN_ON.
> > 
> > The problem started with 7721383f4199 ("Bluetooth: hci_uart: Support
> > operational speed during setup") which introduced a new way for how
> > tty_set_termios() could end up being called for a master pty.
> > 
> > As Al hinted at, setting these ldiscs for a master pty really makes no
> > sense and perhaps that is what we should prevent unless simply making
> > sure they do not call tty_set_termios() is sufficient for the time
> > being.
> > 
> > Finally, note that serdev never operates on a pty, and that this is only
> > an issue for (the three) line disciplines.
> 
> I think for PTYs we should just fail setting the HCI line discipline.
> Fail early and just move on with life.

Sounds good to me. At least for the pty master. There may be some people
trying to use a bluetooth device connected to a remote serial port (I've
seen descriptions of such setups at least), and maybe we need not prevent
that.

Johan

^ permalink raw reply

* Re: [PATCH] net: check negative value for signed refcnt
From: Kirill Tkhai @ 2019-01-31 15:31 UTC (permalink / raw)
  To: Alexandre BESNARD
  Cc: davem, amritha nambiar, lirongqing, netdev, linux-kernel,
	alexander h duyck, jiri, petrm, ecree
In-Reply-To: <561703429.2347936.1548947660762.JavaMail.zimbra@softathome.com>

On 31.01.2019 18:14, Alexandre BESNARD wrote:
> Hi Kirill, and thanks for your time,
> 
> On 31 Jan 19 14:49, Kirill Tkhai ktkhai@virtuozzo.com wrote :
> 
>> Hi, Alexandre,
> 
>> On 31.01.2019 16:20, alexandre.besnard@softathome.com wrote:
>>> From: Alexandre Besnard <alexandre.besnard@softathome.com>
> 
>>> Device remaining references counter is get as a signed integer.
> 
>>> When unregistering network devices, the loop waiting for this counter
>>> to decrement tests the 0 strict equality. Thus if an error occurs and
>>> two references are given back by a protocol, we are stuck in the loop
>>> forever, with a -1 value.
> 
>>> Robustness is added by checking a negative value: the device is then
>>> considered free of references, and a warning is issued (it should not
>>> happen, one should check that behavior)
> 
>>> Signed-off-by: Alexandre Besnard <alexandre.besnard@softathome.com>
>>> ---
>>> net/core/dev.c | 5 +++++
>>> 1 file changed, 5 insertions(+)
> 
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index ddc551f..e4190ae 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -8687,6 +8687,11 @@ static void netdev_wait_allrefs(struct net_device *dev)
>>> refcnt = netdev_refcnt_read(dev);
> 
>>> while (refcnt != 0) {
>>> + if (refcnt < 0) {
>>> + pr_warn("Device %s refcnt negative: device considered free, but it should not
>>> happen\n",
>>> + dev->name);
>>> + break;
>>> + }
> 
>> 1)I don't think this is a good approach. Negative value does not guarantee
>> there is just a double put of device reference. Negative value is an indicator
>> something goes wrong, and we definitely should not free device memory in
>> this case.
> 
>> 2)Not related to your patch -- it looks like we have problem in existing
>> code with this netdev_refcnt_read(). It does not imply a memory ordering
>> or some guarantees about reading percpu values. For example, in generic
>> code struct percpu_ref switches a counter into atomic mode before it checks
>> for the last reference. But there is nothing in netdev_refcnt_read().
> 
> I agree with you, as it is not a full fix for a bad behavior of the refcnt: many
> wrong things could happen here, and that's why I added a warning (short of a
> more critical flag I could think of).
> 
> However, I think this is a good approach as a global workaround for any critical
> situation caused by a negative refcnt, acting as a failsafe. What I try to avoid
> here is not the bug, but a situation such as a deadlock keeping a system from
> powering off, or way worse in the system life.
> On the other hand, I can't think of a critical situation caused by freeing
> the device memory. Processes or even systems may crash in some cases, but it
> should be an expected behavior in such a case IMHO.
>
> Actually, I think that with the current implementation, most of the systems
> locked in the problem are powered off.
> 
> Do you think of any issue beyond this behavior ? 

The problem is that network devices are destroyed not only during reboot
of the system. For example, this happens every time network namespace dies.
And nobody wants to have network device memory is released in some undefined
condition, while system is continuing to run.

I see two approaches there. First one is to crash the system in case of
refcounter is not released for a long time (which is user-defined parameter).
The second one is to set the counter to INT_MAX/2 or something like this,
and never release this memory (we do especially this in our virtuozzo 7 kernel).

Maybe there are more solutions and another people will say about them.

Kirill




^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox