Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net 1/2] geneve, vxlan: Don't check skb_dst() twice
From: Nicolas Dichtel @ 2018-10-15 12:24 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: David S. Miller, Xin Long, Sabrina Dubroca, netdev
In-Reply-To: <20181015130830.1c177301@redhat.com>

Le 15/10/2018 à 13:08, Stefano Brivio a écrit :
> On Mon, 15 Oct 2018 12:19:41 +0200
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> 
>> Le 12/10/2018 à 23:53, Stefano Brivio a écrit :
>>> Commit f15ca723c1eb ("net: don't call update_pmtu unconditionally") avoids
>>> that we try updating PMTU for a non-existent destination, but didn't clean
>>> up cases where the check was already explicit. Drop those redundant checks.  
>> Yes, I leave them to avoid calculating the new mtu value when not needed. We are
>> in the xmit path.
> 
> Before 2/2 of this series, though, we call skb_dst_update_pmtu() (and
> in turn dst->ops->update_pmtu()) for *every* packet with a dst, which
Not if dst is of type md_dst_ops.

> I'd dare saying is by far the most common case. Besides, 2/2 needs
> anyway to calculate the MTU to fix a bug.
> 
> So I think this is a vast improvement overall.
Fair point.

^ permalink raw reply

* Re: [PATCH net-next, v3] hv_netvsc: fix vf serial matching with pci slot info
From: Stephen Hemminger @ 2018-10-15 19:56 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: haiyangz, davem, netdev, olaf, linux-kernel, devel, vkuznets
In-Reply-To: <20181015190615.30628-1-haiyangz@linuxonhyperv.com>

On Mon, 15 Oct 2018 19:06:15 +0000
Haiyang Zhang <haiyangz@linuxonhyperv.com> wrote:

> From: Haiyang Zhang <haiyangz@microsoft.com>
> 
> The VF device's serial number is saved as a string in PCI slot's
> kobj name, not the slot->number. This patch corrects the netvsc
> driver, so the VF device can be successfully paired with synthetic
> NIC.
> 
> Fixes: 00d7ddba1143 ("hv_netvsc: pair VF based on serial number")
> Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>

Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>

^ permalink raw reply

* [PATCH net] sctp: use the pmtu from the icmp packet to update transport pathmtu
From: Xin Long @ 2018-10-15 11:58 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: davem, Marcelo Ricardo Leitner, Neil Horman

Other than asoc pmtu sync from all transports, sctp_assoc_sync_pmtu
is also processing transport pmtu_pending by icmp packets. But it's
meaningless to use sctp_dst_mtu(t->dst) as new pmtu for a transport.

The right pmtu value should come from the icmp packet, and it would
be saved into transport->mtu_info in this patch and used later when
the pmtu sync happens in sctp_sendmsg_to_asoc or sctp_packet_config.

Besides, without this patch, as pmtu can only be updated correctly
when receiving a icmp packet and no place is holding sock lock, it
will take long time if the sock is busy with sending packets.

Note that it doesn't process transport->mtu_info in .release_cb(),
as there is no enough information for pmtu update, like for which
asoc or transport. It is not worth traversing all asocs to check
pmtu_pending. So unlike tcp, sctp does this in tx path, for which
mtu_info needs to be atomic_t.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/sctp/structs.h | 2 ++
 net/sctp/associola.c       | 3 ++-
 net/sctp/input.c           | 1 +
 net/sctp/output.c          | 6 ++++++
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 28a7c8e..a11f937 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -876,6 +876,8 @@ struct sctp_transport {
 	unsigned long sackdelay;
 	__u32 sackfreq;
 
+	atomic_t mtu_info;
+
 	/* When was the last time that we heard from this transport? We use
 	 * this to pick new active and retran paths.
 	 */
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 297d9cf..a827a1f 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1450,7 +1450,8 @@ void sctp_assoc_sync_pmtu(struct sctp_association *asoc)
 	/* Get the lowest pmtu of all the transports. */
 	list_for_each_entry(t, &asoc->peer.transport_addr_list, transports) {
 		if (t->pmtu_pending && t->dst) {
-			sctp_transport_update_pmtu(t, sctp_dst_mtu(t->dst));
+			sctp_transport_update_pmtu(t,
+						   atomic_read(&t->mtu_info));
 			t->pmtu_pending = 0;
 		}
 		if (!pmtu || (t->pathmtu < pmtu))
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 9bbc5f9..5c36a99 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -395,6 +395,7 @@ void sctp_icmp_frag_needed(struct sock *sk, struct sctp_association *asoc,
 		return;
 
 	if (sock_owned_by_user(sk)) {
+		atomic_set(&t->mtu_info, pmtu);
 		asoc->pmtu_pending = 1;
 		t->pmtu_pending = 1;
 		return;
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 7f849b0..67939ad 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -120,6 +120,12 @@ void sctp_packet_config(struct sctp_packet *packet, __u32 vtag,
 			sctp_assoc_sync_pmtu(asoc);
 	}
 
+	if (asoc->pmtu_pending) {
+		if (asoc->param_flags & SPP_PMTUD_ENABLE)
+			sctp_assoc_sync_pmtu(asoc);
+		asoc->pmtu_pending = 0;
+	}
+
 	/* If there a is a prepend chunk stick it on the list before
 	 * any other chunks get appended.
 	 */
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH net-next] netfilter: cttimeout: remove set but not used variable 'l3num'
From: Pablo Neira Ayuso @ 2018-10-15 11:55 UTC (permalink / raw)
  To: YueHaibing
  Cc: Jozsef Kadlecsik, Florian Westphal, netfilter-devel, coreteam,
	netdev, kernel-janitors
In-Reply-To: <1539137652-64831-1-git-send-email-yuehaibing@huawei.com>

On Wed, Oct 10, 2018 at 02:14:12AM +0000, YueHaibing wrote:
> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> net/netfilter/nfnetlink_cttimeout.c: In function 'cttimeout_default_set':
> net/netfilter/nfnetlink_cttimeout.c:353:8: warning:
>  variable 'l3num' set but not used [-Wunused-but-set-variable]
> 
> It not used any more after
> commit dd2934a95701 ("netfilter: conntrack: remove l3->l4 mapping information")

Applied.

^ permalink raw reply

* Re: bond: take rcu lock in netpoll_send_skb_on_dev
From: Eran Ben Elisha @ 2018-10-15 11:36 UTC (permalink / raw)
  To: Dave Jones, netdev@vger.kernel.org
  Cc: Cong Wang, Tariq Toukan, Saeed Mahameed
In-Reply-To: <20180928202608.uycdlytob75iphfu@codemonkey.org.uk>



On 9/28/2018 11:26 PM, Dave Jones wrote:
> The bonding driver lacks the rcu lock when it calls down into
> netdev_lower_get_next_private_rcu from bond_poll_controller, which
> results in a trace like:
> 
> WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 netdev_lower_get_next_private_rcu+0x34/0x40
> CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1
> Workqueue: bond0 bond_mii_monitor
> RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40
> Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 <0f> 0b eb de 31 c0 eb f5 0f 1f 40 00 0f 1f 44 00 00 48 8>
> RSP: 0018:ffffc9000087fa68 EFLAGS: 00010046
> RAX: 0000000000000000 RBX: ffff880429614560 RCX: 0000000000000000
> RDX: 0000000000000001 RSI: 00000000ffffffff RDI: ffffffffa184ada0
> RBP: ffffc9000087fa80 R08: 0000000000000001 R09: 0000000000000000
> R10: ffffc9000087f9f0 R11: ffff880429798040 R12: ffff8804289d5980
> R13: ffffffffa1511f60 R14: 00000000000000c8 R15: 00000000ffffffff
> FS:  0000000000000000(0000) GS:ffff88042f880000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f4b78fce180 CR3: 000000018180f006 CR4: 00000000001606e0
> Call Trace:
>   bond_poll_controller+0x52/0x170
>   netpoll_poll_dev+0x79/0x290
>   netpoll_send_skb_on_dev+0x158/0x2c0
>   netpoll_send_udp+0x2d5/0x430
>   write_ext_msg+0x1e0/0x210
>   console_unlock+0x3c4/0x630
>   vprintk_emit+0xfa/0x2f0
>   printk+0x52/0x6e
>   ? __netdev_printk+0x12b/0x220
>   netdev_info+0x64/0x80
>   ? bond_3ad_set_carrier+0xe9/0x180
>   bond_select_active_slave+0x1fc/0x310
>   bond_mii_monitor+0x709/0x9b0
>   process_one_work+0x221/0x5e0
>   worker_thread+0x4f/0x3b0
>   kthread+0x100/0x140
>   ? process_one_work+0x5e0/0x5e0
>   ? kthread_delayed_work_timer_fn+0x90/0x90
>   ret_from_fork+0x24/0x30
> 
> We're also doing rcu dereferences a layer up in netpoll_send_skb_on_dev
> before we call down into netpoll_poll_dev, so just take the lock there.
> 
> Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
> Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
> 
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index 3219a2932463..692367d7c280 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -330,6 +330,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
>   	/* It is up to the caller to keep npinfo alive. */
>   	struct netpoll_info *npinfo;
>   
> +	rcu_read_lock_bh();
Hi,

This suggested fix introduced a regression while using netconsole module 
with mlx5_core module loaded.

During irq handling, we hit a warning that this rcu_read_lock_bh cannot 
be taken inside an IRQ.
Isn't it accepted from a driver to print to kernel log inside irq 
handler or maybe the lock was taken too high in the calling chain of 
bond_poll_controller?

Attached below the trace we are hitting once we applied your patch over 
our systems.

[2018-10-15 10:45:30] mlx5_core 0000:00:09.0: firmware version: 16.22.8010
[2018-10-15 10:45:30] mlx5_core 0000:00:09.0: 63.008 Gb/s available PCIe 
bandwidth, limited by 8 GT/s x8 link at 0000:00:09.0 (capable of 126.016 
Gb/s with 8 GT/s x16 link)
[2018-10-15 10:45:31] (0000:00:09.0): E-Switch: Total vports 1, per 
vport: max uc(1024) max mc(16384)
[2018-10-15 10:45:31] mlx5_core 0000:00:09.0: Port module event: module 
0, Cable plugged
[2018-10-15 10:45:31] WARNING: CPU: 1 PID: 0 at kernel/softirq.c:168 
__local_bh_enable_ip+0x35/0x50
[2018-10-15 10:45:31] Modules linked in: mlx5_core(+) mlxfw bonding 
ip6_gre ip6_tunnel tunnel6 ip_gre ip_tunnel gre rdma_ucm ib_uverbs 
ib_ipoib ib_umad nfsv3 nfs_acl nfs lockd grace fscache netconsole 
mlx4_ib mlx4_en ptp pps_core mlx4_core cfg80211 devlink rfkill rpcrdma 
ib_isert iscsi_target_mod ib_iser ib_srpt target_core_mod ib_srp sunrpc 
rdma_cm ib_cm iw_cm ib_core snd_hda_codec_generic snd_hda_intel 
snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm 
snd_timer snd soundcore pcspkr i2c_piix4 sch_fq_codel ip_tables cirrus 
drm_kms_helper ata_generic pata_acpi syscopyarea sysfillrect sysimgblt 
fb_sys_fops ttm drm virtio_net net_failover i2c_core failover serio_raw 
floppy ata_piix [last unloaded: mlxfw]
[2018-10-15 10:45:31] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
4.19.0-rc6-J4083-G9e91d710a170 #1
[2018-10-15 10:45:31] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[2018-10-15 10:45:31] RIP: 0010:__local_bh_enable_ip+0x35/0x50
[2018-10-15 10:45:31] Code: 7e a9 00 00 0f 00 75 22 83 ee 01 f7 de 65 01 
35 91 8c f7 7e 65 8b 05 8a 8c f7 7e a9 00 ff 1f 00 74 0c 65 ff 0d 7c 8c 
f7 7e c3 <0f> 0b eb da 65 66 8b 05 1f 4e f8 7e 66 85 c0 74 e7 e8 55 ff ff ff
[2018-10-15 10:45:31] RSP: 0018:ffff880237a43c10 EFLAGS: 00010006
[2018-10-15 10:45:31] RAX: 0000000080010200 RBX: 0000000000000006 RCX: 
0000000000000001
[2018-10-15 10:45:31] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 
ffffffff817a1321
[2018-10-15 10:45:31] RBP: ffff880237a43c60 R08: 0000000000480020 R09: 
0000000000000000
[2018-10-15 10:45:31] R10: 000000020834c006 R11: 0000000000000000 R12: 
ffff880229963d68
[2018-10-15 10:45:31] R13: ffff88020834c034 R14: 0000000000006b00 R15: 
ffff8802297d8400
[2018-10-15 10:45:31] FS:  0000000000000000(0000) 
GS:ffff880237a40000(0000) knlGS:0000000000000000
[2018-10-15 10:45:31] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2018-10-15 10:45:31] CR2: 00007f96d4f57080 CR3: 00000001a157d000 CR4: 
00000000000006e0
[2018-10-15 10:45:31] Call Trace:
[2018-10-15 10:45:31]
[2018-10-15 10:45:31]  netpoll_send_udp+0x2de/0x410
[2018-10-15 10:45:31]  write_msg+0xdb/0xf0 [netconsole]
[2018-10-15 10:45:31]  console_unlock+0x33e/0x500
[2018-10-15 10:45:31]  vprintk_emit+0x211/0x280
[2018-10-15 10:45:31]  dev_vprintk_emit+0x10b/0x200
[2018-10-15 10:45:31]  dev_printk_emit+0x3b/0x50
[2018-10-15 10:45:31]  ? ttwu_do_wakeup+0x19/0x130
[2018-10-15 10:45:31]  _dev_info+0x55/0x60
[2018-10-15 10:45:31]  mlx5_eq_int+0x27a/0x690 [mlx5_core]
[2018-10-15 10:45:31]  __handle_irq_event_percpu+0x3a/0x190
[2018-10-15 10:45:31]  handle_irq_event_percpu+0x20/0x50
[2018-10-15 10:45:31]  handle_irq_event+0x27/0x50
[2018-10-15 10:45:31]  handle_edge_irq+0x6d/0x180
[2018-10-15 10:45:31]  handle_irq+0xa5/0x110
[2018-10-15 10:45:31]  do_IRQ+0x49/0xd0
[2018-10-15 10:45:31]  common_interrupt+0xf/0xf
[2018-10-15 10:45:31]
[2018-10-15 10:45:31] RIP: 0010:native_safe_halt+0x2/0x10
[2018-10-15 10:45:31] Code: 7e ff ff ff 7f f3 c3 65 48 8b 04 25 80 5b 01 
00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 
90 fb f4  0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
[2018-10-15 10:45:31] RSP: 0018:ffff88023663fed0 EFLAGS: 00000246 
ORIG_RAX: ffffffffffffffda
[2018-10-15 10:45:31] RAX: 0000000080000000 RBX: 0000000000000001 RCX: 
ffff880237a5a880
[2018-10-15 10:45:31] RDX: ffffffff8221cd48 RSI: ffff880237a5a880 RDI: 
0000000000000001
[2018-10-15 10:45:31] RBP: 0000000000000001 R08: 000000200b1d1602 R09: 
0000000000000000
[2018-10-15 10:45:31] R10: ffff880236627d20 R11: 0000000000000000 R12: 
0000000000000000
[2018-10-15 10:45:31] R13: 0000000000000000 R14: 0000000000000000 R15: 
0000000000000000
[2018-10-15 10:45:31]  default_idle+0x1c/0x140
[2018-10-15 10:45:31]  do_idle+0x194/0x240
[2018-10-15 10:45:31]  cpu_startup_entry+0x19/0x20
[2018-10-15 10:45:31]  start_secondary+0x138/0x170
[2018-10-15 10:45:31]  secondary_startup_64+0xa4/0xb0
[2018-10-15 10:45:31] ---[ end trace 10dfce1a9e88fa01 ]---

>   	lockdep_assert_irqs_disabled();
>   
>   	npinfo = rcu_dereference_bh(np->dev->npinfo);
> @@ -374,6 +375,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
>   		skb_queue_tail(&npinfo->txq, skb);
>   		schedule_delayed_work(&npinfo->tx_work,0);
>   	}
> +	rcu_read_unlock_bh();
>   }
>   EXPORT_SYMBOL(netpoll_send_skb_on_dev);
>   
> 

^ permalink raw reply

* [PATCH net-next,v3] hv_netvsc: fix vf serial matching with pci slot info
From: Haiyang Zhang @ 2018-10-15 19:06 UTC (permalink / raw)
  To: davem, netdev
  Cc: haiyangz, kys, sthemmin, olaf, vkuznets, devel, linux-kernel

From: Haiyang Zhang <haiyangz@microsoft.com>

The VF device's serial number is saved as a string in PCI slot's
kobj name, not the slot->number. This patch corrects the netvsc
driver, so the VF device can be successfully paired with synthetic
NIC.

Fixes: 00d7ddba1143 ("hv_netvsc: pair VF based on serial number")
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 9bcaf204a7d4..cf36e7ff3191 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2030,14 +2030,15 @@ static void netvsc_vf_setup(struct work_struct *w)
 	rtnl_unlock();
 }
 
-/* Find netvsc by VMBus serial number.
- * The PCI hyperv controller records the serial number as the slot.
+/* Find netvsc by VF serial number.
+ * The PCI hyperv controller records the serial number as the slot kobj name.
  */
 static struct net_device *get_netvsc_byslot(const struct net_device *vf_netdev)
 {
 	struct device *parent = vf_netdev->dev.parent;
 	struct net_device_context *ndev_ctx;
 	struct pci_dev *pdev;
+	u32 serial;
 
 	if (!parent || !dev_is_pci(parent))
 		return NULL; /* not a PCI device */
@@ -2048,16 +2049,22 @@ static struct net_device *get_netvsc_byslot(const struct net_device *vf_netdev)
 		return NULL;
 	}
 
+	if (kstrtou32(pci_slot_name(pdev->slot), 10, &serial)) {
+		netdev_notice(vf_netdev, "Invalid vf serial:%s\n",
+			      pci_slot_name(pdev->slot));
+		return NULL;
+	}
+
 	list_for_each_entry(ndev_ctx, &netvsc_dev_list, list) {
 		if (!ndev_ctx->vf_alloc)
 			continue;
 
-		if (ndev_ctx->vf_serial == pdev->slot->number)
+		if (ndev_ctx->vf_serial == serial)
 			return hv_get_drvdata(ndev_ctx->device_ctx);
 	}
 
 	netdev_notice(vf_netdev,
-		      "no netdev found for slot %u\n", pdev->slot->number);
+		      "no netdev found for vf serial:%u\n", serial);
 	return NULL;
 }
 
-- 
2.18.0

^ permalink raw reply related

* Re: [PATCH][net-next][v2] net: bridge: fix a possible memory leak in __vlan_add
From: Nikolay Aleksandrov @ 2018-10-15 11:13 UTC (permalink / raw)
  To: Li RongQing, netdev; +Cc: bridge, roopa
In-Reply-To: <1539601231-32755-1-git-send-email-lirongqing@baidu.com>

On 15/10/2018 14:00, Li RongQing wrote:
> After per-port vlan stats, vlan stats should be released
> when fail to add vlan
> 
> Fixes: 9163a0fc1f0c0 ("net: bridge: add support for per-port vlan stats")
> CC: bridge@lists.linux-foundation.org
> cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> CC: Roopa Prabhu <roopa@cumulusnetworks.com>
> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
>  net/bridge/br_vlan.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index 9b707234e4ae..8c9297a01947 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -303,6 +303,10 @@ static int __vlan_add(struct net_bridge_vlan *v, u16 flags)
>  	if (p) {
>  		__vlan_vid_del(dev, br, v->vid);
>  		if (masterv) {
> +			if (v->stats && masterv->stats != v->stats)
> +				free_percpu(v->stats);
> +			v->stats = NULL;
> +
>  			br_vlan_put_master(masterv);
>  			v->brvlan = NULL;
>  		}
> 

Thanks,
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

^ permalink raw reply

* Re: [PATCH net-next 11/18] vxlan: Add netif_is_vxlan()
From: Jakub Kicinski @ 2018-10-15 18:57 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: ivecera@redhat.com, andrew@lunn.ch, nikolay@cumulusnetworks.com,
	netdev@vger.kernel.org, roopa@cumulusnetworks.com,
	vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com,   <bridge@lists.linux-foundation.org>, mlxsw <mlxsw@mellanox.com>,  Jiri Pirko <jiri@mellanox.com>, Petr Machata <petrm@mellanox.com>, ,
	"bridge@lists.linux-foundation.org,  " 
In-Reply-To: <20181013171725.3261-12-idosch@mellanox.com>

On Sat, 13 Oct 2018 17:18:38 +0000, Ido Schimmel wrote:
> Add the ability to determine whether a netdev is a VxLAN netdev by
> calling the above mentioned function that checks the netdev's private
> flags.
> 
> This will allow modules to identify netdev events involving a VxLAN
> netdev and act accordingly. For example, drivers capable of VxLAN
> offload will need to configure the underlying device when a VxLAN netdev
> is being enslaved to an offloaded bridge.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Reviewed-by: Petr Machata <petrm@mellanox.com>

Is this preferable over

!strcmp(netdev->rtnl_link_ops->kind, "vxlan")

which is what TC offloads do?

^ permalink raw reply

* Re: [PATCH net 1/2] geneve, vxlan: Don't check skb_dst() twice
From: Stefano Brivio @ 2018-10-15 11:08 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: David S. Miller, Xin Long, Sabrina Dubroca, netdev
In-Reply-To: <61596775-4b5f-884a-7a0d-d8c134bb7e8a@6wind.com>

On Mon, 15 Oct 2018 12:19:41 +0200
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:

> Le 12/10/2018 à 23:53, Stefano Brivio a écrit :
> > Commit f15ca723c1eb ("net: don't call update_pmtu unconditionally") avoids
> > that we try updating PMTU for a non-existent destination, but didn't clean
> > up cases where the check was already explicit. Drop those redundant checks.  
> Yes, I leave them to avoid calculating the new mtu value when not needed. We are
> in the xmit path.

Before 2/2 of this series, though, we call skb_dst_update_pmtu() (and
in turn dst->ops->update_pmtu()) for *every* packet with a dst, which
I'd dare saying is by far the most common case. Besides, 2/2 needs
anyway to calculate the MTU to fix a bug.

So I think this is a vast improvement overall.

If we want to improve this further and avoid any indirect calls in the
most common path, we would need to cache the MTU in the dst -- it's
probably doable, but I would fix the specific issue addressed by 2/2
first.

-- 
Stefano

^ permalink raw reply

* [PATCH][net-next][v2] net: bridge: fix a possible memory leak in __vlan_add
From: Li RongQing @ 2018-10-15 11:00 UTC (permalink / raw)
  To: netdev; +Cc: bridge, nikolay, roopa

After per-port vlan stats, vlan stats should be released
when fail to add vlan

Fixes: 9163a0fc1f0c0 ("net: bridge: add support for per-port vlan stats")
CC: bridge@lists.linux-foundation.org
cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 net/bridge/br_vlan.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 9b707234e4ae..8c9297a01947 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -303,6 +303,10 @@ static int __vlan_add(struct net_bridge_vlan *v, u16 flags)
 	if (p) {
 		__vlan_vid_del(dev, br, v->vid);
 		if (masterv) {
+			if (v->stats && masterv->stats != v->stats)
+				free_percpu(v->stats);
+			v->stats = NULL;
+
 			br_vlan_put_master(masterv);
 			v->brvlan = NULL;
 		}
-- 
2.16.2

^ permalink raw reply related

* Re: [PATCH 2/2] 9p/trans_fd: put worker reqs on destroy
From: Dominique Martinet @ 2018-10-15 10:46 UTC (permalink / raw)
  To: Tomas Bortoli
  Cc: Dominique Martinet, v9fs-developer, netdev, LKML,
	Eric Van Hensbergen, Latchesar Ionkov
In-Reply-To: <CAAHj5qgBteZQOkjF-n3OXvN9e0v6NYEsqg3YOitEKW0jikTpng@mail.gmail.com>

Tomas Bortoli wrote on Tue, Oct 09, 2018:
> Il giorno mar 9 ott 2018 alle ore 06:06 Dominique Martinet
> > Fixes: 728356dedeff8 ("9p: Add refcount to p9_req_t")
> > Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
> > Cc: Eric Van Hensbergen <ericvh@gmail.com>
> > Cc: Latchesar Ionkov <lucho@ionkov.net>
> > Cc: Tomas Bortoli <tomasbortoli@gmail.com>
>
> Reviewed-by: Tomas Bortoli <tomasbortoli@gmail.com>

Thanks Tomas.

Tests seem ok, I've push both patches to my next branch, and will submit
them with the rest in a couple of weeks.
Quite a few of my patches haven't had reviews this cycle, if you read
this and have a bit of time, please pull my branch and have a read, or
each commit has a link to the lkml post to reply somewhat easily.


Reminder gir url:
 git://github.com/martinetd/linux 9p-next

-- 
Dominique

^ permalink raw reply

* Apply For Affordable Loan Offer
From: rifat @ 2018-10-15 10:27 UTC (permalink / raw)
  Cc: mooney, morenom, moshrif, mozharovskiy, mr.s.adams,
	muhle-bammbuhle, murdoch, muriel.groves, musart, music, muzyka,
	mvickers, mvillarino, mylifeismyrule, mzahan, n.dzyubenko,
	n.muratova, n.oleshkevich, n2006, n230063, narimantas.paliulis,
	narimantas.samalavicius, nataber, naujininkai,
	necpalova.magdalena, nehmeazoury, nerijus, nerijus,
	nestor-internet, netdev, news, next, nickyra3, nikolay



Do you need a loan? If YES Kindly contact us via: citigrouploaninvestment@aol.com

^ permalink raw reply

* Re: net/wan: hostess_sv11 + z85230 problems
From: Alan Cox @ 2018-10-15 10:29 UTC (permalink / raw)
  To: Krzysztof Hałasa; +Cc: Randy Dunlap, netdev@vger.kernel.org, LKML
In-Reply-To: <m34ldnd3ju.fsf@t19.piap.pl>

On Mon, 15 Oct 2018 10:20:21 +0200
khalasa@piap.pl (Krzysztof Hałasa) wrote:

> Hi,
> 
> Randy Dunlap <rdunlap@infradead.org> writes:
> 
> > kernel 4.19-rc7, on i386, with NO wan/hdlc/hostess/z85230 hardware:
> >
> > modprobe hostess_sv11 + autoload of z85230 give:  
> 
> BTW Hostess SV11 is apparently an ISA card, with all those problems.

Actually it worked perfectly well of old but people kept changing it
who didn't have hardware. Please just delete the driver instead of
pretending we can test it.

Nobody has one, even if they did there is no use for it as nobody runs
their internet over an old HDLC 64K link any more.

And we know since it's broken several ways by bit-rot that nobody uses it.

Alan

^ permalink raw reply

* Re: BBR and TCP internal pacing causing interrupt storm with pfifo_fast
From: Gasper Zejn @ 2018-10-15 10:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, Kevin Yang, netdev
In-Reply-To: <CANn89iLOHxOOuGzqNODk0hNWJ7an0bZXs4G-8mQjq3WWhZPo9g@mail.gmail.com>


On 09. 10. 2018 19:26, Eric Dumazet wrote:
> On Tue, Oct 9, 2018 at 10:22 AM Gasper Zejn <zelo.zejn@gmail.com> wrote:
>> On 09. 10. 2018 19:00, Eric Dumazet wrote:
>>> On 10/09/2018 09:38 AM, Gasper Zejn wrote:
>>>> Hello,
>>>>
>>>> I am seeing interrupt storms of over 100k-900k local timer interrupts
>>>> when changing between network devices or networks with open TCP
>>>> connections when not using sch_fq (I was using pfifo_fast). Using sch_fq
>>>> makes the bug with interrupt storm go away.
>>>>
>>> That is for what kind of traffic ?
>>>
>>> If your TCP flows send 100k-3M packets per second, then yes, the pacing timers
>>> could be setup in the 100k-900k range.
>>>
>> Traffic is nowhere in that range, think of having a few browser tabs of
>> javascript rich
>> web pages open, mostly idle, for example slack, gmail or tweetdeck. No
>> significant
>> packet rate is needed, just open connections.
> No idea of what is going on really. A repro would be nice.

I've tried to isolate the issue as best I could. There seems to be an
issue if the TCP socket has keepalive set and send queue is not empty
and the route goes away.

https://github.com/zejn/bbr_pfifo_interrupts_issue

Hope this helps,
Gasper

^ permalink raw reply

* Re: [PATCH net 1/2] geneve, vxlan: Don't check skb_dst() twice
From: Nicolas Dichtel @ 2018-10-15 10:19 UTC (permalink / raw)
  To: Stefano Brivio, David S. Miller; +Cc: Xin Long, Sabrina Dubroca, netdev
In-Reply-To: <f58a94b95460594c31348cc517f3917bbb9cc51e.1539381018.git.sbrivio@redhat.com>

Le 12/10/2018 à 23:53, Stefano Brivio a écrit :
> Commit f15ca723c1eb ("net: don't call update_pmtu unconditionally") avoids
> that we try updating PMTU for a non-existent destination, but didn't clean
> up cases where the check was already explicit. Drop those redundant checks.
Yes, I leave them to avoid calculating the new mtu value when not needed. We are
in the xmit path.
As skb_dst_update_pmtu() is inlined, we probably don't care, but gcc could still
decide to not inline it.

Regards,
Nicolas

^ permalink raw reply

* Re: [PATCH] dt-bindings: Add bindings for aliases node
From: Rob Herring @ 2018-10-15 18:00 UTC (permalink / raw)
  To: Brian Norris
  Cc: Geert Uytterhoeven, Matthias Kaehlcke, Mark Rutland,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Linux Kernel Mailing List, linux-wireless, linux-spi, netdev,
	swboyd, Florian Fainelli
In-Reply-To: <20181009183141.GA126050@ban.mtv.corp.google.com>

On Tue, Oct 09, 2018 at 11:31:42AM -0700, Brian Norris wrote:
> On Tue, Oct 09, 2018 at 09:22:07AM +0200, Geert Uytterhoeven wrote:
> > Please note these aliases become cumbersome once you start considering
> > (dynamic) DT overlays.  That's why I made them optional in the sh-sci
> > serial driver, cfr. commit 7678f4c20fa7670f ("serial: sh-sci: Add support
> > for dynamic instances").
> 
> Note that as I understand it, the entire point of documenting this sort
> of thing is to help solidify the interface between a DT aware boot
> program (e.g., bootloader) and a device tree which is provided
> separately, to avoid memorizing node/path hierarchy. It doesn't need to
> (and doesn't, as I read it) enforce an OS's device naming policy.

I'm all for documenting this primarily to prevent folks from just adding 
whatever they wish in /aliases. Some platforms seem to want to have 
aliases for everything.

> > Relevant parts of the commit description are:
> > 
> >     On DT platforms, the sh-sci driver requires the presence of "serialN"
> >     aliases in DT, from which instance IDs are derived.  If a DT alias is
> >     missing, the drivers fails to probe the corresponding serial port.
> > 
> >     This becomes cumbersome when considering DT overlays, as currently
> >     there is no upstream support for dynamically updating the /aliases node
> >     in DT.
> 
> That part is not a DT spec problem :)
> 
> >     Furthermore, even in the presence of such support, hardcoded
> >     instance IDs in independent overlays are prone to conflicts.
> > 
> >     Hence add support for dynamic instance IDs, to be used in the absence of
> >     a DT alias.  This makes serial ports behave similar to I2C and SPI
> >     buses, which already support dynamic instances.
> 
> This seems to be a much different sort of problem. People always love
> having predictable IDs given by the OS (myself included), but that's
> just plain hard to do and impossible in some cases. I don't think that's
> what this document is about though.
> 
> IOW, this document seems pretty consistent with the above: it doesn't
> require the usage of aliases (and it seems silly to have a driver
> *require* an alias) -- it just documents how one should name such an
> alias if you expect multiple independent software components to
> understand it.
> 
> > To clarify my point: R-Car M2-W has 4 different types of serial ports, for a
> > total of 18 ports, and the two ports on a board labeled 0 and 1 may not
> > correspond to the physical first two ports (what's "first" in a collection of
> > 4 different types?).
> > 
> > Aliases may be fine for referring to the main serial console (labeled
> > port 0 on the device, too), and the primary Ethernet interface (so U-Boot
> > knows where to add the "local-mac-address" property), but beyond that,
> > I think they should be avoided.

This basically matches my opinion on aliases.
 
I'd decouple it from board labels a bit. Sometimes the numbering may 
match, but others not. What if a board serial port is labeled "DBG" for 
example? I think 'label' is the right way to handle human identifible 
ports (and then we should have something like /dev/serial/by-label/...).

> That's fair enough. Just because the solution isn't an all-purpose tool
> doesn't mean it shouldn't be documented. The general concept is already
> in ePAPR, but it's just not very specific about property names.

Agreed. I guess the question is what to do on used, but not recommended 
aliases. I would put SPI and I2C into that category BTW.

Rob

^ permalink raw reply

* Re: [PATCH stable 4.9 v2 00/29] backport of IP fragmentation fixes
From: Eric Dumazet @ 2018-10-15 17:53 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, David Miller, Greg Kroah-Hartman, stable, sthemmin
In-Reply-To: <e9b8ade3-e52e-d7a5-2238-77b4e9e55b60@gmail.com>

On Mon, Oct 15, 2018 at 10:47 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 10/10/2018 12:29 PM, Florian Fainelli wrote:
> > This is based on Stephen's v4.14 patches, with the necessary merge
> > conflicts, and the lack of timer_setup() on the 4.9 baseline.
> >
> > Perf results on a gigabit capable system, before and after are below.
> >
> > Series can also be found here:
> >
> > https://github.com/ffainelli/linux/commits/fragment-stack-v4.9-v2
> >
> > Changes in v2:
> >
> > - drop "net: sk_buff rbnode reorg"
> > - added original "ip: use rb trees for IP frag queue." commit
>
> Eric, does this look reasonable to you?

Yes, thanks a lot Florian.

>
> >
> > Before patches:
> >
> >    PerfTop:     180 irqs/sec  kernel:78.9%  exact:  0.0% [4000Hz cycles:ppp],  (all, 4 CPUs)
> > -------------------------------------------------------------------------------
> >
> >     34.81%  [kernel]       [k] ip_defrag
> >      4.57%  [kernel]       [k] arch_cpu_idle
> >      2.09%  [kernel]       [k] fib_table_lookup
> >      1.74%  [kernel]       [k] finish_task_switch
> >      1.57%  [kernel]       [k] v7_dma_inv_range
> >      1.47%  [kernel]       [k] __netif_receive_skb_core
> >      1.06%  [kernel]       [k] __slab_free
> >      1.04%  [kernel]       [k] __netdev_alloc_skb
> >      0.99%  [kernel]       [k] ip_route_input_noref
> >      0.96%  [kernel]       [k] dev_gro_receive
> >      0.96%  [kernel]       [k] tick_nohz_idle_enter
> >      0.93%  [kernel]       [k] bcm_sysport_poll
> >      0.92%  [kernel]       [k] skb_release_data
> >      0.91%  [kernel]       [k] __memzero
> >      0.90%  [kernel]       [k] __free_page_frag
> >      0.87%  [kernel]       [k] ip_rcv
> >      0.77%  [kernel]       [k] eth_type_trans
> >      0.71%  [kernel]       [k] _raw_spin_unlock_irqrestore
> >      0.68%  [kernel]       [k] tick_nohz_idle_exit
> >      0.65%  [kernel]       [k] bcm_sysport_rx_refill
> >
> > After patches:
> >
> >    PerfTop:     214 irqs/sec  kernel:80.4%  exact:  0.0% [4000Hz cycles:ppp],  (all, 4 CPUs)
> > -------------------------------------------------------------------------------
> >
> >      6.61%  [kernel]       [k] arch_cpu_idle
> >      3.77%  [kernel]       [k] ip_defrag
> >      3.65%  [kernel]       [k] v7_dma_inv_range
> >      3.18%  [kernel]       [k] fib_table_lookup
> >      3.04%  [kernel]       [k] __netif_receive_skb_core
> >      2.31%  [kernel]       [k] finish_task_switch
> >      2.31%  [kernel]       [k] _raw_spin_unlock_irqrestore
> >      1.65%  [kernel]       [k] bcm_sysport_poll
> >      1.63%  [kernel]       [k] ip_route_input_noref
> >      1.63%  [kernel]       [k] __memzero
> >      1.58%  [kernel]       [k] __netdev_alloc_skb
> >      1.47%  [kernel]       [k] tick_nohz_idle_enter
> >      1.40%  [kernel]       [k] __slab_free
> >      1.32%  [kernel]       [k] ip_rcv
> >      1.32%  [kernel]       [k] __softirqentry_text_start
> >      1.30%  [kernel]       [k] dev_gro_receive
> >      1.23%  [kernel]       [k] bcm_sysport_rx_refill
> >      1.11%  [kernel]       [k] tick_nohz_idle_exit
> >      1.06%  [kernel]       [k] memcmp
> >      1.02%  [kernel]       [k] dma_cache_maint_page
> >
> >
> > Dan Carpenter (1):
> >   ipv4: frags: precedence bug in ip_expire()
> >
> > Eric Dumazet (21):
> >   inet: frags: change inet_frags_init_net() return value
> >   inet: frags: add a pointer to struct netns_frags
> >   inet: frags: refactor ipfrag_init()
> >   inet: frags: refactor ipv6_frag_init()
> >   inet: frags: refactor lowpan_net_frag_init()
> >   ipv6: export ip6 fragments sysctl to unprivileged users
> >   rhashtable: add schedule points
> >   inet: frags: use rhashtables for reassembly units
> >   inet: frags: remove some helpers
> >   inet: frags: get rif of inet_frag_evicting()
> >   inet: frags: remove inet_frag_maybe_warn_overflow()
> >   inet: frags: break the 2GB limit for frags storage
> >   inet: frags: do not clone skb in ip_expire()
> >   ipv6: frags: rewrite ip6_expire_frag_queue()
> >   rhashtable: reorganize struct rhashtable layout
> >   inet: frags: reorganize struct netns_frags
> >   inet: frags: get rid of ipfrag_skb_cb/FRAG_CB
> >   inet: frags: fix ip6frag_low_thresh boundary
> >   net: speed up skb_rbtree_purge()
> >   net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
> >   net: add rb_to_skb() and other rb tree helpers
> >
> > Florian Westphal (1):
> >   ipv6: defrag: drop non-last frags smaller than min mtu
> >
> > Peter Oskolkov (5):
> >   ip: discard IPv4 datagrams with overlapping segments.
> >   net: modify skb_rbtree_purge to return the truesize of all purged
> >     skbs.
> >   ip: use rb trees for IP frag queue.
> >   ip: add helpers to process in-order fragments faster.
> >   ip: process in-order fragments efficiently
> >
> > Taehee Yoo (1):
> >   ip: frags: fix crash in ip_do_fragment()
> >
> >  Documentation/networking/ip-sysctl.txt  |  13 +-
> >  include/linux/rhashtable.h              |   4 +-
> >  include/linux/skbuff.h                  |  34 +-
> >  include/net/inet_frag.h                 | 133 +++---
> >  include/net/ip.h                        |   1 -
> >  include/net/ipv6.h                      |  26 +-
> >  include/uapi/linux/snmp.h               |   1 +
> >  lib/rhashtable.c                        |   5 +-
> >  net/core/skbuff.c                       |  31 +-
> >  net/ieee802154/6lowpan/6lowpan_i.h      |  26 +-
> >  net/ieee802154/6lowpan/reassembly.c     | 148 +++---
> >  net/ipv4/inet_fragment.c                | 379 ++++------------
> >  net/ipv4/ip_fragment.c                  | 573 +++++++++++++-----------
> >  net/ipv4/proc.c                         |   7 +-
> >  net/ipv4/tcp_input.c                    |  33 +-
> >  net/ipv6/netfilter/nf_conntrack_reasm.c | 100 ++---
> >  net/ipv6/proc.c                         |   5 +-
> >  net/ipv6/reassembly.c                   | 212 ++++-----
> >  18 files changed, 774 insertions(+), 957 deletions(-)
> >
>
> --
> Florian

^ permalink raw reply

* Re: [PATCH stable 4.9 v2 00/29] backport of IP fragmentation fixes
From: Florian Fainelli @ 2018-10-15 17:47 UTC (permalink / raw)
  To: netdev, edumazet; +Cc: davem, gregkh, stable, sthemmin
In-Reply-To: <20181010193017.25221-1-f.fainelli@gmail.com>



On 10/10/2018 12:29 PM, Florian Fainelli wrote:
> This is based on Stephen's v4.14 patches, with the necessary merge
> conflicts, and the lack of timer_setup() on the 4.9 baseline.
> 
> Perf results on a gigabit capable system, before and after are below.
> 
> Series can also be found here:
> 
> https://github.com/ffainelli/linux/commits/fragment-stack-v4.9-v2
> 
> Changes in v2:
> 
> - drop "net: sk_buff rbnode reorg"
> - added original "ip: use rb trees for IP frag queue." commit

Eric, does this look reasonable to you?

> 
> Before patches:
> 
>    PerfTop:     180 irqs/sec  kernel:78.9%  exact:  0.0% [4000Hz cycles:ppp],  (all, 4 CPUs)
> -------------------------------------------------------------------------------
> 
>     34.81%  [kernel]       [k] ip_defrag
>      4.57%  [kernel]       [k] arch_cpu_idle
>      2.09%  [kernel]       [k] fib_table_lookup
>      1.74%  [kernel]       [k] finish_task_switch
>      1.57%  [kernel]       [k] v7_dma_inv_range
>      1.47%  [kernel]       [k] __netif_receive_skb_core
>      1.06%  [kernel]       [k] __slab_free
>      1.04%  [kernel]       [k] __netdev_alloc_skb
>      0.99%  [kernel]       [k] ip_route_input_noref
>      0.96%  [kernel]       [k] dev_gro_receive
>      0.96%  [kernel]       [k] tick_nohz_idle_enter
>      0.93%  [kernel]       [k] bcm_sysport_poll
>      0.92%  [kernel]       [k] skb_release_data
>      0.91%  [kernel]       [k] __memzero
>      0.90%  [kernel]       [k] __free_page_frag
>      0.87%  [kernel]       [k] ip_rcv
>      0.77%  [kernel]       [k] eth_type_trans
>      0.71%  [kernel]       [k] _raw_spin_unlock_irqrestore
>      0.68%  [kernel]       [k] tick_nohz_idle_exit
>      0.65%  [kernel]       [k] bcm_sysport_rx_refill
> 
> After patches:
> 
>    PerfTop:     214 irqs/sec  kernel:80.4%  exact:  0.0% [4000Hz cycles:ppp],  (all, 4 CPUs)
> -------------------------------------------------------------------------------
> 
>      6.61%  [kernel]       [k] arch_cpu_idle
>      3.77%  [kernel]       [k] ip_defrag
>      3.65%  [kernel]       [k] v7_dma_inv_range
>      3.18%  [kernel]       [k] fib_table_lookup
>      3.04%  [kernel]       [k] __netif_receive_skb_core
>      2.31%  [kernel]       [k] finish_task_switch
>      2.31%  [kernel]       [k] _raw_spin_unlock_irqrestore
>      1.65%  [kernel]       [k] bcm_sysport_poll
>      1.63%  [kernel]       [k] ip_route_input_noref
>      1.63%  [kernel]       [k] __memzero
>      1.58%  [kernel]       [k] __netdev_alloc_skb
>      1.47%  [kernel]       [k] tick_nohz_idle_enter
>      1.40%  [kernel]       [k] __slab_free
>      1.32%  [kernel]       [k] ip_rcv
>      1.32%  [kernel]       [k] __softirqentry_text_start
>      1.30%  [kernel]       [k] dev_gro_receive
>      1.23%  [kernel]       [k] bcm_sysport_rx_refill
>      1.11%  [kernel]       [k] tick_nohz_idle_exit
>      1.06%  [kernel]       [k] memcmp
>      1.02%  [kernel]       [k] dma_cache_maint_page
> 
> 
> Dan Carpenter (1):
>   ipv4: frags: precedence bug in ip_expire()
> 
> Eric Dumazet (21):
>   inet: frags: change inet_frags_init_net() return value
>   inet: frags: add a pointer to struct netns_frags
>   inet: frags: refactor ipfrag_init()
>   inet: frags: refactor ipv6_frag_init()
>   inet: frags: refactor lowpan_net_frag_init()
>   ipv6: export ip6 fragments sysctl to unprivileged users
>   rhashtable: add schedule points
>   inet: frags: use rhashtables for reassembly units
>   inet: frags: remove some helpers
>   inet: frags: get rif of inet_frag_evicting()
>   inet: frags: remove inet_frag_maybe_warn_overflow()
>   inet: frags: break the 2GB limit for frags storage
>   inet: frags: do not clone skb in ip_expire()
>   ipv6: frags: rewrite ip6_expire_frag_queue()
>   rhashtable: reorganize struct rhashtable layout
>   inet: frags: reorganize struct netns_frags
>   inet: frags: get rid of ipfrag_skb_cb/FRAG_CB
>   inet: frags: fix ip6frag_low_thresh boundary
>   net: speed up skb_rbtree_purge()
>   net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
>   net: add rb_to_skb() and other rb tree helpers
> 
> Florian Westphal (1):
>   ipv6: defrag: drop non-last frags smaller than min mtu
> 
> Peter Oskolkov (5):
>   ip: discard IPv4 datagrams with overlapping segments.
>   net: modify skb_rbtree_purge to return the truesize of all purged
>     skbs.
>   ip: use rb trees for IP frag queue.
>   ip: add helpers to process in-order fragments faster.
>   ip: process in-order fragments efficiently
> 
> Taehee Yoo (1):
>   ip: frags: fix crash in ip_do_fragment()
> 
>  Documentation/networking/ip-sysctl.txt  |  13 +-
>  include/linux/rhashtable.h              |   4 +-
>  include/linux/skbuff.h                  |  34 +-
>  include/net/inet_frag.h                 | 133 +++---
>  include/net/ip.h                        |   1 -
>  include/net/ipv6.h                      |  26 +-
>  include/uapi/linux/snmp.h               |   1 +
>  lib/rhashtable.c                        |   5 +-
>  net/core/skbuff.c                       |  31 +-
>  net/ieee802154/6lowpan/6lowpan_i.h      |  26 +-
>  net/ieee802154/6lowpan/reassembly.c     | 148 +++---
>  net/ipv4/inet_fragment.c                | 379 ++++------------
>  net/ipv4/ip_fragment.c                  | 573 +++++++++++++-----------
>  net/ipv4/proc.c                         |   7 +-
>  net/ipv4/tcp_input.c                    |  33 +-
>  net/ipv6/netfilter/nf_conntrack_reasm.c | 100 ++---
>  net/ipv6/proc.c                         |   5 +-
>  net/ipv6/reassembly.c                   | 212 ++++-----
>  18 files changed, 774 insertions(+), 957 deletions(-)
> 

-- 
Florian

^ permalink raw reply

* Re: [Potential Spoof] Re: [PATCH net-next v4] net/ncsi: Add NCSI Broadcom OEM command
From: Vijay Khemka @ 2018-10-15 17:38 UTC (permalink / raw)
  To: Samuel Mendoza-Jonas, David S. Miller, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: openbmc@lists.ozlabs.org, linux-aspeed@lists.ozlabs.org



On 10/15/18, 10:27 AM, "Linux-aspeed on behalf of Vijay Khemka" <linux-aspeed-bounces+vijaykhemka=fb.com@lists.ozlabs.org on behalf of vijaykhemka@fb.com> wrote:

    
    
    On 10/14/18, 8:51 PM, "Samuel Mendoza-Jonas" <sam@mendozajonas.com> wrote:
    
        On Mon, 2018-10-15 at 13:08 +1100, Samuel Mendoza-Jonas wrote:
        > On Fri, 2018-10-12 at 11:20 -0700, Vijay Khemka wrote:
        > > This patch adds OEM Broadcom commands and response handling. It also
        > > defines OEM Get MAC Address handler to get and configure the device.
        > > 
        > > ncsi_oem_gma_handler_bcm: This handler send NCSI broadcom command for
        > > getting mac address.
        > > ncsi_rsp_handler_oem_bcm: This handles response received for all
        > > broadcom OEM commands.
        > > ncsi_rsp_handler_oem_bcm_gma: This handles get mac address response and
        > > set it to device.
        > > 
        > > Signed-off-by: Vijay Khemka <vijaykhemka@fb.com>
        > > ---
        > >  v4: updated as per comment from Sam, I was just wondering if I can remove
        > >  NCSI_OEM_CMD_GET_MAC config option and let this code be valid always and
        > >  it will configure mac address if there is get mac address handler for given 
        > >  manufacture id.
        > 
        > Hi Vijay,
        > 
        > We can look at handling this a different way, but I don't think we want
        > to unconditionally set the system's MAC address based on the OEM GMA
        > command. If the user wants to set a custom MAC address, or in the case of
        > OpenBMC for example who have their MAC address saved in flash, this will
        > override that value with whatever the Network Controller has saved. In
        > particular as it is set up it will override any MAC address every time a
        > channel is configured, such as during a failover event.
        > 
        > We *could* always send the GMA command if it is available and move the
        > decision whether to use the resulting address or not into the response
        > handler. That would simplify the ncsi_configure_channel() logic a bit.
        > Another idea may be to have a Netlink command to tell NCSI to ignore the
        > GMA result; then we could drop the config option and the system can
        > safely change the address if desired.
        > 
        > Any thoughts? I'll also ping some of the OpenBMC people and see what
        > their expectations are.
        
        After a bit of a think and an ask around, to quote a colleague:
        > I think we'd want it handled (overall) like any other net device; the MAC
        > address in the device's ROM provides a default, and is overridden by anything
        > specified by userspace 
        
        Which describes what I was thinking pretty well.
        So if we can have it such that the NCSI driver only sets the MAC address
        _once_, and then after then does not update it again, we should be able to call
        the OEM GMA command without hiding it behind a config option. So the first time
        a channel was configured we store and set the MAC address given, but then on
        later configure events we don't continue to update it. What do you think?
        
        Cheers,
        Sam
    
      I agree with you setting it only once. I gave a thought about config option and realize that 
      we should allow user to configure it. If user wants to set mac address through device tree 
      and not through ROM then we must not override mac set by device tree. So my proposal is 
      setting of mac address in response should be hidden under config option. Getting mac address 
      can still go without config option. Your thought?
        
  or simply guard following block under config and no other function declaration guard required. 
  And set static variable flag in function " ncsi_oem_handler" for calling this only once.

  #if IS_ENABLED(CONFIG_NCSI_OEM_CMD_GET_MAC)
    nca.type = NCSI_PKT_CMD_OEM;
    nca.package = np->id;
    nca.channel = nc->id;
    ndp->pending_req_num = 1;
    ret = ncsi_oem_handler(&nca, nc->version.mf_id);
#endif /* CONFIG_NCSI_OEM_CMD_GET_MAC */


^ permalink raw reply

* Re: [PATCH net-next v4] net/ncsi: Add NCSI Broadcom OEM command
From: Vijay Khemka @ 2018-10-15 17:27 UTC (permalink / raw)
  To: Samuel Mendoza-Jonas, David S. Miller, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: linux-aspeed@lists.ozlabs.org, openbmc@lists.ozlabs.org
In-Reply-To: <9c82f52a7ce3fb9052600709a10783b23ef88457.camel@mendozajonas.com>



On 10/14/18, 8:51 PM, "Samuel Mendoza-Jonas" <sam@mendozajonas.com> wrote:

    On Mon, 2018-10-15 at 13:08 +1100, Samuel Mendoza-Jonas wrote:
    > On Fri, 2018-10-12 at 11:20 -0700, Vijay Khemka wrote:
    > > This patch adds OEM Broadcom commands and response handling. It also
    > > defines OEM Get MAC Address handler to get and configure the device.
    > > 
    > > ncsi_oem_gma_handler_bcm: This handler send NCSI broadcom command for
    > > getting mac address.
    > > ncsi_rsp_handler_oem_bcm: This handles response received for all
    > > broadcom OEM commands.
    > > ncsi_rsp_handler_oem_bcm_gma: This handles get mac address response and
    > > set it to device.
    > > 
    > > Signed-off-by: Vijay Khemka <vijaykhemka@fb.com>
    > > ---
    > >  v4: updated as per comment from Sam, I was just wondering if I can remove
    > >  NCSI_OEM_CMD_GET_MAC config option and let this code be valid always and
    > >  it will configure mac address if there is get mac address handler for given 
    > >  manufacture id.
    > 
    > Hi Vijay,
    > 
    > We can look at handling this a different way, but I don't think we want
    > to unconditionally set the system's MAC address based on the OEM GMA
    > command. If the user wants to set a custom MAC address, or in the case of
    > OpenBMC for example who have their MAC address saved in flash, this will
    > override that value with whatever the Network Controller has saved. In
    > particular as it is set up it will override any MAC address every time a
    > channel is configured, such as during a failover event.
    > 
    > We *could* always send the GMA command if it is available and move the
    > decision whether to use the resulting address or not into the response
    > handler. That would simplify the ncsi_configure_channel() logic a bit.
    > Another idea may be to have a Netlink command to tell NCSI to ignore the
    > GMA result; then we could drop the config option and the system can
    > safely change the address if desired.
    > 
    > Any thoughts? I'll also ping some of the OpenBMC people and see what
    > their expectations are.
    
    After a bit of a think and an ask around, to quote a colleague:
    > I think we'd want it handled (overall) like any other net device; the MAC
    > address in the device's ROM provides a default, and is overridden by anything
    > specified by userspace 
    
    Which describes what I was thinking pretty well.
    So if we can have it such that the NCSI driver only sets the MAC address
    _once_, and then after then does not update it again, we should be able to call
    the OEM GMA command without hiding it behind a config option. So the first time
    a channel was configured we store and set the MAC address given, but then on
    later configure events we don't continue to update it. What do you think?
    
    Cheers,
    Sam

  I agree with you setting it only once. I gave a thought about config option and realize that 
  we should allow user to configure it. If user wants to set mac address through device tree 
  and not through ROM then we must not override mac set by device tree. So my proposal is 
  setting of mac address in response should be hidden under config option. Getting mac address 
  can still go without config option. Your thought?
    
    > 
    > > +#if IS_ENABLED(CONFIG_NCSI_OEM_CMD_GET_MAC)
    > > +
    > > +/* NCSI OEM Command APIs */
    > > +static void ncsi_oem_gma_handler_bcm(struct ncsi_cmd_arg *nca)
    > > +{
    > > +	unsigned char data[NCSI_OEM_BCM_CMD_GMA_LEN];
    > > +	int ret = 0;
    > > +
    > > +	nca->payload = NCSI_OEM_BCM_CMD_GMA_LEN;
    > > +
    > > +	memset(data, 0, NCSI_OEM_BCM_CMD_GMA_LEN);
    > > +	*(unsigned int *)data = ntohl(NCSI_OEM_MFR_BCM_ID);
    > > +	data[5] = NCSI_OEM_BCM_CMD_GMA;
    > > +
    > > +	nca->data = data;
    > > +
    > > +	ret = ncsi_xmit_cmd(nca);
    > > +	if (ret)
    > > +		netdev_err(nca->ndp->ndev.dev,
    > > +			   "NCSI: Failed to transmit cmd 0x%x during configure\n",
    > > +			   nca->type);
    > > +}
    > 
    > As a side note while unlikely we probably want to propagate the return
    > value of ncsi_xmit_cmd() from here; otherwise we'll miss a failure and
    > the configure process will stall.
    > 
    > Regards,
    > Sam
    > 
  I will take care of this.  
    
    


^ permalink raw reply

* Re: [PATCH net 0/2] geneve, vxlan: Don't set exceptions if skb->len < mtu
From: Xin Long @ 2018-10-15  9:40 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: davem, Sabrina Dubroca, network dev
In-Reply-To: <cover.1539381018.git.sbrivio@redhat.com>

On Sat, Oct 13, 2018 at 6:54 AM Stefano Brivio <sbrivio@redhat.com> wrote:
>
> This series fixes the exception abuse described in 2/2, and 1/2
> is just a preparatory change to make 2/2 less ugly.
>
> Stefano Brivio (2):
>   geneve, vxlan: Don't check skb_dst() twice
>   geneve, vxlan: Don't set exceptions if skb->len < mtu
>
>  drivers/net/geneve.c | 14 +++-----------
>  drivers/net/vxlan.c  | 12 ++----------
>  include/net/dst.h    | 10 ++++++++++
>  3 files changed, 15 insertions(+), 21 deletions(-)
>
> --
> 2.19.1
>
Series Reviewed-by: Xin Long <lucien.xin@gmail.com>

^ permalink raw reply

* Re: [PATCH net] ip6_tunnel: Don't update PMTU on tunnels with collect_md
From: Stefano Brivio @ 2018-10-15  9:09 UTC (permalink / raw)
  To: Nicolas Dichtel, David S. Miller; +Cc: Alexei Starovoitov, netdev
In-Reply-To: <eb26ebd1-19e3-e72a-2d90-3793889a635e@6wind.com>

On Mon, 15 Oct 2018 10:48:05 +0200
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:

> Le 12/10/2018 à 18:34, Stefano Brivio a écrit :
> > On Fri, 12 Oct 2018 17:58:55 +0200
> > Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:  
> [snip]
> >> Could you explain in your commit log which problem your patch fixes?  
> > 
> > Nothing really.
> > 
> > The change in f15ca723c1eb looked accidental and I thought it doesn't
> > make sense to update the PMTU in that case, but I didn't figure out
> > it's not actually done anyway.
> > 
> > Maybe it makes things a bit more readable, in that case I'd target it
> > for net-next. What do you think?
> >   
> I don't think that this patch helps. The purpose of the skb_dst_update_pmtu()
> helper is to hide those things. If one day, update_pmtu is defined for
> md_dst_op, I bet that we won't remove this test.

I see, makes sense.

David, please drop this patch, and sorry for the noise.

-- 
Stefano

^ permalink raw reply

* Re: [PATCH net] ip6_tunnel: Don't update PMTU on tunnels with collect_md
From: Nicolas Dichtel @ 2018-10-15  8:48 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: David S. Miller, Alexei Starovoitov, netdev
In-Reply-To: <20181012183438.59f4308e@epycfail>

Le 12/10/2018 à 18:34, Stefano Brivio a écrit :
> On Fri, 12 Oct 2018 17:58:55 +0200
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
[snip]
>> Could you explain in your commit log which problem your patch fixes?
> 
> Nothing really.
> 
> The change in f15ca723c1eb looked accidental and I thought it doesn't
> make sense to update the PMTU in that case, but I didn't figure out
> it's not actually done anyway.
> 
> Maybe it makes things a bit more readable, in that case I'd target it
> for net-next. What do you think?
> 
I don't think that this patch helps. The purpose of the skb_dst_update_pmtu()
helper is to hide those things. If one day, update_pmtu is defined for
md_dst_op, I bet that we won't remove this test.


Regards,
Nicolas

^ permalink raw reply

* Re: [PATCH] net: bridge: fix a memory leak in __vlan_add
From: Li RongQing @ 2018-10-15  8:31 UTC (permalink / raw)
  To: nikolay; +Cc: Li RongQing, netdev, bridge, roopa
In-Reply-To: <2badfb2b-0a50-4d18-cdb4-d894b4ef7bec@cumulusnetworks.com>

> >
>
> Hi,
> Good catch, but the patch doesn't fix the bug entirely. The problem is that masterv can be
> created just for this vlan and the br_vlan_put_master() above can free it, so we can
> check a pointer that's not really up-to-date (and thus again leak memory).
> You should move the new code above the br_vlan_put_master() call.
>
> Also please tag the proper branch, this is for net-next, and CC all bridge
> maintainers (added Roopa).
>

Ok, thanks, I will send v2

-RongQing


> Thank you,
>  Nik

^ permalink raw reply

* Re: [PATCH net 2/2] geneve, vxlan: Don't set exceptions if skb->len < mtu
From: Stefano Brivio @ 2018-10-15  8:27 UTC (permalink / raw)
  To: Xin Long; +Cc: davem, Sabrina Dubroca, network dev
In-Reply-To: <CADvbK_fVyZA-MzmESYOQmp_pes+X61iftnYtNNU4Y_uqSg2LhQ@mail.gmail.com>

On Mon, 15 Oct 2018 15:01:31 +0900
Xin Long <lucien.xin@gmail.com> wrote:

> On Sat, Oct 13, 2018 at 6:54 AM Stefano Brivio <sbrivio@redhat.com> wrote:
> >
> > We shouldn't abuse exceptions: if the destination MTU is already higher
> > than what we're transmitting, no exception should be created.  
> makes sense, shouldn't ip(6) tunnels also do this?

I should probably have mentioned this in the cover letter: in theory
yes, but I'm doing this as preparation for ICMP handling in UDP
tunnels, and those will get selftests soon (once I'm done).

Writing extensive selftests for IP tunnels will take significantly
longer, so I'm not too confident to change this right now. I'd prefer
to address that at a later time.

-- 
Stefano

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox