* Re: net/tcp: warning in tcp_try_coalesce/skb_try_coalesce
From: Eric Dumazet @ 2017-04-26 14:16 UTC (permalink / raw)
To: Andrey Konovalov
Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
Patrick McHardy, netdev, LKML, Dmitry Vyukov, Kostya Serebryany,
syzkaller
In-Reply-To: <CAAeHK+yQJUuuv2XM6tQuaE12Zo1htujz6A7Qd_h+E9qCVEwDSg@mail.gmail.com>
On Wed, Apr 26, 2017 at 5:08 AM, Andrey Konovalov <andreyknvl@google.com> wrote:
> On Tue, Jan 31, 2017 at 2:17 PM, Andrey Konovalov <andreyknvl@google.com> wrote:
>> Hi,
>>
>> I've got the following error report while running the syzkaller fuzzer.
>>
>> On commit 566cf877a1fcb6d6dc0126b076aad062054c2637 (4.10-rc6).
>>
>> The fuzzer hits this issue quite often, but I don't have a working reproducer.
>
> I still see this on 5a7ad1146caa895ad718a534399e38bd2ba721b7 (4.11-rc8).
>
> I also have a reproducer now, attached.
>
> It takes around 10 seconds for the warning to trigger.
>
It does not trigger for me, but seeing that you use SO_ATTACH_FILTER,
I have a pretty good idea how to remove this splat.
A call to skb_condense() from ___pskb_trim() will likely help, I will
post a patch after my commute.
Thanks.
^ permalink raw reply
* Re: [PATCH] IB/IPoIB: Check the headroom size
From: Honggang LI @ 2017-04-26 14:11 UTC (permalink / raw)
To: Doug Ledford
Cc: Paolo Abeni, Or Gerlitz, Erez Shitrit, Erez Shitrit,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Linux Netdev List, David Miller
In-Reply-To: <1493214638.3041.110.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
On Wed, Apr 26, 2017 at 09:50:38AM -0400, Doug Ledford wrote:
> On Wed, 2017-04-26 at 09:48 -0400, Doug Ledford wrote:
> > On Wed, 2017-04-26 at 21:33 +0800, Honggang LI wrote:
> > >
> > > Yes, it is during the process of removing the final slave. The
> > > reproducer looks like this:
> > >
> > > ping remote_ip_over_bonding_interface &
> > > while 1; do
> > > ifdown bond0
> > > ifup bond0
> > > done
>
> BTW, rerunning your test as:
>
> ping remote_ip_over_bonding_interface &
> while 1; do
> echo -n "Downing interface..."
> ifdown bond0
Confirmed panic in here.
> echo "done."
> sleep 1
> echo -n "Bringing interface up..."
> ifup bond0
> echo "done."
> done
>
> will allow you to more precisely identify whether the failure is during
> the down or the up of the interface.
>
> --
> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> GPG KeyID: B826A3330E572FDD
>
> Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net v3 2/3] net: hns: support deferred probe when no mdio
From: Matthias Brugger @ 2017-04-26 14:07 UTC (permalink / raw)
To: Yankejian, davem, salil.mehta, yisen.zhuang, lipeng321, zhouhuiru,
huangdaode
Cc: netdev, linuxarm
In-Reply-To: <1493190022-91343-3-git-send-email-yankejian@huawei.com>
On 26/04/17 09:00, Yankejian wrote:
> From: lipeng <lipeng321@huawei.com>
>
> In the hip06 and hip07 SoCs, phy connect to mdio bus.The mdio
> module is probed with module_init, and, as such,
> is not guaranteed to probe before the HNS driver. So we need
> to support deferred probe.
>
> We check for probe deferral in the mac init, so we not init DSAF
> when there is no mdio, and free all resource, to later learn that
> we need to defer the probe.
>
> Signed-off-by: lipeng <lipeng321@huawei.com>
> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Reviewed-by: Matthias Brugger <mbrugger@suse.com>
> ---
> drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 39 +++++++++++++++++++----
> 1 file changed, 33 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
> index bdd8cdd..8c00e09 100644
> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
> @@ -735,6 +735,8 @@ static int hns_mac_init_ex(struct hns_mac_cb *mac_cb)
> rc = phy_device_register(phy);
> if (rc) {
> phy_device_free(phy);
> + dev_err(&mdio->dev, "registered phy fail at address %i\n",
> + addr);
> return -ENODEV;
> }
>
> @@ -745,7 +747,7 @@ static int hns_mac_init_ex(struct hns_mac_cb *mac_cb)
> return 0;
> }
>
> -static void hns_mac_register_phy(struct hns_mac_cb *mac_cb)
> +static int hns_mac_register_phy(struct hns_mac_cb *mac_cb)
> {
> struct acpi_reference_args args;
> struct platform_device *pdev;
> @@ -755,24 +757,39 @@ static void hns_mac_register_phy(struct hns_mac_cb *mac_cb)
>
> /* Loop over the child nodes and register a phy_device for each one */
> if (!to_acpi_device_node(mac_cb->fw_port))
> - return;
> + return -ENODEV;
>
> rc = acpi_node_get_property_reference(
> mac_cb->fw_port, "mdio-node", 0, &args);
> if (rc)
> - return;
> + return rc;
>
> addr = hns_mac_phy_parse_addr(mac_cb->dev, mac_cb->fw_port);
> if (addr < 0)
> - return;
> + return addr;
>
> /* dev address in adev */
> pdev = hns_dsaf_find_platform_device(acpi_fwnode_handle(args.adev));
> + if (!pdev) {
> + dev_err(mac_cb->dev, "mac%d mdio pdev is NULL\n",
> + mac_cb->mac_id);
> + return -EINVAL;
> + }
> +
> mii_bus = platform_get_drvdata(pdev);
> + if (!mii_bus) {
> + dev_err(mac_cb->dev,
> + "mac%d mdio is NULL, dsaf will probe again later\n",
> + mac_cb->mac_id);
> + return -EPROBE_DEFER;
> + }
> +
> rc = hns_mac_register_phydev(mii_bus, mac_cb, addr);
> if (!rc)
> dev_dbg(mac_cb->dev, "mac%d register phy addr:%d\n",
> mac_cb->mac_id, addr);
> +
> + return rc;
> }
>
> #define MAC_MEDIA_TYPE_MAX_LEN 16
> @@ -793,7 +810,7 @@ static void hns_mac_register_phy(struct hns_mac_cb *mac_cb)
> *@np:device node
> * return: 0 --success, negative --fail
> */
> -static int hns_mac_get_info(struct hns_mac_cb *mac_cb)
> +static int hns_mac_get_info(struct hns_mac_cb *mac_cb)
> {
> struct device_node *np;
> struct regmap *syscon;
> @@ -903,7 +920,15 @@ static int hns_mac_get_info(struct hns_mac_cb *mac_cb)
> }
> }
> } else if (is_acpi_node(mac_cb->fw_port)) {
> - hns_mac_register_phy(mac_cb);
> + ret = hns_mac_register_phy(mac_cb);
> + /*
> + * Mac can work well if there is phy or not.If the port don't
> + * connect with phy, the return value will be ignored. Only
> + * when there is phy but can't find mdio bus, the return value
> + * will be handled.
> + */
> + if (ret == -EPROBE_DEFER)
> + return ret;
> } else {
> dev_err(mac_cb->dev, "mac%d cannot find phy node\n",
> mac_cb->mac_id);
> @@ -1065,6 +1090,7 @@ int hns_mac_init(struct dsaf_device *dsaf_dev)
> dsaf_dev->mac_cb[port_id] = mac_cb;
> }
> }
> +
> /* init mac_cb for all port */
> for (port_id = 0; port_id < max_port_num; port_id++) {
> mac_cb = dsaf_dev->mac_cb[port_id];
> @@ -1074,6 +1100,7 @@ int hns_mac_init(struct dsaf_device *dsaf_dev)
> ret = hns_mac_get_cfg(dsaf_dev, mac_cb);
> if (ret)
> return ret;
> +
> ret = hns_mac_init_ex(mac_cb);
> if (ret)
> return ret;
>
^ permalink raw reply
* [PATCH net-next] fib_rules: fix error return code
From: Wei Yongjun @ 2017-04-26 14:03 UTC (permalink / raw)
To: Lorenzo Colitti, Ido Schimmel, David Ahern, Mateusz Bajorski,
Johannes Berg
Cc: Wei Yongjun, netdev
From: Wei Yongjun <weiyongjun1@huawei.com>
Fix to return error code -EINVAL from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 622ec2c9d524 ("net: core: add UID to flows, rules, and routes")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
net/core/fib_rules.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index c58c1df..f21c4d3 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -440,6 +440,7 @@ int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr *nlh,
if (tb[FRA_TUN_ID])
rule->tun_id = nla_get_be64(tb[FRA_TUN_ID]);
+ err = -EINVAL;
if (tb[FRA_L3MDEV]) {
#ifdef CONFIG_NET_L3_MASTER_DEV
rule->l3mdev = nla_get_u8(tb[FRA_L3MDEV]);
@@ -461,7 +462,6 @@ int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr *nlh,
else
rule->suppress_ifgroup = -1;
- err = -EINVAL;
if (tb[FRA_GOTO]) {
if (rule->action != FR_ACT_GOTO)
goto errout_free;
@@ -592,8 +592,10 @@ int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr *nlh,
if (tb[FRA_UID_RANGE]) {
range = nla_get_kuid_range(tb);
- if (!uid_range_set(&range))
+ if (!uid_range_set(&range)) {
+ err = -EINVAL;
goto errout;
+ }
} else {
range = fib_kuid_range_unset;
}
^ permalink raw reply related
* Re: [PATCH net v3 1/3] net: hns: support deferred probe when can not obtain irq
From: Matthias Brugger @ 2017-04-26 14:03 UTC (permalink / raw)
To: Yankejian, davem, salil.mehta, yisen.zhuang, lipeng321, zhouhuiru,
huangdaode
Cc: netdev, linuxarm
In-Reply-To: <1493190022-91343-2-git-send-email-yankejian@huawei.com>
On 26/04/17 09:00, Yankejian wrote:
> From: lipeng <lipeng321@huawei.com>
>
> In the hip06 and hip07 SoCs, the interrupt lines from the
> DSAF controllers are connected to mbigen hw module.
> The mbigen module is probed with module_init, and, as such,
> is not guaranteed to probe before the HNS driver. So we need
> to support deferred probe.
>
> We check for probe deferral in the hw layer probe, so we not
> probe into the main layer and memories, etc., to later learn
> that we need to defer the probe.
This paragraph does not hold any more and should be deleted.
Other then this:
Reviewed-by: Matthias Brugger <mbrugger@suse.com>
>
> Signed-off-by: lipeng <lipeng321@huawei.com>
> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
> ---
> V2 -> V3:
> 1. Check return value when platform_get_irq in hns_rcb_get_cfg;
> ---
> drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 4 +++-
> drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 8 +++++++-
> drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h | 2 +-
> 3 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
> index 6ea8722..a41cf95 100644
> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
> @@ -510,7 +510,9 @@ int hns_ppe_init(struct dsaf_device *dsaf_dev)
>
> hns_ppe_get_cfg(dsaf_dev->ppe_common[i]);
>
> - hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
> + ret = hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
> + if (ret)
> + goto get_rcb_cfg_fail;
> }
>
> for (i = 0; i < HNS_PPE_COM_NUM; i++)
> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
> index f0ed80d6..673a5d3 100644
> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
> @@ -452,7 +452,7 @@ static int hns_rcb_get_base_irq_idx(struct rcb_common_cb *rcb_common)
> *hns_rcb_get_cfg - get rcb config
> *@rcb_common: rcb common device
> */
> -void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
> +int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
> {
> struct ring_pair_cb *ring_pair_cb;
> u32 i;
> @@ -477,10 +477,16 @@ void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
> ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] =
> is_ver1 ? platform_get_irq(pdev, base_irq_idx + i * 2 + 1) :
> platform_get_irq(pdev, base_irq_idx + i * 3);
> + if ((ring_pair_cb->virq[HNS_RCB_IRQ_IDX_TX] == -EPROBE_DEFER) ||
> + (ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] == -EPROBE_DEFER))
> + return -EPROBE_DEFER;
> +
> ring_pair_cb->q.phy_base =
> RCB_COMM_BASE_TO_RING_BASE(rcb_common->phy_base, i);
> hns_rcb_ring_pair_get_cfg(ring_pair_cb);
> }
> +
> + return 0;
> }
>
> /**
> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
> index 99b4e1b..3d7b484 100644
> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
> @@ -110,7 +110,7 @@ struct rcb_common_cb {
> void hns_rcb_common_free_cfg(struct dsaf_device *dsaf_dev, u32 comm_index);
> int hns_rcb_common_init_hw(struct rcb_common_cb *rcb_common);
> void hns_rcb_start(struct hnae_queue *q, u32 val);
> -void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common);
> +int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common);
> void hns_rcb_get_queue_mode(enum dsaf_mode dsaf_mode,
> u16 *max_vfn, u16 *max_q_per_vf);
>
>
^ permalink raw reply
* Re: net/ipv6: use-after-free in __call_rcu/in6_dev_finish_destroy_rcu
From: Paul E. McKenney @ 2017-04-26 13:59 UTC (permalink / raw)
To: Andrey Konovalov
Cc: David S. Miller, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Josh Triplett,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Eric Dumazet,
Cong Wang, Dmitry Vyukov, Kostya Serebryany, syzkaller
In-Reply-To: <CAAeHK+y=YqG0Kb2chQ=_2E8aPttMetQjZPCxFibHKCfktQS2PQ@mail.gmail.com>
On Wed, Apr 26, 2017 at 02:34:15PM +0200, Andrey Konovalov wrote:
> Hi,
>
> I've got the following error report while fuzzing the kernel with syzkaller.
>
> On commit 5a7ad1146caa895ad718a534399e38bd2ba721b7 (4.11-rc8).
>
> Unfortunately it's not reproducible.
>
> I'm not sure whether is is an issue with rcu or ipv6.
>
> ==================================================================
> BUG: KASAN: use-after-free in __call_rcu.constprop.77+0x13be/0x1640
Does building with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y show any splats?
(Yeah, kind of a stupid question if it is not reproducible, but had
to ask!)
Thanx, Paul
> kernel/rcu/tree.c:3269 at addr ffff88003b842280
> Write of size 8 by task kworker/u10:1/180
> CPU: 2 PID: 180 Comm: kworker/u10:1 Not tainted 4.11.0-rc8+ #270
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Workqueue: events_unbound call_usermodehelper_exec_work
> Call Trace:
> __dump_stack lib/dump_stack.c:16 [inline]
> dump_stack+0x192/0x22d lib/dump_stack.c:52
> kasan_object_err+0x1c/0x70 mm/kasan/report.c:164
> print_address_description mm/kasan/report.c:202 [inline]
> kasan_report_error mm/kasan/report.c:291 [inline]
> kasan_report+0x252/0x510 mm/kasan/report.c:347
> __asan_report_store8_noabort+0x17/0x20 mm/kasan/report.c:373
> __call_rcu.constprop.77+0x13be/0x1640 kernel/rcu/tree.c:3269
> call_rcu_sched+0x12/0x20 kernel/rcu/tree.c:3288
> free_pid+0x446/0x5d0 kernel/pid.c:293
> __change_pid+0x2a1/0x3d0 kernel/pid.c:411
> detach_pid+0x1f/0x30 kernel/pid.c:416
> __unhash_process kernel/exit.c:74 [inline]
> __exit_signal kernel/exit.c:155 [inline]
> release_task+0xbb0/0x1d90 kernel/exit.c:199
> wait_task_zombie kernel/exit.c:1230 [inline]
> wait_consider_task+0x11fe/0x3410 kernel/exit.c:1458
> do_wait_thread kernel/exit.c:1521 [inline]
> do_wait+0x3ea/0x8e0 kernel/exit.c:1592
> SYSC_wait4 kernel/exit.c:1720 [inline]
> SyS_wait4+0x208/0x340 kernel/exit.c:1689
> call_usermodehelper_exec_sync kernel/kmod.c:292 [inline]
> call_usermodehelper_exec_work+0x1a7/0x2b0 kernel/kmod.c:329
> process_one_work+0x9f7/0x1580 kernel/workqueue.c:2097
> worker_thread+0x1df/0x14b0 kernel/workqueue.c:2231
> kthread+0x31f/0x3f0 kernel/kthread.c:231
> ret_from_fork+0x2c/0x40 arch/x86/entry/entry_64.S:430
> Object at ffff88003b842018, in cache kmalloc-1024 size: 1024
> Allocated:
> PID = 1
> save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
> save_stack+0x43/0xd0 mm/kasan/kasan.c:513
> set_track mm/kasan/kasan.c:525 [inline]
> kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:616
> kmem_cache_alloc_trace+0x61/0x170 mm/slub.c:2745
> kmalloc include/linux/slab.h:490 [inline]
> kzalloc include/linux/slab.h:663 [inline]
> ipv6_add_dev+0x199/0x1380 net/ipv6/addrconf.c:380
> addrconf_init+0xd0/0x29a net/ipv6/addrconf.c:6405
> inet6_init+0x2f6/0x584 net/ipv6/af_inet6.c:962
> do_one_initcall+0xf3/0x380 init/main.c:792
> do_initcall_level init/main.c:858 [inline]
> do_initcalls init/main.c:866 [inline]
> do_basic_setup init/main.c:884 [inline]
> kernel_init_freeable+0x54d/0x622 init/main.c:1035
> kernel_init+0x13/0x180 init/main.c:959
> ret_from_fork+0x2c/0x40 arch/x86/entry/entry_64.S:430
> Freed:
> PID = 6479
> save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
> save_stack+0x43/0xd0 mm/kasan/kasan.c:513
> set_track mm/kasan/kasan.c:525 [inline]
> kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:589
> slab_free_hook mm/slub.c:1357 [inline]
> slab_free_freelist_hook mm/slub.c:1379 [inline]
> slab_free mm/slub.c:2961 [inline]
> kfree+0x91/0x190 mm/slub.c:3882
> in6_dev_finish_destroy_rcu+0x97/0xc0 net/ipv6/addrconf_core.c:150
> __rcu_reclaim kernel/rcu/rcu.h:118 [inline]
> rcu_do_batch.isra.65+0x6de/0xbd0 kernel/rcu/tree.c:2879
> invoke_rcu_callbacks kernel/rcu/tree.c:3142 [inline]
> __rcu_process_callbacks kernel/rcu/tree.c:3109 [inline]
> rcu_process_callbacks+0x23f/0x810 kernel/rcu/tree.c:3126
> __do_softirq+0x253/0x78b kernel/softirq.c:284
> Memory state around the buggy address:
> ffff88003b842180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff88003b842200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >ffff88003b842280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ^
> ffff88003b842300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff88003b842380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================
>
^ permalink raw reply
* Re: [PATCH net-next v8 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jiri Pirko @ 2017-04-26 13:56 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: davem, xiyou.wangcong, eric.dumazet, netdev, Simon Horman,
Benjamin LaHaise
In-Reply-To: <10fe2c22-8e76-543e-dd24-ddce5813ab69@mojatatu.com>
Wed, Apr 26, 2017 at 03:14:38PM CEST, jhs@mojatatu.com wrote:
>On 17-04-26 08:08 AM, Jiri Pirko wrote:
>> Wed, Apr 26, 2017 at 01:48:29PM CEST, jhs@mojatatu.com wrote:
>> > On 17-04-26 02:19 AM, Jiri Pirko wrote:
>> > > Tue, Apr 25, 2017 at 10:29:40PM CEST, jhs@mojatatu.com wrote:
>> > > > On 17-04-25 12:04 PM, Jiri Pirko wrote:
>
>> > I have experience with dealing with a massive amount of various dumps
>> > and (batch) sets and it always boils down to one thing:
>> > _how much data is exchanged between user and kernel_
>> > 3 flags encoded as bits in a u32 attribute cost 64 bits.
>> > Encoded separately cost 3x that.
>> >
>> > Believe me, it _does make a difference_ in performance.
>> >
>> > My least favorite subsystem is bridge. The bridge code has
>> > tons of flags in those entries that are sent to/from kernel as u8
>> > attributes. It is painful.
>> >
>> > For something more recent, lets look at this commit from Ben on Flower:
>> > + TCA_FLOWER_KEY_MPLS_TTL, /* u8 - 8 bits */
>> > + TCA_FLOWER_KEY_MPLS_BOS, /* u8 - 1 bit */
>> > + TCA_FLOWER_KEY_MPLS_TC, /* u8 - 3 bits */
>> > + TCA_FLOWER_KEY_MPLS_LABEL, /* be32 - 20 bits */
>> >
>> > Yes, that looks pretty, but:
>> > That would have fit in one attribute with a u32. Mask attributes would
>> > be eliminated with a second 32 bit - all in the same singular
>> > attribute.
>> >
>> > Imagine if i have 1M flower entries. If you add up the mask the cost
>> > of these things is about 3*2*64 bits more per entry compared to putting
>> > the mpls info/mask in one attribute.
>> > At 1M entries that is a few MBs of data being exchanged.
>>
>> I can do the math :) Yet still, I would like to see the numbers :)
>> Because I believe that is the only way to end this lenghty converstation
>> once and forever...
>>
>
>Jiri, what are you arguing about if you have done the math? ;->
I can do 3*2*64. What I cannot do is to figure out the real performance
impact.
>You want me to show you that getting or setting less data is good for
>performance?
>Look at the third patch: Why do i think it is necessary to send only
>actions that have changed? Precisely to reduce the amount of data
>being transported. The second patch - to reduce the amount of crossing
>user space to kernel space (which is going to happen more with increased
>data I have to transport between the user and the kernel).
>
>Again: You are looking at this from a manageability point of view which
>is useful but not the only input into a design. If i can squeeze more
>data without killing usability - I am all for it. It just doesnt
>compute that it is ok to use a flag per attribute because it looks
>beautiful.
Hmm. Now that I'm thinking about it, why don't we have NLA_FLAGS with
couple of helpers around it? It will be obvious what the attr is, all
kernel code would use the same helpers. Would be nice.
^ permalink raw reply
* [PATCH v3 net-next]smsc911x: Adding support for Micochip LAN9250 Ethernet controller
From: David.Cai @ 2017-04-26 13:55 UTC (permalink / raw)
To: netdev, davem; +Cc: UNGLinuxDriver, steve.glendinning
From: David Cai <david.cai@microchip.com>
Adding support for Microchip LAN9250 Ethernet controller.
Signed-off-by: David Cai <david.cai@microchip.com>
---
Changes
V2
- email format changed
- remove unnecessary text in commit log Changes
V3
- defined all supported Ethernet controller chip ID.
- removed pdata->sub_generation = 0;
- changed 'if (pdata->sub_generation)' to
if (pdata>generation == 4 && pdata->sub_generation)
drivers/net/ethernet/smsc/smsc911x.c | 55 ++++++++++++++++++++++++------------
drivers/net/ethernet/smsc/smsc911x.h | 19 +++++++++++++
2 files changed, 56 insertions(+), 18 deletions(-)
diff --git a/drivers/net/ethernet/smsc/smsc911x.c b/drivers/net/ethernet/smsc/smsc911x.c
index fa5ca09..e35fe96 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -25,7 +25,7 @@
* LAN9215, LAN9216, LAN9217, LAN9218
* LAN9210, LAN9211
* LAN9220, LAN9221
- * LAN89218
+ * LAN89218,LAN9250
*
*/
@@ -104,6 +104,9 @@ struct smsc911x_data {
/* used to decide which workarounds apply */
unsigned int generation;
+ /* used to decide which sub generation product work arounds to apply */
+ unsigned int sub_generation;
+
/* device configuration (copied from platform_data during probe) */
struct smsc911x_platform_config config;
@@ -1450,6 +1453,8 @@ static int smsc911x_soft_reset(struct smsc911x_data *pdata)
unsigned int timeout;
unsigned int temp;
int ret;
+ unsigned int reset_offset = HW_CFG;
+ unsigned int reset_mask = HW_CFG_SRST_;
/*
* Make sure to power-up the PHY chip before doing a reset, otherwise @@ -1476,15 +1481,23 @@ static int smsc911x_soft_reset(struct smsc911x_data *pdata)
}
}
+ if (pdata->generation == 4 && pdata->sub_generation) {
+ /* special reset for LAN9250 */
+ reset_offset = RESET_CTL;
+ reset_mask = RESET_CTL_DIGITAL_RST_;
+ }
+
/* Reset the LAN911x */
- smsc911x_reg_write(pdata, HW_CFG, HW_CFG_SRST_);
+ smsc911x_reg_write(pdata, reset_offset, reset_mask);
+
+ /* verify reset bit is cleared */
timeout = 10;
do {
udelay(10);
- temp = smsc911x_reg_read(pdata, HW_CFG);
- } while ((--timeout) && (temp & HW_CFG_SRST_));
+ temp = smsc911x_reg_read(pdata, reset_offset);
+ } while ((--timeout) && (temp & reset_mask));
- if (unlikely(temp & HW_CFG_SRST_)) {
+ if (unlikely(temp & reset_mask)) {
SMSC_WARN(pdata, drv, "Failed to complete reset");
return -EIO;
}
@@ -2253,31 +2266,37 @@ static int smsc911x_init(struct net_device *dev)
pdata->idrev = smsc911x_reg_read(pdata, ID_REV);
switch (pdata->idrev & 0xFFFF0000) {
- case 0x01180000:
- case 0x01170000:
- case 0x01160000:
- case 0x01150000:
- case 0x218A0000:
+ case LAN9118:
+ case LAN9117:
+ case LAN9116:
+ case LAN9115:
+ case LAN89218:
/* LAN911[5678] family */
pdata->generation = pdata->idrev & 0x0000FFFF;
break;
- case 0x118A0000:
- case 0x117A0000:
- case 0x116A0000:
- case 0x115A0000:
+ case LAN9218:
+ case LAN9217:
+ case LAN9216:
+ case LAN9215:
/* LAN921[5678] family */
pdata->generation = 3;
break;
- case 0x92100000:
- case 0x92110000:
- case 0x92200000:
- case 0x92210000:
+ case LAN9210:
+ case LAN9211:
+ case LAN9220:
+ case LAN9221:
/* LAN9210/LAN9211/LAN9220/LAN9221 */
pdata->generation = 4;
break;
+ case LAN9250:
+ /* LAN9250 */
+ pdata->generation = 4;
+ pdata->sub_generation = 1;
+ break;
+
default:
SMSC_WARN(pdata, probe, "LAN911x not identified, idrev: 0x%08X",
pdata->idrev);
diff --git a/drivers/net/ethernet/smsc/smsc911x.h b/drivers/net/ethernet/smsc/smsc911x.h
index 54d6489..8d75508 100644
--- a/drivers/net/ethernet/smsc/smsc911x.h
+++ b/drivers/net/ethernet/smsc/smsc911x.h
@@ -20,6 +20,22 @@
#ifndef __SMSC911X_H__
#define __SMSC911X_H__
+/*Chip ID*/
+#define LAN9115 0x01150000
+#define LAN9116 0x01160000
+#define LAN9117 0x01170000
+#define LAN9118 0x01180000
+#define LAN9215 0x115A0000
+#define LAN9216 0x116A0000
+#define LAN9217 0x117A0000
+#define LAN9218 0x118A0000
+#define LAN9210 0x92100000
+#define LAN9211 0x92110000
+#define LAN9220 0x92200000
+#define LAN9221 0x92210000
+#define LAN9250 0x92500000
+#define LAN89218 0x218A0000
+
#define TX_FIFO_LOW_THRESHOLD ((u32)1600)
#define SMSC911X_EEPROM_SIZE ((u32)128)
#define USE_DEBUG 0
@@ -303,6 +319,9 @@
#define E2P_DATA_EEPROM_DATA_ 0x000000FF
#define LAN_REGISTER_EXTENT 0x00000100
+#define RESET_CTL 0x1F8
+#define RESET_CTL_DIGITAL_RST_ 0x00000001
+
/*
* MAC Control and Status Register (Indirect Address)
* Offset (through the MAC_CSR CMD and DATA port)
^ permalink raw reply related
* Re: [PATCH] IB/IPoIB: Check the headroom size
From: Doug Ledford @ 2017-04-26 13:50 UTC (permalink / raw)
To: Honggang LI
Cc: Paolo Abeni, Or Gerlitz, Erez Shitrit, Erez Shitrit,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Linux Netdev List, David Miller
In-Reply-To: <1493214483.3041.108.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
On Wed, 2017-04-26 at 09:48 -0400, Doug Ledford wrote:
> On Wed, 2017-04-26 at 21:33 +0800, Honggang LI wrote:
> >
> > Yes, it is during the process of removing the final slave. The
> > reproducer looks like this:
> >
> > ping remote_ip_over_bonding_interface &
> > while 1; do
> > ifdown bond0
> > ifup bond0
> > done
BTW, rerunning your test as:
ping remote_ip_over_bonding_interface &
while 1; do
echo -n "Downing interface..."
ifdown bond0
echo "done."
sleep 1
echo -n "Bringing interface up..."
ifup bond0
echo "done."
done
will allow you to more precisely identify whether the failure is during
the down or the up of the interface.
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net-next v9 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jiri Pirko @ 2017-04-26 13:48 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: davem, xiyou.wangcong, eric.dumazet, simon.horman, netdev
In-Reply-To: <fc75c67b-2413-a9a9-2411-7789eef43203@mojatatu.com>
Wed, Apr 26, 2017 at 03:19:54PM CEST, jhs@mojatatu.com wrote:
>On 17-04-26 09:17 AM, Jamal Hadi Salim wrote:
>> On 17-04-26 09:08 AM, Jiri Pirko wrote:
>> > Wed, Apr 26, 2017 at 02:42:17PM CEST, jhs@mojatatu.com wrote:
>>
>> > > +#define VALID_TCA_ROOT_FLAGS TCA_FLAG_LARGE_DUMP_ON
>> > > +static inline bool tca_flags_valid(u32 act_flags)
>> > > +{
>> > > + u32 invalid_flags_mask = ~VALID_TCA_ROOT_FLAGS;
>> > > +
>> > > + if (act_flags & invalid_flags_mask)
>> > > + return false;
>> > > +
>> > > + return true;
>> >
>> > This dance should either not be here (flag-per-attr) or should be in
>> > netlink generic place. This is not TC specific at all.
>> >
>>
>> So where do you think it should be?
>
>It would also be helpful for you to make comments when these things
>show up. This change was in version 6. I have had to do this back and
>forth a few times.
I still think that this whole thing is wrong, so that is why. Version 6
it is just because you are pushing versions too fast without actually
reaching consensus (I know why you are doing this, you really want to
push this through. I think it is not good).
^ permalink raw reply
* [PATCH v2 net-next] bridge: add per-port broadcast flood flag
From: Mike Manning @ 2017-04-26 13:48 UTC (permalink / raw)
To: netdev; +Cc: Nikolay Aleksandrov
Support for l2 multicast flood control was added in commit b6cb5ac8331b
("net: bridge: add per-port multicast flood flag"). It allows broadcast
as it was introduced specifically for unknown multicast flood control.
But as broadcast is a special case of multicast, this may also need to
be disabled. For this purpose, introduce a flag to disable the flooding
of received l2 broadcasts. This approach is backwards compatible and
provides flexibility in filtering for the desired packet types.
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Mike Manning <mmanning@brocade.com>
---
include/linux/if_bridge.h | 1 +
include/uapi/linux/if_link.h | 1 +
net/bridge/br_forward.c | 24 +++++++++++++++++-------
net/bridge/br_if.c | 2 +-
net/bridge/br_netlink.c | 3 +++
net/bridge/br_sysfs_if.c | 2 ++
6 files changed, 25 insertions(+), 8 deletions(-)
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index c5847dc..0c16866 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -48,6 +48,7 @@ struct br_ip_list {
#define BR_MCAST_FLOOD BIT(11)
#define BR_MULTICAST_TO_UNICAST BIT(12)
#define BR_VLAN_TUNNEL BIT(13)
+#define BR_BCAST_FLOOD BIT(14)
#define BR_DEFAULT_AGEING_TIME (300 * HZ)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 633aa02..8e56ac7 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -323,6 +323,7 @@ enum {
IFLA_BRPORT_MCAST_FLOOD,
IFLA_BRPORT_MCAST_TO_UCAST,
IFLA_BRPORT_VLAN_TUNNEL,
+ IFLA_BRPORT_BCAST_FLOOD,
__IFLA_BRPORT_MAX
};
#define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 902af6b..48fb174 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -183,13 +183,23 @@ void br_flood(struct net_bridge *br, struct sk_buff *skb,
struct net_bridge_port *p;
list_for_each_entry_rcu(p, &br->port_list, list) {
- /* Do not flood unicast traffic to ports that turn it off */
- if (pkt_type == BR_PKT_UNICAST && !(p->flags & BR_FLOOD))
- continue;
- /* Do not flood if mc off, except for traffic we originate */
- if (pkt_type == BR_PKT_MULTICAST &&
- !(p->flags & BR_MCAST_FLOOD) && skb->dev != br->dev)
- continue;
+ /* Do not flood unicast traffic to ports that turn it off, nor
+ * other traffic if flood off, except for traffic we originate
+ */
+ switch (pkt_type) {
+ case BR_PKT_UNICAST:
+ if (!(p->flags & BR_FLOOD))
+ continue;
+ break;
+ case BR_PKT_MULTICAST:
+ if (!(p->flags & BR_MCAST_FLOOD) && skb->dev != br->dev)
+ continue;
+ break;
+ case BR_PKT_BROADCAST:
+ if (!(p->flags & BR_BCAST_FLOOD) && skb->dev != br->dev)
+ continue;
+ break;
+ }
/* Do not flood to ports that enable proxy ARP */
if (p->flags & BR_PROXYARP)
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 6d273ca..b436ea0 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -362,7 +362,7 @@ static struct net_bridge_port *new_nbp(struct net_bridge *br,
p->path_cost = port_cost(dev);
p->priority = 0x8000 >> BR_PORT_BITS;
p->port_no = index;
- p->flags = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD;
+ p->flags = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
br_init_port(p);
br_set_state(p, BR_STATE_DISABLED);
br_stp_port_timer_init(p);
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 6509864..a572db71 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -189,6 +189,8 @@ static int br_port_fill_attrs(struct sk_buff *skb,
!!(p->flags & BR_FLOOD)) ||
nla_put_u8(skb, IFLA_BRPORT_MCAST_FLOOD,
!!(p->flags & BR_MCAST_FLOOD)) ||
+ nla_put_u8(skb, IFLA_BRPORT_BCAST_FLOOD,
+ !!(p->flags & BR_BCAST_FLOOD)) ||
nla_put_u8(skb, IFLA_BRPORT_PROXYARP, !!(p->flags & BR_PROXYARP)) ||
nla_put_u8(skb, IFLA_BRPORT_PROXYARP_WIFI,
!!(p->flags & BR_PROXYARP_WIFI)) ||
@@ -683,6 +685,7 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
br_set_port_flag(p, tb, IFLA_BRPORT_UNICAST_FLOOD, BR_FLOOD);
br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_FLOOD, BR_MCAST_FLOOD);
br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_TO_UCAST, BR_MULTICAST_TO_UNICAST);
+ br_set_port_flag(p, tb, IFLA_BRPORT_BCAST_FLOOD, BR_BCAST_FLOOD);
br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP, BR_PROXYARP);
br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP_WIFI, BR_PROXYARP_WIFI);
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 79aee75..5d5d413a 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -173,6 +173,7 @@ BRPORT_ATTR_FLAG(unicast_flood, BR_FLOOD);
BRPORT_ATTR_FLAG(proxyarp, BR_PROXYARP);
BRPORT_ATTR_FLAG(proxyarp_wifi, BR_PROXYARP_WIFI);
BRPORT_ATTR_FLAG(multicast_flood, BR_MCAST_FLOOD);
+BRPORT_ATTR_FLAG(broadcast_flood, BR_BCAST_FLOOD);
#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
static ssize_t show_multicast_router(struct net_bridge_port *p, char *buf)
@@ -221,6 +222,7 @@ static const struct brport_attribute *brport_attrs[] = {
&brport_attr_proxyarp,
&brport_attr_proxyarp_wifi,
&brport_attr_multicast_flood,
+ &brport_attr_broadcast_flood,
NULL
};
--
2.1.4
^ permalink raw reply related
* Re: [PATCH] IB/IPoIB: Check the headroom size
From: Doug Ledford @ 2017-04-26 13:48 UTC (permalink / raw)
To: Honggang LI
Cc: Paolo Abeni, Or Gerlitz, Erez Shitrit, Erez Shitrit,
linux-rdma@vger.kernel.org, Linux Netdev List, David Miller
In-Reply-To: <20170426133353.GA7898@dhcp-13-42.nay.redhat.com>
On Wed, 2017-04-26 at 21:33 +0800, Honggang LI wrote:
> Yes, it is during the process of removing the final slave. The
> reproducer looks like this:
>
> ping remote_ip_over_bonding_interface &
> while 1; do
> ifdown bond0
> ifup bond0
> done
Honestly, I would suspect the problem here is not when removing the
busy interface, but when bringing the interface back up. IIRC, the
bonding driver defaults to assuming it will be used on an Ethernet
interface. Once you attach an IB slave, it reconfigures itself for
running over IB instead. But once it's configured it should stay
configured until after the last IB slave is removed (and once that
slave it removed, the bond should no longer even possess the pointer to
the ipoib_hard_header routine, so we should never call it).
The process, in the bonding driver, for going down and coming back up
needs to look something like this:
ifdown bond0:
stop all queues
remove all slaves
possibly reconfigure to default state which is Ethernet suitable
ifup bond0:
add first slave
configure for IB instead of Ethernet
start queues
add additional slaves
I'm wondering if, when we have a current backlog, we aren't
accidentally doing this instead:
ifup bond0:
add first slave
release backlog queue
configure for IB instead of Ethernet
add additional slaves
Or, it might even be more subtle, such as:
ifup bond0:
add first slave
configure for IB instead of Ethernet
start queues
-> however, there was a backlog item on the queue from prior to
the first slave being added and the port configured for IB mode,
so the backlog skb is still configured for the default Ethernet
mode, hence the failure
add additional slaves
Maybe the real issue is that inside the bonding driver, when we
reconfigure the queue type from IB to Ethernet or Ethernet to IB, we
need to force either a drop or a reconfiguration of any packets already
currently in our waiting backlog queue of skbs. Paolo?
> >
> > but I don't think it can be after the final slave has been fully
> > removed from the bond. Paolo, what should the bond driver be doing
> > once the slaves are gone? Wouldn't it just be dropping every skb
> > on
> > the floor without calling anyone's hard header routine?
> >
> > >
> > > so it is better to immediately
> > > give up and return error.
> > >
> > > >
> > > >
> > > > + if (ret)
> > > > + return ret;
> > > >
> > > > header = (struct ipoib_header *) skb_push(skb, sizeof
> > > > *header);
> > > > ---
> > > >
> > > > Paolo
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe
> > > > linux-
> > > > rdma" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.h
> > > > tml
> > --
> > Doug Ledford <dledford@redhat.com>
> > GPG KeyID: B826A3330E572FDD
> >
> > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57
> > 2FDD
> >
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
^ permalink raw reply
* Re: [PATCH net-next v9 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jiri Pirko @ 2017-04-26 13:47 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: davem, xiyou.wangcong, eric.dumazet, simon.horman, netdev
In-Reply-To: <5723f2e0-7738-09f5-23c0-bfe7432d7528@mojatatu.com>
Wed, Apr 26, 2017 at 03:17:59PM CEST, jhs@mojatatu.com wrote:
>On 17-04-26 09:08 AM, Jiri Pirko wrote:
>> Wed, Apr 26, 2017 at 02:42:17PM CEST, jhs@mojatatu.com wrote:
>
>> > +#define VALID_TCA_ROOT_FLAGS TCA_FLAG_LARGE_DUMP_ON
>> > +static inline bool tca_flags_valid(u32 act_flags)
>> > +{
>> > + u32 invalid_flags_mask = ~VALID_TCA_ROOT_FLAGS;
>> > +
>> > + if (act_flags & invalid_flags_mask)
>> > + return false;
>> > +
>> > + return true;
>>
>> This dance should either not be here (flag-per-attr) or should be in
>> netlink generic place. This is not TC specific at all.
>>
>
>So where do you think it should be?
>
>> I would still like to see the numbers prooving we need this.
>> Thanks
>>
>
>We are going to have to agree to disagree.
No, I don't agree with this. All I'm asking is if the flag dance you do
is really necessary. This is important, UAPI, set in stone in future.
So please avoid the rush this. Don't send another version please.
Your argument is "performance", so I just asked for the proof that this
argument is valid. I believe it is not. That's legit, right?
Please proove me wrong and I'll be happy.
Thanks!
^ permalink raw reply
* Re: [PATCH] IB/IPoIB: Check the headroom size
From: Honggang LI @ 2017-04-26 13:33 UTC (permalink / raw)
To: Doug Ledford
Cc: Paolo Abeni, Or Gerlitz, Erez Shitrit, Erez Shitrit,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Linux Netdev List, David Miller
In-Reply-To: <1493213258.3041.98.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
On Wed, Apr 26, 2017 at 09:27:38AM -0400, Doug Ledford wrote:
> On Wed, 2017-04-26 at 20:52 +0800, Honggang LI wrote:
> > On Wed, Apr 26, 2017 at 09:46:34AM +0200, Paolo Abeni wrote:
> > >
> > > On Wed, 2017-04-26 at 00:50 +0300, Or Gerlitz wrote:
> > > >
> > > > so maybe @ least for the time being, we should be picking Hong's
> > > > patch
> > > > with proper change log and without the giant stack dump till we
> > > > have
> > > > something better. If you agree, can you do the re-write of the
> > > > change
> > > > log?
> > >
> > > I think that Hong's patch is following the correct way to fix the
> > > issue: ipoib_hard_header() can't assume that the skb headroom is at
> > > least IPOIB_HARD_LEN bytes, as wrongly implied by commit
> > > fc791b633515
> > > (my fault, I'm sorry).
> > >
> > > Perhaps we can make the code a little more robust with something
> > > alongside the following (only compile tested):
> > > ---
> > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > index d1d3fb7..d53d2e1 100644
> > > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > @@ -1160,6 +1160,11 @@ static int ipoib_hard_header(struct sk_buff
> > > *skb,
> > > const void *daddr, const void *saddr,
> > > unsigned len)
> > > {
> > > struct ipoib_header *header;
> > > + int ret;
> > > +
> > > + ret = skb_cow_head(skb, IPOIB_HARD_LEN);
> >
> > I don't think so. When this skb_under_panic arise, all slaves had
> > been
> > removed from a busy bonding interface,
>
> I'm not sure this entirely makes sense. If all slaves had been
> removed, then we should never end up in ipoib_hard_header. That should
> only be called as long as there is at least one slave still attached to
> the bond. It might be during the process of removing the final slave,
Yes, it is during the process of removing the final slave. The
reproducer looks like this:
ping remote_ip_over_bonding_interface &
while 1; do
ifdown bond0
ifup bond0
done
> but I don't think it can be after the final slave has been fully
> removed from the bond. Paolo, what should the bond driver be doing
> once the slaves are gone? Wouldn't it just be dropping every skb on
> the floor without calling anyone's hard header routine?
>
> > so it is better to immediately
> > give up and return error.
> >
> > >
> > > + if (ret)
> > > + return ret;
> > >
> > > header = (struct ipoib_header *) skb_push(skb, sizeof
> > > *header);
> > > ---
> > >
> > > Paolo
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-
> > > rdma" in
> > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> GPG KeyID: B826A3330E572FDD
>
> Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [oss-drivers] Re: [RFC 3/4] nfp: make use of extended ack message reporting
From: Daniel Borkmann @ 2017-04-26 13:31 UTC (permalink / raw)
To: Jamal Hadi Salim, Simon Horman, David Miller
Cc: jakub.kicinski, netdev, johannes, dsa, alexei.starovoitov,
bblanco, john.fastabend, kubakici, oss-drivers
In-Reply-To: <646e0fb0-1f10-f4d0-936c-7fdf96b6cc43@mojatatu.com>
On 04/26/2017 03:03 PM, Jamal Hadi Salim wrote:
> On 17-04-26 07:13 AM, Simon Horman wrote:
>
>> I don't feel strongly about this and perhaps it can be revisited at some
>> point but perhaps it would be worth documenting that he strings do not
>> form part of the UAPI as my expectation would have been that they do f.e. to
>> facilitate internationalisation.
>
> I thought people script against what iproute2 outputs today. We
> may have to change habits.
I would strictly treat this kind of auxiliary information as
non-stable error message data, and it should be documented
as such, f.e. in the man page.
Every driver or subsystem using this could have different
restrictions/semantics and thus error messages will not be
the same everywhere anyway, so there's zero point in parsing
this text from an application in the first place, it's just
passing this through to the user to aide debugging.
^ permalink raw reply
* Re: [PATCH] IB/IPoIB: Check the headroom size
From: Doug Ledford @ 2017-04-26 13:27 UTC (permalink / raw)
To: Honggang LI, Paolo Abeni
Cc: Or Gerlitz, Erez Shitrit, Erez Shitrit,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Linux Netdev List, David Miller
In-Reply-To: <20170426125230.GB19179-Y5OA6DF/u0nid9cnFhDO8BcY2uh10dtjAL8bYrjMMd8@public.gmane.org>
On Wed, 2017-04-26 at 20:52 +0800, Honggang LI wrote:
> On Wed, Apr 26, 2017 at 09:46:34AM +0200, Paolo Abeni wrote:
> >
> > On Wed, 2017-04-26 at 00:50 +0300, Or Gerlitz wrote:
> > >
> > > so maybe @ least for the time being, we should be picking Hong's
> > > patch
> > > with proper change log and without the giant stack dump till we
> > > have
> > > something better. If you agree, can you do the re-write of the
> > > change
> > > log?
> >
> > I think that Hong's patch is following the correct way to fix the
> > issue: ipoib_hard_header() can't assume that the skb headroom is at
> > least IPOIB_HARD_LEN bytes, as wrongly implied by commit
> > fc791b633515
> > (my fault, I'm sorry).
> >
> > Perhaps we can make the code a little more robust with something
> > alongside the following (only compile tested):
> > ---
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > index d1d3fb7..d53d2e1 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > @@ -1160,6 +1160,11 @@ static int ipoib_hard_header(struct sk_buff
> > *skb,
> > const void *daddr, const void *saddr,
> > unsigned len)
> > {
> > struct ipoib_header *header;
> > + int ret;
> > +
> > + ret = skb_cow_head(skb, IPOIB_HARD_LEN);
>
> I don't think so. When this skb_under_panic arise, all slaves had
> been
> removed from a busy bonding interface,
I'm not sure this entirely makes sense. If all slaves had been
removed, then we should never end up in ipoib_hard_header. That should
only be called as long as there is at least one slave still attached to
the bond. It might be during the process of removing the final slave,
but I don't think it can be after the final slave has been fully
removed from the bond. Paolo, what should the bond driver be doing
once the slaves are gone? Wouldn't it just be dropping every skb on
the floor without calling anyone's hard header routine?
> so it is better to immediately
> give up and return error.
>
> >
> > + if (ret)
> > + return ret;
> >
> > header = (struct ipoib_header *) skb_push(skb, sizeof
> > *header);
> > ---
> >
> > Paolo
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-
> > rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] IB/IPoIB: Check the headroom size
From: Paolo Abeni @ 2017-04-26 13:25 UTC (permalink / raw)
To: Honggang LI
Cc: Or Gerlitz, Doug Ledford, Erez Shitrit, Erez Shitrit,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Linux Netdev List, David Miller
In-Reply-To: <20170426125230.GB19179-Y5OA6DF/u0nid9cnFhDO8BcY2uh10dtjAL8bYrjMMd8@public.gmane.org>
On Wed, 2017-04-26 at 20:52 +0800, Honggang LI wrote:
> On Wed, Apr 26, 2017 at 09:46:34AM +0200, Paolo Abeni wrote:
> > On Wed, 2017-04-26 at 00:50 +0300, Or Gerlitz wrote:
> > > so maybe @ least for the time being, we should be picking Hong's patch
> > > with proper change log and without the giant stack dump till we have
> > > something better. If you agree, can you do the re-write of the change
> > > log?
> >
> > I think that Hong's patch is following the correct way to fix the
> > issue: ipoib_hard_header() can't assume that the skb headroom is at
> > least IPOIB_HARD_LEN bytes, as wrongly implied by commit fc791b633515
> > (my fault, I'm sorry).
> >
> > Perhaps we can make the code a little more robust with something
> > alongside the following (only compile tested):
> > ---
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > index d1d3fb7..d53d2e1 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > @@ -1160,6 +1160,11 @@ static int ipoib_hard_header(struct sk_buff *skb,
> > const void *daddr, const void *saddr, unsigned len)
> > {
> > struct ipoib_header *header;
> > + int ret;
> > +
> > + ret = skb_cow_head(skb, IPOIB_HARD_LEN);
>
> I don't think so. When this skb_under_panic arise, all slaves had been
> removed from a busy bonding interface, so it is better to immediately
> give up and return error.
I fear we can hit the same condition even with other, more convoluted,
setup (e.g. some kind of ip tunnel on top of ipoib) and the
skb_cow_head() check should be cheap, in the common case.
Anyway I'm fine even with the original fix, which has the advantage to
minimize the side effects.
Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net-next v9 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jamal Hadi Salim @ 2017-04-26 13:19 UTC (permalink / raw)
To: Jiri Pirko; +Cc: davem, xiyou.wangcong, eric.dumazet, simon.horman, netdev
In-Reply-To: <5723f2e0-7738-09f5-23c0-bfe7432d7528@mojatatu.com>
On 17-04-26 09:17 AM, Jamal Hadi Salim wrote:
> On 17-04-26 09:08 AM, Jiri Pirko wrote:
>> Wed, Apr 26, 2017 at 02:42:17PM CEST, jhs@mojatatu.com wrote:
>
>>> +#define VALID_TCA_ROOT_FLAGS TCA_FLAG_LARGE_DUMP_ON
>>> +static inline bool tca_flags_valid(u32 act_flags)
>>> +{
>>> + u32 invalid_flags_mask = ~VALID_TCA_ROOT_FLAGS;
>>> +
>>> + if (act_flags & invalid_flags_mask)
>>> + return false;
>>> +
>>> + return true;
>>
>> This dance should either not be here (flag-per-attr) or should be in
>> netlink generic place. This is not TC specific at all.
>>
>
> So where do you think it should be?
It would also be helpful for you to make comments when these things
show up. This change was in version 6. I have had to do this back and
forth a few times.
cheers,
jamal
^ permalink raw reply
* Re: [PATCH net-next v9 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jamal Hadi Salim @ 2017-04-26 13:17 UTC (permalink / raw)
To: Jiri Pirko; +Cc: davem, xiyou.wangcong, eric.dumazet, simon.horman, netdev
In-Reply-To: <20170426130844.GG1867@nanopsycho.orion>
On 17-04-26 09:08 AM, Jiri Pirko wrote:
> Wed, Apr 26, 2017 at 02:42:17PM CEST, jhs@mojatatu.com wrote:
>> +#define VALID_TCA_ROOT_FLAGS TCA_FLAG_LARGE_DUMP_ON
>> +static inline bool tca_flags_valid(u32 act_flags)
>> +{
>> + u32 invalid_flags_mask = ~VALID_TCA_ROOT_FLAGS;
>> +
>> + if (act_flags & invalid_flags_mask)
>> + return false;
>> +
>> + return true;
>
> This dance should either not be here (flag-per-attr) or should be in
> netlink generic place. This is not TC specific at all.
>
So where do you think it should be?
> I would still like to see the numbers prooving we need this.
> Thanks
>
We are going to have to agree to disagree.
cheers,
jamal
^ permalink raw reply
* Re: [PATCH net-next v8 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jamal Hadi Salim @ 2017-04-26 13:14 UTC (permalink / raw)
To: Jiri Pirko
Cc: davem, xiyou.wangcong, eric.dumazet, netdev, Simon Horman,
Benjamin LaHaise
In-Reply-To: <20170426120851.GE1867@nanopsycho.orion>
On 17-04-26 08:08 AM, Jiri Pirko wrote:
> Wed, Apr 26, 2017 at 01:48:29PM CEST, jhs@mojatatu.com wrote:
>> On 17-04-26 02:19 AM, Jiri Pirko wrote:
>>> Tue, Apr 25, 2017 at 10:29:40PM CEST, jhs@mojatatu.com wrote:
>>>> On 17-04-25 12:04 PM, Jiri Pirko wrote:
>> I have experience with dealing with a massive amount of various dumps
>> and (batch) sets and it always boils down to one thing:
>> _how much data is exchanged between user and kernel_
>> 3 flags encoded as bits in a u32 attribute cost 64 bits.
>> Encoded separately cost 3x that.
>>
>> Believe me, it _does make a difference_ in performance.
>>
>> My least favorite subsystem is bridge. The bridge code has
>> tons of flags in those entries that are sent to/from kernel as u8
>> attributes. It is painful.
>>
>> For something more recent, lets look at this commit from Ben on Flower:
>> + TCA_FLOWER_KEY_MPLS_TTL, /* u8 - 8 bits */
>> + TCA_FLOWER_KEY_MPLS_BOS, /* u8 - 1 bit */
>> + TCA_FLOWER_KEY_MPLS_TC, /* u8 - 3 bits */
>> + TCA_FLOWER_KEY_MPLS_LABEL, /* be32 - 20 bits */
>>
>> Yes, that looks pretty, but:
>> That would have fit in one attribute with a u32. Mask attributes would
>> be eliminated with a second 32 bit - all in the same singular
>> attribute.
>>
>> Imagine if i have 1M flower entries. If you add up the mask the cost
>> of these things is about 3*2*64 bits more per entry compared to putting
>> the mpls info/mask in one attribute.
>> At 1M entries that is a few MBs of data being exchanged.
>
> I can do the math :) Yet still, I would like to see the numbers :)
> Because I believe that is the only way to end this lenghty converstation
> once and forever...
>
Jiri, what are you arguing about if you have done the math? ;->
You want me to show you that getting or setting less data is good for
performance?
Look at the third patch: Why do i think it is necessary to send only
actions that have changed? Precisely to reduce the amount of data
being transported. The second patch - to reduce the amount of crossing
user space to kernel space (which is going to happen more with increased
data I have to transport between the user and the kernel).
Again: You are looking at this from a manageability point of view which
is useful but not the only input into a design. If i can squeeze more
data without killing usability - I am all for it. It just doesnt
compute that it is ok to use a flag per attribute because it looks
beautiful.
cheers,
jamal
^ permalink raw reply
* Re: [PATCH net-next] dt-bindings: mdio: Clarify binding document
From: Roger Quadros @ 2017-04-26 13:09 UTC (permalink / raw)
To: Florian Fainelli, netdev-u79uwXL29TY76Z2rM5mHXA
Cc: andrew-g2DYL2Zd6BY, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
tony-4v6yS6AI5VpBDgjK7y7TUQ, nsekhar-l0cyMroinI0,
jsarha-l0cyMroinI0, linux-omap-u79uwXL29TY76Z2rM5mHXA,
lars-Qo5EllUWu/uELgA04lAiVw, Rob Herring, Mark Rutland,
open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
open list
In-Reply-To: <20170425183308.26107-1-f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
On 25/04/17 21:33, Florian Fainelli wrote:
> The described GPIO reset property is applicable to *all* child PHYs. If
> we have one reset line per PHY present on the MDIO bus, these
> automatically become properties of the child PHY nodes.
>
> Finally, indicate how the RESET pulse width must be defined, which is
> the maximum value of all individual PHYs RESET pulse widths determined
> by reading their datasheets.
>
> Fixes: 69226896ad63 ("mdio_bus: Issue GPIO RESET to PHYs.")
> Signed-off-by: Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Reviewed-by: Roger Quadros <rogerq-l0cyMroinI0@public.gmane.org>
> ---
> Documentation/devicetree/bindings/net/mdio.txt | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/devicetree/bindings/net/mdio.txt b/Documentation/devicetree/bindings/net/mdio.txt
> index 4ffbbacebda1..96a53f89aa6e 100644
> --- a/Documentation/devicetree/bindings/net/mdio.txt
> +++ b/Documentation/devicetree/bindings/net/mdio.txt
> @@ -3,13 +3,17 @@ Common MDIO bus properties.
> These are generic properties that can apply to any MDIO bus.
>
> Optional properties:
> -- reset-gpios: List of one or more GPIOs that control the RESET lines
> - of the PHYs on that MDIO bus.
> -- reset-delay-us: RESET pulse width in microseconds as per PHY datasheet.
> +- reset-gpios: One GPIO that control the RESET lines of all PHYs on that MDIO
> + bus.
> +- reset-delay-us: RESET pulse width in microseconds.
>
> A list of child nodes, one per device on the bus is expected. These
> should follow the generic phy.txt, or a device specific binding document.
>
> +The 'reset-delay-us' indicates the RESET signal pulse width in microseconds and
> +applies to all PHY devices. It must therefore be appropriately determined based
> +on all PHY requirements (maximum value of all per-PHY RESET pulse widths).
> +
> Example :
> This example shows these optional properties, plus other properties
> required for the TI Davinci MDIO driver.
> @@ -21,7 +25,7 @@ required for the TI Davinci MDIO driver.
> #size-cells = <0>;
>
> reset-gpios = <&gpio2 5 GPIO_ACTIVE_LOW>;
> - reset-delay-us = <2>; /* PHY datasheet states 1us min */
> + reset-delay-us = <2>;
>
> ethphy0: ethernet-phy@1 {
> reg = <1>;
>
--
cheers,
-roger
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net-next v9 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jiri Pirko @ 2017-04-26 13:08 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: davem, xiyou.wangcong, eric.dumazet, simon.horman, netdev
In-Reply-To: <1493210538-21716-3-git-send-email-jhs@emojatatu.com>
Wed, Apr 26, 2017 at 02:42:17PM CEST, jhs@mojatatu.com wrote:
>From: Jamal Hadi Salim <jhs@mojatatu.com>
>
>When you dump hundreds of thousands of actions, getting only 32 per
>dump batch even when the socket buffer and memory allocations allow
>is inefficient.
>
>With this change, the user will get as many as possibly fitting
>within the given constraints available to the kernel.
>
>The top level action TLV space is extended. An attribute
>TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON
>is set by the user indicating the user is capable of processing
>these large dumps. Older user space which doesnt set this flag
>doesnt get the large (than 32) batches.
>The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many
>actions are put in a single batch. As such user space app knows how long
>to iterate (independent of the type of action being dumped)
>instead of hardcoded maximum of 32.
>
>Some results dumping 1.5M actions, first unpatched tc which the
>kernel doesnt help:
>
>prompt$ time -p tc actions ls action gact | grep index | wc -l
>1500000
>real 1388.43
>user 2.07
>sys 1386.79
>
>Now lets see a patched tc which sets the correct flags when requesting
>a dump:
>
>prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
>1500000
>real 178.13
>user 2.02
>sys 176.96
>
>That is about 8x performance improvement for tc which sets its
>receive buffer to about 32K.
>
>Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
>---
[...]
>+#define VALID_TCA_ROOT_FLAGS TCA_FLAG_LARGE_DUMP_ON
>+static inline bool tca_flags_valid(u32 act_flags)
>+{
>+ u32 invalid_flags_mask = ~VALID_TCA_ROOT_FLAGS;
>+
>+ if (act_flags & invalid_flags_mask)
>+ return false;
>+
>+ return true;
This dance should either not be here (flag-per-attr) or should be in
netlink generic place. This is not TC specific at all.
I would still like to see the numbers prooving we need this.
Thanks
^ permalink raw reply
* Re: [PATCH net-next v8 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jiri Pirko @ 2017-04-26 13:05 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Simon Horman, davem, xiyou.wangcong, eric.dumazet, netdev
In-Reply-To: <2e09d117-60f9-97ed-9f63-c94130ddbb0c@mojatatu.com>
Wed, Apr 26, 2017 at 02:46:22PM CEST, jhs@mojatatu.com wrote:
>On 17-04-26 07:02 AM, Simon Horman wrote:
>> On Tue, Apr 25, 2017 at 06:04:45PM +0200, Jiri Pirko wrote:
>[..]
>
>> > So fix iproute2. It is always first kernel, then iproute2.
>>
>> Perhaps I am missing the point or somehow misguided but I would expect that
>> if the UAPI uses BIT() it also provides BIT().
>
>There is a user of BIT() already in iproute2 (devlink). We can move
>the code to be more generally available for other iproute2 users.
>Then this UAPI change makes use of it.
Should be part of UAPI as well
I see that include/uapi/rdma/vmw_pvrdma-abi.h is using BIT macro.
I don't see BIT macro defined in UAPI (I thought it is). So either
define it there (not sure where) or just use "<<"
^ permalink raw reply
* Re: [oss-drivers] Re: [RFC 3/4] nfp: make use of extended ack message reporting
From: Jamal Hadi Salim @ 2017-04-26 13:03 UTC (permalink / raw)
To: Simon Horman, David Miller
Cc: jakub.kicinski, netdev, johannes, dsa, daniel, alexei.starovoitov,
bblanco, john.fastabend, kubakici, oss-drivers
In-Reply-To: <20170426111315.GC28251@vergenet.net>
On 17-04-26 07:13 AM, Simon Horman wrote:
> I don't feel strongly about this and perhaps it can be revisited at some
> point but perhaps it would be worth documenting that he strings do not
> form part of the UAPI as my expectation would have been that they do f.e. to
> facilitate internationalisation.
>
I thought people script against what iproute2 outputs today. We
may have to change habits.
cheers,
jamal
^ permalink raw reply
* Re: [PATCH] ipv6: ensure message length for raw socket is at least sizeof(ipv6hdr)
From: Alexander Potapenko @ 2017-04-26 13:00 UTC (permalink / raw)
To: David Miller
Cc: Dmitriy Vyukov, Kostya Serebryany, Eric Dumazet, Alexey Kuznetsov,
LKML, Networking
In-Reply-To: <CAG_fn=Xw-WTn4AsyNaaLMRaW1xqo1QOyp7JpYOW=z+hqpWdwLQ@mail.gmail.com>
On Wed, Apr 26, 2017 at 10:54 AM, Alexander Potapenko <glider@google.com> wrote:
> On Tue, Apr 25, 2017 at 7:55 PM, David Miller <davem@davemloft.net> wrote:
>> From: Alexander Potapenko <glider@google.com>
>> Date: Tue, 25 Apr 2017 15:18:27 +0200
>>
>>> rawv6_send_hdrinc() expects that the buffer copied from the userspace
>>> contains the IPv6 header, so if too few bytes are copied parts of the
>>> header may remain uninitialized.
>>>
>>> This bug has been detected with KMSAN.
>>>
>>> Signed-off-by: Alexander Potapenko <glider@google.com>
>>
>> Hmmm, ipv4 seems to lack this check as well.
>>
>> I think we need to be careful here and fully understand why KMSAN doesn't
>> seem to be triggering in the ipv4 case but for ipv6 it is before I apply
>> this.
> Maybe I just couldn't come up with a decent test case for ipv4 yet.
> syzkaller generated the equivalent of the following program for ipv6:
>
> =======================================
> #define _GNU_SOURCE
>
> #include <netinet/in.h>
> #include <string.h>
> #include <sys/socket.h>
> #include <error.h>
>
> int main()
> {
> int sock = socket(PF_INET6, SOCK_RAW, IPPROTO_RAW);
> struct sockaddr_in6 dest_addr;
> memset(&dest_addr, 0, sizeof(dest_addr));
> dest_addr.sin6_family = AF_INET6;
> inet_pton(AF_INET6, "ff00::", &dest_addr.sin6_addr);
> int err = sendto(sock, 0, 0, 0, &dest_addr, sizeof(dest_addr));
> if (err == -1)
> perror("sendto");
> return 0;
> }
> =======================================
>
> I attempted to replace INET6 and such with INET and provide a legal
> IPv4 address to inet_pton(), but couldn't trigger the warning.
syzkaller reports the following errors in the wild (although there's
no stable repro):
==================================================================
BUG: KMSAN: use of unitialized memory in raw_send_hdrinc
net/ipv4/raw.c:407 [inline]
BUG: KMSAN: use of unitialized memory in raw_sendmsg+0x2c8b/0x3400
net/ipv4/raw.c:640
inter: 0
CPU: 3 PID: 2853 Comm: syz-executor1 Not tainted 4.11.0-rc5+ #2445
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x143/0x1b0 lib/dump_stack.c:52
kmsan_report+0x16b/0x1e0 mm/kmsan/kmsan.c:1078
__kmsan_warning_32+0x5c/0xa0 mm/kmsan/kmsan_instr.c:510
raw_send_hdrinc net/ipv4/raw.c:407 [inline]
raw_sendmsg+0x2c8b/0x3400 net/ipv4/raw.c:640
inet_sendmsg+0x3f8/0x6d0 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg net/socket.c:643 [inline]
___sys_sendmsg+0xd4b/0x10f0 net/socket.c:1997
__sys_sendmsg net/socket.c:2031 [inline]
SYSC_sendmsg+0x2c6/0x3f0 net/socket.c:2042
SyS_sendmsg+0x87/0xb0 net/socket.c:2038
entry_SYSCALL_64_fastpath+0x13/0x94
RIP: 0033:0x44a669
RSP: 002b:00007f08d06edb58 EFLAGS: 00000286 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000708000 RCX: 000000000044a669
RDX: 0000000000000000 RSI: 0000000020007fc8 RDI: 0000000000000017
RBP: 0000000000005080 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000286 R12: 00000000006e4140
R13: 0000000020b22000 R14: 0000000000000000 R15: 0000000000000000
origin: 00000000cb200085
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:362 [inline]
kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:257
kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:270
slab_alloc_node mm/slub.c:2735 [inline]
__kmalloc_node_track_caller+0x1f4/0x390 mm/slub.c:4341
__kmalloc_reserve net/core/skbuff.c:138 [inline]
__alloc_skb+0x2cd/0x740 net/core/skbuff.c:231
alloc_skb include/linux/skbuff.h:933 [inline]
alloc_skb_with_frags+0x209/0xbc0 net/core/skbuff.c:4678
sock_alloc_send_pskb+0x9ff/0xe00 net/core/sock.c:1903
sock_alloc_send_skb+0xe4/0x100 net/core/sock.c:1920
raw_send_hdrinc net/ipv4/raw.c:366 [inline]
raw_sendmsg+0x1db4/0x3400 net/ipv4/raw.c:640
inet_sendmsg+0x3f8/0x6d0 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg net/socket.c:643 [inline]
___sys_sendmsg+0xd4b/0x10f0 net/socket.c:1997
__sys_sendmsg net/socket.c:2031 [inline]
SYSC_sendmsg+0x2c6/0x3f0 net/socket.c:2042
SyS_sendmsg+0x87/0xb0 net/socket.c:2038
entry_SYSCALL_64_fastpath+0x13/0x94
==================================================================
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
--
Alexander Potapenko
Software Engineer
Google Germany GmbH
Erika-Mann-Straße, 33
80636 München
Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox