Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
From: Raj, Ashok @ 2017-05-02 16:53 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Casey Leedom, Bjorn Helgaas, leedom, Michael Werner,
	Ganesh Goudar, Arjun V, David Miller, Asit K Mallick,
	Patrick J Cramer, Suravee Suthikulpanit, Bob Shaw, h,
	Ding Tianhong, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	Catalin Marinas, Will Deacon, LinuxArm, David Laight, Jeff 
In-Reply-To: <CAKgT0UeuW88AHBX14H8xHteeG7vR0AH3apbjOeEDx_6p46x7tA@mail.gmail.com>

On Tue, May 02, 2017 at 09:39:34AM -0700, Alexander Duyck wrote:
> On Mon, May 1, 2017 at 4:13 PM, Casey Leedom <leedom@chelsio.com> wrote:
> > The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> > Ordering Attribute should not be used on Transaction Layer Packets destined
> > for the PCIe End Node so flagged.  Initially flagged this way are Intel
> > E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> > Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> > don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
> 
> So this is a good first step though might I suggest one other change.
> 
> We may want to add logic to the core PCIe code that clears the "Enable
> Relaxed Ordering" bit in the device control register for all devices
> hanging off of this root complex. Assuming the devices conform to the
> PCIe spec doing that should disable relaxed ordering in a device
> agnostic way that then enables us at a driver level to just enable the
> feature always without having to perform any checks for your flag. We
> could probably do that as a part of probing the PCIe interfaces
> hanging off of these devices.

I suppose you don't want to turn off RO completely on the device. When
traffic is targetted to mmio for peer to peer reasons RO has performance
upside. The specific issue with these root ports indicate limitation using 
RO for traffic targetting coherent memory.

> 
> > ---
> >  drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
> >  include/linux/pci.h  |  2 ++
> >  2 files changed, 40 insertions(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index f754453..4ae78b3 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
> >                               quirk_tw686x_class);
> >
> >  /*
> > + * Some devices have problems with Transaction Layer Packets with the Relaxed
> > + * Ordering Attribute set.  Such devices should mark themselves and other
> > + * Device Drivers should check before sending TLPs with RO set.
> > + */
> > +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> > +{
> > +       dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> > +}
> > +
> > +/*
> > + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> > + * cause performance problems with Upstream Transaction Layer Packets with
> > + * Relaxed Ordering set.
> > + */
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +
> > +/*
> > + * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
> > + * where Upstream Transaction Layer Packets with the Relaxed Ordering
> > + * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
> > + * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
> > + * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
> > + * November 10, 2010).  As a result, on this platform we can't use Relaxed
> > + * Ordering for Upstream TLPs.
> > + */
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +
> > +/*
> >   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
> >   * values for the Attribute as were supplied in the header of the
> >   * corresponding Request, except as explicitly allowed when IDO is used."
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index eb3da1a..6764f66 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -178,6 +178,8 @@ enum pci_dev_flags {
> >         PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
> >         /* Get VPD from function 0 VPD */
> >         PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> > +       /* Don't use Relaxed Ordering for TLPs directed at this device */
> > +       PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 9),
> >  };
> >
> >  enum pci_irq_reroute_variant {
> > --
> > 1.9.1
> >

^ permalink raw reply

* Re: net/ipv6: use-after-free in __call_rcu/in6_dev_finish_destroy_rcu
From: Andrey Konovalov @ 2017-05-02 16:58 UTC (permalink / raw)
  To: David Ahern
  Cc: Paul E. McKenney, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Eric Dumazet,
	Cong Wang, Dmitry Vyukov, Kostya Serebryany, syzkaller
In-Reply-To: <e133f72f-89ab-bf45-67f5-4ad0c502adc4@cumulusnetworks.com>

On Tue, May 2, 2017 at 4:44 AM, David Ahern <dsa@cumulusnetworks.com> wrote:
> On 4/26/17 9:15 AM, Andrey Konovalov wrote:
>> +David
>>
>> I've enabled CONFIG_DEBUG_OBJECTS_RCU_HEAD and this is what I get.
>>
>> Apparently the rcu warning is related to the fib6_del_route bug I've
>> been trying to reproduce:
>> https://groups.google.com/forum/#!msg/syzkaller/3SS80JbVPKA/2tfIAcW7DwAJ
>>
>> Adding David, who provided the fix:
>> https://patchwork.ozlabs.org/patch/754913/
>>
>> I've managed to extract a reproducer, attached together with the
>> .config that I used.
>>
>> On commit 5a7ad1146caa895ad718a534399e38bd2ba721b7 (4.11-rc8) with
>> David's patch applied.
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 1 PID: 5911 at lib/debugobjects.c:289
>> debug_print_object+0x175/0x210
>> ODEBUG: activate active (active state 1) object type: rcu_head hint:
>>         (null)
>> Modules linked in:
>> CPU: 1 PID: 5911 Comm: a.out Not tainted 4.11.0-rc8+ #271
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:16
>>  dump_stack+0x192/0x22d lib/dump_stack.c:52
>>  __warn+0x19f/0x1e0 kernel/panic.c:549
>>  warn_slowpath_fmt+0xe0/0x120 kernel/panic.c:564
>>  debug_print_object+0x175/0x210 lib/debugobjects.c:286
>>  debug_object_activate+0x574/0x7e0 lib/debugobjects.c:442
>>  debug_rcu_head_queue kernel/rcu/rcu.h:75
>>  __call_rcu.constprop.76+0xff/0x9c0 kernel/rcu/tree.c:3229
>>  call_rcu_sched+0x12/0x20 kernel/rcu/tree.c:3288
>>  rt6_rcu_free net/ipv6/ip6_fib.c:158
>>  rt6_release+0x1ea/0x290 net/ipv6/ip6_fib.c:188
>>  fib6_del_route net/ipv6/ip6_fib.c:1461
>
> I think I got to the bottom of this one.
>
> With your config, ip6_tunnel is compiled in.
>
> The program runs in a very tight loop, calling 'unshare -n' and then
> spawns 2 sets of 14 threads running random ioctl calls. The networking
> sequence:
>
> 1. New network namespace created via unshare -n
> - ip6tnl0 device is created in down state
>
> 2. address added to ip6tnl0 (equivalent to ip -6 addr add dev ip6tnl0
> fd00::bb/1)
> - the host route is created and inserted into FIB
>
> 3. ip6tnl0 is brought up - starts DAD on the address
>
> 4. exit namespace
> - teardown / cleanup sequence starts
> - lo teardown appears to happen BEFORE teardown of ip6tunl0
>   + removes host route from FIB
>   + host route added to rcu callback list: call_rcu(&rt->dst.rcu_head,
> dst_rcu_free);
>   + rcu callback has not run yet, so rt is NOT on the gc list so it has
>     NOT been marked obsolete
>
> 5. worker_thread runs addrconf_dad_completed
> - calls ipv6_ifa_notify which inserts the host route
>
> All of that happens very quickly. The result is that a route that has
> been deleted and added to the RCU list is re-inserted into the FIB. What
> happens next depends on order -- in this case the exit namespace
> eventually gets to cleaning up ip6tnl0 which removes the host route from
> the FIB, calls the rcu function for cleanup -- and triggers the double
> rcu trace.
>
> I have a hack that flags this sequence and prevents the re-insertion
> following DAD. That allows the command to run until it consumes all 2G
> of memory the VM has -- about 600+ iterations without triggering any
> stack traces.

Hi David,

Thanks for looking into this!

Do you have a patch that I could test?

I also reported another issue recently, that might also be related to this one:
https://groups.google.com/forum/#!topic/syzkaller/Rt0pgY4wfiw

Thanks!

^ permalink raw reply

* Re: [PATCH net v3] driver: veth: Fix one possbile memleak when fail to register_netdevice
From: Xin Long @ 2017-05-02 16:59 UTC (permalink / raw)
  To: Gao Feng; +Cc: davem, jarod, Stephen Hemminger, dsa, network dev
In-Reply-To: <000601d2c333$abb526d0$031f7470$@vip.163.com>

On Tue, May 2, 2017 at 7:03 PM, Gao Feng <gfree.wind@vip.163.com> wrote:
>> From: Xin Long [mailto:lucien.xin@gmail.com]
>> Sent: Tuesday, May 2, 2017 3:56 PM
>> On Sat, Apr 29, 2017 at 11:51 AM,  <gfree.wind@foxmail.com> wrote:
>> > From: Gao Feng <gfree.wind@foxmail.com>
> [...]
>> > -static void veth_dev_free(struct net_device *dev)
>> > +static void veth_destructor_free(struct net_device *dev)
>> >  {
>> >         free_percpu(dev->vstats);
>> > +}
>> not sure why you needed to add this function.
>> to use free_percpu() directly may be clearer.
>
> Because both of ndo_uninit and destructor need to perform same free statements.
> It is good at maintain the codes with the common function.
>>
>> > +
>> > +static void veth_dev_uninit(struct net_device *dev) {
>> call free_percpu() here, no need to check dev->reg_state.
>> free_percpu will just return if dev->vstats is NULL.
>
> It would break the original design if don't check the reg_state.
> The original logic is that free the resources in the destructor, not in ndo_init.
I got what you're doing now, can you pls try to fix this with:

--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -219,10 +219,9 @@ static int veth_dev_init(struct net_device *dev)
        return 0;
 }

-static void veth_dev_free(struct net_device *dev)
+static void veth_dev_uninit(struct net_device *dev)
 {
        free_percpu(dev->vstats);
-       free_netdev(dev);
 }

 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -279,6 +278,7 @@ static void veth_set_rx_headroom(struct net_device
*dev, int new_hr)

 static const struct net_device_ops veth_netdev_ops = {
        .ndo_init            = veth_dev_init,
+       .ndo_uninit          = veth_dev_uninit,
        .ndo_open            = veth_open,
        .ndo_stop            = veth_close,
        .ndo_start_xmit      = veth_xmit,
@@ -317,7 +317,7 @@ static void veth_setup(struct net_device *dev)
                               NETIF_F_HW_VLAN_STAG_TX |
                               NETIF_F_HW_VLAN_CTAG_RX |
                               NETIF_F_HW_VLAN_STAG_RX);
-       dev->destructor = veth_dev_free;
+       dev->destructor = free_netdev;
        dev->max_mtu = ETH_MAX_MTU;

        dev->hw_features = VETH_FEATURES;


just as what other virtual nic drivers do (vxlan, geneve, macsec, bridge ....)

>
> BTW, because I send multiple patches too fast today, the email server blocks my account.
> So I have to reply you with a different email account. Sorry.
>
> Best Regards
> Feng
>
>>
> [...]
>
>

^ permalink raw reply

* Re: [PATCH net-next iproute2 1/3] netlink: import netlink message parsing from kernel
From: David Miller @ 2017-05-02 17:00 UTC (permalink / raw)
  To: dsa; +Cc: stephen, netdev, jakub.kicinski
In-Reply-To: <7fa44ec5-9a3a-55d1-b058-540d37ef01bd@cumulusnetworks.com>

From: David Ahern <dsa@cumulusnetworks.com>
Date: Tue, 2 May 2017 10:51:23 -0600

> On 5/2/17 9:25 AM, Stephen Hemminger wrote:
>> Please either use existing netlink attribute code in libnetlink.h
>> (rta_getattr_u32 etc) or use libmnl like devlink.
> 
> All of the existing rta_ functions take a struct rta_attr; netlink
> messages use struct nlattr. It's just wrong to use rta functions for
> netlink messages.

Agreed.

> There is no existing parse function for nlattr. There is no existing
> validate function - for nlattr or rta_attr. So I do not see any overlap
> with existing code.

Also agreed.

^ permalink raw reply

* Re: TPACKET_V3 timeout bug?
From: chetan loke @ 2017-05-02 17:16 UTC (permalink / raw)
  To: Guy Harris; +Cc: Andrew Lunn, Sowmini Varadhan, netdev, tcpdump-workers
In-Reply-To: <CAAsGZS5ir2Ha1swhdGCg7ibgYRORZykZcu089=oJ51NT4GKDuA@mail.gmail.com>

On Tue, May 2, 2017 at 8:04 AM, chetan loke <loke.chetan@gmail.com> wrote:
> On Sat, Apr 15, 2017 at 7:41 PM, Guy Harris <guy@alum.mit.edu> wrote:
>> On Apr 15, 2017, at 7:10 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>>
>>> Do you think this is a kernel problem, libpcap problem, or an
>>> application problem?
>>
>
> Its clearly a kernel regression.
>

Commit that caused it:

https://github.com/torvalds/linux/commit/41a50d621a321b4c15273cc1b5ed41437f4acdfb

Reverting that change is what we need.

When monitoring aggregated links (example: request going out on one
link and response coming in on another) you have to know when to start
mining packets for time-interval[s] to report anomalies etc. And the
block-retirement was designed such that the kernel would fill those
values for the app in the ts_first[/last]_pkt.

We have already amortized the cost by doing a bulk wakeup. And if
user-space holds on to the block even when it is empty (instead of
just using/copying the start/end time interval and releasing the
block) then its a bug in your app.

And if user-space is lagging behind then you will always run out of
buffer space. So I don't buy the entire commit argument of the patch.

^ permalink raw reply

* Re: net/ipv6: use-after-free in __call_rcu/in6_dev_finish_destroy_rcu
From: David Ahern @ 2017-05-02 17:22 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: Paul E. McKenney, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Eric Dumazet,
	Cong Wang, Dmitry Vyukov, Kostya Serebryany, syzkaller
In-Reply-To: <CAAeHK+z-LFc1mbg3FKtSnMqLGQf2kr432a-MSeMx8Q7epfnVaw@mail.gmail.com>

On 5/2/17 10:58 AM, Andrey Konovalov wrote:
> Do you have a patch that I could test?

not yet.

> 
> I also reported another issue recently, that might also be related to this one:
> https://groups.google.com/forum/#!topic/syzkaller/Rt0pgY4wfiw

different problem. I can still trigger this one with the reproducer you sent

^ permalink raw reply

* [PATCH 0/3] net/atm: Fine-tuning for three function implementations
From: SF Markus Elfring @ 2017-05-02 17:45 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: LKML, kernel-janitors

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Tue, 2 May 2017 19:37:39 +0200

A few update suggestions were taken into account
from static source code analysis.

Markus Elfring (3):
  Combine four seq_printf() calls in mpc_show()
  Use seq_putc() in mpc_show()
  Add some spaces for better code readability

 net/atm/mpoa_proc.c | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

-- 
2.12.2

^ permalink raw reply

* [PATCH 1/3] net/atm: Combine four seq_printf() calls in mpc_show()
From: SF Markus Elfring @ 2017-05-02 17:46 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: LKML, kernel-janitors
In-Reply-To: <d9577c1f-07d1-b43a-df16-8626acfb1043@users.sourceforge.net>

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Tue, 2 May 2017 18:52:58 +0200

Some data were put into a sequence by four separate function calls.
Print the same data by two function calls instead.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 net/atm/mpoa_proc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/atm/mpoa_proc.c b/net/atm/mpoa_proc.c
index 2df34eb5d65f..6ea6028fd865 100644
--- a/net/atm/mpoa_proc.c
+++ b/net/atm/mpoa_proc.c
@@ -145,8 +145,8 @@ static int mpc_show(struct seq_file *m, void *v)
 		return 0;
 	}
 
-	seq_printf(m, "\nInterface %d:\n\n", mpc->dev_num);
-	seq_printf(m, "Ingress Entries:\nIP address      State      Holding time  Packets fwded  VPI  VCI\n");
+	seq_printf(m, "\nInterface %d:\n\nIngress Entries:\nIP address      State      Holding time  Packets fwded  VPI  VCI\n",
+		   mpc->dev_num);
 	do_gettimeofday(&now);
 
 	for (in_entry = mpc->in_cache; in_entry; in_entry = in_entry->next) {
@@ -165,7 +165,7 @@ static int mpc_show(struct seq_file *m, void *v)
 	}
 
-	seq_printf(m, "\n");
-	seq_printf(m, "Egress Entries:\nIngress MPC ATM addr\nCache-id        State      Holding time  Packets recvd  Latest IP addr   VPI VCI\n");
+	seq_printf(m,
+		   "\nEgress Entries:\nIngress MPC ATM addr\nCache-id        State      Holding time  Packets recvd  Latest IP addr   VPI VCI\n");
 	for (eg_entry = mpc->eg_cache; eg_entry; eg_entry = eg_entry->next) {
 		unsigned char *p = eg_entry->ctrl_info.in_MPC_data_ATM_addr;
 		for (i = 0; i < ATM_ESA_LEN; i++)
-- 
2.12.2

^ permalink raw reply related

* [PATCH 2/3] net/atm: Use seq_putc() in mpc_show()
From: SF Markus Elfring @ 2017-05-02 17:48 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: LKML, kernel-janitors
In-Reply-To: <d9577c1f-07d1-b43a-df16-8626acfb1043@users.sourceforge.net>

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Tue, 2 May 2017 18:58:08 +0200

Single characters (line breaks) should be put into a sequence.
Thus use the corresponding function "seq_putc".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 net/atm/mpoa_proc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/atm/mpoa_proc.c b/net/atm/mpoa_proc.c
index 6ea6028fd865..324c4f95f4bf 100644
--- a/net/atm/mpoa_proc.c
+++ b/net/atm/mpoa_proc.c
@@ -161,7 +161,7 @@ static int mpc_show(struct seq_file *m, void *v)
 			seq_printf(m, "   %-3d  %-3d",
 				   in_entry->shortcut->vpi,
 				   in_entry->shortcut->vci);
-		seq_printf(m, "\n");
+		seq_putc(m, '\n');
 	}

 	seq_printf(m,
@@ -185,9 +185,9 @@ static int mpc_show(struct seq_file *m, void *v)
 			seq_printf(m, " %-3d %-3d",
 				   eg_entry->shortcut->vpi,
 				   eg_entry->shortcut->vci);
-		seq_printf(m, "\n");
+		seq_putc(m, '\n');
 	}
-	seq_printf(m, "\n");
+	seq_putc(m, '\n');
 	return 0;
 }

-- 
2.12.2

^ permalink raw reply related

* [PATCH 3/3] net/atm: Add some spaces for better code readability
From: SF Markus Elfring @ 2017-05-02 17:49 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: LKML, kernel-janitors
In-Reply-To: <d9577c1f-07d1-b43a-df16-8626acfb1043@users.sourceforge.net>

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Tue, 2 May 2017 19:19:14 +0200

Use space characters at some source code places according to
the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 net/atm/mpoa_proc.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/net/atm/mpoa_proc.c b/net/atm/mpoa_proc.c
index 324c4f95f4bf..6a52606557f0 100644
--- a/net/atm/mpoa_proc.c
+++ b/net/atm/mpoa_proc.c
@@ -154,8 +154,8 @@ static int mpc_show(struct seq_file *m, void *v)
 		seq_printf(m, "%-16s%s%-14lu%-12u",
 			   ip_string,
 			   ingress_state_string(in_entry->entry_state),
-			   in_entry->ctrl_info.holding_time -
-			   (now.tv_sec-in_entry->tv.tv_sec),
+			   in_entry->ctrl_info.holding_time
+			   - (now.tv_sec - in_entry->tv.tv_sec),
 			   in_entry->packets_fwded);
 		if (in_entry->shortcut)
 			seq_printf(m, "   %-3d  %-3d",
@@ -173,8 +173,8 @@ static int mpc_show(struct seq_file *m, void *v)
 		seq_printf(m, "\n%-16lu%s%-14lu%-15u",
 			   (unsigned long)ntohl(eg_entry->ctrl_info.cache_id),
 			   egress_state_string(eg_entry->entry_state),
-			   (eg_entry->ctrl_info.holding_time -
-			    (now.tv_sec-eg_entry->tv.tv_sec)),
+			   eg_entry->ctrl_info.holding_time
+			   - (now.tv_sec - eg_entry->tv.tv_sec),
 			   eg_entry->packets_rcvd);
 
 		/* latest IP address */
@@ -213,7 +213,7 @@ static ssize_t proc_mpc_write(struct file *file, const char __user *buff,
 		return 0;
 
 	if (nbytes >= PAGE_SIZE)
-		nbytes = PAGE_SIZE-1;
+		nbytes = PAGE_SIZE - 1;
 
 	page = (char *)__get_free_page(GFP_KERNEL);
 	if (!page)
@@ -251,18 +251,21 @@ static int parse_qos(const char *buff)
 	memset(&qos, 0, sizeof(struct atm_qos));
 
 	if (sscanf(buff, "del %hhu.%hhu.%hhu.%hhu",
-			ip, ip+1, ip+2, ip+3) == 4) {
+		   ip, ip + 1, ip + 2, ip + 3) == 4) {
 		ipaddr = *(__be32 *)ip;
 		return atm_mpoa_delete_qos(atm_mpoa_search_qos(ipaddr));
 	}
 
 	if (sscanf(buff, "add %hhu.%hhu.%hhu.%hhu tx=%d,%d rx=tx",
-			ip, ip+1, ip+2, ip+3, &tx_pcr, &tx_sdu) == 6) {
+		   ip, ip + 1, ip + 2, ip + 3, &tx_pcr, &tx_sdu) == 6) {
 		rx_pcr = tx_pcr;
 		rx_sdu = tx_sdu;
-	} else if (sscanf(buff, "add %hhu.%hhu.%hhu.%hhu tx=%d,%d rx=%d,%d",
-		ip, ip+1, ip+2, ip+3, &tx_pcr, &tx_sdu, &rx_pcr, &rx_sdu) != 8)
-		return 0;
+	} else {
+		if (sscanf(buff, "add %hhu.%hhu.%hhu.%hhu tx=%d,%d rx=%d,%d",
+			   ip, ip + 1, ip + 2, ip + 3,
+			   &tx_pcr, &tx_sdu, &rx_pcr, &rx_sdu) != 8)
+			return 0;
+	}
 
 	ipaddr = *(__be32 *)ip;
 	qos.txtp.traffic_class = ATM_CBR;
-- 
2.12.2

^ permalink raw reply related

* Re: TPACKET_V3 timeout bug?
From: Guy Harris @ 2017-05-02 17:54 UTC (permalink / raw)
  To: chetan loke; +Cc: Andrew Lunn, Sowmini Varadhan, netdev, tcpdump-workers
In-Reply-To: <CAAsGZS5ir2Ha1swhdGCg7ibgYRORZykZcu089=oJ51NT4GKDuA@mail.gmail.com>

On May 2, 2017, at 8:04 AM, chetan loke <loke.chetan@gmail.com> wrote:

> On Sat, Apr 15, 2017 at 7:41 PM, Guy Harris <guy@alum.mit.edu> wrote:
>> On Apr 15, 2017, at 7:10 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>> 
>>> Do you think this is a kernel problem, libpcap problem, or an
>>> application problem?
>> 
> 
> Its clearly a kernel regression.
> 
> If you look at if_packet.h, I have explicitly called out all the cases
> for the return/status codes. When I first merged the functionality in
> 3.11(or 3.12 I think) I had the logic in place to retire the block
> with or without packets in it. I think there was one case where we
> wouldn't wake up userspace. Someone checked in a fix for that. Now I
> am not sure the regression happened as part of that bug fix or
> sometime later. If you diff 3.12 against the latest you will find the
> regression. Look for prb_retire_rx_blk_timer_expired().

Yes, there's a case where user space wasn't being woken up.

As I said in

	https://github.com/the-tcpdump-group/libpcap/issues/335#issuecomment-30280794

It appeared, at the time, that PF_PACKET sockets delivered a wakeup when a packet is put in a buffer block or dropped due to no buffer blocks being empty, but not when a buffer block is handed to userland.

This means that if the kernel's timer expires, and there are no packets in the current buffer block being filled by the kernel, that buffer block will be handed to userland, but userland won't be woken up to tell it to consume that block.

Thus, libpcap will consume that block only if either:

	* a packet is put in a buffer block, meaning it must pass the filter and there must be a current buffer block, belonging to the kernel, into which to put it;
	* a packet arrives and passes the filter, but there are no current buffer blocks belonging to the kernel, so it's dropped;
	* the poll() times out.

So, with a low packet acceptance rate (either because there isn't much network traffic or because there is but most of it is rejected by the packet filter), and with a poll() timeout of -1, meaning "block forever", 1) will happen infrequently, and 3) will never happen. With an in-kernel timeout rate significantly lower than the rate of packet acceptance, the timeout will often occur when there are no packets in the current buffer block, in which case the kernel will hand an empty buffer block to userland and not tell userland about it.

If that happens often enough in sequence to cause all buffer blocks to be handed to userland before any wakeups occur, the kernel now has no buffer blocks into which to put packets, and the next time a packet arrives, it will be dropped, and a wakeup will finally occur. libpcap will drain the ring, handing all buffer blocks to the kernel, but it won't have any packets to process!

So this is ultimately a problem with the TPACKET_V3 code in the kernel. I personally think that it should not deliver empty buffer blocks to userland, and that it also should not deliver a wakeup when a packet is accepted, and should deliver a wakeup whenever a buffer block is handed to userland. I'll report this to somebody and let them decide which of those changes should be done.

If you want to deliver empty buffer blocks to userland, that's fine, but make sure you wake up userland so that it can process those packets rather than leaving them there taking up space in the ring buffer.

And if you insist on delivering a wakeup when a packet is accepted - a wakeup that libpcap, at least, won't do anything with, as there's nothing useful for it to do with that wakeup - also make sure you deliver a wakeup when a buffer block is handed to userland, which is what libpcap cares about.

> I cannot speak on behalf of user-space wrappers developed around
> tpacket_v3 but the intention(from the kernel POV) of the block_timer
> *is* to unblock the capture/user process/thread so that it does NOT
> stay blocked for an indefinite period of time. The header explicitly
> specifies that contract.

That's not part of the contract for libpcap, as it's a question of what the underlying capture mechanism does, and we don't necessarily have any control over that; if a particular capture mechanism used by libpcap has that as part of its contract, that's OK, but libpcap-based applications shouldn't depend on it.

^ permalink raw reply

* Re: [PATCH net-next iproute2 1/3] netlink: import netlink message parsing from kernel
From: Stephen Hemminger @ 2017-05-02 18:03 UTC (permalink / raw)
  To: David Miller; +Cc: dsa, netdev, jakub.kicinski
In-Reply-To: <20170502.130032.1854491688518080191.davem@davemloft.net>

On Tue, 02 May 2017 13:00:32 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:

> From: David Ahern <dsa@cumulusnetworks.com>
> Date: Tue, 2 May 2017 10:51:23 -0600
> 
> > On 5/2/17 9:25 AM, Stephen Hemminger wrote:  
> >> Please either use existing netlink attribute code in libnetlink.h
> >> (rta_getattr_u32 etc) or use libmnl like devlink.  
> > 
> > All of the existing rta_ functions take a struct rta_attr; netlink
> > messages use struct nlattr. It's just wrong to use rta functions for
> > netlink messages.  
> 
> Agreed.
> 
> > There is no existing parse function for nlattr. There is no existing
> > validate function - for nlattr or rta_attr. So I do not see any overlap
> > with existing code.  
> 
> Also agreed.

Then use libmnl it is already used in several other places in iproute2.
Eventually, I would like to use it everywhere and get rid of old netlink parser.

^ permalink raw reply

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
From: Alexander Duyck @ 2017-05-02 18:10 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: Casey Leedom, Bjorn Helgaas, leedom, Michael Werner,
	Ganesh Goudar, Arjun V, David Miller, Asit K Mallick,
	Patrick J Cramer, Suravee Suthikulpanit, Bob Shaw, h,
	Ding Tianhong, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	Catalin Marinas, Will Deacon, LinuxArm, David Laight, Jeff 
In-Reply-To: <20170502165329.GB26406@linux.intel.com>

On Tue, May 2, 2017 at 9:53 AM, Raj, Ashok <ashok.raj@intel.com> wrote:
> On Tue, May 02, 2017 at 09:39:34AM -0700, Alexander Duyck wrote:
>> On Mon, May 1, 2017 at 4:13 PM, Casey Leedom <leedom@chelsio.com> wrote:
>> > The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
>> > Ordering Attribute should not be used on Transaction Layer Packets destined
>> > for the PCIe End Node so flagged.  Initially flagged this way are Intel
>> > E5-26xx Root Complex Ports which suffer from a Flow Control Credit
>> > Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
>> > don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
>>
>> So this is a good first step though might I suggest one other change.
>>
>> We may want to add logic to the core PCIe code that clears the "Enable
>> Relaxed Ordering" bit in the device control register for all devices
>> hanging off of this root complex. Assuming the devices conform to the
>> PCIe spec doing that should disable relaxed ordering in a device
>> agnostic way that then enables us at a driver level to just enable the
>> feature always without having to perform any checks for your flag. We
>> could probably do that as a part of probing the PCIe interfaces
>> hanging off of these devices.
>
> I suppose you don't want to turn off RO completely on the device. When
> traffic is targetted to mmio for peer to peer reasons RO has performance
> upside. The specific issue with these root ports indicate limitation using
> RO for traffic targetting coherent memory.

Actually my main concern here is virtualization. If I take the PCIe
function and direct assign it I have no way of seeing the root complex
flag as it is now virtualized away. In the meantime the guest now has
the ability to enable the function and sees nothing that says you
can't enable relaxed ordering which in turn ends up potentially
causing data corruption on the system. I want relaxed ordering
disabled before I even consider assigning it to the guest on the
systems where this would be an issue.

I prefer to err on the side of caution with this. Enabling Relaxed
Ordering is technically a performance enhancement, so we function but
not as well as we would like, while having it enabled when there are
issues can lead to data corruption. I would weigh the risk of data
corruption the thing to be avoided and of much higher priority than
enabling improved performance. As such I say we should default the
relaxed ordering attribute to off in general and look at
"white-listing" it in for various architectures and/or chipsets that
support/need it rather than having it enabled by default and trying to
switch it off after the fact when we find some new issue.

So for example, in the case of x86 it seems like there are multiple
root complexes that have issues, and the gains for enabling it with
standard DMA to host memory are small. As such we may want to default
it to off via the architecture specific PCIe code and then look at
having "white-list" cases where we enable it for things like
peer-to-peer accesses. In the case of SPARC we could look at
defaulting it to on, and only "black-list" any cases where there might
be issues since SPARC relies on this in a significant way for
performance. In the case of ARM and other architectures it is a bit of
a toss-up. I would say we could just default it to on for now and
"black-list" anything that doesn't work, or we could go the other way
like I suggested for x86. It all depends on what the ARM community
would want to agree on for this. I would say unless it makes a
significant difference like it does in the case of SPARC we are
probably better off just defaulting it to off.

- Alex

^ permalink raw reply

* Re: TPACKET_V3 timeout bug?
From: Guy Harris @ 2017-05-02 18:19 UTC (permalink / raw)
  To: chetan loke; +Cc: Andrew Lunn, Sowmini Varadhan, netdev, tcpdump-workers
In-Reply-To: <C289F9B5-A8DE-45B1-9485-AE322850B462@alum.mit.edu>

On May 2, 2017, at 10:54 AM, Guy Harris <guy@alum.mit.edu> wrote:

> Yes, there's a case where user space wasn't being woken up.

See also this thread from almost 3 years ago, beginning with

	http://marc.info/?l=linux-netdev&m=140633612828824&w=2

and this patch thread from almost 2 1/2 years ago:

	https://lkml.org/lkml/2014/12/18/466

^ permalink raw reply

* [PATCH] net: ipv6: Fix warning of freeing alive inet6 address
From: Mike Manning @ 2017-05-02 18:30 UTC (permalink / raw)
  To: netdev

While this is not reproducible manually, Andrey's syzkaller program hit
the warning "IPv6: Freeing alive inet6 address" with this part trace:

inet6_ifa_finish_destroy+0x12e/0x190 c:894
in6_ifa_put ./include/net/addrconf.h:330
addrconf_dad_work+0x4e9/0x1040 net/ipv6/addrconf.c:3963

The fix is to call in6_ifa_put() for the inet6_ifaddr before rather
than after calling addrconf_ifdown(), as the latter may remove it from
the address hash table.

Fixes: 85b51b12115c ("net: ipv6: Remove addresses for failures with strict DAD")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Mike Manning <mmanning@brocade.com>
---
 net/ipv6/addrconf.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 80ce478..361993a 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3902,8 +3902,11 @@ static void addrconf_dad_work(struct work_struct *w)
 	} else if (action == DAD_ABORT) {
 		in6_ifa_hold(ifp);
 		addrconf_dad_stop(ifp, 1);
-		if (disable_ipv6)
+		if (disable_ipv6) {
+			in6_ifa_put(ifp);
 			addrconf_ifdown(idev->dev, 0);
+			goto unlock;
+		}
 		goto out;
 	}
 
@@ -3950,6 +3953,7 @@ static void addrconf_dad_work(struct work_struct *w)
 		      ifp->dad_nonce);
 out:
 	in6_ifa_put(ifp);
+unlock:
 	rtnl_unlock();
 }
 
-- 
2.1.4

^ permalink raw reply related

* [PATCH net] bpf, arm64: fix jit branch offset related to ldimm64
From: Daniel Borkmann @ 2017-05-02 18:34 UTC (permalink / raw)
  To: davem
  Cc: ast, netdev, xi.wang, catalin.marinas, zlim.lnx, linux-arm-kernel,
	Daniel Borkmann

When the instruction right before the branch destination is
a 64 bit load immediate, we currently calculate the wrong
jump offset in the ctx->offset[] array as we only account
one instruction slot for the 64 bit load immediate although
it uses two BPF instructions. Fix it up by setting the offset
into the right slot after we incremented the index.

Before (ldimm64 test 1):

  [...]
  00000020:  52800007  mov w7, #0x0 // #0
  00000024:  d2800060  mov x0, #0x3 // #3
  00000028:  d2800041  mov x1, #0x2 // #2
  0000002c:  eb01001f  cmp x0, x1
  00000030:  54ffff82  b.cs 0x00000020
  00000034:  d29fffe7  mov x7, #0xffff // #65535
  00000038:  f2bfffe7  movk x7, #0xffff, lsl #16
  0000003c:  f2dfffe7  movk x7, #0xffff, lsl #32
  00000040:  f2ffffe7  movk x7, #0xffff, lsl #48
  00000044:  d29dddc7  mov x7, #0xeeee // #61166
  00000048:  f2bdddc7  movk x7, #0xeeee, lsl #16
  0000004c:  f2ddddc7  movk x7, #0xeeee, lsl #32
  00000050:  f2fdddc7  movk x7, #0xeeee, lsl #48
  [...]

After (ldimm64 test 1):

  [...]
  00000020:  52800007  mov w7, #0x0 // #0
  00000024:  d2800060  mov x0, #0x3 // #3
  00000028:  d2800041  mov x1, #0x2 // #2
  0000002c:  eb01001f  cmp x0, x1
  00000030:  540000a2  b.cs 0x00000044
  00000034:  d29fffe7  mov x7, #0xffff // #65535
  00000038:  f2bfffe7  movk x7, #0xffff, lsl #16
  0000003c:  f2dfffe7  movk x7, #0xffff, lsl #32
  00000040:  f2ffffe7  movk x7, #0xffff, lsl #48
  00000044:  d29dddc7  mov x7, #0xeeee // #61166
  00000048:  f2bdddc7  movk x7, #0xeeee, lsl #16
  0000004c:  f2ddddc7  movk x7, #0xeeee, lsl #32
  00000050:  f2fdddc7  movk x7, #0xeeee, lsl #48
  [...]

Also, add a couple of test cases to make sure JITs pass
this test. Tested on Cavium ThunderX ARMv8. The added
test cases all pass after the fix.

Fixes: 8eee539ddea0 ("arm64: bpf: fix out-of-bounds read in bpf2a64_offset()")
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Xi Wang <xi.wang@gmail.com>
---
 ( Based against net-next where BPF related patches are usually
   routed, if something else is preferred please let me know. )

 arch/arm64/net/bpf_jit_comp.c |  8 ++++----
 lib/test_bpf.c                | 45 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index a785554..ce8ab04 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -779,14 +779,14 @@ static int build_body(struct jit_ctx *ctx)
 		int ret;
 
 		ret = build_insn(insn, ctx);
-
-		if (ctx->image == NULL)
-			ctx->offset[i] = ctx->idx;
-
 		if (ret > 0) {
 			i++;
+			if (ctx->image == NULL)
+				ctx->offset[i] = ctx->idx;
 			continue;
 		}
+		if (ctx->image == NULL)
+			ctx->offset[i] = ctx->idx;
 		if (ret)
 			return ret;
 	}
diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index 0362da0..2e38502 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -4656,6 +4656,51 @@ static int bpf_fill_ld_abs_vlan_push_pop(struct bpf_test *self)
 		{ },
 		{ { 0, 1 } },
 	},
+	{
+		/* Mainly testing JIT + imm64 here. */
+		"JMP_JGE_X: ldimm64 test 1",
+		.u.insns_int = {
+			BPF_ALU32_IMM(BPF_MOV, R0, 0),
+			BPF_LD_IMM64(R1, 3),
+			BPF_LD_IMM64(R2, 2),
+			BPF_JMP_REG(BPF_JGE, R1, R2, 2),
+			BPF_LD_IMM64(R0, 0xffffffffffffffffUL),
+			BPF_LD_IMM64(R0, 0xeeeeeeeeeeeeeeeeUL),
+			BPF_EXIT_INSN(),
+		},
+		INTERNAL,
+		{ },
+		{ { 0, 0xeeeeeeeeU } },
+	},
+	{
+		"JMP_JGE_X: ldimm64 test 2",
+		.u.insns_int = {
+			BPF_ALU32_IMM(BPF_MOV, R0, 0),
+			BPF_LD_IMM64(R1, 3),
+			BPF_LD_IMM64(R2, 2),
+			BPF_JMP_REG(BPF_JGE, R1, R2, 0),
+			BPF_LD_IMM64(R0, 0xffffffffffffffffUL),
+			BPF_EXIT_INSN(),
+		},
+		INTERNAL,
+		{ },
+		{ { 0, 0xffffffffU } },
+	},
+	{
+		"JMP_JGE_X: ldimm64 test 3",
+		.u.insns_int = {
+			BPF_ALU32_IMM(BPF_MOV, R0, 1),
+			BPF_LD_IMM64(R1, 3),
+			BPF_LD_IMM64(R2, 2),
+			BPF_JMP_REG(BPF_JGE, R1, R2, 4),
+			BPF_LD_IMM64(R0, 0xffffffffffffffffUL),
+			BPF_LD_IMM64(R0, 0xeeeeeeeeeeeeeeeeUL),
+			BPF_EXIT_INSN(),
+		},
+		INTERNAL,
+		{ },
+		{ { 0, 1 } },
+	},
 	/* BPF_JMP | BPF_JNE | BPF_X */
 	{
 		"JMP_JNE_X: if (3 != 2) return 1",
-- 
1.9.3

^ permalink raw reply related

* Re: net/smc and the RDMA core
From: Bart Van Assche @ 2017-05-02 18:39 UTC (permalink / raw)
  To: hch-jcswGhMUV9g@public.gmane.org,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <d9214af6-1c6f-9f95-fc00-3e4a316b4f81-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

On Tue, 2017-05-02 at 14:25 +0200, Ursula Braun wrote:
> if you can point out specific issues, we will be happy to work with you
> to get them addressed!

Hello Ursula,

My list of issues that I would like to see addressed can be found below. Doug,
Christoph and others may have additional inputs. The issues that have not yet
been mentioned in other e-mails are:
- The SMC driver only supports one RDMA transport type (RoCE v1) but
  none of the other RDMA transport types (RoCE v2, IB and iWARP). New
  RDMA drivers should support all RDMA transport types transparently.
  The traditional approach to support multiple RDMA transport types is
  by using the RDMA/CM to establish connections.
- The implementation of the SMC driver only supports RoCEv1. This is
  a very unfortunate choice because:
  - RoCEv1 is not routable and hence is limited to a single Ethernet
    broadcast domain.
  - RoCEv1 packets escape a whole bunch of mechanisms that only work
    for IP packets. Firewalls pass all RoCEv1 packets and switches
    do not restrict RoCEv1 packets to a single VLAN. This means that
    if the network configuration is changed after an SMC connection
    has been set up such that IP communication between the endpoints
    of an SMC connection is blocked that the SMC RoCEv1 packets will
    not be blocked by the network equipment of which the configuration
    has just been changed.
- As already mentioned by Christoph, the SMC implementation uses RDMA
  calls that probably will be deprecated soon (ib_create_cq()) and
  should use the RDMA R/W API instead of building sge lists itself.

Bart.--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next iproute2 1/3] netlink: import netlink message parsing from kernel
From: David Ahern @ 2017-05-02 18:39 UTC (permalink / raw)
  To: Stephen Hemminger, David Miller; +Cc: netdev, jakub.kicinski
In-Reply-To: <20170502110314.66d9ee89@xeon-e3>

On 5/2/17 12:03 PM, Stephen Hemminger wrote:
> Then use libmnl it is already used in several other places in iproute2.
> Eventually, I would like to use it everywhere and get rid of old netlink parser.
> 

Why? libmnl is not going to simplify the iproute2 code.

Look at attribute validation. Importing the kernel code into iproute2,
the API is very familiar to anyone hacking on the kernel side:

+       if (nla_parse(tb, NLMSGERR_ATTR_MAX, attr, alen, extack_policy)
!= 0) {
+               fprintf(stderr,
+                       "Failed to parse extended error attributes\n");
+               return 0;
+       }
+

ie., you pass a policy to the parse routine and the checking is part of
nla_parse. The implementation of nla_parse and validate_nla are quite
easy to read.

Now take a look at what devlink has for validation - attr_cb. IMO very
unreadable and puts the burden on the app using the libmnl API.

^ permalink raw reply

* Re: [PATCH net-next iproute2 1/3] netlink: import netlink message parsing from kernel
From: David Miller @ 2017-05-02 18:51 UTC (permalink / raw)
  To: dsa; +Cc: stephen, netdev, jakub.kicinski
In-Reply-To: <3acddce9-5421-637e-8b2b-ae8dd5c7b29a@cumulusnetworks.com>

From: David Ahern <dsa@cumulusnetworks.com>
Date: Tue, 2 May 2017 12:39:51 -0600

> On 5/2/17 12:03 PM, Stephen Hemminger wrote:
>> Then use libmnl it is already used in several other places in iproute2.
>> Eventually, I would like to use it everywhere and get rid of old netlink parser.
>> 
> 
> Why? libmnl is not going to simplify the iproute2 code.

Agreed.

> Look at attribute validation. Importing the kernel code into iproute2,
> the API is very familiar to anyone hacking on the kernel side:
> 
> +       if (nla_parse(tb, NLMSGERR_ATTR_MAX, attr, alen, extack_policy)
> != 0) {
> +               fprintf(stderr,
> +                       "Failed to parse extended error attributes\n");
> +               return 0;
> +       }
> +
> 
> ie., you pass a policy to the parse routine and the checking is part of
> nla_parse. The implementation of nla_parse and validate_nla are quite
> easy to read.
> 
> Now take a look at what devlink has for validation - attr_cb. IMO very
> unreadable and puts the burden on the app using the libmnl API.

Also agreed.

Stephen I totally agree with David, asking him to use libmnl for this is
not reasonable nor is it even a good idea given the above.

^ permalink raw reply

* [PATCH] bonding: Use seq_putc() in bond_info_show_master()
From: SF Markus Elfring @ 2017-05-02 18:52 UTC (permalink / raw)
  To: netdev, Andy Gospodarek, Jay Vosburgh, Veaceslav Falico
  Cc: LKML, kernel-janitors

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Tue, 2 May 2017 20:48:36 +0200

A few single characters should be put into a sequence.
Thus use the corresponding function "seq_putc".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 drivers/net/bonding/bond_procfs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_procfs.c b/drivers/net/bonding/bond_procfs.c
index d8d4ada034b7..bdc4db184d94 100644
--- a/drivers/net/bonding/bond_procfs.c
+++ b/drivers/net/bonding/bond_procfs.c
@@ -72,7 +72,7 @@ static void bond_info_show_master(struct seq_file *seq)
 		seq_printf(seq, " (fail_over_mac %s)", optval->string);
 	}
 
-	seq_printf(seq, "\n");
+	seq_putc(seq, '\n');
 
 	if (bond_mode_uses_xmit_hash(bond)) {
 		optval = bond_opt_get_val(BOND_OPT_XMIT_HASH,
@@ -117,11 +117,11 @@ static void bond_info_show_master(struct seq_file *seq)
 			if (!bond->params.arp_targets[i])
 				break;
 			if (printed)
-				seq_printf(seq, ",");
+				seq_putc(seq, ',');
 			seq_printf(seq, " %pI4", &bond->params.arp_targets[i]);
 			printed = 1;
 		}
-		seq_printf(seq, "\n");
+		seq_putc(seq, '\n');
 	}
 
 	if (BOND_MODE(bond) == BOND_MODE_8023AD) {
-- 
2.12.2

^ permalink raw reply related

* Re: [PATCH net-next v2] net: ipv6: make sure multicast packets are not forwarded beyond the different scopes
From: David Miller @ 2017-05-02 18:59 UTC (permalink / raw)
  To: donatas.abraitis; +Cc: netdev, stable
In-Reply-To: <20170427071202.84403-1-donatas.abraitis@gmail.com>

From: Donatas Abraitis <donatas.abraitis@gmail.com>
Date: Thu, 27 Apr 2017 10:12:02 +0300

> 	    RFC4291 2.7 Routers must not forward any multicast packets
> 	    beyond of the scope indicated by the scop field in the
> 	    destination multicast address.
> 
> Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>

I think it's a ">=" test which is needed here, not pure equality.
Scopes are subsets of other scopes and are therefore allowed within
eachother.

Did you actually see misbehavior due to this issue, or see a real
bonafide conformance test fail?

If you're just reading the RFC and sticking tests here and there based
upon what you read, without any testing or real life verification of
the issue, this is _strongly_ discouraged.

It would even be ok if you merely showed how another open source
networking stack makes this test.

^ permalink raw reply

* Re: [PATCH] net: ethernet: stmmac: properly set PS bit in MII configurations during reset
From: David Miller @ 2017-05-02 19:00 UTC (permalink / raw)
  To: thomas.petazzoni; +Cc: peppe.cavallaro, alexandre.torgue, netdev, stable
In-Reply-To: <1493286329-24448-1-git-send-email-thomas.petazzoni@free-electrons.com>


Someone needs to review this patch.

^ permalink raw reply

* Re: [PATCH net-next RFC 1/1] net netlink: Add new type NLA_FLAG_BITS
From: David Miller @ 2017-05-02 19:03 UTC (permalink / raw)
  To: jhs; +Cc: netdev, jiri, xiyou.wangcong
In-Reply-To: <1493562519-15563-1-git-send-email-jhs@emojatatu.com>

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Sun, 30 Apr 2017 10:28:39 -0400

> Generic bitflags attribute content sent to the kernel by user.
> With this type the user can either set or unset a flag in the
> kernel.

You asked for feedback, here it is :-)

I think this is overengineered.

Just define a u32 for the value, and mask which defines which bits are
legitimate and defined.  Any bit outside of the legitimate mask must
be zero.

Simple.

^ permalink raw reply

* Re: [PATCH net] bpf, arm64: fix jit branch offset related to ldimm64
From: David Miller @ 2017-05-02 19:06 UTC (permalink / raw)
  To: daniel; +Cc: ast, netdev, xi.wang, catalin.marinas, zlim.lnx, linux-arm-kernel
In-Reply-To: <5f7d105a1830ed789f1b416dc17b02fce0bd070e.1493749880.git.daniel@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Tue,  2 May 2017 20:34:54 +0200

> When the instruction right before the branch destination is
> a 64 bit load immediate, we currently calculate the wrong
> jump offset in the ctx->offset[] array as we only account
> one instruction slot for the 64 bit load immediate although
> it uses two BPF instructions. Fix it up by setting the offset
> into the right slot after we incremented the index.
 ...
> Also, add a couple of test cases to make sure JITs pass
> this test. Tested on Cavium ThunderX ARMv8. The added
> test cases all pass after the fix.
> 
> Fixes: 8eee539ddea0 ("arm64: bpf: fix out-of-bounds read in bpf2a64_offset()")
> Reported-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Alexei Starovoitov <ast@kernel.org>

Applied and queued up for -stable, thanks!

I also applied your XADD patch as well.

Thanks again.

^ permalink raw reply

* Re: [PATCH net] tcp: fix wraparound issue in tcp_lp
From: David Miller @ 2017-05-02 19:07 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1493677788.31837.25.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 01 May 2017 15:29:48 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> Be careful when comparing tcp_time_stamp to some u32 quantity,
> otherwise result can be surprising.
> 
> Fixes: 7c106d7e782b ("[TCP]: TCP Low Priority congestion control")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox