* Re: [patch net v2] net: fec: fix compile with CONFIG_M5272
From: Fabio Estevam @ 2016-12-05 10:02 UTC (permalink / raw)
To: Nikita Yushchenko
Cc: David S. Miller, Fugang Duan, Troy Kisky, Andrew Lunn,
Eric Nelson, Philippe Reynes, Johannes Berg,
netdev@vger.kernel.org, Chris Healy, Fabio Estevam, linux-kernel
In-Reply-To: <1480925763-20254-1-git-send-email-nikita.yoush@cogentembedded.com>
On Mon, Dec 5, 2016 at 6:16 AM, Nikita Yushchenko
<nikita.yoush@cogentembedded.com> wrote:
> Commit 4dfb80d18d05 ("net: fec: cache statistics while device is down")
> introduced unconditional statistics-related actions.
>
> However, when driver is compiled with CONFIG_M5272, staticsics-related
> definitions do not exist, which results into build errors.
>
> Fix that by adding explicit handling of !defined(CONFIG_M5272) case.
>
> Fixes: 4dfb80d18d05 ("net: fec: cache statistics while device is down")
> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Looks better now:
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
^ permalink raw reply
* commit : ppp: add rtnetlink device creation support - breaks netcf on my machine.
From: Brad Campbell @ 2016-12-05 9:06 UTC (permalink / raw)
To: netdev; +Cc: g.nault
G'day All,
I've been chasing an issue on and off for a couple of months now, and
I've managed to finally find a way to reproduce and bisect it.
Machine is an old Debian (7.11) box with a recent compile of netcf. Any
kernel later than 4.6.7 prevents ncftool from running if the ppp
interface is up.
I'm using netcf-0.2.8, and this commit breaks it :
96d934c70db6e1bc135600c57da1285eaf7efb26 is the first bad commit
commit 96d934c70db6e1bc135600c57da1285eaf7efb26
Author: Guillaume Nault <g.nault@alphalink.fr>
Date: Thu Apr 28 17:55:30 2016 +0200
ppp: add rtnetlink device creation support
Define PPP device handler for use with rtnetlink.
The only PPP specific attribute is IFLA_PPP_DEV_FD. It is mandatory and
contains the file descriptor of the associated /dev/ppp instance (the
file descriptor which would have been used for ioctl(PPPIOCNEWUNIT) in
the ioctl-based API). The PPP device is removed when this file
descriptor is released (same behaviour as with ioctl based PPP
devices).
PPP devices created with the rtnetlink API behave like the ones created
with ioctl(PPPIOCNEWUNIT). In particular existing ioctls work the same
way, no matter how the PPP device was created.
The rtnl callbacks are also assigned to ioctl based PPP devices. This
way, rtnl messages have the same effect on any PPP devices.
The immediate effect is that all PPP devices, even ioctl-based
ones, can now be removed with "ip link del".
A minor difference still exists between ioctl and rtnl based PPP
interfaces: in the device name, the number following the "ppp" prefix
corresponds to the PPP unit number for ioctl based devices, while it is
just an unrelated incrementing index for rtnl ones.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
:040000 040000 b26929a9fd7becd3652583d2f56579b8420fc522
dd9e74c76a02cff8387a98821c05509518ca2f48 M drivers
:040000 040000 82eacf2954c38d78ec4ccb7b8bdec2675e96e2e8
7d4b666b5a61b0a23738943b39a984b47bcde3ae M include
The issue is netcf dies if the ppp interface is up with the informative
message "Failed to initialize netcf". No debug output. Enabling
debugging in libnl shows it dies when the data for the ppp interface is
returned.
Take ppp down and ncftool comes up ok.
libnl appears to be version 3.2.24 (Debian libnl-3)
This has been manifesting on a production machine with a pppoe link to
the world, but I managed to reproduce it on a test box using a pppoe
server and a client on the test box which enabled me to bisect it.
I don't know anything about netlink, nor the networking subsystem so I'm
asking for a belt with a cluebat please. I'm using the latest *released*
netcf, and a quick check of the git tree does not seem to indicate
anything that might be related. Have I missed a kernel config option, or
made some catastrophic blunder in setting this up? It works fine on
4.6.4, but anything 4.7 or later just dies if the ppp interface is up.
Regards,
Brad
^ permalink raw reply
* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Pavel Machek @ 2016-12-05 10:15 UTC (permalink / raw)
To: Giuseppe CAVALLARO; +Cc: alexandre.torgue, David Miller, netdev, linux-kernel
In-Reply-To: <3192a4b6-1e97-048f-a0dd-bfc0f3d96ed8@st.com>
[-- Attachment #1: Type: text/plain, Size: 799 bytes --]
Hi!
> >>In the ring, some descriptors can raise the irq (according to a
> >>threshold) and set the IC bit. In this path, the NAPI poll will be
> >>scheduled.
> >
> >Not NAPI poll but stmmac_tx_timer(), right?
>
> in the xmit according the the threshold the timer is started or the
> interrupt is set inside the descriptor.
> Then stmmac_tx_clean will be always called and, if you see the flow,
> no irqlock protection is needed!
Actually, I was wrong. irqlock protection is needed, since
stmmac_tx_clean() is called from timer, and that's interrupt context,
as you can confirm using BUG_ON(in_interrupt());
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply
* Re: [RFC PATCH 2/2] Documentation: devictree: Add macb mdio bindings
From: Nicolas Ferre @ 2016-12-05 10:23 UTC (permalink / raw)
To: Harini Katakam, Rob Herring
Cc: Harini Katakam, David Miller, Pawel Moll, Mark Rutland,
ijc+devicetree@hellion.org.uk, Kumar Gala, Boris Brezillon,
alexandre.belloni, netdev, linux-kernel@vger.kernel.org,
devicetree@vger.kernel.org, michals@xilinx.com
In-Reply-To: <CAFcVECJRbgAxQMbu7jABHjkiicD1Ysp7WmDMoUzs2P5qn26uqw@mail.gmail.com>
Le 05/12/2016 à 03:55, Harini Katakam a écrit :
> Hi Rob,
>
>
> Thanks for the review.
> On Sun, Dec 4, 2016 at 3:05 AM, Rob Herring <robh@kernel.org> wrote:
>> On Mon, Nov 28, 2016 at 03:19:27PM +0530, Harini Katakam wrote:
> <snip>
>>> +Required properties:
>>> +- compatible: Should be "cdns,macb-mdio"
>>
>> Only one version ever? This needs more specific compatible strings.
>>
>
> This is part of the Cadence MAC in a way.
> I can use revision number from the Cadence spec I was working with.
> But it is not necessarily specific that version.
Yes it is:
you must specify the first precise SoC needing/implementing it:
"cdns,zynqmp-gem-mdio" or "xlnx,zynq-gem-mdio" or anything looking like
this.
So, if an Atmel implementation is slightly different, we can still use
our own "atmel,sama5d2-gem-mdio" or "cdns,sama5d2-gem-mdio"
compatibility string.
On the other hand, if it's strictly the same, we can use the
"xlnx,zynq-gem-mdio" compatibility without any problem...
> I'll take care of the other comments in the next version.
>
> Regards,
> Harini
>
--
Nicolas Ferre
^ permalink raw reply
* Re: Build regressions/improvements in v4.9-rc8
From: Geert Uytterhoeven @ 2016-12-05 10:24 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org; +Cc: netdev@vger.kernel.org
In-Reply-To: <1480928996-7974-1-git-send-email-geert@linux-m68k.org>
On Mon, Dec 5, 2016 at 10:09 AM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> JFYI, when comparing v4.9-rc8[1] to v4.9-rc7[3], the summaries are:
> - build errors: +5/-10
+ /home/kisskb/slave/src/drivers/net/ethernet/freescale/fec_main.c:
error: 'fec_stats' undeclared (first use in this function): => 3296:7
+ /home/kisskb/slave/src/drivers/net/ethernet/freescale/fec_main.c:
error: implicit declaration of function
'fec_enet_update_ethtool_stats'
[-Werror=implicit-function-declaration]: => 2887:2
+ /home/kisskb/slave/src/drivers/net/ethernet/freescale/fec_main.c:
error: negative width in bit-field '<anonymous>': => 3296:7
+ /home/kisskb/slave/tmp/ccJlHrOq.s: Error: pcrel too far
BFD_RELOC_BFIN_10: => 912
m68k/m5272c3_defconfig (patch available)
> [1] http://kisskb.ellerman.id.au/kisskb/head/3e5de27e940d00d8d504dfb96625fb654f641509/ (all 267 configs)
> [3] http://kisskb.ellerman.id.au/kisskb/head/e5517c2a5a49ed5e99047008629f1cd60246ea0e/ (all 267 configs)
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* Re: e1000e on Thinkpad x60: gigabit not available due to "SmartSpeed"
From: Pavel Machek @ 2016-12-05 10:44 UTC (permalink / raw)
To: Lennart Sorensen
Cc: Greg, kernel list, jeffrey.t.kirsher, intel-wired-lan, netdev
In-Reply-To: <20160902165735.GJ14311@csclub.uwaterloo.ca>
[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]
On Fri 2016-09-02 12:57:35, Lennart Sorensen wrote:
> On Thu, Sep 01, 2016 at 02:58:13PM -0700, Greg wrote:
> > On Thu, 2016-09-01 at 22:14 +0200, Pavel Machek wrote:
> > > Hi!
> > >
> > > I have trouble getting 1000mbit out of my ethernet card.
> > >
> > > I tried direct connection between two PCs with different cables, and
> > > no luck.
> > >
> > > Today I tried connection to 1000mbit switch, and no luck, either. (Two
> > > cables, one was cat6, both short).
> > >
> > > My computer sees 1000mbit being advertised by the other side, but does
> > > not advertise 1000mbit, "Link Speed was downgraded by SmartSpeed".
> >
> > Check your cables?
> >
> > https://vmxp.wordpress.com/2015/01/06/1gbe-intel-nic-throttled-to-100mbit-by-smartspeed/
>
> Of course if it isn't the cable, then it could even be a broken pin in
> the port. As far as I can tell, anything that causes one of the 3rd
> or 4th pairs of wires to not work will degrade to 100Mbit on just the
> first 2 pairs of wires and give that message. Some badly implemented
> switches can also cause it of course.
Ok, so it seems like broken pin in the port. I realized I had docking
station available for the machine, and when connecting using that
connector (not internal notebook one), I get gigabit speed.
Thanks and best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply
* RE: [PATCH net-next v6 6/6] samples/bpf: add userspace example for prohibiting sockets
From: David Laight @ 2016-12-05 10:51 UTC (permalink / raw)
To: 'David Ahern', Alexei Starovoitov
Cc: netdev@vger.kernel.org, daniel@zonque.org, ast@fb.com,
daniel@iogearbox.net, maheshb@google.com, tgraf@suug.ch
In-Reply-To: <4afe76e2-d1be-4810-0db7-859bcd5c89f3@cumulusnetworks.com>
From: David Ahern
> Sent: 01 December 2016 15:14
> On 11/30/16 10:59 PM, Alexei Starovoitov wrote:
> > On Wed, Nov 30, 2016 at 10:16:50AM -0800, David Ahern wrote:
> >> Add examples preventing a process in a cgroup from opening a socket
> >> based family, protocol and type.
> >>
> >> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> > ...
> >> +++ b/samples/bpf/sock_flags_kern.c
> >> @@ -0,0 +1,37 @@
> >> +#include <uapi/linux/bpf.h>
> >> +#include <linux/socket.h>
> >> +#include "bpf_helpers.h"
> >> +
> >> +SEC("cgroup/sock1")
> >> +int bpf_prog1(struct bpf_sock *sk)
> >> +{
> >> + char fmt[] = "socket: family %d type %d protocol %d\n";
> >> +
> >> + bpf_trace_printk(fmt, sizeof(fmt), sk->family, sk->type, sk->protocol);
> >> +
> >> + /* block PF_INET6, SOCK_RAW, IPPROTO_ICMPV6 sockets
> >> + * ie., make ping6 fail
> >> + */
> >> + if (sk->family == PF_INET6 && sk->type == 3 && sk->protocol == 58)
> >> + return 0;
> >
> > why not to use SOCK_RAW and IPPROTO_ICMPV6 instead of constants?
>
> header file hell.
You can at least improve the comment.
Or add:
#ifndef SOCK_RAW
#define SPCK_RAW 3
#endif
(etc) at the top of the file.
David
^ permalink raw reply
* Re: net/can: warning in raw_setsockopt/__alloc_pages_slowpath
From: Marc Kleine-Budde @ 2016-12-05 10:55 UTC (permalink / raw)
To: Oliver Hartkopp, Andrey Konovalov, David S. Miller, linux-can,
netdev, LKML
Cc: Dmitry Vyukov, Kostya Serebryany, syzkaller
In-Reply-To: <6fe05efd-eb2c-a5e2-9d45-48f2c3098abb@hartkopp.net>
[-- Attachment #1.1: Type: text/plain, Size: 1929 bytes --]
On 12/02/2016 06:05 PM, Oliver Hartkopp wrote:
>
>
> On 12/02/2016 04:42 PM, Marc Kleine-Budde wrote:
>> On 12/02/2016 04:11 PM, Oliver Hartkopp wrote:
>>>
>>>
>>> On 12/02/2016 02:24 PM, Marc Kleine-Budde wrote:
>>>> On 12/02/2016 01:43 PM, Andrey Konovalov wrote:
>>>
>>>
>>>>> [<ffffffff8369e0de>] raw_setsockopt+0x1be/0x9f0 net/can/raw.c:506
>>>>
>>>> We should add a check for a sensible optlen....
>>>>
>>>>> static int raw_setsockopt(struct socket *sock, int level, int optname,
>>>>> char __user *optval, unsigned int optlen)
>>>>> {
>>>>> struct sock *sk = sock->sk;
>>>>> struct raw_sock *ro = raw_sk(sk);
>>>>> struct can_filter *filter = NULL; /* dyn. alloc'ed filters */
>>>>> struct can_filter sfilter; /* single filter */
>>>>> struct net_device *dev = NULL;
>>>>> can_err_mask_t err_mask = 0;
>>>>> int count = 0;
>>>>> int err = 0;
>>>>>
>>>>> if (level != SOL_CAN_RAW)
>>>>> return -EINVAL;
>>>>>
>>>>> switch (optname) {
>>>>>
>>>>> case CAN_RAW_FILTER:
>>>>> if (optlen % sizeof(struct can_filter) != 0)
>>>>> return -EINVAL;
>>>>
>>>> here...
>>>>
>>>> if (optlen > 64 * sizeof(struct can_filter))
>>>> return -EINVAL;
>>>>
>>>
>>> Agreed.
>>>
>>> But what is sensible here?
>>> 64 filters is way to small IMO.
>>>
>>> When thinking about picking a bunch of single CAN IDs I would tend to
>>> something like 512 filters.
>>
>> Ok - 64 was just an arbitrary chosen value for demonstration purposes :)
>>
>
> :-)
>
> Would you like to send a patch?
Yes, how many Filters? 512? Can you test, as I don't have the setup ready?
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* RE: [patch net v2] net: fec: fix compile with CONFIG_M5272
From: Andy Duan @ 2016-12-05 9:08 UTC (permalink / raw)
To: Nikita Yushchenko, David S. Miller, Troy Kisky, Andrew Lunn,
Eric Nelson, Philippe Reynes, Johannes Berg,
netdev@vger.kernel.org
Cc: Chris Healy, Fabio Estevam, linux-kernel@vger.kernel.org
In-Reply-To: <1480925763-20254-1-git-send-email-nikita.yoush@cogentembedded.com>
From: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Sent: Monday, December 05, 2016 4:16 PM
>To: David S. Miller <davem@davemloft.net>; Andy Duan
><fugang.duan@nxp.com>; Troy Kisky <troy.kisky@boundarydevices.com>;
>Andrew Lunn <andrew@lunn.ch>; Eric Nelson <eric@nelint.com>; Philippe
>Reynes <tremyfr@gmail.com>; Johannes Berg <johannes@sipsolutions.net>;
>netdev@vger.kernel.org
>Cc: Chris Healy <cphealy@gmail.com>; Fabio Estevam
><fabio.estevam@nxp.com>; linux-kernel@vger.kernel.org; Nikita
>Yushchenko <nikita.yoush@cogentembedded.com>
>Subject: [patch net v2] net: fec: fix compile with CONFIG_M5272
>
>Commit 4dfb80d18d05 ("net: fec: cache statistics while device is down")
>introduced unconditional statistics-related actions.
>
>However, when driver is compiled with CONFIG_M5272, staticsics-related
>definitions do not exist, which results into build errors.
>
>Fix that by adding explicit handling of !defined(CONFIG_M5272) case.
>
>Fixes: 4dfb80d18d05 ("net: fec: cache statistics while device is down")
>Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
>---
>Changes since v1:
>- instead of #ifdef'ing calls to fec_enet_update_ethtool_stats(), add
> definition of empty fec_enet_update_ethtool_stats() for CONFIG_M5272
> case,
>- add FEC_STATS_SIZE macro to avoid #ifdef in the middle of
> alloc_etherdev_mqs() args.
>
> drivers/net/ethernet/freescale/fec_main.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
>diff --git a/drivers/net/ethernet/freescale/fec_main.c
>b/drivers/net/ethernet/freescale/fec_main.c
>index 5f77caa59534..741cf4a57bfc 100644
>--- a/drivers/net/ethernet/freescale/fec_main.c
>+++ b/drivers/net/ethernet/freescale/fec_main.c
>@@ -2313,6 +2313,8 @@ static const struct fec_stat {
> { "IEEE_rx_octets_ok", IEEE_R_OCTETS_OK }, };
>
>+#define FEC_STATS_SIZE (ARRAY_SIZE(fec_stats) * sizeof(u64))
>+
> static void fec_enet_update_ethtool_stats(struct net_device *dev) {
> struct fec_enet_private *fep = netdev_priv(dev); @@ -2330,7
>+2332,7 @@ static void fec_enet_get_ethtool_stats(struct net_device *dev,
> if (netif_running(dev))
> fec_enet_update_ethtool_stats(dev);
>
>- memcpy(data, fep->ethtool_stats, ARRAY_SIZE(fec_stats) *
>sizeof(u64));
>+ memcpy(data, fep->ethtool_stats, FEC_STATS_SIZE);
> }
>
> static void fec_enet_get_strings(struct net_device *netdev, @@ -2355,6
>+2357,12 @@ static int fec_enet_get_sset_count(struct net_device *dev, int
>sset)
> return -EOPNOTSUPP;
> }
> }
>+
>+#else /* !defined(CONFIG_M5272) */
>+#define FEC_STATS_SIZE 0
>+static inline void fec_enet_update_ethtool_stats(struct net_device
>+*dev) { }
> #endif /* !defined(CONFIG_M5272) */
>
> static int fec_enet_nway_reset(struct net_device *dev) @@ -3293,8
>+3301,7 @@ fec_probe(struct platform_device *pdev)
>
> /* Init network device */
> ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private) +
>- ARRAY_SIZE(fec_stats) * sizeof(u64),
>- num_tx_qs, num_rx_qs);
>+ FEC_STAT_SIZE, num_tx_qs, num_rx_qs);
> if (!ndev)
> return -ENOMEM;
>
>--
>2.1.4
^ permalink raw reply
* Re: [PATCH iproute2 1/1] tc: updated man page to reflect handle-id use in filter GET command.
From: Phil Sutter @ 2016-12-05 11:07 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Roman Mashak, netdev, sathya.perla
In-Reply-To: <20161202140622.5ecad2d6@xeon-e3>
On Fri, Dec 02, 2016 at 02:06:22PM -0800, Stephen Hemminger wrote:
> On Thu, 1 Dec 2016 15:20:44 -0500
> Roman Mashak <mrv@mojatatu.com> wrote:
>
> > Signed-off-by: Roman Mashak <mrv@mojatatu.com>
> > ---
> > man/man8/tc.8 | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/man/man8/tc.8 b/man/man8/tc.8
> > index 8a47a2b..d957ffa 100644
> > --- a/man/man8/tc.8
> > +++ b/man/man8/tc.8
> > @@ -32,7 +32,9 @@ class-id ] qdisc
> > DEV
> > .B [ parent
> > qdisc-id
> > -.B | root ] protocol
> > +.B | root ] [ handle
> > +handle-id ]
> > +.B protocol
> > protocol
> > .B prio
> > priority filtertype
> > @@ -577,7 +579,7 @@ it is created.
> >
> > .TP
> > get
> > -Displays a single filter given the interface, parent ID, priority, protocol and handle ID.
> > +Displays a single filter given the interface, qdisc-id, priority, protocol and handle-id.
> >
> > .TP
> > show
>
> The proper syntax for man page usage section is to put keywords in bold and any value
> that is variable in italic.
>
> I know this whole man page doesn't do this correctly. But that doesn't mean that new
> additions should continue with the mistake.
>
> Please revise and resubmit. Extra bonus points for fixing the other bits.
+1! Thanks for playing man page style Nazi^Wguardian. :)
Cheers, Phil
^ permalink raw reply
* Re: adm80211: add checks for dma mapping errors
From: Kalle Valo @ 2016-12-05 11:11 UTC (permalink / raw)
To: Alexey Khoroshilov
Cc: Alexey Khoroshilov, linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
ldv-project-tpLiQldItUH5n4uC9ZG1Ww
In-Reply-To: <1480715566-17027-1-git-send-email-khoroshilov-ufN2psIa012HXe+LvDLADg@public.gmane.org>
Alexey Khoroshilov <khoroshilov-ufN2psIa012HXe+LvDLADg@public.gmane.org> wrote:
> The driver does not check if mapping dma memory succeed.
> The patch adds the checks and failure handling.
>
> Found by Linux Driver Verification project (linuxtesting.org).
>
> Signed-off-by: Alexey Khoroshilov <khoroshilov-ufN2psIa012HXe+LvDLADg@public.gmane.org>
Patch applied to wireless-drivers-next.git, thanks.
d15697de60db adm80211: add checks for dma mapping errors
--
https://patchwork.kernel.org/patch/9459295/
Documentation about submitting wireless patches and checking status
from patchwork:
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply
* Re: [flamebait] xdp, well meaning but pointless
From: Jesper Dangaard Brouer @ 2016-12-05 11:04 UTC (permalink / raw)
To: John Fastabend
Cc: Willem de Bruijn, Florian Westphal, Network Development, brouer
In-Reply-To: <58432186.20006@gmail.com>
On Sat, 3 Dec 2016 11:48:22 -0800
John Fastabend <john.fastabend@gmail.com> wrote:
> On 16-12-03 08:19 AM, Willem de Bruijn wrote:
> > On Fri, Dec 2, 2016 at 12:22 PM, Jesper Dangaard Brouer
> > <brouer@redhat.com> wrote:
> >>
> >> On Thu, 1 Dec 2016 10:11:08 +0100 Florian Westphal <fw@strlen.de> wrote:
> >>
> >>> In light of DPDKs existence it make a lot more sense to me to provide
> >>> a). a faster mmap based interface (possibly AF_PACKET based) that allows
> >>> to map nic directly into userspace, detaching tx/rx queue from kernel.
> >>>
> >>> John Fastabend sent something like this last year as a proof of
> >>> concept, iirc it was rejected because register space got exposed directly
> >>> to userspace. I think we should re-consider merging netmap
> >>> (or something conceptually close to its design).
> >>
> >> I'm actually working in this direction, of zero-copy RX mapping packets
> >> into userspace. This work is mostly related to page_pool, and I only
> >> plan to use XDP as a filter for selecting packets going to userspace,
> >> as this choice need to be taken very early.
> >>
> >> My design is here:
> >> https://prototype-kernel.readthedocs.io/en/latest/vm/page_pool/design/memory_model_nic.html
> >>
> >> This is mostly about changing the memory model in the drivers, to allow
> >> for safely mapping pages to userspace. (An efficient queue mechanism is
> >> not covered).
> >
> > Virtio virtqueues are used in various other locations in the stack.
> > With separate memory pools and send + completion descriptor rings,
> > signal moderation, careful avoidance of cacheline bouncing, etc. these
> > seem like a good opportunity for a TPACKET_V4 format.
> >
>
> FWIW. After we rejected exposing the register space to user space due to
> valid security issues we fell back to using VFIO which works nicely for
> mapping virtual functions into userspace and VMs. The main drawback is
> user space has to manage the VF but that is mostly a solved problem at
> this point. Deployment concerns aside.
Using VFs (PCIe SR-IOV Virtual Functions) solves this in a completely
different orthogonal way. To me it is still like taking over the entire
NIC, although you use HW to split the traffic into VFs. Setup for VF
deployment still looks troubling like 1G hugepages and vfio
enable_unsafe_noiommu_mode=1. And generally getting SR-IOV working on
your HW is a task of it's own.
One thing people often seem to miss with SR-IOV VFs is that VM-to-VM
traffic will be limited by PCIe bandwidth and transaction overheads.
Like Stepen Hemminger demonstrated[1] at NetDev 1.2 and Luigi also have
a paper demonstrating this (AFAICR).
[1] http://netdevconf.org/1.2/session.html?stephen-hemminger
A key difference in my design is to, allow the NIC to be shared in a
safe manor. The NIC functions 100% as a normal Linux controlled NIC.
The catch is that once an application request zero-copy RX, then the
NIC might have to reconfigure it's RX-ring usage. As the driver MUST
change into what I call the "read-only packet page" mode, which
actually is the default in many drivers today.
> There was a TPACKET_V4 version we had a prototype of that passed
> buffers down to the hardware to use with the dma engine. This gives
> zero-copy but same as VFs requires the hardware to do all the steering
> of traffic and any expected policy in front of the application. Due to
> requiring user space to kick hardware and vice versa though it was
> somewhat slower so I didn't finish it up. The kick was implemented as a
> syscall iirc. I can maybe look at it a bit more next week and see if its
> worth reviving now in this context.
This is still at the design stage. The target here is that the
page_pool and driver adjustments will provide the basis for building RX
zero-copy solutions in a memory safe manor.
I do see tcpdump/RAW packet access like TPACKET_V4 being one of the
first users of this. Not the only user, as further down the road, I
also imagine RX zero-copy delivery into sockets (and perhaps combined
with a "raw_demux" step that doesn't alloc the SKB, which Tom hinted in
the other thread for UDP delivery).
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Lino Sanfilippo @ 2016-12-05 11:40 UTC (permalink / raw)
To: Pavel Machek
Cc: Giuseppe CAVALLARO, alexandre.torgue, David Miller, netdev,
linux-kernel
In-Reply-To: <20161205101516.GA24936@amd>
Hi,
>
> Actually, I was wrong. irqlock protection is needed, since
> stmmac_tx_clean() is called from timer, and that's interrupt context,
> as you can confirm using BUG_ON(in_interrupt());
>
in_interrupt() can mean both softirq and hardirq context. In this case it
means softirq. So I guess you were right before, and no irq locking is needed.
Regards,
Lino
^ permalink raw reply
* Re: stmmac: turn coalescing / NAPI off in stmmac
From: Pavel Machek @ 2016-12-05 11:45 UTC (permalink / raw)
To: Giuseppe CAVALLARO; +Cc: David Miller, alexandre.torgue, netdev, linux-kernel
In-Reply-To: <9b338d9b-a69a-7393-0b48-bc6b92b46cb7@st.com>
[-- Attachment #1: Type: text/plain, Size: 1816 bytes --]
Hi!
> >>>So patch currently looks like this (hand edited, can't be
> >>>applied, got it working few hours ago). Does it look acceptable?
> >>>
> >>>I'd prefer this to go after the patch that pulls common code to single
> >>>place, so that single place needs to be patched. Plus I guess I should
> >>>add ifdefs, so that more advanced NAPI / tx coalescing code can be
> >>>reactivated when it is fixed. Trivial fixes can go on top. Does that
> >>>sound like a plan?
> >>
> >>Hmm, what I find strange is that, just this code is running since a
> >>long time on several platforms and Chip versions. No raise condition
> >>have been found or lock protection problems (also proving look
> >>mechanisms).
> >
> >Well, it works better for me when I disable CONFIG_SMP. It is normal
> >that locking problems are hard to reproduce :-(.
>
> can you share me the test, maybe I can try to reproduce on ARM box.
> Are you using 3.x or 4.x GMAC?
I'm using board similar to altera socfpga. 3.70a, as far as I can
tell.
gmac0: ethernet@ff700000 {
compatible = "altr,socfpga-stmmac", "snps,dwmac-3.70a", "snps,dwmac\
";
> >>Pavel, I ask you sorry if I missed some problems so, if you can
> >>(as D. Miller asked) to send us a cover letter + all patches
> >>I will try to reply soon. I can do also some tests if you ask
> >>me that! I could run on 3.x and 4.x but I cannot promise you
> >>benchmarks.
> >
> >Actually... I have questions here. David normally pulls from you (can
> >I have a address of your git tree?).
>
> No I send the patches to the mailing list.
Aha, ok.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply
* Re: [v5,1/5] soc: qcom: smem_state: Fix include for ERR_PTR()
From: Valo, Kalle @ 2016-12-05 11:45 UTC (permalink / raw)
To: David Miller
Cc: Bjorn Andersson, k.eugene.e@gmail.com,
wcn36xx@lists.infradead.org, linux-wireless@vger.kernel.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-msm@vger.kernel.org, Andy Gross
In-Reply-To: <CAPBZ5Qf8Z7-0Bvsuj88=jY8ZULU9ntCAAOqV_a0OGL4oYSTdWQ@mail.gmail.com>
Hi Dave,
Andy Gross <andy.gross@linaro.org> writes:
> On 1 December 2016 at 04:17, Valo, Kalle <kvalo@qca.qualcomm.com> wrote:
>> Kalle Valo <kvalo@qca.qualcomm.com> writes:
>>
>>> It found the same problem. Interestingly I'm also building x86 with 32
>>> bit, maybe it's related?
>>>
>>> tree: https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git pending
>>> head: 1ea16a1c457939b4564643f7637d5cc639a8d3b7
>>> commit: 5eb09c672b01460804fd49b1c9cc7d1072a102f0 [96/99] wcn36xx: Transition driver to SMD client
>>> config: i386-allmodconfig (attached as .config)
>>> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
>>> reproduce:
>>> git checkout 5eb09c672b01460804fd49b1c9cc7d1072a102f0
>>> # save the attached .config to linux build tree
>>> make ARCH=i386
>>>
>>> All errors (new ones prefixed by >>):
>>>
>>>>> ERROR: "qcom_wcnss_open_channel" [drivers/net/wireless/ath/wcn36xx/wcn36xx.ko] undefined!
>>
>> Bjorn mentioned me on IRC that this is because of a missing commit in my
>> tree:
>>
>> daa6e41ce2b5 soc: qcom: wcnss_ctrl: Stub wcnss_ctrl API
>>
>> When I pull the tag below (which contains the above commit) wcn36xx
>> builds fine for me:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git tags/qcom-drivers-for-4.10
>>
>> Andy, is it ok if I pull your tag also to my ath.git tree to solve the
>> wcn36xx build problem? My trees go to Linus via net-next and I don't
>> know when exactly Dave would send a pull request to Linus, before or
>> after the arm trees, but as the tag seems to contain only few patches I
>> hope it doesn't matter.
>
> The qcom-drivers-for-4.10 tag was already merged into arm-soc. But
> having you pull it as well won't cause issues so long as you are using
> the tag (which you are). I don't see any issues with this approach.
Andy, thanks for the confirmation.
Dave, how do you suggest to handle depency issues like this? I have
pending important wcn36xx patches (converting the driver to use the
recently introduced proper SMD subsystem) which have a build dependency
on a commit which is in Andy's tag qcom-drivers-for-4.10. The commit in
question is currently in arm-soc tree going to 4.10, but not in your
net-next tree. I assume Linus will pull that during the next merge
window.
What I'm planning to do is to pull tag qcom-drivers-for-4.10 to my tree
and then send the patches to you. This will mean that from my pull
request you would get four new qcom-drivers commits which are not in
your tree, yet. Or do you prefer that I wait the qcom-drivers commits
trickle down from Linux until I send you wcn36xx patches? Or something
else?
$ git pull git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git tags/qcom-drivers-for-4.10
>From git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux
* tag qcom-drivers-for-4.10 -> FETCH_HEAD
Auto-merging MAINTAINERS
Merge made by the 'recursive' strategy.
MAINTAINERS | 1 +
drivers/firmware/qcom_scm.c | 4 +++-
include/dt-bindings/pinctrl/qcom,pmic-gpio.h | 4 ++++
include/dt-bindings/pinctrl/qcom,pmic-mpp.h | 6 ++++++
include/linux/soc/qcom/wcnss_ctrl.h | 13 +++++++++++++
5 files changed, 27 insertions(+), 1 deletion(-)
$ git log --oneline ORIG_HEAD..
6d0491261ecc Merge tag 'qcom-drivers-for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into test
bd4760ca0315 firmware: qcom: scm: Use devm_reset_controller_register()
4fb1a4207804 MAINTAINERS: add drivers/pinctrl/qcom to ARM/QUALCOMM SUPPORT
636959fc1232 pinctrl: pm8994: add pad voltage regulator defines
daa6e41ce2b5 soc: qcom: wcnss_ctrl: Stub wcnss_ctrl API
$
--
Kalle Valo
^ permalink raw reply
* Re: [v3 PATCH] netlink: Do not schedule work from sk_destruct
From: Andrey Konovalov @ 2016-12-05 11:51 UTC (permalink / raw)
To: Herbert Xu
Cc: Cong Wang, David S. Miller, Johannes Berg, Florian Westphal,
Eric Dumazet, Bob Copeland, Tom Herbert, David Decotigny, netdev,
LKML, Dmitry Vyukov, Kostya Serebryany, syzkaller
In-Reply-To: <20161205072820.GB10204@gondor.apana.org.au>
On Mon, Dec 5, 2016 at 8:28 AM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Mon, Dec 05, 2016 at 03:26:00PM +0800, Herbert Xu wrote:
>> On Mon, Dec 05, 2016 at 03:19:46PM +0800, Herbert Xu wrote:
>> >
>> > Thanks for the patch. It'll obviously work but I wanted avoid that
>> > because it penalises the common path for the rare case.
>> >
>> > Andrey, please try this patch and let me know if it's any better.
>> >
>> > ---8<---
>> > Subject: netlink: Do not schedule work from sk_destruct
>>
>> Crap, I screwed it up again. Here is a v2 which moves the atomic
>> call into the RCU callback as otherwise the socket can be freed from
>> another path while we await the RCU callback.
>
> With the move it no longer makes sense to rename deferred_put_nlk_sk
> so here is v3 which restores the original name.
>
> ---8<---
> It is wrong to schedule a work from sk_destruct using the socket
> as the memory reserve because the socket will be freed immediately
> after the return from sk_destruct.
>
> Instead we should do the deferral prior to sk_free.
>
> This patch does just that.
>
> Fixes: 707693c8a498 ("netlink: Call cb->done from a worker thread")
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 602e5eb..246f29d 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -322,11 +322,13 @@ static void netlink_skb_set_owner_r(struct sk_buff *skb, struct sock *sk)
> sk_mem_charge(sk, skb->truesize);
> }
>
> -static void __netlink_sock_destruct(struct sock *sk)
> +static void netlink_sock_destruct(struct sock *sk)
> {
> struct netlink_sock *nlk = nlk_sk(sk);
>
> if (nlk->cb_running) {
> + if (nlk->cb.done)
> + nlk->cb.done(&nlk->cb);
> module_put(nlk->cb.module);
> kfree_skb(nlk->cb.skb);
> }
> @@ -348,21 +350,7 @@ static void netlink_sock_destruct_work(struct work_struct *work)
> struct netlink_sock *nlk = container_of(work, struct netlink_sock,
> work);
>
> - nlk->cb.done(&nlk->cb);
> - __netlink_sock_destruct(&nlk->sk);
> -}
> -
> -static void netlink_sock_destruct(struct sock *sk)
> -{
> - struct netlink_sock *nlk = nlk_sk(sk);
> -
> - if (nlk->cb_running && nlk->cb.done) {
> - INIT_WORK(&nlk->work, netlink_sock_destruct_work);
> - schedule_work(&nlk->work);
> - return;
> - }
> -
> - __netlink_sock_destruct(sk);
> + sk_free(&nlk->sk);
> }
>
> /* This lock without WQ_FLAG_EXCLUSIVE is good on UP and it is _very_ bad on
> @@ -667,8 +655,18 @@ static int netlink_create(struct net *net, struct socket *sock, int protocol,
> static void deferred_put_nlk_sk(struct rcu_head *head)
> {
> struct netlink_sock *nlk = container_of(head, struct netlink_sock, rcu);
> + struct sock *sk = &nlk->sk;
> +
> + if (!atomic_dec_and_test(&sk->sk_refcnt))
> + return;
> +
> + if (nlk->cb_running && nlk->cb.done) {
> + INIT_WORK(&nlk->work, netlink_sock_destruct_work);
> + schedule_work(&nlk->work);
> + return;
> + }
>
> - sock_put(&nlk->sk);
> + sk_free(sk);
> }
>
> static int netlink_release(struct socket *sock)
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Hi Herbert,
Tested the last version of your patch, the reports go away.
Thanks for the fix!
Tested-by: Andrey Konovalov <andreyknvl@google.com>
^ permalink raw reply
* Re: [PATCH 1/1] NET: usb: cdc_ncm: adding MBIM RESET_FUNCTION request and modifying ncm bind common code
From: Bjørn Mork @ 2016-12-05 10:10 UTC (permalink / raw)
To: Daniele Palmas; +Cc: Oliver Neukum, linux-usb, netdev
In-Reply-To: <CAGRyCJHsf6dFGaoui7fH1WspMhfHVdEdLAh-f4qV7YCZ2HxezA@mail.gmail.com>
Daniele Palmas <dnlplm@gmail.com> writes:
> I went back to this and further checking revealed that MBIM function
> reset is not needed and the only problem is related to commit
> 48906f62c96cc2cd35753e59310cb70eb08cc6a5 cdc_ncm: toggle altsetting to
> force reset before setup, that, however, is mandatory for some Huawei
> modems.
Interesting. That does sound like an odd firmware bug to me. I have a
bit of a hard time understanding how this can be. Using data interface
altsetting 0 to reset the function is documented in the NCM spec:
"
7.2 Using Alternate Settings to Reset an NCM Function
To place the network aspects of a function in a known state, the host
shall:
• select alternate setting 0 of the NCM Data Interface (this is the
setting with no endpoints). This can be done explicitly using
SetInterface, or implicitly using SetConfiguration. See [USB30] for
details.
• select the NCM operational parameters by sending commands to the NCM
Communication Interface, then
• select the second alternate interface setting of the NCM Data
Interface (this is the setting with a bulk IN endpoint and a bulk
OUT endpoint).
Whenever alternate setting 0 is selected by the host, the function
shall:
• flush function buffers
• reset the packet filter to its default state
• clear all multicast address filters
• clear all power filters set using
SetEthernetPowerManagementPatternFilter
• reset statistics counters to zero
• restore its Ethernet address to its default state
• reset its IN NTB size to the value given by field dwNtbInMaxSize
from the NTB Parameter Struc- ture
• reset the NTB format to NTB-16
• reset the current Maximum Datagram Size to a function-specific
default. If SetMaxDatagramSize is not supported, then the maximum
datagram size shall be the same as the value in wMaxSegmentSize of
the Ethernet Networking Functional Descriptor. If SetMaxDatagramSize
is supported by the function, then the host must either query the
function to determine the current effective maximum datagram size,
or must explicitly set the maximum datagram size. If the host wishes
to set the Maximum Datagram Size, it may do so prior to selecting
the second alternate interface set- ting of the data
interface. Doing so will ensure that the change takes effect prior
to send or receiving data.
• reset CRC mode so that the function will not append CRCs to
datagrams sent on the IN pipe
• reset NTB sequence numbers to zero
When the host selects the second alternate interface setting of the
NCM Data Interface, the function shall perform the following actions
in the following order.
• If connected to the network, the function shall send a
ConnectionSpeedChange notification to the host indicating the
current connection speed.
• Whether connected or not, the function shall then send a
NetworkConnection notification to the host, with wValue indicating
the current state of network connectivity
"
The addition of the "RESET" request in MBIM did not change these
requirements AFAICS. Supporting NCM style "altsetting 0 reset" is
required by default inheritance. The description of dual NCM/MBIM
functions in the MBIM spec further emphasizes this:
"
Regardless of the previous sequence of SET_INTERFACE commands targeting
the Communication Interface, the host is able to put the function into
a defined state by the following sequence:
1. SET_INTERFACE(Data Interface, 0)
2. SET_INTERFACE(Communication Interface, desired alternate setting)
3. Sending the required class commands for the targeted alternate
setting
4. SET_INTERFACE(Data Interface, 1 if Communication Interface,0 and
Data Interface,2 Communication Interface 1)
"
This, along with the lack of *any* sort of discussion of the
relation/interfaction between "MBIM RESET" and "data interface
altsetting 0" makes the MBIM RESET control request either ambigious or
redundant. Or both...
FWIW, I've always considered RESET a symptom of the sloppy MBIM spec
development. It did not exactly work to their advantage that the first
(and only?) published errata simply was an excuse to add new features
(the "MBIM extended functional descriptor" and "MBIM loopback testmode")
without updating the revision number. Causing even more confusion,
since we now have two different MBIMv1.0 specs.
Well, whatever. No need to get all frustrated about that. We have to
deal with whatever the firmware developers serve us. And,
unfortunately, it is not surprising that unclear and ambigious specs
leads to incompatible firmware implementations.
I wonder, what happens if you unload the MBIM driver and let the USB
core reset the data interface to altsetting 0? Will the device then
fail when the driver is loaded again? If not, then where is the
difference? And can we possibly reorder the driver set_interface
requests to make them similar to the USB core reset
Until now, I was under the impression that this was the
only documented way to reset both NCM and MBIM functions (since the
RESET is MBIM specific),
> My proposal is to add something like
> CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE quirk to avoid the toggle.
>
> To have a conservative approach, I would add only Telit LE922 to this
> quirk and keep also the usleep_range delay, since I do not have a
> clear view on all the VIDs/PIDs affected by this quirk. Then other
> modems, once tested, can be added to the quirk if the delay is not
> enough.
>
> If this approach, though ugly, has chances to be accepted I can write
> a new patch.
Yes, that sounds like the best approach at the moment. I have
unfortunately not as much time to experiment with this as I seem to
need. And the EM7455 workaround (the delay) is a bit hard to verify,
since it is easily affected by small additional delays caused by the
debugging itself. This can cause bogus test results.
Bjørn
^ permalink raw reply
* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Pavel Machek @ 2016-12-05 11:56 UTC (permalink / raw)
To: Giuseppe CAVALLARO; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <b6ac7a3c-6774-9568-1cc0-97074eaa928f@st.com>
[-- Attachment #1: Type: text/plain, Size: 1896 bytes --]
Hi!
> the idea behind the TX mitigation is to mix the interrupt and
> timer and this approach gave us real benefit in terms
> of performances and CPU usage (especially on SH4-200/SH4-300 platforms
> based).
> In the ring, some descriptors can raise the irq (according to a
> threshold) and set the IC bit. In this path, the NAPI poll will be
> scheduled.
> But there is a timer that can run (and we experimented that no high
> resolution is needed) to clear the tx resources.
I'm sorry, but it is just broken. It could have never worked. If it
appered it did, you did not test it right.
First, low-res timers have resolution down to one per second (see
David's email). It is not acceptable to delay transmits for 40msec,
and certainly not acceptable to delay them for 1000msec.
Second, the logic is wrong:
if (likely(priv->tx_coal_frames > priv->tx_count_frames))
mod_timer(&priv->txtimer,
STMMAC_COAL_TIMER(priv->tx_coal_timer));
else {
priv->tx_count_frames = 0;
priv->hw->desc->set_tx_ic(desc);
priv->xstats.tx_set_ic_bit++;
}
doing tx_clean() after set number of packets, or set time after the
first packet is transmitted would make sense. But that's not what the
code does. As long as packets are being transmitted, you move the
timer into the future.. so that finally you run out of the place, then
wait for timer (!) and only then you do the cleaning.
Third, times are wrong by order of magnitude. AFAICT cleaning should
be at around 5msec at 100mbit speeds, and at around .5msec at
gigabit. You have 40msec there. (Perhaps that is not too important if
the logic is fixed, as described above).
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply
* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Pavel Machek @ 2016-12-05 12:01 UTC (permalink / raw)
To: Giuseppe CAVALLARO; +Cc: alexandre.torgue, David Miller, netdev, linux-kernel
In-Reply-To: <c4748c8f-ae4b-9bbc-7d20-9c7b7ce433c8@st.com>
[-- Attachment #1: Type: text/plain, Size: 1870 bytes --]
Hi!
> >>We had experimented this tuning on STB IP where just datagrams
> >>had to send externally. To be honest, although we had seen
> >>better results w/o any timer, we kept this approach enabled
> >>because the timer was fast enough to cover our tests on SH4 boxes.
> >
> >Please reply to David, and explain how it is supposed to
> >work... because right now it does not. 40 msec delays are not
> >acceptable in default configuration.
>
> I mean, that on UP and SMP system this schema helped
> to improve the performance saving CPU on my side and this has been
> tested since a long time (~4 years).
> I tested something similar to yours where unidirectional traffic
> with limited throughput was needed and I can confirm you that
> tuning/removing coalesce parameters this helped. The tuning I decided
> to keep in the driver was suitable in several user cases and if now
> you have problems or you want to review it I can just confirm that
> there are no problems on my side. If you want to simply the logic
> around the tx process and remove timer on official driver I can accept
> that. I will just ask you uto double check if the throughput and
> CPU usage when request max throughput (better if on GiGa setup) has
> no regressions.
> Otherwise we could start thinking about adaptive schema if feasible.
Ok, so you see the issue. Good.
See the other email to description how it could be fixed... the logic
is broken. How it was not discovered for 4 years is mystery to me.
> >Agreed that no irqlock protection is needed if we rely on napi and timers.
>
> ok
Actually I was wrong there. Another reason to disable tx coalescing
until it is fixed.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply
* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Pavel Machek @ 2016-12-05 12:14 UTC (permalink / raw)
To: Giuseppe CAVALLARO; +Cc: Alexandre Torgue, David Miller, netdev, linux-kernel
In-Reply-To: <5627300c-a9b5-8709-bb08-5f0bceddfb05@st.com>
[-- Attachment #1: Type: text/plain, Size: 1588 bytes --]
Hi!
> >Sorry but I'm a little bit confused. I'm dropped in some mails without
> >historic. I see cleanup, coalescence issue and TSO question.
> >What is your main issue? Are you working on gmac4 or 3.x ?
> >Can you refresh a little bit the story please ?
>
> let me try to do a sum, please Pavel feel free to correct me.
>
> There are some open points about the tx mitigation schema
> that we are trying to detail and eventually tune or change
> (but keeping the same performance on other user-case).
>
> In particular, the test case that is raising problem is
> an unicast tx bench.
> I suggested Pavel to tune coalesce (IC bit settings) via
> ethtool and monitor stats but he is getting problems (maybe
> due to lock).
>
> IIUC problems are mainly on new kernel and not on 4.4 where
> the gmac4 is missing. Please Pavel, could you confirm?
Actually no, it is the other way around. I can get 4.9 to work with
some tuning. 4.4 likes to crash when tx coalesce is enabled with
shorter than 40 msec timeout. (It crashes with default settings, too,
but that takes too long to reproduce.)
> Also there are some other discussion about the lock
> protection on NAPI still under discussion. I have not
> clear if in this case Pavel is getting strange behavior.
Yep, locking is broken in more than one place. I believe I understand
what some problems are. Let me prepare the patches.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply
* [PATCH] stmmac: disable tx coalescing
From: Pavel Machek @ 2016-12-05 12:27 UTC (permalink / raw)
To: David Miller
Cc: peppe.cavallaro, netdev, linux-kernel, alexandre.torgue,
LinoSanfilippo
In-Reply-To: <20161124.110416.198867271899443489.davem@davemloft.net>
[-- Attachment #1: Type: text/plain, Size: 1926 bytes --]
Tx coalescing in stmmac is broken in more than one way, so disable it
for now.
First, low-res timers have resolution down to one per second. It is
not acceptable to delay transmits for 40msec, and certainly not
acceptable to delay them for 1000msec.
Second, the logic is wrong:
if (likely(priv->tx_coal_frames > priv->tx_count_frames))
mod_timer(&priv->txtimer,
STMMAC_COAL_TIMER(priv->tx_coal_timer));
...
doing tx_clean() after set number of packets, or set time after the
first packet is transmitted would make sense. But that's not what the
code does. As long as packets are being transmitted, you move the
timer into the future.. so that finally you run out of the place, then
wait for timer (!) and only then you do the cleaning.
Third, tx_cleanup is not safe to call from interrupt (tx_lock is not
irqsave), but that's exactly what coalescing code does.
Signed-off-by: Pavel Machek <pavel@denx.de>
---
This is stable candidate, afaict. It is broken in 4.4, too.
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index 3ced2e1..32ce148 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -244,11 +244,11 @@ struct stmmac_extra_stats {
/* Max/Min RI Watchdog Timer count value */
#define MAX_DMA_RIWT 0xff
#define MIN_DMA_RIWT 0x20
-/* Tx coalesce parameters */
+/* Tx coalesce parameters. Set STMMAC_TX_FRAMES to 0 to disable coalescing. */
#define STMMAC_COAL_TX_TIMER 40000
#define STMMAC_MAX_COAL_TX_TICK 100000
#define STMMAC_TX_MAX_FRAMES 256
-#define STMMAC_TX_FRAMES 64
+#define STMMAC_TX_FRAMES 0
/* Rx IPC status */
enum rx_frame_status {
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply related
* Re: [PATCH v2 net-next 7/8] net: reorganize struct sock for better data locality
From: Paolo Abeni @ 2016-12-05 12:36 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David S . Miller, netdev, Yuchung Cheng, Eric Dumazet
In-Reply-To: <1480792497-16607-8-git-send-email-edumazet@google.com>
Hi Eric,
On Sat, 2016-12-03 at 11:14 -0800, Eric Dumazet wrote:
> Group fields used in TX path, and keep some cache lines mostly read
> to permit sharing among cpus.
>
> Gained two 4 bytes holes on 64bit arches.
>
> Added a place holder for tcp tsq_flags, next to sk_wmem_alloc
> to speed up tcp_wfree() in the following patch.
>
> I have not added ____cacheline_aligned_in_smp, this might be done later.
> I prefer doing this once inet and tcp/udp sockets reorg is also done.
>
> Tested with both TCP and UDP.
>
> UDP receiver performance under flood increased by ~20 % :
> Accessing sk_filter/sk_wq/sk_napi_id no longer stalls because sk_drops
> was moved away from a critical cache line, now mostly read and shared.
I cherry-picked this patch only for some UDP benchmark. Under flood with
many concurrent flows, I see this 20% improvement and a relevant
decrease in system load.
Nice work, thanks Eric!
Tested-by: Paolo Abeni <pabeni@redhat.com>
^ permalink raw reply
* Good morning
From: Andrew @ 2016-12-05 12:20 UTC (permalink / raw)
Hello ,
I need your help to me discuss a project .
^ permalink raw reply
* RE: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat
From: David Laight @ 2016-12-05 12:49 UTC (permalink / raw)
To: 'Alexey Dobriyan', davem@davemloft.net
Cc: netdev@vger.kernel.org, xemul@openvz.org
In-Reply-To: <20161202012132.GD31170@avx2>
From: Alexey Dobriyan
> Sent: 02 December 2016 01:22
> net_generic() function is both a) inline and b) used ~600 times.
>
> It has the following code inside
>
> ...
> ptr = ng->ptr[id - 1];
> ...
>
> "id" is never compile time constant so compiler is forced to subtract 1.
> And those decrements or LEA [r32 - 1] instructions add up.
>
> We also start id'ing from 1 to catch bugs where pernet sybsystem id
> is not initialized and 0. This is quite pointless idea (nothing will
> work or immediate interference with first registered subsystem) in
> general but it hints what needs to be done for code size reduction.
>
> Namely, overlaying allocation of pointer array and fixed part of
> structure in the beginning and using usual base-0 addressing.
>
> Ids are just cookies, their exact values do not matter, so lets start
> with 3 on x86_64.
...
> struct net_generic {
> - struct {
> - unsigned int len;
> - struct rcu_head rcu;
> - } s;
> -
> - void *ptr[0];
> + union {
> + struct {
> + unsigned int len;
> + struct rcu_head rcu;
> + } s;
> +
> + void *ptr[0];
> + };
> };
That union is an accident waiting to happen.
What might work is to offset the Ids by
(offsetof(struct net_generic, ptr)/sizeof (void *)) instead of by 1.
The subtract from the offset will then counter the structure offset
- which is what you are trying to achieve.
David.
^ permalink raw reply
* Re: [PATCH net-next] net/sched: cls_flower: Set the filter Hardware device for all use-cases
From: Simon Horman @ 2016-12-05 13:00 UTC (permalink / raw)
To: Hadar Hen Zion; +Cc: David S. Miller, netdev, Jiri Pirko, Or Gerlitz
In-Reply-To: <1480857919-25194-1-git-send-email-hadarh@mellanox.com>
Hi Hadar,
On Sun, Dec 04, 2016 at 03:25:19PM +0200, Hadar Hen Zion wrote:
> Check if the returned device from tcf_exts_get_dev function supports tc
> offload and in case the rule can't be offloaded, set the filter hw_dev
> parameter to the original device given by the user.
>
> The filter hw_device parameter should always be set by fl_hw_replace_filter
> function, since this pointer is used by dump stats and destroy
> filter for each flower rule (offloaded or not).
>
> Fixes: 7091d8c7055d ('net/sched: cls_flower: Add offload support using egress Hardware device')
> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
> Reported-by: Simon Horman <horms@verge.net.au>
Thanks for fixing this, it looks good to me now.
I apologise for not reporting this from my netronome.com address,
which was my intention.
Tested-by: Simon Horman <simon.horman@netronome.com>
> ---
> net/sched/cls_flower.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
> index c5cea78..29a9e6d 100644
> --- a/net/sched/cls_flower.c
> +++ b/net/sched/cls_flower.c
> @@ -236,8 +236,11 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
> int err;
>
> if (!tc_can_offload(dev, tp)) {
> - if (tcf_exts_get_dev(dev, &f->exts, &f->hw_dev))
> + if (tcf_exts_get_dev(dev, &f->exts, &f->hw_dev) ||
> + (f->hw_dev && !tc_can_offload(f->hw_dev, tp))) {
> + f->hw_dev = dev;
> return tc_skip_sw(f->flags) ? -EINVAL : 0;
> + }
> dev = f->hw_dev;
> tc->egress_dev = true;
> } else {
> --
> 1.8.3.1
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox