Netdev List
 help / color / mirror / Atom feed
* [PATCH iproute2 1/1] tc: updated man page to reflect handle-id use in filter GET command.
From: Roman Mashak @ 2016-12-01 20:20 UTC (permalink / raw)
  To: stephen; +Cc: netdev, sathya.perla, Roman Mashak

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 man/man8/tc.8 | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/man/man8/tc.8 b/man/man8/tc.8
index 8a47a2b..d957ffa 100644
--- a/man/man8/tc.8
+++ b/man/man8/tc.8
@@ -32,7 +32,9 @@ class-id ] qdisc
 DEV
 .B [ parent
 qdisc-id
-.B | root ] protocol
+.B | root ] [ handle
+handle-id ]
+.B protocol
 protocol
 .B prio
 priority filtertype
@@ -577,7 +579,7 @@ it is created.
 
 .TP
 get
-Displays a single filter given the interface, parent ID, priority, protocol and handle ID.
+Displays a single filter given the interface, qdisc-id, priority, protocol and handle-id.
 
 .TP
 show
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH] stmmac: simplify flag assignment
From: David Miller @ 2016-12-01 20:23 UTC (permalink / raw)
  To: pavel; +Cc: peppe.cavallaro, netdev, linux-kernel
In-Reply-To: <20161130114431.GB14296@amd>

From: Pavel Machek <pavel@ucw.cz>
Date: Wed, 30 Nov 2016 12:44:31 +0100

> 
> Simplify flag assignment.
>     
> Signed-off-by: Pavel Machek <pavel@denx.de>
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index ed20668..0b706a7 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -2771,12 +2771,8 @@ static netdev_features_t stmmac_fix_features(struct net_device *dev,
>  		features &= ~NETIF_F_CSUM_MASK;
>  
>  	/* Disable tso if asked by ethtool */
> -	if ((priv->plat->tso_en) && (priv->dma_cap.tsoen)) {
> -		if (features & NETIF_F_TSO)
> -			priv->tso = true;
> -		else
> -			priv->tso = false;
> -	}
> +	if ((priv->plat->tso_en) && (priv->dma_cap.tsoen))
> +		priv->tso = !!(features & NETIF_F_TSO);
>  

Pavel, this really seems arbitrary.

Whilst I really appreciate you're looking into this driver a bit because
of some issues you are trying to resolve, I'd like to ask that you not
start bombarding me with nit-pick cleanups here and there and instead
concentrate on the real bug or issue.

Thanks in advance.

^ permalink raw reply

* Re: [RFC PATCH net-next] ipv6: implement consistent hashing for equal-cost multipath routing
From: David Miller @ 2016-12-01 20:26 UTC (permalink / raw)
  To: hannes; +Cc: david.lebrun, netdev
In-Reply-To: <1480511568.3649771.803688521.5B47BE8F@webmail.messagingengine.com>

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Wed, 30 Nov 2016 14:12:48 +0100

> David, one question: do you remember if you measured with linked lists
> at that time or also with arrays. I actually would expect small arrays
> that entirely fit into cachelines to be actually faster than our current
> approach, which also walks a linked list, probably the best algorithm to
> trash cache lines. I ask because I currently prefer this approach more
> than having large allocations in the O(1) case because of easier code
> and easier management.

I did not try this and I do agree with you that for extremely small table
sizes a list or array would perform better because of the cache behavior.

^ permalink raw reply

* [PATCH -next] net: ethernet: ti: davinci_cpdma: add missing EXPORTs
From: Paul Gortmaker @ 2016-12-01 20:25 UTC (permalink / raw)
  To: David S. Miller
  Cc: Paul Gortmaker, Ivan Khoronzhuk, Mugunthan V N, Grygorii Strashko,
	linux-omap, netdev

As of commit 8f32b90981dcdb355516fb95953133f8d4e6b11d
("net: ethernet: ti: davinci_cpdma: add set rate for a channel") the
ARM allmodconfig builds would fail modpost with:

ERROR: "cpdma_chan_set_weight" [drivers/net/ethernet/ti/ti_cpsw.ko] undefined!
ERROR: "cpdma_chan_get_rate" [drivers/net/ethernet/ti/ti_cpsw.ko] undefined!
ERROR: "cpdma_chan_get_min_rate" [drivers/net/ethernet/ti/ti_cpsw.ko] undefined!
ERROR: "cpdma_chan_set_rate" [drivers/net/ethernet/ti/ti_cpsw.ko] undefined!

Since these weren't declared as static, it is assumed they were
meant to be shared outside the file, and that modular build testing
was simply overlooked.

Fixes: 8f32b90981dc ("net: ethernet: ti: davinci_cpdma: add set rate for a channel")
Cc: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: linux-omap@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 drivers/net/ethernet/ti/davinci_cpdma.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index c776e4575d2d..36518fc5c7cc 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -796,6 +796,7 @@ int cpdma_chan_set_weight(struct cpdma_chan *ch, int weight)
 	spin_unlock_irqrestore(&ctlr->lock, flags);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(cpdma_chan_set_weight);
 
 /* cpdma_chan_get_min_rate - get minimum allowed rate for channel
  * Should be called before cpdma_chan_set_rate.
@@ -810,6 +811,7 @@ u32 cpdma_chan_get_min_rate(struct cpdma_ctlr *ctlr)
 
 	return DIV_ROUND_UP(divident, divisor);
 }
+EXPORT_SYMBOL_GPL(cpdma_chan_get_min_rate);
 
 /* cpdma_chan_set_rate - limits bandwidth for transmit channel.
  * The bandwidth * limited channels have to be in order beginning from lowest.
@@ -853,6 +855,7 @@ int cpdma_chan_set_rate(struct cpdma_chan *ch, u32 rate)
 	spin_unlock_irqrestore(&ctlr->lock, flags);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(cpdma_chan_set_rate);
 
 u32 cpdma_chan_get_rate(struct cpdma_chan *ch)
 {
@@ -865,6 +868,7 @@ u32 cpdma_chan_get_rate(struct cpdma_chan *ch)
 
 	return rate;
 }
+EXPORT_SYMBOL_GPL(cpdma_chan_get_rate);
 
 struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr *ctlr, int chan_num,
 				     cpdma_handler_fn handler, int rx_type)
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH net] tcp: warn on bogus MSS and try to amend it
From: David Miller @ 2016-12-01 20:29 UTC (permalink / raw)
  To: marcelo.leitner
  Cc: netdev, jmaxwell37, alexandre.sidorenko, kuznet, jmorris,
	yoshfuji, kaber, tlfalcon, brking, eric.dumazet
In-Reply-To: <0d41deb00d57206f518e6bffae1b0be355bbc726.1480511277.git.marcelo.leitner@gmail.com>

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Wed, 30 Nov 2016 11:14:32 -0200

> There have been some reports lately about TCP connection stalls caused
> by NIC drivers that aren't setting gso_size on aggregated packets on rx
> path. This causes TCP to assume that the MSS is actually the size of the
> aggregated packet, which is invalid.
> 
> Although the proper fix is to be done at each driver, it's often hard
> and cumbersome for one to debug, come to such root cause and report/fix
> it.
> 
> This patch amends this situation in two ways. First, it adds a warning
> on when this situation occurs, so it gives a hint to those trying to
> debug this. It also limit the maximum probed MSS to the adverised MSS,
> as it should never be any higher than that.
> 
> The result is that the connection may not have the best performance ever
> but it shouldn't stall, and the admin will have a hint on what to look
> for.
> 
> Tested with virtio by forcing gso_size to 0.
> 
> Cc: Jonathan Maxwell <jmaxwell37@gmail.com>
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

I totally agree with this change, however I think the warning message can
be improved in two ways:

>  	len = skb_shinfo(skb)->gso_size ? : skb->len;
>  	if (len >= icsk->icsk_ack.rcv_mss) {
> -		icsk->icsk_ack.rcv_mss = len;
> +		icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
> +					       tcp_sk(sk)->advmss);
> +		if (icsk->icsk_ack.rcv_mss != len)
> +			pr_warn_once("Seems your NIC driver is doing bad RX acceleration. TCP performance may be compromised.\n");

We know it's a bad GRO implementation that causes this so let's be specific in the
message, perhaps something like:

	Driver has suspect GRO implementation, TCP performance may be compromised.

Also, we have skb->dev available here most likely, so prefixing the message with
skb->dev->name would make analyzing this situation even easier for someone hitting
this.

I'm not certain if an skb->dev==NULL check is necessary here or not, but it is
definitely something you need to consider.

Thanks!

^ permalink raw reply

* Re: [PATCH net-next 5/6] net: dsa: mv88e6xxx: add helper for switch ready
From: Vivien Didelot @ 2016-12-01 20:31 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20161130233810.GT21645@lunn.ch>

Hi Andrew,

Andrew Lunn <andrew@lunn.ch> writes:

> As we have seen in the past, this sort of loop is broken if we end up
> sleeping for a long time. Please take the opportunity to replace it
> with one of our _wait() helpers, e.g. mv88e6xxx_g1_wait()

That won't work. the _wait() helpers are made to wait on self-clear (SC)
bits, i.e. looping until they are cleared to zero.

Here we want the opposite.

I will keep this existing wait loop for the moment and work soon on a
new patchset to rework the wait routines. We need a generic access to
test a given value against a given mask and wrappers for busy bits, etc.

>> +int mv88e6xxx_g1_init_ready(struct mv88e6xxx_chip *chip, bool *ready)
>> +{
>> +	u16 val;
>> +	int err;
>> +
>> +	/* Check the value of the InitReady bit 11 */
>> +	err = mv88e6xxx_g1_read(chip, GLOBAL_STATUS, &val);
>> +	if (err)
>> +		return err;
>> +
>> +	*ready = !!(val & GLOBAL_STATUS_INIT_READY);
>
> I would actually do the wait here.

That is better indeed.

Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH v3 net-next 3/3] openvswitch: Fix skb->protocol for vlan frames.
From: Pravin Shelar @ 2016-12-01 20:31 UTC (permalink / raw)
  To: Jiri Benc; +Cc: Jarno Rajahalme, Linux Kernel Network Developers, Eric Garver
In-Reply-To: <20161130153041.7a9590ef@griffin>

On Wed, Nov 30, 2016 at 6:30 AM, Jiri Benc <jbenc@redhat.com> wrote:
> On Tue, 29 Nov 2016 15:30:53 -0800, Jarno Rajahalme wrote:
>> Do not always set skb->protocol to be the ethertype of the L3 header.
>> For a packet with non-accelerated VLAN tags skb->protocol needs to be
>> the ethertype of the outermost non-accelerated VLAN ethertype.
>
> Well, the current handling of skb->protocol matches what used to be the
> handling of the kernel net stack before Jiri Pirko cleaned up the vlan
> code.
>
> I'm not opposed to changing this but I'm afraid it needs much deeper
> review. Because with this in place, no core kernel functions that
> depend on skb->protocol may be called from within openvswitch.
>
Can you give specific example where it does not work?

>> @@ -361,6 +362,11 @@ static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
>>       if (res <= 0)
>>               return res;
>>
>> +     /* If the outer vlan tag was accelerated, skb->protocol should
>> +      * refelect the inner vlan type. */
>> +     if (!eth_type_vlan(skb->protocol))
>> +             skb->protocol = key->eth.cvlan.tpid;
>
> This should not depend on the current value in skb->protocol which
> could be arbitrary at this point (from the point of view of how this
> patch understands the skb->protocol values). It's easy to fix, though -
> just add a local bool variable tracking whether the skb->protocol has
> been set.
>
skb-protocol value is set by the caller, so it should not be
arbitrary. is it missing in any case?

^ permalink raw reply

* pull-request: can-next 2016-12-01
From: Marc Kleine-Budde @ 2016-12-01 20:21 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, kernel@pengutronix.de, linux-can@vger.kernel.org


[-- Attachment #1.1: Type: text/plain, Size: 1907 bytes --]

Hello David,

this is a pull request of 4 patches for net-next/master.

There are two patches by Chris Paterson for the rcar_can and rcar_canfd
device tree binding documentation. And a patch by Geert Uytterhoeven
that corrects the order of interrupt specifiers.

The fourth patch by Colin Ian King fixes a spelling error in the
kvaser_usb driver.


regards,
Marc

---
The following changes since commit 8f679ed88f8860206edddff725e2749b4cdbb0e8:

  driver: ipvlan: Remove useless member mtu_adj of struct ipvl_dev (2016-11-30 15:01:32 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git tags/linux-can-next-for-4.10-20161201

for you to fetch changes up to 0d8f8efd32bace9f222fcc92d4a3132d877e5df6:

  net: can: usb: kvaser_usb: fix spelling mistake of "outstanding" (2016-12-01 14:27:02 +0100)

----------------------------------------------------------------
linux-can-next-for-4.10-20161201

----------------------------------------------------------------
Chris Paterson (2):
      can: rcar_can: Add r8a7796 support
      can: rcar_canfd: Add r8a7796 support

Colin Ian King (1):
      net: can: usb: kvaser_usb: fix spelling mistake of "outstanding"

Geert Uytterhoeven (1):
      can: rcar_canfd: Correct order of interrupt specifiers

 Documentation/devicetree/bindings/net/can/rcar_can.txt   | 12 +++++++-----
 Documentation/devicetree/bindings/net/can/rcar_canfd.txt | 14 ++++++++------
 drivers/net/can/usb/kvaser_usb.c                         |  4 ++--
 3 files changed, 17 insertions(+), 13 deletions(-)

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: Initial thoughts on TXDP
From: Tom Herbert @ 2016-12-01 20:39 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: Linux Kernel Network Developers
In-Reply-To: <20161201201324.GJ24547@oracle.com>

On Thu, Dec 1, 2016 at 12:13 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (12/01/16 11:05), Tom Herbert wrote:
>>
>> Polling does not necessarily imply that networking monopolizes the CPU
>> except when the CPU is otherwise idle. Presumably the application
>> drives the polling when it is ready to receive work.
>
> I'm not grokking that- "if the cpu is idle, we want to busy-poll
> and make it 0% idle"?  Keeping CPU 0% idle has all sorts
> of issues, see slide 20 of
>  http://www.slideshare.net/shemminger/dpdk-performance
>
>> > and one other critical difference from the hot-potato-forwarding
>> > model (the sort of OVS model that DPDK etc might aruguably be a fit for)
>> > does not apply: in order to figure out the ethernet and IP headers
>> > in the response correctly at all times (in the face of things like VRRP,
>> > gw changes, gw's mac addr changes etc) the application should really
>> > be listening on NETLINK sockets for modifications to the networking
>> > state - again points to needing a select() socket set where you can
>> > have both the I/O fds and the netlink socket,
>> >
>> I would think that that is management would not be implemented in a
>> fast path processing thread for an application.
>
> sure, but my point was that *XDP and other stack-bypass methods needs
> to provide a select()able socket: when your use-case is not about just
> networking, you have to snoop on changes to the control plane, and update
> your data path. In the OVS case (pure networking) the OVS control plane
> updates are intrinsic to OVS. For the rest of the request/response world,
> we need a select()able socket set to do this elegantly (not really
> possible in DPDK, for example)
>
I'm not sure that TXDP can be reconciled to help OVS. The point of
TXDP is to drive applications closer to bare metal performance, as I
mentioned this is only going to be worth it if the fast path can be
kept simple and not complicated by a requirement for generalization.
It seems like the second we put OVS in we're doubling the data path
and accepting the performance consequences of a complex path anyway.

TXDP can't over the whole system (any more than DPDK can) and needs to
work in concert with other mechanisms-- the key is how to steer the
work amongst the CPUs. For instance, if a latency critical thread is
running on some CPU we either a dedicated queue for the connections of
the thread (e.g. ntuple filtering or aRFS support) or we need a fast
way to get move unrelated packets received on a queue processed by
that CPU to other CPUs (less efficient, but no special HW support is
needed either).

Tom

>
>> The *SOs are always an interesting question. They make for great
>> benchmarks, but in real life the amount of benefit is somewhat
>> unclear. Under the wrong conditions, like all cwnds have collapsed or
>
> I think Rick's already bringing up this one.
>
> --Sowmini
>

^ permalink raw reply

* Re: iproute2 public git outdated?
From: Rami Rosen @ 2016-12-01 20:39 UTC (permalink / raw)
  To: Phil Sutter, Netdev, Stephen Hemminger
In-Reply-To: <20161201121806.GA21576@orbyte.nwl.cc>

Hi Phil,
I suggest that you will try again now, it seems that the iproute2 git
repo was updated in the last 2-4 hours, and "git log" in master shows
now a patch from 30 of November (actually it is your "Add notes about
dropped IPv4 route cache" patch)

Regards,
Rami Rosen


On 1 December 2016 at 14:18, Phil Sutter <phil@nwl.cc> wrote:
> Hi,
>
> I am using iproute2's public git repo at this URL:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
>
> To my surprise, neither master nor net-next branches have received new
> commits since end of October. Did the repo location change or was it
> just not updated for a while?
>
> Thanks, Phil

^ permalink raw reply

* Re: [PATCH net-next 0/3] sfc: defalconisation fixups
From: David Miller @ 2016-12-01 20:39 UTC (permalink / raw)
  To: ecree; +Cc: linux-net-drivers, bkenward, netdev
In-Reply-To: <c52a0276-e379-7841-8d10-d5a834b81c4e@solarflare.com>

From: Edward Cree <ecree@solarflare.com>
Date: Thu, 1 Dec 2016 16:59:13 +0000

> A bug fix, the Kconfig change, and cleaning up a bit more unused code.
> 
> Edward Cree (3):
>   sfc: fix debug message format string in efx_farch_handle_rx_not_ok
>   sfc: don't select SFC_FALCON
>   sfc: remove RESET_TYPE_RX_RECOVERY

Series applied, thank you.

^ permalink raw reply

* Re: [patch net-next v3 11/12] mlxsw: spectrum_router: Request a dump of FIB tables during init
From: Hannes Frederic Sowa @ 2016-12-01 20:40 UTC (permalink / raw)
  To: David Miller, idosch
  Cc: jiri, netdev, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz,
	roopa, dsa, nikolay, andy, vivien.didelot, andrew, f.fainelli,
	alexander.h.duyck, kaber
In-Reply-To: <20161201.150445.558407356269727869.davem@davemloft.net>

On 01.12.2016 21:04, David Miller wrote:
> 
> Hannes and Ido,
> 
> It looks like we are very close to having this in mergable shape, can
> you guys work out this final issue and figure out if it really is
> a merge stopped or not?

Sure, if the fib notification register could be done under protection of
the sequence counter I don't see any more problems.

The sync handler is nice to have and can be done in a later patch series.

^ permalink raw reply

* Re: [PATCH net-next 3/6] net: dsa: mv88e6xxx: add a software reset op
From: Vivien Didelot @ 2016-12-01 20:41 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20161130232633.GS21645@lunn.ch>

Hi Andrew,

Andrew Lunn <andrew@lunn.ch> writes:

>> diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
>> index ab52c37..9e51405 100644
>> --- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
>> +++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
>> @@ -765,6 +765,9 @@ struct mv88e6xxx_ops {
>>  	int (*phy_write)(struct mv88e6xxx_chip *chip, int addr, int reg,
>>  			 u16 val);
>>  
>> +	/* Switch Software Reset */
>> +	int (*reset)(struct mv88e6xxx_chip *chip);
>> +
>
> Hi Vivien
>
> In my huge patch series of 6390, i've been using a g1_ prefix for
> functionality which is in global 1, g2_ for global 2, etc.  This has
> worked for everything so far with the exception of setting which
> reserved MAC addresses should be sent to the CPU. Most devices have it
> in g2, but 6390 has it in g1.
>
> Please could you add the prefix.

I don't understand. It looks like you are talking about the second part
of the comment I made on your RFC patchset, about the Rsvd2CPU feature:

https://www.mail-archive.com/netdev@vger.kernel.org/msg139837.html

Switch reset routines are implemented in this patch in global1.c as
mv88e6185_g1_reset and mv88e6352_g1_reset.

6185 and 6352 are implementation references for other switches.

Thanks,

        Vivien

^ permalink raw reply

* Re: [net PATCH 0/2] Don't use lco_csum to compute IPv4 checksum
From: David Miller @ 2016-12-01 20:41 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: alexander.h.duyck, netdev, intel-wired-lan, sfr
In-Reply-To: <1480540522.2377.18.camel@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 30 Nov 2016 13:15:22 -0800

> On Wed, 2016-11-30 at 09:47 -0500, David Miller wrote:
>> From: Alexander Duyck <alexander.h.duyck@intel.com>
>> Date: Mon, 28 Nov 2016 10:42:18 -0500
>> 
>> > When I implemented the GSO partial support in the Intel drivers I was
>> using
>> > lco_csum to compute the checksum that we needed to plug into the IPv4
>> > checksum field in order to cancel out the data that was not a part of
>> the
>> > IPv4 header.  However this didn't take into account that the transport
>> > offset might be pointing to the inner transport header.
>> > 
>> > Instead of using lco_csum I have just coded around it so that we can
>> use
>> > the outer IP header plus the IP header length to determine where we
>> need to
>> > start our checksum and then just call csum_partial ourselves.
>> > 
>> > This should fix the SIT issue reported on igb interfaces as well as
>> simliar
>> > issues that would pop up on other Intel NICs.
>> 
>> Jeff, are you going to send me a pull request with this stuff or would
>> you be OK with my applying these directly to 'net'?
> 
> Go ahead and apply those to your net tree, I do not want to hold this up.

Ok, done, thanks Jeff.

^ permalink raw reply

* Re: [flamebait] xdp, well meaning but pointless
From: Hannes Frederic Sowa @ 2016-12-01 20:44 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Florian Westphal, netdev
In-Reply-To: <20161201162814.GA31300@pox.localdomain>

Hello,

this is a good conversation and I simply want to bring my worries
across. I don't have good solutions for the problems XDP tries to solve
but I fear we could get caught up in maintenance problems in the long
term given the ideas floating around on how to evolve XDP currently.

On 01.12.2016 17:28, Thomas Graf wrote:
> On 12/01/16 at 04:52pm, Hannes Frederic Sowa wrote:
>> First of all, this is a rant targeted at XDP and not at eBPF as a whole.
>> XDP manipulates packets at free will and thus all security guarantees
>> are off as well as in any user space solution.
>>
>> Secondly user space provides policy, acl, more controlled memory
>> protection, restartability and better debugability. If I had multi
>> tenant workloads I would definitely put more complex "business/acl"
>> logic into user space, so I can make use of LSM and other features to
>> especially prevent a network facing service to attack the tenants. If
>> stuff gets put into the kernel you run user controlled code in the
>> kernel exposing a much bigger attack vector.
>>
>> What use case do you see in XDP specifically e.g. for container networking?
> 
> DDOS mitigation to protect distributed applications in large clusters.
> Relying on CDN works to protect API gateways and frontends (as long as
> they don't throw you out of their network) but offers no protection
> beyond that, e.g. a noisy/hostile neighbour. Doing this at the server
> level and allowing the mitigation capability to scale up with the number
> of servers is natural and cheap.

So far we e.g. always considered L2 attacks a problem of the network
admin to correctly protect the environment. Are you talking about
protecting the L3 data plane? Are there custom proprietary protocols in
place which need custom protocol parsers that need involvement of the
kernel before it could verify the packet?

In the past we tried to protect the L3 data plane as good as we can in
Linux to allow the plain old server admin to set an IP address on an
interface and install whatever software in user space. We try not only
to protect it but also try to achieve fairness by adding a lot of
counters everywhere. Are protections missing right now or are we talking
about better performance?

To provide fairness you often have to share validated data within the
kernel and with XDP. This requires consistent lookup methods for sockets
in the lower level. Those can be exported to XDP via external functions
and become part of uAPI which will limit our ability to change those
functions in future. When the discussion started about early demuxing in
XDP I became really nervous, because suddenly the XDP program has to
decide correctly which protocol type it has and look in the correct
socket table for the socket. Different semantics for sockets can apply
here, e.g. some sockets are RCU managed, some end up using reference
counts. A wrong decision here would cause havoc in the kernel (XDP
considers packet as UDP but kernel stack as TCP). Also, who knows that
we won't have per-cpu socket tables we would keep that as uAPI (this is
btw. the dragonflyBSD approach to scaling)? Imagine someone writing a
SIP rewriter in XDP and depending on a coherent view of all sockets even
if their hash doesn't fit to the one of the queue? Suddenly something
which was thought of as being only mutable by one CPU becomes global
again and because of XDP we need to add locking because of uAPI.

This discussion is parallel to the discussion about trace points, which
are not considered uAPI. If eBPF functions are not considered uAPI then
eBPF in the network stack will have much less value, because you
suddenly depend on specific kernel versions again and cannot simply load
the code into the kernel. The API checks will become very difficult to
implement, see also the ongoing MODVERSIONS discussions on LKML some
days back.

>>> I agree with you if the LB is a software based appliance in either a
>>> dedicated VM or on dedicated baremetal.
>>>
>>> The reality is turning out to be different in many cases though, LB
>>> needs to be performed not only for north south but east west as well.
>>> So even if I would handle LB for traffic entering my datacenter in user
>>> space, I will need the same LB for packets from my applications and
>>> I definitely don't want to move all of that into user space.
>>
>> The open question to me is why is programmability needed here.
>>
>> Look at the discussion about ECMP and consistent hashing. It is not very
>> easy to actually write this code correctly. Why can't we just put C code
>> into the kernel that implements this once and for all and let user space
>> update the policies?
> 
> Whatever LB logic is put in place with native C code now is unlikely the
> logic we need in two years. We can't really predict the future. If it
> was the case, networking would have been done long ago and we would all
> be working on self eating ice cream now.

Did LB algorithms on the networking layer change that much?

There is a long history of using consistent hashing for load balancing,
as e.g. is done in haproxy or F5.

>> Load balancers have to deal correctly with ICMP packets, e.g. they even
>> have to be duplicated to every ECMP route. This seems to be problematic
>> to do in eBPF programs due to looping constructs so you end up with
>> complicated user space anyway.
> 
> Feel free to implement such complex LBs in user space or natively. It is
> not required for the majority of use cases. The most popular LBs for
> application load balancing have no idea of ECMP and require ECMP aware
> routers to be made redundant itself.

They are already available and e.g. deployed as part of some kubernetes
stacks as I wrote above.

It is a generally available algorithm which fits a lot of use cases,
basically every website that wants to shard its sessions can make use of
it. Also it is independent of ECMP and mostly is implemented in load
balancers due to its need for a lot of memory.

New algorithms outdate old ones but the core principles will be the same
and don't require major changes to the interface, e.g. ipvs scheduler.

If we are talking about security features for early drop inside TCP
streams, like http, you need to have a proper stream reassembly engine.
Snort e.g. dropped a complete stream of TCP packets if you send a RST
with the same quadruple but a wrong sequence number. End system didn't
consider the RST but non synchronized solutions ended up not inspecting
this flow anymore. How do you handle diverting views on meta data in
networking protocols? Also look how hard it is to keep e.g. the fib
table synchronized to the hardware.

In retrospect, I think Tom Herbert's move putting ILA stateless
translation into the XDP hook wasn't that bad after all. ILA maybe
hopefully becomes a standard and its implementation is already in the
kernel so why keep its translator not part of the kernel, too?

TLDR; what I'm trying to argue is that evolution of the network stack is
problematic with a programmable backplane in the kernel which locks out
future modifications of the stack in some places. On the other side, if
we don't add those features we will have a half baked solution and
people will simply prefer netmap or DPDK.

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH net] tcp: warn on bogus MSS and try to amend it
From: marcelo.leitner @ 2016-12-01 20:46 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, jmaxwell37, alexandre.sidorenko, kuznet, jmorris,
	yoshfuji, kaber, tlfalcon, brking, eric.dumazet
In-Reply-To: <20161201.152949.1953888486413180001.davem@davemloft.net>

On Thu, Dec 01, 2016 at 03:29:49PM -0500, David Miller wrote:
> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Date: Wed, 30 Nov 2016 11:14:32 -0200
> 
> > There have been some reports lately about TCP connection stalls caused
> > by NIC drivers that aren't setting gso_size on aggregated packets on rx
> > path. This causes TCP to assume that the MSS is actually the size of the
> > aggregated packet, which is invalid.
> > 
> > Although the proper fix is to be done at each driver, it's often hard
> > and cumbersome for one to debug, come to such root cause and report/fix
> > it.
> > 
> > This patch amends this situation in two ways. First, it adds a warning
> > on when this situation occurs, so it gives a hint to those trying to
> > debug this. It also limit the maximum probed MSS to the adverised MSS,
> > as it should never be any higher than that.
> > 
> > The result is that the connection may not have the best performance ever
> > but it shouldn't stall, and the admin will have a hint on what to look
> > for.
> > 
> > Tested with virtio by forcing gso_size to 0.
> > 
> > Cc: Jonathan Maxwell <jmaxwell37@gmail.com>
> > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> 
> I totally agree with this change, however I think the warning message can
> be improved in two ways:
> 
> >  	len = skb_shinfo(skb)->gso_size ? : skb->len;
> >  	if (len >= icsk->icsk_ack.rcv_mss) {
> > -		icsk->icsk_ack.rcv_mss = len;
> > +		icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
> > +					       tcp_sk(sk)->advmss);
> > +		if (icsk->icsk_ack.rcv_mss != len)
> > +			pr_warn_once("Seems your NIC driver is doing bad RX acceleration. TCP performance may be compromised.\n");
> 
> We know it's a bad GRO implementation that causes this so let's be specific in the
> message, perhaps something like:
> 
> 	Driver has suspect GRO implementation, TCP performance may be compromised.

Okay.

> 
> Also, we have skb->dev available here most likely, so prefixing the message with
> skb->dev->name would make analyzing this situation even easier for someone hitting
> this.

Nice, yes.
And this skb is mostly non-forwardable as it's bigger than the MTU,
so if someone is using net namespaces and this skb would be routed
through some veth interfaces, it would give a false hint then, but
shouldn't happen. Unless it would fit (a larger) veth mtu, but still,
one probably will simplify things up to debug this.

> 
> I'm not certain if an skb->dev==NULL check is necessary here or not, but it is
> definitely something you need to consider.
> 
> Thanks!
> 

Will check. Thanks!

  Marcelo

^ permalink raw reply

* Re: [patch net-next v3 11/12] mlxsw: spectrum_router: Request a dump of FIB tables during init
From: Ido Schimmel @ 2016-12-01 20:54 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: David Miller, jiri, netdev, idosch, eladr, yotamg, nogahf,
	arkadis, ogerlitz, roopa, dsa, nikolay, andy, vivien.didelot,
	andrew, f.fainelli, alexander.h.duyck, kaber
In-Reply-To: <d70996c1-be4e-136b-6325-1a8c152e44ce@stressinduktion.org>

On Thu, Dec 01, 2016 at 09:40:48PM +0100, Hannes Frederic Sowa wrote:
> On 01.12.2016 21:04, David Miller wrote:
> > 
> > Hannes and Ido,
> > 
> > It looks like we are very close to having this in mergable shape, can
> > you guys work out this final issue and figure out if it really is
> > a merge stopped or not?
> 
> Sure, if the fib notification register could be done under protection of
> the sequence counter I don't see any more problems.

Did you maybe miss my reply yesterday? Because I was trying to
understand what "ordering" you're referring to, but didn't receive a
reply from you.

> The sync handler is nice to have and can be done in a later patch series.

Sync handler?

^ permalink raw reply

* [patch] net: renesas: ravb: unintialized return value
From: Dan Carpenter @ 2016-12-01 20:57 UTC (permalink / raw)
  To: Sergei Shtylyov, Johan Hovold
  Cc: David S. Miller, Yoshihiro Kaneko, Kazuya Mizuguchi, Simon Horman,
	Wolfram Sang, Andrew Lunn, Philippe Reynes, Niklas Söderlund,
	Arnd Bergmann, netdev, linux-renesas-soc, kernel-janitors

We want to set the other "err" variable here so that we can return it
later.  My version of GCC misses this issue but I caught it with a
static checker.

Fixes: 9f70eb339f52 ("net: ethernet: renesas: ravb: fix fixed-link phydev leaks")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
---
Applies to the net tree for 4.10.

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 2c0357c..92d7692 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1016,8 +1016,6 @@ static int ravb_phy_init(struct net_device *ndev)
 	 * at this time.
 	 */
 	if (priv->chip_id == RCAR_GEN3) {
-		int err;
-
 		err = phy_set_max_speed(phydev, SPEED_100);
 		if (err) {
 			netdev_err(ndev, "failed to limit PHY to 100Mbit/s\n");

^ permalink raw reply related

* Re: [patch net-next v3 11/12] mlxsw: spectrum_router: Request a dump of FIB tables during init
From: Hannes Frederic Sowa @ 2016-12-01 21:09 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Jiri Pirko, netdev, davem, idosch, eladr, yotamg, nogahf, arkadis,
	ogerlitz, roopa, dsa, nikolay, andy, vivien.didelot, andrew,
	f.fainelli, alexander.h.duyck, kaber
In-Reply-To: <20161130163229.rkxvuwukgg35ktrx@splinter.mtl.com>

On 30.11.2016 17:32, Ido Schimmel wrote:
> Hi Hannes,
> 
> On Wed, Nov 30, 2016 at 04:37:48PM +0100, Hannes Frederic Sowa wrote:
>> On 30.11.2016 11:09, Jiri Pirko wrote:
>>> From: Ido Schimmel <idosch@mellanox.com>
>>>
>>> Make sure the device has a complete view of the FIB tables by invoking
>>> their dump during module init.
>>>
>>> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---
>>>  .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 23 ++++++++++++++++++++++
>>>  1 file changed, 23 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
>>> index 14bed1d..d176047 100644
>>> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
>>> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
>>> @@ -2027,8 +2027,23 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
>>>  	return NOTIFY_DONE;
>>>  }
>>>  
>>> +static void mlxsw_sp_router_fib_dump_flush(struct notifier_block *nb)
>>> +{
>>> +	struct mlxsw_sp *mlxsw_sp = container_of(nb, struct mlxsw_sp, fib_nb);
>>> +
>>> +	/* Flush pending FIB notifications and then flush the device's
>>> +	 * table before requesting another dump. Do that with RTNL held,
>>> +	 * as FIB notification block is already registered.
>>> +	 */
>>> +	mlxsw_core_flush_owq();
>>> +	rtnl_lock();
>>> +	mlxsw_sp_router_fib_flush(mlxsw_sp);
>>> +	rtnl_unlock();
>>> +}
>>> +
>>>  int mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp)
>>>  {
>>> +	fib_dump_cb_t *cb = mlxsw_sp_router_fib_dump_flush;
>>>  	int err;
>>>  
>>>  	INIT_LIST_HEAD(&mlxsw_sp->router.nexthop_neighs_list);
>>> @@ -2048,8 +2063,16 @@ int mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp)
>>>  
>>>  	mlxsw_sp->fib_nb.notifier_call = mlxsw_sp_router_fib_event;
>>>  	register_fib_notifier(&mlxsw_sp->fib_nb);
>>
>> Sorry to pick in here again:
>>
>> There is a race here. You need to protect the registration of the fib
>> notifier as well by the sequence counter. Updates here are not ordered
>> in relation to this code below.
> 
> You mean updates that can be received after you registered the notifier
> and until the dump started? I'm aware of that and that's OK. This
> listener should be able to handle duplicates.

I am not concerned about duplicates, but about ordering deletes and
getting an add from the RCU code you will add the node to hw while it is
deleted in the software path. You probably will ignore the delete
because nothing is installed in hw and later add the node which was
actually deleted but just reordered which happend on another CPU, no?

> I've a follow up patchset that introduces a new event in switchdev
> notification chain called SWITCHDEV_SYNC, which is sent when port
> netdevs are enslaved / released  from a master device (points in time
> where kernel<->device can get out of sync). It will invoke
> re-propagation of configuration from different parts of the stack
> (e.g. bridge driver, 8021q driver, fib/neigh code), which can result
> in duplicates.

Okay, understood. I wonder how we can protect against accidentally abort
calls actually. E.g. if I start to inject routes into my routing domain
how can I make sure the box doesn't die after I try to insert enough
routes. Do we need to touch quagga etc?

Thanks,
Hannes

^ permalink raw reply

* Re: [flamebait] xdp, well meaning but pointless
From: Tom Herbert @ 2016-12-01 21:12 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Thomas Graf, Florian Westphal, Linux Kernel Network Developers
In-Reply-To: <583b8947-3395-8529-933b-08e1a86a0778@stressinduktion.org>

On Thu, Dec 1, 2016 at 12:44 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hello,
>
> this is a good conversation and I simply want to bring my worries
> across. I don't have good solutions for the problems XDP tries to solve
> but I fear we could get caught up in maintenance problems in the long
> term given the ideas floating around on how to evolve XDP currently.
>
> On 01.12.2016 17:28, Thomas Graf wrote:
>> On 12/01/16 at 04:52pm, Hannes Frederic Sowa wrote:
>>> First of all, this is a rant targeted at XDP and not at eBPF as a whole.
>>> XDP manipulates packets at free will and thus all security guarantees
>>> are off as well as in any user space solution.
>>>
>>> Secondly user space provides policy, acl, more controlled memory
>>> protection, restartability and better debugability. If I had multi
>>> tenant workloads I would definitely put more complex "business/acl"
>>> logic into user space, so I can make use of LSM and other features to
>>> especially prevent a network facing service to attack the tenants. If
>>> stuff gets put into the kernel you run user controlled code in the
>>> kernel exposing a much bigger attack vector.
>>>
>>> What use case do you see in XDP specifically e.g. for container networking?
>>
>> DDOS mitigation to protect distributed applications in large clusters.
>> Relying on CDN works to protect API gateways and frontends (as long as
>> they don't throw you out of their network) but offers no protection
>> beyond that, e.g. a noisy/hostile neighbour. Doing this at the server
>> level and allowing the mitigation capability to scale up with the number
>> of servers is natural and cheap.
>
> So far we e.g. always considered L2 attacks a problem of the network
> admin to correctly protect the environment. Are you talking about
> protecting the L3 data plane? Are there custom proprietary protocols in
> place which need custom protocol parsers that need involvement of the
> kernel before it could verify the packet?
>
> In the past we tried to protect the L3 data plane as good as we can in
> Linux to allow the plain old server admin to set an IP address on an
> interface and install whatever software in user space. We try not only
> to protect it but also try to achieve fairness by adding a lot of
> counters everywhere. Are protections missing right now or are we talking
> about better performance?
>
The technical plenary at last IETF on Seoul a couple of weeks ago was
exclusively focussed on DDOS in light of the recent attack against
Dyn. There were speakers form Cloudflare and Dyn. The Cloudflare
presentation by Nick Sullivan
(https://www.ietf.org/proceedings/97/slides/slides-97-ietf-sessb-how-to-stay-online-harsh-realities-of-operating-in-a-hostile-network-nick-sullivan-01.pdf)
alluded to some implementation of DDOS mitigation. In particular, on
slide 6 Nick gave some numbers for drop rates in DDOS. The "kernel"
numbers he gave we're based in iptables+BPF and that was a whole
1.2Mpps-- somehow that seems ridiculously to me (I said so at the mic
and that's also when I introduced XDP to whole IETF :-) ). If that's
the best we can do the Internet is in a world hurt. DDOS mitigation
alone is probably a sufficient motivation to look at XDP. We need
something that drops bad packets as quickly as possible when under
attack, we need this to be integrated into the stack, we need it to be
programmable to deal with the increasing savvy of attackers, and we
don't want to be forced to be dependent on HW solutions. This is why
we created XDP!

Tom

> To provide fairness you often have to share validated data within the
> kernel and with XDP. This requires consistent lookup methods for sockets
> in the lower level. Those can be exported to XDP via external functions
> and become part of uAPI which will limit our ability to change those
> functions in future. When the discussion started about early demuxing in
> XDP I became really nervous, because suddenly the XDP program has to
> decide correctly which protocol type it has and look in the correct
> socket table for the socket. Different semantics for sockets can apply
> here, e.g. some sockets are RCU managed, some end up using reference
> counts. A wrong decision here would cause havoc in the kernel (XDP
> considers packet as UDP but kernel stack as TCP). Also, who knows that
> we won't have per-cpu socket tables we would keep that as uAPI (this is
> btw. the dragonflyBSD approach to scaling)? Imagine someone writing a
> SIP rewriter in XDP and depending on a coherent view of all sockets even
> if their hash doesn't fit to the one of the queue? Suddenly something
> which was thought of as being only mutable by one CPU becomes global
> again and because of XDP we need to add locking because of uAPI.
>
> This discussion is parallel to the discussion about trace points, which
> are not considered uAPI. If eBPF functions are not considered uAPI then
> eBPF in the network stack will have much less value, because you
> suddenly depend on specific kernel versions again and cannot simply load
> the code into the kernel. The API checks will become very difficult to
> implement, see also the ongoing MODVERSIONS discussions on LKML some
> days back.
>
>>>> I agree with you if the LB is a software based appliance in either a
>>>> dedicated VM or on dedicated baremetal.
>>>>
>>>> The reality is turning out to be different in many cases though, LB
>>>> needs to be performed not only for north south but east west as well.
>>>> So even if I would handle LB for traffic entering my datacenter in user
>>>> space, I will need the same LB for packets from my applications and
>>>> I definitely don't want to move all of that into user space.
>>>
>>> The open question to me is why is programmability needed here.
>>>
>>> Look at the discussion about ECMP and consistent hashing. It is not very
>>> easy to actually write this code correctly. Why can't we just put C code
>>> into the kernel that implements this once and for all and let user space
>>> update the policies?
>>
>> Whatever LB logic is put in place with native C code now is unlikely the
>> logic we need in two years. We can't really predict the future. If it
>> was the case, networking would have been done long ago and we would all
>> be working on self eating ice cream now.
>
> Did LB algorithms on the networking layer change that much?
>
> There is a long history of using consistent hashing for load balancing,
> as e.g. is done in haproxy or F5.
>
>>> Load balancers have to deal correctly with ICMP packets, e.g. they even
>>> have to be duplicated to every ECMP route. This seems to be problematic
>>> to do in eBPF programs due to looping constructs so you end up with
>>> complicated user space anyway.
>>
>> Feel free to implement such complex LBs in user space or natively. It is
>> not required for the majority of use cases. The most popular LBs for
>> application load balancing have no idea of ECMP and require ECMP aware
>> routers to be made redundant itself.
>
> They are already available and e.g. deployed as part of some kubernetes
> stacks as I wrote above.
>
> It is a generally available algorithm which fits a lot of use cases,
> basically every website that wants to shard its sessions can make use of
> it. Also it is independent of ECMP and mostly is implemented in load
> balancers due to its need for a lot of memory.
>
> New algorithms outdate old ones but the core principles will be the same
> and don't require major changes to the interface, e.g. ipvs scheduler.
>
> If we are talking about security features for early drop inside TCP
> streams, like http, you need to have a proper stream reassembly engine.
> Snort e.g. dropped a complete stream of TCP packets if you send a RST
> with the same quadruple but a wrong sequence number. End system didn't
> consider the RST but non synchronized solutions ended up not inspecting
> this flow anymore. How do you handle diverting views on meta data in
> networking protocols? Also look how hard it is to keep e.g. the fib
> table synchronized to the hardware.
>
> In retrospect, I think Tom Herbert's move putting ILA stateless
> translation into the XDP hook wasn't that bad after all. ILA maybe
> hopefully becomes a standard and its implementation is already in the
> kernel so why keep its translator not part of the kernel, too?
>
> TLDR; what I'm trying to argue is that evolution of the network stack is
> problematic with a programmable backplane in the kernel which locks out
> future modifications of the stack in some places. On the other side, if
> we don't add those features we will have a half baked solution and
> people will simply prefer netmap or DPDK.
>
> Bye,
> Hannes
>

^ permalink raw reply

* Re: [patch] net: renesas: ravb: unintialized return value
From: Sergei Shtylyov @ 2016-12-01 21:13 UTC (permalink / raw)
  To: Dan Carpenter, Johan Hovold
  Cc: David S. Miller, Yoshihiro Kaneko, Kazuya Mizuguchi, Simon Horman,
	Wolfram Sang, Andrew Lunn, Philippe Reynes, Niklas Söderlund,
	Arnd Bergmann, netdev, linux-renesas-soc, kernel-janitors
In-Reply-To: <20161201205744.GB10701@mwanda>

Hello!

On 12/01/2016 11:57 PM, Dan Carpenter wrote:

> We want to set the other "err" variable here so that we can return it
> later.  My version of GCC misses this issue but I caught it with a
> static checker.
>
> Fixes: 9f70eb339f52 ("net: ethernet: renesas: ravb: fix fixed-link phydev leaks")

    Hm, I somehow missed this one, probably due to the horrific CC list. :-(

> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

MBR, Sergei

^ permalink raw reply

* Re: [patch net-next v3 11/12] mlxsw: spectrum_router: Request a dump of FIB tables during init
From: Hannes Frederic Sowa @ 2016-12-01 21:09 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: David Miller, jiri, netdev, idosch, eladr, yotamg, nogahf,
	arkadis, ogerlitz, roopa, dsa, nikolay, andy, vivien.didelot,
	andrew, f.fainelli, alexander.h.duyck, kaber
In-Reply-To: <20161201205457.xf5evjphjj6mytkf@splinter>

On 01.12.2016 21:54, Ido Schimmel wrote:
> On Thu, Dec 01, 2016 at 09:40:48PM +0100, Hannes Frederic Sowa wrote:
>> On 01.12.2016 21:04, David Miller wrote:
>>>
>>> Hannes and Ido,
>>>
>>> It looks like we are very close to having this in mergable shape, can
>>> you guys work out this final issue and figure out if it really is
>>> a merge stopped or not?
>>
>> Sure, if the fib notification register could be done under protection of
>> the sequence counter I don't see any more problems.
> 
> Did you maybe miss my reply yesterday? Because I was trying to
> understand what "ordering" you're referring to, but didn't receive a
> reply from you.

Oh, strange, I am pretty sure I replied to that. Let me resend it.

>> The sync handler is nice to have and can be done in a later patch series.
> 
> Sync handler?

I was talking about SWITCHDEV_SYNC.

Bye,
Hannes

^ permalink raw reply

* Re: [patch net-next v3 11/12] mlxsw: spectrum_router: Request a dump of FIB tables during init
From: Ido Schimmel @ 2016-12-01 21:21 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: David Miller, jiri, netdev, idosch, eladr, yotamg, nogahf,
	arkadis, ogerlitz, roopa, dsa, nikolay, andy, vivien.didelot,
	andrew, f.fainelli, alexander.h.duyck, kaber
In-Reply-To: <343eadfa-f872-788d-748c-a195c0c4d03a@stressinduktion.org>

On Thu, Dec 01, 2016 at 10:09:19PM +0100, Hannes Frederic Sowa wrote:
> On 01.12.2016 21:54, Ido Schimmel wrote:
> > On Thu, Dec 01, 2016 at 09:40:48PM +0100, Hannes Frederic Sowa wrote:
> >> On 01.12.2016 21:04, David Miller wrote:
> >>>
> >>> Hannes and Ido,
> >>>
> >>> It looks like we are very close to having this in mergable shape, can
> >>> you guys work out this final issue and figure out if it really is
> >>> a merge stopped or not?
> >>
> >> Sure, if the fib notification register could be done under protection of
> >> the sequence counter I don't see any more problems.
> > 
> > Did you maybe miss my reply yesterday? Because I was trying to
> > understand what "ordering" you're referring to, but didn't receive a
> > reply from you.
> 
> Oh, strange, I am pretty sure I replied to that. Let me resend it.

:)

I did get this reply, and then replied myself here:
https://marc.info/?l=linux-netdev&m=148053017425465&w=2

^ permalink raw reply

* Re: [flamebait] xdp, well meaning but pointless
From: Hannes Frederic Sowa @ 2016-12-01 21:27 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Thomas Graf, Florian Westphal, Linux Kernel Network Developers
In-Reply-To: <CALx6S36=Y0dSux+-omWXZKtxb73_g4+DXhPBawb79+UF6rp9rw@mail.gmail.com>

On 01.12.2016 22:12, Tom Herbert wrote:
> On Thu, Dec 1, 2016 at 12:44 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> Hello,
>>
>> this is a good conversation and I simply want to bring my worries
>> across. I don't have good solutions for the problems XDP tries to solve
>> but I fear we could get caught up in maintenance problems in the long
>> term given the ideas floating around on how to evolve XDP currently.
>>
>> On 01.12.2016 17:28, Thomas Graf wrote:
>>> On 12/01/16 at 04:52pm, Hannes Frederic Sowa wrote:
>>>> First of all, this is a rant targeted at XDP and not at eBPF as a whole.
>>>> XDP manipulates packets at free will and thus all security guarantees
>>>> are off as well as in any user space solution.
>>>>
>>>> Secondly user space provides policy, acl, more controlled memory
>>>> protection, restartability and better debugability. If I had multi
>>>> tenant workloads I would definitely put more complex "business/acl"
>>>> logic into user space, so I can make use of LSM and other features to
>>>> especially prevent a network facing service to attack the tenants. If
>>>> stuff gets put into the kernel you run user controlled code in the
>>>> kernel exposing a much bigger attack vector.
>>>>
>>>> What use case do you see in XDP specifically e.g. for container networking?
>>>
>>> DDOS mitigation to protect distributed applications in large clusters.
>>> Relying on CDN works to protect API gateways and frontends (as long as
>>> they don't throw you out of their network) but offers no protection
>>> beyond that, e.g. a noisy/hostile neighbour. Doing this at the server
>>> level and allowing the mitigation capability to scale up with the number
>>> of servers is natural and cheap.
>>
>> So far we e.g. always considered L2 attacks a problem of the network
>> admin to correctly protect the environment. Are you talking about
>> protecting the L3 data plane? Are there custom proprietary protocols in
>> place which need custom protocol parsers that need involvement of the
>> kernel before it could verify the packet?
>>
>> In the past we tried to protect the L3 data plane as good as we can in
>> Linux to allow the plain old server admin to set an IP address on an
>> interface and install whatever software in user space. We try not only
>> to protect it but also try to achieve fairness by adding a lot of
>> counters everywhere. Are protections missing right now or are we talking
>> about better performance?
>>
> The technical plenary at last IETF on Seoul a couple of weeks ago was
> exclusively focussed on DDOS in light of the recent attack against
> Dyn. There were speakers form Cloudflare and Dyn. The Cloudflare
> presentation by Nick Sullivan
> (https://www.ietf.org/proceedings/97/slides/slides-97-ietf-sessb-how-to-stay-online-harsh-realities-of-operating-in-a-hostile-network-nick-sullivan-01.pdf)
> alluded to some implementation of DDOS mitigation. In particular, on
> slide 6 Nick gave some numbers for drop rates in DDOS. The "kernel"
> numbers he gave we're based in iptables+BPF and that was a whole
> 1.2Mpps-- somehow that seems ridiculously to me (I said so at the mic
> and that's also when I introduced XDP to whole IETF :-) ). If that's
> the best we can do the Internet is in a world hurt. DDOS mitigation
> alone is probably a sufficient motivation to look at XDP. We need
> something that drops bad packets as quickly as possible when under
> attack, we need this to be integrated into the stack, we need it to be
> programmable to deal with the increasing savvy of attackers, and we
> don't want to be forced to be dependent on HW solutions. This is why
> we created XDP!

I totally understand that. But in my reply to David in this thread I
mentioned DNS apex processing as being problematic which is actually
being referred in your linked slide deck on page 9 ("What do floods look
like") and the problematic of parsing DNS packets in XDP due to string
processing and looping inside eBPF.

Not to mention the fact that you might have to deal with fragments in
the Internet. Some DOS mitigations were already abused to generate
blackholes for other users. Filtering such stuff is quite complicated.

I argued also under the aspect of what Thomas said, that the outside
world of the cluster is already protected by a CDN.

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH iproute2] ip: update link types to show 6lowpan and ieee802.15.4 monitor
From: Stefan Schmidt @ 2016-12-01 21:31 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev@vger.kernel.org, linux-wpan@vger.kernel.org
In-Reply-To: <1477647723-14641-1-git-send-email-stefan@datenfreihafen.org>

Hello.

On 28.10.2016 11:42, Stefan Schmidt wrote:
> Both types have been missing here and thus ip always showed
> only the numbers.
> 
> Based on a suggestion from Alexander Aring.
> 
> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>

Did you somehow mangle this patch manually?

Looking at the patch in your git repo it shows no author name but
instead just my patch was send with git format-patch and git send-email
as usual and shows the right author. Was there something worn on my side
or yours? Just checking to avoid it in the future.

http://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/commit/?id=8ae2c5382bd9d98a8f7ddcb1faad1a978d773909

regards
Stefan Schmidt

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox