Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net 2/2] openvswitch: Clear the L4 portion of the key for "later" fragments.
From: Pravin Shelar @ 2019-08-25 17:00 UTC (permalink / raw)
  To: Justin Pettit; +Cc: Linux Kernel Network Developers, Joe Stringer
In-Reply-To: <20190824165846.79627-2-jpettit@ovn.org>

On Sat, Aug 24, 2019 at 9:58 AM Justin Pettit <jpettit@ovn.org> wrote:
>
> Only the first fragment in a datagram contains the L4 headers.  When the
> Open vSwitch module parses a packet, it always sets the IP protocol
> field in the key, but can only set the L4 fields on the first fragment.
> The original behavior would not clear the L4 portion of the key, so
> garbage values would be sent in the key for "later" fragments.  This
> patch clears the L4 fields in that circumstance to prevent sending those
> garbage values as part of the upcall.
>
> Signed-off-by: Justin Pettit <jpettit@ovn.org>
> ---
>  net/openvswitch/flow.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index bc89e16e0505..0fb2cec08523 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -623,6 +623,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>                 offset = nh->frag_off & htons(IP_OFFSET);
>                 if (offset) {
>                         key->ip.frag = OVS_FRAG_TYPE_LATER;
> +                       memset(&key->tp, 0, sizeof(key->tp));
>                         return 0;
>                 }
>                 if (nh->frag_off & htons(IP_MF) ||
> @@ -740,8 +741,10 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>                         return error;
>                 }
>
> -               if (key->ip.frag == OVS_FRAG_TYPE_LATER)
> +               if (key->ip.frag == OVS_FRAG_TYPE_LATER) {
> +                       memset(&key->tp, 0, sizeof(key->tp));
>                         return 0;
> +               }
>                 if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
>                         key->ip.frag = OVS_FRAG_TYPE_FIRST;
>

Looks good to me.

Acked-by: Pravin B Shelar <pshelar@ovn.org>

Thanks,
Pravin.

^ permalink raw reply

* Re: [PATCH net 1/2] openvswitch: Properly set L4 keys on "later" IP fragments.
From: Pravin Shelar @ 2019-08-25 16:54 UTC (permalink / raw)
  To: Justin Pettit; +Cc: Linux Kernel Network Developers, Joe Stringer
In-Reply-To: <20190824165846.79627-1-jpettit@ovn.org>

On Sat, Aug 24, 2019 at 9:58 AM Justin Pettit <jpettit@ovn.org> wrote:
>
> When IP fragments are reassembled before being sent to conntrack, the
> key from the last fragment is used.  Unless there are reordering
> issues, the last fragment received will not contain the L4 ports, so the
> key for the reassembled datagram won't contain them.  This patch updates
> the key once we have a reassembled datagram.
>
> Signed-off-by: Justin Pettit <jpettit@ovn.org>
> ---
>  net/openvswitch/conntrack.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 848c6eb55064..f40ad2a42086 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -524,6 +524,10 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key,
>                 return -EPFNOSUPPORT;
>         }
>
> +       /* The key extracted from the fragment that completed this datagram
> +        * likely didn't have an L4 header, so regenerate it. */
> +       ovs_flow_key_update(skb, key);
> +
>         key->ip.frag = OVS_FRAG_TYPE_NONE;
>         skb_clear_hash(skb);
>         skb->ignore_df = 1;
> --

Looks good to me.

Acked-by: Pravin B Shelar <pshelar@ovn.org>

Thanks,
Pravin.

^ permalink raw reply

* Re: [PATCH net v2] openvswitch: Fix conntrack cache with timeout
From: Pravin Shelar @ 2019-08-25 16:53 UTC (permalink / raw)
  To: Yi-Hung Wei; +Cc: Linux Kernel Network Developers
In-Reply-To: <CAG1aQhLkrvVADEtAFcdO+GX03SGx7GMW1YyLVTGrPjiCz1HMyQ@mail.gmail.com>

On Fri, Aug 23, 2019 at 9:40 AM Yi-Hung Wei <yihung.wei@gmail.com> wrote:
>
> On Thu, Aug 22, 2019 at 11:51 PM Pravin Shelar <pshelar@ovn.org> wrote:
> >
> > On Thu, Aug 22, 2019 at 1:28 PM Yi-Hung Wei <yihung.wei@gmail.com> wrote:
> > >
> > > This patch addresses a conntrack cache issue with timeout policy.
> > > Currently, we do not check if the timeout extension is set properly in the
> > > cached conntrack entry.  Thus, after packet recirculate from conntrack
> > > action, the timeout policy is not applied properly.  This patch fixes the
> > > aforementioned issue.
> > >
> > > Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action")
> > > Reported-by: kbuild test robot <lkp@intel.com>
> > > Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
> > > ---
> > > v1->v2: Fix rcu dereference issue reported by kbuild test robot.
> > > ---
> > >  net/openvswitch/conntrack.c | 13 +++++++++++++
> > >  1 file changed, 13 insertions(+)
> > >
> > > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> > > index 848c6eb55064..4d7896135e73 100644
> > > --- a/net/openvswitch/conntrack.c
> > > +++ b/net/openvswitch/conntrack.c
> > > @@ -1657,6 +1666,10 @@ int ovs_ct_copy_action(struct net *net, const struct nlattr *attr,
> > >                                       ct_info.timeout))
> > >                         pr_info_ratelimited("Failed to associated timeout "
> > >                                             "policy `%s'\n", ct_info.timeout);
> > > +               else
> > > +                       ct_info.nf_ct_timeout = rcu_dereference(
> > > +                               nf_ct_timeout_find(ct_info.ct)->timeout);
> > Is this dereference safe from NULL pointer?
>
> Hi Pravin,
>
> Thanks for your review.  I am not sure if
> nf_ct_timeout_find(ct_info.ct) will return NULL in this case.
>
> We only run into this statement when ct_info.timeout[0] is set, and it
> is only set in parse_ct() when CONFIG_NF_CONNTRACK_TIMEOUT is
> configured.  Also, in this else condition the timeout extension is
> supposed to be set properly by nf_ct_set_timeout().
>

Sounds good.
Acked-by: Pravin B Shelar <pshelar@ovn.org>

Thanks,
Pravin.

^ permalink raw reply

* Re: [PATCH net] openvswitch: Fix log message in ovs conntrack
From: Pravin Shelar @ 2019-08-25 16:48 UTC (permalink / raw)
  To: Yi-Hung Wei; +Cc: Linux Kernel Network Developers
In-Reply-To: <1566432970-13377-1-git-send-email-yihung.wei@gmail.com>

On Wed, Aug 21, 2019 at 5:27 PM Yi-Hung Wei <yihung.wei@gmail.com> wrote:
>
> Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action")
> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
> ---
>  net/openvswitch/conntrack.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 45498fcf540d..0d5ab4957ec0 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -1574,7 +1574,7 @@ static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info,
>                 case OVS_CT_ATTR_TIMEOUT:
>                         memcpy(info->timeout, nla_data(a), nla_len(a));
>                         if (!memchr(info->timeout, '\0', nla_len(a))) {
> -                               OVS_NLERR(log, "Invalid conntrack helper");
> +                               OVS_NLERR(log, "Invalid conntrack timeout");
>                                 return -EINVAL;
>                         }
>                         break;
Acked-by: Pravin B Shelar <pshelar@ovn.org>

Thanks,
Pravin.

^ permalink raw reply

* Re: [PATCH net-next v3 4/6] net: dsa: mv88e6xxx: simplify SERDES code for Topaz and Peridot
From: Marek Behun @ 2019-08-25 16:36 UTC (permalink / raw)
  To: Vivien Didelot; +Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean
In-Reply-To: <20190825120232.GG6729@t480s.localdomain>

On Sun, 25 Aug 2019 12:02:32 -0400
Vivien Didelot <vivien.didelot@gmail.com> wrote:

> Aren't you relying on -ENODEV as well?

Vivien, I am not relying o -ENODEV. I changed the serdes_get_lane
semantics:
 - previously:
   - if port has a lane for current cmode, return given lane number
   - otherwise return -ENODEV
   - if other error occured during serdes_get_lane, return that error
     (this never happened, because all implementations only need port
     number and cmode, and cmode is cached, so no function was called
     that could err)
 - after this commit:
   - if port has a lane for current cmode, return 0 and put lane number
     into *lane
   - otherwise return 0 and put -1 into *lane
   - if error occured, return that error number

I removed the -ENODEV semantics for "no lane on port" event.
There are two reasons for this:
  1. once you requested lane number to be put into a place pointed to
     by a pointer, rather than the return value, the code seemed better
     to me (you may of course disagree, this is a personal opinion) when
     I did:
       if (err)
           return err;
       if (lane < 0)
           return 0;
     rather than
       if (err == -ENODEV)
           return 0;
       if (err)
           return err;
  2. some future implementation may actually need to call some MDIO
     read/write functions, which may or may not return -ENODEV. That
     could conflict with the -ENODEV returned when there is no lane.

Marek

^ permalink raw reply

* Re: [PATCH 0/3] Add NETIF_F_HW_BRIDGE feature
From: Horatiu Vultur @ 2019-08-25 16:30 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Florian Fainelli, roopa, nikolay, davem, UNGLinuxDriver,
	alexandre.belloni, allan.nielsen, netdev, linux-kernel, bridge
In-Reply-To: <20190824074204.GA15041@nanopsycho.orion>

The 08/24/2019 09:42, Jiri Pirko wrote:
> External E-Mail
> 
> 
> Sat, Aug 24, 2019 at 01:25:02AM CEST, f.fainelli@gmail.com wrote:
> >On 8/22/19 12:07 PM, Horatiu Vultur wrote:
> >> Current implementation of the SW bridge is setting the interfaces in
> >> promisc mode when they are added to bridge if learning of the frames is
> >> enabled.
> >> In case of Ocelot which has HW capabilities to switch frames, it is not
> >> needed to set the ports in promisc mode because the HW already capable of
> >> doing that. Therefore add NETIF_F_HW_BRIDGE feature to indicate that the
> >> HW has bridge capabilities. Therefore the SW bridge doesn't need to set
> >> the ports in promisc mode to do the switching.
> >
> >Then do not do anything when the ndo_set_rx_mode() for the ocelot
> >network device is called and indicates that IFF_PROMISC is set and that
> >your network port is a bridge port member. That is what mlxsw does AFAICT.

Yes, but then if you want to monitor all the traffic on a bridge port
you will not be able to do that. And this seems to be a limitation.
This is the case for mlxsw and ocelot(it doesn't implement at all
promisc mode) and might be others.

> 
> Correct.
> 
> >
> >As other pointed out, the Linux bridge implements a software bridge by
> >default, and because it needs to operate on a wide variety of network
> >devices, all with different capabilities, the easiest way to make sure
> >that all management (IGMP, BPDU, etc. ) as well as non-management
> >traffic can make it to the bridge ports, is to put the network devices
> >in promiscuous mode.

What if the HW can copy all the management traffic to the SW bridge and
HW knows to learn and flood frames. Then there is no point to set a
network port in promisc mode just because it is a bridge port member.
> >If this is suboptimal for you, you can take
> >shortcuts in your driver that do not hinder the overall functionality.

If I add this check, I don't see how any other network drivers will be
affected by this. If a network driver will start to use this then it
needs to know that the HW should be configure to include CPU in the
flood mask and to know which addresses can be reached through SW bridge.

> >
> >> This optimization takes places only if all the interfaces that are part
> >> of the bridge have this flag and have the same network driver.
> >> 
> >> If the bridge interfaces is added in promisc mode then also the ports part
> >> of the bridge are set in promisc mode.
> >> 
> >> Horatiu Vultur (3):
> >>   net: Add HW_BRIDGE offload feature
> >>   net: mscc: Use NETIF_F_HW_BRIDGE
> >>   net: mscc: Implement promisc mode.
> >> 
> >>  drivers/net/ethernet/mscc/ocelot.c | 26 ++++++++++++++++++++++++--
> >>  include/linux/netdev_features.h    |  3 +++
> >>  net/bridge/br_if.c                 | 29 ++++++++++++++++++++++++++++-
> >>  net/core/ethtool.c                 |  1 +
> >>  4 files changed, 56 insertions(+), 3 deletions(-)
> >> 
> >
> >
> >-- 
> >Florian
> 

-- 
/Horatiu

^ permalink raw reply

* Re: [PATCH net] ipv4: mpls: fix mpls_xmit for iptunnel
From: David Ahern @ 2019-08-25 16:23 UTC (permalink / raw)
  To: Alexey Kodanev, netdev; +Cc: David Miller
In-Reply-To: <1566582703-26567-1-git-send-email-alexey.kodanev@oracle.com>

On 8/23/19 11:51 AM, Alexey Kodanev wrote:
> When using mpls over gre/gre6 setup, rt->rt_gw4 address is not set, the
> same for rt->rt_gw_family.  Therefore, when rt->rt_gw_family is checked
> in mpls_xmit(), neigh_xmit() call is skipped. As a result, such setup
> doesn't work anymore.
> 
> This issue was found with LTP mpls03 tests.
> 
> Fixes: 1550c171935d ("ipv4: Prepare rtable for IPv6 gateway")
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
> ---
>  net/mpls/mpls_iptunnel.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

ok, I see now. This is a device only route with MPLS encap:

10.23.0.1  encap mpls  60 dev ltp_v0 scope link

and the change reverts to 5.1 behavior unless the gateway is IPv6 (new
behavior). Thanks for the patch.

Reviewed-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply

* Re: [PATCH net-next v3 3/6] net: dsa: mv88e6xxx: create serdes_get_lane chip operation
From: Vivien Didelot @ 2019-08-25 16:12 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean,
	Marek Behún
In-Reply-To: <20190825035915.13112-4-marek.behun@nic.cz>

On Sun, 25 Aug 2019 05:59:12 +0200, Marek Behún <marek.behun@nic.cz> wrote:
>  int mv88e6390_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on)
>  {
> -	int lane;
> -
> -	lane = mv88e6390_serdes_get_lane(chip, port);
> -	if (lane == -ENODEV)
> -		return 0;
> +	s8 lane;
> +	int err;
>  
> +	err = mv88e6xxx_serdes_get_lane(chip, port, &lane);
> +	if (err)
> +		return err;
>  	if (lane < 0)
> -		return lane;
> +		return 0;

In fact you're also relying on -ENODEV, which is what you return here (and in
other places) instead of 0. So I'm afraid you have to address my comment now...

^ permalink raw reply

* Re: [PATCH net-next v3 4/6] net: dsa: mv88e6xxx: simplify SERDES code for Topaz and Peridot
From: Vivien Didelot @ 2019-08-25 16:02 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean,
	Marek Behún
In-Reply-To: <20190825035915.13112-5-marek.behun@nic.cz>

Hi Marek,

On Sun, 25 Aug 2019 05:59:13 +0200, Marek Behún <marek.behun@nic.cz> wrote:
> +int mv88e6341_serdes_get_lane(struct mv88e6xxx_chip *chip, int port, s8 *lane)
> +{
> +	u8 cmode = chip->ports[port].cmode;
> +
> +	*lane = -1;
> +
> +	if (port != 5)
> +		return 0;

Aren't you relying on -ENODEV as well?

> +
> +	if (cmode == MV88E6XXX_PORT_STS_CMODE_1000BASE_X ||
> +	    cmode == MV88E6XXX_PORT_STS_CMODE_SGMII ||
> +	    cmode == MV88E6XXX_PORT_STS_CMODE_2500BASEX)
> +		*lane = MV88E6341_PORT5_LANE;
> +
> +	return 0;
> +}

^ permalink raw reply

* Re: [PATCH net-next v3 3/6] net: dsa: mv88e6xxx: create serdes_get_lane chip operation
From: Vivien Didelot @ 2019-08-25 15:48 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean,
	Marek Behún
In-Reply-To: <20190825035915.13112-4-marek.behun@nic.cz>

Hi Marek,

On Sun, 25 Aug 2019 05:59:12 +0200, Marek Behún <marek.behun@nic.cz> wrote:
>  void mv88e6390x_serdes_irq_free(struct mv88e6xxx_chip *chip, int port)
>  {
> -	int lane = mv88e6390x_serdes_get_lane(chip, port);
> +	int err;
> +	s8 lane;
>  
> -	if (lane == -ENODEV)
> +	err = mv88e6xxx_serdes_get_lane(chip, port, &lane);
> +	if (err) {
> +		dev_err(chip->dev, "Unable to free SERDES irq: %d\n", err);
>  		return;
> -
> +	}
>  	if (lane < 0)
>  		return;

[...]

> -int mv88e6390x_serdes_get_lane(struct mv88e6xxx_chip *chip, int port);
> +/* Put the SERDES lane address a port is using into *lane. If a port has
> + * multiple lanes, should put the first lane the port is using. If a port does
> + * not have a lane, put -1 into *lane.
> + */
> +static inline int mv88e6xxx_serdes_get_lane(struct mv88e6xxx_chip *chip,
> +					    int port, s8 *lane)
> +{
> +	if (!chip->info->ops->serdes_get_lane)
> +		return -EOPNOTSUPP;
> +
> +	return chip->info->ops->serdes_get_lane(chip, port, lane);
> +}

Using an invalid value is only useful if it can be interpreted by the other
functions of the API. So I would've make lane an u8, assuming it gets modified
only on success, which would result in calls like this one:

    u8 lane;
    int err;

    err = mv88e6xxx_serdes_get_lane(chip, port, &lane);
    if (err) {
        if (err == -ENODEV)
            err = 0;
        return err;
    }

But at least it is well documented, so we can eventually fine tuned later:

Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>


Thank you!

	Vivien

^ permalink raw reply

* Re: [PATCH net-next v3 2/6] net: dsa: mv88e6xxx: update code operating on hidden registers
From: Vivien Didelot @ 2019-08-25 15:07 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean,
	Marek Behún
In-Reply-To: <20190825035915.13112-3-marek.behun@nic.cz>

Hi Marek,

On Sun, 25 Aug 2019 05:59:11 +0200, Marek Behún <marek.behun@nic.cz> wrote:
> This patch moves the functions operating on the hidden debug registers
> into it's own file, port_hidden.c. The functions prefix is renamed from
> mv88e6390_hidden_ to mv88e6xxx_port_hidden_, to be consistent with the
> rest of this driver. The macros are prefixed with MV88E6XXX_ prefix, and
> are changed not to use the BIT() macro nor bit shifts, since the rest of
> the port.h file does not use it.
> 
> We also add the support for setting the Block Address field when
> operating hidden registers. Marvell's mdio examples for SERDES settings
> on Topaz use Block Address 0x7 when reading/writing hidden registers,
> and although the specification says that block must be set to 0xf, those
> settings are reachable only with Block Address 0x7.
> 
> Signed-off-by: Marek Behún <marek.behun@nic.cz>

Even though there are several semantic changes here, bisectability is
preserved, and blaming the new functions will easily show which commit
introduced this new API and why. Much more readable and well documented:

Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Florian Fainelli @ 2019-08-25 15:00 UTC (permalink / raw)
  To: Marek Behun
  Cc: netdev, Andrew Lunn, Vivien Didelot, David Ahern,
	Stephen Hemminger, Chris Healy, Vladimir Oltean
In-Reply-To: <20190825091337.236ee73a@nic.cz>

On 8/25/2019 12:13 AM, Marek Behun wrote:
> On Sat, 24 Aug 2019 13:04:04 -0700
> Florian Fainelli <f.fainelli@gmail.com> wrote:
> 
>> Now, the 4.9 kernel behavior actually works just fine because eth1 is
>> not a special interface, so no tagging is expected, and "wifi", although
>> it supports DSA tagging, represents another side of the CPU/host network
>> stack, so you never have to inject frames into the switch, because you
>> can use eth1 to do that and let MAC learning do its job to forward to
>> the correct port of the switch.
> 
> Hi Florian,
> 
> Sorry, I am having trouble understanding what you mean in the
> paragraph I quoted above (and paragraphs afterwards).
> 
> eth0 and eth1 are interfaces created by an ethernet driver.
> wlan0 is an interface created by wireless driver.
> wifi is a slave interface created by DSA for port 5 on the switch.
> eth1 is DSA slave or a DSA master connected to port 5?

"wifi" is a DSA slave network interface, and as you understood, eth1 is
a standard Ethernet driver (which happens to be the same as eth0, since
on that system there are two identical controllers, driver is
bcmsysport.c). The relevant part of the DTS for that system looks like this:

port@5 {
	reg = <5>;
	label = "wifi";
	ethernet = <&enet_1>;
};

port@8 {
	reg = <8>;
	label = "cpu";
	ethernet = <&enet_0>;
};

So as you can see, the devices are all described correctly, simply the
behavior on the Linux change changes whether you have commit
6d4e5c570c2d66c806ecc6bd851fcf881fe8a38e ("net: dsa: get port type at
parse time") included or not. With this commit, the criteria for
determining what is a DSA/CPU port changed: it used to be based on the
label (e.g.: "cpu", "dsa") and then it changed to be based on whether a
phandle property is either "ethernet" (port is CPU) or "link" (port is
DSA), which is the right thing to do, but it no longer allows us to be
specific about which port is going to be elected as the CPU port.
Fortunately the switch driver is coded with the assumption that either
port 5 or 8 could be used.

Note that when we do not have a network device that represents the
switch"side (e.g.: the CPU port or the DSA port), we had to jump through
many hoops in order to make some information visible to the user, or
even overlay specific operations onto the DSA master, that code is under
net/dsa/master.c.

> 
> How does DSA handle two interfaces with same reg property?

This is not happening, see DTS example above.
-- 
Florian

^ permalink raw reply

* [PATCH net] nexthop: Fix nexthop_num_path for blackhole nexthops
From: David Ahern @ 2019-08-25 14:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, sharpd, David Ahern

From: David Ahern <dsahern@gmail.com>

Donald reported this sequence:
  ip next add id 1 blackhole
  ip next add id 2 blackhole
  ip ro add 1.1.1.1/32 nhid 1
  ip ro add 1.1.1.2/32 nhid 2

would cause a crash. Backtrace is:

[  151.302790] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  151.304043] CPU: 1 PID: 277 Comm: ip Not tainted 5.3.0-rc5+ #37
[  151.305078] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
[  151.306526] RIP: 0010:fib_add_nexthop+0x8b/0x2aa
[  151.307343] Code: 35 f7 81 48 8d 14 01 c7 02 f1 f1 f1 f1 c7 42 04 01 f4 f4 f4 48 89 f2 48 c1 ea 03 65 48 8b 0c 25 28 00 00 00 48 89 4d d0 31 c9 <80> 3c 02 00 74 08 48 89 f7 e8 1a e8 53 ff be 08 00 00 00 4c 89 e7
[  151.310549] RSP: 0018:ffff888116c27340 EFLAGS: 00010246
[  151.311469] RAX: dffffc0000000000 RBX: ffff8881154ece00 RCX: 0000000000000000
[  151.312713] RDX: 0000000000000004 RSI: 0000000000000020 RDI: ffff888115649b40
[  151.313968] RBP: ffff888116c273d8 R08: ffffed10221e3757 R09: ffff888110f1bab8
[  151.315212] R10: 0000000000000001 R11: ffff888110f1bab3 R12: ffff888115649b40
[  151.316456] R13: 0000000000000020 R14: ffff888116c273b0 R15: ffff888115649b40
[  151.317707] FS:  00007f60b4d8d800(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
[  151.319113] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  151.320119] CR2: 0000555671ffdc00 CR3: 00000001136ba005 CR4: 0000000000020ee0
[  151.321367] Call Trace:
[  151.321820]  ? fib_nexthop_info+0x635/0x635
[  151.322572]  fib_dump_info+0xaa4/0xde0
[  151.323247]  ? fib_create_info+0x2431/0x2431
[  151.324008]  ? napi_alloc_frag+0x2a/0x2a
[  151.324711]  rtmsg_fib+0x2c4/0x3be
[  151.325339]  fib_table_insert+0xe2f/0xeee
...

fib_dump_info incorrectly has nhs = 0 for blackhole nexthops, so it
believes the nexthop object is a multipath group (nhs != 1) and ends
up down the nexthop_mpath_fill_node() path which is wrong for a
blackhole.

The blackhole check in nexthop_num_path is leftover from early days
of the blackhole implementation which did not initialize the device.
In the end the design was simpler (fewer special case checks) to set
the device to loopback in nh_info, so the check in nexthop_num_path
should have been removed.

Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Reported-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/net/nexthop.h | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/include/net/nexthop.h b/include/net/nexthop.h
index 25f1f9a8419b..95f766c31c90 100644
--- a/include/net/nexthop.h
+++ b/include/net/nexthop.h
@@ -141,12 +141,6 @@ static inline unsigned int nexthop_num_path(const struct nexthop *nh)
 
 		nh_grp = rcu_dereference_rtnl(nh->nh_grp);
 		rc = nh_grp->num_nh;
-	} else {
-		const struct nh_info *nhi;
-
-		nhi = rcu_dereference_rtnl(nh->nh_info);
-		if (nhi->reject_nh)
-			rc = 0;
 	}
 
 	return rc;
-- 
2.11.0


^ permalink raw reply related

* Re: [PATCH net-next v2] net: openvswitch: Set OvS recirc_id from tc chain index
From: Paul Blakey @ 2019-08-25 14:11 UTC (permalink / raw)
  To: David Miller
  Cc: pshelar@ovn.org, netdev@vger.kernel.org, jpettit@nicira.com,
	simon.horman@netronome.com, marcelo.leitner@gmail.com,
	Vlad Buslov, Jiri Pirko, Roi Dayan, Yossi Kuperman, Rony Efraim,
	Oz Shlomo
In-Reply-To: <20190821.205735.2069656948701231785.davem@davemloft.net>

On 8/22/2019 6:57 AM, David Miller wrote:

> From: Paul Blakey <paulb@mellanox.com>
> Date: Tue, 20 Aug 2019 15:30:51 +0300
>
>> @@ -4050,6 +4060,9 @@ enum skb_ext_id {
>>   #ifdef CONFIG_XFRM
>>   	SKB_EXT_SEC_PATH,
>>   #endif
>> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
>> +	TC_SKB_EXT,
>> +#endif
>>   	SKB_EXT_NUM, /* must be last */
>>   };
> Sorry, no.
>
> The SKB extensions are not a dumping ground for people to use when they can't
> figure out another more reasonable place to put their values.  Try to use
> the normal cb[], and if you can't you must explain in exhaustive detail
> why you cannot in any way whatsoever make that work.
>
> Again, SKB extensions are not a dumping ground.
Hi,
The general context of this skb extension patch is hardware offload of 
multi chain rules.
This patch shows only one usage of this extension which is by tc -> OvS 
miss path.
But we also plan to reuse this extension to pass information from 
HW/Driver -> tc miss path.

In tc multi chain rules scenarios, some of the rules might be offloaded
and some not (e.g skip_hw, unsupported rules by HW, vxlan encapsulation, 
offload order, etc).
Therefore, HW can miss at any point of the processing chain.
SW will need to continue processing in correct tc chain where the HW 
left off, as HW might have modified the packet and updated stats for it.
This scenario can reuse this tc SKB extension to restore the tc chain.

Skb control block acts a scratchpad area for storing temporary 
information and isn't suppose
to be used to pass around information between different layers of 
processing.
HW/Driver -> tc - >OvS  are different layers, and not necessarily 
processing the packet one after another.
There can be bridges, tunnel devices, VLAN devices, Netfilter 
(Conntrack) and a host of other entities processing the packet in between
so we can't guarantee the control block integrity between this main 
processing entities (HW/Driver, Tc, Ovs).
So if we'll use the control block, it will restrict such use cases.
For example, the napi API which we use, uses the control block and comes 
right after our driver layer.
This will overwrite any usage of CB by us.

Thanks,
Paul B.

^ permalink raw reply

* Re: [PATCHv2 1/1] net: rds: add service level support in rds-info
From: Zhu Yanjun @ 2019-08-25 14:11 UTC (permalink / raw)
  To: David Miller
  Cc: santosh.shilimkar, netdev, linux-rdma, rds-devel, gerd.rausch
In-Reply-To: <20190824.165851.1817456673626840850.davem@davemloft.net>


On 2019/8/25 7:58, David Miller wrote:
> From: Zhu Yanjun <yanjun.zhu@oracle.com>
> Date: Fri, 23 Aug 2019 21:04:16 -0400
>
>> diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h
>> index fd6b5f6..cba368e 100644
>> --- a/include/uapi/linux/rds.h
>> +++ b/include/uapi/linux/rds.h
>> @@ -250,6 +250,7 @@ struct rds_info_rdma_connection {
>>   	__u32		rdma_mr_max;
>>   	__u32		rdma_mr_size;
>>   	__u8		tos;
>> +	__u8		sl;
>>   	__u32		cache_allocs;
>>   };
> I'm applying this, but I am once again severely disappointed in how
> RDS development is being handled.
>
> >From the Fixes: commit:
>
> 	Since rds.h in rds-tools is not related with the kernel rds.h,
> 	the change in kernel rds.h does not affect rds-tools.
>
> This is the height of arrogance and shows a lack of understanding of
> what user ABI requirements are all about.
>
> It is possible for other userland components to be built by other
> people, outside of your controlled eco-system and tools, that use
> these interfaces.
>
> And you cannot control that.
>
> Therefore you cannot make arbitrary changes to UABI data strucures
> just because the tool you use and maintain is not effected by it.
>
> Please stop making these incredibly incompatible user interface
> changes in the RDS stack.
>
> I am, from this point forward, going to be extra strict on RDS stack
> changes especially in this area.

OK. It is up to you to decide to merge this commit or not.

Zhu Yanjun

>
>

^ permalink raw reply

* tx_fixup cdc_ether to mimic cdc_ncm tx behavior
From: Roee Kashi @ 2019-08-25 13:17 UTC (permalink / raw)
  To: netdev

Hi,

I ported from Intel based modem chipset (cdc_ncm) to a Qualcomm's one
(cdc_ether), and encountered a major difference between the two.

cdc_ncm had a nice "feature" (which probably wasn't the original
purpose): when trying to transmit more than the module's capacity,
tx_fixup would return NULL skb, hence fd buffer would remain full,
causing sendto/select to block until modem is available.
It's quite useful when sending UDP datagrams through an LTE link for
example, since the module provides a dynamic and reliable information
in real-time regarding its *incapability* of sending the datagram.

For example:
If my LTE link max upload bandwidth is 30Mbps, and i'll try with
cdc_ncm to transmit above that, the send/select would block until
modem is available, so the actual bandwidth would be 30Mbps with ~0%
packet loss.
with cdc_ncm: `iperf -u -c xxx -b 60Mbps` would report a TX bandwidth
~30Mbps with ~0% loss.

with cdc_ether, even though the modem is unable to transmit the
packet, nothing holds the tx flow: select continue and return fd as
available for tx, even though the modem's buffer is full.
with cdc_ether: `iperf -u -c xxx -b 60Mbps` would report a TX
bandwidth ~60Mbps with ~50% loss.

the difference between cdc_ncm and cdc_ether for this matter, is the
'cdc_ncm_tx_fixup' in cdc_ncm, documented as:
 /*
* The Ethernet API we are using does not support transmitting
* multiple Ethernet frames in a single call. This driver will
* accumulate multiple Ethernet frames and send out a larger
* USB frame when the USB buffer is full or when a single jiffies
* timeout happens.
*/

This fixup adds this useful side-effect to cdc_ncm, and I wonder how
to extend this specific behavior to cdc_ether as well, per flag.
What exactly in cdc_ncm: cdc_ncm_fill_tx_frame, causing this behavior,
and what is the community approach about adopting the described
cdc_ncm behavior?

(qmi_wwan behaves the same as cdc_ether)

Cheers,
Roee.

^ permalink raw reply

* Re: [PATCH net-next v2 0/3] net: dsa: mt7530: Convert to PHYLINK and add support for port 5
From: René van Dorst @ 2019-08-25 13:15 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Sean Wang, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David S . Miller, Matthias Brugger, Frank Wunderlich, netdev,
	linux-mips, linux-mediatek, John Crispin, linux-arm-kernel
In-Reply-To: <20190824222935.GG13294@shell.armlinux.org.uk>

Hi Russell,

Quoting Russell King - ARM Linux admin <linux@armlinux.org.uk>:

> On Wed, Aug 21, 2019 at 04:45:44PM +0200, René van Dorst wrote:
>> 1. net: dsa: mt7530: Convert to PHYLINK API
>>    This patch converts mt7530 to PHYLINK API.
>> 2. dt-bindings: net: dsa: mt7530: Add support for port 5
>> 3. net: dsa: mt7530: Add support for port 5
>>    These 2 patches adding support for port 5 of the switch.
>>
>> v1->v2:
>>  * Mostly phylink improvements after review.
>> rfc -> v1:
>>  * Mostly phylink improvements after review.
>>  * Drop phy isolation patches. Adds no value for now.
>> René van Dorst (3):
>>   net: dsa: mt7530: Convert to PHYLINK API
>>   dt-bindings: net: dsa: mt7530: Add support for port 5
>>   net: dsa: mt7530: Add support for port 5
>>
>>  .../devicetree/bindings/net/dsa/mt7530.txt    | 218 ++++++++++
>>  drivers/net/dsa/mt7530.c                      | 371 +++++++++++++++---
>>  drivers/net/dsa/mt7530.h                      |  61 ++-
>>  3 files changed, 577 insertions(+), 73 deletions(-)
>
> Having looked through this set of patches, I don't see anything
> from the phylink point of view that concerns me.  So, for the
> series from the phylink perspective:
>
> Acked-by: Russell King <rmk+kernel@armlinux.org.uk>

Thanks and thanks for reviewing.

Greats,

René

>
> Thanks.
>
> I did notice a dev_info() in patch 3 that you may like to consider
> whether they should be printed at info level or debug level.  You
> may keep my ack on the patch when fixing that.
>
> I haven't considered whether the patch passes davem's style
> requirements for networking code; what I spotted did look like
> the declarations were upside-down christmas tree.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up




^ permalink raw reply

* Re: [PATCH net-next v2 3/3] net: dsa: mt7530: Add support for port 5
From: René van Dorst @ 2019-08-25 12:48 UTC (permalink / raw)
  To: David Miller
  Cc: sean.wang, andrew, vivien.didelot, f.fainelli, matthias.bgg,
	netdev, linux-arm-kernel, linux-mediatek, john, linux-mips,
	frank-w
In-Reply-To: <20190824.161912.1377369658338940538.davem@davemloft.net>

Hi David,

Quoting David Miller <davem@davemloft.net>:

> From: René van Dorst <opensource@vdorst.com>
> Date: Wed, 21 Aug 2019 16:45:47 +0200
>
>> +	dev_info(ds->dev, "Setup P5, HWTRAP=0x%x, intf_sel=%s, phy-mode=%s\n",
>> +		 val, p5_intf_modes(priv->p5_intf_sel), phy_modes(interface));
>
> This is debugging, at best.  Please make this a debugging message or
> remove it entirely.

I change it to a debug message.

If there is nothing else I send a new version with this change also
add the tags ack-by Russell King and tested-by Frank Wunderlich.

Greats,

René



^ permalink raw reply

* Re: [PATCH net-next v3 2/3] net: ethernet: mediatek: Re-add support SGMII
From: René van Dorst @ 2019-08-25 11:41 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: John Crispin, Sean Wang, Nelson Chang, David S . Miller,
	Matthias Brugger, netdev, linux-arm-kernel, linux-mediatek,
	linux-mips, Frank Wunderlich, Stefan Roese
In-Reply-To: <20190824133225.GE13294@shell.armlinux.org.uk>

Hi Russell,

Quoting Russell King - ARM Linux admin <linux@armlinux.org.uk>:

> Hi René,
>
> On Sat, Aug 24, 2019 at 01:11:17PM +0000, René van Dorst wrote:
>> Hi Russell,
>>
>> Mediatek calls it Turbo RGMII. It is a overclock version of RGMII mode.
>> It is used between first GMAC and port 6 of the mt7530 switch. Can be used
>> with
>> an internal and an external mt7530 switch.
>>
>> TRGMII speed are:
>> * mt7621: 1200Mbit
>> * mt7623: 2000Mbit and 2600Mbit.
>>
>> I think that TRGMII is only used in a fixed-link situation in combination
>> with a
>> mt7530 switch and running and maximum speed/full duplex. So reporting
>> 1000baseT_Full seems to me the right option.
>
> I think we can ignore this one for the purposes of merging this patch
> set, since this seems to be specific to this setup.  Neither 1000BaseT
> nor 1000BaseX fit very well, but we have to choose something.
>
>> PHY_INTERFACE_MODE_GMII:
>> 	  10baseT_Half
>> 	  10baseT_Full
>> 	 100baseT_Half
>> 	 100baseT_Full
>> 	1000baseT_Half
>> 	1000baseT_Full
>
> I think GMII can be connected to a PHY that can convert to 1000BaseX, so
> should probably include that here too.
>

Thanks for reviewing.
I shall add that too.

I send v4 today.

Greats,

René


> Thanks.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up




^ permalink raw reply

* Re: [PATCH 1/3] net: Add HW_BRIDGE offload feature
From: Horatiu Vultur @ 2019-08-25 10:44 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Andrew Lunn, roopa, nikolay, davem, UNGLinuxDriver,
	alexandre.belloni, allan.nielsen, netdev, linux-kernel, bridge
In-Reply-To: <afde1b82-2e4c-5b93-ff31-83cb80a0f7bd@gmail.com>

The 08/23/2019 16:30, Florian Fainelli wrote:
> External E-Mail
> 
> 
> On 8/23/19 5:39 AM, Horatiu Vultur wrote:
> > The 08/22/2019 22:08, Andrew Lunn wrote:
> >> External E-Mail
> >>
> >>
> >>> +/* Determin if the SW bridge can be offloaded to HW. Return true if all
> >>> + * the interfaces of the bridge have the feature NETIF_F_HW_SWITCHDEV set
> >>> + * and have the same netdev_ops.
> >>> + */
> >>
> >> Hi Horatiu
> >>
> >> Why do you need these restrictions. The HW bridge should be able to
> >> learn that a destination MAC address can be reached via the SW
> >> bridge. The software bridge can then forward it out the correct
> >> interface.
> >>
> >> Or are you saying your hardware cannot learn from frames which come
> >> from the CPU?
> >>
> >> 	Andrew
> >>
> > Hi Andrew,
> > 
> > I do not believe that our HW can learn from frames which comes from the
> > CPU, at least not in the way they are injected today. But in case of Ocelot
> > (and the next chip we are working on), we have other issues in mixing with
> > foreign interfaces which is why we have the check in
> > ocelot_netdevice_dev_check.
> > 
> > More important, as we responded to Nikolay, we properly introduced this
> > restriction for the wrong reasons.
> > 
> > In SW bridge I will remove all these restrictions and only set ports in
> > promisc mode only if NETIF_F_HW_BRIDGE is not set.
> > Then in the network driver I can see if a foreign interface is added to
> > the bridge, and when that happens I can set the port in promisc mode.
> > Then the frames will be flooded to the SW bridge which eventually will
> > send to the foreign interface.
> 
> Is that really necessary?
From what I see, it seems to be necessary.

> Is not the skb->fwd_offload_mark as well as
> the phys_switch_id supposed to tell that information to the bridge already?
Yes, the bridge is using the fwd_offload_mark to know that it should or
should not forward the frame. But in case that the network driver knows
that the SW bridge will not do anything with the frame, then there is no
point to send the frame to SW bridge just to use CPU cycles for dropping
the frame.
> -- 
> Florian

-- 
/Horatiu

^ permalink raw reply

* Re: Regresion with dsa_port_disable
From: Marek Behun @ 2019-08-25  8:35 UTC (permalink / raw)
  To: Vivien Didelot; +Cc: Andrew Lunn, netdev
In-Reply-To: <20190824205349.GB27859@t480s.localdomain>

On Sat, 24 Aug 2019 20:53:49 -0400
Vivien Didelot <vivien.didelot@gmail.com> wrote:

> OK I think you meant info->ops->serdes_irq_free and
> info->ops->serdes_irq_setup, otherwise it's confusing.
> 
> I think I know what's going on, I'll look into it soon.

Either dsa_port_setup is calling dsa_port_disable for
DSA_PORT_TYPE_UNUSED and this causes that, or, as line
  Workqueue: events deferred_probe_work_func
suggests, probe is deferred, dsa_tree_setup calls
dsa_tree_teardown_switches, and thus dsa_port_disable is called.
We should add a check into dsa_port_disable if that port was successfuly
enabled, or something like that.

^ permalink raw reply

* Re: [PATCH net-next] MAINTAINERS: Add phylink keyword to SFF/SFP/SFP+ MODULE SUPPORT
From: Sergei Shtylyov @ 2019-08-25  8:07 UTC (permalink / raw)
  To: Andrew Lunn, David Miller; +Cc: Russell King, netdev
In-Reply-To: <20190824223454.15932-1-andrew@lunn.ch>

Hello!

On 25.08.2019 1:34, Andrew Lunn wrote:

> Russell king

    King, with capital K. :-)

> maintains phylink, as part of the SFP module support.
> However, much of the review work is about drivers swapping from phylib
> to phylink. Such changes don't make changes to the phylink core, and
> so the F: rules in MAINTAINERS don't match. Add a K:, keywork rule,

    Keyword?

> which hopefully get_maintainers will match against for patches to MAC
> drivers swapping to phylink.
> 
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Marek Behun @ 2019-08-25  7:13 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, Andrew Lunn, Vivien Didelot, David Ahern,
	Stephen Hemminger, Chris Healy, Vladimir Oltean
In-Reply-To: <a7fed8ab-60f3-a30c-5634-fd89e4daf44d@gmail.com>

On Sat, 24 Aug 2019 13:04:04 -0700
Florian Fainelli <f.fainelli@gmail.com> wrote:

> Now, the 4.9 kernel behavior actually works just fine because eth1 is
> not a special interface, so no tagging is expected, and "wifi", although
> it supports DSA tagging, represents another side of the CPU/host network
> stack, so you never have to inject frames into the switch, because you
> can use eth1 to do that and let MAC learning do its job to forward to
> the correct port of the switch.

Hi Florian,

Sorry, I am having trouble understanding what you mean in the
paragraph I quoted above (and paragraphs afterwards).

eth0 and eth1 are interfaces created by an ethernet driver.
wlan0 is an interface created by wireless driver.
wifi is a slave interface created by DSA for port 5 on the switch.
eth1 is DSA slave or a DSA master connected to port 5?

How does DSA handle two interfaces with same reg property?

Marek

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Marek Behun @ 2019-08-25  4:19 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, Vivien Didelot, Florian Fainelli, David Ahern,
	Stephen Hemminger
In-Reply-To: <20190824152407.GA8251@lunn.ch>

On Sat, 24 Aug 2019 17:24:07 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> That is a new idea. Interesting.
> 
> I would like to look around and see what else uses this "lan1@eth0"
> concept. We need to ensure it is not counter intuitive in general,
> when you consider all possible users.

There are not many users of ndo_get_iflink besides DSA slave:
  ip6_gre, ip6_vti, sit, ip6_tunnel
  ip_gte, ip_vti, ipmr, ipip,
  macsec, macvlan, veth
  ipvlan
  ipoib
and a few other. What these have in common is that all these interfaces
are linked somehow to another interfacem, ie. a macvlan interface eth0.1
is linked to it's master interface eth0. All of these are virtual
interfaces.

Marek

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Marek Behun @ 2019-08-25  4:08 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, Andrew Lunn, Vivien Didelot, David Ahern,
	Stephen Hemminger, Chris Healy, Vladimir Oltean
In-Reply-To: <20190824230121.35a3d59b@nic.cz>

On Sat, 24 Aug 2019 23:01:21 +0200
Marek Behun <marek.behun@nic.cz> wrote:

> the documentation would became weird to users.
... would become weird ...
> 
> We are *already* using the iflink property to report which CPU device
> is used as CPU destination port for a given switch slave interface. So
> why to use that for changing this, also?
... why NOT to use that for chaning this also?
> 
> If you think that iflink should not be used for this, and other agree,
... and others agree with you,

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox