Netdev List
 help / color / mirror / Atom feed
* regression with napi/softirq ?
From: Sudip Mukherjee @ 2019-07-17 20:19 UTC (permalink / raw)
  To: peterz, davem; +Cc: netdev, linux-kernel

Hi All,

I am using v4.14.55 on an Intel Atom based board and I am seeing network
packet drops frequently on wireshark logs. After lots of debugging it
seems that when this happens softirq is taking huge time to start after
it has been raised. This is a small snippet from ftrace:

           <...>-2110  [001] dNH1   466.634916: irq_handler_entry: irq=126 name=eth0-TxRx-0
           <...>-2110  [001] dNH1   466.634917: softirq_raise: vec=3 [action=NET_RX]
           <...>-2110  [001] dNH1   466.634918: irq_handler_exit: irq=126 ret=handled
     ksoftirqd/1-15    [001] ..s.   466.635826: softirq_entry: vec=3 [action=NET_RX]
     ksoftirqd/1-15    [001] ..s.   466.635852: softirq_exit: vec=3 [action=NET_RX]
     ksoftirqd/1-15    [001] d.H.   466.635856: irq_handler_entry: irq=126 name=eth0-TxRx-0
     ksoftirqd/1-15    [001] d.H.   466.635857: softirq_raise: vec=3 [action=NET_RX]
     ksoftirqd/1-15    [001] d.H.   466.635858: irq_handler_exit: irq=126 ret=handled
     ksoftirqd/1-15    [001] ..s.   466.635860: softirq_entry: vec=3 [action=NET_RX]
     ksoftirqd/1-15    [001] ..s.   466.635863: softirq_exit: vec=3 [action=NET_RX]

So, softirq was raised at 466.634917 but it started at 466.635826 almost
909 usec after it was raised.

If I move back to v4.4 kernel I still see similar behaviour but the maximum
delay I get is in the range of 500usec. But if I move back to v3.8 kernel I
can see there is no packet loss and the maximum delay between softirq_raise
and irq_handler_entry is 103usec.

Is this a known issue?
Will really appreciate your help in this problem.


--
Regards
Sudip


^ permalink raw reply

* [PATCH net 0/2] mlxsw: Two fixes
From: Ido Schimmel @ 2019-07-17 20:29 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, petrm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

This patchset contains two fixes for mlxsw.

Patch #1 from Petr fixes an issue in which DSCP rewrite can occur even
if the egress port was switched to Trust L2 mode where priority mapping
is based on PCP.

Patch #2 fixes a problem where packets can be learned on a non-existing
FID if a tc filter with a redirect action is configured on a bridged
port. The problem and fix are explained in detail in the commit message.

Please consider both patches for 5.2.y

Ido Schimmel (1):
  mlxsw: spectrum: Do not process learned records with a dummy FID

Petr Machata (1):
  mlxsw: spectrum_dcb: Configure DSCP map as the last rule is removed

 drivers/net/ethernet/mellanox/mlxsw/spectrum.h   |  1 +
 .../net/ethernet/mellanox/mlxsw/spectrum_dcb.c   | 16 ++++++++--------
 .../net/ethernet/mellanox/mlxsw/spectrum_fid.c   | 10 ++++++++++
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c |  6 ++++++
 4 files changed, 25 insertions(+), 8 deletions(-)

-- 
2.21.0


^ permalink raw reply

* [PATCH net 2/2] mlxsw: spectrum: Do not process learned records with a dummy FID
From: Ido Schimmel @ 2019-07-17 20:29 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, petrm, mlxsw, Ido Schimmel
In-Reply-To: <20190717202908.1547-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

The switch periodically sends notifications about learned FDB entries.
Among other things, the notification includes the FID (Filtering
Identifier) and the port on which the MAC was learned.

In case the driver does not have the FID defined on the relevant port,
the following error will be periodically generated:

mlxsw_spectrum2 0000:06:00.0 swp32: Failed to find a matching {Port, VID} following FDB notification

This is not supposed to happen under normal conditions, but can happen
if an ingress tc filter with a redirect action is installed on a bridged
port. The redirect action will cause the packet's FID to be changed to
the dummy FID and a learning notification will be emitted with this FID
- which is not defined on the bridged port.

Fix this by having the driver ignore learning notifications generated
with the dummy FID and delete them from the device.

Another option is to chain an ignore action after the redirect action
which will cause the device to disable learning, but this means that we
need to consume another action whenever a redirect action is used. In
addition, the scenario described above is merely a corner case.

Fixes: cedbb8b25948 ("mlxsw: spectrum_flower: Set dummy FID before forward action")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Alex Kushnarov <alexanderk@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h         |  1 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c     | 10 ++++++++++
 .../net/ethernet/mellanox/mlxsw/spectrum_switchdev.c   |  6 ++++++
 3 files changed, 17 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index a252b080dda9..131f62ce9297 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -830,6 +830,7 @@ int mlxsw_sp_setup_tc_prio(struct mlxsw_sp_port *mlxsw_sp_port,
 			   struct tc_prio_qopt_offload *p);
 
 /* spectrum_fid.c */
+bool mlxsw_sp_fid_is_dummy(struct mlxsw_sp *mlxsw_sp, u16 fid_index);
 bool mlxsw_sp_fid_lag_vid_valid(const struct mlxsw_sp_fid *fid);
 struct mlxsw_sp_fid *mlxsw_sp_fid_lookup_by_index(struct mlxsw_sp *mlxsw_sp,
 						  u16 fid_index);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
index 46baf3b44309..8df3cb21baa6 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
@@ -126,6 +126,16 @@ static const int *mlxsw_sp_packet_type_sfgc_types[] = {
 	[MLXSW_SP_FLOOD_TYPE_MC]	= mlxsw_sp_sfgc_mc_packet_types,
 };
 
+bool mlxsw_sp_fid_is_dummy(struct mlxsw_sp *mlxsw_sp, u16 fid_index)
+{
+	enum mlxsw_sp_fid_type fid_type = MLXSW_SP_FID_TYPE_DUMMY;
+	struct mlxsw_sp_fid_family *fid_family;
+
+	fid_family = mlxsw_sp->fid_core->fid_family_arr[fid_type];
+
+	return fid_family->start_index == fid_index;
+}
+
 bool mlxsw_sp_fid_lag_vid_valid(const struct mlxsw_sp_fid *fid)
 {
 	return fid->fid_family->lag_vid_valid;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 50111f228d77..5ecb45118400 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -2468,6 +2468,9 @@ static void mlxsw_sp_fdb_notify_mac_process(struct mlxsw_sp *mlxsw_sp,
 		goto just_remove;
 	}
 
+	if (mlxsw_sp_fid_is_dummy(mlxsw_sp, fid))
+		goto just_remove;
+
 	mlxsw_sp_port_vlan = mlxsw_sp_port_vlan_find_by_fid(mlxsw_sp_port, fid);
 	if (!mlxsw_sp_port_vlan) {
 		netdev_err(mlxsw_sp_port->dev, "Failed to find a matching {Port, VID} following FDB notification\n");
@@ -2527,6 +2530,9 @@ static void mlxsw_sp_fdb_notify_mac_lag_process(struct mlxsw_sp *mlxsw_sp,
 		goto just_remove;
 	}
 
+	if (mlxsw_sp_fid_is_dummy(mlxsw_sp, fid))
+		goto just_remove;
+
 	mlxsw_sp_port_vlan = mlxsw_sp_port_vlan_find_by_fid(mlxsw_sp_port, fid);
 	if (!mlxsw_sp_port_vlan) {
 		netdev_err(mlxsw_sp_port->dev, "Failed to find a matching {Port, VID} following FDB notification\n");
-- 
2.21.0


^ permalink raw reply related

* [PATCH net 1/2] mlxsw: spectrum_dcb: Configure DSCP map as the last rule is removed
From: Ido Schimmel @ 2019-07-17 20:29 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, petrm, mlxsw, Ido Schimmel
In-Reply-To: <20190717202908.1547-1-idosch@idosch.org>

From: Petr Machata <petrm@mellanox.com>

Spectrum systems use DSCP rewrite map to update DSCP field in egressing
packets to correspond to priority that the packet has. Whether rewriting
will take place is determined at the point when the packet ingresses the
switch: if the port is in Trust L3 mode, packet priority is determined from
the DSCP map at the port, and DSCP rewrite will happen. If the port is in
Trust L2 mode, 802.1p is used for packet prioritization, and no DSCP
rewrite will happen.

The driver determines the port trust mode based on whether any DSCP
prioritization rules are in effect at given port. If there are any, trust
level is L3, otherwise it's L2. When the last DSCP rule is removed, the
port is switched to trust L2. Under that scenario, if DSCP of a packet
should be rewritten, it should be rewritten to 0.

However, when switching to Trust L2, the driver neglects to also update the
DSCP rewrite map. The last DSCP rule thus remains in effect, and packets
egressing through this port, if they have the right priority, will have
their DSCP set according to this rule.

Fix by first configuring the rewrite map, and only then switching to trust
L2 and bailing out.

Fixes: b2b1dab6884e ("mlxsw: spectrum: Support ieee_setapp, ieee_delapp")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reported-by: Alex Veber <alexve@mellanox.com>
Tested-by: Alex Veber <alexve@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum_dcb.c   | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
index b25048c6c761..21296fa7f7fb 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
@@ -408,14 +408,6 @@ static int mlxsw_sp_port_dcb_app_update(struct mlxsw_sp_port *mlxsw_sp_port)
 	have_dscp = mlxsw_sp_port_dcb_app_prio_dscp_map(mlxsw_sp_port,
 							&prio_map);
 
-	if (!have_dscp) {
-		err = mlxsw_sp_port_dcb_toggle_trust(mlxsw_sp_port,
-					MLXSW_REG_QPTS_TRUST_STATE_PCP);
-		if (err)
-			netdev_err(mlxsw_sp_port->dev, "Couldn't switch to trust L2\n");
-		return err;
-	}
-
 	mlxsw_sp_port_dcb_app_dscp_prio_map(mlxsw_sp_port, default_prio,
 					    &dscp_map);
 	err = mlxsw_sp_port_dcb_app_update_qpdpm(mlxsw_sp_port,
@@ -432,6 +424,14 @@ static int mlxsw_sp_port_dcb_app_update(struct mlxsw_sp_port *mlxsw_sp_port)
 		return err;
 	}
 
+	if (!have_dscp) {
+		err = mlxsw_sp_port_dcb_toggle_trust(mlxsw_sp_port,
+					MLXSW_REG_QPTS_TRUST_STATE_PCP);
+		if (err)
+			netdev_err(mlxsw_sp_port->dev, "Couldn't switch to trust L2\n");
+		return err;
+	}
+
 	err = mlxsw_sp_port_dcb_toggle_trust(mlxsw_sp_port,
 					     MLXSW_REG_QPTS_TRUST_STATE_DSCP);
 	if (err) {
-- 
2.21.0


^ permalink raw reply related

* Re: [Patch net v2] net_sched: unset TCQ_F_CAN_BYPASS when adding filters
From: David Miller @ 2019-07-17 20:34 UTC (permalink / raw)
  To: edumazet; +Cc: xiyou.wangcong, netdev
In-Reply-To: <CANn89iKKidoeN8fwZLcem8BRK1FTJptwfYkO3Jn61ya7PKQLJA@mail.gmail.com>

From: Eric Dumazet <edumazet@google.com>
Date: Wed, 17 Jul 2019 21:19:51 +0200

> On Wed, Jul 17, 2019 at 9:04 PM David Miller <davem@davemloft.net> wrote:
>>
>> From: Cong Wang <xiyou.wangcong@gmail.com>
>> Date: Tue, 16 Jul 2019 13:57:30 -0700
>>
>> > For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS,
>> > notably fq_codel, it makes no sense to let packets bypass the TC
>> > filters we setup in any scenario, otherwise our packets steering
>> > policy could not be enforced.
>>  ...
>>
>> Eric I think your feedback was addressed, please review to confirm.
> 
> Yes, this seems good to me, thanks.
> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Great, applied and queued up for -stable.

^ permalink raw reply

* [PATCH net] ipv6: Unlink sibling route in case of failure
From: Ido Schimmel @ 2019-07-17 20:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, alexpe, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

When a route needs to be appended to an existing multipath route,
fib6_add_rt2node() first appends it to the siblings list and increments
the number of sibling routes on each sibling.

Later, the function notifies the route via call_fib6_entry_notifiers().
In case the notification is vetoed, the route is not unlinked from the
siblings list, which can result in a use-after-free.

Fix this by unlinking the route from the siblings list before returning
an error.

Audited the rest of the call sites from which the FIB notification chain
is called and could not find more problems.

Fixes: 2233000cba40 ("net/ipv6: Move call_fib6_entry_notifiers up for route adds")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
---
Dave, this will not apply cleanly to stable trees due to recent changes
in net-next. I can prepare another patch for stable if needed.
---
 net/ipv6/ip6_fib.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 49884f96232b..87f47bc55c5e 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1151,8 +1151,24 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt,
 			err = call_fib6_entry_notifiers(info->nl_net,
 							FIB_EVENT_ENTRY_ADD,
 							rt, extack);
-			if (err)
+			if (err) {
+				struct fib6_info *sibling, *next_sibling;
+
+				/* If the route has siblings, then it first
+				 * needs to be unlinked from them.
+				 */
+				if (!rt->fib6_nsiblings)
+					return err;
+
+				list_for_each_entry_safe(sibling, next_sibling,
+							 &rt->fib6_siblings,
+							 fib6_siblings)
+					sibling->fib6_nsiblings--;
+				rt->fib6_nsiblings = 0;
+				list_del_init(&rt->fib6_siblings);
+				rt6_multipath_rebalance(next_sibling);
 				return err;
+			}
 		}
 
 		rcu_assign_pointer(rt->fib6_next, iter);
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH bpf] bpf: fix narrower loads on s390
From: Ilya Leoshkevich @ 2019-07-17 20:51 UTC (permalink / raw)
  To: Y Song; +Cc: bpf, netdev, gor, heiko.carstens
In-Reply-To: <CAH3MdRV-qsJnyZVV1GnxRZ4=3KXTvKSgETp90fyevxycmAiHmA@mail.gmail.com>

> Am 17.07.2019 um 18:25 schrieb Y Song <ys114321@gmail.com>:
> 
> On Wed, Jul 17, 2019 at 3:36 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>> 
>> 
>> Here is a better one: len=0x11223344 and we would like to do
>> ((u8 *)&len)[3].
>> 
>> len is represented as `11 22 33 44` in memory, so the desired result is
>> 0x44. It can be obtained by doing (*(u32 *)&len) & 0xff, but today the
>> verifier does ((*(u32 *)&len) >> 24) & 0xff instead.
> 
> What you described above for the memory layout all makes sense.
> The root cause is for big endian, we should do *((u8 *)&len + 3).
> This is exactly what macros in test_pkt_md_access.c tries to do.
> 
> if  __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> #define TEST_FIELD(TYPE, FIELD, MASK)                                   \
>        {                                                               \
>                TYPE tmp = *(volatile TYPE *)&skb->FIELD;               \
>                if (tmp != ((*(volatile __u32 *)&skb->FIELD) & MASK))   \
>                        return TC_ACT_SHOT;                             \
>        }
> #else
> #define TEST_FIELD_OFFSET(a, b) ((sizeof(a) - sizeof(b)) / sizeof(b))
> #define TEST_FIELD(TYPE, FIELD, MASK)                                   \
>        {                                                               \
>                TYPE tmp = *((volatile TYPE *)&skb->FIELD +             \
>                              TEST_FIELD_OFFSET(skb->FIELD, TYPE));     \
>                if (tmp != ((*(volatile __u32 *)&skb->FIELD) & MASK))   \
>                        return TC_ACT_SHOT;                             \
>        }
> #endif
> 
> Could you check whether your __BYTE_ORDER__ is set
> correctly or not for this case? You may need to tweak Makefile
> if you are doing cross compilation, I am not sure how as I
> did not have environment.

I’m building natively on s390.

Here is the (formatted) preprocessed C code for the first condition:

{
	__u8 tmp = *((volatile __u8 *)&skb->len +
		((sizeof(skb->len) - sizeof(__u8)) / sizeof(__u8)));
	if (tmp != ((*(volatile __u32 *)&skb->len) & 0xFF)) return 2;
};

So I believe the endianness is chosen correctly.

Here is the clang-generated BPF bytecode for the first condition:

# llvm-objdump -d test_pkt_md_access.o
0000000000000000 process:
       0:	71 21 00 03 00 00 00 00	r2 = *(u8 *)(r1 + 3)
       1:	61 31 00 00 00 00 00 00	r3 = *(u32 *)(r1 + 0)
       2:	57 30 00 00 00 00 00 ff	r3 &= 255
       3:	5d 23 00 1d 00 00 00 00	if r2 != r3 goto +29 <LBB0_10>

This also looks good to me.

Finally, here is the verifier-generated BPF bytecode:

# bpftool prog dump xlated id 14
; TEST_FIELD(__u8,  len, 0xFF);
   0: (61) r2 = *(u32 *)(r1 +104)
   1: (bc) w2 = w2
   2: (74) w2 >>= 24
   3: (bc) w2 = w2
   4: (54) w2 &= 255
   5: (bc) w2 = w2

Here we can see the shift that I'm referring to. I believe we should
translate *(u8 *)(r1 + 3) in this case without this shift on big-endian
machines.

Best regards,
Ilya

^ permalink raw reply

* Re: regression with napi/softirq ?
From: Thomas Gleixner @ 2019-07-17 20:53 UTC (permalink / raw)
  To: Sudip Mukherjee; +Cc: peterz, davem, netdev, linux-kernel
In-Reply-To: <20190717201925.fur57qfs2x3ha6aq@debian>

On Wed, 17 Jul 2019, Sudip Mukherjee wrote:
> I am using v4.14.55 on an Intel Atom based board and I am seeing network
> packet drops frequently on wireshark logs. After lots of debugging it
> seems that when this happens softirq is taking huge time to start after
> it has been raised. This is a small snippet from ftrace:
> 
>            <...>-2110  [001] dNH1   466.634916: irq_handler_entry: irq=126 name=eth0-TxRx-0
>            <...>-2110  [001] dNH1   466.634917: softirq_raise: vec=3 [action=NET_RX]
>            <...>-2110  [001] dNH1   466.634918: irq_handler_exit: irq=126 ret=handled
>      ksoftirqd/1-15    [001] ..s.   466.635826: softirq_entry: vec=3 [action=NET_RX]
>      ksoftirqd/1-15    [001] ..s.   466.635852: softirq_exit: vec=3 [action=NET_RX]
>      ksoftirqd/1-15    [001] d.H.   466.635856: irq_handler_entry: irq=126 name=eth0-TxRx-0
>      ksoftirqd/1-15    [001] d.H.   466.635857: softirq_raise: vec=3 [action=NET_RX]
>      ksoftirqd/1-15    [001] d.H.   466.635858: irq_handler_exit: irq=126 ret=handled
>      ksoftirqd/1-15    [001] ..s.   466.635860: softirq_entry: vec=3 [action=NET_RX]
>      ksoftirqd/1-15    [001] ..s.   466.635863: softirq_exit: vec=3 [action=NET_RX]
> 
> So, softirq was raised at 466.634917 but it started at 466.635826 almost
> 909 usec after it was raised.

This is a situation where the network softirq decided to delegate softirq
processing to ksoftirqd. That happens when too much work is available while
processing softirqs on return from interrupt.

That means that softirq processing happens under scheduler control. So if
there are other runnable tasks on the same CPU ksoftirqd can be delayed
until their time slice expired. As a consequence ksoftirqd might not be
able to catch up with the incoming packet flood and the NIC starts to drop.

You can hack ksoftirq_running() to return always false to avoid this, but
that might cause application starvation and a huge packet buffer backlog
when the amount of incoming packets makes the CPU do nothing else than
softirq processing.

Thanks,

	tglx




^ permalink raw reply

* phylink: flow control on fixed-link not working.
From: René van Dorst @ 2019-07-17 21:31 UTC (permalink / raw)
  To: netdev; +Cc: Russell King

Hi,

I am trying to enable flow control/pause on PHYLINK and fixed-link.

My setup SOC mac (mt7621) <-> RGMII <-> SWITCH mac (mt7530).

It seems that in fixed-link mode all the flow control/pause bits are  
cleared in
phylink_parse_fixedlink(). If I read phylink_parse_fixedlink() [0] correctly,
I see that pl->link_config.advertising is AND with pl->supprted which has only
the PHY_SETTING() modes bits set. pl->link_config.advertising is losing Pause
bits. pl->link_config.advertising is used in phylink_resolve_flow() to set the
MLO_PAUSE_RX/TX BITS.

I think this is an error.
Because in phylink_start() see this part [1].

  /* Apply the link configuration to the MAC when starting. This allows
   * a fixed-link to start with the correct parameters, and also
   * ensures that we set the appropriate advertisement for Serdes links.
   */
  phylink_resolve_flow(pl, &pl->link_config);
  phylink_mac_config(pl, &pl->link_config);


If I add a this hacky patch below, flow control is enabled on the fixed-link.
         if (s) {
                 __set_bit(s->bit, pl->supported);
+               if (phylink_test(pl->link_config.advertising, Pause))
+                       phylink_set(pl->supported, Pause);
         } else {

So is phylink_parse_fixedlink() broken or should it handled in a other way?

Greats,

René

[0]:  
https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/phylink.c#L196
[1]:  
https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/phylink.c#L897


^ permalink raw reply

* Re: regression with napi/softirq ?
From: Sudip Mukherjee @ 2019-07-17 21:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra (Intel), David S. Miller, netdev, linux-kernel
In-Reply-To: <alpine.DEB.2.21.1907172238490.1778@nanos.tec.linutronix.de>

Hi,

On Wed, Jul 17, 2019 at 9:53 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Wed, 17 Jul 2019, Sudip Mukherjee wrote:
> > I am using v4.14.55 on an Intel Atom based board and I am seeing network
> > packet drops frequently on wireshark logs. After lots of debugging it
> > seems that when this happens softirq is taking huge time to start after
> > it has been raised. This is a small snippet from ftrace:
> >
> >            <...>-2110  [001] dNH1   466.634916: irq_handler_entry: irq=126 name=eth0-TxRx-0
> >            <...>-2110  [001] dNH1   466.634917: softirq_raise: vec=3 [action=NET_RX]
> >            <...>-2110  [001] dNH1   466.634918: irq_handler_exit: irq=126 ret=handled
> >      ksoftirqd/1-15    [001] ..s.   466.635826: softirq_entry: vec=3 [action=NET_RX]
> >      ksoftirqd/1-15    [001] ..s.   466.635852: softirq_exit: vec=3 [action=NET_RX]
> >      ksoftirqd/1-15    [001] d.H.   466.635856: irq_handler_entry: irq=126 name=eth0-TxRx-0
> >      ksoftirqd/1-15    [001] d.H.   466.635857: softirq_raise: vec=3 [action=NET_RX]
> >      ksoftirqd/1-15    [001] d.H.   466.635858: irq_handler_exit: irq=126 ret=handled
> >      ksoftirqd/1-15    [001] ..s.   466.635860: softirq_entry: vec=3 [action=NET_RX]
> >      ksoftirqd/1-15    [001] ..s.   466.635863: softirq_exit: vec=3 [action=NET_RX]
> >
> > So, softirq was raised at 466.634917 but it started at 466.635826 almost
> > 909 usec after it was raised.
>
> This is a situation where the network softirq decided to delegate softirq
> processing to ksoftirqd. That happens when too much work is available while
> processing softirqs on return from interrupt.
>
> That means that softirq processing happens under scheduler control. So if
> there are other runnable tasks on the same CPU ksoftirqd can be delayed
> until their time slice expired. As a consequence ksoftirqd might not be
> able to catch up with the incoming packet flood and the NIC starts to drop.

Yes, and I see in the ftrace that there are many other userspace processes
getting scheduled in that time.

>
> You can hack ksoftirq_running() to return always false to avoid this, but
> that might cause application starvation and a huge packet buffer backlog
> when the amount of incoming packets makes the CPU do nothing else than
> softirq processing.

I tried that now, it is better but still not as good as v3.8
Now I am getting 375.9usec as the maximum time between raising the softirq
and it starting to execute and packet drops still there.

And just a thought, do you think there should be a CONFIG_ option for
this feature
of ksoftirqd_running() so that it can be disabled if needed by users like us?

Can you please think of anything else that might have changed which I still need
to change to make the time comparable to v3.8..


-- 
Regards
Sudip

^ permalink raw reply

* Re: [PATCH RFC 0/4] Add support to directly attach BPF program to ftrace
From: Alexei Starovoitov @ 2019-07-17 21:40 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: LKML, bpf, Daniel Borkmann, Android Kernel Team,
	Network Development, Steven Rostedt
In-Reply-To: <20190717130119.GA138030@google.com>

On Wed, Jul 17, 2019 at 6:01 AM Joel Fernandes <joel@joelfernandes.org> wrote:

I trimmed cc. some emails were bouncing.

> > I think allowing one tracepoint and disallowing another is pointless
> > from security point of view. Tracing bpf program can do bpf_probe_read
> > of anything.
>
> I think the assumption here is the user controls the program instructions at
> runtime, but that's not the case. The BPF program we are loading is not
> dynamically generated, it is built at build time and it is loaded from a
> secure verified partition, so even though it can do bpf_probe_read, it is
> still not something that the user can change.

so you're saying that by having a set of signed bpf programs which
instructions are known to be non-malicious and allowed set of tracepoints
to attach via selinux whitelist, such setup will be safe?
Have you considered how mix and match will behave?

> And, we are planning to make it
> even more secure by making it kernel verify the program at load time as well
> (you were on some discussions about that a few months ago).

It sounds like api decisions for this sticky raw_tp feature are
driven by security choices which are not actually secure.
I'm suggesting to avoid bringing up point of security as a reason for
this api design, since it's making the opposite effect.

^ permalink raw reply

* [Patch net v3 0/2] ipv4: relax source validation check for loopback packets
From: Cong Wang @ 2019-07-17 21:41 UTC (permalink / raw)
  To: netdev; +Cc: dsahern, Cong Wang

This patchset fixes a corner case when loopback packets get dropped
by rp_filter when we route them from veth to lo. Patch 1 is the fix
and patch 2 provides a simplified test case for this scenario.

Cong Wang (2):
  fib: relax source validation check for loopback packets
  selftests: add a test case for rp_filter

---
v3: use dummy1 instead of dummy0 in the test case
v2: remove a redundant if check and add a test case

 net/ipv4/fib_frontend.c                  |  5 ++++
 tools/testing/selftests/net/fib_tests.sh | 35 +++++++++++++++++++++++-
 2 files changed, 39 insertions(+), 1 deletion(-)

-- 
2.21.0


^ permalink raw reply

* [Patch net v3 2/2] selftests: add a test case for rp_filter
From: Cong Wang @ 2019-07-17 21:41 UTC (permalink / raw)
  To: netdev; +Cc: dsahern, Cong Wang
In-Reply-To: <20190717214159.25959-1-xiyou.wangcong@gmail.com>

Add a test case to simulate the loopback packet case fixed
in the previous patch.

This test gets passed after the fix:

IPv4 rp_filter tests
    TEST: rp_filter passes local packets                                [ OK ]
    TEST: rp_filter passes loopback packets                             [ OK ]

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 tools/testing/selftests/net/fib_tests.sh | 35 +++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 9457aaeae092..4465fc2dae14 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -9,12 +9,13 @@ ret=0
 ksft_skip=4
 
 # all tests in this script. Can be overridden with -t option
-TESTS="unregister down carrier nexthop ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw"
+TESTS="unregister down carrier nexthop ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw rp_filter"
 
 VERBOSE=0
 PAUSE_ON_FAIL=no
 PAUSE=no
 IP="ip -netns ns1"
+NS_EXEC="ip netns exec ns1"
 
 log_test()
 {
@@ -433,6 +434,37 @@ fib_carrier_test()
 	fib_carrier_unicast_test
 }
 
+fib_rp_filter_test()
+{
+	echo
+	echo "IPv4 rp_filter tests"
+
+	setup
+
+	set -e
+	$IP link set dev lo address 52:54:00:6a:c7:5e
+	$IP link set dummy0 address 52:54:00:6a:c7:5e
+	$IP link add dummy1 type dummy
+	$IP link set dummy1 address 52:54:00:6a:c7:5e
+	$IP link set dev dummy1 up
+	$NS_EXEC sysctl -qw net.ipv4.conf.all.rp_filter=1
+	$NS_EXEC sysctl -qw net.ipv4.conf.all.accept_local=1
+	$NS_EXEC sysctl -qw net.ipv4.conf.all.route_localnet=1
+
+	$NS_EXEC tc qd add dev dummy1 parent root handle 1: fq_codel
+	$NS_EXEC tc filter add dev dummy1 parent 1: protocol arp basic action mirred egress redirect dev lo
+	$NS_EXEC tc filter add dev dummy1 parent 1: protocol ip basic action mirred egress redirect dev lo
+	set +e
+
+	run_cmd "ip netns exec ns1 ping -I dummy1 -w1 -c1 198.51.100.1"
+	log_test $? 0 "rp_filter passes local packets"
+
+	run_cmd "ip netns exec ns1 ping -I dummy1 -w1 -c1 127.0.0.1"
+	log_test $? 0 "rp_filter passes loopback packets"
+
+	cleanup
+}
+
 ################################################################################
 # Tests on nexthop spec
 
@@ -1557,6 +1589,7 @@ do
 	fib_unreg_test|unregister)	fib_unreg_test;;
 	fib_down_test|down)		fib_down_test;;
 	fib_carrier_test|carrier)	fib_carrier_test;;
+	fib_rp_filter_test|rp_filter)	fib_rp_filter_test;;
 	fib_nexthop_test|nexthop)	fib_nexthop_test;;
 	ipv6_route_test|ipv6_rt)	ipv6_route_test;;
 	ipv4_route_test|ipv4_rt)	ipv4_route_test;;
-- 
2.21.0


^ permalink raw reply related

* [Patch net v3 1/2] fib: relax source validation check for loopback packets
From: Cong Wang @ 2019-07-17 21:41 UTC (permalink / raw)
  To: netdev; +Cc: dsahern, Cong Wang, Julian Anastasov
In-Reply-To: <20190717214159.25959-1-xiyou.wangcong@gmail.com>

In a rare case where we redirect local packets from veth to lo,
these packets fail to pass the source validation when rp_filter
is turned on, as the tracing shows:

  <...>-311708 [040] ..s1 7951180.957825: fib_table_lookup: table 254 oif 0 iif 1 src 10.53.180.130 dst 10.53.180.130 tos 0 scope 0 flags 0
  <...>-311708 [040] ..s1 7951180.957826: fib_table_lookup_nh: nexthop dev eth0 oif 4 src 10.53.180.130

So, the fib table lookup returns eth0 as the nexthop even though
the packets are local and should be routed to loopback nonetheless,
but they can't pass the dev match check in fib_info_nh_uses_dev()
without this patch.

It should be safe to relax this check for this special case, as
normally packets coming out of loopback device still have skb_dst
so they won't even hit this slow path.

Cc: Julian Anastasov <ja@ssi.bg>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/ipv4/fib_frontend.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 317339cd7f03..e8bc939b56dd 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -388,6 +388,11 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
 	fib_combine_itag(itag, &res);
 
 	dev_match = fib_info_nh_uses_dev(res.fi, dev);
+	/* This is not common, loopback packets retain skb_dst so normally they
+	 * would not even hit this slow path.
+	 */
+	dev_match = dev_match || (res.type == RTN_LOCAL &&
+				  dev == net->loopback_dev);
 	if (dev_match) {
 		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
 		return ret;
-- 
2.21.0


^ permalink raw reply related

* [net  1/1] tipc: initialize 'validated' field of received packets
From: Jon Maloy @ 2019-07-17 21:43 UTC (permalink / raw)
  To: davem, netdev
  Cc: gordan.mihaljevic, tung.q.nguyen, hoang.h.le, jon.maloy,
	canh.d.luu, ying.xue, tipc-discussion

The tipc_msg_validate() function leaves a boolean flag 'validated' in
the validated buffer's control block, to avoid performing this action
more than once. However, at reception of new packets, the position of
this field may already have been set by lower layer protocols, so
that the packet is erroneously perceived as already validated by TIPC.

We fix this by initializing the said field to 'false' before performing
the initial validation.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/node.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/tipc/node.c b/net/tipc/node.c
index 324a1f9..3a5be1d 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1807,6 +1807,7 @@ void tipc_rcv(struct net *net, struct sk_buff *skb, struct tipc_bearer *b)
 	__skb_queue_head_init(&xmitq);
 
 	/* Ensure message is well-formed before touching the header */
+	TIPC_SKB_CB(skb)->validated = false;
 	if (unlikely(!tipc_msg_validate(&skb)))
 		goto discard;
 	hdr = buf_msg(skb);
-- 
2.1.4


^ permalink raw reply related

* Re: phylink: flow control on fixed-link not working.
From: Russell King - ARM Linux admin @ 2019-07-17 21:51 UTC (permalink / raw)
  To: René van Dorst; +Cc: netdev
In-Reply-To: <20190717213111.Horde.nir2D5kAJww569fjh8BZgZm@www.vdorst.com>

On Wed, Jul 17, 2019 at 09:31:11PM +0000, René van Dorst wrote:
> Hi,
> 
> I am trying to enable flow control/pause on PHYLINK and fixed-link.
> 
> My setup SOC mac (mt7621) <-> RGMII <-> SWITCH mac (mt7530).
> 
> It seems that in fixed-link mode all the flow control/pause bits are cleared
> in
> phylink_parse_fixedlink(). If I read phylink_parse_fixedlink() [0] correctly,
> I see that pl->link_config.advertising is AND with pl->supprted which has only
> the PHY_SETTING() modes bits set. pl->link_config.advertising is losing Pause
> bits. pl->link_config.advertising is used in phylink_resolve_flow() to set the
> MLO_PAUSE_RX/TX BITS.
> 
> I think this is an error.
> Because in phylink_start() see this part [1].
> 
>  /* Apply the link configuration to the MAC when starting. This allows
>   * a fixed-link to start with the correct parameters, and also
>   * ensures that we set the appropriate advertisement for Serdes links.
>   */
>  phylink_resolve_flow(pl, &pl->link_config);
>  phylink_mac_config(pl, &pl->link_config);
> 
> 
> If I add a this hacky patch below, flow control is enabled on the fixed-link.
>         if (s) {
>                 __set_bit(s->bit, pl->supported);
> +               if (phylink_test(pl->link_config.advertising, Pause))
> +                       phylink_set(pl->supported, Pause);
>         } else {
> 
> So is phylink_parse_fixedlink() broken or should it handled in a other way?

Quite simply, if the MAC says it doesn't support pause modes (i.o.w.
the validate callback clears them) then pause modes aren't supported.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* Re: regression with napi/softirq ?
From: Thomas Gleixner @ 2019-07-17 21:52 UTC (permalink / raw)
  To: Sudip Mukherjee
  Cc: Peter Zijlstra (Intel), David S. Miller, netdev, linux-kernel
In-Reply-To: <CADVatmO_m-NYotb9Htd7gS0d2-o0DeEWeDJ1uYKE+oj_HjoN0Q@mail.gmail.com>

Sudip,

On Wed, 17 Jul 2019, Sudip Mukherjee wrote:
> On Wed, Jul 17, 2019 at 9:53 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > You can hack ksoftirq_running() to return always false to avoid this, but
> > that might cause application starvation and a huge packet buffer backlog
> > when the amount of incoming packets makes the CPU do nothing else than
> > softirq processing.
> 
> I tried that now, it is better but still not as good as v3.8
> Now I am getting 375.9usec as the maximum time between raising the softirq
> and it starting to execute and packet drops still there.
>
> And just a thought, do you think there should be a CONFIG_ option for
> this feature of ksoftirqd_running() so that it can be disabled if needed
> by users like us?

If at all then a sysctl to allow runtime control.
 
> Can you please think of anything else that might have changed which I still need
> to change to make the time comparable to v3.8..

Something with in that small range of:

 63592 files changed, 13783320 insertions(+), 5155492 deletions(-)

:)

Seriously, that can be anything.

Can you please test with Linus' head of tree and add some more
instrumentation, so we can see what holds off softirqs from being
processed. If the ksoftirqd enforcement is disabled, then the only reason
can be a long lasting softirq disabled region. Tracing should tell.

Thanks,

	tglx

^ permalink raw reply

* Re: [Patch net v3 2/2] selftests: add a test case for rp_filter
From: David Ahern @ 2019-07-17 21:54 UTC (permalink / raw)
  To: Cong Wang, netdev
In-Reply-To: <20190717214159.25959-3-xiyou.wangcong@gmail.com>

On 7/17/19 3:41 PM, Cong Wang wrote:
> Add a test case to simulate the loopback packet case fixed
> in the previous patch.
> 
> This test gets passed after the fix:
> 
> IPv4 rp_filter tests
>     TEST: rp_filter passes local packets                                [ OK ]
>     TEST: rp_filter passes loopback packets                             [ OK ]
> 
> Cc: David Ahern <dsahern@gmail.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  tools/testing/selftests/net/fib_tests.sh | 35 +++++++++++++++++++++++-
>  1 file changed, 34 insertions(+), 1 deletion(-)
> 

Thanks for adding the test

Reviewed-by: David Ahern <dsahern@gmail.com>


^ permalink raw reply

* Re: [Patch net v3 1/2] fib: relax source validation check for loopback packets
From: David Ahern @ 2019-07-17 21:59 UTC (permalink / raw)
  To: Cong Wang, netdev; +Cc: Julian Anastasov
In-Reply-To: <20190717214159.25959-2-xiyou.wangcong@gmail.com>

On 7/17/19 3:41 PM, Cong Wang wrote:
> In a rare case where we redirect local packets from veth to lo,
> these packets fail to pass the source validation when rp_filter
> is turned on, as the tracing shows:
> 
>   <...>-311708 [040] ..s1 7951180.957825: fib_table_lookup: table 254 oif 0 iif 1 src 10.53.180.130 dst 10.53.180.130 tos 0 scope 0 flags 0
>   <...>-311708 [040] ..s1 7951180.957826: fib_table_lookup_nh: nexthop dev eth0 oif 4 src 10.53.180.130
> 
> So, the fib table lookup returns eth0 as the nexthop even though
> the packets are local and should be routed to loopback nonetheless,
> but they can't pass the dev match check in fib_info_nh_uses_dev()
> without this patch.
> 
> It should be safe to relax this check for this special case, as
> normally packets coming out of loopback device still have skb_dst
> so they won't even hit this slow path.
> 
> Cc: Julian Anastasov <ja@ssi.bg>
> Cc: David Ahern <dsahern@gmail.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  net/ipv4/fib_frontend.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 

Seems ok to me.
Reviewed-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply

* [PATCH] net: bcmgenet: use promisc for unsupported filters
From: justinpopo6 @ 2019-07-17 21:58 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, bcm-kernel-feedback-list, davem, f.fainelli,
	opendmb, Justin Chen

From: Justin Chen <justinpopo6@gmail.com>

Currently we silently ignore filters if we cannot meet the filter
requirements. This will lead to the MAC dropping packets that are
expected to pass. A better solution would be to set the NIC to promisc
mode when the required filters cannot be met.

Also correct the number of MDF filters supported. It should be 17,
not 16.

Signed-off-by: Justin Chen <justinpopo6@gmail.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 57 ++++++++++++--------------
 1 file changed, 26 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 34466b8..a2b5780 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -3083,39 +3083,42 @@ static void bcmgenet_timeout(struct net_device *dev)
 	netif_tx_wake_all_queues(dev);
 }
 
-#define MAX_MC_COUNT	16
+#define MAX_MDF_FILTER	17
 
 static inline void bcmgenet_set_mdf_addr(struct bcmgenet_priv *priv,
 					 unsigned char *addr,
-					 int *i,
-					 int *mc)
+					 int *i)
 {
-	u32 reg;
-
 	bcmgenet_umac_writel(priv, addr[0] << 8 | addr[1],
 			     UMAC_MDF_ADDR + (*i * 4));
 	bcmgenet_umac_writel(priv, addr[2] << 24 | addr[3] << 16 |
 			     addr[4] << 8 | addr[5],
 			     UMAC_MDF_ADDR + ((*i + 1) * 4));
-	reg = bcmgenet_umac_readl(priv, UMAC_MDF_CTRL);
-	reg |= (1 << (MAX_MC_COUNT - *mc));
-	bcmgenet_umac_writel(priv, reg, UMAC_MDF_CTRL);
 	*i += 2;
-	(*mc)++;
 }
 
 static void bcmgenet_set_rx_mode(struct net_device *dev)
 {
 	struct bcmgenet_priv *priv = netdev_priv(dev);
 	struct netdev_hw_addr *ha;
-	int i, mc;
+	int i, nfilter;
 	u32 reg;
 
 	netif_dbg(priv, hw, dev, "%s: %08X\n", __func__, dev->flags);
 
-	/* Promiscuous mode */
+	/* Number of filters needed */
+	nfilter = netdev_uc_count(dev) + netdev_mc_count(dev) + 2;
+
+	/*
+	 * Turn on promicuous mode for three scenarios
+	 * 1. IFF_PROMISC flag is set
+	 * 2. IFF_ALLMULTI flag is set
+	 * 3. The number of filters needed exceeds the number filters
+	 *    supported by the hardware.
+	*/
 	reg = bcmgenet_umac_readl(priv, UMAC_CMD);
-	if (dev->flags & IFF_PROMISC) {
+	if ((dev->flags & (IFF_PROMISC | IFF_ALLMULTI)) ||
+	    (nfilter > MAX_MDF_FILTER)) {
 		reg |= CMD_PROMISC;
 		bcmgenet_umac_writel(priv, reg, UMAC_CMD);
 		bcmgenet_umac_writel(priv, 0, UMAC_MDF_CTRL);
@@ -3125,32 +3128,24 @@ static void bcmgenet_set_rx_mode(struct net_device *dev)
 		bcmgenet_umac_writel(priv, reg, UMAC_CMD);
 	}
 
-	/* UniMac doesn't support ALLMULTI */
-	if (dev->flags & IFF_ALLMULTI) {
-		netdev_warn(dev, "ALLMULTI is not supported\n");
-		return;
-	}
-
 	/* update MDF filter */
 	i = 0;
-	mc = 0;
 	/* Broadcast */
-	bcmgenet_set_mdf_addr(priv, dev->broadcast, &i, &mc);
+	bcmgenet_set_mdf_addr(priv, dev->broadcast, &i);
 	/* my own address.*/
-	bcmgenet_set_mdf_addr(priv, dev->dev_addr, &i, &mc);
-	/* Unicast list*/
-	if (netdev_uc_count(dev) > (MAX_MC_COUNT - mc))
-		return;
+	bcmgenet_set_mdf_addr(priv, dev->dev_addr, &i);
 
-	if (!netdev_uc_empty(dev))
-		netdev_for_each_uc_addr(ha, dev)
-			bcmgenet_set_mdf_addr(priv, ha->addr, &i, &mc);
-	/* Multicast */
-	if (netdev_mc_empty(dev) || netdev_mc_count(dev) >= (MAX_MC_COUNT - mc))
-		return;
+	/* Unicast */
+	netdev_for_each_uc_addr(ha, dev)
+		bcmgenet_set_mdf_addr(priv, ha->addr, &i);
 
+	/* Multicast */
 	netdev_for_each_mc_addr(ha, dev)
-		bcmgenet_set_mdf_addr(priv, ha->addr, &i, &mc);
+		bcmgenet_set_mdf_addr(priv, ha->addr, &i);
+
+	/* Enable filters */
+	reg = GENMASK(MAX_MDF_FILTER - 1, MAX_MDF_FILTER - nfilter);
+	bcmgenet_umac_writel(priv, reg, UMAC_MDF_CTRL);
 }
 
 /* Set the hardware MAC address. */
-- 
2.7.4


^ permalink raw reply related

* [PATCH net] ipv6: rt6_check should return NULL if 'from' is NULL
From: David Ahern @ 2019-07-17 22:08 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-kernel, David Ahern

From: David Ahern <dsahern@gmail.com>

Paul reported that l2tp sessions were broken after the commit referenced
in the Fixes tag. Prior to this commit rt6_check returned NULL if the
rt6_info 'from' was NULL - ie., the dst_entry was disconnected from a FIB
entry. Restore that behavior.

Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Paul Donohue <linux-kernel@PaulSD.com>
Tested-by: Paul Donohue <linux-kernel@PaulSD.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
---
 net/ipv6/route.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4d2e6b31a8d6..6fe3097b9ab7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2563,7 +2563,7 @@ static struct dst_entry *rt6_check(struct rt6_info *rt,
 {
 	u32 rt_cookie = 0;
 
-	if ((from && !fib6_get_cookie_safe(from, &rt_cookie)) ||
+	if (!from || !fib6_get_cookie_safe(from, &rt_cookie) ||
 	    rt_cookie != cookie)
 		return NULL;
 
-- 
2.11.0


^ permalink raw reply related

* Re: [PATCH net,v3 1/4] net: openvswitch: rename flow_stats to sw_flow_stats
From: David Miller @ 2019-07-17 22:16 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev, jiri, jakub.kicinski
In-Reply-To: <20190717194248.2522-1-pablo@netfilter.org>


What is this series doing?

Where is your "0/4" cover letter which would tell us this?

Also:

> OVS compilation breaks here after this patchset since flow_stats
> structure is already defined in include/net/flow_offload.h. This patch
> is new in this batch.

You need to explain in more detail and in more context what this
means.  Does the build break when this patch is applies?  If so, why
is that OK?

I'm tossing this series until you submit it properly.

^ permalink raw reply

* Re: [PATCHv2] net: ag71xx: Add missing header
From: David Miller @ 2019-07-17 22:17 UTC (permalink / raw)
  To: rosenp; +Cc: netdev
In-Reply-To: <20190717194645.24239-1-rosenp@gmail.com>

From: Rosen Penev <rosenp@gmail.com>
Date: Wed, 17 Jul 2019 12:46:45 -0700

> ag71xx uses devm_ioremap_nocache. This fixes usage of an implicit function
> 
> Fixes: d51b6ce441d356369387d20bc1de5f2edb0ab71e
> 
> Signed-off-by: Rosen Penev <rosenp@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net 0/2] mlxsw: Two fixes
From: David Miller @ 2019-07-17 22:20 UTC (permalink / raw)
  To: idosch; +Cc: netdev, jiri, petrm, mlxsw, idosch
In-Reply-To: <20190717202908.1547-1-idosch@idosch.org>

From: Ido Schimmel <idosch@idosch.org>
Date: Wed, 17 Jul 2019 23:29:06 +0300

> From: Ido Schimmel <idosch@mellanox.com>
> 
> This patchset contains two fixes for mlxsw.
> 
> Patch #1 from Petr fixes an issue in which DSCP rewrite can occur even
> if the egress port was switched to Trust L2 mode where priority mapping
> is based on PCP.
> 
> Patch #2 fixes a problem where packets can be learned on a non-existing
> FID if a tc filter with a redirect action is configured on a bridged
> port. The problem and fix are explained in detail in the commit message.

Series applied.

> Please consider both patches for 5.2.y

I'll queue them up for -stable, thanks.

^ permalink raw reply

* Re: [Patch net v3 0/2] ipv4: relax source validation check for loopback packets
From: David Miller @ 2019-07-17 22:23 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, dsahern
In-Reply-To: <20190717214159.25959-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Wed, 17 Jul 2019 14:41:57 -0700

> This patchset fixes a corner case when loopback packets get dropped
> by rp_filter when we route them from veth to lo. Patch 1 is the fix
> and patch 2 provides a simplified test case for this scenario.

Series applied, thanks Cong.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox