Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] sit: use dst_cache in ipip6_tunnel_xmit
From: Haishuang Yan @ 2019-07-14 13:31 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI
  Cc: netdev, linux-kernel, Haishuang Yan

Same as other ip tunnel, use dst_cache in xmit action to avoid
unnecessary fib lookups.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 net/ipv6/sit.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 8061089..b2ccbc4 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -900,12 +900,17 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 			   RT_TOS(tos), RT_SCOPE_UNIVERSE, IPPROTO_IPV6,
 			   0, dst, tiph->saddr, 0, 0,
 			   sock_net_uid(tunnel->net, NULL));
-	rt = ip_route_output_flow(tunnel->net, &fl4, NULL);
 
-	if (IS_ERR(rt)) {
-		dev->stats.tx_carrier_errors++;
-		goto tx_error_icmp;
+	rt = dst_cache_get_ip4(&tunnel->dst_cache, &fl4.saddr);
+	if (!rt) {
+		rt = ip_route_output_flow(tunnel->net, &fl4, NULL);
+		if (IS_ERR(rt)) {
+			dev->stats.tx_carrier_errors++;
+			goto tx_error_icmp;
+		}
+		dst_cache_set_ip4(&tunnel->dst_cache, &rt->dst, fl4.saddr);
 	}
+
 	if (rt->rt_type != RTN_UNICAST) {
 		ip_rt_put(rt);
 		dev->stats.tx_carrier_errors++;
-- 
1.8.3.1




^ permalink raw reply related

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
From: Ido Schimmel @ 2019-07-14 12:43 UTC (permalink / raw)
  To: Neil Horman
  Cc: David Miller, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, idosch
In-Reply-To: <20190714112904.GA5082@hmswarspite.think-freely.org>

On Sun, Jul 14, 2019 at 07:29:04AM -0400, Neil Horman wrote:
> On Fri, Jul 12, 2019 at 04:52:30PM +0300, Ido Schimmel wrote:
> > On Thu, Jul 11, 2019 at 07:53:54PM -0400, Neil Horman wrote:
> > > A few things here:
> > > IIRC we don't announce individual hardware drops, drivers record them in
> > > internal structures, and they are retrieved on demand via ethtool calls, so you
> > > will either need to include some polling (probably not a very performant idea),
> > > or some sort of flagging mechanism to indicate that on the next message sent to
> > > user space you should go retrieve hw stats from a given interface.  I certainly
> > > wouldn't mind seeing this happen, but its more work than just adding a new
> > > netlink message.
> > 
> > Neil,
> > 
> > The idea of this series is to pass the dropped packets themselves to
> > user space along with metadata, such as the drop reason and the ingress
> > port. In the future more metadata could be added thanks to the
> > extensible nature of netlink.
> > 
> I had experimented with this idea previously.  Specifically I had investigated
> the possibility of creating a dummy net_device that received only dropped
> packets so that utilities like tcpdump could monitor the interface for said
> packets along with the metadata that described where they dropped.
> 
> The concern I had was, as Dave mentioned, that you would wind up with either a
> head of line blocking issue, or simply lots of lost "dropped" packets due to
> queue overflow on receive, which kind of defeated the purpose of drop monitor.
> 
> That said, I like the idea, and if we can find a way around the fact that we
> could potentially receive way more dropped packets than we could bounce back to
> userspace, it would be a major improvement.

We don't necessarily need the entire packet, so we could add an option
to truncate them and get only the headers. tc-sample does this (see man
tc-sample).

Also, packets that are dropped for the same reason and belong to the
same flow are not necessarily interesting. We could add an eBPF filter
on the netlink socket which will only allow "unique" packets to be
enqueued. Where "unique" is defined as {drop reason, 5-tuple}. In the
case of SW drops "drop reason" is instruction pointer (call site of
kfree_skb()) and in the case of HW drops it's the drop reason provided
by the device. The duplicate copies will be counted in the eBPF map.

> 
> > In v1 these packets were notified to user space as devlink events
> > and my plan for v2 is to send them as drop_monitor events, given it's an
> > existing generic netlink channel used to monitor SW drops. This will
> > allow users to listen on one netlink channel to diagnose potential
> > problems in either SW or HW (and hopefully XDP in the future).
> > 
> Yeah, I'm supportive of that.
> 
> > Please note that the packets I'm talking about are packets users
> > currently do not see. They are dropped - potentially silently - by the
> > underlying device, thereby making it hard to debug whatever issues you
> > might be experiencing in your network.
> > 
> Right I get that, you want the ability to register a listener of sorts to
> monitor drops in hardware and report that back to user space as an drop even
> with a location that (instead of being a kernel address, is a 'special location'
> representing a hardware instance.  Makes sense.  Having that be a location +
> counter tuple would make sense, but I don't think we can pass the skb itself (as
> you mention above), without seeing significant loss.

Getting the skb itself will be an option users will need to enable in
drop_monitor. If it is not enabled, then default behavior is maintained
and you only get notifications that contain the location + counter tuple
you mentioned.

> 
> > The control path that determines if these packets are even sent to the
> > CPU from the HW needs to remain in devlink for the reasons I outlined in
> > my previous reply. However, the monitoring of these drops will be over
> > drop_monitor. This is similar to what we are currently doing with
> > tc-sample, where the control path that triggers the sampling and
> > determines the sampling rate and truncation is done over rtnetlink (tc),
> > but the sampled packets are notified over the generic netlink psample
> > channel.
> > 
> > To make it more real, you can check the example of the dissected devlink
> > message that notifies the drop of a packet due to a multicast source
> > MAC: https://marc.info/?l=linux-netdev&m=156248736710238&w=2
> > 
> > I will obviously have to create another Wireshark dissector for
> > drop_monitor in order to get the same information.
> > 
> yes, Of course.
> > > Thats an interesting idea, but dropwatch certainly isn't currently setup for
> > > that kind of messaging.  It may be worth creating a v2 of the netlink protocol
> > > and really thinking out what you want to communicate.
> > 
> > I don't think we need a v2 of the netlink protocol. My current plan is
> > to extend the existing protocol with: New message type (e.g.,
> > NET_DM_CMD_HW_ALERT), new multicast group and a set of attributes to
> > encode the information that is currently encoded in the example message
> > I pasted above.
> > 
> Ok, that makes sense.  I think we already do some very rudimentary version of
> that (see trace_napi_poll_hit).  Here we check the device we receive frames on
> to see if its rx_dropped count has increased, and if it has, store that as a
> drop count in the NULL location.  Thats obviously insufficient, but I wonder if
> its worth looking at using the dm_hw_stat_delta to encode and record those event
> for sending with your new message type.

Will check.

Thanks, Neil.

^ permalink raw reply

* Re: linux-next: Fixes tag needs some work in the net tree
From: Tariq Toukan @ 2019-07-14 12:17 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: David Miller, Networking, Linux Next Mailing List,
	Linux Kernel Mailing List, Saeed Mahameed
In-Reply-To: <20190714221511.7717d6de@canb.auug.org.au>



On 7/14/2019 3:15 PM, Stephen Rothwell wrote:
> Hi Tariq,
> 
> On Sun, 14 Jul 2019 07:55:48 +0000 Tariq Toukan <tariqt@mellanox.com> wrote:
>>
>> How do you think we should handle this?
> 
> Dave doesn't rebase his trees, so all you can really do is learn from
> it and not do it again :-)
> 

Sure.
My bad, used the SHA1 I had on my internal branch.

Thanks,
Tariq

^ permalink raw reply

* Re: linux-next: Fixes tag needs some work in the net tree
From: Stephen Rothwell @ 2019-07-14 12:15 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David Miller, Networking, Linux Next Mailing List,
	Linux Kernel Mailing List, Saeed Mahameed
In-Reply-To: <4f524361-9ea3-7c04-736d-d14fcb498178@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 272 bytes --]

Hi Tariq,

On Sun, 14 Jul 2019 07:55:48 +0000 Tariq Toukan <tariqt@mellanox.com> wrote:
>
> How do you think we should handle this?

Dave doesn't rebase his trees, so all you can really do is learn from
it and not do it again :-)

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [PATCH] gve: Remove the exporting of gve_probe
From: Denis Efremov @ 2019-07-14 12:02 UTC (permalink / raw)
  To: Catherine Sullivan
  Cc: Denis Efremov, Sagi Shahar, Jon Olson, David S. Miller, netdev,
	linux-kernel

The function gve_probe is declared static and marked EXPORT_SYMBOL, which
is at best an odd combination. Because the function is not used outside of
the drivers/net/ethernet/google/gve/gve_main.c file it is defined in, this
commit removes the EXPORT_SYMBOL() marking.

Signed-off-by: Denis Efremov <efremov@linux.com>
---
 drivers/net/ethernet/google/gve/gve_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 24f16e3368cd..e8ee8cac2bbf 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -1192,7 +1192,6 @@ static int gve_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	pci_disable_device(pdev);
 	return -ENXIO;
 }
-EXPORT_SYMBOL(gve_probe);
 
 static void gve_remove(struct pci_dev *pdev)
 {
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
From: Neil Horman @ 2019-07-14 11:29 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: David Miller, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, idosch
In-Reply-To: <20190712135230.GA13108@splinter>

On Fri, Jul 12, 2019 at 04:52:30PM +0300, Ido Schimmel wrote:
> On Thu, Jul 11, 2019 at 07:53:54PM -0400, Neil Horman wrote:
> > A few things here:
> > IIRC we don't announce individual hardware drops, drivers record them in
> > internal structures, and they are retrieved on demand via ethtool calls, so you
> > will either need to include some polling (probably not a very performant idea),
> > or some sort of flagging mechanism to indicate that on the next message sent to
> > user space you should go retrieve hw stats from a given interface.  I certainly
> > wouldn't mind seeing this happen, but its more work than just adding a new
> > netlink message.
> 
> Neil,
> 
> The idea of this series is to pass the dropped packets themselves to
> user space along with metadata, such as the drop reason and the ingress
> port. In the future more metadata could be added thanks to the
> extensible nature of netlink.
> 
I had experimented with this idea previously.  Specifically I had investigated
the possibility of creating a dummy net_device that received only dropped
packets so that utilities like tcpdump could monitor the interface for said
packets along with the metadata that described where they dropped.

The concern I had was, as Dave mentioned, that you would wind up with either a
head of line blocking issue, or simply lots of lost "dropped" packets due to
queue overflow on receive, which kind of defeated the purpose of drop monitor.

That said, I like the idea, and if we can find a way around the fact that we
could potentially receive way more dropped packets than we could bounce back to
userspace, it would be a major improvement.

> In v1 these packets were notified to user space as devlink events
> and my plan for v2 is to send them as drop_monitor events, given it's an
> existing generic netlink channel used to monitor SW drops. This will
> allow users to listen on one netlink channel to diagnose potential
> problems in either SW or HW (and hopefully XDP in the future).
> 
Yeah, I'm supportive of that.

> Please note that the packets I'm talking about are packets users
> currently do not see. They are dropped - potentially silently - by the
> underlying device, thereby making it hard to debug whatever issues you
> might be experiencing in your network.
> 
Right I get that, you want the ability to register a listener of sorts to
monitor drops in hardware and report that back to user space as an drop even
with a location that (instead of being a kernel address, is a 'special location'
representing a hardware instance.  Makes sense.  Having that be a location +
counter tuple would make sense, but I don't think we can pass the skb itself (as
you mention above), without seeing significant loss.

> The control path that determines if these packets are even sent to the
> CPU from the HW needs to remain in devlink for the reasons I outlined in
> my previous reply. However, the monitoring of these drops will be over
> drop_monitor. This is similar to what we are currently doing with
> tc-sample, where the control path that triggers the sampling and
> determines the sampling rate and truncation is done over rtnetlink (tc),
> but the sampled packets are notified over the generic netlink psample
> channel.
> 
> To make it more real, you can check the example of the dissected devlink
> message that notifies the drop of a packet due to a multicast source
> MAC: https://marc.info/?l=linux-netdev&m=156248736710238&w=2
> 
> I will obviously have to create another Wireshark dissector for
> drop_monitor in order to get the same information.
> 
yes, Of course.
> > Thats an interesting idea, but dropwatch certainly isn't currently setup for
> > that kind of messaging.  It may be worth creating a v2 of the netlink protocol
> > and really thinking out what you want to communicate.
> 
> I don't think we need a v2 of the netlink protocol. My current plan is
> to extend the existing protocol with: New message type (e.g.,
> NET_DM_CMD_HW_ALERT), new multicast group and a set of attributes to
> encode the information that is currently encoded in the example message
> I pasted above.
> 
Ok, that makes sense.  I think we already do some very rudimentary version of
that (see trace_napi_poll_hit).  Here we check the device we receive frames on
to see if its rx_dropped count has increased, and if it has, store that as a
drop count in the NULL location.  Thats obviously insufficient, but I wonder if
its worth looking at using the dm_hw_stat_delta to encode and record those event
for sending with your new message type.

> Thanks
> 

^ permalink raw reply

* [PATCH] sky2: Disable MSI on P5W DH Deluxe
From: Tasos Sahanidis @ 2019-07-14 10:31 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, davem, stephen, mlindner, tasos

The onboard sky2 NICs send IRQs after S3, resulting in ethernet not
working after resume.
Maskable MSI and MSI-X are also not supported, so fall back to INTx.

Signed-off-by: Tasos Sahanidis <tasos@tasossah.com>
---
 drivers/net/ethernet/marvell/sky2.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
index fe518c854..f518312ff 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -4917,6 +4917,13 @@ static const struct dmi_system_id msi_blacklist[] = {
 			DMI_MATCH(DMI_PRODUCT_NAME, "P-79"),
 		},
 	},
+	{
+		.ident = "ASUS P5W DH Deluxe",
+		.matches = {
+			DMI_MATCH(DMI_SYS_VENDOR, "ASUSTEK COMPUTER INC"),
+			DMI_MATCH(DMI_PRODUCT_NAME, "P5W DH Deluxe"),
+		},
+	},
 	{}
 };
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH net v2] net: neigh: fix multiple neigh timer scheduling
From: Lorenzo Bianconi @ 2019-07-14  8:45 UTC (permalink / raw)
  To: davem; +Cc: netdev, dsahern, marek

Neigh timer can be scheduled multiple times from userspace adding
multiple neigh entries and forcing the neigh timer scheduling passing
NTF_USE in the netlink requests.
This will result in a refcount leak and in the following dump stack:

[   32.465295] NEIGH: BUG, double timer add, state is 8
[   32.465308] CPU: 0 PID: 416 Comm: double_timer_ad Not tainted 5.2.0+ #65
[   32.465311] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
[   32.465313] Call Trace:
[   32.465318]  dump_stack+0x7c/0xc0
[   32.465323]  __neigh_event_send+0x20c/0x880
[   32.465326]  ? ___neigh_create+0x846/0xfb0
[   32.465329]  ? neigh_lookup+0x2a9/0x410
[   32.465332]  ? neightbl_fill_info.constprop.0+0x800/0x800
[   32.465334]  neigh_add+0x4f8/0x5e0
[   32.465337]  ? neigh_xmit+0x620/0x620
[   32.465341]  ? find_held_lock+0x85/0xa0
[   32.465345]  rtnetlink_rcv_msg+0x204/0x570
[   32.465348]  ? rtnl_dellink+0x450/0x450
[   32.465351]  ? mark_held_locks+0x90/0x90
[   32.465354]  ? match_held_lock+0x1b/0x230
[   32.465357]  netlink_rcv_skb+0xc4/0x1d0
[   32.465360]  ? rtnl_dellink+0x450/0x450
[   32.465363]  ? netlink_ack+0x420/0x420
[   32.465366]  ? netlink_deliver_tap+0x115/0x560
[   32.465369]  ? __alloc_skb+0xc9/0x2f0
[   32.465372]  netlink_unicast+0x270/0x330
[   32.465375]  ? netlink_attachskb+0x2f0/0x2f0
[   32.465378]  netlink_sendmsg+0x34f/0x5a0
[   32.465381]  ? netlink_unicast+0x330/0x330
[   32.465385]  ? move_addr_to_kernel.part.0+0x20/0x20
[   32.465388]  ? netlink_unicast+0x330/0x330
[   32.465391]  sock_sendmsg+0x91/0xa0
[   32.465394]  ___sys_sendmsg+0x407/0x480
[   32.465397]  ? copy_msghdr_from_user+0x200/0x200
[   32.465401]  ? _raw_spin_unlock_irqrestore+0x37/0x40
[   32.465404]  ? lockdep_hardirqs_on+0x17d/0x250
[   32.465407]  ? __wake_up_common_lock+0xcb/0x110
[   32.465410]  ? __wake_up_common+0x230/0x230
[   32.465413]  ? netlink_bind+0x3e1/0x490
[   32.465416]  ? netlink_setsockopt+0x540/0x540
[   32.465420]  ? __fget_light+0x9c/0xf0
[   32.465423]  ? sockfd_lookup_light+0x8c/0xb0
[   32.465426]  __sys_sendmsg+0xa5/0x110
[   32.465429]  ? __ia32_sys_shutdown+0x30/0x30
[   32.465432]  ? __fd_install+0xe1/0x2c0
[   32.465435]  ? lockdep_hardirqs_off+0xb5/0x100
[   32.465438]  ? mark_held_locks+0x24/0x90
[   32.465441]  ? do_syscall_64+0xf/0x270
[   32.465444]  do_syscall_64+0x63/0x270
[   32.465448]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fix the issue unscheduling neigh_timer if selected entry is in 'IN_TIMER'
receiving a netlink request with NTF_USE flag set

Reported-by: Marek Majkowski <marek@cloudflare.com>
Fixes: 0c5c2d308906 ("neigh: Allow for user space users of the neighbour table")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
---
Changes since v1:
- fix compilation errors defining neigh_event_send_check_timer routine
---
 include/net/neighbour.h | 15 ++++++++++++---
 net/core/neighbour.c    | 11 ++++++++---
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 50a67bd6a434..58bb515cea5d 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -324,7 +324,8 @@ static inline struct neighbour *neigh_create(struct neigh_table *tbl,
 	return __neigh_create(tbl, pkey, dev, true);
 }
 void neigh_destroy(struct neighbour *neigh);
-int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb);
+int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb,
+		       bool check_timer);
 int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, u32 flags,
 		 u32 nlmsg_pid);
 void __neigh_set_probe_once(struct neighbour *neigh);
@@ -435,17 +436,25 @@ static inline struct neighbour * neigh_clone(struct neighbour *neigh)
 
 #define neigh_hold(n)	refcount_inc(&(n)->refcnt)
 
-static inline int neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
+static inline int neigh_event_send_check_timer(struct neighbour *neigh,
+					       struct sk_buff *skb,
+					       bool check_timer)
 {
 	unsigned long now = jiffies;
 	
 	if (neigh->used != now)
 		neigh->used = now;
 	if (!(neigh->nud_state&(NUD_CONNECTED|NUD_DELAY|NUD_PROBE)))
-		return __neigh_event_send(neigh, skb);
+		return __neigh_event_send(neigh, skb, check_timer);
 	return 0;
 }
 
+static inline int neigh_event_send(struct neighbour *neigh,
+				   struct sk_buff *skb)
+{
+	return neigh_event_send_check_timer(neigh, skb, false);
+}
+
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 static inline int neigh_hh_bridge(struct hh_cache *hh, struct sk_buff *skb)
 {
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 742cea4ce72e..0b63df6146f7 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1104,7 +1104,8 @@ static void neigh_timer_handler(struct timer_list *t)
 	neigh_release(neigh);
 }
 
-int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
+int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb,
+		       bool check_timer)
 {
 	int rc;
 	bool immediate_probe = false;
@@ -1124,7 +1125,9 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
 
 			atomic_set(&neigh->probes,
 				   NEIGH_VAR(neigh->parms, UCAST_PROBES));
-			neigh->nud_state     = NUD_INCOMPLETE;
+			if (check_timer)
+				neigh_del_timer(neigh);
+			neigh->nud_state = NUD_INCOMPLETE;
 			neigh->updated = now;
 			next = now + max(NEIGH_VAR(neigh->parms, RETRANS_TIME),
 					 HZ/2);
@@ -1140,6 +1143,8 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
 		}
 	} else if (neigh->nud_state & NUD_STALE) {
 		neigh_dbg(2, "neigh %p is delayed\n", neigh);
+		if (check_timer)
+			neigh_del_timer(neigh);
 		neigh->nud_state = NUD_DELAY;
 		neigh->updated = jiffies;
 		neigh_add_timer(neigh, jiffies +
@@ -1962,7 +1967,7 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh,
 		flags |= NEIGH_UPDATE_F_ISROUTER;
 
 	if (ndm->ndm_flags & NTF_USE) {
-		neigh_event_send(neigh, NULL);
+		neigh_event_send_check_timer(neigh, NULL, true);
 		err = 0;
 	} else
 		err = __neigh_update(neigh, lladdr, ndm->ndm_state, flags,
-- 
2.21.0


^ permalink raw reply related

* Re: linux-next: Fixes tag needs some work in the net tree
From: Tariq Toukan @ 2019-07-14  7:55 UTC (permalink / raw)
  To: Stephen Rothwell, David Miller, Networking
  Cc: Linux Next Mailing List, Linux Kernel Mailing List,
	Saeed Mahameed
In-Reply-To: <20190712165042.01745c65@canb.auug.org.au>



On 7/12/2019 9:50 AM, Stephen Rothwell wrote:
> Hi all,
> 
> In commit
> 
>    c93dfec10f1d ("net/mlx5e: Fix compilation error in TLS code")
> 
> Fixes tag
> 
>    Fixes: 90687e1a9a50 ("net/mlx5: Kconfig, Better organize compilation flags")
> 
> has these problem(s):
> 
>    - Target SHA1 does not exist
> 
> Did you mean
> 
> Fixes: e2869fb2068b ("net/mlx5: Kconfig, Better organize compilation flags")
> 

Right.
Thank you Stephen!
How do you think we should handle this?

Tariq

^ permalink raw reply

* Re: [Patch net] fib: relax source validation check for loopback packets
From: Cong Wang @ 2019-07-14  6:33 UTC (permalink / raw)
  To: David Ahern; +Cc: Linux Kernel Network Developers, Julian Anastasov
In-Reply-To: <CAM_iQpUd44ctMmtGrr4x_uA9UUxUdTzS-3tuySt2-jhM0y950A@mail.gmail.com>

On Sat, Jul 13, 2019 at 11:30 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> It's complicated, Mesos network isolation uses this case:
> https://cgit.twitter.biz/mesos/tree/src/slave/containerizer/mesos/isolators/network/port_mapping.cpp

Oops, please use the open source link instead:
https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/isolators/network/port_mapping.cpp

^ permalink raw reply

* Re: [Patch net] fib: relax source validation check for loopback packets
From: Cong Wang @ 2019-07-14  6:30 UTC (permalink / raw)
  To: David Ahern; +Cc: Linux Kernel Network Developers, Julian Anastasov
In-Reply-To: <8355af23-100f-a3bb-0759-fca8b0aa583b@gmail.com>

On Sat, Jul 13, 2019 at 3:42 PM David Ahern <dsahern@gmail.com> wrote:
>
> On 7/12/19 2:17 PM, Cong Wang wrote:
> > diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> > index 317339cd7f03..8662a44a28f9 100644
> > --- a/net/ipv4/fib_frontend.c
> > +++ b/net/ipv4/fib_frontend.c
> > @@ -388,6 +388,12 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
> >       fib_combine_itag(itag, &res);
> >
> >       dev_match = fib_info_nh_uses_dev(res.fi, dev);
> > +     /* This is rare, loopback packets retain skb_dst so normally they
> > +      * would not even hit this slow path.
> > +      */
> > +     dev_match = dev_match || (res.type == RTN_LOCAL &&
> > +                               dev == net->loopback_dev &&
>
> The dev should not be needed. res.type == RTN_LOCAL should be enough, no?
>
> > +                               IN_DEV_ACCEPT_LOCAL(idev));
>
> Why is this check needed? Can you give an example use that is fixed -

I am not sure if I should have this check either, my initial version didn't
have it either, later I add it because I find out it is checked for rp_filter=0
case too.

On the other hand, loopback always accepts local traffic, so it may be
redundant to check it. So, I am not sure.

What do you think?

> and add one to selftests/net/fib_tests.sh?

It's complicated, Mesos network isolation uses this case:
https://cgit.twitter.biz/mesos/tree/src/slave/containerizer/mesos/isolators/network/port_mapping.cpp

Even if I use a simplified case, it still has to use TC filters and mirred
action to redirect the packet, which I am not sure they fit in fib_tests.sh.

Thanks.

^ permalink raw reply

* Re: INFO: task hung in unregister_netdevice_notifier (3)
From: syzbot @ 2019-07-14  4:07 UTC (permalink / raw)
  To: davem, linux-can, linux-kernel, mkl, netdev, socketcan,
	syzkaller-bugs
In-Reply-To: <0000000000009d787a0582128cbe@google.com>

syzbot has found a reproducer for the following crash on:

HEAD commit:    a2d79c71 Merge tag 'for-5.3/io_uring-20190711' of git://gi..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10e45f0fa00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=3539b1747f03988e
dashboard link: https://syzkaller.appspot.com/bug?extid=0f1827363a305f74996f
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1765c52fa00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+0f1827363a305f74996f@syzkaller.appspotmail.com

INFO: task syz-executor.4:9527 blocked for more than 143 seconds.
       Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.4  D28136  9527   9356 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:3252 [inline]
  __schedule+0x755/0x1580 kernel/sched/core.c:3878
  schedule+0xa8/0x270 kernel/sched/core.c:3942
  rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
  __down_write kernel/locking/rwsem.c:1349 [inline]
  down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
  unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
  bcm_release+0x93/0x5e0 net/can/bcm.c:1525
  __sock_release+0xce/0x280 net/socket.c:586
  sock_close+0x1e/0x30 net/socket.c:1264
  __fput+0x2ff/0x890 fs/file_table.c:280
  ____fput+0x16/0x20 fs/file_table.c:313
  task_work_run+0x145/0x1c0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:185 [inline]
  exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
  prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
  do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 48  
83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48  
89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
INFO: task syz-executor.2:9528 blocked for more than 145 seconds.
       Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.2  D28136  9528   9354 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:3252 [inline]
  __schedule+0x755/0x1580 kernel/sched/core.c:3878
  schedule+0xa8/0x270 kernel/sched/core.c:3942
  rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
  __down_write kernel/locking/rwsem.c:1349 [inline]
  down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
  unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
  bcm_release+0x93/0x5e0 net/can/bcm.c:1525
  __sock_release+0xce/0x280 net/socket.c:586
  sock_close+0x1e/0x30 net/socket.c:1264
  __fput+0x2ff/0x890 fs/file_table.c:280
  ____fput+0x16/0x20 fs/file_table.c:313
  task_work_run+0x145/0x1c0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:185 [inline]
  exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
  prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
  do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 5f fe ff ff 31 c9 31 f6 41 b9 b0 20 41 00 41 b8 8c d6 65 00 ba 02 00  
00 00 bf 28 38 44 00 ff 15 7d a1 24 00 85 c0 0f 85 37 fe <ff> ff 31 c9 31  
f6 41 b9 b0 20 41 00 41 b8 90 d6 65 00 ba 03 00 00
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
INFO: task syz-executor.0:9529 blocked for more than 147 seconds.
       Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.0  D28136  9529   9353 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:3252 [inline]
  __schedule+0x755/0x1580 kernel/sched/core.c:3878
  schedule+0xa8/0x270 kernel/sched/core.c:3942
  rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
  __down_write kernel/locking/rwsem.c:1349 [inline]
  down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
  unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
  bcm_release+0x93/0x5e0 net/can/bcm.c:1525
  __sock_release+0xce/0x280 net/socket.c:586
  sock_close+0x1e/0x30 net/socket.c:1264
  __fput+0x2ff/0x890 fs/file_table.c:280
  ____fput+0x16/0x20 fs/file_table.c:313
  task_work_run+0x145/0x1c0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:185 [inline]
  exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
  prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
  do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 48  
83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48  
89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
INFO: task syz-executor.5:9533 blocked for more than 148 seconds.
       Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.5  D28136  9533   9358 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:3252 [inline]
  __schedule+0x755/0x1580 kernel/sched/core.c:3878
  schedule+0xa8/0x270 kernel/sched/core.c:3942
  rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
  __down_write kernel/locking/rwsem.c:1349 [inline]
  down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
  unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
  bcm_release+0x93/0x5e0 net/can/bcm.c:1525
  __sock_release+0xce/0x280 net/socket.c:586
  sock_close+0x1e/0x30 net/socket.c:1264
  __fput+0x2ff/0x890 fs/file_table.c:280
  ____fput+0x16/0x20 fs/file_table.c:313
  task_work_run+0x145/0x1c0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:185 [inline]
  exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
  prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
  do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 5f fe ff ff 31 c9 31 f6 41 b9 b0 20 41 00 41 b8 8c d6 65 00 ba 02 00  
00 00 bf 28 38 44 00 ff 15 7d a1 24 00 85 c0 0f 85 37 fe <ff> ff 31 c9 31  
f6 41 b9 b0 20 41 00 41 b8 90 d6 65 00 ba 03 00 00
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
INFO: task syz-executor.1:9534 blocked for more than 148 seconds.
       Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.1  D28136  9534   9359 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:3252 [inline]
  __schedule+0x755/0x1580 kernel/sched/core.c:3878
  schedule+0xa8/0x270 kernel/sched/core.c:3942
  rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
  __down_write kernel/locking/rwsem.c:1349 [inline]
  down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
  unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
  bcm_release+0x93/0x5e0 net/can/bcm.c:1525
  __sock_release+0xce/0x280 net/socket.c:586
  sock_close+0x1e/0x30 net/socket.c:1264
  __fput+0x2ff/0x890 fs/file_table.c:280
  ____fput+0x16/0x20 fs/file_table.c:313
  task_work_run+0x145/0x1c0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:185 [inline]
  exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
  prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
  do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 48  
83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48  
89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
INFO: task syz-executor.3:9535 blocked for more than 150 seconds.
       Not tainted 5.2.0+ #80
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.3  D28136  9535   9351 0x00000004
Call Trace:
  context_switch kernel/sched/core.c:3252 [inline]
  __schedule+0x755/0x1580 kernel/sched/core.c:3878
  schedule+0xa8/0x270 kernel/sched/core.c:3942
  rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
  __down_write kernel/locking/rwsem.c:1349 [inline]
  down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
  unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
  bcm_release+0x93/0x5e0 net/can/bcm.c:1525
  __sock_release+0xce/0x280 net/socket.c:586
  sock_close+0x1e/0x30 net/socket.c:1264
  __fput+0x2ff/0x890 fs/file_table.c:280
  ____fput+0x16/0x20 fs/file_table.c:313
  task_work_run+0x145/0x1c0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:185 [inline]
  exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
  prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
  do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 48  
83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48  
89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff

Showing all locks held in the system:
1 lock held by khungtaskd/1049:
  #0: 00000000ede263b0 (rcu_read_lock){....}, at:  
debug_show_all_locks+0x5f/0x27e kernel/locking/lockdep.c:5257
1 lock held by rsyslogd/9208:
  #0: 00000000da20b59a (&f->f_pos_lock){+.+.}, at: __fdget_pos+0xee/0x110  
fs/file.c:801
2 locks held by getty/9298:
  #0: 00000000e9efae0d (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
  #1: 0000000007287a12 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9299:
  #0: 00000000ad0733b0 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
  #1: 0000000094dd5193 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9300:
  #0: 00000000692c340f (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
  #1: 00000000538c7d7d (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9301:
  #0: 00000000116ea6c7 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
  #1: 00000000a908a9f7 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9302:
  #0: 0000000042704f01 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
  #1: 0000000041cc8671 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9303:
  #0: 000000001ef3b293 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
  #1: 000000008b703302 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
2 locks held by getty/9304:
  #0: 0000000095601bb0 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341


^ permalink raw reply

* Re: [PATCH net-next 1/1] Allow 0.0.0.0/8 as a valid address range
From: Paul Marks @ 2019-07-14  3:57 UTC (permalink / raw)
  To: David Miller; +Cc: dave.taht, netdev, gnu
In-Reply-To: <20190626.132003.50827799670386389.davem@davemloft.net>

On Wed, Jun 26, 2019 at 1:20 PM David Miller <davem@davemloft.net> wrote:
>
> From: Dave Taht <dave.taht@gmail.com>
> Date: Sat, 22 Jun 2019 10:07:34 -0700
>
> > The longstanding prohibition against using 0.0.0.0/8 dates back
> > to two issues with the early internet.
> >
> > There was an interoperability problem with BSD 4.2 in 1984, fixed in
> > BSD 4.3 in 1986. BSD 4.2 has long since been retired.
> >
> > Secondly, addresses of the form 0.x.y.z were initially defined only as
> > a source address in an ICMP datagram, indicating "node number x.y.z on
> > this IPv4 network", by nodes that know their address on their local
> > network, but do not yet know their network prefix, in RFC0792 (page
> > 19).  This usage of 0.x.y.z was later repealed in RFC1122 (section
> > 3.2.2.7), because the original ICMP-based mechanism for learning the
> > network prefix was unworkable on many networks such as Ethernet (which
> > have longer addresses that would not fit into the 24 "node number"
> > bits).  Modern networks use reverse ARP (RFC0903) or BOOTP (RFC0951)
> > or DHCP (RFC2131) to find their full 32-bit address and CIDR netmask
> > (and other parameters such as default gateways). 0.x.y.z has had
> > 16,777,215 addresses in 0.0.0.0/8 space left unused and reserved for
> > future use, since 1989.
> >
> > This patch allows for these 16m new IPv4 addresses to appear within
> > a box or on the wire. Layer 2 switches don't care.
> >
> > 0.0.0.0/32 is still prohibited, of course.
> >
> > Signed-off-by: Dave Taht <dave.taht@gmail.com>
> > Signed-off-by: John Gilmore <gnu@toad.com>
> > Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
>
> Applied, thanks for following up on this.

This breaks an undocumented feature of Linux:

$ telnet 0.0.0.1 22
Trying 0.0.0.1...
telnet: Unable to connect to remote host: Invalid argument

It's sometimes useful to put 0.x.x.x in command-line flags,
/etc/hosts, or other config files, because it forces connect() to fail
immediately, instead of sending packets and waiting for a timeout.

Given that this has been user-visible for decades, is it a good idea
to pull out the rug?

^ permalink raw reply

* Re: [PATCH v3 0/3] kernel/notifier.c: avoid duplicate registration
From: Xiaoming Ni @ 2019-07-14  2:45 UTC (permalink / raw)
  To: gregkh@linuxfoundation.org
  Cc: Vasily Averin, adobriyan@gmail.com, akpm@linux-foundation.org,
	anna.schumaker@netapp.com, arjan@linux.intel.com,
	bfields@fieldses.org, chuck.lever@oracle.com, davem@davemloft.net,
	jlayton@kernel.org, luto@kernel.org, mingo@kernel.org,
	Nadia.Derbey@bull.net, paulmck@linux.vnet.ibm.com,
	semen.protsenko@linaro.org, stable@kernel.org,
	stern@rowland.harvard.edu, tglx@linutronix.de,
	torvalds@linux-foundation.org, trond.myklebust@hammerspace.com,
	viresh.kumar@linaro.org, Huangjianhui (Alex), Dailei,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20190712140729.GA11583@kroah.com>

On 2019/7/12 22:07, gregkh@linuxfoundation.org wrote:
> On Fri, Jul 12, 2019 at 09:11:57PM +0800, Xiaoming Ni wrote:
>> On 2019/7/11 21:57, Vasily Averin wrote:
>>> On 7/11/19 4:55 AM, Nixiaoming wrote:
>>>> On Wed, July 10, 2019 1:49 PM Vasily Averin wrote:
>>>>> On 7/10/19 6:09 AM, Xiaoming Ni wrote:
>>>>>> Registering the same notifier to a hook repeatedly can cause the hook
>>>>>> list to form a ring or lose other members of the list.
>>>>>
>>>>> I think is not enough to _prevent_ 2nd register attempt,
>>>>> it's enough to detect just attempt and generate warning to mark host in bad state.
>>>>>
>>>>
>>>> Duplicate registration is prevented in my patch, not just "mark host in bad state"
>>>>
>>>> Duplicate registration is checked and exited in notifier_chain_cond_register()
>>>>
>>>> Duplicate registration was checked in notifier_chain_register() but only 
>>>> the alarm was triggered without exiting. added by commit 831246570d34692e 
>>>> ("kernel/notifier.c: double register detection")
>>>>
>>>> My patch is like a combination of 831246570d34692e and notifier_chain_cond_register(),
>>>>  which triggers an alarm and exits when a duplicate registration is detected.
>>>>
>>>>> Unexpected 2nd register of the same hook most likely will lead to 2nd unregister,
>>>>> and it can lead to host crash in any time: 
>>>>> you can unregister notifier on first attempt it can be too early, it can be still in use.
>>>>> on the other hand you can never call 2nd unregister at all.
>>>>
>>>> Since the member was not added to the linked list at the time of the second registration, 
>>>> no linked list ring was formed. 
>>>> The member is released on the first unregistration and -ENOENT on the second unregistration.
>>>> After patching, the fault has been alleviated
>>>
>>> You are wrong here.
>>> 2nd notifier's registration is a pure bug, this should never happen.
>>> If you know the way to reproduce this situation -- you need to fix it. 
>>>
>>> 2nd registration can happen in 2 cases:
>>> 1) missed rollback, when someone forget to call unregister after successfull registration, 
>>> and then tried to call register again. It can lead to crash for example when according module will be unloaded.
>>> 2) some subsystem is registered twice, for example from  different namespaces.
>>> in this case unregister called during sybsystem cleanup in first namespace will incorrectly remove notifier used 
>>> in second namespace, it also can lead to unexpacted behaviour.
>>>
>> So in these two cases, is it more reasonable to trigger BUG() directly when checking for duplicate registration ?
>> But why does current notifier_chain_register() just trigger WARN() without exiting ?
>> notifier_chain_cond_register() direct exit without triggering WARN() ?
> 
> It should recover from this, if it can be detected.  The main point is
> that not all apis have to be this "robust" when used within the kernel
> as we do allow for the callers to know what they are doing :)
> 
In the notifier_chain_register(), the condition ( (*nl) == n) is the same registration of the same hook.
 We can intercept this situation and avoid forming a linked list ring to make the API more rob

> If this does not cause any additional problems or slow downs, it's
> probably fine to add.
> 
Notifier_chain_register() is not a system hotspot function.
At the same time, there is already a WARN_ONCE judgment. There is no new judgment in the new patch.
It only changes the processing under the condition of (*nl) == n, which will not cause performance problems.
At the same time, avoiding the formation of a link ring can make the system more robust.

> thanks,
> 
> greg k-h
> 
> .
> 
Thanks

Xiaoming Ni



^ permalink raw reply

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
From: David Miller @ 2019-07-14  2:38 UTC (permalink / raw)
  To: idosch
  Cc: nhorman, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, idosch
In-Reply-To: <20190711123909.GA10978@splinter>

From: Ido Schimmel <idosch@idosch.org>
Date: Thu, 11 Jul 2019 15:39:09 +0300

> Before I start working on v2, I would like to get your feedback on the
> high level plan. Also adding Neil who is the maintainer of drop_monitor
> (and counterpart DropWatch tool [1]).
> 
> IIUC, the problem you point out is that users need to use different
> tools to monitor packet drops based on where these drops occur
> (SW/HW/XDP).
> 
> Therefore, my plan is to extend the existing drop_monitor netlink
> channel to also cover HW drops. I will add a new message type and a new
> multicast group for HW drops and encode in the message what is currently
> encoded in the devlink events.

Consolidating the reporting to merge with the drop monitor facilities
is definitely a step in the right direction.

^ permalink raw reply

* Re: [net-next] bonding: add documentation for peer_notif_delay
From: David Miller @ 2019-07-14  2:29 UTC (permalink / raw)
  To: vincent; +Cc: j.vosburgh, vfalico, andy, netdev
In-Reply-To: <20190713143527.27562-1-vincent@bernat.ch>

From: Vincent Bernat <vincent@bernat.ch>
Date: Sat, 13 Jul 2019 16:35:27 +0200

> Ability to tweak the interval between peer notifications has been
> added in 07a4ddec3ce9 ("bonding: add an option to specify a delay
> between peer notifications") but the documentation was not updated.
> 
> Signed-off-by: Vincent Bernat <vincent@bernat.ch>

Applied.

^ permalink raw reply

* Re: [PATCH net] r8169: fix issue with confused RX unit after PHY power-down on RTL8411b
From: David Miller @ 2019-07-14  2:28 UTC (permalink / raw)
  To: hkallweit1; +Cc: nic_swsd, netdev, ionut.radu
In-Reply-To: <dfc533d0-a90e-37fe-2338-483abc9c1177@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Sat, 13 Jul 2019 13:45:47 +0200

> On RTL8411b the RX unit gets confused if the PHY is powered-down.
> This was reported in [0] and confirmed by Realtek. Realtek provided
> a sequence to fix the RX unit after PHY wakeup.
> 
> The issue itself seems to have been there longer, the Fixes tag
> refers to where the fix applies properly.
> 
> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1692075
> 
> Fixes: a99790bf5c7f ("r8169: Reinstate ASPM Support")
> Tested-by: Ionut Radu <ionut.radu@gmail.com>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH 2/2] net-next: ag71xx: Rearrange ag711xx struct to remove holes
From: David Miller @ 2019-07-14  2:24 UTC (permalink / raw)
  To: rosenp; +Cc: netdev
In-Reply-To: <20190713020921.18202-2-rosenp@gmail.com>

From: Rosen Penev <rosenp@gmail.com>
Date: Fri, 12 Jul 2019 19:09:21 -0700

> Removed ____cacheline_aligned attribute to ring structs. This actually
> causes holes in the ag71xx struc as well as lower performance.
> 
> Rearranged struct members to fall within respective cachelines. The RX
> ring struct now does not share a cacheline with the TX ring. The NAPI
> atruct now takes up its own cachelines and does not share.
> 
> According to pahole -C ag71xx -c 32
 ...

net-next is closed, therefore it is not appropriate to submit this change
at the current time.

^ permalink raw reply

* Re: [PATCH 1/2] net-next: ag71xx: Add missing header
From: David Miller @ 2019-07-14  2:24 UTC (permalink / raw)
  To: rosenp; +Cc: netdev
In-Reply-To: <20190713020921.18202-1-rosenp@gmail.com>

From: Rosen Penev <rosenp@gmail.com>
Date: Fri, 12 Jul 2019 19:09:20 -0700

> ag71xx uses devm_ioremap_nocache. This fixes usage of an implicit function
> 
> Signed-off-by: Rosen Penev <rosenp@gmail.com>

This is a bug fix and should target 'net'.

^ permalink raw reply

* Re: [PATCH net-next 1/2] net sched: update skbedit action for batched events operations
From: David Miller @ 2019-07-14  2:23 UTC (permalink / raw)
  To: mrv; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri
In-Reply-To: <1562970120-29517-1-git-send-email-mrv@mojatatu.com>

From: Roman Mashak <mrv@mojatatu.com>
Date: Fri, 12 Jul 2019 18:21:59 -0400

> Add get_fill_size() routine used to calculate the action size
> when building a batch of events.
> 
> Signed-off-by: Roman Mashak <mrv@mojatatu.com>

Please add an appropriate Fixes: tag, and also when you respin this provide the
required "[PATCH 0/N] " header posting explaining what the series is doing at
a high level, how it is doing it, and why it is doing it that way.

Thank you.

^ permalink raw reply

* Re: [Patch net] net_sched: unset TCQ_F_CAN_BYPASS when adding filters
From: Cong Wang @ 2019-07-14  2:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <8733195c-ac70-4a2a-db2f-b9bdfd05a703@gmail.com>

On Sat, Jul 13, 2019 at 5:54 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 7/12/19 10:17 PM, Cong Wang wrote:
> > For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS,
> > notably fq_codel, it makes no sense to let packets bypass the TC
> > filters we setup in any scenario, otherwise our packets steering
> > policy could not be enforced.
> >
> > This can be easily reproduced with the following script:
> >
> >  ip li add dev dummy0 type dummy
> >  ifconfig dummy0 up
> >  tc qd add dev dummy0 root fq_codel
> >  tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo
> >  tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo
> >  ping -I dummy0 192.168.112.1
> >
> > Without this patch, packets are sent directly to dummy0 without
> > hitting any of the filters. With this patch, packets are redirected
> > to loopback as expected.
> >
> > This fix is not perfect, it only unsets the flag but does not set it back
> > because we have to save the information somewhere in the qdisc if we
> > really want that.
> >
> > Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
> > Cc: Eric Dumazet <edumazet@google.com>
> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> > ---
> >  net/sched/cls_api.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> > index 638c1bc1ea1b..5c800b0c810b 100644
> > --- a/net/sched/cls_api.c
> > +++ b/net/sched/cls_api.c
> > @@ -2152,6 +2152,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
> >               tfilter_notify(net, skb, n, tp, block, q, parent, fh,
> >                              RTM_NEWTFILTER, false, rtnl_held);
> >               tfilter_put(tp, fh);
> > +             q->flags &= ~TCQ_F_CAN_BYPASS;
> >       }
> >
> >  errout:
> >
>
> Strange, because sfq and fq_codel are roughly the same for TCQ_F_CAN_BYPASS handling.
>
> Why is fq_codel_bind() not effective ?

Because I don't have class id set in the filter.

>
> If not effective, sfq had the same issue, so the Fixes: tag needs to be refined,
> maybe to commit 23624935e0c4 net_sched: TCQ_F_CAN_BYPASS generalization
>

Yeah, I think it is probably a better commit here.

Thanks.

^ permalink raw reply

* Re: [Patch net] fib: relax source validation check for loopback packets
From: David Ahern @ 2019-07-14  0:29 UTC (permalink / raw)
  To: Cong Wang, netdev; +Cc: Julian Anastasov
In-Reply-To: <8355af23-100f-a3bb-0759-fca8b0aa583b@gmail.com>

On 7/13/19 4:42 PM, David Ahern wrote:
> On 7/12/19 2:17 PM, Cong Wang wrote:
>> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
>> index 317339cd7f03..8662a44a28f9 100644
>> --- a/net/ipv4/fib_frontend.c
>> +++ b/net/ipv4/fib_frontend.c
>> @@ -388,6 +388,12 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
>>  	fib_combine_itag(itag, &res);
>>  
>>  	dev_match = fib_info_nh_uses_dev(res.fi, dev);
>> +	/* This is rare, loopback packets retain skb_dst so normally they
>> +	 * would not even hit this slow path.
>> +	 */
>> +	dev_match = dev_match || (res.type == RTN_LOCAL &&
>> +				  dev == net->loopback_dev &&
> 
> The dev should not be needed. res.type == RTN_LOCAL should be enough, no?
> 

nevermind, I see why you have the dev check.

^ permalink raw reply

* Re: [GIT] Networking
From: pr-tracker-bot @ 2019-07-13 23:15 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, akpm, netdev, linux-kernel
In-Reply-To: <20190712.231704.904072376124323665.davem@davemloft.net>

The pull request you sent on Fri, 12 Jul 2019 23:17:04 -0700 (PDT):

> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git refs/heads/master

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/d12109291ccbef7e879cc0d0733f31685cd80854

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply

* Re: [Patch net] fib: relax source validation check for loopback packets
From: David Ahern @ 2019-07-13 22:42 UTC (permalink / raw)
  To: Cong Wang, netdev; +Cc: Julian Anastasov
In-Reply-To: <20190712201749.28421-2-xiyou.wangcong@gmail.com>

On 7/12/19 2:17 PM, Cong Wang wrote:
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 317339cd7f03..8662a44a28f9 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -388,6 +388,12 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
>  	fib_combine_itag(itag, &res);
>  
>  	dev_match = fib_info_nh_uses_dev(res.fi, dev);
> +	/* This is rare, loopback packets retain skb_dst so normally they
> +	 * would not even hit this slow path.
> +	 */
> +	dev_match = dev_match || (res.type == RTN_LOCAL &&
> +				  dev == net->loopback_dev &&

The dev should not be needed. res.type == RTN_LOCAL should be enough, no?

> +				  IN_DEV_ACCEPT_LOCAL(idev));

Why is this check needed? Can you give an example use that is fixed -
and add one to selftests/net/fib_tests.sh?


>  	if (dev_match) {
>  		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
>  		return ret;
> 


^ permalink raw reply

* Re: [PATCH] Fix dumping vlan rules
From: Florian Westphal @ 2019-07-13 21:43 UTC (permalink / raw)
  To: michael-dev; +Cc: netdev, netfilter-devel
In-Reply-To: <20190713210306.30815-1-michael-dev@fami-braun.de>

michael-dev@fami-braun.de <michael-dev@fami-braun.de> wrote:
> From: "M. Braun" <michael-dev@fami-braun.de>
> 
> Given the following bridge rules:
> 1. ip protocol icmp accept
> 2. ether type vlan vlan type ip ip protocol icmp accept
> 
> The are currently both dumped by "nft list ruleset" as
> 1. ip protocol icmp accept
> 2. ip protocol icmp accept

Yes, thats a bug, the dependency removal is incorrect.

> +++ b/src/payload.c
> @@ -506,6 +506,18 @@ static bool payload_may_dependency_kill(struct payload_dep_ctx *ctx,
>  		     dep->left->payload.desc == &proto_ip6) &&
>  		    expr->payload.base == PROTO_BASE_TRANSPORT_HDR)
>  			return false;
> +		/* Do not kill
> +		 *  ether type vlan and vlan type ip and ip protocol icmp
> +		 * into
> +		 *  ip protocol icmp
> +		 * as this lacks ether type vlan.
> +		 * More generally speaking, do not kill protocol type
> +		 * for stacked protocols if we only have protcol type matches.
> +		 */
> +		if (dep->left->etype == EXPR_PAYLOAD && dep->op == OP_EQ &&
> +		    expr->flags & EXPR_F_PROTOCOL &&
> +		    expr->payload.base == dep->left->payload.base)
> +			return false;

Can you please add a test case for this problem to
tests/py/bridge/vlan.t so we catch this when messing with dependency
handling in the future?

Also, please submit v2 directly to netfilter-devel@.

Thanks!

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox