Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v12 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Stephen Hemminger @ 2018-05-31 12:58 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
	alexander.h.duyck, kubakici, jasowang, loseweigh, jiri,
	aaron.f.brown, anjali.singhai
In-Reply-To: <274f0b84-07f1-5cd5-e256-ce4b71358c14@intel.com>

On Wed, 30 May 2018 20:03:11 -0700
"Samudrala, Sridhar" <sridhar.samudrala@intel.com> wrote:

> On 5/30/2018 7:06 PM, Stephen Hemminger wrote:
> > On Thu, 24 May 2018 09:55:14 -0700
> > Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
> >  
> >> Use the registration/notification framework supported by the generic
> >> failover infrastructure.
> >>
> >> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>  
> > Why was this merged? It was never signed off by any of the netvsc maintainers,
> > and there were still issues unresolved.
> >
> > There are also namespaces issues I am fixing and this breaks them.
> > Will start my patch set with a revert for this. Sorry  
> 
> I would appreciate if you can make the fixes on top of this patch series. I tried hard
> to make sure that netvsc functionality and behavior doesn't change.
> 
> It is possible that there could be some bugs introduced, but they can be fixed.
> Looks like Wei already found a bug and submitted a fix for that.
> 

Ok, but several of these may clash with what you want for virtio.
Like:
	- VF should be moved to namespace of virt device
	- VF should be associated based on message from host with serial # not
	  registration notifier and MAC address.
	- control operations should use master device reference rather than
	  searching based on MAC.

As you can see these are structural changes.

^ permalink raw reply

* [PATCH v2 net] mlx4_core: restore optimal ICM memory allocation
From: Eric Dumazet @ 2018-05-31 12:52 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, John Sperbeck, Tarick Bedeir,
	Qing Huang, Daniel Jurgens, Zhu Yanjun

Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
brought two regressions caught in our regression suite.

The big one is an additional cost of 256 bytes of overhead per 4096 bytes,
or 6.25 % which is unacceptable since ICM can be pretty large.

This comes from having to allocate one struct mlx4_icm_chunk (256 bytes)
per MLX4_TABLE_CHUNK, which the buggy commit shrank to 4KB
(instead of prior 256KB)

Note that mlx4_alloc_icm() is already able to try high order allocations
and fallback to low-order allocations under high memory pressure.

Most of these allocations happen right after boot time, when we get
plenty of non fragmented memory, there is really no point being so
pessimistic and break huge pages into order-0 ones just for fun.

We only have to tweak gfp_mask a bit, to help falling back faster,
without risking OOM killings.

Second regression is an KASAN fault, that will need further investigations.

Fixes: 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Cc: John Sperbeck <jsperbeck@google.com>
Cc: Tarick Bedeir <tarick@google.com>
Cc: Qing Huang <qing.huang@oracle.com>
Cc: Daniel Jurgens <danielj@mellanox.com>
Cc: Zhu Yanjun <yanjun.zhu@oracle.com>
---
 drivers/net/ethernet/mellanox/mlx4/icm.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
index 685337d58276fc91baeeb64387c52985e1bc6dda..5342bd8a3d0bfaa9e76bb9b6943790606c97b181 100644
--- a/drivers/net/ethernet/mellanox/mlx4/icm.c
+++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
@@ -43,12 +43,13 @@
 #include "fw.h"
 
 /*
- * We allocate in page size (default 4KB on many archs) chunks to avoid high
- * order memory allocations in fragmented/high usage memory situation.
+ * We allocate in as big chunks as we can, up to a maximum of 256 KB
+ * per chunk. Note that the chunks are not necessarily in contiguous
+ * physical memory.
  */
 enum {
-	MLX4_ICM_ALLOC_SIZE	= PAGE_SIZE,
-	MLX4_TABLE_CHUNK_SIZE	= PAGE_SIZE,
+	MLX4_ICM_ALLOC_SIZE	= 1 << 18,
+	MLX4_TABLE_CHUNK_SIZE	= 1 << 18,
 };
 
 static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)
@@ -135,6 +136,7 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
 	struct mlx4_icm *icm;
 	struct mlx4_icm_chunk *chunk = NULL;
 	int cur_order;
+	gfp_t mask;
 	int ret;
 
 	/* We use sg_set_buf for coherent allocs, which assumes low memory */
@@ -178,13 +180,17 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
 		while (1 << cur_order > npages)
 			--cur_order;
 
+		mask = gfp_mask;
+		if (cur_order)
+			mask &= ~__GFP_DIRECT_RECLAIM;
+
 		if (coherent)
 			ret = mlx4_alloc_icm_coherent(&dev->persist->pdev->dev,
 						      &chunk->mem[chunk->npages],
-						      cur_order, gfp_mask);
+						      cur_order, mask);
 		else
 			ret = mlx4_alloc_icm_pages(&chunk->mem[chunk->npages],
-						   cur_order, gfp_mask,
+						   cur_order, mask,
 						   dev->numa_node);
 
 		if (ret) {
-- 
2.17.0.921.gf22659ad46-goog

^ permalink raw reply related

* Re: [PATCH net-next v4 00/11] Modify action API for implementing lockless actions
From: Vlad Buslov @ 2018-05-31 12:38 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, xiyou.wangcong, jiri, pablo, kadlec, fw, ast,
	daniel, edumazet, keescook, marcelo.leitner, kliteyn
In-Reply-To: <262fbd11-401e-90cf-4226-39b1604eb16d@mojatatu.com>


On Thu 31 May 2018 at 10:01, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> Hi Vlad,
>
> Can you try one simple test below with these patches?
>
> #create an action
> sudo $TC actions add action skbedit mark 1 pipe
> #
> sudo $TC qdisc del dev lo parent ffff:
> sudo $TC qdisc add dev lo ingress
> # bind action to filter....
> sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
> u32 match ip dst 127.0.0.1/32 flowid 1:1 action skbedit index 1
>
> #now delete that action multiple times while it is still bound
> sudo $TC actions del action skbedit index 1
> sudo $TC actions del action skbedit index 1
> sudo $TC actions del action skbedit index 1
>
> #check the refcount and bindcount
> sudo $TC -s actions ls action skbedit
>
> #delete the filter (which should remove the bindcnt)
>
> sudo $TC filter del dev lo parent ffff: protocol ip prio 1 \
> u32 match ip dst 127.0.0.1/32 flowid 1:1
>
> #check the refcount and bindcount
> sudo $TC -s actions ls action skbedit
>
> Current behavior: i believe the action is gone in this last step.
> Your patches may change behavior so that the action action is still
> around. I dont think this is a big deal, but just wanted to be sure
> it is not something more unexpected.
>
> cheers,
> jamal

Hi Jamal,

On current net-next I still have action with single reference after last
step:
~$ sudo $TC -s actions ls action skbedit                       
total acts 1                                                   
                                                               
        action order 0:  skbedit mark 1 pipe                   
         index 1 ref 2 bind 1 installed 47 sec used 47 sec     
        Action statistics:                                     
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0                               
~$ sudo $TC filter del dev lo parent ffff: protocol ip prio 1 \
> u32 match ip dst 127.0.0.1/32 flowid 1:1                     
~$ sudo $TC -s actions ls action skbedit                       
total acts 1                                                   
                                                               
        action order 0:  skbedit mark 1 pipe                   
         index 1 ref 1 bind 0 installed 80 sec used 80 sec     
        Action statistics:                                     
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0                               

Which branch are you testing on?

Regards,
Vlad

^ permalink raw reply

* [PATCH] vlan: use non-archaic spelling of failes
From: Thadeu Lima de Souza Cascardo @ 2018-05-31 12:20 UTC (permalink / raw)
  To: netdev

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
---
 include/linux/if_vlan.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 78a5a90b4267..83ea4df6ab81 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -331,7 +331,7 @@ static inline bool vlan_hw_offload_capable(netdev_features_t features,
  * @mac_len: MAC header length including outer vlan headers
  *
  * Inserts the VLAN tag into @skb as part of the payload at offset mac_len
- * Returns error if skb_cow_head failes.
+ * Returns error if skb_cow_head fails.
  *
  * Does not change skb->protocol so this function can be used during receive.
  */
@@ -379,7 +379,7 @@ static inline int __vlan_insert_inner_tag(struct sk_buff *skb,
  * @vlan_tci: VLAN TCI to insert
  *
  * Inserts the VLAN tag into @skb as part of the payload
- * Returns error if skb_cow_head failes.
+ * Returns error if skb_cow_head fails.
  *
  * Does not change skb->protocol so this function can be used during receive.
  */
-- 
2.17.0

^ permalink raw reply related

* [PATCH] caif: use non-archaic spelling of failes
From: Thadeu Lima de Souza Cascardo @ 2018-05-31 12:18 UTC (permalink / raw)
  To: netdev; +Cc: dmitry.tarnyagin

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
---
 include/net/caif/caif_layer.h | 2 +-
 net/caif/cfrfml.c             | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/caif/caif_layer.h b/include/net/caif/caif_layer.h
index 94e5ed64dc6d..8a114e57bcb6 100644
--- a/include/net/caif/caif_layer.h
+++ b/include/net/caif/caif_layer.h
@@ -22,7 +22,7 @@ struct caif_packet_funcs;
  * @assert: expression to evaluate.
  *
  * This function will print a error message and a do WARN_ON if the
- * assertion failes. Normally this will do a stack up at the current location.
+ * assertion fails. Normally this will do a stack up at the current location.
  */
 #define caif_assert(assert)					\
 do {								\
diff --git a/net/caif/cfrfml.c b/net/caif/cfrfml.c
index b82440e1fcb4..3f2c63c78004 100644
--- a/net/caif/cfrfml.c
+++ b/net/caif/cfrfml.c
@@ -85,7 +85,7 @@ static struct cfpkt *rfm_append(struct cfrfml *rfml, char *seghead,
 	tmppkt = cfpkt_append(rfml->incomplete_frm, pkt,
 			rfml->pdu_size + RFM_HEAD_SIZE);
 
-	/* If cfpkt_append failes input pkts are not freed */
+	/* If cfpkt_append fails input pkts are not freed */
 	*err = -ENOMEM;
 	if (tmppkt == NULL)
 		return NULL;
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag
From: Moshe Shemesh @ 2018-05-31 12:18 UTC (permalink / raw)
  To: Tariq Toukan, Eric Dumazet, Alexander Aring
  Cc: David Miller, edumazet, netdev, fw, herbert, tgraf, brouer,
	alex.aring, stefan, ktkhai, Eran Ben Elisha
In-Reply-To: <11b2baca-c810-3f61-38d1-415099783129@mellanox.com>



On 5/30/2018 10:20 AM, Tariq Toukan wrote:
> 
> 
> On 28/05/2018 7:09 PM, Eric Dumazet wrote:
>>
>>
>> On 05/28/2018 07:52 AM, Alexander Aring wrote:
>>
>>> as somebody who had similar issues with this patch series I can tell you
>>> about what happened for the 6LoWPAN fragmentation.
>>>
>>> The issue sounds similar, but there is too much missing information here
>>> to say something about if you have exactly the issue which we had.
>>>
>>> Our problem:
>>>
>>> The patch series uses memcmp() to compare hash keys, we had some padding
>>> bytes in our hash key and it occurs that we had sometimes random bytes
>>> in this structure when it's put on stack. We solved it by a struct
>>> foo_key bar = {}, which in case of gcc it _seems_ it makes a whole
>>> memset(bar, 0, ..) on the structure.
>>>
>>> I asked on the netdev mailinglist how to deal with this problem in
>>> general, because = {} works in case of gcc, others compilers may have a
>>> different handling or even gcc will changes this behaviour in future.
>>> I got no reply so I did what it works for me. :-)
>>>
>>> At least maybe a memcmp() on structures should never be used, it should
>>> be compared by field. I would recommend this way when the compiler is
>>> always clever enough to optimize it in some cases, but I am not so a
>>> compiler expert to say anything about that.
>>>
>>> I checked the hash key structures for x86_64 and pahole, so far I didn't
>>> find any padding bytes there, but it might be different on
>>> architectures or ?compiler?.
>>>
>>> Additional useful information to check if you running into the same 
>>> problem
>>> would be:
>>>
>>>   - Which architecture do you use?
>>>
>>>   - Do you have similar problems with a veth setup?
>>>
>>> You could also try this:
>>>
>>> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
>>> index b939b94e7e91..40ece9ab8b12 100644
>>> --- a/net/ipv6/reassembly.c
>>> +++ b/net/ipv6/reassembly.c
>>> @@ -142,19 +142,19 @@ static void ip6_frag_expire(struct timer_list *t)
>>>   static struct frag_queue *
>>>   fq_find(struct net *net, __be32 id, const struct ipv6hdr *hdr, int 
>>> iif)
>>>   {
>>> -       struct frag_v6_compare_key key = {
>>> -               .id = id,
>>> -               .saddr = hdr->saddr,
>>> -               .daddr = hdr->daddr,
>>> -               .user = IP6_DEFRAG_LOCAL_DELIVER,
>>> -               .iif = iif,
>>> -       };
>>> +       struct frag_v6_compare_key key = {};
>>>          struct inet_frag_queue *q;
>>>          if (!(ipv6_addr_type(&hdr->daddr) & (IPV6_ADDR_MULTICAST |
>>>                                              IPV6_ADDR_LINKLOCAL)))
>>>                  key.iif = 0;
>>> +       key.id = id;
>>> +       key.saddr = hdr->saddr;
>>> +       key.daddr = hdr->daddr;
>>> +       key.user = IP6_DEFRAG_LOCAL_DELIVER;
>>> +       key.iif = iif;
>>> +
>>>          q = inet_frag_find(&net->ipv6.frags, &key);
>>>          if (!q)
>>>                  return NULL;
>>>
>>> - Alex
>>>
>>
>> Hi Alex.
>>
>> This patch makes no sense, since struct frag_v6_compare_key has no hole.
>>
>> Only 6LoWPAN had a problem really, because of its way of having unions 
>> (and holes).
>>
>> Also note that your patch would break the case when we force key.iif 
>> to be zero.
>>
>>
>> Tariq, here are my test results : No drops for me.
>>
>> # ./netperf -H 2607:f8b0:8099:e18:: -t UDP_STREAM
>> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 
>> 2607:f8b0:8099:e18:: () port 0 AF_INET6
>> Socket  Message  Elapsed      Messages
>> Size    Size     Time         Okay Errors   Throughput
>> bytes   bytes    secs            #      #   10^6bits/sec
>>
>> 212992   65507   10.00      202117      0    10592.00
>> 212992           10.00           0              0.00
>>
>> Somehow, you might send packets too fast and receiver has a problem 
>> with that ?
> 
> Not sure, the transmit BW you get is higher than what we saw.
> Anyway, we'll check this.
> 
>> For particular needs, you might need to adjust :
>>
>> /proc/sys/net/ipv6/ip6frag_time  (to 2 seconds instead of the default 
>> of 60)
>> /proc/sys/net/ipv6/ip6frag_low_thresh
>> /proc/sys/net/ipv6/ip6frag_high_thresh
>>
>> Once your receiver has filled its capacity with frags, the default of 
>> 60 seconds to garbage collect
>> might be the reason you notice a problem.
>>
>> Check :
>> grep FRAG6 /proc/net/sockstat6
>>
>> On Google servers we multiply by 25 the limits for ipv6 frags memory 
>> usage :
>>
>> /proc/sys/net/ipv6/ip6frag_high_thresh:104857600  (instead of 4MB)
>> /proc/sys/net/ipv6/ip6frag_low_thresh:78643200  (instead of 3 MB)
>>
>> When using 64KB datagrams, note that the truesize of the datagram 
>> would be about 44 * 2 = 88 KB,
>> so after ~40 lost packets in the network, you no longer can accept 
>> ipv6 fragments, until garbage
>> collector evicted old datagrams.
>>
> 
> Great.
> Moshe, please try the suggested above.

I do see big improvement after changing the 3 parameters as Eric suggested:
/proc/sys/net/ipv6/ip6frag_time  set to 2
/proc/sys/net/ipv6/ip6frag_low_thresh set to 104857600
/proc/sys/net/ipv6/ip6frag_high_thresh set to 78643200


[root@reg-l-vrt-67100-104 linux-stable]#  netperf -H 
fe80::7efe:90ff:fed5:bb48%ens9,inet6 -t udp_stream --
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 
fe80::7efe:90ff:fed5:bb48%ens9 () port 0 AF_INET6
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      156387      0    8194.60
212992           10.00       76901           4029.57

#kernel
Ip6InReceives                   7107999            0.0
Ip6InDelivers                   114126             0.0
Ip6OutRequests                  47                 0.0
Ip6ReasmTimeout                 5115               0.0
Ip6ReasmReqds                   7107987            0.0
Ip6ReasmOKs                     114114             0.0
Ip6ReasmFails                   1714146            0.0
...
Udp6InDatagrams                 112486             0.0
Udp6InErrors                    1629               0.0
Udp6RcvbufErrors                1629               0.0
...

While before these parameters settings I got:
[root@reg-l-vrt-67100-104 ~]# netperf -H 
fe80::e61d:2dff:feca:c7c3%ens9,inet6 -t udp_stream --
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 
fe80::e61d:2dff:feca:c7c3%ens9 () port 0 AF_INET6
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      145419      0    7620.35
212992           10.00         285             14.93

#kernel
Ip6InReceives                   6665965            0.0
Ip6InDelivers                   300                0.0
Ip6OutRequests                  9                  0.0
Ip6ReasmReqds                   6665950            0.0
Ip6ReasmOKs                     285                0.0
Ip6ReasmFails                   6650890            0.0
...
Udp6InDatagrams                 286                0.0


however, before the patchset, I got much better results:
[root@reg-l-vrt-67100-104 linux-stable]#  netperf -H 
fe80::7efe:90ff:fed5:bb48%ens9,inet6 -t udp_stream --
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 
fe80::7efe:90ff:fed5:bb48%ens9 () port 0 AF_INET6
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      158935      0    8328.32
212992           10.00      144652           7579.88


#kernel
Ip6InReceives                   7088903            0.0
Ip6InDelivers                   154117             0.0
Ip6OutRequests                  9                  0.0
Ip6ReasmReqds                   7088889            0.0
Ip6ReasmOKs                     154103             0.0
...
Udp6InDatagrams                 144653             0.0
Udp6InErrors                    9451               0.0
Udp6RcvbufErrors                9451               0.0


> 
> In case these values dramatically improve performance, maybe its time to 
> change the default.
> 
> Thanks,
> Tariq
> 
>>
>>
>>
>>
>>
>>

^ permalink raw reply

* [PATCH 2/2 net-next] net_failover: fix error code in net_failover_create()
From: Dan Carpenter @ 2018-05-31 12:04 UTC (permalink / raw)
  To: David S. Miller, Sridhar Samudrala; +Cc: netdev, kernel-janitors

We forgot to set the error code on this path.  This function is supposed
to return error pointers, so with this bug it accidentally returns NULL
and the caller doesn't check for that.

Fixes: cfc80d9a1163 ("net: Introduce net_failover driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

diff --git a/drivers/net/net_failover.c b/drivers/net/net_failover.c
index ef50158e90a9..881f3fa13e6b 100644
--- a/drivers/net/net_failover.c
+++ b/drivers/net/net_failover.c
@@ -761,8 +761,10 @@ struct failover *net_failover_create(struct net_device *standby_dev)
 	netif_carrier_off(failover_dev);

 	failover = failover_register(failover_dev, &net_failover_ops);
-	if (IS_ERR(failover))
+	if (IS_ERR(failover)) {
+		err = PTR_ERR(failover);
 		goto err_failover_register;
+	}

 	return failover;

^ permalink raw reply related

* [PATCH 1/2 net-next] net_failover: fix net_failover_compute_features()
From: Dan Carpenter @ 2018-05-31 12:01 UTC (permalink / raw)
  To: David S. Miller, Sridhar Samudrala; +Cc: netdev, kernel-janitors

This has an '&' vs '|' typo so it starts with vlan_features set to none.
Also a u32 type isn't large enough to hold all the feature bits, it
should be netdev_features_t.

Fixes: cfc80d9a1163 ("net: Introduce net_failover driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

diff --git a/drivers/net/net_failover.c b/drivers/net/net_failover.c
index 8b508e2cf29b..ef50158e90a9 100644
--- a/drivers/net/net_failover.c
+++ b/drivers/net/net_failover.c
@@ -380,7 +380,8 @@ static rx_handler_result_t net_failover_handle_frame(struct sk_buff **pskb)
 
 static void net_failover_compute_features(struct net_device *dev)
 {
-	u32 vlan_features = FAILOVER_VLAN_FEATURES & NETIF_F_ALL_FOR_ALL;
+	netdev_features_t vlan_features = FAILOVER_VLAN_FEATURES |
+					  NETIF_F_ALL_FOR_ALL;
 	netdev_features_t enc_features  = FAILOVER_ENC_FEATURES;
 	unsigned short max_hard_header_len = ETH_HLEN;
 	unsigned int dst_release_flag = IFF_XMIT_DST_RELEASE |

^ permalink raw reply related

* Re: [PATCH v2] netfilter: properly initialize xt_table_info structure
From: Michal Kubecek @ 2018-05-31 11:55 UTC (permalink / raw)
  To: peter pi
  Cc: Greg Kroah-Hartman, Florian Westphal, Jan Engelhardt,
	Eric Dumazet, Greg Hackmann, Pablo Neira Ayuso, Jozsef Kadlecsik,
	netfilter-devel, coreteam, netdev
In-Reply-To: <20180531113215.sbqqjip2gxvhl2eg@unicorn.suse.cz>

On Thu, May 31, 2018 at 01:32:16PM +0200, Michal Kubecek wrote:
> I think I start to understand the problem. IPT_SO_GET_ENTRIES leads to
> calling copy_entries_to_user() which copies the entries as they are to
> user provided buffer. It also copies instances of struct xt_entry_match
> and struct xt_entry_target which contain kernel pointers. We then
> rewrite them with match/target name for userspace but the layout looks
> (on x86_64) like this
> 
> /* offset    |  size */  type = struct xt_entry_match {
> /*    0      |    32 */    union {
> /*                32 */        struct {
> /*    0      |     2 */            __u16 match_size;
> /*    2      |    29 */            char name[29];
> /*   31      |     1 */            __u8 revision;
> 
>                                    /* total size (bytes):   32 */
>                                } user;
> /*                16 */        struct {
> /*    0      |     2 */            __u16 match_size;
> /* XXX  6-byte hole  */
> /*    8      |     8 */            struct xt_match *match;
> 
>                                    /* total size (bytes):   16 */
>                                } kernel;
> /*                 2 */        __u16 match_size;
> 
>                                /* total size (bytes):   32 */
>                            } u;
> /*   32      |     0 */    unsigned char data[];
> 
>                            /* total size (bytes):   32 */
>                          }
> 
> 
> so that if match name is no longer than five characters (which is often
> the case), writing to .u.user.name leaves .u.kernel.match untouched. The
> same problem exists in struct xt_entry_target.

And this should no longer happen since the series

 f32815d21d4d ("xtables: add xt_match, xt_target and data copy_to_user functions")
 f77bc5b23fb1 ("iptables: use match, target and data copy_to_user helpers")
 e47ddb2c4691 ("ip6tables: use match, target and data copy_to_user helpers")
 244b531bee2b ("arptables: use match, target and data copy_to_user helpers")
 b5040f6c33a5 ("ebtables: use match, target and data copy_to_user helpers")
 4915f7bbc402 ("xtables: use match, target and data copy_to_user helpers in compat")
 ec2318904965 ("xtables: extend matches and targets with .usersize")

changed the logic in 4.11-rc1.

Michal Kubecek

^ permalink raw reply

* Re: [PATCH 1/1] net: usb: cdc_mbim: add flag FLAG_SEND_ZLP
From: Daniele Palmas @ 2018-05-31 11:52 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: Oliver Neukum, netdev, linux-usb
In-Reply-To: <87a7sggnch.fsf@miraculix.mork.no>

2018-05-31 11:56 GMT+02:00 Bjørn Mork <bjorn@mork.no>:
> Daniele Palmas <dnlplm@gmail.com> writes:
>
>> Testing Telit LM940 with ICMP packets > 14552 bytes revealed that
>> the modem needs FLAG_SEND_ZLP to properly work, otherwise the cdc
>> mbim data interface won't be anymore responsive.
>>
>> Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
>
> Acked-by: Bjørn Mork <bjorn@mork.no>
>
> Should have thought of this... I noticed your discussion, but couldn't
> reproduce the issues myself.  This explains why.
>
> Do you happen to know if the device announces larger buffers than the
> driver wants to use, or if this happens with the max sized buffers too?
>
> You can easily check these values by comparing dwNtbInMaxSize and
> dwNtbOutMaxSize (device maximum values) with rx_max and tx_max
> (neogtiated values) using e.g
>
>  grep . /sys/class/net/wwan0/cdc_ncm/*
>

This seems to happen with the max sized buffers according to the output:

daniele@L2122:/home/daniele$ grep . /sys/class/net/wwp0s20u6i2/cdc_ncm/*

/sys/class/net/wwp0s20u6i2/cdc_ncm/bmNtbFormatsSupported:0x0001
/sys/class/net/wwp0s20u6i2/cdc_ncm/dwNtbInMaxSize:16384
/sys/class/net/wwp0s20u6i2/cdc_ncm/dwNtbOutMaxSize:16384
/sys/class/net/wwp0s20u6i2/cdc_ncm/min_tx_pkt:13312
/sys/class/net/wwp0s20u6i2/cdc_ncm/ndp_to_end:N
/sys/class/net/wwp0s20u6i2/cdc_ncm/rx_max:16384
/sys/class/net/wwp0s20u6i2/cdc_ncm/tx_max:16384
/sys/class/net/wwp0s20u6i2/cdc_ncm/tx_timer_usecs:400
/sys/class/net/wwp0s20u6i2/cdc_ncm/wNdpInAlignment:4
/sys/class/net/wwp0s20u6i2/cdc_ncm/wNdpInDivisor:1
/sys/class/net/wwp0s20u6i2/cdc_ncm/wNdpInPayloadRemainder:0
/sys/class/net/wwp0s20u6i2/cdc_ncm/wNdpOutAlignment:4
/sys/class/net/wwp0s20u6i2/cdc_ncm/wNdpOutDivisor:4
/sys/class/net/wwp0s20u6i2/cdc_ncm/wNdpOutPayloadRemainder:0
/sys/class/net/wwp0s20u6i2/cdc_ncm/wNtbOutMaxDatagrams:16

Thanks,
Daniele

>
> It has never been 100% clear to me whether we should send the ZLP by
> default if we've negotiated a smaller than max buffer. But the ZLP ought
> to be redundant in any case, since the device knows the negotiated
> buffer size. So I do believe our current interpretation makes sense.
>
> Not that it matters.  There are obviously more than enough device
> implementations violating this requirement to make it completely
> pointless.
>
>
> Bjørn

^ permalink raw reply

* [PATCH net-next] net: axienet: remove stale comment of axienet_open
From: YueHaibing @ 2018-05-31 11:51 UTC (permalink / raw)
  To: davem, anirudh, John.Linn; +Cc: netdev, linux-kernel, michal.simek, YueHaibing

axienet_open no longer return -ENODEV when PHY cannot be connected to
since commit d7cc3163e026 ("net: axienet: Support phy-less mode of operation")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
index e74e1e8..f24f48f 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
@@ -900,7 +900,6 @@ static void axienet_dma_err_handler(unsigned long data);
  * @ndev:	Pointer to net_device structure
  *
  * Return: 0, on success.
- *	    -ENODEV, if PHY cannot be connected to
  *	    non-zero error value on failure
  *
  * This is the driver open routine. It calls phy_start to start the PHY device.
-- 
2.7.0

^ permalink raw reply related

* Re: general protection fault in requeue_rx_msgs
From: Kirill Tkhai @ 2018-05-31 11:34 UTC (permalink / raw)
  To: syzbot, davem, ebiggers, edumazet, linux-kernel, netdev,
	syzkaller-bugs, tom, viro
In-Reply-To: <0000000000000482ce056d7c1436@google.com>

On 31.05.2018 11:16, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    0044cdeb7313 Merge branch 'for-linus' of git://git.kernel...
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=15aeff0f800000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=968b0b23c7854c0b
> dashboard link: https://syzkaller.appspot.com/bug?extid=554266c04a41d1f9754d
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=131a208f800000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+554266c04a41d1f9754d@syzkaller.appspotmail.com
> 
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] SMP KASAN
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 4788 Comm: kworker/u4:3 Not tainted 4.17.0-rc7+ #74
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: kstrp strp_work
> RIP: 0010:__skb_unlink include/linux/skbuff.h:1844 [inline]
> RIP: 0010:__skb_dequeue include/linux/skbuff.h:1861 [inline]
> RIP: 0010:requeue_rx_msgs+0x14d/0x620 net/kcm/kcmsock.c:226
> RSP: 0018:ffff8801aa97f0b8 EFLAGS: 00010202
> RAX: 0000000000000000 RBX: dffffc0000000000 RCX: ffffffff86d54ed3
> RDX: 0000000000000001 RSI: ffffffff86d531e2 RDI: 0000000000000008
> RBP: ffff8801aa97f1b8 R08: ffff8801aaa8e3c0 R09: ffffed0035f0a0e8
> R10: ffffed0035f0a0e8 R11: ffff8801af850743 R12: ffff8801d4407000
> R13: ffffed003552fe22 R14: 0000000000000000 R15: ffff8801a6bb06c0
> FS:  0000000000000000(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fba7c5099a0 CR3: 00000001af6f6000 CR4: 00000000001406f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  unreserve_rx_kcm+0x471/0x520 net/kcm/kcmsock.c:334
>  kcm_rcv_strparser+0x109/0x8d0 net/kcm/kcmsock.c:375
>  __strp_recv+0x34b/0x2130 net/strparser/strparser.c:328
>  strp_recv+0xcf/0x110 net/strparser/strparser.c:362
>  tcp_read_sock+0x2aa/0x810 net/ipv4/tcp.c:1652
>  strp_read_sock+0x1a1/0x2d0 net/strparser/strparser.c:385
>  do_strp_work net/strparser/strparser.c:440 [inline]
>  strp_work+0xcd/0x120 net/strparser/strparser.c:449
>  process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
>  worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
>  kthread+0x345/0x410 kernel/kthread.c:240
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
> Code: 80 3c 1a 00 0f 85 70 04 00 00 48 8d 78 08 4d 8b 74 24 08 49 c7 04 24 00 00 00 00 49 c7 44 24 08 00 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 1a 00 0f 85 a7 04 00 00 4c 89 f2 4c 89 70 08 48 c1 ea 03
> RIP: __skb_unlink include/linux/skbuff.h:1844 [inline] RSP: ffff8801aa97f0b8
> RIP: __skb_dequeue include/linux/skbuff.h:1861 [inline] RSP: ffff8801aa97f0b8
> RIP: requeue_rx_msgs+0x14d/0x620 net/kcm/kcmsock.c:226 RSP: ffff8801aa97f0b8
> ---[ end trace e4c0e45094907eaa ]---

This looks like the same as syzbot+5f1a04e374a635efc426@syzkaller.appspotmail.com:
"WARNING in kcm_exit_net (3)". This may confirm the theory. It looks like after async
pernet_operations the race window became bigger, and now the work has more chances
to have no a time to complete.

I'm not close to this code. Tom, could you please to say, whether kcm_done_work()
can be called for in-kernel kcm sockets (created via kcm_clone())?

Also, is there a possibility to create !kernel socket in kcm_clone()? I forgot
the reasons, why we can't do that in some places.

Thanks,
Kirill

^ permalink raw reply

* Re: [PATCH v2] netfilter: properly initialize xt_table_info structure
From: Michal Kubecek @ 2018-05-31 11:32 UTC (permalink / raw)
  To: peter pi
  Cc: Greg Kroah-Hartman, Florian Westphal, Jan Engelhardt,
	Eric Dumazet, Greg Hackmann, Pablo Neira Ayuso, Jozsef Kadlecsik,
	netfilter-devel, coreteam, netdev
In-Reply-To: <CANZU63VE7fWNL+PJrLp7-5PBS6R6RQPvhw2QgqAK8NhX4uQc9Q@mail.gmail.com>

On Thu, May 31, 2018 at 05:40:40PM +0800, peter pi wrote:
> 
> My test method is very simple:
> 1, In copy_to_user, add a function call like my_examine(from, n) to check
> every 8 bytes. There is an kernel function called  virt_addr_valid which
> can check if the value is a address value.
> 2, Print a kernel log when there is a leak detected in function my_examine
> 3, Run iptables-save or ip6tables-save in shell, it will hit the kernel
> code path of the problem

I think I start to understand the problem. IPT_SO_GET_ENTRIES leads to
calling copy_entries_to_user() which copies the entries as they are to
user provided buffer. It also copies instances of struct xt_entry_match
and struct xt_entry_target which contain kernel pointers. We then
rewrite them with match/target name for userspace but the layout looks
(on x86_64) like this

/* offset    |  size */  type = struct xt_entry_match {
/*    0      |    32 */    union {
/*                32 */        struct {
/*    0      |     2 */            __u16 match_size;
/*    2      |    29 */            char name[29];
/*   31      |     1 */            __u8 revision;

                                   /* total size (bytes):   32 */
                               } user;
/*                16 */        struct {
/*    0      |     2 */            __u16 match_size;
/* XXX  6-byte hole  */
/*    8      |     8 */            struct xt_match *match;

                                   /* total size (bytes):   16 */
                               } kernel;
/*                 2 */        __u16 match_size;

                               /* total size (bytes):   32 */
                           } u;
/*   32      |     0 */    unsigned char data[];

                           /* total size (bytes):   32 */
                         }

so that if match name is no longer than five characters (which is often
the case), writing to .u.user.name leaves .u.kernel.match untouched. The
same problem exists in struct xt_entry_target.

Unless there are other kernel pointers leaked, the solution should be
simple: explicitly zero the copy of .u.kernel.match (.u.kernel.target)
before we copy the name. I haven't checked yet if compat_ code path
suffers from the same problem.

Michal Kubecek

^ permalink raw reply

* Re: [PATCH v2] netfilter: properly initialize xt_table_info structure
From: Greg Kroah-Hartman @ 2018-05-31 11:23 UTC (permalink / raw)
  To: peter pi
  Cc: Florian Westphal, Jan Engelhardt, Eric Dumazet, Greg Hackmann,
	Pablo Neira Ayuso, Jozsef Kadlecsik, Michal Kubecek,
	netfilter-devel, coreteam, netdev
In-Reply-To: <CANZU63VE7fWNL+PJrLp7-5PBS6R6RQPvhw2QgqAK8NhX4uQc9Q@mail.gmail.com>

On Thu, May 31, 2018 at 05:40:40PM +0800, peter pi wrote:
> Hi Greg,
> 
> My test method is very simple:
> 1, In copy_to_user, add a function call like my_examine(from, n) to check
> every 8 bytes. There is an kernel function called  virt_addr_valid which
> can check if the value is a address value.
> 2, Print a kernel log when there is a leak detected in function my_examine
> 3, Run iptables-save or ip6tables-save in shell, it will hit the kernel
> code path of the problem
> 
> 
> Because my test code is specified for Pixel 2, so I think you can write the
> test code yourself just about 10 lines code

Any chance you can test this on a more modern kernel, like 4.14 or
newer on a normal system?

thanks,

greg k-h

^ permalink raw reply

* [PATCH v2 06/21] iwlwifi: mvm: use match_string() helper
From: Yisheng Xie @ 2018-05-31 11:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: andy.shevchenko, Yisheng Xie, Kalle Valo, Intel Linux Wireless,
	Johannes Berg, Emmanuel Grumbach, linux-wireless, netdev
In-Reply-To: <1527765086-19873-1-git-send-email-xieyisheng1@huawei.com>

match_string() returns the index of an array for a matching string,
which can be used instead of open coded variant.

Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Intel Linux Wireless <linuxwifi@intel.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
---
v2:
 - let ret get return value of match_string  - per Andy

 drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c b/drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c
index 0e6401c..d7ac511 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c
@@ -671,16 +671,11 @@ static ssize_t iwl_dbgfs_bt_cmd_read(struct file *file, char __user *user_buf,
 	};
 	int ret, bt_force_ant_mode;
 
-	for (bt_force_ant_mode = 0;
-	     bt_force_ant_mode < ARRAY_SIZE(modes_str);
-	     bt_force_ant_mode++) {
-		if (!strcmp(buf, modes_str[bt_force_ant_mode]))
-			break;
-	}
-
-	if (bt_force_ant_mode >= ARRAY_SIZE(modes_str))
-		return -EINVAL;
+	ret = match_string(modes_str, ARRAY_SIZE(modes_str), buf);
+	if (ret < 0)
+		return ret;
 
+	bt_force_ant_mode = ret;
 	ret = 0;
 	mutex_lock(&mvm->mutex);
 	if (mvm->bt_force_ant_mode == bt_force_ant_mode)
-- 
1.7.12.4

^ permalink raw reply related

* [PATCH v2 05/21] hp100: use match_string() helper
From: Yisheng Xie @ 2018-05-31 11:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: andy.shevchenko, Yisheng Xie, Jaroslav Kysela, netdev
In-Reply-To: <1527765086-19873-1-git-send-email-xieyisheng1@huawei.com>

match_string() returns the index of an array for a matching string,
which can be used instead of open coded variant.

Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: netdev@vger.kernel.org
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
---
v2:
 - add Reviewed-by tag.

 drivers/net/ethernet/hp/hp100.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/hp/hp100.c b/drivers/net/ethernet/hp/hp100.c
index c8c7ad2..84501b3 100644
--- a/drivers/net/ethernet/hp/hp100.c
+++ b/drivers/net/ethernet/hp/hp100.c
@@ -335,7 +335,6 @@ static const char *hp100_read_id(int ioaddr)
 static __init int hp100_isa_probe1(struct net_device *dev, int ioaddr)
 {
 	const char *sig;
-	int i;
 
 	if (!request_region(ioaddr, HP100_REGION_SIZE, "hp100"))
 		goto err;
@@ -351,13 +350,7 @@ static __init int hp100_isa_probe1(struct net_device *dev, int ioaddr)
 	if (sig == NULL)
 		goto err;
 
-	for (i = 0; i < ARRAY_SIZE(hp100_isa_tbl); i++) {
-		if (!strcmp(hp100_isa_tbl[i], sig))
-			break;
-
-	}
-
-	if (i < ARRAY_SIZE(hp100_isa_tbl))
+	if (match_string(hp100_isa_tbl, ARRAY_SIZE(hp100_isa_tbl), sig) >= 0)
 		return hp100_probe1(dev, ioaddr, HP100_BUS_ISA, NULL);
  err:
 	return -ENODEV;
-- 
1.7.12.4

^ permalink raw reply related

* [PATCH v2 04/21] cxgb4: use match_string() helper
From: Yisheng Xie @ 2018-05-31 11:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: andy.shevchenko, Yisheng Xie, Ganesh Goudar, netdev
In-Reply-To: <1527765086-19873-1-git-send-email-xieyisheng1@huawei.com>

match_string() returns the index of an array for a matching string,
which can be used instead of open coded variant.

Cc: Ganesh Goudar <ganeshgr@chelsio.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
---
v2:
 - no change from v1.

 drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
index 9da6f57..bd61610 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
@@ -782,17 +782,11 @@ static int cudbg_get_mem_region(struct adapter *padap,
 	if (rc)
 		return rc;
 
-	for (i = 0; i < ARRAY_SIZE(cudbg_region); i++) {
-		if (!strcmp(cudbg_region[i], region_name)) {
-			found = 1;
-			idx = i;
-			break;
-		}
-	}
-	if (!found)
-		return -EINVAL;
+	rc = match_string(cudbg_region, ARRAY_SIZE(cudbg_region), region_name);
+	if (rc < 0)
+		return rc;
 
-	found = 0;
+	idx = rc;
 	for (i = 0; i < meminfo->mem_c; i++) {
 		if (meminfo->mem[i].idx >= ARRAY_SIZE(cudbg_region))
 			continue; /* Skip holes */
-- 
1.7.12.4

^ permalink raw reply related

* Re: [GIT PULL 0/7] perf/urgent fixes
From: Ingo Molnar @ 2018-05-31 10:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Clark Williams, linux-kernel, linux-perf-users, Adrian Hunter,
	Agustin Vega-Frias, Alexander Shishkin, Andi Kleen, coresight,
	Daniel Borkmann, David Ahern, Ganapatrao Kulkarni, Heiko Carstens,
	He Kuang, Hendrik Brueckner, Jin Yao, Jiri Olsa, Jonathan Corbet,
	Kan Liang, kim.phillips, Kim Phillips, Lakshman 
In-Reply-To: <20180531103220.24684-1-acme@kernel.org>


* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> Hi Ingo,
> 
> 	Please consider pulling,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit f3903c9161f0d636a7b0ff03841628928457e64c:
> 
>   Merge tag 'perf-urgent-for-mingo-4.17-20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent (2018-05-15 08:20:45 +0200)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-urgent-for-mingo-4.17-20180531
> 
> for you to fetch changes up to 18a7057420f8b67f15d17087bf5c0863db752c8b:
> 
>   perf tools: Fix perf.data format description of NRCPUS header (2018-05-30 15:40:26 -0300)
> 
> ----------------------------------------------------------------
> perf/urgent fixes:
> 
> - Fix 'perf test Session topology' segfault on s390 (Thomas Richter)
> 
> - Fix NULL return handling in bpf__prepare_load() (YueHaibing)
> 
> - Fix indexing on Coresight ETM packet queue decoder (Mathieu Poirier)
> 
> - Fix perf.data format description of NRCPUS header (Arnaldo Carvalho de Melo)
> 
> - Update perf.data documentation section on cpu topology
> 
> - Handle uncore event aliases in small groups properly (Kan Liang)
> 
> - Add missing perf_sample.addr into python sample dictionary (Leo Yan)
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> ----------------------------------------------------------------
> Arnaldo Carvalho de Melo (1):
>       perf tools: Fix perf.data format description of NRCPUS header
> 
> Kan Liang (1):
>       perf parse-events: Handle uncore event aliases in small groups properly
> 
> Leo Yan (1):
>       perf script python: Add addr into perf sample dict
> 
> Mathieu Poirier (1):
>       perf cs-etm: Fix indexing for decoder packet queue
> 
> Thomas Richter (2):
>       perf test: "Session topology" dumps core on s390
>       perf data: Update documentation section on cpu topology
> 
> YueHaibing (1):
>       perf bpf: Fix NULL return handling in bpf__prepare_load()
> 
>  tools/perf/Documentation/perf.data-file-format.txt |  10 +-
>  tools/perf/tests/topology.c                        |  30 ++++-
>  tools/perf/util/bpf-loader.c                       |   6 +-
>  tools/perf/util/cs-etm-decoder/cs-etm-decoder.c    |  12 +-
>  tools/perf/util/evsel.h                            |   1 +
>  tools/perf/util/parse-events.c                     | 130 ++++++++++++++++++++-
>  tools/perf/util/parse-events.h                     |   7 +-
>  tools/perf/util/parse-events.y                     |   8 +-
>  .../util/scripting-engines/trace-event-python.c    |   2 +
>  9 files changed, 185 insertions(+), 21 deletions(-)

Pulled, thanks a lot Arnaldo!

	Ingo

^ permalink raw reply

* [PATCH 3/7] perf bpf: Fix NULL return handling in bpf__prepare_load()
From: Arnaldo Carvalho de Melo @ 2018-05-31 10:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Clark Williams, linux-kernel, linux-perf-users, YueHaibing,
	Alexander Shishkin, Namhyung Kim, Peter Zijlstra, netdev,
	Arnaldo Carvalho de Melo
In-Reply-To: <20180531103220.24684-1-acme@kernel.org>

From: YueHaibing <yuehaibing@huawei.com>

bpf_object__open()/bpf_object__open_buffer can return error pointer or
NULL, check the return values with IS_ERR_OR_NULL() in bpf__prepare_load
and bpf__prepare_load_buffer

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: netdev@vger.kernel.org
Link: https://lkml.kernel.org/n/tip-psf4xwc09n62al2cb9s33v9h@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/bpf-loader.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index af7ad814b2c3..cee658733e2c 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -66,7 +66,7 @@ bpf__prepare_load_buffer(void *obj_buf, size_t obj_buf_sz, const char *name)
 	}
 
 	obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, name);
-	if (IS_ERR(obj)) {
+	if (IS_ERR_OR_NULL(obj)) {
 		pr_debug("bpf: failed to load buffer\n");
 		return ERR_PTR(-EINVAL);
 	}
@@ -102,14 +102,14 @@ struct bpf_object *bpf__prepare_load(const char *filename, bool source)
 			pr_debug("bpf: successfull builtin compilation\n");
 		obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, filename);
 
-		if (!IS_ERR(obj) && llvm_param.dump_obj)
+		if (!IS_ERR_OR_NULL(obj) && llvm_param.dump_obj)
 			llvm__dump_obj(filename, obj_buf, obj_buf_sz);
 
 		free(obj_buf);
 	} else
 		obj = bpf_object__open(filename);
 
-	if (IS_ERR(obj)) {
+	if (IS_ERR_OR_NULL(obj)) {
 		pr_debug("bpf: failed to load %s\n", filename);
 		return obj;
 	}
-- 
2.14.3

^ permalink raw reply related

* [GIT PULL 0/7] perf/urgent fixes
From: Arnaldo Carvalho de Melo @ 2018-05-31 10:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Clark Williams, linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, Adrian Hunter, Agustin Vega-Frias,
	Alexander Shishkin, Andi Kleen, coresight, Daniel Borkmann,
	David Ahern, Ganapatrao Kulkarni, Heiko Carstens, He Kuang,
	Hendrik Brueckner, Jin Yao, Jiri Olsa, Jonathan Corbet, Kan Liang,
	kim.phillips, Kim 

Hi Ingo,

	Please consider pulling,

- Arnaldo

Test results at the end of this message, as usual.

The following changes since commit f3903c9161f0d636a7b0ff03841628928457e64c:

  Merge tag 'perf-urgent-for-mingo-4.17-20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent (2018-05-15 08:20:45 +0200)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-urgent-for-mingo-4.17-20180531

for you to fetch changes up to 18a7057420f8b67f15d17087bf5c0863db752c8b:

  perf tools: Fix perf.data format description of NRCPUS header (2018-05-30 15:40:26 -0300)

----------------------------------------------------------------
perf/urgent fixes:

- Fix 'perf test Session topology' segfault on s390 (Thomas Richter)

- Fix NULL return handling in bpf__prepare_load() (YueHaibing)

- Fix indexing on Coresight ETM packet queue decoder (Mathieu Poirier)

- Fix perf.data format description of NRCPUS header (Arnaldo Carvalho de Melo)

- Update perf.data documentation section on cpu topology

- Handle uncore event aliases in small groups properly (Kan Liang)

- Add missing perf_sample.addr into python sample dictionary (Leo Yan)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

----------------------------------------------------------------
Arnaldo Carvalho de Melo (1):
      perf tools: Fix perf.data format description of NRCPUS header

Kan Liang (1):
      perf parse-events: Handle uncore event aliases in small groups properly

Leo Yan (1):
      perf script python: Add addr into perf sample dict

Mathieu Poirier (1):
      perf cs-etm: Fix indexing for decoder packet queue

Thomas Richter (2):
      perf test: "Session topology" dumps core on s390
      perf data: Update documentation section on cpu topology

YueHaibing (1):
      perf bpf: Fix NULL return handling in bpf__prepare_load()

 tools/perf/Documentation/perf.data-file-format.txt |  10 +-
 tools/perf/tests/topology.c                        |  30 ++++-
 tools/perf/util/bpf-loader.c                       |   6 +-
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c    |  12 +-
 tools/perf/util/evsel.h                            |   1 +
 tools/perf/util/parse-events.c                     | 130 ++++++++++++++++++++-
 tools/perf/util/parse-events.h                     |   7 +-
 tools/perf/util/parse-events.y                     |   8 +-
 .../util/scripting-engines/trace-event-python.c    |   2 +
 9 files changed, 185 insertions(+), 21 deletions(-)

Test results:

The first ones are container (docker) based builds of tools/perf with
and without libelf support.  Where clang is available, it is also used
to build perf with/without libelf, and building with LIBCLANGLLVM=1
(built-in clang) with gcc and clang when clang and its devel libraries
are installed.

The objtool and samples/bpf/ builds are disabled now that I'm switching from
using the sources in a local volume to fetching them from a http server to
build it inside the container, to make it easier to build in a container cluster.
Those will come back later.

Several are cross builds, the ones with -x-ARCH and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.

The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.

Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Get in contact if interested in helping having this in place.

   1 alpine:3.4                    : Ok   gcc (Alpine 5.3.0) 5.3.0
   2 alpine:3.5                    : Ok   gcc (Alpine 6.2.1) 6.2.1 20160822
   3 alpine:3.6                    : Ok   gcc (Alpine 6.3.0) 6.3.0
   4 alpine:3.7                    : Ok   gcc (Alpine 6.4.0) 6.4.0
   5 alpine:edge                   : Ok   gcc (Alpine 6.4.0) 6.4.0
   6 amazonlinux:1                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
   7 amazonlinux:2                 : Ok   gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
   8 android-ndk:r12b-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
   9 android-ndk:r15c-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
  10 centos:5                      : Ok   gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55)
  11 centos:6                      : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)
  12 centos:7                      : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
  13 debian:7                      : Ok   gcc (Debian 4.7.2-5) 4.7.2
  14 debian:8                      : Ok   gcc (Debian 4.9.2-10+deb8u1) 4.9.2
  15 debian:9                      : Ok   gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
  16 debian:experimental           : Ok   gcc (Debian 7.3.0-19) 7.3.0
  17 debian:experimental-x-arm64   : Ok   aarch64-linux-gnu-gcc (Debian 7.3.0-19) 7.3.0
  18 debian:experimental-x-mips    : Ok   mips-linux-gnu-gcc (Debian 7.3.0-19) 7.3.0
  19 debian:experimental-x-mips64  : Ok   mips64-linux-gnuabi64-gcc (Debian 7.3.0-18) 7.3.0
  20 debian:experimental-x-mipsel  : Ok   mipsel-linux-gnu-gcc (Debian 7.3.0-19) 7.3.0
  21 fedora:20                     : Ok   gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
  22 fedora:21                     : Ok   gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
  23 fedora:22                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
  24 fedora:23                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
  25 fedora:24                     : Ok   gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1)
  26 fedora:24-x-ARC-uClibc        : Ok   arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
  27 fedora:25                     : Ok   gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1)
  28 fedora:26                     : Ok   gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2)
  29 fedora:27                     : Ok   gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
  30 fedora:28                     : Ok   gcc (GCC) 8.1.1 20180502 (Red Hat 8.1.1-1)
  31 fedora:rawhide                : Ok   gcc (GCC) 8.0.1 20180324 (Red Hat 8.0.1-0.20)
  32 gentoo-stage3-amd64:latest    : Ok   gcc (Gentoo 6.4.0-r1 p1.3) 6.4.0
  33 mageia:5                      : Ok   gcc (GCC) 4.9.2
  34 mageia:6                      : Ok   gcc (Mageia 5.5.0-1.mga6) 5.5.0
  35 opensuse:42.1                 : Ok   gcc (SUSE Linux) 4.8.5
  36 opensuse:42.2                 : Ok   gcc (SUSE Linux) 4.8.5
  37 opensuse:42.3                 : Ok   gcc (SUSE Linux) 4.8.5
  38 opensuse:tumbleweed           : Ok   gcc (SUSE Linux) 7.3.1 20180323 [gcc-7-branch revision 258812]
  39 oraclelinux:6                 : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18.0.7)
  40 oraclelinux:7                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28.0.1)
  41 ubuntu:12.04.5                : Ok   gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
  42 ubuntu:14.04.4                : Ok   gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
  43 ubuntu:14.04.4-x-linaro-arm64 : Ok   aarch64-linux-gnu-gcc (Linaro GCC 5.5-2017.10) 5.5.0
  44 ubuntu:16.04                  : Ok   gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  45 ubuntu:16.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  46 ubuntu:16.04-x-arm64          : Ok   aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  47 ubuntu:16.04-x-powerpc        : Ok   powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  48 ubuntu:16.04-x-powerpc64      : Ok   powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  49 ubuntu:16.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  50 ubuntu:16.04-x-s390           : Ok   s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
  51 ubuntu:16.10                  : Ok   gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005
  52 ubuntu:17.04                  : Ok   gcc (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406
  53 ubuntu:17.10                  : Ok   gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0
  54 ubuntu:18.04                  : Ok   gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0

  # git log --oneline -1
  18a7057420f8 (HEAD -> perf/urgent) perf tools: Fix perf.data format description of NRCPUS header
  # perf --version
  perf version 4.17.rc5.g18a7057
  # uname -a
  Linux jouet 4.17.0-rc5 #21 SMP Mon May 14 15:35:35 -03 2018 x86_64 x86_64 x86_64 GNU/Linux
  # perf test
   1: vmlinux symtab matches kallsyms                       : Ok
   2: Detect openat syscall event                           : Ok
   3: Detect openat syscall event on all cpus               : Ok
   4: Read samples using the mmap interface                 : Ok
   5: Test data source output                               : Ok
   6: Parse event definition strings                        : Ok
   7: Simple expression parser                              : Ok
   8: PERF_RECORD_* events & perf_sample fields             : Ok
   9: Parse perf pmu format                                 : Ok
  10: DSO data read                                         : Ok
  11: DSO data cache                                        : Ok
  12: DSO data reopen                                       : Ok
  13: Roundtrip evsel->name                                 : Ok
  14: Parse sched tracepoints fields                        : Ok
  15: syscalls:sys_enter_openat event fields                : Ok
  16: Setup struct perf_event_attr                          : Ok
  17: Match and link multiple hists                         : Ok
  18: 'import perf' in python                               : Ok
  19: Breakpoint overflow signal handler                    : Ok
  20: Breakpoint overflow sampling                          : Ok
  21: Breakpoint accounting                                 : Ok
  22: Number of exit events of a simple workload            : Ok
  23: Software clock events period values                   : Ok
  24: Object code reading                                   : Ok
  25: Sample parsing                                        : Ok
  26: Use a dummy software event to keep tracking           : Ok
  27: Parse with no sample_id_all bit set                   : Ok
  28: Filter hist entries                                   : Ok
  29: Lookup mmap thread                                    : Ok
  30: Share thread mg                                       : Ok
  31: Sort output of hist entries                           : Ok
  32: Cumulate child hist entries                           : Ok
  33: Track with sched_switch                               : Ok
  34: Filter fds with revents mask in a fdarray             : Ok
  35: Add fd to a fdarray, making it autogrow               : Ok
  36: kmod_path__parse                                      : Ok
  37: Thread map                                            : Ok
  38: LLVM search and compile                               :
  38.1: Basic BPF llvm compile                              : Ok
  38.2: kbuild searching                                    : Ok
  38.3: Compile source for BPF prologue generation          : Ok
  38.4: Compile source for BPF relocation                   : Ok
  39: Session topology                                      : Ok
  40: BPF filter                                            :
  40.1: Basic BPF filtering                                 : Ok
  40.2: BPF pinning                                         : Ok
  40.3: BPF prologue generation                             : Ok
  40.4: BPF relocation checker                              : Ok
  41: Synthesize thread map                                 : Ok
  42: Remove thread map                                     : Ok
  43: Synthesize cpu map                                    : Ok
  44: Synthesize stat config                                : Ok
  45: Synthesize stat                                       : Ok
  46: Synthesize stat round                                 : Ok
  47: Synthesize attr update                                : Ok
  48: Event times                                           : Ok
  49: Read backward ring buffer                             : Ok
  50: Print cpu map                                         : Ok
  51: Probe SDT events                                      : Ok
  52: is_printable_array                                    : Ok
  53: Print bitmap                                          : Ok
  54: perf hooks                                            : Ok
  55: builtin clang support                                 : Skip (not compiled in)
  56: unit_number__scnprintf                                : Ok
  57: mem2node                                              : Ok
  58: x86 rdpmc                                             : Ok
  59: Convert perf time to TSC                              : Ok
  60: DWARF unwind                                          : Ok
  61: x86 instruction decoder - new instructions            : Ok
  62: Use vfs_getname probe to get syscall args filenames   : Ok
  63: Check open filename arg using perf trace + vfs_getname: Ok
  64: probe libc's inet_pton & backtrace it with ping       : Ok
  65: Add vfs_getname probe to get syscall args filenames   : Ok
  #

  $ make -C tools/perf build-test
  make: Entering directory '/home/acme/git/perf/tools/perf'
  - tarpkg: ./tests/perf-targz-src-pkg .
                   make_pure_O: make
                 make_static_O: make LDFLAGS=-static
           make_no_libpython_O: make NO_LIBPYTHON=1
                    make_doc_O: make doc
  make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
              make_no_libbpf_O: make NO_LIBBPF=1
           make_no_backtrace_O: make NO_BACKTRACE=1
            make_install_bin_O: make install-bin
            make_no_auxtrace_O: make NO_AUXTRACE=1
                  make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
                   make_tags_O: make tags
         make_install_prefix_O: make install prefix=/tmp/krava
              make_clean_all_O: make clean all
             make_no_libperl_O: make NO_LIBPERL=1
             make_util_map_o_O: make util/map.o
        make_with_babeltrace_O: make LIBBABELTRACE=1
                 make_perf_o_O: make perf.o
           make_no_libunwind_O: make NO_LIBUNWIND=1
                make_no_newt_O: make NO_NEWT=1
            make_no_libaudit_O: make NO_LIBAUDIT=1
                make_no_gtk2_O: make NO_GTK2=1
                make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1
             make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
           make_no_libbionic_O: make NO_LIBBIONIC=1
         make_with_clangllvm_O: make LIBCLANGLLVM=1
   make_install_prefix_slash_O: make install prefix=/tmp/krava/
             make_no_libnuma_O: make NO_LIBNUMA=1
               make_no_slang_O: make NO_SLANG=1
                make_install_O: make install
              make_no_libelf_O: make NO_LIBELF=1
       make_util_pmu_bison_o_O: make util/pmu-bison.o
                   make_help_O: make help
                  make_debug_O: make DEBUG=1
            make_no_demangle_O: make NO_DEMANGLE=1
  OK
  make: Leaving directory '/home/acme/git/perf/tools/perf'
  $

^ permalink raw reply

* [PATCH 1/2] xfrm6: avoid potential infinite loop in _decode_session6()
From: Steffen Klassert @ 2018-05-31 10:23 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180531102326.5728-1-steffen.klassert@secunet.com>

From: Eric Dumazet <edumazet@google.com>

syzbot found a way to trigger an infinitie loop by overflowing
@offset variable that has been forced to use u16 for some very
obscure reason in the past.

We probably want to look at NEXTHDR_FRAGMENT handling which looks
wrong, in a separate patch.

In net-next, we shall try to use skb_header_pointer() instead of
pskb_may_pull().

watchdog: BUG: soft lockup - CPU#1 stuck for 134s! [syz-executor738:4553]
Modules linked in:
irq event stamp: 13885653
hardirqs last  enabled at (13885652): [<ffffffff878009d5>] restore_regs_and_return_to_kernel+0x0/0x2b
hardirqs last disabled at (13885653): [<ffffffff87800905>] interrupt_entry+0xb5/0xf0 arch/x86/entry/entry_64.S:625
softirqs last  enabled at (13614028): [<ffffffff84df0809>] tun_napi_alloc_frags drivers/net/tun.c:1478 [inline]
softirqs last  enabled at (13614028): [<ffffffff84df0809>] tun_get_user+0x1dd9/0x4290 drivers/net/tun.c:1825
softirqs last disabled at (13614032): [<ffffffff84df1b6f>] tun_get_user+0x313f/0x4290 drivers/net/tun.c:1942
CPU: 1 PID: 4553 Comm: syz-executor738 Not tainted 4.17.0-rc3+ #40
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:check_kcov_mode kernel/kcov.c:67 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0x20/0x50 kernel/kcov.c:101
RSP: 0018:ffff8801d8cfe250 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: ffff8801d88a8080 RBX: ffff8801d7389e40 RCX: 0000000000000006
RDX: 0000000000000000 RSI: ffffffff868da4ad RDI: ffff8801c8a53277
RBP: ffff8801d8cfe250 R08: ffff8801d88a8080 R09: ffff8801d8cfe3e8
R10: ffffed003b19fc87 R11: ffff8801d8cfe43f R12: ffff8801c8a5327f
R13: 0000000000000000 R14: ffff8801c8a4e5fe R15: ffff8801d8cfe3e8
FS:  0000000000d88940(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 00000001acab3000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 _decode_session6+0xc1d/0x14f0 net/ipv6/xfrm6_policy.c:150
 __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:2368
 xfrm_decode_session_reverse include/net/xfrm.h:1213 [inline]
 icmpv6_route_lookup+0x395/0x6e0 net/ipv6/icmp.c:372
 icmp6_send+0x1982/0x2da0 net/ipv6/icmp.c:551
 icmpv6_send+0x17a/0x300 net/ipv6/ip6_icmp.c:43
 ip6_input_finish+0x14e1/0x1a30 net/ipv6/ip6_input.c:305
 NF_HOOK include/linux/netfilter.h:288 [inline]
 ip6_input+0xe1/0x5e0 net/ipv6/ip6_input.c:327
 dst_input include/net/dst.h:450 [inline]
 ip6_rcv_finish+0x29c/0xa10 net/ipv6/ip6_input.c:71
 NF_HOOK include/linux/netfilter.h:288 [inline]
 ipv6_rcv+0xeb8/0x2040 net/ipv6/ip6_input.c:208
 __netif_receive_skb_core+0x2468/0x3650 net/core/dev.c:4646
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4711
 netif_receive_skb_internal+0x126/0x7b0 net/core/dev.c:4785
 napi_frags_finish net/core/dev.c:5226 [inline]
 napi_gro_frags+0x631/0xc40 net/core/dev.c:5299
 tun_get_user+0x3168/0x4290 drivers/net/tun.c:1951
 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1996
 call_write_iter include/linux/fs.h:1784 [inline]
 do_iter_readv_writev+0x859/0xa50 fs/read_write.c:680
 do_iter_write+0x185/0x5f0 fs/read_write.c:959
 vfs_writev+0x1c7/0x330 fs/read_write.c:1004
 do_writev+0x112/0x2f0 fs/read_write.c:1039
 __do_sys_writev fs/read_write.c:1112 [inline]
 __se_sys_writev fs/read_write.c:1109 [inline]
 __x64_sys_writev+0x75/0xb0 fs/read_write.c:1109
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reported-by: syzbot+0053c8...@syzkaller.appspotmail.com
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv6/xfrm6_policy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 416fe67271a9..86dba282a147 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -126,7 +126,7 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int reverse)
 	struct flowi6 *fl6 = &fl->u.ip6;
 	int onlyproto = 0;
 	const struct ipv6hdr *hdr = ipv6_hdr(skb);
-	u16 offset = sizeof(*hdr);
+	u32 offset = sizeof(*hdr);
 	struct ipv6_opt_hdr *exthdr;
 	const unsigned char *nh = skb_network_header(skb);
 	u16 nhoff = IP6CB(skb)->nhoff;
-- 
2.14.1

^ permalink raw reply related

* [PATCH 2/2] xfrm Fix potential error pointer dereference in xfrm_bundle_create.
From: Steffen Klassert @ 2018-05-31 10:23 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180531102326.5728-1-steffen.klassert@secunet.com>

We may derference an invalid pointer in the error path of
xfrm_bundle_create(). Fix this by returning this error
pointer directly instead of assigning it to xdst0.

Fixes: 45b018beddb6 ("ipsec: Create and use new helpers for dst child access.")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_policy.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 40b54cc64243..5f48251c1319 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1658,7 +1658,6 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 		trailer_len -= xdst_prev->u.dst.xfrm->props.trailer_len;
 	}

-out:
 	return &xdst0->u.dst;

 put_states:
@@ -1667,8 +1666,8 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 free_dst:
 	if (xdst0)
 		dst_release_immediate(&xdst0->u.dst);
-	xdst0 = ERR_PTR(err);
-	goto out;
+
+	return ERR_PTR(err);
 }

 static int xfrm_expand_policies(const struct flowi *fl, u16 family,
-- 
2.14.1

^ permalink raw reply related

* pull request (net): ipsec 2018-05-31
From: Steffen Klassert @ 2018-05-31 10:23 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

1) Avoid possible overflow of the offset variable
   in  _decode_session6(), this fixes an infinite
   lookp there. From Eric Dumazet.

2) We may use an error pointer in the error path of
   xfrm_bundle_create(). Fix this by returning this
   pointer directly to the caller.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit 2c5d5b13c6eb79f5677e206b8aad59b3a2097f60:

  llc: better deal with too small mtu (2018-05-08 00:11:40 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git master

for you to fetch changes up to 38369f54d97dd7dc50c73a2797bfeb53c2e87d2d:

  xfrm Fix potential error pointer dereference in xfrm_bundle_create. (2018-05-31 09:53:04 +0200)

----------------------------------------------------------------
Eric Dumazet (1):
      xfrm6: avoid potential infinite loop in _decode_session6()

Steffen Klassert (1):
      xfrm Fix potential error pointer dereference in xfrm_bundle_create.

 net/ipv6/xfrm6_policy.c | 2 +-
 net/xfrm/xfrm_policy.c  | 5 ++---
 2 files changed, 3 insertions(+), 4 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next 0/8] nfp: offload LAG for tc flower egress
From: John Hurley @ 2018-05-31 10:20 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jakub Kicinski, David Miller, Linux Netdev List, oss-drivers,
	Jay Vosburgh, Veaceslav Falico, Andy Gospodarek
In-Reply-To: <20180530202954.GF2010@nanopsycho>

On Wed, May 30, 2018 at 9:29 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Wed, May 30, 2018 at 11:26:23AM CEST, john.hurley@netronome.com wrote:
>>On Tue, May 29, 2018 at 11:09 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Tue, May 29, 2018 at 04:08:48PM CEST, john.hurley@netronome.com wrote:
>>>>On Sat, May 26, 2018 at 3:47 AM, Jakub Kicinski
>>>><jakub.kicinski@netronome.com> wrote:
>>>>> On Fri, 25 May 2018 08:48:09 +0200, Jiri Pirko wrote:
>>>>>> Thu, May 24, 2018 at 04:22:47AM CEST, jakub.kicinski@netronome.com wrote:
>>>>>> >Hi!
>>>>>> >
>>>>>> >This series from John adds bond offload to the nfp driver.  Patch 5
>>>>>> >exposes the hash type for NETDEV_LAG_TX_TYPE_HASH to make sure nfp
>>>>>> >hashing matches that of the software LAG.  This may be unnecessarily
>>>>>> >conservative, let's see what LAG maintainers think :)
>>>>>>
>>>>>> So you need to restrict offload to only certain hash algo? In mlxsw, we
>>>>>> just ignore the lag setting and do some hw default hashing. Would not be
>>>>>> enough? Note that there's a good reason for it, as you see, in team, the
>>>>>> hashing is done in a BPF function and could be totally arbitrary.
>>>>>> Your patchset effectively disables team offload for nfp.
>>>>>
>>>>> My understanding is that the project requirements only called for L3/L4
>>>>> hash algorithm offload, hence the temptation to err on the side of
>>>>> caution and not offload all the bond configurations.  John can provide
>>>>> more details.  Not being able to offload team is unfortunate indeed.
>>>>
>>>>Hi Jiri,
>>>>Yes, as Jakub mentions, we restrict ourselves to L3/L4 hash algorithm
>>>>as this is currently what is supported in fw.
>>>
>>> In mlxsw, a default l3/l4 is used always, no matter what the
>>> bonding/team sets. It is not correct, but it works with team as well.
>>> Perhaps we can have NETDEV_LAG_HASH_UNKNOWN to indicate to the driver to
>>> do some default? That would make the "team" offload functional.
>>>
>>
>>yes, I would agree with that.
>>Thanks
>
> Okay, would you please adjust your driver?
>

Will do.
Thanks, Jiri

> I will teka care of mlxsw bits.
>
> Thanks!
>
>>
>>>>Hopefully this will change as fw features are expanded.
>>>>I understand the issue this presents with offloading team.
>>>>Perhaps resorting to a default hw hash for team is acceptable.
>>>>John

^ permalink raw reply

* [PATCH] net: core: improve the tx_hash calculating
From: Tonghao Zhang @ 2018-05-31 10:14 UTC (permalink / raw)
  To: netdev; +Cc: Tonghao Zhang

Use the % instead of while, and it may simple code and improve
the calculating. The real_num_tx_queues has been checked when
allocating and setting it.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/core/dev.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1844d9b..edc5b75 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2617,15 +2617,13 @@ void netif_device_attach(struct net_device *dev)
  */
 static u16 skb_tx_hash(const struct net_device *dev, struct sk_buff *skb)
 {
-	u32 hash;
 	u16 qoffset = 0;
 	u16 qcount = dev->real_num_tx_queues;
 
 	if (skb_rx_queue_recorded(skb)) {
-		hash = skb_get_rx_queue(skb);
-		while (unlikely(hash >= qcount))
-			hash -= qcount;
-		return hash;
+		/* When setting the real_num_tx_queues, we make sure
+		 * real_num_tx_queues != 0. */
+		return skb_get_rx_queue(skb) % qcount;
 	}
 
 	if (dev->num_tc) {
-- 
1.8.3.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox