Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS
From: Eric Dumazet @ 2018-05-11  0:55 UTC (permalink / raw)
  To: Gao Feng, Eric Dumazet
  Cc: davem@davemloft.net, daniel, jakub.kicinski, David Ahern,
	netdev@vger.kernel.org
In-Reply-To: <654af0ff.3e1.1634c90380e.Coremail.gfree.wind@vip.163.com>



On 05/10/2018 05:18 PM, Gao Feng wrote:
> At 2018-05-10 21:02:55, "Eric Dumazet" <eric.dumazet@gmail.com> wrote:
>>
>>
>> On 05/10/2018 01:28 AM, gfree.wind@vip.163.com wrote:
>>> From: Gao Feng <gfree.wind@vip.163.com>
>>>
>>> The skb flow limit is implemented for each CPU independently. In the
>>> current codes, the function skb_flow_limit gets the softnet_data by
>>> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
>>> the current cpu when enable RPS. As the result, the skb_flow_limit checks
>>> the stats of current CPU, while the skb is going to append the queue of
>>> another CPU. It isn't the expected behavior.
>>>
>>> Now pass the softnet_data as a param to softnet_data to make consistent.
>>>
>>
>> Please add a correct Fixes: tag
> 
> Thanks Eric.
> 
> I have one question about the "Fixes: tag".
> Most of patches are bug fixes, but when need to add the "Fixes: tag", and when not ?
> 
> I'm not clear about it. Could you explain it please?
> 

For this particular patch, since you have not CC Willem (author of the patch),
I found very useful that you did a search to find out.
Once you found which commit added the problem, simply add the Fixes: tag and CC: the author.

Doing so saves us (stable teams, reviewers, maintainers) a lot of time really.

In my opinion, Fixes: tags should be mandatory when applicable.

> Best Regards
> Feng
> 
>>
>> By doing so, you will likely add a CC: tag to make sure the author of the code
>> will receive your email and give feed back.
>>
>> Thanks !
>>

^ permalink raw reply

* Re: [PATCH net] tun: fix use after free for ptr_ring
From: Jason Wang @ 2018-05-11  1:29 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, LKML, Eric Dumazet,
	Michael S . Tsirkin
In-Reply-To: <CAM_iQpUVFZ-4EFeGM6eKyOrJzc2=5uu7b81d3Rf5Pf7TgZw38Q@mail.gmail.com>



On 2018年05月11日 02:08, Cong Wang wrote:
> On Tue, May 8, 2018 at 11:59 PM, Jason Wang <jasowang@redhat.com> wrote:
>> We used to initialize ptr_ring during TUNSETIFF, this is because its
>> size depends on the tx_queue_len of netdevice. And we try to clean it
>> up when socket were detached from netdevice. A race were spotted when
>> trying to do uninit during a read which will lead a use after free for
>> pointer ring. Solving this by always initialize a zero size ptr_ring
>> in open() and do resizing during TUNSETIFF, and then we can safely do
>> cleanup during close(). With this, there's no need for the workaround
>> that was introduced by commit 4df0bfc79904 ("tun: fix a memory leak
>> for tfile->tx_array").
>>
> Ah, I didn't know ptr_ring_init(0) could work... Nice patch!
> Except one thing below.
>
>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index ef33950..298cb96 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -681,15 +681,6 @@ static void tun_queue_purge(struct tun_file *tfile)
>>          skb_queue_purge(&tfile->sk.sk_error_queue);
>>   }
>>
>> -static void tun_cleanup_tx_ring(struct tun_file *tfile)
>> -{
>> -       if (tfile->tx_ring.queue) {
>> -               ptr_ring_cleanup(&tfile->tx_ring, tun_ptr_free);
>> -               xdp_rxq_info_unreg(&tfile->xdp_rxq);
>> -               memset(&tfile->tx_ring, 0, sizeof(tfile->tx_ring));
>> -       }
>> -}
>
> I don't think you can totally remove ptr_ring_cleanup(), it should be
> called unconditionally with your ptr_ring_init(0) trick, right?

Right, my bad. Actually I do intend to cleanup it at close() like what 
commit log said.

Will send v2.

Thanks

^ permalink raw reply

* Re: linux-next: Signed-off-by missing for commit in the net tree
From: Hangbin Liu @ 2018-05-11  1:30 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: David Miller, Networking, Linux-Next Mailing List,
	Linux Kernel Mailing List
In-Reply-To: <20180511071716.038a9095@canb.auug.org.au>

On Fri, May 11, 2018 at 07:17:16AM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Commit
> 
>   0e8411e426e2 ("ipv4: reset fnhe_mtu_locked after cache route flushed")
> 
> is missing a Signed-off-by from its author.

Opps, My bad.

> After route cache is flushed via ipv4_sysctl_rtcache_flush(), we forget
> to reset fnhe_mtu_locked in rt_bind_exception(). When pmtu is updated
> in __ip_rt_update_pmtu(), it will return directly since the pmtu is
> still locked. e.g.
>
> + ip netns exec client ping 10.10.1.1 -c 1 -s 1400 -M do
> PING 10.10.1.1 (10.10.1.1) 1400(1428) bytes of data.
> From 10.10.0.254 icmp_seq=1 Frag needed and DF set (mtu = 0)
>
> --- 10.10.1.1 ping statistics ---
> 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

I shouldn't add comments with the '---' lines. David reminded me before. But
I didn't realise it when pasted the ping logs. Another lesson learned...

Thanks Stephen.

Regards
Hangbin

^ permalink raw reply

* Re:Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS
From: Gao Feng @ 2018-05-11  1:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem@davemloft.net, daniel@iogearbox.net,
	jakub.kicinski@netronome.com, David Ahern, netdev@vger.kernel.org
In-Reply-To: <721ce144-2470-6124-1edd-cc7a343994a6@gmail.com>

<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><pre>At 2018-05-11 08:55:47, "Eric Dumazet" &lt;eric.dumazet@gmail.com&gt; wrote:
&gt;
&gt;
&gt;On 05/10/2018 05:18 PM, Gao Feng wrote:
&gt;&gt; At 2018-05-10 21:02:55, "Eric Dumazet" &lt;eric.dumazet@gmail.com&gt; wrote:
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; On 05/10/2018 01:28 AM, gfree.wind@vip.163.com wrote:
&gt;&gt;&gt;&gt; From: Gao Feng &lt;gfree.wind@vip.163.com&gt;
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; The skb flow limit is implemented for each CPU independently. In the
&gt;&gt;&gt;&gt; current codes, the function skb_flow_limit gets the softnet_data by
&gt;&gt;&gt;&gt; this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
&gt;&gt;&gt;&gt; the current cpu when enable RPS. As the result, the skb_flow_limit checks
&gt;&gt;&gt;&gt; the stats of current CPU, while the skb is going to append the queue of
&gt;&gt;&gt;&gt; another CPU. It isn't the expected behavior.
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; Now pass the softnet_data as a param to softnet_data to make consistent.
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; Please add a correct Fixes: tag
&gt;&gt; 
&gt;&gt; Thanks Eric.
&gt;&gt; 
&gt;&gt; I have one question about the "Fixes: tag".
&gt;&gt; Most of patches are bug fixes, but when need to add the "Fixes: tag", and when not ?
&gt;&gt; 
&gt;&gt; I'm not clear about it. Could you explain it please?
&gt;&gt; 
&gt;
&gt;For this particular patch, since you have not CC Willem (author of the patch),
&gt;I found very useful that you did a search to find out.
&gt;Once you found which commit added the problem, simply add the Fixes: tag and CC: the author.
&gt;
<div>&gt;Doing so saves us (stable teams, reviewers, maintainers) a lot of time really.</div><div><br /></div><div> Normally I get the "to" list by get_maintainer.pl script, now I would save the stable team ASAP.</div>&gt;
<div>&gt;In my opinion, Fixes: tags should be mandatory when applicable.</div><div><br /></div><div>Thanks your explanations, I get it.</div><div><br /></div><div>Best Regards</div><div>Feng</div><div><br /></div>&gt;
&gt;&gt; Best Regards
&gt;&gt; Feng
&gt;&gt; 
&gt;&gt;&gt;
&gt;&gt;&gt; By doing so, you will likely add a CC: tag to make sure the author of the code
&gt;&gt;&gt; will receive your email and give feed back.
&gt;&gt;&gt;
&gt;&gt;&gt; Thanks !
&gt;&gt;&gt;
</pre></div>

^ permalink raw reply

* Re: [PATCH] mlx4_core: allocate 4KB ICM chunks
From: Qing Huang @ 2018-05-11  1:36 UTC (permalink / raw)
  To: Yanjun Zhu, tariqt, davem; +Cc: netdev, linux-rdma, linux-kernel
In-Reply-To: <6768e075-70f5-4de3-a98a-fdffa53e0a2f@oracle.com>

Thank you for reviewing it!


On 5/10/2018 6:23 PM, Yanjun Zhu wrote:
>
>
>
> On 2018/5/11 9:15, Qing Huang wrote:
>>
>>
>>
>> On 5/10/2018 5:13 PM, Yanjun Zhu wrote:
>>>
>>>
>>> On 2018/5/11 7:31, Qing Huang wrote:
>>>> When a system is under memory presure (high usage with fragments),
>>>> the original 256KB ICM chunk allocations will likely trigger kernel
>>>> memory management to enter slow path doing memory compact/migration
>>>> ops in order to complete high order memory allocations.
>>>>
>>>> When that happens, user processes calling uverb APIs may get stuck
>>>> for more than 120s easily even though there are a lot of free pages
>>>> in smaller chunks available in the system.
>>>>
>>>> Syslog:
>>>> ...
>>>> Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
>>>> oracle_205573_e:205573 blocked for more than 120 seconds.
>>>> ...
>>>>
>>>> With 4KB ICM chunk size, the above issue is fixed.
>>>>
>>>> However in order to support 4KB ICM chunk size, we need to fix another
>>>> issue in large size kcalloc allocations.
>>>>
>>>> E.g.
>>>> Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
>>>> size, each ICM chunk can only hold 512 mtt entries (8 bytes for 
>>>> each mtt
>>>> entry). So we need a 16MB allocation for a table->icm pointer array to
>>>> hold 2M pointers which can easily cause kcalloc to fail.
>>>>
>>>> The solution is to use vzalloc to replace kcalloc. There is no need
>>>> for contiguous memory pages for a driver meta data structure (no need
>>> Hi,
>>>
>>> Replace continuous memory pages with virtual memory, is there any 
>>> performance loss?
>>
>> Not really. "table->icm" will be accessed as individual pointer 
>> variables randomly. Kcalloc
>
> Sure. Thanks. If "table->icm" will be accessed as individual pointer 
> variables randomly, the performance loss
> caused by discontinuous memory will be very trivial.
>
> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>
>> also returns a virtual address except its mapped pages are guaranteed 
>> to be contiguous
>> which will provide little advantage over vzalloc for individual 
>> pointer variable access.
>>
>> Qing
>>
>>>
>>> Zhu Yanjun
>>>> of DMA ops).
>>>>
>>>> Signed-off-by: Qing Huang <qing.huang@oracle.com>
>>>> Acked-by: Daniel Jurgens <danielj@mellanox.com>
>>>> ---
>>>>   drivers/net/ethernet/mellanox/mlx4/icm.c | 14 +++++++-------
>>>>   1 file changed, 7 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c 
>>>> b/drivers/net/ethernet/mellanox/mlx4/icm.c
>>>> index a822f7a..2b17a4b 100644
>>>> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
>>>> @@ -43,12 +43,12 @@
>>>>   #include "fw.h"
>>>>     /*
>>>> - * We allocate in as big chunks as we can, up to a maximum of 256 KB
>>>> - * per chunk.
>>>> + * We allocate in 4KB page size chunks to avoid high order memory
>>>> + * allocations in fragmented/high usage memory situation.
>>>>    */
>>>>   enum {
>>>> -    MLX4_ICM_ALLOC_SIZE    = 1 << 18,
>>>> -    MLX4_TABLE_CHUNK_SIZE    = 1 << 18
>>>> +    MLX4_ICM_ALLOC_SIZE    = 1 << 12,
>>>> +    MLX4_TABLE_CHUNK_SIZE    = 1 << 12
>>>>   };
>>>>     static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct 
>>>> mlx4_icm_chunk *chunk)
>>>> @@ -400,7 +400,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, 
>>>> struct mlx4_icm_table *table,
>>>>       obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size;
>>>>       num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk;
>>>>   -    table->icm      = kcalloc(num_icm, sizeof(*table->icm), 
>>>> GFP_KERNEL);
>>>> +    table->icm      = vzalloc(num_icm * sizeof(*table->icm));
>>>>       if (!table->icm)
>>>>           return -ENOMEM;
>>>>       table->virt     = virt;
>>>> @@ -446,7 +446,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, 
>>>> struct mlx4_icm_table *table,
>>>>               mlx4_free_icm(dev, table->icm[i], use_coherent);
>>>>           }
>>>>   -    kfree(table->icm);
>>>> +    vfree(table->icm);
>>>>         return -ENOMEM;
>>>>   }
>>>> @@ -462,5 +462,5 @@ void mlx4_cleanup_icm_table(struct mlx4_dev 
>>>> *dev, struct mlx4_icm_table *table)
>>>>               mlx4_free_icm(dev, table->icm[i], table->coherent);
>>>>           }
>>>>   -    kfree(table->icm);
>>>> +    vfree(table->icm);
>>>>   }
>>>
>>
>

^ permalink raw reply

* [PATCH bpf-next] samples/bpf: xdp_monitor, accept short options
From: Prashant Bhole @ 2018-05-11  1:37 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: Prashant Bhole, Jesper Dangaard Brouer, David S . Miller, netdev

updated optstring accept short options

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
---
 samples/bpf/xdp_monitor_user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/samples/bpf/xdp_monitor_user.c b/samples/bpf/xdp_monitor_user.c
index 894bc64c2cac..668511c77aaf 100644
--- a/samples/bpf/xdp_monitor_user.c
+++ b/samples/bpf/xdp_monitor_user.c
@@ -594,7 +594,7 @@ int main(int argc, char **argv)
 	snprintf(bpf_obj_file, sizeof(bpf_obj_file), "%s_kern.o", argv[0]);
 
 	/* Parse commands line args */
-	while ((opt = getopt_long(argc, argv, "h",
+	while ((opt = getopt_long(argc, argv, "hDSs:",
 				  long_options, &longindex)) != -1) {
 		switch (opt) {
 		case 'D':
-- 
2.13.6

^ permalink raw reply related

* [PATCH net-next] udp: avoid refcount_t saturation in __udp_gso_segment()
From: Eric Dumazet @ 2018-05-11  2:07 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Willem de Bruijn,
	Alexander Duyck

For some reason, Willem thought that the issue we fixed for TCP
in commit 7ec318feeed1 ("tcp: gso: avoid refcount_t warning from
tcp_gso_segment()") was not relevant for UDP GSO.

But syzbot found its way.

refcount_t: saturated; leaking memory.
WARNING: CPU: 0 PID: 10261 at lib/refcount.c:78 refcount_add_not_zero+0x2d4/0x320 lib/refcount.c:78
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 10261 Comm: syz-executor5 Not tainted 4.17.0-rc3+ #38
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 panic+0x22f/0x4de kernel/panic.c:184
 __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
 report_bug+0x252/0x2d0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
RIP: 0010:refcount_add_not_zero+0x2d4/0x320 lib/refcount.c:78
RSP: 0018:ffff880196db6b90 EFLAGS: 00010282
RAX: 0000000000000026 RBX: 00000000ffffff01 RCX: ffffc900040d9000
RDX: 0000000000004a29 RSI: ffffffff8160f6f1 RDI: ffff880196db66f0
RBP: ffff880196db6c78 R08: ffff8801b33d6740 R09: 0000000000000002
R10: ffff8801b33d6740 R11: 0000000000000000 R12: 0000000000000000
R13: 00000000ffffffff R14: ffff880196db6c50 R15: 0000000000020101
 refcount_add+0x1b/0x70 lib/refcount.c:102
 __udp_gso_segment+0xaa5/0xee0 net/ipv4/udp_offload.c:272
 udp4_ufo_fragment+0x592/0x7a0 net/ipv4/udp_offload.c:301
 inet_gso_segment+0x639/0x12b0 net/ipv4/af_inet.c:1342
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 __skb_gso_segment+0x3bb/0x870 net/core/dev.c:2865
 skb_gso_segment include/linux/netdevice.h:4050 [inline]
 validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3122
 __dev_queue_xmit+0xbf8/0x34c0 net/core/dev.c:3579
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3620
 neigh_direct_output+0x15/0x20 net/core/neighbour.c:1401
 neigh_output include/net/neighbour.h:483 [inline]
 ip_finish_output2+0xa5f/0x1840 net/ipv4/ip_output.c:229
 ip_finish_output+0x828/0xf80 net/ipv4/ip_output.c:317
 NF_HOOK_COND include/linux/netfilter.h:277 [inline]
 ip_output+0x21b/0x850 net/ipv4/ip_output.c:405
 dst_output include/net/dst.h:444 [inline]
 ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
 ip_send_skb+0x40/0xe0 net/ipv4/ip_output.c:1434
 udp_send_skb.isra.37+0x5eb/0x1000 net/ipv4/udp.c:825
 udp_push_pending_frames+0x5c/0xf0 net/ipv4/udp.c:853
 udp_v6_push_pending_frames+0x380/0x3e0 net/ipv6/udp.c:1105
 udp_lib_setsockopt+0x59a/0x600 net/ipv4/udp.c:2403
 udpv6_setsockopt+0x95/0xa0 net/ipv6/udp.c:1447
 sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3046
 __sys_setsockopt+0x1bd/0x390 net/socket.c:1903
 __do_sys_setsockopt net/socket.c:1914 [inline]
 __se_sys_setsockopt net/socket.c:1911 [inline]
 __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: ad405857b174 ("udp: better wmem accounting on gso")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 net/ipv4/udp_offload.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index ede2a7305b90f789c748d911530453ec2cbbfab7..92dc9e5a7ff3d0a7509bfa2a66e9189c8341a5fa 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -268,9 +268,17 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 		uh->check = gso_make_checksum(seg, ~check) ? : CSUM_MANGLED_0;
 
 	/* update refcount for the packet */
-	if (copy_dtor)
-		refcount_add(sum_truesize - gso_skb->truesize,
-			     &sk->sk_wmem_alloc);
+	if (copy_dtor) {
+		int delta = sum_truesize - gso_skb->truesize;
+
+		/* In some pathological cases, delta can be negative.
+		 * We need to either use refcount_add() or refcount_sub_and_test()
+		 */
+		if (likely(delta >= 0))
+			refcount_add(delta, &sk->sk_wmem_alloc);
+		else
+			WARN_ON_ONCE(refcount_sub_and_test(-delta, &sk->sk_wmem_alloc));
+	}
 	return segs;
 }
 EXPORT_SYMBOL_GPL(__udp_gso_segment);
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

* Re: [PATCH net] macmace: Set platform device coherent_dma_mask
From: Michael Schmitz @ 2018-05-11  2:11 UTC (permalink / raw)
  To: Finn Thain
  Cc: Geert Uytterhoeven, David S. Miller, linux-m68k, netdev,
	Linux Kernel Mailing List, Christoph Hellwig
In-Reply-To: <alpine.LNX.2.21.1805110921020.8@nippy.intranet>

Hi Finn,

On Fri, May 11, 2018 at 11:55 AM, Finn Thain <fthain@telegraphics.com.au> wrote:

>> > What's worse, if you do pass a dma_mask in struct
>> > platform_device_info, you end up with this problem in
>> > platform_device_register_full():
>> >
>> >         if (pdevinfo->dma_mask) {
>> >                 /*
>> >                  * This memory isn't freed when the device is put,
>> >                  * I don't have a nice idea for that though.  Conceptually
>> >                  * dma_mask in struct device should not be a pointer.
>> >                  * See http://thread.gmane.org/gmane.linux.kernel.pci/9081
>> >                  */
>> >                 pdev->dev.dma_mask =
>> >                         kmalloc(sizeof(*pdev->dev.dma_mask), GFP_KERNEL);
>>
>> Maybe platform_device_register_full() should rather check whether
>> dev.coherent_dma_mask is set, and make dev.dma_mask point to that? This
>> is how we solved the warning issue for the Zorro bus devices...
>> (8614f1b58bd0e920a5859464a500b93152c5f8b1)
>>
>
> The claim in the comment above that a pointer is the wrong solution
> suggests that your proposal won't get far. Also, your proposal doesn't

I read the comment to be mostly concerned about not freeing memory,
and attempted to address that. I won't pretend it's the right thing to
do if the pointer will go away anyway, and I certainly won't submit a
patch. Sorry for muddling the issue.

> address the other issues I raised: a new
> platform_device_register_simple_dma() API would only have two callers, and
> the dma mask setup for device-tree probed platform devices is apparently a
> work-in-progress (which I don't want to churn up).

Yes, and that's why I would prefer your old patch handling this in the
device driver (which Geert didn't like), or in the alternative to set
the mask up when registering a device with its bus where appropriate.

I concede this won't help with pure platform devices but as we can't
test all these, we should leave the fix for platfoem devices up to
Christoph.

>
>> > > With people setting the mask to kill the WARNING splat, this may
>> > > become more common.
>> >
>> > Since the commit which introduced the WARNING, only commits f61e64310b75
>> > ("m68k: set dma and coherent masks for platform FEC ethernets") and
>> > 7bcfab202ca7 ("powerpc/macio: set a proper dma_coherent_mask") seem to be
>> > aimed at squelching that WARNING.
>> >
>> > (Am I missing any others?)
>>
>> Zorro devices :-)
>
> Right, I should add commit 55496d3fe2ac ("zorro: Set up z->dev.dma_mask
> for the DMA API") to that list.
>
>> Which begs the question: why can' you set up all Nubus bus devices' DMA
>> masks in nubus_device_register(), or nubus_add_board()?
>
> I am expecting to see the same WARNING from the nubus sonic driver but it
> hasn't happened yet, so I don't have a patch for it yet. In anycase, the
> nubus fix would be a lot like the zorro bus fix, so I don't see a problem.

That's odd. But what I meant to say is that by setting up
dma_coherent_mask in nubus_add_board(), and pointing dma_mask to that,
ypu won't need any patches to Nubus device drivers.

I must be missing something else...

Cheers,

  Michael


>
> --

^ permalink raw reply

* Re: [PATCH] coredump: rename umh_pipe_setup() to coredump_pipe_setup()
From: Al Viro @ 2018-05-11  2:48 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Alexei Starovoitov, ast, linux-fsdevel, linux-kernel,
	David S. Miller, netdev
In-Reply-To: <20180510233247.GG27853@wotan.suse.de>

On Thu, May 10, 2018 at 11:32:47PM +0000, Luis R. Rodriguez wrote:

> I think net-next makes sense if Al Viro is OK with that. This way it could go
> in regardless of the state of your series, but it also lines up with your work.

Fine by me...

^ permalink raw reply

* [PATCH net V2] tun: fix use after free for ptr_ring
From: Jason Wang @ 2018-05-11  2:49 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: xiyou.wangcong, eric.dumazet, mst, Jason Wang

We used to initialize ptr_ring during TUNSETIFF, this is because its
size depends on the tx_queue_len of netdevice. And we try to clean it
up when socket were detached from netdevice. A race were spotted when
trying to do uninit during a read which will lead a use after free for
pointer ring. Solving this by always initialize a zero size ptr_ring
in open() and do resizing during TUNSETIFF, and then we can safely do
cleanup during close(). With this, there's no need for the workaround
that was introduced by commit 4df0bfc79904 ("tun: fix a memory leak
for tfile->tx_array").

Reported-by: syzbot+e8b902c3c3fadf0a9dba@syzkaller.appspotmail.com
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Fixes: 1576d9860599 ("tun: switch to use skb array for tx")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
Changes from v1:
- free ptr_ring during close()
- use tun_ptr_free() during resie for safety
---
 drivers/net/tun.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index ef33950..9fbbb32 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -681,15 +681,6 @@ static void tun_queue_purge(struct tun_file *tfile)
 	skb_queue_purge(&tfile->sk.sk_error_queue);
 }
 
-static void tun_cleanup_tx_ring(struct tun_file *tfile)
-{
-	if (tfile->tx_ring.queue) {
-		ptr_ring_cleanup(&tfile->tx_ring, tun_ptr_free);
-		xdp_rxq_info_unreg(&tfile->xdp_rxq);
-		memset(&tfile->tx_ring, 0, sizeof(tfile->tx_ring));
-	}
-}
-
 static void __tun_detach(struct tun_file *tfile, bool clean)
 {
 	struct tun_file *ntfile;
@@ -736,7 +727,8 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
 			    tun->dev->reg_state == NETREG_REGISTERED)
 				unregister_netdevice(tun->dev);
 		}
-		tun_cleanup_tx_ring(tfile);
+		if (tun)
+			xdp_rxq_info_unreg(&tfile->xdp_rxq);
 		sock_put(&tfile->sk);
 	}
 }
@@ -783,14 +775,14 @@ static void tun_detach_all(struct net_device *dev)
 		tun_napi_del(tun, tfile);
 		/* Drop read queue */
 		tun_queue_purge(tfile);
+		xdp_rxq_info_unreg(&tfile->xdp_rxq);
 		sock_put(&tfile->sk);
-		tun_cleanup_tx_ring(tfile);
 	}
 	list_for_each_entry_safe(tfile, tmp, &tun->disabled, next) {
 		tun_enable_queue(tfile);
 		tun_queue_purge(tfile);
+		xdp_rxq_info_unreg(&tfile->xdp_rxq);
 		sock_put(&tfile->sk);
-		tun_cleanup_tx_ring(tfile);
 	}
 	BUG_ON(tun->numdisabled != 0);
 
@@ -834,7 +826,8 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
 	}
 
 	if (!tfile->detached &&
-	    ptr_ring_init(&tfile->tx_ring, dev->tx_queue_len, GFP_KERNEL)) {
+	    ptr_ring_resize(&tfile->tx_ring, dev->tx_queue_len,
+			    GFP_KERNEL, tun_ptr_free)) {
 		err = -ENOMEM;
 		goto out;
 	}
@@ -3219,6 +3212,11 @@ static int tun_chr_open(struct inode *inode, struct file * file)
 					    &tun_proto, 0);
 	if (!tfile)
 		return -ENOMEM;
+	if (ptr_ring_init(&tfile->tx_ring, 0, GFP_KERNEL)) {
+		sk_free(&tfile->sk);
+		return -ENOMEM;
+	}
+
 	RCU_INIT_POINTER(tfile->tun, NULL);
 	tfile->flags = 0;
 	tfile->ifindex = 0;
@@ -3239,8 +3237,6 @@ static int tun_chr_open(struct inode *inode, struct file * file)
 
 	sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
 
-	memset(&tfile->tx_ring, 0, sizeof(tfile->tx_ring));
-
 	return 0;
 }
 
@@ -3249,6 +3245,7 @@ static int tun_chr_close(struct inode *inode, struct file *file)
 	struct tun_file *tfile = file->private_data;
 
 	tun_detach(tfile, true);
+	ptr_ring_cleanup(&tfile->tx_ring, tun_ptr_free);
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related

* [PATCH] dt-bindings: net: ravb: Add support for r8a77990 SoC
From: Yoshihiro Shimoda @ 2018-05-11  3:18 UTC (permalink / raw)
  To: netdev, linux-renesas-soc, robh+dt, mark.rutland
  Cc: sergei.shtylyov, devicetree, Yoshihiro Shimoda

Add documentation for r8a77990 compatible string to renesas ravb device
tree bindings documentation.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
---
 Documentation/devicetree/bindings/net/renesas,ravb.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt b/Documentation/devicetree/bindings/net/renesas,ravb.txt
index 890526d..fac897d 100644
--- a/Documentation/devicetree/bindings/net/renesas,ravb.txt
+++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt
@@ -21,6 +21,7 @@ Required properties:
       - "renesas,etheravb-r8a77965" for the R8A77965 SoC.
       - "renesas,etheravb-r8a77970" for the R8A77970 SoC.
       - "renesas,etheravb-r8a77980" for the R8A77980 SoC.
+      - "renesas,etheravb-r8a77990" for the R8A77990 SoC.
       - "renesas,etheravb-r8a77995" for the R8A77995 SoC.
       - "renesas,etheravb-rcar-gen3" as a fallback for the above
 		R-Car Gen3 devices.
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net] macmace: Set platform device coherent_dma_mask
From: Finn Thain @ 2018-05-11  3:28 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Geert Uytterhoeven, David S. Miller, linux-m68k, netdev,
	Linux Kernel Mailing List, Christoph Hellwig
In-Reply-To: <CAOmrzkLL0ZNzAh7rwiDy=BkNqqXgbqfv3vkN6K8CTF8VGbT8zA@mail.gmail.com>

On Fri, 11 May 2018, Michael Schmitz wrote:

> > > Which begs the question: why can' you set up all Nubus bus devices' 
> > > DMA masks in nubus_device_register(), or nubus_add_board()?
> >
> > I am expecting to see the same WARNING from the nubus sonic driver but 
> > it hasn't happened yet, so I don't have a patch for it yet. In 
> > anycase, the nubus fix would be a lot like the zorro bus fix, so I 
> > don't see a problem.
> 
> That's odd. But what I meant to say is that by setting up 
> dma_coherent_mask in nubus_add_board(), and pointing dma_mask to that, 
> ypu won't need any patches to Nubus device drivers.

Right. I think I've already acknowledged that. But it's off-topic, because 
the patches under review are for platform drivers. Those patches fix an 
actual bug that I've observed. Whereas, the nubus driver dma mask issue 
that you raised is purely theoretical at this stage.

-- 

^ permalink raw reply

* Re: [PATCH net-next] udp: avoid refcount_t saturation in __udp_gso_segment()
From: Willem de Bruijn @ 2018-05-11  3:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Eric Dumazet, Willem de Bruijn,
	Alexander Duyck
In-Reply-To: <20180511020713.159465-1-edumazet@google.com>

On Thu, May 10, 2018 at 10:07 PM, Eric Dumazet <edumazet@google.com> wrote:
> For some reason, Willem thought that the issue we fixed for TCP
> in commit 7ec318feeed1 ("tcp: gso: avoid refcount_t warning from
> tcp_gso_segment()") was not relevant for UDP GSO.
>
> But syzbot found its way.

[..]

> Fixes: ad405857b174 ("udp: better wmem accounting on gso")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>

Acked-by: Willem de Bruijn <willemb@google.com>

Thanks Eric. Yep, I was naive here. I am quite curious what kind of
gso packet syzkaller was able to cook that exceeds the truesize
of its segments.

^ permalink raw reply

* Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS
From: Willem de Bruijn @ 2018-05-11  3:54 UTC (permalink / raw)
  To: gfree.wind
  Cc: David Miller, Daniel Borkmann, jakub.kicinski, David Ahern,
	Network Development
In-Reply-To: <1525940884-21067-1-git-send-email-gfree.wind@vip.163.com>

On Thu, May 10, 2018 at 4:28 AM,  <gfree.wind@vip.163.com> wrote:
> From: Gao Feng <gfree.wind@vip.163.com>
>
> The skb flow limit is implemented for each CPU independently. In the
> current codes, the function skb_flow_limit gets the softnet_data by
> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
> the current cpu when enable RPS. As the result, the skb_flow_limit checks
> the stats of current CPU, while the skb is going to append the queue of
> another CPU. It isn't the expected behavior.
>
> Now pass the softnet_data as a param to softnet_data to make consistent.

The local cpu softnet_data is used on purpose. The operations in
skb_flow_limit() on sd fields could race if not executed on the local cpu.

Flow limit tries to detect large ("elephant") DoS flows with a fixed four-tuple.
These would always hit the same RPS cpu, so that cpu being backlogged
may be an indication that such a flow is active. But the flow will also always
arrive on the same initial cpu courtesy of RSS. So storing the lookup table
on the initial CPU is also fine. There may be false positives on other CPUs
with the same RPS destination, but that is unlikely with a highly concurrent
traffic server mix ("mice").

Note that the sysctl net.core.flow_limit_cpu_bitmap enables the feature
for the cpus on which traffic initially lands, not the RPS destination cpus.
See also Documentation/networking/scaling.txt

That said, I had to reread the code, as it does seem sensible that the
same softnet_data is intended to be used both when testing qlen and
flow_limit.

^ permalink raw reply

* Re: [PATCH net] macmace: Set platform device coherent_dma_mask
From: Michael Schmitz @ 2018-05-11  4:18 UTC (permalink / raw)
  To: Finn Thain
  Cc: Geert Uytterhoeven, David S. Miller, linux-m68k, netdev,
	Linux Kernel Mailing List, Christoph Hellwig
In-Reply-To: <alpine.LNX.2.21.1805111221180.8@nippy.intranet>

Hi Finn,

Am 11.05.2018 um 15:28 schrieb Finn Thain:
> On Fri, 11 May 2018, Michael Schmitz wrote:
> 
>>>> Which begs the question: why can' you set up all Nubus bus devices' 
>>>> DMA masks in nubus_device_register(), or nubus_add_board()?
>>>
>>> I am expecting to see the same WARNING from the nubus sonic driver but 
>>> it hasn't happened yet, so I don't have a patch for it yet. In 
>>> anycase, the nubus fix would be a lot like the zorro bus fix, so I 
>>> don't see a problem.
>>
>> That's odd. But what I meant to say is that by setting up 
>> dma_coherent_mask in nubus_add_board(), and pointing dma_mask to that, 
>> ypu won't need any patches to Nubus device drivers.
> 
> Right. I think I've already acknowledged that. But it's off-topic, because 
> the patches under review are for platform drivers. Those patches fix an 
> actual bug that I've observed. Whereas, the nubus driver dma mask issue 
> that you raised is purely theoretical at this stage.

I had lost track of the fact that macsonic can be probed as either Nubus
or platform device. Sorry for the noise.

I'm afraid using platform_device_register() (which you already use for
the SCC devices) is the only option handling this on a per-device basis
without touching platform core code, while at the same time keeping the
DMA mask setup out of device drivers (I can see Geert's point there -
device driver code might be shared across implementations of the device
on platforms with different DMA mask requirements,, something the driver
can't be expected to know about).

Cheers,

	Michael

^ permalink raw reply

* Re: [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF
From: Martin KaFai Lau @ 2018-05-11  5:00 UTC (permalink / raw)
  To: Joe Stringer; +Cc: daniel, netdev, ast, john.fastabend
In-Reply-To: <20180509210709.7201-8-joe@wand.net.nz>

On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote:
> This patch adds a new BPF helper function, sk_lookup() which allows BPF
> programs to find out if there is a socket listening on this host, and
> returns a socket pointer which the BPF program can then access to
> determine, for instance, whether to forward or drop traffic. sk_lookup()
> takes a reference on the socket, so when a BPF program makes use of this
> function, it must subsequently pass the returned pointer into the newly
> added sk_release() to return the reference.
> 
> By way of example, the following pseudocode would filter inbound
> connections at XDP if there is no corresponding service listening for
> the traffic:
> 
>   struct bpf_sock_tuple tuple;
>   struct bpf_sock_ops *sk;
> 
>   populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet
>   sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0);
>   if (!sk) {
>     // Couldn't find a socket listening for this traffic. Drop.
>     return TC_ACT_SHOT;
>   }
>   bpf_sk_release(sk, 0);
>   return TC_ACT_OK;
> 
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> ---
>  include/uapi/linux/bpf.h                  |  39 +++++++++++-
>  kernel/bpf/verifier.c                     |   8 ++-
>  net/core/filter.c                         | 102 ++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h            |  40 +++++++++++-
>  tools/testing/selftests/bpf/bpf_helpers.h |   7 ++
>  5 files changed, 193 insertions(+), 3 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d615c777b573..29f38838dbca 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1828,6 +1828,25 @@ union bpf_attr {
>   * 	Return
>   * 		0 on success, or a negative error in case of failure.
>   *
> + * struct bpf_sock_ops *bpf_sk_lookup(ctx, tuple, tuple_size, netns, flags)
> + * 	Decription
> + * 		Look for socket matching 'tuple'. The return value must be checked,
> + * 		and if non-NULL, released via bpf_sk_release().
> + * 		@ctx: pointer to ctx
> + * 		@tuple: pointer to struct bpf_sock_tuple
> + * 		@tuple_size: size of the tuple
> + * 		@flags: flags value
> + * 	Return
> + * 		pointer to socket ops on success, or
> + * 		NULL in case of failure
> + *
> + *  int bpf_sk_release(sock, flags)
> + * 	Description
> + * 		Release the reference held by 'sock'.
> + * 		@sock: Pointer reference to release. Must be found via bpf_sk_lookup().
> + * 		@flags: flags value
> + * 	Return
> + * 		0 on success, or a negative error in case of failure.
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -1898,7 +1917,9 @@ union bpf_attr {
>  	FN(xdp_adjust_tail),		\
>  	FN(skb_get_xfrm_state),		\
>  	FN(get_stack),			\
> -	FN(skb_load_bytes_relative),
> +	FN(skb_load_bytes_relative),	\
> +	FN(sk_lookup),			\
> +	FN(sk_release),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> @@ -2060,6 +2081,22 @@ struct bpf_sock {
>  				 */
>  };
>  
> +struct bpf_sock_tuple {
> +	union {
> +		__be32 ipv6[4];
> +		__be32 ipv4;
> +	} saddr;
> +	union {
> +		__be32 ipv6[4];
> +		__be32 ipv4;
> +	} daddr;
> +	__be16 sport;
> +	__be16 dport;
> +	__u32 dst_if;
> +	__u8 family;
> +	__u8 proto;
> +};
> +
>  #define XDP_PACKET_HEADROOM 256
>  
>  /* User return codes for XDP prog type.
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 92b9a5dc465a..579012c483e4 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -153,6 +153,12 @@ static const struct bpf_verifier_ops * const bpf_verifier_ops[] = {
>   * PTR_TO_MAP_VALUE, PTR_TO_SOCKET_OR_NULL becomes PTR_TO_SOCKET when the type
>   * passes through a NULL-check conditional. For the branch wherein the state is
>   * changed to CONST_IMM, the verifier releases the reference.
> + *
> + * For each helper function that allocates a reference, such as bpf_sk_lookup(),
> + * there is a corresponding release function, such as bpf_sk_release(). When
> + * a reference type passes into the release function, the verifier also releases
> + * the reference. If any unchecked or unreleased reference remains at the end of
> + * the program, the verifier rejects it.
>   */
>  
>  /* verifier_state + insn_idx are pushed to stack when branch is encountered */
> @@ -277,7 +283,7 @@ static bool arg_type_is_refcounted(enum bpf_arg_type type)
>   */
>  static bool is_release_function(enum bpf_func_id func_id)
>  {
> -	return false;
> +	return func_id == BPF_FUNC_sk_release;
>  }
>  
>  /* string representation of 'enum bpf_reg_type' */
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 4c35152fb3a8..751c255d17d3 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -58,8 +58,12 @@
>  #include <net/busy_poll.h>
>  #include <net/tcp.h>
>  #include <net/xfrm.h>
> +#include <net/udp.h>
>  #include <linux/bpf_trace.h>
>  #include <net/xdp_sock.h>
> +#include <net/inet_hashtables.h>
> +#include <net/inet6_hashtables.h>
> +#include <net/net_namespace.h>
>  
>  /**
>   *	sk_filter_trim_cap - run a packet through a socket filter
> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
>  };
>  #endif
>  
> +struct sock *
> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) {
Would it be possible to have another version that
returns a sk without taking its refcnt?
It may have performance benefit.

> +	int dst_if = (int)tuple->dst_if;
> +	struct in6_addr *src6;
> +	struct in6_addr *dst6;
> +
> +	if (tuple->family == AF_INET6) {
> +		src6 = (struct in6_addr *)&tuple->saddr.ipv6;
> +		dst6 = (struct in6_addr *)&tuple->daddr.ipv6;
> +	} else if (tuple->family != AF_INET) {
> +		return ERR_PTR(-EOPNOTSUPP);
> +	}
> +
> +	if (tuple->proto == IPPROTO_TCP) {
> +		if (tuple->family == AF_INET)
> +			return inet_lookup(net, &tcp_hashinfo, NULL, 0,
> +					   tuple->saddr.ipv4, tuple->sport,
> +					   tuple->daddr.ipv4, tuple->dport,
> +					   dst_if);
> +		else
> +			return inet6_lookup(net, &tcp_hashinfo, NULL, 0,
> +					    src6, tuple->sport,
> +					    dst6, tuple->dport, dst_if);
> +	} else if (tuple->proto == IPPROTO_UDP) {
> +		if (tuple->family == AF_INET)
> +			return udp4_lib_lookup(net, tuple->saddr.ipv4,
> +					       tuple->sport, tuple->daddr.ipv4,
> +					       tuple->dport, dst_if);
> +		else
> +			return udp6_lib_lookup(net, src6, tuple->sport,
> +					       dst6, tuple->dport, dst_if);
> +	} else {
> +		return ERR_PTR(-EOPNOTSUPP);
> +	}
> +
> +	return NULL;
> +}
> +
> +BPF_CALL_5(bpf_sk_lookup, struct sk_buff *, skb,
> +	   struct bpf_sock_tuple *, tuple, u32, len, u32, netns_id, u64, flags)
> +{
> +	struct net *caller_net = dev_net(skb->dev);
> +	struct sock *sk = NULL;
> +	struct net *net;
> +
> +	/* XXX: Perform verification-time checking of tuple size? */
> +	if (unlikely(len != sizeof(struct bpf_sock_tuple) || flags))
> +		goto out;
> +
> +	net = get_net_ns_by_id(caller_net, netns_id);
> +	if (unlikely(!net))
> +		goto out;
> +
> +	sk = sk_lookup(net, tuple);
> +	put_net(net);
> +	if (IS_ERR_OR_NULL(sk))
> +		sk = NULL;
> +	else
> +		sk = sk_to_full_sk(sk);
> +out:
> +	return (unsigned long) sk;
> +}
> +
> +static const struct bpf_func_proto bpf_sk_lookup_proto = {
> +	.func		= bpf_sk_lookup,
> +	.gpl_only	= false,
> +	.ret_type	= RET_PTR_TO_SOCKET_OR_NULL,
> +	.arg1_type	= ARG_PTR_TO_CTX,
> +	.arg2_type	= ARG_PTR_TO_MEM,
> +	.arg3_type	= ARG_CONST_SIZE,
> +	.arg4_type	= ARG_ANYTHING,
> +	.arg5_type	= ARG_ANYTHING,
> +};
> +
> +BPF_CALL_2(bpf_sk_release, struct sock *, sk, u64, flags)
> +{
> +	sock_gen_put(sk);
> +	if (unlikely(flags))
> +		return -EINVAL;
> +	return 0;
> +}
> +
> +static const struct bpf_func_proto bpf_sk_release_proto = {
> +	.func		= bpf_sk_release,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_SOCKET,
> +	.arg2_type	= ARG_ANYTHING,
> +};
> +
>  static const struct bpf_func_proto *
>  bpf_base_func_proto(enum bpf_func_id func_id)
>  {
> @@ -4181,6 +4275,10 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  	case BPF_FUNC_skb_get_xfrm_state:
>  		return &bpf_skb_get_xfrm_state_proto;
>  #endif
> +	case BPF_FUNC_sk_lookup:
> +		return &bpf_sk_lookup_proto;
> +	case BPF_FUNC_sk_release:
> +		return &bpf_sk_release_proto;
>  	default:
>  		return bpf_base_func_proto(func_id);
>  	}
> @@ -4292,6 +4390,10 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  		return &bpf_get_socket_uid_proto;
>  	case BPF_FUNC_sk_redirect_map:
>  		return &bpf_sk_redirect_map_proto;
> +	case BPF_FUNC_sk_lookup:
> +		return &bpf_sk_lookup_proto;
> +	case BPF_FUNC_sk_release:
> +		return &bpf_sk_release_proto;
>  	default:
>  		return bpf_base_func_proto(func_id);
>  	}
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index fff51c187d1e..29f38838dbca 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -117,6 +117,7 @@ enum bpf_map_type {
>  	BPF_MAP_TYPE_DEVMAP,
>  	BPF_MAP_TYPE_SOCKMAP,
>  	BPF_MAP_TYPE_CPUMAP,
> +	BPF_MAP_TYPE_XSKMAP,
>  };
>  
>  enum bpf_prog_type {
> @@ -1827,6 +1828,25 @@ union bpf_attr {
>   * 	Return
>   * 		0 on success, or a negative error in case of failure.
>   *
> + * struct bpf_sock_ops *bpf_sk_lookup(ctx, tuple, tuple_size, netns, flags)
> + * 	Decription
> + * 		Look for socket matching 'tuple'. The return value must be checked,
> + * 		and if non-NULL, released via bpf_sk_release().
> + * 		@ctx: pointer to ctx
> + * 		@tuple: pointer to struct bpf_sock_tuple
> + * 		@tuple_size: size of the tuple
> + * 		@flags: flags value
> + * 	Return
> + * 		pointer to socket ops on success, or
> + * 		NULL in case of failure
> + *
> + *  int bpf_sk_release(sock, flags)
> + * 	Description
> + * 		Release the reference held by 'sock'.
> + * 		@sock: Pointer reference to release. Must be found via bpf_sk_lookup().
> + * 		@flags: flags value
> + * 	Return
> + * 		0 on success, or a negative error in case of failure.
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -1897,7 +1917,9 @@ union bpf_attr {
>  	FN(xdp_adjust_tail),		\
>  	FN(skb_get_xfrm_state),		\
>  	FN(get_stack),			\
> -	FN(skb_load_bytes_relative),
> +	FN(skb_load_bytes_relative),	\
> +	FN(sk_lookup),			\
> +	FN(sk_release),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> @@ -2059,6 +2081,22 @@ struct bpf_sock {
>  				 */
>  };
>  
> +struct bpf_sock_tuple {
> +	union {
> +		__be32 ipv6[4];
> +		__be32 ipv4;
> +	} saddr;
> +	union {
> +		__be32 ipv6[4];
> +		__be32 ipv4;
> +	} daddr;
> +	__be16 sport;
> +	__be16 dport;
> +	__u32 dst_if;
> +	__u8 family;
> +	__u8 proto;
> +};
> +
>  #define XDP_PACKET_HEADROOM 256
>  
>  /* User return codes for XDP prog type.
> diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
> index 265f8e0e8ada..4dc311ea0c16 100644
> --- a/tools/testing/selftests/bpf/bpf_helpers.h
> +++ b/tools/testing/selftests/bpf/bpf_helpers.h
> @@ -103,6 +103,13 @@ static int (*bpf_skb_get_xfrm_state)(void *ctx, int index, void *state,
>  	(void *) BPF_FUNC_skb_get_xfrm_state;
>  static int (*bpf_get_stack)(void *ctx, void *buf, int size, int flags) =
>  	(void *) BPF_FUNC_get_stack;
> +static struct bpf_sock *(*bpf_sk_lookup)(void *ctx,
> +					 struct bpf_sock_tuple *tuple,
> +					 int size, unsigned int netns_id,
> +					 unsigned long long flags) =
> +	(void *) BPF_FUNC_sk_lookup;
> +static int (*bpf_sk_release)(struct bpf_sock *sk, unsigned long long flags) =
> +	(void *) BPF_FUNC_sk_release;
>  
>  /* llvm builtin functions that eBPF C program may use to
>   * emit BPF_LD_ABS and BPF_LD_IND instructions
> -- 
> 2.14.1
> 

^ permalink raw reply

* KASAN: null-ptr-deref Read in rds_ib_get_mr
From: DaeRyong Jeong @ 2018-05-11  5:20 UTC (permalink / raw)
  To: santosh.shilimkar, davem
  Cc: netdev, linux-rdma, rds-devel, linux-kernel, byoungyoung, kt0755

We report the crash: KASAN: null-ptr-deref Read in rds_ib_get_mr

Note that this bug is previously reported by syzkaller.
https://syzkaller.appspot.com/bug?id=0bb56a5a48b000b52aa2b0d8dd20b1f545214d91
Nonetheless, this bug has not fixed yet, and we hope that this report and our
analysis, which gets help by the RaceFuzzer's feature, will helpful to fix the
crash.

This crash has been found in v4.17-rc1 using RaceFuzzer (a modified
version of Syzkaller), which we describe more at the end of this
report. Our analysis shows that the race occurs when invoking two
syscalls concurrently, bind$rds and setsockopt$RDS_GET_MR.


Analysis:
We think the concurrent execution of __rds_rdma_map() and rds_bind()
causes the problem. __rds_rdma_map() checks whether rs->rs_bound_addr is 0
or not. But the concurrent execution with rds_bind() can by-pass this
check. Therefore, __rds_rdmap_map() calls rs->rs_transport->get_mr() and
rds_ib_get_mr() causes the null deref at ib_rdma.c:544 in v4.17-rc1, when
dereferencing rs_conn.


Thread interleaving:
CPU0 (__rds_rdma_map)					CPU1 (rds_bind)
							// rds_add_bound() sets rs->bound_addr as none 0
							ret = rds_add_bound(rs, sin->sin_addr.s_addr, &sin->sin_port);
if (rs->rs_bound_addr == 0 || !rs->rs_transport) {
	ret = -ENOTCONN; /* XXX not a great errno */
	goto out;
}
							if (rs->rs_transport) { /* previously bound */
								trans = rs->rs_transport;
								if (trans->laddr_check(sock_net(sock->sk),
										       sin->sin_addr.s_addr) != 0) {
									ret = -ENOPROTOOPT;
									// rds_remove_bound() sets rs->bound_addr as 0
									rds_remove_bound(rs);
...
trans_private = rs->rs_transport->get_mr(sg, nents, rs,
					 &mr->r_key);
(in rds_ib_get_mr())
struct rds_ib_connection *ic = rs->rs_conn->c_transport_data;


Call sequence (v4.17-rc1):
CPU0
rds_setsockopt
	rds_get_mr
		__rds_rdma_map
			rds_ib_get_mr


CPU1
rds_bind
	rds_add_bound
	...
	rds_remove_bound


Crash log:
==================================================================
BUG: KASAN: null-ptr-deref in rds_ib_get_mr+0x3a/0x150 net/rds/ib_rdma.c:544
Read of size 8 at addr 0000000000000068 by task syz-executor0/32067

CPU: 0 PID: 32067 Comm: syz-executor0 Not tainted 4.17.0-rc1 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x166/0x21c lib/dump_stack.c:113
 kasan_report_error mm/kasan/report.c:352 [inline]
 kasan_report+0x140/0x360 mm/kasan/report.c:412
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 __asan_load8+0x54/0x90 mm/kasan/kasan.c:699
 rds_ib_get_mr+0x3a/0x150 net/rds/ib_rdma.c:544
 __rds_rdma_map+0x521/0x9d0 net/rds/rdma.c:271
 rds_get_mr+0xad/0xf0 net/rds/rdma.c:333
 rds_setsockopt+0x57f/0x720 net/rds/af_rds.c:347
 __sys_setsockopt+0x147/0x230 net/socket.c:1903
 __do_sys_setsockopt net/socket.c:1914 [inline]
 __se_sys_setsockopt net/socket.c:1911 [inline]
 __x64_sys_setsockopt+0x67/0x80 net/socket.c:1911
 do_syscall_64+0x15f/0x4a0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4563f9
RSP: 002b:00007f6a2b3c2b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 000000000072bee0 RCX: 00000000004563f9
RDX: 0000000000000002 RSI: 0000000000000114 RDI: 0000000000000015
RBP: 0000000000000575 R08: 0000000000000020 R09: 0000000000000000
R10: 0000000020000140 R11: 0000000000000246 R12: 00007f6a2b3c36d4
R13: 00000000ffffffff R14: 00000000006fd398 R15: 0000000000000000
==================================================================


= About RaceFuzzer

RaceFuzzer is a customized version of Syzkaller, specifically tailored
to find race condition bugs in the Linux kernel. While we leverage
many different technique, the notable feature of RaceFuzzer is in
leveraging a custom hypervisor (QEMU/KVM) to interleave the
scheduling. In particular, we modified the hypervisor to intentionally
stall a per-core execution, which is similar to supporting per-core
breakpoint functionality. This allows RaceFuzzer to force the kernel
to deterministically trigger racy condition (which may rarely happen
in practice due to randomness in scheduling).

RaceFuzzer's C repro always pinpoints two racy syscalls. Since C
repro's scheduling synchronization should be performed at the user
space, its reproducibility is limited (reproduction may take from 1
second to 10 minutes (or even more), depending on a bug). This is
because, while RaceFuzzer precisely interleaves the scheduling at the
kernel's instruction level when finding this bug, C repro cannot fully
utilize such a feature. Please disregard all code related to
"should_hypercall" in the C repro, as this is only for our debugging
purposes using our own hypervisor.

^ permalink raw reply

* Re: [PATCH net] macmace: Set platform device coherent_dma_mask
From: Finn Thain @ 2018-05-11  5:28 UTC (permalink / raw)
  To: Michael Schmitz, Geert Uytterhoeven
  Cc: David S. Miller, linux-m68k, netdev, Linux Kernel Mailing List,
	Christoph Hellwig
In-Reply-To: <0b13a08d-ba2d-b9dc-4fd9-1f6bad5cd1ee@gmail.com>

On Fri, 11 May 2018, Michael Schmitz wrote:

> 
> I'm afraid using platform_device_register() (which you already use for 
> the SCC devices) is the only option handling this on a per-device basis 
> without touching platform core code, while at the same time keeping the 
> DMA mask setup out of device drivers

I don't think that will fly. If you call platform_device_register() and 
follow that with a dma mask assignment, you could race with the bus 
matching and driver probe, and we are back to the same WARNING message.

If you want to use platform_device_register(), you'd have to implement 
arch_setup_pdev_archdata() and use that to set up the dma mask.

> (I can see Geert's point there - device driver code might be shared 
> across implementations of the device on platforms with different DMA 
> mask requirements,, something the driver can't be expected to know 
> about).

As I said, these drivers might be expected to be portable between Macs and 
early PowerMacs, but the same dma mask would apply AFAIK.

If a platform driver isn't expected to be portable, I think either method 
is reasonable: arch_setup_pdev_archdata() or the method in the patch.

Anyway, there is this in arch/powerpc/kernel/setup-common.c:

void arch_setup_pdev_archdata(struct platform_device *pdev)
{
        pdev->archdata.dma_mask = DMA_BIT_MASK(32);
        pdev->dev.dma_mask = &pdev->archdata.dma_mask;
	...

I'm inclined to propose something similar for m68k. That should fix the 
problem, since arch_setup_pdev_archdata() is already in the call chain:

	platform_device_register_simple()
		platform_device_register_resndata()
			platform_device_register_full()
				platform_device_alloc()
					arch_setup_pdev_archdata()

Thoughts? Will this have nasty side effects for m68k platforms that use 
smaller dma masks?

-- 

> 
> Cheers,
> 
> 	Michael
> 

^ permalink raw reply

* Re:Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS
From: Gao Feng @ 2018-05-11  6:20 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: davem@davemloft.net, daniel@iogearbox.net,
	jakub.kicinski@netronome.com, David Ahern, netdev@vger.kernel.org
In-Reply-To: <CAF=yD-LVQ_hpQaN9tZ_UmJ3YYqipAaHBLhEsusaOXYJiXfcCrw@mail.gmail.com>

At 2018-05-11 11:54:55, "Willem de Bruijn" <willemdebruijn.kernel@gmail.com> wrote:
>On Thu, May 10, 2018 at 4:28 AM,  <gfree.wind@vip.163.com> wrote:
>> From: Gao Feng <gfree.wind@vip.163.com>
>>
>> The skb flow limit is implemented for each CPU independently. In the
>> current codes, the function skb_flow_limit gets the softnet_data by
>> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
>> the current cpu when enable RPS. As the result, the skb_flow_limit checks
>> the stats of current CPU, while the skb is going to append the queue of
>> another CPU. It isn't the expected behavior.
>>
>> Now pass the softnet_data as a param to softnet_data to make consistent.
>
>The local cpu softnet_data is used on purpose. The operations in
>skb_flow_limit() on sd fields could race if not executed on the local cpu.

I think the race doesn't exist because of the rps_lock.
The enqueue_to_backlog has hold the rps_lock before skb_flow_limit.

>
>Flow limit tries to detect large ("elephant") DoS flows with a fixed four-tuple.
>These would always hit the same RPS cpu, so that cpu being backlogged

They may hit the different target CPU when enable RFS. Because the app could be scheduled
to another CPU, then RFS tries to deliver the skb to latest core which has hot cache.

>may be an indication that such a flow is active. But the flow will also always
>arrive on the same initial cpu courtesy of RSS. So storing the lookup table

The RSS couldn't make sure the irq is handled by same cpu. It would be balanced between
the cpus.

>on the initial CPU is also fine. There may be false positives on other CPUs
>with the same RPS destination, but that is unlikely with a highly concurrent
>traffic server mix ("mice").

If my comment is right, the flow couldn't always arrive one the same initial cpu,  although
it may be sent to one same target cpu.

>
>Note that the sysctl net.core.flow_limit_cpu_bitmap enables the feature
>for the cpus on which traffic initially lands, not the RPS destination cpus.
>See also Documentation/networking/scaling.txt
>
>That said, I had to reread the code, as it does seem sensible that the
>same softnet_data is intended to be used both when testing qlen and
>flow_limit.

In most cases, user configures the same RPS map with flow_limit like 0xff.
Because user couldn't predict which core the evil flow would arrive on.

Take an example, there are 2 cores, cpu0 and cpu1. 
One flow is the an evil flow, but the irq is sent to cpu0. After RPS/RFS, the target cpu is cpu1.
Now cpu0 invokes enqueue_to_backlog, then the skb_flow_limit checkes the queue length
of cpu0. Certainly it could pass the check of skb_flow_limit because there is no any evil flow on cpu0.
Then the cpu0 inserts the skb into the queue of cpu1.
As a result, the skb_flow_limit doesn't work as expected.

BTW, I have already sent the v2 patch which only adds the "Fixes: tag".

Best Regards
Feng



^ permalink raw reply

* Re: [bpf-next v3 8/9] bpf: Provide helper to do forwarding lookups in kernel FIB table
From: David Ahern @ 2018-05-11  6:30 UTC (permalink / raw)
  To: Mathieu Xhonneux
  Cc: netdev, borkmann, ast, David Miller, shm, roopa, brouer, toke,
	john.fastabend
In-Reply-To: <CAKSCvkTQPFZUBfNVfdJYj3PakTMxCwEwoWJWdVfnofzjFtG=3g@mail.gmail.com>

On 5/10/18 12:27 PM, Mathieu Xhonneux wrote:
> I'm quite interested in this helper to implement OAM features (through
> other hooks, e.g. the BPF LWT hook). Do you have an idea about how it
> behaves with ECMP routes (with IPv4 and/or IPv6) ? In IPv6, I'm
> guessing that the returned gateway address is always a link-local
> address ?

ECMP works with BPF and forwarding just like it does with the full stack
-- the traffic should be distributed based on the L3 or L4 hash
algorithm and the number of links.

As for the IPv6 gateway question, it returns the gateway of the route in
the FIB -- ie., link local or global address. No assumptions are
necessary or made for the BPF helper.

^ permalink raw reply

* [PATCH v2 net] stmmac: strip vlan tag on reception only for 8021q tagged frames
From: Elad Nachman @ 2018-05-11  7:31 UTC (permalink / raw)
  To: Toshiaki Makita, davem, netdev; +Cc: eladv6
In-Reply-To: <f56f7073-46ac-9f9e-f8c0-deaa23ab3aa9@lab.ntt.co.jp>

stmmac reception handler calls stmmac_rx_vlan() to strip the vlan before calling napi_gro_receive().

The function assumes VLAN tagged frames are always tagged with 802.1Q protocol,
and assigns ETH_P_8021Q to the skb by hard-coding the parameter on call to __vlan_hwaccel_put_tag() without checking the actual VLAN tag.

This causes packets not to be passed to the VLAN slave if it was created with 802.1AD protocol
(ip link add link eth0 eth0.100 type vlan proto 802.1ad id 100).

This fix only strips the VLAN tag for 802.1Q tagged protocols. 802.1AD frames will be handled later by skb_vlan_untag() .

Signed-off-by: Elad Nachman <eladn@gilat.com>

---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index b65e2d1..b230ab5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3293,17 +3293,21 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 
 static void stmmac_rx_vlan(struct net_device *dev, struct sk_buff *skb)
 {
-	struct ethhdr *ehdr;
+	struct vlan_ethhdr *veth;
 	u16 vlanid;
+	__be16 vlan_proto;
 
 	if ((dev->features & NETIF_F_HW_VLAN_CTAG_RX) ==
 	    NETIF_F_HW_VLAN_CTAG_RX &&
 	    !__vlan_get_tag(skb, &vlanid)) {
-		/* pop the vlan tag */
-		ehdr = (struct ethhdr *)skb->data;
-		memmove(skb->data + VLAN_HLEN, ehdr, ETH_ALEN * 2);
-		skb_pull(skb, VLAN_HLEN);
-		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlanid);
+		veth = (struct vlan_ethhdr *)skb->data;
+		vlan_proto = veth->h_vlan_proto;
+		if (likely(vlan_proto == htons(ETH_P_8021Q))) {
+			/* pop the vlan tag */
+			memmove(skb->data + VLAN_HLEN, veth, ETH_ALEN * 2);
+			skb_pull(skb, VLAN_HLEN);
+			__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlanid);
+		}
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related

* Re: KASAN: null-ptr-deref Read in rds_ib_get_mr
From: Yanjun Zhu @ 2018-05-11  7:48 UTC (permalink / raw)
  To: DaeRyong Jeong, santosh.shilimkar, davem
  Cc: netdev, linux-rdma, rds-devel, linux-kernel, byoungyoung, kt0755
In-Reply-To: <20180511052056.GA10547@dragonet.kaist.ac.kr>



On 2018/5/11 13:20, DaeRyong Jeong wrote:
> We report the crash: KASAN: null-ptr-deref Read in rds_ib_get_mr
>
> Note that this bug is previously reported by syzkaller.
> https://syzkaller.appspot.com/bug?id=0bb56a5a48b000b52aa2b0d8dd20b1f545214d91
> Nonetheless, this bug has not fixed yet, and we hope that this report and our
> analysis, which gets help by the RaceFuzzer's feature, will helpful to fix the
> crash.
>
> This crash has been found in v4.17-rc1 using RaceFuzzer (a modified
> version of Syzkaller), which we describe more at the end of this
> report. Our analysis shows that the race occurs when invoking two
> syscalls concurrently, bind$rds and setsockopt$RDS_GET_MR.
>
>
> Analysis:
> We think the concurrent execution of __rds_rdma_map() and rds_bind()
> causes the problem. __rds_rdma_map() checks whether rs->rs_bound_addr is 0
> or not. But the concurrent execution with rds_bind() can by-pass this
> check. Therefore, __rds_rdmap_map() calls rs->rs_transport->get_mr() and
> rds_ib_get_mr() causes the null deref at ib_rdma.c:544 in v4.17-rc1, when
> dereferencing rs_conn.
>
>
> Thread interleaving:
> CPU0 (__rds_rdma_map)					CPU1 (rds_bind)
> 							// rds_add_bound() sets rs->bound_addr as none 0
> 							ret = rds_add_bound(rs, sin->sin_addr.s_addr, &sin->sin_port);
> if (rs->rs_bound_addr == 0 || !rs->rs_transport) {
> 	ret = -ENOTCONN; /* XXX not a great errno */
> 	goto out;
> }
> 							if (rs->rs_transport) { /* previously bound */
> 								trans = rs->rs_transport;
> 								if (trans->laddr_check(sock_net(sock->sk),
> 										       sin->sin_addr.s_addr) != 0) {
> 									ret = -ENOPROTOOPT;
> 									// rds_remove_bound() sets rs->bound_addr as 0
> 									rds_remove_bound(rs);
> ...
> trans_private = rs->rs_transport->get_mr(sg, nents, rs,
> 					 &mr->r_key);
> (in rds_ib_get_mr())
> struct rds_ib_connection *ic = rs->rs_conn->c_transport_data;
>
>
> Call sequence (v4.17-rc1):
> CPU0
> rds_setsockopt
> 	rds_get_mr
> 		__rds_rdma_map
> 			rds_ib_get_mr
>
>
> CPU1
> rds_bind
> 	rds_add_bound
> 	...
> 	rds_remove_bound
>
>
> Crash log:
> ==================================================================
> BUG: KASAN: null-ptr-deref in rds_ib_get_mr+0x3a/0x150 net/rds/ib_rdma.c:544
> Read of size 8 at addr 0000000000000068 by task syz-executor0/32067
>
> CPU: 0 PID: 32067 Comm: syz-executor0 Not tainted 4.17.0-rc1 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
> Call Trace:
>   __dump_stack lib/dump_stack.c:77 [inline]
>   dump_stack+0x166/0x21c lib/dump_stack.c:113
>   kasan_report_error mm/kasan/report.c:352 [inline]
>   kasan_report+0x140/0x360 mm/kasan/report.c:412
>   check_memory_region_inline mm/kasan/kasan.c:260 [inline]
>   __asan_load8+0x54/0x90 mm/kasan/kasan.c:699
>   rds_ib_get_mr+0x3a/0x150 net/rds/ib_rdma.c:544
>   __rds_rdma_map+0x521/0x9d0 net/rds/rdma.c:271
>   rds_get_mr+0xad/0xf0 net/rds/rdma.c:333
>   rds_setsockopt+0x57f/0x720 net/rds/af_rds.c:347
>   __sys_setsockopt+0x147/0x230 net/socket.c:1903
>   __do_sys_setsockopt net/socket.c:1914 [inline]
>   __se_sys_setsockopt net/socket.c:1911 [inline]
>   __x64_sys_setsockopt+0x67/0x80 net/socket.c:1911
>   do_syscall_64+0x15f/0x4a0 arch/x86/entry/common.c:287
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x4563f9
> RSP: 002b:00007f6a2b3c2b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
> RAX: ffffffffffffffda RBX: 000000000072bee0 RCX: 00000000004563f9
> RDX: 0000000000000002 RSI: 0000000000000114 RDI: 0000000000000015
> RBP: 0000000000000575 R08: 0000000000000020 R09: 0000000000000000
> R10: 0000000020000140 R11: 0000000000000246 R12: 00007f6a2b3c36d4
> R13: 00000000ffffffff R14: 00000000006fd398 R15: 0000000000000000
> ==================================================================
diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
index e678699..2228b50 100644
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -539,11 +539,17 @@ void rds_ib_flush_mrs(void)
  void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,
                     struct rds_sock *rs, u32 *key_ret)
  {
-       struct rds_ib_device *rds_ibdev;
+       struct rds_ib_device *rds_ibdev = NULL;
         struct rds_ib_mr *ibmr = NULL;
-       struct rds_ib_connection *ic = rs->rs_conn->c_transport_data;
+       struct rds_ib_connection *ic = NULL;
         int ret;

+       if (rs->rs_bound_addr == 0) {
+               ret = -EPERM;
+               goto out;
+       }
+
+       ic = rs->rs_conn->c_transport_data;
         rds_ibdev = rds_ib_get_device(rs->rs_bound_addr);
         if (!rds_ibdev) {
                 ret = -ENODEV;

I made this raw patch. If you can reproduce this bug, please make tests 
with it.

Thanks a lot.
>
> = About RaceFuzzer
>
> RaceFuzzer is a customized version of Syzkaller, specifically tailored
> to find race condition bugs in the Linux kernel. While we leverage
> many different technique, the notable feature of RaceFuzzer is in
> leveraging a custom hypervisor (QEMU/KVM) to interleave the
> scheduling. In particular, we modified the hypervisor to intentionally
> stall a per-core execution, which is similar to supporting per-core
> breakpoint functionality. This allows RaceFuzzer to force the kernel
> to deterministically trigger racy condition (which may rarely happen
> in practice due to randomness in scheduling).
>
> RaceFuzzer's C repro always pinpoints two racy syscalls. Since C
> repro's scheduling synchronization should be performed at the user
> space, its reproducibility is limited (reproduction may take from 1
> second to 10 minutes (or even more), depending on a bug). This is
> because, while RaceFuzzer precisely interleaves the scheduling at the
> kernel's instruction level when finding this bug, C repro cannot fully
> utilize such a feature. Please disregard all code related to
> "should_hypercall" in the C repro, as this is only for our debugging
> purposes using our own hypervisor.
>

^ permalink raw reply related

* Re: [RFC] net: Add new LoRaWAN subsystem
From: Jiri Pirko @ 2018-05-11  8:16 UTC (permalink / raw)
  To: Jian-Hong Pan
  Cc: David S. Miller, Alexander Aring, Stefan Schmidt, linux-wpan - ML,
	netdev, linux-kernel
In-Reply-To: <CAC=mGzijVyqEG=DXH4v9WkD0kXR2WOJC4KBPzoT7g5wqVrPGXA@mail.gmail.com>

Tue, May 08, 2018 at 05:33:01PM CEST, starnight@g.ncu.edu.tw wrote:
>A Low-Power Wide-Area Network (LPWAN) is a type of wireless
>telecommunication wide area network designed to allow long range
>communications at a low bit rate among things (connected objects), such
>as sensors operated on a battery.  It can be used widely in IoT area.
>LoRaWAN, which is one kind of implementation of LPWAN, is a medium
>access control (MAC) layer protocol for managing communication between
>LPWAN gateways and end-node devices, maintained by the LoRa Alliance.
>LoRaWAN™ Specification could be downloaded at:
>https://lora-alliance.org/lorawan-for-developers
>
>However, LoRaWAN is not implemented in Linux kernel right now, so I am
>trying to develop it.  Here is my repository:
>https://github.com/starnight/LoRa/tree/lorawan-ndo/LoRaWAN

Link to some out-of-tree module is not enough.
If you want anyone to look at this and comment, you need to base your
work on top of kernel git (net-next for example) and send a patch/patchset.


>
>Because it is a kind of network, the ideal usage in an user space
>program should be like "socket(PF_LORAWAN, SOCK_DGRAM, 0)" and with
>other socket APIs.  Therefore, the definitions like AF_LORAWAN,
>PF_LORAWAN ..., must be listed in the header files of glibc.
>For the driver in kernel space, the definitions also must be listed in
>the corresponding Linux socket header files.
>Especially, both are for the testing programs.
>
>Back to the mentioned "LoRaWAN is not implemented in Linux kernel now".
>Could or should we add the definitions into corresponding kernel header
>files now, if LoRaWAN will be accepted as a subsystem in Linux?
>
>Thanks,
>Jian-Hong Pan

^ permalink raw reply

* Re: [PATCH] dt-bindings: net: ravb: Add support for r8a77990 SoC
From: Sergei Shtylyov @ 2018-05-11  8:42 UTC (permalink / raw)
  To: Yoshihiro Shimoda, netdev, linux-renesas-soc, robh+dt,
	mark.rutland
  Cc: devicetree
In-Reply-To: <1526008736-26496-1-git-send-email-yoshihiro.shimoda.uh@renesas.com>

On 5/11/2018 6:18 AM, Yoshihiro Shimoda wrote:

> Add documentation for r8a77990 compatible string to renesas ravb device
> tree bindings documentation.
> 
> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
[...]

Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

MBR, Sergei

^ permalink raw reply

* [PATCH net-next 0/2] mlxsw: spectrum_span: Two minor adjustments
From: Ido Schimmel @ 2018-05-11  8:57 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, petrm, mlxsw, Ido Schimmel

Petr says:

This patch set fixes a couple of nits in mlxsw's SPAN implementation:
two counts of inaccurate variable name and one count of unsuitable error
code, fixed, respectively, in patches #1 and #2.

Petr Machata (2):
  mlxsw: spectrum_span: Rename misnamed variable l3edev
  mlxsw: spectrum_span: Use a more fitting error code

 .../net/ethernet/mellanox/mlxsw/spectrum_span.c    | 34 +++++++++++-----------
 1 file changed, 17 insertions(+), 17 deletions(-)

-- 
2.14.3

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox