* Re: [PATCH 0/3] Add R8A77980 GEther support
From: Sergei Shtylyov @ 2018-05-18 11:00 UTC (permalink / raw)
To: David Miller; +Cc: netdev, devicetree, robh+dt, mark.rutland, linux-renesas-soc
In-Reply-To: <20180517.145357.2037804925428844092.davem@davemloft.net>
On 05/17/2018 09:53 PM, David Miller wrote:
>> Here's a set of 3 patches against DaveM's 'net-next.git' repo. They (gradually)
>> add R8A77980 GEther support to the 'sh_eth' driver, starting with couple new
>> register bits/values introduced with this chip, and ending with adding a new
>> 'struct sh_eth_cpu_data' instance connected to the new DT "compatible" prop
>> value...
>>
>> [1/1] sh_eth: add RGMII support
>> [2/3] sh_eth: add EDMR.NBST support
>> [3/3] sh_eth: add R8A77980 support
>
> Waiting for a respin of this, correcting the RGMII check in patch #1.
Respun yesterday, will repost RSN. :-)
MBR, Sergei
^ permalink raw reply
* Re: [PATCH v2] netfilter: properly initialize xt_table_info structure
From: Greg Kroah-Hartman @ 2018-05-18 11:04 UTC (permalink / raw)
To: Florian Westphal
Cc: Jan Engelhardt, Eric Dumazet, Greg Hackmann, Pablo Neira Ayuso,
Jozsef Kadlecsik, Michal Kubecek, netfilter-devel, coreteam,
netdev
In-Reply-To: <20180518092756.odlyvxcpgbuistqq@breakpoint.cc>
On Fri, May 18, 2018 at 11:27:56AM +0200, Florian Westphal wrote:
> Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> > On Thu, May 17, 2018 at 12:42:00PM +0200, Jan Engelhardt wrote:
> > >
> > > On Thursday 2018-05-17 12:09, Greg Kroah-Hartman wrote:
> > > >> > --- a/net/netfilter/x_tables.c
> > > >> > +++ b/net/netfilter/x_tables.c
> > > >> > @@ -1183,11 +1183,10 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
> > > >> > * than shoot all processes down before realizing there is nothing
> > > >> > * more to reclaim.
> > > >> > */
> > > >> > - info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
> > > >> > + info = kvzalloc(sz, GFP_KERNEL | __GFP_NORETRY);
> > > >> > if (!info)
> > > >> > return NULL;
> > > >>
> > > >> I am curious, what particular path does not later overwrite the whole zone ?
> > > >
> > > >In do_ipt_get_ctl, the IPT_SO_GET_ENTRIES: option uses a len value that
> > > >can be larger than the size of the structure itself.
> > > >
> > > >Then the data is copied to userspace in copy_entries_to_user() for ipv4
> > > >and v6, and that's where the "bad data"
> > >
> > > If the kernel incorrectly copies more bytes than it should, isn't that
> > > a sign that may be going going past the end of the info buffer?
> > > (And thus, zeroing won't truly fix the issue)
> >
> > No, the buffer size is correct, we just aren't filling up the whole
> > buffer as the data requested is smaller than the buffer size.
>
> I have no objections to the patch but I'd like to understand what
> problem its fixing.
>
> Normal pattern is:
> newinfo = xt_alloc_table_info(tmp.size);
> copy_from_user(newinfo->entries, user + sizeof(tmp), tmp.size);
>
> So inital value of the rule blob area should not matter.
>
> Furthermore, when copying the rule blob back to userspace,
> the kernel is not supposed to copy any padding back to userspace either,
> since commit f32815d21d4d8287336fb9cef4d2d9e0866214c2 only the
> user-relevant parts should be copied (some matches and targets allocate
> kernel-private data such as pointers, and we did use to leak such pointer
> values back to userspace).
Ah, fun, commit f32815d21d4d ("xtables: add xt_match, xt_target and data
copy_to_user functions") showed up in 4.11 and this was reported in 4.4 :(
However, the "bad" code path seems to be from the IPT_SO_GET_ENTRIES
request, which does not look to use the new functions provided in
f32815d21d4d, or am I mistaken?
Let me go work on a reproducer for this to make it a lot more obvious
what is happening, and if it is still even an issue after f32815d21d4d
is applied to a kernel. Sorry for not providing that in the first
place...
thanks,
greg k-h
^ permalink raw reply
* Re: [Cake] [PATCH net-next v12 3/7] sch_cake: Add optional ACK filter
From: Kevin Darbyshire-Bryant @ 2018-05-18 11:18 UTC (permalink / raw)
To: Cong Wang
Cc: Toke Høiland-Jørgensen, Cake List,
Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <CAM_iQpUA1cEx5X3mD9Zhs4YqON5Q_SL1T=EjOd2k6Zbj6vzVyA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1581 bytes --]
> On 18 May 2018, at 05:27, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Thu, May 17, 2018 at 4:23 AM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>> Eric Dumazet <eric.dumazet@gmail.com> writes:
>>
>>> On 05/16/2018 01:29 PM, Toke Høiland-Jørgensen wrote:
>>>> The ACK filter is an optional feature of CAKE which is designed to improve
>>>> performance on links with very asymmetrical rate limits. On such links
>>>> (which are unfortunately quite prevalent, especially for DSL and cable
>>>> subscribers), the downstream throughput can be limited by the number of
>>>> ACKs capable of being transmitted in the *upstream* direction.
>>>>
>>>
>>> ...
>>>
>>>>
>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>>>> ---
>>>> net/sched/sch_cake.c | 260 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 1 file changed, 258 insertions(+), 2 deletions(-)
>>>>
>>>>
>>>
>>> I have decided to implement ACK compression in TCP stack itself.
>>
>> Awesome! Will look forward to seeing that!
>
> +1
>
> It is really odd to put into a TC qdisc, TCP stack is a much better
> place.
Speaking as a user of cake’s ack filtering, although it may be an odd place, it is incredibly useful in my linux based home router middle box that usefully extracts extra usable bandwidth from my asymmetric link. And whilst ack compression/reduction/filtering call it what you will, will come to the linux TCP stack, as yet other OS stacks are less enlightened and benefit from the router’s tweaking/meddling/interference.
[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [Cake] [PATCH net-next v12 3/7] sch_cake: Add optional ACK filter
From: Sebastian Moeller @ 2018-05-18 11:23 UTC (permalink / raw)
To: Kevin Darbyshire-Bryant
Cc: Cong Wang, Cake List, Linux Kernel Network Developers,
Eric Dumazet
In-Reply-To: <05E1D675-B73B-4409-8991-EB89D5538EAB@darbyshire-bryant.me.uk>
Hi Kevin,
> On May 18, 2018, at 13:18, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
>
>
>
>> On 18 May 2018, at 05:27, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>
>> On Thu, May 17, 2018 at 4:23 AM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>>> Eric Dumazet <eric.dumazet@gmail.com> writes:
>>>
>>>> On 05/16/2018 01:29 PM, Toke Høiland-Jørgensen wrote:
>>>>> The ACK filter is an optional feature of CAKE which is designed to improve
>>>>> performance on links with very asymmetrical rate limits. On such links
>>>>> (which are unfortunately quite prevalent, especially for DSL and cable
>>>>> subscribers), the downstream throughput can be limited by the number of
>>>>> ACKs capable of being transmitted in the *upstream* direction.
>>>>>
>>>>
>>>> ...
>>>>
>>>>>
>>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>>>>> ---
>>>>> net/sched/sch_cake.c | 260 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> 1 file changed, 258 insertions(+), 2 deletions(-)
>>>>>
>>>>>
>>>>
>>>> I have decided to implement ACK compression in TCP stack itself.
>>>
>>> Awesome! Will look forward to seeing that!
>>
>> +1
>>
>> It is really odd to put into a TC qdisc, TCP stack is a much better
>> place.
>
> Speaking as a user of cake’s ack filtering, although it may be an odd place, it is incredibly useful in my linux based home router middle box that usefully extracts extra usable bandwidth from my asymmetric link. And whilst ack compression/reduction/filtering call it what you will, will come to the linux TCP stack, as yet other OS stacks are less enlightened and benefit from the router’s tweaking/meddling/interference.
I believe this is a good point, it is really the asymmetry of the link that makes ACK suppression more or less desirable, and it is quite helpful if the adaptation to that link only needs to be configured on one device. I think this is similar to applying MSS clamping on a router to account for say PPPoE overhead as compared to relaying on path MTU discovery or having to configure the MTU on all end-points.
Best Regards
>
>
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
^ permalink raw reply
* Re: [RFC v4 3/5] virtio_ring: add packed ring support
From: Tiwei Bie @ 2018-05-18 11:29 UTC (permalink / raw)
To: Jason Wang; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann
In-Reply-To: <bc38e5a1-e920-7055-dc22-49ac98455257@redhat.com>
On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:
> On 2018年05月16日 22:33, Tiwei Bie wrote:
> > On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:
> > > On 2018年05月16日 21:45, Tiwei Bie wrote:
> > > > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> > > > > On 2018年05月16日 20:39, Tiwei Bie wrote:
> > > > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > > > > > On 2018年05月16日 16:37, Tiwei Bie wrote:
> > [...]
> > > > > > > > +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > > + unsigned int id, void **ctx)
> > > > > > > > +{
> > > > > > > > + struct vring_packed_desc *desc;
> > > > > > > > + unsigned int i, j;
> > > > > > > > +
> > > > > > > > + /* Clear data ptr. */
> > > > > > > > + vq->desc_state[id].data = NULL;
> > > > > > > > +
> > > > > > > > + i = head;
> > > > > > > > +
> > > > > > > > + for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > > > > > + desc = &vq->vring_packed.desc[i];
> > > > > > > > + vring_unmap_one_packed(vq, desc);
> > > > > > > As mentioned in previous discussion, this probably won't work for the case
> > > > > > > of out of order completion since it depends on the information in the
> > > > > > > descriptor ring. We probably need to extend ctx to record such information.
> > > > > > Above code doesn't depend on the information in the descriptor
> > > > > > ring. The vq->desc_state[] is the extended ctx.
> > > > > >
> > > > > > Best regards,
> > > > > > Tiwei Bie
> > > > > Yes, but desc is a pointer to descriptor ring I think so
> > > > > vring_unmap_one_packed() still depends on the content of descriptor ring?
> > > > >
> > > > I got your point now. I think it makes sense to reserve
> > > > the bits of the addr field. Driver shouldn't try to get
> > > > addrs from the descriptors when cleanup the descriptors
> > > > no matter whether we support out-of-order or not.
> > > Maybe I was wrong, but I remember spec mentioned something like this.
> > You're right. Spec mentioned this. I was just repeating
> > the spec to emphasize that it does make sense. :)
> >
> > > > But combining it with the out-of-order support, it will
> > > > mean that the driver still needs to maintain a desc/ctx
> > > > list that is very similar to the desc ring in the split
> > > > ring. I'm not quite sure whether it's something we want.
> > > > If it is true, I'll do it. So do you think we also want
> > > > to maintain such a desc/ctx list for packed ring?
> > > To make it work for OOO backends I think we need something like this
> > > (hardware NIC drivers are usually have something like this).
> > Which hardware NIC drivers have this?
>
> It's quite common I think, e.g driver track e.g dma addr and page frag
> somewhere. e.g the ring->rx_info in mlx4 driver.
It seems that I had a misunderstanding on your
previous comments. I know it's quite common for
drivers to track e.g. DMA addrs somewhere (and
I think one reason behind this is that they want
to reuse the bits of addr field). But tracking
addrs somewhere doesn't means supporting OOO.
I thought you were saying it's quite common for
hardware NIC drivers to support OOO (i.e. NICs
will return the descriptors OOO):
I'm not familiar with mlx4, maybe I'm wrong.
I just had a quick glance. And I found below
comments in mlx4_en_process_rx_cq():
```
/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
* descriptor offset can be deduced from the CQE index instead of
* reading 'cqe->index' */
index = cq->mcq.cons_index & ring->size_mask;
cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
```
It seems that although they have a completion
queue, they are still using the ring in order.
I guess maybe storage device may want OOO.
Best regards,
Tiwei Bie
>
> Thanks
>
> >
> > > Not for the patch, but it looks like having a OUT_OF_ORDER feature bit is
> > > much more simpler to be started with.
> > +1
> >
> > Best regards,
> > Tiwei Bie
>
^ permalink raw reply
* Re: [net-next 3/6] ixgbe: release lock for the duration of ixgbe_suspend_close()
From: Pavel Tatashin @ 2018-05-18 11:37 UTC (permalink / raw)
To: Sergei Shtylyov, Jeff Kirsher, davem; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <c9fbb37d-9213-ff72-aa8c-b998f0b19490@cogentembedded.com>
* parallelized this function, so drop lock for the
>
> Parallelizing? Else the sentence doesn't parse for me. :-)
Hi Sergei,
In a separate series I parallelized device_shutdown(), see:
http://lkml.kernel.org/r/20180516024004.28977-1-pasha.tatashin@oracle.com
But, this particular patch should be dropped, as discussed in this thread:
http://lkml.kernel.org/r/20180503035931.22439-2-pasha.tatashin@oracle.com
Alexander Duyck, made a point that a generic RTNL scalability fix should be done. This particular patch might introduce a race, since it relies on assumption that RTNL is not needed in this place because ixgbe_close() does not have it, but Alexander Duyck, says that the callers of ixgbe_close() are assumed to own this lock.
Thank you,
Pavel
^ permalink raw reply
* [PATCH net] sock_diag: fix use-after-free read in __sk_free
From: Eric Dumazet @ 2018-05-18 11:47 UTC (permalink / raw)
To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet, Craig Gallek
We must not call sock_diag_has_destroy_listeners(sk) on a socket
that has no reference on net structure.
BUG: KASAN: use-after-free in sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline]
BUG: KASAN: use-after-free in __sk_free+0x329/0x340 net/core/sock.c:1609
Read of size 8 at addr ffff88018a02e3a0 by task swapper/1/0
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.17.0-rc5+ #54
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
print_address_description+0x6c/0x20b mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline]
__sk_free+0x329/0x340 net/core/sock.c:1609
sk_free+0x42/0x50 net/core/sock.c:1623
sock_put include/net/sock.h:1664 [inline]
reqsk_free include/net/request_sock.h:116 [inline]
reqsk_put include/net/request_sock.h:124 [inline]
inet_csk_reqsk_queue_drop_and_put net/ipv4/inet_connection_sock.c:672 [inline]
reqsk_timer_handler+0xe27/0x10e0 net/ipv4/inet_connection_sock.c:739
call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
expire_timers kernel/time/timer.c:1363 [inline]
__run_timers+0x79e/0xc50 kernel/time/timer.c:1666
run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
__do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
invoke_softirq kernel/softirq.c:365 [inline]
irq_exit+0x1d1/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:525 [inline]
smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
</IRQ>
RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54
RSP: 0018:ffff8801d9ae7c38 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: dffffc0000000000 RBX: 1ffff1003b35cf8a RCX: 0000000000000000
RDX: 1ffffffff11a30d0 RSI: 0000000000000001 RDI: ffffffff88d18680
RBP: ffff8801d9ae7c38 R08: ffffed003b5e46c3 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
R13: ffff8801d9ae7cf0 R14: ffffffff897bef20 R15: 0000000000000000
arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
default_idle+0xc2/0x440 arch/x86/kernel/process.c:354
arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:345
default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
cpuidle_idle_call kernel/sched/idle.c:153 [inline]
do_idle+0x395/0x560 kernel/sched/idle.c:262
cpu_startup_entry+0x104/0x120 kernel/sched/idle.c:368
start_secondary+0x426/0x5b0 arch/x86/kernel/smpboot.c:269
secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242
Allocated by task 4557:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
kmem_cache_zalloc include/linux/slab.h:691 [inline]
net_alloc net/core/net_namespace.c:383 [inline]
copy_net_ns+0x159/0x4c0 net/core/net_namespace.c:423
create_new_namespaces+0x69d/0x8f0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc3/0x1f0 kernel/nsproxy.c:206
ksys_unshare+0x708/0xf90 kernel/fork.c:2408
__do_sys_unshare kernel/fork.c:2476 [inline]
__se_sys_unshare kernel/fork.c:2474 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2474
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 69:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
__cache_free mm/slab.c:3498 [inline]
kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
net_free net/core/net_namespace.c:399 [inline]
net_drop_ns.part.14+0x11a/0x130 net/core/net_namespace.c:406
net_drop_ns net/core/net_namespace.c:405 [inline]
cleanup_net+0x6a1/0xb20 net/core/net_namespace.c:541
process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
kthread+0x345/0x410 kernel/kthread.c:240
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
The buggy address belongs to the object at ffff88018a02c140
which belongs to the cache net_namespace of size 8832
The buggy address is located 8800 bytes inside of
8832-byte region [ffff88018a02c140, ffff88018a02e3c0)
The buggy address belongs to the page:
page:ffffea0006280b00 count:1 mapcount:0 mapping:ffff88018a02c140 index:0x0 compound_mapcount: 0
flags: 0x2fffc0000008100(slab|head)
raw: 02fffc0000008100 ffff88018a02c140 0000000000000000 0000000100000001
raw: ffffea00062a1320 ffffea0006268020 ffff8801d9bdde40 0000000000000000
page dumped because: kasan: bad access detected
Fixes: b922622ec6ef ("sock_diag: don't broadcast kernel sockets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Craig Gallek <kraig@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
net/core/sock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/sock.c b/net/core/sock.c
index 6444525f610cf8039516744ad26aec58485b9b8a..3b6d02854e57736254975963c45369515f369ddc 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1606,7 +1606,7 @@ static void __sk_free(struct sock *sk)
if (likely(sk->sk_net_refcnt))
sock_inuse_add(sock_net(sk), -1);
- if (unlikely(sock_diag_has_destroy_listeners(sk) && sk->sk_net_refcnt))
+ if (unlikely(sk->sk_net_refcnt && sock_diag_has_destroy_listeners(sk)))
sock_diag_broadcast_destroy(sk);
else
sk_destruct(sk);
--
2.17.0.441.gb46fe60e1d-goog
^ permalink raw reply related
* [PATCH net-next] tipc: eliminate complaint of KMSAN uninit-value in tipc_conn_rcv_sub
From: Ying Xue @ 2018-05-18 11:50 UTC (permalink / raw)
To: netdev; +Cc: tipc-discussion, syzkaller-bugs, davem
As variable s of struct tipc_subscr type is not initialized
in tipc_conn_rcv_from_sock() before it is used in tipc_conn_rcv_sub(),
KMSAN reported the following uninit-value type complaint:
==================================================================
BUG: KMSAN: uninit-value in tipc_conn_rcv_sub+0x184/0x950
net/tipc/topsrv.c:373
CPU: 0 PID: 66 Comm: kworker/u4:4 Not tainted 4.17.0-rc3+ #88
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Workqueue: tipc_rcv tipc_conn_recv_work
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:113
kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
__msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
tipc_conn_rcv_sub+0x184/0x950 net/tipc/topsrv.c:373
tipc_conn_rcv_from_sock net/tipc/topsrv.c:409 [inline]
tipc_conn_recv_work+0x3cd/0x560 net/tipc/topsrv.c:424
process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145
worker_thread+0x113c/0x24f0 kernel/workqueue.c:2279
kthread+0x539/0x720 kernel/kthread.c:239
ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412
Local variable description: ----s.i@tipc_conn_recv_work
Variable was created at:
tipc_conn_recv_work+0x65/0x560 net/tipc/topsrv.c:419
process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145
==================================================================
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 66 Comm: kworker/u4:4 Tainted: G B 4.17.0-rc3+
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Workqueue: tipc_rcv tipc_conn_recv_work
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:113
panic+0x39d/0x940 kernel/panic.c:184
kmsan_report+0x238/0x240 mm/kmsan/kmsan.c:1083
__msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
tipc_conn_rcv_sub+0x184/0x950 net/tipc/topsrv.c:373
tipc_conn_rcv_from_sock net/tipc/topsrv.c:409 [inline]
tipc_conn_recv_work+0x3cd/0x560 net/tipc/topsrv.c:424
process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145
worker_thread+0x113c/0x24f0 kernel/workqueue.c:2279
kthread+0x539/0x720 kernel/kthread.c:239
ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..
Reported-by: syzbot+8951a3065ee7fd6d6e23@syzkaller.appspotmail.com
Signed-off-by: Ying Xue <ying.xue@windriver.com>
---
net/tipc/topsrv.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c
index c8e34ef..fe47a62 100644
--- a/net/tipc/topsrv.c
+++ b/net/tipc/topsrv.c
@@ -397,6 +397,7 @@ static int tipc_conn_rcv_from_sock(struct tipc_conn *con)
struct kvec iov;
int ret;
+ memset(&s, 0, sizeof(s));
iov.iov_base = &s;
iov.iov_len = sizeof(s);
msg.msg_name = NULL;
--
2.7.4
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
^ permalink raw reply related
* [PATCH bpf-next 0/4] AF_XDP follow-up patches, cosmetics
From: Björn Töpel @ 2018-05-18 12:00 UTC (permalink / raw)
To: magnus.karlsson, magnus.karlsson, ast, daniel, netdev
Cc: Björn Töpel
From: Björn Töpel <bjorn.topel@intel.com>
This series contain "cosmetics only" follow-up patches for AF_XDP.
Thanks to Daniel for suggesting them!
Björn Töpel (4):
xsk: clean up SPDX headers
xsk: remove newline at end of file
xsk: fixed some cases of unnecessary parentheses
xsk: proper '=' alignment
include/net/xdp_sock.h | 13 ++-----------
include/uapi/linux/if_xdp.h | 13 ++-----------
kernel/bpf/xskmap.c | 9 ---------
net/xdp/Makefile | 1 -
net/xdp/xdp_umem.c | 13 ++-----------
net/xdp/xdp_umem.h | 13 ++-----------
net/xdp/xdp_umem_props.h | 13 ++-----------
net/xdp/xsk.c | 45 ++++++++++++++++++---------------------------
net/xdp/xsk_queue.c | 12 +-----------
net/xdp/xsk_queue.h | 17 ++++-------------
samples/bpf/xdpsock_user.c | 12 +-----------
11 files changed, 34 insertions(+), 127 deletions(-)
--
2.14.1
^ permalink raw reply
* [PATCH bpf-next 1/4] xsk: clean up SPDX headers
From: Björn Töpel @ 2018-05-18 12:00 UTC (permalink / raw)
To: magnus.karlsson, magnus.karlsson, ast, daniel, netdev
Cc: Björn Töpel
In-Reply-To: <20180518120024.8588-1-bjorn.topel@gmail.com>
From: Björn Töpel <bjorn.topel@intel.com>
Clean up SPDX-License-Identifier and removing licensing leftovers.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
include/net/xdp_sock.h | 13 ++-----------
include/uapi/linux/if_xdp.h | 13 ++-----------
kernel/bpf/xskmap.c | 9 ---------
net/xdp/xdp_umem.c | 9 ---------
net/xdp/xdp_umem.h | 13 ++-----------
net/xdp/xdp_umem_props.h | 13 ++-----------
net/xdp/xsk.c | 9 ---------
net/xdp/xsk_queue.c | 9 ---------
net/xdp/xsk_queue.h | 13 ++-----------
samples/bpf/xdpsock_user.c | 12 +-----------
10 files changed, 11 insertions(+), 102 deletions(-)
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 185f4928fbda..7a647c56ec15 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -1,15 +1,6 @@
-/* SPDX-License-Identifier: GPL-2.0
- * AF_XDP internal functions
+/* SPDX-License-Identifier: GPL-2.0 */
+/* AF_XDP internal functions
* Copyright(c) 2018 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
*/
#ifndef _LINUX_XDP_SOCK_H
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index 77b88c4efe98..56db977221d2 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -1,17 +1,8 @@
-/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
- *
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
* if_xdp: XDP socket user-space interface
* Copyright(c) 2018 Intel Corporation.
*
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
* Author(s): Björn Töpel <bjorn.topel@intel.com>
* Magnus Karlsson <magnus.karlsson@intel.com>
*/
diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
index cb3a12137404..b3c557476a8d 100644
--- a/kernel/bpf/xskmap.c
+++ b/kernel/bpf/xskmap.c
@@ -1,15 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/* XSKMAP used for AF_XDP sockets
* Copyright(c) 2018 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
*/
#include <linux/bpf.h>
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 2b47a1dd7c6c..df4ea97c433b 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -1,15 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/* XDP user-space packet buffer
* Copyright(c) 2018 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
*/
#include <linux/init.h>
diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h
index 7e0b2fab8522..70fe225baa51 100644
--- a/net/xdp/xdp_umem.h
+++ b/net/xdp/xdp_umem.h
@@ -1,15 +1,6 @@
-/* SPDX-License-Identifier: GPL-2.0
- * XDP user-space packet buffer
+/* SPDX-License-Identifier: GPL-2.0 */
+/* XDP user-space packet buffer
* Copyright(c) 2018 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
*/
#ifndef XDP_UMEM_H_
diff --git a/net/xdp/xdp_umem_props.h b/net/xdp/xdp_umem_props.h
index 77fb5daf29f3..2cf8ec485fd2 100644
--- a/net/xdp/xdp_umem_props.h
+++ b/net/xdp/xdp_umem_props.h
@@ -1,15 +1,6 @@
-/* SPDX-License-Identifier: GPL-2.0
- * XDP user-space packet buffer
+/* SPDX-License-Identifier: GPL-2.0 */
+/* XDP user-space packet buffer
* Copyright(c) 2018 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
*/
#ifndef XDP_UMEM_PROPS_H_
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 009c5af5bba5..b8d1cb4d78c0 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -5,15 +5,6 @@
* applications.
* Copyright(c) 2018 Intel Corporation.
*
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
* Author(s): Björn Töpel <bjorn.topel@intel.com>
* Magnus Karlsson <magnus.karlsson@intel.com>
*/
diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c
index d012e5e23591..9f605d22dad4 100644
--- a/net/xdp/xsk_queue.c
+++ b/net/xdp/xsk_queue.c
@@ -1,15 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/* XDP user-space ring structure
* Copyright(c) 2018 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
*/
#include <linux/slab.h>
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 7aa9a535db0e..928d464e57b9 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -1,15 +1,6 @@
-/* SPDX-License-Identifier: GPL-2.0
- * XDP user-space ring structure
+/* SPDX-License-Identifier: GPL-2.0 */
+/* XDP user-space ring structure
* Copyright(c) 2018 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
*/
#ifndef _LINUX_XSK_QUEUE_H
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 7fe60f6f7d53..60a882a2296c 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -1,15 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Copyright(c) 2017 - 2018 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- */
+/* Copyright(c) 2017 - 2018 Intel Corporation. */
#include <assert.h>
#include <errno.h>
--
2.14.1
^ permalink raw reply related
* [PATCH bpf-next 2/4] xsk: remove newline at end of file
From: Björn Töpel @ 2018-05-18 12:00 UTC (permalink / raw)
To: magnus.karlsson, magnus.karlsson, ast, daniel, netdev
Cc: Björn Töpel
In-Reply-To: <20180518120024.8588-1-bjorn.topel@gmail.com>
From: Björn Töpel <bjorn.topel@intel.com>
Minor cleanup, remove newline at end of Makefile.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
net/xdp/Makefile | 1 -
1 file changed, 1 deletion(-)
diff --git a/net/xdp/Makefile b/net/xdp/Makefile
index 074fb2b2d51c..04f073146256 100644
--- a/net/xdp/Makefile
+++ b/net/xdp/Makefile
@@ -1,2 +1 @@
obj-$(CONFIG_XDP_SOCKETS) += xsk.o xdp_umem.o xsk_queue.o
-
--
2.14.1
^ permalink raw reply related
* [PATCH bpf-next 3/4] xsk: fixed some cases of unnecessary parentheses
From: Björn Töpel @ 2018-05-18 12:00 UTC (permalink / raw)
To: magnus.karlsson, magnus.karlsson, ast, daniel, netdev
Cc: Björn Töpel
In-Reply-To: <20180518120024.8588-1-bjorn.topel@gmail.com>
From: Björn Töpel <bjorn.topel@intel.com>
Removed some cases of unnecessary parentheses.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
net/xdp/xdp_umem.c | 4 ++--
net/xdp/xsk_queue.c | 3 +--
net/xdp/xsk_queue.h | 4 ++--
3 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index df4ea97c433b..c47909c74899 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -20,7 +20,7 @@ int xdp_umem_create(struct xdp_umem **umem)
{
*umem = kzalloc(sizeof(**umem), GFP_KERNEL);
- if (!(*umem))
+ if (!*umem)
return -ENOMEM;
return 0;
@@ -247,5 +247,5 @@ int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
bool xdp_umem_validate_queues(struct xdp_umem *umem)
{
- return (umem->fq && umem->cq);
+ return umem->fq && umem->cq;
}
diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c
index 9f605d22dad4..ebe85e59507e 100644
--- a/net/xdp/xsk_queue.c
+++ b/net/xdp/xsk_queue.c
@@ -22,8 +22,7 @@ static u32 xskq_umem_get_ring_size(struct xsk_queue *q)
static u32 xskq_rxtx_get_ring_size(struct xsk_queue *q)
{
- return (sizeof(struct xdp_ring) +
- q->nentries * sizeof(struct xdp_desc));
+ return sizeof(struct xdp_ring) + q->nentries * sizeof(struct xdp_desc);
}
struct xsk_queue *xskq_create(u32 nentries, bool umem_queue)
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 928d464e57b9..62e43be407d8 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -223,12 +223,12 @@ static inline void xskq_produce_flush_desc(struct xsk_queue *q)
static inline bool xskq_full_desc(struct xsk_queue *q)
{
- return (xskq_nb_avail(q, q->nentries) == q->nentries);
+ return xskq_nb_avail(q, q->nentries) == q->nentries;
}
static inline bool xskq_empty_desc(struct xsk_queue *q)
{
- return (xskq_nb_free(q, q->prod_tail, 1) == q->nentries);
+ return xskq_nb_free(q, q->prod_tail, 1) == q->nentries;
}
void xskq_set_umem(struct xsk_queue *q, struct xdp_umem_props *umem_props);
--
2.14.1
^ permalink raw reply related
* [PATCH bpf-next 4/4] xsk: proper '=' alignment
From: Björn Töpel @ 2018-05-18 12:00 UTC (permalink / raw)
To: magnus.karlsson, magnus.karlsson, ast, daniel, netdev
Cc: Björn Töpel
In-Reply-To: <20180518120024.8588-1-bjorn.topel@gmail.com>
From: Björn Töpel <bjorn.topel@intel.com>
Properly align xsk_proto_ops initialization.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
net/xdp/xsk.c | 36 ++++++++++++++++++------------------
1 file changed, 18 insertions(+), 18 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index b8d1cb4d78c0..817340f7725d 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -545,24 +545,24 @@ static struct proto xsk_proto = {
};
static const struct proto_ops xsk_proto_ops = {
- .family = PF_XDP,
- .owner = THIS_MODULE,
- .release = xsk_release,
- .bind = xsk_bind,
- .connect = sock_no_connect,
- .socketpair = sock_no_socketpair,
- .accept = sock_no_accept,
- .getname = sock_no_getname,
- .poll = xsk_poll,
- .ioctl = sock_no_ioctl,
- .listen = sock_no_listen,
- .shutdown = sock_no_shutdown,
- .setsockopt = xsk_setsockopt,
- .getsockopt = xsk_getsockopt,
- .sendmsg = xsk_sendmsg,
- .recvmsg = sock_no_recvmsg,
- .mmap = xsk_mmap,
- .sendpage = sock_no_sendpage,
+ .family = PF_XDP,
+ .owner = THIS_MODULE,
+ .release = xsk_release,
+ .bind = xsk_bind,
+ .connect = sock_no_connect,
+ .socketpair = sock_no_socketpair,
+ .accept = sock_no_accept,
+ .getname = sock_no_getname,
+ .poll = xsk_poll,
+ .ioctl = sock_no_ioctl,
+ .listen = sock_no_listen,
+ .shutdown = sock_no_shutdown,
+ .setsockopt = xsk_setsockopt,
+ .getsockopt = xsk_getsockopt,
+ .sendmsg = xsk_sendmsg,
+ .recvmsg = sock_no_recvmsg,
+ .mmap = xsk_mmap,
+ .sendpage = sock_no_sendpage,
};
static void xsk_destruct(struct sock *sk)
--
2.14.1
^ permalink raw reply related
* (unknown)
From: DaeRyong Jeong @ 2018-05-18 12:04 UTC (permalink / raw)
To: davem, kuznet, yoshfuji
Cc: netdev, linux-kernel, byoungyoung, kt0755, bammanag
Bcc:
Subject: WARNING in ip_recv_error
Reply-To:
We report the crash: WARNING in ip_recv_error
This crash has been found in v4.17-rc1 using RaceFuzzer (a modified
version of Syzkaller), which we describe more at the end of this
report. Our analysis shows that the race occurs when invoking two
syscalls concurrently, do_ipv6_setsockopt and inet_recvmsg.
Diagnosis:
We think the concurrent execution of do_ipv6_setsockopt() with optname
IPV6_ADDRFORM and inet_recvmsg() causes the crash. do_ipv6_setsockopt()
can update sk->prot to &udp_prot and sk->sk_family to PF_INET. But
inet_recvmsg() can execute sk->sk_prot->recvmsg() right after that
sk->prot is updated and sk->sk_family is not updated by
do_ipv6_setsockopt(). This will lead WARN_ON in ip_recv_error().
Thread interleaving:
CPU0 (do_ipv6_setsockopt) CPU1 (inet_recvmsg)
===== =====
struct proto *prot = &udp_prot;
...
sk->sk_prot = prot;
sk->sk_socket->ops = &inet_dgram_ops;
err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
flags & ~MSG_DONTWAIT, &addr_len);
(in udp_recvmsg)
if (flags & MSG_ERRQUEUE)
return ip_recv_error(sk, msg, len, addr_len);
(in ip_recv_error)
WARN_ON_ONCE(sk->sk_family == AF_INET6);
sk->sk_family = PF_INET;
Call Sequence:
CPU0
=====
udpv6_setsockopt
ipv6_setsockopt
do_ipv6_setsockopt
CPU1
=====
sock_recvmsg
sock_recvmsg_nosec
inet_recvmsg
udp_recvmsg
==================================================================
WARNING: CPU: 1 PID: 32600 at /home/daeryong/workspace/new-race-fuzzer/kernels_repo/kernel_v4.17-rc1/net/ipv4/ip_sockglue.c:508 ip_recv_error+0x6f2/0x720 net/ipv4/ip_sockglue.c:508
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 32600 Comm: syz-executor0 Not tainted 4.17.0-rc1 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x166/0x21c lib/dump_stack.c:113
panic+0x1a0/0x3a7 kernel/panic.c:184
__warn+0x191/0x1a0 kernel/panic.c:536
report_bug+0x132/0x1b0 lib/bug.c:186
fixup_bug.part.11+0x28/0x50 arch/x86/kernel/traps.c:178
fixup_bug arch/x86/kernel/traps.c:247 [inline]
do_error_trap+0x28b/0x2d0 arch/x86/kernel/traps.c:296
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
RIP: 0010:ip_recv_error+0x6f2/0x720 net/ipv4/ip_sockglue.c:508
RSP: 0018:ffff8801dadff630 EFLAGS: 00010212
RAX: 0000000000040000 RBX: 0000000000002002 RCX: ffffffff8327de12
RDX: 000000000000008a RSI: ffffc90001a0c000 RDI: ffff8801be615010
RBP: ffff8801dadff720 R08: 0000000000002002 R09: ffff8801dadff918
R10: ffff8801dadff738 R11: ffff8801dadffaff R12: ffff8801be615000
R13: ffff8801dadffd50 R14: 1ffff1003b5bfece R15: ffff8801dadffb90
udp_recvmsg+0x834/0xa10 net/ipv4/udp.c:1571
inet_recvmsg+0x121/0x420 net/ipv4/af_inet.c:830
sock_recvmsg_nosec net/socket.c:802 [inline]
sock_recvmsg+0x7f/0xa0 net/socket.c:809
___sys_recvmsg+0x1f0/0x430 net/socket.c:2279
__sys_recvmsg+0xfc/0x1c0 net/socket.c:2328
__do_sys_recvmsg net/socket.c:2338 [inline]
__se_sys_recvmsg net/socket.c:2335 [inline]
__x64_sys_recvmsg+0x48/0x50 net/socket.c:2335
do_syscall_64+0x15f/0x4a0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4563f9
RSP: 002b:00007f24f6927b28 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
RAX: ffffffffffffffda RBX: 000000000072bfa0 RCX: 00000000004563f9
RDX: 0000000000002002 RSI: 0000000020000240 RDI: 0000000000000016
RBP: 00000000000004e4 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f24f69286d4
R13: 00000000ffffffff R14: 00000000006fc600 R15: 0000000000000000
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..
==================================================================
= About RaceFuzzer
RaceFuzzer is a customized version of Syzkaller, specifically tailored
to find race condition bugs in the Linux kernel. While we leverage
many different technique, the notable feature of RaceFuzzer is in
leveraging a custom hypervisor (QEMU/KVM) to interleave the
scheduling. In particular, we modified the hypervisor to intentionally
stall a per-core execution, which is similar to supporting per-core
breakpoint functionality. This allows RaceFuzzer to force the kernel
to deterministically trigger racy condition (which may rarely happen
in practice due to randomness in scheduling).
RaceFuzzer's C repro always pinpoints two racy syscalls. Since C
repro's scheduling synchronization should be performed at the user
space, its reproducibility is limited (reproduction may take from 1
second to 10 minutes (or even more), depending on a bug). This is
because, while RaceFuzzer precisely interleaves the scheduling at the
kernel's instruction level when finding this bug, C repro cannot fully
utilize such a feature. Please disregard all code related to
"should_hypercall" in the C repro, as this is only for our debugging
purposes using our own hypervisor.
^ permalink raw reply
* WARNING in ip_recv_error
From: DaeRyong Jeong @ 2018-05-18 12:08 UTC (permalink / raw)
To: davem, kuznet, yoshfuji
Cc: netdev, linux-kernel, byoungyoung, kt0755, bammanag
We report the crash: WARNING in ip_recv_error
(I resend the email since I mistakenly missed the subject in my previous
email. I'm sorry.)
This crash has been found in v4.17-rc1 using RaceFuzzer (a modified
version of Syzkaller), which we describe more at the end of this
report. Our analysis shows that the race occurs when invoking two
syscalls concurrently, setsockopt$inet6_IPV6_ADDRFORM and recvmsg.
Diagnosis:
We think the concurrent execution of do_ipv6_setsockopt() with optname
IPV6_ADDRFORM and inet_recvmsg() causes the crash. do_ipv6_setsockopt()
can update sk->prot to &udp_prot and sk->sk_family to PF_INET. But
inet_recvmsg() can execute sk->sk_prot->recvmsg() right after that
sk->prot is updated and sk->sk_family is not updated by
do_ipv6_setsockopt(). This will lead WARN_ON in ip_recv_error().
Thread interleaving:
CPU0 (do_ipv6_setsockopt) CPU1 (inet_recvmsg)
===== =====
struct proto *prot = &udp_prot;
...
sk->sk_prot = prot;
sk->sk_socket->ops = &inet_dgram_ops;
err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
flags & ~MSG_DONTWAIT, &addr_len);
(in udp_recvmsg)
if (flags & MSG_ERRQUEUE)
return ip_recv_error(sk, msg, len, addr_len);
(in ip_recv_error)
WARN_ON_ONCE(sk->sk_family == AF_INET6);
sk->sk_family = PF_INET;
Call Sequence:
CPU0
=====
udpv6_setsockopt
ipv6_setsockopt
do_ipv6_setsockopt
CPU1
=====
sock_recvmsg
sock_recvmsg_nosec
inet_recvmsg
udp_recvmsg
==================================================================
WARNING: CPU: 1 PID: 32600 at /home/daeryong/workspace/new-race-fuzzer/kernels_repo/kernel_v4.17-rc1/net/ipv4/ip_sockglue.c:508 ip_recv_error+0x6f2/0x720 net/ipv4/ip_sockglue.c:508
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 32600 Comm: syz-executor0 Not tainted 4.17.0-rc1 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x166/0x21c lib/dump_stack.c:113
panic+0x1a0/0x3a7 kernel/panic.c:184
__warn+0x191/0x1a0 kernel/panic.c:536
report_bug+0x132/0x1b0 lib/bug.c:186
fixup_bug.part.11+0x28/0x50 arch/x86/kernel/traps.c:178
fixup_bug arch/x86/kernel/traps.c:247 [inline]
do_error_trap+0x28b/0x2d0 arch/x86/kernel/traps.c:296
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
RIP: 0010:ip_recv_error+0x6f2/0x720 net/ipv4/ip_sockglue.c:508
RSP: 0018:ffff8801dadff630 EFLAGS: 00010212
RAX: 0000000000040000 RBX: 0000000000002002 RCX: ffffffff8327de12
RDX: 000000000000008a RSI: ffffc90001a0c000 RDI: ffff8801be615010
RBP: ffff8801dadff720 R08: 0000000000002002 R09: ffff8801dadff918
R10: ffff8801dadff738 R11: ffff8801dadffaff R12: ffff8801be615000
R13: ffff8801dadffd50 R14: 1ffff1003b5bfece R15: ffff8801dadffb90
udp_recvmsg+0x834/0xa10 net/ipv4/udp.c:1571
inet_recvmsg+0x121/0x420 net/ipv4/af_inet.c:830
sock_recvmsg_nosec net/socket.c:802 [inline]
sock_recvmsg+0x7f/0xa0 net/socket.c:809
___sys_recvmsg+0x1f0/0x430 net/socket.c:2279
__sys_recvmsg+0xfc/0x1c0 net/socket.c:2328
__do_sys_recvmsg net/socket.c:2338 [inline]
__se_sys_recvmsg net/socket.c:2335 [inline]
__x64_sys_recvmsg+0x48/0x50 net/socket.c:2335
do_syscall_64+0x15f/0x4a0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4563f9
RSP: 002b:00007f24f6927b28 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
RAX: ffffffffffffffda RBX: 000000000072bfa0 RCX: 00000000004563f9
RDX: 0000000000002002 RSI: 0000000020000240 RDI: 0000000000000016
RBP: 00000000000004e4 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f24f69286d4
R13: 00000000ffffffff R14: 00000000006fc600 R15: 0000000000000000
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..
==================================================================
= About RaceFuzzer
RaceFuzzer is a customized version of Syzkaller, specifically tailored
to find race condition bugs in the Linux kernel. While we leverage
many different technique, the notable feature of RaceFuzzer is in
leveraging a custom hypervisor (QEMU/KVM) to interleave the
scheduling. In particular, we modified the hypervisor to intentionally
stall a per-core execution, which is similar to supporting per-core
breakpoint functionality. This allows RaceFuzzer to force the kernel
to deterministically trigger racy condition (which may rarely happen
in practice due to randomness in scheduling).
RaceFuzzer's C repro always pinpoints two racy syscalls. Since C
repro's scheduling synchronization should be performed at the user
space, its reproducibility is limited (reproduction may take from 1
second to 10 minutes (or even more), depending on a bug). This is
because, while RaceFuzzer precisely interleaves the scheduling at the
kernel's instruction level when finding this bug, C repro cannot fully
utilize such a feature. Please disregard all code related to
"should_hypercall" in the C repro, as this is only for our debugging
purposes using our own hypervisor.
^ permalink raw reply
* Re: [PATCH net-next] tipc: eliminate complaint of KMSAN uninit-value in tipc_conn_rcv_sub
From: Dmitry Vyukov @ 2018-05-18 12:10 UTC (permalink / raw)
To: Ying Xue; +Cc: netdev, David Miller, Jon Maloy, syzkaller-bugs, tipc-discussion
In-Reply-To: <1526644255-9182-1-git-send-email-ying.xue@windriver.com>
On Fri, May 18, 2018 at 1:50 PM, Ying Xue <ying.xue@windriver.com> wrote:
> As variable s of struct tipc_subscr type is not initialized
> in tipc_conn_rcv_from_sock() before it is used in tipc_conn_rcv_sub(),
> KMSAN reported the following uninit-value type complaint:
>
> ==================================================================
> BUG: KMSAN: uninit-value in tipc_conn_rcv_sub+0x184/0x950
> net/tipc/topsrv.c:373
> CPU: 0 PID: 66 Comm: kworker/u4:4 Not tainted 4.17.0-rc3+ #88
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: tipc_rcv tipc_conn_recv_work
> Call Trace:
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0x185/0x1d0 lib/dump_stack.c:113
> kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
> __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
> tipc_conn_rcv_sub+0x184/0x950 net/tipc/topsrv.c:373
> tipc_conn_rcv_from_sock net/tipc/topsrv.c:409 [inline]
> tipc_conn_recv_work+0x3cd/0x560 net/tipc/topsrv.c:424
> process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145
> worker_thread+0x113c/0x24f0 kernel/workqueue.c:2279
> kthread+0x539/0x720 kernel/kthread.c:239
> ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412
>
> Local variable description: ----s.i@tipc_conn_recv_work
> Variable was created at:
> tipc_conn_recv_work+0x65/0x560 net/tipc/topsrv.c:419
> process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145
> ==================================================================
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 0 PID: 66 Comm: kworker/u4:4 Tainted: G B 4.17.0-rc3+
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: tipc_rcv tipc_conn_recv_work
> Call Trace:
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0x185/0x1d0 lib/dump_stack.c:113
> panic+0x39d/0x940 kernel/panic.c:184
> kmsan_report+0x238/0x240 mm/kmsan/kmsan.c:1083
> __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
> tipc_conn_rcv_sub+0x184/0x950 net/tipc/topsrv.c:373
> tipc_conn_rcv_from_sock net/tipc/topsrv.c:409 [inline]
> tipc_conn_recv_work+0x3cd/0x560 net/tipc/topsrv.c:424
> process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145
> worker_thread+0x113c/0x24f0 kernel/workqueue.c:2279
> kthread+0x539/0x720 kernel/kthread.c:239
> ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412
> Dumping ftrace buffer:
> (ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
>
> Reported-by: syzbot+8951a3065ee7fd6d6e23@syzkaller.appspotmail.com
> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> ---
> net/tipc/topsrv.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c
> index c8e34ef..fe47a62 100644
> --- a/net/tipc/topsrv.c
> +++ b/net/tipc/topsrv.c
> @@ -397,6 +397,7 @@ static int tipc_conn_rcv_from_sock(struct tipc_conn *con)
> struct kvec iov;
> int ret;
>
> + memset(&s, 0, sizeof(s));
> iov.iov_base = &s;
> iov.iov_len = sizeof(s);
> msg.msg_name = NULL;
Isn't the kernel bug here that sock_recvmsg does a short read? It
seems that sock_recvmsg should initialize all of s.
^ permalink raw reply
* Re: [PATCH 1/4] arcnet: com20020: Add com20020 io mapped version
From: Andrea Greco @ 2018-05-18 12:18 UTC (permalink / raw)
To: David Miller
Cc: Tobin C. Harding, Andrea Greco, Michael Grzeschik, linux-kernel,
netdev
In-Reply-To: <20180517.163113.2110198960037727630.davem@davemloft.net>
On 05/17/2018 10:31 PM, David Miller wrote:
> From: Andrea Greco <andrea.greco.gapmilano@gmail.com>
> Date: Thu, 17 May 2018 15:05:29 +0200
>
>> + /* Will be set by userspace during if setup */
>> + dev->dev_addr[0] = 0;
>
> Hmmm... really?
>
> Also, every error path from this point forward will leak 'dev'.
>
In com20020.c found this:
/* FIXME: do this some other way! */
if (!dev->dev_addr[0])
dev->dev_addr[0] = arcnet_inb(ioaddr, 8);
NODE-ID, must be univoque, for all arcnet network.
My previews idea was take random value but, this could create a
collision over network.
A possible solution is:
In case of collision com20020 set a bit in status register.
Then peak a new NODE-ID and repeat this while correct NODE-ID is found.
Other ideas is pass it via DTS.
But suppose have 2 same product in same network, same address same problem.
For this reason i prefer left standard driver behavior.
Other ideas for solve this ?
Other question discussed with Tobin in RFC patch is:
At now a devm_ioremap is done by this driver.
Other version of this driver, PCI, PCMCIA, ISA do not remap memory.
Other implementation, use:inb outb for r/w operation.
I do a ugly #ifndef and redefine arcnet_inb in case is defined
CONFIG_ARCNET_COM20020_IO.
My proposal was:
Add relative callback arcnet_inb, arcnet_outb and friends hw struct in
arcdevice.h, then every driver set callback with required function.
Regards, Andrea
^ permalink raw reply
* Re: [PATCH 00/14] Modify action API for implementing lockless actions
From: Jamal Hadi Salim @ 2018-05-18 12:33 UTC (permalink / raw)
To: Vlad Buslov, Jiri Pirko
Cc: Roman Mashak, Linux Kernel Network Developers, David Miller,
Cong Wang, pablo, kadlec, fw, ast, Daniel Borkmann, Eric Dumazet,
kliteyn, Lucas Bates
In-Reply-To: <vbf8t8icsmx.fsf@reg-r-vrt-018-180.mtr.labs.mlnx>
On 17/05/18 09:35 AM, Vlad Buslov wrote:
>
> On Wed 16 May 2018 at 21:51, Jiri Pirko <jiri@resnulli.us> wrote:
>> Wed, May 16, 2018 at 11:23:41PM CEST, vladbu@mellanox.com wrote:
>>>
>>>> Please make sure you have these in your kernel config:
>>>>
>>>> CONFIG_NET_ACT_IFE=y
>>>> CONFIG_NET_IFE_SKBMARK=m
>>>> CONFIG_NET_IFE_SKBPRIO=m
>>>> CONFIG_NET_IFE_SKBTCINDEX=m
>>
>> Roman, could you please add this to some file? Something similar to:
>> tools/testing/selftests/net/forwarding/config
>>
How would putting the file there help?
>> Thanks!
>>
>>>>
>>>> For tdc to run all the tests, it is assumed that all the supported tc
>>>> actions/filters are enabled and compiled.
>>>
>>> Enabling these options allowed all ife tests to pass. Thanks!
>>>
>>> Error in u32 test still appears however:
>>>
>>> Test e9a3: Add u32 with source match
>>>
>>> -----> prepare stage *** Could not execute: "$TC qdisc add dev $DEV1 ingress"
>>>
>>> -----> prepare stage *** Error message: "Cannot find device "v0p1"
>
> I investigated and was able to fix u32 problems.
>
> First of all, u32 test requires having veth interfaces that are not
> created by test infrastructure by default. Following command fixes the
> issue:
>
> sudo ip link add v0p0 type veth peer name v0p1
>
That is documented on the README i believe - however, we should
be able to detect that a test needs the device and create it via
a plugin. Lucas?
> After executing this command test passes, however looking at test
> definition itself it seems meaningless. It creates filter with match
> source IP 127.0.0.1, then tests if filter with source IP 127.0.0.2
> exists, but passes successfully because it actually expects to match
> zero filters with such IP :)
>
> I fixed it and it passed properly matching single filter with source IP
> 127.0.0.2.
>
Please send a patch.
> After this flower test failed. The flower test expects that user
> explicitly provide "-d" option with interface to use. With -d it failed
> again. This time because it expects action to have 1m references, but
> actual value was 1000001. I investigated it and found out that test
> passed, if executed without running other tests first. So it seemed that
> some other test was leaking reference to gact action. It turned out that
> culprit was mirred test 6fb4, which created pipe action but didn't flush
> it afterward.
>
Hopefully the last patch from Roman fixes that? Otherwise send something
on top.
> With all tests passing on that particular version of net-next, I will
> now rebase my changes on top of it and run them again.
>
Thank you Vlad!
cheers,
jamal
^ permalink raw reply
* [PATCH] net: mvpp2: typo and cosmetic fixes
From: Antoine Tenart @ 2018-05-18 12:34 UTC (permalink / raw)
To: davem
Cc: Antoine Tenart, netdev, linux-kernel, thomas.petazzoni,
maxime.chevallier, gregory.clement, miquel.raynal, nadavh,
stefanc, ymarkman, mw
This patch on the Marvell PPv2 driver is only cosmetic. Two typos are
removed as well as other cosmetic fixes, such as extra new lines or tabs
vs spaces.
Suggested-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
---
drivers/net/ethernet/marvell/mvpp2.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index f8ed983bc767..7f54bb0334f1 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -1757,7 +1757,6 @@ static void mvpp2_prs_tcam_ai_update(struct mvpp2_prs_entry *pe,
int i, ai_idx = MVPP2_PRS_TCAM_AI_BYTE;
for (i = 0; i < MVPP2_PRS_AI_BITS; i++) {
-
if (!(enable & BIT(i)))
continue;
@@ -1841,7 +1840,6 @@ static void mvpp2_prs_sram_ai_update(struct mvpp2_prs_entry *pe,
int ai_off = MVPP2_PRS_SRAM_AI_OFFS;
for (i = 0; i < MVPP2_PRS_SRAM_AI_CTRL_BITS; i++) {
-
if (!(mask & BIT(i)))
continue;
@@ -4937,7 +4935,7 @@ static void mvpp22_gop_mask_irq(struct mvpp2_port *port)
if (port->gop_id == 0) {
val = readl(port->base + MVPP22_XLG_EXT_INT_MASK);
val &= ~(MVPP22_XLG_EXT_INT_MASK_XLG |
- MVPP22_XLG_EXT_INT_MASK_GIG);
+ MVPP22_XLG_EXT_INT_MASK_GIG);
writel(val, port->base + MVPP22_XLG_EXT_INT_MASK);
}
@@ -5471,7 +5469,6 @@ static void mvpp2_aggr_txq_pend_desc_add(struct mvpp2_port *port, int pending)
MVPP2_AGGR_TXQ_UPDATE_REG, pending);
}
-
/* Check if there are enough free descriptors in aggregated txq.
* If not, update the number of occupied descriptors and repeat the check.
*
@@ -5551,7 +5548,7 @@ static int mvpp2_txq_reserved_desc_num_proc(struct mvpp2 *priv,
txq_pcpu->reserved_num += mvpp2_txq_alloc_reserved_desc(priv, txq, req);
- /* OK, the descriptor cound has been updated: check again. */
+ /* OK, the descriptor could have been updated: check again. */
if (txq_pcpu->reserved_num < num)
return -ENOMEM;
return 0;
@@ -6033,7 +6030,7 @@ static int mvpp2_txq_init(struct mvpp2_port *port,
/* Calculate base address in prefetch buffer. We reserve 16 descriptors
* for each existing TXQ.
* TCONTS for PON port must be continuous from 0 to MVPP2_MAX_TCONT
- * GBE ports assumed to be continious from 0 to MVPP2_MAX_PORTS
+ * GBE ports assumed to be continuous from 0 to MVPP2_MAX_PORTS
*/
desc_per_txq = 16;
desc = (port->id * MVPP2_MAX_TXQ * desc_per_txq) +
@@ -6603,8 +6600,7 @@ static int mvpp2_tx_frag_process(struct mvpp2_port *port, struct sk_buff *skb,
mvpp2_txdesc_size_set(port, tx_desc, frag->size);
buf_dma_addr = dma_map_single(port->dev->dev.parent, addr,
- frag->size,
- DMA_TO_DEVICE);
+ frag->size, DMA_TO_DEVICE);
if (dma_mapping_error(port->dev->dev.parent, buf_dma_addr)) {
mvpp2_txq_desc_put(txq);
goto cleanup;
--
2.17.0
^ permalink raw reply related
* Re: pull-request Cavium liquidio firmware v1.7.2
From: Josh Boyer @ 2018-05-18 12:36 UTC (permalink / raw)
To: Felix Manlunas
Cc: Linux Firmware, netdev, raghu.vatsavayi, derek.chickles,
satananda.burla
In-Reply-To: <20180509022157.GA1404@felix-thinkpad.cavium.com>
On Wed, May 9, 2018 at 12:54 AM Felix Manlunas <felix.manlunas@cavium.com>
wrote:
> The following changes since commit
8fc2d4e55685bf73b6f7752383da9067404a74bb:
> Merge git://git.marvell.com/mwifiex-firmware (2018-05-07 09:09:40 -0400)
> are available in the git repository at:
> https://github.com/felix-cavium/linux-firmware.git
for-upstreaming-v1.7.2
> for you to fetch changes up to d3b6941e1a85cbff895a92aa9e36b50deaeac970:
> linux-firmware: liquidio: update firmware to v1.7.2 (2018-05-08
19:02:41 -0700)
> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
> ----------------------------------------------------------------
> Felix Manlunas (1):
> linux-firmware: liquidio: update firmware to v1.7.2
> WHENCE | 8 ++++----
> liquidio/lio_210nv_nic.bin | Bin 1265368 -> 1281464 bytes
> liquidio/lio_210sv_nic.bin | Bin 1163128 -> 1179352 bytes
> liquidio/lio_23xx_nic.bin | Bin 1271456 -> 1287264 bytes
> liquidio/lio_410nv_nic.bin | Bin 1265368 -> 1281464 bytes
> 5 files changed, 4 insertions(+), 4 deletions(-)
Pulled and pushed out. Thanks.
josh
^ permalink raw reply
* [PATCH bpf v2 0/6] bpf: enhancements for multi-function programs
From: Sandipan Das @ 2018-05-18 12:50 UTC (permalink / raw)
To: ast, daniel; +Cc: netdev, linuxppc-dev, naveen.n.rao, mpe, jakub.kicinski
This patch series introduces the following:
[1] Support for bpf-to-bpf function calls in the powerpc64 JIT compiler.
[2] Provide a way for resolving function calls because of the way JITed
images are allocated in powerpc64.
[3] Fix to get JITed instruction dumps for multi-function programs from
the bpf system call.
v2:
- Incorporate review comments from Jakub
Sandipan Das (6):
bpf: support 64-bit offsets for bpf function calls
bpf: powerpc64: add JIT support for multi-function programs
bpf: get kernel symbol addresses via syscall
tools: bpf: sync bpf uapi header
tools: bpftool: resolve calls without using imm field
bpf: fix JITed dump for multi-function programs via syscall
arch/powerpc/net/bpf_jit_comp64.c | 79 ++++++++++++++++++++++++++++++++++-----
include/uapi/linux/bpf.h | 2 +
kernel/bpf/syscall.c | 56 ++++++++++++++++++++++++---
kernel/bpf/verifier.c | 22 +++++++----
tools/bpf/bpftool/prog.c | 29 ++++++++++++++
tools/bpf/bpftool/xlated_dumper.c | 10 ++++-
tools/bpf/bpftool/xlated_dumper.h | 2 +
tools/include/uapi/linux/bpf.h | 2 +
8 files changed, 179 insertions(+), 23 deletions(-)
--
2.14.3
^ permalink raw reply
* [PATCH bpf v2 1/6] bpf: support 64-bit offsets for bpf function calls
From: Sandipan Das @ 2018-05-18 12:50 UTC (permalink / raw)
To: ast, daniel; +Cc: netdev, linuxppc-dev, naveen.n.rao, mpe, jakub.kicinski
In-Reply-To: <20180518125039.6500-1-sandipan@linux.vnet.ibm.com>
The imm field of a bpf instruction is a signed 32-bit integer.
For JIT bpf-to-bpf function calls, it stores the offset of the
start address of the callee's JITed image from __bpf_call_base.
For some architectures, such as powerpc64, this offset may be
as large as 64 bits and cannot be accomodated in the imm field
without truncation.
We resolve this by:
[1] Additionally using the auxillary data of each function to
keep a list of start addresses of the JITed images for all
functions determined by the verifier.
[2] Retaining the subprog id inside the off field of the call
instructions and using it to index into the list mentioned
above and lookup the callee's address.
To make sure that the existing JIT compilers continue to work
without requiring changes, we keep the imm field as it is.
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
kernel/bpf/verifier.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a9e4b1372da6..6c56cce9c4e3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5383,11 +5383,24 @@ static int jit_subprogs(struct bpf_verifier_env *env)
insn->src_reg != BPF_PSEUDO_CALL)
continue;
subprog = insn->off;
- insn->off = 0;
insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
func[subprog]->bpf_func -
__bpf_call_base;
}
+
+ /* we use the aux data to keep a list of the start addresses
+ * of the JITed images for each function in the program
+ *
+ * for some architectures, such as powerpc64, the imm field
+ * might not be large enough to hold the offset of the start
+ * address of the callee's JITed image from __bpf_call_base
+ *
+ * in such cases, we can lookup the start address of a callee
+ * by using its subprog id, available from the off field of
+ * the call instruction, as an index for this list
+ */
+ func[i]->aux->func = func;
+ func[i]->aux->func_cnt = env->subprog_cnt + 1;
}
for (i = 0; i < env->subprog_cnt; i++) {
old_bpf_func = func[i]->bpf_func;
--
2.14.3
^ permalink raw reply related
* [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs
From: Sandipan Das @ 2018-05-18 12:50 UTC (permalink / raw)
To: ast, daniel; +Cc: netdev, linuxppc-dev, naveen.n.rao, mpe, jakub.kicinski
In-Reply-To: <20180518125039.6500-1-sandipan@linux.vnet.ibm.com>
This adds support for bpf-to-bpf function calls in the powerpc64
JIT compiler. The JIT compiler converts the bpf call instructions
to native branch instructions. After a round of the usual passes,
the start addresses of the JITed images for the callee functions
are known. Finally, to fixup the branch target addresses, we need
to perform an extra pass.
Because of the address range in which JITed images are allocated
on powerpc64, the offsets of the start addresses of these images
from __bpf_call_base are as large as 64 bits. So, for a function
call, we cannot use the imm field of the instruction to determine
the callee's address. Instead, we use the alternative method of
getting it from the list of function addresses in the auxillary
data of the caller by using the off field as an index.
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
arch/powerpc/net/bpf_jit_comp64.c | 79 ++++++++++++++++++++++++++++++++++-----
1 file changed, 69 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 1bdb1aff0619..25939892d8f7 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
/* Assemble the body code between the prologue & epilogue */
static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
struct codegen_context *ctx,
- u32 *addrs)
+ u32 *addrs, bool extra_pass)
{
const struct bpf_insn *insn = fp->insnsi;
int flen = fp->len;
@@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
break;
/*
- * Call kernel helper
+ * Call kernel helper or bpf function
*/
case BPF_JMP | BPF_CALL:
ctx->seen |= SEEN_FUNC;
- func = (u8 *) __bpf_call_base + imm;
+
+ /* bpf function call */
+ if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass)
+ if (fp->aux->func && off < fp->aux->func_cnt)
+ /* use the subprog id from the off
+ * field to lookup the callee address
+ */
+ func = (u8 *) fp->aux->func[off]->bpf_func;
+ else
+ return -EINVAL;
+ /* kernel helper call */
+ else
+ func = (u8 *) __bpf_call_base + imm;
bpf_jit_emit_func_call(image, ctx, (u64)func);
@@ -864,6 +876,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
return 0;
}
+struct powerpc64_jit_data {
+ struct bpf_binary_header *header;
+ u32 *addrs;
+ u8 *image;
+ u32 proglen;
+ struct codegen_context ctx;
+};
+
struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
{
u32 proglen;
@@ -871,6 +891,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
u8 *image = NULL;
u32 *code_base;
u32 *addrs;
+ struct powerpc64_jit_data *jit_data;
struct codegen_context cgctx;
int pass;
int flen;
@@ -878,6 +899,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
struct bpf_prog *org_fp = fp;
struct bpf_prog *tmp_fp;
bool bpf_blinded = false;
+ bool extra_pass = false;
if (!fp->jit_requested)
return org_fp;
@@ -891,7 +913,28 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
fp = tmp_fp;
}
+ jit_data = fp->aux->jit_data;
+ if (!jit_data) {
+ jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
+ if (!jit_data) {
+ fp = org_fp;
+ goto out;
+ }
+ fp->aux->jit_data = jit_data;
+ }
+
flen = fp->len;
+ addrs = jit_data->addrs;
+ if (addrs) {
+ cgctx = jit_data->ctx;
+ image = jit_data->image;
+ bpf_hdr = jit_data->header;
+ proglen = jit_data->proglen;
+ alloclen = proglen + FUNCTION_DESCR_SIZE;
+ extra_pass = true;
+ goto skip_init_ctx;
+ }
+
addrs = kzalloc((flen+1) * sizeof(*addrs), GFP_KERNEL);
if (addrs == NULL) {
fp = org_fp;
@@ -904,10 +947,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
/* Scouting faux-generate pass 0 */
- if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
+ if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
/* We hit something illegal or unsupported. */
fp = org_fp;
- goto out;
+ goto out_addrs;
}
/*
@@ -925,9 +968,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
bpf_jit_fill_ill_insns);
if (!bpf_hdr) {
fp = org_fp;
- goto out;
+ goto out_addrs;
}
+skip_init_ctx:
code_base = (u32 *)(image + FUNCTION_DESCR_SIZE);
/* Code generation passes 1-2 */
@@ -935,7 +979,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
/* Now build the prologue, body code & epilogue for real. */
cgctx.idx = 0;
bpf_jit_build_prologue(code_base, &cgctx);
- bpf_jit_build_body(fp, code_base, &cgctx, addrs);
+ bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass);
bpf_jit_build_epilogue(code_base, &cgctx);
if (bpf_jit_enable > 1)
@@ -956,15 +1000,30 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
((u64 *)image)[1] = local_paca->kernel_toc;
#endif
+ bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
+
+ if (!fp->is_func || extra_pass) {
+ bpf_jit_binary_lock_ro(bpf_hdr);
+ } else {
+ jit_data->addrs = addrs;
+ jit_data->ctx = cgctx;
+ jit_data->proglen = proglen;
+ jit_data->image = image;
+ jit_data->header = bpf_hdr;
+ }
+
fp->bpf_func = (void *)image;
fp->jited = 1;
fp->jited_len = alloclen;
- bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
+ if (!fp->is_func || extra_pass) {
+out_addrs:
+ kfree(addrs);
+ kfree(jit_data);
+ fp->aux->jit_data = NULL;
+ }
out:
- kfree(addrs);
-
if (bpf_blinded)
bpf_jit_prog_release_other(fp, fp == org_fp ? tmp_fp : org_fp);
--
2.14.3
^ permalink raw reply related
* [PATCH bpf v2 3/6] bpf: get kernel symbol addresses via syscall
From: Sandipan Das @ 2018-05-18 12:50 UTC (permalink / raw)
To: ast, daniel; +Cc: netdev, linuxppc-dev, naveen.n.rao, mpe, jakub.kicinski
In-Reply-To: <20180518125039.6500-1-sandipan@linux.vnet.ibm.com>
This adds new two new fields to struct bpf_prog_info. For
multi-function programs, these fields can be used to pass
a list of kernel symbol addresses for all functions in a
given program and to userspace using the bpf system call
with the BPF_OBJ_GET_INFO_BY_FD command.
When bpf_jit_kallsyms is enabled, we can get the address
of the corresponding kernel symbol for a callee function
and resolve the symbol's name. The address is determined
by adding the value of the call instruction's imm field
to __bpf_call_base. This offset gets assigned to the imm
field by the verifier.
For some architectures, such as powerpc64, the imm field
is not large enough to hold this offset.
We resolve this by:
[1] Assigning the subprog id to the imm field of a call
instruction in the verifier instead of the offset of
the callee's symbol's address from __bpf_call_base.
[2] Determining the address of a callee's corresponding
symbol by using the imm field as an index for the
list of kernel symbol addresses now available from
the program info.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
include/uapi/linux/bpf.h | 2 ++
kernel/bpf/syscall.c | 20 ++++++++++++++++++++
kernel/bpf/verifier.c | 7 +------
3 files changed, 23 insertions(+), 6 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d94d333a8225..040c9cac7303 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2188,6 +2188,8 @@ struct bpf_prog_info {
__u32 xlated_prog_len;
__aligned_u64 jited_prog_insns;
__aligned_u64 xlated_prog_insns;
+ __aligned_u64 jited_ksyms;
+ __u32 nr_jited_ksyms;
__u64 load_time; /* ns since boottime */
__u32 created_by_uid;
__u32 nr_map_ids;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index bfcde949c7f8..54a72fafe57c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1933,6 +1933,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
if (!capable(CAP_SYS_ADMIN)) {
info.jited_prog_len = 0;
info.xlated_prog_len = 0;
+ info.nr_jited_ksyms = 0;
goto done;
}
@@ -1981,6 +1982,25 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
}
}
+ ulen = info.nr_jited_ksyms;
+ info.nr_jited_ksyms = prog->aux->func_cnt;
+ if (info.nr_jited_ksyms && ulen) {
+ u64 __user *user_jited_ksyms = u64_to_user_ptr(info.jited_ksyms);
+ ulong ksym_addr;
+ u32 i;
+
+ /* copy the address of the kernel symbol corresponding to
+ * each function
+ */
+ ulen = min_t(u32, info.nr_jited_ksyms, ulen);
+ for (i = 0; i < ulen; i++) {
+ ksym_addr = (ulong) prog->aux->func[i]->bpf_func;
+ ksym_addr &= PAGE_MASK;
+ if (put_user((u64) ksym_addr, &user_jited_ksyms[i]))
+ return -EFAULT;
+ }
+ }
+
done:
if (copy_to_user(uinfo, &info, info_len) ||
put_user(info_len, &uattr->info.info_len))
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6c56cce9c4e3..e826c396aba2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5426,17 +5426,12 @@ static int jit_subprogs(struct bpf_verifier_env *env)
* later look the same as if they were interpreted only.
*/
for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) {
- unsigned long addr;
-
if (insn->code != (BPF_JMP | BPF_CALL) ||
insn->src_reg != BPF_PSEUDO_CALL)
continue;
insn->off = env->insn_aux_data[i].call_imm;
subprog = find_subprog(env, i + insn->off + 1);
- addr = (unsigned long)func[subprog]->bpf_func;
- addr &= PAGE_MASK;
- insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
- addr - __bpf_call_base;
+ insn->imm = subprog;
}
prog->jited = 1;
--
2.14.3
^ permalink raw reply related
* [PATCH bpf v2 4/6] tools: bpf: sync bpf uapi header
From: Sandipan Das @ 2018-05-18 12:50 UTC (permalink / raw)
To: ast, daniel; +Cc: netdev, linuxppc-dev, naveen.n.rao, mpe, jakub.kicinski
In-Reply-To: <20180518125039.6500-1-sandipan@linux.vnet.ibm.com>
Syncing the bpf.h uapi header with tools so that struct
bpf_prog_info has the two new fields for passing on the
addresses of the kernel symbols corresponding to each
function in a JITed program.
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
tools/include/uapi/linux/bpf.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d94d333a8225..040c9cac7303 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2188,6 +2188,8 @@ struct bpf_prog_info {
__u32 xlated_prog_len;
__aligned_u64 jited_prog_insns;
__aligned_u64 xlated_prog_insns;
+ __aligned_u64 jited_ksyms;
+ __u32 nr_jited_ksyms;
__u64 load_time; /* ns since boottime */
__u32 created_by_uid;
__u32 nr_map_ids;
--
2.14.3
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox