Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
From: Oliver Neukum @ 2018-05-16 10:09 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev@vger.kernel.org, jslaby
In-Reply-To: <0835B3720019904CB8F7AA43166CEEB2D2E47EB4@RTITMBSV06.realtek.com.tw>

Am Mittwoch, den 16.05.2018, 10:00 +0000 schrieb Hayes Wang:
> Oliver Neukum [mailto:oneukum@suse.com]
> > Sent: Wednesday, May 16, 2018 4:27 PM
> 
> [...]
> > > 
> > > Would usb_autopm_get_interface() take a long time?
> > > The driver would wake the device if it has suspended.
> > > I have no idea about how usb_autopm_get_interface() works, so I don't know
> > 
> > how to help.
> > 
> > Hi,
> > 
> > it basically calls r8152_resume() and makes a control request to the
> > hub. I think we are spinning in rtl8152_runtime_resume(), but where?
> > It has a lot of NAPI stuff. Any suggestions on how to instrument or
> > trace this?
> 
> Is rtl8152_runtime_resume() called? I don't see the name in the trace.

Good question. I see nothing else that could produce a live lock.
> 
> I guess the relative API in rtl8152_runtime_resume() are
> 		ops->disable		= rtl8153_disable;
> 		ops->autosuspend_en	= rtl8153_runtime_enable;
> 
> And I don't find any possible dead lock in rtl8152_runtime_resume().
> 
> Besides, I find a similar issue as following.
> https://www.spinics.net/lists/netdev/msg493512.html

Well, if we have an imbalance in NAPI it should strike whereever
it is used, not just in suspend(). Is there debugging for NAPI
we could activate?

	Regards
		Oliver

^ permalink raw reply

* Re: [PATCH V2] mlx4_core: allocate ICM memory in page size chunks
From: Gi-Oh Kim @ 2018-05-16 10:10 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Qing Huang, davem, haakon.bugge, yanjun.zhu, netdev, linux-rdma,
	linux-kernel
In-Reply-To: <465cf5c6-46be-2701-1c26-3e90f31f05e4@mellanox.com>

On Wed, May 16, 2018 at 9:04 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>
>
> On 15/05/2018 9:53 PM, Qing Huang wrote:
>>
>>
>>
>> On 5/15/2018 2:19 AM, Tariq Toukan wrote:
>>>
>>>
>>>
>>> On 14/05/2018 7:41 PM, Qing Huang wrote:
>>>>
>>>>
>>>>
>>>> On 5/13/2018 2:00 AM, Tariq Toukan wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 11/05/2018 10:23 PM, Qing Huang wrote:
>>>>>>
>>>>>> When a system is under memory presure (high usage with fragments),
>>>>>> the original 256KB ICM chunk allocations will likely trigger kernel
>>>>>> memory management to enter slow path doing memory compact/migration
>>>>>> ops in order to complete high order memory allocations.
>>>>>>
>>>>>> When that happens, user processes calling uverb APIs may get stuck
>>>>>> for more than 120s easily even though there are a lot of free pages
>>>>>> in smaller chunks available in the system.
>>>>>>
>>>>>> Syslog:
>>>>>> ...
>>>>>> Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
>>>>>> oracle_205573_e:205573 blocked for more than 120 seconds.
>>>>>> ...
>>>>>>
>>>>>> With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.
>>>>>>
>>>>>> However in order to support smaller ICM chunk size, we need to fix
>>>>>> another issue in large size kcalloc allocations.
>>>>>>
>>>>>> E.g.
>>>>>> Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
>>>>>> size, each ICM chunk can only hold 512 mtt entries (8 bytes for each
>>>>>> mtt
>>>>>> entry). So we need a 16MB allocation for a table->icm pointer array to
>>>>>> hold 2M pointers which can easily cause kcalloc to fail.
>>>>>>
>>>>>> The solution is to use vzalloc to replace kcalloc. There is no need
>>>>>> for contiguous memory pages for a driver meta data structure (no need
>>>>>> of DMA ops).
>>>>>>
>>>>>> Signed-off-by: Qing Huang <qing.huang@oracle.com>
>>>>>> Acked-by: Daniel Jurgens <danielj@mellanox.com>
>>>>>> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>>>>>> ---
>>>>>> v2 -> v1: adjusted chunk size to reflect different architectures.
>>>>>>
>>>>>>   drivers/net/ethernet/mellanox/mlx4/icm.c | 14 +++++++-------
>>>>>>   1 file changed, 7 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c
>>>>>> b/drivers/net/ethernet/mellanox/mlx4/icm.c
>>>>>> index a822f7a..ccb62b8 100644
>>>>>> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c
>>>>>> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
>>>>>> @@ -43,12 +43,12 @@
>>>>>>   #include "fw.h"
>>>>>>     /*
>>>>>> - * We allocate in as big chunks as we can, up to a maximum of 256 KB
>>>>>> - * per chunk.
>>>>>> + * We allocate in page size (default 4KB on many archs) chunks to
>>>>>> avoid high
>>>>>> + * order memory allocations in fragmented/high usage memory
>>>>>> situation.
>>>>>>    */
>>>>>>   enum {
>>>>>> -    MLX4_ICM_ALLOC_SIZE    = 1 << 18,
>>>>>> -    MLX4_TABLE_CHUNK_SIZE    = 1 << 18
>>>>>> +    MLX4_ICM_ALLOC_SIZE    = 1 << PAGE_SHIFT,
>>>>>> +    MLX4_TABLE_CHUNK_SIZE    = 1 << PAGE_SHIFT
>>>>>
>>>>>
>>>>> Which is actually PAGE_SIZE.
>>>>
>>>>
>>>> Yes, we wanted to avoid high order memory allocations.
>>>>
>>>
>>> Then please use PAGE_SIZE instead.
>>
>>
>> PAGE_SIZE is usually defined as 1 << PAGE_SHIFT. So I think PAGE_SHIFT is
>> actually more appropriate here.
>>
>
> Definition of PAGE_SIZE varies among different archs.
> It is not always as simple as 1 << PAGE_SHIFT.
> It might be:
> PAGE_SIZE (1UL << PAGE_SHIFT)
> PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT)
> etc...
>
> Please replace 1 << PAGE_SHIFT with PAGE_SIZE.
>
>>
>>>
>>>>> Also, please add a comma at the end of the last entry.
>>>>
>>>>
>>>> Hmm..., followed the existing code style and checkpatch.pl didn't
>>>> complain about the comma.
>>>>
>>>
>>> I am in favor of having a comma also after the last element, so that when
>>> another enum element is added we do not modify this line again, which would
>>> falsely affect git blame.
>>>
>>> I know it didn't exist before your patch, but once we're here, let's do
>>> it.
>>
>>
>> I'm okay either way. If adding an extra comma is preferred by many people,
>> someone should update checkpatch.pl to enforce it. :)
>>
> I agree.
> Until then, please use an extra comma in this patch.
>
>>>
>>>>>
>>>>>>   };
>>>>>>     static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct
>>>>>> mlx4_icm_chunk *chunk)
>>>>>> @@ -400,7 +400,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev,
>>>>>> struct mlx4_icm_table *table,
>>>>>>       obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size;
>>>>>>       num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk;
>>>>>>   -    table->icm      = kcalloc(num_icm, sizeof(*table->icm),
>>>>>> GFP_KERNEL);
>>>>>> +    table->icm      = vzalloc(num_icm * sizeof(*table->icm));
>>>>>
>>>>>
>>>>> Why not kvzalloc ?
>>>>
>>>>
>>>> I think table->icm pointer array doesn't really need physically
>>>> contiguous memory. Sometimes high order
>>>> memory allocation by kmalloc variants may trigger slow path and cause
>>>> tasks to be blocked.
>>>>
>>>
>>> This is control path so it is less latency-sensitive.
>>> Let's not produce unnecessary degradation here, please call kvzalloc so
>>> we maintain a similar behavior when contiguous memory is available, and a
>>> fallback for resiliency.
>>
>>
>> No sure what exactly degradation is caused by vzalloc here. I think it's
>> better to keep physically contiguous pages
>> to other requests which really need them. Besides slow path/mem compacting
>> can be really expensive.
>>
>
> Degradation is expected when you replace a contig memory with non-contig
> memory, without any perf test.
> We agree that when contig memory is not available, we should use non-contig
> instead of simply failing, and for this you can call kvzalloc.

The expected degradation would be little if the data is not very
performance sensitive.
I think vmalloc would be better in general case.

Even if the server has hundreds of gigabyte memory, even 1MB
contiguous memory is often rare.
For example, I attached /proc/pagetypeinfo of my server running for 153 days.
The largest contiguous memory is 2^7=512KB.

Node    0, zone   Normal, type    Unmovable   4499   9418   4817
732    747    567     18      3      0      0      0
Node    0, zone   Normal, type      Movable  38179  40839  10546
1888    491     51      1      0      0      0      0
Node    0, zone   Normal, type  Reclaimable    117     98   1279
35     21     10      8      0      0      0      0
Node    0, zone   Normal, type   HighAtomic      0      0      0
0      0      0      0      0      0      0      0
Node    0, zone   Normal, type      Isolate      0      0      0
0      0      0      0      0      0      0      0

So I think vmalloc would be good if it is not very performance critical data.



-- 
GIOH KIM
Linux Kernel Entwickler

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 176 2697 8962
Fax:      +49 30 577 008 299
Email:    gi-oh.kim@profitbricks.com
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss, Matthias Steinberg, Christoph Steffens

^ permalink raw reply

* Re: [PATCH 11/14] net: core: add new/replace rate estimator lock parameter
From: Jiri Pirko @ 2018-05-16 10:11 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: netdev, davem, jhs, xiyou.wangcong, pablo, kadlec, fw, ast,
	daniel, edumazet, keescook, linux-kernel, netfilter-devel,
	coreteam, kliteyn
In-Reply-To: <vbf603n2a3q.fsf@reg-r-vrt-018-180.mtr.labs.mlnx>

Wed, May 16, 2018 at 12:00:57PM CEST, vladbu@mellanox.com wrote:
>
>On Wed 16 May 2018 at 09:54, Jiri Pirko <jiri@resnulli.us> wrote:
>> Mon, May 14, 2018 at 04:27:12PM CEST, vladbu@mellanox.com wrote:
>>>Extend rate estimator new and replace APIs with additional spinlock
>>>parameter used by lockless actions to protect rate_est pointer from
>>>concurrent modification.
>>>
>>>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>>
>> [...]
>>
>>
>>> /**
>>>  * gen_new_estimator - create a new rate estimator
>>>  * @bstats: basic statistics
>>>  * @cpu_bstats: bstats per cpu
>>>  * @rate_est: rate estimator statistics
>>>+ * @rate_est_lock: rate_est lock (might be NULL)
>>
>> I cannot find a place you actually use this new arg in this patchset.
>> Did I miss it?
>
>It is used by specific action init function. However, that code was
>moved to next patchset due to patchset size limit.

Please move this patch too.

>
>>
>>
>>>  * @stats_lock: statistics lock
>>>  * @running: qdisc running seqcount
>>>  * @opt: rate estimator configuration TLV
>>
>> [...]
>

^ permalink raw reply

* Re: [RFC v4 5/5] virtio_ring: enable packed ring
From: Sergei Shtylyov @ 2018-05-16 10:15 UTC (permalink / raw)
  To: Tiwei Bie, mst, jasowang, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180516083737.26504-6-tiwei.bie@intel.com>

On 5/16/2018 11:37 AM, Tiwei Bie wrote:

> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>   drivers/virtio/virtio_ring.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index de3839f3621a..b158692263b0 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1940,6 +1940,8 @@ void vring_transport_features(struct virtio_device *vdev)
>   			break;
>   		case VIRTIO_F_IOMMU_PLATFORM:
>   			break;
> +		case VIRTIO_F_RING_PACKED:
> +			break;

    Why not just add this *case* under the previous *case*?

>    		default:
>   			/* We don't understand this bit. */
>   			__virtio_clear_bit(vdev, i);

MBR, Sergei

^ permalink raw reply

* Re: [RFC v4 5/5] virtio_ring: enable packed ring
From: Tiwei Bie @ 2018-05-16 10:21 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: mst, jasowang, virtualization, linux-kernel, netdev, wexu,
	jfreimann
In-Reply-To: <27dfb4e8-6d63-bf7d-0f97-ac51559f8040@cogentembedded.com>

On Wed, May 16, 2018 at 01:15:48PM +0300, Sergei Shtylyov wrote:
> On 5/16/2018 11:37 AM, Tiwei Bie wrote:
> 
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
> >   drivers/virtio/virtio_ring.c | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index de3839f3621a..b158692263b0 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1940,6 +1940,8 @@ void vring_transport_features(struct virtio_device *vdev)
> >   			break;
> >   		case VIRTIO_F_IOMMU_PLATFORM:
> >   			break;
> > +		case VIRTIO_F_RING_PACKED:
> > +			break;
> 
>    Why not just add this *case* under the previous *case*?

Do you mean fallthrough? Something like:

		case VIRTIO_F_IOMMU_PLATFORM:
		case VIRTIO_F_RING_PACKED:
			break;

Best regards,
Tiwei Bie

> 
> >    		default:
> >   			/* We don't understand this bit. */
> >   			__virtio_clear_bit(vdev, i);
> 
> MBR, Sergei

^ permalink raw reply

* Re: mounting NFS on the same host leads to D state
From: maowenan @ 2018-05-16 10:37 UTC (permalink / raw)
  To: netdev@vger.kernel.org, jlayton, linux-nfs, Eric Dumazet
  Cc: yangerkun, weiyongjun (A)
In-Reply-To: <737a947e-a53d-fcd2-bbc3-ca373bea6cb1@huawei.com>

Hi,

I have tested in recent version 4.16 rc7 and find it also has the same issue.
@Eric do you have any comments about this nfs issue?

On 2018/5/15 11:47, maowenan wrote:
> Hi,
> 
> Recently I have tested NFS and exportfs scenario,
> that NFS server and client are in the same host.
> And I found mounting NFS filesystm onto the same host
> can lead to rpc.mountd and related task become D state.
> My kernel version is based on 3.10, and I find 4.15 has the same
> appearance.
> 
> My test step as below:
> 1)create dir.
> mkdir -p /home/test1 /home/test2
> 2)share dir /home/test1
> echo '/home/test1 localhost(rw,all_squash,anonuid=0,anongid=0)' > /etc/exports
> 3)exportfs
> exportfs -vr || echo "Failed to export /home/test1"
> 4)mount NFS.
> mount localhost:/home/test1 /home/test2 -o vers=3,soft
> 5)share dir /home/test2
> echo '/home/test2 *(rw,all_squash,anonuid=0,anongid=0)' >> /etc/exports
> 6)exportfs
> exportfs -vr
> 7) list /home/test2
> ls /home/test2
> then we found ls command is hung, ls and rpc.mountd became "D" state, and after
> 180 second ls command return.
> 
> Another scenario as below:
> 1)create dir.
> mkdir -p /home/test3 /home/test4
> 2)share dir /home/test3
> echo '/home/test3 localhost(rw,sync,no_wdelay,anonuid=0,anongid=0,no_subtree_check)' > /etc/exports
> 3)exportfs
> exportfs -r
> 4)to see NFS status
> showmount -e localhost
> 5)mount NFS
> mount -t nfs4 -o proto=tcp,nolock,soft,timeo=50 localhost:/home/test3 /home/test4
> 6) stop nfs service,and  and check ls task state is D.
> service nfs stop
> ls /home/test4
> ls command is hung and became D state.
> 
> I wonder to know is it reasonable about these test scenario because NFS server and
> client are in the same host? Since some task went into D state, is there any reason about this?
> and is there any patch to fix this issue?
> Here is a link to talk about NFS mounting on the same host,  https://lwn.net/Articles/595652/
> 

^ permalink raw reply

* Re: INFO: rcu detected stall in sctp_packet_transmit
From: Xin Long @ 2018-05-16 10:44 UTC (permalink / raw)
  To: syzbot
  Cc: davem, LKML, linux-sctp, Marcelo Ricardo Leitner, network dev,
	Neil Horman, syzkaller-bugs, Vlad Yasevich
In-Reply-To: <0000000000008149ec056c4e4289@google.com>

On Wed, May 16, 2018 at 4:11 PM, syzbot
<syzbot+ff0b569fb5111dcd1a36@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    961423f9fcbc Merge branch 'sctp-Introduce-sctp_flush_ctx'
> git tree:       net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1366aea7800000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=51fb0a6913f757db
> dashboard link: https://syzkaller.appspot.com/bug?extid=ff0b569fb5111dcd1a36
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+ff0b569fb5111dcd1a36@syzkaller.appspotmail.com
>
> INFO: rcu_sched self-detected stall on CPU
>         0-....: (1 GPs behind) idle=dae/1/4611686018427387908
> softirq=93090/93091 fqs=30902
>          (t=125000 jiffies g=51107 c=51106 q=972)
> NMI backtrace for cpu 0
> CPU: 0 PID: 24668 Comm: syz-executor6 Not tainted 4.17.0-rc4+ #44
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
>  print_cpu_stall kernel/rcu/tree.c:1525 [inline]
>  check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
>  __rcu_pending kernel/rcu/tree.c:3356 [inline]
>  rcu_pending kernel/rcu/tree.c:3401 [inline]
>  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
>  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
>  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
>  __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
>  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1460
>  hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
> RIP: 0010:sctp_v6_xmit+0x259/0x6b0 net/sctp/ipv6.c:219
> RSP: 0018:ffff8801dae068e8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000007 RBX: ffff8801bb7ec800 RCX: ffffffff86f1b345
> RDX: 0000000000000000 RSI: ffffffff86f1b381 RDI: ffff8801b73d97c4
> RBP: ffff8801dae06988 R08: ffff88019505c300 R09: ffffed003b5c46c2
> R10: ffffed003b5c46c2 R11: ffff8801dae23613 R12: ffff88011fd57300
> R13: ffff8801bb7ecec8 R14: 0000000000000029 R15: 0000000000000002
>  sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:642
>  sctp_outq_flush_transports net/sctp/outqueue.c:1164 [inline]
>  sctp_outq_flush+0x5f5/0x3430 net/sctp/outqueue.c:1212
>  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
Shocks, this timer event again. Can we try to minimize the repo.syz and
get a short script, not neccessary to reproduce the issue 100%. we need
to know what it was doing when this happened.

Thanks.

>  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
>  expire_timers kernel/time/timer.c:1363 [inline]
>  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
>  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
>  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
>  invoke_softirq kernel/softirq.c:365 [inline]
>  irq_exit+0x1d1/0x200 kernel/softirq.c:405
>  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
>  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
>  </IRQ>
> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
> [inline]
> RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160
> [inline]
> RIP: 0010:_raw_spin_unlock_irqrestore+0xa1/0xc0
> kernel/locking/spinlock.c:184
> RSP: 0018:ffff880196227328 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
> RAX: dffffc0000000000 RBX: 0000000000000286 RCX: 0000000000000000
> RDX: 1ffffffff11a316d RSI: 0000000000000001 RDI: 0000000000000286
> RBP: ffff880196227338 R08: ffffed003b5c4b81 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801dae25c00
> R13: ffff8801dae25c80 R14: ffff880196227758 R15: ffff8801dae25c00
>  unlock_hrtimer_base kernel/time/hrtimer.c:887 [inline]
>  hrtimer_start_range_ns+0x692/0xd10 kernel/time/hrtimer.c:1118
>  hrtimer_start_expires include/linux/hrtimer.h:412 [inline]
>  futex_wait_queue_me+0x304/0x820 kernel/futex.c:2517
>  futex_wait+0x450/0x9f0 kernel/futex.c:2645
>  do_futex+0x336/0x27d0 kernel/futex.c:3527
>  __do_sys_futex kernel/futex.c:3587 [inline]
>  __se_sys_futex kernel/futex.c:3555 [inline]
>  __x64_sys_futex+0x46a/0x680 kernel/futex.c:3555
>  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x455a09
> RSP: 002b:0000000000a3e938 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: ffffffffffffffda RBX: 0000000000045a9b RCX: 0000000000455a09
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072becc
> RBP: 000000000072becc R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000a3e940 R11: 0000000000000246 R12: 0000000000000019
> R13: 0000000000000002 R14: 000000000072bea0 R15: 0000000000045a8f
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
> syzbot.

^ permalink raw reply

* Re: [RFC PATCH bpf-next 00/12] AF_XDP, zero-copy support
From: Jesper Dangaard Brouer @ 2018-05-16 10:47 UTC (permalink / raw)
  To: Björn Töpel
  Cc: magnus.karlsson, magnus.karlsson, alexander.h.duyck,
	alexander.duyck, john.fastabend, ast, willemdebruijn.kernel,
	daniel, mst, netdev, Björn Töpel, michael.lundkvist,
	jesse.brandeburg, anjali.singhai, qi.z.zhang, intel-wired-lan,
	brouer
In-Reply-To: <20180515190615.23099-1-bjorn.topel@gmail.com>

On Tue, 15 May 2018 21:06:03 +0200
Björn Töpel <bjorn.topel@gmail.com> wrote:

> e have run some benchmarks on a dual socket system with two Broadwell
> E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
> cores which gives a total of 28, but only two cores are used in these
> experiments. One for TR/RX and one for the user space application. The
> memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
> 8192MB and with 8 of those DIMMs in the system we have 64 GB of total
> memory. The compiler used is gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0. The
> NIC is Intel I40E 40Gbit/s using the i40e driver.
> 
> Below are the results in Mpps of the I40E NIC benchmark runs for 64
> and 1500 byte packets, generated by a commercial packet generator HW
> outputing packets at full 40 Gbit/s line rate. The results are without
> retpoline so that we can compare against previous numbers. 
> 
> AF_XDP performance 64 byte packets. Results from the AF_XDP V3 patch
> set are also reported for ease of reference.
> 
> Benchmark   XDP_SKB    XDP_DRV    XDP_DRV with zerocopy
> rxdrop       2.9*       9.6*       21.5
> txpush       2.6*       -          21.6
> l2fwd        1.9*       2.5*       15.0

These performance numbers are actually amazing.

When reaching these amazing/crazy speeds, where we are approaching the
speed of light (travel 30 cm in 1 nanosec), we have to view these
numbers differently, because we are actually working on a nanosec scale.

21.5 Mpps is 46.5 nanosec.

If we want to optimize for +1 Mpps, then (1/22.5*10^3=44.44ns) your
actually only have to optimize the code with 2 nanosec, and with this
2.0 GHz CPU it should in theory only be 4 cycles, but likely have more
instructions per cycle (I see around 2.5 ins per cycle), so we are
looking at (2*2*2.5) needing to find 10 cycles for +1Mpps.

Comparing to XDP_DROP of 32.3Mpps vs ZC-rxdrop 21.5Mpps, this is
actually only a "slowdown" of 15.55 ns, for having frame travel through
xdp_do_redirect, do map lookup etc, and queue into userspace, and
return frames back to kernel.  That is rather amazingly fast.

  (1/21.5*10^3)-(1/32.3*10^3) = 15.55 ns

Another performance number which is amazing is your l2fwd number of
15Mpps, because it if faster than xdp_redirect_map on i40e NICs on my
system, which runs at 12.2 Mpps (2.8Mpps slower).  Again looking at the
nanosec scale instead, this correspond to 15.3 ns.
  I expect, this improvement comes from avoiding page_frag_free, and
avoiding the TX dma_map call (as you premap pages DMA for TX). Reverse
calculating based on perf percentage, I find that these should only
cost 7.18 ns.  Maybe the rest is because you are running TX and TX-dma
completion on another CPU.

I notice you are also using the XDP return-API, which still does a
rhashtable_lookup per frame.  I plan to optimize this to do bulking, to
get away from per frame lookup.  Thus, this should get even faster.

> * From AF_XDP V3 patch set and cover letter.
> 
> AF_XDP performance 1500 byte packets:
> Benchmark   XDP_SKB   XDP_DRV     XDP_DRV with zerocopy
> rxdrop       2.1        3.3       3.3
> l2fwd        1.4        1.8       3.1
> 
> So why do we not get higher values for RX similar to the 34 Mpps we
> had in AF_PACKET V4? We made an experiment running the rxdrop
> benchmark without using the xdp_do_redirect/flush infrastructure nor
> using an XDP program (all traffic on a queue goes to one
> socket). Instead the driver acts directly on the AF_XDP socket. With
> this we got 36.9 Mpps, a significant improvement without any change to
> the uapi. So not forcing users to have an XDP program if they do not
> need it, might be a good idea. This measurement is actually higher
> than what we got with AF_PACKET V4.

So, that are you telling me with your number 36.9 Mpps for
direct-socket-rxdrop...

Compared to XDP_DROP at 32.3Mpps, are you saying that it only costs
3.86 nanosec to call the XDP bpf_prog which returns XDP_DROP.  That is
very impressive actually. (1/32.3*10^3)-(1/36.9*10^3)

Compared to ZC-AF_XDP rxdrop 21.5Mpps, are you saying the cost of XDP
redirect infrastructure, map lookups etc (incl. return-API per frame)
cost 19.41 nanosec (1/21.5*10^3)-(1/36.9*10^3).  Which is approx 40
clock-cycles or 100 (speculative) instructions.  That is not too bad,
and we are still optimizing this stuff.

> XDP performance on our system as a base line:
> 
> 64 byte packets:
> XDP stats       CPU     pps         issue-pps
> XDP-RX CPU      16      32.3M  0
> 
> 1500 byte packets:
> XDP stats       CPU     pps         issue-pps
> XDP-RX CPU      16      3.3M    0

Overall I'm *very* impressed by the performance of ZC AF_XDP.
Just remember that measuring improvement in +N Mpps, is actually
misleading, when operating at these (light) speeds.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: INFO: rcu detected stall in sctp_packet_transmit
From: Dmitry Vyukov @ 2018-05-16 10:53 UTC (permalink / raw)
  To: Xin Long
  Cc: syzbot, davem, LKML, linux-sctp, Marcelo Ricardo Leitner,
	network dev, Neil Horman, syzkaller-bugs, Vlad Yasevich
In-Reply-To: <CADvbK_dj5nLyzZGUmqfiZuoEwDjNxTy_e-VXsEGrWYoGP4m6PA@mail.gmail.com>

On Wed, May 16, 2018 at 12:44 PM, Xin Long <lucien.xin@gmail.com> wrote:
> On Wed, May 16, 2018 at 4:11 PM, syzbot
> <syzbot+ff0b569fb5111dcd1a36@syzkaller.appspotmail.com> wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    961423f9fcbc Merge branch 'sctp-Introduce-sctp_flush_ctx'
>> git tree:       net-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1366aea7800000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=51fb0a6913f757db
>> dashboard link: https://syzkaller.appspot.com/bug?extid=ff0b569fb5111dcd1a36
>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+ff0b569fb5111dcd1a36@syzkaller.appspotmail.com
>>
>> INFO: rcu_sched self-detected stall on CPU
>>         0-....: (1 GPs behind) idle=dae/1/4611686018427387908
>> softirq=93090/93091 fqs=30902
>>          (t=125000 jiffies g=51107 c=51106 q=972)
>> NMI backtrace for cpu 0
>> CPU: 0 PID: 24668 Comm: syz-executor6 Not tainted 4.17.0-rc4+ #44
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Call Trace:
>>  <IRQ>
>>  __dump_stack lib/dump_stack.c:77 [inline]
>>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>>  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>>  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>>  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
>>  print_cpu_stall kernel/rcu/tree.c:1525 [inline]
>>  check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
>>  __rcu_pending kernel/rcu/tree.c:3356 [inline]
>>  rcu_pending kernel/rcu/tree.c:3401 [inline]
>>  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
>>  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>>  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
>>  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
>>  __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
>>  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1460
>>  hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
>>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>>  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
>>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
>> RIP: 0010:sctp_v6_xmit+0x259/0x6b0 net/sctp/ipv6.c:219
>> RSP: 0018:ffff8801dae068e8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
>> RAX: 0000000000000007 RBX: ffff8801bb7ec800 RCX: ffffffff86f1b345
>> RDX: 0000000000000000 RSI: ffffffff86f1b381 RDI: ffff8801b73d97c4
>> RBP: ffff8801dae06988 R08: ffff88019505c300 R09: ffffed003b5c46c2
>> R10: ffffed003b5c46c2 R11: ffff8801dae23613 R12: ffff88011fd57300
>> R13: ffff8801bb7ecec8 R14: 0000000000000029 R15: 0000000000000002
>>  sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:642
>>  sctp_outq_flush_transports net/sctp/outqueue.c:1164 [inline]
>>  sctp_outq_flush+0x5f5/0x3430 net/sctp/outqueue.c:1212
>>  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>>  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>>  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
> Shocks, this timer event again. Can we try to minimize the repo.syz and
> get a short script, not neccessary to reproduce the issue 100%. we need
> to know what it was doing when this happened.
>
> Thanks.

It's possible to reply the whole log from console output following
these instructions:
https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md


>>  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
>>  expire_timers kernel/time/timer.c:1363 [inline]
>>  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
>>  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
>>  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
>>  invoke_softirq kernel/softirq.c:365 [inline]
>>  irq_exit+0x1d1/0x200 kernel/softirq.c:405
>>  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
>>  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
>>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
>>  </IRQ>
>> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
>> [inline]
>> RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160
>> [inline]
>> RIP: 0010:_raw_spin_unlock_irqrestore+0xa1/0xc0
>> kernel/locking/spinlock.c:184
>> RSP: 0018:ffff880196227328 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
>> RAX: dffffc0000000000 RBX: 0000000000000286 RCX: 0000000000000000
>> RDX: 1ffffffff11a316d RSI: 0000000000000001 RDI: 0000000000000286
>> RBP: ffff880196227338 R08: ffffed003b5c4b81 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801dae25c00
>> R13: ffff8801dae25c80 R14: ffff880196227758 R15: ffff8801dae25c00
>>  unlock_hrtimer_base kernel/time/hrtimer.c:887 [inline]
>>  hrtimer_start_range_ns+0x692/0xd10 kernel/time/hrtimer.c:1118
>>  hrtimer_start_expires include/linux/hrtimer.h:412 [inline]
>>  futex_wait_queue_me+0x304/0x820 kernel/futex.c:2517
>>  futex_wait+0x450/0x9f0 kernel/futex.c:2645
>>  do_futex+0x336/0x27d0 kernel/futex.c:3527
>>  __do_sys_futex kernel/futex.c:3587 [inline]
>>  __se_sys_futex kernel/futex.c:3555 [inline]
>>  __x64_sys_futex+0x46a/0x680 kernel/futex.c:3555
>>  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> RIP: 0033:0x455a09
>> RSP: 002b:0000000000a3e938 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
>> RAX: ffffffffffffffda RBX: 0000000000045a9b RCX: 0000000000455a09
>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072becc
>> RBP: 000000000072becc R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000a3e940 R11: 0000000000000246 R12: 0000000000000019
>> R13: 0000000000000002 R14: 000000000072bea0 R15: 0000000000045a8f
>>
>>
>> ---
>> This bug is generated by a bot. It may contain errors.
>> See https://goo.gl/tpsmEJ for more information about syzbot.
>> syzbot engineers can be reached at syzkaller@googlegroups.com.
>>
>> syzbot will keep track of this bug report. See:
>> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
>> syzbot.

^ permalink raw reply

* [PATCH net] net/sched: fix refcnt leak in the error path of tcf_vlan_init()
From: Davide Caratti @ 2018-05-16 10:54 UTC (permalink / raw)
  To: David S. Miller, netdev, Jiri Pirko; +Cc: Jamal Hadi Salim, Roman Mashak

Similarly to what was done with commit a52956dfc503 ("net sched actions:
fix refcnt leak in skbmod"), fix the error path of tcf_vlan_init() to avoid
refcnt leaks when wrong value of TCA_VLAN_PUSH_VLAN_PROTOCOL is given.

Fixes: 5026c9b1bafc ("net sched: vlan action fix late binding")
CC: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/sched/act_vlan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 853604685965..1fb39e1f9d07 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -161,6 +161,8 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
 			case htons(ETH_P_8021AD):
 				break;
 			default:
+				if (exists)
+					tcf_idr_release(*a, bind);
 				return -EPROTONOSUPPORT;
 			}
 		} else {
-- 
2.17.0

^ permalink raw reply related

* [PATCH net-next] ath10k: Remove useless test before clk_disable_unprepare
From: YueHaibing @ 2018-05-16 10:54 UTC (permalink / raw)
  To: kvalo, davem; +Cc: netdev, linux-wireless, YueHaibing

clk_disable_unprepare() already checks that the clock pointer is valid.
No need to test it before calling it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/wireless/ath/ath10k/ahb.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/ahb.c b/drivers/net/wireless/ath/ath10k/ahb.c
index 35d1049..fa39fff 100644
--- a/drivers/net/wireless/ath/ath10k/ahb.c
+++ b/drivers/net/wireless/ath/ath10k/ahb.c
@@ -180,14 +180,11 @@ static void ath10k_ahb_clock_disable(struct ath10k *ar)
 {
 	struct ath10k_ahb *ar_ahb = ath10k_ahb_priv(ar);
 
-	if (!IS_ERR_OR_NULL(ar_ahb->cmd_clk))
-		clk_disable_unprepare(ar_ahb->cmd_clk);
+	clk_disable_unprepare(ar_ahb->cmd_clk);
 
-	if (!IS_ERR_OR_NULL(ar_ahb->ref_clk))
-		clk_disable_unprepare(ar_ahb->ref_clk);
+	clk_disable_unprepare(ar_ahb->ref_clk);
 
-	if (!IS_ERR_OR_NULL(ar_ahb->rtc_clk))
-		clk_disable_unprepare(ar_ahb->rtc_clk);
+	clk_disable_unprepare(ar_ahb->rtc_clk);
 }
 
 static int ath10k_ahb_rst_ctrl_init(struct ath10k *ar)
-- 
2.7.0

^ permalink raw reply related

* [PATCH net-next] net: stmmac: Remove useless test before clk_disable_unprepare
From: YueHaibing @ 2018-05-16 10:59 UTC (permalink / raw)
  To: peppe.cavallaro, davem; +Cc: netdev, linux-kernel, alexandre.torgue, YueHaibing

clk_disable_unprepare() already checks that the clock pointer is valid.
No need to test it before calling it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 24 +++++++-----------------
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
index 13133b3..f08625a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
@@ -1104,30 +1104,20 @@ static int gmac_clk_enable(struct rk_priv_data *bsp_priv, bool enable)
 	} else {
 		if (bsp_priv->clk_enabled) {
 			if (phy_iface == PHY_INTERFACE_MODE_RMII) {
-				if (!IS_ERR(bsp_priv->mac_clk_rx))
-					clk_disable_unprepare(
-						bsp_priv->mac_clk_rx);
+				clk_disable_unprepare(bsp_priv->mac_clk_rx);
 
-				if (!IS_ERR(bsp_priv->clk_mac_ref))
-					clk_disable_unprepare(
-						bsp_priv->clk_mac_ref);
+				clk_disable_unprepare(bsp_priv->clk_mac_ref);
 
-				if (!IS_ERR(bsp_priv->clk_mac_refout))
-					clk_disable_unprepare(
-						bsp_priv->clk_mac_refout);
+				clk_disable_unprepare(bsp_priv->clk_mac_refout);
 			}
 
-			if (!IS_ERR(bsp_priv->clk_phy))
-				clk_disable_unprepare(bsp_priv->clk_phy);
+			clk_disable_unprepare(bsp_priv->clk_phy);
 
-			if (!IS_ERR(bsp_priv->aclk_mac))
-				clk_disable_unprepare(bsp_priv->aclk_mac);
+			clk_disable_unprepare(bsp_priv->aclk_mac);
 
-			if (!IS_ERR(bsp_priv->pclk_mac))
-				clk_disable_unprepare(bsp_priv->pclk_mac);
+			clk_disable_unprepare(bsp_priv->pclk_mac);
 
-			if (!IS_ERR(bsp_priv->mac_clk_tx))
-				clk_disable_unprepare(bsp_priv->mac_clk_tx);
+			clk_disable_unprepare(bsp_priv->mac_clk_tx);
 			/**
 			 * if (!IS_ERR(bsp_priv->clk_mac))
 			 *	clk_disable_unprepare(bsp_priv->clk_mac);
-- 
2.7.0

^ permalink raw reply related

* Re: [PATCH net-next v2 2/2] drivers: net: Remove device_node checks with of_mdiobus_register()
From: Jose Abreu @ 2018-05-16 11:01 UTC (permalink / raw)
  To: Florian Fainelli, netdev
  Cc: Andrew Lunn, Vivien Didelot, David S. Miller, Nicolas Ferre,
	Fugang Duan, Sergei Shtylyov, Giuseppe Cavallaro,
	Alexandre Torgue, Jose Abreu, Grygorii Strashko, Woojung Huh,
	Microchip Linux Driver Support, Rob Herring, Frank Rowand,
	Antoine Tenart, Tobias Jordan, Russell King, Geert
In-Reply-To: <20180515235619.27773-3-f.fainelli@gmail.com>

On 16-05-2018 00:56, Florian Fainelli wrote:
> A number of drivers have the following pattern:
>
> if (np)
> 	of_mdiobus_register()
> else
> 	mdiobus_register()
>
> which the implementation of of_mdiobus_register() now takes care of.
> Remove that pattern in drivers that strictly adhere to it.
>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
>  drivers/net/dsa/bcm_sf2.c                         |  8 ++------
>  drivers/net/dsa/mv88e6xxx/chip.c                  |  5 +----
>  drivers/net/ethernet/cadence/macb_main.c          | 12 +++---------
>  drivers/net/ethernet/freescale/fec_main.c         |  8 ++------
>  drivers/net/ethernet/marvell/mvmdio.c             |  5 +----
>  drivers/net/ethernet/renesas/sh_eth.c             | 11 +++--------
>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c |  5 +----

For stmmac:

Reviewed-by: Jose Abreu <joabreu@synopsys.com>

Thanks and Best Regards,
Jose Miguel Abreu

>  drivers/net/ethernet/ti/davinci_mdio.c            |  8 +++-----
>  drivers/net/phy/mdio-gpio.c                       |  6 +-----
>  drivers/net/phy/mdio-mscc-miim.c                  |  6 +-----
>  drivers/net/usb/lan78xx.c                         |  7 ++-----
>  11 files changed, 20 insertions(+), 61 deletions(-)
>

^ permalink raw reply

* Re: INFO: rcu detected stall in sctp_packet_transmit
From: Xin Long @ 2018-05-16 11:02 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot, davem, LKML, linux-sctp, Marcelo Ricardo Leitner,
	network dev, Neil Horman, syzkaller-bugs, Vlad Yasevich
In-Reply-To: <CACT4Y+YD7KebCZujCW90iSNxPn5GJ2pg=1f7nARA9tphrY_1JQ@mail.gmail.com>

On Wed, May 16, 2018 at 6:53 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Wed, May 16, 2018 at 12:44 PM, Xin Long <lucien.xin@gmail.com> wrote:
>> On Wed, May 16, 2018 at 4:11 PM, syzbot
>> <syzbot+ff0b569fb5111dcd1a36@syzkaller.appspotmail.com> wrote:
>>> Hello,
>>>
>>> syzbot found the following crash on:
>>>
>>> HEAD commit:    961423f9fcbc Merge branch 'sctp-Introduce-sctp_flush_ctx'
>>> git tree:       net-next
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=1366aea7800000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=51fb0a6913f757db
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=ff0b569fb5111dcd1a36
>>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>>>
>>> Unfortunately, I don't have any reproducer for this crash yet.
>>>
>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>> Reported-by: syzbot+ff0b569fb5111dcd1a36@syzkaller.appspotmail.com
>>>
>>> INFO: rcu_sched self-detected stall on CPU
>>>         0-....: (1 GPs behind) idle=dae/1/4611686018427387908
>>> softirq=93090/93091 fqs=30902
>>>          (t=125000 jiffies g=51107 c=51106 q=972)
>>> NMI backtrace for cpu 0
>>> CPU: 0 PID: 24668 Comm: syz-executor6 Not tainted 4.17.0-rc4+ #44
>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>>> Google 01/01/2011
>>> Call Trace:
>>>  <IRQ>
>>>  __dump_stack lib/dump_stack.c:77 [inline]
>>>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>>>  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>>>  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>>>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>>>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>>>  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
>>>  print_cpu_stall kernel/rcu/tree.c:1525 [inline]
>>>  check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
>>>  __rcu_pending kernel/rcu/tree.c:3356 [inline]
>>>  rcu_pending kernel/rcu/tree.c:3401 [inline]
>>>  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
>>>  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>>>  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
>>>  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
>>>  __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
>>>  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1460
>>>  hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1518
>>>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>>>  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
>>>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
>>> RIP: 0010:sctp_v6_xmit+0x259/0x6b0 net/sctp/ipv6.c:219
>>> RSP: 0018:ffff8801dae068e8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
>>> RAX: 0000000000000007 RBX: ffff8801bb7ec800 RCX: ffffffff86f1b345
>>> RDX: 0000000000000000 RSI: ffffffff86f1b381 RDI: ffff8801b73d97c4
>>> RBP: ffff8801dae06988 R08: ffff88019505c300 R09: ffffed003b5c46c2
>>> R10: ffffed003b5c46c2 R11: ffff8801dae23613 R12: ffff88011fd57300
>>> R13: ffff8801bb7ecec8 R14: 0000000000000029 R15: 0000000000000002
>>>  sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:642
>>>  sctp_outq_flush_transports net/sctp/outqueue.c:1164 [inline]
>>>  sctp_outq_flush+0x5f5/0x3430 net/sctp/outqueue.c:1212
>>>  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>>>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>>>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>>>  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>>>  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
>> Shocks, this timer event again. Can we try to minimize the repo.syz and
>> get a short script, not neccessary to reproduce the issue 100%. we need
>> to know what it was doing when this happened.
>>
>> Thanks.
>
> It's possible to reply the whole log from console output following
> these instructions:
> https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md
Thanks, it's running now.
Usually how long will it take to finish running this 5000-line log?

>
>
>>>  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
>>>  expire_timers kernel/time/timer.c:1363 [inline]
>>>  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
>>>  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
>>>  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
>>>  invoke_softirq kernel/softirq.c:365 [inline]
>>>  irq_exit+0x1d1/0x200 kernel/softirq.c:405
>>>  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
>>>  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
>>>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
>>>  </IRQ>
>>> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
>>> [inline]
>>> RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160
>>> [inline]
>>> RIP: 0010:_raw_spin_unlock_irqrestore+0xa1/0xc0
>>> kernel/locking/spinlock.c:184
>>> RSP: 0018:ffff880196227328 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
>>> RAX: dffffc0000000000 RBX: 0000000000000286 RCX: 0000000000000000
>>> RDX: 1ffffffff11a316d RSI: 0000000000000001 RDI: 0000000000000286
>>> RBP: ffff880196227338 R08: ffffed003b5c4b81 R09: 0000000000000000
>>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801dae25c00
>>> R13: ffff8801dae25c80 R14: ffff880196227758 R15: ffff8801dae25c00
>>>  unlock_hrtimer_base kernel/time/hrtimer.c:887 [inline]
>>>  hrtimer_start_range_ns+0x692/0xd10 kernel/time/hrtimer.c:1118
>>>  hrtimer_start_expires include/linux/hrtimer.h:412 [inline]
>>>  futex_wait_queue_me+0x304/0x820 kernel/futex.c:2517
>>>  futex_wait+0x450/0x9f0 kernel/futex.c:2645
>>>  do_futex+0x336/0x27d0 kernel/futex.c:3527
>>>  __do_sys_futex kernel/futex.c:3587 [inline]
>>>  __se_sys_futex kernel/futex.c:3555 [inline]
>>>  __x64_sys_futex+0x46a/0x680 kernel/futex.c:3555
>>>  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
>>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>> RIP: 0033:0x455a09
>>> RSP: 002b:0000000000a3e938 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
>>> RAX: ffffffffffffffda RBX: 0000000000045a9b RCX: 0000000000455a09
>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072becc
>>> RBP: 000000000072becc R08: 0000000000000000 R09: 0000000000000000
>>> R10: 0000000000a3e940 R11: 0000000000000246 R12: 0000000000000019
>>> R13: 0000000000000002 R14: 000000000072bea0 R15: 0000000000045a8f
>>>
>>>
>>> ---
>>> This bug is generated by a bot. It may contain errors.
>>> See https://goo.gl/tpsmEJ for more information about syzbot.
>>> syzbot engineers can be reached at syzkaller@googlegroups.com.
>>>
>>> syzbot will keep track of this bug report. See:
>>> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
>>> syzbot.

^ permalink raw reply

* [PATCH net-next] net: ethoc: Remove useless test before clk_disable_unprepare
From: YueHaibing @ 2018-05-16 11:18 UTC (permalink / raw)
  To: tklauser, davem; +Cc: netdev, linux-kernel, YueHaibing

clk_disable_unprepare() already checks that the clock pointer is valid.
No need to test it before calling it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/ethernet/ethoc.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
index 8bb0db9..00a5727 100644
--- a/drivers/net/ethernet/ethoc.c
+++ b/drivers/net/ethernet/ethoc.c
@@ -1246,8 +1246,7 @@ static int ethoc_probe(struct platform_device *pdev)
 	mdiobus_unregister(priv->mdio);
 	mdiobus_free(priv->mdio);
 free2:
-	if (priv->clk)
-		clk_disable_unprepare(priv->clk);
+	clk_disable_unprepare(priv->clk);
 free:
 	free_netdev(netdev);
 out:
@@ -1271,8 +1270,7 @@ static int ethoc_remove(struct platform_device *pdev)
 			mdiobus_unregister(priv->mdio);
 			mdiobus_free(priv->mdio);
 		}
-		if (priv->clk)
-			clk_disable_unprepare(priv->clk);
+		clk_disable_unprepare(priv->clk);
 		unregister_netdev(netdev);
 		free_netdev(netdev);
 	}
-- 
2.7.0

^ permalink raw reply related

* Re: [PATCH net-next v3 0/7] Microsemi Ocelot Ethernet switch support
From: Alexandre Belloni @ 2018-05-16 11:26 UTC (permalink / raw)
  To: James Hogan
  Cc: Andrew Lunn, David S . Miller, Allan Nielsen, razvan.stefanescu,
	po.liu, Thomas Petazzoni, Florian Fainelli, netdev, linux-kernel,
	linux-mips
In-Reply-To: <20180514214734.GB29541@jamesdev>

On 14/05/2018 22:47:35+0100, James Hogan wrote:
> On Mon, May 14, 2018 at 10:58:44PM +0200, Andrew Lunn wrote:
> > Hi Alexandre
> > > 
> > > The ocelot dts changes are here for reference and should probably go
> > > through the MIPS tree once the bindings are accepted.
> > 
> > For your next version, you probably want to drop those patches, so
> > that David can apply the network patches to net-next.
> 
> Since it sounds like the net patches are ready now, I'll apply the MIPS
> DTS ones for 4.18.
> 

They are in now, tell me if you want me to resend.

Anyway, I'll probably send another patch to enable the driver in
board-ocelot.config


-- 
Alexandre Belloni, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [PATCH bpf-next 2/7] bpf: introduce bpf subcommand BPF_PERF_EVENT_QUERY
From: Peter Zijlstra @ 2018-05-16 11:27 UTC (permalink / raw)
  To: Yonghong Song; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180515234521.856763-3-yhs@fb.com>

On Tue, May 15, 2018 at 04:45:16PM -0700, Yonghong Song wrote:
> Currently, suppose a userspace application has loaded a bpf program
> and attached it to a tracepoint/kprobe/uprobe, and a bpf
> introspection tool, e.g., bpftool, wants to show which bpf program
> is attached to which tracepoint/kprobe/uprobe. Such attachment
> information will be really useful to understand the overall bpf
> deployment in the system.
> 
> There is a name field (16 bytes) for each program, which could
> be used to encode the attachment point. There are some drawbacks
> for this approaches. First, bpftool user (e.g., an admin) may not
> really understand the association between the name and the
> attachment point. Second, if one program is attached to multiple
> places, encoding a proper name which can imply all these
> attachments becomes difficult.
> 
> This patch introduces a new bpf subcommand BPF_PERF_EVENT_QUERY.
> Given a pid and fd, if the <pid, fd> is associated with a
> tracepoint/kprobe/uprobea perf event, BPF_PERF_EVENT_QUERY will return
>    . prog_id
>    . tracepoint name, or
>    . k[ret]probe funcname + offset or kernel addr, or
>    . u[ret]probe filename + offset
> to the userspace.
> The user can use "bpftool prog" to find more information about
> bpf program itself with prog_id.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  include/linux/trace_events.h |  15 ++++++
>  include/uapi/linux/bpf.h     |  25 ++++++++++
>  kernel/bpf/syscall.c         | 113 +++++++++++++++++++++++++++++++++++++++++++
>  kernel/trace/bpf_trace.c     |  53 ++++++++++++++++++++
>  kernel/trace/trace_kprobe.c  |  29 +++++++++++
>  kernel/trace/trace_uprobe.c  |  22 +++++++++
>  6 files changed, 257 insertions(+)

Why is the command called *_PERF_EVENT_* ? Are there not a lot of !perf
places to attach BPF proglets?

^ permalink raw reply

* Re: [PATCH net-next] ath10k: Remove useless test before clk_disable_unprepare
From: Kalle Valo @ 2018-05-16 11:42 UTC (permalink / raw)
  To: YueHaibing; +Cc: davem, netdev, linux-wireless
In-Reply-To: <20180516105454.30212-1-yuehaibing@huawei.com>

YueHaibing <yuehaibing@huawei.com> writes:

> clk_disable_unprepare() already checks that the clock pointer is valid.
> No need to test it before calling it.
>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Just a note that ath10k patches are applied to my ath.git tree, not to
net-next. So adding net-next to the subject is wrong, but no need to
resend just because of that.

-- 
Kalle Valo

^ permalink raw reply

* Re: [RFC v4 5/5] virtio_ring: enable packed ring
From: Sergei Shtylyov @ 2018-05-16 11:42 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180516102159.GA11467@debian>

On 05/16/2018 01:21 PM, Tiwei Bie wrote:

>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>> ---
>>>   drivers/virtio/virtio_ring.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>> index de3839f3621a..b158692263b0 100644
>>> --- a/drivers/virtio/virtio_ring.c
>>> +++ b/drivers/virtio/virtio_ring.c
>>> @@ -1940,6 +1940,8 @@ void vring_transport_features(struct virtio_device *vdev)
>>>   			break;
>>>   		case VIRTIO_F_IOMMU_PLATFORM:
>>>   			break;
>>> +		case VIRTIO_F_RING_PACKED:
>>> +			break;
>>
>>    Why not just add this *case* under the previous *case*?
> 
> Do you mean fallthrough? Something like:
> 
> 		case VIRTIO_F_IOMMU_PLATFORM:
> 		case VIRTIO_F_RING_PACKED:
> 			break;

   Yes, exactly. :-)

> Best regards,
> Tiwei Bie

[...]

MBR, Sergei

^ permalink raw reply

* [PATCH net 0/3] qed: LL2 fixes
From: Michal Kalderon @ 2018-05-16 11:44 UTC (permalink / raw)
  To: michal.kalderon, davem; +Cc: netdev, chad.dupuis, Michal Kalderon, Ariel Elior

This series fixes some issues in ll2 related to synchronization
and resource freeing

Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>

Michal Kalderon (3):
  qed: LL2 flush isles when connection is closed
  qed: Fix possibility of list corruption during rmmod flows
  qed: Fix LL2 race during connection terminate

 drivers/net/ethernet/qlogic/qed/qed_ll2.c | 61 +++++++++++++++++++++++++------
 1 file changed, 50 insertions(+), 11 deletions(-)

-- 
2.9.5

^ permalink raw reply

* [PATCH net 1/3] qed: LL2 flush isles when connection is closed
From: Michal Kalderon @ 2018-05-16 11:44 UTC (permalink / raw)
  To: michal.kalderon, davem; +Cc: netdev, chad.dupuis, Michal Kalderon, Ariel Elior
In-Reply-To: <20180516114440.5048-1-Michal.Kalderon@cavium.com>

Driver should free all pending isles once it gets a FLUSH cqe from FW.
Part of iSCSI out of order flow.

Fixes: 1d6cff4fca4366 ("qed: Add iSCSI out of order packet handling")
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
---
 drivers/net/ethernet/qlogic/qed/qed_ll2.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
index 3850281..b5918bd 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
@@ -591,6 +591,27 @@ static void qed_ll2_rxq_flush(struct qed_hwfn *p_hwfn, u8 connection_handle)
 	}
 }
 
+static bool
+qed_ll2_lb_rxq_handler_slowpath(struct qed_hwfn *p_hwfn,
+				struct core_rx_slow_path_cqe *p_cqe)
+{
+	struct ooo_opaque *iscsi_ooo;
+	u32 cid;
+
+	if (p_cqe->ramrod_cmd_id != CORE_RAMROD_RX_QUEUE_FLUSH)
+		return false;
+
+	iscsi_ooo = (struct ooo_opaque *)&p_cqe->opaque_data;
+	if (iscsi_ooo->ooo_opcode != TCP_EVENT_DELETE_ISLES)
+		return false;
+
+	/* Need to make a flush */
+	cid = le32_to_cpu(iscsi_ooo->cid);
+	qed_ooo_release_connection_isles(p_hwfn, p_hwfn->p_ooo_info, cid);
+
+	return true;
+}
+
 static int qed_ll2_lb_rxq_handler(struct qed_hwfn *p_hwfn,
 				  struct qed_ll2_info *p_ll2_conn)
 {
@@ -617,6 +638,11 @@ static int qed_ll2_lb_rxq_handler(struct qed_hwfn *p_hwfn,
 		cq_old_idx = qed_chain_get_cons_idx(&p_rx->rcq_chain);
 		cqe_type = cqe->rx_cqe_sp.type;
 
+		if (cqe_type == CORE_RX_CQE_TYPE_SLOW_PATH)
+			if (qed_ll2_lb_rxq_handler_slowpath(p_hwfn,
+							    &cqe->rx_cqe_sp))
+				continue;
+
 		if (cqe_type != CORE_RX_CQE_TYPE_REGULAR) {
 			DP_NOTICE(p_hwfn,
 				  "Got a non-regular LB LL2 completion [type 0x%02x]\n",
-- 
2.9.5

^ permalink raw reply related

* [PATCH net 2/3] qed: Fix possibility of list corruption during rmmod flows
From: Michal Kalderon @ 2018-05-16 11:44 UTC (permalink / raw)
  To: michal.kalderon, davem; +Cc: netdev, chad.dupuis, Michal Kalderon, Ariel Elior
In-Reply-To: <20180516114440.5048-1-Michal.Kalderon@cavium.com>

The ll2 flows of flushing the txq/rxq need to be synchronized with the
regular fp processing. Caused list corruption during load/unload stress
tests.

Fixes: 0a7fb11c23c0f ("qed: Add Light L2 support")
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
---
 drivers/net/ethernet/qlogic/qed/qed_ll2.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
index b5918bd..851561f 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
@@ -292,6 +292,7 @@ static void qed_ll2_txq_flush(struct qed_hwfn *p_hwfn, u8 connection_handle)
 	struct qed_ll2_tx_packet *p_pkt = NULL;
 	struct qed_ll2_info *p_ll2_conn;
 	struct qed_ll2_tx_queue *p_tx;
+	unsigned long flags = 0;
 	dma_addr_t tx_frag;
 
 	p_ll2_conn = qed_ll2_handle_sanity_inactive(p_hwfn, connection_handle);
@@ -300,6 +301,7 @@ static void qed_ll2_txq_flush(struct qed_hwfn *p_hwfn, u8 connection_handle)
 
 	p_tx = &p_ll2_conn->tx_queue;
 
+	spin_lock_irqsave(&p_tx->lock, flags);
 	while (!list_empty(&p_tx->active_descq)) {
 		p_pkt = list_first_entry(&p_tx->active_descq,
 					 struct qed_ll2_tx_packet, list_entry);
@@ -309,6 +311,7 @@ static void qed_ll2_txq_flush(struct qed_hwfn *p_hwfn, u8 connection_handle)
 		list_del(&p_pkt->list_entry);
 		b_last_packet = list_empty(&p_tx->active_descq);
 		list_add_tail(&p_pkt->list_entry, &p_tx->free_descq);
+		spin_unlock_irqrestore(&p_tx->lock, flags);
 		if (p_ll2_conn->input.conn_type == QED_LL2_TYPE_OOO) {
 			struct qed_ooo_buffer *p_buffer;
 
@@ -328,7 +331,9 @@ static void qed_ll2_txq_flush(struct qed_hwfn *p_hwfn, u8 connection_handle)
 						      b_last_frag,
 						      b_last_packet);
 		}
+		spin_lock_irqsave(&p_tx->lock, flags);
 	}
+	spin_unlock_irqrestore(&p_tx->lock, flags);
 }
 
 static int qed_ll2_txq_completion(struct qed_hwfn *p_hwfn, void *p_cookie)
@@ -556,6 +561,7 @@ static void qed_ll2_rxq_flush(struct qed_hwfn *p_hwfn, u8 connection_handle)
 	struct qed_ll2_info *p_ll2_conn = NULL;
 	struct qed_ll2_rx_packet *p_pkt = NULL;
 	struct qed_ll2_rx_queue *p_rx;
+	unsigned long flags = 0;
 
 	p_ll2_conn = qed_ll2_handle_sanity_inactive(p_hwfn, connection_handle);
 	if (!p_ll2_conn)
@@ -563,13 +569,14 @@ static void qed_ll2_rxq_flush(struct qed_hwfn *p_hwfn, u8 connection_handle)
 
 	p_rx = &p_ll2_conn->rx_queue;
 
+	spin_lock_irqsave(&p_rx->lock, flags);
 	while (!list_empty(&p_rx->active_descq)) {
 		p_pkt = list_first_entry(&p_rx->active_descq,
 					 struct qed_ll2_rx_packet, list_entry);
 		if (!p_pkt)
 			break;
-
 		list_move_tail(&p_pkt->list_entry, &p_rx->free_descq);
+		spin_unlock_irqrestore(&p_rx->lock, flags);
 
 		if (p_ll2_conn->input.conn_type == QED_LL2_TYPE_OOO) {
 			struct qed_ooo_buffer *p_buffer;
@@ -588,7 +595,9 @@ static void qed_ll2_rxq_flush(struct qed_hwfn *p_hwfn, u8 connection_handle)
 						      cookie,
 						      rx_buf_addr, b_last);
 		}
+		spin_lock_irqsave(&p_rx->lock, flags);
 	}
+	spin_unlock_irqrestore(&p_rx->lock, flags);
 }
 
 static bool
-- 
2.9.5

^ permalink raw reply related

* [PATCH net 3/3] qed: Fix LL2 race during connection terminate
From: Michal Kalderon @ 2018-05-16 11:44 UTC (permalink / raw)
  To: michal.kalderon, davem; +Cc: netdev, chad.dupuis, Michal Kalderon, Ariel Elior
In-Reply-To: <20180516114440.5048-1-Michal.Kalderon@cavium.com>

Stress on qedi/qedr load unload lead to list_del corruption.
This is due to ll2 connection terminate freeing resources without
verifying that no more ll2 processing will occur.

This patch unregisters the ll2 status block before terminating
the connection to assure this race does not occur.

Fixes: 1d6cff4fca4366 ("qed: Add iSCSI out of order packet handling")
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
---
 drivers/net/ethernet/qlogic/qed/qed_ll2.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
index 851561f..468c59d 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
@@ -829,6 +829,9 @@ static int qed_ll2_lb_rxq_completion(struct qed_hwfn *p_hwfn, void *p_cookie)
 	struct qed_ll2_info *p_ll2_conn = (struct qed_ll2_info *)p_cookie;
 	int rc;
 
+	if (!QED_LL2_RX_REGISTERED(p_ll2_conn))
+		return 0;
+
 	rc = qed_ll2_lb_rxq_handler(p_hwfn, p_ll2_conn);
 	if (rc)
 		return rc;
@@ -849,6 +852,9 @@ static int qed_ll2_lb_txq_completion(struct qed_hwfn *p_hwfn, void *p_cookie)
 	u16 new_idx = 0, num_bds = 0;
 	int rc;
 
+	if (!QED_LL2_TX_REGISTERED(p_ll2_conn))
+		return 0;
+
 	new_idx = le16_to_cpu(*p_tx->p_fw_cons);
 	num_bds = ((s16)new_idx - (s16)p_tx->bds_idx);
 
@@ -1902,17 +1908,25 @@ int qed_ll2_terminate_connection(void *cxt, u8 connection_handle)
 
 	/* Stop Tx & Rx of connection, if needed */
 	if (QED_LL2_TX_REGISTERED(p_ll2_conn)) {
+		p_ll2_conn->tx_queue.b_cb_registred = false;
+		smp_wmb(); /* Make sure this is seen by ll2_lb_rxq_completion */
 		rc = qed_sp_ll2_tx_queue_stop(p_hwfn, p_ll2_conn);
 		if (rc)
 			goto out;
+
 		qed_ll2_txq_flush(p_hwfn, connection_handle);
+		qed_int_unregister_cb(p_hwfn, p_ll2_conn->tx_queue.tx_sb_index);
 	}
 
 	if (QED_LL2_RX_REGISTERED(p_ll2_conn)) {
+		p_ll2_conn->rx_queue.b_cb_registred = false;
+		smp_wmb(); /* Make sure this is seen by ll2_lb_rxq_completion */
 		rc = qed_sp_ll2_rx_queue_stop(p_hwfn, p_ll2_conn);
 		if (rc)
 			goto out;
+
 		qed_ll2_rxq_flush(p_hwfn, connection_handle);
+		qed_int_unregister_cb(p_hwfn, p_ll2_conn->rx_queue.rx_sb_index);
 	}
 
 	if (p_ll2_conn->input.conn_type == QED_LL2_TYPE_OOO)
@@ -1960,16 +1974,6 @@ void qed_ll2_release_connection(void *cxt, u8 connection_handle)
 	if (!p_ll2_conn)
 		return;
 
-	if (QED_LL2_RX_REGISTERED(p_ll2_conn)) {
-		p_ll2_conn->rx_queue.b_cb_registred = false;
-		qed_int_unregister_cb(p_hwfn, p_ll2_conn->rx_queue.rx_sb_index);
-	}
-
-	if (QED_LL2_TX_REGISTERED(p_ll2_conn)) {
-		p_ll2_conn->tx_queue.b_cb_registred = false;
-		qed_int_unregister_cb(p_hwfn, p_ll2_conn->tx_queue.tx_sb_index);
-	}
-
 	kfree(p_ll2_conn->tx_queue.descq_mem);
 	qed_chain_free(p_hwfn->cdev, &p_ll2_conn->tx_queue.txq_chain);
 
-- 
2.9.5

^ permalink raw reply related

* Re: [PATCH net] net/sched: fix refcnt leak in the error path of tcf_vlan_init()
From: Jamal Hadi Salim @ 2018-05-16 11:45 UTC (permalink / raw)
  To: Davide Caratti, David S. Miller, netdev, Jiri Pirko; +Cc: Roman Mashak
In-Reply-To: <b2e7cfab172461a824959d683d8c868fe04119a3.1526467698.git.dcaratti@redhat.com>

On 16/05/18 06:54 AM, Davide Caratti wrote:
> Similarly to what was done with commit a52956dfc503 ("net sched actions:
> fix refcnt leak in skbmod"), fix the error path of tcf_vlan_init() to avoid
> refcnt leaks when wrong value of TCA_VLAN_PUSH_VLAN_PROTOCOL is given.
> 
> Fixes: 5026c9b1bafc ("net sched: vlan action fix late binding")
> CC: Roman Mashak <mrv@mojatatu.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> ---
>   net/sched/act_vlan.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
> index 853604685965..1fb39e1f9d07 100644
> --- a/net/sched/act_vlan.c
> +++ b/net/sched/act_vlan.c
> @@ -161,6 +161,8 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
>   			case htons(ETH_P_8021AD):
>   				break;
>   			default:
> +				if (exists)
> +					tcf_idr_release(*a, bind);
>   				return -EPROTONOSUPPORT;

LGTM.
Note: 5026c9b1bafc fixed the bug that existed. It missed a few
spots like the one here. If there was an "Updates" tag, it would
be more appropriate.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* Re: [RFC v4 3/5] virtio_ring: add packed ring support
From: Jason Wang @ 2018-05-16 11:50 UTC (permalink / raw)
  To: Tiwei Bie, mst, virtualization, linux-kernel, netdev; +Cc: wexu, jfreimann
In-Reply-To: <20180516083737.26504-4-tiwei.bie@intel.com>



On 2018年05月16日 16:37, Tiwei Bie wrote:
> This commit introduces the basic support (without EVENT_IDX)
> for packed ring.
>
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>   drivers/virtio/virtio_ring.c | 491 ++++++++++++++++++++++++++++++++++-
>   1 file changed, 481 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 62d7c407841a..c6c5deb0e3ae 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -58,7 +58,8 @@
>   
>   struct vring_desc_state {
>   	void *data;			/* Data for callback. */
> -	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
> +	void *indir_desc;		/* Indirect descriptor, if any. */
> +	int num;			/* Descriptor list length. */
>   };
>   
>   struct vring_virtqueue {
> @@ -116,6 +117,9 @@ struct vring_virtqueue {
>   			/* Last written value to driver->flags in
>   			 * guest byte order. */
>   			u16 event_flags_shadow;
> +
> +			/* ID allocation. */
> +			struct idr buffer_id;

I'm not sure idr is fit for the performance critical case here. Need to 
measure its performance impact, especially if we have few unused slots.

>   		};
>   	};
>   
> @@ -142,6 +146,16 @@ struct vring_virtqueue {
>   
>   #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
>   
> +static inline bool virtqueue_use_indirect(struct virtqueue *_vq,
> +					  unsigned int total_sg)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	/* If the host supports indirect descriptor tables, and we have multiple
> +	 * buffers, then go indirect. FIXME: tune this threshold */
> +	return (vq->indirect && total_sg > 1 && vq->vq.num_free);
> +}
> +
>   /*
>    * Modern virtio devices have feature bits to specify whether they need a
>    * quirk and bypass the IOMMU. If not there, just use the DMA API.
> @@ -327,9 +341,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>   
>   	head = vq->free_head;
>   
> -	/* If the host supports indirect descriptor tables, and we have multiple
> -	 * buffers, then go indirect. FIXME: tune this threshold */
> -	if (vq->indirect && total_sg > 1 && vq->vq.num_free)
> +	if (virtqueue_use_indirect(_vq, total_sg))
>   		desc = alloc_indirect_split(_vq, total_sg, gfp);
>   	else {
>   		desc = NULL;
> @@ -741,6 +753,63 @@ static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
>   		& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
>   }
>   
> +static void vring_unmap_one_packed(const struct vring_virtqueue *vq,
> +				   struct vring_packed_desc *desc)
> +{
> +	u16 flags;
> +
> +	if (!vring_use_dma_api(vq->vq.vdev))
> +		return;
> +
> +	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> +
> +	if (flags & VRING_DESC_F_INDIRECT) {
> +		dma_unmap_single(vring_dma_dev(vq),
> +				 virtio64_to_cpu(vq->vq.vdev, desc->addr),
> +				 virtio32_to_cpu(vq->vq.vdev, desc->len),
> +				 (flags & VRING_DESC_F_WRITE) ?
> +				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
> +	} else {
> +		dma_unmap_page(vring_dma_dev(vq),
> +			       virtio64_to_cpu(vq->vq.vdev, desc->addr),
> +			       virtio32_to_cpu(vq->vq.vdev, desc->len),
> +			       (flags & VRING_DESC_F_WRITE) ?
> +			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
> +	}
> +}
> +
> +static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
> +						       unsigned int total_sg,
> +						       gfp_t gfp)
> +{
> +	struct vring_packed_desc *desc;
> +
> +	/*
> +	 * We require lowmem mappings for the descriptors because
> +	 * otherwise virt_to_phys will give us bogus addresses in the
> +	 * virtqueue.
> +	 */
> +	gfp &= ~__GFP_HIGHMEM;
> +
> +	desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
> +
> +	return desc;
> +}
> +
> +static u16 alloc_id_packed(struct vring_virtqueue *vq)
> +{
> +	u16 id;
> +
> +	id = idr_alloc(&vq->buffer_id, NULL, 0, vq->vring_packed.num,
> +		       GFP_KERNEL);
> +	return id;
> +}
> +
> +static void free_id_packed(struct vring_virtqueue *vq, u16 id)
> +{
> +	idr_remove(&vq->buffer_id, id);
> +}
> +
>   static inline int virtqueue_add_packed(struct virtqueue *_vq,
>   				       struct scatterlist *sgs[],
>   				       unsigned int total_sg,
> @@ -750,47 +819,446 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>   				       void *ctx,
>   				       gfp_t gfp)
>   {
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	struct vring_packed_desc *desc;
> +	struct scatterlist *sg;
> +	unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
> +	__virtio16 uninitialized_var(head_flags), flags;
> +	u16 head, wrap_counter, id;
> +	bool indirect;
> +
> +	START_USE(vq);
> +
> +	BUG_ON(data == NULL);
> +	BUG_ON(ctx && vq->indirect);
> +
> +	if (unlikely(vq->broken)) {
> +		END_USE(vq);
> +		return -EIO;
> +	}
> +
> +#ifdef DEBUG
> +	{
> +		ktime_t now = ktime_get();
> +
> +		/* No kick or get, with .1 second between?  Warn. */
> +		if (vq->last_add_time_valid)
> +			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
> +					    > 100);
> +		vq->last_add_time = now;
> +		vq->last_add_time_valid = true;
> +	}
> +#endif
> +
> +	BUG_ON(total_sg == 0);
> +
> +	head = vq->next_avail_idx;
> +	wrap_counter = vq->wrap_counter;
> +
> +	if (virtqueue_use_indirect(_vq, total_sg))
> +		desc = alloc_indirect_packed(_vq, total_sg, gfp);
> +	else {
> +		desc = NULL;
> +		WARN_ON_ONCE(total_sg > vq->vring_packed.num && !vq->indirect);
> +	}
> +
> +	if (desc) {
> +		/* Use a single buffer which doesn't continue */
> +		indirect = true;
> +		/* Set up rest to use this indirect table. */
> +		i = 0;
> +		descs_used = 1;
> +	} else {
> +		indirect = false;
> +		desc = vq->vring_packed.desc;
> +		i = head;
> +		descs_used = total_sg;
> +	}
> +
> +	if (vq->vq.num_free < descs_used) {
> +		pr_debug("Can't add buf len %i - avail = %i\n",
> +			 descs_used, vq->vq.num_free);
> +		/* FIXME: for historical reasons, we force a notify here if
> +		 * there are outgoing parts to the buffer.  Presumably the
> +		 * host should service the ring ASAP. */
> +		if (out_sgs)
> +			vq->notify(&vq->vq);
> +		if (indirect)
> +			kfree(desc);
> +		END_USE(vq);
> +		return -ENOSPC;
> +	}
> +
> +	id = alloc_id_packed(vq);
> +
> +	for (n = 0; n < out_sgs + in_sgs; n++) {
> +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +			dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
> +					       DMA_TO_DEVICE : DMA_FROM_DEVICE);
> +			if (vring_mapping_error(vq, addr))
> +				goto unmap_release;
> +
> +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> +					(n < out_sgs ? 0 : VRING_DESC_F_WRITE) |
> +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
> +					VRING_DESC_F_USED(!vq->wrap_counter));
> +			if (!indirect && i == head)
> +				head_flags = flags;
> +			else
> +				desc[i].flags = flags;
> +
> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> +			i++;
> +			if (!indirect && i >= vq->vring_packed.num) {
> +				i = 0;
> +				vq->wrap_counter ^= 1;
> +			}
> +		}
> +	}
> +
> +	prev = (i > 0 ? i : vq->vring_packed.num) - 1;
> +	desc[prev].id = cpu_to_virtio16(_vq->vdev, id);
> +
> +	/* Last one doesn't continue. */
> +	if (total_sg == 1)
> +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> +	else
> +		desc[prev].flags &= cpu_to_virtio16(_vq->vdev,
> +						~VRING_DESC_F_NEXT);
> +
> +	if (indirect) {
> +		/* Now that the indirect table is filled in, map it. */
> +		dma_addr_t addr = vring_map_single(
> +			vq, desc, total_sg * sizeof(struct vring_packed_desc),
> +			DMA_TO_DEVICE);
> +		if (vring_mapping_error(vq, addr))
> +			goto unmap_release;
> +
> +		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
> +					     VRING_DESC_F_AVAIL(wrap_counter) |
> +					     VRING_DESC_F_USED(!wrap_counter));
> +		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev,
> +								   addr);
> +		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
> +				total_sg * sizeof(struct vring_packed_desc));
> +		vq->vring_packed.desc[head].id = cpu_to_virtio16(_vq->vdev, id);
> +	}
> +
> +	/* We're using some buffers from the free list. */
> +	vq->vq.num_free -= descs_used;
> +
> +	/* Update free pointer */
> +	if (indirect) {
> +		n = head + 1;
> +		if (n >= vq->vring_packed.num) {
> +			n = 0;
> +			vq->wrap_counter ^= 1;
> +		}
> +		vq->next_avail_idx = n;
> +	} else
> +		vq->next_avail_idx = i;
> +
> +	/* Store token and indirect buffer state. */
> +	vq->desc_state[id].num = descs_used;
> +	vq->desc_state[id].data = data;
> +	if (indirect)
> +		vq->desc_state[id].indir_desc = desc;
> +	else
> +		vq->desc_state[id].indir_desc = ctx;
> +
> +	/* A driver MUST NOT make the first descriptor in the list
> +	 * available before all subsequent descriptors comprising
> +	 * the list are made available. */
> +	virtio_wmb(vq->weak_barriers);
> +	vq->vring_packed.desc[head].flags = head_flags;
> +	vq->num_added += descs_used;
> +
> +	pr_debug("Added buffer head %i to %p\n", head, vq);
> +	END_USE(vq);
> +
> +	return 0;
> +
> +unmap_release:
> +	err_idx = i;
> +	i = head;
> +
> +	for (n = 0; n < total_sg; n++) {
> +		if (i == err_idx)
> +			break;
> +		vring_unmap_one_packed(vq, &desc[i]);
> +		i++;
> +		if (!indirect && i >= vq->vring_packed.num)
> +			i = 0;
> +	}
> +
> +	vq->wrap_counter = wrap_counter;
> +
> +	if (indirect)
> +		kfree(desc);
> +
> +	free_id_packed(vq, id);
> +
> +	END_USE(vq);
>   	return -EIO;
>   }
>   
>   static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
>   {
> -	return false;
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	u16 flags;
> +	bool needs_kick;
> +	u32 snapshot;
> +
> +	START_USE(vq);
> +	/* We need to expose the new flags value before checking notification
> +	 * suppressions. */
> +	virtio_mb(vq->weak_barriers);
> +
> +	snapshot = *(u32 *)vq->vring_packed.device;
> +	flags = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot >> 16)) & 0x3;
> +
> +#ifdef DEBUG
> +	if (vq->last_add_time_valid) {
> +		WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
> +					      vq->last_add_time)) > 100);
> +	}
> +	vq->last_add_time_valid = false;
> +#endif
> +
> +	needs_kick = (flags != VRING_EVENT_F_DISABLE);
> +	END_USE(vq);
> +	return needs_kick;
> +}
> +
> +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
> +			      unsigned int id, void **ctx)
> +{
> +	struct vring_packed_desc *desc;
> +	unsigned int i, j;
> +
> +	/* Clear data ptr. */
> +	vq->desc_state[id].data = NULL;
> +
> +	i = head;
> +
> +	for (j = 0; j < vq->desc_state[id].num; j++) {
> +		desc = &vq->vring_packed.desc[i];
> +		vring_unmap_one_packed(vq, desc);

As mentioned in previous discussion, this probably won't work for the 
case of out of order completion since it depends on the information in 
the descriptor ring. We probably need to extend ctx to record such 
information.

Thanks

> +		i++;
> +		if (i >= vq->vring_packed.num)
> +			i = 0;
> +	}
> +
> +	vq->vq.num_free += vq->desc_state[id].num;
> +
> +	if (vq->indirect) {
> +		u32 len;
> +
> +		/* Free the indirect table, if any, now that it's unmapped. */
> +		desc = vq->desc_state[id].indir_desc;
> +		if (!desc)
> +			goto out;
> +
> +		len = virtio32_to_cpu(vq->vq.vdev,
> +				      vq->vring_packed.desc[head].len);
> +
> +		for (j = 0; j < len / sizeof(struct vring_packed_desc); j++)
> +			vring_unmap_one_packed(vq, &desc[j]);
> +
> +		kfree(desc);
> +		vq->desc_state[id].indir_desc = NULL;
> +	} else if (ctx) {
> +		*ctx = vq->desc_state[id].indir_desc;
> +	}
> +
> +out:
> +	free_id_packed(vq, id);
>   }
>   
>   static inline bool more_used_packed(const struct vring_virtqueue *vq)
>   {
> -	return false;
> +	u16 last_used, flags;
> +	bool avail, used;
> +
> +	if (vq->vq.num_free == vq->vring_packed.num)
> +		return false;
> +
> +	last_used = vq->last_used_idx;
> +	flags = virtio16_to_cpu(vq->vq.vdev,
> +				vq->vring_packed.desc[last_used].flags);
> +	avail = flags & VRING_DESC_F_AVAIL(1);
> +	used = flags & VRING_DESC_F_USED(1);
> +
> +	return avail == used;
>   }
>   
>   static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
>   					  unsigned int *len,
>   					  void **ctx)
>   {
> -	return NULL;
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	u16 last_used, id;
> +	void *ret;
> +
> +	START_USE(vq);
> +
> +	if (unlikely(vq->broken)) {
> +		END_USE(vq);
> +		return NULL;
> +	}
> +
> +	if (!more_used_packed(vq)) {
> +		pr_debug("No more buffers in queue\n");
> +		END_USE(vq);
> +		return NULL;
> +	}
> +
> +	/* Only get used elements after they have been exposed by host. */
> +	virtio_rmb(vq->weak_barriers);
> +
> +	last_used = vq->last_used_idx;
> +	id = virtio16_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].id);
> +	*len = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].len);
> +
> +	if (unlikely(id >= vq->vring_packed.num)) {
> +		BAD_RING(vq, "id %u out of range\n", id);
> +		return NULL;
> +	}
> +	if (unlikely(!vq->desc_state[id].data)) {
> +		BAD_RING(vq, "id %u is not a head!\n", id);
> +		return NULL;
> +	}
> +
> +	vq->last_used_idx += vq->desc_state[id].num;
> +	if (vq->last_used_idx >= vq->vring_packed.num)
> +		vq->last_used_idx -= vq->vring_packed.num;
> +
> +	/* detach_buf_packed clears data, so grab it now. */
> +	ret = vq->desc_state[id].data;
> +	detach_buf_packed(vq, last_used, id, ctx);
> +
> +#ifdef DEBUG
> +	vq->last_add_time_valid = false;
> +#endif
> +
> +	END_USE(vq);
> +	return ret;
>   }
>   
>   static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
>   {
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	if (vq->event_flags_shadow != VRING_EVENT_F_DISABLE) {
> +		vq->event_flags_shadow = VRING_EVENT_F_DISABLE;
> +		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> +							vq->event_flags_shadow);
> +	}
>   }
>   
>   static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
>   {
> -	return 0;
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	START_USE(vq);
> +
> +	/* We optimistically turn back on interrupts, then check if there was
> +	 * more to do. */
> +
> +	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> +		virtio_wmb(vq->weak_barriers);
> +		vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> +		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> +							vq->event_flags_shadow);
> +	}
> +
> +	END_USE(vq);
> +	return vq->last_used_idx;
>   }
>   
>   static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
>   {
> -	return false;
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	bool avail, used;
> +	u16 flags;
> +
> +	virtio_mb(vq->weak_barriers);
> +	flags = virtio16_to_cpu(vq->vq.vdev,
> +			vq->vring_packed.desc[last_used_idx].flags);
> +	avail = flags & VRING_DESC_F_AVAIL(1);
> +	used = flags & VRING_DESC_F_USED(1);
> +	return avail == used;
>   }
>   
>   static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
>   {
> -	return false;
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	START_USE(vq);
> +
> +	/* We optimistically turn back on interrupts, then check if there was
> +	 * more to do. */
> +
> +	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> +		virtio_wmb(vq->weak_barriers);
> +		vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> +		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> +							vq->event_flags_shadow);
> +	}
> +
> +	if (more_used_packed(vq)) {
> +		END_USE(vq);
> +		return false;
> +	}
> +
> +	END_USE(vq);
> +	return true;
>   }
>   
>   static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
>   {
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	u16 flags, head, id, i;
> +	unsigned int len;
> +	void *buf;
> +
> +	START_USE(vq);
> +
> +	/* Detach the used descriptors. */
> +	if (more_used_packed(vq)) {
> +		buf = virtqueue_get_buf_ctx_packed(_vq, &len, NULL);
> +		END_USE(vq);
> +		return buf;
> +	}
> +
> +	/* Detach the available descriptors. */
> +	for (i = vq->last_used_idx; i != vq->next_avail_idx;
> +			i = (i + 1) % vq->vring_packed.num) {
> +		flags = virtio16_to_cpu(vq->vq.vdev,
> +				vq->vring_packed.desc[i].flags);
> +		while (flags & VRING_DESC_F_NEXT) {
> +			i = (i + 1) % vq->vring_packed.num;
> +			flags = virtio16_to_cpu(vq->vq.vdev,
> +					vq->vring_packed.desc[i].flags);
> +		}
> +		id = virtio16_to_cpu(_vq->vdev, vq->vring_packed.desc[i].id);
> +		if (!vq->desc_state[id].data)
> +			continue;
> +
> +		len = vq->desc_state[id].num - 1;
> +		head = (i < len ? i + vq->vring_packed.num : i) - len;
> +
> +		/* detach_buf clears data, so grab it now. */
> +		buf = vq->desc_state[id].data;
> +		detach_buf_packed(vq, head, id, NULL);
> +		END_USE(vq);
> +		return buf;
> +	}
> +	/* That should have freed everything. */
> +	BUG_ON(vq->vq.num_free != vq->vring_packed.num);
> +
> +	END_USE(vq);
>   	return NULL;
>   }
>   
> @@ -1198,6 +1666,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>   		vq->next_avail_idx = 0;
>   		vq->wrap_counter = 1;
>   		vq->event_flags_shadow = 0;
> +		idr_init(&vq->buffer_id);
>   	} else {
>   		vq->vring = vring.vring_split;
>   		vq->avail_flags_shadow = 0;
> @@ -1384,6 +1853,8 @@ void vring_del_virtqueue(struct virtqueue *_vq)
>   					      (void *)vq->vring.desc,
>   				 vq->queue_dma_addr);
>   	}
> +	if (vq->packed)
> +		idr_destroy(&vq->buffer_id);
>   	list_del(&_vq->list);
>   	kfree(vq);
>   }

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox