Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] ipv6: fix general protection fault in fib6_add()
From: David Miller @ 2018-01-04 19:30 UTC (permalink / raw)
  To: weiwan; +Cc: netdev, kafai, dsahern
In-Reply-To: <20180103221159.159648-1-tracywwnj@gmail.com>

From: Wei Wang <weiwan@google.com>
Date: Wed,  3 Jan 2018 14:11:59 -0800

> From: Wei Wang <weiwan@google.com>
> 
> In fib6_add(), pn could be NULL if fib6_add_1() failed to return a fib6
> node. Checking pn != fn before accessing pn->leaf makes sure pn is not
> NULL.
> This fixes the following GPF reported by syzkaller:
 ...
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
> Signed-off-by: Wei Wang <weiwan@google.com>

Applied, thanks.

^ permalink raw reply

* Re: uapi/if_ether.h: prevent redefinition of struct ethhdr
From: David Miller @ 2018-01-04 19:31 UTC (permalink / raw)
  To: hauke; +Cc: netdev, linux-api, musl, felix.janda, f.fainelli
In-Reply-To: <20180103221421.8273-1-hauke@hauke-m.de>

From: Hauke Mehrtens <hauke@hauke-m.de>
Date: Wed,  3 Jan 2018 23:14:21 +0100

> Musl provides its own ethhdr struct definition. Add a guard to prevent
> its definition of the appropriate musl header has already been included.
> 
> glibc does not implement this header, but when glibc will implement this
> they can just define __UAPI_DEF_ETHHDR 0 to make it work with the
> kernel.
> 
> Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next v3 1/3] virtio_net: propagate linkspeed/duplex settings from the hypervisor
From: Jason Baron @ 2018-01-04 19:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: davem, jasowang, netdev, virtualization, qemu-devel, virtio-dev
In-Reply-To: <20180104201934-mutt-send-email-mst@kernel.org>



On 01/04/2018 01:22 PM, Michael S. Tsirkin wrote:
> On Thu, Jan 04, 2018 at 01:12:30PM -0500, Jason Baron wrote:
>>
>>
>> On 01/04/2018 12:05 PM, Michael S. Tsirkin wrote:
>>> On Thu, Jan 04, 2018 at 12:16:44AM -0500, Jason Baron wrote:
>>>> The ability to set speed and duplex for virtio_net is useful in various
>>>> scenarios as described here:
>>>>
>>>> 16032be virtio_net: add ethtool support for set and get of settings
>>>>
>>>> However, it would be nice to be able to set this from the hypervisor,
>>>> such that virtio_net doesn't require custom guest ethtool commands.
>>>>
>>>> Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows
>>>> the hypervisor to export a linkspeed and duplex setting. The user can
>>>> subsequently overwrite it later if desired via: 'ethtool -s'.
>>>>
>>>> Note that VIRTIO_NET_F_SPEED_DUPLEX is defined as bit 63, the intention
>>>> is that device feature bits are to grow down from bit 63, since the
>>>> transports are starting from bit 24 and growing up.
>>>>
>>>> Signed-off-by: Jason Baron <jbaron@akamai.com>
>>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>>> Cc: Jason Wang <jasowang@redhat.com>
>>>> Cc: virtio-dev@lists.oasis-open.org
>>>> ---
>>>>  drivers/net/virtio_net.c        | 19 ++++++++++++++++++-
>>>>  include/uapi/linux/virtio_net.h | 13 +++++++++++++
>>>>  2 files changed, 31 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index 6fb7b65..0b2d314 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -2146,6 +2146,22 @@ static void virtnet_config_changed_work(struct work_struct *work)
>>>>  
>>>>  	vi->status = v;
>>>>  
>>>> +	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX)) {
>>>
>>> BTW we can avoid this read for when link goes down.
>>> Not a big deal but still.
>>
>> So you are saying that we can just set vi->speed and vi->duplex to
>> 'unknown' when the link goes down and not check for the presence of
>> VIRTIO_NET_F_SPEED_DUPLEX?
>>
>> If so, that could over-write what the user may have configured in the
>> guest via 'ethtool -s' when the link goes down, so that would be a
>> change in behavior, but perhaps that is ok?
> 
> No - what I am saying is that your patch overwrites the values
> set by user when link goes down.
> 
> I suggest limiting this call to when
> 
> if (vi->status & VIRTIO_NET_S_LINK_UP)
> 
> and then the values are overwritten when link goes up
> which seems closer to what a user might expect.
ok, will update this for v4.

Thanks,

-Jason

> 
>>
>> I think I would prefer to have the link down event still check for
>> VIRTIO_NET_F_SPEED_DUPLEX before changing speed/duplex. That way we
>> still have 2 modes for updating the fields:
>>
>> 1) completely guest controlled. Same as we have now and host does not
>> change any values and does not set VIRTIO_NET_F_SPEED_DUPLEX flag (hence
>> don't remove above check).
>>
>> 2) if speed or duplex or speed is set in the qemu command line, then set
>> the VIRTIO_NET_F_SPEED_DUPLEX and have host control the settings of
>> speed/duplex (with ability of guest to over-write if it wanted to).
>>
>>
>>>
>>>> +		u32 speed;
>>>> +		u8 duplex;
>>>> +
>>>> +		speed = virtio_cread32(vi->vdev,
>>>> +				       offsetof(struct virtio_net_config,
>>>> +						speed));
>>>> +		if (ethtool_validate_speed(speed))
>>>> +			vi->speed = speed;
>>>> +		duplex = virtio_cread8(vi->vdev,
>>>> +				       offsetof(struct virtio_net_config,
>>>> +						duplex));
>>>> +		if (ethtool_validate_duplex(duplex))
>>>> +			vi->duplex = duplex;
>>>> +	}
>>>> +
>>>>  	if (vi->status & VIRTIO_NET_S_LINK_UP) {
>>>>  		netif_carrier_on(vi->dev);
>>>>  		netif_tx_wake_all_queues(vi->dev);
>>>
>>> OK so this handles the case when VIRTIO_NET_F_STATUS is set,
>>> but when it's clear we need to call this from virtnet_probe.
>>>
>>> I propose moving this chunk to a function and calling from two places.
>>>
>>
>> good point. will update.
>>
>> Thanks,
>>
>> -Jason
>>
>>>
>>>> @@ -2796,7 +2812,8 @@ static struct virtio_device_id id_table[] = {
>>>>  	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
>>>>  	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>>>>  	VIRTIO_NET_F_CTRL_MAC_ADDR, \
>>>> -	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
>>>> +	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
>>>> +	VIRTIO_NET_F_SPEED_DUPLEX
>>>>  
>>>>  static unsigned int features[] = {
>>>>  	VIRTNET_FEATURES,
>>>> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
>>>> index fc353b5..5de6ed3 100644
>>>> --- a/include/uapi/linux/virtio_net.h
>>>> +++ b/include/uapi/linux/virtio_net.h
>>>> @@ -57,6 +57,8 @@
>>>>  					 * Steering */
>>>>  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
>>>>  
>>>> +#define VIRTIO_NET_F_SPEED_DUPLEX 63	/* Device set linkspeed and duplex */
>>>> +
>>>>  #ifndef VIRTIO_NET_NO_LEGACY
>>>>  #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */
>>>>  #endif /* VIRTIO_NET_NO_LEGACY */
>>>> @@ -76,6 +78,17 @@ struct virtio_net_config {
>>>>  	__u16 max_virtqueue_pairs;
>>>>  	/* Default maximum transmit unit advice */
>>>>  	__u16 mtu;
>>>> +	/*
>>>> +	 * speed, in units of 1Mb. All values 0 to INT_MAX are legal.
>>>> +	 * Any other value stands for unknown.
>>>> +	 */
>>>> +	__u32 speed;
>>>> +	/*
>>>> +	 * 0x00 - half duplex
>>>> +	 * 0x01 - full duplex
>>>> +	 * Any other value stands for unknown.
>>>> +	 */
>>>> +	__u8 duplex;
>>>>  } __attribute__((packed));
>>>>  
>>>>  /*
>>>> -- 
>>>> 2.6.1

^ permalink raw reply

* Re: BUG: free active (active state 0) object type: work_struct hint: strp_work
From: Tom Herbert @ 2018-01-04 19:36 UTC (permalink / raw)
  To: syzbot
  Cc: David S . Miller, Eric Biggers, John Fastabend, linux-kernel,
	Linux Kernel Network Developers, syzkaller-bugs, xiyou.wangcong
In-Reply-To: <001a114aca7419fa410561f23992@google.com>

On Thu, Jan 4, 2018 at 4:10 AM, syzbot
<syzbot+3c6c745b0d2f341bbf50@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzkaller hit the following crash on
> 6bb8824732f69de0f233ae6b1a8158e149627b38
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> Unfortunately, I don't have any reproducer for this bug yet.
>
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+3c6c745b0d2f341bbf50@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> Use struct sctp_assoc_value instead
> sctp: [Deprecated]: syz-executor4 (pid 12483) Use of int in maxseg socket
> option.
> Use struct sctp_assoc_value instead
> ------------[ cut here ]------------
> ODEBUG: free active (active state 0) object type: work_struct hint:
> strp_work+0x0/0xf0 net/strparser/strparser.c:381
> WARNING: CPU: 1 PID: 3502 at lib/debugobjects.c:291
> debug_print_object+0x166/0x220 lib/debugobjects.c:288
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 1 PID: 3502 Comm: kworker/u4:4 Not tainted 4.15.0-rc5+ #170
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: kkcmd kcm_tx_work
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  panic+0x1e4/0x41c kernel/panic.c:183
>  __warn+0x1dc/0x200 kernel/panic.c:547
>  report_bug+0x211/0x2d0 lib/bug.c:184
>  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
>  fixup_bug arch/x86/kernel/traps.c:247 [inline]
>  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>  invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1061
> RIP: 0010:debug_print_object+0x166/0x220 lib/debugobjects.c:288
> RSP: 0018:ffff8801c0ee7068 EFLAGS: 00010086
> RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff8159bc3e
> RDX: 0000000000000000 RSI: 1ffff100381dcdc8 RDI: ffff8801db317dd0
> RBP: ffff8801c0ee70a8 R08: 0000000000000000 R09: 1ffff100381dcd9a
> R10: ffffed00381dce3c R11: ffffffff86137ad8 R12: 0000000000000001
> R13: ffffffff86113480 R14: ffffffff8560dc40 R15: ffffffff8146e5f0
>  __debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
>  debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
>  kmem_cache_free+0x253/0x2a0 mm/slab.c:3745

I believe we just need to defer kmem_cache_free to call_rcu.

Tom

>  unreserve_psock+0x5a1/0x780 net/kcm/kcmsock.c:547
>  kcm_write_msgs+0xbae/0x1b80 net/kcm/kcmsock.c:590
>  kcm_tx_work+0x2e/0x190 net/kcm/kcmsock.c:731
>  process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112
>  worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>  kthread+0x33c/0x400 kernel/kthread.c:238
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:515
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.15.0-rc5+ #170 Not tainted
> ------------------------------------------------------
> kworker/u4:4/3502 is trying to acquire lock:
>  ((console_sem).lock){-.-.}, at: [<0000000091214b42>] down_trylock+0x13/0x70
> kernel/locking/semaphore.c:136
>
> but task is already holding lock:
>  (&obj_hash[i].lock){-.-.}, at: [<00000000da143489>]
> __debug_check_no_obj_freed lib/debugobjects.c:736 [inline]
>  (&obj_hash[i].lock){-.-.}, at: [<00000000da143489>]
> debug_check_no_obj_freed+0x1e9/0xf1f lib/debugobjects.c:774
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&obj_hash[i].lock){-.-.}:
>        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>        _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
>        __debug_object_init+0x109/0x1040 lib/debugobjects.c:343
>        debug_object_init+0x17/0x20 lib/debugobjects.c:391
>        debug_hrtimer_init kernel/time/hrtimer.c:396 [inline]
>        debug_init kernel/time/hrtimer.c:441 [inline]
>        hrtimer_init+0x8c/0x410 kernel/time/hrtimer.c:1122
>        init_dl_task_timer+0x1b/0x50 kernel/sched/deadline.c:1023
>        __sched_fork+0x2c4/0xb70 kernel/sched/core.c:2188
>        init_idle+0x75/0x820 kernel/sched/core.c:5279
>        sched_init+0xb19/0xc43 kernel/sched/core.c:5976
>        start_kernel+0x452/0x819 init/main.c:582
>        x86_64_start_reservations+0x2a/0x2c arch/x86/kernel/head64.c:378
>        x86_64_start_kernel+0x77/0x7a arch/x86/kernel/head64.c:359
>        secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:237
>
> -> #2 (&rq->lock){-.-.}:
>        __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
>        _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:144
>        rq_lock kernel/sched/sched.h:1766 [inline]
>        task_fork_fair+0x7a/0x690 kernel/sched/fair.c:9449
>        sched_fork+0x435/0xc00 kernel/sched/core.c:2404
>        copy_process.part.38+0x174b/0x4b20 kernel/fork.c:1722
>        copy_process kernel/fork.c:1565 [inline]
>        _do_fork+0x1f7/0xfe0 kernel/fork.c:2044
>        kernel_thread+0x34/0x40 kernel/fork.c:2106
>        rest_init+0x22/0xf0 init/main.c:401
>        start_kernel+0x7f1/0x819 init/main.c:713
>        x86_64_start_reservations+0x2a/0x2c arch/x86/kernel/head64.c:378
>        x86_64_start_kernel+0x77/0x7a arch/x86/kernel/head64.c:359
>        secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:237
>
> -> #1 (&p->pi_lock){-.-.}:
>        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>        _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
>        try_to_wake_up+0xbc/0x1600 kernel/sched/core.c:1988
>        wake_up_process+0x10/0x20 kernel/sched/core.c:2151
>        __up.isra.0+0x1cc/0x2c0 kernel/locking/semaphore.c:262
>        up+0x13b/0x1d0 kernel/locking/semaphore.c:187
>        __up_console_sem+0xb2/0x1a0 kernel/printk/printk.c:245
>        console_unlock+0x538/0xd80 kernel/printk/printk.c:2248
>        do_con_write+0x106e/0x1f70 drivers/tty/vt/vt.c:2433
>        con_write+0x25/0xb0 drivers/tty/vt/vt.c:2782
>        do_output_char+0x4d9/0x7a0 drivers/tty/n_tty.c:431
>        process_output drivers/tty/n_tty.c:498 [inline]
>        n_tty_write+0x68d/0xec0 drivers/tty/n_tty.c:2314
>        do_tty_write drivers/tty/tty_io.c:949 [inline]
>        tty_write+0x3fa/0x840 drivers/tty/tty_io.c:1033
>        __vfs_write+0xef/0x970 fs/read_write.c:480
>        vfs_write+0x189/0x510 fs/read_write.c:544
>        SYSC_write fs/read_write.c:589 [inline]
>        SyS_write+0xef/0x220 fs/read_write.c:581
>        entry_SYSCALL_64_fastpath+0x1f/0x96
>
> -> #0 ((console_sem).lock){-.-.}:
>        lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914
>        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>        _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
>        down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
>        __down_trylock_console_sem+0xa2/0x1e0 kernel/printk/printk.c:228
>        console_trylock+0x15/0x100 kernel/printk/printk.c:2065
>        vprintk_emit+0x49b/0x590 kernel/printk/printk.c:1756
>        vprintk_default+0x28/0x30 kernel/printk/printk.c:1796
>        vprintk_func+0x57/0xc0 kernel/printk/printk_safe.c:379
>        printk+0xaa/0xca kernel/printk/printk.c:1829
>        __warn_printk+0x90/0xf0 kernel/panic.c:599
>        debug_print_object+0x166/0x220 lib/debugobjects.c:288
>        __debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
>        debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
>        kmem_cache_free+0x253/0x2a0 mm/slab.c:3745
>        unreserve_psock+0x5a1/0x780 net/kcm/kcmsock.c:547
>        kcm_write_msgs+0xbae/0x1b80 net/kcm/kcmsock.c:590
>        kcm_tx_work+0x2e/0x190 net/kcm/kcmsock.c:731
>        process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112
>        worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>        kthread+0x33c/0x400 kernel/kthread.c:238
>        ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:515
>
> other info that might help us debug this:
>
> Chain exists of:
>   (console_sem).lock --> &rq->lock --> &obj_hash[i].lock
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(&obj_hash[i].lock);
>                                lock(&rq->lock);
>                                lock(&obj_hash[i].lock);
>   lock((console_sem).lock);
>
>  *** DEADLOCK ***
>
> 5 locks held by kworker/u4:4/3502:
>  #0:  ((wq_completion)"%s""kkcmd"){+.+.}, at: [<0000000030be6056>]
> process_one_work+0xaaf/0x1b10 kernel/workqueue.c:2083
>  #1:  ((work_completion)(&kcm->tx_work)){+.+.}, at: [<0000000019ffb03c>]
> process_one_work+0xb01/0x1b10 kernel/workqueue.c:2087
>  #2:  (sk_lock-AF_KCM){+.+.}, at: [<0000000077d44615>] lock_sock
> include/net/sock.h:1462 [inline]
>  #2:  (sk_lock-AF_KCM){+.+.}, at: [<0000000077d44615>]
> kcm_tx_work+0x26/0x190 net/kcm/kcmsock.c:726
>  #3:  (&(&mux->lock)->rlock){+...}, at: [<00000000c908a2e7>] spin_lock_bh
> include/linux/spinlock.h:315 [inline]
>  #3:  (&(&mux->lock)->rlock){+...}, at: [<00000000c908a2e7>]
> unreserve_psock+0x9e/0x780 net/kcm/kcmsock.c:521
>  #4:  (&obj_hash[i].lock){-.-.}, at: [<00000000da143489>]
> __debug_check_no_obj_freed lib/debugobjects.c:736 [inline]
>  #4:  (&obj_hash[i].lock){-.-.}, at: [<00000000da143489>]
> debug_check_no_obj_freed+0x1e9/0xf1f lib/debugobjects.c:774
>
> stack backtrace:
> CPU: 1 PID: 3502 Comm: kworker/u4:4 Not tainted 4.15.0-rc5+ #170
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: kkcmd kcm_tx_work
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  print_circular_bug.isra.37+0x2cd/0x2dc kernel/locking/lockdep.c:1218
>  check_prev_add kernel/locking/lockdep.c:1858 [inline]
>  check_prevs_add kernel/locking/lockdep.c:1971 [inline]
>  validate_chain kernel/locking/lockdep.c:2412 [inline]
>  __lock_acquire+0x30a8/0x3e00 kernel/locking/lockdep.c:3426
>  lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914
>  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>  _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
>  down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
>  __down_trylock_console_sem+0xa2/0x1e0 kernel/printk/printk.c:228
>  console_trylock+0x15/0x100 kernel/printk/printk.c:2065
>  vprintk_emit+0x49b/0x590 kernel/printk/printk.c:1756
>  vprintk_default+0x28/0x30 kernel/printk/printk.c:1796
>  vprintk_func+0x57/0xc0 kernel/printk/printk_safe.c:379
>  printk+0xaa/0xca kernel/printk/printk.c:1829
>  __warn_printk+0x90/0xf0 kernel/panic.c:599
>  debug_print_object+0x166/0x220 lib/debugobjects.c:288
>  __debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
>  debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
>  kmem_cache_free+0x253/0x2a0 mm/slab.c:3745
>  unreserve_psock+0x5a1/0x780 net/kcm/kcmsock.c:547
>  kcm_write_msgs+0xbae/0x1b80 net/kcm/kcmsock.c:590
>  kcm_tx_work+0x2e/0x190 net/kcm/kcmsock.c:731
>  process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112
>  worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>  kthread+0x33c/0x400 kernel/kthread.c:238
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:515
> Shutting down cpus with NMI
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply

* Re: [PATCH v5 1/3] clocksource/drivers/atcpit100: Add andestech atcpit100 timer
From: Daniel Lezcano @ 2018-01-04 19:48 UTC (permalink / raw)
  To: Greentime Hu
  Cc: Greentime, Rick Chen, Rick Chen, Linux Kernel Mailing List,
	Arnd Bergmann, Linus Walleij, linux-arch, Thomas Gleixner,
	Jason Cooper, Marc Zyngier, Rob Herring, netdev, Vincent Chen,
	DTML, Al Viro, David Howells, Will Deacon, linux-serial
In-Reply-To: <CAEbi=3c4yAx7Tejz8aPw3-3vj89t8cg-zNEY-vbF7KaMo8OM0Q@mail.gmail.com>

On 04/01/2018 15:06, Greentime Hu wrote:
> Hi, Daniel:
> 
> 2018-01-04 21:50 GMT+08:00 Daniel Lezcano <daniel.lezcano@linaro.org>:
>>
>> Hi,
>>
>> sorry I missed your answer. Comments below.
>>
>> On 13/12/2017 07:06, Greentime Hu wrote:
>>> Hi, Daniel:
>>>
>>> 2017-12-12 18:05 GMT+08:00 Daniel Lezcano <daniel.lezcano@linaro.org>:
>>>> On 12/12/2017 06:46, Rick Chen wrote:
>>>>> ATCPIT100 is often used on the Andes architecture,
>>>>> This timer provide 4 PIT channels. Each PIT channel is a
>>>>> multi-function timer, can be configured as 32,16,8 bit timers
>>>>> or PWM as well.
>>>>>
>>>>> For system timer it will set channel 1 32-bit timer0 as clock
>>>>> source and count downwards until underflow and restart again.
>>>>
>>>> [ ... ]
>>>>
>>>>> +config CLKSRC_ATCPIT100
>>>>> +     bool "Clocksource for AE3XX platform"
>>>>> +     depends on NDS32 || COMPILE_TEST
>>>>> +     depends on HAS_IOMEM
>>>>> +     help
>>>>> +       This option enables support for the Andestech AE3XX platform timers.
>>>>
>>>> Hi Rick,
>>>>
>>>> the general rule for the Kconfig is:
>>>>
>>>> bool "Clocksource for AE3XX platform" if COMPILE_TEST

BTW, select TIMER_OF is missing.

>>>> and no deps on the platform.
>>>>
>>>> It is up to the platform Kconfig to select the option.
>>>>
>>>> We want here a silent option but make it selectable in case of
>>>> compilation test coverage.
>>>
>>>
>>> The way we like to use it is because
>>> 1. This timer is a basic component to boot an nds32 CPU and it should
>>> be able to select without COMPILE_TEST for nds32 architecture.
>>
>> Yes, so you don't need it to be selectable, you must select it from the
>> platform's Kconfig.
> 
> I am not sure that I get your point or not.
> We don't have a CONFIG_PLAT_AE3XX.
> Do you mean we should create one and select CLKSRC_ATCPIT100 under
> CONFIG_PLAT_AE3XX?

No. Can't you add in arch/ndis32/Kconfig ?

+select TIMER_ATCPIT100

Like:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/Kconfig#n50


>>> 2. It seems conflict with debug info. I am not sure if there is
>>> another way to debug kernel(with debug info) with COMPILE_TEST and
>>> DEBUG_INFO because we need this driver for nds32 architecture.
>>>
>>> Symbol: DEBUG_INFO [=n]
>>> Type  : boolean
>>> Prompt: Compile the kernel with debug info
>>>   Location:
>>>     -> Kernel hacking
>>>       -> Compile-time checks and compiler options
>>>   Defined at lib/Kconfig.debug:140
>>>   Depends on: DEBUG_KERNEL [=y] && !COMPILE_TEST [=n]
>>
>> The COMPILE_TEST option is only there to allow cross-compilation test
>> coverage, it does not select or unselect the driver in usual way.
>>
>> If the COMPILE_TEST is enabled, then the option will appear in the
>> menuconfig, so that gives the opportunity to select/unselect it.
>>
>> Otherwise, the Kconfig's platform selects automatically the driver and
>> the user *can't* unselect it from the menuconfig as it is a silent
>> option and that is certainly what you want.
>>
>>>> Also, this driver is not a CLKSRC but a TIMER. Rename CLKSRC_ATCPIT100
>>>> to TIMER_ATCPIT100.
>>>
>>> Thanks. We will rename it in the next version patch.
>>
>> You just resend an entire series V5 for the architecture. I'm confused,
>> what is the merging path ?
> 
> Sorry. I didn't get your point.
> We sent the timer patch and the architecture patch together because it
> would be easier for reviewer to check the vdso implementations.
> What do you mean about the merging path?

I received a [Vx y/3] series and I received a [Vx y/39].

The former from Rick Chen means to me "please pick them through your tree".

The latter from you means to me "can you ack the patches so I can merge
them through my tree". Note you will have to resend the entire arch
series for every single review/comment (that could end up upset the
Cc'ed people).

Which one should I review ? I can not track different patchset
implementing the same thing. Which one should I comment, review ? Are
the comments I did on [Vx y/3] taken into account in the arch series ?
etc ...

Please clarify, it is confusing and impossible to review in this situation.

I suggest we stick to the x/3 series, so I can comment it and you can
resend a new version without resending the entire arch series. So I can
merge it through my tree, and you get it via eg. a shared immutable
branch. The arch series will be reduced by 3 patches.

-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply

* Re: [PATCH] nfp: add basic multicast filtering
From: Simon Horman @ 2018-01-04 19:52 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski; +Cc: netdev, oss-drivers
In-Reply-To: <20180104151019.18398-1-simon.horman@netronome.com>

On Thu, Jan 04, 2018 at 04:10:19PM +0100, Simon Horman wrote:
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> 
> We currently always pass all multicast traffic through.
> Only set L2MC when actually needed.  Since the driver
> was not making use of the capability to filter out mcast
> frames, some FW projects don't implement it any more.
> Don't warn users if capability is not present (like we
> do for promisc flag).  The lack of L2MC capability is
> assumed to mean all multicast traffic goes through.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>

Apologies, I somehow forgot to mark this as being for net-next.

> ---
>  drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> index 0add4870ce2e..29c0947f6d70 100644
> --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> @@ -2850,6 +2850,11 @@ static void nfp_net_set_rx_mode(struct net_device *netdev)
>  
>  	new_ctrl = nn->dp.ctrl;
>  
> +	if (!netdev_mc_empty(netdev) || netdev->flags & IFF_ALLMULTI)
> +		new_ctrl |= nn->cap & NFP_NET_CFG_CTRL_L2MC;
> +	else
> +		new_ctrl &= ~NFP_NET_CFG_CTRL_L2MC;
> +
>  	if (netdev->flags & IFF_PROMISC) {
>  		if (nn->cap & NFP_NET_CFG_CTRL_PROMISC)
>  			new_ctrl |= NFP_NET_CFG_CTRL_PROMISC;
> @@ -3787,8 +3792,6 @@ int nfp_net_init(struct nfp_net *nn)
>  	/* Allow L2 Broadcast and Multicast through by default, if supported */
>  	if (nn->cap & NFP_NET_CFG_CTRL_L2BC)
>  		nn->dp.ctrl |= NFP_NET_CFG_CTRL_L2BC;
> -	if (nn->cap & NFP_NET_CFG_CTRL_L2MC)
> -		nn->dp.ctrl |= NFP_NET_CFG_CTRL_L2MC;
>  
>  	/* Allow IRQ moderation, if supported */
>  	if (nn->cap & NFP_NET_CFG_CTRL_IRQMOD) {
> -- 
> 2.11.0
> 

^ permalink raw reply

* [RFC -next 0/2] sched: red offload and TC stats
From: Jakub Kicinski @ 2018-01-04 20:18 UTC (permalink / raw)
  To: john.fastabend, jiri, xiyou.wangcong, Nogah Frankel, Yuval Mintz
  Cc: netdev, oss-drivers, edumazet, Jakub Kicinski

Hi!

I've been playing with TC qdisc offloads.  I came to a little snag
with the momentary stats (qlen and backlog).  It seems there is a
little bit of legacy magic around the backlog while qlen needs
changes to be reported.

I would appreciate any comments and guidance.

Jakub Kicinski (2):
  net: sched: add qstats.qlen to qlen
  net: sched: red: don't reset the backlog on every stat dump

 drivers/net/ethernet/mellanox/mlxsw/spectrum.h       | 1 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c | 9 +++++++--
 include/net/sch_generic.h                            | 4 ++--
 net/sched/sch_red.c                                  | 1 -
 4 files changed, 10 insertions(+), 5 deletions(-)

-- 
2.15.1

^ permalink raw reply

* [RFC -next 1/2] net: sched: add qstats.qlen to qlen
From: Jakub Kicinski @ 2018-01-04 20:18 UTC (permalink / raw)
  To: john.fastabend, jiri, xiyou.wangcong, Nogah Frankel, Yuval Mintz
  Cc: netdev, oss-drivers, edumazet, Jakub Kicinski
In-Reply-To: <20180104201851.6455-1-jakub.kicinski@netronome.com>

AFAICT struct gnet_stats_queue.qlen is not used in Qdiscs.
It may, however, be useful for offloads to report HW queue
length there.  Add that value to the result of qdisc_qlen_sum().

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/sch_generic.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index ac029d5d88e4..5d8735c0af11 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -310,14 +310,14 @@ static inline int qdisc_qlen(const struct Qdisc *q)
 
 static inline int qdisc_qlen_sum(const struct Qdisc *q)
 {
-	__u32 qlen = 0;
+	__u32 qlen = q->qstats.qlen;
 	int i;
 
 	if (q->flags & TCQ_F_NOLOCK) {
 		for_each_possible_cpu(i)
 			qlen += per_cpu_ptr(q->cpu_qstats, i)->qlen;
 	} else {
-		qlen = q->q.qlen;
+		qlen += q->q.qlen;
 	}
 
 	return qlen;
-- 
2.15.1

^ permalink raw reply related

* [RFC -next 2/2] net: sched: red: don't reset the backlog on every stat dump
From: Jakub Kicinski @ 2018-01-04 20:18 UTC (permalink / raw)
  To: john.fastabend, jiri, xiyou.wangcong, Nogah Frankel, Yuval Mintz
  Cc: netdev, oss-drivers, edumazet, Jakub Kicinski
In-Reply-To: <20180104201851.6455-1-jakub.kicinski@netronome.com>

Commit 0dfb33a0d7e2 ("sch_red: report backlog information") copied
child's backlog into RED's backlog.  Back then RED did not maintain
its own backlog counts.  This has changed after commit 2ccccf5fb43f
("net_sched: update hierarchical backlog too") and commit d7f4f332f082
("sch_red: update backlog as well").  Copying is no longer necessary.

Tested:

$ tc -s qdisc show dev veth0
qdisc red 1: root refcnt 2 limit 400000b min 30000b max 30000b ecn
 Sent 20942 bytes 221 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 1260b 14p requeues 14
  marked 0 early 0 pdrop 0 other 0
qdisc tbf 2: parent 1: rate 1Kbit burst 15000b lat 3585.0s
 Sent 20942 bytes 221 pkt (dropped 0, overlimits 138 requeues 0)
 backlog 1260b 14p requeues 14

Recently RED offload was added.  We need to make sure drivers don't
depend on resetting the stats.  This means backlog should be treated
like any other statistic:

  total_stat = new_hw_stat - prev_hw_stat;

Unlike for other statistics new_hw_stat < prev_hw_stat can be true.
Adjust mlxsw.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h       | 1 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c | 9 +++++++--
 net/sched/sch_red.c                                  | 1 -
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index ff8d32bc852c..6755050e4ee0 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -237,6 +237,7 @@ struct mlxsw_sp_qdisc {
 	u64 tx_packets;
 	u64 drops;
 	u64 overlimits;
+	u64 backlog;
 };
 
 /* No need an internal lock; At worse - miss a single periodic iteration */
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
index c33beac5def0..d5091740bd40 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
@@ -212,6 +212,7 @@ mlxsw_sp_qdisc_get_red_stats(struct mlxsw_sp_port *mlxsw_sp_port, u32 handle,
 	u64 tx_bytes, tx_packets, overlimits, drops;
 	struct mlxsw_sp_port_xstats *xstats;
 	struct rtnl_link_stats64 *stats;
+	s64 backlog;
 
 	if (mlxsw_sp_qdisc->handle != handle ||
 	    mlxsw_sp_qdisc->type != MLXSW_SP_QDISC_RED)
@@ -226,17 +227,21 @@ mlxsw_sp_qdisc_get_red_stats(struct mlxsw_sp_port *mlxsw_sp_port, u32 handle,
 		     mlxsw_sp_qdisc->overlimits;
 	drops = xstats->wred_drop[tclass_num] + xstats->tail_drop[tclass_num] -
 		mlxsw_sp_qdisc->drops;
+	backlog = mlxsw_sp_cells_bytes(mlxsw_sp_port->mlxsw_sp,
+				       xstats->backlog[tclass_num]) -
+		mlxsw_sp_qdisc->backlog;
 
 	_bstats_update(res->bstats, tx_bytes, tx_packets);
 	res->qstats->overlimits += overlimits;
 	res->qstats->drops += drops;
-	res->qstats->backlog += mlxsw_sp_cells_bytes(mlxsw_sp_port->mlxsw_sp,
-						xstats->backlog[tclass_num]);
+	res->qstats->backlog +=	backlog;
 
 	mlxsw_sp_qdisc->drops +=  drops;
 	mlxsw_sp_qdisc->overlimits += overlimits;
 	mlxsw_sp_qdisc->tx_bytes += tx_bytes;
 	mlxsw_sp_qdisc->tx_packets += tx_packets;
+	mlxsw_sp_qdisc->backlog += backlog;
+
 	return 0;
 }
 
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index a392eaa4a0b4..caebb7e37551 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -322,7 +322,6 @@ static int red_dump(struct Qdisc *sch, struct sk_buff *skb)
 	};
 	int err;
 
-	sch->qstats.backlog = q->qdisc->qstats.backlog;
 	err = red_dump_offload_stats(sch, &opt);
 	if (err)
 		goto nla_put_failure;
-- 
2.15.1

^ permalink raw reply related

* Re: [net 02/14] Revert "mlx5: move affinity hints assignments to generic code"
From: Saeed Mahameed @ 2018-01-04 20:19 UTC (permalink / raw)
  To: Sagi Grimberg, Saeed Mahameed
  Cc: David S. Miller, Linux Netdev List, Thomas Gleixner, Jes Sorensen
In-Reply-To: <95a5c181-79fa-da60-1c94-766e19e97dd3@grimberg.me>



On 12/26/2017 9:14 AM, Sagi Grimberg wrote:
> 
>> Are you sure it won't get populated at all  ? even if you manually set
>> IRQ affinity via sysfs ?
> 
> Yes, the msi_desc affinity is not initialized without the affinity
> descriptor passed (which is what PCI_IRQ_AFFINITY is for).
> 
>> Anyway we can implement this driver helper function to return the IRQ
>> affinity hint stored in the driver:
>>   "cpumask_first(mdev->priv.irq_info[vector].mask);"
> 
> minus the cpumask_first, but yea. Please send a new patch so we can
> test it out.

Actually using mdev->priv.irq_info[vector].mask is wrong since it only 
gives the initial hint and not the current actual affinity mask.

I found a better way to address this and return the actual dynamic 
affinity of an interrupt vector.

I will send a patch to net soon.

^ permalink raw reply

* [net-next 00/10] net: create dynamic software irq moderation library
From: Andy Gospodarek @ 2018-01-04 20:21 UTC (permalink / raw)
  To: netdev; +Cc: mchan, talgi, ogerlitz, Andy Gospodarek

From: Andy Gospodarek <gospo@broadcom.com>

This converts the dynamic interrupt moderation library from the mlx5_en driver
into a library so it can be used by any driver.  The penultimatepatch in this
set adds support for interrupt moderation in the bnxt_en driver and the last
patch creates an entry in the MAINTAINERS file.  

The main purpose of this code in the mlx5_en driver is to allow an
administrator to make sure that default coalesce settings are optimized
for low latency, but quickly adapt to handle high throughput traffic and
optimize how many packets are received during each napi poll.

For any new driver the following changes would be needed to use this
library:

- add elements in ring struct to track items needed by this library
- create function that can be called to actually set coalesce settings
  for the driver

Credit to Rob Rice and Lee Reed for doing some of the initial proof of
concept and testing for this patch and Tal Gilboa and Or Gerlitz for their
comments, etc on this set.

Andy Gospodarek (10):
  net/mlx5e: move interrupt moderation structs to new file
  net/mlx5e: move interrupt moderation forward declarations
  net/mlx5e: remove rq references in mlx5e_rx_am
  net/mlx5e: move AM logic enums
  net/mlx5e: move generic functions to new file
  net/mlx5e: change Mellanox references in DIM code
  net: move dynamic interrpt coalescing code to include/linux
  net/dim: use struct net_dim_sample as arg to net_dim
  bnxt_en: add support for software dynamic interrupt moderation
  MAINTAINERS: add entry for Dynamic Interrupt Moderation

 MAINTAINERS                                        |   5 +
 drivers/net/ethernet/broadcom/bnxt/Makefile        |   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |  52 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h          |  34 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c      |  32 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  |  12 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  46 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  49 +++
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  12 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  32 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 341 -------------------
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  10 +-
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  | 108 ++++++
 include/linux/mlx5/mlx5_ifc.h                      |   6 -
 include/linux/net_dim.h                            | 372 +++++++++++++++++++++
 17 files changed, 693 insertions(+), 426 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h
 create mode 100644 include/linux/net_dim.h

-- 
2.7.4

^ permalink raw reply

* [net-next 01/10] net/mlx5e: move interrupt moderation structs to new file
From: Andy Gospodarek @ 2018-01-04 20:21 UTC (permalink / raw)
  To: netdev; +Cc: mchan, talgi, ogerlitz, Andy Gospodarek
In-Reply-To: <1515097290-17470-1-git-send-email-andy@greyhouse.net>

From: Andy Gospodarek <gospo@broadcom.com>

Create new header file to prepare to move code that handles irq
moderation to a library that lives in a header file.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h     | 33 +----------
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h | 75 ++++++++++++++++++++++++
 include/linux/mlx5/mlx5_ifc.h                    |  6 --
 3 files changed, 76 insertions(+), 38 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 543060c..ddb5429 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -49,6 +49,7 @@
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
+#include "en_dim.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
@@ -226,12 +227,6 @@ enum mlx5e_priv_flag {
 #define MLX5E_MAX_BW_ALLOC 100 /* Max percentage of BW allocation */
 #endif
 
-struct mlx5e_cq_moder {
-	u16 usec;
-	u16 pkts;
-	u8 cq_period_mode;
-};
-
 struct mlx5e_params {
 	u8  log_sq_size;
 	u8  rq_wq_type;
@@ -472,32 +467,6 @@ struct mlx5e_mpw_info {
 	u16 skbs_frags[MLX5_MPWRQ_PAGES_PER_WQE];
 };
 
-struct mlx5e_rx_am_stats {
-	int ppms; /* packets per msec */
-	int bpms; /* bytes per msec */
-	int epms; /* events per msec */
-};
-
-struct mlx5e_rx_am_sample {
-	ktime_t	time;
-	u32	pkt_ctr;
-	u32	byte_ctr;
-	u16	event_ctr;
-};
-
-struct mlx5e_rx_am { /* Adaptive Moderation */
-	u8					state;
-	struct mlx5e_rx_am_stats		prev_stats;
-	struct mlx5e_rx_am_sample		start_sample;
-	struct work_struct			work;
-	u8					profile_ix;
-	u8					mode;
-	u8					tune_state;
-	u8					steps_right;
-	u8					steps_left;
-	u8					tired;
-};
-
 /* a single cache unit is capable to serve one napi call (for non-striding rq)
  * or a MPWQE (for striding rq).
  */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
new file mode 100644
index 0000000..84b8524
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -0,0 +1,75 @@
+/*
+ * Copyright (c) 2013-2015, Mellanox Technologies, Ltd.  All rights reserved.
+ * Copyright (c) 2017, Broadcom Limited
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+*/
+
+#ifndef MLX5_AM_H
+#define MLX5_AM_H
+
+struct mlx5e_cq_moder {
+	u16 usec;
+	u16 pkts;
+	u8 cq_period_mode;
+};
+
+struct mlx5e_rx_am_sample {
+	ktime_t time;
+	u32     pkt_ctr;
+	u32     byte_ctr;
+	u16     event_ctr;
+};
+
+struct mlx5e_rx_am_stats {
+	int ppms; /* packets per msec */
+	int bpms; /* bytes per msec */
+	int epms; /* events per msec */
+};
+
+struct mlx5e_rx_am { /* Adaptive Moderation */
+	u8                                      state;
+	struct mlx5e_rx_am_stats                prev_stats;
+	struct mlx5e_rx_am_sample               start_sample;
+	struct work_struct                      work;
+	u8                                      profile_ix;
+	u8                                      mode;
+	u8                                      tune_state;
+	u8                                      steps_right;
+	u8                                      steps_left;
+	u8                                      tired;
+};
+
+enum {
+	MLX5_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
+	MLX5_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
+	MLX5_CQ_PERIOD_NUM_MODES
+};
+
+#endif /* MLX5_AM_H */
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index d44ec5f..60d17e5 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -2918,12 +2918,6 @@ enum {
 	MLX5_CQC_ST_FIRED                                 = 0xa,
 };
 
-enum {
-	MLX5_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
-	MLX5_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
-	MLX5_CQ_PERIOD_NUM_MODES
-};
-
 struct mlx5_ifc_cqc_bits {
 	u8         status[0x4];
 	u8         reserved_at_4[0x4];
-- 
2.7.4

^ permalink raw reply related

* [net-next 02/10] net/mlx5e: move interrupt moderation forward declarations
From: Andy Gospodarek @ 2018-01-04 20:21 UTC (permalink / raw)
  To: netdev; +Cc: mchan, talgi, ogerlitz, Andy Gospodarek
In-Reply-To: <1515097290-17470-1-git-send-email-andy@greyhouse.net>

From: Andy Gospodarek <gospo@broadcom.com>

Move these to newly created file to prepare to move these functions to a
library.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h     | 4 ----
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h | 5 +++++
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ddb5429..2ccedf6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -829,10 +829,6 @@ void mlx5e_dealloc_rx_wqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_dealloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_free_rx_mpwqe(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi);
 
-void mlx5e_rx_am(struct mlx5e_rq *rq);
-void mlx5e_rx_am_work(struct work_struct *work);
-struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
-
 void mlx5e_update_stats(struct mlx5e_priv *priv, bool full);
 
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index 84b8524..f5f6535 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -72,4 +72,9 @@ enum {
 	MLX5_CQ_PERIOD_NUM_MODES
 };
 
+struct mlx5e_rq;
+void mlx5e_rx_am(struct mlx5e_rq *rq);
+void mlx5e_rx_am_work(struct work_struct *work);
+struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
+
 #endif /* MLX5_AM_H */
-- 
2.7.4

^ permalink raw reply related

* [net-next 03/10] net/mlx5e: remove rq references in mlx5e_rx_am
From: Andy Gospodarek @ 2018-01-04 20:21 UTC (permalink / raw)
  To: netdev; +Cc: mchan, talgi, ogerlitz, Andy Gospodarek
In-Reply-To: <1515097290-17470-1-git-send-email-andy@greyhouse.net>

From: Andy Gospodarek <gospo@broadcom.com>

This makes mlx5e_am_sample more generic so that it can be called easily
from a driver that does not use the same data structure to store these
values in a single structure.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h   |  6 ++++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 22 +++++++++++++---------
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  5 ++++-
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index f5f6535..b676a057 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -72,8 +72,10 @@ enum {
 	MLX5_CQ_PERIOD_NUM_MODES
 };
 
-struct mlx5e_rq;
-void mlx5e_rx_am(struct mlx5e_rq *rq);
+void mlx5e_rx_am(struct mlx5e_rx_am *am,
+		 u16 event_ctr,
+		 u64 packets,
+		 u64 bytes);
 void mlx5e_rx_am_work(struct work_struct *work);
 struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index e401d9d..1630076 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -264,13 +264,15 @@ static bool mlx5e_am_decision(struct mlx5e_rx_am_stats *curr_stats,
 	return am->profile_ix != prev_ix;
 }
 
-static void mlx5e_am_sample(struct mlx5e_rq *rq,
+static void mlx5e_am_sample(u16 event_ctr,
+			    u64 packets,
+			    u64 bytes,
 			    struct mlx5e_rx_am_sample *s)
 {
 	s->time	     = ktime_get();
-	s->pkt_ctr   = rq->stats.packets;
-	s->byte_ctr  = rq->stats.bytes;
-	s->event_ctr = rq->cq.event_ctr;
+	s->pkt_ctr   = packets;
+	s->byte_ctr  = bytes;
+	s->event_ctr = event_ctr;
 }
 
 #define MLX5E_AM_NEVENTS 64
@@ -309,20 +311,22 @@ void mlx5e_rx_am_work(struct work_struct *work)
 	am->state = MLX5E_AM_START_MEASURE;
 }
 
-void mlx5e_rx_am(struct mlx5e_rq *rq)
+void mlx5e_rx_am(struct mlx5e_rx_am *am,
+		 u16 event_ctr,
+		 u64 packets,
+		 u64 bytes)
 {
-	struct mlx5e_rx_am *am = &rq->am;
 	struct mlx5e_rx_am_sample end_sample;
 	struct mlx5e_rx_am_stats curr_stats;
 	u16 nevents;
 
 	switch (am->state) {
 	case MLX5E_AM_MEASURE_IN_PROGRESS:
-		nevents = BIT_GAP(BITS_PER_TYPE(u16), rq->cq.event_ctr,
+		nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
 				  am->start_sample.event_ctr);
 		if (nevents < MLX5E_AM_NEVENTS)
 			break;
-		mlx5e_am_sample(rq, &end_sample);
+		mlx5e_am_sample(event_ctr, packets, bytes, &end_sample);
 		mlx5e_am_calc_stats(&am->start_sample, &end_sample,
 				    &curr_stats);
 		if (mlx5e_am_decision(&curr_stats, am)) {
@@ -332,7 +336,7 @@ void mlx5e_rx_am(struct mlx5e_rq *rq)
 		}
 		/* fall through */
 	case MLX5E_AM_START_MEASURE:
-		mlx5e_am_sample(rq, &am->start_sample);
+		mlx5e_am_sample(event_ctr, packets, bytes, &am->start_sample);
 		am->state = MLX5E_AM_MEASURE_IN_PROGRESS;
 		break;
 	case MLX5E_AM_APPLY_NEW_PROFILE:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index ab92298..1849169 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -79,7 +79,10 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
 		mlx5e_cq_arm(&c->sq[i].cq);
 
 	if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM))
-		mlx5e_rx_am(&c->rq);
+		mlx5e_rx_am(&c->rq.am,
+			    c->rq.cq.event_ctr,
+			    c->rq.stats.packets,
+			    c->rq.stats.bytes);
 
 	mlx5e_cq_arm(&c->rq.cq);
 	mlx5e_cq_arm(&c->icosq.cq);
-- 
2.7.4

^ permalink raw reply related

* [net-next 04/10] net/mlx5e: move AM logic enums
From: Andy Gospodarek @ 2018-01-04 20:21 UTC (permalink / raw)
  To: netdev; +Cc: mchan, talgi, ogerlitz, Andy Gospodarek
In-Reply-To: <1515097290-17470-1-git-send-email-andy@greyhouse.net>

From: Andy Gospodarek <gospo@broadcom.com>

More movement to help make this code more generic.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h   | 26 ++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 25 ---------------------
 2 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index b676a057..c9f0d05 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -72,6 +72,32 @@ enum {
 	MLX5_CQ_PERIOD_NUM_MODES
 };
 
+/* Adaptive moderation logic */
+enum {
+	MLX5E_AM_START_MEASURE,
+	MLX5E_AM_MEASURE_IN_PROGRESS,
+	MLX5E_AM_APPLY_NEW_PROFILE,
+};
+
+enum {
+	MLX5E_AM_PARKING_ON_TOP,
+	MLX5E_AM_PARKING_TIRED,
+	MLX5E_AM_GOING_RIGHT,
+	MLX5E_AM_GOING_LEFT,
+};
+
+enum {
+	MLX5E_AM_STATS_WORSE,
+	MLX5E_AM_STATS_SAME,
+	MLX5E_AM_STATS_BETTER,
+};
+
+enum {
+	MLX5E_AM_STEPPED,
+	MLX5E_AM_TOO_TIRED,
+	MLX5E_AM_ON_EDGE,
+};
+
 void mlx5e_rx_am(struct mlx5e_rx_am *am,
 		 u16 event_ctr,
 		 u64 packets,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index 1630076..337dd60 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -82,31 +82,6 @@ struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode)
 	return mlx5e_am_get_profile(rx_cq_period_mode, default_profile_ix);
 }
 
-/* Adaptive moderation logic */
-enum {
-	MLX5E_AM_START_MEASURE,
-	MLX5E_AM_MEASURE_IN_PROGRESS,
-	MLX5E_AM_APPLY_NEW_PROFILE,
-};
-
-enum {
-	MLX5E_AM_PARKING_ON_TOP,
-	MLX5E_AM_PARKING_TIRED,
-	MLX5E_AM_GOING_RIGHT,
-	MLX5E_AM_GOING_LEFT,
-};
-
-enum {
-	MLX5E_AM_STATS_WORSE,
-	MLX5E_AM_STATS_SAME,
-	MLX5E_AM_STATS_BETTER,
-};
-
-enum {
-	MLX5E_AM_STEPPED,
-	MLX5E_AM_TOO_TIRED,
-	MLX5E_AM_ON_EDGE,
-};
 
 static bool mlx5e_am_on_top(struct mlx5e_rx_am *am)
 {
-- 
2.7.4

^ permalink raw reply related

* [net-next 05/10] net/mlx5e: move generic functions to new file
From: Andy Gospodarek @ 2018-01-04 20:21 UTC (permalink / raw)
  To: netdev; +Cc: mchan, talgi, ogerlitz, Andy Gospodarek
In-Reply-To: <1515097290-17470-1-git-send-email-andy@greyhouse.net>

From: Andy Gospodarek <gospo@broadcom.com>

These functions were identified as ones that could be made generic and
used by multiple drivers.  Most of the contents of en_rx_am.c are moved
to net_dim.c.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  48 ++++
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h   |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 320 ---------------------
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c  | 307 ++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  | 109 +++++++
 6 files changed, 467 insertions(+), 322 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 19b21b4..b46b6de2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -14,8 +14,8 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o fpga/conn.o fpga/sdk.o \
 		fpga/ipsec.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
-		en_tx.o en_rx.o en_rx_am.o en_txrx.o en_stats.o vxlan.o \
-		en_arfs.o en_fs_ethtool.o en_selftest.o
+		en_tx.o en_rx.o en_dim.o en_txrx.o en_stats.o vxlan.o \
+		en_arfs.o en_fs_ethtool.o en_selftest.o net_dim.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
new file mode 100644
index 0000000..b9b434b
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -0,0 +1,48 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "en.h"
+
+void mlx5e_rx_am_work(struct work_struct *work)
+{
+	struct mlx5e_rx_am *am = container_of(work, struct mlx5e_rx_am,
+					      work);
+	struct mlx5e_rq *rq = container_of(am, struct mlx5e_rq, am);
+	struct mlx5e_cq_moder cur_profile = mlx5e_am_get_profile(am->mode,
+								 am->profile_ix);
+
+	mlx5_core_modify_cq_moderation(rq->mdev, &rq->cq.mcq,
+				       cur_profile.usec, cur_profile.pkts);
+
+	am->state = MLX5E_AM_START_MEASURE;
+}
+
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index c9f0d05..5ce8e54 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -104,5 +104,6 @@ void mlx5e_rx_am(struct mlx5e_rx_am *am,
 		 u64 bytes);
 void mlx5e_rx_am_work(struct work_struct *work);
 struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
+struct mlx5e_cq_moder mlx5e_am_get_profile(u8 cq_period_mode, int ix);
 
 #endif /* MLX5_AM_H */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
deleted file mode 100644
index 337dd60..0000000
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ /dev/null
@@ -1,320 +0,0 @@
-/*
- * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-
-#include "en.h"
-
-/* Adaptive moderation profiles */
-#define MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
-#define MLX5E_RX_AM_DEF_PROFILE_CQE 1
-#define MLX5E_RX_AM_DEF_PROFILE_EQE 1
-#define MLX5E_PARAMS_AM_NUM_PROFILES 5
-
-/* All profiles sizes must be MLX5E_PARAMS_AM_NUM_PROFILES */
-#define MLX5_AM_EQE_PROFILES { \
-	{1,   MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-	{8,   MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-	{64,  MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-	{128, MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-	{256, MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-}
-
-#define MLX5_AM_CQE_PROFILES { \
-	{2,  256},             \
-	{8,  128},             \
-	{16, 64},              \
-	{32, 64},              \
-	{64, 64}               \
-}
-
-static const struct mlx5e_cq_moder
-profile[MLX5_CQ_PERIOD_NUM_MODES][MLX5E_PARAMS_AM_NUM_PROFILES] = {
-	MLX5_AM_EQE_PROFILES,
-	MLX5_AM_CQE_PROFILES,
-};
-
-static inline struct mlx5e_cq_moder mlx5e_am_get_profile(u8 cq_period_mode, int ix)
-{
-	struct mlx5e_cq_moder cq_moder;
-
-	cq_moder = profile[cq_period_mode][ix];
-	cq_moder.cq_period_mode = cq_period_mode;
-	return cq_moder;
-}
-
-struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode)
-{
-	int default_profile_ix;
-
-	if (rx_cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE)
-		default_profile_ix = MLX5E_RX_AM_DEF_PROFILE_CQE;
-	else /* MLX5_CQ_PERIOD_MODE_START_FROM_EQE */
-		default_profile_ix = MLX5E_RX_AM_DEF_PROFILE_EQE;
-
-	return mlx5e_am_get_profile(rx_cq_period_mode, default_profile_ix);
-}
-
-
-static bool mlx5e_am_on_top(struct mlx5e_rx_am *am)
-{
-	switch (am->tune_state) {
-	case MLX5E_AM_PARKING_ON_TOP:
-	case MLX5E_AM_PARKING_TIRED:
-		return true;
-	case MLX5E_AM_GOING_RIGHT:
-		return (am->steps_left > 1) && (am->steps_right == 1);
-	default: /* MLX5E_AM_GOING_LEFT */
-		return (am->steps_right > 1) && (am->steps_left == 1);
-	}
-}
-
-static void mlx5e_am_turn(struct mlx5e_rx_am *am)
-{
-	switch (am->tune_state) {
-	case MLX5E_AM_PARKING_ON_TOP:
-	case MLX5E_AM_PARKING_TIRED:
-		break;
-	case MLX5E_AM_GOING_RIGHT:
-		am->tune_state = MLX5E_AM_GOING_LEFT;
-		am->steps_left = 0;
-		break;
-	case MLX5E_AM_GOING_LEFT:
-		am->tune_state = MLX5E_AM_GOING_RIGHT;
-		am->steps_right = 0;
-		break;
-	}
-}
-
-static int mlx5e_am_step(struct mlx5e_rx_am *am)
-{
-	if (am->tired == (MLX5E_PARAMS_AM_NUM_PROFILES * 2))
-		return MLX5E_AM_TOO_TIRED;
-
-	switch (am->tune_state) {
-	case MLX5E_AM_PARKING_ON_TOP:
-	case MLX5E_AM_PARKING_TIRED:
-		break;
-	case MLX5E_AM_GOING_RIGHT:
-		if (am->profile_ix == (MLX5E_PARAMS_AM_NUM_PROFILES - 1))
-			return MLX5E_AM_ON_EDGE;
-		am->profile_ix++;
-		am->steps_right++;
-		break;
-	case MLX5E_AM_GOING_LEFT:
-		if (am->profile_ix == 0)
-			return MLX5E_AM_ON_EDGE;
-		am->profile_ix--;
-		am->steps_left++;
-		break;
-	}
-
-	am->tired++;
-	return MLX5E_AM_STEPPED;
-}
-
-static void mlx5e_am_park_on_top(struct mlx5e_rx_am *am)
-{
-	am->steps_right  = 0;
-	am->steps_left   = 0;
-	am->tired        = 0;
-	am->tune_state   = MLX5E_AM_PARKING_ON_TOP;
-}
-
-static void mlx5e_am_park_tired(struct mlx5e_rx_am *am)
-{
-	am->steps_right  = 0;
-	am->steps_left   = 0;
-	am->tune_state   = MLX5E_AM_PARKING_TIRED;
-}
-
-static void mlx5e_am_exit_parking(struct mlx5e_rx_am *am)
-{
-	am->tune_state = am->profile_ix ? MLX5E_AM_GOING_LEFT :
-					  MLX5E_AM_GOING_RIGHT;
-	mlx5e_am_step(am);
-}
-
-#define IS_SIGNIFICANT_DIFF(val, ref) \
-	(((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */
-
-static int mlx5e_am_stats_compare(struct mlx5e_rx_am_stats *curr,
-				  struct mlx5e_rx_am_stats *prev)
-{
-	if (!prev->bpms)
-		return curr->bpms ? MLX5E_AM_STATS_BETTER :
-				    MLX5E_AM_STATS_SAME;
-
-	if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
-		return (curr->bpms > prev->bpms) ? MLX5E_AM_STATS_BETTER :
-						   MLX5E_AM_STATS_WORSE;
-
-	if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
-		return (curr->ppms > prev->ppms) ? MLX5E_AM_STATS_BETTER :
-						   MLX5E_AM_STATS_WORSE;
-
-	if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
-		return (curr->epms < prev->epms) ? MLX5E_AM_STATS_BETTER :
-						   MLX5E_AM_STATS_WORSE;
-
-	return MLX5E_AM_STATS_SAME;
-}
-
-static bool mlx5e_am_decision(struct mlx5e_rx_am_stats *curr_stats,
-			      struct mlx5e_rx_am *am)
-{
-	int prev_state = am->tune_state;
-	int prev_ix = am->profile_ix;
-	int stats_res;
-	int step_res;
-
-	switch (am->tune_state) {
-	case MLX5E_AM_PARKING_ON_TOP:
-		stats_res = mlx5e_am_stats_compare(curr_stats, &am->prev_stats);
-		if (stats_res != MLX5E_AM_STATS_SAME)
-			mlx5e_am_exit_parking(am);
-		break;
-
-	case MLX5E_AM_PARKING_TIRED:
-		am->tired--;
-		if (!am->tired)
-			mlx5e_am_exit_parking(am);
-		break;
-
-	case MLX5E_AM_GOING_RIGHT:
-	case MLX5E_AM_GOING_LEFT:
-		stats_res = mlx5e_am_stats_compare(curr_stats, &am->prev_stats);
-		if (stats_res != MLX5E_AM_STATS_BETTER)
-			mlx5e_am_turn(am);
-
-		if (mlx5e_am_on_top(am)) {
-			mlx5e_am_park_on_top(am);
-			break;
-		}
-
-		step_res = mlx5e_am_step(am);
-		switch (step_res) {
-		case MLX5E_AM_ON_EDGE:
-			mlx5e_am_park_on_top(am);
-			break;
-		case MLX5E_AM_TOO_TIRED:
-			mlx5e_am_park_tired(am);
-			break;
-		}
-
-		break;
-	}
-
-	if ((prev_state     != MLX5E_AM_PARKING_ON_TOP) ||
-	    (am->tune_state != MLX5E_AM_PARKING_ON_TOP))
-		am->prev_stats = *curr_stats;
-
-	return am->profile_ix != prev_ix;
-}
-
-static void mlx5e_am_sample(u16 event_ctr,
-			    u64 packets,
-			    u64 bytes,
-			    struct mlx5e_rx_am_sample *s)
-{
-	s->time	     = ktime_get();
-	s->pkt_ctr   = packets;
-	s->byte_ctr  = bytes;
-	s->event_ctr = event_ctr;
-}
-
-#define MLX5E_AM_NEVENTS 64
-#define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
-#define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) & (BIT_ULL(bits) - 1))
-
-static void mlx5e_am_calc_stats(struct mlx5e_rx_am_sample *start,
-				struct mlx5e_rx_am_sample *end,
-				struct mlx5e_rx_am_stats *curr_stats)
-{
-	/* u32 holds up to 71 minutes, should be enough */
-	u32 delta_us = ktime_us_delta(end->time, start->time);
-	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
-	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
-			     start->byte_ctr);
-
-	if (!delta_us)
-		return;
-
-	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
-	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
-	curr_stats->epms = DIV_ROUND_UP(MLX5E_AM_NEVENTS * USEC_PER_MSEC,
-					delta_us);
-}
-
-void mlx5e_rx_am_work(struct work_struct *work)
-{
-	struct mlx5e_rx_am *am = container_of(work, struct mlx5e_rx_am,
-					      work);
-	struct mlx5e_rq *rq = container_of(am, struct mlx5e_rq, am);
-	struct mlx5e_cq_moder cur_profile = profile[am->mode][am->profile_ix];
-
-	mlx5_core_modify_cq_moderation(rq->mdev, &rq->cq.mcq,
-				       cur_profile.usec, cur_profile.pkts);
-
-	am->state = MLX5E_AM_START_MEASURE;
-}
-
-void mlx5e_rx_am(struct mlx5e_rx_am *am,
-		 u16 event_ctr,
-		 u64 packets,
-		 u64 bytes)
-{
-	struct mlx5e_rx_am_sample end_sample;
-	struct mlx5e_rx_am_stats curr_stats;
-	u16 nevents;
-
-	switch (am->state) {
-	case MLX5E_AM_MEASURE_IN_PROGRESS:
-		nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
-				  am->start_sample.event_ctr);
-		if (nevents < MLX5E_AM_NEVENTS)
-			break;
-		mlx5e_am_sample(event_ctr, packets, bytes, &end_sample);
-		mlx5e_am_calc_stats(&am->start_sample, &end_sample,
-				    &curr_stats);
-		if (mlx5e_am_decision(&curr_stats, am)) {
-			am->state = MLX5E_AM_APPLY_NEW_PROFILE;
-			schedule_work(&am->work);
-			break;
-		}
-		/* fall through */
-	case MLX5E_AM_START_MEASURE:
-		mlx5e_am_sample(event_ctr, packets, bytes, &am->start_sample);
-		am->state = MLX5E_AM_MEASURE_IN_PROGRESS;
-		break;
-	case MLX5E_AM_APPLY_NEW_PROFILE:
-		break;
-	}
-}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
new file mode 100644
index 0000000..ca05f4e
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
@@ -0,0 +1,307 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2017, Broadcom Limiited. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "en.h"
+
+#define MLX5E_PARAMS_AM_NUM_PROFILES 5
+/* Adaptive moderation profiles */
+#define MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
+#define MLX5E_RX_AM_DEF_PROFILE_CQE 1
+#define MLX5E_RX_AM_DEF_PROFILE_EQE 1
+
+/* All profiles sizes must be MLX5E_PARAMS_AM_NUM_PROFILES */
+#define MLX5_AM_EQE_PROFILES { \
+	{1,   MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+	{8,   MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+	{64,  MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+	{128, MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+	{256, MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+}
+
+#define MLX5_AM_CQE_PROFILES { \
+	{2,  256},             \
+	{8,  128},             \
+	{16, 64},              \
+	{32, 64},              \
+	{64, 64}               \
+}
+
+static const struct mlx5e_cq_moder
+profile[MLX5_CQ_PERIOD_NUM_MODES][MLX5E_PARAMS_AM_NUM_PROFILES] = {
+	MLX5_AM_EQE_PROFILES,
+	MLX5_AM_CQE_PROFILES,
+};
+
+struct mlx5e_cq_moder mlx5e_am_get_profile(u8 cq_period_mode, int ix)
+{
+	struct mlx5e_cq_moder cq_moder;
+
+	cq_moder = profile[cq_period_mode][ix];
+	cq_moder.cq_period_mode = cq_period_mode;
+	return cq_moder;
+}
+
+struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode)
+{
+	int default_profile_ix;
+
+	if (rx_cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE)
+		default_profile_ix = MLX5E_RX_AM_DEF_PROFILE_CQE;
+	else /* MLX5_CQ_PERIOD_MODE_START_FROM_EQE */
+		default_profile_ix = MLX5E_RX_AM_DEF_PROFILE_EQE;
+
+	return mlx5e_am_get_profile(rx_cq_period_mode, default_profile_ix);
+}
+
+static bool mlx5e_am_on_top(struct mlx5e_rx_am *am)
+{
+	switch (am->tune_state) {
+	case MLX5E_AM_PARKING_ON_TOP:
+	case MLX5E_AM_PARKING_TIRED:
+		return true;
+	case MLX5E_AM_GOING_RIGHT:
+		return (am->steps_left > 1) && (am->steps_right == 1);
+	default: /* MLX5E_AM_GOING_LEFT */
+		return (am->steps_right > 1) && (am->steps_left == 1);
+	}
+}
+
+static void mlx5e_am_turn(struct mlx5e_rx_am *am)
+{
+	switch (am->tune_state) {
+	case MLX5E_AM_PARKING_ON_TOP:
+	case MLX5E_AM_PARKING_TIRED:
+		break;
+	case MLX5E_AM_GOING_RIGHT:
+		am->tune_state = MLX5E_AM_GOING_LEFT;
+		am->steps_left = 0;
+		break;
+	case MLX5E_AM_GOING_LEFT:
+		am->tune_state = MLX5E_AM_GOING_RIGHT;
+		am->steps_right = 0;
+		break;
+	}
+}
+
+static int mlx5e_am_step(struct mlx5e_rx_am *am)
+{
+	if (am->tired == (MLX5E_PARAMS_AM_NUM_PROFILES * 2))
+		return MLX5E_AM_TOO_TIRED;
+
+	switch (am->tune_state) {
+	case MLX5E_AM_PARKING_ON_TOP:
+	case MLX5E_AM_PARKING_TIRED:
+		break;
+	case MLX5E_AM_GOING_RIGHT:
+		if (am->profile_ix == (MLX5E_PARAMS_AM_NUM_PROFILES - 1))
+			return MLX5E_AM_ON_EDGE;
+		am->profile_ix++;
+		am->steps_right++;
+		break;
+	case MLX5E_AM_GOING_LEFT:
+		if (am->profile_ix == 0)
+			return MLX5E_AM_ON_EDGE;
+		am->profile_ix--;
+		am->steps_left++;
+		break;
+	}
+
+	am->tired++;
+	return MLX5E_AM_STEPPED;
+}
+
+static void mlx5e_am_park_on_top(struct mlx5e_rx_am *am)
+{
+	am->steps_right  = 0;
+	am->steps_left   = 0;
+	am->tired        = 0;
+	am->tune_state   = MLX5E_AM_PARKING_ON_TOP;
+}
+
+static void mlx5e_am_park_tired(struct mlx5e_rx_am *am)
+{
+	am->steps_right  = 0;
+	am->steps_left   = 0;
+	am->tune_state   = MLX5E_AM_PARKING_TIRED;
+}
+
+static void mlx5e_am_exit_parking(struct mlx5e_rx_am *am)
+{
+	am->tune_state = am->profile_ix ? MLX5E_AM_GOING_LEFT :
+					  MLX5E_AM_GOING_RIGHT;
+	mlx5e_am_step(am);
+}
+
+#define IS_SIGNIFICANT_DIFF(val, ref) \
+	(((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */
+
+static int mlx5e_am_stats_compare(struct mlx5e_rx_am_stats *curr,
+				  struct mlx5e_rx_am_stats *prev)
+{
+	if (!prev->bpms)
+		return curr->bpms ? MLX5E_AM_STATS_BETTER :
+				    MLX5E_AM_STATS_SAME;
+
+	if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
+		return (curr->bpms > prev->bpms) ? MLX5E_AM_STATS_BETTER :
+						   MLX5E_AM_STATS_WORSE;
+
+	if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
+		return (curr->ppms > prev->ppms) ? MLX5E_AM_STATS_BETTER :
+						   MLX5E_AM_STATS_WORSE;
+
+	if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
+		return (curr->epms < prev->epms) ? MLX5E_AM_STATS_BETTER :
+						   MLX5E_AM_STATS_WORSE;
+
+	return MLX5E_AM_STATS_SAME;
+}
+
+static bool mlx5e_am_decision(struct mlx5e_rx_am_stats *curr_stats,
+			      struct mlx5e_rx_am *am)
+{
+	int prev_state = am->tune_state;
+	int prev_ix = am->profile_ix;
+	int stats_res;
+	int step_res;
+
+	switch (am->tune_state) {
+	case MLX5E_AM_PARKING_ON_TOP:
+		stats_res = mlx5e_am_stats_compare(curr_stats, &am->prev_stats);
+		if (stats_res != MLX5E_AM_STATS_SAME)
+			mlx5e_am_exit_parking(am);
+		break;
+
+	case MLX5E_AM_PARKING_TIRED:
+		am->tired--;
+		if (!am->tired)
+			mlx5e_am_exit_parking(am);
+		break;
+
+	case MLX5E_AM_GOING_RIGHT:
+	case MLX5E_AM_GOING_LEFT:
+		stats_res = mlx5e_am_stats_compare(curr_stats, &am->prev_stats);
+		if (stats_res != MLX5E_AM_STATS_BETTER)
+			mlx5e_am_turn(am);
+
+		if (mlx5e_am_on_top(am)) {
+			mlx5e_am_park_on_top(am);
+			break;
+		}
+
+		step_res = mlx5e_am_step(am);
+		switch (step_res) {
+		case MLX5E_AM_ON_EDGE:
+			mlx5e_am_park_on_top(am);
+			break;
+		case MLX5E_AM_TOO_TIRED:
+			mlx5e_am_park_tired(am);
+			break;
+		}
+
+		break;
+	}
+
+	if ((prev_state     != MLX5E_AM_PARKING_ON_TOP) ||
+	    (am->tune_state != MLX5E_AM_PARKING_ON_TOP))
+		am->prev_stats = *curr_stats;
+
+	return am->profile_ix != prev_ix;
+}
+
+static void mlx5e_am_sample(u16 event_ctr,
+			    u64 packets,
+			    u64 bytes,
+			    struct mlx5e_rx_am_sample *s)
+{
+	s->time	     = ktime_get();
+	s->pkt_ctr   = packets;
+	s->byte_ctr  = bytes;
+	s->event_ctr = event_ctr;
+}
+
+#define MLX5E_AM_NEVENTS 64
+#define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
+#define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) & (BIT_ULL(bits) - 1))
+
+static void mlx5e_am_calc_stats(struct mlx5e_rx_am_sample *start,
+				struct mlx5e_rx_am_sample *end,
+				struct mlx5e_rx_am_stats *curr_stats)
+{
+	/* u32 holds up to 71 minutes, should be enough */
+	u32 delta_us = ktime_us_delta(end->time, start->time);
+	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
+	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
+			     start->byte_ctr);
+
+	if (!delta_us)
+		return;
+
+	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
+	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
+	curr_stats->epms = DIV_ROUND_UP(MLX5E_AM_NEVENTS * USEC_PER_MSEC,
+					delta_us);
+}
+
+void mlx5e_rx_am(struct mlx5e_rx_am *am,
+		 u16 event_ctr,
+		 u64 packets,
+		 u64 bytes)
+{
+	struct mlx5e_rx_am_sample end_sample;
+	struct mlx5e_rx_am_stats curr_stats;
+	u16 nevents;
+
+	switch (am->state) {
+	case MLX5E_AM_MEASURE_IN_PROGRESS:
+		nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
+				  am->start_sample.event_ctr);
+		if (nevents < MLX5E_AM_NEVENTS)
+			break;
+		mlx5e_am_sample(event_ctr, packets, bytes, &end_sample);
+		mlx5e_am_calc_stats(&am->start_sample, &end_sample,
+				    &curr_stats);
+		if (mlx5e_am_decision(&curr_stats, am)) {
+			am->state = MLX5E_AM_APPLY_NEW_PROFILE;
+			schedule_work(&am->work);
+			break;
+		}
+		/* fall through */
+	case MLX5E_AM_START_MEASURE:
+		mlx5e_am_sample(event_ctr, packets, bytes, &am->start_sample);
+		am->state = MLX5E_AM_MEASURE_IN_PROGRESS;
+		break;
+	case MLX5E_AM_APPLY_NEW_PROFILE:
+		break;
+	}
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.h b/drivers/net/ethernet/mellanox/mlx5/core/net_dim.h
new file mode 100644
index 0000000..5ce8e54
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/net_dim.h
@@ -0,0 +1,109 @@
+/*
+ * Copyright (c) 2013-2015, Mellanox Technologies, Ltd.  All rights reserved.
+ * Copyright (c) 2017, Broadcom Limited
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+*/
+
+#ifndef MLX5_AM_H
+#define MLX5_AM_H
+
+struct mlx5e_cq_moder {
+	u16 usec;
+	u16 pkts;
+	u8 cq_period_mode;
+};
+
+struct mlx5e_rx_am_sample {
+	ktime_t time;
+	u32     pkt_ctr;
+	u32     byte_ctr;
+	u16     event_ctr;
+};
+
+struct mlx5e_rx_am_stats {
+	int ppms; /* packets per msec */
+	int bpms; /* bytes per msec */
+	int epms; /* events per msec */
+};
+
+struct mlx5e_rx_am { /* Adaptive Moderation */
+	u8                                      state;
+	struct mlx5e_rx_am_stats                prev_stats;
+	struct mlx5e_rx_am_sample               start_sample;
+	struct work_struct                      work;
+	u8                                      profile_ix;
+	u8                                      mode;
+	u8                                      tune_state;
+	u8                                      steps_right;
+	u8                                      steps_left;
+	u8                                      tired;
+};
+
+enum {
+	MLX5_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
+	MLX5_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
+	MLX5_CQ_PERIOD_NUM_MODES
+};
+
+/* Adaptive moderation logic */
+enum {
+	MLX5E_AM_START_MEASURE,
+	MLX5E_AM_MEASURE_IN_PROGRESS,
+	MLX5E_AM_APPLY_NEW_PROFILE,
+};
+
+enum {
+	MLX5E_AM_PARKING_ON_TOP,
+	MLX5E_AM_PARKING_TIRED,
+	MLX5E_AM_GOING_RIGHT,
+	MLX5E_AM_GOING_LEFT,
+};
+
+enum {
+	MLX5E_AM_STATS_WORSE,
+	MLX5E_AM_STATS_SAME,
+	MLX5E_AM_STATS_BETTER,
+};
+
+enum {
+	MLX5E_AM_STEPPED,
+	MLX5E_AM_TOO_TIRED,
+	MLX5E_AM_ON_EDGE,
+};
+
+void mlx5e_rx_am(struct mlx5e_rx_am *am,
+		 u16 event_ctr,
+		 u64 packets,
+		 u64 bytes);
+void mlx5e_rx_am_work(struct work_struct *work);
+struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
+struct mlx5e_cq_moder mlx5e_am_get_profile(u8 cq_period_mode, int ix);
+
+#endif /* MLX5_AM_H */
-- 
2.7.4

^ permalink raw reply related

* [net-next 06/10] net/mlx5e: change Mellanox references in DIM code
From: Andy Gospodarek @ 2018-01-04 20:21 UTC (permalink / raw)
  To: netdev; +Cc: mchan, talgi, ogerlitz, Andy Gospodarek
In-Reply-To: <1515097290-17470-1-git-send-email-andy@greyhouse.net>

From: Andy Gospodarek <gospo@broadcom.com>

Change all mlx5_am* and MLX_AM* references to net_dim and NET_DIM,
respectively, in code that handles dynamic interrupt moderation.  Also
change all references from 'am' to 'dim' when used as local variables.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  10 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  14 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h   |  20 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  12 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  32 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c  | 284 ++++++++++-----------
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  |  63 +++--
 9 files changed, 219 insertions(+), 222 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 2ccedf6..da2d5e7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -50,6 +50,7 @@
 #include "mlx5_core.h"
 #include "en_stats.h"
 #include "en_dim.h"
+#include "net_dim.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
@@ -237,8 +238,8 @@ struct mlx5e_params {
 	u16 num_channels;
 	u8  num_tc;
 	bool rx_cqe_compress_def;
-	struct mlx5e_cq_moder rx_cq_moderation;
-	struct mlx5e_cq_moder tx_cq_moderation;
+	struct net_dim_cq_moder rx_cq_moderation;
+	struct net_dim_cq_moder tx_cq_moderation;
 	bool lro_en;
 	u32 lro_wqe_sz;
 	u16 tx_max_inline;
@@ -248,7 +249,7 @@ struct mlx5e_params {
 	u32 indirection_rqt[MLX5E_INDIR_RQT_SIZE];
 	bool vlan_strip_disable;
 	bool scatter_fcs_en;
-	bool rx_am_enabled;
+	bool rx_dim_enabled;
 	u32 lro_timeout;
 	u32 pflags;
 	struct bpf_prog *xdp_prog;
@@ -527,7 +528,7 @@ struct mlx5e_rq {
 	unsigned long          state;
 	int                    ix;
 
-	struct mlx5e_rx_am     am; /* Adaptive Moderation */
+	struct net_dim         dim; /* Dynamic Interrupt Moderation */
 
 	/* XDP */
 	struct bpf_prog       *xdp_prog;
@@ -1075,4 +1076,5 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
 			    struct mlx5e_params *params,
 			    u16 max_channels);
 u8 mlx5e_params_calculate_tx_min_inline(struct mlx5_core_dev *mdev);
+void mlx5e_rx_dim_work(struct work_struct *work);
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index b9b434b..f620325 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -32,17 +32,17 @@
 
 #include "en.h"
 
-void mlx5e_rx_am_work(struct work_struct *work)
+void mlx5e_rx_dim_work(struct work_struct *work)
 {
-	struct mlx5e_rx_am *am = container_of(work, struct mlx5e_rx_am,
-					      work);
-	struct mlx5e_rq *rq = container_of(am, struct mlx5e_rq, am);
-	struct mlx5e_cq_moder cur_profile = mlx5e_am_get_profile(am->mode,
-								 am->profile_ix);
+	struct net_dim *dim = container_of(work, struct net_dim,
+					   work);
+	struct mlx5e_rq *rq = container_of(dim, struct mlx5e_rq, dim);
+	struct net_dim_cq_moder cur_profile = net_dim_get_profile(dim->mode,
+								  dim->profile_ix);
 
 	mlx5_core_modify_cq_moderation(rq->mdev, &rq->cq.mcq,
 				       cur_profile.usec, cur_profile.pkts);
 
-	am->state = MLX5E_AM_START_MEASURE;
+	dim->state = NET_DIM_START_MEASURE;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
index 5ce8e54..21219de 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
@@ -40,23 +40,23 @@ struct mlx5e_cq_moder {
 	u8 cq_period_mode;
 };
 
-struct mlx5e_rx_am_sample {
+struct mlx5e_rx_dim_sample {
 	ktime_t time;
 	u32     pkt_ctr;
 	u32     byte_ctr;
 	u16     event_ctr;
 };
 
-struct mlx5e_rx_am_stats {
+struct mlx5e_rx_dim_stats {
 	int ppms; /* packets per msec */
 	int bpms; /* bytes per msec */
 	int epms; /* events per msec */
 };
 
-struct mlx5e_rx_am { /* Adaptive Moderation */
+struct mlx5e_rx_dim { /* Adaptive Moderation */
 	u8                                      state;
-	struct mlx5e_rx_am_stats                prev_stats;
-	struct mlx5e_rx_am_sample               start_sample;
+	struct mlx5e_rx_dim_stats               prev_stats;
+	struct mlx5e_rx_dim_sample              start_sample;
 	struct work_struct                      work;
 	u8                                      profile_ix;
 	u8                                      mode;
@@ -98,12 +98,8 @@ enum {
 	MLX5E_AM_ON_EDGE,
 };
 
-void mlx5e_rx_am(struct mlx5e_rx_am *am,
-		 u16 event_ctr,
-		 u64 packets,
-		 u64 bytes);
-void mlx5e_rx_am_work(struct work_struct *work);
-struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
-struct mlx5e_cq_moder mlx5e_am_get_profile(u8 cq_period_mode, int ix);
+void mlx5e_rx_dim_work(struct work_struct *work);
+struct mlx5e_cq_moder mlx5e_dim_get_def_profile(u8 rx_cq_period_mode);
+struct mlx5e_cq_moder mlx5e_dim_get_profile(u8 cq_period_mode, int ix);
 
 #endif /* MLX5_AM_H */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 8f05efa..772d276 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -480,7 +480,7 @@ int mlx5e_ethtool_get_coalesce(struct mlx5e_priv *priv,
 	coal->rx_max_coalesced_frames = priv->channels.params.rx_cq_moderation.pkts;
 	coal->tx_coalesce_usecs       = priv->channels.params.tx_cq_moderation.usec;
 	coal->tx_max_coalesced_frames = priv->channels.params.tx_cq_moderation.pkts;
-	coal->use_adaptive_rx_coalesce = priv->channels.params.rx_am_enabled;
+	coal->use_adaptive_rx_coalesce = priv->channels.params.rx_dim_enabled;
 
 	return 0;
 }
@@ -534,7 +534,7 @@ int mlx5e_ethtool_set_coalesce(struct mlx5e_priv *priv,
 	new_channels.params.tx_cq_moderation.pkts = coal->tx_max_coalesced_frames;
 	new_channels.params.rx_cq_moderation.usec = coal->rx_coalesce_usecs;
 	new_channels.params.rx_cq_moderation.pkts = coal->rx_max_coalesced_frames;
-	new_channels.params.rx_am_enabled         = !!coal->use_adaptive_rx_coalesce;
+	new_channels.params.rx_dim_enabled        = !!coal->use_adaptive_rx_coalesce;
 
 	if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) {
 		priv->channels.params = new_channels.params;
@@ -542,7 +542,7 @@ int mlx5e_ethtool_set_coalesce(struct mlx5e_priv *priv,
 	}
 	/* we are opened */
 
-	reset = !!coal->use_adaptive_rx_coalesce != priv->channels.params.rx_am_enabled;
+	reset = !!coal->use_adaptive_rx_coalesce != priv->channels.params.rx_dim_enabled;
 	if (!reset) {
 		mlx5e_set_priv_channels_coalesce(priv, coal);
 		priv->channels.params = new_channels.params;
@@ -1465,14 +1465,14 @@ static int set_pflag_cqe_based_moder(struct net_device *netdev, bool enable,
 	int err = 0;
 
 	cq_period_mode = enable ?
-		MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
-		MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
+		NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE :
+		NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 	current_cq_period_mode = is_rx_cq ?
 		priv->channels.params.rx_cq_moderation.cq_period_mode :
 		priv->channels.params.tx_cq_moderation.cq_period_mode;
 	mode_changed = cq_period_mode != current_cq_period_mode;
 
-	if (cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE &&
+	if (cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE &&
 	    !MLX5_CAP_GEN(mdev, cq_period_start_from_cqe))
 		return -EOPNOTSUPP;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 3aa1c90..edd4077 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -674,8 +674,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 		wqe->data.lkey = rq->mkey_be;
 	}
 
-	INIT_WORK(&rq->am.work, mlx5e_rx_am_work);
-	rq->am.mode = params->rx_cq_moderation.cq_period_mode;
+	INIT_WORK(&rq->dim.work, mlx5e_rx_dim_work);
+	rq->dim.mode = params->rx_cq_moderation.cq_period_mode;
 	rq->page_cache.head = 0;
 	rq->page_cache.tail = 0;
 
@@ -919,7 +919,7 @@ static int mlx5e_open_rq(struct mlx5e_channel *c,
 	if (err)
 		goto err_destroy_rq;
 
-	if (params->rx_am_enabled)
+	if (params->rx_dim_enabled)
 		c->rq.state |= BIT(MLX5E_RQ_STATE_AM);
 
 	return 0;
@@ -952,7 +952,7 @@ static void mlx5e_deactivate_rq(struct mlx5e_rq *rq)
 
 static void mlx5e_close_rq(struct mlx5e_rq *rq)
 {
-	cancel_work_sync(&rq->am.work);
+	cancel_work_sync(&rq->dim.work);
 	mlx5e_destroy_rq(rq);
 	mlx5e_free_rx_descs(rq);
 	mlx5e_free_rq(rq);
@@ -1565,7 +1565,7 @@ static void mlx5e_destroy_cq(struct mlx5e_cq *cq)
 }
 
 static int mlx5e_open_cq(struct mlx5e_channel *c,
-			 struct mlx5e_cq_moder moder,
+			 struct net_dim_cq_moder moder,
 			 struct mlx5e_cq_param *param,
 			 struct mlx5e_cq *cq)
 {
@@ -1747,7 +1747,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 			      struct mlx5e_channel_param *cparam,
 			      struct mlx5e_channel **cp)
 {
-	struct mlx5e_cq_moder icocq_moder = {0, 0};
+	struct net_dim_cq_moder icocq_moder = {0, 0};
 	struct net_device *netdev = priv->netdev;
 	int cpu = mlx5e_get_cpu(priv, ix);
 	struct mlx5e_channel *c;
@@ -1999,7 +1999,7 @@ static void mlx5e_build_ico_cq_param(struct mlx5e_priv *priv,
 
 	mlx5e_build_common_cq_param(priv, param);
 
-	param->cq_period_mode = MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
+	param->cq_period_mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 }
 
 static void mlx5e_build_icosq_param(struct mlx5e_priv *priv,
@@ -4016,13 +4016,13 @@ void mlx5e_set_tx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode)
 	params->tx_cq_moderation.usec =
 		MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC;
 
-	if (cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE)
+	if (cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE)
 		params->tx_cq_moderation.usec =
 			MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC_FROM_CQE;
 
 	MLX5E_SET_PFLAG(params, MLX5E_PFLAG_TX_CQE_BASED_MODER,
 			params->tx_cq_moderation.cq_period_mode ==
-				MLX5_CQ_PERIOD_MODE_START_FROM_CQE);
+				NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE);
 }
 
 void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode)
@@ -4034,17 +4034,17 @@ void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode)
 	params->rx_cq_moderation.usec =
 		MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC;
 
-	if (cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE)
+	if (cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE)
 		params->rx_cq_moderation.usec =
 			MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC_FROM_CQE;
 
-	if (params->rx_am_enabled)
+	if (params->rx_dim_enabled)
 		params->rx_cq_moderation =
-			mlx5e_am_get_def_profile(cq_period_mode);
+			net_dim_get_def_profile(cq_period_mode);
 
 	MLX5E_SET_PFLAG(params, MLX5E_PFLAG_RX_CQE_BASED_MODER,
 			params->rx_cq_moderation.cq_period_mode ==
-				MLX5_CQ_PERIOD_MODE_START_FROM_CQE);
+				NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE);
 }
 
 u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout)
@@ -4100,9 +4100,9 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
 
 	/* CQ moderation params */
 	cq_period_mode = MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ?
-			MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
-			MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
-	params->rx_am_enabled = MLX5_CAP_GEN(mdev, cq_moderation);
+			NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE :
+			NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+	params->rx_dim_enabled = MLX5_CAP_GEN(mdev, cq_moderation);
 	mlx5e_set_rx_cq_mode_params(params, cq_period_mode);
 	mlx5e_set_tx_cq_mode_params(params, cq_period_mode);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index c6a77f8..ccb038f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -877,8 +877,8 @@ static void mlx5e_build_rep_params(struct mlx5_core_dev *mdev,
 				   struct mlx5e_params *params)
 {
 	u8 cq_period_mode = MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ?
-					 MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
-					 MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
+					 NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE :
+					 NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 
 	params->log_sq_size = MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE;
 	params->rq_wq_type  = MLX5_WQ_TYPE_LINKED_LIST;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index 1849169..dae77a9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -79,7 +79,7 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
 		mlx5e_cq_arm(&c->sq[i].cq);
 
 	if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM))
-		mlx5e_rx_am(&c->rq.am,
+		net_dim(&c->rq.dim,
 			    c->rq.cq.event_ctr,
 			    c->rq.stats.packets,
 			    c->rq.stats.bytes);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
index ca05f4e..00b9ae3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
@@ -33,22 +33,22 @@
 
 #include "en.h"
 
-#define MLX5E_PARAMS_AM_NUM_PROFILES 5
+#define NET_DIM_PARAMS_NUM_PROFILES 5
 /* Adaptive moderation profiles */
-#define MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
-#define MLX5E_RX_AM_DEF_PROFILE_CQE 1
-#define MLX5E_RX_AM_DEF_PROFILE_EQE 1
-
-/* All profiles sizes must be MLX5E_PARAMS_AM_NUM_PROFILES */
-#define MLX5_AM_EQE_PROFILES { \
-	{1,   MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-	{8,   MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-	{64,  MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-	{128, MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-	{256, MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+#define NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
+#define NET_DIM_DEF_PROFILE_CQE 1
+#define NET_DIM_DEF_PROFILE_EQE 1
+
+/* All profiles sizes must be NET_PARAMS_DIM_NUM_PROFILES */
+#define NET_DIM_EQE_PROFILES { \
+	{1,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+	{8,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+	{64,  NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+	{128, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+	{256, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
 }
 
-#define MLX5_AM_CQE_PROFILES { \
+#define NET_DIM_CQE_PROFILES { \
 	{2,  256},             \
 	{8,  128},             \
 	{16, 64},              \
@@ -56,193 +56,193 @@
 	{64, 64}               \
 }
 
-static const struct mlx5e_cq_moder
-profile[MLX5_CQ_PERIOD_NUM_MODES][MLX5E_PARAMS_AM_NUM_PROFILES] = {
-	MLX5_AM_EQE_PROFILES,
-	MLX5_AM_CQE_PROFILES,
+static const struct net_dim_cq_moder
+profile[NET_DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
+	NET_DIM_EQE_PROFILES,
+	NET_DIM_CQE_PROFILES,
 };
 
-struct mlx5e_cq_moder mlx5e_am_get_profile(u8 cq_period_mode, int ix)
+struct net_dim_cq_moder net_dim_get_profile(u8 cq_period_mode, int ix)
 {
-	struct mlx5e_cq_moder cq_moder;
+	struct net_dim_cq_moder cq_moder;
 
 	cq_moder = profile[cq_period_mode][ix];
 	cq_moder.cq_period_mode = cq_period_mode;
 	return cq_moder;
 }
 
-struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode)
+struct net_dim_cq_moder net_dim_get_def_profile(u8 rx_cq_period_mode)
 {
 	int default_profile_ix;
 
-	if (rx_cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE)
-		default_profile_ix = MLX5E_RX_AM_DEF_PROFILE_CQE;
-	else /* MLX5_CQ_PERIOD_MODE_START_FROM_EQE */
-		default_profile_ix = MLX5E_RX_AM_DEF_PROFILE_EQE;
+	if (rx_cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE)
+		default_profile_ix = NET_DIM_DEF_PROFILE_CQE;
+	else /* NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE */
+		default_profile_ix = NET_DIM_DEF_PROFILE_EQE;
 
-	return mlx5e_am_get_profile(rx_cq_period_mode, default_profile_ix);
+	return net_dim_get_profile(rx_cq_period_mode, default_profile_ix);
 }
 
-static bool mlx5e_am_on_top(struct mlx5e_rx_am *am)
+static bool net_dim_on_top(struct net_dim *dim)
 {
-	switch (am->tune_state) {
-	case MLX5E_AM_PARKING_ON_TOP:
-	case MLX5E_AM_PARKING_TIRED:
+	switch (dim->tune_state) {
+	case NET_DIM_PARKING_ON_TOP:
+	case NET_DIM_PARKING_TIRED:
 		return true;
-	case MLX5E_AM_GOING_RIGHT:
-		return (am->steps_left > 1) && (am->steps_right == 1);
-	default: /* MLX5E_AM_GOING_LEFT */
-		return (am->steps_right > 1) && (am->steps_left == 1);
+	case NET_DIM_GOING_RIGHT:
+		return (dim->steps_left > 1) && (dim->steps_right == 1);
+	default: /* NET_DIM_GOING_LEFT */
+		return (dim->steps_right > 1) && (dim->steps_left == 1);
 	}
 }
 
-static void mlx5e_am_turn(struct mlx5e_rx_am *am)
+static void net_dim_turn(struct net_dim *dim)
 {
-	switch (am->tune_state) {
-	case MLX5E_AM_PARKING_ON_TOP:
-	case MLX5E_AM_PARKING_TIRED:
+	switch (dim->tune_state) {
+	case NET_DIM_PARKING_ON_TOP:
+	case NET_DIM_PARKING_TIRED:
 		break;
-	case MLX5E_AM_GOING_RIGHT:
-		am->tune_state = MLX5E_AM_GOING_LEFT;
-		am->steps_left = 0;
+	case NET_DIM_GOING_RIGHT:
+		dim->tune_state = NET_DIM_GOING_LEFT;
+		dim->steps_left = 0;
 		break;
-	case MLX5E_AM_GOING_LEFT:
-		am->tune_state = MLX5E_AM_GOING_RIGHT;
-		am->steps_right = 0;
+	case NET_DIM_GOING_LEFT:
+		dim->tune_state = NET_DIM_GOING_RIGHT;
+		dim->steps_right = 0;
 		break;
 	}
 }
 
-static int mlx5e_am_step(struct mlx5e_rx_am *am)
+static int net_dim_step(struct net_dim *dim)
 {
-	if (am->tired == (MLX5E_PARAMS_AM_NUM_PROFILES * 2))
-		return MLX5E_AM_TOO_TIRED;
+	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
+		return NET_DIM_TOO_TIRED;
 
-	switch (am->tune_state) {
-	case MLX5E_AM_PARKING_ON_TOP:
-	case MLX5E_AM_PARKING_TIRED:
+	switch (dim->tune_state) {
+	case NET_DIM_PARKING_ON_TOP:
+	case NET_DIM_PARKING_TIRED:
 		break;
-	case MLX5E_AM_GOING_RIGHT:
-		if (am->profile_ix == (MLX5E_PARAMS_AM_NUM_PROFILES - 1))
-			return MLX5E_AM_ON_EDGE;
-		am->profile_ix++;
-		am->steps_right++;
+	case NET_DIM_GOING_RIGHT:
+		if (dim->profile_ix == (NET_DIM_PARAMS_NUM_PROFILES - 1))
+			return NET_DIM_ON_EDGE;
+		dim->profile_ix++;
+		dim->steps_right++;
 		break;
-	case MLX5E_AM_GOING_LEFT:
-		if (am->profile_ix == 0)
-			return MLX5E_AM_ON_EDGE;
-		am->profile_ix--;
-		am->steps_left++;
+	case NET_DIM_GOING_LEFT:
+		if (dim->profile_ix == 0)
+			return NET_DIM_ON_EDGE;
+		dim->profile_ix--;
+		dim->steps_left++;
 		break;
 	}
 
-	am->tired++;
-	return MLX5E_AM_STEPPED;
+	dim->tired++;
+	return NET_DIM_STEPPED;
 }
 
-static void mlx5e_am_park_on_top(struct mlx5e_rx_am *am)
+static void net_dim_park_on_top(struct net_dim *dim)
 {
-	am->steps_right  = 0;
-	am->steps_left   = 0;
-	am->tired        = 0;
-	am->tune_state   = MLX5E_AM_PARKING_ON_TOP;
+	dim->steps_right  = 0;
+	dim->steps_left   = 0;
+	dim->tired        = 0;
+	dim->tune_state   = NET_DIM_PARKING_ON_TOP;
 }
 
-static void mlx5e_am_park_tired(struct mlx5e_rx_am *am)
+static void net_dim_park_tired(struct net_dim *dim)
 {
-	am->steps_right  = 0;
-	am->steps_left   = 0;
-	am->tune_state   = MLX5E_AM_PARKING_TIRED;
+	dim->steps_right  = 0;
+	dim->steps_left   = 0;
+	dim->tune_state   = NET_DIM_PARKING_TIRED;
 }
 
-static void mlx5e_am_exit_parking(struct mlx5e_rx_am *am)
+static void net_dim_exit_parking(struct net_dim *dim)
 {
-	am->tune_state = am->profile_ix ? MLX5E_AM_GOING_LEFT :
-					  MLX5E_AM_GOING_RIGHT;
-	mlx5e_am_step(am);
+	dim->tune_state = dim->profile_ix ? NET_DIM_GOING_LEFT :
+					  NET_DIM_GOING_RIGHT;
+	net_dim_step(dim);
 }
 
 #define IS_SIGNIFICANT_DIFF(val, ref) \
 	(((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */
 
-static int mlx5e_am_stats_compare(struct mlx5e_rx_am_stats *curr,
-				  struct mlx5e_rx_am_stats *prev)
+static int net_dim_stats_compare(struct net_dim_stats *curr,
+				   struct net_dim_stats *prev)
 {
 	if (!prev->bpms)
-		return curr->bpms ? MLX5E_AM_STATS_BETTER :
-				    MLX5E_AM_STATS_SAME;
+		return curr->bpms ? NET_DIM_STATS_BETTER :
+				    NET_DIM_STATS_SAME;
 
 	if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
-		return (curr->bpms > prev->bpms) ? MLX5E_AM_STATS_BETTER :
-						   MLX5E_AM_STATS_WORSE;
+		return (curr->bpms > prev->bpms) ? NET_DIM_STATS_BETTER :
+						   NET_DIM_STATS_WORSE;
 
 	if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
-		return (curr->ppms > prev->ppms) ? MLX5E_AM_STATS_BETTER :
-						   MLX5E_AM_STATS_WORSE;
+		return (curr->ppms > prev->ppms) ? NET_DIM_STATS_BETTER :
+						   NET_DIM_STATS_WORSE;
 
 	if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
-		return (curr->epms < prev->epms) ? MLX5E_AM_STATS_BETTER :
-						   MLX5E_AM_STATS_WORSE;
+		return (curr->epms < prev->epms) ? NET_DIM_STATS_BETTER :
+						   NET_DIM_STATS_WORSE;
 
-	return MLX5E_AM_STATS_SAME;
+	return NET_DIM_STATS_SAME;
 }
 
-static bool mlx5e_am_decision(struct mlx5e_rx_am_stats *curr_stats,
-			      struct mlx5e_rx_am *am)
+static bool net_dim_decision(struct net_dim_stats *curr_stats,
+			       struct net_dim *dim)
 {
-	int prev_state = am->tune_state;
-	int prev_ix = am->profile_ix;
+	int prev_state = dim->tune_state;
+	int prev_ix = dim->profile_ix;
 	int stats_res;
 	int step_res;
 
-	switch (am->tune_state) {
-	case MLX5E_AM_PARKING_ON_TOP:
-		stats_res = mlx5e_am_stats_compare(curr_stats, &am->prev_stats);
-		if (stats_res != MLX5E_AM_STATS_SAME)
-			mlx5e_am_exit_parking(am);
+	switch (dim->tune_state) {
+	case NET_DIM_PARKING_ON_TOP:
+		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
+		if (stats_res != NET_DIM_STATS_SAME)
+			net_dim_exit_parking(dim);
 		break;
 
-	case MLX5E_AM_PARKING_TIRED:
-		am->tired--;
-		if (!am->tired)
-			mlx5e_am_exit_parking(am);
+	case NET_DIM_PARKING_TIRED:
+		dim->tired--;
+		if (!dim->tired)
+			net_dim_exit_parking(dim);
 		break;
 
-	case MLX5E_AM_GOING_RIGHT:
-	case MLX5E_AM_GOING_LEFT:
-		stats_res = mlx5e_am_stats_compare(curr_stats, &am->prev_stats);
-		if (stats_res != MLX5E_AM_STATS_BETTER)
-			mlx5e_am_turn(am);
+	case NET_DIM_GOING_RIGHT:
+	case NET_DIM_GOING_LEFT:
+		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
+		if (stats_res != NET_DIM_STATS_BETTER)
+			net_dim_turn(dim);
 
-		if (mlx5e_am_on_top(am)) {
-			mlx5e_am_park_on_top(am);
+		if (net_dim_on_top(dim)) {
+			net_dim_park_on_top(dim);
 			break;
 		}
 
-		step_res = mlx5e_am_step(am);
+		step_res = net_dim_step(dim);
 		switch (step_res) {
-		case MLX5E_AM_ON_EDGE:
-			mlx5e_am_park_on_top(am);
+		case NET_DIM_ON_EDGE:
+			net_dim_park_on_top(dim);
 			break;
-		case MLX5E_AM_TOO_TIRED:
-			mlx5e_am_park_tired(am);
+		case NET_DIM_TOO_TIRED:
+			net_dim_park_tired(dim);
 			break;
 		}
 
 		break;
 	}
 
-	if ((prev_state     != MLX5E_AM_PARKING_ON_TOP) ||
-	    (am->tune_state != MLX5E_AM_PARKING_ON_TOP))
-		am->prev_stats = *curr_stats;
+	if ((prev_state     != NET_DIM_PARKING_ON_TOP) ||
+	    (dim->tune_state != NET_DIM_PARKING_ON_TOP))
+		dim->prev_stats = *curr_stats;
 
-	return am->profile_ix != prev_ix;
+	return dim->profile_ix != prev_ix;
 }
 
-static void mlx5e_am_sample(u16 event_ctr,
-			    u64 packets,
-			    u64 bytes,
-			    struct mlx5e_rx_am_sample *s)
+static void net_dim_sample(u16 event_ctr,
+			   u64 packets,
+			   u64 bytes,
+			   struct net_dim_sample *s)
 {
 	s->time	     = ktime_get();
 	s->pkt_ctr   = packets;
@@ -250,13 +250,13 @@ static void mlx5e_am_sample(u16 event_ctr,
 	s->event_ctr = event_ctr;
 }
 
-#define MLX5E_AM_NEVENTS 64
+#define NET_DIM_NEVENTS 64
 #define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
 #define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) & (BIT_ULL(bits) - 1))
 
-static void mlx5e_am_calc_stats(struct mlx5e_rx_am_sample *start,
-				struct mlx5e_rx_am_sample *end,
-				struct mlx5e_rx_am_stats *curr_stats)
+static void net_dim_calc_stats(struct net_dim_sample *start,
+			       struct net_dim_sample *end,
+			       struct net_dim_stats *curr_stats)
 {
 	/* u32 holds up to 71 minutes, should be enough */
 	u32 delta_us = ktime_us_delta(end->time, start->time);
@@ -269,39 +269,39 @@ static void mlx5e_am_calc_stats(struct mlx5e_rx_am_sample *start,
 
 	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
 	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
-	curr_stats->epms = DIV_ROUND_UP(MLX5E_AM_NEVENTS * USEC_PER_MSEC,
+	curr_stats->epms = DIV_ROUND_UP(NET_DIM_NEVENTS * USEC_PER_MSEC,
 					delta_us);
 }
 
-void mlx5e_rx_am(struct mlx5e_rx_am *am,
-		 u16 event_ctr,
-		 u64 packets,
-		 u64 bytes)
+void net_dim(struct net_dim *dim,
+	     u16 event_ctr,
+	     u64 packets,
+	     u64 bytes)
 {
-	struct mlx5e_rx_am_sample end_sample;
-	struct mlx5e_rx_am_stats curr_stats;
+	struct net_dim_sample end_sample;
+	struct net_dim_stats curr_stats;
 	u16 nevents;
 
-	switch (am->state) {
-	case MLX5E_AM_MEASURE_IN_PROGRESS:
+	switch (dim->state) {
+	case NET_DIM_MEASURE_IN_PROGRESS:
 		nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
-				  am->start_sample.event_ctr);
-		if (nevents < MLX5E_AM_NEVENTS)
+				  dim->start_sample.event_ctr);
+		if (nevents < NET_DIM_NEVENTS)
 			break;
-		mlx5e_am_sample(event_ctr, packets, bytes, &end_sample);
-		mlx5e_am_calc_stats(&am->start_sample, &end_sample,
+		net_dim_sample(event_ctr, packets, bytes, &end_sample);
+		net_dim_calc_stats(&dim->start_sample, &end_sample,
 				    &curr_stats);
-		if (mlx5e_am_decision(&curr_stats, am)) {
-			am->state = MLX5E_AM_APPLY_NEW_PROFILE;
-			schedule_work(&am->work);
+		if (net_dim_decision(&curr_stats, dim)) {
+			dim->state = NET_DIM_APPLY_NEW_PROFILE;
+			schedule_work(&dim->work);
 			break;
 		}
 		/* fall through */
-	case MLX5E_AM_START_MEASURE:
-		mlx5e_am_sample(event_ctr, packets, bytes, &am->start_sample);
-		am->state = MLX5E_AM_MEASURE_IN_PROGRESS;
+	case NET_DIM_START_MEASURE:
+		net_dim_sample(event_ctr, packets, bytes, &dim->start_sample);
+		dim->state = NET_DIM_MEASURE_IN_PROGRESS;
 		break;
-	case MLX5E_AM_APPLY_NEW_PROFILE:
+	case NET_DIM_APPLY_NEW_PROFILE:
 		break;
 	}
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.h b/drivers/net/ethernet/mellanox/mlx5/core/net_dim.h
index 5ce8e54..a775c12 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/net_dim.h
@@ -31,32 +31,32 @@
  * SOFTWARE.
 */
 
-#ifndef MLX5_AM_H
-#define MLX5_AM_H
+#ifndef NET_DIM_H
+#define NET_DIM_H
 
-struct mlx5e_cq_moder {
+struct net_dim_cq_moder {
 	u16 usec;
 	u16 pkts;
 	u8 cq_period_mode;
 };
 
-struct mlx5e_rx_am_sample {
+struct net_dim_sample {
 	ktime_t time;
 	u32     pkt_ctr;
 	u32     byte_ctr;
 	u16     event_ctr;
 };
 
-struct mlx5e_rx_am_stats {
+struct net_dim_stats {
 	int ppms; /* packets per msec */
 	int bpms; /* bytes per msec */
 	int epms; /* events per msec */
 };
 
-struct mlx5e_rx_am { /* Adaptive Moderation */
+struct net_dim { /* Adaptive Moderation */
 	u8                                      state;
-	struct mlx5e_rx_am_stats                prev_stats;
-	struct mlx5e_rx_am_sample               start_sample;
+	struct net_dim_stats                    prev_stats;
+	struct net_dim_sample                   start_sample;
 	struct work_struct                      work;
 	u8                                      profile_ix;
 	u8                                      mode;
@@ -67,43 +67,42 @@ struct mlx5e_rx_am { /* Adaptive Moderation */
 };
 
 enum {
-	MLX5_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
-	MLX5_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
-	MLX5_CQ_PERIOD_NUM_MODES
+	NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
+	NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
+	NET_DIM_CQ_PERIOD_NUM_MODES
 };
 
 /* Adaptive moderation logic */
 enum {
-	MLX5E_AM_START_MEASURE,
-	MLX5E_AM_MEASURE_IN_PROGRESS,
-	MLX5E_AM_APPLY_NEW_PROFILE,
+	NET_DIM_START_MEASURE,
+	NET_DIM_MEASURE_IN_PROGRESS,
+	NET_DIM_APPLY_NEW_PROFILE,
 };
 
 enum {
-	MLX5E_AM_PARKING_ON_TOP,
-	MLX5E_AM_PARKING_TIRED,
-	MLX5E_AM_GOING_RIGHT,
-	MLX5E_AM_GOING_LEFT,
+	NET_DIM_PARKING_ON_TOP,
+	NET_DIM_PARKING_TIRED,
+	NET_DIM_GOING_RIGHT,
+	NET_DIM_GOING_LEFT,
 };
 
 enum {
-	MLX5E_AM_STATS_WORSE,
-	MLX5E_AM_STATS_SAME,
-	MLX5E_AM_STATS_BETTER,
+	NET_DIM_STATS_WORSE,
+	NET_DIM_STATS_SAME,
+	NET_DIM_STATS_BETTER,
 };
 
 enum {
-	MLX5E_AM_STEPPED,
-	MLX5E_AM_TOO_TIRED,
-	MLX5E_AM_ON_EDGE,
+	NET_DIM_STEPPED,
+	NET_DIM_TOO_TIRED,
+	NET_DIM_ON_EDGE,
 };
 
-void mlx5e_rx_am(struct mlx5e_rx_am *am,
-		 u16 event_ctr,
-		 u64 packets,
-		 u64 bytes);
-void mlx5e_rx_am_work(struct work_struct *work);
-struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
-struct mlx5e_cq_moder mlx5e_am_get_profile(u8 cq_period_mode, int ix);
+void net_dim(struct net_dim *dim,
+	     u16 event_ctr,
+	     u64 packets,
+	     u64 bytes);
+struct net_dim_cq_moder net_dim_get_def_profile(u8 rx_cq_period_mode);
+struct net_dim_cq_moder net_dim_get_profile(u8 cq_period_mode, int ix);
 
-#endif /* MLX5_AM_H */
+#endif /* NET_DIM_H */
-- 
2.7.4

^ permalink raw reply related

* [net-next 07/10] net: move dynamic interrupt coalescing code to include/linux
From: Andy Gospodarek @ 2018-01-04 20:21 UTC (permalink / raw)
  To: netdev; +Cc: mchan, talgi, ogerlitz, Andy Gospodarek
In-Reply-To: <1515097290-17470-1-git-send-email-andy@greyhouse.net>

From: Andy Gospodarek <gospo@broadcom.com>

This move allows drivers to add private structure elements to track the
number of packets, bytes, and interrupts events per ring.  A driver
also defines a workqueue handler to act on this collected data once per
poll and modify the coalescing parameters per ring.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.c  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h  | 105 ------
 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c | 307 ------------------
 include/linux/net_dim.h                           | 376 ++++++++++++++++++++++
 6 files changed, 379 insertions(+), 415 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
 create mode 100644 include/linux/net_dim.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index b46b6de2..c805769 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -15,7 +15,7 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o fpga/conn.o fpga/sdk.o \
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
 		en_tx.o en_rx.o en_dim.o en_txrx.o en_stats.o vxlan.o \
-		en_arfs.o en_fs_ethtool.o en_selftest.o net_dim.o
+		en_arfs.o en_fs_ethtool.o en_selftest.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index da2d5e7..41e6783 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -46,11 +46,10 @@
 #include <linux/mlx5/transobj.h>
 #include <linux/rhashtable.h>
 #include <net/switchdev.h>
+#include <linux/net_dim.h>
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
-#include "en_dim.h"
-#include "net_dim.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index f620325..2b89951 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -30,6 +30,7 @@
  * SOFTWARE.
  */
 
+#include <linux/net_dim.h>
 #include "en.h"
 
 void mlx5e_rx_dim_work(struct work_struct *work)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
deleted file mode 100644
index 21219de..0000000
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.h
+++ /dev/null
@@ -1,105 +0,0 @@
-/*
- * Copyright (c) 2013-2015, Mellanox Technologies, Ltd.  All rights reserved.
- * Copyright (c) 2017, Broadcom Limited
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
-*/
-
-#ifndef MLX5_AM_H
-#define MLX5_AM_H
-
-struct mlx5e_cq_moder {
-	u16 usec;
-	u16 pkts;
-	u8 cq_period_mode;
-};
-
-struct mlx5e_rx_dim_sample {
-	ktime_t time;
-	u32     pkt_ctr;
-	u32     byte_ctr;
-	u16     event_ctr;
-};
-
-struct mlx5e_rx_dim_stats {
-	int ppms; /* packets per msec */
-	int bpms; /* bytes per msec */
-	int epms; /* events per msec */
-};
-
-struct mlx5e_rx_dim { /* Adaptive Moderation */
-	u8                                      state;
-	struct mlx5e_rx_dim_stats               prev_stats;
-	struct mlx5e_rx_dim_sample              start_sample;
-	struct work_struct                      work;
-	u8                                      profile_ix;
-	u8                                      mode;
-	u8                                      tune_state;
-	u8                                      steps_right;
-	u8                                      steps_left;
-	u8                                      tired;
-};
-
-enum {
-	MLX5_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
-	MLX5_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
-	MLX5_CQ_PERIOD_NUM_MODES
-};
-
-/* Adaptive moderation logic */
-enum {
-	MLX5E_AM_START_MEASURE,
-	MLX5E_AM_MEASURE_IN_PROGRESS,
-	MLX5E_AM_APPLY_NEW_PROFILE,
-};
-
-enum {
-	MLX5E_AM_PARKING_ON_TOP,
-	MLX5E_AM_PARKING_TIRED,
-	MLX5E_AM_GOING_RIGHT,
-	MLX5E_AM_GOING_LEFT,
-};
-
-enum {
-	MLX5E_AM_STATS_WORSE,
-	MLX5E_AM_STATS_SAME,
-	MLX5E_AM_STATS_BETTER,
-};
-
-enum {
-	MLX5E_AM_STEPPED,
-	MLX5E_AM_TOO_TIRED,
-	MLX5E_AM_ON_EDGE,
-};
-
-void mlx5e_rx_dim_work(struct work_struct *work);
-struct mlx5e_cq_moder mlx5e_dim_get_def_profile(u8 rx_cq_period_mode);
-struct mlx5e_cq_moder mlx5e_dim_get_profile(u8 cq_period_mode, int ix);
-
-#endif /* MLX5_AM_H */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
deleted file mode 100644
index 00b9ae3..0000000
--- a/drivers/net/ethernet/mellanox/mlx5/core/net_dim.c
+++ /dev/null
@@ -1,307 +0,0 @@
-/*
- * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
- * Copyright (c) 2017, Broadcom Limiited. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-
-#include "en.h"
-
-#define NET_DIM_PARAMS_NUM_PROFILES 5
-/* Adaptive moderation profiles */
-#define NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
-#define NET_DIM_DEF_PROFILE_CQE 1
-#define NET_DIM_DEF_PROFILE_EQE 1
-
-/* All profiles sizes must be NET_PARAMS_DIM_NUM_PROFILES */
-#define NET_DIM_EQE_PROFILES { \
-	{1,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-	{8,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-	{64,  NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-	{128, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-	{256, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-}
-
-#define NET_DIM_CQE_PROFILES { \
-	{2,  256},             \
-	{8,  128},             \
-	{16, 64},              \
-	{32, 64},              \
-	{64, 64}               \
-}
-
-static const struct net_dim_cq_moder
-profile[NET_DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
-	NET_DIM_EQE_PROFILES,
-	NET_DIM_CQE_PROFILES,
-};
-
-struct net_dim_cq_moder net_dim_get_profile(u8 cq_period_mode, int ix)
-{
-	struct net_dim_cq_moder cq_moder;
-
-	cq_moder = profile[cq_period_mode][ix];
-	cq_moder.cq_period_mode = cq_period_mode;
-	return cq_moder;
-}
-
-struct net_dim_cq_moder net_dim_get_def_profile(u8 rx_cq_period_mode)
-{
-	int default_profile_ix;
-
-	if (rx_cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE)
-		default_profile_ix = NET_DIM_DEF_PROFILE_CQE;
-	else /* NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE */
-		default_profile_ix = NET_DIM_DEF_PROFILE_EQE;
-
-	return net_dim_get_profile(rx_cq_period_mode, default_profile_ix);
-}
-
-static bool net_dim_on_top(struct net_dim *dim)
-{
-	switch (dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
-		return true;
-	case NET_DIM_GOING_RIGHT:
-		return (dim->steps_left > 1) && (dim->steps_right == 1);
-	default: /* NET_DIM_GOING_LEFT */
-		return (dim->steps_right > 1) && (dim->steps_left == 1);
-	}
-}
-
-static void net_dim_turn(struct net_dim *dim)
-{
-	switch (dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
-		break;
-	case NET_DIM_GOING_RIGHT:
-		dim->tune_state = NET_DIM_GOING_LEFT;
-		dim->steps_left = 0;
-		break;
-	case NET_DIM_GOING_LEFT:
-		dim->tune_state = NET_DIM_GOING_RIGHT;
-		dim->steps_right = 0;
-		break;
-	}
-}
-
-static int net_dim_step(struct net_dim *dim)
-{
-	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
-		return NET_DIM_TOO_TIRED;
-
-	switch (dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-	case NET_DIM_PARKING_TIRED:
-		break;
-	case NET_DIM_GOING_RIGHT:
-		if (dim->profile_ix == (NET_DIM_PARAMS_NUM_PROFILES - 1))
-			return NET_DIM_ON_EDGE;
-		dim->profile_ix++;
-		dim->steps_right++;
-		break;
-	case NET_DIM_GOING_LEFT:
-		if (dim->profile_ix == 0)
-			return NET_DIM_ON_EDGE;
-		dim->profile_ix--;
-		dim->steps_left++;
-		break;
-	}
-
-	dim->tired++;
-	return NET_DIM_STEPPED;
-}
-
-static void net_dim_park_on_top(struct net_dim *dim)
-{
-	dim->steps_right  = 0;
-	dim->steps_left   = 0;
-	dim->tired        = 0;
-	dim->tune_state   = NET_DIM_PARKING_ON_TOP;
-}
-
-static void net_dim_park_tired(struct net_dim *dim)
-{
-	dim->steps_right  = 0;
-	dim->steps_left   = 0;
-	dim->tune_state   = NET_DIM_PARKING_TIRED;
-}
-
-static void net_dim_exit_parking(struct net_dim *dim)
-{
-	dim->tune_state = dim->profile_ix ? NET_DIM_GOING_LEFT :
-					  NET_DIM_GOING_RIGHT;
-	net_dim_step(dim);
-}
-
-#define IS_SIGNIFICANT_DIFF(val, ref) \
-	(((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */
-
-static int net_dim_stats_compare(struct net_dim_stats *curr,
-				   struct net_dim_stats *prev)
-{
-	if (!prev->bpms)
-		return curr->bpms ? NET_DIM_STATS_BETTER :
-				    NET_DIM_STATS_SAME;
-
-	if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
-		return (curr->bpms > prev->bpms) ? NET_DIM_STATS_BETTER :
-						   NET_DIM_STATS_WORSE;
-
-	if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
-		return (curr->ppms > prev->ppms) ? NET_DIM_STATS_BETTER :
-						   NET_DIM_STATS_WORSE;
-
-	if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
-		return (curr->epms < prev->epms) ? NET_DIM_STATS_BETTER :
-						   NET_DIM_STATS_WORSE;
-
-	return NET_DIM_STATS_SAME;
-}
-
-static bool net_dim_decision(struct net_dim_stats *curr_stats,
-			       struct net_dim *dim)
-{
-	int prev_state = dim->tune_state;
-	int prev_ix = dim->profile_ix;
-	int stats_res;
-	int step_res;
-
-	switch (dim->tune_state) {
-	case NET_DIM_PARKING_ON_TOP:
-		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
-		if (stats_res != NET_DIM_STATS_SAME)
-			net_dim_exit_parking(dim);
-		break;
-
-	case NET_DIM_PARKING_TIRED:
-		dim->tired--;
-		if (!dim->tired)
-			net_dim_exit_parking(dim);
-		break;
-
-	case NET_DIM_GOING_RIGHT:
-	case NET_DIM_GOING_LEFT:
-		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
-		if (stats_res != NET_DIM_STATS_BETTER)
-			net_dim_turn(dim);
-
-		if (net_dim_on_top(dim)) {
-			net_dim_park_on_top(dim);
-			break;
-		}
-
-		step_res = net_dim_step(dim);
-		switch (step_res) {
-		case NET_DIM_ON_EDGE:
-			net_dim_park_on_top(dim);
-			break;
-		case NET_DIM_TOO_TIRED:
-			net_dim_park_tired(dim);
-			break;
-		}
-
-		break;
-	}
-
-	if ((prev_state     != NET_DIM_PARKING_ON_TOP) ||
-	    (dim->tune_state != NET_DIM_PARKING_ON_TOP))
-		dim->prev_stats = *curr_stats;
-
-	return dim->profile_ix != prev_ix;
-}
-
-static void net_dim_sample(u16 event_ctr,
-			   u64 packets,
-			   u64 bytes,
-			   struct net_dim_sample *s)
-{
-	s->time	     = ktime_get();
-	s->pkt_ctr   = packets;
-	s->byte_ctr  = bytes;
-	s->event_ctr = event_ctr;
-}
-
-#define NET_DIM_NEVENTS 64
-#define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
-#define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) & (BIT_ULL(bits) - 1))
-
-static void net_dim_calc_stats(struct net_dim_sample *start,
-			       struct net_dim_sample *end,
-			       struct net_dim_stats *curr_stats)
-{
-	/* u32 holds up to 71 minutes, should be enough */
-	u32 delta_us = ktime_us_delta(end->time, start->time);
-	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
-	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
-			     start->byte_ctr);
-
-	if (!delta_us)
-		return;
-
-	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
-	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
-	curr_stats->epms = DIV_ROUND_UP(NET_DIM_NEVENTS * USEC_PER_MSEC,
-					delta_us);
-}
-
-void net_dim(struct net_dim *dim,
-	     u16 event_ctr,
-	     u64 packets,
-	     u64 bytes)
-{
-	struct net_dim_sample end_sample;
-	struct net_dim_stats curr_stats;
-	u16 nevents;
-
-	switch (dim->state) {
-	case NET_DIM_MEASURE_IN_PROGRESS:
-		nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
-				  dim->start_sample.event_ctr);
-		if (nevents < NET_DIM_NEVENTS)
-			break;
-		net_dim_sample(event_ctr, packets, bytes, &end_sample);
-		net_dim_calc_stats(&dim->start_sample, &end_sample,
-				    &curr_stats);
-		if (net_dim_decision(&curr_stats, dim)) {
-			dim->state = NET_DIM_APPLY_NEW_PROFILE;
-			schedule_work(&dim->work);
-			break;
-		}
-		/* fall through */
-	case NET_DIM_START_MEASURE:
-		net_dim_sample(event_ctr, packets, bytes, &dim->start_sample);
-		dim->state = NET_DIM_MEASURE_IN_PROGRESS;
-		break;
-	case NET_DIM_APPLY_NEW_PROFILE:
-		break;
-	}
-}
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
new file mode 100644
index 0000000..bb99073
--- /dev/null
+++ b/include/linux/net_dim.h
@@ -0,0 +1,376 @@
+/*
+ * Copyright (c) 2013-2015, Mellanox Technologies, Ltd.  All rights reserved.
+ * Copyright (c) 2017, Broadcom Limited
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+*/
+
+#ifndef NET_DIM_H
+#define NET_DIM_H
+
+#include <linux/module.h>
+
+struct net_dim_cq_moder {
+	u16 usec;
+	u16 pkts;
+	u8 cq_period_mode;
+};
+
+struct net_dim_sample {
+	ktime_t time;
+	u32     pkt_ctr;
+	u32     byte_ctr;
+	u16     event_ctr;
+};
+
+struct net_dim_stats {
+	int ppms; /* packets per msec */
+	int bpms; /* bytes per msec */
+	int epms; /* events per msec */
+};
+
+struct net_dim { /* Adaptive Moderation */
+	u8                                      state;
+	struct net_dim_stats                    prev_stats;
+	struct net_dim_sample                   start_sample;
+	struct work_struct                      work;
+	u8                                      profile_ix;
+	u8                                      mode;
+	u8                                      tune_state;
+	u8                                      steps_right;
+	u8                                      steps_left;
+	u8                                      tired;
+};
+
+enum {
+	NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
+	NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
+	NET_DIM_CQ_PERIOD_NUM_MODES
+};
+
+/* Adaptive moderation logic */
+enum {
+	NET_DIM_START_MEASURE,
+	NET_DIM_MEASURE_IN_PROGRESS,
+	NET_DIM_APPLY_NEW_PROFILE,
+};
+
+enum {
+	NET_DIM_PARKING_ON_TOP,
+	NET_DIM_PARKING_TIRED,
+	NET_DIM_GOING_RIGHT,
+	NET_DIM_GOING_LEFT,
+};
+
+enum {
+	NET_DIM_STATS_WORSE,
+	NET_DIM_STATS_SAME,
+	NET_DIM_STATS_BETTER,
+};
+
+enum {
+	NET_DIM_STEPPED,
+	NET_DIM_TOO_TIRED,
+	NET_DIM_ON_EDGE,
+};
+
+#define NET_DIM_PARAMS_NUM_PROFILES 5
+/* Adaptive moderation profiles */
+#define NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
+#define NET_DIM_DEF_PROFILE_CQE 1
+#define NET_DIM_DEF_PROFILE_EQE 1
+
+/* All profiles sizes must be NET_PARAMS_DIM_NUM_PROFILES */
+#define NET_DIM_EQE_PROFILES { \
+	{1,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+	{8,   NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+	{64,  NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+	{128, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+	{256, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
+}
+
+#define NET_DIM_CQE_PROFILES { \
+	{2,  256},             \
+	{8,  128},             \
+	{16, 64},              \
+	{32, 64},              \
+	{64, 64}               \
+}
+
+static const struct net_dim_cq_moder
+profile[NET_DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
+	NET_DIM_EQE_PROFILES,
+	NET_DIM_CQE_PROFILES,
+};
+
+static inline struct net_dim_cq_moder net_dim_get_profile(u8 cq_period_mode, int ix)
+{
+	struct net_dim_cq_moder cq_moder;
+
+	cq_moder = profile[cq_period_mode][ix];
+	cq_moder.cq_period_mode = cq_period_mode;
+	return cq_moder;
+}
+
+static inline struct net_dim_cq_moder net_dim_get_def_profile(u8 rx_cq_period_mode)
+{
+	int default_profile_ix;
+
+	if (rx_cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE)
+		default_profile_ix = NET_DIM_DEF_PROFILE_CQE;
+	else /* NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE */
+		default_profile_ix = NET_DIM_DEF_PROFILE_EQE;
+
+	return net_dim_get_profile(rx_cq_period_mode, default_profile_ix);
+}
+
+static inline bool net_dim_on_top(struct net_dim *dim)
+{
+	switch (dim->tune_state) {
+	case NET_DIM_PARKING_ON_TOP:
+	case NET_DIM_PARKING_TIRED:
+		return true;
+	case NET_DIM_GOING_RIGHT:
+		return (dim->steps_left > 1) && (dim->steps_right == 1);
+	default: /* NET_DIM_GOING_LEFT */
+		return (dim->steps_right > 1) && (dim->steps_left == 1);
+	}
+}
+
+static inline void net_dim_turn(struct net_dim *dim)
+{
+	switch (dim->tune_state) {
+	case NET_DIM_PARKING_ON_TOP:
+	case NET_DIM_PARKING_TIRED:
+		break;
+	case NET_DIM_GOING_RIGHT:
+		dim->tune_state = NET_DIM_GOING_LEFT;
+		dim->steps_left = 0;
+		break;
+	case NET_DIM_GOING_LEFT:
+		dim->tune_state = NET_DIM_GOING_RIGHT;
+		dim->steps_right = 0;
+		break;
+	}
+}
+
+static inline int net_dim_step(struct net_dim *dim)
+{
+	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
+		return NET_DIM_TOO_TIRED;
+
+	switch (dim->tune_state) {
+	case NET_DIM_PARKING_ON_TOP:
+	case NET_DIM_PARKING_TIRED:
+		break;
+	case NET_DIM_GOING_RIGHT:
+		if (dim->profile_ix == (NET_DIM_PARAMS_NUM_PROFILES - 1))
+			return NET_DIM_ON_EDGE;
+		dim->profile_ix++;
+		dim->steps_right++;
+		break;
+	case NET_DIM_GOING_LEFT:
+		if (dim->profile_ix == 0)
+			return NET_DIM_ON_EDGE;
+		dim->profile_ix--;
+		dim->steps_left++;
+		break;
+	}
+
+	dim->tired++;
+	return NET_DIM_STEPPED;
+}
+
+static inline void net_dim_park_on_top(struct net_dim *dim)
+{
+	dim->steps_right  = 0;
+	dim->steps_left   = 0;
+	dim->tired        = 0;
+	dim->tune_state   = NET_DIM_PARKING_ON_TOP;
+}
+
+static inline void net_dim_park_tired(struct net_dim *dim)
+{
+	dim->steps_right  = 0;
+	dim->steps_left   = 0;
+	dim->tune_state   = NET_DIM_PARKING_TIRED;
+}
+
+static inline void net_dim_exit_parking(struct net_dim *dim)
+{
+	dim->tune_state = dim->profile_ix ? NET_DIM_GOING_LEFT :
+					  NET_DIM_GOING_RIGHT;
+	net_dim_step(dim);
+}
+
+#define IS_SIGNIFICANT_DIFF(val, ref) \
+	(((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */
+
+static inline int net_dim_stats_compare(struct net_dim_stats *curr,
+				        struct net_dim_stats *prev)
+{
+	if (!prev->bpms)
+		return curr->bpms ? NET_DIM_STATS_BETTER :
+				    NET_DIM_STATS_SAME;
+
+	if (IS_SIGNIFICANT_DIFF(curr->bpms, prev->bpms))
+		return (curr->bpms > prev->bpms) ? NET_DIM_STATS_BETTER :
+						   NET_DIM_STATS_WORSE;
+
+	if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
+		return (curr->ppms > prev->ppms) ? NET_DIM_STATS_BETTER :
+						   NET_DIM_STATS_WORSE;
+
+	if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
+		return (curr->epms < prev->epms) ? NET_DIM_STATS_BETTER :
+						   NET_DIM_STATS_WORSE;
+
+	return NET_DIM_STATS_SAME;
+}
+
+static inline bool net_dim_decision(struct net_dim_stats *curr_stats,
+				    struct net_dim *dim)
+{
+	int prev_state = dim->tune_state;
+	int prev_ix = dim->profile_ix;
+	int stats_res;
+	int step_res;
+
+	switch (dim->tune_state) {
+	case NET_DIM_PARKING_ON_TOP:
+		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
+		if (stats_res != NET_DIM_STATS_SAME)
+			net_dim_exit_parking(dim);
+		break;
+
+	case NET_DIM_PARKING_TIRED:
+		dim->tired--;
+		if (!dim->tired)
+			net_dim_exit_parking(dim);
+		break;
+
+	case NET_DIM_GOING_RIGHT:
+	case NET_DIM_GOING_LEFT:
+		stats_res = net_dim_stats_compare(curr_stats, &dim->prev_stats);
+		if (stats_res != NET_DIM_STATS_BETTER)
+			net_dim_turn(dim);
+
+		if (net_dim_on_top(dim)) {
+			net_dim_park_on_top(dim);
+			break;
+		}
+
+		step_res = net_dim_step(dim);
+		switch (step_res) {
+		case NET_DIM_ON_EDGE:
+			net_dim_park_on_top(dim);
+			break;
+		case NET_DIM_TOO_TIRED:
+			net_dim_park_tired(dim);
+			break;
+		}
+
+		break;
+	}
+
+	if ((prev_state     != NET_DIM_PARKING_ON_TOP) ||
+	    (dim->tune_state != NET_DIM_PARKING_ON_TOP))
+		dim->prev_stats = *curr_stats;
+
+	return dim->profile_ix != prev_ix;
+}
+
+static inline void net_dim_sample(u16 event_ctr,
+				  u64 packets,
+				  u64 bytes,
+				  struct net_dim_sample *s)
+{
+	s->time	     = ktime_get();
+	s->pkt_ctr   = packets;
+	s->byte_ctr  = bytes;
+	s->event_ctr = event_ctr;
+}
+
+#define NET_DIM_NEVENTS 64
+#define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
+#define BIT_GAP(bits, end, start) ((((end) - (start)) + BIT_ULL(bits)) & (BIT_ULL(bits) - 1))
+
+static inline void net_dim_calc_stats(struct net_dim_sample *start,
+				      struct net_dim_sample *end,
+				      struct net_dim_stats *curr_stats)
+{
+	/* u32 holds up to 71 minutes, should be enough */
+	u32 delta_us = ktime_us_delta(end->time, start->time);
+	u32 npkts = BIT_GAP(BITS_PER_TYPE(u32), end->pkt_ctr, start->pkt_ctr);
+	u32 nbytes = BIT_GAP(BITS_PER_TYPE(u32), end->byte_ctr,
+			     start->byte_ctr);
+
+	if (!delta_us)
+		return;
+
+	curr_stats->ppms = DIV_ROUND_UP(npkts * USEC_PER_MSEC, delta_us);
+	curr_stats->bpms = DIV_ROUND_UP(nbytes * USEC_PER_MSEC, delta_us);
+	curr_stats->epms = DIV_ROUND_UP(NET_DIM_NEVENTS * USEC_PER_MSEC,
+					delta_us);
+}
+
+static inline void net_dim(struct net_dim *dim,
+			   u16 event_ctr,
+			   u64 packets,
+			   u64 bytes)
+{
+	struct net_dim_sample end_sample;
+	struct net_dim_stats curr_stats;
+	u16 nevents;
+
+	switch (dim->state) {
+	case NET_DIM_MEASURE_IN_PROGRESS:
+		nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
+				  dim->start_sample.event_ctr);
+		if (nevents < NET_DIM_NEVENTS)
+			break;
+		net_dim_sample(event_ctr, packets, bytes, &end_sample);
+		net_dim_calc_stats(&dim->start_sample, &end_sample,
+				   &curr_stats);
+		if (net_dim_decision(&curr_stats, dim)) {
+			dim->state = NET_DIM_APPLY_NEW_PROFILE;
+			schedule_work(&dim->work);
+			break;
+		}
+		/* fall through */
+	case NET_DIM_START_MEASURE:
+		net_dim_sample(event_ctr, packets, bytes, &dim->start_sample);
+		dim->state = NET_DIM_MEASURE_IN_PROGRESS;
+		break;
+	case NET_DIM_APPLY_NEW_PROFILE:
+		break;
+	}
+}
+
+#endif /* NET_DIM_H */
-- 
2.7.4

^ permalink raw reply related

* [net-next 08/10] net/dim: use struct net_dim_sample as arg to net_dim
From: Andy Gospodarek @ 2018-01-04 20:21 UTC (permalink / raw)
  To: netdev; +Cc: mchan, talgi, ogerlitz, Andy Gospodarek
In-Reply-To: <1515097290-17470-1-git-send-email-andy@greyhouse.net>

From: Andy Gospodarek <gospo@broadcom.com>

Simplify the arguments net_dim() by formatting them into a struct
net_dim_sample before calling the function.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Suggested-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 13 ++++++++-----
 include/linux/net_dim.h                           | 10 +++-------
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index dae77a9..f292bb3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -78,11 +78,14 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
 	for (i = 0; i < c->num_tc; i++)
 		mlx5e_cq_arm(&c->sq[i].cq);
 
-	if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM))
-		net_dim(&c->rq.dim,
-			    c->rq.cq.event_ctr,
-			    c->rq.stats.packets,
-			    c->rq.stats.bytes);
+	if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM)) {
+		struct net_dim_sample dim_sample;
+		net_dim_sample(c->rq.cq.event_ctr,
+			       c->rq.stats.packets,
+			       c->rq.stats.bytes,
+			       &dim_sample);
+		net_dim(&c->rq.dim, dim_sample);
+	}
 
 	mlx5e_cq_arm(&c->rq.cq);
 	mlx5e_cq_arm(&c->icosq.cq);
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index bb99073..2cceefa 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -341,21 +341,18 @@ static inline void net_dim_calc_stats(struct net_dim_sample *start,
 }
 
 static inline void net_dim(struct net_dim *dim,
-			   u16 event_ctr,
-			   u64 packets,
-			   u64 bytes)
+			   struct net_dim_sample end_sample)
 {
-	struct net_dim_sample end_sample;
 	struct net_dim_stats curr_stats;
 	u16 nevents;
 
 	switch (dim->state) {
 	case NET_DIM_MEASURE_IN_PROGRESS:
-		nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
+		nevents = BIT_GAP(BITS_PER_TYPE(u16),
+				  end_sample.event_ctr,
 				  dim->start_sample.event_ctr);
 		if (nevents < NET_DIM_NEVENTS)
 			break;
-		net_dim_sample(event_ctr, packets, bytes, &end_sample);
 		net_dim_calc_stats(&dim->start_sample, &end_sample,
 				   &curr_stats);
 		if (net_dim_decision(&curr_stats, dim)) {
@@ -365,7 +362,6 @@ static inline void net_dim(struct net_dim *dim,
 		}
 		/* fall through */
 	case NET_DIM_START_MEASURE:
-		net_dim_sample(event_ctr, packets, bytes, &dim->start_sample);
 		dim->state = NET_DIM_MEASURE_IN_PROGRESS;
 		break;
 	case NET_DIM_APPLY_NEW_PROFILE:
-- 
2.7.4

^ permalink raw reply related

* [net-next 09/10] bnxt_en: add support for software dynamic interrupt moderation
From: Andy Gospodarek @ 2018-01-04 20:21 UTC (permalink / raw)
  To: netdev; +Cc: mchan, talgi, ogerlitz, Andy Gospodarek
In-Reply-To: <1515097290-17470-1-git-send-email-andy@greyhouse.net>

From: Andy Gospodarek <gospo@broadcom.com>

This implements the changes needed for the bnxt_en driver to add support
for dynamic interrupt moderation per ring.

This does add additional counters in the receive path, but testing shows
that any additional instructions are offset by throughput gain when the
default configuration is for low latency.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/Makefile       |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 52 +++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h         | 34 ++++++++++-----
 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c     | 32 ++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 12 ++++++
 5 files changed, 120 insertions(+), 12 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c

diff --git a/drivers/net/ethernet/broadcom/bnxt/Makefile b/drivers/net/ethernet/broadcom/bnxt/Makefile
index 59c8ec9..7c560d5 100644
--- a/drivers/net/ethernet/broadcom/bnxt/Makefile
+++ b/drivers/net/ethernet/broadcom/bnxt/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_BNXT) += bnxt_en.o
 
-bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o bnxt_xdp.o bnxt_vfr.o bnxt_devlink.o
+bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o bnxt_xdp.o bnxt_vfr.o bnxt_devlink.o bnxt_dim.o
 bnxt_en-$(CONFIG_BNXT_FLOWER_OFFLOAD) += bnxt_tc.o
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 9efbdc6..3c5d2fa 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1645,6 +1645,8 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_napi *bnapi, u32 *raw_cons,
 	rxr->rx_next_cons = NEXT_RX(cons);
 
 next_rx_no_prod:
+	cpr->rx_packets += 1;
+	cpr->rx_bytes += len;
 	*raw_cons = tmp_raw_cons;
 
 	return rc;
@@ -1802,6 +1804,7 @@ static irqreturn_t bnxt_msix(int irq, void *dev_instance)
 	struct bnxt_cp_ring_info *cpr = &bnapi->cp_ring;
 	u32 cons = RING_CMP(cpr->cp_raw_cons);
 
+	cpr->event_ctr++;
 	prefetch(&cpr->cp_desc_ring[CP_RING(cons)][CP_IDX(cons)]);
 	napi_schedule(&bnapi->napi);
 	return IRQ_HANDLED;
@@ -2025,6 +2028,14 @@ static int bnxt_poll(struct napi_struct *napi, int budget)
 			break;
 		}
 	}
+	if (bp->flags & BNXT_FLAG_DIM) {
+		struct net_dim_sample dim_sample;
+		net_dim_sample(cpr->event_ctr,
+			       cpr->rx_packets,
+			       cpr->rx_bytes,
+			       &dim_sample);
+		net_dim(&cpr->am, dim_sample);
+	}
 	mmiowb();
 	return work_done;
 }
@@ -2610,6 +2621,8 @@ static void bnxt_init_cp_rings(struct bnxt *bp)
 		struct bnxt_ring_struct *ring = &cpr->cp_ring_struct;
 
 		ring->fw_ring_id = INVALID_HW_RING_ID;
+		cpr->rx_ring_coal.coal_ticks = bp->rx_coal.coal_ticks;
+		cpr->rx_ring_coal.coal_bufs = bp->rx_coal.coal_bufs;
 	}
 }
 
@@ -4583,6 +4596,41 @@ static void bnxt_hwrm_set_coal_params(struct bnxt_coal *hw_coal,
 	req->flags = cpu_to_le16(flags);
 }
 
+int bnxt_hwrm_set_ring_coal(struct bnxt *bp, struct bnxt_napi *bnapi)
+{
+	struct hwrm_ring_cmpl_ring_cfg_aggint_params_input req_rx = {0};
+	struct bnxt_cp_ring_info *cpr = &bnapi->cp_ring;
+	struct bnxt_coal coal;
+	unsigned int grp_idx;
+	int rc = 0;
+
+        /* Tick values in micro seconds.
+         * 1 coal_buf x bufs_per_record = 1 completion record.
+         */
+	memcpy(&coal, &bp->rx_coal, sizeof(struct bnxt_coal));
+
+	coal.coal_ticks = cpr->rx_ring_coal.coal_ticks;
+	coal.coal_bufs = cpr->rx_ring_coal.coal_bufs;
+
+	if (!bnapi->rx_ring)
+		return -ENODEV;
+
+	bnxt_hwrm_cmd_hdr_init(bp, &req_rx,
+			       HWRM_RING_CMPL_RING_CFG_AGGINT_PARAMS, -1, -1);
+
+	bnxt_hwrm_set_coal_params(&coal, &req_rx);
+
+	mutex_lock(&bp->hwrm_cmd_lock);
+	grp_idx = bnapi->index;
+
+	req_rx.ring_id = cpu_to_le16(bp->grp_info[grp_idx].cp_fw_ring_id);
+
+	rc = _hwrm_send_message(bp, &req_rx, sizeof(req_rx),
+				HWRM_CMD_TIMEOUT);
+	mutex_unlock(&bp->hwrm_cmd_lock);
+	return rc;
+}
+
 int bnxt_hwrm_set_coal(struct bnxt *bp)
 {
 	int i, rc = 0;
@@ -5705,7 +5753,11 @@ static void bnxt_enable_napi(struct bnxt *bp)
 	int i;
 
 	for (i = 0; i < bp->cp_nr_rings; i++) {
+		struct bnxt_cp_ring_info *cpr = &bp->bnapi[i]->cp_ring;
 		bp->bnapi[i]->in_reset = false;
+
+		INIT_WORK(&cpr->am.work, bnxt_dim_work);
+		cpr->am.mode = NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 		napi_enable(&bp->bnapi[i]->napi);
 	}
 }
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 5359a1f..b3d5a363 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -23,6 +23,7 @@
 #include <net/devlink.h>
 #include <net/dst_metadata.h>
 #include <net/switchdev.h>
+#include <linux/net_dim.h>
 
 struct tx_bd {
 	__le32 tx_bd_len_flags_type;
@@ -607,6 +608,17 @@ struct bnxt_tx_ring_info {
 	struct bnxt_ring_struct	tx_ring_struct;
 };
 
+struct bnxt_coal {
+	u16			coal_ticks;
+	u16			coal_ticks_irq;
+	u16			coal_bufs;
+	u16			coal_bufs_irq;
+			/* RING_IDLE enabled when coal ticks < idle_thresh  */
+	u16			idle_thresh;
+	u8			bufs_per_record;
+	u8			budget;
+};
+
 struct bnxt_tpa_info {
 	void			*data;
 	u8			*data_ptr;
@@ -670,6 +682,13 @@ struct bnxt_cp_ring_info {
 	u32			cp_raw_cons;
 	void __iomem		*cp_doorbell;
 
+	struct bnxt_coal	rx_ring_coal;
+	u64			rx_packets;
+	u64			rx_bytes;
+	u64			event_ctr;
+
+	struct net_dim		am;
+
 	struct tx_cmp		*cp_desc_ring[MAX_CP_PAGES];
 
 	dma_addr_t		cp_desc_mapping[MAX_CP_PAGES];
@@ -944,17 +963,6 @@ struct bnxt_test_info {
 #define BNXT_CAG_REG_LEGACY_INT_STATUS	0x4014
 #define BNXT_CAG_REG_BASE		0x300000
 
-struct bnxt_coal {
-	u16			coal_ticks;
-	u16			coal_ticks_irq;
-	u16			coal_bufs;
-	u16			coal_bufs_irq;
-			/* RING_IDLE enabled when coal ticks < idle_thresh  */
-	u16			idle_thresh;
-	u8			bufs_per_record;
-	u8			budget;
-};
-
 struct bnxt_tc_flow_stats {
 	u64		packets;
 	u64		bytes;
@@ -1126,6 +1134,7 @@ struct bnxt {
 	#define BNXT_FLAG_DOUBLE_DB	0x400000
 	#define BNXT_FLAG_FW_DCBX_AGENT	0x800000
 	#define BNXT_FLAG_CHIP_NITRO_A0	0x1000000
+	#define BNXT_FLAG_DIM		0x2000000
 
 	#define BNXT_FLAG_ALL_CONFIG_FEATS (BNXT_FLAG_TPA |		\
 					    BNXT_FLAG_RFS |		\
@@ -1423,4 +1432,7 @@ int bnxt_setup_mq_tc(struct net_device *dev, u8 tc);
 int bnxt_get_max_rings(struct bnxt *, int *, int *, bool);
 void bnxt_restore_pf_fw_resources(struct bnxt *bp);
 int bnxt_port_attr_get(struct bnxt *bp, struct switchdev_attr *attr);
+void bnxt_dim_work(struct work_struct *work);
+int bnxt_hwrm_set_ring_coal(struct bnxt *bp, struct bnxt_napi *bnapi);
+
 #endif
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
new file mode 100644
index 0000000..0eefe4b
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
@@ -0,0 +1,32 @@
+/*
+ * Copyright (c) 2017 Broadcom Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/net_dim.h>
+#include "bnxt_hsi.h"
+#include "bnxt.h"
+
+void bnxt_dim_work(struct work_struct *work)
+{
+	struct net_dim *am = container_of(work, struct net_dim,
+					 work);
+	struct bnxt_cp_ring_info *cpr = container_of(am,
+						     struct bnxt_cp_ring_info,
+						     am);
+	struct bnxt_napi *bnapi = container_of(cpr,
+					       struct bnxt_napi,
+					       cp_ring);
+	struct net_dim_cq_moder cur_profile = net_dim_get_profile(am->mode,
+							      am->profile_ix);
+
+	cpr->rx_ring_coal.coal_ticks = cur_profile.usec;
+	cpr->rx_ring_coal.coal_bufs = cur_profile.pkts;
+
+	bnxt_hwrm_set_ring_coal(bnapi->bp, bnapi);
+	am->state = NET_DIM_START_MEASURE;
+}
+
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index fe7599f..632e03b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -49,6 +49,8 @@ static int bnxt_get_coalesce(struct net_device *dev,
 
 	memset(coal, 0, sizeof(*coal));
 
+	coal->use_adaptive_rx_coalesce = bp->flags & BNXT_FLAG_DIM;
+
 	hw_coal = &bp->rx_coal;
 	mult = hw_coal->bufs_per_record;
 	coal->rx_coalesce_usecs = hw_coal->coal_ticks;
@@ -77,6 +79,15 @@ static int bnxt_set_coalesce(struct net_device *dev,
 	int rc = 0;
 	u16 mult;
 
+	if (coal->use_adaptive_rx_coalesce)
+		bp->flags |= BNXT_FLAG_DIM;
+	else {
+		if (bp->flags & BNXT_FLAG_DIM) {
+			bp->flags &= ~(BNXT_FLAG_DIM);
+			goto reset_coalesce;
+		}
+	}
+
 	hw_coal = &bp->rx_coal;
 	mult = hw_coal->bufs_per_record;
 	hw_coal->coal_ticks = coal->rx_coalesce_usecs;
@@ -104,6 +115,7 @@ static int bnxt_set_coalesce(struct net_device *dev,
 		update_stats = true;
 	}
 
+reset_coalesce:
 	if (netif_running(dev)) {
 		if (update_stats) {
 			rc = bnxt_close_nic(bp, true, false);
-- 
2.7.4

^ permalink raw reply related

* [net-next 10/10] MAINTAINERS: add entry for Dynamic Interrupt Moderation
From: Andy Gospodarek @ 2018-01-04 20:21 UTC (permalink / raw)
  To: netdev; +Cc: mchan, talgi, ogerlitz, Andy Gospodarek
In-Reply-To: <1515097290-17470-1-git-send-email-andy@greyhouse.net>

From: Andy Gospodarek <gospo@broadcom.com>

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
---
 MAINTAINERS | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 753799d..769857b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4944,6 +4944,11 @@ S:	Maintained
 F:	lib/dynamic_debug.c
 F:	include/linux/dynamic_debug.h
 
+DYNAMIC INTERRUPT MODERATION
+M:	Tal Gilboa <talgi@mellanox.com>
+S:	Mainained
+F:	include/linux/net_dim.h
+
 DZ DECSTATION DZ11 SERIAL DRIVER
 M:	"Maciej W. Rozycki" <macro@linux-mips.org>
 S:	Maintained
-- 
2.7.4

^ permalink raw reply related

* Re: [net-next 00/10] net: create dynamic software irq moderation library
From: Or Gerlitz @ 2018-01-04 20:37 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Linux Netdev List, mchan, Tal Gilboa, Or Gerlitz, Andy Gospodarek
In-Reply-To: <1515097290-17470-1-git-send-email-andy@greyhouse.net>

>   net/mlx5e: move interrupt moderation structs to new file
>   net/mlx5e: move interrupt moderation forward declarations
>   net/mlx5e: remove rq references in mlx5e_rx_am
>   net/mlx5e: move AM logic enums
>   net/mlx5e: move generic functions to new file
>   net/mlx5e: change Mellanox references in DIM code

Hi, Andy && happy new 2018 --  this is indeed a nit, but I have
provided it to you twice (...),
please get the commit titles to align with what we do which is capital
letter after the net/mlx5e: prefix

from: net/mlx5e: move interrupt moderation structs to new file
to: net/mlx5e: Move interrupt moderation structs to new file

If you get other comments, just apply this for the next version, if everyone
is happy, that would be a very small effort to just fix and get that in..

^ permalink raw reply

* Re: [PATCH] ps3_gelic_net: Delete an error message for a failed memory allocation in gelic_descr_prepare_rx()
From: Geoff Levand @ 2018-01-04 20:44 UTC (permalink / raw)
  To: SF Markus Elfring, netdev, linuxppc-dev, Benjamin Herrenschmidt,
	Michael Ellerman, Paul Mackerras
  Cc: kernel-janitors, LKML
In-Reply-To: <9613bfbf-11cc-1e66-484a-84fcda022861@users.sourceforge.net>

On 01/03/2018 06:40 AM, SF Markus Elfring wrote:
> Omit an extra message for a memory allocation failure in this function.

I applied this to my ps3-queue branch.

As I mentioned to you several times before, please keep
the commit subject line to less than 50 characters.
Also, in this case the prefix would be 'net/ps3_gelic_net:"

-Geoff

^ permalink raw reply

* Re: [PATCH net-next 0/5] Export SERDES stats via ethtool -S
From: Andrew Lunn @ 2018-01-04 21:07 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: David Miller, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <20180103175832.GJ28752@n2100.armlinux.org.uk>

> Hi Andrew,
> 
> Isn't the serdes interface just a normal PHY - the register set in
> Viviens debugfs patch certainly looks like an 88e1545 in one of the
> switches I've here (I think on the dev rev B).  Is there a reason why
> this PHY isn't visible as a normal PHY, just like we export the other
> internal PHYs?

Humm, interesting idea. Yes, it looks like a normal PHY. It is however
not contiguous to the internal PHYs on the 6352. You probably can
however see it, if you look on the MDIO bus, at address 0xf.

Depends on what mode the ports are in, the SERDES can be connected to
either port 4 or 5. Port 5 does not have an internal PHY, but port 4
does. We could determine what port is using the SERDES, and redirect
PHY read/writes to the SERDES interface. But it gets a bit complex for
port 4, since it can use either copper or SERDES depending on which
comes up first. So suppose you need to be able to see both the copper
PHY and the SERDES, so you can initialize them both, in order for one
to actually come up.

   Andrew

^ permalink raw reply

* Re: [PATCH bpf-next 1/2] bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map
From: Jakub Kicinski @ 2018-01-04 21:08 UTC (permalink / raw)
  To: Yonghong Song; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180104072746.1569033-2-yhs@fb.com>

On Wed, 3 Jan 2018 23:27:45 -0800, Yonghong Song wrote:
> Currently, bpf syscall command BPF_MAP_GET_NEXT_KEY is not
> supported for stacktrace map. However, there are use cases where
> user space wants to enumerate all stacktrace map entries where
> BPF_MAP_GET_NEXT_KEY command will be really helpful.
> In addition, if user space wants to delete all map entries
> in order to save memory and does not want to close the
> map file descriptor, BPF_MAP_GET_NEXT_KEY may help improve
> performance if map entries are sparsely populated.
> 
> The implementation follows the API specification of existing
> BPF_MAP_GET_NEXT_KEY implementation. If user provides
> an NULL key pointer, the first key is returned. Otherwise,
> the first valid key after the input parameter "key"
> is returned, or -ENOENT if no valid key can be found.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  kernel/bpf/stackmap.c | 23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index a15bc63..207b21c 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -226,9 +226,28 @@ int bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
>  	return 0;
>  }
>  
> -static int stack_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
> +static int stack_map_get_next_key(struct bpf_map *map, void *key,
> +				  void *next_key)
>  {
> -	return -EINVAL;
> +	struct bpf_stack_map *smap = container_of(map,
> +						  struct bpf_stack_map, map);
> +	u32 id;
> +
> +	WARN_ON_ONCE(!rcu_read_lock_held());
> +
> +	if (!key)
> +		id = 0;
> +	else
> +		id = *(u32 *)key + 1;
> +
> +	while (id < smap->n_buckets && !smap->buckets[id])
> +		id++;
> +
> +	if (id >= smap->n_buckets)
> +		return -ENOENT;

AFAIU for hash maps the semantics of get next are as follows:

get_next(map, key) {
	if (!key)
		return get_first(map);

	elem = lookup(map, key);
	if (!elem)                       // <-- note this branch
		return get_first(map);
	if (elem->next)
		return elem->next->key;
	return -ENOENT;
}

For arrays elements always exist, hence the elem->next check is
omitted.  Here you are, however, testing !smap->buckets[id] so I assume
elements may not exist.  

Is there any precedent for get_next on non-existent key returning
element other than first?  The stacktrace map is a bit special, and
returning id + 1 would defeat what you're trying to do here..  Is there
value in keeping the behaviour consistent across map types?  

Anyway, you said in the commit message that "The implementation follows
the API specification of existing BPF_MAP_GET_NEXT_KEY implementation."
and I find that arguable :)

> +	*(u32 *)next_key = id;
> +	return 0;
>  }
>  
>  static int stack_map_update_elem(struct bpf_map *map, void *key, void *value,

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox