Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH RFC net-next 7/7] netdevsim: Add simple FIB resource controller via devlink
From: Jakub Kicinski @ 2018-03-25  6:35 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, davem, roopa, shm, jiri, idosch, David Ahern
In-Reply-To: <1b5db863-7949-e4a2-7f22-6dd79ecb8089@cumulusnetworks.com>

On Sat, 24 Mar 2018 09:02:45 -0600, David Ahern wrote:
> >> diff --git a/drivers/net/netdevsim/Makefile b/drivers/net/netdevsim/Makefile
> >> index 09388c06171d..449b2a1a1800 100644
> >> --- a/drivers/net/netdevsim/Makefile
> >> +++ b/drivers/net/netdevsim/Makefile
> >> @@ -9,3 +9,7 @@ ifeq ($(CONFIG_BPF_SYSCALL),y)
> >>  netdevsim-objs += \
> >>  	bpf.o
> >>  endif
> >> +
> >> +ifneq ($(CONFIG_NET_DEVLINK),)  
> > 
> > Hm.  Don't you need MAY_USE_DEVLINK dependency perhaps?  
> 
> mlxsw uses CONFIG_NET_DEVLINK in its Makefile.
> 
> MAY_USE_DEVLINK seems to only be used in Kconfig files. Not clear to me
> why it is needed at all.

NETDEVSIM=y && DEVLINK=m

^ permalink raw reply

* [PATCH iproute2-next] rdma: Move RDMA UAPI header file to be under RDMA responsibility
From: Leon Romanovsky @ 2018-03-25  6:38 UTC (permalink / raw)
  To: David Ahern
  Cc: Leon Romanovsky, netdev, RDMA mailing list, Stephen Hemminger,
	Steve Wise

From: Leon Romanovsky <leonro@mellanox.com>

In iproute2 package, the updates of UAPIs files are performed
after the needed feature lands in kernel's net-next tree.

Such development flow created delays to the rdma tool developers,
who uses rdma-next tree as a basis for their work.

Move RDMA UAPI file to be under rdma/ folder, so whole responsibility
of syncing this file will be on them.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 rdma/Makefile                                      | 1 +
 {include => rdma/include}/uapi/rdma/rdma_netlink.h | 0
 2 files changed, 1 insertion(+)
 rename {include => rdma/include}/uapi/rdma/rdma_netlink.h (100%)

diff --git a/rdma/Makefile b/rdma/Makefile
index 360f09b2..819fcbe3 100644
--- a/rdma/Makefile
+++ b/rdma/Makefile
@@ -4,6 +4,7 @@ include ../config.mk
 TARGETS :=

 ifeq ($(HAVE_MNL),y)
+CFLAGS += -I./include/uapi/

 RDMA_OBJ = rdma.o utils.o dev.o link.o res.o

diff --git a/include/uapi/rdma/rdma_netlink.h b/rdma/include/uapi/rdma/rdma_netlink.h
similarity index 100%
rename from include/uapi/rdma/rdma_netlink.h
rename to rdma/include/uapi/rdma/rdma_netlink.h

^ permalink raw reply related

* Re: [PATCH RFC net-next 1/7] net: Fix fib notifer to return errno
From: Ido Schimmel @ 2018-03-25  8:16 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, davem, roopa, shm, jiri, idosch, jakub.kicinski
In-Reply-To: <20180322225757.10377-2-dsa@cumulusnetworks.com>

On Thu, Mar 22, 2018 at 03:57:51PM -0700, David Ahern wrote:
> Notifier handlers use notifier_from_errno to convert any potential error
> to an encoded format. As a consequence the other side, call_fib_notifiers
> in this case, needs to use notifier_to_errno to return the error from
> the handler back to its caller.
> 
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ---
>  net/core/fib_notifier.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/fib_notifier.c b/net/core/fib_notifier.c
> index 5ace0705a3f9..14ba52ebe8c9 100644
> --- a/net/core/fib_notifier.c
> +++ b/net/core/fib_notifier.c
> @@ -21,8 +21,11 @@ EXPORT_SYMBOL(call_fib_notifier);
>  int call_fib_notifiers(struct net *net, enum fib_event_type event_type,
>  		       struct fib_notifier_info *info)

There's another (less interesting case) - call_fib_notifier(), which is
used to dump the FIB tables for the caller registering to the
notification chain.

For example, if you have a non-default FIB rule in the system and you
modprobe mlxsw, you'll get a silent failure and routes will not be
offloaded. On the other hand, I'm not sure we want to fail the module
loading in such cases.

A possible solution is to have the driver emit a warning via extack for
each route/rule being notified after the abort mechanism was triggered.

>  {
> +	int err;
> +
>  	info->net = net;
> -	return atomic_notifier_call_chain(&fib_chain, event_type, info);
> +	err = atomic_notifier_call_chain(&fib_chain, event_type, info);
> +	return notifier_to_errno(err);
>  }
>  EXPORT_SYMBOL(call_fib_notifiers);
>  
> -- 
> 2.11.0
> 

^ permalink raw reply

* Re: [PATCH] of_net: Implement of_get_nvmem_mac_address helper
From: Mike Looijmans @ 2018-03-25  8:17 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, netdev, linux-kernel, devicetree, robh+dt,
	frowand.list
In-Reply-To: <20180324185324.GG31941@lunn.ch>

On 24-03-18 19:53, Andrew Lunn wrote:
>> A quick survey for the of_get_mac_address users learns that most of them do
>> a memcpy (or similar) right after it, so for these drivers the
>> "of_get_nvmem_mac_address" style signature that performs the memcpy (or
>> better, ether_addr_copy) is a better fit, e.g.:
>>
>> int of_get_mac_address(struct device_node *np, void *addr)
> 
> Hi Mike
> 
> This is a nicer solution, but it is quite a lot of work, there are a
> lot of users. Maybe Coccinelle can help?

About 58 of them, yeah. And this looked like such a simple thing when I 
started it...
https://elixir.bootlin.com/linux/v4.16-rc6/ident/of_get_mac_address
I have no experience with Coccinelle though.



-- 
Mike Looijmans

^ permalink raw reply

* KASAN: use-after-free Read in pppol2tp_connect (3)
From: syzbot @ 2018-03-25  9:06 UTC (permalink / raw)
  To: davem, jchapman, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot hit the following crash on upstream commit
bcfc1f4554662d8f2429ac8bd96064a59c149754 (Sat Mar 24 16:50:12 2018 +0000)
Merge tag 'pinctrl-v4.16-3' of  
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=fbeeb5c3b538e8545644

So far this crash happened 5 times on net-next, upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5150723616538624
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=4732053023096832
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5847288391925760
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-5034017172441945317
compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+fbeeb5c3b538e8545644@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.
If you forward the report, please keep this part and the footer.

l2tp_core: tunl 2: sockfd_lookup(fd=9) returned -9
l2tp_core: tunl 2: sockfd_lookup(fd=8) returned -9
l2tp_core: tunl 2: sockfd_lookup(fd=9) returned -9
==================================================================
BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:188  
[inline]
BUG: KASAN: use-after-free in sk_dst_get include/net/sock.h:1767 [inline]
BUG: KASAN: use-after-free in pppol2tp_session_init net/l2tp/l2tp_ppp.c:587  
[inline]
BUG: KASAN: use-after-free in pppol2tp_connect+0x1a98/0x1dd0  
net/l2tp/l2tp_ppp.c:747
Read of size 8 at addr ffff8801d18242a8 by task syzkaller117300/8372

CPU: 1 PID: 8372 Comm: syzkaller117300 Not tainted 4.16.0-rc6+ #365
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x24d lib/dump_stack.c:53
  print_address_description+0x73/0x250 mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report+0x23c/0x360 mm/kasan/report.c:412
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
  __read_once_size include/linux/compiler.h:188 [inline]
  sk_dst_get include/net/sock.h:1767 [inline]
  pppol2tp_session_init net/l2tp/l2tp_ppp.c:587 [inline]
  pppol2tp_connect+0x1a98/0x1dd0 net/l2tp/l2tp_ppp.c:747
  SYSC_connect+0x213/0x4a0 net/socket.c:1639
  SyS_connect+0x24/0x30 net/socket.c:1620
  do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x449f39
RSP: 002b:00007fa9d92efce8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000700024 RCX: 0000000000449f39
RDX: 000000000000002e RSI: 0000000020e92000 RDI: 0000000000000009
RBP: 0000000000700020 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000080efcf R14: 00007fa9d92f09c0 R15: 0000000000000009

Allocated by task 8381:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
  set_track mm/kasan/kasan.c:459 [inline]
  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:552
  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
  kmem_cache_alloc+0x12e/0x760 mm/slab.c:3541
  sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1465
  sk_alloc+0x105/0x1440 net/core/sock.c:1525
  inet_create+0x47c/0xf50 net/ipv4/af_inet.c:320
  __sock_create+0x4d4/0x850 net/socket.c:1285
  sock_create net/socket.c:1325 [inline]
  SYSC_socket net/socket.c:1355 [inline]
  SyS_socket+0xeb/0x1d0 net/socket.c:1335
  do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x42/0xb7

Freed by task 21:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
  set_track mm/kasan/kasan.c:459 [inline]
  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:520
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:527
  __cache_free mm/slab.c:3485 [inline]
  kmem_cache_free+0x83/0x2a0 mm/slab.c:3743
  sk_prot_free net/core/sock.c:1506 [inline]
  __sk_destruct+0x628/0x920 net/core/sock.c:1590
  sk_destruct+0x47/0x80 net/core/sock.c:1598
  __sk_free+0xf1/0x2b0 net/core/sock.c:1609
  sk_free+0x2a/0x40 net/core/sock.c:1620
  sock_put include/net/sock.h:1659 [inline]
  l2tp_tunnel_free net/l2tp/l2tp_core.c:160 [inline]
  l2tp_tunnel_dec_refcount net/l2tp/l2tp_core.h:264 [inline]
  l2tp_tunnel_del_work+0x474/0x6a0 net/l2tp/l2tp_core.c:1307
  process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
  worker_thread+0x223/0x1990 kernel/workqueue.c:2247
  kthread+0x33c/0x400 kernel/kthread.c:238
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406

The buggy address belongs to the object at ffff8801d1824080
  which belongs to the cache UDP of size 1472
The buggy address is located 552 bytes inside of
  1472-byte region [ffff8801d1824080, ffff8801d1824640)
The buggy address belongs to the page:
page:ffffea0007460900 count:1 mapcount:0 mapping:ffff8801d1824080 index:0x0  
compound_mapcount: 0
flags: 0x2fffc0000008100(slab|head)
raw: 02fffc0000008100 ffff8801d1824080 0000000000000000 0000000100000005
raw: ffffea0006b0fda0 ffffea0006a53820 ffff8801d6f36640 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801d1824180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801d1824200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8801d1824280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                   ^
  ffff8801d1824300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801d1824380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply

* Hello Beautiful
From: Jack @ 2018-03-25  8:59 UTC (permalink / raw)


Cześć Drogi, nazywam się Jack i szukam związku, w którym będę czuć się kochany po serii nieudanych związków.

Mam nadzieję, że byłbyś zainteresowany i moglibyśmy się lepiej poznać, jeśli nie masz nic przeciwko. Jestem otwarty na udzielanie odpowiedzi na pytania od ciebie, ponieważ uważam, że moje podejście jest trochę niewłaściwe. Mam nadzieję, że odezwę się od ciebie.

Jacek.

^ permalink raw reply

* Re: [net-next 03/15] net/mlx5e: PFC stall prevention support
From: Gal Pressman @ 2018-03-25 10:28 UTC (permalink / raw)
  To: Andrew Lunn, Saeed Mahameed; +Cc: David S. Miller, netdev, Inbar Karmy
In-Reply-To: <20180324150709.GD31941@lunn.ch>

On 24-Mar-18 18:07, Andrew Lunn wrote:
> On Fri, Mar 23, 2018 at 03:39:13PM -0700, Saeed Mahameed wrote:
>> From: Inbar Karmy <inbark@mellanox.com>
>>
>> Implement set/get functions to configure PFC stall prevention
>> timeout by tunables api through ethtool.
>> By default the stall prevention timeout is configured to 8 sec.
>> Timeout range is: 80-8000 msec.
>> Enabling stall prevention without a specific timeout will set
>> the timeout to 100 msec.
> 
> Hi Inbar
> 
> I think this last sentence should be talking about auto, not without a
> specific timeout.

Hi Andrew,
Good catch, we will fix that.

> 
>>
>> Signed-off-by: Inbar Karmy <inbark@mellanox.com>
>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
>> ---
>>  .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 57 +++++++++++++++++++
>>  drivers/net/ethernet/mellanox/mlx5/core/port.c     | 64 +++++++++++++++++++---
>>  include/linux/mlx5/mlx5_ifc.h                      | 17 ++++--
>>  include/linux/mlx5/port.h                          |  6 ++
>>  4 files changed, 132 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
>> index cc8048f68f11..62061fd23143 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
>> @@ -1066,6 +1066,57 @@ static int mlx5e_get_rxnfc(struct net_device *netdev,
>>  	return err;
>>  }
>>  
>> +#define MLX5E_PFC_PREVEN_AUTO_TOUT_MSEC		100
>> +#define MLX5E_PFC_PREVEN_TOUT_MAX_MSEC		8000
>> +#define MLX5E_PFC_PREVEN_MINOR_PRECENT		85
>> +#define MLX5E_PFC_PREVEN_TOUT_MIN_MSEC		80
>> +#define MLX5E_DEVICE_STALL_MINOR_WATERMARK(critical_tout) \
>> +	max_t(u16, MLX5E_PFC_PREVEN_TOUT_MIN_MSEC, \
>> +	      (critical_tout * MLX5E_PFC_PREVEN_MINOR_PRECENT) / 100)
>> +
>> +static int mlx5e_get_pfc_prevention_tout(struct net_device *netdev,
>> +					 u16 *pfc_prevention_tout)
>> +{
>> +	struct mlx5e_priv *priv    = netdev_priv(netdev);
>> +	struct mlx5_core_dev *mdev = priv->mdev;
>> +
>> +	if (!MLX5_CAP_PCAM_FEATURE((priv)->mdev, pfcc_mask) ||
>> +	    !MLX5_CAP_DEBUG((priv)->mdev, stall_detect))
>> +		return -EOPNOTSUPP;
>> +
>> +	return mlx5_query_port_stall_watermark(mdev, pfc_prevention_tout, NULL);
> 
> Shouldn't you map a value of MLX5E_PFC_PREVEN_AUTO_TOUT_MSEC back to 
> PFC_STORM_PREVENTION_AUTO?

We discussed this point internally, mapping MLX5E_PFC_PREVEN_AUTO_TOUT_MSEC (100) to
PFC_STORM_PREVENTION_AUTO might cause confusion when the user explicitly asks for 100msec timeout
and gets auto in his following query.
Also, this way the "auto" timeout is visible to the user, which might help him get an initial
clue of which values are recommended.

> 
> 	Andrew
> 

^ permalink raw reply

* Re: [net-next 02/15] ethtool: Add support for configuring PFC stall prevention in ethtool
From: Gal Pressman @ 2018-03-25 10:30 UTC (permalink / raw)
  To: Andrew Lunn, Saeed Mahameed
  Cc: David S. Miller, netdev, Inbar Karmy, Michal Kubecek
In-Reply-To: <20180324145703.GC31941@lunn.ch>

On 24-Mar-18 17:57, Andrew Lunn wrote:
> On Fri, Mar 23, 2018 at 03:39:12PM -0700, Saeed Mahameed wrote:
>> From: Inbar Karmy <inbark@mellanox.com>
>>
>> In the event where the device unexpectedly becomes unresponsive
>> for a long period of time, flow control mechanism may propagate
>> pause frames which will cause congestion spreading to the entire
>> network.
>> To prevent this scenario, when the device is stalled for a period
>> longer than a pre-configured timeout, flow control mechanisms are
>> automatically disabled.
>>
>> This patch adds support for the ETHTOOL_PFC_STALL_PREVENTION
>> as a tunable.
>> This API provides support for configuring flow control storm prevention
>> timeout (msec).
>>
>> Signed-off-by: Inbar Karmy <inbark@mellanox.com>
>> Cc: Michal Kubecek <mkubecek@suse.cz>
>> Cc: Andrew Lunn <andrew@lunn.ch>
>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
>> ---
>>  include/uapi/linux/ethtool.h | 4 ++++
>>  net/core/ethtool.c           | 6 ++++++
>>  2 files changed, 10 insertions(+)
>>
>> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
>> index 20da156aaf64..9dc63a14a747 100644
>> --- a/include/uapi/linux/ethtool.h
>> +++ b/include/uapi/linux/ethtool.h
>> @@ -217,10 +217,14 @@ struct ethtool_value {
>>  	__u32	data;
>>  };
>>  
>> +#define PFC_STORM_PREVENTION_AUTO	0xffff
>> +#define PFC_STORM_PREVENTION_DISABLE	0
>> +
>>  enum tunable_id {
>>  	ETHTOOL_ID_UNSPEC,
>>  	ETHTOOL_RX_COPYBREAK,
>>  	ETHTOOL_TX_COPYBREAK,
>> +	ETHTOOL_PFC_PREVENTION_TOUT,
> 
> Hi Inbar
> 
> Please could you add a comment here about the units. Ideally we want
> this file to be self documenting.

Thank you for the review, we will fix that.

> 
>      Andrew
> 

^ permalink raw reply

* Hello Beautiful
From: Jack @ 2018-03-25 10:50 UTC (permalink / raw)


Hi Dear, my name is Jack and i am seeking for a relationship in which i will feel loved after a series of failed relationships. 

I am hoping that you would be interested and we could possibly get to know each other more if you do not mind. I am open to answering questions from you as i think my approach is a little inappropriate. Hope to hear back from you.

Jack.

^ permalink raw reply

* Re: [RFC v3 net-next 13/18] net/sched: Introduce the TBS Qdisc
From: Thomas Gleixner @ 2018-03-25 11:46 UTC (permalink / raw)
  To: Jesus Sanchez-Palencia
  Cc: netdev, jhs, xiyou.wangcong, jiri, vinicius.gomes, richardcochran,
	anna-maria, henrik, John Stultz, levi.pearson, edumazet, willemb,
	mlichvar
In-Reply-To: <2897b562-06e0-0fcc-4fb1-e8c4469c0faa@intel.com>

On Fri, 23 Mar 2018, Jesus Sanchez-Palencia wrote:
> On 03/22/2018 03:52 PM, Thomas Gleixner wrote:
> > So what's the plan for this? Having TAS as a separate entity or TAS feeding
> > into the proposed 'basic' time transmission thing?
> 
> The second one, I guess.

That's just wrong. It won't work. See below.

> Elaborating, the plan is at some point having TAS as a separate entity,
> but which can use tbs for one of its classes (and cbs for another, and
> strict priority for everything else, etc).
>
> Basically, the design would something along the lines of 'taprio'. A root qdisc
> that is both time and priority aware, and capable of running a schedule for the
> port. That schedule can run inside the kernel with hrtimers, or just be
> offloaded into the controller if Qbv is supported on HW.
> 
> Because it would expose the inner traffic classes in a mq / mqprio / prio style,
> then it would allow for other per-queue qdiscs to be attached to it. On a system
> using the i210, for instance, we could then have tbs installed on traffic class
> 0 just dialing hw offload. The Qbv schedule would be running in SW on the TAS
> entity (i.e. 'taprio') which would be setting the packets' txtime before
> dequeueing packets on a fast path -> tbs -> NIC.
> 
> Similarly, other qdisc, like cbs, could be installed if all that traffic class
> requires is traffic shaping once its 'gate' is allowed to execute the selected
> tx algorithm attached to it.
> 
> > I've not yet seen a convincing argument why this low level stuff with all
> > of its weird flavours is superiour over something which reflects the basic
> > operating principle of TSN.
> 
> 
> As you know, not all TSN systems are designed the same. Take AVB systems, for
> example. These not always are running on networks that are aware of any time
> schedule, or at least not quite like what is described by Qbv.
> 
> On those systems there is usually a certain number of streams with different
> priorities that care mostly about having their bandwidth reserved along the
> network. The applications running on such systems are usually based on AVTP,
> thus they already have to calculate and set the "avtp presentation time"
> per-packet themselves. A Qbv scheduler would probably provide very little
> benefits to this domain, IMHO. For "talkers" of these AVB systems, shaping
> traffic using txtime (i.e. tbs) can provide a low-jitter alternative to cbs, for
> instance.

You're looking at it from particular use cases and try to accomodate for
them in the simplest possible way. I don't think that cuts it.

Let's take a step back and look at it from a more general POV without
trying to make it fit to any of the standards first. I'm deliberately NOT
using any of the standard defined terms.

At the (local) network level you have always an explicit plan. This plan
might range from no plan at all to an very elaborate plan which is strict
about when each node is allowed to TX a particular class of packets.

So lets assume we have the following picture:

   	       	  [NIC]
		    |
	 [ Time slice manager ]

Now in the simplest case, the time slice manager has no constraints and
exposes a single input which allows the application to say: "Send my packet
at time X". There is no restriction on 'time X' except if there is a time
collision with an already queued packet or the requested TX time has
already passed. That's close to what you implemented.

  Is the TX timestamp which you defined in the user space ABI a fixed
  scheduling point or is it a deadline?

  That's an important distinction and for this all to work accross various
  use cases you need a way to express that in the ABI. It might be an
  implicit property of the socket/channel to which the application connects
  to but still you want to express it from the application side to do
  proper sanity checking.

  Just think about stuff like audio/video streaming. The point of
  transmission does not have to be fixed if you have some intelligent
  controller at the receiving end which can buffer stuff. The only relevant
  information is the deadline, i.e. the latest point in time where the
  packet needs to go out on the wire in order to keep the stream steady at
  the consumer side. Having the notion of a deadline and that's the only
  thing the provider knows about allows you proper utilization by using an
  approriate scheduling algorithm like EDF.

  Contrary to that you want very explicit TX points for applications like
  automation control. For this kind of use case there is no wiggle room, it
  has to go out at a fixed time because that's the way control systems
  work.

  This is missing right now and you want to get that right from the very
  beginning. Duct taping it on the interface later on is a bad idea.

Now lets go one step further and create two time slices for whatever
purpose still on the single node (not network wide). You want to do that
because you want temporal separation of services. The reason might be
bandwidth guarantee, collission avoidance or whatever.

  How does the application which was written for the simple manager which
  had no restrictions learn about this?

  Does it learn it the hard way because now the packets which fall into the
  reserved timeslice are rejected? The way you created your interface, the
  answer is yes. That's patently bad as it requires to change the
  application once it runs on a partitioned node.

  So you really want a way for the application to query the timing
  constraints and perhaps other properties of the channel it connects
  to. And you want that now before the first application starts to use the
  new ABI. If the application developer does not use it, you still have to
  fix the application, but you have to fix it because the developer was a
  lazy bastard and not because the design was bad. That's a major
  difference.

Now that we have two time slices, I'm coming back to your idea of having
your proposed qdisc as the entity which sits right at the network
interface. Lets assume the following:

   [Slice 1: Timed traffic ] [Slice 2: Other Traffic]

  Lets assume further that 'Other traffic' has no idea about time slices at
  all. It's just stuff like ssh, http, etc. So if you keep that design

       	         [ NIC ]
  	            |
           [ Time slice manager ]
	       |          |
     [ Timed traffic ]  [ Other traffic ]

  feeding into your proposed TBS thingy, then in case of underutilization
  of the 'Timed traffic' slot you prevent utilization of remaining time by
  pulling 'Other traffic' into the empty slots because 'Other traffic' is
  restricted to Slice 2 and 'Timed traffic' does not know about 'Other
  traffic' at all. And no, you cannot make TBS magically pull packets from
  'Other traffic' just because its not designed for it. So your design
  becomes strictly partitioned and forces underutilization.

  That's becoming even worse, when you switch to the proposed full hardware
  offloading scheme. In that case the only way to do admission control is
  the TX time of the farthest out packet which is already queued. That
  might work for a single application which controls all of the network
  traffic, but it wont ever work for something more flexible. The more I
  think about it the less interesting full hardware offload becomes. It's
  nice if you have a fully strict scheduling plan for everything, but then
  your admission control is bogus once you have more than one channel as
  input. So yes, it can be used when the card supports it and you have
  other ways to enforce admission control w/o hurting utilization or if you
  don't care about utilization at all. It's also useful for channels which
  are strictly isolated and have a defined TX time. Such traffic can be
  directly fed into the hardware.

Coming back to the overall scheme. If you start upfront with a time slice
manager which is designed to:

  - Handle multiple channels

  - Expose the time constraints, properties per channel

then you can fit all kind of use cases, whether designed by committee or
not. You can configure that thing per node or network wide. It does not
make a difference. The only difference are the resulting constraints.

We really want to accomodate everything between the 'no restrictions' and
the 'full network wide explicit plan' case. And it's not rocket science
once you realize that the 'no restrictions' case is just a subset of the
'full network wide explicit plan' simply because it exposes a single
channel where:

	slice period = slice length.

It's that easy, but at the same time you teach the application from the
very beginning to ask for the time constraints so if it runs on a more
sophisticated system/network, then it will see a different slice period and
a different slice length and can accomodate or react in a useful way
instead of just dying on the 17th packet it tries to send because it is
rejected.

We really want to design for this as we want to be able to run the video
stream on the same node and network which does robot control without
changing the video application. That's not a theoretical problem. These use
cases exist today, but they are forced to use different networks for the
two. But if you look at the utilization of both then they very well fit
into one and industry certainly wants to go for that.

That implies that you need constraint aware applications from the very
beginning and that requires a proper ABI in the first place. The proposed
ad hoc mode does not qualify. Please be aware, that you are creating a user
space ABI and not a random in kernel interface which can be changed at any
given time.

So lets look once more at the picture in an abstract way:

     	       [ NIC ]
	          |
	 [ Time slice manager ]
	    |           |
         [ Ch 0 ] ... [ Ch N ]

So you have a bunch of properties here:

1) Number of Channels ranging from 1 to N

2) Start point, slice period and slice length per channel

3) Queueing modes assigned per channel. Again that might be anything from
   'feed through' over FIFO, PRIO to more complex things like EDF.

   The queueing mode can also influence properties like the meaning of the
   TX time, i.e. strict or deadline.

Please sit back and map your use cases, standards or whatever you care
about into the above and I would be very surprised if they don't fit.

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH v4 17/17] net: ena: Eliminate duplicate barriers on weakly-ordered archs
From: Belgazal, Netanel @ 2018-03-25 12:06 UTC (permalink / raw)
  To: Sinan Kaya, netdev@vger.kernel.org, timur@codeaurora.org,
	sulrich@codeaurora.org, Kiyanovski, Arthur
  Cc: linux-arm-msm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, Bshara, Saeed,
	Machulsky, Zorik, David S. Miller, Tobias Klauser,
	linux-kernel@vger.kernel.org
In-Reply-To: <1521513753-7325-18-git-send-email-okaya@codeaurora.org>

I think you should either add a parameter to ena_com_write_sq_doorbell() or add ena_com_write_sq_doorbell_rel().
Right now, you have unused function.

On 3/20/18, 4:43 AM, "Sinan Kaya" <okaya@codeaurora.org> wrote:

    Code includes barrier() followed by writel(). writel() already has a
    barrier
    on some architectures like arm64.
    
    This ends up CPU observing two barriers back to back before executing the
    register write.
    
    Create a new wrapper function with relaxed write operator. Use the new
    wrapper when a write is following a barrier().
    
    Since code already has an explicit barrier call, changing writel() to
    writel_relaxed().
    
    Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
    ---
     drivers/net/ethernet/amazon/ena/ena_com.c     |  6 ++++--
     drivers/net/ethernet/amazon/ena/ena_eth_com.h | 22 ++++++++++++++++++++--
     drivers/net/ethernet/amazon/ena/ena_netdev.c  |  4 ++--
     3 files changed, 26 insertions(+), 6 deletions(-)
    
    diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
    index bf2de52..b6e628f 100644
    --- a/drivers/net/ethernet/amazon/ena/ena_com.c
    +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
    @@ -631,7 +631,8 @@ static u32 ena_com_reg_bar_read32(struct ena_com_dev *ena_dev, u16 offset)
     	 */
     	wmb();
     
    -	writel(mmio_read_reg, ena_dev->reg_bar + ENA_REGS_MMIO_REG_READ_OFF);
    +	writel_relaxed(mmio_read_reg,
    +		       ena_dev->reg_bar + ENA_REGS_MMIO_REG_READ_OFF);
     
     	for (i = 0; i < timeout; i++) {
     		if (read_resp->req_id == mmio_read->seq_num)
    @@ -1826,7 +1827,8 @@ void ena_com_aenq_intr_handler(struct ena_com_dev *dev, void *data)
     
     	/* write the aenq doorbell after all AENQ descriptors were read */
     	mb();
    -	writel((u32)aenq->head, dev->reg_bar + ENA_REGS_AENQ_HEAD_DB_OFF);
    +	writel_relaxed((u32)aenq->head,
    +		       dev->reg_bar + ENA_REGS_AENQ_HEAD_DB_OFF);
     }
     
     int ena_com_dev_reset(struct ena_com_dev *ena_dev,
    diff --git a/drivers/net/ethernet/amazon/ena/ena_eth_com.h b/drivers/net/ethernet/amazon/ena/ena_eth_com.h
    index 2f76572..09ef7cd 100644
    --- a/drivers/net/ethernet/amazon/ena/ena_eth_com.h
    +++ b/drivers/net/ethernet/amazon/ena/ena_eth_com.h
    @@ -107,7 +107,8 @@ static inline int ena_com_sq_empty_space(struct ena_com_io_sq *io_sq)
     	return io_sq->q_depth - 1 - cnt;
     }
     
    -static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq *io_sq)
    +static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq *io_sq,
    +					    bool relaxed)
     {
     	u16 tail;
     
    @@ -116,7 +117,24 @@ static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq *io_sq)
     	pr_debug("write submission queue doorbell for queue: %d tail: %d\n",
     		 io_sq->qid, tail);
     
    -	writel(tail, io_sq->db_addr);
    +	if (relaxed)
    +		writel_relaxed(tail, io_sq->db_addr);
    +	else
    +		writel(tail, io_sq->db_addr);
    +
    +	return 0;
    +}
    +
    +static inline int ena_com_write_sq_doorbell_rel(struct ena_com_io_sq *io_sq)
    +{
    +	u16 tail;
    +
    +	tail = io_sq->tail;
    +
    +	pr_debug("write submission queue doorbell for queue: %d tail: %d\n",
    +		 io_sq->qid, tail);
    +
    +	writel_relaxed(tail, io_sq->db_addr);
     
     	return 0;
     }
    diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
    index 6975150..0530201 100644
    --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
    +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
    @@ -556,7 +556,7 @@ static int ena_refill_rx_bufs(struct ena_ring *rx_ring, u32 num)
     		 * issue a doorbell
     		 */
     		wmb();
    -		ena_com_write_sq_doorbell(rx_ring->ena_com_io_sq);
    +		ena_com_write_sq_doorbell(rx_ring->ena_com_io_sq, true);
     	}
     
     	rx_ring->next_to_use = next_to_use;
    @@ -2151,7 +2151,7 @@ static netdev_tx_t ena_start_xmit(struct sk_buff *skb, struct net_device *dev)
     
     	if (netif_xmit_stopped(txq) || !skb->xmit_more) {
     		/* trigger the dma engine */
    -		ena_com_write_sq_doorbell(tx_ring->ena_com_io_sq);
    +		ena_com_write_sq_doorbell(tx_ring->ena_com_io_sq, false);
     		u64_stats_update_begin(&tx_ring->syncp);
     		tx_ring->tx_stats.doorbells++;
     		u64_stats_update_end(&tx_ring->syncp);
    -- 
    2.7.4
    
    
    


^ permalink raw reply

* Re: [PATCH net-next v2 1/2] fs/crashdd: add API to collect hardware dump in second kernel
From: kbuild test robot @ 2018-03-25 12:43 UTC (permalink / raw)
  To: Rahul Lakkireddy
  Cc: kbuild-all, netdev, linux-fsdevel, kexec, linux-kernel, davem,
	viro, ebiederm, stephen, akpm, torvalds, ganeshgr, nirranjan,
	indranil, Rahul Lakkireddy
In-Reply-To: <296ffbd47fd4f30238689e636bd2480683224227.1521888444.git.rahul.lakkireddy@chelsio.com>

[-- Attachment #1: Type: text/plain, Size: 7182 bytes --]

Hi Rahul,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Rahul-Lakkireddy/fs-crashdd-add-API-to-collect-hardware-dump-in-second-kernel/20180325-191308
config: i386-randconfig-s0-03251817 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from fs//crashdd/crashdd.c:8:0:
   fs//crashdd/crashdd_internal.h:13:23: error: field 'bin_attr' has incomplete type
     struct bin_attribute bin_attr; /* Binary dump file's attributes */
                          ^~~~~~~~
   fs//crashdd/crashdd.c: In function 'crashdd_read':
   fs//crashdd/crashdd.c:19:43: error: dereferencing pointer to incomplete type 'struct bin_attribute'
     struct crashdd_dump_node *dump = bin_attr->private;
                                              ^~
   fs//crashdd/crashdd.c: In function 'crashdd_mkdir':
   fs//crashdd/crashdd.c:27:9: error: implicit declaration of function 'kobject_create_and_add' [-Werror=implicit-function-declaration]
     return kobject_create_and_add(name, crashdd_kobj);
            ^~~~~~~~~~~~~~~~~~~~~~
   fs//crashdd/crashdd.c:27:9: warning: return makes pointer from integer without a cast [-Wint-conversion]
     return kobject_create_and_add(name, crashdd_kobj);
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs//crashdd/crashdd.c: In function 'crashdd_add_file':
   fs//crashdd/crashdd.c:39:9: error: implicit declaration of function 'sysfs_create_bin_file' [-Werror=implicit-function-declaration]
     return sysfs_create_bin_file(kobj, &dump->bin_attr);
            ^~~~~~~~~~~~~~~~~~~~~
   fs//crashdd/crashdd.c: In function 'crashdd_rmdir':
   fs//crashdd/crashdd.c:44:2: error: implicit declaration of function 'kobject_put' [-Werror=implicit-function-declaration]
     kobject_put(kobj);
     ^~~~~~~~~~~
   In file included from include/linux/kernel.h:10:0,
                    from include/linux/list.h:9,
                    from include/linux/preempt.h:11,
                    from include/linux/spinlock.h:51,
                    from include/linux/vmalloc.h:5,
                    from fs//crashdd/crashdd.c:4:
   fs//crashdd/crashdd.c: In function 'crashdd_get_driver':
   fs//crashdd/crashdd.c:101:25: error: dereferencing pointer to incomplete type 'struct kobject'
      if (!strcmp(node->kobj->name, name)) {
                            ^
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
     if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
                                 ^~~~
>> fs//crashdd/crashdd.c:101:3: note: in expansion of macro 'if'
      if (!strcmp(node->kobj->name, name)) {
      ^~
   fs//crashdd/crashdd.c: In function 'crashdd_init':
   fs//crashdd/crashdd.c:227:51: error: 'kernel_kobj' undeclared (first use in this function)
     crashdd_kobj = kobject_create_and_add("crashdd", kernel_kobj);
                                                      ^~~~~~~~~~~
   fs//crashdd/crashdd.c:227:51: note: each undeclared identifier is reported only once for each function it appears in
   fs//crashdd/crashdd.c: In function 'crashdd_add_file':
   fs//crashdd/crashdd.c:40:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^
   cc1: some warnings being treated as errors

vim +/if +101 fs//crashdd/crashdd.c

     3	
   > 4	#include <linux/vmalloc.h>
     5	#include <linux/crash_dump.h>
     6	#include <linux/crashdd.h>
     7	
     8	#include "crashdd_internal.h"
     9	
    10	static LIST_HEAD(crashdd_list);
    11	static DEFINE_MUTEX(crashdd_mutex);
    12	
    13	static struct kobject *crashdd_kobj;
    14	
    15	static ssize_t crashdd_read(struct file *filp, struct kobject *kobj,
    16				    struct bin_attribute *bin_attr,
    17				    char *buf, loff_t fpos, size_t count)
    18	{
    19		struct crashdd_dump_node *dump = bin_attr->private;
    20	
    21		memcpy(buf, dump->buf + fpos, count);
    22		return count;
    23	}
    24	
    25	static struct kobject *crashdd_mkdir(const char *name)
    26	{
    27		return kobject_create_and_add(name, crashdd_kobj);
    28	}
    29	
    30	static int crashdd_add_file(struct kobject *kobj, const char *name,
    31				    struct crashdd_dump_node *dump)
    32	{
    33		dump->bin_attr.attr.name = name;
    34		dump->bin_attr.attr.mode = 0444;
    35		dump->bin_attr.size = dump->size;
    36		dump->bin_attr.read = crashdd_read;
    37		dump->bin_attr.private = dump;
    38	
    39		return sysfs_create_bin_file(kobj, &dump->bin_attr);
    40	}
    41	
    42	static void crashdd_rmdir(struct kobject *kobj)
    43	{
    44		kobject_put(kobj);
    45	}
    46	
    47	/**
    48	 * crashdd_init_driver - create a sysfs driver entry.
    49	 * @name: Name of the directory.
    50	 *
    51	 * Creates a directory under /sys/kernel/crashdd/ with @name.  Allocates
    52	 * and saves the sysfs entry.  The sysfs entry is added to the global
    53	 * list and then returned to the caller. On failure, returns NULL.
    54	 */
    55	static struct crashdd_driver_node *crashdd_init_driver(const char *name)
    56	{
    57		struct crashdd_driver_node *node;
    58	
    59		node = vzalloc(sizeof(*node));
    60		if (!node)
    61			return NULL;
    62	
    63		/* Create a driver's directory under /sys/kernel/crashdd/ */
    64		node->kobj = crashdd_mkdir(name);
    65		if (!node->kobj) {
    66			vfree(node);
    67			return NULL;
    68		}
    69	
    70		atomic_set(&node->refcnt, 1);
    71	
    72		/* Initialize the list of dumps that go under this driver's
    73		 * directory.
    74		 */
    75		INIT_LIST_HEAD(&node->dump_list);
    76	
    77		/* Add the driver's entry to global list */
    78		mutex_lock(&crashdd_mutex);
    79		list_add_tail(&node->list, &crashdd_list);
    80		mutex_unlock(&crashdd_mutex);
    81	
    82		return node;
    83	}
    84	
    85	/**
    86	 * crashdd_get_driver - get an exisiting sysfs driver entry.
    87	 * @name: Name of the directory.
    88	 *
    89	 * Searches and fetches a sysfs entry having @name.  If @name is
    90	 * found, then the reference count is incremented and the entry
    91	 * is returned.  If @name is not found, NULL is returned.
    92	 */
    93	static struct crashdd_driver_node *crashdd_get_driver(const char *name)
    94	{
    95		struct crashdd_driver_node *node;
    96		int found = 0;
    97	
    98		/* Search for an existing driver sysfs entry having @name */
    99		mutex_lock(&crashdd_mutex);
   100		list_for_each_entry(node, &crashdd_list, list) {
 > 101			if (!strcmp(node->kobj->name, name)) {
   102				atomic_inc(&node->refcnt);
   103				found = 1;
   104				break;
   105			}
   106		}
   107		mutex_unlock(&crashdd_mutex);
   108	
   109		if (found)
   110			return node;
   111	
   112		/* No driver with @name found */
   113		return NULL;
   114	}
   115	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32369 bytes --]

^ permalink raw reply

* Re: [PATCH v4 17/17] net: ena: Eliminate duplicate barriers on weakly-ordered archs
From: okaya @ 2018-03-25 13:33 UTC (permalink / raw)
  To: Belgazal, Netanel
  Cc: netdev, timur, sulrich, Kiyanovski, Arthur, linux-arm-msm,
	linux-arm-kernel, Bshara, Saeed, Machulsky, Zorik,
	David S. Miller, Tobias Klauser, linux-kernel
In-Reply-To: <7BF039D2-9E0F-45BE-9F40-8E785814C17D@amazon.com>

On 2018-03-25 08:06, Belgazal, Netanel wrote:
> I think you should either add a parameter to
> ena_com_write_sq_doorbell() or add ena_com_write_sq_doorbell_rel().
> Right now, you have unused function.

That is true. I got rid of ena_com_write_sq_doorbell_rel.

> 
> On 3/20/18, 4:43 AM, "Sinan Kaya" <okaya@codeaurora.org> wrote:
> 
>     Code includes barrier() followed by writel(). writel() already has 
> a
>     barrier
>     on some architectures like arm64.
> 
>     This ends up CPU observing two barriers back to back before 
> executing the
>     register write.
> 
>     Create a new wrapper function with relaxed write operator. Use the 
> new
>     wrapper when a write is following a barrier().
> 
>     Since code already has an explicit barrier call, changing writel() 
> to
>     writel_relaxed().
> 
>     Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>     ---
>      drivers/net/ethernet/amazon/ena/ena_com.c     |  6 ++++--
>      drivers/net/ethernet/amazon/ena/ena_eth_com.h | 22 
> ++++++++++++++++++++--
>      drivers/net/ethernet/amazon/ena/ena_netdev.c  |  4 ++--
>      3 files changed, 26 insertions(+), 6 deletions(-)
> 
>     diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c
> b/drivers/net/ethernet/amazon/ena/ena_com.c
>     index bf2de52..b6e628f 100644
>     --- a/drivers/net/ethernet/amazon/ena/ena_com.c
>     +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
>     @@ -631,7 +631,8 @@ static u32 ena_com_reg_bar_read32(struct
> ena_com_dev *ena_dev, u16 offset)
>      	 */
>      	wmb();
> 
>     -	writel(mmio_read_reg, ena_dev->reg_bar + 
> ENA_REGS_MMIO_REG_READ_OFF);
>     +	writel_relaxed(mmio_read_reg,
>     +		       ena_dev->reg_bar + ENA_REGS_MMIO_REG_READ_OFF);
> 
>      	for (i = 0; i < timeout; i++) {
>      		if (read_resp->req_id == mmio_read->seq_num)
>     @@ -1826,7 +1827,8 @@ void ena_com_aenq_intr_handler(struct
> ena_com_dev *dev, void *data)
> 
>      	/* write the aenq doorbell after all AENQ descriptors were read 
> */
>      	mb();
>     -	writel((u32)aenq->head, dev->reg_bar + 
> ENA_REGS_AENQ_HEAD_DB_OFF);
>     +	writel_relaxed((u32)aenq->head,
>     +		       dev->reg_bar + ENA_REGS_AENQ_HEAD_DB_OFF);
>      }
> 
>      int ena_com_dev_reset(struct ena_com_dev *ena_dev,
>     diff --git a/drivers/net/ethernet/amazon/ena/ena_eth_com.h
> b/drivers/net/ethernet/amazon/ena/ena_eth_com.h
>     index 2f76572..09ef7cd 100644
>     --- a/drivers/net/ethernet/amazon/ena/ena_eth_com.h
>     +++ b/drivers/net/ethernet/amazon/ena/ena_eth_com.h
>     @@ -107,7 +107,8 @@ static inline int
> ena_com_sq_empty_space(struct ena_com_io_sq *io_sq)
>      	return io_sq->q_depth - 1 - cnt;
>      }
> 
>     -static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq 
> *io_sq)
>     +static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq 
> *io_sq,
>     +					    bool relaxed)
>      {
>      	u16 tail;
> 
>     @@ -116,7 +117,24 @@ static inline int
> ena_com_write_sq_doorbell(struct ena_com_io_sq *io_sq)
>      	pr_debug("write submission queue doorbell for queue: %d tail: 
> %d\n",
>      		 io_sq->qid, tail);
> 
>     -	writel(tail, io_sq->db_addr);
>     +	if (relaxed)
>     +		writel_relaxed(tail, io_sq->db_addr);
>     +	else
>     +		writel(tail, io_sq->db_addr);
>     +
>     +	return 0;
>     +}
>     +
>     +static inline int ena_com_write_sq_doorbell_rel(struct
> ena_com_io_sq *io_sq)
>     +{
>     +	u16 tail;
>     +
>     +	tail = io_sq->tail;
>     +
>     +	pr_debug("write submission queue doorbell for queue: %d tail: 
> %d\n",
>     +		 io_sq->qid, tail);
>     +
>     +	writel_relaxed(tail, io_sq->db_addr);
> 
>      	return 0;
>      }
>     diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
>     index 6975150..0530201 100644
>     --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
>     +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
>     @@ -556,7 +556,7 @@ static int ena_refill_rx_bufs(struct ena_ring
> *rx_ring, u32 num)
>      		 * issue a doorbell
>      		 */
>      		wmb();
>     -		ena_com_write_sq_doorbell(rx_ring->ena_com_io_sq);
>     +		ena_com_write_sq_doorbell(rx_ring->ena_com_io_sq, true);
>      	}
> 
>      	rx_ring->next_to_use = next_to_use;
>     @@ -2151,7 +2151,7 @@ static netdev_tx_t ena_start_xmit(struct
> sk_buff *skb, struct net_device *dev)
> 
>      	if (netif_xmit_stopped(txq) || !skb->xmit_more) {
>      		/* trigger the dma engine */
>     -		ena_com_write_sq_doorbell(tx_ring->ena_com_io_sq);
>     +		ena_com_write_sq_doorbell(tx_ring->ena_com_io_sq, false);
>      		u64_stats_update_begin(&tx_ring->syncp);
>      		tx_ring->tx_stats.doorbells++;
>      		u64_stats_update_end(&tx_ring->syncp);
>     --
>     2.7.4

^ permalink raw reply

* RE: [PATCH iproute2-next] rdma: Move RDMA UAPI header file to be under RDMA responsibility
From: Steve Wise @ 2018-03-25 13:52 UTC (permalink / raw)
  To: 'Leon Romanovsky', 'David Ahern'
  Cc: 'Leon Romanovsky', 'netdev',
	'RDMA mailing list', 'Stephen Hemminger'
In-Reply-To: <20180325063856.31709-1-leon@kernel.org>

> Subject: [PATCH iproute2-next] rdma: Move RDMA UAPI header file to be
> under RDMA responsibility
> 
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> In iproute2 package, the updates of UAPIs files are performed
> after the needed feature lands in kernel's net-next tree.
> 
> Such development flow created delays to the rdma tool developers,
> who uses rdma-next tree as a basis for their work.
> 
> Move RDMA UAPI file to be under rdma/ folder, so whole responsibility
> of syncing this file will be on them.
> 
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>

Reviewed-by: Steve Wise <swise@opengridcomputing.com>

^ permalink raw reply

* Re: [PATCH RFC net-next 1/7] net: Fix fib notifer to return errno
From: David Ahern @ 2018-03-25 14:00 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, roopa, shm, jiri, idosch, jakub.kicinski
In-Reply-To: <20180325081632.GA11327@splinter>

On 3/25/18 2:16 AM, Ido Schimmel wrote:
> On Thu, Mar 22, 2018 at 03:57:51PM -0700, David Ahern wrote:
>> Notifier handlers use notifier_from_errno to convert any potential error
>> to an encoded format. As a consequence the other side, call_fib_notifiers
>> in this case, needs to use notifier_to_errno to return the error from
>> the handler back to its caller.
>>
>> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
>> ---
>>  net/core/fib_notifier.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/core/fib_notifier.c b/net/core/fib_notifier.c
>> index 5ace0705a3f9..14ba52ebe8c9 100644
>> --- a/net/core/fib_notifier.c
>> +++ b/net/core/fib_notifier.c
>> @@ -21,8 +21,11 @@ EXPORT_SYMBOL(call_fib_notifier);
>>  int call_fib_notifiers(struct net *net, enum fib_event_type event_type,
>>  		       struct fib_notifier_info *info)
> 
> There's another (less interesting case) - call_fib_notifier(), which is
> used to dump the FIB tables for the caller registering to the
> notification chain.
> 
> For example, if you have a non-default FIB rule in the system and you
> modprobe mlxsw, you'll get a silent failure and routes will not be
> offloaded. On the other hand, I'm not sure we want to fail the module
> loading in such cases.

right. In normal cases the driver is loaded to create the netdevices
long before any networking config is done. So it seems to me the use
case you refer to, some user would have go out of there way to create a
situation where they install config that is not supported by the driver.

> 
> A possible solution is to have the driver emit a warning via extack for
> each route/rule being notified after the abort mechanism was triggered.

extack is not available on module load.

Per past discussions, something you suggested, we need a message for
"out-of-line" cases like this where a driver notifies userspace of a
problem.

^ permalink raw reply

* Re: [PATCH RFC net-next 7/7] netdevsim: Add simple FIB resource controller via devlink
From: David Ahern @ 2018-03-25 14:24 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, roopa, shm, jiri, idosch, jakub.kicinski,
	David Ahern
In-Reply-To: <20180324160238.GE1891@nanopsycho>

On 3/24/18 10:02 AM, Jiri Pirko wrote:
>>>>>>>
>>>>>>> Wait a second. What do you mean by "per-network namespace"? Devlink
>>>>>>> instance is always associated with one physical device. Like an ASIC.
>>>>>>>
>>>>>>>
>>>>>>>> has a net entry, the simplest design is to put it into the namespace of
>>>>>>>> the controller. Without it, controlling resource sizes in namespace
>>>>>>>> 'foobar' has to be done from init_net, which is just wrong.
>>>>>>
>>>>>> you need to look at how netdevsim creates a device per netdevice.
>>>>>
>>>>> That means one devlink instance for each netdevsim device, doesn't it?
>>>>>
>>>>
>>>> yes.
>>>
>>> Still not sure how to handle namespaces in devlink. Originally, I
>>> thought it would be okay to leave all devlink instances in init_ns.
>>> Because what happens if you move netdev to another namespace? Should the
>>> devlink move as well? What if you have multiple ports, each in different
>>> namespace. Can user move devlink instance to another namespace? Etc.
>>>
>>
>> The devlink instance is associated with a 'struct device' and those do
>> not change namespaces AFAIK.
> 
> Yeah. But you put devlink instance into namespace according to struct
> net_device. That is mismatch.
> 

New netdevsim netdevice creates a new 'struct device' which creates a
new devlink instance. The namespace the netdev is created in is then
passed to the devlink instance. Yes, the netdev could change namespaces,
but that is something we can easily prevent if it has a devlink instance.

But really, we are way down a tangent with respect to the intent of this
patch set. I am fine with limiting the example resource controller to
just the init_net; this patch set is really aimed at the bigger idea of
allowing a notifier handler to fail route and rule adds.

We can revisit devlink and namespaces another time, along with moving
switch ports to namespaces - a key part of implementing a feature
equivalent to what Cisco calls a VDC [1].

[1]
https://www.cisco.com/c/en/us/products/collateral/switches/nexus-7000-10-slot-switch/White_Paper_Tech_Overview_Virtual_Device_Contexts.html

^ permalink raw reply

* Re: [PATCH RFC net-next 7/7] netdevsim: Add simple FIB resource controller via devlink
From: David Ahern @ 2018-03-25 14:27 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, davem, roopa, shm, jiri, idosch, David Ahern
In-Reply-To: <20180324233524.41ecd368@cakuba.netronome.com>

On 3/25/18 12:35 AM, Jakub Kicinski wrote:
> On Sat, 24 Mar 2018 09:02:45 -0600, David Ahern wrote:
>>>> diff --git a/drivers/net/netdevsim/Makefile b/drivers/net/netdevsim/Makefile
>>>> index 09388c06171d..449b2a1a1800 100644
>>>> --- a/drivers/net/netdevsim/Makefile
>>>> +++ b/drivers/net/netdevsim/Makefile
>>>> @@ -9,3 +9,7 @@ ifeq ($(CONFIG_BPF_SYSCALL),y)
>>>>  netdevsim-objs += \
>>>>  	bpf.o
>>>>  endif
>>>> +
>>>> +ifneq ($(CONFIG_NET_DEVLINK),)  
>>>
>>> Hm.  Don't you need MAY_USE_DEVLINK dependency perhaps?  
>>
>> mlxsw uses CONFIG_NET_DEVLINK in its Makefile.
>>
>> MAY_USE_DEVLINK seems to only be used in Kconfig files. Not clear to me
>> why it is needed at all.
> 
> NETDEVSIM=y && DEVLINK=m
> 

ok. I purposely did not add DEVLINK as a dependency to netdevsim to make
the resource controller truly optional. Can add it if you prefer.

^ permalink raw reply

* Re: [PATCH net] nfp: use full 40 bits of the NSP buffer address
From: Jakub Kicinski @ 2018-03-25 14:28 UTC (permalink / raw)
  To: davem; +Cc: netdev, oss-drivers, Dirk van der Merwe
In-Reply-To: <20180324024222.20134-1-jakub.kicinski@netronome.com>

On Fri, 23 Mar 2018 19:42:22 -0700, Jakub Kicinski wrote:
> From: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
> 
> The NSP default buffer is a piece of NFP memory where additional
> command data can be placed.  Its format has been copied from
> host buffer, but the PCIe selection bits do not make sense in
> this case.  If those get masked out from a NFP address - writes
> to random place in the chip memory may be issued and crash the
> device.
> 
> This has never been an issue because the buffer used to be
> allocated in memory with less-than-38-bit-long address but that
> is about to change.
> 
> Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
>  drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
> index 39abac678b71..b54ab02f5b33 100644
> --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
> +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
> @@ -71,8 +71,7 @@
>  /* CPP address to retrieve the data from */
>  #define NSP_BUFFER		0x10
>  #define   NSP_BUFFER_CPP	GENMASK_ULL(63, 40)
> -#define   NSP_BUFFER_PCIE	GENMASK_ULL(39, 38)
> -#define   NSP_BUFFER_ADDRESS	GENMASK_ULL(37, 0)
> +#define   NSP_BUFFER_ADDRESS	GENMASK_ULL(39, 0)

self-nack, this is changing a shared define for all bufs, adding a new
define of NFP buf will be cleaner.

^ permalink raw reply

* [PATCH v7 1/7] net: qla3xxx: Eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-25 14:39 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Dept-GELinuxNICDev,
	linux-kernel
In-Reply-To: <1521988761-30344-1-git-send-email-okaya@codeaurora.org>

Code includes wmb() followed by writel(). writel() already has a
barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing
the register write.

Since code already has an explicit barrier call, changing code to

wmb()
writel_relaxed()
mmiowb()

for multi-arch support.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/qlogic/qla3xxx.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c
index 9e5264d..b48f761 100644
--- a/drivers/net/ethernet/qlogic/qla3xxx.c
+++ b/drivers/net/ethernet/qlogic/qla3xxx.c
@@ -1858,8 +1858,9 @@ static void ql_update_small_bufq_prod_index(struct ql3_adapter *qdev)
 			qdev->small_buf_release_cnt -= 8;
 		}
 		wmb();
-		writel(qdev->small_buf_q_producer_index,
-			&port_regs->CommonRegs.rxSmallQProducerIndex);
+		writel_relaxed(qdev->small_buf_q_producer_index,
+			       &port_regs->CommonRegs.rxSmallQProducerIndex);
+		mmiowb();
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH v7 2/7] qlcnic: Eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-25 14:39 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Harish Patil,
	Manish Chopra, Dept-GELinuxNICDev, linux-kernel
In-Reply-To: <1521988761-30344-1-git-send-email-okaya@codeaurora.org>

Code includes wmb() followed by writel(). writel() already has a
barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing
the register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Acked-by: Manish Chopra <manish.chopra@cavium.com>
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
index 46b0372..97c146e7 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
@@ -478,7 +478,7 @@ irqreturn_t qlcnic_83xx_clear_legacy_intr(struct qlcnic_adapter *adapter)
 	wmb();
 
 	/* clear the interrupt trigger control register */
-	writel(0, adapter->isr_int_vec);
+	writel_relaxed(0, adapter->isr_int_vec);
 	intr_val = readl(adapter->isr_int_vec);
 	do {
 		intr_val = readl(adapter->tgt_status_reg);
-- 
2.7.4

^ permalink raw reply related

* [PATCH v7 5/7] net: qlge: Eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-25 14:39 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Harish Patil,
	Manish Chopra, Dept-GELinuxNICDev, linux-kernel
In-Reply-To: <1521988761-30344-1-git-send-email-okaya@codeaurora.org>

Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a wmb().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/qlogic/qlge/qlge.h      | 16 ++++++++++++++++
 drivers/net/ethernet/qlogic/qlge/qlge_main.c |  3 ++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge.h b/drivers/net/ethernet/qlogic/qlge/qlge.h
index 84ac50f..3e71b65 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge.h
+++ b/drivers/net/ethernet/qlogic/qlge/qlge.h
@@ -2185,6 +2185,22 @@ static inline void ql_write_db_reg(u32 val, void __iomem *addr)
 }
 
 /*
+ * Doorbell Registers:
+ * Doorbell registers are virtual registers in the PCI memory space.
+ * The space is allocated by the chip during PCI initialization.  The
+ * device driver finds the doorbell address in BAR 3 in PCI config space.
+ * The registers are used to control outbound and inbound queues. For
+ * example, the producer index for an outbound queue.  Each queue uses
+ * 1 4k chunk of memory.  The lower half of the space is for outbound
+ * queues. The upper half is for inbound queues.
+ * Caller has to guarantee ordering.
+ */
+static inline void ql_write_db_reg_relaxed(u32 val, void __iomem *addr)
+{
+	writel_relaxed(val, addr);
+}
+
+/*
  * Shadow Registers:
  * Outbound queues have a consumer index that is maintained by the chip.
  * Inbound queues have a producer index that is maintained by the chip.
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 50038d9..8293c202 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -2700,7 +2700,8 @@ static netdev_tx_t qlge_send(struct sk_buff *skb, struct net_device *ndev)
 		tx_ring->prod_idx = 0;
 	wmb();
 
-	ql_write_db_reg(tx_ring->prod_idx, tx_ring->prod_idx_db_reg);
+	ql_write_db_reg_relaxed(tx_ring->prod_idx, tx_ring->prod_idx_db_reg);
+	mmiowb();
 	netif_printk(qdev, tx_queued, KERN_DEBUG, qdev->ndev,
 		     "tx queued, slot %d, len %d\n",
 		     tx_ring->prod_idx, skb->len);
-- 
2.7.4

^ permalink raw reply related

* [PATCH v7 6/7] bnxt_en: Eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-25 14:39 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Michael Chan,
	linux-kernel
In-Reply-To: <1521988761-30344-1-git-send-email-okaya@codeaurora.org>

Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a wmb().

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Also add mmiowb() so that write code doesn't move outside of scope.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 1500243..befb538 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1922,7 +1922,7 @@ static int bnxt_poll_work(struct bnxt *bp, struct bnxt_napi *bnapi, int budget)
 		/* Sync BD data before updating doorbell */
 		wmb();
 
-		bnxt_db_write(bp, db, DB_KEY_TX | prod);
+		bnxt_db_write_relaxed(bp, db, DB_KEY_TX | prod);
 	}
 
 	cpr->cp_raw_cons = raw_cons;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 1989c47..5e453b9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1401,6 +1401,15 @@ static inline u32 bnxt_tx_avail(struct bnxt *bp, struct bnxt_tx_ring_info *txr)
 		((txr->tx_prod - txr->tx_cons) & bp->tx_ring_mask);
 }
 
+/* For TX and RX ring doorbells with no ordering guarantee*/
+static inline void bnxt_db_write_relaxed(struct bnxt *bp, void __iomem *db,
+					 u32 val)
+{
+	writel_relaxed(val, db);
+	if (bp->flags & BNXT_FLAG_DOUBLE_DB)
+		writel_relaxed(val, db);
+}
+
 /* For TX and RX ring doorbells */
 static inline void bnxt_db_write(struct bnxt *bp, void __iomem *db, u32 val)
 {
-- 
2.7.4

^ permalink raw reply related

* [PATCH v7 7/7] net: ena: Eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-25 14:39 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Netanel Belgazal,
	Saeed Bishara, Zorik Machulsky, David S. Miller, Tobias Klauser,
	linux-kernel
In-Reply-To: <1521988761-30344-1-git-send-email-okaya@codeaurora.org>

Code includes barrier() followed by writel(). writel() already has a
barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a barrier().

Since code already has an explicit barrier call, changing writel() to
writel_relaxed() and adding mmiowb() for ordering protection.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/amazon/ena/ena_com.c     | 8 ++++++--
 drivers/net/ethernet/amazon/ena/ena_eth_com.h | 8 ++++++--
 drivers/net/ethernet/amazon/ena/ena_netdev.c  | 5 +++--
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index bf2de52..1b9d3130 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -631,8 +631,10 @@ static u32 ena_com_reg_bar_read32(struct ena_com_dev *ena_dev, u16 offset)
 	 */
 	wmb();
 
-	writel(mmio_read_reg, ena_dev->reg_bar + ENA_REGS_MMIO_REG_READ_OFF);
+	writel_relaxed(mmio_read_reg,
+		       ena_dev->reg_bar + ENA_REGS_MMIO_REG_READ_OFF);
 
+	mmiowb();
 	for (i = 0; i < timeout; i++) {
 		if (read_resp->req_id == mmio_read->seq_num)
 			break;
@@ -1826,7 +1828,9 @@ void ena_com_aenq_intr_handler(struct ena_com_dev *dev, void *data)
 
 	/* write the aenq doorbell after all AENQ descriptors were read */
 	mb();
-	writel((u32)aenq->head, dev->reg_bar + ENA_REGS_AENQ_HEAD_DB_OFF);
+	writel_relaxed((u32)aenq->head,
+		       dev->reg_bar + ENA_REGS_AENQ_HEAD_DB_OFF);
+	mmiowb();
 }
 
 int ena_com_dev_reset(struct ena_com_dev *ena_dev,
diff --git a/drivers/net/ethernet/amazon/ena/ena_eth_com.h b/drivers/net/ethernet/amazon/ena/ena_eth_com.h
index 2f76572..6fdc753 100644
--- a/drivers/net/ethernet/amazon/ena/ena_eth_com.h
+++ b/drivers/net/ethernet/amazon/ena/ena_eth_com.h
@@ -107,7 +107,8 @@ static inline int ena_com_sq_empty_space(struct ena_com_io_sq *io_sq)
 	return io_sq->q_depth - 1 - cnt;
 }
 
-static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq *io_sq)
+static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq *io_sq,
+					    bool relaxed)
 {
 	u16 tail;
 
@@ -116,7 +117,10 @@ static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq *io_sq)
 	pr_debug("write submission queue doorbell for queue: %d tail: %d\n",
 		 io_sq->qid, tail);
 
-	writel(tail, io_sq->db_addr);
+	if (relaxed)
+		writel_relaxed(tail, io_sq->db_addr);
+	else
+		writel(tail, io_sq->db_addr);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 6975150..a822e70 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -556,7 +556,8 @@ static int ena_refill_rx_bufs(struct ena_ring *rx_ring, u32 num)
 		 * issue a doorbell
 		 */
 		wmb();
-		ena_com_write_sq_doorbell(rx_ring->ena_com_io_sq);
+		ena_com_write_sq_doorbell(rx_ring->ena_com_io_sq, true);
+		mmiowb();
 	}
 
 	rx_ring->next_to_use = next_to_use;
@@ -2151,7 +2152,7 @@ static netdev_tx_t ena_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	if (netif_xmit_stopped(txq) || !skb->xmit_more) {
 		/* trigger the dma engine */
-		ena_com_write_sq_doorbell(tx_ring->ena_com_io_sq);
+		ena_com_write_sq_doorbell(tx_ring->ena_com_io_sq, false);
 		u64_stats_update_begin(&tx_ring->syncp);
 		tx_ring->tx_stats.doorbells++;
 		u64_stats_update_end(&tx_ring->syncp);
-- 
2.7.4

^ permalink raw reply related

* [PATCH v7 0/7] netdev: Eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-25 14:39 UTC (permalink / raw)
  To: netdev, timur, sulrich; +Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

I did a regex search for wmb() followed by writel() in each drivers
directory.
I scrubbed the ones I care about in this series.

I considered "ease of change", "popular usage" and "performance critical
path" as the determining criteria for my filtering.

We used relaxed API heavily on ARM for a long time but
it did not exist on other architectures. For this reason, relaxed
architectures have been paying double penalty in order to use the common
drivers.

Now that relaxed API is present on all architectures, we can go and scrub
all drivers to see what needs to change and what can remain.

We start with mostly used ones and hope to increase the coverage over time.
It will take a while to cover all drivers.

Feel free to apply patches individually.

Changes since v6:
- bring back amazon ena and add mmiowb, remove
  ena_com_write_sq_doorbell_rel(). 
- remove extra mmiowb in bnx2x
- correct spelling mistake in  bnx2x: Replace doorbell barrier() with wmb()

Sinan Kaya (7):
  net: qla3xxx: Eliminate duplicate barriers on weakly-ordered archs
  qlcnic: Eliminate duplicate barriers on weakly-ordered archs
  bnx2x: Replace doorbell barrier() with wmb()
  bnx2x: Eliminate duplicate barriers on weakly-ordered archs
  net: qlge: Eliminate duplicate barriers on weakly-ordered archs
  bnxt_en: Eliminate duplicate barriers on weakly-ordered archs
  net: ena: Eliminate duplicate barriers on weakly-ordered archs

 drivers/net/ethernet/amazon/ena/ena_com.c           |  8 ++++++--
 drivers/net/ethernet/amazon/ena/ena_eth_com.h       |  8 ++++++--
 drivers/net/ethernet/amazon/ena/ena_netdev.c        |  5 +++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h         | 12 ++++++++----
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c     |  5 +++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h     |  4 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c |  5 +++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c    |  4 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c    |  4 +++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c           |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h           |  9 +++++++++
 drivers/net/ethernet/qlogic/qla3xxx.c               |  5 +++--
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c |  2 +-
 drivers/net/ethernet/qlogic/qlge/qlge.h             | 16 ++++++++++++++++
 drivers/net/ethernet/qlogic/qlge/qlge_main.c        |  3 ++-
 15 files changed, 68 insertions(+), 24 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH v7 3/7] bnx2x: Replace doorbell barrier() with wmb()
From: Sinan Kaya @ 2018-03-25 14:39 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Ariel Elior,
	everest-linux-l2, linux-kernel
In-Reply-To: <1521988761-30344-1-git-send-email-okaya@codeaurora.org>

barrier() doesn't guarantee memory writes to be observed by the hardware on
all architectures. barrier() only tells compiler not to move this code
with respect to other read/writes.

If memory write needs to be observed by the HW, wmb() is the right choice.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c     | 3 ++-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index d7c98e8..0f86f18 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4153,7 +4153,8 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	wmb();
 
 	txdata->tx_db.data.prod += nbd;
-	barrier();
+	/* make sure descriptor update is observed by HW */
+	wmb();
 
 	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
index 1e33abd..39af4f8 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
@@ -2591,7 +2591,8 @@ static int bnx2x_run_loopback(struct bnx2x *bp, int loopback_mode)
 	wmb();
 
 	txdata->tx_db.data.prod += 2;
-	barrier();
+	/* make sure descriptor update is observed by the HW */
+	wmb();
 	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
 
 	mmiowb();
-- 
2.7.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox