Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] net/mlx5: Fix addr's type in mlx5dr_icm_dm
From: Nathan Chancellor @ 2019-09-05  2:14 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev, linux-rdma, linux-kernel, clang-built-linux,
	Nathan Chancellor

clang errors when CONFIG_PHYS_ADDR_T_64BIT is not set:

drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c:121:8:
error: incompatible pointer types passing 'u64 *' (aka 'unsigned long
long *') to parameter of type 'phys_addr_t *' (aka 'unsigned int *')
[-Werror,-Wincompatible-pointer-types]
                                   &icm_mr->dm.addr, &icm_mr->dm.obj_id);
                                   ^~~~~~~~~~~~~~~~
include/linux/mlx5/driver.h:1092:39: note: passing argument to parameter
'addr' here
                         u64 length, u16 uid, phys_addr_t *addr, u32 *obj_id);
                                                           ^
1 error generated.

Use phys_addr_t for addr's type in mlx5dr_icm_dm, which won't change
anything with 64-bit builds because phys_addr_t is u64 when
CONFIG_PHYS_ADDR_T_64BIT is set, which is always when CONFIG_64BIT is
set.

Fixes: 29cf8febd185 ("net/mlx5: DR, ICM pool memory allocator")
Link: https://github.com/ClangBuiltLinux/linux/issues/653
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c
index e76f61e7555e..913f1e5aaaf2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c
@@ -53,7 +53,7 @@ struct mlx5dr_icm_pool {
 struct mlx5dr_icm_dm {
 	u32 obj_id;
 	enum mlx5_sw_icm_type type;
-	u64 addr;
+	phys_addr_t addr;
 	size_t length;
 };
 
-- 
2.23.0


^ permalink raw reply related

* Re: [PATCH v3] tun: fix use-after-free when register netdev failed
From: Yang Yingliang @ 2019-09-05  2:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: David Miller, netdev, eric dumazet, xiyou wangcong, weiyongjun1
In-Reply-To: <314835944.12221643.1567507811976.JavaMail.zimbra@redhat.com>



On 2019/9/3 18:50, Jason Wang wrote:
>
> ----- Original Message -----
>>
>> On 2019/9/3 14:06, Jason Wang wrote:
>>> On 2019/9/3 下午1:42, Yang Yingliang wrote:
>>>>
>>>> On 2019/9/3 11:03, Jason Wang wrote:
>>>>> On 2019/9/3 上午9:45, Yang Yingliang wrote:
>>>>>>
>>>>>> On 2019/9/2 13:32, Jason Wang wrote:
>>>>>>> On 2019/8/23 下午5:36, Yang Yingliang wrote:
>>>>>>>>
>>>>>>>> On 2019/8/23 11:05, Jason Wang wrote:
>>>>>>>>> ----- Original Message -----
>>>>>>>>>> On 2019/8/22 14:07, Yang Yingliang wrote:
>>>>>>>>>>> On 2019/8/22 10:13, Jason Wang wrote:
>>>>>>>>>>>> On 2019/8/20 上午10:28, Jason Wang wrote:
>>>>>>>>>>>>> On 2019/8/20 上午9:25, David Miller wrote:
>>>>>>>>>>>>>> From: Yang Yingliang <yangyingliang@huawei.com>
>>>>>>>>>>>>>> Date: Mon, 19 Aug 2019 21:31:19 +0800
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Call tun_attach() after register_netdevice() to make sure
>>>>>>>>>>>>>>> tfile->tun
>>>>>>>>>>>>>>> is not published until the netdevice is registered. So the
>>>>>>>>>>>>>>> read/write
>>>>>>>>>>>>>>> thread can not use the tun pointer that may freed by
>>>>>>>>>>>>>>> free_netdev().
>>>>>>>>>>>>>>> (The tun and dev pointer are allocated by
>>>>>>>>>>>>>>> alloc_netdev_mqs(), they
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> be freed by netdev_freemem().)
>>>>>>>>>>>>>> register_netdevice() must always be the last operation in
>>>>>>>>>>>>>> the order of
>>>>>>>>>>>>>> network device setup.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> At the point register_netdevice() is called, the device is
>>>>>>>>>>>>>> visible
>>>>>>>>>>>>>> globally
>>>>>>>>>>>>>> and therefore all of it's software state must be fully
>>>>>>>>>>>>>> initialized and
>>>>>>>>>>>>>> ready for us.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You're going to have to find another solution to these
>>>>>>>>>>>>>> problems.
>>>>>>>>>>>>> The device is loosely coupled with sockets/queues. Each side is
>>>>>>>>>>>>> allowed to be go away without caring the other side. So in this
>>>>>>>>>>>>> case, there's a small window that network stack think the
>>>>>>>>>>>>> device has
>>>>>>>>>>>>> one queue but actually not, the code can then safely drop them.
>>>>>>>>>>>>> Maybe it's ok here with some comments?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Or if not, we can try to hold the device before tun_attach
>>>>>>>>>>>>> and drop
>>>>>>>>>>>>> it after register_netdevice().
>>>>>>>>>>>> Hi Yang:
>>>>>>>>>>>>
>>>>>>>>>>>> I think maybe we can try to hold refcnt instead of playing
>>>>>>>>>>>> real num
>>>>>>>>>>>> queues here. Do you want to post a V4?
>>>>>>>>>>> I think the refcnt can prevent freeing the memory in this case.
>>>>>>>>>>> When register_netdevice() failed, free_netdev() will be called
>>>>>>>>>>> directly,
>>>>>>>>>>> dev->pcpu_refcnt and dev are freed without checking refcnt of
>>>>>>>>>>> dev.
>>>>>>>>>> How about using patch-v1 that using a flag to check whether the
>>>>>>>>>> device
>>>>>>>>>> registered successfully.
>>>>>>>>>>
>>>>>>>>> As I said, it lacks sufficient locks or barriers. To be clear, I
>>>>>>>>> meant
>>>>>>>>> something like (compile-test only):
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>>>>>>> index db16d7a13e00..e52678f9f049 100644
>>>>>>>>> --- a/drivers/net/tun.c
>>>>>>>>> +++ b/drivers/net/tun.c
>>>>>>>>> @@ -2828,6 +2828,7 @@ static int tun_set_iff(struct net *net,
>>>>>>>>> struct file *file, struct ifreq *ifr)
>>>>>>>>>                                 (ifr->ifr_flags & TUN_FEATURES);
>>>>>>>>> INIT_LIST_HEAD(&tun->disabled);
>>>>>>>>> +               dev_hold(dev);
>>>>>>>>>                   err = tun_attach(tun, file, false,
>>>>>>>>> ifr->ifr_flags & IFF_NAPI,
>>>>>>>>>                                    ifr->ifr_flags & IFF_NAPI_FRAGS);
>>>>>>>>>                   if (err < 0)
>>>>>>>>> @@ -2836,6 +2837,7 @@ static int tun_set_iff(struct net *net,
>>>>>>>>> struct file *file, struct ifreq *ifr)
>>>>>>>>>                   err = register_netdevice(tun->dev);
>>>>>>>>>                   if (err < 0)
>>>>>>>>>                           goto err_detach;
>>>>>>>>> +               dev_put(dev);
>>>>>>>>>           }
>>>>>>>>>             netif_carrier_on(tun->dev);
>>>>>>>>> @@ -2852,11 +2854,13 @@ static int tun_set_iff(struct net *net,
>>>>>>>>> struct file *file, struct ifreq *ifr)
>>>>>>>>>           return 0;
>>>>>>>>>      err_detach:
>>>>>>>>> +       dev_put(dev);
>>>>>>>>>           tun_detach_all(dev);
>>>>>>>>>           /* register_netdevice() already called
>>>>>>>>> tun_free_netdev() */
>>>>>>>>>           goto err_free_dev;
>>>>>>>>>      err_free_flow:
>>>>>>>>> +       dev_put(dev);
>>>>>>>>>           tun_flow_uninit(tun);
>>>>>>>>> security_tun_dev_free_security(tun->security);
>>>>>>>>>    err_free_stat:
>>>>>>>>>
>>>>>>>>> What's your thought?
>>>>>>>> The dev pointer are freed without checking the refcount in
>>>>>>>> free_netdev() called by err_free_dev
>>>>>>>>
>>>>>>>> path, so I don't understand how the refcount protects this pointer.
>>>>>>>>
>>>>>>> The refcount are guaranteed to be zero there, isn't it?
>>>>>> No, it's not.
>>>>>>
>>>>>> err_free_dev:
>>>>>>          free_netdev(dev);
>>>>>>
>>>>>> void free_netdev(struct net_device *dev)
>>>>>> {
>>>>>> ...
>>>>>>          /* pcpu_refcnt can be freed without checking refcount */
>>>>>>          free_percpu(dev->pcpu_refcnt);
>>>>>>          dev->pcpu_refcnt = NULL;
>>>>>>
>>>>>>          /*  Compatibility with error handling in drivers */
>>>>>>          if (dev->reg_state == NETREG_UNINITIALIZED) {
>>>>>>                  /* dev can be freed without checking refcount */
>>>>>>                  netdev_freemem(dev);
>>>>>>                  return;
>>>>>>          }
>>>>>> ...
>>>>>> }
>>>>>
>>>>> Right, but what I meant is in my patch, when code reaches
>>>>> free_netdev() the refcnt is zero. What did I miss?
>>>> Yes, but it can't fix the UAF problem.
>>>
>>> Well, it looks to me that the dev_put() in tun_put() won't release the
>>> device in this case.
>> The device is not released in tun_put().
>> This is how the UAF occurs:
>>
>>           CPUA                                           CPUB
>>       tun_set_iff()
>>         alloc_netdev_mqs()
>>         tun_attach()
>>                                                       tun_chr_read_iter()
>>                                                         tun_get()
>>                                                         tun_do_read()
>>                                                           tun_ring_recv()
>>         register_netdevice() <-- inject error
>>         goto err_detach
>>         tun_detach_all() <-- set RCV_SHUTDOWN
>>         free_netdev() <-- called from
>>                          err_free_dev path
>>           netdev_freemem() <-- free the memory
>>                             without check refcount
>>           (In this path, the refcount cannot prevent
>>            freeing the memory of dev, and the memory
>>            will be used by dev_put() called by
>>            tun_chr_read_iter() on CPUB.)
>>                                                          (Break from
>>                                                          tun_ring_recv(),
>>                                                          because RCV_SHUTDOWN
>>                                                          is set)
>>                                                        tun_put()
>>                                                        dev_put() <-- use the
>>                                                        memory freed by
>>                                                        netdev_freemem()
>>
>>
> My bad, thanks for the patience. Since all evil come from the
> tfile->tun, how about delay the publishing of tfile->tun until the
> success of registration to make sure dev_put() and dev_hold() work.
> (Compile test only)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index db16d7a13e00..aab0be40d443 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -787,7 +787,8 @@ static void tun_detach_all(struct net_device *dev)
>   }
>   
>   static int tun_attach(struct tun_struct *tun, struct file *file,
> -		      bool skip_filter, bool napi, bool napi_frags)
> +		      bool skip_filter, bool napi, bool napi_frags,
> +		      bool publish_tun)
>   {
>   	struct tun_file *tfile = file->private_data;
>   	struct net_device *dev = tun->dev;
> @@ -870,7 +871,8 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
>   	 * initialized tfile; otherwise we risk using half-initialized
>   	 * object.
>   	 */
> -	rcu_assign_pointer(tfile->tun, tun);
> +	if (publish_tun)
> +		rcu_assign_pointer(tfile->tun, tun);
>   	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>   	tun->numqueues++;
>   	tun_set_real_num_queues(tun);
> @@ -2730,7 +2732,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>   
>   		err = tun_attach(tun, file, ifr->ifr_flags & IFF_NOFILTER,
>   				 ifr->ifr_flags & IFF_NAPI,
> -				 ifr->ifr_flags & IFF_NAPI_FRAGS);
> +				 ifr->ifr_flags & IFF_NAPI_FRAGS, true);
>   		if (err < 0)
>   			return err;
>   
> @@ -2829,13 +2831,17 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>   
>   		INIT_LIST_HEAD(&tun->disabled);
>   		err = tun_attach(tun, file, false, ifr->ifr_flags & IFF_NAPI,
> -				 ifr->ifr_flags & IFF_NAPI_FRAGS);
> +				 ifr->ifr_flags & IFF_NAPI_FRAGS, false);
>   		if (err < 0)
>   			goto err_free_flow;
>   
>   		err = register_netdevice(tun->dev);
>   		if (err < 0)
>   			goto err_detach;
> +		/* free_netdev() won't check refcnt, to aovid race
> +		 * with dev_put() we need publish tun after registration.
> +		 */
> +		rcu_assign_pointer(tfile->tun, tun);
>   	}
>   
>   	netif_carrier_on(tun->dev);
> @@ -2978,7 +2984,7 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
>   		if (ret < 0)
>   			goto unlock;
>   		ret = tun_attach(tun, file, false, tun->flags & IFF_NAPI,
> -				 tun->flags & IFF_NAPI_FRAGS);
> +				 tun->flags & IFF_NAPI_FRAGS, true);
>   	} else if (ifr->ifr_flags & IFF_DETACH_QUEUE) {
>   		tun = rtnl_dereference(tfile->tun);
>   		if (!tun || !(tun->flags & IFF_MULTI_QUEUE) || tfile->detached)
I tested this patch, it can fix this UAF.

But as Eric replied in my patch v1, tun_get() will return NULL as long 
as tun_set_iff() (TUNSETIFF ioctl())
has not yet been called. This could break some applications, since 
tun_get() is used from poll()
and other syscalls.

I think it should return '-EAGIAN' instead of '-EBADFD' in this way. I 
did some change in patch v1,
if it's OK, I will send a v4.

  drivers/net/tun.c | 34 ++++++++++++++++++++++++++++++----
  1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index db16d7a13e00..0abc654010e3 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -115,6 +115,7 @@ do {                                \
  /* High bits in flags field are unused. */
  #define TUN_VNET_LE     0x80000000
  #define TUN_VNET_BE     0x40000000
+#define TUN_DEV_REGISTERED    0x20000000

  #define TUN_FEATURES (IFF_NO_PI | IFF_ONE_QUEUE | IFF_VNET_HDR | \
                IFF_MULTI_QUEUE | IFF_NAPI | IFF_NAPI_FRAGS)
@@ -719,8 +720,10 @@ static void __tun_detach(struct tun_file *tfile, 
bool clean)
              netif_carrier_off(tun->dev);

              if (!(tun->flags & IFF_PERSIST) &&
-                tun->dev->reg_state == NETREG_REGISTERED)
+                tun->dev->reg_state == NETREG_REGISTERED) {
+                tun->flags &= ~TUN_DEV_REGISTERED;
                  unregister_netdevice(tun->dev);
+            }
          }
          if (tun)
              xdp_rxq_info_unreg(&tfile->xdp_rxq);
@@ -884,8 +887,12 @@ static struct tun_struct *tun_get(struct tun_file 
*tfile)

      rcu_read_lock();
      tun = rcu_dereference(tfile->tun);
-    if (tun)
-        dev_hold(tun->dev);
+    if (tun) {
+        if (tun->flags & TUN_DEV_REGISTERED)
+            dev_hold(tun->dev);
+        else
+            tun = ERR_PTR(-EAGAIN);
+    }
      rcu_read_unlock();

      return tun;
@@ -1428,7 +1435,7 @@ static __poll_t tun_chr_poll(struct file *file, 
poll_table *wait)
      struct sock *sk;
      __poll_t mask = 0;

-    if (!tun)
+    if (IS_ERR_OR_NULL(tun))
          return EPOLLERR;

      sk = tfile->socket.sk;
@@ -2017,6 +2024,9 @@ static ssize_t tun_chr_write_iter(struct kiocb 
*iocb, struct iov_iter *from)
      if (!tun)
          return -EBADFD;

+    if (IS_ERR(tun))
+        return PTR_ERR(tun);
+
      result = tun_get_user(tun, tfile, NULL, from,
                    file->f_flags & O_NONBLOCK, false);

@@ -2242,6 +2252,10 @@ static ssize_t tun_chr_read_iter(struct kiocb 
*iocb, struct iov_iter *to)

      if (!tun)
          return -EBADFD;
+
+    if (IS_ERR(tun))
+        return PTR_ERR(tun);
+
      ret = tun_do_read(tun, tfile, to, file->f_flags & O_NONBLOCK, NULL);
      ret = min_t(ssize_t, ret, len);
      if (ret > 0)
@@ -2525,6 +2539,9 @@ static int tun_sendmsg(struct socket *sock, struct 
msghdr *m, size_t total_len)
      if (!tun)
          return -EBADFD;

+    if (IS_ERR(tun))
+        return PTR_ERR(tun);
+
      if (ctl && (ctl->type == TUN_MSG_PTR)) {
          struct tun_page tpage;
          int n = ctl->num;
@@ -2573,6 +2590,11 @@ static int tun_recvmsg(struct socket *sock, 
struct msghdr *m, size_t total_len,
          goto out_free;
      }

+    if (IS_ERR(tun)) {
+        ret = PTR_ERR(tun);
+        goto out_free;
+    }
+
      if (flags & ~(MSG_DONTWAIT|MSG_TRUNC|MSG_ERRQUEUE)) {
          ret = -EINVAL;
          goto out_put_tun;
@@ -2622,6 +2644,9 @@ static int tun_peek_len(struct socket *sock)
      if (!tun)
          return 0;

+    if (IS_ERR(tun))
+        return PTR_ERR(tun);
+
      ret = PTR_RING_PEEK_CALL(&tfile->tx_ring, tun_ptr_peek_len);
      tun_put(tun);

@@ -2836,6 +2861,7 @@ static int tun_set_iff(struct net *net, struct 
file *file, struct ifreq *ifr)
          err = register_netdevice(tun->dev);
          if (err < 0)
              goto err_detach;
+        tun->flags |= TUN_DEV_REGISTERED;
      }

      netif_carrier_on(tun->dev);
-- 
2.17.1


^ permalink raw reply related

* [PATCH] net/mlx5: Fix rt's type in dr_action_create_reformat_action
From: Nathan Chancellor @ 2019-09-05  1:47 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev, linux-rdma, linux-kernel, clang-built-linux,
	Nathan Chancellor

clang warns:

drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c:1080:9:
warning: implicit conversion from enumeration type 'enum
mlx5_reformat_ctx_type' to different enumeration type 'enum
mlx5dr_action_type' [-Wenum-conversion]
                        rt = MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL;
                           ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c:1082:9:
warning: implicit conversion from enumeration type 'enum
mlx5_reformat_ctx_type' to different enumeration type 'enum
mlx5dr_action_type' [-Wenum-conversion]
                        rt = MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL;
                           ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c:1084:51:
warning: implicit conversion from enumeration type 'enum
mlx5dr_action_type' to different enumeration type 'enum
mlx5_reformat_ctx_type' [-Wenum-conversion]
                ret = mlx5dr_cmd_create_reformat_ctx(dmn->mdev, rt, data_sz, data,
                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~            ^~
3 warnings generated.

Use the right type for rt, which is mlx5_reformat_ctx_type so there are
no warnings about mismatched types.

Fixes: 9db810ed2d37 ("net/mlx5: DR, Expose steering action functionality")
Link: https://github.com/ClangBuiltLinux/linux/issues/652
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
index a02f87f85c17..7d81a7735de5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
@@ -1074,7 +1074,7 @@ dr_action_create_reformat_action(struct mlx5dr_domain *dmn,
 	case DR_ACTION_TYP_L2_TO_TNL_L2:
 	case DR_ACTION_TYP_L2_TO_TNL_L3:
 	{
-		enum mlx5dr_action_type rt;
+		enum mlx5_reformat_ctx_type rt;
 
 		if (action->action_type == DR_ACTION_TYP_L2_TO_TNL_L2)
 			rt = MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL;
-- 
2.23.0


^ permalink raw reply related

* [PATCH v2 net] net: sonic: return NETDEV_TX_OK if failed to map buffer
From: Mao Wenan @ 2019-09-05  1:57 UTC (permalink / raw)
  To: eric.dumazet, tsbogend, davem
  Cc: netdev, linux-kernel, kernel-janitors, Mao Wenan
In-Reply-To: <960c7d1f-6e80-84fb-8d7a-9c5692605500@huawei.com>

NETDEV_TX_BUSY really should only be used by drivers that call
netif_tx_stop_queue() at the wrong moment. If dma_map_single() is
failed to map tx DMA buffer, it might trigger an infinite loop.
This patch use NETDEV_TX_OK instead of NETDEV_TX_BUSY, and change
printk to pr_err_ratelimited.

Fixes: d9fb9f384292 ("*sonic/natsemi/ns83829: Move the National Semi-conductor drivers")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
---
 v2: change subject and description of patch, use NETDEV_TX_OK instead of NETDEV_TX_BUSY.
 drivers/net/ethernet/natsemi/sonic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/natsemi/sonic.c b/drivers/net/ethernet/natsemi/sonic.c
index d0a01e8f000a..18fd62fbfb64 100644
--- a/drivers/net/ethernet/natsemi/sonic.c
+++ b/drivers/net/ethernet/natsemi/sonic.c
@@ -232,9 +232,9 @@ static int sonic_send_packet(struct sk_buff *skb, struct net_device *dev)
 
 	laddr = dma_map_single(lp->device, skb->data, length, DMA_TO_DEVICE);
 	if (!laddr) {
-		printk(KERN_ERR "%s: failed to map tx DMA buffer.\n", dev->name);
+		pr_err_ratelimited("%s: failed to map tx DMA buffer.\n", dev->name);
 		dev_kfree_skb(skb);
-		return NETDEV_TX_BUSY;
+		return NETDEV_TX_OK;
 	}
 
 	sonic_tda_put(dev, entry, SONIC_TD_STATUS, 0);       /* clear status */
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v3 bpf-next 2/3] bpf: implement CAP_BPF
From: Alexei Starovoitov @ 2019-09-05  1:32 UTC (permalink / raw)
  To: Song Liu
  Cc: Alexei Starovoitov, David S . Miller, Daniel Borkmann,
	Peter Zijlstra, Andy Lutomirski, Networking, bpf, Kernel Team,
	linux-api@vger.kernel.org
In-Reply-To: <CE3B644F-D1A5-49F7-96B6-FD663C5F8961@fb.com>

On Thu, Sep 05, 2019 at 12:34:36AM +0000, Song Liu wrote:
> 
> 
> > On Sep 4, 2019, at 11:43 AM, Alexei Starovoitov <ast@kernel.org> wrote:
> > 
> > Implement permissions as stated in uapi/linux/capability.h
> > 
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > 
> 
> [...]
> 
> > @@ -1648,11 +1648,11 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
> > 	is_gpl = license_is_gpl_compatible(license);
> > 
> > 	if (attr->insn_cnt == 0 ||
> > -	    attr->insn_cnt > (capable(CAP_SYS_ADMIN) ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
> > +	    attr->insn_cnt > (capable_bpf() ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
> > 		return -E2BIG;
> > 	if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
> > 	    type != BPF_PROG_TYPE_CGROUP_SKB &&
> > -	    !capable(CAP_SYS_ADMIN))
> > +	    !capable_bpf())
> > 		return -EPERM;
> 
> Do we allow load BPF_PROG_TYPE_SOCKET_FILTER and BPF_PROG_TYPE_CGROUP_SKB
> without CAP_BPF? If so, maybe highlight in the header?

of course. there is no change in behavior.
'highlight in the header'?
you mean in commit log?
I think it's a bit weird to describe things in commit that patch
is _not_ changing vs things that patch does actually change.
This type of comment would be great in a doc though.
The doc will be coming separately in the follow up assuming
the whole thing lands. I'll remember to note that bit.


^ permalink raw reply

* RE: [PATCH v2 net-next] net: stmmac: Add support for MDIO interrupts
From: Voon, Weifeng @ 2019-09-05  1:23 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S. Miller, Maxime Coquelin, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Jose Abreu, Giuseppe Cavallaro,
	Alexandre Torgue, Ong, Boon Leong
In-Reply-To: <20190904145804.GA9068@lunn.ch>

> On Wed, Sep 04, 2019 at 10:02:54PM +0800, Voon Weifeng wrote:
> > From: "Chuah, Kim Tatt" <kim.tatt.chuah@intel.com>
> >
> > DW EQoS v5.xx controllers added capability for interrupt generation
> > when MDIO interface is done (GMII Busy bit is cleared).
> > This patch adds support for this interrupt on supported HW to avoid
> > polling on GMII Busy bit.
> >
> > stmmac_mdio_read() & stmmac_mdio_write() will sleep until wake_up() is
> > called by the interrupt handler.
> >
> > Reviewed-by: Voon Weifeng <weifeng.voon@intel.com>
> > Reviewed-by: Kweh, Hock Leong <hock.leong.kweh@intel.com>
> > Reviewed-by: Ong Boon Leong <boon.leong.ong@intel.com>
> > Signed-off-by: Chuah, Kim Tatt <kim.tatt.chuah@intel.com>
> > Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
> > Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
> 
> Hi Voon
> 
> It is normal to include a short description of what you changed between
> the previous version and this version.

The change log is near the end of the patch:
/**
--
Changelog v2
*mdio interrupt mode or polling mode will depends on mdio interrupt enable bit
*Disable the mdio interrupt enable bit in stmmac_release
*Remove the condition for initialize wait queues
*Applied reverse Christmas tree
1.9.1

> 
> The formatting of this patch also looks a bit odd. Did you use git
> format-patch ; git send-email?

Yes, I do git format-patch, then ./scripts/checkpatch.pl. 
Lastly git send-email

> 
> Thanks
> 	Andrew

^ permalink raw reply

* [PATCH v2 3/4] spi: spi-fsl-dspi: Implement the PTP system timestamping for TCFQ mode
From: Vladimir Oltean @ 2019-09-05  1:01 UTC (permalink / raw)
  To: broonie, h.feurstein, mlichvar, richardcochran, andrew,
	f.fainelli
  Cc: linux-spi, netdev, Vladimir Oltean
In-Reply-To: <20190905010114.26718-1-olteanv@gmail.com>

In this mode, the DSPI controller uses PIO to transfer word by word. In
comparison, in EOQ mode the 4-word deep FIFO is being used, hence the
current logic will need some adaptation for which I do not have the
hardware (Coldfire) to test. It is not clear what is the timing of DMA
transfers and whether timestamping in the driver brings any overall
performance increase compared to regular timestamping done in the core.

Short phc2sys summary after 58 minutes of running on LS1021A-TSN with
interrupts disabled during the critical section:

  offset: min -26251 max 16416 mean -21.8672 std dev 863.416
  delay: min 4720 max 57280 mean 5182.49 std dev 1607.19
  lost servo lock 3 times

Summary of the same phc2sys service running for 120 minutes with
interrupts disabled:

  offset: min -378 max 381 mean -0.0083089 std dev 101.495
  delay: min 4720 max 5920 mean 5129.38 std dev 154.899
  lost servo lock 0 times

The minimum delay (pre to post time) in nanoseconds is the same, but the
maximum delay is quite a bit higher, due to interrupts getting sometimes
executed and interfering with the measurement. Hence set disable_irqs
whenever possible (aka when the driver runs in poll mode - otherwise it
would be a contradiction in terms).

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes in v2:
- Adapted to the newly introduced SPI core API from 02/04.

 drivers/spi/spi-fsl-dspi.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index bec758e978fb..7caea2da4397 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -129,6 +129,7 @@ enum dspi_trans_mode {
 struct fsl_dspi_devtype_data {
 	enum dspi_trans_mode	trans_mode;
 	u8			max_clock_factor;
+	bool			ptp_sts_supported;
 	bool			xspi_mode;
 };
 
@@ -140,12 +141,14 @@ static const struct fsl_dspi_devtype_data vf610_data = {
 static const struct fsl_dspi_devtype_data ls1021a_v1_data = {
 	.trans_mode		= DSPI_TCFQ_MODE,
 	.max_clock_factor	= 8,
+	.ptp_sts_supported	= true,
 	.xspi_mode		= true,
 };
 
 static const struct fsl_dspi_devtype_data ls2085a_data = {
 	.trans_mode		= DSPI_TCFQ_MODE,
 	.max_clock_factor	= 8,
+	.ptp_sts_supported	= true,
 };
 
 static const struct fsl_dspi_devtype_data coldfire_data = {
@@ -654,6 +657,9 @@ static int dspi_rxtx(struct fsl_dspi *dspi)
 	u16 spi_tcnt;
 	u32 spi_tcr;
 
+	spi_take_timestamp_post(dspi->ctlr, dspi->cur_transfer,
+				dspi->tx - dspi->bytes_per_word, !dspi->irq);
+
 	/* Get transfer counter (in number of SPI transfers). It was
 	 * reset to 0 when transfer(s) were started.
 	 */
@@ -672,6 +678,9 @@ static int dspi_rxtx(struct fsl_dspi *dspi)
 		/* Success! */
 		return 0;
 
+	spi_take_timestamp_pre(dspi->ctlr, dspi->cur_transfer,
+			       dspi->tx, !dspi->irq);
+
 	if (trans_mode == DSPI_EOQ_MODE)
 		dspi_eoq_write(dspi);
 	else if (trans_mode == DSPI_TCFQ_MODE)
@@ -779,6 +788,9 @@ static int dspi_transfer_one_message(struct spi_controller *ctlr,
 				     SPI_FRAME_EBITS(transfer->bits_per_word) |
 				     SPI_CTARE_DTCP(1));
 
+		spi_take_timestamp_pre(dspi->ctlr, dspi->cur_transfer,
+				       dspi->tx, !dspi->irq);
+
 		trans_mode = dspi->devtype_data->trans_mode;
 		switch (trans_mode) {
 		case DSPI_EOQ_MODE:
@@ -1132,6 +1144,7 @@ static int dspi_probe(struct platform_device *pdev)
 	init_waitqueue_head(&dspi->waitq);
 
 poll_mode:
+
 	if (dspi->devtype_data->trans_mode == DSPI_DMA_MODE) {
 		ret = dspi_request_dma(dspi, res->start);
 		if (ret < 0) {
@@ -1143,6 +1156,8 @@ static int dspi_probe(struct platform_device *pdev)
 	ctlr->max_speed_hz =
 		clk_get_rate(dspi->clk) / dspi->devtype_data->max_clock_factor;
 
+	ctlr->ptp_sts_supported = dspi->devtype_data->ptp_sts_supported;
+
 	platform_set_drvdata(pdev, ctlr);
 
 	ret = spi_register_controller(ctlr);
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 2/4] spi: Add a PTP system timestamp to the transfer structure
From: Vladimir Oltean @ 2019-09-05  1:01 UTC (permalink / raw)
  To: broonie, h.feurstein, mlichvar, richardcochran, andrew,
	f.fainelli
  Cc: linux-spi, netdev, Vladimir Oltean
In-Reply-To: <20190905010114.26718-1-olteanv@gmail.com>

SPI is one of the interfaces used to access devices which have a POSIX
clock driver (real time clocks, 1588 timers etc). The fact that the SPI
bus is slow is not what the main problem is, but rather the fact that
drivers don't take a constant amount of time in transferring data over
SPI. When there is a high delay in the readout of time, there will be
uncertainty in the value that has been read out of the peripheral.
When that delay is constant, the uncertainty can at least be
approximated with a certain accuracy which is fine more often than not.

Timing jitter occurs all over in the kernel code, and is mainly caused
by having to let go of the CPU for various reasons such as preemption,
servicing interrupts, going to sleep, etc. Another major reason is CPU
dynamic frequency scaling.

It turns out that the problem of retrieving time from a SPI peripheral
with high accuracy can be solved by the use of "PTP system
timestamping" - a mechanism to correlate the time when the device has
snapshotted its internal time counter with the Linux system time at that
same moment. This is sufficient for having a precise time measurement -
it is not necessary for the whole SPI transfer to be transmitted "as
fast as possible", or "as low-jitter as possible". The system has to be
low-jitter for a very short amount of time to be effective.

This patch introduces a PTP system timestamping mechanism in struct
spi_transfer. This is to be used by SPI device drivers when they need to
know the exact time at which the underlying device's time was
snapshotted. More often than not, SPI peripherals have a very exact
timing for when their SPI-to-interconnect bridge issues a transaction
for snapshotting and reading the time register, and that will be
dependent on when the SPI-to-interconnect bridge figures out that this
is what it should do, aka as soon as it sees byte N of the SPI transfer.
Since spi_device drivers are the ones who'd know best how the peripheral
behaves in this regard, expose a mechanism in spi_transfer which allows
them to specify which word (or word range) from the transfer should be
timestamped.

Add a default implementation of the PTP system timestamping in the SPI
core. This is not going to be satisfactory performance-wise, but should
at least increase the likelihood that SPI device drivers will use PTP
system timestamping in the future.
There are 3 entry points from the core towards the SPI controller
drivers:

- transfer_one: The driver is passed individual spi_transfers to
  execute. This is the easiest to timestamp.

- transfer_one_message: The core passes the driver an entire spi_message
  (a potential batch of spi_transfers). The core puts the same pre and
  post timestamp to all transfers within a message. This is not ideal,
  but nothing better can be done by default anyway, since the core has
  no insight into how the driver batches the transfers.

- transfer: Like transfer_one_message, but for unqueued drivers (i.e.
  the driver implements its own queue scheduling).

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes in v2:
- Added even more helpful helper functions: spi_take_timestamp_pre and
  spi_take_timestamp_pre, that drivers can simply call without keeping
  any state themselves.
- Updated comments and commit message.

 drivers/spi/spi.c       | 127 ++++++++++++++++++++++++++++++++++++++++
 include/linux/spi/spi.h |  61 +++++++++++++++++++
 2 files changed, 188 insertions(+)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 9ce86f761763..bfc1f1e7ae12 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1171,6 +1171,11 @@ static int spi_transfer_one_message(struct spi_controller *ctlr,
 		spi_statistics_add_transfer_stats(statm, xfer, ctlr);
 		spi_statistics_add_transfer_stats(stats, xfer, ctlr);
 
+		if (!ctlr->ptp_sts_supported) {
+			xfer->ptp_sts_word_pre = 0;
+			ptp_read_system_prets(xfer->ptp_sts);
+		}
+
 		if (xfer->tx_buf || xfer->rx_buf) {
 			reinit_completion(&ctlr->xfer_completion);
 
@@ -1197,6 +1202,11 @@ static int spi_transfer_one_message(struct spi_controller *ctlr,
 					xfer->len);
 		}
 
+		if (!ctlr->ptp_sts_supported) {
+			ptp_read_system_postts(xfer->ptp_sts);
+			xfer->ptp_sts_word_post = xfer->len;
+		}
+
 		trace_spi_transfer_stop(msg, xfer);
 
 		if (msg->status != -EINPROGRESS)
@@ -1265,6 +1275,7 @@ EXPORT_SYMBOL_GPL(spi_finalize_current_transfer);
  */
 static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 {
+	struct spi_transfer *xfer;
 	struct spi_message *msg;
 	bool was_busy = false;
 	unsigned long flags;
@@ -1391,6 +1402,13 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 		goto out;
 	}
 
+	if (!ctlr->ptp_sts_supported && !ctlr->transfer_one) {
+		list_for_each_entry(xfer, &msg->transfers, transfer_list) {
+			xfer->ptp_sts_word_pre = 0;
+			ptp_read_system_prets(xfer->ptp_sts);
+		}
+	}
+
 	ret = ctlr->transfer_one_message(ctlr, msg);
 	if (ret) {
 		dev_err(&ctlr->dev,
@@ -1418,6 +1436,99 @@ static void spi_pump_messages(struct kthread_work *work)
 	__spi_pump_messages(ctlr, true);
 }
 
+/**
+ * spi_take_timestamp_pre - helper for drivers to collect the beginning of the
+ *			    TX timestamp for the requested byte from the SPI
+ *			    transfer. The frequency with which this function
+ *			    must be called (once per word, once for the whole
+ *			    transfer, once per batch of words etc) is arbitrary
+ *			    as long as the @tx buffer offset is greater than or
+ *			    equal to the requested byte at the time of the
+ *			    call. The timestamp is only taken once, at the
+ *			    first such call. It is assumed that the driver
+ *			    advances its @tx buffer pointer monotonically.
+ * @ctlr: Pointer to the spi_controller structure of the driver
+ * @xfer: Pointer to the transfer being timestamped
+ * @tx: Pointer to the current word within the xfer->tx_buf that the driver is
+ *	preparing to transmit right now.
+ * @irqs_off: If true, will disable IRQs and preemption for the duration of the
+ *	      transfer, for less jitter in time measurement. Only compatible
+ *	      with PIO drivers. If true, must follow up with
+ *	      spi_take_timestamp_post or otherwise system will crash.
+ *	      WARNING: for fully predictable results, the CPU frequency must
+ *	      also be under control (governor).
+ */
+void spi_take_timestamp_pre(struct spi_controller *ctlr,
+			    struct spi_transfer *xfer,
+			    const void *tx, bool irqs_off)
+{
+	u8 bytes_per_word = DIV_ROUND_UP(xfer->bits_per_word, 8);
+
+	if (!xfer->ptp_sts)
+		return;
+
+	if (xfer->timestamped_pre)
+		return;
+
+	if (tx < (xfer->tx_buf + xfer->ptp_sts_word_pre * bytes_per_word))
+		return;
+
+	/* Capture the resolution of the timestamp */
+	xfer->ptp_sts_word_pre = (tx - xfer->tx_buf) / bytes_per_word;
+
+	xfer->timestamped_pre = true;
+
+	if (irqs_off) {
+		local_irq_save(ctlr->irq_flags);
+		preempt_disable();
+	}
+
+	ptp_read_system_prets(xfer->ptp_sts);
+}
+EXPORT_SYMBOL_GPL(spi_take_timestamp_pre);
+
+/**
+ * spi_take_timestamp_post - helper for drivers to collect the end of the
+ *			     TX timestamp for the requested byte from the SPI
+ *			     transfer. Can be called with an arbitrary
+ *			     frequency: only the first call where @tx exceeds
+ *			     or is equal to the requested word will be
+ *			     timestamped.
+ * @ctlr: Pointer to the spi_controller structure of the driver
+ * @xfer: Pointer to the transfer being timestamped
+ * @tx: Pointer to the current word within the xfer->tx_buf that the driver has
+ *	just transmitted.
+ * @irqs_off: If true, will re-enable IRQs and preemption for the local CPU.
+ */
+void spi_take_timestamp_post(struct spi_controller *ctlr,
+			     struct spi_transfer *xfer,
+			     const void *tx, bool irqs_off)
+{
+	u8 bytes_per_word = DIV_ROUND_UP(xfer->bits_per_word, 8);
+
+	if (!xfer->ptp_sts)
+		return;
+
+	if (xfer->timestamped_post)
+		return;
+
+	if (tx < (xfer->tx_buf + xfer->ptp_sts_word_post * bytes_per_word))
+		return;
+
+	ptp_read_system_postts(xfer->ptp_sts);
+
+	if (irqs_off) {
+		local_irq_restore(ctlr->irq_flags);
+		preempt_enable();
+	}
+
+	/* Capture the resolution of the timestamp */
+	xfer->ptp_sts_word_post = (tx - xfer->tx_buf) / bytes_per_word;
+
+	xfer->timestamped_post = true;
+}
+EXPORT_SYMBOL_GPL(spi_take_timestamp_post);
+
 /**
  * spi_set_thread_rt - set the controller to pump at realtime priority
  * @ctlr: controller to boost priority of
@@ -1503,6 +1614,7 @@ EXPORT_SYMBOL_GPL(spi_get_next_queued_message);
  */
 void spi_finalize_current_message(struct spi_controller *ctlr)
 {
+	struct spi_transfer *xfer;
 	struct spi_message *mesg;
 	unsigned long flags;
 	int ret;
@@ -1511,6 +1623,13 @@ void spi_finalize_current_message(struct spi_controller *ctlr)
 	mesg = ctlr->cur_msg;
 	spin_unlock_irqrestore(&ctlr->queue_lock, flags);
 
+	if (!ctlr->ptp_sts_supported && !ctlr->transfer_one) {
+		list_for_each_entry(xfer, &mesg->transfers, transfer_list) {
+			ptp_read_system_postts(xfer->ptp_sts);
+			xfer->ptp_sts_word_post = xfer->len;
+		}
+	}
+
 	spi_unmap_msg(ctlr, mesg);
 
 	if (ctlr->cur_msg_prepared && ctlr->unprepare_message) {
@@ -3271,6 +3390,7 @@ static int __spi_validate(struct spi_device *spi, struct spi_message *message)
 static int __spi_async(struct spi_device *spi, struct spi_message *message)
 {
 	struct spi_controller *ctlr = spi->controller;
+	struct spi_transfer *xfer;
 
 	/*
 	 * Some controllers do not support doing regular SPI transfers. Return
@@ -3286,6 +3406,13 @@ static int __spi_async(struct spi_device *spi, struct spi_message *message)
 
 	trace_spi_message_submit(message);
 
+	if (!ctlr->ptp_sts_supported) {
+		list_for_each_entry(xfer, &message->transfers, transfer_list) {
+			xfer->ptp_sts_word_pre = 0;
+			ptp_read_system_prets(xfer->ptp_sts);
+		}
+	}
+
 	return ctlr->transfer(spi, message);
 }
 
diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
index af4f265d0f67..27f6b046cf92 100644
--- a/include/linux/spi/spi.h
+++ b/include/linux/spi/spi.h
@@ -13,6 +13,7 @@
 #include <linux/completion.h>
 #include <linux/scatterlist.h>
 #include <linux/gpio/consumer.h>
+#include <linux/ptp_clock_kernel.h>
 
 struct dma_chan;
 struct property_entry;
@@ -409,6 +410,12 @@ static inline void spi_unregister_driver(struct spi_driver *sdrv)
  * @fw_translate_cs: If the boot firmware uses different numbering scheme
  *	what Linux expects, this optional hook can be used to translate
  *	between the two.
+ * @ptp_sts_supported: If the driver sets this to true, it must provide a
+ *	time snapshot in @spi_transfer->ptp_sts as close as possible to the
+ *	moment in time when @spi_transfer->ptp_sts_word_pre and
+ *	@spi_transfer->ptp_sts_word_post were transmitted.
+ *	If the driver does not set this, the SPI core takes the snapshot as
+ *	close to the driver hand-over as possible.
  *
  * Each SPI controller can communicate with one or more @spi_device
  * children.  These make a small bus, sharing MOSI, MISO and SCK signals
@@ -604,6 +611,15 @@ struct spi_controller {
 	void			*dummy_tx;
 
 	int (*fw_translate_cs)(struct spi_controller *ctlr, unsigned cs);
+
+	/*
+	 * Driver sets this field to indicate it is able to snapshot SPI
+	 * transfers (needed e.g. for reading the time of POSIX clocks)
+	 */
+	bool			ptp_sts_supported;
+
+	/* Interrupt enable state during PTP system timestamping */
+	unsigned long		irq_flags;
 };
 
 static inline void *spi_controller_get_devdata(struct spi_controller *ctlr)
@@ -644,6 +660,14 @@ extern struct spi_message *spi_get_next_queued_message(struct spi_controller *ct
 extern void spi_finalize_current_message(struct spi_controller *ctlr);
 extern void spi_finalize_current_transfer(struct spi_controller *ctlr);
 
+/* Helper calls for driver to timestamp transfer */
+void spi_take_timestamp_pre(struct spi_controller *ctlr,
+			    struct spi_transfer *xfer,
+			    const void *tx, bool irqs_off);
+void spi_take_timestamp_post(struct spi_controller *ctlr,
+			     struct spi_transfer *xfer,
+			     const void *tx, bool irqs_off);
+
 /* the spi driver core manages memory for the spi_controller classdev */
 extern struct spi_controller *__spi_alloc_controller(struct device *host,
 						unsigned int size, bool slave);
@@ -753,6 +777,35 @@ extern void spi_res_release(struct spi_controller *ctlr,
  * @transfer_list: transfers are sequenced through @spi_message.transfers
  * @tx_sg: Scatterlist for transmit, currently not for client use
  * @rx_sg: Scatterlist for receive, currently not for client use
+ * @ptp_sts_word_pre: The word (subject to bits_per_word semantics) offset
+ *	within @tx_buf for which the SPI device is requesting that the time
+ *	snapshot for this transfer begins. Upon completing the SPI transfer,
+ *	this value may have changed compared to what was requested, depending
+ *	on the available snapshotting resolution (DMA transfer,
+ *	@ptp_sts_supported is false, etc).
+ * @ptp_sts_word_post: See @ptp_sts_word_post. The two can be equal (meaning
+ *	that a single byte should be snapshotted).
+ *	If the core takes care of the timestamp (if @ptp_sts_supported is false
+ *	for this controller), it will set @ptp_sts_word_pre to 0, and
+ *	@ptp_sts_word_post to the length of the transfer. This is done
+ *	purposefully (instead of setting to spi_transfer->len - 1) to denote
+ *	that a transfer-level snapshot taken from within the driver may still
+ *	be of higher quality.
+ * @ptp_sts: Pointer to a memory location held by the SPI slave device where a
+ *	PTP system timestamp structure may lie. If drivers use PIO or their
+ *	hardware has some sort of assist for retrieving exact transfer timing,
+ *	they can (and should) assert @ptp_sts_supported and populate this
+ *	structure using the ptp_read_system_*ts helper functions.
+ *	The timestamp must represent the time at which the SPI slave device has
+ *	processed the word, i.e. the "pre" timestamp should be taken before
+ *	transmitting the "pre" word, and the "post" timestamp after receiving
+ *	transmit confirmation from the controller for the "post" word.
+ * @timestamped_pre: Set by the SPI controller driver to denote it has acted
+ *	upon the @ptp_sts request. Not set when the SPI core has taken care of
+ *	the task. SPI device drivers are free to print a warning if this comes
+ *	back unset and they need the better resolution.
+ * @timestamped_post: See above. The reason why both exist is that these
+ *	booleans are also used to keep state in the core SPI logic.
  *
  * SPI transfers always write the same number of bytes as they read.
  * Protocol drivers should always provide @rx_buf and/or @tx_buf.
@@ -842,6 +895,14 @@ struct spi_transfer {
 
 	u32		effective_speed_hz;
 
+	unsigned int	ptp_sts_word_pre;
+	unsigned int	ptp_sts_word_post;
+
+	struct ptp_system_timestamp *ptp_sts;
+
+	bool		timestamped_pre;
+	bool		timestamped_post;
+
 	struct list_head transfer_list;
 };
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 4/4] spi: spi-fsl-dspi: Always use the TCFQ devices in poll mode
From: Vladimir Oltean @ 2019-09-05  1:01 UTC (permalink / raw)
  To: broonie, h.feurstein, mlichvar, richardcochran, andrew,
	f.fainelli
  Cc: linux-spi, netdev, Vladimir Oltean
In-Reply-To: <20190905010114.26718-1-olteanv@gmail.com>

With this patch, the "interrupts" property from the device tree bindings
is ignored, even if present, if the driver runs in TCFQ mode.

Switching to using the DSPI in poll mode has several distinct
benefits:

- With interrupts, the DSPI driver in TCFQ mode raises an IRQ after each
  transmitted word. There is more time wasted for the "waitq" event than
  for actual I/O. And the DSPI IRQ count can easily get the largest in
  /proc/interrupts on Freescale boards with attached SPI devices.

- The SPI I/O time is both lower, and more consistently so. Attached to
  some Freescale devices are either PTP switches, or SPI RTCs. For
  reading time off of a SPI slave device, it is important that all SPI
  transfers take a deterministic time to complete.

- In poll mode there is much less time spent by the CPU in hardirq
  context, which helps with the response latency of the system, and at
  the same time there is more control over when interrupts must be
  disabled (to get a precise timestamp measurement): win-win.

On the LS1021A-TSN board, where the SPI device is a SJA1105 PTP switch
(with a bits_per_word=8 driver), I created a "benchmark" where I read
its PTP time once per second, for 120 seconds. Each "read PTP time" is a
12-byte SPI transfer. I then recorded the time before putting the first
byte in the TX FIFO, and the time after reading the last byte from the
RX FIFO. That is the transfer delay in nanoseconds.

Interrupt mode:

  delay: min 125120 max 168320 mean 150286 std dev 17675.3

Poll mode:

  delay: min 69440 max 119040 mean 70312.9 std dev 8065.34

Both the mean latency and the standard deviation are more than 50% lower
in poll mode than in interrupt mode. This is with an 'ondemand' governor
on an otherwise idle system - therefore running mostly at 600 MHz out of
a max of 1200 MHz.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes in v2:
- None.

 drivers/spi/spi-fsl-dspi.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index 7caea2da4397..c30325faa050 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -716,7 +716,7 @@ static irqreturn_t dspi_interrupt(int irq, void *dev_id)
 	regmap_read(dspi->regmap, SPI_SR, &spi_sr);
 	regmap_write(dspi->regmap, SPI_SR, spi_sr);
 
-	if (!(spi_sr & (SPI_SR_EOQF | SPI_SR_TCFQF)))
+	if (!(spi_sr & SPI_SR_EOQF))
 		return IRQ_NONE;
 
 	if (dspi_rxtx(dspi) == 0) {
@@ -1126,6 +1126,9 @@ static int dspi_probe(struct platform_device *pdev)
 
 	dspi_init(dspi);
 
+	if (dspi->devtype_data->trans_mode == DSPI_TCFQ_MODE)
+		goto poll_mode;
+
 	dspi->irq = platform_get_irq(pdev, 0);
 	if (dspi->irq <= 0) {
 		dev_info(&pdev->dev,
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 0/4] Deterministic SPI latency with NXP DSPI driver
From: Vladimir Oltean @ 2019-09-05  1:01 UTC (permalink / raw)
  To: broonie, h.feurstein, mlichvar, richardcochran, andrew,
	f.fainelli
  Cc: linux-spi, netdev, Vladimir Oltean

Since v1, I noticed that under CPU load, the pre-to-post delay gets
reduced to half (i.e. the code runs twice as fast). After a bit of
debugging, turns out this is the effect of the 'ondemand' governor.
Disabling dynamic frequency scaling (either by switching to the
'performance' governor or by making sure that
/sys/devices/system/cpu/cpu{0,1}/cpufreq/scaling_min_freq is equal to
/sys/devices/system/cpu/cpu{0,1}/cpufreq/scaling_max_freq) is enough for
all readouts to take roughly the same time now. It is an open question
to me: is there any API available to kernel drivers for controlling
this, or is it only a policy for user space to configure?

So here is another test (K) which ran for 36 minutes, with interrupts
disabled during the critical section, poll mode, and with the CPU pegged
at the maximum frequency (1.2 GHz) and SPI controller at 12 MHz.

  offset: min -72 max 64 mean 0.00729594 std dev 21.6964
  delay: min 4400 max 4480 mean 4402.66 std dev 95.8882
  lost servo lock 0 times

While it is clear that the delay is now finally constant, I can't seem
to figure out why it is what it is - 4400 ns, when the bit time on 12
MHz SPI is 83.33 ns, and hence a frame time is 666 ns. Puzzled, I added
a print in dspi_poll and it turns out the status register is always set
on the first access. While this is yet another argument to operate in
poll mode, it is also an indication that the transmission is CPU-bound
and so is the timestamp measurement. Hence it probably makes sense to
revert to an implementation similar to Hubert's - add a fixed timestamp
correction based on frame size and frequency (hardware latencies are
smaller than can be measured from the CPU), and just stop including the
time to poll the TX confirmation status into the measurement.

I haven't done this last change yet, however, mostly to collect comments
and make sure I'm not operating on false assumptions. Having the CPU cap
the measurement at 4400 ns makes it difficult to know when does the 666
ns transfer actually happen, and thus what an appropriate offset
correction for the hardware transfer would be.

Cover letter from v1 below.

===========================================================

This patchset proposes an interface from the SPI subsystem for
software timestamping SPI transfers. There is a default implementation
provided in the core, as well as a mechanism for SPI slave drivers to
check which byte was in fact timestamped post-facto. The patchset also
adds the first user of this interface (the NXP DSPI driver in TCFQ mode).

The interface is somewhat similar to Hubert Feurstein's proposal for the
MDIO subsystem: https://lkml.org/lkml/2019/8/16/638

Original cover letter below. Also provided at the end some results with
an extra test (J - phc2sys using the timestamps taken by the SPI core).

===========================================================

Continuing the discussion created by Hubert Feurstein around the
mv88e6xxx driver for MDIO-controlled switches
(https://lkml.org/lkml/2019/8/2/1364), this patchset takes a similar
approach for the NXP LS1021A-TSN board, which has a SPI-controlled DSA
switch (SJA1105).

The patchset is motivated by some experiments done with a logic
analyzer, trying to understand the source of latency (and especially of
the jitter). SJA1105 SPI messages for reading the PTP clock are 12 bytes
in length: 4 for the SPI header and 8 for the timestamp. When looking at
the messages with a scope, there's jitter basically everywhere: between
bits of a frame and between frames in a transfer. The inter-bit jitter
is hardware and impacts us to a lesser extend (is smaller and caused by
the PVT stability of the oscillators, PLLs, etc). We will focus on the
latency between consecutive SPI frames within a 12-byte transfer.

As a preface, revisions of the DSPI controller IP are integrated in many
Freescale/NXP devices. As a result, the driver has 3 modes of operation:
- TCFQ (Transfer Complete Flag mode): The controller signals software
  that data has been sent/received after each individual word.
- EOQ (End of Queue mode): The driver can implement batching by making
  use of the controller's 4-word deep FIFO.
- DMA (Direct Memory Access mode): The SPI controller's FIFO is no
  longer in direct interaction with the driver, but is used to trigger
  the RX and TX channels of the eDMA module on the SoC.

In LS1021A, the driver works in the least efficient mode of the 3
(TCFQ). There is a well-known errata that the DSPI controller is broken
in conjunction with the eDMA module. As for the EOQ mode, I have tried
unsuccessfully for a few days to make use of the 4 entry FIFO, and the
hardware simply fails to reliably acknowledge the transmission when the
FIFO gets full. So it looks like we're stuck with the TCFQ mode.

The problem with phc2sys on the LS1021A-TSN board is that in order for
the gettime64() call to complete on the sja1105, the system has to
service 12 IRQs. Intuitively that is excessive and is the main source of
jitter, but let's not get ahead of ourselves.

An outline of the experiments that were done (unless otherwise
mentioned, all of these ran for 120 seconds):

A. First I have measured the (poor) performance of phc2sys under current
   conditions. (DSPI driver in IRQ mode, no PTP system timestamping)

   offset: min -53310 max 16107 mean -1737.18 std dev 11444.3
   delay: min 163680 max 237360 mean 201149 std dev 22446.6
   lost servo lock 1 times

B. I switched the .gettime64 callback to .gettimex64, snapshotting the
   PTP system timestamp within the sja1105 driver.

   offset: min -48923 max 64217 mean -904.137 std dev 17358.1
   delay: min 149600 max 203840 mean 169045 std dev 17993.3
   lost servo lock 8 times

C. I patched "struct spi_transfer" to contain the PTP system timestamp,
   and from the sja1105 driver, I passed this structure to be
   snapshotted by the SPI controller's driver (spi-fsl-dspi). This is
   the "transfer-level" snapshot.

   offset: min -64979 max 38979 mean -416.197 std dev 15367.9
   delay: min 125120 max 168320 mean 150286 std dev 17675.3
   lost servo lock 10 times

D. I changed the placement of the transfer snapshotting within the DSPI
   driver, from "transfer-level" to "byte-level".

   offset: min -9021 max 7149 mean -0.418803 std dev 3529.81
   delay: min 7840 max 23920 mean 14493.7 std dev 5982.17
   lost servo lock 0 times

E. I moved the DSPI driver to poll mode. I went back to collecting the
   PTP system timestamps from the sja1105 driver (same as B).

   offset: min -4199 max 46643 mean 418.214 std dev 4554.01
   delay: min 84000 max 194000 mean 99463.2 std dev 12936.5
   lost servo lock 1 times

F. Transfer-level snapshotting in the DSPI driver (same as C), but in
   poll mode.

   offset: min -24244 max 1115 mean -230.478 std dev 2297.28
   delay: min 69440 max 119040 mean 70312.9 std dev 8065.34
   lost servo lock 1 times

G. Byte-level snapshotting (same as D) but in poll mode.

   offset: min -314 max 288 mean -2.48718 std dev 118.045
   delay: min 4880 max 6000 mean 5118.63 std dev 507.258
   lost servo lock 0 times

   This seemed suspiciously good to me, so I let it run for longer
   (58 minutes):

   offset: min -26251 max 16416 mean -21.8672 std dev 863.416
   delay: min 4720 max 57280 mean 5182.49 std dev 1607.19
   lost servo lock 3 times

H. Transfer-level snapshotting (same as F), but with IRQs disabled.
   This ran for 86 minutes.

   offset: min -1927 max 1843 mean -0.209203 std dev 529.398
   delay: min 85440 max 93680 mean 88245 std dev 1454.71
   lost servo lock 0 times

I. Byte-level snapshotting (same as G), but with IRQs disabled.
   This ran for 102 minutes.

   offset: min -378 max 381 mean -0.0083089 std dev 101.495
   delay: min 4720 max 5920 mean 5129.38 std dev 154.899
   lost servo lock 0 times

J. Default snapshotting taken by the SPI core, with the DSPI driver
   running in poll mode, IRQs enabled. This ran for 274 minutes.

   offset: min -42568 max 44576 mean 2.91646 std dev 947.467
   delay: min 58480 max 171040 mean 80750.7 std dev 2001.61
   lost servo lock 3 times

As a result, this patchset proposes the implementation of scenario I.
The others were done through temporary patches which are not presented
here due to the difficulty of presenting a coherent git history without
resorting to reverts etc. The gist of each experiment should be clear
though.

The raw data is available for dissection at
https://drive.google.com/open?id=1r9raU9ZeqOqkqts6Lb-ISf5ubLDLP3wk.
The logic analyzer captures can be opened with a free-as-in-beer program
provided by Saleae: https://www.saleae.com/downloads/.

In the capture data one can find the MOSI, SCK SPI signals, as well as a
debug GPIO which was toggled at the same time as the PTP system
timestamp was taken, to give the viewer an impression of what the
software is capturing compared to the actual timing of the SPI transfer.

Attached are also some close-up screenshots of transfers where there is
a clear and huge delay in-between frames of the same 12-byte SPI
transfer. As it turns out, these were all caused by the CPU getting
interrupted by some other IRQ. Approaches H and I are the only ones that
get rid of these glitches. In theory, the byte-level snapshotting should
be less vulnerable to an IRQ interrupting the SPI transfer (because the
time window is much smaller) but as the 58 minutes experiment shows, it
is not immune.

Vladimir Oltean (4):
  spi: Use an abbreviated pointer to ctlr->cur_msg in
    __spi_pump_messages
  spi: Add a PTP system timestamp to the transfer structure
  spi: spi-fsl-dspi: Implement the PTP system timestamping for TCFQ mode
  spi: spi-fsl-dspi: Always use the TCFQ devices in poll mode

 drivers/spi/spi-fsl-dspi.c |  20 ++++-
 drivers/spi/spi.c          | 150 ++++++++++++++++++++++++++++++++++---
 include/linux/spi/spi.h    |  61 +++++++++++++++
 3 files changed, 219 insertions(+), 12 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [PATCH v2 1/4] spi: Use an abbreviated pointer to ctlr->cur_msg in __spi_pump_messages
From: Vladimir Oltean @ 2019-09-05  1:01 UTC (permalink / raw)
  To: broonie, h.feurstein, mlichvar, richardcochran, andrew,
	f.fainelli
  Cc: linux-spi, netdev, Vladimir Oltean
In-Reply-To: <20190905010114.26718-1-olteanv@gmail.com>

This helps a bit with line fitting now (the list_first_entry call) as
well as during the next patch which needs to iterate through all
transfers of ctlr->cur_msg so it timestamps them.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes in v2:
- Renamed mesg to msg.

 drivers/spi/spi.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 75ac046cae52..9ce86f761763 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1265,8 +1265,9 @@ EXPORT_SYMBOL_GPL(spi_finalize_current_transfer);
  */
 static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 {
-	unsigned long flags;
+	struct spi_message *msg;
 	bool was_busy = false;
+	unsigned long flags;
 	int ret;
 
 	/* Lock queue */
@@ -1325,10 +1326,10 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 	}
 
 	/* Extract head of queue */
-	ctlr->cur_msg =
-		list_first_entry(&ctlr->queue, struct spi_message, queue);
+	msg = list_first_entry(&ctlr->queue, struct spi_message, queue);
+	ctlr->cur_msg = msg;
 
-	list_del_init(&ctlr->cur_msg->queue);
+	list_del_init(&msg->queue);
 	if (ctlr->busy)
 		was_busy = true;
 	else
@@ -1361,7 +1362,7 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 			if (ctlr->auto_runtime_pm)
 				pm_runtime_put(ctlr->dev.parent);
 
-			ctlr->cur_msg->status = ret;
+			msg->status = ret;
 			spi_finalize_current_message(ctlr);
 
 			mutex_unlock(&ctlr->io_mutex);
@@ -1369,28 +1370,28 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 		}
 	}
 
-	trace_spi_message_start(ctlr->cur_msg);
+	trace_spi_message_start(msg);
 
 	if (ctlr->prepare_message) {
-		ret = ctlr->prepare_message(ctlr, ctlr->cur_msg);
+		ret = ctlr->prepare_message(ctlr, msg);
 		if (ret) {
 			dev_err(&ctlr->dev, "failed to prepare message: %d\n",
 				ret);
-			ctlr->cur_msg->status = ret;
+			msg->status = ret;
 			spi_finalize_current_message(ctlr);
 			goto out;
 		}
 		ctlr->cur_msg_prepared = true;
 	}
 
-	ret = spi_map_msg(ctlr, ctlr->cur_msg);
+	ret = spi_map_msg(ctlr, msg);
 	if (ret) {
-		ctlr->cur_msg->status = ret;
+		msg->status = ret;
 		spi_finalize_current_message(ctlr);
 		goto out;
 	}
 
-	ret = ctlr->transfer_one_message(ctlr, ctlr->cur_msg);
+	ret = ctlr->transfer_one_message(ctlr, msg);
 	if (ret) {
 		dev_err(&ctlr->dev,
 			"failed to transfer one message from queue\n");
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH net] net: sonic: remove dev_kfree_skb before return NETDEV_TX_BUSY
From: maowenan @ 2019-09-05  0:56 UTC (permalink / raw)
  To: Eric Dumazet, tsbogend, davem; +Cc: netdev, linux-kernel, kernel-janitors
In-Reply-To: <c4a4d1b0-d036-7af7-2b30-117f9fdee9ad@gmail.com>



On 2019/9/4 18:19, Eric Dumazet wrote:
> 
> 
> On 9/4/19 11:42 AM, Mao Wenan wrote:
>> When dma_map_single is failed to map buffer, skb can't be freed
>> before sonic driver return to stack with NETDEV_TX_BUSY, because
>> this skb may be requeued to qdisc, it might trigger use-after-free.
>>
>> Fixes: d9fb9f384292 ("*sonic/natsemi/ns83829: Move the National Semi-conductor drivers")
>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>> ---
>>  drivers/net/ethernet/natsemi/sonic.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/natsemi/sonic.c b/drivers/net/ethernet/natsemi/sonic.c
>> index d0a01e8f000a..248a8f22a33b 100644
>> --- a/drivers/net/ethernet/natsemi/sonic.c
>> +++ b/drivers/net/ethernet/natsemi/sonic.c
>> @@ -233,7 +233,6 @@ static int sonic_send_packet(struct sk_buff *skb, struct net_device *dev)
>>  	laddr = dma_map_single(lp->device, skb->data, length, DMA_TO_DEVICE);
>>  	if (!laddr) {
>>  		printk(KERN_ERR "%s: failed to map tx DMA buffer.\n", dev->name);
>> -		dev_kfree_skb(skb);
>>  		return NETDEV_TX_BUSY;
>>  	}
>>  
>>
> 
> That is the wrong way to fix this bug.
> 
> What guarantee do we have that the mapping operation will succeed next time we attempt
> the transmit (and the dma_map_single() operation) ?
> 
> NETDEV_TX_BUSY is very dangerous, this might trigger an infinite loop.
> 
> I would rather leave the dev_kfree_skb(skb), and return NETDEV_TX_OK
yes, right, it will go to infinite loop if dma_map_single failed always.
v2 will be sent later.
> 
> Also the printk(KERN_ERR ...) should be replaced by pr_err_ratelimited(...)
> 
> NETDEV_TX_BUSY really should only be used by drivers that call netif_tx_stop_queue()
> at the wrong moment.
It will call netif_tx_stop_queue() then return NETDEV_TX_BUSY, and give a chance to
netif_tx_wake_queue(), then stack can be resent packet to driver?

> 
> .
> 


^ permalink raw reply

* Re: [PATCH v3 bpf-next 2/3] bpf: implement CAP_BPF
From: Song Liu @ 2019-09-05  0:34 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S . Miller, Daniel Borkmann, Peter Zijlstra,
	Andy Lutomirski, Networking, bpf, Kernel Team,
	linux-api@vger.kernel.org
In-Reply-To: <20190904184335.360074-2-ast@kernel.org>



> On Sep 4, 2019, at 11:43 AM, Alexei Starovoitov <ast@kernel.org> wrote:
> 
> Implement permissions as stated in uapi/linux/capability.h
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> 

[...]

> @@ -1648,11 +1648,11 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
> 	is_gpl = license_is_gpl_compatible(license);
> 
> 	if (attr->insn_cnt == 0 ||
> -	    attr->insn_cnt > (capable(CAP_SYS_ADMIN) ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
> +	    attr->insn_cnt > (capable_bpf() ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
> 		return -E2BIG;
> 	if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
> 	    type != BPF_PROG_TYPE_CGROUP_SKB &&
> -	    !capable(CAP_SYS_ADMIN))
> +	    !capable_bpf())
> 		return -EPERM;

Do we allow load BPF_PROG_TYPE_SOCKET_FILTER and BPF_PROG_TYPE_CGROUP_SKB
without CAP_BPF? If so, maybe highlight in the header?

Thanks,
Song


^ permalink raw reply

* Re: [PATCH][next] net/mlx5: fix spelling mistake "offlaods" -> "offloads"
From: Saeed Mahameed @ 2019-09-04 23:31 UTC (permalink / raw)
  To: linux-rdma@vger.kernel.org, colin.king@canonical.com,
	davem@davemloft.net, leon@kernel.org, netdev@vger.kernel.org
  Cc: kernel-janitors@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20190904193841.20138-1-colin.king@canonical.com>

On Wed, 2019-09-04 at 20:38 +0100, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> There is a spelling mistake in a NL_SET_ERR_MSG_MOD error message.
> Fix it.
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> 

Applied to net-next-mlx5.

Thanks,
Saeed.

^ permalink raw reply

* Re: [PATCH][next] net/mlx5: fix missing assignment of variable err
From: Saeed Mahameed @ 2019-09-04 23:30 UTC (permalink / raw)
  To: linux-rdma@vger.kernel.org, Maor Gottlieb,
	colin.king@canonical.com, davem@davemloft.net, leon@kernel.org,
	netdev@vger.kernel.org
  Cc: kernel-janitors@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20190904192914.19684-1-colin.king@canonical.com>

On Wed, 2019-09-04 at 20:29 +0100, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> The error return from a call to mlx5_flow_namespace_set_peer is not
> being assigned to variable err and hence the error check following
> the call is currently not working.  Fix this by assigning ret as
> intended.
> 
> Addresses-Coverity: ("Logically dead code")
> Fixes: 8463daf17e80 ("net/mlx5: Add support to use SMFS in switchdev
> mode")
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---

Applied to net-next-mlx5.

I have a cleanup series coming up, will send it all together to 
net-next soon. 

Thanks,
Saeed.

^ permalink raw reply

* Re: [PATCH] net/mlx5: Use PTR_ERR_OR_ZERO rather than its implementation
From: Saeed Mahameed @ 2019-09-04 23:29 UTC (permalink / raw)
  To: zhongjiang@huawei.com, davem@davemloft.net
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	leon@kernel.org
In-Reply-To: <797f3807c00a52ea923301b4859f24145f0a291a.camel@mellanox.com>

On Tue, 2019-09-03 at 20:08 +0000, Saeed Mahameed wrote:
> On Tue, 2019-09-03 at 14:56 +0800, zhong jiang wrote:
> > PTR_ERR_OR_ZERO contains if(IS_ERR(...)) + PTR_ERR. It is better
> > to use it directly. hence just replace it.
> > 
> > Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> > ---
> >  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> > index 5581a80..2e0b467 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> > @@ -989,10 +989,7 @@ static void mlx5e_hairpin_flow_del(struct
> > mlx5e_priv *priv,
> >  					    &flow_act, dest, dest_ix);
> >  	mutex_unlock(&priv->fs.tc.t_lock);
> >  
> > -	if (IS_ERR(flow->rule[0]))
> > -		return PTR_ERR(flow->rule[0]);
> > -
> > -	return 0;
> > +	return PTR_ERR_OR_ZERO(flow->rule[0]);
> >  }
> >  
> >  static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
> 
> Acked-by: Saeed Mahameed <saeedm@mellanox.com>

Applied to net-next-mlx5 as i have a cleanup series coming up soon.

Thanks,
Saeed.

^ permalink raw reply

* Re: rtnl_lock() question
From: Saeed Mahameed @ 2019-09-04 23:23 UTC (permalink / raw)
  To: jonathan.lemon@gmail.com, eric.dumazet@gmail.com; +Cc: netdev@vger.kernel.org
In-Reply-To: <C46053D2-6BF5-4CFE-BF76-32DDCAD7BC10@gmail.com>

On Wed, 2019-09-04 at 09:38 -0700, Jonathan Lemon wrote:
> On 4 Sep 2019, at 0:39, Eric Dumazet wrote:
> 
> > On 9/3/19 11:55 PM, Jonathan Lemon wrote:
> > > How appropriate is it to hold the rtnl_lock() across a sleepable
> > > memory allocation?  On one hand it's just a mutex, but it would
> > > seem like it could block quite a few things.
> > > 
> > 
> > Sure, all GFP_KERNEL allocations can sleep for quite a while.
> > 
> > On the other hand, we may want to delay stuff if memory is under 
> > pressure,
> > or complex operations like NEWLINK would fail.
> > 
> > RTNL is mostly taken for control path operations, we prefer them to
> > be
> > mostly reliable, otherwise admins job would be a nightmare.
> > 
> > In some cases, it is relatively easy to pre-allocate memory before 
> > rtnl is taken,
> > but that will only take care of some selected paths.
> 
> The particular code path that I'm looking at is
> mlx5e_tx_timeout_work().
> 
> This is called on TX timeout, and mlx5 wants to move an entire
> channel
> and all the supporting structures elsewhere.  Under the rtnl_lock(),
> it
> calls kvzmalloc() in order to grab a large chunk of contig memory,
> which
> ends up stalling the system.
> 
> I suspect these large allocation should really be done outside the
> lock.

I am afraid that is impossible, at least not for all allocations

some allocations require parameters that should remain valid and
constant across the whole reconfiguration procedure such
params.num_channels, so they must be done inside the lock. 

other allocations are buried deep inside mlx5 that by doing pre
allocations is going to require a lot of refactoring. 

One idea is to use some sort of mem cache specifically for mlx5
reconfiguration that is cheaper to call than raw kvzalloc ? but
different objects  in the mlx5 reconfiguration path requires differnt
memory types, numa affinity etc.. which might make the cache harder to
satisfy all requirements.

^ permalink raw reply

* Re: [PATCH bpf-next 0/6] selftests/bpf: move sockopt tests under test_progs
From: Alexei Starovoitov @ 2019-09-04 23:03 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, davem, ast, daniel
In-Reply-To: <20190904162509.199561-1-sdf@google.com>

On Wed, Sep 04, 2019 at 09:25:03AM -0700, Stanislav Fomichev wrote:
> Now that test_progs is shaping into more generic test framework,
> let's convert sockopt tests to it. This requires adding
> a helper to create and join a cgroup first (test__join_cgroup).
> Since we already hijack stdout/stderr that shouldn't be
> a problem (cgroup helpers log to stderr).
> 
> The rest of the patches just move sockopt tests files under prog_tests/
> and do the required small adjustments.

Looks good. Thank you for working on it.
Could you de-verbose setsockopt test a bit?
#23/32 setsockopt: deny write ctx->retval:OK
#23/33 setsockopt: deny read ctx->retval:OK
#23/34 setsockopt: deny writing to ctx->optval:OK
#23/35 setsockopt: deny writing to ctx->optval_end:OK
#23/36 setsockopt: allow IP_TOS <= 128:OK
#23/37 setsockopt: deny IP_TOS > 128:OK
37 subtests is a bit too much spam.

^ permalink raw reply

* Re: WARNING in hso_free_net_device
From: Stephen Hemminger @ 2019-09-04 22:41 UTC (permalink / raw)
  To: Hui Peng
  Cc: syzbot+44d53c7255bb1aea22d2, alexios.zavras, andreyknvl, davem,
	gregkh, linux-kernel, linux-usb, mathias.payer, netdev, rfontana,
	syzkaller-bugs, tglx
In-Reply-To: <d6e4d2da-66c6-a8fe-2fea-a3435fa7cb54@gmail.com>

On Wed, 4 Sep 2019 16:27:50 -0400
Hui Peng <benquike@gmail.com> wrote:

> Hi, all:
> 
> I looked at the bug a little.
> 
> The issue is that in the error handling code, hso_free_net_device
> unregisters
> 
> the net_device (hso_net->net)  by calling unregister_netdev. In the
> error handling code path,
> 
> hso_net->net has not been registered yet.
> 
> I think there are two ways to solve the issue:
> 
> 1. fix it in drivers/net/usb/hso.c to avoiding unregistering the
> net_device when it is still not registered
> 
> 2. fix it in unregister_netdev. We can add a field in net_device to
> record whether it is registered, and make unregister_netdev return if
> the net_device is not registered yet.
> 
> What do you guys think ?

#1

^ permalink raw reply

* Re: [PATCH v2] net-ipv6: fix excessive RTF_ADDRCONF flag on ::1/128 local route (and others)
From: David Miller @ 2019-09-04 22:33 UTC (permalink / raw)
  To: zenczykowski; +Cc: maze, netdev, dsahern, lorenzo
In-Reply-To: <20190902162336.240405-1-zenczykowski@gmail.com>

From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Mon,  2 Sep 2019 09:23:36 -0700

> From: Maciej Żenczykowski <maze@google.com>
> 
> There is a subtle change in behaviour introduced by:
>   commit c7a1ce397adacaf5d4bb2eab0a738b5f80dc3e43
>   'ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create'
> 
> Before that patch /proc/net/ipv6_route includes:
> 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo
> 
> Afterwards /proc/net/ipv6_route includes:
> 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80240001 lo
> 
> ie. the above commit causes the ::1/128 local (automatic) route to be flagged with RTF_ADDRCONF (0x040000).
> 
> AFAICT, this is incorrect since these routes are *not* coming from RA's.
> 
> As such, this patch restores the old behaviour.
> 
> Fixes: c7a1ce397adacaf5d4bb2eab0a738b5f80dc3e43

Please format your Fixes: tag properly next time, I fixed it up for you.

> Cc: David Ahern <dsahern@gmail.com>
> Cc: Lorenzo Colitti <lorenzo@google.com>
> Signed-off-by: Maciej Żenczykowski <maze@google.com>

Applied and queued up for -stable.

^ permalink raw reply

* RE: [PATCH v2 net-next 2/2] i40e: Implement debug macro hw_dbg using dev_dbg
From: Bowers, AndrewX @ 2019-09-04 22:30 UTC (permalink / raw)
  To: netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org
In-Reply-To: <20190903192021.25789-2-maurosr@linux.vnet.ibm.com>

> -----Original Message-----
> From: Mauro S. M. Rodrigues [mailto:maurosr@linux.vnet.ibm.com]
> Sent: Tuesday, September 3, 2019 12:20 PM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org;
> davem@davemloft.net; Bowers, AndrewX <andrewx.bowers@intel.com>;
> Jakub Kicinski <jakub.kicinski@netronome.com>;
> maurosr@linux.vnet.ibm.com
> Subject: [PATCH v2 net-next 2/2] i40e: Implement debug macro hw_dbg
> using dev_dbg
> 
> There are several uses of hw_dbg in the code, producing no output. This
> patch implments it using dev_debug.
> 
> Initially the intention was to implement it using netdev_dbg, analogously to
> what is done in ixgbe for instance. That approach was avoided due to some
> early usages of hw_dbg, like i40e_pf_reset, before the vsi structure
> initialization causing NULL pointer dereference during the driver probe if the
> dbg messages were turned on as soon as the module is probed.
> 
> v2:
>  - Use dev_dbg instead of pr_debug, and take advantage of dev_name
> instead of crafting pretty much the same device name locally as suggested by
> Jakub Kicinski.
> 
> Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_common.c | 1 +
>  drivers/net/ethernet/intel/i40e/i40e_hmc.c    | 1 +
>  drivers/net/ethernet/intel/i40e/i40e_osdep.h  | 5 ++++-
>  3 files changed, 6 insertions(+), 1 deletion(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply

* Re: [PATCH net] sctp: use transport pf_retrans in sctp_do_8_2_transport_strike
From: David Miller @ 2019-09-04 22:30 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, linux-sctp, marcelo.leitner, nhorman
In-Reply-To: <41769d6033d27d629798e060671a3b21f22e2a21.1567437861.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Mon,  2 Sep 2019 23:24:21 +0800

> Transport should use its own pf_retrans to do the error_count
> check, instead of asoc's. Otherwise, it's meaningless to make
> pf_retrans per transport.
> 
> Fixes: 5aa93bcf66f4 ("sctp: Implement quick failover draft from tsvwg")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Applied and queued up for -stable.

^ permalink raw reply

* RE: [PATCH v2 net-next 1/2] i40e: fix hw_dbg usage in i40e_hmc_get_object_va
From: Bowers, AndrewX @ 2019-09-04 22:29 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: intel-wired-lan@lists.osuosl.org
In-Reply-To: <20190903192021.25789-1-maurosr@linux.vnet.ibm.com>

> -----Original Message-----
> From: Mauro S. M. Rodrigues [mailto:maurosr@linux.vnet.ibm.com]
> Sent: Tuesday, September 3, 2019 12:20 PM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org;
> davem@davemloft.net; Bowers, AndrewX <andrewx.bowers@intel.com>;
> Jakub Kicinski <jakub.kicinski@netronome.com>;
> maurosr@linux.vnet.ibm.com
> Subject: [PATCH v2 net-next 1/2] i40e: fix hw_dbg usage in
> i40e_hmc_get_object_va
> 
> The mentioned function references a i40e_hw attribute, as parameter for
> hw_dbg, but it doesn't exist in the function scope.
> Fixes it by changing  parameters from i40e_hmc_info to i40e_hw which can
> retrieve the necessary i40e_hmc_info.
> 
> v2:
>  - Fixed reverse xmas tree code style issue as suggested by Jakub Kicinski
> 
> Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply

* Re: [PATCH net-next v3 0/3] net: dsa: mt7530: Convert to PHYLINK and add support for port 5
From: David Miller @ 2019-09-04 22:28 UTC (permalink / raw)
  To: opensource
  Cc: sean.wang, andrew, vivien.didelot, f.fainelli, matthias.bgg,
	netdev, linux-arm-kernel, linux-mediatek, linux, john, linux-mips,
	frank-w
In-Reply-To: <20190902130226.26845-1-opensource@vdorst.com>

From: René van Dorst <opensource@vdorst.com>
Date: Mon,  2 Sep 2019 15:02:23 +0200

> 1. net: dsa: mt7530: Convert to PHYLINK API
>    This patch converts mt7530 to PHYLINK API.
> 2. dt-bindings: net: dsa: mt7530: Add support for port 5
> 3. net: dsa: mt7530: Add support for port 5
>    These 2 patches adding support for port 5 of the switch.
> 
> v2->v3:
>  * Removed 'status = "okay"' lines in patch #2
>  * Change a port 5 setup message in a debug message in patch #3
>  * Added ack-by and tested-by tags
> v1->v2:
>  * Mostly phylink improvements after review.
> rfc -> v1:
>  * Mostly phylink improvements after review.
>  * Drop phy isolation patches. Adds no value for now.

Series applied.

^ permalink raw reply

* Re: [PATCH net-next] r8152: modify rtl8152_set_speed function
From: David Miller @ 2019-09-04 22:26 UTC (permalink / raw)
  To: hayeswang; +Cc: netdev, nic_swsd, linux-kernel
In-Reply-To: <1394712342-15778-326-Taiwan-albertk@realtek.com>

From: Hayes Wang <hayeswang@realtek.com>
Date: Mon, 2 Sep 2019 19:52:28 +0800

> First, for AUTONEG_DISABLE, we only need to modify MII_BMCR.
> 
> Second, add advertising parameter for rtl8152_set_speed(). Add
> RTL_ADVERTISED_xxx for advertising parameter of rtl8152_set_speed().
> Then, the advertising settings from ethtool could be saved.
> 
> Signed-off-by: Hayes Wang <hayeswang@realtek.com>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox