Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net] net/sonic: Use dma_mapping_error()
From: Finn Thain @ 2018-05-30  3:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: Thomas Bogendoerfer, netdev, linux-kernel

With CONFIG_DMA_API_DEBUG=y, calling sonic_open() produces the
message, "DMA-API: device driver failed to check map error".
Add the missing dma_mapping_error() call.

Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
 drivers/net/ethernet/natsemi/sonic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/natsemi/sonic.c b/drivers/net/ethernet/natsemi/sonic.c
index 7ed08486ae23..c805dcbebd02 100644
--- a/drivers/net/ethernet/natsemi/sonic.c
+++ b/drivers/net/ethernet/natsemi/sonic.c
@@ -84,7 +84,7 @@ static int sonic_open(struct net_device *dev)
 	for (i = 0; i < SONIC_NUM_RRS; i++) {
 		dma_addr_t laddr = dma_map_single(lp->device, skb_put(lp->rx_skb[i], SONIC_RBSIZE),
 		                                  SONIC_RBSIZE, DMA_FROM_DEVICE);
-		if (!laddr) {
+		if (dma_mapping_error(lp->device, laddr)) {
 			while(i > 0) { /* free any that were mapped successfully */
 				i--;
 				dma_unmap_single(lp->device, lp->rx_laddr[i], SONIC_RBSIZE, DMA_FROM_DEVICE);
-- 
2.16.1

^ permalink raw reply related

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
From: Michael Chan @ 2018-05-30  3:19 UTC (permalink / raw)
  To: Samudrala, Sridhar; +Cc: David Miller, Netdev
In-Reply-To: <1de4ef51-dbe8-4d7e-0e43-9e92773cf52b@intel.com>

On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar
<sridhar.samudrala@intel.com> wrote:

>
> Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
> extended?

I didn't know about that.

> Shouldn't we enable this via ethtool on the port representor netdev?
>
>

We discussed about this.  ethtool on the VF representor will only work
in switchdev mode and also will not support min/max values.

^ permalink raw reply

* Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks
From: Eric Dumazet @ 2018-05-30  3:34 UTC (permalink / raw)
  To: David Miller, qing.huang
  Cc: tariqt, haakon.bugge, yanjun.zhu, netdev, linux-rdma,
	linux-kernel, gi-oh.kim
In-Reply-To: <20180525.102321.858995452200286788.davem@davemloft.net>



On 05/25/2018 10:23 AM, David Miller wrote:
> From: Qing Huang <qing.huang@oracle.com>
> Date: Wed, 23 May 2018 16:22:46 -0700
> 
>> When a system is under memory presure (high usage with fragments),
>> the original 256KB ICM chunk allocations will likely trigger kernel
>> memory management to enter slow path doing memory compact/migration
>> ops in order to complete high order memory allocations.
>>
>> When that happens, user processes calling uverb APIs may get stuck
>> for more than 120s easily even though there are a lot of free pages
>> in smaller chunks available in the system.
>>
>> Syslog:
>> ...
>> Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
>> oracle_205573_e:205573 blocked for more than 120 seconds.
>> ...
>>
>> With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.
>>
>> However in order to support smaller ICM chunk size, we need to fix
>> another issue in large size kcalloc allocations.
>>
>> E.g.
>> Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
>> size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
>> entry). So we need a 16MB allocation for a table->icm pointer array to
>> hold 2M pointers which can easily cause kcalloc to fail.
>>
>> The solution is to use kvzalloc to replace kcalloc which will fall back
>> to vmalloc automatically if kmalloc fails.
>>
>> Signed-off-by: Qing Huang <qing.huang@oracle.com>
>> Acked-by: Daniel Jurgens <danielj@mellanox.com>
>> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> 
> Applied, thanks.
> 

I must say this patch causes regressions here.

KASAN is not happy.

It looks that you guys did not really looked at mlx4_alloc_icm()

This function is properly handling high order allocations with fallbacks to order-0 pages
under high memory pressure.

BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib]
Read of size 4 at addr ffff8817df584f68 by task qp_listing_test/92585

CPU: 38 PID: 92585 Comm: qp_listing_test Tainted: G           O     
Call Trace:
 [<ffffffffba80d7bb>] dump_stack+0x4d/0x72
 [<ffffffffb951dc5f>] print_address_description+0x6f/0x260
 [<ffffffffb951e1c7>] kasan_report+0x257/0x370
 [<ffffffffb951e339>] __asan_report_load4_noabort+0x19/0x20
 [<ffffffffc0256d28>] to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib]
 [<ffffffffc02785b3>] mlx4_ib_query_qp+0x1213/0x1660 [mlx4_ib]
 [<ffffffffc02dbfdb>] qpstat_print_qp+0x13b/0x500 [ib_uverbs]
 [<ffffffffc02dc3ea>] qpstat_seq_show+0x4a/0xb0 [ib_uverbs]
 [<ffffffffb95f125c>] seq_read+0xa9c/0x1230
 [<ffffffffb96e0821>] proc_reg_read+0xc1/0x180
 [<ffffffffb9577918>] __vfs_read+0xe8/0x730
 [<ffffffffb9578057>] vfs_read+0xf7/0x300
 [<ffffffffb95794d2>] SyS_read+0xd2/0x1b0
 [<ffffffffb8e06b16>] do_syscall_64+0x186/0x420
 [<ffffffffbaa00071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7f851a7bb30d
RSP: 002b:00007ffd09a758c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 00007f84ff959440 RCX: 00007f851a7bb30d
RDX: 000000000003fc00 RSI: 00007f84ff60a000 RDI: 000000000000000b
RBP: 00007ffd09a75900 R08: 00000000ffffffff R09: 0000000000000000
R10: 0000000000000022 R11: 0000000000000293 R12: 0000000000000000
R13: 000000000003ffff R14: 000000000003ffff R15: 00007f84ff60a000

Allocated by task 4488: 
 save_stack+0x46/0xd0
 kasan_kmalloc+0xad/0xe0
 __kmalloc+0x101/0x5e0
 ib_register_device+0xc03/0x1250 [ib_core]
 mlx4_ib_add+0x27d6/0x4dd0 [mlx4_ib]
 mlx4_add_device+0xa9/0x340 [mlx4_core]
 mlx4_register_interface+0x16e/0x390 [mlx4_core]
 xhci_pci_remove+0x7a/0x180 [xhci_pci]
 do_one_initcall+0xa0/0x230
 do_init_module+0x1b9/0x5a4
 load_module+0x63e6/0x94c0
 SYSC_init_module+0x1a4/0x1c0
 SyS_init_module+0xe/0x10
 do_syscall_64+0x186/0x420
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Freed by task 0:
(stack is not available)

The buggy address belongs to the object at ffff8817df584f40
 which belongs to the cache kmalloc-32 of size 32
The buggy address is located 8 bytes to the right of
 32-byte region [ffff8817df584f40, ffff8817df584f60)
The buggy address belongs to the page:
page:ffffea005f7d6100 count:1 mapcount:0 mapping:ffff8817df584000 index:0xffff8817df584fc1
flags: 0x880000000000100(slab)
raw: 0880000000000100 ffff8817df584000 ffff8817df584fc1 000000010000003f
raw: ffffea005f3ac0a0 ffffea005c476760 ffff8817fec00900 ffff883ff78d26c0
page dumped because: kasan: bad access detected
page->mem_cgroup:ffff883ff78d26c0

Memory state around the buggy address:
 ffff8817df584e00: 00 03 fc fc fc fc fc fc 00 03 fc fc fc fc fc fc
 ffff8817df584e80: 00 00 00 04 fc fc fc fc 00 00 00 fc fc fc fc fc
>ffff8817df584f00: fb fb fb fb fc fc fc fc 00 00 00 00 fc fc fc fc
                                                          ^
 ffff8817df584f80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8817df585000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

I will test :

diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
index 685337d58276fc91baeeb64387c52985e1bc6dda..4d2a71381acb739585d662175e86caef72338097 100644
--- a/drivers/net/ethernet/mellanox/mlx4/icm.c
+++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
@@ -43,12 +43,13 @@
 #include "fw.h"
 
 /*
- * We allocate in page size (default 4KB on many archs) chunks to avoid high
- * order memory allocations in fragmented/high usage memory situation.
+ * We allocate in as big chunks as we can, up to a maximum of 256 KB
+ * per chunk. Note that the chunks are not necessarily in contiguous
+ * physical memory.
  */
 enum {
-       MLX4_ICM_ALLOC_SIZE     = PAGE_SIZE,
-       MLX4_TABLE_CHUNK_SIZE   = PAGE_SIZE,
+       MLX4_ICM_ALLOC_SIZE     = 1 << 18,
+       MLX4_TABLE_CHUNK_SIZE   = 1 << 18
 };
 
 static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)

^ permalink raw reply related

* Re: [net] vhost: Use kzalloc() to allocate vhost_msg_node
From: Guenter Roeck @ 2018-05-30  3:42 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Easton, Jason Wang, kvm, virtualization, netdev,
	linux-kernel, syzkaller-bugs
In-Reply-To: <20180530055704-mutt-send-email-mst@kernel.org>

On 05/29/2018 08:01 PM, Michael S. Tsirkin wrote:
> On Tue, May 29, 2018 at 03:19:08PM -0700, Guenter Roeck wrote:
>> On Fri, Apr 27, 2018 at 11:45:02AM -0400, Kevin Easton wrote:
>>> The struct vhost_msg within struct vhost_msg_node is copied to userspace,
>>> so it should be allocated with kzalloc() to ensure all structure padding
>>> is zeroed.
>>>
>>> Signed-off-by: Kevin Easton <kevin@guarana.org>
>>> Reported-by: syzbot+87cfa083e727a224754b@syzkaller.appspotmail.com
>>
>> Is this patch going anywhere ?
>>
>> The patch fixes CVE-2018-1118. It would be useful to understand if and when
>> this problem is going to be fixed.
>>
>> Thanks,
>> Guenter
>>> ---
>>>   drivers/vhost/vhost.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>> index f3bd8e9..1b84dcff 100644
>>> --- a/drivers/vhost/vhost.c
>>> +++ b/drivers/vhost/vhost.c
>>> @@ -2339,7 +2339,7 @@ EXPORT_SYMBOL_GPL(vhost_disable_notify);
>>>   /* Create a new message. */
>>>   struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
>>>   {
>>> -	struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
>>> +	struct vhost_msg_node *node = kzalloc(sizeof *node, GFP_KERNEL);
>>>   	if (!node)
>>>   		return NULL;
>>>   	node->vq = vq;
> 
> As I pointed out, we don't need to init the whole structure. The proper
> fix is thus (I think) below.
> 
> Could you use your testing infrastructure to confirm this fixes the issue?
> 

Sorry, I don't have the means to test the fix.

Guenter

> Thanks!
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index f3bd8e941224..58d9aec90afb 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2342,6 +2342,9 @@ struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
>   	struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
>   	if (!node)
>   		return NULL;
> +
> +	/* Make sure all padding within the structure is initialized. */
> +	memset(&node->msg, 0, sizeof node->msg);
>   	node->vq = vq;
>   	node->msg.type = type;
>   	return node;
> 

^ permalink raw reply

* Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks
From: Eric Dumazet @ 2018-05-30  3:44 UTC (permalink / raw)
  To: Eric Dumazet, David Miller, qing.huang
  Cc: tariqt, haakon.bugge, yanjun.zhu, netdev, linux-rdma,
	linux-kernel, gi-oh.kim
In-Reply-To: <7a353b65-6b7f-1aee-1c48-e83c8e02f693@gmail.com>



On 05/29/2018 11:34 PM, Eric Dumazet wrote:

> I will test :
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
> index 685337d58276fc91baeeb64387c52985e1bc6dda..4d2a71381acb739585d662175e86caef72338097 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
> @@ -43,12 +43,13 @@
>  #include "fw.h"
>  
>  /*
> - * We allocate in page size (default 4KB on many archs) chunks to avoid high
> - * order memory allocations in fragmented/high usage memory situation.
> + * We allocate in as big chunks as we can, up to a maximum of 256 KB
> + * per chunk. Note that the chunks are not necessarily in contiguous
> + * physical memory.
>   */
>  enum {
> -       MLX4_ICM_ALLOC_SIZE     = PAGE_SIZE,
> -       MLX4_TABLE_CHUNK_SIZE   = PAGE_SIZE,
> +       MLX4_ICM_ALLOC_SIZE     = 1 << 18,
> +       MLX4_TABLE_CHUNK_SIZE   = 1 << 18
>  };
>  
>  static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)
> 

And I will add this simple fix, this really should address your initial concern much better.

@@ -99,6 +100,8 @@ static int mlx4_alloc_icm_pages(struct scatterlist *mem, int order,
 {
        struct page *page;
 
+       if (order)
+               gfp_mask |= __GFP_NORETRY;
        page = alloc_pages_node(node, gfp_mask, order);
        if (!page) {
                page = alloc_pages(gfp_mask, order);

^ permalink raw reply

* Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks
From: Eric Dumazet @ 2018-05-30  3:49 UTC (permalink / raw)
  To: Eric Dumazet, David Miller, qing.huang
  Cc: tariqt, haakon.bugge, yanjun.zhu, netdev, linux-rdma,
	linux-kernel, gi-oh.kim
In-Reply-To: <0e11e0fc-6ccf-aa93-9c4f-b9eae1b90643@gmail.com>



On 05/29/2018 11:44 PM, Eric Dumazet wrote:

> 
> And I will add this simple fix, this really should address your initial concern much better.
> 
> @@ -99,6 +100,8 @@ static int mlx4_alloc_icm_pages(struct scatterlist *mem, int order,
>  {
>         struct page *page;
>  
> +       if (order)
> +               gfp_mask |= __GFP_NORETRY;

    and also      gfp_mask &= ~__GFP_DIRECT_RECLAIM


>         page = alloc_pages_node(node, gfp_mask, order);
>         if (!page) {
>                 page = alloc_pages(gfp_mask, order);
> 

^ permalink raw reply

* [PATCH net] mlx4_core: restore optimal ICM memory allocation
From: Eric Dumazet @ 2018-05-30  4:11 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, John Sperbeck, Tarick Bedeir,
	Qing Huang, Daniel Jurgens, Zhu Yanjun, Tariq Toukan

Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
brought a regression caught in our regression suite, thanks to KASAN.

Note that mlx4_alloc_icm() is already able to try high order allocations
and fallback to low-order allocations under high memory pressure.

We only have to tweak gfp_mask a bit, to help falling back faster,
without risking OOM killings.

BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib]
Read of size 4 at addr ffff8817df584f68 by task qp_listing_test/92585

CPU: 38 PID: 92585 Comm: qp_listing_test Tainted: G           O
Call Trace:
 [<ffffffffba80d7bb>] dump_stack+0x4d/0x72
 [<ffffffffb951dc5f>] print_address_description+0x6f/0x260
 [<ffffffffb951e1c7>] kasan_report+0x257/0x370
 [<ffffffffb951e339>] __asan_report_load4_noabort+0x19/0x20
 [<ffffffffc0256d28>] to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib]
 [<ffffffffc02785b3>] mlx4_ib_query_qp+0x1213/0x1660 [mlx4_ib]
 [<ffffffffc02dbfdb>] qpstat_print_qp+0x13b/0x500 [ib_uverbs]
 [<ffffffffc02dc3ea>] qpstat_seq_show+0x4a/0xb0 [ib_uverbs]
 [<ffffffffb95f125c>] seq_read+0xa9c/0x1230
 [<ffffffffb96e0821>] proc_reg_read+0xc1/0x180
 [<ffffffffb9577918>] __vfs_read+0xe8/0x730
 [<ffffffffb9578057>] vfs_read+0xf7/0x300
 [<ffffffffb95794d2>] SyS_read+0xd2/0x1b0
 [<ffffffffb8e06b16>] do_syscall_64+0x186/0x420
 [<ffffffffbaa00071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7f851a7bb30d
RSP: 002b:00007ffd09a758c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 00007f84ff959440 RCX: 00007f851a7bb30d
RDX: 000000000003fc00 RSI: 00007f84ff60a000 RDI: 000000000000000b
RBP: 00007ffd09a75900 R08: 00000000ffffffff R09: 0000000000000000
R10: 0000000000000022 R11: 0000000000000293 R12: 0000000000000000
R13: 000000000003ffff R14: 000000000003ffff R15: 00007f84ff60a000

Allocated by task 4488:
 save_stack+0x46/0xd0
 kasan_kmalloc+0xad/0xe0
 __kmalloc+0x101/0x5e0
 ib_register_device+0xc03/0x1250 [ib_core]
 mlx4_ib_add+0x27d6/0x4dd0 [mlx4_ib]
 mlx4_add_device+0xa9/0x340 [mlx4_core]
 mlx4_register_interface+0x16e/0x390 [mlx4_core]
 xhci_pci_remove+0x7a/0x180 [xhci_pci]
 do_one_initcall+0xa0/0x230
 do_init_module+0x1b9/0x5a4
 load_module+0x63e6/0x94c0
 SYSC_init_module+0x1a4/0x1c0
 SyS_init_module+0xe/0x10
 do_syscall_64+0x186/0x420
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Freed by task 0:
(stack is not available)

The buggy address belongs to the object at ffff8817df584f40
 which belongs to the cache kmalloc-32 of size 32
The buggy address is located 8 bytes to the right of
 32-byte region [ffff8817df584f40, ffff8817df584f60)
The buggy address belongs to the page:
page:ffffea005f7d6100 count:1 mapcount:0 mapping:ffff8817df584000 index:0xffff8817df584fc1
flags: 0x880000000000100(slab)
raw: 0880000000000100 ffff8817df584000 ffff8817df584fc1 000000010000003f
raw: ffffea005f3ac0a0 ffffea005c476760 ffff8817fec00900 ffff883ff78d26c0
page dumped because: kasan: bad access detected
page->mem_cgroup:ffff883ff78d26c0

Memory state around the buggy address:
 ffff8817df584e00: 00 03 fc fc fc fc fc fc 00 03 fc fc fc fc fc fc
 ffff8817df584e80: 00 00 00 04 fc fc fc fc 00 00 00 fc fc fc fc fc
> ffff8817df584f00: fb fb fb fb fc fc fc fc 00 00 00 00 fc fc fc fc
                                                          ^
 ffff8817df584f80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8817df585000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: John Sperbeck <jsperbeck@google.com>
Cc: Tarick Bedeir <tarick@google.com>
Cc: Qing Huang <qing.huang@oracle.com>
Cc: Daniel Jurgens <danielj@mellanox.com>
Cc: Zhu Yanjun <yanjun.zhu@oracle.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/icm.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
index 685337d58276fc91baeeb64387c52985e1bc6dda..cae33d5c7dbd9ba7929adcf2127b104f6796fa5a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/icm.c
+++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
@@ -43,12 +43,13 @@
 #include "fw.h"
 
 /*
- * We allocate in page size (default 4KB on many archs) chunks to avoid high
- * order memory allocations in fragmented/high usage memory situation.
+ * We allocate in as big chunks as we can, up to a maximum of 256 KB
+ * per chunk. Note that the chunks are not necessarily in contiguous
+ * physical memory.
  */
 enum {
-	MLX4_ICM_ALLOC_SIZE	= PAGE_SIZE,
-	MLX4_TABLE_CHUNK_SIZE	= PAGE_SIZE,
+	MLX4_ICM_ALLOC_SIZE	= 1 << 18,
+	MLX4_TABLE_CHUNK_SIZE	= 1 << 18,
 };
 
 static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)
@@ -135,6 +136,7 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
 	struct mlx4_icm *icm;
 	struct mlx4_icm_chunk *chunk = NULL;
 	int cur_order;
+	gfp_t mask;
 	int ret;
 
 	/* We use sg_set_buf for coherent allocs, which assumes low memory */
@@ -178,13 +180,16 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
 		while (1 << cur_order > npages)
 			--cur_order;
 
+		mask = gfp_mask;
+		if (cur_order)
+			mask = (mask & ~__GFP_DIRECT_RECLAIM) | __GFP_NORETRY;
 		if (coherent)
 			ret = mlx4_alloc_icm_coherent(&dev->persist->pdev->dev,
 						      &chunk->mem[chunk->npages],
-						      cur_order, gfp_mask);
+						      cur_order, mask);
 		else
 			ret = mlx4_alloc_icm_pages(&chunk->mem[chunk->npages],
-						   cur_order, gfp_mask,
+						   cur_order, mask,
 						   dev->numa_node);
 
 		if (ret) {
-- 
2.17.0.921.gf22659ad46-goog

^ permalink raw reply related

* WARNING in iov_iter_revert
From: Gerb Stralko @ 2018-05-30  4:58 UTC (permalink / raw)
  To: aviadye, davejwatson, davem, ilyal, linux-kernel, netdev,
	syzkaller-bugs

Hello,

I'm in the process of fixing this bug 
(https://syzkaller.appspot.com/bug?id=1339e0a805a4ddb11eaee6fb6b1bc905493ded77)

However, there are a couple things I'm trying to wrap my head around to understand and fix this issue properly.

I'm using the C reproducer from syzkaller[1] to trigger this issue which has been a huge help in learning the Linux kernel.

First in tls_sw.c in tls_sw_sendmsg we having the following: 

`do_tcp_sendpages
   `tls_push_sg
      `tls_push_record 

returns -EPIPE (-32) because sk->sk_shudown is set.  See do_tcp_sendpages 
in net/ipv4/tcp.c

Do we need to check for -EPIPE before continuing in tls_sw.c after 
tls_push_record, for example:

if (ret == -EAGAIN || ret == -EPIPE)
                     ^^^^^^^^^^^^^^
  goto send_end;

If -EPIPE is valid and we don't need to check for that particular 
condition 
then calling iov_iter_revert might be wrong because of the second 
argument:

ctx->sg_plaintext_size - orig_size

Because that statement will evaluate to -1 because ctx->sg_plaintext_size 
has been set to zero from earlier when calling:

`free_sg
  `tls_push_sg  

And orig_size is set to 1 from the beginning of the while loop:

orig_size = ctx->sg_plaintext_size

So calling iov_iter_revert with the second argument being -1 will trigger 
the warning in lib/iov_iter.c:iov_iter_revert:

if (WARN_ON(unroll > MAX_RW_COUNT))
     return;

Am I correct in thinking EPIPE should be checked after returning from 
tls_push_record?  Or do we need to do some sanity checks on 
sg_plaintext_size and orig_size before calling iov_iter_revert?

Any help would be greatly appreciated.

[1] https://syzkaller.appspot.com/text?tag=ReproC&x=141d5417800000

On Sunday, May 13, 2018 at 9:28:03 AM UTC-7, syzbot wrote:
>
> Hello, 
>
> syzbot found the following crash on: 
>
> HEAD commit:    427fbe89261d Merge branch 'next' of 
git://git.kernel.org/p.. 
>
> git tree:       upstream 
> console output: https://syzkaller.appspot.com/x/log.txt?x=16b33477800000 
> kernel config:  
https://syzkaller.appspot.com/x/.config?x=fcce42b221691ff9 
> dashboard link: 
> https://syzkaller.appspot.com/bug?extid=c226690f7b3126c5ee04 
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental) 
> syzkaller 
repro:https://syzkaller.appspot.com/x/repro.syz?x=144f1997800000 
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=141d5417800000 
>
> IMPORTANT: if you fix the bug, please add the following tag to the 
commit: 
> Reported-by: syzbot+c22669...@syzkaller.appspotmail.com <javascript:> 
>
> random: sshd: uninitialized urandom read (32 bytes read) 
> random: sshd: uninitialized urandom read (32 bytes read) 
> random: sshd: uninitialized urandom read (32 bytes read) 
> random: sshd: uninitialized urandom read (32 bytes read) 
> random: sshd: uninitialized urandom read (32 bytes read) 
> WARNING: CPU: 1 PID: 4542 at lib/iov_iter.c:857 
> iov_iter_revert+0x2ee/0xaa0   
> lib/iov_iter.c:857 
> Kernel panic - not syncing: panic_on_warn set ... 
>
> CPU: 1 PID: 4542 Comm: syz-executor650 Not tainted 4.17.0-rc4+ #44 
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS   
> Google 01/01/2011 
> Call Trace: 
>   __dump_stack lib/dump_stack.c:77 [inline] 
>   dump_stack+0x1b9/0x294 lib/dump_stack.c:113 
>   panic+0x22f/0x4de kernel/panic.c:184 
>   __warn.cold.8+0x163/0x1b3 kernel/panic.c:536 
>   report_bug+0x252/0x2d0 lib/bug.c:186 
>   fixup_bug arch/x86/kernel/traps.c:178 [inline] 
>   do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296 
>   do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 
>   invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992 
> RIP: 0010:iov_iter_revert+0x2ee/0xaa0 lib/iov_iter.c:857 
> RSP: 0018:ffff8801ad1bf700 EFLAGS: 00010293 
> RAX: ffff8801ac55e6c0 RBX: 00000000ffffffff RCX: ffffffff835104a1 
> RDX: 0000000000000000 RSI: ffffffff8351074e RDI: 0000000000000007 
> RBP: ffff8801ad1bf760 R08: ffff8801ac55e6c0 R09: ffffed003b5e46c2 
> R10: 0000000000000003 R11: 0000000000000001 R12: 0000000000000001 
> R13: ffff8801ad1bfd60 R14: 0000000000000011 R15: ffff8801ae9ac040 
>   tls_sw_sendmsg+0xf1c/0x12d0 net/tls/tls_sw.c:448 
>   inet_sendmsg+0x19f/0x690 net/ipv4/af_inet.c:798 
>   sock_sendmsg_nosec net/socket.c:629 [inline] 
>   sock_sendmsg+0xd5/0x120 net/socket.c:639 
>   ___sys_sendmsg+0x805/0x940 net/socket.c:2117 
>   __sys_sendmsg+0x115/0x270 net/socket.c:2155 
>   __do_sys_sendmsg net/socket.c:2164 [inline] 
>   __se_sys_sendmsg net/socket.c:2162 [inline] 
>   __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162 
>   do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe 
> RIP: 0033:0x4403a9 
> RSP: 002b:00007ffdcdfbd6c8 EFLAGS: 00000207 ORIG_RAX: 000000000000002e 
> RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004403a9 
> RDX: 0000000000000000 RSI: 0000000020001340 RDI: 0000000000000003 
> RBP: 00000000006ca018 R08: 000000000000001c R09: 000000000000001c 
> R10: 0000000020000180 R11: 0000000000000207 R12: 0000000000401cd0 
> R13: 0000000000401d60 R14: 0000000000000000 R15: 0000000000000000 
> Dumping ftrace buffer: 
>     (ftrace buffer empty) 
> Kernel Offset: disabled 
> Rebooting in 86400 seconds.. 
>
>
> --- 
> This bug is generated by a bot. It may contain errors. 
> See https://goo.gl/tpsmEJ for more information about syzbot. 
> syzbot engineers can be reached at syzk...@googlegroups.com 
<javascript:>. 
>
>
> syzbot will keep track of this bug report. See: 
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with   
> syzbot. 
> syzbot can test patches for this bug, for details see: 
> https://goo.gl/tpsmEJ#testing-patches 
>

^ permalink raw reply

* Re: [PATCH bpf v2 0/5] fix test_sockmap
From: John Fastabend @ 2018-05-30  5:12 UTC (permalink / raw)
  To: Prashant Bhole, Alexei Starovoitov, Daniel Borkmann
  Cc: David S . Miller, Shuah Khan, netdev, linux-kselftest
In-Reply-To: <95bb9c9a-25ed-4b10-a181-987a61e0a24c@lab.ntt.co.jp>

On 05/29/2018 05:44 PM, Prashant Bhole wrote:
> 
> 
> On 5/30/2018 12:48 AM, John Fastabend wrote:
>> On 05/27/2018 09:37 PM, Prashant Bhole wrote:
>>> This series fixes error handling, timeout and data verification in
>>> test_sockmap. Previously it was not able to detect failure/timeout in
>>> RX/TX thread because error was not notified to the main thread.
>>>
>>> Also slightly improved test output by printing parameter values (cork,
>>> apply, start, end) so that parameters for all tests are displayed.
>>>
>>> Prashant Bhole (5):
>>>    selftests/bpf: test_sockmap, check test failure
>>>    selftests/bpf: test_sockmap, join cgroup in selftest mode
>>>    selftests/bpf: test_sockmap, fix test timeout
>>>    selftests/bpf: test_sockmap, fix data verification
>>>    selftests/bpf: test_sockmap, print additional test options
>>>
>>>   tools/testing/selftests/bpf/test_sockmap.c | 76 +++++++++++++++++-----
>>>   1 file changed, 58 insertions(+), 18 deletions(-)
>>>
>>
>> After first patch "check test failure" how do we handle the case
>> where test is known to cause timeouts because we are specifically testing
>> these cases. This is the 'cork' parameter we discussed in the last
>> series. It looks like with this series the test may still throw an
>> error?
> 
> Sorry. In your comment in last series, did you mean to skip error
> checking only for all cork tests (for now)?
> 
> -Prashant
> 

Hi, After this is applied are any errors returned from test_sockmap?
When I read the first patch it looked like timeouts from the cork
tests may result in errors "FAILED" tests. If this is the case then
yes we need skip error checking on all tests or just the corked
tests.

.John

^ permalink raw reply

* Re: [PATCH bpf v2 0/5] fix test_sockmap
From: Prashant Bhole @ 2018-05-30  5:31 UTC (permalink / raw)
  To: John Fastabend, Alexei Starovoitov, Daniel Borkmann
  Cc: David S . Miller, Shuah Khan, netdev, linux-kselftest
In-Reply-To: <f477f42d-f3fb-6138-b0af-589d049790a0@gmail.com>



On 5/30/2018 2:12 PM, John Fastabend wrote:
> On 05/29/2018 05:44 PM, Prashant Bhole wrote:
>>
>>
>> On 5/30/2018 12:48 AM, John Fastabend wrote:
>>> On 05/27/2018 09:37 PM, Prashant Bhole wrote:
>>>> This series fixes error handling, timeout and data verification in
>>>> test_sockmap. Previously it was not able to detect failure/timeout in
>>>> RX/TX thread because error was not notified to the main thread.
>>>>
>>>> Also slightly improved test output by printing parameter values (cork,
>>>> apply, start, end) so that parameters for all tests are displayed.
>>>>
>>>> Prashant Bhole (5):
>>>>     selftests/bpf: test_sockmap, check test failure
>>>>     selftests/bpf: test_sockmap, join cgroup in selftest mode
>>>>     selftests/bpf: test_sockmap, fix test timeout
>>>>     selftests/bpf: test_sockmap, fix data verification
>>>>     selftests/bpf: test_sockmap, print additional test options
>>>>
>>>>    tools/testing/selftests/bpf/test_sockmap.c | 76 +++++++++++++++++-----
>>>>    1 file changed, 58 insertions(+), 18 deletions(-)
>>>>
>>>
>>> After first patch "check test failure" how do we handle the case
>>> where test is known to cause timeouts because we are specifically testing
>>> these cases. This is the 'cork' parameter we discussed in the last
>>> series. It looks like with this series the test may still throw an
>>> error?
>>
>> Sorry. In your comment in last series, did you mean to skip error
>> checking only for all cork tests (for now)?
>>
>> -Prashant
>>
> 
> Hi, After this is applied are any errors returned from test_sockmap?
> When I read the first patch it looked like timeouts from the cork
> tests may result in errors "FAILED" tests. If this is the case then
> yes we need skip error checking on all tests or just the corked
> tests.

Yes errors returned after applying this series. I will skip error 
checking on just corked tests.

-Prashant

^ permalink raw reply

* [PATCH v5 net] stmmac: 802.1ad tag stripping fix
From: Elad Nachman @ 2018-05-30  5:48 UTC (permalink / raw)
  To: Toshiaki Makita, David Miller
  Cc: Jose Abreu, Florian Fainelli, netdev, peppe.cavallaro,
	alexandre.torgue, eladv6
In-Reply-To: <8fcb3661-40f5-dc40-0800-d47494e021c1@lab.ntt.co.jp>

stmmac reception handler calls stmmac_rx_vlan() to strip the vlan before calling napi_gro_receive().

The function assumes VLAN tagged frames are always tagged with 802.1Q protocol,
and assigns ETH_P_8021Q to the skb by hard-coding the parameter on call to __vlan_hwaccel_put_tag() .

This causes packets not to be passed to the VLAN slave if it was created with 802.1AD protocol
(ip link add link eth0 eth0.100 type vlan proto 802.1ad id 100).

This fix passes the protocol from the VLAN header into __vlan_hwaccel_put_tag()
instead of using the hard-coded value of ETH_P_8021Q.
NETIF_F_HW_VLAN_CTAG_RX check was removed and NETIF_F_HW_VLAN_STAG_RX feature was added to be in line with the driver actual abilities.

Signed-off-by: Elad Nachman <eladn@gilat.com>

---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index b65e2d1..f680bcf 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3293,17 +3293,17 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 
 static void stmmac_rx_vlan(struct net_device *dev, struct sk_buff *skb)
 {
-	struct ethhdr *ehdr;
+	struct vlan_ethhdr *veth;
 	u16 vlanid;
+	__be16 vlan_proto;
 
-	if ((dev->features & NETIF_F_HW_VLAN_CTAG_RX) ==
-	    NETIF_F_HW_VLAN_CTAG_RX &&
-	    !__vlan_get_tag(skb, &vlanid)) {
+	if (!__vlan_get_tag(skb, &vlanid)) {
 		/* pop the vlan tag */
-		ehdr = (struct ethhdr *)skb->data;
-		memmove(skb->data + VLAN_HLEN, ehdr, ETH_ALEN * 2);
+		veth = (struct vlan_ethhdr *)skb->data;
+		vlan_proto = veth->h_vlan_proto;
+		memmove(skb->data + VLAN_HLEN, veth, ETH_ALEN * 2);
 		skb_pull(skb, VLAN_HLEN);
-		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlanid);
+		__vlan_hwaccel_put_tag(skb, vlan_proto, vlanid);
 	}
 }
 
@@ -4344,7 +4344,7 @@ int stmmac_dvr_probe(struct device *device,
 	ndev->watchdog_timeo = msecs_to_jiffies(watchdog);
 #ifdef STMMAC_VLAN_TAG_USED
 	/* Both mac100 and gmac support receive VLAN tag detection */
-	ndev->features |= NETIF_F_HW_VLAN_CTAG_RX;
+	ndev->features |= NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_HW_VLAN_STAG_RX;
 #endif
 	priv->msg_enable = netif_msg_init(debug, default_msg_level);
 
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
From: Jakub Kicinski @ 2018-05-30  5:56 UTC (permalink / raw)
  To: Michael Chan, Samudrala, Sridhar; +Cc: David Miller, Netdev, Or Gerlitz

On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
> > Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
> > extended?

+1 it's painful to see this feature being added to the legacy
API :(  Another duplicated configuration knob.

> I didn't know about that.
>
> > Shouldn't we enable this via ethtool on the port representor netdev?
>
> We discussed about this.  ethtool on the VF representor will only work
> in switchdev mode and also will not support min/max values.

Ethtool channel API may be overdue a rewrite in devlink anyway, but I
feel like implementing switchdev mode and rewriting features in devlink
may be too much to ask.

^ permalink raw reply

* [PATCH bpf v3 0/5] fix test_sockmap
From: Prashant Bhole @ 2018-05-30  5:56 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend
  Cc: Prashant Bhole, David S . Miller, Shuah Khan, netdev,
	linux-kselftest

This series fixes error handling, timeout and data verification in
test_sockmap. Previously it was not able to detect failure/timeout in
RX/TX thread because error was not notified to the main thread.

Also slightly improved test output by printing parameter values (cork,
apply, start, end) so that parameters for all tests are displayed.

Changes in v3:
  - Skipped error checking for corked tests

Prashant Bhole (5):
  selftests/bpf: test_sockmap, check test failure
  selftests/bpf: test_sockmap, join cgroup in selftest mode
  selftests/bpf: test_sockmap, fix test timeout
  selftests/bpf: test_sockmap, fix data verification
  selftests/bpf: test_sockmap, print additional test options

 tools/testing/selftests/bpf/test_sockmap.c | 76 +++++++++++++++++-----
 1 file changed, 58 insertions(+), 18 deletions(-)

-- 
2.17.0

^ permalink raw reply

* [PATCH bpf v3 1/5] selftests/bpf: test_sockmap, check test failure
From: Prashant Bhole @ 2018-05-30  5:56 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend
  Cc: Prashant Bhole, David S . Miller, Shuah Khan, netdev,
	linux-kselftest
In-Reply-To: <20180530055611.10216-1-bhole_prashant_q7@lab.ntt.co.jp>

Test failures are not identified because exit code of RX/TX threads
is not checked. Also threads are not returning correct exit code.

- Return exit code from threads depending on test execution status
- In main thread, check the exit code of RX/TX threads
- Skip error checking for corked tests as they are expected to timeout

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 25 ++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index eb17fae458e6..01bc9c6745e8 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -429,8 +429,8 @@ static int sendmsg_test(struct sockmap_options *opt)
 	struct msg_stats s = {0};
 	int iov_count = opt->iov_count;
 	int iov_buf = opt->iov_length;
+	int rx_status, tx_status;
 	int cnt = opt->rate;
-	int status;
 
 	errno = 0;
 
@@ -442,7 +442,7 @@ static int sendmsg_test(struct sockmap_options *opt)
 	rxpid = fork();
 	if (rxpid == 0) {
 		if (opt->drop_expected)
-			exit(1);
+			exit(0);
 
 		if (opt->sendpage)
 			iov_count = 1;
@@ -463,7 +463,7 @@ static int sendmsg_test(struct sockmap_options *opt)
 				"rx_sendmsg: TX: %zuB %fB/s %fGB/s RX: %zuB %fB/s %fGB/s\n",
 				s.bytes_sent, sent_Bps, sent_Bps/giga,
 				s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
-		exit(1);
+		exit(err ? 1 : 0);
 	} else if (rxpid == -1) {
 		perror("msg_loop_rx: ");
 		return errno;
@@ -491,14 +491,27 @@ static int sendmsg_test(struct sockmap_options *opt)
 				"tx_sendmsg: TX: %zuB %fB/s %f GB/s RX: %zuB %fB/s %fGB/s\n",
 				s.bytes_sent, sent_Bps, sent_Bps/giga,
 				s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
-		exit(1);
+		exit(err ? 1 : 0);
 	} else if (txpid == -1) {
 		perror("msg_loop_tx: ");
 		return errno;
 	}
 
-	assert(waitpid(rxpid, &status, 0) == rxpid);
-	assert(waitpid(txpid, &status, 0) == txpid);
+	assert(waitpid(rxpid, &rx_status, 0) == rxpid);
+	assert(waitpid(txpid, &tx_status, 0) == txpid);
+	if (WIFEXITED(rx_status)) {
+		err = WEXITSTATUS(rx_status);
+		if (err && !txmsg_cork) {
+			fprintf(stderr, "rx thread exited with err %d. ", err);
+			goto out;
+		}
+	}
+	if (WIFEXITED(tx_status)) {
+		err = WEXITSTATUS(tx_status);
+		if (err)
+			fprintf(stderr, "tx thread exited with err %d. ", err);
+	}
+out:
 	return err;
 }
 
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf v3 2/5] selftests/bpf: test_sockmap, join cgroup in selftest mode
From: Prashant Bhole @ 2018-05-30  5:56 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend
  Cc: Prashant Bhole, David S . Miller, Shuah Khan, netdev,
	linux-kselftest
In-Reply-To: <20180530055611.10216-1-bhole_prashant_q7@lab.ntt.co.jp>

In case of selftest mode, temporary cgroup environment is created but
cgroup is not joined. It causes test failures. Fixed by joining the
cgroup

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 01bc9c6745e8..64f9e25c451f 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1342,6 +1342,11 @@ static int __test_suite(char *bpf_file)
 		return cg_fd;
 	}
 
+	if (join_cgroup(CG_PATH)) {
+		fprintf(stderr, "ERROR: failed to join cgroup\n");
+		return -EINVAL;
+	}
+
 	/* Tests basic commands and APIs with range of iov values */
 	txmsg_start = txmsg_end = 0;
 	err = test_txmsg(cg_fd);
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf v3 3/5] selftests/bpf: test_sockmap, fix test timeout
From: Prashant Bhole @ 2018-05-30  5:56 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend
  Cc: Prashant Bhole, David S . Miller, Shuah Khan, netdev,
	linux-kselftest
In-Reply-To: <20180530055611.10216-1-bhole_prashant_q7@lab.ntt.co.jp>

In order to reduce runtime of tests, recently timout for select() call
was reduced from 1sec to 10usec. This was causing many tests failures.
It was caught with failure handling commits in this series.

Restoring the timeout from 10usec to 1sec

Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 64f9e25c451f..9d01f5c2abe2 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -345,8 +345,8 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 		if (err < 0)
 			perror("recv start time: ");
 		while (s->bytes_recvd < total_bytes) {
-			timeout.tv_sec = 0;
-			timeout.tv_usec = 10;
+			timeout.tv_sec = 1;
+			timeout.tv_usec = 0;
 
 			/* FD sets */
 			FD_ZERO(&w);
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf v3 4/5] selftests/bpf: test_sockmap, fix data verification
From: Prashant Bhole @ 2018-05-30  5:56 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend
  Cc: Prashant Bhole, David S . Miller, Shuah Khan, netdev,
	linux-kselftest
In-Reply-To: <20180530055611.10216-1-bhole_prashant_q7@lab.ntt.co.jp>

When data verification is enabled, some tests fail because verification is done
incorrectly. Following changes fix it.

- Identify the size of data block to be verified
- Reset verification counter when data block size is reached
- Fixed the value printed in case of verfication failure

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 9d01f5c2abe2..664f268dc02a 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -337,8 +337,15 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 		int fd_flags = O_NONBLOCK;
 		struct timeval timeout;
 		float total_bytes;
+		int bytes_cnt = 0;
+		int chunk_sz;
 		fd_set w;
 
+		if (opt->sendpage)
+			chunk_sz = iov_length * cnt;
+		else
+			chunk_sz = iov_length * iov_count;
+
 		fcntl(fd, fd_flags);
 		total_bytes = (float)iov_count * (float)iov_length * (float)cnt;
 		err = clock_gettime(CLOCK_MONOTONIC, &s->start);
@@ -388,9 +395,14 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 							errno = -EIO;
 							fprintf(stderr,
 								"detected data corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n",
-								i, j, d[j], k - 1, d[j+1], k + 1);
+								i, j, d[j], k - 1, d[j+1], k);
 							goto out_errno;
 						}
+						bytes_cnt++;
+						if (bytes_cnt == chunk_sz) {
+							k = 0;
+							bytes_cnt = 0;
+						}
 						recv--;
 					}
 				}
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf v3 5/5] selftests/bpf: test_sockmap, print additional test options
From: Prashant Bhole @ 2018-05-30  5:56 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, John Fastabend
  Cc: Prashant Bhole, David S . Miller, Shuah Khan, netdev,
	linux-kselftest
In-Reply-To: <20180530055611.10216-1-bhole_prashant_q7@lab.ntt.co.jp>

Print values of test options like apply, cork, start, end so that
individual failed tests can be identified for manual run

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 28 +++++++++++++++-------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 664f268dc02a..637c6585ff80 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -869,6 +869,8 @@ static char *test_to_str(int test)
 #define OPTSTRING 60
 static void test_options(char *options)
 {
+	char tstr[OPTSTRING];
+
 	memset(options, 0, OPTSTRING);
 
 	if (txmsg_pass)
@@ -881,14 +883,22 @@ static void test_options(char *options)
 		strncat(options, "redir_noisy,", OPTSTRING);
 	if (txmsg_drop)
 		strncat(options, "drop,", OPTSTRING);
-	if (txmsg_apply)
-		strncat(options, "apply,", OPTSTRING);
-	if (txmsg_cork)
-		strncat(options, "cork,", OPTSTRING);
-	if (txmsg_start)
-		strncat(options, "start,", OPTSTRING);
-	if (txmsg_end)
-		strncat(options, "end,", OPTSTRING);
+	if (txmsg_apply) {
+		snprintf(tstr, OPTSTRING, "apply %d,", txmsg_apply);
+		strncat(options, tstr, OPTSTRING);
+	}
+	if (txmsg_cork) {
+		snprintf(tstr, OPTSTRING, "cork %d,", txmsg_cork);
+		strncat(options, tstr, OPTSTRING);
+	}
+	if (txmsg_start) {
+		snprintf(tstr, OPTSTRING, "start %d,", txmsg_start);
+		strncat(options, tstr, OPTSTRING);
+	}
+	if (txmsg_end) {
+		snprintf(tstr, OPTSTRING, "end %d,", txmsg_end);
+		strncat(options, tstr, OPTSTRING);
+	}
 	if (txmsg_ingress)
 		strncat(options, "ingress,", OPTSTRING);
 	if (txmsg_skb)
@@ -897,7 +907,7 @@ static void test_options(char *options)
 
 static int __test_exec(int cgrp, int test, struct sockmap_options *opt)
 {
-	char *options = calloc(60, sizeof(char));
+	char *options = calloc(OPTSTRING, sizeof(char));
 	int err;
 
 	if (test == SENDPAGE)
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
From: Michael Chan @ 2018-05-30  6:08 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Samudrala, Sridhar, David Miller, Netdev, Or Gerlitz
In-Reply-To: <CAJpBn1xTCsu9upLJP1e6d0vOwKfR=XygJQb_XwPX93ynF9-ppQ@mail.gmail.com>

On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
>> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
>> > Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
>> > extended?
>
> +1 it's painful to see this feature being added to the legacy
> API :(  Another duplicated configuration knob.
>
>> I didn't know about that.
>>
>> > Shouldn't we enable this via ethtool on the port representor netdev?
>>
>> We discussed about this.  ethtool on the VF representor will only work
>> in switchdev mode and also will not support min/max values.
>
> Ethtool channel API may be overdue a rewrite in devlink anyway, but I
> feel like implementing switchdev mode and rewriting features in devlink
> may be too much to ask.

Totally agreed.  And switchdev mode doesn't seem to be that widely
used at the moment.  Do you have other suggestions besides NDO?

^ permalink raw reply

* Re: [PATCH v5 net] stmmac: 802.1ad tag stripping fix
From: Toshiaki Makita @ 2018-05-30  6:08 UTC (permalink / raw)
  To: Elad Nachman
  Cc: David Miller, Jose Abreu, Florian Fainelli, netdev,
	peppe.cavallaro, alexandre.torgue
In-Reply-To: <113191f7-ad35-151f-3414-a2342ff0e13c@gmail.com>

On 2018/05/30 14:48, Elad Nachman wrote:
> stmmac reception handler calls stmmac_rx_vlan() to strip the vlan before calling napi_gro_receive().
> 
> The function assumes VLAN tagged frames are always tagged with 802.1Q protocol,
> and assigns ETH_P_8021Q to the skb by hard-coding the parameter on call to __vlan_hwaccel_put_tag() .
> 
> This causes packets not to be passed to the VLAN slave if it was created with 802.1AD protocol
> (ip link add link eth0 eth0.100 type vlan proto 802.1ad id 100).
> 
> This fix passes the protocol from the VLAN header into __vlan_hwaccel_put_tag()
> instead of using the hard-coded value of ETH_P_8021Q.
> NETIF_F_HW_VLAN_CTAG_RX check was removed and NETIF_F_HW_VLAN_STAG_RX feature was added to be in line with the driver actual abilities.
> 
> Signed-off-by: Elad Nachman <eladn@gilat.com>
> 
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index b65e2d1..f680bcf 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -3293,17 +3293,17 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  static void stmmac_rx_vlan(struct net_device *dev, struct sk_buff *skb)
>  {
> -	struct ethhdr *ehdr;
> +	struct vlan_ethhdr *veth;
>  	u16 vlanid;
> +	__be16 vlan_proto;
>  
> -	if ((dev->features & NETIF_F_HW_VLAN_CTAG_RX) ==
> -	    NETIF_F_HW_VLAN_CTAG_RX &&
> -	    !__vlan_get_tag(skb, &vlanid)) {
> +	if (!__vlan_get_tag(skb, &vlanid)) {
>  		/* pop the vlan tag */
> -		ehdr = (struct ethhdr *)skb->data;
> -		memmove(skb->data + VLAN_HLEN, ehdr, ETH_ALEN * 2);
> +		veth = (struct vlan_ethhdr *)skb->data;
> +		vlan_proto = veth->h_vlan_proto;
> +		memmove(skb->data + VLAN_HLEN, veth, ETH_ALEN * 2);
>  		skb_pull(skb, VLAN_HLEN);
> -		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlanid);
> +		__vlan_hwaccel_put_tag(skb, vlan_proto, vlanid);
>  	}

Should this function contents be surrounded by
#ifdef STMMAC_VLAN_TAG_USED, since the features is enabled only when it
is defined?
Otherwise looks good to me from the perspective of vlan features.

>  }
>  
> @@ -4344,7 +4344,7 @@ int stmmac_dvr_probe(struct device *device,
>  	ndev->watchdog_timeo = msecs_to_jiffies(watchdog);
>  #ifdef STMMAC_VLAN_TAG_USED
>  	/* Both mac100 and gmac support receive VLAN tag detection */
> -	ndev->features |= NETIF_F_HW_VLAN_CTAG_RX;
> +	ndev->features |= NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_HW_VLAN_STAG_RX;
>  #endif
>  	priv->msg_enable = netif_msg_init(debug, default_msg_level);
>  
> 

-- 
Toshiaki Makita

^ permalink raw reply

* Re: [PATCH net-next v16 4/8] netfilter: Add nf_ct_get_tuple_skb callback
From: kbuild test robot @ 2018-05-30  6:11 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: kbuild-all, netdev, cake, netfilter-devel
In-Reply-To: <152751766686.30935.14644567905547700823.stgit@alrua-kau>

[-- Attachment #1: Type: text/plain, Size: 2192 bytes --]

Hi Toke,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net/master]
[also build test WARNING on v4.17-rc7]
[cannot apply to net-next/master next-20180529]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Toke-H-iland-J-rgensen/sched-Add-Common-Applications-Kept-Enhanced-cake-qdisc/20180530-125240
config: i386-randconfig-a0-05291352 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   net/netfilter/core.c: In function 'nf_ct_get_tuple_skb':
>> net/netfilter/core.c:586:12: warning: assignment from incompatible pointer type
     get_tuple = rcu_dereference(skb_ct_get_tuple);
               ^
>> net/netfilter/core.c:589:18: warning: passing argument 1 of 'get_tuple' from incompatible pointer type
     ret = get_tuple(dst_tuple, skb);
                     ^
   net/netfilter/core.c:589:18: note: expected 'const struct sk_buff *' but argument is of type 'struct nf_conntrack_tuple *'
   net/netfilter/core.c:589:29: warning: passing argument 2 of 'get_tuple' from incompatible pointer type
     ret = get_tuple(dst_tuple, skb);
                                ^
   net/netfilter/core.c:589:29: note: expected 'struct nf_conntrack_tuple *' but argument is of type 'const struct sk_buff *'

vim +586 net/netfilter/core.c

   578	
   579	bool nf_ct_get_tuple_skb(struct nf_conntrack_tuple *dst_tuple,
   580				 const struct sk_buff *skb)
   581	{
   582		bool (*get_tuple)(const struct sk_buff *, struct nf_conntrack_tuple *);
   583		bool ret = false;
   584	
   585		rcu_read_lock();
 > 586		get_tuple = rcu_dereference(skb_ct_get_tuple);
   587		if (!get_tuple)
   588			goto out;
 > 589		ret = get_tuple(dst_tuple, skb);
   590	out:
   591		rcu_read_unlock();
   592		return ret;
   593	}
   594	EXPORT_SYMBOL(nf_ct_get_tuple_skb);
   595	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28923 bytes --]

^ permalink raw reply

* Re: [PATCH v5 net] stmmac: 802.1ad tag stripping fix
From: Elad Nachman @ 2018-05-30  6:16 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: David Miller, Jose Abreu, Florian Fainelli, netdev,
	peppe.cavallaro, alexandre.torgue, eladv6
In-Reply-To: <3d8a01cf-34c5-f7f7-1cb4-6c9187ef88a1@lab.ntt.co.jp>

Interesting question. That's the way the driver was originally written and I tried to minimize the changes in the patch.
Anyway, common.h (included by stmmac_main.c) contains the following:

#if IS_ENABLED(CONFIG_VLAN_8021Q)
#define STMMAC_VLAN_TAG_USED
#include <linux/if_vlan.h>
#endif

So the define in question kicks in only once you enable 802.1Q support in the kernel .config .

Thanks,

Elad.

On 30/05/18 09:08, Toshiaki Makita wrote:
> On 2018/05/30 14:48, Elad Nachman wrote:
>> stmmac reception handler calls stmmac_rx_vlan() to strip the vlan before calling napi_gro_receive().
>>
>> The function assumes VLAN tagged frames are always tagged with 802.1Q protocol,
>> and assigns ETH_P_8021Q to the skb by hard-coding the parameter on call to __vlan_hwaccel_put_tag() .
>>
>> This causes packets not to be passed to the VLAN slave if it was created with 802.1AD protocol
>> (ip link add link eth0 eth0.100 type vlan proto 802.1ad id 100).
>>
>> This fix passes the protocol from the VLAN header into __vlan_hwaccel_put_tag()
>> instead of using the hard-coded value of ETH_P_8021Q.
>> NETIF_F_HW_VLAN_CTAG_RX check was removed and NETIF_F_HW_VLAN_STAG_RX feature was added to be in line with the driver actual abilities.
>>
>> Signed-off-by: Elad Nachman <eladn@gilat.com>
>>
>> ---
>>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 16 ++++++++--------
>>  1 file changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> index b65e2d1..f680bcf 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> @@ -3293,17 +3293,17 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
>>  
>>  static void stmmac_rx_vlan(struct net_device *dev, struct sk_buff *skb)
>>  {
>> -	struct ethhdr *ehdr;
>> +	struct vlan_ethhdr *veth;
>>  	u16 vlanid;
>> +	__be16 vlan_proto;
>>  
>> -	if ((dev->features & NETIF_F_HW_VLAN_CTAG_RX) ==
>> -	    NETIF_F_HW_VLAN_CTAG_RX &&
>> -	    !__vlan_get_tag(skb, &vlanid)) {
>> +	if (!__vlan_get_tag(skb, &vlanid)) {
>>  		/* pop the vlan tag */
>> -		ehdr = (struct ethhdr *)skb->data;
>> -		memmove(skb->data + VLAN_HLEN, ehdr, ETH_ALEN * 2);
>> +		veth = (struct vlan_ethhdr *)skb->data;
>> +		vlan_proto = veth->h_vlan_proto;
>> +		memmove(skb->data + VLAN_HLEN, veth, ETH_ALEN * 2);
>>  		skb_pull(skb, VLAN_HLEN);
>> -		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlanid);
>> +		__vlan_hwaccel_put_tag(skb, vlan_proto, vlanid);
>>  	}
> 
> Should this function contents be surrounded by
> #ifdef STMMAC_VLAN_TAG_USED, since the features is enabled only when it
> is defined?
> Otherwise looks good to me from the perspective of vlan features.
> 
>>  }
>>  
>> @@ -4344,7 +4344,7 @@ int stmmac_dvr_probe(struct device *device,
>>  	ndev->watchdog_timeo = msecs_to_jiffies(watchdog);
>>  #ifdef STMMAC_VLAN_TAG_USED
>>  	/* Both mac100 and gmac support receive VLAN tag detection */
>> -	ndev->features |= NETIF_F_HW_VLAN_CTAG_RX;
>> +	ndev->features |= NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_HW_VLAN_STAG_RX;
>>  #endif
>>  	priv->msg_enable = netif_msg_init(debug, default_msg_level);
>>  
>>
> 

^ permalink raw reply

* Re: [PATCH v5 net] stmmac: 802.1ad tag stripping fix
From: Toshiaki Makita @ 2018-05-30  6:26 UTC (permalink / raw)
  To: Elad Nachman
  Cc: David Miller, Jose Abreu, Florian Fainelli, netdev,
	peppe.cavallaro, alexandre.torgue
In-Reply-To: <6fccf717-93a1-9154-d618-70556d7e5854@gmail.com>

On 2018/05/30 15:16, Elad Nachman wrote:
> Interesting question. That's the way the driver was originally written and I tried to minimize the changes in the patch.
> Anyway, common.h (included by stmmac_main.c) contains the following:
> 
> #if IS_ENABLED(CONFIG_VLAN_8021Q)
> #define STMMAC_VLAN_TAG_USED
> #include <linux/if_vlan.h>
> #endif
> 
> So the define in question kicks in only once you enable 802.1Q support in the kernel .config .

So, we end up with stripping vlan even though the device does not have
HW_VLAN_CTAG/STAG_RX when CONFIG_VLAN_8021Q is disabled, since the patch
anyway removed the original condition...

-- 
Toshiaki Makita

^ permalink raw reply

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
From: Jakub Kicinski @ 2018-05-30  6:33 UTC (permalink / raw)
  To: Michael Chan; +Cc: Samudrala, Sridhar, David Miller, Netdev, Or Gerlitz
In-Reply-To: <CACKFLikZ8t11Fp-0LLqLRHwBS4O6U04zuMhb8nGNS9E3-abzRg@mail.gmail.com>

On Tue, 29 May 2018 23:08:11 -0700, Michael Chan wrote:
> On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski wrote:
> > On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
> >> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
> >> > Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
> >> > extended?
> >
> > +1 it's painful to see this feature being added to the legacy
> > API :(  Another duplicated configuration knob.
> >
> >> I didn't know about that.
> >>
> >> > Shouldn't we enable this via ethtool on the port representor netdev?
> >>
> >> We discussed about this.  ethtool on the VF representor will only work
> >> in switchdev mode and also will not support min/max values.
> >
> > Ethtool channel API may be overdue a rewrite in devlink anyway, but I
> > feel like implementing switchdev mode and rewriting features in devlink
> > may be too much to ask.
>
> Totally agreed.  And switchdev mode doesn't seem to be that widely
> used at the moment.  Do you have other suggestions besides NDO?

At some points you (Broadcom) were working whole bunch of devlink
configuration options for the PCIe side of the ASIC.  The number of
queues relates to things like number of allocated MSI-X vectors, which
if memory serves me was in your devlink patch set.  In an ideal world
we would try to keep all those in one place :)

For PCIe config there is always the question of what can be configured
at runtime, and what requires a HW reset.  Therefore that devlink API
which could configure current as well as persistent device settings was
quite nice.  I'm not sure if reallocating queues would ever require
PCIe block reset but maybe...  Certainly it seems the notion of min
queues would make more sense in PCIe configuration devlink API than
ethtool channel API to me as well.

Queues are in the grey area between netdev and non-netdev constructs.
They make sense both from PCIe resource allocation perspective (i.e.
devlink PCIe settings) and netdev perspective (ethtool) because they
feed into things like qdisc offloads, maybe per-queue stats etc.

So yes...  IMHO it would be nice to add this to a devlink SR-IOV config
API and/or switchdev representors.  But neither of those are really an
option for you today so IDK :)

^ permalink raw reply

* [PATCH rdma-next] net/mlx5: Use flow counter pointer as input to the query function
From: Or Gerlitz @ 2018-05-30  6:35 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Leon Romanovsky, raeds, linux-rdma, netdev, Or Gerlitz

This allows to un-expose the details of struct mlx5_fc and keep
it internal to the core driver as it used to be.

Change-Id: I780cd74863fa2beccdd52e7d0cdd1e117a5aa353
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---

Jason,

As you asked, I am sending a fixup in case you intend to apply
V2 of the flow counter series [1], if there's going to be V3,
Leon, please apply it from the begining.

Or.

[1] https://marc.info/?l=linux-netdev&m=152759937829994&w=2

 drivers/infiniband/hw/mlx5/main.c                  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  | 15 ++++++--------
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  | 22 +++++++++++++++++---
 .../net/ethernet/mellanox/mlx5/core/fs_counters.c  |  4 ++--
 include/linux/mlx5/fs.h                            | 24 ++++------------------
 5 files changed, 32 insertions(+), 35 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index ac99125..4b09dcd 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3151,7 +3151,7 @@ static int read_flow_counters(struct ib_device *ibdev,
 	struct mlx5_fc *fc = read_attr->hw_cntrs_hndl;
 	struct mlx5_ib_dev *dev = to_mdev(ibdev);
 
-	return mlx5_fc_query(dev->mdev, fc->id,
+	return mlx5_fc_query(dev->mdev, fc,
 			     &read_attr->out[IB_COUNTER_PACKETS],
 			     &read_attr->out[IB_COUNTER_BYTES]);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 6cab1dd..f63dfbc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -2104,21 +2104,18 @@ static int mlx5_eswitch_query_vport_drop_stats(struct mlx5_core_dev *dev,
 	struct mlx5_vport *vport = &esw->vports[vport_idx];
 	u64 rx_discard_vport_down, tx_discard_vport_down;
 	u64 bytes = 0;
-	u16 idx = 0;
 	int err = 0;
 
 	if (!vport->enabled || esw->mode != SRIOV_LEGACY)
 		return 0;
 
-	if (vport->egress.drop_counter) {
-		idx = vport->egress.drop_counter->id;
-		mlx5_fc_query(dev, idx, &stats->rx_dropped, &bytes);
-	}
+	if (vport->egress.drop_counter)
+		mlx5_fc_query(dev, vport->egress.drop_counter,
+			      &stats->rx_dropped, &bytes);
 
-	if (vport->ingress.drop_counter) {
-		idx = vport->ingress.drop_counter->id;
-		mlx5_fc_query(dev, idx, &stats->tx_dropped, &bytes);
-	}
+	if (vport->ingress.drop_counter)
+		mlx5_fc_query(dev, vport->ingress.drop_counter,
+			      &stats->tx_dropped, &bytes);
 
 	if (!MLX5_CAP_GEN(dev, receive_discard_vport_down) &&
 	    !MLX5_CAP_GEN(dev, transmit_discard_vport_down))
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
index 40992ae..0211d77 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
@@ -131,6 +131,25 @@ struct mlx5_flow_table {
 	struct rhltable			fgs_hash;
 };
 
+struct mlx5_fc_cache {
+	u64 packets;
+	u64 bytes;
+	u64 lastuse;
+};
+
+struct mlx5_fc {
+	struct rb_node node;
+	struct list_head list;
+
+	u64 lastpackets;
+	u64 lastbytes;
+
+	u32 id;
+	bool deleted;
+	bool aging;
+	struct mlx5_fc_cache cache ____cacheline_aligned_in_smp;
+};
+
 struct mlx5_ft_underlay_qp {
 	struct list_head list;
 	u32 qpn;
@@ -210,9 +229,6 @@ void mlx5_fc_queue_stats_work(struct mlx5_core_dev *dev,
 			      unsigned long delay);
 void mlx5_fc_update_sampling_interval(struct mlx5_core_dev *dev,
 				      unsigned long interval);
-int mlx5_fc_query(struct mlx5_core_dev *dev, u16 id,
-		  u64 *packets, u64 *bytes);
-
 int mlx5_init_fs(struct mlx5_core_dev *dev);
 void mlx5_cleanup_fs(struct mlx5_core_dev *dev);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
index 10f4078..58af6be 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
@@ -314,10 +314,10 @@ void mlx5_cleanup_fc_stats(struct mlx5_core_dev *dev)
 	}
 }
 
-int mlx5_fc_query(struct mlx5_core_dev *dev, u16 id,
+int mlx5_fc_query(struct mlx5_core_dev *dev, struct mlx5_fc *counter,
 		  u64 *packets, u64 *bytes)
 {
-	return mlx5_cmd_fc_query(dev, id, packets, bytes);
+	return mlx5_cmd_fc_query(dev, counter->id, packets, bytes);
 }
 EXPORT_SYMBOL(mlx5_fc_query);
 
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 4612e0a..ef2f3bf 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -185,30 +185,14 @@ int mlx5_modify_rule_destination(struct mlx5_flow_handle *handler,
 struct mlx5_fc *mlx5_flow_rule_counter(struct mlx5_flow_handle *handler);
 struct mlx5_fc *mlx5_fc_create(struct mlx5_core_dev *dev, bool aging);
 void mlx5_fc_destroy(struct mlx5_core_dev *dev, struct mlx5_fc *counter);
+
+struct mlx5_fc *counter;
+
 void mlx5_fc_query_cached(struct mlx5_fc *counter,
 			  u64 *bytes, u64 *packets, u64 *lastuse);
-int mlx5_fc_query(struct mlx5_core_dev *dev, u16 id,
+int mlx5_fc_query(struct mlx5_core_dev *dev, struct mlx5_fc *counter,
 		  u64 *packets, u64 *bytes);
 
-struct mlx5_fc_cache {
-	u64 packets;
-	u64 bytes;
-	u64 lastuse;
-};
-
-struct mlx5_fc {
-	struct rb_node node;
-	struct list_head list;
-
-	u64 lastpackets;
-	u64 lastbytes;
-
-	u32 id;
-	bool deleted;
-	bool aging;
-	struct mlx5_fc_cache cache ____cacheline_aligned_in_smp;
-};
-
 int mlx5_fs_add_rx_underlay_qpn(struct mlx5_core_dev *dev, u32 underlay_qpn);
 int mlx5_fs_remove_rx_underlay_qpn(struct mlx5_core_dev *dev, u32 underlay_qpn);
 
-- 
2.3.7

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox