Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2] netfilter: properly initialize xt_table_info structure
From: Michal Kubecek @ 2018-05-31 11:32 UTC (permalink / raw)
  To: peter pi
  Cc: Greg Kroah-Hartman, Florian Westphal, Jan Engelhardt,
	Eric Dumazet, Greg Hackmann, Pablo Neira Ayuso, Jozsef Kadlecsik,
	netfilter-devel, coreteam, netdev
In-Reply-To: <CANZU63VE7fWNL+PJrLp7-5PBS6R6RQPvhw2QgqAK8NhX4uQc9Q@mail.gmail.com>

On Thu, May 31, 2018 at 05:40:40PM +0800, peter pi wrote:
> 
> My test method is very simple:
> 1, In copy_to_user, add a function call like my_examine(from, n) to check
> every 8 bytes. There is an kernel function called  virt_addr_valid which
> can check if the value is a address value.
> 2, Print a kernel log when there is a leak detected in function my_examine
> 3, Run iptables-save or ip6tables-save in shell, it will hit the kernel
> code path of the problem

I think I start to understand the problem. IPT_SO_GET_ENTRIES leads to
calling copy_entries_to_user() which copies the entries as they are to
user provided buffer. It also copies instances of struct xt_entry_match
and struct xt_entry_target which contain kernel pointers. We then
rewrite them with match/target name for userspace but the layout looks
(on x86_64) like this

/* offset    |  size */  type = struct xt_entry_match {
/*    0      |    32 */    union {
/*                32 */        struct {
/*    0      |     2 */            __u16 match_size;
/*    2      |    29 */            char name[29];
/*   31      |     1 */            __u8 revision;

                                   /* total size (bytes):   32 */
                               } user;
/*                16 */        struct {
/*    0      |     2 */            __u16 match_size;
/* XXX  6-byte hole  */
/*    8      |     8 */            struct xt_match *match;

                                   /* total size (bytes):   16 */
                               } kernel;
/*                 2 */        __u16 match_size;

                               /* total size (bytes):   32 */
                           } u;
/*   32      |     0 */    unsigned char data[];

                           /* total size (bytes):   32 */
                         }


so that if match name is no longer than five characters (which is often
the case), writing to .u.user.name leaves .u.kernel.match untouched. The
same problem exists in struct xt_entry_target.

Unless there are other kernel pointers leaked, the solution should be
simple: explicitly zero the copy of .u.kernel.match (.u.kernel.target)
before we copy the name. I haven't checked yet if compat_ code path
suffers from the same problem.

Michal Kubecek

^ permalink raw reply

* Re: general protection fault in requeue_rx_msgs
From: Kirill Tkhai @ 2018-05-31 11:34 UTC (permalink / raw)
  To: syzbot, davem, ebiggers, edumazet, linux-kernel, netdev,
	syzkaller-bugs, tom, viro
In-Reply-To: <0000000000000482ce056d7c1436@google.com>

On 31.05.2018 11:16, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    0044cdeb7313 Merge branch 'for-linus' of git://git.kernel...
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=15aeff0f800000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=968b0b23c7854c0b
> dashboard link: https://syzkaller.appspot.com/bug?extid=554266c04a41d1f9754d
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=131a208f800000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+554266c04a41d1f9754d@syzkaller.appspotmail.com
> 
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] SMP KASAN
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 4788 Comm: kworker/u4:3 Not tainted 4.17.0-rc7+ #74
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: kstrp strp_work
> RIP: 0010:__skb_unlink include/linux/skbuff.h:1844 [inline]
> RIP: 0010:__skb_dequeue include/linux/skbuff.h:1861 [inline]
> RIP: 0010:requeue_rx_msgs+0x14d/0x620 net/kcm/kcmsock.c:226
> RSP: 0018:ffff8801aa97f0b8 EFLAGS: 00010202
> RAX: 0000000000000000 RBX: dffffc0000000000 RCX: ffffffff86d54ed3
> RDX: 0000000000000001 RSI: ffffffff86d531e2 RDI: 0000000000000008
> RBP: ffff8801aa97f1b8 R08: ffff8801aaa8e3c0 R09: ffffed0035f0a0e8
> R10: ffffed0035f0a0e8 R11: ffff8801af850743 R12: ffff8801d4407000
> R13: ffffed003552fe22 R14: 0000000000000000 R15: ffff8801a6bb06c0
> FS:  0000000000000000(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fba7c5099a0 CR3: 00000001af6f6000 CR4: 00000000001406f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  unreserve_rx_kcm+0x471/0x520 net/kcm/kcmsock.c:334
>  kcm_rcv_strparser+0x109/0x8d0 net/kcm/kcmsock.c:375
>  __strp_recv+0x34b/0x2130 net/strparser/strparser.c:328
>  strp_recv+0xcf/0x110 net/strparser/strparser.c:362
>  tcp_read_sock+0x2aa/0x810 net/ipv4/tcp.c:1652
>  strp_read_sock+0x1a1/0x2d0 net/strparser/strparser.c:385
>  do_strp_work net/strparser/strparser.c:440 [inline]
>  strp_work+0xcd/0x120 net/strparser/strparser.c:449
>  process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
>  worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
>  kthread+0x345/0x410 kernel/kthread.c:240
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
> Code: 80 3c 1a 00 0f 85 70 04 00 00 48 8d 78 08 4d 8b 74 24 08 49 c7 04 24 00 00 00 00 49 c7 44 24 08 00 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 1a 00 0f 85 a7 04 00 00 4c 89 f2 4c 89 70 08 48 c1 ea 03
> RIP: __skb_unlink include/linux/skbuff.h:1844 [inline] RSP: ffff8801aa97f0b8
> RIP: __skb_dequeue include/linux/skbuff.h:1861 [inline] RSP: ffff8801aa97f0b8
> RIP: requeue_rx_msgs+0x14d/0x620 net/kcm/kcmsock.c:226 RSP: ffff8801aa97f0b8
> ---[ end trace e4c0e45094907eaa ]---

This looks like the same as syzbot+5f1a04e374a635efc426@syzkaller.appspotmail.com:
"WARNING in kcm_exit_net (3)". This may confirm the theory. It looks like after async
pernet_operations the race window became bigger, and now the work has more chances
to have no a time to complete.

I'm not close to this code. Tom, could you please to say, whether kcm_done_work()
can be called for in-kernel kcm sockets (created via kcm_clone())?

Also, is there a possibility to create !kernel socket in kcm_clone()? I forgot
the reasons, why we can't do that in some places.

Thanks,
Kirill

^ permalink raw reply

* [PATCH net-next] net: axienet: remove stale comment of axienet_open
From: YueHaibing @ 2018-05-31 11:51 UTC (permalink / raw)
  To: davem, anirudh, John.Linn; +Cc: netdev, linux-kernel, michal.simek, YueHaibing

axienet_open no longer return -ENODEV when PHY cannot be connected to
since commit d7cc3163e026 ("net: axienet: Support phy-less mode of operation")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
index e74e1e8..f24f48f 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
@@ -900,7 +900,6 @@ static void axienet_dma_err_handler(unsigned long data);
  * @ndev:	Pointer to net_device structure
  *
  * Return: 0, on success.
- *	    -ENODEV, if PHY cannot be connected to
  *	    non-zero error value on failure
  *
  * This is the driver open routine. It calls phy_start to start the PHY device.
-- 
2.7.0

^ permalink raw reply related

* Re: [PATCH 1/1] net: usb: cdc_mbim: add flag FLAG_SEND_ZLP
From: Daniele Palmas @ 2018-05-31 11:52 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: Oliver Neukum, netdev, linux-usb
In-Reply-To: <87a7sggnch.fsf@miraculix.mork.no>

2018-05-31 11:56 GMT+02:00 Bjørn Mork <bjorn@mork.no>:
> Daniele Palmas <dnlplm@gmail.com> writes:
>
>> Testing Telit LM940 with ICMP packets > 14552 bytes revealed that
>> the modem needs FLAG_SEND_ZLP to properly work, otherwise the cdc
>> mbim data interface won't be anymore responsive.
>>
>> Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
>
> Acked-by: Bjørn Mork <bjorn@mork.no>
>
> Should have thought of this... I noticed your discussion, but couldn't
> reproduce the issues myself.  This explains why.
>
> Do you happen to know if the device announces larger buffers than the
> driver wants to use, or if this happens with the max sized buffers too?
>
> You can easily check these values by comparing dwNtbInMaxSize and
> dwNtbOutMaxSize (device maximum values) with rx_max and tx_max
> (neogtiated values) using e.g
>
>  grep . /sys/class/net/wwan0/cdc_ncm/*
>

This seems to happen with the max sized buffers according to the output:

daniele@L2122:/home/daniele$ grep . /sys/class/net/wwp0s20u6i2/cdc_ncm/*

/sys/class/net/wwp0s20u6i2/cdc_ncm/bmNtbFormatsSupported:0x0001
/sys/class/net/wwp0s20u6i2/cdc_ncm/dwNtbInMaxSize:16384
/sys/class/net/wwp0s20u6i2/cdc_ncm/dwNtbOutMaxSize:16384
/sys/class/net/wwp0s20u6i2/cdc_ncm/min_tx_pkt:13312
/sys/class/net/wwp0s20u6i2/cdc_ncm/ndp_to_end:N
/sys/class/net/wwp0s20u6i2/cdc_ncm/rx_max:16384
/sys/class/net/wwp0s20u6i2/cdc_ncm/tx_max:16384
/sys/class/net/wwp0s20u6i2/cdc_ncm/tx_timer_usecs:400
/sys/class/net/wwp0s20u6i2/cdc_ncm/wNdpInAlignment:4
/sys/class/net/wwp0s20u6i2/cdc_ncm/wNdpInDivisor:1
/sys/class/net/wwp0s20u6i2/cdc_ncm/wNdpInPayloadRemainder:0
/sys/class/net/wwp0s20u6i2/cdc_ncm/wNdpOutAlignment:4
/sys/class/net/wwp0s20u6i2/cdc_ncm/wNdpOutDivisor:4
/sys/class/net/wwp0s20u6i2/cdc_ncm/wNdpOutPayloadRemainder:0
/sys/class/net/wwp0s20u6i2/cdc_ncm/wNtbOutMaxDatagrams:16

Thanks,
Daniele

>
> It has never been 100% clear to me whether we should send the ZLP by
> default if we've negotiated a smaller than max buffer. But the ZLP ought
> to be redundant in any case, since the device knows the negotiated
> buffer size. So I do believe our current interpretation makes sense.
>
> Not that it matters.  There are obviously more than enough device
> implementations violating this requirement to make it completely
> pointless.
>
>
> Bjørn

^ permalink raw reply

* Re: [PATCH v2] netfilter: properly initialize xt_table_info structure
From: Michal Kubecek @ 2018-05-31 11:55 UTC (permalink / raw)
  To: peter pi
  Cc: Greg Kroah-Hartman, Florian Westphal, Jan Engelhardt,
	Eric Dumazet, Greg Hackmann, Pablo Neira Ayuso, Jozsef Kadlecsik,
	netfilter-devel, coreteam, netdev
In-Reply-To: <20180531113215.sbqqjip2gxvhl2eg@unicorn.suse.cz>

On Thu, May 31, 2018 at 01:32:16PM +0200, Michal Kubecek wrote:
> I think I start to understand the problem. IPT_SO_GET_ENTRIES leads to
> calling copy_entries_to_user() which copies the entries as they are to
> user provided buffer. It also copies instances of struct xt_entry_match
> and struct xt_entry_target which contain kernel pointers. We then
> rewrite them with match/target name for userspace but the layout looks
> (on x86_64) like this
> 
> /* offset    |  size */  type = struct xt_entry_match {
> /*    0      |    32 */    union {
> /*                32 */        struct {
> /*    0      |     2 */            __u16 match_size;
> /*    2      |    29 */            char name[29];
> /*   31      |     1 */            __u8 revision;
> 
>                                    /* total size (bytes):   32 */
>                                } user;
> /*                16 */        struct {
> /*    0      |     2 */            __u16 match_size;
> /* XXX  6-byte hole  */
> /*    8      |     8 */            struct xt_match *match;
> 
>                                    /* total size (bytes):   16 */
>                                } kernel;
> /*                 2 */        __u16 match_size;
> 
>                                /* total size (bytes):   32 */
>                            } u;
> /*   32      |     0 */    unsigned char data[];
> 
>                            /* total size (bytes):   32 */
>                          }
> 
> 
> so that if match name is no longer than five characters (which is often
> the case), writing to .u.user.name leaves .u.kernel.match untouched. The
> same problem exists in struct xt_entry_target.

And this should no longer happen since the series

 f32815d21d4d ("xtables: add xt_match, xt_target and data copy_to_user functions")
 f77bc5b23fb1 ("iptables: use match, target and data copy_to_user helpers")
 e47ddb2c4691 ("ip6tables: use match, target and data copy_to_user helpers")
 244b531bee2b ("arptables: use match, target and data copy_to_user helpers")
 b5040f6c33a5 ("ebtables: use match, target and data copy_to_user helpers")
 4915f7bbc402 ("xtables: use match, target and data copy_to_user helpers in compat")
 ec2318904965 ("xtables: extend matches and targets with .usersize")

changed the logic in 4.11-rc1.

Michal Kubecek

^ permalink raw reply

* [PATCH 1/2 net-next] net_failover: fix net_failover_compute_features()
From: Dan Carpenter @ 2018-05-31 12:01 UTC (permalink / raw)
  To: David S. Miller, Sridhar Samudrala; +Cc: netdev, kernel-janitors

This has an '&' vs '|' typo so it starts with vlan_features set to none.
Also a u32 type isn't large enough to hold all the feature bits, it
should be netdev_features_t.

Fixes: cfc80d9a1163 ("net: Introduce net_failover driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

diff --git a/drivers/net/net_failover.c b/drivers/net/net_failover.c
index 8b508e2cf29b..ef50158e90a9 100644
--- a/drivers/net/net_failover.c
+++ b/drivers/net/net_failover.c
@@ -380,7 +380,8 @@ static rx_handler_result_t net_failover_handle_frame(struct sk_buff **pskb)
 
 static void net_failover_compute_features(struct net_device *dev)
 {
-	u32 vlan_features = FAILOVER_VLAN_FEATURES & NETIF_F_ALL_FOR_ALL;
+	netdev_features_t vlan_features = FAILOVER_VLAN_FEATURES |
+					  NETIF_F_ALL_FOR_ALL;
 	netdev_features_t enc_features  = FAILOVER_ENC_FEATURES;
 	unsigned short max_hard_header_len = ETH_HLEN;
 	unsigned int dst_release_flag = IFF_XMIT_DST_RELEASE |

^ permalink raw reply related

* [PATCH 2/2 net-next] net_failover: fix error code in net_failover_create()
From: Dan Carpenter @ 2018-05-31 12:04 UTC (permalink / raw)
  To: David S. Miller, Sridhar Samudrala; +Cc: netdev, kernel-janitors

We forgot to set the error code on this path.  This function is supposed
to return error pointers, so with this bug it accidentally returns NULL
and the caller doesn't check for that.

Fixes: cfc80d9a1163 ("net: Introduce net_failover driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

diff --git a/drivers/net/net_failover.c b/drivers/net/net_failover.c
index ef50158e90a9..881f3fa13e6b 100644
--- a/drivers/net/net_failover.c
+++ b/drivers/net/net_failover.c
@@ -761,8 +761,10 @@ struct failover *net_failover_create(struct net_device *standby_dev)
 	netif_carrier_off(failover_dev);
 
 	failover = failover_register(failover_dev, &net_failover_ops);
-	if (IS_ERR(failover))
+	if (IS_ERR(failover)) {
+		err = PTR_ERR(failover);
 		goto err_failover_register;
+	}
 
 	return failover;
 

^ permalink raw reply related

* Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag
From: Moshe Shemesh @ 2018-05-31 12:18 UTC (permalink / raw)
  To: Tariq Toukan, Eric Dumazet, Alexander Aring
  Cc: David Miller, edumazet, netdev, fw, herbert, tgraf, brouer,
	alex.aring, stefan, ktkhai, Eran Ben Elisha
In-Reply-To: <11b2baca-c810-3f61-38d1-415099783129@mellanox.com>



On 5/30/2018 10:20 AM, Tariq Toukan wrote:
> 
> 
> On 28/05/2018 7:09 PM, Eric Dumazet wrote:
>>
>>
>> On 05/28/2018 07:52 AM, Alexander Aring wrote:
>>
>>> as somebody who had similar issues with this patch series I can tell you
>>> about what happened for the 6LoWPAN fragmentation.
>>>
>>> The issue sounds similar, but there is too much missing information here
>>> to say something about if you have exactly the issue which we had.
>>>
>>> Our problem:
>>>
>>> The patch series uses memcmp() to compare hash keys, we had some padding
>>> bytes in our hash key and it occurs that we had sometimes random bytes
>>> in this structure when it's put on stack. We solved it by a struct
>>> foo_key bar = {}, which in case of gcc it _seems_ it makes a whole
>>> memset(bar, 0, ..) on the structure.
>>>
>>> I asked on the netdev mailinglist how to deal with this problem in
>>> general, because = {} works in case of gcc, others compilers may have a
>>> different handling or even gcc will changes this behaviour in future.
>>> I got no reply so I did what it works for me. :-)
>>>
>>> At least maybe a memcmp() on structures should never be used, it should
>>> be compared by field. I would recommend this way when the compiler is
>>> always clever enough to optimize it in some cases, but I am not so a
>>> compiler expert to say anything about that.
>>>
>>> I checked the hash key structures for x86_64 and pahole, so far I didn't
>>> find any padding bytes there, but it might be different on
>>> architectures or ?compiler?.
>>>
>>> Additional useful information to check if you running into the same 
>>> problem
>>> would be:
>>>
>>>   - Which architecture do you use?
>>>
>>>   - Do you have similar problems with a veth setup?
>>>
>>> You could also try this:
>>>
>>> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
>>> index b939b94e7e91..40ece9ab8b12 100644
>>> --- a/net/ipv6/reassembly.c
>>> +++ b/net/ipv6/reassembly.c
>>> @@ -142,19 +142,19 @@ static void ip6_frag_expire(struct timer_list *t)
>>>   static struct frag_queue *
>>>   fq_find(struct net *net, __be32 id, const struct ipv6hdr *hdr, int 
>>> iif)
>>>   {
>>> -       struct frag_v6_compare_key key = {
>>> -               .id = id,
>>> -               .saddr = hdr->saddr,
>>> -               .daddr = hdr->daddr,
>>> -               .user = IP6_DEFRAG_LOCAL_DELIVER,
>>> -               .iif = iif,
>>> -       };
>>> +       struct frag_v6_compare_key key = {};
>>>          struct inet_frag_queue *q;
>>>          if (!(ipv6_addr_type(&hdr->daddr) & (IPV6_ADDR_MULTICAST |
>>>                                              IPV6_ADDR_LINKLOCAL)))
>>>                  key.iif = 0;
>>> +       key.id = id;
>>> +       key.saddr = hdr->saddr;
>>> +       key.daddr = hdr->daddr;
>>> +       key.user = IP6_DEFRAG_LOCAL_DELIVER;
>>> +       key.iif = iif;
>>> +
>>>          q = inet_frag_find(&net->ipv6.frags, &key);
>>>          if (!q)
>>>                  return NULL;
>>>
>>> - Alex
>>>
>>
>> Hi Alex.
>>
>> This patch makes no sense, since struct frag_v6_compare_key has no hole.
>>
>> Only 6LoWPAN had a problem really, because of its way of having unions 
>> (and holes).
>>
>> Also note that your patch would break the case when we force key.iif 
>> to be zero.
>>
>>
>> Tariq, here are my test results : No drops for me.
>>
>> # ./netperf -H 2607:f8b0:8099:e18:: -t UDP_STREAM
>> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 
>> 2607:f8b0:8099:e18:: () port 0 AF_INET6
>> Socket  Message  Elapsed      Messages
>> Size    Size     Time         Okay Errors   Throughput
>> bytes   bytes    secs            #      #   10^6bits/sec
>>
>> 212992   65507   10.00      202117      0    10592.00
>> 212992           10.00           0              0.00
>>
>> Somehow, you might send packets too fast and receiver has a problem 
>> with that ?
> 
> Not sure, the transmit BW you get is higher than what we saw.
> Anyway, we'll check this.
> 
>> For particular needs, you might need to adjust :
>>
>> /proc/sys/net/ipv6/ip6frag_time  (to 2 seconds instead of the default 
>> of 60)
>> /proc/sys/net/ipv6/ip6frag_low_thresh
>> /proc/sys/net/ipv6/ip6frag_high_thresh
>>
>> Once your receiver has filled its capacity with frags, the default of 
>> 60 seconds to garbage collect
>> might be the reason you notice a problem.
>>
>> Check :
>> grep FRAG6 /proc/net/sockstat6
>>
>> On Google servers we multiply by 25 the limits for ipv6 frags memory 
>> usage :
>>
>> /proc/sys/net/ipv6/ip6frag_high_thresh:104857600  (instead of 4MB)
>> /proc/sys/net/ipv6/ip6frag_low_thresh:78643200  (instead of 3 MB)
>>
>> When using 64KB datagrams, note that the truesize of the datagram 
>> would be about 44 * 2 = 88 KB,
>> so after ~40 lost packets in the network, you no longer can accept 
>> ipv6 fragments, until garbage
>> collector evicted old datagrams.
>>
> 
> Great.
> Moshe, please try the suggested above.

I do see big improvement after changing the 3 parameters as Eric suggested:
/proc/sys/net/ipv6/ip6frag_time  set to 2
/proc/sys/net/ipv6/ip6frag_low_thresh set to 104857600
/proc/sys/net/ipv6/ip6frag_high_thresh set to 78643200


[root@reg-l-vrt-67100-104 linux-stable]#  netperf -H 
fe80::7efe:90ff:fed5:bb48%ens9,inet6 -t udp_stream --
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 
fe80::7efe:90ff:fed5:bb48%ens9 () port 0 AF_INET6
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      156387      0    8194.60
212992           10.00       76901           4029.57

#kernel
Ip6InReceives                   7107999            0.0
Ip6InDelivers                   114126             0.0
Ip6OutRequests                  47                 0.0
Ip6ReasmTimeout                 5115               0.0
Ip6ReasmReqds                   7107987            0.0
Ip6ReasmOKs                     114114             0.0
Ip6ReasmFails                   1714146            0.0
...
Udp6InDatagrams                 112486             0.0
Udp6InErrors                    1629               0.0
Udp6RcvbufErrors                1629               0.0
...

While before these parameters settings I got:
[root@reg-l-vrt-67100-104 ~]# netperf -H 
fe80::e61d:2dff:feca:c7c3%ens9,inet6 -t udp_stream --
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 
fe80::e61d:2dff:feca:c7c3%ens9 () port 0 AF_INET6
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      145419      0    7620.35
212992           10.00         285             14.93

#kernel
Ip6InReceives                   6665965            0.0
Ip6InDelivers                   300                0.0
Ip6OutRequests                  9                  0.0
Ip6ReasmReqds                   6665950            0.0
Ip6ReasmOKs                     285                0.0
Ip6ReasmFails                   6650890            0.0
...
Udp6InDatagrams                 286                0.0


however, before the patchset, I got much better results:
[root@reg-l-vrt-67100-104 linux-stable]#  netperf -H 
fe80::7efe:90ff:fed5:bb48%ens9,inet6 -t udp_stream --
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 
fe80::7efe:90ff:fed5:bb48%ens9 () port 0 AF_INET6
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      158935      0    8328.32
212992           10.00      144652           7579.88


#kernel
Ip6InReceives                   7088903            0.0
Ip6InDelivers                   154117             0.0
Ip6OutRequests                  9                  0.0
Ip6ReasmReqds                   7088889            0.0
Ip6ReasmOKs                     154103             0.0
...
Udp6InDatagrams                 144653             0.0
Udp6InErrors                    9451               0.0
Udp6RcvbufErrors                9451               0.0


> 
> In case these values dramatically improve performance, maybe its time to 
> change the default.
> 
> Thanks,
> Tariq
> 
>>
>>
>>
>>
>>
>>

^ permalink raw reply

* [PATCH] caif: use non-archaic spelling of failes
From: Thadeu Lima de Souza Cascardo @ 2018-05-31 12:18 UTC (permalink / raw)
  To: netdev; +Cc: dmitry.tarnyagin

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
---
 include/net/caif/caif_layer.h | 2 +-
 net/caif/cfrfml.c             | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/caif/caif_layer.h b/include/net/caif/caif_layer.h
index 94e5ed64dc6d..8a114e57bcb6 100644
--- a/include/net/caif/caif_layer.h
+++ b/include/net/caif/caif_layer.h
@@ -22,7 +22,7 @@ struct caif_packet_funcs;
  * @assert: expression to evaluate.
  *
  * This function will print a error message and a do WARN_ON if the
- * assertion failes. Normally this will do a stack up at the current location.
+ * assertion fails. Normally this will do a stack up at the current location.
  */
 #define caif_assert(assert)					\
 do {								\
diff --git a/net/caif/cfrfml.c b/net/caif/cfrfml.c
index b82440e1fcb4..3f2c63c78004 100644
--- a/net/caif/cfrfml.c
+++ b/net/caif/cfrfml.c
@@ -85,7 +85,7 @@ static struct cfpkt *rfm_append(struct cfrfml *rfml, char *seghead,
 	tmppkt = cfpkt_append(rfml->incomplete_frm, pkt,
 			rfml->pdu_size + RFM_HEAD_SIZE);
 
-	/* If cfpkt_append failes input pkts are not freed */
+	/* If cfpkt_append fails input pkts are not freed */
 	*err = -ENOMEM;
 	if (tmppkt == NULL)
 		return NULL;
-- 
2.17.0

^ permalink raw reply related

* [PATCH] vlan: use non-archaic spelling of failes
From: Thadeu Lima de Souza Cascardo @ 2018-05-31 12:20 UTC (permalink / raw)
  To: netdev

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
---
 include/linux/if_vlan.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 78a5a90b4267..83ea4df6ab81 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -331,7 +331,7 @@ static inline bool vlan_hw_offload_capable(netdev_features_t features,
  * @mac_len: MAC header length including outer vlan headers
  *
  * Inserts the VLAN tag into @skb as part of the payload at offset mac_len
- * Returns error if skb_cow_head failes.
+ * Returns error if skb_cow_head fails.
  *
  * Does not change skb->protocol so this function can be used during receive.
  */
@@ -379,7 +379,7 @@ static inline int __vlan_insert_inner_tag(struct sk_buff *skb,
  * @vlan_tci: VLAN TCI to insert
  *
  * Inserts the VLAN tag into @skb as part of the payload
- * Returns error if skb_cow_head failes.
+ * Returns error if skb_cow_head fails.
  *
  * Does not change skb->protocol so this function can be used during receive.
  */
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH net-next v4 00/11] Modify action API for implementing lockless actions
From: Vlad Buslov @ 2018-05-31 12:38 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, xiyou.wangcong, jiri, pablo, kadlec, fw, ast,
	daniel, edumazet, keescook, marcelo.leitner, kliteyn
In-Reply-To: <262fbd11-401e-90cf-4226-39b1604eb16d@mojatatu.com>


On Thu 31 May 2018 at 10:01, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> Hi Vlad,
>
> Can you try one simple test below with these patches?
>
> #create an action
> sudo $TC actions add action skbedit mark 1 pipe
> #
> sudo $TC qdisc del dev lo parent ffff:
> sudo $TC qdisc add dev lo ingress
> # bind action to filter....
> sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
> u32 match ip dst 127.0.0.1/32 flowid 1:1 action skbedit index 1
>
> #now delete that action multiple times while it is still bound
> sudo $TC actions del action skbedit index 1
> sudo $TC actions del action skbedit index 1
> sudo $TC actions del action skbedit index 1
>
> #check the refcount and bindcount
> sudo $TC -s actions ls action skbedit
>
> #delete the filter (which should remove the bindcnt)
>
> sudo $TC filter del dev lo parent ffff: protocol ip prio 1 \
> u32 match ip dst 127.0.0.1/32 flowid 1:1
>
> #check the refcount and bindcount
> sudo $TC -s actions ls action skbedit
>
> Current behavior: i believe the action is gone in this last step.
> Your patches may change behavior so that the action action is still
> around. I dont think this is a big deal, but just wanted to be sure
> it is not something more unexpected.
>
> cheers,
> jamal

Hi Jamal,

On current net-next I still have action with single reference after last
step:
~$ sudo $TC -s actions ls action skbedit                       
total acts 1                                                   
                                                               
        action order 0:  skbedit mark 1 pipe                   
         index 1 ref 2 bind 1 installed 47 sec used 47 sec     
        Action statistics:                                     
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0                               
~$ sudo $TC filter del dev lo parent ffff: protocol ip prio 1 \
> u32 match ip dst 127.0.0.1/32 flowid 1:1                     
~$ sudo $TC -s actions ls action skbedit                       
total acts 1                                                   
                                                               
        action order 0:  skbedit mark 1 pipe                   
         index 1 ref 1 bind 0 installed 80 sec used 80 sec     
        Action statistics:                                     
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0                               

Which branch are you testing on?

Regards,
Vlad

^ permalink raw reply

* [PATCH v2 net] mlx4_core: restore optimal ICM memory allocation
From: Eric Dumazet @ 2018-05-31 12:52 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, John Sperbeck, Tarick Bedeir,
	Qing Huang, Daniel Jurgens, Zhu Yanjun

Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
brought two regressions caught in our regression suite.

The big one is an additional cost of 256 bytes of overhead per 4096 bytes,
or 6.25 % which is unacceptable since ICM can be pretty large.

This comes from having to allocate one struct mlx4_icm_chunk (256 bytes)
per MLX4_TABLE_CHUNK, which the buggy commit shrank to 4KB
(instead of prior 256KB)

Note that mlx4_alloc_icm() is already able to try high order allocations
and fallback to low-order allocations under high memory pressure.

Most of these allocations happen right after boot time, when we get
plenty of non fragmented memory, there is really no point being so
pessimistic and break huge pages into order-0 ones just for fun.

We only have to tweak gfp_mask a bit, to help falling back faster,
without risking OOM killings.

Second regression is an KASAN fault, that will need further investigations.

Fixes: 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Cc: John Sperbeck <jsperbeck@google.com>
Cc: Tarick Bedeir <tarick@google.com>
Cc: Qing Huang <qing.huang@oracle.com>
Cc: Daniel Jurgens <danielj@mellanox.com>
Cc: Zhu Yanjun <yanjun.zhu@oracle.com>
---
 drivers/net/ethernet/mellanox/mlx4/icm.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
index 685337d58276fc91baeeb64387c52985e1bc6dda..5342bd8a3d0bfaa9e76bb9b6943790606c97b181 100644
--- a/drivers/net/ethernet/mellanox/mlx4/icm.c
+++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
@@ -43,12 +43,13 @@
 #include "fw.h"
 
 /*
- * We allocate in page size (default 4KB on many archs) chunks to avoid high
- * order memory allocations in fragmented/high usage memory situation.
+ * We allocate in as big chunks as we can, up to a maximum of 256 KB
+ * per chunk. Note that the chunks are not necessarily in contiguous
+ * physical memory.
  */
 enum {
-	MLX4_ICM_ALLOC_SIZE	= PAGE_SIZE,
-	MLX4_TABLE_CHUNK_SIZE	= PAGE_SIZE,
+	MLX4_ICM_ALLOC_SIZE	= 1 << 18,
+	MLX4_TABLE_CHUNK_SIZE	= 1 << 18,
 };
 
 static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)
@@ -135,6 +136,7 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
 	struct mlx4_icm *icm;
 	struct mlx4_icm_chunk *chunk = NULL;
 	int cur_order;
+	gfp_t mask;
 	int ret;
 
 	/* We use sg_set_buf for coherent allocs, which assumes low memory */
@@ -178,13 +180,17 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
 		while (1 << cur_order > npages)
 			--cur_order;
 
+		mask = gfp_mask;
+		if (cur_order)
+			mask &= ~__GFP_DIRECT_RECLAIM;
+
 		if (coherent)
 			ret = mlx4_alloc_icm_coherent(&dev->persist->pdev->dev,
 						      &chunk->mem[chunk->npages],
-						      cur_order, gfp_mask);
+						      cur_order, mask);
 		else
 			ret = mlx4_alloc_icm_pages(&chunk->mem[chunk->npages],
-						   cur_order, gfp_mask,
+						   cur_order, mask,
 						   dev->numa_node);
 
 		if (ret) {
-- 
2.17.0.921.gf22659ad46-goog

^ permalink raw reply related

* Re: [PATCH net-next v12 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Stephen Hemminger @ 2018-05-31 12:58 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
	alexander.h.duyck, kubakici, jasowang, loseweigh, jiri,
	aaron.f.brown, anjali.singhai
In-Reply-To: <274f0b84-07f1-5cd5-e256-ce4b71358c14@intel.com>

On Wed, 30 May 2018 20:03:11 -0700
"Samudrala, Sridhar" <sridhar.samudrala@intel.com> wrote:

> On 5/30/2018 7:06 PM, Stephen Hemminger wrote:
> > On Thu, 24 May 2018 09:55:14 -0700
> > Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
> >  
> >> Use the registration/notification framework supported by the generic
> >> failover infrastructure.
> >>
> >> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>  
> > Why was this merged? It was never signed off by any of the netvsc maintainers,
> > and there were still issues unresolved.
> >
> > There are also namespaces issues I am fixing and this breaks them.
> > Will start my patch set with a revert for this. Sorry  
> 
> I would appreciate if you can make the fixes on top of this patch series. I tried hard
> to make sure that netvsc functionality and behavior doesn't change.
> 
> It is possible that there could be some bugs introduced, but they can be fixed.
> Looks like Wei already found a bug and submitted a fix for that.
> 

Ok, but several of these may clash with what you want for virtio.
Like:
	- VF should be moved to namespace of virt device
	- VF should be associated based on message from host with serial # not
	  registration notifier and MAC address.
	- control operations should use master device reference rather than
	  searching based on MAC.

As you can see these are structural changes.

^ permalink raw reply

* [PATCH rdma-next v3 00/14] Verbs flow counters support
From: Leon Romanovsky @ 2018-05-31 13:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
	Michael J . Ruhl, Or Gerlitz, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev

From: Leon Romanovsky <leonro@mellanox.com>

Changelog:
v2->v3:
 * Change function mlx5_fc_query signature to hide the details of
   internal core driver struct mlx5_fc
 * Add commen to data[] field at struct mlx5_ib_flow_counters_data (mlx5-abi.h)
 * Use array of struct mlx5_ib_flow_counters_desc to clarify the output
v1->v2:
 * Removed conversion from struct mlx5_fc* to void*
 * Fixed one place with double space in it
 * Balanced release of hardware handler in case of counters allocation failure
 * Added Tested-by
 * Minimize time spent holding mutex lock
 * Fixed deadlock caused by nested lock in error path
 * Protect from handler pointer derefence in the error paths

Not changed: mlx5-abi.h

v0->v1:
 * Decouple from DevX submission
 * Use uverbs_attr_get_obj at counters read method
 * Added define for max read buffer size (MAX_COUNTERS_BUFF_SIZE)
 * Removed the struct mlx5_ib_flow_counter basic_flow_cnts and
   the related structs used, used define instead
 * Took Matan's patch from DevX
 * uverbs_free_counters removed void* casting
 * Added check to bound ncounters value (added define
 * Changed user supplied data buffer structure to be array of
   struct <desc,index> pair (applied this change to user space also)

Not changed:
 * UAPI files
 * Addition of uhw to flow

Thanks

----------------------------------------------------------------------
>From Raed:

This series comes to allow user space applications to monitor real time
traffic activity and events of the verbs objects it manages, e.g.:
ibv_qp, ibv_wq, ibv_flow.

This API enables generic counters creation and define mapping
to association with a verbs object, current mlx5 driver using
this API for flow counters.

With this API, an application can monitor the entire life cycle of
object activity, defined here as a static counters attachment.
This API also allows dynamic counters monitoring of measurement points
for a partial period in the verbs object life cycle.

In addition it presents the implementation of the generic counters interface.

This will be achieved by extending flow creation by adding a new flow count
specification type which allows the user to associate a previously created
flow counters using the generic verbs counters interface to the created flow,
once associated the user could read statistics by using the read function of
the generic counters interface.

The API includes:
1. create and destroyed API of a new counters objects
2. read the counters values from HW

Note:
Attaching API to allow application to define the measurement points per objects
is a user space only API and this data is passed to kernel when the counted
object (e.g. flow) is created with the counters object.

Thanks

Matan Barak (2):
  IB/uverbs: Add an ib_uobject getter to ioctl() infrastructure
  IB/core: Support passing uhw for create_flow

Or Gerlitz (1):
  net/mlx5: Use flow counter pointer as input to the query function

Raed Salem (11):
  net/mlx5: Export flow counter related API
  IB/core: Introduce counters object and its create/destroy
  IB/uverbs: Add create/destroy counters support
  IB/core: Introduce counters read verb
  IB/uverbs: Add read counters support
  IB/core: Add support for flow counters
  IB/uverbs: Add support for flow counters
  IB/mlx5: Add counters create and destroy support
  IB/mlx5: Add flow counters binding support
  IB/mlx5: Add flow counters read support
  IB/mlx5: Add counters read support

 drivers/infiniband/core/Makefile                   |   2 +-
 drivers/infiniband/core/uverbs.h                   |   2 +
 drivers/infiniband/core/uverbs_cmd.c               |  88 +++++-
 drivers/infiniband/core/uverbs_std_types.c         |   3 +-
 .../infiniband/core/uverbs_std_types_counters.c    | 157 +++++++++++
 drivers/infiniband/core/uverbs_std_types_cq.c      |  23 +-
 .../infiniband/core/uverbs_std_types_flow_action.c |   4 +-
 drivers/infiniband/core/verbs.c                    |   2 +-
 drivers/infiniband/hw/mlx4/main.c                  |   6 +-
 drivers/infiniband/hw/mlx5/main.c                  | 305 ++++++++++++++++++++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h               |  36 +++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  15 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |   2 -
 .../net/ethernet/mellanox/mlx5/core/fs_counters.c  |   7 +-
 include/linux/mlx5/fs.h                            |   4 +
 include/rdma/ib_verbs.h                            |  43 ++-
 include/rdma/uverbs_ioctl.h                        |  11 +
 include/uapi/rdma/ib_user_ioctl_cmds.h             |  21 ++
 include/uapi/rdma/ib_user_verbs.h                  |  13 +
 include/uapi/rdma/mlx5-abi.h                       |  24 ++
 20 files changed, 712 insertions(+), 56 deletions(-)
 create mode 100644 drivers/infiniband/core/uverbs_std_types_counters.c

^ permalink raw reply

* [PATCH rdma-next v3 01/14] IB/uverbs: Add an ib_uobject getter to ioctl() infrastructure
From: Leon Romanovsky @ 2018-05-31 13:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
	Michael J . Ruhl, Or Gerlitz, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20180531134341.18441-1-leon@kernel.org>

From: Matan Barak <matanb@mellanox.com>

Previously, the user had to dig inside the attribute to get the uobject.
Add a helper function that correctly extract it (and do the required
checks) for him/her.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/uverbs_std_types_cq.c      | 23 +++++++++++-----------
 .../infiniband/core/uverbs_std_types_flow_action.c |  4 ++--
 include/rdma/uverbs_ioctl.h                        | 11 +++++++++++
 3 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_std_types_cq.c b/drivers/infiniband/core/uverbs_std_types_cq.c
index b0dbae9dd0d7..3d293d01afea 100644
--- a/drivers/infiniband/core/uverbs_std_types_cq.c
+++ b/drivers/infiniband/core/uverbs_std_types_cq.c
@@ -65,7 +65,6 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(struct ib_device *ib_dev,
 	struct ib_cq_init_attr attr = {};
 	struct ib_cq                   *cq;
 	struct ib_uverbs_completion_event_file    *ev_file = NULL;
-	const struct uverbs_attr *ev_file_attr;
 	struct ib_uobject *ev_file_uobj;
 
 	if (!(ib_dev->uverbs_cmd_mask & 1ULL << IB_USER_VERBS_CMD_CREATE_CQ))
@@ -87,10 +86,8 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(struct ib_device *ib_dev,
 						UVERBS_ATTR_CREATE_CQ_FLAGS)))
 		return -EFAULT;
 
-	ev_file_attr = uverbs_attr_get(attrs, UVERBS_ATTR_CREATE_CQ_COMP_CHANNEL);
-	if (!IS_ERR(ev_file_attr)) {
-		ev_file_uobj = ev_file_attr->obj_attr.uobject;
-
+	ev_file_uobj = uverbs_attr_get_uobject(attrs, UVERBS_ATTR_CREATE_CQ_COMP_CHANNEL);
+	if (!IS_ERR(ev_file_uobj)) {
 		ev_file = container_of(ev_file_uobj,
 				       struct ib_uverbs_completion_event_file,
 				       uobj_file.uobj);
@@ -102,8 +99,8 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(struct ib_device *ib_dev,
 		goto err_event_file;
 	}
 
-	obj = container_of(uverbs_attr_get(attrs,
-					   UVERBS_ATTR_CREATE_CQ_HANDLE)->obj_attr.uobject,
+	obj = container_of(uverbs_attr_get_uobject(attrs,
+						   UVERBS_ATTR_CREATE_CQ_HANDLE),
 			   typeof(*obj), uobject);
 	obj->uverbs_file	   = ucontext->ufile;
 	obj->comp_events_reported  = 0;
@@ -170,13 +167,17 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_DESTROY)(struct ib_device *ib_dev,
 						    struct ib_uverbs_file *file,
 						    struct uverbs_attr_bundle *attrs)
 {
-	struct ib_uverbs_destroy_cq_resp resp;
 	struct ib_uobject *uobj =
-		uverbs_attr_get(attrs, UVERBS_ATTR_DESTROY_CQ_HANDLE)->obj_attr.uobject;
-	struct ib_ucq_object *obj = container_of(uobj, struct ib_ucq_object,
-						 uobject);
+		uverbs_attr_get_uobject(attrs, UVERBS_ATTR_DESTROY_CQ_HANDLE);
+	struct ib_uverbs_destroy_cq_resp resp;
+	struct ib_ucq_object *obj;
 	int ret;
 
+	if (IS_ERR(uobj))
+		return PTR_ERR(uobj);
+
+	obj = container_of(uobj, struct ib_ucq_object, uobject);
+
 	if (!(ib_dev->uverbs_cmd_mask & 1ULL << IB_USER_VERBS_CMD_DESTROY_CQ))
 		return -EOPNOTSUPP;
 
diff --git a/drivers/infiniband/core/uverbs_std_types_flow_action.c b/drivers/infiniband/core/uverbs_std_types_flow_action.c
index b4f016dfa23d..a7be51cf2e42 100644
--- a/drivers/infiniband/core/uverbs_std_types_flow_action.c
+++ b/drivers/infiniband/core/uverbs_std_types_flow_action.c
@@ -320,7 +320,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_FLOW_ACTION_ESP_CREATE)(struct ib_device
 		return ret;
 
 	/* No need to check as this attribute is marked as MANDATORY */
-	uobj = uverbs_attr_get(attrs, UVERBS_ATTR_FLOW_ACTION_ESP_HANDLE)->obj_attr.uobject;
+	uobj = uverbs_attr_get_uobject(attrs, UVERBS_ATTR_FLOW_ACTION_ESP_HANDLE);
 	action = ib_dev->create_flow_action_esp(ib_dev, &esp_attr.hdr, attrs);
 	if (IS_ERR(action))
 		return PTR_ERR(action);
@@ -350,7 +350,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_FLOW_ACTION_ESP_MODIFY)(struct ib_device
 	if (ret)
 		return ret;
 
-	uobj = uverbs_attr_get(attrs, UVERBS_ATTR_FLOW_ACTION_ESP_HANDLE)->obj_attr.uobject;
+	uobj = uverbs_attr_get_uobject(attrs, UVERBS_ATTR_FLOW_ACTION_ESP_HANDLE);
 	action = uobj->object;
 
 	if (action->type != IB_FLOW_ACTION_ESP)
diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h
index 095383a4bd1a..bd6bba3a6e04 100644
--- a/include/rdma/uverbs_ioctl.h
+++ b/include/rdma/uverbs_ioctl.h
@@ -420,6 +420,17 @@ static inline void *uverbs_attr_get_obj(const struct uverbs_attr_bundle *attrs_b
 	return attr->obj_attr.uobject->object;
 }
 
+static inline struct ib_uobject *uverbs_attr_get_uobject(const struct uverbs_attr_bundle *attrs_bundle,
+							 u16 idx)
+{
+	const struct uverbs_attr *attr = uverbs_attr_get(attrs_bundle, idx);
+
+	if (IS_ERR(attr))
+		return ERR_CAST(attr);
+
+	return attr->obj_attr.uobject;
+}
+
 static inline int uverbs_copy_to(const struct uverbs_attr_bundle *attrs_bundle,
 				 size_t idx, const void *from, size_t size)
 {
-- 
2.14.3

^ permalink raw reply related

* [PATCH mlx5-next v3 02/14] net/mlx5: Use flow counter pointer as input to the query function
From: Leon Romanovsky @ 2018-05-31 13:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
	Michael J . Ruhl, Or Gerlitz, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20180531134341.18441-1-leon@kernel.org>

From: Or Gerlitz <ogerlitz@mellanox.com>

This allows to un-expose the details of struct mlx5_fc and keep
it internal to the core driver as it used to be.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c     | 15 ++++++---------
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h     |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c |  4 ++--
 3 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 9a24314b817a..bb9665b7e8e7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -2104,21 +2104,18 @@ static int mlx5_eswitch_query_vport_drop_stats(struct mlx5_core_dev *dev,
 	struct mlx5_vport *vport = &esw->vports[vport_idx];
 	u64 rx_discard_vport_down, tx_discard_vport_down;
 	u64 bytes = 0;
-	u16 idx = 0;
 	int err = 0;
 
 	if (!vport->enabled || esw->mode != SRIOV_LEGACY)
 		return 0;
 
-	if (vport->egress.drop_counter) {
-		idx = vport->egress.drop_counter->id;
-		mlx5_fc_query(dev, idx, &stats->rx_dropped, &bytes);
-	}
+	if (vport->egress.drop_counter)
+		mlx5_fc_query(dev, vport->egress.drop_counter,
+			      &stats->rx_dropped, &bytes);
 
-	if (vport->ingress.drop_counter) {
-		idx = vport->ingress.drop_counter->id;
-		mlx5_fc_query(dev, idx, &stats->tx_dropped, &bytes);
-	}
+	if (vport->ingress.drop_counter)
+		mlx5_fc_query(dev, vport->ingress.drop_counter,
+			      &stats->tx_dropped, &bytes);
 
 	if (!MLX5_CAP_GEN(dev, receive_discard_vport_down) &&
 	    !MLX5_CAP_GEN(dev, transmit_discard_vport_down))
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
index b6da322a8016..d589dbbc72a7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
@@ -233,7 +233,7 @@ void mlx5_fc_queue_stats_work(struct mlx5_core_dev *dev,
 			      unsigned long delay);
 void mlx5_fc_update_sampling_interval(struct mlx5_core_dev *dev,
 				      unsigned long interval);
-int mlx5_fc_query(struct mlx5_core_dev *dev, u16 id,
+int mlx5_fc_query(struct mlx5_core_dev *dev, struct mlx5_fc *counter,
 		  u64 *packets, u64 *bytes);
 
 int mlx5_init_fs(struct mlx5_core_dev *dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
index b7ab929d5f8e..f5cb3fa5d8cf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
@@ -312,10 +312,10 @@ void mlx5_cleanup_fc_stats(struct mlx5_core_dev *dev)
 	}
 }
 
-int mlx5_fc_query(struct mlx5_core_dev *dev, u16 id,
+int mlx5_fc_query(struct mlx5_core_dev *dev, struct mlx5_fc *counter,
 		  u64 *packets, u64 *bytes)
 {
-	return mlx5_cmd_fc_query(dev, id, packets, bytes);
+	return mlx5_cmd_fc_query(dev, counter->id, packets, bytes);
 }
 
 void mlx5_fc_query_cached(struct mlx5_fc *counter,
-- 
2.14.3

^ permalink raw reply related

* [PATCH rdma-next v3 04/14] IB/core: Introduce counters object and its create/destroy
From: Leon Romanovsky @ 2018-05-31 13:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
	Michael J . Ruhl, Or Gerlitz, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20180531134341.18441-1-leon@kernel.org>

From: Raed Salem <raeds@mellanox.com>

A verbs application may need to get statistics and info on various
aspects of a verb object (e.g. Flow, QP, ...), in general case the
application will state which object's counters its interested in
(we refer to this action as attach), bind this new counters object
to the appropriate verb object and on later stage read their values
using the counters object.

This series introduces a general API for counters object that may
accumulate any ib object counters type, bound and read on demand.

Counters instance is allocated on an IB context and belongs to
that context.
Upon successful creation the counters can be bound to a verbs
object so that hardware counter instances can be created and read.

Downstream patches in this series will introduce the attach, bind
and the read functionality.

Counters instance can be de-allocated, upon successful
destruction the related hardware resources are released.

Prior to destroy call the user must first make sure that the counters
is not being used by any IB object, e.g. not attached to any of its
counted type otherwise an EBUSY error is invoked.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 include/rdma/ib_verbs.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 406c98d7a09a..90c548437c8a 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2212,6 +2212,13 @@ struct ib_port_pkey_list {
 	struct list_head              pkey_list;
 };
 
+struct ib_counters {
+	struct ib_device	*device;
+	struct ib_uobject	*uobject;
+	/* num of objects attached */
+	atomic_t	usecnt;
+};
+
 struct uverbs_attr_bundle;
 
 struct ib_device {
@@ -2483,6 +2490,10 @@ struct ib_device {
 	struct ib_mr *             (*reg_dm_mr)(struct ib_pd *pd, struct ib_dm *dm,
 						struct ib_dm_mr_attr *attr,
 						struct uverbs_attr_bundle *attrs);
+	struct ib_counters *	(*create_counters)(struct ib_device *device,
+						   struct uverbs_attr_bundle *attrs);
+	int	(*destroy_counters)(struct ib_counters	*counters);
+
 	/**
 	 * rdma netdev operation
 	 *
-- 
2.14.3

^ permalink raw reply related

* [PATCH rdma-next v3 05/14] IB/uverbs: Add create/destroy counters support
From: Leon Romanovsky @ 2018-05-31 13:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
	Michael J . Ruhl, Or Gerlitz, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20180531134341.18441-1-leon@kernel.org>

From: Raed Salem <raeds@mellanox.com>

User space application which uses counters functionality,
is expected to allocate/release the counters resources by
calling create/destroy verbs and in turn get a unique handle
that can be used to attach the counters to its counted type.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/Makefile                   |   2 +-
 drivers/infiniband/core/uverbs.h                   |   1 +
 drivers/infiniband/core/uverbs_std_types.c         |   3 +-
 .../infiniband/core/uverbs_std_types_counters.c    | 100 +++++++++++++++++++++
 include/uapi/rdma/ib_user_ioctl_cmds.h             |  14 +++
 5 files changed, 118 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/core/uverbs_std_types_counters.c

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 8d42373a2d8a..61667705d746 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -37,4 +37,4 @@ ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
 				rdma_core.o uverbs_std_types.o uverbs_ioctl.o \
 				uverbs_ioctl_merge.o uverbs_std_types_cq.o \
 				uverbs_std_types_flow_action.o uverbs_std_types_dm.o \
-				uverbs_std_types_mr.o
+				uverbs_std_types_mr.o uverbs_std_types_counters.o
diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index cfb51618ab7a..5b2461fa634d 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -287,6 +287,7 @@ extern const struct uverbs_object_def UVERBS_OBJECT(UVERBS_OBJECT_RWQ_IND_TBL);
 extern const struct uverbs_object_def UVERBS_OBJECT(UVERBS_OBJECT_XRCD);
 extern const struct uverbs_object_def UVERBS_OBJECT(UVERBS_OBJECT_FLOW_ACTION);
 extern const struct uverbs_object_def UVERBS_OBJECT(UVERBS_OBJECT_DM);
+extern const struct uverbs_object_def UVERBS_OBJECT(UVERBS_OBJECT_COUNTERS);
 
 #define IB_UVERBS_DECLARE_CMD(name)					\
 	ssize_t ib_uverbs_##name(struct ib_uverbs_file *file,		\
diff --git a/drivers/infiniband/core/uverbs_std_types.c b/drivers/infiniband/core/uverbs_std_types.c
index 569f48bd821e..b570acbd94af 100644
--- a/drivers/infiniband/core/uverbs_std_types.c
+++ b/drivers/infiniband/core/uverbs_std_types.c
@@ -302,7 +302,8 @@ static DECLARE_UVERBS_OBJECT_TREE(uverbs_default_objects,
 				  &UVERBS_OBJECT(UVERBS_OBJECT_RWQ_IND_TBL),
 				  &UVERBS_OBJECT(UVERBS_OBJECT_XRCD),
 				  &UVERBS_OBJECT(UVERBS_OBJECT_FLOW_ACTION),
-				  &UVERBS_OBJECT(UVERBS_OBJECT_DM));
+				  &UVERBS_OBJECT(UVERBS_OBJECT_DM),
+				  &UVERBS_OBJECT(UVERBS_OBJECT_COUNTERS));
 
 const struct uverbs_object_tree_def *uverbs_default_get_objects(void)
 {
diff --git a/drivers/infiniband/core/uverbs_std_types_counters.c b/drivers/infiniband/core/uverbs_std_types_counters.c
new file mode 100644
index 000000000000..a5bc50ceee13
--- /dev/null
+++ b/drivers/infiniband/core/uverbs_std_types_counters.c
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/*
+ * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "uverbs.h"
+#include <rdma/uverbs_std_types.h>
+
+static int uverbs_free_counters(struct ib_uobject *uobject,
+				enum rdma_remove_reason why)
+{
+	struct ib_counters *counters = uobject->object;
+
+	if (why == RDMA_REMOVE_DESTROY &&
+	    atomic_read(&counters->usecnt))
+		return -EBUSY;
+
+	return counters->device->destroy_counters(counters);
+}
+
+static int UVERBS_HANDLER(UVERBS_METHOD_COUNTERS_CREATE)(struct ib_device *ib_dev,
+							 struct ib_uverbs_file *file,
+							 struct uverbs_attr_bundle *attrs)
+{
+	struct ib_counters *counters;
+	struct ib_uobject *uobj;
+	int ret;
+
+	/*
+	 * This check should be removed once the infrastructure
+	 * have the ability to remove methods from parse tree once
+	 * such condition is met.
+	 */
+	if (!ib_dev->create_counters)
+		return -EOPNOTSUPP;
+
+	uobj = uverbs_attr_get_uobject(attrs, UVERBS_ATTR_CREATE_COUNTERS_HANDLE);
+	counters = ib_dev->create_counters(ib_dev, attrs);
+	if (IS_ERR(counters)) {
+		ret = PTR_ERR(counters);
+		goto err_create_counters;
+	}
+
+	counters->device = ib_dev;
+	counters->uobject = uobj;
+	uobj->object = counters;
+	atomic_set(&counters->usecnt, 0);
+
+	return 0;
+
+err_create_counters:
+	return ret;
+}
+
+static DECLARE_UVERBS_NAMED_METHOD(UVERBS_METHOD_COUNTERS_CREATE,
+	&UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_COUNTERS_HANDLE,
+			 UVERBS_OBJECT_COUNTERS,
+			 UVERBS_ACCESS_NEW,
+			 UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)));
+
+static DECLARE_UVERBS_NAMED_METHOD_WITH_HANDLER(UVERBS_METHOD_COUNTERS_DESTROY,
+	uverbs_destroy_def_handler,
+	&UVERBS_ATTR_IDR(UVERBS_ATTR_DESTROY_COUNTERS_HANDLE,
+			 UVERBS_OBJECT_COUNTERS,
+			 UVERBS_ACCESS_DESTROY,
+			 UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)));
+
+DECLARE_UVERBS_NAMED_OBJECT(UVERBS_OBJECT_COUNTERS,
+			    &UVERBS_TYPE_ALLOC_IDR(0, uverbs_free_counters),
+			    &UVERBS_METHOD(UVERBS_METHOD_COUNTERS_CREATE),
+			    &UVERBS_METHOD(UVERBS_METHOD_COUNTERS_DESTROY));
+
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index 83e3890eef20..c28ce62d2e40 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -55,6 +55,7 @@ enum uverbs_default_objects {
 	UVERBS_OBJECT_WQ,
 	UVERBS_OBJECT_FLOW_ACTION,
 	UVERBS_OBJECT_DM,
+	UVERBS_OBJECT_COUNTERS,
 };
 
 enum {
@@ -131,4 +132,17 @@ enum uverbs_methods_mr {
 	UVERBS_METHOD_DM_MR_REG,
 };
 
+enum uverbs_attrs_create_counters_cmd_attr_ids {
+	UVERBS_ATTR_CREATE_COUNTERS_HANDLE,
+};
+
+enum uverbs_attrs_destroy_counters_cmd_attr_ids {
+	UVERBS_ATTR_DESTROY_COUNTERS_HANDLE,
+};
+
+enum uverbs_methods_actions_counters_ops {
+	UVERBS_METHOD_COUNTERS_CREATE,
+	UVERBS_METHOD_COUNTERS_DESTROY,
+};
+
 #endif
-- 
2.14.3

^ permalink raw reply related

* [PATCH rdma-next v3 06/14] IB/core: Introduce counters read verb
From: Leon Romanovsky @ 2018-05-31 13:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
	Michael J . Ruhl, Or Gerlitz, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20180531134341.18441-1-leon@kernel.org>

From: Raed Salem <raeds@mellanox.com>

The user supplies counters instance and a reference to an output
array of uint64_t.
The driver reads the hardware counters values and writes them to
the output index location in the user supplied array.
All counters values are represented as uint64_t types.

To be able to successfully read the data the counters must be
first bound to an IB object.

Downstream patches will present binding method for
flow counters.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 include/rdma/ib_verbs.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 90c548437c8a..ba49e874c841 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2219,6 +2219,17 @@ struct ib_counters {
 	atomic_t	usecnt;
 };
 
+enum ib_read_counters_flags {
+	/* prefer read values from driver cache */
+	IB_READ_COUNTERS_ATTR_PREFER_CACHED = 1 << 0,
+};
+
+struct ib_counters_read_attr {
+	u64	*counters_buff;
+	u32	ncounters;
+	u32	flags; /* use enum ib_read_counters_flags */
+};
+
 struct uverbs_attr_bundle;
 
 struct ib_device {
@@ -2493,6 +2504,9 @@ struct ib_device {
 	struct ib_counters *	(*create_counters)(struct ib_device *device,
 						   struct uverbs_attr_bundle *attrs);
 	int	(*destroy_counters)(struct ib_counters	*counters);
+	int	(*read_counters)(struct ib_counters *counters,
+				 struct ib_counters_read_attr *counters_read_attr,
+				 struct uverbs_attr_bundle *attrs);
 
 	/**
 	 * rdma netdev operation
-- 
2.14.3

^ permalink raw reply related

* [PATCH rdma-next v3 07/14] IB/uverbs: Add read counters support
From: Leon Romanovsky @ 2018-05-31 13:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
	Michael J . Ruhl, Or Gerlitz, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20180531134341.18441-1-leon@kernel.org>

From: Raed Salem <raeds@mellanox.com>

This patch exposes the read counters verb to user space
applications.
By that verb the user can read the hardware counters which
are associated with the counters object.

The application needs to provide a sufficient memory to
hold the statistics.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 .../infiniband/core/uverbs_std_types_counters.c    | 59 +++++++++++++++++++++-
 include/uapi/rdma/ib_user_ioctl_cmds.h             |  7 +++
 2 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/uverbs_std_types_counters.c b/drivers/infiniband/core/uverbs_std_types_counters.c
index a5bc50ceee13..b35fcd3718c8 100644
--- a/drivers/infiniband/core/uverbs_std_types_counters.c
+++ b/drivers/infiniband/core/uverbs_std_types_counters.c
@@ -80,6 +80,49 @@ static int UVERBS_HANDLER(UVERBS_METHOD_COUNTERS_CREATE)(struct ib_device *ib_de
 	return ret;
 }
 
+static int UVERBS_HANDLER(UVERBS_METHOD_COUNTERS_READ)(struct ib_device *ib_dev,
+						       struct ib_uverbs_file *file,
+						       struct uverbs_attr_bundle *attrs)
+{
+	struct ib_counters_read_attr read_attr = {};
+	const struct uverbs_attr *uattr;
+	struct ib_counters *counters =
+		uverbs_attr_get_obj(attrs, UVERBS_ATTR_READ_COUNTERS_HANDLE);
+	int ret;
+
+	if (!ib_dev->read_counters)
+		return -EOPNOTSUPP;
+
+	if (!atomic_read(&counters->usecnt))
+		return -EINVAL;
+
+	ret = uverbs_copy_from(&read_attr.flags, attrs,
+			       UVERBS_ATTR_READ_COUNTERS_FLAGS);
+	if (ret)
+		return ret;
+
+	uattr = uverbs_attr_get(attrs, UVERBS_ATTR_READ_COUNTERS_BUFF);
+	read_attr.ncounters = uattr->ptr_attr.len / sizeof(u64);
+	read_attr.counters_buff = kcalloc(read_attr.ncounters,
+					  sizeof(u64), GFP_KERNEL);
+	if (!read_attr.counters_buff)
+		return -ENOMEM;
+
+	ret = ib_dev->read_counters(counters,
+				    &read_attr,
+				    attrs);
+	if (ret)
+		goto err_read;
+
+	ret = uverbs_copy_to(attrs, UVERBS_ATTR_READ_COUNTERS_BUFF,
+			     read_attr.counters_buff,
+			     read_attr.ncounters * sizeof(u64));
+
+err_read:
+	kfree(read_attr.counters_buff);
+	return ret;
+}
+
 static DECLARE_UVERBS_NAMED_METHOD(UVERBS_METHOD_COUNTERS_CREATE,
 	&UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_COUNTERS_HANDLE,
 			 UVERBS_OBJECT_COUNTERS,
@@ -93,8 +136,22 @@ static DECLARE_UVERBS_NAMED_METHOD_WITH_HANDLER(UVERBS_METHOD_COUNTERS_DESTROY,
 			 UVERBS_ACCESS_DESTROY,
 			 UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)));
 
+#define MAX_COUNTERS_BUFF_SIZE USHRT_MAX
+static DECLARE_UVERBS_NAMED_METHOD(UVERBS_METHOD_COUNTERS_READ,
+	&UVERBS_ATTR_IDR(UVERBS_ATTR_READ_COUNTERS_HANDLE,
+			 UVERBS_OBJECT_COUNTERS,
+			 UVERBS_ACCESS_READ,
+			 UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)),
+	&UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_READ_COUNTERS_BUFF,
+			     UVERBS_ATTR_SIZE(0, MAX_COUNTERS_BUFF_SIZE),
+			     UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)),
+	&UVERBS_ATTR_PTR_IN(UVERBS_ATTR_READ_COUNTERS_FLAGS,
+			    UVERBS_ATTR_TYPE(__u32),
+			    UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)));
+
 DECLARE_UVERBS_NAMED_OBJECT(UVERBS_OBJECT_COUNTERS,
 			    &UVERBS_TYPE_ALLOC_IDR(0, uverbs_free_counters),
 			    &UVERBS_METHOD(UVERBS_METHOD_COUNTERS_CREATE),
-			    &UVERBS_METHOD(UVERBS_METHOD_COUNTERS_DESTROY));
+			    &UVERBS_METHOD(UVERBS_METHOD_COUNTERS_DESTROY),
+			    &UVERBS_METHOD(UVERBS_METHOD_COUNTERS_READ));
 
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index c28ce62d2e40..888ac5975a6c 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -140,9 +140,16 @@ enum uverbs_attrs_destroy_counters_cmd_attr_ids {
 	UVERBS_ATTR_DESTROY_COUNTERS_HANDLE,
 };
 
+enum uverbs_attrs_read_counters_cmd_attr_ids {
+	UVERBS_ATTR_READ_COUNTERS_HANDLE,
+	UVERBS_ATTR_READ_COUNTERS_BUFF,
+	UVERBS_ATTR_READ_COUNTERS_FLAGS,
+};
+
 enum uverbs_methods_actions_counters_ops {
 	UVERBS_METHOD_COUNTERS_CREATE,
 	UVERBS_METHOD_COUNTERS_DESTROY,
+	UVERBS_METHOD_COUNTERS_READ,
 };
 
 #endif
-- 
2.14.3

^ permalink raw reply related

* [PATCH rdma-next v3 08/14] IB/core: Support passing uhw for create_flow
From: Leon Romanovsky @ 2018-05-31 13:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
	Michael J . Ruhl, Or Gerlitz, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20180531134341.18441-1-leon@kernel.org>

From: Matan Barak <matanb@mellanox.com>

This is required when user-space drivers need to pass extra information
regarding how to handle this flow steering specification.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/uverbs_cmd.c | 7 ++++++-
 drivers/infiniband/core/verbs.c      | 2 +-
 drivers/infiniband/hw/mlx4/main.c    | 6 +++++-
 drivers/infiniband/hw/mlx5/main.c    | 7 ++++++-
 include/rdma/ib_verbs.h              | 3 ++-
 5 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index e74262ee104c..ddb9d79691be 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3542,11 +3542,16 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
 		err = -EINVAL;
 		goto err_free;
 	}
-	flow_id = ib_create_flow(qp, flow_attr, IB_FLOW_DOMAIN_USER);
+
+	flow_id = qp->device->create_flow(qp, flow_attr,
+					  IB_FLOW_DOMAIN_USER, uhw);
+
 	if (IS_ERR(flow_id)) {
 		err = PTR_ERR(flow_id);
 		goto err_free;
 	}
+	atomic_inc(&qp->usecnt);
+	flow_id->qp = qp;
 	flow_id->uobject = uobj;
 	uobj->object = flow_id;
 	uflow = container_of(uobj, typeof(*uflow), uobject);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 6ddfb1fade79..0b56828c1319 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1983,7 +1983,7 @@ struct ib_flow *ib_create_flow(struct ib_qp *qp,
 	if (!qp->device->create_flow)
 		return ERR_PTR(-EOPNOTSUPP);
 
-	flow_id = qp->device->create_flow(qp, flow_attr, domain);
+	flow_id = qp->device->create_flow(qp, flow_attr, domain, NULL);
 	if (!IS_ERR(flow_id)) {
 		atomic_inc(&qp->usecnt);
 		flow_id->qp = qp;
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index bf12394c13c1..6fe5d5d1d1d9 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1848,7 +1848,7 @@ static int mlx4_ib_add_dont_trap_rule(struct mlx4_dev *dev,
 
 static struct ib_flow *mlx4_ib_create_flow(struct ib_qp *qp,
 				    struct ib_flow_attr *flow_attr,
-				    int domain)
+				    int domain, struct ib_udata *udata)
 {
 	int err = 0, i = 0, j = 0;
 	struct mlx4_ib_flow *mflow;
@@ -1866,6 +1866,10 @@ static struct ib_flow *mlx4_ib_create_flow(struct ib_qp *qp,
 	    (flow_attr->type != IB_FLOW_ATTR_NORMAL))
 		return ERR_PTR(-EOPNOTSUPP);
 
+	if (udata &&
+	    udata->inlen && !ib_is_udata_cleared(udata, 0, udata->inlen))
+		return ERR_PTR(-EOPNOTSUPP);
+
 	memset(type, 0, sizeof(type));
 
 	mflow = kzalloc(sizeof(*mflow), GFP_KERNEL);
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 92879d2d3026..fb31a719ee25 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3371,7 +3371,8 @@ static struct mlx5_ib_flow_handler *create_sniffer_rule(struct mlx5_ib_dev *dev,
 
 static struct ib_flow *mlx5_ib_create_flow(struct ib_qp *qp,
 					   struct ib_flow_attr *flow_attr,
-					   int domain)
+					   int domain,
+					   struct ib_udata *udata)
 {
 	struct mlx5_ib_dev *dev = to_mdev(qp->device);
 	struct mlx5_ib_qp *mqp = to_mqp(qp);
@@ -3383,6 +3384,10 @@ static struct ib_flow *mlx5_ib_create_flow(struct ib_qp *qp,
 	int err;
 	int underlay_qpn;
 
+	if (udata &&
+	    udata->inlen && !ib_is_udata_cleared(udata, 0, udata->inlen))
+		return ERR_PTR(-EOPNOTSUPP);
+
 	if (flow_attr->priority > MLX5_IB_FLOW_LAST_PRIO)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index ba49e874c841..84f412f7b8f3 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2459,7 +2459,8 @@ struct ib_device {
 	struct ib_flow *	   (*create_flow)(struct ib_qp *qp,
 						  struct ib_flow_attr
 						  *flow_attr,
-						  int domain);
+						  int domain,
+						  struct ib_udata *udata);
 	int			   (*destroy_flow)(struct ib_flow *flow_id);
 	int			   (*check_mr_status)(struct ib_mr *mr, u32 check_mask,
 						      struct ib_mr_status *mr_status);
-- 
2.14.3

^ permalink raw reply related

* [PATCH rdma-next v3 09/14] IB/core: Add support for flow counters
From: Leon Romanovsky @ 2018-05-31 13:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
	Michael J . Ruhl, Or Gerlitz, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20180531134341.18441-1-leon@kernel.org>

From: Raed Salem <raeds@mellanox.com>

A counters object could be attached to flow on creation
by providing the counter specification action.

General counters description which count packets and bytes are
introduced, downstream patches from this series will use them
as part of flow counters binding.

In addition, increase number of flow specifications supported
layers to 10 upon adding count specification and for the
previously added drop specification.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 include/rdma/ib_verbs.h | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 84f412f7b8f3..2cc04abb6df8 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1859,9 +1859,10 @@ enum ib_flow_spec_type {
 	IB_FLOW_SPEC_ACTION_TAG         = 0x1000,
 	IB_FLOW_SPEC_ACTION_DROP        = 0x1001,
 	IB_FLOW_SPEC_ACTION_HANDLE	= 0x1002,
+	IB_FLOW_SPEC_ACTION_COUNT       = 0x1003,
 };
 #define IB_FLOW_SPEC_LAYER_MASK	0xF0
-#define IB_FLOW_SPEC_SUPPORT_LAYERS 8
+#define IB_FLOW_SPEC_SUPPORT_LAYERS 10
 
 /* Flow steering rule priority is set according to it's domain.
  * Lower domain value means higher priority.
@@ -2041,6 +2042,17 @@ struct ib_flow_spec_action_handle {
 	struct ib_flow_action	     *act;
 };
 
+enum ib_counters_description {
+	IB_COUNTER_PACKETS,
+	IB_COUNTER_BYTES,
+};
+
+struct ib_flow_spec_action_count {
+	enum ib_flow_spec_type type;
+	u16 size;
+	struct ib_counters *counters;
+};
+
 union ib_flow_spec {
 	struct {
 		u32			type;
@@ -2058,6 +2070,7 @@ union ib_flow_spec {
 	struct ib_flow_spec_action_tag  flow_tag;
 	struct ib_flow_spec_action_drop drop;
 	struct ib_flow_spec_action_handle action;
+	struct ib_flow_spec_action_count flow_count;
 };
 
 struct ib_flow_attr {
-- 
2.14.3

^ permalink raw reply related

* [PATCH rdma-next v3 10/14] IB/uverbs: Add support for flow counters
From: Leon Romanovsky @ 2018-05-31 13:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
	Michael J . Ruhl, Or Gerlitz, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20180531134341.18441-1-leon@kernel.org>

From: Raed Salem <raeds@mellanox.com>

The struct ib_uverbs_flow_spec_action_count associates
a counters object with the flow.

Post this association the flow counters can be read via
the counters object.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/uverbs.h     |  1 +
 drivers/infiniband/core/uverbs_cmd.c | 81 +++++++++++++++++++++++++++++++-----
 include/uapi/rdma/ib_user_verbs.h    | 13 ++++++
 3 files changed, 84 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 5b2461fa634d..c0d40fc3a53a 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -263,6 +263,7 @@ struct ib_uverbs_flow_spec {
 		struct ib_uverbs_flow_spec_action_tag	flow_tag;
 		struct ib_uverbs_flow_spec_action_drop	drop;
 		struct ib_uverbs_flow_spec_action_handle action;
+		struct ib_uverbs_flow_spec_action_count flow_count;
 	};
 };
 
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index ddb9d79691be..3179a95c6f5e 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2748,43 +2748,82 @@ ssize_t ib_uverbs_detach_mcast(struct ib_uverbs_file *file,
 struct ib_uflow_resources {
 	size_t			max;
 	size_t			num;
-	struct ib_flow_action	*collection[0];
+	size_t			collection_num;
+	size_t			counters_num;
+	struct ib_counters	**counters;
+	struct ib_flow_action	**collection;
 };
 
 static struct ib_uflow_resources *flow_resources_alloc(size_t num_specs)
 {
 	struct ib_uflow_resources *resources;
 
-	resources =
-		kmalloc(sizeof(*resources) +
-			num_specs * sizeof(*resources->collection), GFP_KERNEL);
+	resources = kzalloc(sizeof(*resources), GFP_KERNEL);
 
 	if (!resources)
-		return NULL;
+		goto err_res;
+
+	resources->counters =
+		kcalloc(num_specs, sizeof(*resources->counters), GFP_KERNEL);
+
+	if (!resources->counters)
+		goto err_cnt;
+
+	resources->collection =
+		kcalloc(num_specs, sizeof(*resources->collection), GFP_KERNEL);
+
+	if (!resources->collection)
+		goto err_collection;
 
-	resources->num = 0;
 	resources->max = num_specs;
 
 	return resources;
+
+err_collection:
+	kfree(resources->counters);
+err_cnt:
+	kfree(resources);
+err_res:
+	return NULL;
 }
 
 void ib_uverbs_flow_resources_free(struct ib_uflow_resources *uflow_res)
 {
 	unsigned int i;
 
-	for (i = 0; i < uflow_res->num; i++)
+	for (i = 0; i < uflow_res->collection_num; i++)
 		atomic_dec(&uflow_res->collection[i]->usecnt);
 
+	for (i = 0; i < uflow_res->counters_num; i++)
+		atomic_dec(&uflow_res->counters[i]->usecnt);
+
+	kfree(uflow_res->collection);
+	kfree(uflow_res->counters);
 	kfree(uflow_res);
 }
 
 static void flow_resources_add(struct ib_uflow_resources *uflow_res,
-			       struct ib_flow_action *action)
+			       enum ib_flow_spec_type type,
+			       void *ibobj)
 {
 	WARN_ON(uflow_res->num >= uflow_res->max);
 
-	atomic_inc(&action->usecnt);
-	uflow_res->collection[uflow_res->num++] = action;
+	switch (type) {
+	case IB_FLOW_SPEC_ACTION_HANDLE:
+		atomic_inc(&((struct ib_flow_action *)ibobj)->usecnt);
+		uflow_res->collection[uflow_res->collection_num++] =
+			(struct ib_flow_action *)ibobj;
+		break;
+	case IB_FLOW_SPEC_ACTION_COUNT:
+		atomic_inc(&((struct ib_counters *)ibobj)->usecnt);
+		uflow_res->counters[uflow_res->counters_num++] =
+			(struct ib_counters *)ibobj;
+		break;
+	default:
+		WARN_ON(1);
+	}
+
+	uflow_res->num++;
 }
 
 static int kern_spec_to_ib_spec_action(struct ib_ucontext *ucontext,
@@ -2821,9 +2860,29 @@ static int kern_spec_to_ib_spec_action(struct ib_ucontext *ucontext,
 			return -EINVAL;
 		ib_spec->action.size =
 			sizeof(struct ib_flow_spec_action_handle);
-		flow_resources_add(uflow_res, ib_spec->action.act);
+		flow_resources_add(uflow_res,
+				   IB_FLOW_SPEC_ACTION_HANDLE,
+				   ib_spec->action.act);
 		uobj_put_obj_read(ib_spec->action.act);
 		break;
+	case IB_FLOW_SPEC_ACTION_COUNT:
+		if (kern_spec->flow_count.size !=
+			sizeof(struct ib_uverbs_flow_spec_action_count))
+			return -EINVAL;
+		ib_spec->flow_count.counters =
+			uobj_get_obj_read(counters,
+					  UVERBS_OBJECT_COUNTERS,
+					  kern_spec->flow_count.handle,
+					  ucontext);
+		if (!ib_spec->flow_count.counters)
+			return -EINVAL;
+		ib_spec->flow_count.size =
+				sizeof(struct ib_flow_spec_action_count);
+		flow_resources_add(uflow_res,
+				   IB_FLOW_SPEC_ACTION_COUNT,
+				   ib_spec->flow_count.counters);
+		uobj_put_obj_read(ib_spec->flow_count.counters);
+		break;
 	default:
 		return -EINVAL;
 	}
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 409507f83b91..4f9991de8e3a 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -998,6 +998,19 @@ struct ib_uverbs_flow_spec_action_handle {
 	__u32			      reserved1;
 };
 
+struct ib_uverbs_flow_spec_action_count {
+	union {
+		struct ib_uverbs_flow_spec_hdr hdr;
+		struct {
+			__u32 type;
+			__u16 size;
+			__u16 reserved;
+		};
+	};
+	__u32			      handle;
+	__u32			      reserved1;
+};
+
 struct ib_uverbs_flow_tunnel_filter {
 	__be32 tunnel_id;
 };
-- 
2.14.3

^ permalink raw reply related

* [PATCH rdma-next v3 11/14] IB/mlx5: Add counters create and destroy support
From: Leon Romanovsky @ 2018-05-31 13:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
	Michael J . Ruhl, Or Gerlitz, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20180531134341.18441-1-leon@kernel.org>

From: Raed Salem <raeds@mellanox.com>

This patch implements the device counters create and destroy APIs
and introducing some internal management structures.

Downstream patches in this series will add the functionality to
support flow counters binding and reading.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c    | 23 +++++++++++++++++++++++
 drivers/infiniband/hw/mlx5/mlx5_ib.h | 10 ++++++++++
 2 files changed, 33 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index fb31a719ee25..4d0f53566854 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -5132,6 +5132,27 @@ static void depopulate_specs_root(struct mlx5_ib_dev *dev)
 	uverbs_free_spec_tree(dev->ib_dev.specs_root);
 }
 
+static int mlx5_ib_destroy_counters(struct ib_counters *counters)
+{
+	struct mlx5_ib_mcounters *mcounters = to_mcounters(counters);
+
+	kfree(mcounters);
+
+	return 0;
+}
+
+static struct ib_counters *mlx5_ib_create_counters(struct ib_device *device,
+						   struct uverbs_attr_bundle *attrs)
+{
+	struct mlx5_ib_mcounters *mcounters;
+
+	mcounters = kzalloc(sizeof(*mcounters), GFP_KERNEL);
+	if (!mcounters)
+		return ERR_PTR(-ENOMEM);
+
+	return &mcounters->ibcntrs;
+}
+
 void mlx5_ib_stage_init_cleanup(struct mlx5_ib_dev *dev)
 {
 	mlx5_ib_cleanup_multiport_master(dev);
@@ -5375,6 +5396,8 @@ int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev)
 	dev->ib_dev.destroy_flow_action = mlx5_ib_destroy_flow_action;
 	dev->ib_dev.modify_flow_action_esp = mlx5_ib_modify_flow_action_esp;
 	dev->ib_dev.driver_id = RDMA_DRIVER_MLX5;
+	dev->ib_dev.create_counters = mlx5_ib_create_counters;
+	dev->ib_dev.destroy_counters = mlx5_ib_destroy_counters;
 
 	err = init_node_data(dev);
 	if (err)
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 49a1aa0ff429..fd27ec1aed08 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -813,6 +813,16 @@ struct mlx5_memic {
 	DECLARE_BITMAP(memic_alloc_pages, MLX5_MAX_MEMIC_PAGES);
 };
 
+struct mlx5_ib_mcounters {
+	struct ib_counters ibcntrs;
+};
+
+static inline struct mlx5_ib_mcounters *
+to_mcounters(struct ib_counters *ibcntrs)
+{
+	return container_of(ibcntrs, struct mlx5_ib_mcounters, ibcntrs);
+}
+
 struct mlx5_ib_dev {
 	struct ib_device		ib_dev;
 	struct mlx5_core_dev		*mdev;
-- 
2.14.3

^ permalink raw reply related

* [PATCH mlx5-next v3 03/14] net/mlx5: Export flow counter related API
From: Leon Romanovsky @ 2018-05-31 13:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
	Michael J . Ruhl, Or Gerlitz, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20180531134341.18441-1-leon@kernel.org>

From: Raed Salem <raeds@mellanox.com>

Exports counters API to be used in both IB and EN.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h     | 2 --
 drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c | 3 +++
 include/linux/mlx5/fs.h                               | 3 +++
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
index d589dbbc72a7..32070e5d993d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
@@ -233,8 +233,6 @@ void mlx5_fc_queue_stats_work(struct mlx5_core_dev *dev,
 			      unsigned long delay);
 void mlx5_fc_update_sampling_interval(struct mlx5_core_dev *dev,
 				      unsigned long interval);
-int mlx5_fc_query(struct mlx5_core_dev *dev, struct mlx5_fc *counter,
-		  u64 *packets, u64 *bytes);
 
 int mlx5_init_fs(struct mlx5_core_dev *dev);
 void mlx5_cleanup_fs(struct mlx5_core_dev *dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
index f5cb3fa5d8cf..58af6be13dfa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
@@ -243,6 +243,7 @@ struct mlx5_fc *mlx5_fc_create(struct mlx5_core_dev *dev, bool aging)
 
 	return ERR_PTR(err);
 }
+EXPORT_SYMBOL(mlx5_fc_create);
 
 void mlx5_fc_destroy(struct mlx5_core_dev *dev, struct mlx5_fc *counter)
 {
@@ -260,6 +261,7 @@ void mlx5_fc_destroy(struct mlx5_core_dev *dev, struct mlx5_fc *counter)
 	mlx5_cmd_fc_free(dev, counter->id);
 	kfree(counter);
 }
+EXPORT_SYMBOL(mlx5_fc_destroy);
 
 int mlx5_init_fc_stats(struct mlx5_core_dev *dev)
 {
@@ -317,6 +319,7 @@ int mlx5_fc_query(struct mlx5_core_dev *dev, struct mlx5_fc *counter,
 {
 	return mlx5_cmd_fc_query(dev, counter->id, packets, bytes);
 }
+EXPORT_SYMBOL(mlx5_fc_query);
 
 void mlx5_fc_query_cached(struct mlx5_fc *counter,
 			  u64 *bytes, u64 *packets, u64 *lastuse)
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 9f4d32e41c06..3b4c3298061c 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -186,6 +186,9 @@ struct mlx5_fc *mlx5_fc_create(struct mlx5_core_dev *dev, bool aging);
 void mlx5_fc_destroy(struct mlx5_core_dev *dev, struct mlx5_fc *counter);
 void mlx5_fc_query_cached(struct mlx5_fc *counter,
 			  u64 *bytes, u64 *packets, u64 *lastuse);
+int mlx5_fc_query(struct mlx5_core_dev *dev, struct mlx5_fc *counter,
+		  u64 *packets, u64 *bytes);
+
 int mlx5_fs_add_rx_underlay_qpn(struct mlx5_core_dev *dev, u32 underlay_qpn);
 int mlx5_fs_remove_rx_underlay_qpn(struct mlx5_core_dev *dev, u32 underlay_qpn);
 
-- 
2.14.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox