Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] net/sched: Make NET_ACT_CT depends on NF_NAT
From: David Miller @ 2019-07-17 19:02 UTC (permalink / raw)
  To: yuehaibing; +Cc: jhs, xiyou.wangcong, jiri, linux-kernel, netdev
In-Reply-To: <20190716071602.27276-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Tue, 16 Jul 2019 15:16:02 +0800

> If NF_NAT is m and NET_ACT_CT is y, build fails:
> 
> net/sched/act_ct.o: In function `tcf_ct_act':
> act_ct.c:(.text+0x21ac): undefined reference to `nf_ct_nat_ext_add'
> act_ct.c:(.text+0x229a): undefined reference to `nf_nat_icmp_reply_translation'
> act_ct.c:(.text+0x233a): undefined reference to `nf_nat_setup_info'
> act_ct.c:(.text+0x234a): undefined reference to `nf_nat_alloc_null_binding'
> act_ct.c:(.text+0x237c): undefined reference to `nf_nat_packet'
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH] qlge: Move drivers/net/ethernet/qlogic/qlge/ to drivers/staging/qlge/
From: David Miller @ 2019-07-17 19:02 UTC (permalink / raw)
  To: bpoirier; +Cc: gregkh, manishc, GR-Linux-NIC-Dev, netdev
In-Reply-To: <20190716023459.23266-1-bpoirier@suse.com>

From: Benjamin Poirier <bpoirier@suse.com>
Date: Tue, 16 Jul 2019 11:34:59 +0900

> The hardware has been declared EOL by the vendor more than 5 years ago.
> What's more relevant to the Linux kernel is that the quality of this driver
> is not on par with many other mainline drivers.
> 
> Cc: Manish Chopra <manishc@marvell.com>
> Message-id: <20190617074858.32467-1-bpoirier@suse.com>
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>

Please resubmit this when the net-next tree opens back up.

^ permalink raw reply

* Re: [PATCH] net: sctp: fix warning "NULL check before some freeing functions is not needed"
From: David Miller @ 2019-07-17 19:01 UTC (permalink / raw)
  To: hariprasad.kelam
  Cc: vyasevich, nhorman, marcelo.leitner, linux-sctp, netdev,
	linux-kernel
In-Reply-To: <20190716022002.GA19592@hari-Inspiron-1545>

From: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Date: Tue, 16 Jul 2019 07:50:02 +0530

> This patch removes NULL checks before calling kfree.
> 
> fixes below issues reported by coccicheck
> net/sctp/sm_make_chunk.c:2586:3-8: WARNING: NULL check before some
> freeing functions is not needed.
> net/sctp/sm_make_chunk.c:2652:3-8: WARNING: NULL check before some
> freeing functions is not needed.
> net/sctp/sm_make_chunk.c:2667:3-8: WARNING: NULL check before some
> freeing functions is not needed.
> net/sctp/sm_make_chunk.c:2684:3-8: WARNING: NULL check before some
> freeing functions is not needed.
> 
> Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] bnx2x: Prevent load reordering in tx completion processing
From: David Miller @ 2019-07-17 19:00 UTC (permalink / raw)
  To: brking; +Cc: GR-everest-linux-l2, skalluru, aelior, netdev
In-Reply-To: <1563226910-21660-1-git-send-email-brking@linux.vnet.ibm.com>

From: Brian King <brking@linux.vnet.ibm.com>
Date: Mon, 15 Jul 2019 16:41:50 -0500

> This patch fixes an issue seen on Power systems with bnx2x which results
> in the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb
> pointer getting loaded in bnx2x_free_tx_pkt prior to the hw_cons
> load in bnx2x_tx_int. Adding a read memory barrier resolves the issue.
> 
> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>

Marvell folks, please review.

^ permalink raw reply

* Re: [PATCH net] caif-hsi: fix possible deadlock in cfhsi_exit_module()
From: David Miller @ 2019-07-17 18:59 UTC (permalink / raw)
  To: ap420073; +Cc: netdev
In-Reply-To: <20190715051017.7514-1-ap420073@gmail.com>

From: Taehee Yoo <ap420073@gmail.com>
Date: Mon, 15 Jul 2019 14:10:17 +0900

> cfhsi_exit_module() calls unregister_netdev() under rtnl_lock().
> but unregister_netdev() internally calls rtnl_lock().
> So deadlock would occur.
> 
> Fixes: c41254006377 ("caif-hsi: Add rtnl support")
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>

Applied and queued up for -stable, thank you.

^ permalink raw reply

* Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jon Hunter @ 2019-07-17 18:58 UTC (permalink / raw)
  To: Jose Abreu, linux-kernel, netdev, linux-stm32, linux-arm-kernel
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <1b254bb7fc6044c5e6e2fdd9e00088d1d13a808b.1562149883.git.joabreu@synopsys.com>


On 03/07/2019 11:37, Jose Abreu wrote:
> Mapping and unmapping DMA region is an high bottleneck in stmmac driver,
> specially in the RX path.
> 
> This commit introduces support for Page Pool API and uses it in all RX
> queues. With this change, we get more stable troughput and some increase
> of banwidth with iperf:
> 	- MAC1000 - 950 Mbps
> 	- XGMAC: 9.22 Gbps
I am seeing a boot regression on one of our Tegra boards with both
mainline and -next. Bisecting is pointing to this commit and reverting
this commit on top of mainline fixes the problem. Unfortunately, there
is not much of a backtrace but what I have captured is below. 

Please note that this is seen on a system that is using NFS to mount
the rootfs and the crash occurs right around the point the rootfs is
mounted.

Let me know if you have any thoughts.

Cheers
Jon 

[   12.221843] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[   12.229485] CPU: 5 PID: 1 Comm: init Tainted: G S                5.2.0-11500-g916f562fb28a #18
[   12.238076] Hardware name: NVIDIA Tegra186 P2771-0000 Development Board (DT)
[   12.245105] Call trace:
[   12.247548]  dump_backtrace+0x0/0x150
[   12.251199]  show_stack+0x14/0x20
[   12.254505]  dump_stack+0x9c/0xc4
[   12.257809]  panic+0x13c/0x32c
[   12.260853]  complete_and_exit+0x0/0x20
[   12.264676]  do_group_exit+0x34/0x98
[   12.268241]  get_signal+0x104/0x668
[   12.271718]  do_notify_resume+0x2ac/0x380
[   12.275716]  work_pending+0x8/0x10
[   12.279109] SMP: stopping secondary CPUs
[   12.283025] Kernel Offset: disabled
[   12.286502] CPU features: 0x0002,20806000
[   12.290499] Memory Limit: none
[   12.293548] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

-- 
nvpublic

^ permalink raw reply

* Re: [PULL] virtio, vhost: fixes, features, performance
From: pr-tracker-bot @ 2019-07-17 18:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Linus Torvalds, kvm, virtualization, netdev, linux-kernel,
	aarcange, bharat.bhushan, bhelgaas, linux-arm-kernel, linux-mm,
	linux-parisc, davem, eric.auger, gustavo, hch, ihor.matushchak,
	James.Bottomley, jasowang, jean-philippe.brucker, jglisse, mst,
	natechancellor
In-Reply-To: <20190716113151-mutt-send-email-mst@kernel.org>

The pull request you sent on Tue, 16 Jul 2019 11:31:51 -0400:

> git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/3a1d5384b7decbff6519daa9c65a35665e227323

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply

* Re: [PATCH] net: dsa: sja1105: Add missing spin_unlock
From: Vladimir Oltean @ 2019-07-17 18:00 UTC (permalink / raw)
  To: YueHaibing
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, David S. Miller,
	lkml, netdev
In-Reply-To: <20190717141200.46604-1-yuehaibing@huawei.com>

On Wed, 17 Jul 2019 at 17:12, YueHaibing <yuehaibing@huawei.com> wrote:
>
> It should call spin_unlock() before return NULL.
> Detected by Coccinelle.
>
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Fixes: f3097be21bf1 net: ("dsa: sja1105: Add a state machine for RX timestamping")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---

Hi Yue,

Thanks for the patch. Wei Yongjun submitted an identical one a few
hours before yours: https://patchwork.ozlabs.org/patch/1133135/
Let's go with that version this time.

>  net/dsa/tag_sja1105.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
> index 1d96c9d..26363d7 100644
> --- a/net/dsa/tag_sja1105.c
> +++ b/net/dsa/tag_sja1105.c
> @@ -216,6 +216,7 @@ static struct sk_buff
>                 if (!skb) {
>                         dev_err_ratelimited(dp->ds->dev,
>                                             "Failed to copy stampable skb\n");
> +                       spin_unlock(&sp->data->meta_lock);
>                         return NULL;
>                 }
>                 sja1105_transfer_meta(skb, meta);
> --
> 2.7.4
>
>

Regards,
-Vladimir

^ permalink raw reply

* Re: [PATCH] net: dsa: sja1105: Fix missing unlock on error in sk_buff()
From: Vladimir Oltean @ 2019-07-17 17:58 UTC (permalink / raw)
  To: Wei Yongjun
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev,
	kernel-janitors
In-Reply-To: <20190717062956.127446-1-weiyongjun1@huawei.com>

On Wed, 17 Jul 2019 at 09:24, Wei Yongjun <weiyongjun1@huawei.com> wrote:
>
> Add the missing unlock before return from function sk_buff()
> in the error handling case.
>
> Fixes: f3097be21bf1 ("net: dsa: sja1105: Add a state machine for RX timestamping")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> ---

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>

>  net/dsa/tag_sja1105.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
> index 1d96c9d4a8e9..26363d72d25b 100644
> --- a/net/dsa/tag_sja1105.c
> +++ b/net/dsa/tag_sja1105.c
> @@ -216,6 +216,7 @@ static struct sk_buff
>                 if (!skb) {
>                         dev_err_ratelimited(dp->ds->dev,
>                                             "Failed to copy stampable skb\n");
> +                       spin_unlock(&sp->data->meta_lock);
>                         return NULL;
>                 }
>                 sja1105_transfer_meta(skb, meta);
>
>
>

^ permalink raw reply

* Re: [PATCH net-next 00/12] mlx5 TLS TX HW offload support
From: Jakub Kicinski @ 2019-07-17 17:41 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David Miller, netdev@vger.kernel.org, Eran Ben Elisha,
	Saeed Mahameed, Moshe Shemesh
In-Reply-To: <d5d5324e-b62a-ed90-603f-b30c7eea67ea@mellanox.com>

On Sun, 7 Jul 2019 06:44:27 +0000, Tariq Toukan wrote:
> On 7/6/2019 2:29 AM, David Miller wrote:
> > From: Tariq Toukan <tariqt@mellanox.com>
> > Date: Fri,  5 Jul 2019 18:30:10 +0300
> >   
> >> This series from Eran and me, adds TLS TX HW offload support to
> >> the mlx5 driver.  
> > 
> > Series applied, please deal with any further feedback you get from
> > Jakub et al.
> 
> I will followup with patches addressing Jakub's feedback.

Ping.

^ permalink raw reply

* Re: [RFC PATCH 0/5] PTP: add support for Intel's TGPIO controller
From: Richard Cochran @ 2019-07-17 17:39 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall
In-Reply-To: <87ef2p2lvc.fsf@linux.intel.com>

On Wed, Jul 17, 2019 at 09:52:55AM +0300, Felipe Balbi wrote:
> 
> It's just a pin, like a GPIO. So it would be a PCB trace, flat flex,
> copper wire... Anything, really.

Cool.  Are there any Intel CPUs available that have this feature?

Thanks,
Richard

^ permalink raw reply

* Re: [RFC PATCH 4/5] PTP: Add flag for non-periodic output
From: Richard Cochran @ 2019-07-17 17:36 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall
In-Reply-To: <87k1ch2m1i.fsf@linux.intel.com>

On Wed, Jul 17, 2019 at 09:49:13AM +0300, Felipe Balbi wrote:
> No worries, I'll work on this after vacations (I'll off for 2 weeks
> starting next week). I thought about adding a new IOCTL until I saw that
> rsv field. Oh well :-)

It would be great if you could fix up the PTP ioctls as a preface to
your series.

Thanks,
Richard

^ permalink raw reply

* [PATCH net-next v2 2/2] tc-testing: updated skbedit action tests with batch create/delete
From: Roman Mashak @ 2019-07-17 17:36 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak
In-Reply-To: <1563384992-9430-1-git-send-email-mrv@mojatatu.com>

Update TDC tests with cases varifying ability of TC to install or delete
batches of skbedit actions.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 .../tc-testing/tc-tests/actions/skbedit.json       | 47 ++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json b/tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json
index bf5ebf59c2d4..9cdd2e31ac2c 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json
@@ -670,5 +670,52 @@
         "teardown": [
             "$TC actions flush action skbedit"
         ]
+    },
+    {
+        "id": "630c",
+        "name": "Add batch of 32 skbedit actions with all parameters and cookie",
+        "category": [
+            "actions",
+            "skbedit"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action skbedit",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action skbedit queue_mapping 2 priority 10 mark 7/0xaabbccdd ptype host inheritdsfield index \\$i cookie aabbccddeeff112233445566778800a1 \\\"; args=\"\\$args\\$cmd\"; done && $TC actions add \\$args\"",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action skbedit",
+        "matchPattern": "^[ \t]+index [0-9]+ ref",
+        "matchCount": "32",
+        "teardown": [
+            "$TC actions flush action skbedit"
+        ]
+    },
+    {
+        "id": "706d",
+        "name": "Delete batch of 32 skbedit actions with all parameters",
+        "category": [
+            "actions",
+            "skbedit"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action skbedit",
+                0,
+                1,
+                255
+            ],
+            "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action skbedit queue_mapping 2 priority 10 mark 7/0xaabbccdd ptype host inheritdsfield index \\$i \\\"; args=\\\"\\$args\\$cmd\\\"; done && $TC actions add \\$args\""
+        ],
+        "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action skbedit index \\$i \\\"; args=\"\\$args\\$cmd\"; done && $TC actions del \\$args\"",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action skbedit",
+        "matchPattern": "^[ \t]+index [0-9]+ ref",
+        "matchCount": "0",
+        "teardown": []
     }
 ]
-- 
2.7.4


^ permalink raw reply related

* [PATCH net-next v2 1/2] net sched: update skbedit action for batched events operations
From: Roman Mashak @ 2019-07-17 17:36 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak
In-Reply-To: <1563384992-9430-1-git-send-email-mrv@mojatatu.com>

Add get_fill_size() routine used to calculate the action size
when building a batch of events.

Fixes: ca9b0e27e ("pkt_action: add new action skbedit")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 net/sched/act_skbedit.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index 215a06705cef..dc3c653ec45e 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -306,6 +306,17 @@ static int tcf_skbedit_search(struct net *net, struct tc_action **a, u32 index)
 	return tcf_idr_search(tn, a, index);
 }
 
+static size_t tcf_skbedit_get_fill_size(const struct tc_action *act)
+{
+	return nla_total_size(sizeof(struct tc_skbedit))
+		+ nla_total_size(sizeof(u32)) /* TCA_SKBEDIT_PRIORITY */
+		+ nla_total_size(sizeof(u16)) /* TCA_SKBEDIT_QUEUE_MAPPING */
+		+ nla_total_size(sizeof(u32)) /* TCA_SKBEDIT_MARK */
+		+ nla_total_size(sizeof(u16)) /* TCA_SKBEDIT_PTYPE */
+		+ nla_total_size(sizeof(u32)) /* TCA_SKBEDIT_MASK */
+		+ nla_total_size_64bit(sizeof(u64)); /* TCA_SKBEDIT_FLAGS */
+}
+
 static struct tc_action_ops act_skbedit_ops = {
 	.kind		=	"skbedit",
 	.id		=	TCA_ID_SKBEDIT,
@@ -315,6 +326,7 @@ static struct tc_action_ops act_skbedit_ops = {
 	.init		=	tcf_skbedit_init,
 	.cleanup	=	tcf_skbedit_cleanup,
 	.walk		=	tcf_skbedit_walker,
+	.get_fill_size	=	tcf_skbedit_get_fill_size,
 	.lookup		=	tcf_skbedit_search,
 	.size		=	sizeof(struct tcf_skbedit),
 };
-- 
2.7.4


^ permalink raw reply related

* [PATCH net-next v2 0/2] Fix batched event generation for skbedit action
From: Roman Mashak @ 2019-07-17 17:36 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak

When adding or deleting a batch of entries, the kernel sends up to
TCA_ACT_MAX_PRIO (defined to 32 in kernel) entries in an event to user
space. However it does not consider that the action sizes may vary and
require different skb sizes.

For example, consider the following script adding 32 entries with all
supported skbedit parameters (in order to maximize netlink messages size):

% cat tc-batch.sh
TC="sudo /mnt/iproute2.git/tc/tc"

$TC actions flush action skbedit
for i in `seq 1 $1`;
do
   cmd="action skbedit queue_mapping 2 priority 10 mark 7/0xaabbccdd ptype host inheritdsfield index $i "
   args=$args$cmd
done
$TC actions add $args
%
% ./tc-batch.sh 32
Error: Failed to fill netlink attributes while adding TC action.
We have an error talking to the kernel
%

patch 1 adds callback in tc_action_ops of skbedit action, which calculates
the action size, and passes size to tcf_add_notify()/tcf_del_notify().

patch 2 updates the TDC test suite with relevant skbedit test cases.

v2:
   Added Fixes: tag
   Added cover letter with details on the patchset

Roman Mashak (2):
  net sched: update skbedit action for batched events operations
  tc-testing: updated skbedit action tests with batch create/delete

 net/sched/act_skbedit.c                            | 12 ++++++
 .../tc-testing/tc-tests/actions/skbedit.json       | 47 ++++++++++++++++++++++
 2 files changed, 59 insertions(+)

-- 
2.7.4

^ permalink raw reply

* Re: [PATCH net-next v1] fix: taprio: Change type of txtime-delay parameter to u32
From: Patel, Vedang @ 2019-07-17 17:32 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, Kirsher, Jeffrey T, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, intel-wired-lan@lists.osuosl.org,
	Gomes, Vinicius, l@dorileo.org, Jakub Kicinski, Murali Karicheri,
	Sergei Shtylyov, Eric Dumazet, Brown, Aaron F, Stephen Hemminger
In-Reply-To: <20190716.141904.308520366333461345.davem@davemloft.net>



> On Jul 16, 2019, at 2:19 PM, David Miller <davem@davemloft.net> wrote:
> 
> From: Vedang Patel <vedang.patel@intel.com>
> Date: Tue, 16 Jul 2019 12:52:18 -0700
> 
>> During the review of the iproute2 patches for txtime-assist mode, it was
>> pointed out that it does not make sense for the txtime-delay parameter to
>> be negative. So, change the type of the parameter from s32 to u32.
>> 
>> Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode")
>> Reported-by: Stephen Hemminger <stephen@networkplumber.org>
>> Signed-off-by: Vedang Patel <vedang.patel@intel.com>
> 
> You should have targetted this at 'net' as that's the only tree open
> right now.
> 
> I'll apply this.

Sorry about that.

I will keep this in mind from next time. 

Thanks,
Vedang

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix narrower loads on s390
From: Y Song @ 2019-07-17 16:25 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: bpf, netdev, gor, heiko.carstens
In-Reply-To: <4311B5C3-8D1B-4958-9CDE-450662A7851D@linux.ibm.com>

On Wed, Jul 17, 2019 at 3:36 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> > Am 17.07.2019 um 11:21 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
> >
> >> Am 17.07.2019 um 07:11 schrieb Y Song <ys114321@gmail.com>:
> >>
> >> [sorry, resend again as previous one has come text messed out due to
> >> networking issues]
> >>
> >> On Tue, Jul 16, 2019 at 10:08 PM Y Song <ys114321@gmail.com> wrote:
> >>>
> >>> On Tue, Jul 16, 2019 at 4:59 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >>>>
> >>>> test_pkt_md_access is failing on s390, since the associated eBPF prog
> >>>> returns TC_ACT_SHOT, which in turn happens because loading a part of a
> >>>> struct __sk_buff field produces an incorrect result.
> >>>>
> >>>> The problem is that when verifier emits the code to replace partial load
> >>>> of a field with a full load, a shift and a bitwise AND, it assumes that
> >>>> the machine is little endian.
> >>>>
> >>>> Adjust shift count calculation to account for endianness.
> >>>>
> >>>> Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields")
> >>>> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> >>>> ---
> >>>> kernel/bpf/verifier.c | 8 ++++++--
> >>>> 1 file changed, 6 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >>>> index 5900cbb966b1..3f9353653558 100644
> >>>> --- a/kernel/bpf/verifier.c
> >>>> +++ b/kernel/bpf/verifier.c
> >>>> @@ -8616,8 +8616,12 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
> >>>>               }
> >>>>
> >>>>               if (is_narrower_load && size < target_size) {
> >>>> -                       u8 shift = (off & (size_default - 1)) * 8;
> >>>> -
> >>>> +                       u8 load_off = off & (size_default - 1);
> >>>> +#ifdef __LITTLE_ENDIAN
> >>>> +                       u8 shift = load_off * 8;
> >>>> +#else
> >>>> +                       u8 shift = (size_default - (load_off + size)) * 8;
> >>>> +#endif
> >>>
> >> All the values are in register. The shifting operations should be the
> >> same for big endian and little endian, e.g., value 64 >> 2 = 16 when
> >> value "64" is in register. So I did not see a problem here.
> >>
> >> Could you elaborate which field access in test_pkt_md_access
> >> caused problem?
> >
> > The very first one: TEST_FIELD(__u8,  len, 0xFF);
> >
> >> It would be good if you can give detailed memory layout and register values
> >> to illustrate the problem.
> >
> > Suppose len = 0x11223344. On a big endian system, this would be
> >
> > 11 22 33 44
> >
> > Now, we would like to do *(u8 *)&len, the desired result is 0x11.
> > Verifier should emit the following: ((*(u32 *)&len) >> 24) & 0xff, but as
> > of today it misses the shift.
> >
> > On a little endian system the layout is:
> >
> > 44 33 22 11
> >
> > and the desired result is different - 0x44. Verifier correctly emits
> > (*(u32 *)&len) & 0xff.
>
> I’ve just realized, that this example does not reflect what the test is
> doing on big-endian systems (there is an #ifdef for those).
>
> Here is a better one: len=0x11223344 and we would like to do
> ((u8 *)&len)[3].
>
> len is represented as `11 22 33 44` in memory, so the desired result is
> 0x44. It can be obtained by doing (*(u32 *)&len) & 0xff, but today the
> verifier does ((*(u32 *)&len) >> 24) & 0xff instead.

What you described above for the memory layout all makes sense.
The root cause is for big endian, we should do *((u8 *)&len + 3).
This is exactly what macros in test_pkt_md_access.c tries to do.

if  __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#define TEST_FIELD(TYPE, FIELD, MASK)                                   \
        {                                                               \
                TYPE tmp = *(volatile TYPE *)&skb->FIELD;               \
                if (tmp != ((*(volatile __u32 *)&skb->FIELD) & MASK))   \
                        return TC_ACT_SHOT;                             \
        }
#else
#define TEST_FIELD_OFFSET(a, b) ((sizeof(a) - sizeof(b)) / sizeof(b))
#define TEST_FIELD(TYPE, FIELD, MASK)                                   \
        {                                                               \
                TYPE tmp = *((volatile TYPE *)&skb->FIELD +             \
                              TEST_FIELD_OFFSET(skb->FIELD, TYPE));     \
                if (tmp != ((*(volatile __u32 *)&skb->FIELD) & MASK))   \
                        return TC_ACT_SHOT;                             \
        }
#endif

Could you check whether your __BYTE_ORDER__ is set
correctly or not for this case? You may need to tweak Makefile
if you are doing cross compilation, I am not sure how as I
did not have environment.

^ permalink raw reply

* Re: [PATCH v4 13/15] docs: ABI: testing: make the files compatible with ReST output
From: Jonathan Cameron @ 2019-07-17 16:13 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: gregkh, Rafael J. Wysocki, Len Brown, Jonathan Cameron,
	Hartmut Knaack, Lars-Peter Clausen, Peter Meerwald-Stadler,
	Peter Rosin, Benson Leung, Enric Balletbo i Serra, Guenter Roeck,
	Maxime Coquelin, Alexandre Torgue, Fabrice Gasnier,
	Frederic Barrat, Andrew Donnellan, Sebastian Reichel,
	Heikki Krogerus, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Mike Kravetz, Nicolas Ferre,
	Alexandre Belloni, Ludovic Desroches, Richard Cochran,
	Jonathan Corbet, linux-acpi, linux-iio, linux-stm32,
	linux-arm-kernel, linuxppc-dev, linux-pm, linux-usb, xen-devel,
	linux-mm, netdev, linux-doc
In-Reply-To: <88d15fa38167e3f2e73e65e1c1a1f39bca0267b4.1563365880.git.mchehab+samsung@kernel.org>

On Wed, 17 Jul 2019 09:28:17 -0300
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> wrote:

> Some files over there won't parse well by Sphinx.
> 
> Fix them.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Hi Mauro,

Does feel like this one should perhaps have been broken up a touch!

For the IIO ones I've eyeballed it rather than testing the results

Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>



^ permalink raw reply

* Re: KASAN: use-after-free Read in nr_release
From: syzbot @ 2019-07-17 16:11 UTC (permalink / raw)
  To: davem, hdanton, linux-hams, linux-kernel, netdev, ralf,
	syzkaller-bugs
In-Reply-To: <0000000000007e8b70058acbd60f@google.com>

syzbot has found a reproducer for the following crash on:

HEAD commit:    192f0f8e Merge tag 'powerpc-5.3-1' of git://git.kernel.org..
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=171bde00600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=87305c3ca9c25c70
dashboard link: https://syzkaller.appspot.com/bug?extid=6eaef7158b19e3fec3a0
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15882cd0600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+6eaef7158b19e3fec3a0@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in atomic_read  
/./include/asm-generic/atomic-instrumented.h:26 [inline]
BUG: KASAN: use-after-free in refcount_inc_not_zero_checked+0x81/0x200  
/lib/refcount.c:123
Read of size 4 at addr ffff88807be6b6c0 by task syz-executor.0/11548

CPU: 0 PID: 11548 Comm: syz-executor.0 Not tainted 5.2.0+ #66
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack /lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 /lib/dump_stack.c:113
  print_address_description.cold+0xd4/0x306 /mm/kasan/report.c:351
  __kasan_report.cold+0x1b/0x36 /mm/kasan/report.c:482
  kasan_report+0x12/0x20 /mm/kasan/common.c:612
  check_memory_region_inline /mm/kasan/generic.c:185 [inline]
  check_memory_region+0x134/0x1a0 /mm/kasan/generic.c:192
  __kasan_check_read+0x11/0x20 /mm/kasan/common.c:92
  atomic_read /./include/asm-generic/atomic-instrumented.h:26 [inline]
  refcount_inc_not_zero_checked+0x81/0x200 /lib/refcount.c:123
  refcount_inc_checked+0x17/0x70 /lib/refcount.c:156
  sock_hold /./include/net/sock.h:649 [inline]
  nr_release+0x62/0x3e0 /net/netrom/af_netrom.c:520
  __sock_release+0xce/0x280 /net/socket.c:586
  sock_close+0x1e/0x30 /net/socket.c:1264
  __fput+0x2ff/0x890 /fs/file_table.c:280
  ____fput+0x16/0x20 /fs/file_table.c:313
  task_work_run+0x145/0x1c0 /kernel/task_work.c:113
  tracehook_notify_resume /./include/linux/tracehook.h:185 [inline]
  exit_to_usermode_loop+0x316/0x380 /arch/x86/entry/common.c:163
  prepare_exit_to_usermode /arch/x86/entry/common.c:194 [inline]
  syscall_return_slowpath /arch/x86/entry/common.c:274 [inline]
  do_syscall_64+0x5a9/0x6a0 /arch/x86/entry/common.c:299
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 48  
83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48  
89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00007ffe5eb40550 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000413501
RDX: 0000001b2be20000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 00007ffe5eb40630 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 0000000000760a68 R15: ffffffffffffffff

Allocated by task 0:
  save_stack+0x23/0x90 /mm/kasan/common.c:69
  set_track /mm/kasan/common.c:77 [inline]
  __kasan_kmalloc /mm/kasan/common.c:487 [inline]
  __kasan_kmalloc.constprop.0+0xcf/0xe0 /mm/kasan/common.c:460
  kasan_kmalloc+0x9/0x10 /mm/kasan/common.c:501
  __do_kmalloc /mm/slab.c:3655 [inline]
  __kmalloc+0x163/0x780 /mm/slab.c:3664
  kmalloc /./include/linux/slab.h:557 [inline]
  sk_prot_alloc+0x23a/0x310 /net/core/sock.c:1603
  sk_alloc+0x39/0xf70 /net/core/sock.c:1657
  nr_make_new /net/netrom/af_netrom.c:476 [inline]
  nr_rx_frame+0x733/0x1e80 /net/netrom/af_netrom.c:959
  nr_loopback_timer+0x7b/0x170 /net/netrom/nr_loopback.c:59
  call_timer_fn+0x1ac/0x780 /kernel/time/timer.c:1322
  expire_timers /kernel/time/timer.c:1366 [inline]
  __run_timers /kernel/time/timer.c:1685 [inline]
  __run_timers /kernel/time/timer.c:1653 [inline]
  run_timer_softirq+0x697/0x17a0 /kernel/time/timer.c:1698
  __do_softirq+0x262/0x98c /kernel/softirq.c:292

Freed by task 11551:
  save_stack+0x23/0x90 /mm/kasan/common.c:69
  set_track /mm/kasan/common.c:77 [inline]
  __kasan_slab_free+0x102/0x150 /mm/kasan/common.c:449
  kasan_slab_free+0xe/0x10 /mm/kasan/common.c:457
  __cache_free /mm/slab.c:3425 [inline]
  kfree+0x10a/0x2c0 /mm/slab.c:3756
  sk_prot_free /net/core/sock.c:1640 [inline]
  __sk_destruct+0x4f7/0x6e0 /net/core/sock.c:1726
  sk_destruct+0x86/0xa0 /net/core/sock.c:1734
  __sk_free+0xfb/0x360 /net/core/sock.c:1745
  sk_free+0x42/0x50 /net/core/sock.c:1756
  sock_put /./include/net/sock.h:1725 [inline]
  sock_efree+0x61/0x80 /net/core/sock.c:2042
  skb_release_head_state+0xeb/0x260 /net/core/skbuff.c:652
  skb_release_all+0x16/0x60 /net/core/skbuff.c:663
  __kfree_skb /net/core/skbuff.c:679 [inline]
  kfree_skb /net/core/skbuff.c:697 [inline]
  kfree_skb+0x101/0x3c0 /net/core/skbuff.c:691
  nr_accept+0x570/0x720 /net/netrom/af_netrom.c:819
  __sys_accept4+0x34e/0x6a0 /net/socket.c:1750
  __do_sys_accept4 /net/socket.c:1785 [inline]
  __se_sys_accept4 /net/socket.c:1782 [inline]
  __x64_sys_accept4+0x97/0xf0 /net/socket.c:1782
  do_syscall_64+0xfd/0x6a0 /arch/x86/entry/common.c:296
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff88807be6b640
  which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 128 bytes inside of
  2048-byte region [ffff88807be6b640, ffff88807be6be40)
The buggy address belongs to the page:
page:ffffea0001ef9a80 refcount:1 mapcount:0 mapping:ffff8880aa400e00  
index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea0001ef9708 ffffea0002453708 ffff8880aa400e00
raw: 0000000000000000 ffff88807be6a540 0000000100000003 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff88807be6b580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff88807be6b600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
> ffff88807be6b680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                            ^
  ffff88807be6b700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff88807be6b780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
------------[ cut here ]------------
ODEBUG: activate not available (active state 0) object type: timer_list  
hint: nr_t1timer_expiry+0x0/0x340 /net/netrom/nr_timer.c:157
WARNING: CPU: 0 PID: 11548 at lib/debugobjects.c:481  
debug_print_object+0x168/0x250 /lib/debugobjects.c:481
Modules linked in:
CPU: 0 PID: 11548 Comm: syz-executor.0 Tainted: G    B             5.2.0+  
#66
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:debug_print_object+0x168/0x250 /lib/debugobjects.c:481
Code: dd a0 48 c5 87 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 b5 00 00 00 48  
8b 14 dd a0 48 c5 87 48 c7 c7 00 3e c5 87 e8 f0 b1 07 fe <0f> 0b 83 05 13  
86 66 06 01 48 83 c4 20 5b 41 5c 41 5d 41 5e 5d c3
RSP: 0018:ffff88809151faf0 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff815c1016 RDI: ffffed10122a3f50
RBP: ffff88809151fb30 R08: ffff8880943fe300 R09: ffffed1015d040f1
R10: ffffed1015d040f0 R11: ffff8880ae820787 R12: 0000000000000001
R13: ffffffff88db4ca0 R14: ffffffff8161a860 R15: 1ffff110122a3f6c
FS:  0000555555737940(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fada90cddb8 CR3: 00000000a7f80000 CR4: 00000000001406f0
Call Trace:
  debug_object_activate+0x2e5/0x470 /lib/debugobjects.c:680
  debug_timer_activate /kernel/time/timer.c:710 [inline]
  __mod_timer /kernel/time/timer.c:1035 [inline]
  mod_timer+0x452/0xc10 /kernel/time/timer.c:1096
  sk_reset_timer+0x24/0x60 /net/core/sock.c:2821
  nr_start_t1timer+0x6e/0xa0 /net/netrom/nr_timer.c:52
  nr_release+0x1de/0x3e0 /net/netrom/af_netrom.c:537
  __sock_release+0xce/0x280 /net/socket.c:586
  sock_close+0x1e/0x30 /net/socket.c:1264
  __fput+0x2ff/0x890 /fs/file_table.c:280
  ____fput+0x16/0x20 /fs/file_table.c:313
  task_work_run+0x145/0x1c0 /kernel/task_work.c:113
  tracehook_notify_resume /./include/linux/tracehook.h:185 [inline]
  exit_to_usermode_loop+0x316/0x380 /arch/x86/entry/common.c:163
  prepare_exit_to_usermode /arch/x86/entry/common.c:194 [inline]
  syscall_return_slowpath /arch/x86/entry/common.c:274 [inline]
  do_syscall_64+0x5a9/0x6a0 /arch/x86/entry/common.c:299
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413501
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 48  
83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48  
89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00007ffe5eb40550 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000413501
RDX: 0000001b2be20000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
R10: 00007ffe5eb40630 R11: 0000000000000293 R12: 000000000075c9a0
R13: 000000000075c9a0 R14: 0000000000760a68 R15: ffffffffffffffff
irq event stamp: 1316
hardirqs last  enabled at (1315): [<ffffffff873119e8>]  
__raw_spin_unlock_irq /./include/linux/spinlock_api_smp.h:168 [inline]
hardirqs last  enabled at (1315): [<ffffffff873119e8>]  
_raw_spin_unlock_irq+0x28/0x90 /kernel/locking/spinlock.c:199
hardirqs last disabled at (1316): [<ffffffff8731216f>]  
__raw_spin_lock_irqsave /./include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (1316): [<ffffffff8731216f>]  
_raw_spin_lock_irqsave+0x6f/0xcd /kernel/locking/spinlock.c:159
softirqs last  enabled at (1168): [<ffffffff812923fe>] memcpy  
/./include/linux/string.h:359 [inline]
softirqs last  enabled at (1168): [<ffffffff812923fe>]  
fpu__copy+0x17e/0x8c0 /arch/x86/kernel/fpu/core.c:195
softirqs last disabled at (1166): [<ffffffff81292327>] fpu__copy+0xa7/0x8c0  
/arch/x86/kernel/fpu/core.c:183
---[ end trace c9359faa0df5eab0 ]---


^ permalink raw reply

* Re: IPv6 L2TP issues related to 93531c67
From: Paul Donohue @ 2019-07-17 15:37 UTC (permalink / raw)
  To: David Ahern; +Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev
In-Reply-To: <22e3eabc-526d-8265-ac39-a6aefc9ef7db@gmail.com>

On Wed, Jul 17, 2019 at 05:11:21AM -0600, David Ahern wrote:
> This fixes the test script (whitespace damaged but simple enough to
> manually patch). See if it fixes the problem with your more complex
> setup. If so I will send a formal patch.

Yes! I applied this on top of f632a8170a6b667ee4e3f552087588f0fe13c4bb (master branch), and it fixes the problem on my systems.

Thank you very much!  I really appreciate all of your work on Linux networking!

^ permalink raw reply

* Re: [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed
From: Michael S. Tsirkin @ 2019-07-17 14:59 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190717113030.163499-6-sgarzare@redhat.com>

On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:
> Since now we are able to split packets, we can avoid limiting
> their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> packet size.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>


OK so this is kind of like GSO where we are passing
64K packets to the vsock and then split at the
low level.


> ---
>  net/vmw_vsock/virtio_transport_common.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 56fab3f03d0e..94cc0fa3e848 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -181,8 +181,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  	vvs = vsk->trans;
>  
>  	/* we can send less than pkt_len bytes */
> -	if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> -		pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> +	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> +		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>  
>  	/* virtio_transport_get_credit might return less than pkt_len credit */
>  	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> -- 
> 2.20.1

^ permalink raw reply

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Michael S. Tsirkin @ 2019-07-17 14:54 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190717113030.163499-5-sgarzare@redhat.com>

On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> If the packets to sent to the guest are bigger than the buffer
> available, we can split them, using multiple buffers and fixing
> the length in the packet header.
> This is safe since virtio-vsock supports only stream sockets.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>

So how does it work right now? If an app
does sendmsg with a 64K buffer and the other
side publishes 4K buffers - does it just stall?


> ---
>  drivers/vhost/vsock.c                   | 66 ++++++++++++++++++-------
>  net/vmw_vsock/virtio_transport_common.c | 15 ++++--
>  2 files changed, 60 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 6c8390a2af52..9f57736fe15e 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -102,7 +102,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>  		struct iov_iter iov_iter;
>  		unsigned out, in;
>  		size_t nbytes;
> -		size_t len;
> +		size_t iov_len, payload_len;
>  		int head;
>  
>  		spin_lock_bh(&vsock->send_pkt_list_lock);
> @@ -147,8 +147,24 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>  			break;
>  		}
>  
> -		len = iov_length(&vq->iov[out], in);
> -		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
> +		iov_len = iov_length(&vq->iov[out], in);
> +		if (iov_len < sizeof(pkt->hdr)) {
> +			virtio_transport_free_pkt(pkt);
> +			vq_err(vq, "Buffer len [%zu] too small\n", iov_len);
> +			break;
> +		}
> +
> +		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len);
> +		payload_len = pkt->len - pkt->off;
> +
> +		/* If the packet is greater than the space available in the
> +		 * buffer, we split it using multiple buffers.
> +		 */
> +		if (payload_len > iov_len - sizeof(pkt->hdr))
> +			payload_len = iov_len - sizeof(pkt->hdr);
> +
> +		/* Set the correct length in the header */
> +		pkt->hdr.len = cpu_to_le32(payload_len);
>  
>  		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
>  		if (nbytes != sizeof(pkt->hdr)) {
> @@ -157,33 +173,47 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>  			break;
>  		}
>  
> -		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
> -		if (nbytes != pkt->len) {
> +		nbytes = copy_to_iter(pkt->buf + pkt->off, payload_len,
> +				      &iov_iter);
> +		if (nbytes != payload_len) {
>  			virtio_transport_free_pkt(pkt);
>  			vq_err(vq, "Faulted on copying pkt buf\n");
>  			break;
>  		}
>  
> -		vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
> +		vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
>  		added = true;
>  
> -		if (pkt->reply) {
> -			int val;
> -
> -			val = atomic_dec_return(&vsock->queued_replies);
> -
> -			/* Do we have resources to resume tx processing? */
> -			if (val + 1 == tx_vq->num)
> -				restart_tx = true;
> -		}
> -
>  		/* Deliver to monitoring devices all correctly transmitted
>  		 * packets.
>  		 */
>  		virtio_transport_deliver_tap_pkt(pkt);
>  
> -		total_len += pkt->len;
> -		virtio_transport_free_pkt(pkt);
> +		pkt->off += payload_len;
> +		total_len += payload_len;
> +
> +		/* If we didn't send all the payload we can requeue the packet
> +		 * to send it with the next available buffer.
> +		 */
> +		if (pkt->off < pkt->len) {
> +			spin_lock_bh(&vsock->send_pkt_list_lock);
> +			list_add(&pkt->list, &vsock->send_pkt_list);
> +			spin_unlock_bh(&vsock->send_pkt_list_lock);
> +		} else {
> +			if (pkt->reply) {
> +				int val;
> +
> +				val = atomic_dec_return(&vsock->queued_replies);
> +
> +				/* Do we have resources to resume tx
> +				 * processing?
> +				 */
> +				if (val + 1 == tx_vq->num)
> +					restart_tx = true;
> +			}
> +
> +			virtio_transport_free_pkt(pkt);
> +		}
>  	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
>  	if (added)
>  		vhost_signal(&vsock->dev, vq);
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 34a2b42313b7..56fab3f03d0e 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -97,8 +97,17 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
>  	struct virtio_vsock_pkt *pkt = opaque;
>  	struct af_vsockmon_hdr *hdr;
>  	struct sk_buff *skb;
> +	size_t payload_len;
> +	void *payload_buf;
>  
> -	skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + pkt->len,
> +	/* A packet could be split to fit the RX buffer, so we can retrieve
> +	 * the payload length from the header and the buffer pointer taking
> +	 * care of the offset in the original packet.
> +	 */
> +	payload_len = le32_to_cpu(pkt->hdr.len);
> +	payload_buf = pkt->buf + pkt->off;
> +
> +	skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + payload_len,
>  			GFP_ATOMIC);
>  	if (!skb)
>  		return NULL;
> @@ -138,8 +147,8 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
>  
>  	skb_put_data(skb, &pkt->hdr, sizeof(pkt->hdr));
>  
> -	if (pkt->len) {
> -		skb_put_data(skb, pkt->buf, pkt->len);
> +	if (payload_len) {
> +		skb_put_data(skb, payload_buf, payload_len);
>  	}
>  
>  	return skb;
> -- 
> 2.20.1

^ permalink raw reply

* Re: [PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
From: Michael S. Tsirkin @ 2019-07-17 14:51 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190717113030.163499-4-sgarzare@redhat.com>

On Wed, Jul 17, 2019 at 01:30:28PM +0200, Stefano Garzarella wrote:
> fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
> the same spinlock also if we are in the TX path.
> 
> Move also buf_alloc under the same lock.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>

Wait a second is this a bugfix?
If it's used under the wrong lock won't values get corrupted?
Won't traffic then stall or more data get to sent than
credits?

> ---
>  include/linux/virtio_vsock.h            | 2 +-
>  net/vmw_vsock/virtio_transport_common.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 49fc9d20bc43..4c7781f4b29b 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -35,7 +35,6 @@ struct virtio_vsock_sock {
>  
>  	/* Protected by tx_lock */
>  	u32 tx_cnt;
> -	u32 buf_alloc;
>  	u32 peer_fwd_cnt;
>  	u32 peer_buf_alloc;
>  
> @@ -43,6 +42,7 @@ struct virtio_vsock_sock {
>  	u32 fwd_cnt;
>  	u32 last_fwd_cnt;
>  	u32 rx_bytes;
> +	u32 buf_alloc;
>  	struct list_head rx_queue;
>  };
>  
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index a85559d4d974..34a2b42313b7 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -210,11 +210,11 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
>  
>  void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
>  {
> -	spin_lock_bh(&vvs->tx_lock);
> +	spin_lock_bh(&vvs->rx_lock);
>  	vvs->last_fwd_cnt = vvs->fwd_cnt;
>  	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
>  	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
> -	spin_unlock_bh(&vvs->tx_lock);
> +	spin_unlock_bh(&vvs->rx_lock);
>  }
>  EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
>  
> -- 
> 2.20.1

^ permalink raw reply

* [PATCH iproute2-rc v1 5/7] rdma: Add stat manual mode support
From: Leon Romanovsky @ 2019-07-17 14:31 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Leon Romanovsky, netdev, David Ahern, Mark Zhang,
	RDMA mailing list
In-Reply-To: <20190717143157.27205-1-leon@kernel.org>

From: Mark Zhang <markz@mellanox.com>

In manual mode a QP can be manually bound to a counter. If the counter
id(cntn) is not specified that kernel will allocate one. After a
successful bind, the cntn can be seen through "rdma statistic qp show".
And in unbind if lqpn is not specified then all QPs on this counter will
be unbound.
The manual and auto mode are mutual-exclusive.

Examples:
$ rdma statistic qp bind link mlx5_2/1 lqpn 178
$ rdma statistic qp bind link mlx5_2/1 lqpn 178 cntn 4
$ rdma statistic qp unbind link mlx5_2/1 cntn 4
$ rdma statistic qp unbind link mlx5_2/1 cntn 4 lqpn 178

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 rdma/stat.c | 192 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 192 insertions(+)

diff --git a/rdma/stat.c b/rdma/stat.c
index ad1cc063..942c1ac3 100644
--- a/rdma/stat.c
+++ b/rdma/stat.c
@@ -15,6 +15,8 @@ static int stat_help(struct rd *rd)
 	pr_out("       %s statistic OBJECT show link [ DEV/PORT_INDEX ] [ FILTER-NAME FILTER-VALUE ]\n", rd->filename);
 	pr_out("       %s statistic OBJECT mode\n", rd->filename);
 	pr_out("       %s statistic OBJECT set COUNTER_SCOPE [DEV/PORT_INDEX] auto {CRITERIA | off}\n", rd->filename);
+	pr_out("       %s statistic OBJECT bind COUNTER_SCOPE [DEV/PORT_INDEX] [OBJECT-ID] [COUNTER-ID]\n", rd->filename);
+	pr_out("       %s statistic OBJECT unbind COUNTER_SCOPE [DEV/PORT_INDEX] [COUNTER-ID]\n", rd->filename);
 	pr_out("where  OBJECT: = { qp }\n");
 	pr_out("       CRITERIA : = { type }\n");
 	pr_out("       COUNTER_SCOPE: = { link | dev }\n");
@@ -25,6 +27,10 @@ static int stat_help(struct rd *rd)
 	pr_out("       %s statistic qp mode link mlx5_0\n", rd->filename);
 	pr_out("       %s statistic qp set link mlx5_2/1 auto type on\n", rd->filename);
 	pr_out("       %s statistic qp set link mlx5_2/1 auto off\n", rd->filename);
+	pr_out("       %s statistic qp bind link mlx5_2/1 lqpn 178\n", rd->filename);
+	pr_out("       %s statistic qp bind link mlx5_2/1 lqpn 178 cntn 4\n", rd->filename);
+	pr_out("       %s statistic qp unbind link mlx5_2/1 cntn 4\n", rd->filename);
+	pr_out("       %s statistic qp unbind link mlx5_2/1 cntn 4 lqpn 178\n", rd->filename);
 
 	return 0;
 }
@@ -467,6 +473,190 @@ static int stat_qp_set(struct rd *rd)
 	return rd_exec_cmd(rd, cmds, "parameter");
 }
 
+static int stat_get_arg(struct rd *rd, const char *arg)
+{
+	int value = 0;
+	char *endp;
+
+	if (strcmpx(rd_argv(rd), arg) != 0)
+		return -EINVAL;
+
+	rd_arg_inc(rd);
+	value = strtol(rd_argv(rd), &endp, 10);
+	rd_arg_inc(rd);
+
+	return value;
+}
+
+static int stat_one_qp_bind(struct rd *rd)
+{
+	int lqpn = 0, cntn = 0, ret;
+	uint32_t seq;
+
+	if (rd_no_arg(rd)) {
+		stat_help(rd);
+		return -EINVAL;
+	}
+
+	ret = rd_build_filter(rd, stat_valid_filters);
+	if (ret)
+		return ret;
+
+	lqpn = stat_get_arg(rd, "lqpn");
+
+	rd_prepare_msg(rd, RDMA_NLDEV_CMD_STAT_SET,
+		       &seq, (NLM_F_REQUEST | NLM_F_ACK));
+
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_STAT_MODE,
+			 RDMA_COUNTER_MODE_MANUAL);
+
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_STAT_RES, RDMA_NLDEV_ATTR_RES_QP);
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_DEV_INDEX, rd->dev_idx);
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_PORT_INDEX, rd->port_idx);
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_RES_LQPN, lqpn);
+
+	if (rd_argc(rd)) {
+		cntn = stat_get_arg(rd, "cntn");
+		mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_STAT_COUNTER_ID,
+				 cntn);
+	}
+
+	return rd_sendrecv_msg(rd, seq);
+}
+
+static int do_stat_qp_unbind_lqpn(struct rd *rd, uint32_t cntn, uint32_t lqpn)
+{
+	uint32_t seq;
+
+	rd_prepare_msg(rd, RDMA_NLDEV_CMD_STAT_DEL,
+		       &seq, (NLM_F_REQUEST | NLM_F_ACK));
+
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_STAT_MODE,
+			 RDMA_COUNTER_MODE_MANUAL);
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_STAT_RES, RDMA_NLDEV_ATTR_RES_QP);
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_DEV_INDEX, rd->dev_idx);
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_PORT_INDEX, rd->port_idx);
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_STAT_COUNTER_ID, cntn);
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_RES_LQPN, lqpn);
+
+	return rd_sendrecv_msg(rd, seq);
+}
+
+static int stat_get_counter_parse_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX] = {};
+	struct nlattr *nla_table, *nla_entry;
+	struct rd *rd = data;
+	uint32_t lqpn, cntn;
+	int err;
+
+	mnl_attr_parse(nlh, 0, rd_attr_cb, tb);
+
+	if (!tb[RDMA_NLDEV_ATTR_STAT_COUNTER_ID])
+		return MNL_CB_ERROR;
+	cntn = mnl_attr_get_u32(tb[RDMA_NLDEV_ATTR_STAT_COUNTER_ID]);
+
+	nla_table = tb[RDMA_NLDEV_ATTR_RES_QP];
+	if (!nla_table)
+		return MNL_CB_ERROR;
+
+	mnl_attr_for_each_nested(nla_entry, nla_table) {
+		struct nlattr *nla_line[RDMA_NLDEV_ATTR_MAX] = {};
+
+		err = mnl_attr_parse_nested(nla_entry, rd_attr_cb, nla_line);
+		if (err != MNL_CB_OK)
+			return -EINVAL;
+
+		if (!nla_line[RDMA_NLDEV_ATTR_RES_LQPN])
+			return -EINVAL;
+
+		lqpn = mnl_attr_get_u32(nla_line[RDMA_NLDEV_ATTR_RES_LQPN]);
+		err = do_stat_qp_unbind_lqpn(rd, cntn, lqpn);
+		if (err)
+			return MNL_CB_ERROR;
+	}
+
+	return MNL_CB_OK;
+}
+
+static int stat_one_qp_unbind(struct rd *rd)
+{
+	int flags = NLM_F_REQUEST | NLM_F_ACK, ret;
+	char buf[MNL_SOCKET_BUFFER_SIZE];
+	int lqpn = 0, cntn = 0;
+	unsigned int portid;
+	uint32_t seq;
+
+	ret = rd_build_filter(rd, stat_valid_filters);
+	if (ret)
+		return ret;
+
+	cntn = stat_get_arg(rd, "cntn");
+	if (rd_argc(rd)) {
+		lqpn = stat_get_arg(rd, "lqpn");
+		return do_stat_qp_unbind_lqpn(rd, cntn, lqpn);
+	}
+
+	rd_prepare_msg(rd, RDMA_NLDEV_CMD_STAT_GET, &seq, flags);
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_DEV_INDEX, rd->dev_idx);
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_PORT_INDEX, rd->port_idx);
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_STAT_RES, RDMA_NLDEV_ATTR_RES_QP);
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_STAT_COUNTER_ID, cntn);
+	ret = rd_send_msg(rd);
+	if (ret)
+		return ret;
+
+
+	/* Can't use rd_recv_msg() since the callback also calls it (recursively),
+	 * then rd_recv_msg() always return -1 here
+	 */
+	portid = mnl_socket_get_portid(rd->nl);
+	ret = mnl_socket_recvfrom(rd->nl, buf, sizeof(buf));
+	if (ret <= 0)
+		return ret;
+
+	ret = mnl_cb_run(buf, ret, seq, portid, stat_get_counter_parse_cb, rd);
+	mnl_socket_close(rd->nl);
+	if (ret != MNL_CB_OK)
+		return ret;
+
+	return 0;
+}
+
+static int stat_qp_bind_link(struct rd *rd)
+{
+	return rd_exec_link(rd, stat_one_qp_bind, true);
+}
+
+static int stat_qp_bind(struct rd *rd)
+{
+	const struct rd_cmd cmds[] = {
+		{ NULL,		stat_help },
+		{ "link",	stat_qp_bind_link },
+		{ "help",	stat_help },
+		{ 0 },
+	};
+
+	return rd_exec_cmd(rd, cmds, "parameter");
+}
+
+static int stat_qp_unbind_link(struct rd *rd)
+{
+	return rd_exec_link(rd, stat_one_qp_unbind, true);
+}
+
+static int stat_qp_unbind(struct rd *rd)
+{
+	const struct rd_cmd cmds[] = {
+		{ NULL,		stat_help },
+		{ "link",	stat_qp_unbind_link },
+		{ "help",	stat_help },
+		{ 0 },
+	};
+
+	return rd_exec_cmd(rd, cmds, "parameter");
+}
+
 static int stat_qp(struct rd *rd)
 {
 	const struct rd_cmd cmds[] =  {
@@ -475,6 +665,8 @@ static int stat_qp(struct rd *rd)
 		{ "list",	stat_qp_show },
 		{ "mode",	stat_qp_get_mode },
 		{ "set",	stat_qp_set },
+		{ "bind",	stat_qp_bind },
+		{ "unbind",	stat_qp_unbind },
 		{ "help",	stat_help },
 		{ 0 }
 	};
-- 
2.20.1


^ permalink raw reply related

* [PATCH iproute2-rc v1 7/7] rdma: Document counter statistic
From: Leon Romanovsky @ 2019-07-17 14:31 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Leon Romanovsky, netdev, David Ahern, Mark Zhang,
	RDMA mailing list
In-Reply-To: <20190717143157.27205-1-leon@kernel.org>

From: Mark Zhang <markz@mellanox.com>

Add document of accessing the QP counter, including bind/unbind a QP
to a counter manually or automatically, and dump counter statistics.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 man/man8/rdma-dev.8       |   1 +
 man/man8/rdma-link.8      |   1 +
 man/man8/rdma-resource.8  |   1 +
 man/man8/rdma-statistic.8 | 167 ++++++++++++++++++++++++++++++++++++++
 man/man8/rdma.8           |   7 +-
 5 files changed, 176 insertions(+), 1 deletion(-)
 create mode 100644 man/man8/rdma-statistic.8

diff --git a/man/man8/rdma-dev.8 b/man/man8/rdma-dev.8
index 38e34b3b..e77e7cd0 100644
--- a/man/man8/rdma-dev.8
+++ b/man/man8/rdma-dev.8
@@ -77,6 +77,7 @@ previously created using iproute2 ip command.
 .BR rdma-link (8),
 .BR rdma-resource (8),
 .BR rdma-system (8),
+.BR rdma-statistic (8),
 .br
 
 .SH AUTHOR
diff --git a/man/man8/rdma-link.8 b/man/man8/rdma-link.8
index b3b40de7..32f80228 100644
--- a/man/man8/rdma-link.8
+++ b/man/man8/rdma-link.8
@@ -97,6 +97,7 @@ Removes RXE link rxe_eth0
 .BR rdma (8),
 .BR rdma-dev (8),
 .BR rdma-resource (8),
+.BR rdma-statistic (8),
 .br
 
 .SH AUTHOR
diff --git a/man/man8/rdma-resource.8 b/man/man8/rdma-resource.8
index 40b073db..05030d0a 100644
--- a/man/man8/rdma-resource.8
+++ b/man/man8/rdma-resource.8
@@ -103,6 +103,7 @@ Show CQs belonging to pid 30489
 .BR rdma (8),
 .BR rdma-dev (8),
 .BR rdma-link (8),
+.BR rdma-statistic (8),
 .br
 
 .SH AUTHOR
diff --git a/man/man8/rdma-statistic.8 b/man/man8/rdma-statistic.8
new file mode 100644
index 00000000..dea6ff24
--- /dev/null
+++ b/man/man8/rdma-statistic.8
@@ -0,0 +1,167 @@
+.TH RDMA\-STATISTIC 8 "17 Mar 2019" "iproute2" "Linux"
+.SH NAME
+rdma-statistic \- RDMA statistic counter configuration
+.SH SYNOPSIS
+.sp
+.ad l
+.in +8
+.ti -8
+.B rdma
+.RI "[ " OPTIONS " ]"
+.B statistic
+.RI  " { " COMMAND " | "
+.BR help " }"
+.sp
+
+.ti -8
+.B rdma statistic
+.RI "[ " OBJECT " ]"
+.B show
+
+.ti -8
+.B rdma statistic
+.RI "[ " OBJECT " ]"
+.B show link
+.RI "[ " DEV/PORT_INDX " ]"
+
+.ti -8
+.B rdma statistic
+.IR OBJECT
+.B mode
+
+.ti -8
+.B rdma statistic
+.IR OBJECT
+.B set
+.IR COUNTER_SCOPE
+.RI "[ " DEV/PORT_INDEX "]"
+.B auto
+.RI "{ " CRITERIA " | "
+.BR off " }"
+
+.ti -8
+.B rdma statistic
+.IR OBJECT
+.B bind
+.IR COUNTER_SCOPE
+.RI "[ " DEV/PORT_INDEX "]"
+.RI "[ " OBJECT-ID " ]"
+.RI "[ " COUNTER-ID " ]"
+
+.ti -8
+.B rdma statistic
+.IR OBJECT
+.B unbind
+.IR COUNTER_SCOPE
+.RI "[ " DEV/PORT_INDEX "]"
+.RI "[ " COUNTER-ID " ]"
+.RI "[ " OBJECT-ID " ]"
+
+.ti -8
+.IR COUNTER_SCOPE " := "
+.RB "{ " link " | " dev " }"
+
+.ti -8
+.IR OBJECT " := "
+.RB "{ " qp " }"
+
+.ti -8
+.IR CRITERIA " := "
+.RB "{ " type " }"
+
+.SH "DESCRIPTION"
+.SS rdma statistic [object] show - Queries the specified RDMA device for RDMA and driver-specific statistics. Show the default hw counters if object is not specified
+
+.PP
+.I "DEV"
+- specifies counters on this RDMA device to show.
+
+.I "PORT_INDEX"
+- specifies counters on this RDMA port to show.
+
+.SS rdma statistic <object> set - configure counter statistic auto-mode for a specific device/port
+In auto mode all objects belong to one category are bind automatically to a single counter set.
+
+.SS rdma statistic <object> bind - manually bind an object (e.g., a qp) with a counter
+When bound the statistics of this object are available in this counter.
+
+.SS rdma statistic <object> unbind - manually unbind an object (e.g., a qp) from the counter previously bound
+When unbound the statistics of this object are no longer available in this counter; And if object id is not specified then all objects on this counter will be unbound.
+
+.I "COUNTER-ID"
+- specifies the id of the counter to be bound.
+If this argument is omitted then a new counter will be allocated.
+
+.SH "EXAMPLES"
+.PP
+rdma statistic show
+.RS 4
+Shows the state of the default counter of all RDMA devices on the system.
+.RE
+.PP
+rdma statistic show link mlx5_2/1
+.RS 4
+Shows the state of the default counter of specified RDMA port
+.RE
+.PP
+rdma statistic qp show
+.RS 4
+Shows the state of all qp counters of all RDMA devices on the system.
+.RE
+.PP
+rdma statistic qp show link mlx5_2/1
+.RS 4
+Shows the state of all qp counters of specified RDMA port.
+.RE
+.PP
+rdma statistic qp show link mlx5_2 pid 30489
+.RS 4
+Shows the state of all qp counters of specified RDMA port and belonging to pid 30489
+.RE
+.PP
+rdma statistic qp mode
+.RS 4
+List current counter mode on all devices
+.RE
+.PP
+rdma statistic qp mode link mlx5_2/1
+.RS 4
+List current counter mode of device mlx5_2 port 1
+.RE
+.PP
+rdma statistic qp set link mlx5_2/1 auto type on
+.RS 4
+On device mlx5_2 port 1, for each new QP bind it with a counter automatically. Per counter for QPs with same qp type in each process. Currently only "type" is supported.
+.RE
+.PP
+rdma statistic qp set link mlx5_2/1 auto off
+.RS 4
+Turn-off auto mode on device mlx5_2 port 1. The allocated counters can be manually accessed.
+.RE
+.PP
+rdma statistic qp bind link mlx5_2/1 lqpn 178
+.RS 4
+On device mlx5_2 port 1, allocate a counter and bind the specified qp on it
+.RE
+.PP
+rdma statistic qp unbind link mlx5_2/1 cntn 4 lqpn 178
+.RS 4
+On device mlx5_2 port 1, bind the specified qp on the specified counter
+.RE
+.PP
+rdma statistic qp unbind link mlx5_2/1 cntn 4
+.RS 4
+On device mlx5_2 port 1, unbind all QPs on the specified counter. After that this counter will be released automatically by the kernel.
+
+.RE
+.PP
+
+.SH SEE ALSO
+.BR rdma (8),
+.BR rdma-dev (8),
+.BR rdma-link (8),
+.BR rdma-resource (8),
+.br
+
+.SH AUTHOR
+Mark Zhang <markz@mellanox.com>
diff --git a/man/man8/rdma.8 b/man/man8/rdma.8
index 3ae33987..ef29b1c6 100644
--- a/man/man8/rdma.8
+++ b/man/man8/rdma.8
@@ -19,7 +19,7 @@ rdma \- RDMA tool
 
 .ti -8
 .IR OBJECT " := { "
-.BR dev " | " link " | " system " }"
+.BR dev " | " link " | " system " | " statistic " }"
 .sp
 
 .ti -8
@@ -74,6 +74,10 @@ Generate JSON output.
 .B sys
 - RDMA subsystem related.
 
+.TP
+.B statistic
+- RDMA counter statistic related.
+
 .PP
 The names of all objects may be written in full or
 abbreviated form, for example
@@ -112,6 +116,7 @@ Exit status is 0 if command was successful or a positive integer upon failure.
 .BR rdma-link (8),
 .BR rdma-resource (8),
 .BR rdma-system (8),
+.BR rdma-statistic (8),
 .br
 
 .SH REPORTING BUGS
-- 
2.20.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox