From: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: dumping queue state
Date: Wed, 20 Dec 2017 18:36:55 +0200 [thread overview]
Message-ID: <20171220163655.GW2942@mtr-leonro.local> (raw)
In-Reply-To: <00a701d379aa$b098ee10$11caca30$@opengridcomputing.com>
[-- Attachment #1: Type: text/plain, Size: 4308 bytes --]
On Wed, Dec 20, 2017 at 09:53:33AM -0600, Steve Wise wrote:
> Hey,
>
> I have a need to provide tools for customers to gather runtime state for an
> rdma device. Say, when an application is stuck waiting for some completion
> or other rdma event. This includes hw/fw state of course, and equally as
> important, rdma object sw state. Is debugfs the correct way to export this
> sw state? The data is quite large potentially; each QP, its structures, the
> dma queue memory, etc. Ditto for CQs. Also MR state, etc etc. It seems
> that would be overloading debugfs to me. Currently the hw/fw state is being
> gathered via ethtool dump commands (--get-dump, --register-dump,
> --eeprom-dump). I am considering using the ethtool --get-dump method for
> the low level driver to also include dumping the rdma queue state for the
> device. Is that a reasonable approach?
In this cycle, I'm going to submit RDMA resource tracking feature, which
will give an infrastructure to do it in RDMA:
The kernel code, it is based on RCU and not final, because my call to
synchronize_rcu is not effective.
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=topic/restrack-rcu
There is supplementary part in RDMAtool, which presents global
information, QP information with options to filter.
It is initial stage.
+ /mnt/iproute2/rdma/rdma res
1: mlx5_0: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144
2: mlx5_1: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144
3: mlx5_2: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144
4: mlx5_3: curr/max: pd 2/16777216 cq 3/16777216 qp 2/262144
5: mlx5_4: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144
+ /mnt/iproute2/rdma/rdma res show mlx5_4
5: mlx5_4: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4
DEV/PORT LQPN TYPE STATE PID COMM
mlx5_4/- 8 UD RESET 0 [ipoib-verbs]
mlx5_4/1 7 UD RTS 0 [mlx5-gsi]
mlx5_4/1 1 GSI RTS 0 [rdma-mad]
mlx5_4/1 0 SMI RTS 0 [rdma-mad]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/
DEV/PORT LQPN TYPE STATE PID COMM
mlx5_4/- 8 UD RESET 0 [ipoib-verbs]
mlx5_4/1 7 UD RTS 0 [mlx5-gsi]
mlx5_4/1 1 GSI RTS 0 [rdma-mad]
mlx5_4/1 0 SMI RTS 0 [rdma-mad]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/0
Wrong device name
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/1
DEV/PORT LQPN TYPE STATE PID COMM
mlx5_4/1 7 UD RTS 0 [mlx5-gsi]
mlx5_4/1 1 GSI RTS 0 [rdma-mad]
mlx5_4/1 0 SMI RTS 0 [rdma-mad]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/-
DEV/PORT LQPN TYPE STATE PID COMM
mlx5_4/- 8 UD RESET 0 [ipoib-verbs]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/- -d
DEV/PORT LQPN RQPN TYPE STATE PID COMM SQ-PSN RQ-PSN PATH-MIG
mlx5_4/- 8 --- UD RESET 0 [ipoib-verbs] 0 --- ---
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/1 display pid,lqpn,comm
DEV/PORT LQPN PID COMM
mlx5_4/1 7 0 [mlx5-gsi]
mlx5_4/1 1 0 [rdma-mad]
mlx5_4/1 0 0 [rdma-mad]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/1 display pid,lqpn,comm -d
DEV/PORT LQPN PID COMM
mlx5_4/1 7 0 [mlx5-gsi]
mlx5_4/1 1 0 [rdma-mad]
mlx5_4/1 0 0 [rdma-mad]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/1 display pid,lqpn,comm pid 0-2000
DEV/PORT LQPN PID COMM
mlx5_4/1 7 0 [mlx5-gsi]
mlx5_4/1 1 0 [rdma-mad]
mlx5_4/1 0 0 [rdma-mad]
>
> Any thoughts/suggestions?
Care to try?
>
> Thanks in advance,
>
> Steve.
>
>
> ---
> This email has been checked for viruses by AVG.
> http://www.avg.com
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2017-12-20 16:36 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-20 15:53 dumping queue state Steve Wise
2017-12-20 16:14 ` Bart Van Assche
[not found] ` <1513786487.2603.4.camel-Sjgp3cTcYWE@public.gmane.org>
2017-12-20 16:59 ` Steve Wise
2017-12-20 16:36 ` Leon Romanovsky [this message]
[not found] ` <20171220163655.GW2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-12-20 17:00 ` Steve Wise
2017-12-20 17:11 ` Leon Romanovsky
[not found] ` <20171220171146.GX2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-12-20 17:42 ` Steve Wise
2017-12-20 18:05 ` Leon Romanovsky
[not found] ` <20171220180549.GY2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-12-20 19:01 ` Steve Wise
2017-12-20 20:01 ` Leon Romanovsky
[not found] ` <20171220200115.GC2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-12-20 21:42 ` Steve Wise
2017-12-21 5:15 ` Leon Romanovsky
2017-12-20 19:31 ` Steve Wise
2017-12-20 19:58 ` Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171220163655.GW2942@mtr-leonro.local \
--to=leon-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox