From: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: dumping queue state
Date: Wed, 20 Dec 2017 18:36:55 +0200 [thread overview]
Message-ID: <20171220163655.GW2942@mtr-leonro.local> (raw)
In-Reply-To: <00a701d379aa$b098ee10$11caca30$@opengridcomputing.com>
[-- Attachment #1: Type: text/plain, Size: 4308 bytes --]
On Wed, Dec 20, 2017 at 09:53:33AM -0600, Steve Wise wrote:
> Hey,
>
> I have a need to provide tools for customers to gather runtime state for an
> rdma device. Say, when an application is stuck waiting for some completion
> or other rdma event. This includes hw/fw state of course, and equally as
> important, rdma object sw state. Is debugfs the correct way to export this
> sw state? The data is quite large potentially; each QP, its structures, the
> dma queue memory, etc. Ditto for CQs. Also MR state, etc etc. It seems
> that would be overloading debugfs to me. Currently the hw/fw state is being
> gathered via ethtool dump commands (--get-dump, --register-dump,
> --eeprom-dump). I am considering using the ethtool --get-dump method for
> the low level driver to also include dumping the rdma queue state for the
> device. Is that a reasonable approach?
In this cycle, I'm going to submit RDMA resource tracking feature, which
will give an infrastructure to do it in RDMA:
The kernel code, it is based on RCU and not final, because my call to
synchronize_rcu is not effective.
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=topic/restrack-rcu
There is supplementary part in RDMAtool, which presents global
information, QP information with options to filter.
It is initial stage.
+ /mnt/iproute2/rdma/rdma res
1: mlx5_0: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144
2: mlx5_1: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144
3: mlx5_2: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144
4: mlx5_3: curr/max: pd 2/16777216 cq 3/16777216 qp 2/262144
5: mlx5_4: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144
+ /mnt/iproute2/rdma/rdma res show mlx5_4
5: mlx5_4: curr/max: pd 3/16777216 cq 5/16777216 qp 4/262144
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4
DEV/PORT LQPN TYPE STATE PID COMM
mlx5_4/- 8 UD RESET 0 [ipoib-verbs]
mlx5_4/1 7 UD RTS 0 [mlx5-gsi]
mlx5_4/1 1 GSI RTS 0 [rdma-mad]
mlx5_4/1 0 SMI RTS 0 [rdma-mad]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/
DEV/PORT LQPN TYPE STATE PID COMM
mlx5_4/- 8 UD RESET 0 [ipoib-verbs]
mlx5_4/1 7 UD RTS 0 [mlx5-gsi]
mlx5_4/1 1 GSI RTS 0 [rdma-mad]
mlx5_4/1 0 SMI RTS 0 [rdma-mad]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/0
Wrong device name
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/1
DEV/PORT LQPN TYPE STATE PID COMM
mlx5_4/1 7 UD RTS 0 [mlx5-gsi]
mlx5_4/1 1 GSI RTS 0 [rdma-mad]
mlx5_4/1 0 SMI RTS 0 [rdma-mad]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/-
DEV/PORT LQPN TYPE STATE PID COMM
mlx5_4/- 8 UD RESET 0 [ipoib-verbs]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/- -d
DEV/PORT LQPN RQPN TYPE STATE PID COMM SQ-PSN RQ-PSN PATH-MIG
mlx5_4/- 8 --- UD RESET 0 [ipoib-verbs] 0 --- ---
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/1 display pid,lqpn,comm
DEV/PORT LQPN PID COMM
mlx5_4/1 7 0 [mlx5-gsi]
mlx5_4/1 1 0 [rdma-mad]
mlx5_4/1 0 0 [rdma-mad]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/1 display pid,lqpn,comm -d
DEV/PORT LQPN PID COMM
mlx5_4/1 7 0 [mlx5-gsi]
mlx5_4/1 1 0 [rdma-mad]
mlx5_4/1 0 0 [rdma-mad]
+ /mnt/iproute2/rdma/rdma res show qp link mlx5_4/1 display pid,lqpn,comm pid 0-2000
DEV/PORT LQPN PID COMM
mlx5_4/1 7 0 [mlx5-gsi]
mlx5_4/1 1 0 [rdma-mad]
mlx5_4/1 0 0 [rdma-mad]
>
> Any thoughts/suggestions?
Care to try?
>
> Thanks in advance,
>
> Steve.
>
>
> ---
> This email has been checked for viruses by AVG.
> http://www.avg.com
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2017-12-20 16:36 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-20 15:53 dumping queue state Steve Wise
2017-12-20 16:14 ` Bart Van Assche
[not found] ` <1513786487.2603.4.camel-Sjgp3cTcYWE@public.gmane.org>
2017-12-20 16:59 ` Steve Wise
2017-12-20 16:36 ` Leon Romanovsky [this message]
[not found] ` <20171220163655.GW2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-12-20 17:00 ` Steve Wise
2017-12-20 17:11 ` Leon Romanovsky
[not found] ` <20171220171146.GX2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-12-20 17:42 ` Steve Wise
2017-12-20 18:05 ` Leon Romanovsky
[not found] ` <20171220180549.GY2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-12-20 19:01 ` Steve Wise
2017-12-20 20:01 ` Leon Romanovsky
[not found] ` <20171220200115.GC2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-12-20 21:42 ` Steve Wise
2017-12-21 5:15 ` Leon Romanovsky
2017-12-20 19:31 ` Steve Wise
2017-12-20 19:58 ` Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171220163655.GW2942@mtr-leonro.local \
--to=leon-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.