Linux RDMA and InfiniBand development

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

* Re: [PATCH 3/5] mm/notifier: add migration invalidation type
From: Jason Gunthorpe @ 2020-07-10 19:39 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao
In-Reply-To: <20200706222347.32290-4-rcampbell@nvidia.com>

On Mon, Jul 06, 2020 at 03:23:45PM -0700, Ralph Campbell wrote:
> Currently migrate_vma_setup() calls mmu_notifier_invalidate_range_start()
> which flushes all device private page mappings whether or not a page
> is being migrated to/from device private memory. In order to not disrupt
> device mappings that are not being migrated, shift the responsibility
> for clearing device private mappings to the device driver and leave
> CPU page table unmapping handled by migrate_vma_setup(). To support
> this, the caller of migrate_vma_setup() should always set struct
> migrate_vma::src_owner to a non NULL value that matches the device
> private page->pgmap->owner. This value is then passed to the struct
> mmu_notifier_range with a new event type which the driver's invalidation
> function can use to avoid device MMU invalidations.
> 
> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
>  include/linux/mmu_notifier.h | 7 +++++++
>  mm/migrate.c                 | 8 +++++++-
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index fc68f3570e19..bd0b34dbe4de 100644
> +++ b/include/linux/mmu_notifier.h
> @@ -38,6 +38,10 @@ struct mmu_interval_notifier;
>   *
>   * @MMU_NOTIFY_RELEASE: used during mmu_interval_notifier invalidate to signal
>   * that the mm refcount is zero and the range is no longer accessible.
> + *
> + * @MMU_NOTIFY_MIGRATE: used during migrate_vma_collect() invalidate to signal
> + * a device driver to possibly ignore the invalidation if the src_own
> + * field matches.
>   */
>  enum mmu_notifier_event {
>  	MMU_NOTIFY_UNMAP = 0,
> @@ -46,6 +50,7 @@ enum mmu_notifier_event {
>  	MMU_NOTIFY_PROTECTION_PAGE,
>  	MMU_NOTIFY_SOFT_DIRTY,
>  	MMU_NOTIFY_RELEASE,
> +	MMU_NOTIFY_MIGRATE,
>  };
>  
>  #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
> @@ -264,6 +269,7 @@ struct mmu_notifier_range {
>  	unsigned long end;
>  	unsigned flags;
>  	enum mmu_notifier_event event;
> +	void *data;
>  };

This generic member usually ends up a bit ugly, can we do a tagged
union instead?

union
{
     void *migrate_pgmap_owner;
};

and probably drop the union until we actually need two things here.

Jason

^ permalink raw reply

* Re: [PATCH for-next] RDMA/mlx5: Remove unused to_mibmr function
From: Jason Gunthorpe @ 2020-07-10 19:40 UTC (permalink / raw)
  To: Gal Pressman; +Cc: Doug Ledford, Leon Romanovsky, linux-rdma
In-Reply-To: <20200705141143.47303-1-galpress@amazon.com>

On Sun, Jul 05, 2020 at 05:11:43PM +0300, Gal Pressman wrote:
> The to_mibmr function is unused, remove it.
> 
> Signed-off-by: Gal Pressman <galpress@amazon.com>
> Acked-by: Leon Romanovsky <leonro@mellanox.com>
> ---
>  drivers/infiniband/hw/mlx5/mlx5_ib.h | 5 -----
>  1 file changed, 5 deletions(-)

Applied to for-next, thanks

Jason

^ permalink raw reply

* Re: [PATCH v3 0/5] mm/hmm/nouveau: add PMD system memory mapping
From: Jason Gunthorpe @ 2020-07-10 19:27 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: linux-rdma, linux-mm, nouveau, linux-kselftest, linux-kernel,
	Jerome Glisse, John Hubbard, Christoph Hellwig, Andrew Morton,
	Shuah Khan, Ben Skeggs
In-Reply-To: <20200701225352.9649-1-rcampbell@nvidia.com>

On Wed, Jul 01, 2020 at 03:53:47PM -0700, Ralph Campbell wrote:
> The goal for this series is to introduce the hmm_pfn_to_map_order()
> function. This allows a device driver to know that a given 4K PFN is
> actually mapped by the CPU using a larger sized CPU page table entry and
> therefore the device driver can safely map system memory using larger
> device MMU PTEs.
> The series is based on 5.8.0-rc3 and is intended for Jason Gunthorpe's
> hmm tree. These were originally part of a larger series:
> https://lore.kernel.org/linux-mm/20200619215649.32297-1-rcampbell@nvidia.com/
> 
> Changes in v3:
> Replaced the HMM_PFN_P[MU]D flags with hmm_pfn_to_map_order() to
> indicate the size of the CPU mapping.
> 
> Changes in v2:
> Make the hmm_range_fault() API changes into a separate series and add
>   two output flags for PMD/PUD instead of a single compund page flag as
>   suggested by Jason Gunthorpe.
> Make the nouveau page table changes a separate patch as suggested by
>   Ben Skeggs.
> Only add support for 2MB nouveau mappings initially since changing the
> 1:1 CPU/GPU page table size assumptions requires a bigger set of changes.
> Rebase to 5.8.0-rc3.
> 
> Ralph Campbell (5):
>   nouveau/hmm: fault one page at a time
>   mm/hmm: add hmm_mapping order
>   nouveau: fix mapping 2MB sysmem pages
>   nouveau/hmm: support mapping large sysmem pages
>   hmm: add tests for HMM_PFN_PMD flag

Applied to hmm.git. 

I edited the comment for hmm_pfn_to_map_order() and added a function
to compute the field.

Thanks,
Jason

^ permalink raw reply

* Re: [PATCH] SCSI RDMA PROTOCOL (SRP) TARGET: Replace HTTP links with HTTPS ones
From: Alexander A. Klimov @ 2020-07-10 18:12 UTC (permalink / raw)
  To: Bart Van Assche, dledford, jgg, linux-rdma, target-devel,
	linux-kernel
In-Reply-To: <3d230abd-752e-8ac1-e18d-b64561b409ff@acm.org>



Am 10.07.20 um 16:22 schrieb Bart Van Assche:
> On 2020-07-09 12:48, Alexander A. Klimov wrote:
>> diff --git a/drivers/infiniband/ulp/srpt/Kconfig b/drivers/infiniband/ulp/srpt/Kconfig
>> index 4b5d9b792cfa..f63b34d9ae32 100644
>> --- a/drivers/infiniband/ulp/srpt/Kconfig
>> +++ b/drivers/infiniband/ulp/srpt/Kconfig
>> @@ -10,4 +10,4 @@ config INFINIBAND_SRPT
>>   	  that supports the RDMA protocol. Currently the RDMA protocol is
>>   	  supported by InfiniBand and by iWarp network hardware. More
>>   	  information about the SRP protocol can be found on the website
>> -	  of the INCITS T10 technical committee (http://www.t10.org/).
>> +	  of the INCITS T10 technical committee (https://www.t10.org/).
> 
> It is not clear to me how modifying an URL in a Kconfig file helps to
> reduce the attack surface on kernel devs?
Not on all, just on the ones who open it.

> 
> Thanks,
> 
> Bart.
> 
> 

^ permalink raw reply

* [GIT PULL] Please pull RDMA subsystem changes
From: Jason Gunthorpe @ 2020-07-10 17:58 UTC (permalink / raw)
  To: Linus Torvalds, Doug Ledford; +Cc: linux-rdma, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2118 bytes --]

Hi Linus,

Second rc pull request

Fairly normal here, things are actually somewhat quiet when look at
the next release.

Thanks,
Jason

The following changes since commit 9ebcfadb0610322ac537dd7aa5d9cbc2b2894c68:

  Linux 5.8-rc3 (2020-06-28 15:00:24 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git tags/for-linus

for you to fetch changes up to 0a03715068794e4b524f66ebbf412ab1f2933f3f:

  RDMA/mlx5: Set PD pointers for the error flow unwind (2020-07-08 20:15:59 -0300)

----------------------------------------------------------------
RDMA second 5.8 rc pull request

Small update, a few more merge window bugs and normal driver bug fixes:

- Two merge window regressions in mlx5: a error path bug found by
  syzkaller and some lost code during a rework preventing ipoib from
  working in some configurations

- Silence clang compilation warning in OPA related code

- Fix a long standing race condition in ib_nl for ACM

- Resolve when the HFI1 is shutdown

----------------------------------------------------------------
Aya Levin (1):
      IB/mlx5: Fix 50G per lane indication

Divya Indi (1):
      IB/sa: Resolv use-after-free in ib_nl_make_request()

Kaike Wan (2):
      IB/hfi1: Do not destroy hfi1_wq when the device is shut down
      IB/hfi1: Do not destroy link_wq when the device is shut down

Kamal Heib (1):
      RDMA/siw: Fix reporting vendor_part_id

Leon Romanovsky (2):
      RDMA/mlx5: Fix legacy IPoIB QP initialization
      RDMA/mlx5: Set PD pointers for the error flow unwind

Nathan Chancellor (1):
      IB/hfi1: Add explicit cast OPA_MTU_8192 to 'enum ib_mtu'

 drivers/infiniband/core/sa_query.c    | 38 ++++++++++++++++-------------------
 drivers/infiniband/hw/hfi1/init.c     | 37 +++++++++++++++++++++++++---------
 drivers/infiniband/hw/hfi1/qp.c       |  7 +++++--
 drivers/infiniband/hw/hfi1/tid_rdma.c |  5 ++++-
 drivers/infiniband/hw/mlx5/main.c     |  2 +-
 drivers/infiniband/hw/mlx5/qp.c       |  7 ++++++-
 drivers/infiniband/sw/siw/siw_main.c  |  3 ++-
 7 files changed, 63 insertions(+), 36 deletions(-)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: mmotm 2020-07-09-21-00 uploaded (drivers/net/ethernet/mellanox/mlx5/core/en_main.c)
From: Randy Dunlap @ 2020-07-10 17:40 UTC (permalink / raw)
  To: Andrew Morton, broonie, linux-fsdevel, linux-kernel, linux-mm,
	linux-next, mhocko, mm-commits, sfr, netdev@vger.kernel.org,
	Saeed Mahameed, Leon Romanovsky, linux-rdma@vger.kernel.org
In-Reply-To: <20200710040047.md-jEb0TK%akpm@linux-foundation.org>

On 7/9/20 9:00 PM, Andrew Morton wrote:
> The mm-of-the-moment snapshot 2020-07-09-21-00 has been uploaded to
> 
>    http://www.ozlabs.org/~akpm/mmotm/
> 
> mmotm-readme.txt says
> 
> README for mm-of-the-moment:
> 
> http://www.ozlabs.org/~akpm/mmotm/
> 
> This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
> more than once a week.
> 
> You will need quilt to apply these patches to the latest Linus release (5.x
> or 5.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
> http://ozlabs.org/~akpm/mmotm/series
> 

on i386:

In file included from ../drivers/net/ethernet/mellanox/mlx5/core/en_main.c:49:0:
../drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h: In function ‘mlx5e_accel_sk_get_rxq’:
../drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h:153:12: error: implicit declaration of function ‘sk_rx_queue_get’; did you mean ‘sk_rx_queue_set’? [-Werror=implicit-function-declaration]
  int rxq = sk_rx_queue_get(sk);
            ^~~~~~~~~~~~~~~
            sk_rx_queue_set



-- 
~Randy
Reported-by: Randy Dunlap <rdunlap@infradead.org>

^ permalink raw reply

* Re: [PATCH RFC 0/3] IB CM tracepoints
From: Chuck Lever @ 2020-07-10 16:32 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-rdma, aron.silverton
In-Reply-To: <20200710151737.GZ25301@ziepe.ca>



> On Jul 10, 2020, at 11:17 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Fri, Jul 10, 2020 at 10:06:01AM -0400, Chuck Lever wrote:
>> Hi-
>> 
>> This is a Request For Comments.
>> 
>> Oracle has an interest in a common observability infrastructure in
>> the RDMA core and ULPs. One alternative for this infrastructure is
>> to introduce static tracepoints that can also be used as hooks for
>> eBPF scripts, replacing infrastructure that is based on printk.
> 
> Don't we already have tracepoints in CM, why is adding more RFC?

One of these patches _replaces_ printk calls with tracepoints.
Is that OK?

--
Chuck Lever




^ permalink raw reply

* Re: [PATCH RFC 0/3] IB CM tracepoints
From: Jason Gunthorpe @ 2020-07-10 15:17 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, aron.silverton
In-Reply-To: <20200710135812.14749.4630.stgit@klimt.1015granger.net>

On Fri, Jul 10, 2020 at 10:06:01AM -0400, Chuck Lever wrote:
> Hi-
> 
> This is a Request For Comments.
> 
> Oracle has an interest in a common observability infrastructure in
> the RDMA core and ULPs. One alternative for this infrastructure is
> to introduce static tracepoints that can also be used as hooks for
> eBPF scripts, replacing infrastructure that is based on printk.

Don't we already have tracepoints in CM, why is adding more RFC?

Jason

^ permalink raw reply

* Re: [PATCH 02/25] dma-fence: prime lockdep annotations
From: Jason Gunthorpe @ 2020-07-10 14:23 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Christian König, DRI Development, Intel Graphics Development,
	linux-rdma, Felix Kuehling, kernel test robot,
	Thomas Hellström, Mika Kuoppala,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, amd-gfx list,
	Chris Wilson, Maarten Lankhorst, Daniel Vetter
In-Reply-To: <CAKMK7uGSUULTmL=bDXty6U4e37dzLHzu7wgUyOxo2CvR9KvXGg@mail.gmail.com>

On Fri, Jul 10, 2020 at 04:02:35PM +0200, Daniel Vetter wrote:

> > dma_fence only possibly makes some sense if you intend to expose the
> > completion outside a single driver.
> >
> > The prefered kernel design pattern for this is to connect things with
> > a function callback.
> >
> > So the actual use case of dma_fence is quite narrow and tightly linked
> > to DRM.
> >
> > I don't think we should spread this beyond DRM, I can't see a reason.
> 
> Yeah v4l has a legit reason to use dma_fence, android wants that
> there. 

'legit' in the sense the v4l is supposed to trigger stuff in DRM when
V4L DMA completes? I would still see that as part of DRM

Or is it building a parallel DRM like DMA completion graph?

> > Trying to improve performance of limited HW by using sketchy
> > techniques at the cost of general system stability should be a NAK.
>
> Well that's pretty much gpu drivers, all the horrors for a bit more speed :-)
> 
> On the text itself, should I upgrade to "must not" instead of "should
> not"? Or more needed?

Fundamentally having some unknowable graph of dependencies where parts
of the graph can be placed in critical regions like notifiers is a
complete maintenance nightmare.

I think building systems like this should be discouraged :\

Jason

^ permalink raw reply

* Re: [PATCH] SCSI RDMA PROTOCOL (SRP) TARGET: Replace HTTP links with HTTPS ones
From: Bart Van Assche @ 2020-07-10 14:22 UTC (permalink / raw)
  To: Alexander A. Klimov, dledford, jgg, linux-rdma, target-devel,
	linux-kernel
In-Reply-To: <20200709194820.27032-1-grandmaster@al2klimov.de>

On 2020-07-09 12:48, Alexander A. Klimov wrote:
> diff --git a/drivers/infiniband/ulp/srpt/Kconfig b/drivers/infiniband/ulp/srpt/Kconfig
> index 4b5d9b792cfa..f63b34d9ae32 100644
> --- a/drivers/infiniband/ulp/srpt/Kconfig
> +++ b/drivers/infiniband/ulp/srpt/Kconfig
> @@ -10,4 +10,4 @@ config INFINIBAND_SRPT
>  	  that supports the RDMA protocol. Currently the RDMA protocol is
>  	  supported by InfiniBand and by iWarp network hardware. More
>  	  information about the SRP protocol can be found on the website
> -	  of the INCITS T10 technical committee (http://www.t10.org/).
> +	  of the INCITS T10 technical committee (https://www.t10.org/).

It is not clear to me how modifying an URL in a Kconfig file helps to
reduce the attack surface on kernel devs?

Thanks,

Bart.



^ permalink raw reply

* Re: [PATCH RFC 2/3] RDMA/cm: Replace pr_debug() call sites with tracepoints
From: Chuck Lever @ 2020-07-10 14:10 UTC (permalink / raw)
  To: linux-rdma; +Cc: aron.silverton
In-Reply-To: <20200710140612.14749.47766.stgit@klimt.1015granger.net>



> On Jul 10, 2020, at 10:06 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> In the interest of converging on a common instrumentation
> infrastructure, modernize the pr_debug() call sites added by commit
> 119bf81793ea ("IB/cm: Add debug prints to ib_cm"). The new
> tracepoints appear in a new "ib_cma" subsystem.
> 
> The conversion is somewhat mechanical. Someone more familiar with
> the semantics of the recorded information might suggest additional
> data capture.
> 
> Some benefits include:
> - Tracepoints enable "always on" reporting of these errors
> - The error records are structured and compact
> - Tracepoints provide hooks for eBPF scripts
> 
> Sample output:
> 
>            nfsd-1954  [003]    62.017901: cm_send_dreq_err:     local_id=1998890974 remote_id=1129750393 state=DREQ_RCVD lap_state=LAP_UNINIT

Oops. I renamed this trace point cm_dreq_skipped and neglected
to update the patch description.


> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> drivers/infiniband/core/Makefile   |    2 
> drivers/infiniband/core/cm.c       |   80 +++------
> drivers/infiniband/core/cm_trace.c |   15 ++
> drivers/infiniband/core/cm_trace.h |  309 ++++++++++++++++++++++++++++++++++++
> 4 files changed, 351 insertions(+), 55 deletions(-)
> create mode 100644 drivers/infiniband/core/cm_trace.c
> create mode 100644 drivers/infiniband/core/cm_trace.h
> 
> diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
> index 24cb71a16a28..ccf2670ef45e 100644
> --- a/drivers/infiniband/core/Makefile
> +++ b/drivers/infiniband/core/Makefile
> @@ -17,7 +17,7 @@ ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
> ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
> ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
> 
> -ib_cm-y :=			cm.o
> +ib_cm-y :=			cm.o cm_trace.o
> 
> iw_cm-y :=			iwcm.o iwpm_util.o iwpm_msg.o
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index 0d1377232933..8dd8039e1a02 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -27,6 +27,7 @@
> #include <rdma/ib_cm.h>
> #include "cm_msgs.h"
> #include "core_priv.h"
> +#include "cm_trace.h"
> 
> MODULE_AUTHOR("Sean Hefty");
> MODULE_DESCRIPTION("InfiniBand CM");
> @@ -2124,8 +2125,7 @@ static int cm_req_handler(struct cm_work *work)
> 
> 	listen_cm_id_priv = cm_match_req(work, cm_id_priv);
> 	if (!listen_cm_id_priv) {
> -		pr_debug("%s: local_id %d, no listen_cm_id_priv\n", __func__,
> -			 be32_to_cpu(cm_id_priv->id.local_id));
> +		trace_cm_no_listener_err(&cm_id_priv->id);
> 		cm_id_priv->id.state = IB_CM_IDLE;
> 		ret = -EINVAL;
> 		goto destroy;
> @@ -2274,8 +2274,7 @@ int ib_send_cm_rep(struct ib_cm_id *cm_id,
> 	spin_lock_irqsave(&cm_id_priv->lock, flags);
> 	if (cm_id->state != IB_CM_REQ_RCVD &&
> 	    cm_id->state != IB_CM_MRA_REQ_SENT) {
> -		pr_debug("%s: local_comm_id %d, cm_id->state: %d\n", __func__,
> -			 be32_to_cpu(cm_id_priv->id.local_id), cm_id->state);
> +		trace_cm_send_rep_err(cm_id_priv->id.local_id, cm_id->state);
> 		ret = -EINVAL;
> 		goto out;
> 	}
> @@ -2348,8 +2347,7 @@ int ib_send_cm_rtu(struct ib_cm_id *cm_id,
> 	spin_lock_irqsave(&cm_id_priv->lock, flags);
> 	if (cm_id->state != IB_CM_REP_RCVD &&
> 	    cm_id->state != IB_CM_MRA_REP_SENT) {
> -		pr_debug("%s: local_id %d, cm_id->state %d\n", __func__,
> -			 be32_to_cpu(cm_id->local_id), cm_id->state);
> +		trace_cm_send_cm_rtu_err(cm_id);
> 		ret = -EINVAL;
> 		goto error;
> 	}
> @@ -2465,7 +2463,7 @@ static int cm_rep_handler(struct cm_work *work)
> 		cpu_to_be32(IBA_GET(CM_REP_REMOTE_COMM_ID, rep_msg)), 0);
> 	if (!cm_id_priv) {
> 		cm_dup_rep_handler(work);
> -		pr_debug("%s: remote_comm_id %d, no cm_id_priv\n", __func__,
> +		trace_cm_remote_no_priv_err(
> 			 IBA_GET(CM_REP_REMOTE_COMM_ID, rep_msg));
> 		return -EINVAL;
> 	}
> @@ -2479,11 +2477,10 @@ static int cm_rep_handler(struct cm_work *work)
> 		break;
> 	default:
> 		ret = -EINVAL;
> -		pr_debug(
> -			"%s: cm_id_priv->id.state: %d, local_comm_id %d, remote_comm_id %d\n",
> -			__func__, cm_id_priv->id.state,
> +		trace_cm_rep_unknown_err(
> 			IBA_GET(CM_REP_LOCAL_COMM_ID, rep_msg),
> -			IBA_GET(CM_REP_REMOTE_COMM_ID, rep_msg));
> +			IBA_GET(CM_REP_REMOTE_COMM_ID, rep_msg),
> +			cm_id_priv->id.state);
> 		spin_unlock_irq(&cm_id_priv->lock);
> 		goto error;
> 	}
> @@ -2500,7 +2497,7 @@ static int cm_rep_handler(struct cm_work *work)
> 		spin_unlock(&cm.lock);
> 		spin_unlock_irq(&cm_id_priv->lock);
> 		ret = -EINVAL;
> -		pr_debug("%s: Failed to insert remote id %d\n", __func__,
> +		trace_cm_insert_failed_err(
> 			 IBA_GET(CM_REP_REMOTE_COMM_ID, rep_msg));
> 		goto error;
> 	}
> @@ -2517,9 +2514,8 @@ static int cm_rep_handler(struct cm_work *work)
> 			     IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REP,
> 			     NULL, 0);
> 		ret = -EINVAL;
> -		pr_debug(
> -			"%s: Stale connection. local_comm_id %d, remote_comm_id %d\n",
> -			__func__, IBA_GET(CM_REP_LOCAL_COMM_ID, rep_msg),
> +		trace_cm_staleconn_err(
> +			IBA_GET(CM_REP_LOCAL_COMM_ID, rep_msg),
> 			IBA_GET(CM_REP_REMOTE_COMM_ID, rep_msg));
> 
> 		if (cur_cm_id_priv) {
> @@ -2646,9 +2642,7 @@ static int cm_send_dreq_locked(struct cm_id_private *cm_id_priv,
> 		return -EINVAL;
> 
> 	if (cm_id_priv->id.state != IB_CM_ESTABLISHED) {
> -		pr_debug("%s: local_id %d, cm_id->state: %d\n", __func__,
> -			 be32_to_cpu(cm_id_priv->id.local_id),
> -			 cm_id_priv->id.state);
> +		trace_cm_dreq_skipped(&cm_id_priv->id);
> 		return -EINVAL;
> 	}
> 
> @@ -2722,10 +2716,7 @@ static int cm_send_drep_locked(struct cm_id_private *cm_id_priv,
> 		return -EINVAL;
> 
> 	if (cm_id_priv->id.state != IB_CM_DREQ_RCVD) {
> -		pr_debug(
> -			"%s: local_id %d, cm_idcm_id->state(%d) != IB_CM_DREQ_RCVD\n",
> -			__func__, be32_to_cpu(cm_id_priv->id.local_id),
> -			cm_id_priv->id.state);
> +		trace_cm_send_drep_err(&cm_id_priv->id);
> 		kfree(private_data);
> 		return -EINVAL;
> 	}
> @@ -2810,9 +2801,8 @@ static int cm_dreq_handler(struct cm_work *work)
> 		atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
> 				counter[CM_DREQ_COUNTER]);
> 		cm_issue_drep(work->port, work->mad_recv_wc);
> -		pr_debug(
> -			"%s: no cm_id_priv, local_comm_id %d, remote_comm_id %d\n",
> -			__func__, IBA_GET(CM_DREQ_LOCAL_COMM_ID, dreq_msg),
> +		trace_cm_no_priv_err(
> +			IBA_GET(CM_DREQ_LOCAL_COMM_ID, dreq_msg),
> 			IBA_GET(CM_DREQ_REMOTE_COMM_ID, dreq_msg));
> 		return -EINVAL;
> 	}
> @@ -2858,9 +2848,7 @@ static int cm_dreq_handler(struct cm_work *work)
> 				counter[CM_DREQ_COUNTER]);
> 		goto unlock;
> 	default:
> -		pr_debug("%s: local_id %d, cm_id_priv->id.state: %d\n",
> -			 __func__, be32_to_cpu(cm_id_priv->id.local_id),
> -			 cm_id_priv->id.state);
> +		trace_cm_dreq_unknown_err(&cm_id_priv->id);
> 		goto unlock;
> 	}
> 	cm_id_priv->id.state = IB_CM_DREQ_RCVD;
> @@ -2945,9 +2933,7 @@ static int cm_send_rej_locked(struct cm_id_private *cm_id_priv,
> 			      state);
> 		break;
> 	default:
> -		pr_debug("%s: local_id %d, cm_id->state: %d\n", __func__,
> -			 be32_to_cpu(cm_id_priv->id.local_id),
> -			 cm_id_priv->id.state);
> +		trace_cm_send_unknown_rej_err(&cm_id_priv->id);
> 		return -EINVAL;
> 	}
> 
> @@ -3060,9 +3046,7 @@ static int cm_rej_handler(struct cm_work *work)
> 		}
> 		/* fall through */
> 	default:
> -		pr_debug("%s: local_id %d, cm_id_priv->id.state: %d\n",
> -			 __func__, be32_to_cpu(cm_id_priv->id.local_id),
> -			 cm_id_priv->id.state);
> +		trace_cm_rej_unknown_err(&cm_id_priv->id);
> 		spin_unlock_irq(&cm_id_priv->lock);
> 		goto out;
> 	}
> @@ -3118,9 +3102,7 @@ int ib_send_cm_mra(struct ib_cm_id *cm_id,
> 		}
> 		/* fall through */
> 	default:
> -		pr_debug("%s: local_id %d, cm_id_priv->id.state: %d\n",
> -			 __func__, be32_to_cpu(cm_id_priv->id.local_id),
> -			 cm_id_priv->id.state);
> +		trace_cm_send_mra_unknown_err(&cm_id_priv->id);
> 		ret = -EINVAL;
> 		goto error1;
> 	}
> @@ -3229,9 +3211,7 @@ static int cm_mra_handler(struct cm_work *work)
> 				counter[CM_MRA_COUNTER]);
> 		/* fall through */
> 	default:
> -		pr_debug("%s local_id %d, cm_id_priv->id.state: %d\n",
> -			 __func__, be32_to_cpu(cm_id_priv->id.local_id),
> -			 cm_id_priv->id.state);
> +		trace_cm_mra_unknown_err(&cm_id_priv->id);
> 		goto out;
> 	}
> 
> @@ -3765,8 +3745,7 @@ static void cm_process_send_error(struct ib_mad_send_buf *msg,
> 	if (msg != cm_id_priv->msg || state != cm_id_priv->id.state)
> 		goto discard;
> 
> -	pr_debug_ratelimited("CM: failed sending MAD in state %d. (%s)\n",
> -			     state, ib_wc_status_msg(wc_status));
> +	trace_cm_mad_send_err(state, wc_status);
> 	switch (state) {
> 	case IB_CM_REQ_SENT:
> 	case IB_CM_MRA_REQ_RCVD:
> @@ -3889,7 +3868,7 @@ static void cm_work_handler(struct work_struct *_work)
> 		ret = cm_timewait_handler(work);
> 		break;
> 	default:
> -		pr_debug("cm_event.event: 0x%x\n", work->cm_event.event);
> +		trace_cm_handler_err(work->cm_event.event);
> 		ret = -EINVAL;
> 		break;
> 	}
> @@ -3925,8 +3904,7 @@ static int cm_establish(struct ib_cm_id *cm_id)
> 		ret = -EISCONN;
> 		break;
> 	default:
> -		pr_debug("%s: local_id %d, cm_id->state: %d\n", __func__,
> -			 be32_to_cpu(cm_id->local_id), cm_id->state);
> +		trace_cm_establish_err(cm_id);
> 		ret = -EINVAL;
> 		break;
> 	}
> @@ -4123,9 +4101,7 @@ static int cm_init_qp_init_attr(struct cm_id_private *cm_id_priv,
> 		ret = 0;
> 		break;
> 	default:
> -		pr_debug("%s: local_id %d, cm_id_priv->id.state: %d\n",
> -			 __func__, be32_to_cpu(cm_id_priv->id.local_id),
> -			 cm_id_priv->id.state);
> +		trace_cm_qp_init_err(&cm_id_priv->id);
> 		ret = -EINVAL;
> 		break;
> 	}
> @@ -4173,9 +4149,7 @@ static int cm_init_qp_rtr_attr(struct cm_id_private *cm_id_priv,
> 		ret = 0;
> 		break;
> 	default:
> -		pr_debug("%s: local_id %d, cm_id_priv->id.state: %d\n",
> -			 __func__, be32_to_cpu(cm_id_priv->id.local_id),
> -			 cm_id_priv->id.state);
> +		trace_cm_qp_rtr_err(&cm_id_priv->id);
> 		ret = -EINVAL;
> 		break;
> 	}
> @@ -4235,9 +4209,7 @@ static int cm_init_qp_rts_attr(struct cm_id_private *cm_id_priv,
> 		ret = 0;
> 		break;
> 	default:
> -		pr_debug("%s: local_id %d, cm_id_priv->id.state: %d\n",
> -			 __func__, be32_to_cpu(cm_id_priv->id.local_id),
> -			 cm_id_priv->id.state);
> +		trace_cm_qp_rts_err(&cm_id_priv->id);
> 		ret = -EINVAL;
> 		break;
> 	}
> diff --git a/drivers/infiniband/core/cm_trace.c b/drivers/infiniband/core/cm_trace.c
> new file mode 100644
> index 000000000000..8f3482f66338
> --- /dev/null
> +++ b/drivers/infiniband/core/cm_trace.c
> @@ -0,0 +1,15 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Trace points for the IB Connection Manager.
> + *
> + * Author: Chuck Lever <chuck.lever@oracle.com>
> + *
> + * Copyright (c) 2020, Oracle and/or its affiliates.
> + */
> +
> +#include <rdma/rdma_cm.h>
> +#include "cma_priv.h"
> +
> +#define CREATE_TRACE_POINTS
> +
> +#include "cm_trace.h"
> diff --git a/drivers/infiniband/core/cm_trace.h b/drivers/infiniband/core/cm_trace.h
> new file mode 100644
> index 000000000000..84f65f597e34
> --- /dev/null
> +++ b/drivers/infiniband/core/cm_trace.h
> @@ -0,0 +1,309 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Trace point definitions for the RDMA Connect Manager.
> + *
> + * Author: Chuck Lever <chuck.lever@oracle.com>
> + *
> + * Copyright (c) 2020 Oracle and/or its affiliates.
> + */
> +
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM ib_cma
> +
> +#if !defined(_TRACE_IB_CMA_H) || defined(TRACE_HEADER_MULTI_READ)
> +
> +#define _TRACE_IB_CMA_H
> +
> +#include <linux/tracepoint.h>
> +#include <rdma/ib_cm.h>
> +#include <trace/events/rdma.h>
> +
> +/*
> + * enum ib_cm_state, from include/rdma/ib_cm.h
> + */
> +#define IB_CM_STATE_LIST					\
> +	ib_cm_state(IDLE)					\
> +	ib_cm_state(LISTEN)					\
> +	ib_cm_state(REQ_SENT)					\
> +	ib_cm_state(REQ_RCVD)					\
> +	ib_cm_state(MRA_REQ_SENT)				\
> +	ib_cm_state(MRA_REQ_RCVD)				\
> +	ib_cm_state(REP_SENT)					\
> +	ib_cm_state(REP_RCVD)					\
> +	ib_cm_state(MRA_REP_SENT)				\
> +	ib_cm_state(MRA_REP_RCVD)				\
> +	ib_cm_state(ESTABLISHED)				\
> +	ib_cm_state(DREQ_SENT)					\
> +	ib_cm_state(DREQ_RCVD)					\
> +	ib_cm_state(TIMEWAIT)					\
> +	ib_cm_state(SIDR_REQ_SENT)				\
> +	ib_cm_state_end(SIDR_REQ_RCVD)
> +
> +#undef  ib_cm_state
> +#undef  ib_cm_state_end
> +#define ib_cm_state(x)		TRACE_DEFINE_ENUM(IB_CM_##x);
> +#define ib_cm_state_end(x)	TRACE_DEFINE_ENUM(IB_CM_##x);
> +
> +IB_CM_STATE_LIST
> +
> +#undef  ib_cm_state
> +#undef  ib_cm_state_end
> +#define ib_cm_state(x)		{ IB_CM_##x, #x },
> +#define ib_cm_state_end(x)	{ IB_CM_##x, #x }
> +
> +#define show_ib_cm_state(x) \
> +		__print_symbolic(x, IB_CM_STATE_LIST)
> +
> +/*
> + * enum ib_cm_lap_state, from include/rdma/ib_cm.h
> + */
> +#define IB_CM_LAP_STATE_LIST					\
> +	ib_cm_lap_state(LAP_UNINIT)				\
> +	ib_cm_lap_state(LAP_IDLE)				\
> +	ib_cm_lap_state(LAP_SENT)				\
> +	ib_cm_lap_state(LAP_RCVD)				\
> +	ib_cm_lap_state(MRA_LAP_SENT)				\
> +	ib_cm_lap_state_end(MRA_LAP_RCVD)
> +
> +#undef  ib_cm_lap_state
> +#undef  ib_cm_lap_state_end
> +#define ib_cm_lap_state(x)	TRACE_DEFINE_ENUM(IB_CM_##x);
> +#define ib_cm_lap_state_end(x)	TRACE_DEFINE_ENUM(IB_CM_##x);
> +
> +IB_CM_LAP_STATE_LIST
> +
> +#undef  ib_cm_lap_state
> +#undef  ib_cm_lap_state_end
> +#define ib_cm_lap_state(x)	{ IB_CM_##x, #x },
> +#define ib_cm_lap_state_end(x)	{ IB_CM_##x, #x }
> +
> +#define show_ib_cm_lap_state(x) \
> +		__print_symbolic(x, IB_CM_LAP_STATE_LIST)
> +
> +
> +DECLARE_EVENT_CLASS(cm_id_class,
> +	TP_PROTO(
> +		const struct ib_cm_id *cm_id
> +	),
> +
> +	TP_ARGS(cm_id),
> +
> +	TP_STRUCT__entry(
> +		__field(const void *, cm_id)	/* for eBPF scripts */
> +		__field(unsigned int, local_id)
> +		__field(unsigned int, remote_id)
> +		__field(unsigned long, state)
> +		__field(unsigned long, lap_state)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->cm_id = cm_id;
> +		__entry->local_id = be32_to_cpu(cm_id->local_id);
> +		__entry->remote_id = be32_to_cpu(cm_id->remote_id);
> +		__entry->state = cm_id->state;
> +		__entry->lap_state = cm_id->lap_state;
> +	),
> +
> +	TP_printk("local_id=%u remote_id=%u state=%s lap_state=%s",
> +		__entry->local_id, __entry->remote_id,
> +		show_ib_cm_state(__entry->state),
> +		show_ib_cm_lap_state(__entry->lap_state)
> +	)
> +);
> +
> +#define DEFINE_CM_ERR_EVENT(name)					\
> +		DEFINE_EVENT(cm_id_class,				\
> +				cm_##name##_err,			\
> +				TP_PROTO(				\
> +					const struct ib_cm_id *cm_id	\
> +				),					\
> +				TP_ARGS(cm_id))
> +
> +DEFINE_CM_ERR_EVENT(send_cm_rtu);
> +DEFINE_CM_ERR_EVENT(establish);
> +DEFINE_CM_ERR_EVENT(no_listener);
> +DEFINE_CM_ERR_EVENT(send_drep);
> +DEFINE_CM_ERR_EVENT(dreq_unknown);
> +DEFINE_CM_ERR_EVENT(send_unknown_rej);
> +DEFINE_CM_ERR_EVENT(rej_unknown);
> +DEFINE_CM_ERR_EVENT(send_mra_unknown);
> +DEFINE_CM_ERR_EVENT(mra_unknown);
> +DEFINE_CM_ERR_EVENT(qp_init);
> +DEFINE_CM_ERR_EVENT(qp_rtr);
> +DEFINE_CM_ERR_EVENT(qp_rts);
> +
> +DEFINE_EVENT(cm_id_class,						\
> +	cm_dreq_skipped,						\
> +	TP_PROTO(							\
> +		const struct ib_cm_id *cm_id				\
> +	),								\
> +	TP_ARGS(cm_id)							\
> +);
> +
> +DECLARE_EVENT_CLASS(cm_local_class,
> +	TP_PROTO(
> +		unsigned int local_id,
> +		unsigned int remote_id
> +	),
> +
> +	TP_ARGS(local_id, remote_id),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned int, local_id)
> +		__field(unsigned int, remote_id)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->local_id = local_id;
> +		__entry->remote_id = remote_id;
> +	),
> +
> +	TP_printk("local_id=%u remote_id=%u",
> +		__entry->local_id, __entry->remote_id
> +	)
> +);
> +
> +#define DEFINE_CM_LOCAL_EVENT(name)					\
> +		DEFINE_EVENT(cm_local_class,				\
> +				cm_##name,				\
> +				TP_PROTO(				\
> +					unsigned int local_id,			\
> +					unsigned int remote_id			\
> +				),					\
> +				TP_ARGS(local_id, remote_id))
> +
> +DEFINE_CM_LOCAL_EVENT(staleconn_err);
> +DEFINE_CM_LOCAL_EVENT(no_priv_err);
> +
> +DECLARE_EVENT_CLASS(cm_remote_class,
> +	TP_PROTO(
> +		u32 remote_id
> +	),
> +
> +	TP_ARGS(remote_id),
> +
> +	TP_STRUCT__entry(
> +		__field(u32, remote_id)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->remote_id = remote_id;
> +	),
> +
> +	TP_printk("remote_id=%u",
> +		__entry->remote_id
> +	)
> +);
> +
> +#define DEFINE_CM_REMOTE_EVENT(name)					\
> +		DEFINE_EVENT(cm_remote_class,				\
> +				cm_##name,				\
> +				TP_PROTO(				\
> +					u32 remote_id			\
> +				),					\
> +				TP_ARGS(remote_id))
> +
> +DEFINE_CM_REMOTE_EVENT(remote_no_priv_err);
> +DEFINE_CM_REMOTE_EVENT(insert_failed_err);
> +
> +TRACE_EVENT(cm_send_rep_err,
> +	TP_PROTO(
> +		__be32 local_id,
> +		enum ib_cm_state state
> +	),
> +
> +	TP_ARGS(local_id, state),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned int, local_id)
> +		__field(unsigned long, state)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->local_id = be32_to_cpu(local_id);
> +		__entry->state = state;
> +	),
> +
> +	TP_printk("local_id=%u state=%s",
> +		__entry->local_id, show_ib_cm_state(__entry->state)
> +	)
> +);
> +
> +TRACE_EVENT(cm_rep_unknown_err,
> +	TP_PROTO(
> +		unsigned int local_id,
> +		unsigned int remote_id,
> +		enum ib_cm_state state
> +	),
> +
> +	TP_ARGS(local_id, remote_id, state),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned int, local_id)
> +		__field(unsigned int, remote_id)
> +		__field(unsigned long, state)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->local_id = local_id;
> +		__entry->remote_id = remote_id;
> +		__entry->state = state;
> +	),
> +
> +	TP_printk("local_id=%u remote_id=%u state=%s",
> +		__entry->local_id, __entry->remote_id,
> +		show_ib_cm_state(__entry->state)
> +	)
> +);
> +
> +TRACE_EVENT(cm_handler_err,
> +	TP_PROTO(
> +		enum ib_event_type event
> +	),
> +
> +	TP_ARGS(event),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned long, event)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->event = event;
> +	),
> +
> +	TP_printk("unhandled event=%s",
> +		rdma_show_ib_event(__entry->event)
> +	)
> +);
> +
> +TRACE_EVENT(cm_mad_send_err,
> +	TP_PROTO(
> +		enum ib_cm_state state,
> +		enum ib_wc_status wc_status
> +	),
> +
> +	TP_ARGS(state, wc_status),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned long, state)
> +		__field(unsigned long, wc_status)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->state = state;
> +		__entry->wc_status = wc_status;
> +	),
> +
> +	TP_printk("state=%s completion status=%s",
> +		show_ib_cm_state(__entry->state),
> +		rdma_show_wc_status(__entry->wc_status)
> +	)
> +);
> +
> +#endif /* _TRACE_IB_CMA_H */
> +
> +#undef TRACE_INCLUDE_PATH
> +#define TRACE_INCLUDE_PATH .
> +#define TRACE_INCLUDE_FILE cm_trace
> +
> +#include <trace/define_trace.h>
> 

--
Chuck Lever




^ permalink raw reply

* [PATCH RFC 3/3] RDMA/cm: Add tracepoints to track MAD send operations
From: Chuck Lever @ 2020-07-10 14:06 UTC (permalink / raw)
  To: linux-rdma; +Cc: aron.silverton
In-Reply-To: <20200710135812.14749.4630.stgit@klimt.1015granger.net>

Surface the operation of MAD exchanges during connection
establishment. Some samples:

[root@klimt ~]# trace-cmd report -F ib_cma
cpus=4
     kworker/0:4-123   [000]    60.677388: cm_send_rep:          local_id=1965336542 remote_id=1096195961 state=REQ_RCVD lap_state=LAP_UNINIT
   kworker/u8:11-391   [002]    60.678808: cm_send_req:          local_id=1982113758 remote_id=0 state=IDLE lap_state=LAP_UNINIT
     kworker/0:4-123   [000]    60.679652: cm_send_rtu:          local_id=1982113758 remote_id=1079418745 state=REP_RCVD lap_state=LAP_UNINIT
            nfsd-1954  [001]    60.691350: cm_send_rep:          local_id=1998890974 remote_id=1129750393 state=MRA_REQ_SENT lap_state=LAP_UNINIT
            nfsd-1954  [003]    62.017931: cm_send_drep:         local_id=1998890974 remote_id=1129750393 state=TIMEWAIT lap_state=LAP_UNINIT

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/infiniband/core/cm.c       |   22 +++++++-
 drivers/infiniband/core/cm_trace.h |  105 ++++++++++++++++++++++++++++++++++++
 2 files changed, 125 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 8dd8039e1a02..ace3835da0cd 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1564,6 +1564,7 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
 	cm_id_priv->local_qpn = cpu_to_be32(IBA_GET(CM_REQ_LOCAL_QPN, req_msg));
 	cm_id_priv->rq_psn = cpu_to_be32(IBA_GET(CM_REQ_STARTING_PSN, req_msg));
 
+	trace_cm_send_req(&cm_id_priv->id);
 	spin_lock_irqsave(&cm_id_priv->lock, flags);
 	ret = ib_post_send_mad(cm_id_priv->msg, NULL);
 	if (ret) {
@@ -1611,6 +1612,9 @@ static int cm_issue_rej(struct cm_port *port,
 		IBA_SET_MEM(CM_REJ_ARI, rej_msg, ari, ari_length);
 	}
 
+	trace_cm_issue_rej(
+		IBA_GET(CM_REJ_LOCAL_COMM_ID, rcv_msg),
+		IBA_GET(CM_REJ_REMOTE_COMM_ID, rcv_msg));
 	ret = ib_post_send_mad(msg, NULL);
 	if (ret)
 		cm_free_msg(msg);
@@ -1962,6 +1966,7 @@ static void cm_dup_req_handler(struct cm_work *work,
 	}
 	spin_unlock_irq(&cm_id_priv->lock);
 
+	trace_cm_send_dup_req(&cm_id_priv->id);
 	ret = ib_post_send_mad(msg, NULL);
 	if (ret)
 		goto free;
@@ -2288,6 +2293,7 @@ int ib_send_cm_rep(struct ib_cm_id *cm_id,
 	msg->timeout_ms = cm_id_priv->timeout_ms;
 	msg->context[1] = (void *) (unsigned long) IB_CM_REP_SENT;
 
+	trace_cm_send_rep(cm_id);
 	ret = ib_post_send_mad(msg, NULL);
 	if (ret) {
 		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
@@ -2359,6 +2365,7 @@ int ib_send_cm_rtu(struct ib_cm_id *cm_id,
 	cm_format_rtu((struct cm_rtu_msg *) msg->mad, cm_id_priv,
 		      private_data, private_data_len);
 
+	trace_cm_send_rtu(cm_id);
 	ret = ib_post_send_mad(msg, NULL);
 	if (ret) {
 		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
@@ -2440,6 +2447,7 @@ static void cm_dup_rep_handler(struct cm_work *work)
 		goto unlock;
 	spin_unlock_irq(&cm_id_priv->lock);
 
+	trace_cm_send_dup_rep(&cm_id_priv->id);
 	ret = ib_post_send_mad(msg, NULL);
 	if (ret)
 		goto free;
@@ -2661,6 +2669,7 @@ static int cm_send_dreq_locked(struct cm_id_private *cm_id_priv,
 	msg->timeout_ms = cm_id_priv->timeout_ms;
 	msg->context[1] = (void *) (unsigned long) IB_CM_DREQ_SENT;
 
+	trace_cm_send_dreq(&cm_id_priv->id);
 	ret = ib_post_send_mad(msg, NULL);
 	if (ret) {
 		cm_enter_timewait(cm_id_priv);
@@ -2731,6 +2740,7 @@ static int cm_send_drep_locked(struct cm_id_private *cm_id_priv,
 	cm_format_drep((struct cm_drep_msg *) msg->mad, cm_id_priv,
 		       private_data, private_data_len);
 
+	trace_cm_send_drep(&cm_id_priv->id);
 	ret = ib_post_send_mad(msg, NULL);
 	if (ret) {
 		cm_free_msg(msg);
@@ -2780,6 +2790,9 @@ static int cm_issue_drep(struct cm_port *port,
 	IBA_SET(CM_DREP_LOCAL_COMM_ID, drep_msg,
 		IBA_GET(CM_DREQ_REMOTE_COMM_ID, dreq_msg));
 
+	trace_cm_issue_drep(
+		IBA_GET(CM_DREQ_LOCAL_COMM_ID, dreq_msg),
+		IBA_GET(CM_DREQ_REMOTE_COMM_ID, dreq_msg));
 	ret = ib_post_send_mad(msg, NULL);
 	if (ret)
 		cm_free_msg(msg);
@@ -2937,6 +2950,7 @@ static int cm_send_rej_locked(struct cm_id_private *cm_id_priv,
 		return -EINVAL;
 	}
 
+	trace_cm_send_rej(&cm_id_priv->id, reason);
 	ret = ib_post_send_mad(msg, NULL);
 	if (ret) {
 		cm_free_msg(msg);
@@ -3115,6 +3129,7 @@ int ib_send_cm_mra(struct ib_cm_id *cm_id,
 		cm_format_mra((struct cm_mra_msg *) msg->mad, cm_id_priv,
 			      msg_response, service_timeout,
 			      private_data, private_data_len);
+		trace_cm_send_mra(cm_id);
 		ret = ib_post_send_mad(msg, NULL);
 		if (ret)
 			goto error2;
@@ -3485,10 +3500,12 @@ int ib_send_cm_sidr_req(struct ib_cm_id *cm_id,
 	msg->context[1] = (void *) (unsigned long) IB_CM_SIDR_REQ_SENT;
 
 	spin_lock_irqsave(&cm_id_priv->lock, flags);
-	if (cm_id->state == IB_CM_IDLE)
+	if (cm_id->state == IB_CM_IDLE) {
+		trace_cm_send_sidr_req(&cm_id_priv->id);
 		ret = ib_post_send_mad(msg, NULL);
-	else
+	} else {
 		ret = -EINVAL;
+	}
 
 	if (ret) {
 		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
@@ -3650,6 +3667,7 @@ static int cm_send_sidr_rep_locked(struct cm_id_private *cm_id_priv,
 
 	cm_format_sidr_rep((struct cm_sidr_rep_msg *) msg->mad, cm_id_priv,
 			   param);
+	trace_cm_send_sidr_rep(&cm_id_priv->id);
 	ret = ib_post_send_mad(msg, NULL);
 	if (ret) {
 		cm_free_msg(msg);
diff --git a/drivers/infiniband/core/cm_trace.h b/drivers/infiniband/core/cm_trace.h
index 84f65f597e34..100e38a3d489 100644
--- a/drivers/infiniband/core/cm_trace.h
+++ b/drivers/infiniband/core/cm_trace.h
@@ -80,6 +80,59 @@ IB_CM_LAP_STATE_LIST
 #define show_ib_cm_lap_state(x) \
 		__print_symbolic(x, IB_CM_LAP_STATE_LIST)
 
+/*
+ * enum ib_cm_rej_reason, from include/rdma/ib_cm.h
+ */
+#define IB_CM_REJ_REASON_LIST					\
+	ib_cm_rej_reason(REJ_NO_QP)				\
+	ib_cm_rej_reason(REJ_NO_EEC)				\
+	ib_cm_rej_reason(REJ_NO_RESOURCES)			\
+	ib_cm_rej_reason(REJ_TIMEOUT)				\
+	ib_cm_rej_reason(REJ_UNSUPPORTED)			\
+	ib_cm_rej_reason(REJ_INVALID_COMM_ID)			\
+	ib_cm_rej_reason(REJ_INVALID_COMM_INSTANCE)		\
+	ib_cm_rej_reason(REJ_INVALID_SERVICE_ID)		\
+	ib_cm_rej_reason(REJ_INVALID_TRANSPORT_TYPE)		\
+	ib_cm_rej_reason(REJ_STALE_CONN)			\
+	ib_cm_rej_reason(REJ_RDC_NOT_EXIST)			\
+	ib_cm_rej_reason(REJ_INVALID_GID)			\
+	ib_cm_rej_reason(REJ_INVALID_LID)			\
+	ib_cm_rej_reason(REJ_INVALID_SL)			\
+	ib_cm_rej_reason(REJ_INVALID_TRAFFIC_CLASS)		\
+	ib_cm_rej_reason(REJ_INVALID_HOP_LIMIT)			\
+	ib_cm_rej_reason(REJ_INVALID_PACKET_RATE)		\
+	ib_cm_rej_reason(REJ_INVALID_ALT_GID)			\
+	ib_cm_rej_reason(REJ_INVALID_ALT_LID)			\
+	ib_cm_rej_reason(REJ_INVALID_ALT_SL)			\
+	ib_cm_rej_reason(REJ_INVALID_ALT_TRAFFIC_CLASS)		\
+	ib_cm_rej_reason(REJ_INVALID_ALT_HOP_LIMIT)		\
+	ib_cm_rej_reason(REJ_INVALID_ALT_PACKET_RATE)		\
+	ib_cm_rej_reason(REJ_PORT_CM_REDIRECT)			\
+	ib_cm_rej_reason(REJ_PORT_REDIRECT)			\
+	ib_cm_rej_reason(REJ_INVALID_MTU)			\
+	ib_cm_rej_reason(REJ_INSUFFICIENT_RESP_RESOURCES)	\
+	ib_cm_rej_reason(REJ_CONSUMER_DEFINED)			\
+	ib_cm_rej_reason(REJ_INVALID_RNR_RETRY)			\
+	ib_cm_rej_reason(REJ_DUPLICATE_LOCAL_COMM_ID)		\
+	ib_cm_rej_reason(REJ_INVALID_CLASS_VERSION)		\
+	ib_cm_rej_reason(REJ_INVALID_FLOW_LABEL)		\
+	ib_cm_rej_reason(REJ_INVALID_ALT_FLOW_LABEL)		\
+	ib_cm_rej_reason_end(REJ_VENDOR_OPTION_NOT_SUPPORTED)
+
+#undef  ib_cm_rej_reason
+#undef  ib_cm_rej_reason_end
+#define ib_cm_rej_reason(x)	TRACE_DEFINE_ENUM(IB_CM_##x);
+#define ib_cm_rej_reason_end(x)	TRACE_DEFINE_ENUM(IB_CM_##x);
+
+IB_CM_REJ_REASON_LIST
+
+#undef  ib_cm_rej_reason
+#undef  ib_cm_rej_reason_end
+#define ib_cm_rej_reason(x)	{ IB_CM_##x, #x },
+#define ib_cm_rej_reason_end(x)	{ IB_CM_##x, #x }
+
+#define show_ib_cm_rej_reason(x) \
+		__print_symbolic(x, IB_CM_REJ_REASON_LIST)
 
 DECLARE_EVENT_CLASS(cm_id_class,
 	TP_PROTO(
@@ -111,6 +164,56 @@ DECLARE_EVENT_CLASS(cm_id_class,
 	)
 );
 
+#define DEFINE_CM_SEND_EVENT(name)					\
+		DEFINE_EVENT(cm_id_class,				\
+				cm_send_##name,				\
+				TP_PROTO(				\
+					const struct ib_cm_id *cm_id	\
+				),					\
+				TP_ARGS(cm_id))
+
+DEFINE_CM_SEND_EVENT(req);
+DEFINE_CM_SEND_EVENT(rep);
+DEFINE_CM_SEND_EVENT(dup_req);
+DEFINE_CM_SEND_EVENT(dup_rep);
+DEFINE_CM_SEND_EVENT(rtu);
+DEFINE_CM_SEND_EVENT(mra);
+DEFINE_CM_SEND_EVENT(sidr_req);
+DEFINE_CM_SEND_EVENT(sidr_rep);
+DEFINE_CM_SEND_EVENT(dreq);
+DEFINE_CM_SEND_EVENT(drep);
+
+TRACE_EVENT(cm_send_rej,
+	TP_PROTO(
+		const struct ib_cm_id *cm_id,
+		enum ib_cm_rej_reason reason
+	),
+
+	TP_ARGS(cm_id, reason),
+
+	TP_STRUCT__entry(
+		__field(const void *, cm_id)
+		__field(u32, local_id)
+		__field(u32, remote_id)
+		__field(unsigned long, state)
+		__field(unsigned long, reason)
+	),
+
+	TP_fast_assign(
+		__entry->cm_id = cm_id;
+		__entry->local_id = be32_to_cpu(cm_id->local_id);
+		__entry->remote_id = be32_to_cpu(cm_id->remote_id);
+		__entry->state = cm_id->state;
+		__entry->reason = reason;
+	),
+
+	TP_printk("local_id=%u remote_id=%u state=%s reason=%s",
+		__entry->local_id, __entry->remote_id,
+		show_ib_cm_state(__entry->state),
+		show_ib_cm_rej_reason(__entry->reason)
+	)
+);
+
 #define DEFINE_CM_ERR_EVENT(name)					\
 		DEFINE_EVENT(cm_id_class,				\
 				cm_##name##_err,			\
@@ -172,6 +275,8 @@ DECLARE_EVENT_CLASS(cm_local_class,
 				),					\
 				TP_ARGS(local_id, remote_id))
 
+DEFINE_CM_LOCAL_EVENT(issue_rej);
+DEFINE_CM_LOCAL_EVENT(issue_drep);
 DEFINE_CM_LOCAL_EVENT(staleconn_err);
 DEFINE_CM_LOCAL_EVENT(no_priv_err);
 


^ permalink raw reply related

* [PATCH RFC 2/3] RDMA/cm: Replace pr_debug() call sites with tracepoints
From: Chuck Lever @ 2020-07-10 14:06 UTC (permalink / raw)
  To: linux-rdma; +Cc: aron.silverton
In-Reply-To: <20200710135812.14749.4630.stgit@klimt.1015granger.net>

In the interest of converging on a common instrumentation
infrastructure, modernize the pr_debug() call sites added by commit
119bf81793ea ("IB/cm: Add debug prints to ib_cm"). The new
tracepoints appear in a new "ib_cma" subsystem.

The conversion is somewhat mechanical. Someone more familiar with
the semantics of the recorded information might suggest additional
data capture.

Some benefits include:
- Tracepoints enable "always on" reporting of these errors
- The error records are structured and compact
- Tracepoints provide hooks for eBPF scripts

Sample output:

            nfsd-1954  [003]    62.017901: cm_send_dreq_err:     local_id=1998890974 remote_id=1129750393 state=DREQ_RCVD lap_state=LAP_UNINIT

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/infiniband/core/Makefile   |    2 
 drivers/infiniband/core/cm.c       |   80 +++------
 drivers/infiniband/core/cm_trace.c |   15 ++
 drivers/infiniband/core/cm_trace.h |  309 ++++++++++++++++++++++++++++++++++++
 4 files changed, 351 insertions(+), 55 deletions(-)
 create mode 100644 drivers/infiniband/core/cm_trace.c
 create mode 100644 drivers/infiniband/core/cm_trace.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 24cb71a16a28..ccf2670ef45e 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -17,7 +17,7 @@ ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
 ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
 ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
 
-ib_cm-y :=			cm.o
+ib_cm-y :=			cm.o cm_trace.o
 
 iw_cm-y :=			iwcm.o iwpm_util.o iwpm_msg.o
 
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 0d1377232933..8dd8039e1a02 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -27,6 +27,7 @@
 #include <rdma/ib_cm.h>
 #include "cm_msgs.h"
 #include "core_priv.h"
+#include "cm_trace.h"
 
 MODULE_AUTHOR("Sean Hefty");
 MODULE_DESCRIPTION("InfiniBand CM");
@@ -2124,8 +2125,7 @@ static int cm_req_handler(struct cm_work *work)
 
 	listen_cm_id_priv = cm_match_req(work, cm_id_priv);
 	if (!listen_cm_id_priv) {
-		pr_debug("%s: local_id %d, no listen_cm_id_priv\n", __func__,
-			 be32_to_cpu(cm_id_priv->id.local_id));
+		trace_cm_no_listener_err(&cm_id_priv->id);
 		cm_id_priv->id.state = IB_CM_IDLE;
 		ret = -EINVAL;
 		goto destroy;
@@ -2274,8 +2274,7 @@ int ib_send_cm_rep(struct ib_cm_id *cm_id,
 	spin_lock_irqsave(&cm_id_priv->lock, flags);
 	if (cm_id->state != IB_CM_REQ_RCVD &&
 	    cm_id->state != IB_CM_MRA_REQ_SENT) {
-		pr_debug("%s: local_comm_id %d, cm_id->state: %d\n", __func__,
-			 be32_to_cpu(cm_id_priv->id.local_id), cm_id->state);
+		trace_cm_send_rep_err(cm_id_priv->id.local_id, cm_id->state);
 		ret = -EINVAL;
 		goto out;
 	}
@@ -2348,8 +2347,7 @@ int ib_send_cm_rtu(struct ib_cm_id *cm_id,
 	spin_lock_irqsave(&cm_id_priv->lock, flags);
 	if (cm_id->state != IB_CM_REP_RCVD &&
 	    cm_id->state != IB_CM_MRA_REP_SENT) {
-		pr_debug("%s: local_id %d, cm_id->state %d\n", __func__,
-			 be32_to_cpu(cm_id->local_id), cm_id->state);
+		trace_cm_send_cm_rtu_err(cm_id);
 		ret = -EINVAL;
 		goto error;
 	}
@@ -2465,7 +2463,7 @@ static int cm_rep_handler(struct cm_work *work)
 		cpu_to_be32(IBA_GET(CM_REP_REMOTE_COMM_ID, rep_msg)), 0);
 	if (!cm_id_priv) {
 		cm_dup_rep_handler(work);
-		pr_debug("%s: remote_comm_id %d, no cm_id_priv\n", __func__,
+		trace_cm_remote_no_priv_err(
 			 IBA_GET(CM_REP_REMOTE_COMM_ID, rep_msg));
 		return -EINVAL;
 	}
@@ -2479,11 +2477,10 @@ static int cm_rep_handler(struct cm_work *work)
 		break;
 	default:
 		ret = -EINVAL;
-		pr_debug(
-			"%s: cm_id_priv->id.state: %d, local_comm_id %d, remote_comm_id %d\n",
-			__func__, cm_id_priv->id.state,
+		trace_cm_rep_unknown_err(
 			IBA_GET(CM_REP_LOCAL_COMM_ID, rep_msg),
-			IBA_GET(CM_REP_REMOTE_COMM_ID, rep_msg));
+			IBA_GET(CM_REP_REMOTE_COMM_ID, rep_msg),
+			cm_id_priv->id.state);
 		spin_unlock_irq(&cm_id_priv->lock);
 		goto error;
 	}
@@ -2500,7 +2497,7 @@ static int cm_rep_handler(struct cm_work *work)
 		spin_unlock(&cm.lock);
 		spin_unlock_irq(&cm_id_priv->lock);
 		ret = -EINVAL;
-		pr_debug("%s: Failed to insert remote id %d\n", __func__,
+		trace_cm_insert_failed_err(
 			 IBA_GET(CM_REP_REMOTE_COMM_ID, rep_msg));
 		goto error;
 	}
@@ -2517,9 +2514,8 @@ static int cm_rep_handler(struct cm_work *work)
 			     IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REP,
 			     NULL, 0);
 		ret = -EINVAL;
-		pr_debug(
-			"%s: Stale connection. local_comm_id %d, remote_comm_id %d\n",
-			__func__, IBA_GET(CM_REP_LOCAL_COMM_ID, rep_msg),
+		trace_cm_staleconn_err(
+			IBA_GET(CM_REP_LOCAL_COMM_ID, rep_msg),
 			IBA_GET(CM_REP_REMOTE_COMM_ID, rep_msg));
 
 		if (cur_cm_id_priv) {
@@ -2646,9 +2642,7 @@ static int cm_send_dreq_locked(struct cm_id_private *cm_id_priv,
 		return -EINVAL;
 
 	if (cm_id_priv->id.state != IB_CM_ESTABLISHED) {
-		pr_debug("%s: local_id %d, cm_id->state: %d\n", __func__,
-			 be32_to_cpu(cm_id_priv->id.local_id),
-			 cm_id_priv->id.state);
+		trace_cm_dreq_skipped(&cm_id_priv->id);
 		return -EINVAL;
 	}
 
@@ -2722,10 +2716,7 @@ static int cm_send_drep_locked(struct cm_id_private *cm_id_priv,
 		return -EINVAL;
 
 	if (cm_id_priv->id.state != IB_CM_DREQ_RCVD) {
-		pr_debug(
-			"%s: local_id %d, cm_idcm_id->state(%d) != IB_CM_DREQ_RCVD\n",
-			__func__, be32_to_cpu(cm_id_priv->id.local_id),
-			cm_id_priv->id.state);
+		trace_cm_send_drep_err(&cm_id_priv->id);
 		kfree(private_data);
 		return -EINVAL;
 	}
@@ -2810,9 +2801,8 @@ static int cm_dreq_handler(struct cm_work *work)
 		atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
 				counter[CM_DREQ_COUNTER]);
 		cm_issue_drep(work->port, work->mad_recv_wc);
-		pr_debug(
-			"%s: no cm_id_priv, local_comm_id %d, remote_comm_id %d\n",
-			__func__, IBA_GET(CM_DREQ_LOCAL_COMM_ID, dreq_msg),
+		trace_cm_no_priv_err(
+			IBA_GET(CM_DREQ_LOCAL_COMM_ID, dreq_msg),
 			IBA_GET(CM_DREQ_REMOTE_COMM_ID, dreq_msg));
 		return -EINVAL;
 	}
@@ -2858,9 +2848,7 @@ static int cm_dreq_handler(struct cm_work *work)
 				counter[CM_DREQ_COUNTER]);
 		goto unlock;
 	default:
-		pr_debug("%s: local_id %d, cm_id_priv->id.state: %d\n",
-			 __func__, be32_to_cpu(cm_id_priv->id.local_id),
-			 cm_id_priv->id.state);
+		trace_cm_dreq_unknown_err(&cm_id_priv->id);
 		goto unlock;
 	}
 	cm_id_priv->id.state = IB_CM_DREQ_RCVD;
@@ -2945,9 +2933,7 @@ static int cm_send_rej_locked(struct cm_id_private *cm_id_priv,
 			      state);
 		break;
 	default:
-		pr_debug("%s: local_id %d, cm_id->state: %d\n", __func__,
-			 be32_to_cpu(cm_id_priv->id.local_id),
-			 cm_id_priv->id.state);
+		trace_cm_send_unknown_rej_err(&cm_id_priv->id);
 		return -EINVAL;
 	}
 
@@ -3060,9 +3046,7 @@ static int cm_rej_handler(struct cm_work *work)
 		}
 		/* fall through */
 	default:
-		pr_debug("%s: local_id %d, cm_id_priv->id.state: %d\n",
-			 __func__, be32_to_cpu(cm_id_priv->id.local_id),
-			 cm_id_priv->id.state);
+		trace_cm_rej_unknown_err(&cm_id_priv->id);
 		spin_unlock_irq(&cm_id_priv->lock);
 		goto out;
 	}
@@ -3118,9 +3102,7 @@ int ib_send_cm_mra(struct ib_cm_id *cm_id,
 		}
 		/* fall through */
 	default:
-		pr_debug("%s: local_id %d, cm_id_priv->id.state: %d\n",
-			 __func__, be32_to_cpu(cm_id_priv->id.local_id),
-			 cm_id_priv->id.state);
+		trace_cm_send_mra_unknown_err(&cm_id_priv->id);
 		ret = -EINVAL;
 		goto error1;
 	}
@@ -3229,9 +3211,7 @@ static int cm_mra_handler(struct cm_work *work)
 				counter[CM_MRA_COUNTER]);
 		/* fall through */
 	default:
-		pr_debug("%s local_id %d, cm_id_priv->id.state: %d\n",
-			 __func__, be32_to_cpu(cm_id_priv->id.local_id),
-			 cm_id_priv->id.state);
+		trace_cm_mra_unknown_err(&cm_id_priv->id);
 		goto out;
 	}
 
@@ -3765,8 +3745,7 @@ static void cm_process_send_error(struct ib_mad_send_buf *msg,
 	if (msg != cm_id_priv->msg || state != cm_id_priv->id.state)
 		goto discard;
 
-	pr_debug_ratelimited("CM: failed sending MAD in state %d. (%s)\n",
-			     state, ib_wc_status_msg(wc_status));
+	trace_cm_mad_send_err(state, wc_status);
 	switch (state) {
 	case IB_CM_REQ_SENT:
 	case IB_CM_MRA_REQ_RCVD:
@@ -3889,7 +3868,7 @@ static void cm_work_handler(struct work_struct *_work)
 		ret = cm_timewait_handler(work);
 		break;
 	default:
-		pr_debug("cm_event.event: 0x%x\n", work->cm_event.event);
+		trace_cm_handler_err(work->cm_event.event);
 		ret = -EINVAL;
 		break;
 	}
@@ -3925,8 +3904,7 @@ static int cm_establish(struct ib_cm_id *cm_id)
 		ret = -EISCONN;
 		break;
 	default:
-		pr_debug("%s: local_id %d, cm_id->state: %d\n", __func__,
-			 be32_to_cpu(cm_id->local_id), cm_id->state);
+		trace_cm_establish_err(cm_id);
 		ret = -EINVAL;
 		break;
 	}
@@ -4123,9 +4101,7 @@ static int cm_init_qp_init_attr(struct cm_id_private *cm_id_priv,
 		ret = 0;
 		break;
 	default:
-		pr_debug("%s: local_id %d, cm_id_priv->id.state: %d\n",
-			 __func__, be32_to_cpu(cm_id_priv->id.local_id),
-			 cm_id_priv->id.state);
+		trace_cm_qp_init_err(&cm_id_priv->id);
 		ret = -EINVAL;
 		break;
 	}
@@ -4173,9 +4149,7 @@ static int cm_init_qp_rtr_attr(struct cm_id_private *cm_id_priv,
 		ret = 0;
 		break;
 	default:
-		pr_debug("%s: local_id %d, cm_id_priv->id.state: %d\n",
-			 __func__, be32_to_cpu(cm_id_priv->id.local_id),
-			 cm_id_priv->id.state);
+		trace_cm_qp_rtr_err(&cm_id_priv->id);
 		ret = -EINVAL;
 		break;
 	}
@@ -4235,9 +4209,7 @@ static int cm_init_qp_rts_attr(struct cm_id_private *cm_id_priv,
 		ret = 0;
 		break;
 	default:
-		pr_debug("%s: local_id %d, cm_id_priv->id.state: %d\n",
-			 __func__, be32_to_cpu(cm_id_priv->id.local_id),
-			 cm_id_priv->id.state);
+		trace_cm_qp_rts_err(&cm_id_priv->id);
 		ret = -EINVAL;
 		break;
 	}
diff --git a/drivers/infiniband/core/cm_trace.c b/drivers/infiniband/core/cm_trace.c
new file mode 100644
index 000000000000..8f3482f66338
--- /dev/null
+++ b/drivers/infiniband/core/cm_trace.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Trace points for the IB Connection Manager.
+ *
+ * Author: Chuck Lever <chuck.lever@oracle.com>
+ *
+ * Copyright (c) 2020, Oracle and/or its affiliates.
+ */
+
+#include <rdma/rdma_cm.h>
+#include "cma_priv.h"
+
+#define CREATE_TRACE_POINTS
+
+#include "cm_trace.h"
diff --git a/drivers/infiniband/core/cm_trace.h b/drivers/infiniband/core/cm_trace.h
new file mode 100644
index 000000000000..84f65f597e34
--- /dev/null
+++ b/drivers/infiniband/core/cm_trace.h
@@ -0,0 +1,309 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Trace point definitions for the RDMA Connect Manager.
+ *
+ * Author: Chuck Lever <chuck.lever@oracle.com>
+ *
+ * Copyright (c) 2020 Oracle and/or its affiliates.
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM ib_cma
+
+#if !defined(_TRACE_IB_CMA_H) || defined(TRACE_HEADER_MULTI_READ)
+
+#define _TRACE_IB_CMA_H
+
+#include <linux/tracepoint.h>
+#include <rdma/ib_cm.h>
+#include <trace/events/rdma.h>
+
+/*
+ * enum ib_cm_state, from include/rdma/ib_cm.h
+ */
+#define IB_CM_STATE_LIST					\
+	ib_cm_state(IDLE)					\
+	ib_cm_state(LISTEN)					\
+	ib_cm_state(REQ_SENT)					\
+	ib_cm_state(REQ_RCVD)					\
+	ib_cm_state(MRA_REQ_SENT)				\
+	ib_cm_state(MRA_REQ_RCVD)				\
+	ib_cm_state(REP_SENT)					\
+	ib_cm_state(REP_RCVD)					\
+	ib_cm_state(MRA_REP_SENT)				\
+	ib_cm_state(MRA_REP_RCVD)				\
+	ib_cm_state(ESTABLISHED)				\
+	ib_cm_state(DREQ_SENT)					\
+	ib_cm_state(DREQ_RCVD)					\
+	ib_cm_state(TIMEWAIT)					\
+	ib_cm_state(SIDR_REQ_SENT)				\
+	ib_cm_state_end(SIDR_REQ_RCVD)
+
+#undef  ib_cm_state
+#undef  ib_cm_state_end
+#define ib_cm_state(x)		TRACE_DEFINE_ENUM(IB_CM_##x);
+#define ib_cm_state_end(x)	TRACE_DEFINE_ENUM(IB_CM_##x);
+
+IB_CM_STATE_LIST
+
+#undef  ib_cm_state
+#undef  ib_cm_state_end
+#define ib_cm_state(x)		{ IB_CM_##x, #x },
+#define ib_cm_state_end(x)	{ IB_CM_##x, #x }
+
+#define show_ib_cm_state(x) \
+		__print_symbolic(x, IB_CM_STATE_LIST)
+
+/*
+ * enum ib_cm_lap_state, from include/rdma/ib_cm.h
+ */
+#define IB_CM_LAP_STATE_LIST					\
+	ib_cm_lap_state(LAP_UNINIT)				\
+	ib_cm_lap_state(LAP_IDLE)				\
+	ib_cm_lap_state(LAP_SENT)				\
+	ib_cm_lap_state(LAP_RCVD)				\
+	ib_cm_lap_state(MRA_LAP_SENT)				\
+	ib_cm_lap_state_end(MRA_LAP_RCVD)
+
+#undef  ib_cm_lap_state
+#undef  ib_cm_lap_state_end
+#define ib_cm_lap_state(x)	TRACE_DEFINE_ENUM(IB_CM_##x);
+#define ib_cm_lap_state_end(x)	TRACE_DEFINE_ENUM(IB_CM_##x);
+
+IB_CM_LAP_STATE_LIST
+
+#undef  ib_cm_lap_state
+#undef  ib_cm_lap_state_end
+#define ib_cm_lap_state(x)	{ IB_CM_##x, #x },
+#define ib_cm_lap_state_end(x)	{ IB_CM_##x, #x }
+
+#define show_ib_cm_lap_state(x) \
+		__print_symbolic(x, IB_CM_LAP_STATE_LIST)
+
+
+DECLARE_EVENT_CLASS(cm_id_class,
+	TP_PROTO(
+		const struct ib_cm_id *cm_id
+	),
+
+	TP_ARGS(cm_id),
+
+	TP_STRUCT__entry(
+		__field(const void *, cm_id)	/* for eBPF scripts */
+		__field(unsigned int, local_id)
+		__field(unsigned int, remote_id)
+		__field(unsigned long, state)
+		__field(unsigned long, lap_state)
+	),
+
+	TP_fast_assign(
+		__entry->cm_id = cm_id;
+		__entry->local_id = be32_to_cpu(cm_id->local_id);
+		__entry->remote_id = be32_to_cpu(cm_id->remote_id);
+		__entry->state = cm_id->state;
+		__entry->lap_state = cm_id->lap_state;
+	),
+
+	TP_printk("local_id=%u remote_id=%u state=%s lap_state=%s",
+		__entry->local_id, __entry->remote_id,
+		show_ib_cm_state(__entry->state),
+		show_ib_cm_lap_state(__entry->lap_state)
+	)
+);
+
+#define DEFINE_CM_ERR_EVENT(name)					\
+		DEFINE_EVENT(cm_id_class,				\
+				cm_##name##_err,			\
+				TP_PROTO(				\
+					const struct ib_cm_id *cm_id	\
+				),					\
+				TP_ARGS(cm_id))
+
+DEFINE_CM_ERR_EVENT(send_cm_rtu);
+DEFINE_CM_ERR_EVENT(establish);
+DEFINE_CM_ERR_EVENT(no_listener);
+DEFINE_CM_ERR_EVENT(send_drep);
+DEFINE_CM_ERR_EVENT(dreq_unknown);
+DEFINE_CM_ERR_EVENT(send_unknown_rej);
+DEFINE_CM_ERR_EVENT(rej_unknown);
+DEFINE_CM_ERR_EVENT(send_mra_unknown);
+DEFINE_CM_ERR_EVENT(mra_unknown);
+DEFINE_CM_ERR_EVENT(qp_init);
+DEFINE_CM_ERR_EVENT(qp_rtr);
+DEFINE_CM_ERR_EVENT(qp_rts);
+
+DEFINE_EVENT(cm_id_class,						\
+	cm_dreq_skipped,						\
+	TP_PROTO(							\
+		const struct ib_cm_id *cm_id				\
+	),								\
+	TP_ARGS(cm_id)							\
+);
+
+DECLARE_EVENT_CLASS(cm_local_class,
+	TP_PROTO(
+		unsigned int local_id,
+		unsigned int remote_id
+	),
+
+	TP_ARGS(local_id, remote_id),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, local_id)
+		__field(unsigned int, remote_id)
+	),
+
+	TP_fast_assign(
+		__entry->local_id = local_id;
+		__entry->remote_id = remote_id;
+	),
+
+	TP_printk("local_id=%u remote_id=%u",
+		__entry->local_id, __entry->remote_id
+	)
+);
+
+#define DEFINE_CM_LOCAL_EVENT(name)					\
+		DEFINE_EVENT(cm_local_class,				\
+				cm_##name,				\
+				TP_PROTO(				\
+					unsigned int local_id,			\
+					unsigned int remote_id			\
+				),					\
+				TP_ARGS(local_id, remote_id))
+
+DEFINE_CM_LOCAL_EVENT(staleconn_err);
+DEFINE_CM_LOCAL_EVENT(no_priv_err);
+
+DECLARE_EVENT_CLASS(cm_remote_class,
+	TP_PROTO(
+		u32 remote_id
+	),
+
+	TP_ARGS(remote_id),
+
+	TP_STRUCT__entry(
+		__field(u32, remote_id)
+	),
+
+	TP_fast_assign(
+		__entry->remote_id = remote_id;
+	),
+
+	TP_printk("remote_id=%u",
+		__entry->remote_id
+	)
+);
+
+#define DEFINE_CM_REMOTE_EVENT(name)					\
+		DEFINE_EVENT(cm_remote_class,				\
+				cm_##name,				\
+				TP_PROTO(				\
+					u32 remote_id			\
+				),					\
+				TP_ARGS(remote_id))
+
+DEFINE_CM_REMOTE_EVENT(remote_no_priv_err);
+DEFINE_CM_REMOTE_EVENT(insert_failed_err);
+
+TRACE_EVENT(cm_send_rep_err,
+	TP_PROTO(
+		__be32 local_id,
+		enum ib_cm_state state
+	),
+
+	TP_ARGS(local_id, state),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, local_id)
+		__field(unsigned long, state)
+	),
+
+	TP_fast_assign(
+		__entry->local_id = be32_to_cpu(local_id);
+		__entry->state = state;
+	),
+
+	TP_printk("local_id=%u state=%s",
+		__entry->local_id, show_ib_cm_state(__entry->state)
+	)
+);
+
+TRACE_EVENT(cm_rep_unknown_err,
+	TP_PROTO(
+		unsigned int local_id,
+		unsigned int remote_id,
+		enum ib_cm_state state
+	),
+
+	TP_ARGS(local_id, remote_id, state),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, local_id)
+		__field(unsigned int, remote_id)
+		__field(unsigned long, state)
+	),
+
+	TP_fast_assign(
+		__entry->local_id = local_id;
+		__entry->remote_id = remote_id;
+		__entry->state = state;
+	),
+
+	TP_printk("local_id=%u remote_id=%u state=%s",
+		__entry->local_id, __entry->remote_id,
+		show_ib_cm_state(__entry->state)
+	)
+);
+
+TRACE_EVENT(cm_handler_err,
+	TP_PROTO(
+		enum ib_event_type event
+	),
+
+	TP_ARGS(event),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, event)
+	),
+
+	TP_fast_assign(
+		__entry->event = event;
+	),
+
+	TP_printk("unhandled event=%s",
+		rdma_show_ib_event(__entry->event)
+	)
+);
+
+TRACE_EVENT(cm_mad_send_err,
+	TP_PROTO(
+		enum ib_cm_state state,
+		enum ib_wc_status wc_status
+	),
+
+	TP_ARGS(state, wc_status),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, state)
+		__field(unsigned long, wc_status)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+		__entry->wc_status = wc_status;
+	),
+
+	TP_printk("state=%s completion status=%s",
+		show_ib_cm_state(__entry->state),
+		rdma_show_wc_status(__entry->wc_status)
+	)
+);
+
+#endif /* _TRACE_IB_CMA_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE cm_trace
+
+#include <trace/define_trace.h>


^ permalink raw reply related

* [PATCH RFC 1/3] RDMA/core: Move the rdma_show_ib_cm_event() macro
From: Chuck Lever @ 2020-07-10 14:06 UTC (permalink / raw)
  To: linux-rdma; +Cc: aron.silverton
In-Reply-To: <20200710135812.14749.4630.stgit@klimt.1015granger.net>

Refactor: Make it globally available in the utilities header.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/infiniband/core/cma_trace.h |   40 ----------------------------------
 include/trace/events/rdma.h         |   41 ++++++++++++++++++++++++++++++++++-
 include/trace/events/rpcrdma.h      |    1 +
 3 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/drivers/infiniband/core/cma_trace.h b/drivers/infiniband/core/cma_trace.h
index e6e20c36c538..e45264267bcc 100644
--- a/drivers/infiniband/core/cma_trace.h
+++ b/drivers/infiniband/core/cma_trace.h
@@ -17,46 +17,6 @@
 #include <linux/tracepoint.h>
 #include <trace/events/rdma.h>
 
-/*
- * enum ib_cm_event_type, from include/rdma/ib_cm.h
- */
-#define IB_CM_EVENT_LIST			\
-	ib_cm_event(REQ_ERROR)			\
-	ib_cm_event(REQ_RECEIVED)		\
-	ib_cm_event(REP_ERROR)			\
-	ib_cm_event(REP_RECEIVED)		\
-	ib_cm_event(RTU_RECEIVED)		\
-	ib_cm_event(USER_ESTABLISHED)		\
-	ib_cm_event(DREQ_ERROR)			\
-	ib_cm_event(DREQ_RECEIVED)		\
-	ib_cm_event(DREP_RECEIVED)		\
-	ib_cm_event(TIMEWAIT_EXIT)		\
-	ib_cm_event(MRA_RECEIVED)		\
-	ib_cm_event(REJ_RECEIVED)		\
-	ib_cm_event(LAP_ERROR)			\
-	ib_cm_event(LAP_RECEIVED)		\
-	ib_cm_event(APR_RECEIVED)		\
-	ib_cm_event(SIDR_REQ_ERROR)		\
-	ib_cm_event(SIDR_REQ_RECEIVED)		\
-	ib_cm_event_end(SIDR_REP_RECEIVED)
-
-#undef ib_cm_event
-#undef ib_cm_event_end
-
-#define ib_cm_event(x)		TRACE_DEFINE_ENUM(IB_CM_##x);
-#define ib_cm_event_end(x)	TRACE_DEFINE_ENUM(IB_CM_##x);
-
-IB_CM_EVENT_LIST
-
-#undef ib_cm_event
-#undef ib_cm_event_end
-
-#define ib_cm_event(x)		{ IB_CM_##x, #x },
-#define ib_cm_event_end(x)	{ IB_CM_##x, #x }
-
-#define rdma_show_ib_cm_event(x) \
-		__print_symbolic(x, IB_CM_EVENT_LIST)
-
 
 DECLARE_EVENT_CLASS(cma_fsm_class,
 	TP_PROTO(
diff --git a/include/trace/events/rdma.h b/include/trace/events/rdma.h
index aa19afc73a4e..81bb454fc288 100644
--- a/include/trace/events/rdma.h
+++ b/include/trace/events/rdma.h
@@ -6,7 +6,6 @@
 /*
  * enum ib_event_type, from include/rdma/ib_verbs.h
  */
-
 #define IB_EVENT_LIST				\
 	ib_event(CQ_ERR)			\
 	ib_event(QP_FATAL)			\
@@ -90,6 +89,46 @@ IB_WC_STATUS_LIST
 #define rdma_show_wc_status(x) \
 		__print_symbolic(x, IB_WC_STATUS_LIST)
 
+/*
+ * enum ib_cm_event_type, from include/rdma/ib_cm.h
+ */
+#define IB_CM_EVENT_LIST			\
+	ib_cm_event(REQ_ERROR)			\
+	ib_cm_event(REQ_RECEIVED)		\
+	ib_cm_event(REP_ERROR)			\
+	ib_cm_event(REP_RECEIVED)		\
+	ib_cm_event(RTU_RECEIVED)		\
+	ib_cm_event(USER_ESTABLISHED)		\
+	ib_cm_event(DREQ_ERROR)			\
+	ib_cm_event(DREQ_RECEIVED)		\
+	ib_cm_event(DREP_RECEIVED)		\
+	ib_cm_event(TIMEWAIT_EXIT)		\
+	ib_cm_event(MRA_RECEIVED)		\
+	ib_cm_event(REJ_RECEIVED)		\
+	ib_cm_event(LAP_ERROR)			\
+	ib_cm_event(LAP_RECEIVED)		\
+	ib_cm_event(APR_RECEIVED)		\
+	ib_cm_event(SIDR_REQ_ERROR)		\
+	ib_cm_event(SIDR_REQ_RECEIVED)		\
+	ib_cm_event_end(SIDR_REP_RECEIVED)
+
+#undef ib_cm_event
+#undef ib_cm_event_end
+
+#define ib_cm_event(x)		TRACE_DEFINE_ENUM(IB_CM_##x);
+#define ib_cm_event_end(x)	TRACE_DEFINE_ENUM(IB_CM_##x);
+
+IB_CM_EVENT_LIST
+
+#undef ib_cm_event
+#undef ib_cm_event_end
+
+#define ib_cm_event(x)		{ IB_CM_##x, #x },
+#define ib_cm_event_end(x)	{ IB_CM_##x, #x }
+
+#define rdma_show_ib_cm_event(x) \
+		__print_symbolic(x, IB_CM_EVENT_LIST)
+
 /*
  * enum rdma_cm_event_type, from include/rdma/rdma_cm.h
  */
diff --git a/include/trace/events/rpcrdma.h b/include/trace/events/rpcrdma.h
index abe942225637..b6aad52beb62 100644
--- a/include/trace/events/rpcrdma.h
+++ b/include/trace/events/rpcrdma.h
@@ -13,6 +13,7 @@
 #include <linux/scatterlist.h>
 #include <linux/sunrpc/rpc_rdma_cid.h>
 #include <linux/tracepoint.h>
+#include <rdma/ib_cm.h>
 #include <trace/events/rdma.h>
 
 /**


^ permalink raw reply related

* [PATCH RFC 0/3] IB CM tracepoints
From: Chuck Lever @ 2020-07-10 14:06 UTC (permalink / raw)
  To: linux-rdma; +Cc: aron.silverton

Hi-

This is a Request For Comments.

Oracle has an interest in a common observability infrastructure in
the RDMA core and ULPs. One alternative for this infrastructure is
to introduce static tracepoints that can also be used as hooks for
eBPF scripts, replacing infrastructure that is based on printk.

As an addendum to tracepoints already in NFS/RDMA and parts of the
RDMA core, this series takes that approach as a strawman. Feedback
is welcome!

---

Chuck Lever (3):
      RDMA/core: Move the rdma_show_ib_cm_event() macro
      RDMA/cm: Replace pr_debug() call sites with tracepoints
      RDMA/cm: Add tracepoints to track MAD send operations

 drivers/infiniband/core/Makefile   |   2 +-
 drivers/infiniband/core/cm.c       | 102 ++++---
 drivers/infiniband/core/cm_trace.c |  15 ++
 drivers/infiniband/core/cm_trace.h | 414 +++++++++++++++++++++++++++++
 4 files changed, 476 insertions(+), 57 deletions(-)
 create mode 100644 drivers/infiniband/core/cm_trace.c
 create mode 100644 drivers/infiniband/core/cm_trace.h

--
Chuck Lever

^ permalink raw reply

* Re: [PATCH for-rc v2 1/2] IB/hfi1: Do not destroy hfi1_wq when the device is shut down
From: Sasha Levin @ 2020-07-10 14:03 UTC (permalink / raw)
  To: Sasha Levin, Dennis Dalessandro, Kaike Wan, jgg, dledford
  Cc: linux-rdma, stable, stable
In-Reply-To: <20200623204047.107638.77646.stgit@awfm-01.aw.intel.com>

Hi

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag
fixing commit: 8d3e71136a08 ("IB/{hfi1, qib}: Add handling of kernel restart").

The bot has tested the following trees: v5.7.6, v5.4.49, v4.19.130, v4.14.186, v4.9.228.

v5.7.6: Build OK!
v5.4.49: Build OK!
v4.19.130: Failed to apply! Possible dependencies:
    09e71899b9cf5 ("IB/hfi1: Prepare for new HFI1 MSIx API")
    24c5bfeaf1e66 ("IB/hfi1: Add the TID second leg ACK packet builder")
    3ca633f1ff7b1 ("IB/hfi1: Right size user_sdma sequence numbers and related variables")
    572f0c3301138 ("IB/hfi1: Add the dual leg code")
    57f97e96625fe ("IB/hfi1: Get the hfi1_devdata structure as early as possible")
    5da0fc9dbf891 ("IB/hfi1: Prepare resource waits for dual leg")
    6eb4eb10fb0d1 ("IB/hfi1: Make the MSIx resource allocation a bit more flexible")
    829eaee5d09a7 ("IB/hfi1: Add TID RDMA retry timer")
    a2f7bbdc2dba0 ("IB/hfi1: Rework the IRQ API to be more flexible")
    c54a73d8202a3 ("IB/hfi1: Rework file list in Makefile")
    f01b4d5a43da4 ("IB/hfi1: OPFN interface")

v4.14.186: Failed to apply! Possible dependencies:
    05cb18fda926d ("IB/hfi1: Update HFI to use the latest PCI API")
    09e71899b9cf5 ("IB/hfi1: Prepare for new HFI1 MSIx API")
    1b311f8931cfe ("IB/hfi1: Add tx_opcode_stats like the opcode_stats")
    2d9544aacf9e6 ("IB/hfi1: Insure int mask for in-kernel receive contexts is clear")
    442e55661db1d ("IB/hfi1: Extend input hdr tracing for packet type")
    473291b3ea0e1 ("IB/hfi1: Fix for early release of sdma context")
    5d18ee67d4c17 ("IB/{hfi1, rdmavt, qib}: Implement CQ completion vector support")
    6eb4eb10fb0d1 ("IB/hfi1: Make the MSIx resource allocation a bit more flexible")
    70324739ac5e0 ("IB/hfi1: Remove INTx support and simplify MSIx usage")
    a2f7bbdc2dba0 ("IB/hfi1: Rework the IRQ API to be more flexible")
    a74d5307caba4 ("IB/hfi1: Rework fault injection machinery")
    b5de809ef6f6c ("IB/hfi1: Show fault stats in both TX and RX directions")
    c54a73d8202a3 ("IB/hfi1: Rework file list in Makefile")
    cc9a97ea2c74e ("IB/hfi1: Do not allocate PIO send contexts for VNIC")
    d108c60d3d55e ("IB/hfi1: Set in_use_ctxts bits for user ctxts only")
    e9777ad4399c2 ("IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure")

v4.9.228: Failed to apply! Possible dependencies:
    0181ce31b2602 ("IB/hfi1: Add receive fault injection feature")
    1bb0d7b781b1c ("IB/hfi1: Code reuse with memdup_copy")
    2280740f01aee ("IB/hfi1: Virtual Network Interface Controller (VNIC) HW support")
    6e768f0682e26 ("IB/hfi1: Optimize devdata cachelines")
    a2f7bbdc2dba0 ("IB/hfi1: Rework the IRQ API to be more flexible")
    b7481944b06e9 ("IB/hfi1: Show statistics counters under IB stats interface")
    cc9a97ea2c74e ("IB/hfi1: Do not allocate PIO send contexts for VNIC")
    d108c60d3d55e ("IB/hfi1: Set in_use_ctxts bits for user ctxts only")
    d295dbeb2a0c9 ("IB/hf1: User context locking is inconsistent")
    d4829ea6035b8 ("IB/hfi1: OPA_VNIC RDMA netdev support")
    ec8a142327f85 ("IB/hfi1: Force logical link down")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks
Sasha

^ permalink raw reply

* Re: [PATCH 02/25] dma-fence: prime lockdep annotations
From: Daniel Vetter @ 2020-07-10 14:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, DRI Development, Intel Graphics Development,
	linux-rdma, Felix Kuehling, kernel test robot,
	Thomas Hellström, Mika Kuoppala,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, amd-gfx list,
	Chris Wilson, Maarten Lankhorst, Daniel Vetter
In-Reply-To: <20200710134826.GD23821@mellanox.com>

On Fri, Jul 10, 2020 at 3:48 PM Jason Gunthorpe <jgg@mellanox.com> wrote:
>
> On Fri, Jul 10, 2020 at 03:01:10PM +0200, Christian König wrote:
> > Am 10.07.20 um 14:54 schrieb Jason Gunthorpe:
> > > On Fri, Jul 10, 2020 at 02:48:16PM +0200, Christian König wrote:
> > > > Am 10.07.20 um 14:43 schrieb Jason Gunthorpe:
> > > > > On Thu, Jul 09, 2020 at 10:09:11AM +0200, Daniel Vetter wrote:
> > > > > > Hi Jason,
> > > > > >
> > > > > > Below the paragraph I've added after our discussions around dma-fences
> > > > > > outside of drivers/gpu. Good enough for an ack on this, or want something
> > > > > > changed?
> > > > > >
> > > > > > Thanks, Daniel
> > > > > >
> > > > > > > + * Note that only GPU drivers have a reasonable excuse for both requiring
> > > > > > > + * &mmu_interval_notifier and &shrinker callbacks at the same time as having to
> > > > > > > + * track asynchronous compute work using &dma_fence. No driver outside of
> > > > > > > + * drivers/gpu should ever call dma_fence_wait() in such contexts.
> > > > > I was hoping we'd get to 'no driver outside GPU should even use
> > > > > dma_fence()'
> > > > My last status was that V4L could come use dma_fences as well.
> > > I'm sure lots of places *could* use it, but I think I understood that
> > > it is a bad idea unless you have to fit into the DRM uAPI?
> >
> > It would be a bit questionable if you use the container objects we came up
> > with in the DRM subsystem outside of it.
> >
> > But using the dma_fence itself makes sense for everything which could do
> > async DMA in general.
>
> dma_fence only possibly makes some sense if you intend to expose the
> completion outside a single driver.
>
> The prefered kernel design pattern for this is to connect things with
> a function callback.
>
> So the actual use case of dma_fence is quite narrow and tightly linked
> to DRM.
>
> I don't think we should spread this beyond DRM, I can't see a reason.

Yeah v4l has a legit reason to use dma_fence, android wants that
there. There's even been patches proposed years ago, but never landed
because android is using some vendor hack horror show for camera
drivers right now.

But there is an effort going on to fix that (under the libcamera
heading), and I expect that once we have that, it'll want dma_fence
support. So outright excluding everyone from dma_fence is a bit too
much. They definitely shouldn't be used though for entirely
independent stuff.

> > > You are better to do something contained in the single driver where
> > > locking can be analyzed.
> > >
> > > > I'm not 100% sure, but wouldn't MMU notifier + dma_fence be a valid use case
> > > > for things like custom FPGA interfaces as well?
> > > I don't think we should expand the list of drivers that use this
> > > technique.
> > > Drivers that can't suspend should pin memory, not use blocked
> > > notifiers to created pinned memory.
> >
> > Agreed totally, it's a complete pain to maintain even for the GPU drivers.
> >
> > Unfortunately that doesn't change users from requesting it. So I'm pretty
> > sure we are going to see more of this.
>
> Kernel maintainers need to say no.
>
> The proper way to do DMA on no-faulting hardware is page pinning.
>
> Trying to improve performance of limited HW by using sketchy
> techniques at the cost of general system stability should be a NAK.

Well that's pretty much gpu drivers, all the horrors for a bit more speed :-)

On the text itself, should I upgrade to "must not" instead of "should
not"? Or more needed?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply

* Re: [PATCH 02/25] dma-fence: prime lockdep annotations
From: Jason Gunthorpe @ 2020-07-10 13:48 UTC (permalink / raw)
  To: Christian König
  Cc: Daniel Vetter, DRI Development, Intel Graphics Development,
	linux-rdma, Daniel Vetter, Felix Kuehling, kernel test robot,
	Thomas Hellström, Mika Kuoppala, linux-media, linaro-mm-sig,
	amd-gfx, Chris Wilson, Maarten Lankhorst, Daniel Vetter
In-Reply-To: <4f4a2cf7-f505-8494-1461-bd467870481e@amd.com>

On Fri, Jul 10, 2020 at 03:01:10PM +0200, Christian König wrote:
> Am 10.07.20 um 14:54 schrieb Jason Gunthorpe:
> > On Fri, Jul 10, 2020 at 02:48:16PM +0200, Christian König wrote:
> > > Am 10.07.20 um 14:43 schrieb Jason Gunthorpe:
> > > > On Thu, Jul 09, 2020 at 10:09:11AM +0200, Daniel Vetter wrote:
> > > > > Hi Jason,
> > > > > 
> > > > > Below the paragraph I've added after our discussions around dma-fences
> > > > > outside of drivers/gpu. Good enough for an ack on this, or want something
> > > > > changed?
> > > > > 
> > > > > Thanks, Daniel
> > > > > 
> > > > > > + * Note that only GPU drivers have a reasonable excuse for both requiring
> > > > > > + * &mmu_interval_notifier and &shrinker callbacks at the same time as having to
> > > > > > + * track asynchronous compute work using &dma_fence. No driver outside of
> > > > > > + * drivers/gpu should ever call dma_fence_wait() in such contexts.
> > > > I was hoping we'd get to 'no driver outside GPU should even use
> > > > dma_fence()'
> > > My last status was that V4L could come use dma_fences as well.
> > I'm sure lots of places *could* use it, but I think I understood that
> > it is a bad idea unless you have to fit into the DRM uAPI?
> 
> It would be a bit questionable if you use the container objects we came up
> with in the DRM subsystem outside of it.
> 
> But using the dma_fence itself makes sense for everything which could do
> async DMA in general.

dma_fence only possibly makes some sense if you intend to expose the
completion outside a single driver. 

The prefered kernel design pattern for this is to connect things with
a function callback.

So the actual use case of dma_fence is quite narrow and tightly linked
to DRM.

I don't think we should spread this beyond DRM, I can't see a reason.

> > You are better to do something contained in the single driver where
> > locking can be analyzed.
> > 
> > > I'm not 100% sure, but wouldn't MMU notifier + dma_fence be a valid use case
> > > for things like custom FPGA interfaces as well?
> > I don't think we should expand the list of drivers that use this
> > technique.
> > Drivers that can't suspend should pin memory, not use blocked
> > notifiers to created pinned memory.
> 
> Agreed totally, it's a complete pain to maintain even for the GPU drivers.
> 
> Unfortunately that doesn't change users from requesting it. So I'm pretty
> sure we are going to see more of this.

Kernel maintainers need to say no.

The proper way to do DMA on no-faulting hardware is page pinning.

Trying to improve performance of limited HW by using sketchy
techniques at the cost of general system stability should be a NAK.

Jason

^ permalink raw reply

* Re: [PATCH 02/25] dma-fence: prime lockdep annotations
From: Christian König @ 2020-07-10 13:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Vetter, DRI Development, Intel Graphics Development,
	linux-rdma, Daniel Vetter, Felix Kuehling, kernel test robot,
	Thomas Hellström, Mika Kuoppala, linux-media, linaro-mm-sig,
	amd-gfx, Chris Wilson, Maarten Lankhorst, Daniel Vetter
In-Reply-To: <20200710125453.GC23821@mellanox.com>

Am 10.07.20 um 14:54 schrieb Jason Gunthorpe:
> On Fri, Jul 10, 2020 at 02:48:16PM +0200, Christian König wrote:
>> Am 10.07.20 um 14:43 schrieb Jason Gunthorpe:
>>> On Thu, Jul 09, 2020 at 10:09:11AM +0200, Daniel Vetter wrote:
>>>> Hi Jason,
>>>>
>>>> Below the paragraph I've added after our discussions around dma-fences
>>>> outside of drivers/gpu. Good enough for an ack on this, or want something
>>>> changed?
>>>>
>>>> Thanks, Daniel
>>>>
>>>>> + * Note that only GPU drivers have a reasonable excuse for both requiring
>>>>> + * &mmu_interval_notifier and &shrinker callbacks at the same time as having to
>>>>> + * track asynchronous compute work using &dma_fence. No driver outside of
>>>>> + * drivers/gpu should ever call dma_fence_wait() in such contexts.
>>> I was hoping we'd get to 'no driver outside GPU should even use
>>> dma_fence()'
>> My last status was that V4L could come use dma_fences as well.
> I'm sure lots of places *could* use it, but I think I understood that
> it is a bad idea unless you have to fit into the DRM uAPI?

It would be a bit questionable if you use the container objects we came 
up with in the DRM subsystem outside of it.

But using the dma_fence itself makes sense for everything which could do 
async DMA in general.

> You are better to do something contained in the single driver where
> locking can be analyzed.
>
>> I'm not 100% sure, but wouldn't MMU notifier + dma_fence be a valid use case
>> for things like custom FPGA interfaces as well?
> I don't think we should expand the list of drivers that use this
> technique.
> Drivers that can't suspend should pin memory, not use blocked
> notifiers to created pinned memory.

Agreed totally, it's a complete pain to maintain even for the GPU drivers.

Unfortunately that doesn't change users from requesting it. So I'm 
pretty sure we are going to see more of this.

Christian.

>
> Jason


^ permalink raw reply

* Re: [PATCH 02/25] dma-fence: prime lockdep annotations
From: Jason Gunthorpe @ 2020-07-10 12:54 UTC (permalink / raw)
  To: Christian König
  Cc: Daniel Vetter, DRI Development, Intel Graphics Development,
	linux-rdma, Daniel Vetter, Felix Kuehling, kernel test robot,
	Thomas Hellström, Mika Kuoppala, linux-media, linaro-mm-sig,
	amd-gfx, Chris Wilson, Maarten Lankhorst, Daniel Vetter
In-Reply-To: <5c163d74-4a28-1d74-be86-099b4729a2e0@amd.com>

On Fri, Jul 10, 2020 at 02:48:16PM +0200, Christian König wrote:
> Am 10.07.20 um 14:43 schrieb Jason Gunthorpe:
> > On Thu, Jul 09, 2020 at 10:09:11AM +0200, Daniel Vetter wrote:
> > > Hi Jason,
> > > 
> > > Below the paragraph I've added after our discussions around dma-fences
> > > outside of drivers/gpu. Good enough for an ack on this, or want something
> > > changed?
> > > 
> > > Thanks, Daniel
> > > 
> > > > + * Note that only GPU drivers have a reasonable excuse for both requiring
> > > > + * &mmu_interval_notifier and &shrinker callbacks at the same time as having to
> > > > + * track asynchronous compute work using &dma_fence. No driver outside of
> > > > + * drivers/gpu should ever call dma_fence_wait() in such contexts.
> > I was hoping we'd get to 'no driver outside GPU should even use
> > dma_fence()'
> 
> My last status was that V4L could come use dma_fences as well.

I'm sure lots of places *could* use it, but I think I understood that
it is a bad idea unless you have to fit into the DRM uAPI?

You are better to do something contained in the single driver where
locking can be analyzed.

> I'm not 100% sure, but wouldn't MMU notifier + dma_fence be a valid use case
> for things like custom FPGA interfaces as well?

I don't think we should expand the list of drivers that use this
technique. 

Drivers that can't suspend should pin memory, not use blocked
notifiers to created pinned memory.

Jason

^ permalink raw reply

* Re: [PATCH 02/25] dma-fence: prime lockdep annotations
From: Christian König @ 2020-07-10 12:48 UTC (permalink / raw)
  To: Jason Gunthorpe, Daniel Vetter
  Cc: DRI Development, Intel Graphics Development, linux-rdma,
	Daniel Vetter, Felix Kuehling, kernel test robot,
	Thomas Hellström, Mika Kuoppala, linux-media, linaro-mm-sig,
	amd-gfx, Chris Wilson, Maarten Lankhorst, Daniel Vetter
In-Reply-To: <20200710124357.GB23821@mellanox.com>

Am 10.07.20 um 14:43 schrieb Jason Gunthorpe:
> On Thu, Jul 09, 2020 at 10:09:11AM +0200, Daniel Vetter wrote:
>> Hi Jason,
>>
>> Below the paragraph I've added after our discussions around dma-fences
>> outside of drivers/gpu. Good enough for an ack on this, or want something
>> changed?
>>
>> Thanks, Daniel
>>
>>> + * Note that only GPU drivers have a reasonable excuse for both requiring
>>> + * &mmu_interval_notifier and &shrinker callbacks at the same time as having to
>>> + * track asynchronous compute work using &dma_fence. No driver outside of
>>> + * drivers/gpu should ever call dma_fence_wait() in such contexts.
> I was hoping we'd get to 'no driver outside GPU should even use
> dma_fence()'

My last status was that V4L could come use dma_fences as well.

I'm not 100% sure, but wouldn't MMU notifier + dma_fence be a valid use 
case for things like custom FPGA interfaces as well?

> Is that not reasonable?
>
> When your annotations once anything uses dma_fence it has to assume
> the worst cases, right?

Well a defensive approach is usually the best idea, yes.

Christian.

>
> Jason


^ permalink raw reply

* Re: [PATCH 02/25] dma-fence: prime lockdep annotations
From: Jason Gunthorpe @ 2020-07-10 12:43 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, Intel Graphics Development, linux-rdma,
	Daniel Vetter, Felix Kuehling, kernel test robot,
	Thomas Hellström, Mika Kuoppala, linux-media, linaro-mm-sig,
	amd-gfx, Chris Wilson, Maarten Lankhorst, Christian König,
	Daniel Vetter
In-Reply-To: <20200709080911.GP3278063@phenom.ffwll.local>

On Thu, Jul 09, 2020 at 10:09:11AM +0200, Daniel Vetter wrote:
> Hi Jason,
> 
> Below the paragraph I've added after our discussions around dma-fences
> outside of drivers/gpu. Good enough for an ack on this, or want something
> changed?
> 
> Thanks, Daniel
> 
> > + * Note that only GPU drivers have a reasonable excuse for both requiring
> > + * &mmu_interval_notifier and &shrinker callbacks at the same time as having to
> > + * track asynchronous compute work using &dma_fence. No driver outside of
> > + * drivers/gpu should ever call dma_fence_wait() in such contexts.

I was hoping we'd get to 'no driver outside GPU should even use
dma_fence()'

Is that not reasonable?

When your annotations once anything uses dma_fence it has to assume
the worst cases, right?

Jason

^ permalink raw reply

* Re: [PATCH 1/2] dma-buf.rst: Document why indefinite fences are a bad idea
From: Maarten Lankhorst @ 2020-07-10 12:30 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development
  Cc: Intel Graphics Development, linux-rdma, Christian König,
	Daniel Stone, Jesse Natalie, Steve Pronovost, Jason Ekstrand,
	Felix Kuehling, Mika Kuoppala, Thomas Hellstrom, linux-media,
	linaro-mm-sig, amd-gfx, Chris Wilson, Daniel Vetter
In-Reply-To: <20200709123339.547390-1-daniel.vetter@ffwll.ch>

Op 09-07-2020 om 14:33 schreef Daniel Vetter:
> Comes up every few years, gets somewhat tedious to discuss, let's
> write this down once and for all.
>
> What I'm not sure about is whether the text should be more explicit in
> flat out mandating the amdkfd eviction fences for long running compute
> workloads or workloads where userspace fencing is allowed.
>
> v2: Now with dot graph!
>
> v3: Typo (Dave Airlie)

For first 5 patches, and patch 16, 17:

Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

> Acked-by: Christian König <christian.koenig@amd.com>
> Acked-by: Daniel Stone <daniels@collabora.com>
> Cc: Jesse Natalie <jenatali@microsoft.com>
> Cc: Steve Pronovost <spronovo@microsoft.com>
> Cc: Jason Ekstrand <jason@jlekstrand.net>
> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Thomas Hellstrom <thomas.hellstrom@intel.com>
> Cc: linux-media@vger.kernel.org
> Cc: linaro-mm-sig@lists.linaro.org
> Cc: linux-rdma@vger.kernel.org
> Cc: amd-gfx@lists.freedesktop.org
> Cc: intel-gfx@lists.freedesktop.org
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  Documentation/driver-api/dma-buf.rst | 70 ++++++++++++++++++++++++++++
>  1 file changed, 70 insertions(+)
>
> diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> index f8f6decde359..100bfd227265 100644
> --- a/Documentation/driver-api/dma-buf.rst
> +++ b/Documentation/driver-api/dma-buf.rst
> @@ -178,3 +178,73 @@ DMA Fence uABI/Sync File
>  .. kernel-doc:: include/linux/sync_file.h
>     :internal:
>  
> +Indefinite DMA Fences
> +~~~~~~~~~~~~~~~~~~~~
> +
> +At various times &dma_fence with an indefinite time until dma_fence_wait()
> +finishes have been proposed. Examples include:
> +
> +* Future fences, used in HWC1 to signal when a buffer isn't used by the display
> +  any longer, and created with the screen update that makes the buffer visible.
> +  The time this fence completes is entirely under userspace's control.
> +
> +* Proxy fences, proposed to handle &drm_syncobj for which the fence has not yet
> +  been set. Used to asynchronously delay command submission.
> +
> +* Userspace fences or gpu futexes, fine-grained locking within a command buffer
> +  that userspace uses for synchronization across engines or with the CPU, which
> +  are then imported as a DMA fence for integration into existing winsys
> +  protocols.
> +
> +* Long-running compute command buffers, while still using traditional end of
> +  batch DMA fences for memory management instead of context preemption DMA
> +  fences which get reattached when the compute job is rescheduled.
> +
> +Common to all these schemes is that userspace controls the dependencies of these
> +fences and controls when they fire. Mixing indefinite fences with normal
> +in-kernel DMA fences does not work, even when a fallback timeout is included to
> +protect against malicious userspace:
> +
> +* Only the kernel knows about all DMA fence dependencies, userspace is not aware
> +  of dependencies injected due to memory management or scheduler decisions.
> +
> +* Only userspace knows about all dependencies in indefinite fences and when
> +  exactly they will complete, the kernel has no visibility.
> +
> +Furthermore the kernel has to be able to hold up userspace command submission
> +for memory management needs, which means we must support indefinite fences being
> +dependent upon DMA fences. If the kernel also support indefinite fences in the
> +kernel like a DMA fence, like any of the above proposal would, there is the
> +potential for deadlocks.
> +
> +.. kernel-render:: DOT
> +   :alt: Indefinite Fencing Dependency Cycle
> +   :caption: Indefinite Fencing Dependency Cycle
> +
> +   digraph "Fencing Cycle" {
> +      node [shape=box bgcolor=grey style=filled]
> +      kernel [label="Kernel DMA Fences"]
> +      userspace [label="userspace controlled fences"]
> +      kernel -> userspace [label="memory management"]
> +      userspace -> kernel [label="Future fence, fence proxy, ..."]
> +
> +      { rank=same; kernel userspace }
> +   }
> +
> +This means that the kernel might accidentally create deadlocks
> +through memory management dependencies which userspace is unaware of, which
> +randomly hangs workloads until the timeout kicks in. Workloads, which from
> +userspace's perspective, do not contain a deadlock.  In such a mixed fencing
> +architecture there is no single entity with knowledge of all dependencies.
> +Thefore preventing such deadlocks from within the kernel is not possible.
> +
> +The only solution to avoid dependencies loops is by not allowing indefinite
> +fences in the kernel. This means:
> +
> +* No future fences, proxy fences or userspace fences imported as DMA fences,
> +  with or without a timeout.
> +
> +* No DMA fences that signal end of batchbuffer for command submission where
> +  userspace is allowed to use userspace fencing or long running compute
> +  workloads. This also means no implicit fencing for shared buffers in these
> +  cases.



^ permalink raw reply

* Re: [PATCH] drm/tilcdc: Use standard drm_atomic_helper_commit
From: Jyri Sarha @ 2020-07-10 11:16 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development
  Cc: Intel Graphics Development, linux-rdma, Daniel Vetter,
	Tomi Valkeinen
In-Reply-To: <20200708142050.530240-1-daniel.vetter@ffwll.ch>

Thank you Daniel,
Now this works perfectly, all while I was on vacation.

On 08/07/2020 17:20, Daniel Vetter wrote:
> Gives us proper nonblocking support for free, and a pile of other
> things. The tilcdc code is simply old enough that it was never
> converted over, but was stuck forever with the copypasta from when it
> was initially merged.
> 
> The riskiest thing with this conversion is maybe that there's an issue
> with the vblank handling or vblank event handling, which will upset
> the modern commit support in atomic helpers. But from a cursory review
> drm_crtc_vblank_on/off is called in the right places, and the event
> handling also seems to exist (albeit with much hand-rolling and
> probably some races, could perhaps be converted over to
> drm_crtc_arm_vblank_event without any real loss).
> 
> Motivated by me not having to hand-roll the dma-fence annotations for
> this.
> 
> v2: Clear out crtc_state->event when we're handling the event, to
> avoid upsetting the helpers (reported by Jyri).
> 
> v3: Also send out even whent the crtc is getting disabled. Tilcdc looks a
> bit like conversion to simple display helpers would work out really
> nice.
> 

Probably. Should take a closer looks some day when I have time.

> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jyri Sarha <jsarha@ti.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>

Tested-by: Jyri Sarha <jsarha@ti.com>
Reviewed-by: Jyri Sarha <jsarha@ti.com>

> --
> From logs looks like we're not stuck when disabling the display, so I
> hacked in a bit of code for that too. Like mentioned above, tilcdc
> looks like a perfect candidate for simple display helpers, I think
> that would simplify a _lot_ of code here.
> -Daniel
> ---
>  drivers/gpu/drm/tilcdc/tilcdc_crtc.c  | 13 ++++++++
>  drivers/gpu/drm/tilcdc/tilcdc_drv.c   | 47 +--------------------------
>  drivers/gpu/drm/tilcdc/tilcdc_plane.c |  8 +++--
>  3 files changed, 19 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tilcdc/tilcdc_crtc.c b/drivers/gpu/drm/tilcdc/tilcdc_crtc.c
> index e9dd5e5cb4e7..1856962411c7 100644
> --- a/drivers/gpu/drm/tilcdc/tilcdc_crtc.c
> +++ b/drivers/gpu/drm/tilcdc/tilcdc_crtc.c
> @@ -537,6 +537,18 @@ static void tilcdc_crtc_atomic_disable(struct drm_crtc *crtc,
>  	tilcdc_crtc_disable(crtc);
>  }
>  
> +static void tilcdc_crtc_atomic_flush(struct drm_crtc *crtc,
> +				     struct drm_crtc_state *old_state)
> +{
> +	if (!crtc->state->event)
> +		return;
> +
> +	spin_lock_irq(&crtc->dev->event_lock);
> +	drm_crtc_send_vblank_event(crtc, crtc->state->event);
> +	crtc->state->event = NULL;
> +	spin_unlock_irq(&crtc->dev->event_lock);
> +}
> +
>  void tilcdc_crtc_shutdown(struct drm_crtc *crtc)
>  {
>  	tilcdc_crtc_off(crtc, true);
> @@ -822,6 +834,7 @@ static const struct drm_crtc_helper_funcs tilcdc_crtc_helper_funcs = {
>  	.atomic_check	= tilcdc_crtc_atomic_check,
>  	.atomic_enable	= tilcdc_crtc_atomic_enable,
>  	.atomic_disable	= tilcdc_crtc_atomic_disable,
> +	.atomic_flush	= tilcdc_crtc_atomic_flush,
>  };
>  
>  void tilcdc_crtc_set_panel_info(struct drm_crtc *crtc,
> diff --git a/drivers/gpu/drm/tilcdc/tilcdc_drv.c b/drivers/gpu/drm/tilcdc/tilcdc_drv.c
> index 0d74a6443263..4f5fc3e87383 100644
> --- a/drivers/gpu/drm/tilcdc/tilcdc_drv.c
> +++ b/drivers/gpu/drm/tilcdc/tilcdc_drv.c
> @@ -87,55 +87,10 @@ static int tilcdc_atomic_check(struct drm_device *dev,
>  	return ret;
>  }
>  
> -static int tilcdc_commit(struct drm_device *dev,
> -		  struct drm_atomic_state *state,
> -		  bool async)
> -{
> -	int ret;
> -
> -	ret = drm_atomic_helper_prepare_planes(dev, state);
> -	if (ret)
> -		return ret;
> -
> -	ret = drm_atomic_helper_swap_state(state, true);
> -	if (ret) {
> -		drm_atomic_helper_cleanup_planes(dev, state);
> -		return ret;
> -	}
> -
> -	/*
> -	 * Everything below can be run asynchronously without the need to grab
> -	 * any modeset locks at all under one condition: It must be guaranteed
> -	 * that the asynchronous work has either been cancelled (if the driver
> -	 * supports it, which at least requires that the framebuffers get
> -	 * cleaned up with drm_atomic_helper_cleanup_planes()) or completed
> -	 * before the new state gets committed on the software side with
> -	 * drm_atomic_helper_swap_state().
> -	 *
> -	 * This scheme allows new atomic state updates to be prepared and
> -	 * checked in parallel to the asynchronous completion of the previous
> -	 * update. Which is important since compositors need to figure out the
> -	 * composition of the next frame right after having submitted the
> -	 * current layout.
> -	 */
> -
> -	drm_atomic_helper_commit_modeset_disables(dev, state);
> -
> -	drm_atomic_helper_commit_planes(dev, state, 0);
> -
> -	drm_atomic_helper_commit_modeset_enables(dev, state);
> -
> -	drm_atomic_helper_wait_for_vblanks(dev, state);
> -
> -	drm_atomic_helper_cleanup_planes(dev, state);
> -
> -	return 0;
> -}
> -
>  static const struct drm_mode_config_funcs mode_config_funcs = {
>  	.fb_create = drm_gem_fb_create,
>  	.atomic_check = tilcdc_atomic_check,
> -	.atomic_commit = tilcdc_commit,
> +	.atomic_commit = drm_atomic_helper_commit,
>  };
>  
>  static void modeset_init(struct drm_device *dev)
> diff --git a/drivers/gpu/drm/tilcdc/tilcdc_plane.c b/drivers/gpu/drm/tilcdc/tilcdc_plane.c
> index 0d09b31ae759..2f681a713815 100644
> --- a/drivers/gpu/drm/tilcdc/tilcdc_plane.c
> +++ b/drivers/gpu/drm/tilcdc/tilcdc_plane.c
> @@ -83,9 +83,11 @@ static void tilcdc_plane_atomic_update(struct drm_plane *plane,
>  	if (WARN_ON(!state->fb || !state->crtc->state))
>  		return;
>  
> -	tilcdc_crtc_update_fb(state->crtc,
> -			      state->fb,
> -			      state->crtc->state->event);
> +	if (tilcdc_crtc_update_fb(state->crtc,
> +				  state->fb,
> +				  state->crtc->state->event) == 0) {
> +		state->crtc->state->event = NULL;
> +	}
>  }
>  
>  static const struct drm_plane_helper_funcs plane_helper_funcs = {
> 


-- 
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

^ permalink raw reply

* Re: Question about IB_QP_CUR_STATE
From: liweihang @ 2020-07-10 10:11 UTC (permalink / raw)
  To: Gal Pressman, Leon Romanovsky; +Cc: linux-rdma@vger.kernel.org, Linuxarm
In-Reply-To: <a738e3f3-e2bc-a670-7292-8025c78e210d@amazon.com>

On 2020/7/9 14:14, Gal Pressman wrote:
> On 08/07/2020 18:44, Leon Romanovsky wrote:
>> On Wed, Jul 08, 2020 at 03:42:34PM +0300, Gal Pressman wrote:
>>> On 08/07/2020 12:41, liweihang wrote:
>>>> Hi all,
>>>>
>>>> I'm a little confused about the role of IB_QP_CUR_STATE in the enumeration
>>>> ib_qp_attr_mask.
>>>>
>>>> In manual page of ibv_modify_qp(), comments of cur_qp_state is "Assume this
>>>> is the current QP state". Why we need to get current qp state from users
>>>> instead of drivers?
>>>>
>>>> For example, why the users are allowed to modify qp from RTR to RTS again
>>>> even if the qp's state in driver and hardware has already been RTS.
>>>>
>>>> I would be appretiate it if someone can help with this.
>>>>
>>>> Weihang
>>>>
>>>
>>> Talking about IB_QP_CUR_STATE, I see many drivers filling it in their query QP
>>> callback although it should only be used in modify operations.. Is there a
>>> reason not to remove it?
>>
>> IBTA section "11.2.5.3 QUERY QUEUE PAIR" has line about IB_QP_CUR_STATE.
>> It is one of output modifiers.
>>
>> Thanks
>>
> 
> It says the current QP state should be returned, that's what qp_state field is used for.
> According to the man pages:
> 
> libibverbs/man/ibv_query_qp.3:
>                enum ibv_qp_state       qp_state;            /* Current QP state */
>                enum ibv_qp_state       cur_qp_state;        /* Current QP state - irrelevant for ibv_query_qp */
> 

Hi Gal and Leon,

Thanks for your reply, as Gal said, I can't find the reason why the IBTA use
cur_qp_state either. I'll take a closer look at the protocol.

Weihang

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox