From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-184.mta0.migadu.com (out-184.mta0.migadu.com [91.218.175.184]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F32B24071D2 for ; Mon, 8 Jun 2026 05:01:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.184 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780894910; cv=none; b=XbZr0JhOOq95UXpi0GD87Pxw210GLr0mLhqenfxObiY+WWSMO9aFfQoXTxu+NhF0wbybEGUfu4pqGcJOYPL56mrSPfOQEJvqTdz5/Rn5w0uZvEkMpWvfgNy7Ly1LbHBV/HZUFXPLaDdVGpNTqNn2XJ7+Ur+I4V0f/nPEsWcdlGU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780894910; c=relaxed/simple; bh=nIiUyXMfv9m6eg4y8jOaUh98HGNKNHZIHSW4KVZLGRI=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=oFX4c8RyyGBdSZT6hnjgJnu2lnFVl1PzsCiMeFTpzEg72N5GWWZg7eWXVkigIaOjgDuESIkAuASTJnhM8WiRDXKw0U31JTlEoepiJR6Tcl+9eJlaVlrdZt5vYrJXb0HceD9pEidu+fYW4oM/QnALHEeSDbqAgQJ5uVFTMC7PNEc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Sti62uUL; arc=none smtp.client-ip=91.218.175.184 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Sti62uUL" Message-ID: <1c0463ef-79bf-4e59-9b95-fe5f5f67612a@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1780894907; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pHRIWIB6WnJdKFsoNBqsFLpoKXzO68Xg+i+Fr44DLP4=; b=Sti62uULGVfJXgZuPFl9iowvZON5GQc4PW3GRjIUBFeCFwLdj/fKDzIashEwn46vhtM1Jo 2u29k3untkLjz2o+Fmxuqajhr90bZ6HCqx2hmRZ0QF/Jp98RQmulORSAZ31dx6MuTErnIW yXQbtyKFPyliv0mSN6DG7yxUlHk1EYI= Date: Sun, 7 Jun 2026 22:00:58 -0700 Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH for-next v3 2/2] RDMA/efa: Add AH cache handling on create and destroy AH To: Yonatan Nachum , jgg@nvidia.com, leon@kernel.org, linux-rdma@vger.kernel.org, "yanjun.zhu@linux.dev" Cc: mrgolin@amazon.com, sleybo@amazon.com, matua@amazon.com, gal.pressman@linux.dev, Firas Jahjah References: <20260607161753.1607559-1-ynachum@amazon.com> <20260607161753.1607559-3-ynachum@amazon.com> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Zhu Yanjun In-Reply-To: <20260607161753.1607559-3-ynachum@amazon.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT 在 2026/6/7 9:17, Yonatan Nachum 写道: > On create AH, first check if the AH cache entry already exists and if > so, returns the already stored AH number. If the entry doesn't exist, > the driver creates it and calls the device to create the AH. A per-entry > mutex serializes concurrent device commands on the same AH cache entry, > ensuring only one thread issues the device create while others wait and > reuse the result. If the device create fails, the entry remains > uninitialized so subsequent threads calls can create the AH. > > On destroy AH, the refcount is checked and if it's the last reference, > the driver issues the device destroy command while holding the entry > mutex. The entry remains in the hashtable during destroy to allow > concurrent create threads to find it and wait on the entry mutex, > preventing create-before-destroy races on the device. After the device > destroy completes, the entry is either recycled if new users arrived or > removed and freed. > > Reviewed-by: Firas Jahjah > Reviewed-by: Michael Margolin > Signed-off-by: Yonatan Nachum > --- > drivers/infiniband/hw/efa/efa_ah_cache.c | 122 +++++++++++++++++++++++ > drivers/infiniband/hw/efa/efa_ah_cache.h | 5 + > drivers/infiniband/hw/efa/efa_com_cmd.c | 73 ++++++++++---- > drivers/infiniband/hw/efa/efa_com_cmd.h | 1 + > drivers/infiniband/hw/efa/efa_verbs.c | 9 +- > 5 files changed, 187 insertions(+), 23 deletions(-) > > diff --git a/drivers/infiniband/hw/efa/efa_ah_cache.c b/drivers/infiniband/hw/efa/efa_ah_cache.c > index ab763b06b9bb..b8314cdbf054 100644 > --- a/drivers/infiniband/hw/efa/efa_ah_cache.c > +++ b/drivers/infiniband/hw/efa/efa_ah_cache.c > @@ -39,3 +39,125 @@ void efa_ah_cache_destroy(struct efa_ah_cache *ah_cache) > rhashtable_free_and_destroy(&ah_cache->hashtable, efa_ah_cache_entry_free, NULL); > mutex_destroy(&ah_cache->lock); > } > + > +static struct efa_ah_cache_entry *efa_ah_cache_lookup(struct efa_ah_cache *ah_cache, u16 pd, > + u8 *gid) > + __must_hold(&ah_cache->lock) > +{ > + struct efa_ah_cache_key key = {}; > + > + memcpy(key.gid, gid, sizeof(key.gid)); > + key.pd = pd; > + > + return rhashtable_lookup_fast(&ah_cache->hashtable, &key, ah_cache_params); > +} > + > +/** > + * efa_ah_cache_get_or_create - Get or create an AH cache entry > + * @ah_cache: AH cache > + * @pd: Protection domain number > + * @gid: GID address > + * > + * Look up an AH cache entry by PD and GID. If found, increment the refcount and > + * return it. If not found, allocate a new entry and insert it into the > + * hashtable. The entry is returned unlocked. > + * > + * Return: Pointer to the entry on success, ERR_PTR on failure. > + */ > +struct efa_ah_cache_entry *efa_ah_cache_get_or_create(struct efa_ah_cache *ah_cache, u16 pd, > + u8 *gid) > +{ > + struct efa_ah_cache_entry *entry; > + int err; > + > + mutex_lock(&ah_cache->lock); > + > + entry = efa_ah_cache_lookup(ah_cache, pd, gid); > + if (entry) { > + refcount_inc(&entry->refcount); > + mutex_unlock(&ah_cache->lock); > + return entry; > + } > + > + entry = kzalloc(sizeof(*entry), GFP_KERNEL); kzalloc_obj Zhu Yanjun > + if (!entry) { > + mutex_unlock(&ah_cache->lock); > + return ERR_PTR(-ENOMEM); > + } > + > + memcpy(entry->key.gid, gid, sizeof(entry->key.gid)); > + entry->key.pd = pd; > + refcount_set(&entry->refcount, 1); > + mutex_init(&entry->lock); > + > + err = rhashtable_insert_fast(&ah_cache->hashtable, &entry->linkage, ah_cache_params); > + if (err) { > + mutex_destroy(&entry->lock); > + kfree(entry); > + mutex_unlock(&ah_cache->lock); > + return ERR_PTR(err); > + } > + > + mutex_unlock(&ah_cache->lock); > + return entry; > +} > + > +/** > + * efa_ah_cache_put_unless_last - Release a reference to an AH cache entry > + * @ah_cache: AH cache > + * @pd: Protection domain number > + * @gid: GID address > + * > + * If this is not the last reference, decrement the refcount and return NULL. > + * If this is the last reference, return the entry with its mutex locked > + * without decrementing. > + * > + * Return: Pointer to the locked entry if last reference, NULL otherwise. > + */ > +struct efa_ah_cache_entry *efa_ah_cache_put_unless_last(struct efa_ah_cache *ah_cache, u16 pd, > + u8 *gid) > +{ > + struct efa_ah_cache_entry *entry; > + > + mutex_lock(&ah_cache->lock); > + entry = efa_ah_cache_lookup(ah_cache, pd, gid); > + if (!entry) { > + mutex_unlock(&ah_cache->lock); > + return NULL; > + } > + > + if (refcount_dec_not_one(&entry->refcount)) { > + mutex_unlock(&ah_cache->lock); > + return NULL; > + } > + > + mutex_lock(&entry->lock); > + mutex_unlock(&ah_cache->lock); > + return entry; > +} > + > +/** > + * efa_ah_cache_put - Release the final reference to an AH cache entry > + * @ah_cache: AH cache > + * @entry: AH cache entry > + * > + * Decrement the refcount. If it reaches zero, the entry is removed from the > + * hashtable and freed. Otherwise, the entry is kept for reuse. > + * > + * Called after the device destroy completes or on a failed create to release > + * the caller's reference. > + */ > +void efa_ah_cache_put(struct efa_ah_cache *ah_cache, struct efa_ah_cache_entry *entry) > +{ > + mutex_lock(&ah_cache->lock); > + if (!refcount_dec_and_test(&entry->refcount)) { > + mutex_unlock(&ah_cache->lock); > + return; > + } > + > + rhashtable_remove_fast(&ah_cache->hashtable, &entry->linkage, ah_cache_params); > + mutex_unlock(&ah_cache->lock); > + > + mutex_destroy(&entry->lock); > + kfree_rcu(entry, rcu_head); > +} > diff --git a/drivers/infiniband/hw/efa/efa_ah_cache.h b/drivers/infiniband/hw/efa/efa_ah_cache.h > index 133181b4466d..573fd29bb416 100644 > --- a/drivers/infiniband/hw/efa/efa_ah_cache.h > +++ b/drivers/infiniband/hw/efa/efa_ah_cache.h > @@ -33,5 +33,10 @@ struct efa_ah_cache { > > int efa_ah_cache_init(struct efa_ah_cache *ah_cache); > void efa_ah_cache_destroy(struct efa_ah_cache *ah_cache); > +struct efa_ah_cache_entry *efa_ah_cache_get_or_create(struct efa_ah_cache *ah_cache, u16 pd, > + u8 *gid); > +struct efa_ah_cache_entry *efa_ah_cache_put_unless_last(struct efa_ah_cache *ah_cache, u16 pd, > + u8 *gid); > +void efa_ah_cache_put(struct efa_ah_cache *ah_cache, struct efa_ah_cache_entry *entry); > > #endif /* _EFA_AH_CACHE_H_ */ > diff --git a/drivers/infiniband/hw/efa/efa_com_cmd.c b/drivers/infiniband/hw/efa/efa_com_cmd.c > index 63c7f07806a8..9eafcba5e028 100644 > --- a/drivers/infiniband/hw/efa/efa_com_cmd.c > +++ b/drivers/infiniband/hw/efa/efa_com_cmd.c > @@ -313,19 +313,25 @@ int efa_com_dereg_mr(struct efa_com_dev *edev, > return 0; > } > > -int efa_com_create_ah(struct efa_com_dev *edev, > - struct efa_com_create_ah_params *params, > - struct efa_com_create_ah_result *result) > +int efa_com_destroy_ah(struct efa_com_dev *edev, > + struct efa_com_destroy_ah_params *params) > { > - struct efa_admin_create_ah_resp cmd_completion; > + struct efa_admin_destroy_ah_resp cmd_completion; > + struct efa_admin_destroy_ah_cmd ah_cmd = {}; > struct efa_com_admin_queue *aq = &edev->aq; > - struct efa_admin_create_ah_cmd ah_cmd = {}; > + struct efa_ah_cache_entry *entry; > int err; > > - ah_cmd.aq_common_desc.opcode = EFA_ADMIN_CREATE_AH; > + entry = efa_ah_cache_put_unless_last(&edev->ah_cache, params->pdn, params->gid); > + if (!entry) > + return 0; > > - memcpy(ah_cmd.dest_addr, params->dest_addr, sizeof(ah_cmd.dest_addr)); > - ah_cmd.pd = params->pdn; > + if (!entry->initialized) > + goto out; > + > + ah_cmd.aq_common_desc.opcode = EFA_ADMIN_DESTROY_AH; > + ah_cmd.ah = entry->ah; > + ah_cmd.pd = entry->key.pd; > > err = efa_com_cmd_exec(aq, > (struct efa_admin_aq_entry *)&ah_cmd, > @@ -333,27 +339,47 @@ int efa_com_create_ah(struct efa_com_dev *edev, > (struct efa_admin_acq_entry *)&cmd_completion, > sizeof(cmd_completion)); > if (err) { > + mutex_unlock(&entry->lock); > ibdev_err_ratelimited(edev->efa_dev, > - "Failed to create ah for %pI6 [%d]\n", > - ah_cmd.dest_addr, err); > + "Failed to destroy ah-%d pd-%d [%d]\n", > + ah_cmd.ah, ah_cmd.pd, err); > return err; > } > > - result->ah = cmd_completion.ah; > + entry->initialized = false; > + > +out: > + mutex_unlock(&entry->lock); > + efa_ah_cache_put(&edev->ah_cache, entry); > > return 0; > } > > -int efa_com_destroy_ah(struct efa_com_dev *edev, > - struct efa_com_destroy_ah_params *params) > +int efa_com_create_ah(struct efa_com_dev *edev, > + struct efa_com_create_ah_params *params, > + struct efa_com_create_ah_result *result) > { > - struct efa_admin_destroy_ah_resp cmd_completion; > - struct efa_admin_destroy_ah_cmd ah_cmd = {}; > + struct efa_com_destroy_ah_params destroy_params = {}; > + struct efa_admin_create_ah_resp cmd_completion; > struct efa_com_admin_queue *aq = &edev->aq; > + struct efa_admin_create_ah_cmd ah_cmd = {}; > + struct efa_ah_cache_entry *entry; > int err; > > - ah_cmd.aq_common_desc.opcode = EFA_ADMIN_DESTROY_AH; > - ah_cmd.ah = params->ah; > + entry = efa_ah_cache_get_or_create(&edev->ah_cache, params->pdn, params->dest_addr); > + if (IS_ERR(entry)) > + return PTR_ERR(entry); > + > + mutex_lock(&entry->lock); > + if (entry->initialized) { > + result->ah = entry->ah; > + mutex_unlock(&entry->lock); > + return 0; > + } > + > + ah_cmd.aq_common_desc.opcode = EFA_ADMIN_CREATE_AH; > + > + memcpy(ah_cmd.dest_addr, params->dest_addr, sizeof(ah_cmd.dest_addr)); > ah_cmd.pd = params->pdn; > > err = efa_com_cmd_exec(aq, > @@ -362,12 +388,21 @@ int efa_com_destroy_ah(struct efa_com_dev *edev, > (struct efa_admin_acq_entry *)&cmd_completion, > sizeof(cmd_completion)); > if (err) { > + mutex_unlock(&entry->lock); > + memcpy(destroy_params.gid, params->dest_addr, sizeof(destroy_params.gid)); > + destroy_params.pdn = params->pdn; > + efa_com_destroy_ah(edev, &destroy_params); > ibdev_err_ratelimited(edev->efa_dev, > - "Failed to destroy ah-%d pd-%d [%d]\n", > - ah_cmd.ah, ah_cmd.pd, err); > + "Failed to create ah for %pI6 [%d]\n", > + ah_cmd.dest_addr, err); > return err; > } > > + entry->ah = cmd_completion.ah; > + entry->initialized = true; > + result->ah = cmd_completion.ah; > + mutex_unlock(&entry->lock); > + > return 0; > } > > diff --git a/drivers/infiniband/hw/efa/efa_com_cmd.h b/drivers/infiniband/hw/efa/efa_com_cmd.h > index ef15b3c38429..39bd4e06684a 100644 > --- a/drivers/infiniband/hw/efa/efa_com_cmd.h > +++ b/drivers/infiniband/hw/efa/efa_com_cmd.h > @@ -106,6 +106,7 @@ struct efa_com_create_ah_result { > > struct efa_com_destroy_ah_params { > u16 ah; > + u8 gid[EFA_GID_SIZE]; > u16 pdn; > }; > > diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c > index 434d60235945..6742a4037888 100644 > --- a/drivers/infiniband/hw/efa/efa_verbs.c > +++ b/drivers/infiniband/hw/efa/efa_verbs.c > @@ -2032,10 +2032,11 @@ int efa_mmap(struct ib_ucontext *ibucontext, > > static int efa_ah_destroy(struct efa_dev *dev, struct efa_ah *ah) > { > - struct efa_com_destroy_ah_params params = { > - .ah = ah->ah, > - .pdn = to_epd(ah->ibah.pd)->pdn, > - }; > + struct efa_com_destroy_ah_params params = {}; > + > + params.ah = ah->ah; > + memcpy(params.gid, ah->id, sizeof(params.gid)); > + params.pdn = to_epd(ah->ibah.pd)->pdn; > > return efa_com_destroy_ah(&dev->edev, ¶ms); > }