All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Long Li <longli@microsoft.com>,
	Konstantin Taranov <kotaranov@microsoft.com>,
	Jakub Kicinski <kuba@kernel.org>,
	"David S . Miller" <davem@davemloft.net>,
	Paolo Abeni <pabeni@redhat.com>,
	Eric Dumazet <edumazet@google.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	"K . Y . Srinivasan" <kys@microsoft.com>,
	Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
	Simon Horman <horms@kernel.org>,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources
Date: Mon, 16 Mar 2026 22:08:43 +0200	[thread overview]
Message-ID: <20260316200843.GK61385@unreal> (raw)
In-Reply-To: <20260313165928.GH1704121@ziepe.ca>

On Fri, Mar 13, 2026 at 01:59:28PM -0300, Jason Gunthorpe wrote:
> On Sat, Mar 07, 2026 at 07:38:14PM +0200, Leon Romanovsky wrote:
> > On Fri, Mar 06, 2026 at 05:47:14PM -0800, Long Li wrote:
> > > When the MANA hardware undergoes a service reset, the ETH auxiliary device
> > > (mana.eth) used by DPDK persists across the reset cycle — it is not removed
> > > and re-added like RC/UD/GSI QPs. This means userspace RDMA consumers such
> > > as DPDK have no way of knowing that firmware handles for their PD, CQ, WQ,
> > > QP and MR resources have become stale.
> > 
> > NAK to any of this.
> > 
> > In case of hardware reset, mana_ib AUX device needs to be destroyed and
> > recreated later.
> 
> Yeah, that is our general model for any serious RAS event where the
> driver's view of resources becomes out of sync with the HW.
> 
> You have tear down the ib_device by removing the aux and then bring
> back a new one.
> 
> There is an IB_EVENT_DEVICE_FATAL, but the purpose of that event is to
> tell userspace to close and re-open their uverbs FD.
> 
> We don't have a model where a uverbs FD in userspace can continue to
> work after the device has a catasrophic RAS event.
> 
> There may be room to have a model where the ib device doesn't fully
> unplug/replug so it retains its name and things, but that is core code
> not driver stuff.

Good luck with that model. It is going to break RDMA-CM hotplug support.

Thanks

> 
> Jason
> 

  reply	other threads:[~2026-03-16 20:08 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-07  1:47 [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Long Li
2026-03-07  1:47 ` [PATCH rdma-next 1/8] RDMA/mana_ib: Track ucontext per device Long Li
2026-03-07  1:47 ` [PATCH rdma-next 2/8] RDMA/mana_ib: Track PD per ucontext Long Li
2026-03-07  1:47 ` [PATCH rdma-next 3/8] RDMA/mana_ib: Track CQ " Long Li
2026-03-07  1:47 ` [PATCH rdma-next 4/8] RDMA/mana_ib: Track WQ " Long Li
2026-03-07  1:47 ` [PATCH rdma-next 5/8] RDMA/mana_ib: Track QP " Long Li
2026-03-07  1:47 ` [PATCH rdma-next 6/8] RDMA/mana_ib: Track MR " Long Li
2026-03-07  1:47 ` [PATCH rdma-next 7/8] RDMA/mana_ib: Notify service reset events to RDMA devices Long Li
2026-03-07  1:47 ` [PATCH rdma-next 8/8] RDMA/mana_ib: Skip firmware commands for invalidated handles Long Li
2026-03-07 17:38 ` [PATCH rdma-next 0/8] RDMA/mana_ib: Handle service reset for RDMA resources Leon Romanovsky
2026-03-13 16:59   ` Jason Gunthorpe
2026-03-16 20:08     ` Leon Romanovsky [this message]
2026-03-17 23:43       ` [EXTERNAL] " Long Li
2026-03-18 14:49         ` Leon Romanovsky
2026-03-21  0:49           ` Long Li
2026-04-10 15:49         ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260316200843.GK61385@unreal \
    --to=leon@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=edumazet@google.com \
    --cc=haiyangz@microsoft.com \
    --cc=horms@kernel.org \
    --cc=jgg@ziepe.ca \
    --cc=kotaranov@microsoft.com \
    --cc=kuba@kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.